date:20230823

Re: [RFC PATCH v2 00/11] Device Memory TCP

2023-08-23 Thread David Wei

On 17/08/2023 15:18, Mina Almasry wrote:
> On Thu, Aug 17, 2023 at 11:04 AM Pavel Begunkov  
> wrote:
>>
>> On 8/14/23 02:12, David Ahern wrote:
>>> On 8/9/23 7:57 PM, Mina Almasry wrote:
 Changes in RFC v2:
 --
>> ...
 ** Test Setup

 Kernel: net-next with this RFC and memory provider API cherry-picked
 locally.

 Hardware: Google Cloud A3 VMs.

 NIC: GVE with header split & RSS & flow steering support.
>>>
>>> This set seems to depend on Jakub's memory provider patches and a netdev
>>> driver change which is not included. For the testing mentioned here, you
>>> must have a tree + branch with all of the patches. Is it publicly available?
>>>
>>> It would be interesting to see how well (easy) this integrates with
>>> io_uring. Besides avoiding all of the syscalls for receiving the iov and
>>> releasing the buffers back to the pool, io_uring also brings in the
>>> ability to seed a page_pool with registered buffers which provides a
>>> means to get simpler Rx ZC for host memory.
>>
>> The patchset sounds pretty interesting. I've been working with David Wei
>> (CC'ing) on io_uring zc rx (prototype polishing stage) all that is old
>> similar approaches based on allocating an rx queue. It targets host
>> memory and device memory as an extra feature, uapi is different, lifetimes
>> are managed/bound to io_uring. Completions/buffers are returned to user via
>> a separate queue instead of cmsg, and pushed back granularly to the kernel
>> via another queue. I'll leave it to David to elaborate
>>
>> It sounds like we have space for collaboration here, if not merging then
>> reusing internals as much as we can, but we'd need to look into the
>> details deeper.
>>
> 
> I'm happy to look at your implementation and collaborate on something
> that works for both use cases. Feel free to share unpolished prototype
> so I can start having a general idea if possible.

Hi I'm David and I am working with Pavel on this. We will have something to
share with you on the mailing list before the end of the week.

I'm also preparing a submission for NetDev conf. I wonder if you and others at
Google plan to present there as well? If so, then we may want to coordinate our
submissions and talks (if accepted).

Please let me know this week, thanks!

> 
>>> Overall I like the intent and possibilities for extensions, but a lot of
>>> details are missing - perhaps some are answered by seeing an end-to-end
>>> implementation.
>>
>> --
>> Pavel Begunkov
> 
> 
>

[PATCH] drm/mediatek: mtk_drm_crtc: Avoid inappropriate kfree() in

2023-08-23 Thread Katya Orlova

mtk_drm_crtc_create() and mtk_drm_cmdq_pkt_destroy() are called with
argument 'pkt' pointed to a field 'cmdq_handle' of 'mtk_crtc' structure.
There is no need to kfree it.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 7627122fd1c0 ("drm/mediatek: Add cmdq_handle in mtk_crtc")
Signed-off-by: Katya Orlova 
---
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index d40142842f85..ba7307efa675 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -117,7 +117,6 @@ static int mtk_drm_cmdq_pkt_create(struct cmdq_client 
*client, struct cmdq_pkt *
 
pkt->va_base = kzalloc(size, GFP_KERNEL);
if (!pkt->va_base) {
-   kfree(pkt);
return -ENOMEM;
}
pkt->buf_size = size;
@@ -129,7 +128,6 @@ static int mtk_drm_cmdq_pkt_create(struct cmdq_client 
*client, struct cmdq_pkt *
if (dma_mapping_error(dev, dma_addr)) {
dev_err(dev, "dma map failed, size=%u\n", (u32)(u64)size);
kfree(pkt->va_base);
-   kfree(pkt);
return -ENOMEM;
}
 
@@ -145,7 +143,6 @@ static void mtk_drm_cmdq_pkt_destroy(struct cmdq_pkt *pkt)
dma_unmap_single(client->chan->mbox->dev, pkt->pa_base, pkt->buf_size,
 DMA_TO_DEVICE);
kfree(pkt->va_base);
-   kfree(pkt);
 }
 #endif
 
-- 
2.30.2

Hardware Requirements to participate in the EVoc

2023-08-23 Thread Raghav Sharma

Hello

I am a student from India pursuing a bachelor's degree in engineering . I
would love to be a part of the EVoC.

I would like to know whether the hardware requirements are a bar to entry
to the program?
Can somebody who does not have prescribed hardware participate?

Thank You
Raghav Sharma

[PATCH v2 6/6] drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger property

2023-08-23 Thread Gil Dekel

When a link-training attempt fails, emit a uevent to user space that
includes the trigger property, which in this case will be
link-statue=Bad.

This will allow userspace to parse the uevent property and better
understand the reason for the previous modeset failure.

Signed-off-by: Gil Dekel 

V2:
  - init link_status_property inline.
---
 drivers/gpu/drm/i915/display/intel_dp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index e8b10f59e141..328e9f030033 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "g4x_dp.h"
 #include "i915_drv.h"
@@ -5995,6 +5996,8 @@ static void intel_dp_modeset_retry_work_fn(struct 
work_struct *work)
struct intel_dp *intel_dp =
container_of(work, typeof(*intel_dp), modeset_retry_work);
struct drm_connector *connector = _dp->attached_connector->base;
+   struct drm_property *link_status_property =
+   connector->dev->mode_config.link_status_property;

/* Set the connector's (and possibly all its downstream MST ports') link
 * status to BAD.
@@ -6011,7 +6014,7 @@ static void intel_dp_modeset_retry_work_fn(struct 
work_struct *work)
}
mutex_unlock(>dev->mode_config.mutex);
/* Send Hotplug uevent so userspace can reprobe */
-   drm_kms_helper_connector_hotplug_event(connector);
+   drm_sysfs_connector_property_event(connector, link_status_property);
 }

 bool
--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

[PATCH v2 5/6] drm/i915/dp_link_training: Set all downstream MST ports to BAD before retrying

2023-08-23 Thread Gil Dekel

Before sending a uevent to userspace in order to trigger a corrective
modeset, we change the failing connector's link-status to BAD. However,
the downstream MST branch ports are left in their original GOOD state.

This patch utilizes the drm helper function
drm_dp_set_mst_topology_link_status() to rectify this and set all
downstream MST connectors' link-status to BAD before emitting the uevent
to userspace.

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 42353b1ac487..e8b10f59e141 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct 
work_struct *work)
struct intel_dp *intel_dp =
container_of(work, typeof(*intel_dp), modeset_retry_work);
struct drm_connector *connector = _dp->attached_connector->base;
-   drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
-   connector->name);

-   /* Grab the locks before changing connector property*/
-   mutex_lock(>dev->mode_config.mutex);
-   /* Set connector link status to BAD and send a Uevent to notify
-* userspace to do a modeset.
+   /* Set the connector's (and possibly all its downstream MST ports') link
+* status to BAD.
 */
+   mutex_lock(>dev->mode_config.mutex);
+   drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n",
+   connector->base.id, connector->name,
+   connector->state->link_status, DRM_MODE_LINK_STATUS_BAD);
drm_connector_set_link_status_property(connector,
   DRM_MODE_LINK_STATUS_BAD);
+   if (intel_dp->is_mst) {
+   drm_dp_set_mst_topology_link_status(_dp->mst_mgr,
+   DRM_MODE_LINK_STATUS_BAD);
+   }
mutex_unlock(>dev->mode_config.mutex);
/* Send Hotplug uevent so userspace can reprobe */
drm_kms_helper_connector_hotplug_event(connector);
--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

[PATCH v2 4/6] drm/i915: Move DP modeset_retry_work into intel_dp

2023-08-23 Thread Gil Dekel

Currently, link-training fallback is only implemented for SST, so having
modeset_retry_work in intel_connector makes sense. However, we hope to
implement link training fallback for MST in a follow-up patchset, so
moving modeset_retry_work to indel_dp will make handling both SST and
MST connectors simpler. This patch does exactly that, and updates all
modeset_retry_work dependencies to use an intel_dp instead.

Credit: this patch is a rebase of Lyude Pual's original patch:
https://patchwork.freedesktop.org/patch/216627/?series=41576=3

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/i915/display/intel_display.c   | 14 +++---
 drivers/gpu/drm/i915/display/intel_display_types.h |  6 +++---
 drivers/gpu/drm/i915/display/intel_dp.c| 11 ---
 .../gpu/drm/i915/display/intel_dp_link_training.c  |  3 +--
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index db3c26e013e3..2ec75aa0b4ee 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -7962,20 +7962,28 @@ void i830_disable_pipe(struct drm_i915_private 
*dev_priv, enum pipe pipe)

 void intel_hpd_poll_fini(struct drm_i915_private *i915)
 {
-   struct intel_connector *connector;
struct drm_connector_list_iter conn_iter;
+   struct intel_connector *connector;
+   struct intel_dp *intel_dp;
+   struct intel_encoder *encoder;

/* Kill all the work that may have been queued by hpd. */
drm_connector_list_iter_begin(>drm, _iter);
for_each_intel_connector_iter(connector, _iter) {
-   if (connector->modeset_retry_work.func)
-   cancel_work_sync(>modeset_retry_work);
if (connector->hdcp.shim) {
cancel_delayed_work_sync(>hdcp.check_work);
cancel_work_sync(>hdcp.prop_work);
}
}
drm_connector_list_iter_end(_iter);
+
+   for_each_intel_dp(>drm, encoder) {
+   if (encoder->type == DRM_MODE_CONNECTOR_eDP ||
+   encoder->type == DRM_MODE_CONNECTOR_DisplayPort) {
+   intel_dp = enc_to_intel_dp(encoder);
+   cancel_work_sync(_dp->modeset_retry_work);
+   }
+   }
 }

 bool intel_scanout_needs_vtd_wa(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h 
b/drivers/gpu/drm/i915/display/intel_display_types.h
index 731f2ec04d5c..b92bb69a3fe4 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -620,9 +620,6 @@ struct intel_connector {

struct intel_dp *mst_port;

-   /* Work struct to schedule a uevent on link train failure */
-   struct work_struct modeset_retry_work;
-
struct intel_hdcp hdcp;
 };

@@ -1779,6 +1776,9 @@ struct intel_dp {
/* Displayport compliance testing */
struct intel_dp_compliance compliance;

+   /* Work struct to schedule a uevent on link train failure */
+   struct work_struct modeset_retry_work;
+
/* Downstream facing port caps */
struct {
int min_tmds_clock, max_tmds_clock;
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 01b180c8d9bd..42353b1ac487 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5992,12 +5992,9 @@ static bool intel_edp_init_connector(struct intel_dp 
*intel_dp,

 static void intel_dp_modeset_retry_work_fn(struct work_struct *work)
 {
-   struct intel_connector *intel_connector;
-   struct drm_connector *connector;
-
-   intel_connector = container_of(work, typeof(*intel_connector),
-  modeset_retry_work);
-   connector = _connector->base;
+   struct intel_dp *intel_dp =
+   container_of(work, typeof(*intel_dp), modeset_retry_work);
+   struct drm_connector *connector = _dp->attached_connector->base;
drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id,
connector->name);

@@ -6027,7 +6024,7 @@ intel_dp_init_connector(struct intel_digital_port 
*dig_port,
int type;

/* Initialize the work for modeset in case of link train failure */
-   INIT_WORK(_connector->modeset_retry_work,
+   INIT_WORK(_dp->modeset_retry_work,
  intel_dp_modeset_retry_work_fn);

if (drm_WARN(dev, dig_port->max_lanes < 1,
diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c 
b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
index 31d0d7854003..87d13cd03ef5 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
@@ -1063,7 +1063,6 @@ intel_dp_link_train_phy(struct intel_dp

[PATCH v2 3/6] drm/dp_mst: Add drm_dp_set_mst_topology_link_status()

2023-08-23 Thread Gil Dekel

Unlike SST, MST can support multiple displays connected to a single
connector. However, this also means that if the DisplayPort link to the
top-level MST branch device becomes unstable, then every single branch
device has an unstable link.

Since there are multiple downstream ports per connector, setting the
link status of the parent mstb's port to BAD is not enough. All of the
downstream mstb ports must also have their link status set to BAD.

This aligns to how the DP link status logic in DRM works. We notify
userspace that all of the mstb ports need retraining and apply new lower
bandwidth constraints to all future atomic commits on the topology that
follow.

Since any driver supporting MST needs to figure out which connectors
live downstream on an MST topology and update their link status in order
to retrain MST links properly, we add the
drm_dp_set_mst_topology_link_status() helper. This helper simply marks
the link status of all connectors living in that topology as bad. We
will make use of this helper in i915 later in this series.

Credit: this patch is a refactor of Lyude Pual's original patch:
https://patchwork.kernel.org/project/dri-devel/patch/20180308232421.14049-5-ly...@redhat.com/

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 +++
 include/drm/display/drm_dp_mst_helper.h   |  3 ++
 2 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ed96cfcfa304..17cbadfb6ccb 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3566,6 +3566,45 @@ int drm_dp_get_vc_payload_bw(const struct 
drm_dp_mst_topology_mgr *mgr,
 }
 EXPORT_SYMBOL(drm_dp_get_vc_payload_bw);

+/**
+ * drm_dp_set_mst_topology_link_status() - set all downstream MST ports' link 
status
+ * @mgr: MST topology manager to set state for
+ * @status: The new status to set the MST topology to
+ *
+ * Set all downstream ports' link-status within the topology to the given 
status.
+ */
+void drm_dp_set_mst_topology_link_status(struct drm_dp_mst_topology_mgr *mgr,
+enum drm_link_status status)
+{
+   struct drm_dp_mst_port *port;
+   struct drm_dp_mst_branch *rmstb;
+   struct drm_dp_mst_branch *mstb =
+   drm_dp_mst_topology_get_mstb_validated(mgr, mgr->mst_primary);
+
+   list_for_each_entry_reverse(port, >ports, next) {
+   struct drm_connector *connector = port->connector;
+
+   if (connector) {
+   mutex_lock(>dev->mode_config.mutex);
+   drm_dbg_kms(
+   connector->dev,
+   "[MST-CONNECTOR:%d:%s] link status %d -> %d\n",
+   connector->base.id, connector->name,
+   connector->state->link_status, status);
+   connector->state->link_status = status;
+   mutex_unlock(>dev->mode_config.mutex);
+   }
+
+   rmstb = drm_dp_mst_topology_get_mstb_validated(mstb->mgr,
+  port->mstb);
+   if (rmstb) {
+   drm_dp_set_mst_topology_link_status(rmstb->mgr, status);
+   drm_dp_mst_topology_put_mstb(rmstb);
+   }
+   }
+}
+EXPORT_SYMBOL(drm_dp_set_mst_topology_link_status);
+
 /**
  * drm_dp_read_mst_cap() - check whether or not a sink supports MST
  * @aux: The DP AUX channel to use
diff --git a/include/drm/display/drm_dp_mst_helper.h 
b/include/drm/display/drm_dp_mst_helper.h
index ed5c9660563c..855d488bf364 100644
--- a/include/drm/display/drm_dp_mst_helper.h
+++ b/include/drm/display/drm_dp_mst_helper.h
@@ -832,6 +832,9 @@ struct edid *drm_dp_mst_get_edid(struct drm_connector 
*connector,
 int drm_dp_get_vc_payload_bw(const struct drm_dp_mst_topology_mgr *mgr,
 int link_rate, int link_lane_count);

+void drm_dp_set_mst_topology_link_status(struct drm_dp_mst_topology_mgr *mgr,
+enum drm_link_status status);
+
 int drm_dp_calc_pbn_mode(int clock, int bpp, bool dsc);

 void drm_dp_mst_update_slots(struct drm_dp_mst_topology_state *mst_state, 
uint8_t link_encoding_cap);
--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

[PATCH v2 2/6] drm/i915/dp_link_training: Add a final failing state to link training fallback for MST

2023-08-23 Thread Gil Dekel

Currently, MST link training has no fallback. This means that if an MST
base connector fails to link-train once, the training completely fails,
which makes this case significantly more common than a complete SST link
training failure.

Similar to the final failure state of SST, this patch zeros out both
max_link_rate and max_link_lane_count. In addition, it stops resetting
MST params so the zeroing of the HBR fields stick. This ensures that
the MST base connector's modes will be completely pruned, since it is
effectively left with 0Gbps bandwidth.

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/i915/display/intel_dp.c   | 27 ++-
 drivers/gpu/drm/i915/display/intel_dp.h   |  2 +-
 .../drm/i915/display/intel_dp_link_training.c |  8 +++---
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 2152ddbab557..01b180c8d9bd 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -630,7 +630,7 @@ static bool intel_dp_can_link_train_fallback_for_edp(struct 
intel_dp *intel_dp,
return true;
 }

-int intel_dp_get_link_train_fallback_values(struct intel_dp *intel_dp,
+void intel_dp_get_link_train_fallback_values(struct intel_dp *intel_dp,
int link_rate, u8 lane_count)
 {
struct drm_i915_private *i915 = dp_to_i915(intel_dp);
@@ -638,18 +638,23 @@ int intel_dp_get_link_train_fallback_values(struct 
intel_dp *intel_dp,

/*
 * TODO: Enable fallback on MST links once MST link compute can handle
-* the fallback params.
+* the fallback params. For now, similar to the SST case, ensure all of
+* the base connector's modes are pruned in the next connector probe by
+* effectively reducing its bandwidth to 0 so userspace can ignore it
+* within the next modeset attempt.
 */
if (intel_dp->is_mst) {
drm_err(>drm, "Link Training Unsuccessful\n");
-   return -1;
+   intel_dp->max_link_rate = 0;
+   intel_dp->max_link_lane_count = 0;
+   return;
}

if (intel_dp_is_edp(intel_dp) && !intel_dp->use_max_params) {
drm_dbg_kms(>drm,
"Retrying Link training for eDP with max 
parameters\n");
intel_dp->use_max_params = true;
-   return 0;
+   return;
}

index = intel_dp_rate_index(intel_dp->common_rates,
@@ -662,7 +667,7 @@ int intel_dp_get_link_train_fallback_values(struct intel_dp 
*intel_dp,
  lane_count)) {
drm_dbg_kms(>drm,
"Retrying Link training for eDP with same 
parameters\n");
-   return 0;
+   return;
}
intel_dp->max_link_rate = intel_dp_common_rate(intel_dp, index 
- 1);
intel_dp->max_link_lane_count = lane_count;
@@ -673,7 +678,7 @@ int intel_dp_get_link_train_fallback_values(struct intel_dp 
*intel_dp,
  lane_count >> 1)) 
{
drm_dbg_kms(>drm,
"Retrying Link training for eDP with same 
parameters\n");
-   return 0;
+   return;
}
intel_dp->max_link_rate = intel_dp_max_common_rate(intel_dp);
intel_dp->max_link_lane_count = lane_count >> 1;
@@ -686,10 +691,7 @@ int intel_dp_get_link_train_fallback_values(struct 
intel_dp *intel_dp,
 */
intel_dp->max_link_rate = 0;
intel_dp->max_link_lane_count = 0;
-   return 0;
}
-
-   return 0;
 }

 u32 intel_dp_mode_to_fec_clock(u32 mode_clock)
@@ -5310,10 +5312,11 @@ intel_dp_detect(struct drm_connector *connector,
intel_dp_configure_mst(intel_dp);

/*
-* TODO: Reset link params when switching to MST mode, until MST
-* supports link training fallback params.
+* Note: Even though MST link training fallback is not yet implemented,
+* do not reset. This is because the base connector needs to have all
+* its modes pruned when link training for the MST port fails.
 */
-   if (intel_dp->reset_link_params || intel_dp->is_mst) {
+   if (intel_dp->reset_link_params) {
intel_dp_reset_max_link_params(intel_dp);
intel_dp->reset_link_params = false;
}
diff --git a/drivers/gpu/drm/i915/display/intel_dp.h 
b/drivers/gpu/drm/i915/display/intel_dp.h
index 788a577ebe16..7388510e0cb2 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.h
+++ b/drivers/gpu/drm/i915/display/intel_dp.h
@@ -40,7 +40,7 @@ bool intel_dp_init_connector(struct intel_digital_port

[PATCH v2 1/6] drm/i915/dp_link_training: Add a final failing state to link training fallback

2023-08-23 Thread Gil Dekel

Instead of silently giving up when all link-training fallback values are
exhausted, this patch modifies the fallback's failure branch to reduces
both max_link_lane_count and max_link_rate to zero (0) and continues to
emit uevents until userspace stops attempting to modeset.

By doing so, we ensure the failing connector, which is in
link-status=Bad, has all its modes pruned (due to effectively having a
bandwidth of 0Gbps).

It is then the userspace's responsibility to ignore connectors with no
modes, even if they are marked as connected.

Signed-off-by: Gil Dekel 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 7067ee3a4bd3..2152ddbab557 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -276,8 +276,12 @@ static int intel_dp_common_len_rate_limit(const struct 
intel_dp *intel_dp,

 static int intel_dp_common_rate(struct intel_dp *intel_dp, int index)
 {
+   /* This occurs when max link rate drops to 0 via link training 
fallback*/
+   if (index < 0)
+   return 0;
+
if (drm_WARN_ON(_to_i915(intel_dp)->drm,
-   index < 0 || index >= intel_dp->num_common_rates))
+   index >= intel_dp->num_common_rates))
return 162000;

return intel_dp->common_rates[index];
@@ -318,6 +322,9 @@ static int intel_dp_max_common_lane_count(struct intel_dp 
*intel_dp)
 int intel_dp_max_lane_count(struct intel_dp *intel_dp)
 {
switch (intel_dp->max_link_lane_count) {
+   /* This occurs when max link lane count drops to 0 via link training 
fallback*/
+   case 0:
+   return 0;
case 1:
case 2:
case 4:
@@ -672,7 +679,14 @@ int intel_dp_get_link_train_fallback_values(struct 
intel_dp *intel_dp,
intel_dp->max_link_lane_count = lane_count >> 1;
} else {
drm_err(>drm, "Link Training Unsuccessful\n");
-   return -1;
+   /*
+* Ensure all of the connector's modes are pruned in the next
+* probe by effectively reducing its bandwidth to 0 so userspace
+* can ignore it within the next modeset attempt.
+*/
+   intel_dp->max_link_rate = 0;
+   intel_dp->max_link_lane_count = 0;
+   return 0;
}

return 0;
--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

[PATCH v2 0/6] drm/i915/dp_link_training: Define a final failure state when link training fails

2023-08-23 Thread Gil Dekel

Next version of https://patchwork.freedesktop.org/series/122643/

Reorganize into:
1) Add for final failure state for SST and MST link training fallback.
2) Add a DRM helper for setting downstream MST ports' link-status state.
3) Make handling SST and MST connectors simpler via intel_dp.
4) Update link-status for downstream MST ports.
5) Emit a uevent with the "link-status" trigger property.

Gil Dekel (6):
  drm/i915/dp_link_training: Add a final failing state to link training
fallback
  drm/i915/dp_link_training: Add a final failing state to link training
fallback for MST
  drm/dp_mst: Add drm_dp_set_mst_topology_link_status()
  drm/i915: Move DP modeset_retry_work into intel_dp
  drm/i915/dp_link_training: Set all downstream MST ports to BAD before
retrying
  drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger
property

 drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++
 drivers/gpu/drm/i915/display/intel_display.c  | 14 +++-
 .../drm/i915/display/intel_display_types.h|  6 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 75 ---
 drivers/gpu/drm/i915/display/intel_dp.h   |  2 +-
 .../drm/i915/display/intel_dp_link_training.c | 11 ++-
 include/drm/display/drm_dp_mst_helper.h   |  3 +
 7 files changed, 110 insertions(+), 40 deletions(-)

--
Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics

[PATCH v3 3/3] drm/mst: adjust the function drm_dp_remove_payload_part2()

2023-08-23 Thread Wayne Lin

[Why]
Now in drm_dp_remove_payload_part2(), it utilizes the time slot number
of the payload in old state to represent the one in the payload table
at the moment.

It would be better to clarify the idea by using the latest allocated
time slot number for the port at the moment instead and which info is
already included in new mst_state. By this, we can also remove redundant
workaround for amdgpu driver.

[How]
Remove "old_payload" input of drm_dp_remove_payload_part2() and get the
latest number of allocated time slot for the port from new mst_state
instead.

Signed-off-by: Wayne Lin 
---
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 70 ---
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 32 ++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   |  7 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c   |  6 +-
 include/drm/display/drm_dp_mst_helper.h   |  9 ++-
 5 files changed, 40 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index cbef4ff28cd8..02cb372260f3 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -203,40 +203,6 @@ void dm_helpers_dp_update_branch_info(
const struct dc_link *link)
 {}
 
-static void dm_helpers_construct_old_payload(
-   struct dc_link *link,
-   int pbn_per_slot,
-   struct drm_dp_mst_atomic_payload *new_payload,
-   struct drm_dp_mst_atomic_payload *old_payload)
-{
-   struct link_mst_stream_allocation_table current_link_table =
-   
link->mst_stream_alloc_table;
-   struct link_mst_stream_allocation *dc_alloc;
-   int i;
-
-   *old_payload = *new_payload;
-
-   /* Set correct time_slots/PBN of old payload.
-* other fields (delete & dsc_enabled) in
-* struct drm_dp_mst_atomic_payload are don't care fields
-* while calling drm_dp_remove_payload_part2()
-*/
-   for (i = 0; i < current_link_table.stream_count; i++) {
-   dc_alloc =
-   _link_table.stream_allocations[i];
-
-   if (dc_alloc->vcp_id == new_payload->vcpi) {
-   old_payload->time_slots = dc_alloc->slot_count;
-   old_payload->pbn = dc_alloc->slot_count * pbn_per_slot;
-   break;
-   }
-   }
-
-   /* make sure there is an old payload*/
-   ASSERT(i != current_link_table.stream_count);
-
-}
-
 /*
  * Writes payload allocation table in immediate downstream device.
  */
@@ -248,7 +214,7 @@ bool dm_helpers_dp_mst_write_payload_allocation_table(
 {
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_state *mst_state;
-   struct drm_dp_mst_atomic_payload *target_payload, *new_payload, 
old_payload;
+   struct drm_dp_mst_atomic_payload *payload;
struct drm_dp_mst_topology_mgr *mst_mgr;
 
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
@@ -263,28 +229,21 @@ bool dm_helpers_dp_mst_write_payload_allocation_table(
 
mst_mgr = >mst_root->mst_mgr;
mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
-   new_payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);
-
-   if (enable) {
-   target_payload = new_payload;
+   payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);
 
+   if (enable)
/* It's OK for this to fail */
-   drm_dp_add_payload_part1(mst_mgr, mst_state, new_payload);
-   } else {
-   /* construct old payload by VCPI*/
-   dm_helpers_construct_old_payload(stream->link, 
mst_state->pbn_div,
-   new_payload, _payload);
-   target_payload = _payload;
+   drm_dp_add_payload_part1(mst_mgr, mst_state, payload);
+   else
 
-   drm_dp_remove_payload_part1(mst_mgr, mst_state, new_payload);
-   }
+   drm_dp_remove_payload_part1(mst_mgr, mst_state, payload);
 
/* mst_mgr->->payloads are VC payload notify MST branch using DPCD or
 * AUX message. The sequence is slot 1-63 allocated sequence for each
 * stream. AMD ASIC stream slot allocation should follow the same
 * sequence. copy DRM MST allocation to dc
 */
-   fill_dc_mst_payload_table_from_drm(stream->link, enable, 
target_payload, proposed_table);
+   fill_dc_mst_payload_table_from_drm(stream->link, enable, payload, 
proposed_table);
 
return true;
 }
@@ -343,7 +302,7 @@ bool dm_helpers_dp_mst_send_payload_allocation(
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_state *mst_state;

[PATCH v3 2/3] drm/mst: Refactor the flow for payload allocation/removement

2023-08-23 Thread Wayne Lin

[Why]
Today, the allocation/deallocation steps and status is a bit unclear.

For instance, payload->vc_start_slot = -1 stands for "the failure of
updating DPCD payload ID table" and can also represent as "payload is not
allocated yet". These two cases should be handled differently and hence
better to distinguish them for better understanding.

[How]
Define enumeration - ALLOCATION_LOCAL, ALLOCATION_DFP and ALLOCATION_REMOTE
to distinguish different allocation status. Adjust the code to handle
different status accordingly for better understanding the sequence of
payload allocation and payload removement.

For payload creation, the procedure should look like this:
DRM part 1:
* step 1 - update sw mst mgr variables to add a new payload
* step 2 - add payload at immediate DFP DPCD payload table

Driver:
* Add new payload in HW and sync up with DFP by sending ACT

DRM Part 2:
* Send ALLOCATE_PAYLOAD sideband message to allocate bandwidth along the
  virtual channel.

And as for payload removement, the procedure should look like this:
DRM part 1:
* step 1 - Send ALLOCATE_PAYLOAD sideband message to release bandwidth
   along the virtual channel
* step 2 - Clear payload allocation at immediate DFP DPCD payload table

Driver:
* Remove the payload in HW and sync up with DFP by sending ACT

DRM part 2:
* update sw mst mgr variables to remove the payload

Note that it's fine to fail when communicate with the branch device
connected at immediate downstrean-facing port, but updating variables of
SW mst mgr and HW configuration should be conducted anyway. That's because
it's under commit_tail and we need to complete the HW programming.

Changes since v1:
* Remove the set but not use variable 'old_payload' in function
  'nv50_msto_prepare'. Catched by kernel test robot 

Changes since v2:
* Fix indention

Signed-off-by: Wayne Lin 
Reviewed-by: Lyude Paul 
---
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  20 ++-
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 159 +++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   |  18 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c   |  21 +--
 include/drm/display/drm_dp_mst_helper.h   |  23 ++-
 5 files changed, 153 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index 4b230933b28e..cbef4ff28cd8 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -219,7 +219,7 @@ static void dm_helpers_construct_old_payload(
/* Set correct time_slots/PBN of old payload.
 * other fields (delete & dsc_enabled) in
 * struct drm_dp_mst_atomic_payload are don't care fields
-* while calling drm_dp_remove_payload()
+* while calling drm_dp_remove_payload_part2()
 */
for (i = 0; i < current_link_table.stream_count; i++) {
dc_alloc =
@@ -263,13 +263,12 @@ bool dm_helpers_dp_mst_write_payload_allocation_table(
 
mst_mgr = >mst_root->mst_mgr;
mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
-
-   /* It's OK for this to fail */
new_payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);
 
if (enable) {
target_payload = new_payload;
 
+   /* It's OK for this to fail */
drm_dp_add_payload_part1(mst_mgr, mst_state, new_payload);
} else {
/* construct old payload by VCPI*/
@@ -277,7 +276,7 @@ bool dm_helpers_dp_mst_write_payload_allocation_table(
new_payload, _payload);
target_payload = _payload;
 
-   drm_dp_remove_payload(mst_mgr, mst_state, _payload, 
new_payload);
+   drm_dp_remove_payload_part1(mst_mgr, mst_state, new_payload);
}
 
/* mst_mgr->->payloads are VC payload notify MST branch using DPCD or
@@ -344,7 +343,7 @@ bool dm_helpers_dp_mst_send_payload_allocation(
struct amdgpu_dm_connector *aconnector;
struct drm_dp_mst_topology_state *mst_state;
struct drm_dp_mst_topology_mgr *mst_mgr;
-   struct drm_dp_mst_atomic_payload *payload;
+   struct drm_dp_mst_atomic_payload *new_payload, *old_payload;
enum mst_progress_status set_flag = MST_ALLOCATE_NEW_PAYLOAD;
enum mst_progress_status clr_flag = MST_CLEAR_ALLOCATED_PAYLOAD;
int ret = 0;
@@ -357,15 +356,20 @@ bool dm_helpers_dp_mst_send_payload_allocation(
mst_mgr = >mst_root->mst_mgr;
mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
 
-   payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);
+   new_payload = drm_atomic_get_mst_payload_state(mst_state, 
aconnector->mst_output_port);
 
if (!enable) {
set_flag = MST_CLEAR_ALLOCATED_PAYLOAD;
clr_flag =

[PATCH v3 1/3] drm/mst: delete unnecessary case in drm_dp_add_payload_part2()

2023-08-23 Thread Wayne Lin

[Why]
There is no need to consider payload->delete case since we won't call
drm_dp_add_payload_part2() to create a payload when we're about to
remove it.

[How]
Delete unnecessary case to simplify the code.

Signed-off-by: Wayne Lin 
Reviewed-by: Lyude Paul 
---
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ed96cfcfa304..4d80426757ab 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3411,12 +3411,8 @@ int drm_dp_add_payload_part2(struct 
drm_dp_mst_topology_mgr *mgr,
 
ret = drm_dp_create_payload_step2(mgr, payload);
if (ret < 0) {
-   if (!payload->delete)
-   drm_err(mgr->dev, "Step 2 of creating MST payload for 
%p failed: %d\n",
-   payload->port, ret);
-   else
-   drm_dbg_kms(mgr->dev, "Step 2 of removing MST payload 
for %p failed: %d\n",
-   payload->port, ret);
+   drm_err(mgr->dev, "Step 2 of creating MST payload for %p 
failed: %d\n",
+   payload->port, ret);
}
 
return ret;
-- 
2.37.3

[PATCH v3 0/3] Refactor and clean up codes of mst

2023-08-23 Thread Wayne Lin

This patch set is mainly trying to organize the mst code today a bit.
Like to clarify and organize the sequence of mst payload allocation and
removement.And also clean up some redundant codes today.

The main refactor one is the patch
"drm/mst: Refactor the flow for payload allocation/removement"
which is adding a new enum variable in stuct drm_dp_mst_atomic_payload
to represent the status of paylad alloction, and then handle the payload
accordingly. Besides, rename some drm mst functions to better express the
behind idea.

The other two patches are mainly to clean up unnecessary codes.

Changes since v1:
* Remove the set but not use variable 'old_payload' in function
  'nv50_msto_prepare'. Catched by kernel test robot 

Changes since v2:
* Fix indention

Wayne Lin (3):
  drm/mst: delete unnecessary case in drm_dp_add_payload_part2()
  drm/mst: Refactor the flow for payload allocation/removement
  drm/mst: adjust the function drm_dp_remove_payload_part2()

 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  60 +-
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 189 +++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   |  13 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c   |  17 +-
 include/drm/display/drm_dp_mst_helper.h   |  22 +-
 5 files changed, 159 insertions(+), 142 deletions(-)

-- 
2.37.3

Re: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-08-23 Thread kernel test robot

Hi Adrián,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panfrost-Add-cycle-count-GPU-register-definitions/20230824-093848
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:
https://lore.kernel.org/r/20230824013604.466224-3-adrian.larumbe%40collabora.com
patch subject: [PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics
config: alpha-allyesconfig 
(https://download.01.org/0day-ci/archive/20230824/202308241240.ngaywbmr-...@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce: 
(https://download.01.org/0day-ci/archive/20230824/202308241240.ngaywbmr-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308241240.ngaywbmr-...@intel.com/

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/panfrost/panfrost_drv.c: In function 
'panfrost_gpu_show_fdinfo':
>> drivers/gpu/drm/panfrost/panfrost_drv.c:551:50: warning: format '%u' expects 
>> argument of type 'unsigned int', but argument 4 has type 'long unsigned int' 
>> [-Wformat=]
 551 | drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
 | ~^
 |  |
 |  unsigned int
 | %lu
 552 |ei->name, 
pfdev->pfdevfreq.current_frequency);
 |  
~~
 |  |
 |  long unsigned 
int


vim +551 drivers/gpu/drm/panfrost/panfrost_drv.c

   534  
   535  
   536  static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
   537   struct panfrost_file_priv 
*panfrost_priv,
   538   struct drm_printer *p)
   539  {
   540  int i;
   541  
   542  for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
   543  struct engine_info *ei = 
_priv->fdinfo.engines[i];
   544  
   545  drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   546 ei->name, ei->elapsed_ns);
   547  drm_printf(p, "drm-cycles-%s:\t%llu\n",
   548 ei->name, ei->cycles);
   549  drm_printf(p, "drm-maxfreq-%s:\t%u Hz\n",
   550 ei->name, panfrost_priv->fdinfo.maxfreq);
 > 551  drm_printf(p, "drm-curfreq-%s:\t%u Hz\n",
   552 ei->name, 
pfdev->pfdevfreq.current_frequency);
   553  }
   554  }
   555  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

RE: [Patch v2 2/3] drm/mst: Refactor the flow for payload allocation/removement

2023-08-23 Thread Lin, Wayne

[Public]

Hi Lyude,

I'm afraid that I don't have the permissions to push and would like to have
your help. Thanks!

> -Original Message-
> From: Lyude Paul 
> Sent: Thursday, August 24, 2023 5:00 AM
> To: Lin, Wayne ; dri-devel@lists.freedesktop.org;
> amd-...@lists.freedesktop.org
> Cc: jani.nik...@intel.com; ville.syrj...@linux.intel.com; imre.d...@intel.com;
> Wentland, Harry ; Zuo, Jerry
> 
> Subject: Re: [Patch v2 2/3] drm/mst: Refactor the flow for payload
> allocation/removement
>
> Sure - you're also welcome to push the first two patches after fixing the
> indentation if you'd like
>
> On Wed, 2023-08-23 at 03:19 +, Lin, Wayne wrote:
> > [Public]
> >
> > Thanks, Lyude!
> > Should I push another version to fix the indention?
> >
> > > -Original Message-
> > > From: Lyude Paul 
> > > Sent: Friday, August 18, 2023 6:17 AM
> > > To: Lin, Wayne ; dri-devel@lists.freedesktop.org;
> > > amd-...@lists.freedesktop.org
> > > Cc: jani.nik...@intel.com; ville.syrj...@linux.intel.com;
> > > imre.d...@intel.com; Wentland, Harry ; Zuo,
> > > Jerry 
> > > Subject: Re: [Patch v2 2/3] drm/mst: Refactor the flow for payload
> > > allocation/removement
> > >
> > > Two small comments:
> > >
> > > On Mon, 2023-08-07 at 10:56 +0800, Wayne Lin wrote:
> > > > [Why]
> > > > Today, the allocation/deallocation steps and status is a bit unclear.
> > > >
> > > > For instance, payload->vc_start_slot = -1 stands for "the failure
> > > > of updating DPCD payload ID table" and can also represent as
> > > > "payload is not allocated yet". These two cases should be handled
> > > > differently and hence better to distinguish them for better 
> > > > understanding.
> > > >
> > > > [How]
> > > > Define enumeration - ALLOCATION_LOCAL, ALLOCATION_DFP and
> > > > ALLOCATION_REMOTE to distinguish different allocation status.
> > > > Adjust the code to handle different status accordingly for better
> > > > understanding the sequence of payload allocation and payload
> > > removement.
> > > >
> > > > For payload creation, the procedure should look like this:
> > > > DRM part 1:
> > > > * step 1 - update sw mst mgr variables to add a new payload
> > > > * step 2 - add payload at immediate DFP DPCD payload table
> > > >
> > > > Driver:
> > > > * Add new payload in HW and sync up with DFP by sending ACT
> > > >
> > > > DRM Part 2:
> > > > * Send ALLOCATE_PAYLOAD sideband message to allocate bandwidth
> > > > along
> > > the
> > > >   virtual channel.
> > > >
> > > > And as for payload removement, the procedure should look like this:
> > > > DRM part 1:
> > > > * step 1 - Send ALLOCATE_PAYLOAD sideband message to release
> bandwidth
> > > >along the virtual channel
> > > > * step 2 - Clear payload allocation at immediate DFP DPCD payload
> > > > table
> > > >
> > > > Driver:
> > > > * Remove the payload in HW and sync up with DFP by sending ACT
> > > >
> > > > DRM part 2:
> > > > * update sw mst mgr variables to remove the payload
> > > >
> > > > Note that it's fine to fail when communicate with the branch
> > > > device connected at immediate downstrean-facing port, but updating
> > > > variables of SW mst mgr and HW configuration should be conducted
> > > > anyway. That's because it's under commit_tail and we need to
> > > > complete the HW
> > > programming.
> > > >
> > > > Changes since v1:
> > > > * Remove the set but not use variable 'old_payload' in function
> > > >   'nv50_msto_prepare'. Catched by kernel test robot
> > > > 
> > > >
> > > > Signed-off-by: Wayne Lin 
> > > > ---
> > > >  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  20 ++-
> > > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 159
> > > > +++--
> > > -
> > > >  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  18 +-
> > > >  drivers/gpu/drm/nouveau/dispnv50/disp.c   |  21 +--
> > > >  include/drm/display/drm_dp_mst_helper.h   |  23 ++-
> > > >  5 files changed, 153 insertions(+), 88 deletions(-)
> > > >
> > > > diff --git
> > > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > > index d9a482908380..9ad509279b0a 100644
> > > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > > +++
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > > @@ -219,7 +219,7 @@ static void dm_helpers_construct_old_payload(
> > > > /* Set correct time_slots/PBN of old payload.
> > > >  * other fields (delete & dsc_enabled) in
> > > >  * struct drm_dp_mst_atomic_payload are don't care fields
> > > > -* while calling drm_dp_remove_payload()
> > > > +* while calling drm_dp_remove_payload_part2()
> > > >  */
> > > > for (i = 0; i < current_link_table.stream_count; i++) {
> > > > dc_alloc =
> > > > @@ -262,13 +262,12 @@ bool
> > > > dm_helpers_dp_mst_write_payload_allocation_table(
> > > >
> > > > mst_mgr = >mst_root->mst_mgr;
> > > > mst_state =

Re: [RFC PATCH 3/3] drm/virtio: drm_gem_plane_helper_prepare_fb for obj synchronization

2023-08-23 Thread Dmitry Osipenko

On 8/20/23 23:58, Kim, Dongwon wrote:
> On 8/17/2023 7:33 PM, Dmitry Osipenko wrote:
>> On 7/13/23 01:44, Dongwon Kim wrote:
>>> This helper is needed for framebuffer synchronization. Old
>>> framebuffer data
>>> is often displayed on the guest display without this helper.
>>>
>>> Cc: Gerd Hoffmann 
>>> Cc: Vivek Kasireddy 
>>> Signed-off-by: Dongwon Kim 
>>> ---
>>>   drivers/gpu/drm/virtio/virtgpu_plane.c | 4 
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c
>>> b/drivers/gpu/drm/virtio/virtgpu_plane.c
>>> index a063f06ab6c5..e197299489ce 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
>>> @@ -26,6 +26,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>>     #include "virtgpu_drv.h"
>>>   @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct
>>> drm_plane *plane,
>>>   vgfb = to_virtio_gpu_framebuffer(new_state->fb);
>>>   vgplane_st = to_virtio_gpu_plane_state(new_state);
>>>   bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
>>> +
>>> +    drm_gem_plane_helper_prepare_fb(plane, new_state);
>> The implicit display BO sync should happen on a host side, unless you're
>> rendering with Venus and then displaying with virgl. Doing it on guest
>> side should be a major performance hit. Please provide a complete
>> description of your setup: what VMM you use, config options, what tests
>> you're running.
> 
> We use virtio-gpu as a kms device while using i915 as the render device
> in our setup.
> And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as
> a blob resource
> (reference to the buffer). QEMU then creates a dmabuf using udmabuf for
> the blob
> then renders it as a texture using OGL. The test I ran is simple. Just
> starting terminal
> app and typing things to see if there is any frame regression. I believe
> this helper is
> required since the BO on the guest is basically dmabuf that is being
> shared between i915
> and virtio-gpu driver. I didn't think about the performance impact. If
> the impact is
> too much and that is not acceptable, is there any other suggestions or
> some tests I can try?

You can do fence-wait in the guest userspace/Mesa after blitting/drawing
to the udmabuf.

You may run popular vk/gl gfx benchmarks using gl/sdl outputs to see the
fps impact.

Virglrender today supports native contexts. The method you're using for
GPU priming was proven to be slow in comparison to multi-gpu native
contexts. There is ongoing work for supporting fence passing from guest
to host [1] that allows to do fence-syncing on host. You'll find links
to the WIP virtio-intel native context in [1] as well. You won't find
GPU priming support using native context in [1], patches hasn't been
published yet.

[1]
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138

Note that in general it's not acceptable to upstream patches that serve
downstream only. Yours display sync issue is irrelevant to the upstream
stack unless you're going to upstream all the VMM and guest userspace
patches, and in such case you should always publish all the patches and
provide links.

So, you need to check the performance impact and publish all the patches
to the relevant upstream projects.

-- 
Best regards,
Dmitry

[PATCH 1/2] drm/display/dp: Default 8 bpc support when DSC is supported

2023-08-23 Thread Ankit Nautiyal

As per DP v1.4, a DP DSC Sink device shall support 8bpc in DPCD 6Ah.
Apparently some panels that do support DSC, are not setting the bit for
8bpc.

So always assume 8bpc support by DSC decoder, when DSC is claimed to be
supported.

v2: Use helper to check dsc support. (Ankit)

Signed-off-by: Ankit Nautiyal 
---
 drivers/gpu/drm/display/drm_dp_helper.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index e6a78fd32380..309fc10cde78 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -2447,14 +2447,19 @@ int drm_dp_dsc_sink_supported_input_bpcs(const u8 
dsc_dpcd[DP_DSC_RECEIVER_CAP_S
 u8 dsc_bpc[3])
 {
int num_bpc = 0;
+
+   if(!drm_dp_sink_supports_dsc(dsc_dpcd))
+   return 0;
+
u8 color_depth = dsc_dpcd[DP_DSC_DEC_COLOR_DEPTH_CAP - DP_DSC_SUPPORT];
 
if (color_depth & DP_DSC_12_BPC)
dsc_bpc[num_bpc++] = 12;
if (color_depth & DP_DSC_10_BPC)
dsc_bpc[num_bpc++] = 10;
-   if (color_depth & DP_DSC_8_BPC)
-   dsc_bpc[num_bpc++] = 8;
+
+   /* A DP DSC Sink devices shall support 8 bpc. */
+   dsc_bpc[num_bpc++] = 8;
 
return num_bpc;
 }
-- 
2.40.1

[PATCH v5 22/45] drm/panfrost: dynamically allocate the drm-panfrost shrinker

2023-08-23 Thread Qi Zheng

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-panfrost shrinker, so that it can be freed
asynchronously via RCU. Then it doesn't need to wait for RCU read-side
critical section when releasing the struct panfrost_device.

Signed-off-by: Qi Zheng 
Reviewed-by: Steven Price 
Acked-by: Daniel Vetter 
CC: Rob Herring 
CC: Tomeu Vizoso 
CC: Alyssa Rosenzweig 
CC: David Airlie 
CC: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/panfrost/panfrost_device.h|  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |  6 +++-
 drivers/gpu/drm/panfrost/panfrost_gem.h   |  2 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  | 30 +++
 4 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..e667e5689353 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -118,7 +118,7 @@ struct panfrost_device {
 
struct mutex shrinker_lock;
struct list_head shrinker_list;
-   struct shrinker shrinker;
+   struct shrinker *shrinker;
 
struct panfrost_devfreq pfdevfreq;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..e1d0e3a23757 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -601,10 +601,14 @@ static int panfrost_probe(struct platform_device *pdev)
if (err < 0)
goto err_out1;
 
-   panfrost_gem_shrinker_init(ddev);
+   err = panfrost_gem_shrinker_init(ddev);
+   if (err)
+   goto err_out2;
 
return 0;
 
+err_out2:
+   drm_dev_unregister(ddev);
 err_out1:
pm_runtime_disable(pfdev->dev);
panfrost_device_fini(pfdev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..863d2ec8d4f0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -81,7 +81,7 @@ panfrost_gem_mapping_get(struct panfrost_gem_object *bo,
 void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping);
 void panfrost_gem_teardown_mappings_locked(struct panfrost_gem_object *bo);
 
-void panfrost_gem_shrinker_init(struct drm_device *dev);
+int panfrost_gem_shrinker_init(struct drm_device *dev);
 void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
 
 #endif /* __PANFROST_GEM_H__ */
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c 
b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index 6a71a2555f85..3dfe2b7ccdd9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -18,8 +18,7 @@
 static unsigned long
 panfrost_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control 
*sc)
 {
-   struct panfrost_device *pfdev =
-   container_of(shrinker, struct panfrost_device, shrinker);
+   struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem;
unsigned long count = 0;
 
@@ -65,8 +64,7 @@ static bool panfrost_gem_purge(struct drm_gem_object *obj)
 static unsigned long
 panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control 
*sc)
 {
-   struct panfrost_device *pfdev =
-   container_of(shrinker, struct panfrost_device, shrinker);
+   struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem, *tmp;
unsigned long freed = 0;
 
@@ -97,13 +95,22 @@ panfrost_gem_shrinker_scan(struct shrinker *shrinker, 
struct shrink_control *sc)
  *
  * This function registers and sets up the panfrost shrinker.
  */
-void panfrost_gem_shrinker_init(struct drm_device *dev)
+int panfrost_gem_shrinker_init(struct drm_device *dev)
 {
struct panfrost_device *pfdev = dev->dev_private;
-   pfdev->shrinker.count_objects = panfrost_gem_shrinker_count;
-   pfdev->shrinker.scan_objects = panfrost_gem_shrinker_scan;
-   pfdev->shrinker.seeks = DEFAULT_SEEKS;
-   WARN_ON(register_shrinker(>shrinker, "drm-panfrost"));
+
+   pfdev->shrinker = shrinker_alloc(0, "drm-panfrost");
+   if (!pfdev->shrinker)
+   return -ENOMEM;
+
+   pfdev->shrinker->count_objects = panfrost_gem_shrinker_count;
+   pfdev->shrinker->scan_objects = panfrost_gem_shrinker_scan;
+   pfdev->shrinker->seeks = DEFAULT_SEEKS;
+   pfdev->shrinker->private_data = pfdev;
+
+   shrinker_register(pfdev->shrinker);
+
+   return 0;
 }
 
 /**
@@ -116,7 +123,6 @@ void panfrost_gem_shrinker_cleanup(struct drm_device *dev)
 {
struct panfrost_device *pfdev = dev->dev_private;
 
-   if (pfdev->shrinker.nr_deferred) {
-   unregister_shrinker(>shrinker);
-   }
+   if (pfdev->shrinker)
+

[PATCH v5 21/45] drm/msm: dynamically allocate the drm-msm_gem shrinker

2023-08-23 Thread Qi Zheng

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-msm_gem shrinker, so that it can be freed
asynchronously via RCU. Then it doesn't need to wait for RCU read-side
critical section when releasing the struct msm_drm_private.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
Acked-by: Daniel Vetter 
CC: Rob Clark 
CC: Abhinav Kumar 
CC: Dmitry Baryshkov 
CC: Sean Paul 
CC: Marijn Suijten 
CC: David Airlie 
CC: linux-arm-...@vger.kernel.org
CC: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/msm/msm_drv.c  |  4 ++-
 drivers/gpu/drm/msm/msm_drv.h  |  4 +--
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 34 --
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4bd028fa7500..7f20249d6071 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -462,7 +462,9 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
if (ret)
goto err_msm_uninit;
 
-   msm_gem_shrinker_init(ddev);
+   ret = msm_gem_shrinker_init(ddev);
+   if (ret)
+   goto err_msm_uninit;
 
if (priv->kms_init) {
ret = priv->kms_init(ddev);
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 02fd6c7d0bb7..e2fc56f161b5 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -221,7 +221,7 @@ struct msm_drm_private {
} vram;
 
struct notifier_block vmap_notifier;
-   struct shrinker shrinker;
+   struct shrinker *shrinker;
 
struct drm_atomic_state *pm_state;
 
@@ -283,7 +283,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 unsigned long msm_gem_shrinker_shrink(struct drm_device *dev, unsigned long 
nr_to_scan);
 #endif
 
-void msm_gem_shrinker_init(struct drm_device *dev);
+int msm_gem_shrinker_init(struct drm_device *dev);
 void msm_gem_shrinker_cleanup(struct drm_device *dev);
 
 struct sg_table *msm_gem_prime_get_sg_table(struct drm_gem_object *obj);
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index f38296ad8743..2063e4f8 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -34,8 +34,7 @@ static bool can_block(struct shrink_control *sc)
 static unsigned long
 msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
-   struct msm_drm_private *priv =
-   container_of(shrinker, struct msm_drm_private, shrinker);
+   struct msm_drm_private *priv = shrinker->private_data;
unsigned count = priv->lru.dontneed.count;
 
if (can_swap())
@@ -100,8 +99,7 @@ active_evict(struct drm_gem_object *obj)
 static unsigned long
 msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 {
-   struct msm_drm_private *priv =
-   container_of(shrinker, struct msm_drm_private, shrinker);
+   struct msm_drm_private *priv = shrinker->private_data;
struct {
struct drm_gem_lru *lru;
bool (*shrink)(struct drm_gem_object *obj);
@@ -148,10 +146,11 @@ msm_gem_shrinker_shrink(struct drm_device *dev, unsigned 
long nr_to_scan)
struct shrink_control sc = {
.nr_to_scan = nr_to_scan,
};
-   int ret;
+   unsigned long ret = SHRINK_STOP;
 
fs_reclaim_acquire(GFP_KERNEL);
-   ret = msm_gem_shrinker_scan(>shrinker, );
+   if (priv->shrinker)
+   ret = msm_gem_shrinker_scan(priv->shrinker, );
fs_reclaim_release(GFP_KERNEL);
 
return ret;
@@ -210,16 +209,25 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned 
long event, void *ptr)
  *
  * This function registers and sets up the msm shrinker.
  */
-void msm_gem_shrinker_init(struct drm_device *dev)
+int msm_gem_shrinker_init(struct drm_device *dev)
 {
struct msm_drm_private *priv = dev->dev_private;
-   priv->shrinker.count_objects = msm_gem_shrinker_count;
-   priv->shrinker.scan_objects = msm_gem_shrinker_scan;
-   priv->shrinker.seeks = DEFAULT_SEEKS;
-   WARN_ON(register_shrinker(>shrinker, "drm-msm_gem"));
+
+   priv->shrinker = shrinker_alloc(0, "drm-msm_gem");
+   if (!priv->shrinker)
+   return -ENOMEM;
+
+   priv->shrinker->count_objects = msm_gem_shrinker_count;
+   priv->shrinker->scan_objects = msm_gem_shrinker_scan;
+   priv->shrinker->seeks = DEFAULT_SEEKS;
+   priv->shrinker->private_data = priv;
+
+   shrinker_register(priv->shrinker);
 
priv->vmap_notifier.notifier_call = msm_gem_shrinker_vmap;
WARN_ON(register_vmap_purge_notifier(>vmap_notifier));
+
+   return 0;
 }
 
 /**
@@ -232,8 +240,8 @@ void msm_gem_shrinker_cleanup(struct drm_device *dev)
 {
struct msm_drm_private *priv = dev->dev_private;
 
-   if

[PATCH v5 20/45] drm/i915: dynamically allocate the i915_gem_mm shrinker

2023-08-23 Thread Qi Zheng

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the i915_gem_mm shrinker, so that it can be freed
asynchronously via RCU. Then it doesn't need to wait for RCU read-side
critical section when releasing the struct drm_i915_private.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
Acked-by: Daniel Vetter 
CC: Jani Nikula 
CC: Joonas Lahtinen 
CC: Rodrigo Vivi 
CC: Tvrtko Ursulin 
CC: David Airlie 
CC: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 +++-
 drivers/gpu/drm/i915/i915_drv.h  |  2 +-
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 214763942aa2..4504eb4f31d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -284,8 +284,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private 
*i915)
 static unsigned long
 i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
-   struct drm_i915_private *i915 =
-   container_of(shrinker, struct drm_i915_private, mm.shrinker);
+   struct drm_i915_private *i915 = shrinker->private_data;
unsigned long num_objects;
unsigned long count;
 
@@ -302,8 +301,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct 
shrink_control *sc)
if (num_objects) {
unsigned long avg = 2 * count / num_objects;
 
-   i915->mm.shrinker.batch =
-   max((i915->mm.shrinker.batch + avg) >> 1,
+   i915->mm.shrinker->batch =
+   max((i915->mm.shrinker->batch + avg) >> 1,
128ul /* default SHRINK_BATCH */);
}
 
@@ -313,8 +312,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct 
shrink_control *sc)
 static unsigned long
 i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 {
-   struct drm_i915_private *i915 =
-   container_of(shrinker, struct drm_i915_private, mm.shrinker);
+   struct drm_i915_private *i915 = shrinker->private_data;
unsigned long freed;
 
sc->nr_scanned = 0;
@@ -422,12 +420,18 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, 
unsigned long event, void *ptr
 
 void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
 {
-   i915->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
-   i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
-   i915->mm.shrinker.seeks = DEFAULT_SEEKS;
-   i915->mm.shrinker.batch = 4096;
-   drm_WARN_ON(>drm, register_shrinker(>mm.shrinker,
- "drm-i915_gem"));
+   i915->mm.shrinker = shrinker_alloc(0, "drm-i915_gem");
+   if (!i915->mm.shrinker) {
+   drm_WARN_ON(>drm, 1);
+   } else {
+   i915->mm.shrinker->scan_objects = i915_gem_shrinker_scan;
+   i915->mm.shrinker->count_objects = i915_gem_shrinker_count;
+   i915->mm.shrinker->seeks = DEFAULT_SEEKS;
+   i915->mm.shrinker->batch = 4096;
+   i915->mm.shrinker->private_data = i915;
+
+   shrinker_register(i915->mm.shrinker);
+   }
 
i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
drm_WARN_ON(>drm, register_oom_notifier(>mm.oom_notifier));
@@ -443,7 +447,7 @@ void i915_gem_driver_unregister__shrinker(struct 
drm_i915_private *i915)
unregister_vmap_purge_notifier(>mm.vmap_notifier));
drm_WARN_ON(>drm,
unregister_oom_notifier(>mm.oom_notifier));
-   unregister_shrinker(>mm.shrinker);
+   shrinker_free(i915->mm.shrinker);
 }
 
 void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a8ce7239bc9..f2f21da4d7f9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -163,7 +163,7 @@ struct i915_gem_mm {
 
struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
-   struct shrinker shrinker;
+   struct shrinker *shrinker;
 
 #ifdef CONFIG_MMU_NOTIFIER
/**
-- 
2.30.2

[PATCH v5 04/45] drm/ttm: dynamically allocate the drm-ttm_pool shrinker

2023-08-23 Thread Qi Zheng

Use new APIs to dynamically allocate the drm-ttm_pool shrinker.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
Acked-by: Daniel Vetter 
CC: Christian Koenig 
CC: Huang Rui 
CC: David Airlie 
CC: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/ttm/ttm_pool.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 648ca70403a7..fe610a3cace0 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -73,7 +73,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 
1];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
-static struct shrinker mm_shrinker;
+static struct shrinker *mm_shrinker;
 static DECLARE_RWSEM(pool_shrink_rwsem);
 
 /* Allocate pages of size 1 << order with the given gfp_flags */
@@ -749,8 +749,8 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, 
void *data)
struct shrink_control sc = { .gfp_mask = GFP_NOFS };
 
fs_reclaim_acquire(GFP_KERNEL);
-   seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(_shrinker, ),
-  ttm_pool_shrinker_scan(_shrinker, ));
+   seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, ),
+  ttm_pool_shrinker_scan(mm_shrinker, ));
fs_reclaim_release(GFP_KERNEL);
 
return 0;
@@ -794,10 +794,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
_pool_debugfs_shrink_fops);
 #endif
 
-   mm_shrinker.count_objects = ttm_pool_shrinker_count;
-   mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
-   mm_shrinker.seeks = 1;
-   return register_shrinker(_shrinker, "drm-ttm_pool");
+   mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
+   if (!mm_shrinker)
+   return -ENOMEM;
+
+   mm_shrinker->count_objects = ttm_pool_shrinker_count;
+   mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
+   mm_shrinker->seeks = 1;
+
+   shrinker_register(mm_shrinker);
+
+   return 0;
 }
 
 /**
@@ -817,6 +824,6 @@ void ttm_pool_mgr_fini(void)
ttm_pool_type_fini(_dma32_uncached[i]);
}
 
-   unregister_shrinker(_shrinker);
+   shrinker_free(mm_shrinker);
WARN_ON(!list_empty(_list));
 }
-- 
2.30.2

[PATCH v3 4/4] drm/ttm: introduce pool_shrink_rwsem

2023-08-23 Thread Qi Zheng

Currently, the synchronize_shrinkers() is only used by TTM pool. It only
requires that no shrinkers run in parallel.

After we use RCU+refcount method to implement the lockless slab shrink,
we can not use shrinker_rwsem or synchronize_rcu() to guarantee that all
shrinker invocations have seen an update before freeing memory.

So we introduce a new pool_shrink_rwsem to implement a private
ttm_pool_synchronize_shrinkers(), so as to achieve the same purpose.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
Reviewed-by: Christian König 
Acked-by: Daniel Vetter 
---
 drivers/gpu/drm/ttm/ttm_pool.c | 17 -
 include/linux/shrinker.h   |  1 -
 mm/shrinker.c  | 15 ---
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cddb9151d20f..648ca70403a7 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -74,6 +74,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 
1];
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
+static DECLARE_RWSEM(pool_shrink_rwsem);
 
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@@ -317,6 +318,7 @@ static unsigned int ttm_pool_shrink(void)
unsigned int num_pages;
struct page *p;
 
+   down_read(_shrink_rwsem);
spin_lock(_lock);
pt = list_first_entry(_list, typeof(*pt), shrinker_list);
list_move_tail(>shrinker_list, _list);
@@ -329,6 +331,7 @@ static unsigned int ttm_pool_shrink(void)
} else {
num_pages = 0;
}
+   up_read(_shrink_rwsem);
 
return num_pages;
 }
@@ -572,6 +575,18 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
*dev,
 }
 EXPORT_SYMBOL(ttm_pool_init);
 
+/**
+ * ttm_pool_synchronize_shrinkers - Wait for all running shrinkers to complete.
+ *
+ * This is useful to guarantee that all shrinker invocations have seen an
+ * update, before freeing memory, similar to rcu.
+ */
+static void ttm_pool_synchronize_shrinkers(void)
+{
+   down_write(_shrink_rwsem);
+   up_write(_shrink_rwsem);
+}
+
 /**
  * ttm_pool_fini - Cleanup a pool
  *
@@ -593,7 +608,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
/* We removed the pool types from the LRU, but we need to also make sure
 * that no shrinker is concurrently freeing pages from the pool.
 */
-   synchronize_shrinkers();
+   ttm_pool_synchronize_shrinkers();
 }
 EXPORT_SYMBOL(ttm_pool_fini);
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 8dc15aa37410..6b5843c3b827 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -103,7 +103,6 @@ extern int __printf(2, 3) register_shrinker(struct shrinker 
*shrinker,
const char *fmt, ...);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
-extern void synchronize_shrinkers(void);
 
 #ifdef CONFIG_SHRINKER_DEBUG
 extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
diff --git a/mm/shrinker.c b/mm/shrinker.c
index 043c87ccfab4..a16cd448b924 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -692,18 +692,3 @@ void unregister_shrinker(struct shrinker *shrinker)
shrinker->nr_deferred = NULL;
 }
 EXPORT_SYMBOL(unregister_shrinker);
-
-/**
- * synchronize_shrinkers - Wait for all running shrinkers to complete.
- *
- * This is equivalent to calling unregister_shrink() and register_shrinker(),
- * but atomically and with less overhead. This is useful to guarantee that all
- * shrinker invocations have seen an update, before freeing memory, similar to
- * rcu.
- */
-void synchronize_shrinkers(void)
-{
-   down_write(_rwsem);
-   up_write(_rwsem);
-}
-EXPORT_SYMBOL(synchronize_shrinkers);
-- 
2.30.2

[PATCH v3 3/4] mm: shrinker: remove redundant shrinker_rwsem in debugfs operations

2023-08-23 Thread Qi Zheng

The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
so the shrinker will not be freed when doing debugfs operations (such as
shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
is no need to hold shrinker_rwsem during debugfs operations.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
---
 mm/shrinker_debug.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index ee0cddb4530f..e4ce509f619e 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -51,17 +51,12 @@ static int shrinker_debugfs_count_show(struct seq_file *m, 
void *v)
struct mem_cgroup *memcg;
unsigned long total;
bool memcg_aware;
-   int ret, nid;
+   int ret = 0, nid;
 
count_per_node = kcalloc(nr_node_ids, sizeof(unsigned long), 
GFP_KERNEL);
if (!count_per_node)
return -ENOMEM;
 
-   ret = down_read_killable(_rwsem);
-   if (ret) {
-   kfree(count_per_node);
-   return ret;
-   }
rcu_read_lock();
 
memcg_aware = shrinker->flags & SHRINKER_MEMCG_AWARE;
@@ -94,7 +89,6 @@ static int shrinker_debugfs_count_show(struct seq_file *m, 
void *v)
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);
 
rcu_read_unlock();
-   up_read(_rwsem);
 
kfree(count_per_node);
return ret;
@@ -119,7 +113,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file 
*file,
struct mem_cgroup *memcg = NULL;
int nid;
char kbuf[72];
-   ssize_t ret;
 
read_len = size < (sizeof(kbuf) - 1) ? size : (sizeof(kbuf) - 1);
if (copy_from_user(kbuf, buf, read_len))
@@ -148,12 +141,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file 
*file,
return -EINVAL;
}
 
-   ret = down_read_killable(_rwsem);
-   if (ret) {
-   mem_cgroup_put(memcg);
-   return ret;
-   }
-
sc.nid = nid;
sc.memcg = memcg;
sc.nr_to_scan = nr_to_scan;
@@ -161,7 +148,6 @@ static ssize_t shrinker_debugfs_scan_write(struct file 
*file,
 
shrinker->scan_objects(shrinker, );
 
-   up_read(_rwsem);
mem_cgroup_put(memcg);
 
return size;
-- 
2.30.2

[PATCH v3 2/4] mm: vmscan: move shrinker-related code into a separate file

2023-08-23 Thread Qi Zheng

The mm/vmscan.c file is too large, so separate the shrinker-related
code from it into a separate file. No functional changes.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
---
 mm/Makefile   |   4 +-
 mm/internal.h |   2 +
 mm/shrinker.c | 709 ++
 mm/vmscan.c   | 701 -
 4 files changed, 713 insertions(+), 703 deletions(-)
 create mode 100644 mm/shrinker.c

diff --git a/mm/Makefile b/mm/Makefile
index ec65984e2ade..33873c8aedb3 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -48,8 +48,8 @@ endif
 
 obj-y  := filemap.o mempool.o oom_kill.o fadvise.o \
   maccess.o page-writeback.o folio-compat.o \
-  readahead.o swap.o truncate.o vmscan.o shmem.o \
-  util.o mmzone.o vmstat.o backing-dev.o \
+  readahead.o swap.o truncate.o vmscan.o shrinker.o \
+  shmem.o util.o mmzone.o vmstat.o backing-dev.o \
   mm_init.o percpu.o slab_common.o \
   compaction.o show_mem.o shmem_quota.o\
   interval_tree.o list_lru.o workingset.o \
diff --git a/mm/internal.h b/mm/internal.h
index f30bb60e7790..5d4697612073 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1157,6 +1157,8 @@ struct vma_prepare {
 };
 
 /* shrinker related functions */
+unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
+ int priority);
 
 #ifdef CONFIG_SHRINKER_DEBUG
 extern int shrinker_debugfs_add(struct shrinker *shrinker);
diff --git a/mm/shrinker.c b/mm/shrinker.c
new file mode 100644
index ..043c87ccfab4
--- /dev/null
+++ b/mm/shrinker.c
@@ -0,0 +1,709 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+
+#include "internal.h"
+
+LIST_HEAD(shrinker_list);
+DECLARE_RWSEM(shrinker_rwsem);
+
+#ifdef CONFIG_MEMCG
+static int shrinker_nr_max;
+
+/* The shrinker_info is expanded in a batch of BITS_PER_LONG */
+static inline int shrinker_map_size(int nr_items)
+{
+   return (DIV_ROUND_UP(nr_items, BITS_PER_LONG) * sizeof(unsigned long));
+}
+
+static inline int shrinker_defer_size(int nr_items)
+{
+   return (round_up(nr_items, BITS_PER_LONG) * sizeof(atomic_long_t));
+}
+
+void free_shrinker_info(struct mem_cgroup *memcg)
+{
+   struct mem_cgroup_per_node *pn;
+   struct shrinker_info *info;
+   int nid;
+
+   for_each_node(nid) {
+   pn = memcg->nodeinfo[nid];
+   info = rcu_dereference_protected(pn->shrinker_info, true);
+   kvfree(info);
+   rcu_assign_pointer(pn->shrinker_info, NULL);
+   }
+}
+
+int alloc_shrinker_info(struct mem_cgroup *memcg)
+{
+   struct shrinker_info *info;
+   int nid, size, ret = 0;
+   int map_size, defer_size = 0;
+
+   down_write(_rwsem);
+   map_size = shrinker_map_size(shrinker_nr_max);
+   defer_size = shrinker_defer_size(shrinker_nr_max);
+   size = map_size + defer_size;
+   for_each_node(nid) {
+   info = kvzalloc_node(sizeof(*info) + size, GFP_KERNEL, nid);
+   if (!info) {
+   free_shrinker_info(memcg);
+   ret = -ENOMEM;
+   break;
+   }
+   info->nr_deferred = (atomic_long_t *)(info + 1);
+   info->map = (void *)info->nr_deferred + defer_size;
+   info->map_nr_max = shrinker_nr_max;
+   rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info);
+   }
+   up_write(_rwsem);
+
+   return ret;
+}
+
+static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg,
+int nid)
+{
+   return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info,
+lockdep_is_held(_rwsem));
+}
+
+static int expand_one_shrinker_info(struct mem_cgroup *memcg,
+   int map_size, int defer_size,
+   int old_map_size, int old_defer_size,
+   int new_nr_max)
+{
+   struct shrinker_info *new, *old;
+   struct mem_cgroup_per_node *pn;
+   int nid;
+   int size = map_size + defer_size;
+
+   for_each_node(nid) {
+   pn = memcg->nodeinfo[nid];
+   old = shrinker_info_protected(memcg, nid);
+   /* Not yet online memcg */
+   if (!old)
+   return 0;
+
+   /* Already expanded this shrinker_info */
+   if (new_nr_max <= old->map_nr_max)
+   continue;
+
+   new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+   if (!new)
+   return -ENOMEM;
+
+   new->nr_deferred = (atomic_long_t *)(new + 1);
+

[PATCH v3 1/4] mm: move some shrinker-related function declarations to mm/internal.h

2023-08-23 Thread Qi Zheng

The following functions are only used inside the mm subsystem, so it's
better to move their declarations to the mm/internal.h file.

1. shrinker_debugfs_add()
2. shrinker_debugfs_detach()
3. shrinker_debugfs_remove()

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 
---
 include/linux/shrinker.h | 19 ---
 mm/internal.h| 26 ++
 mm/shrinker_debug.c  |  2 ++
 3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 224293b2dd06..8dc15aa37410 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -106,28 +106,9 @@ extern void free_prealloced_shrinker(struct shrinker 
*shrinker);
 extern void synchronize_shrinkers(void);
 
 #ifdef CONFIG_SHRINKER_DEBUG
-extern int shrinker_debugfs_add(struct shrinker *shrinker);
-extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
- int *debugfs_id);
-extern void shrinker_debugfs_remove(struct dentry *debugfs_entry,
-   int debugfs_id);
 extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
  const char *fmt, ...);
 #else /* CONFIG_SHRINKER_DEBUG */
-static inline int shrinker_debugfs_add(struct shrinker *shrinker)
-{
-   return 0;
-}
-static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
-int *debugfs_id)
-{
-   *debugfs_id = -1;
-   return NULL;
-}
-static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
-  int debugfs_id)
-{
-}
 static inline __printf(2, 3)
 int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...)
 {
diff --git a/mm/internal.h b/mm/internal.h
index 7499b5ea1cf6..f30bb60e7790 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1155,4 +1155,30 @@ struct vma_prepare {
struct vm_area_struct *remove;
struct vm_area_struct *remove2;
 };
+
+/* shrinker related functions */
+
+#ifdef CONFIG_SHRINKER_DEBUG
+extern int shrinker_debugfs_add(struct shrinker *shrinker);
+extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+ int *debugfs_id);
+extern void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+   int debugfs_id);
+#else /* CONFIG_SHRINKER_DEBUG */
+static inline int shrinker_debugfs_add(struct shrinker *shrinker)
+{
+   return 0;
+}
+static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker,
+int *debugfs_id)
+{
+   *debugfs_id = -1;
+   return NULL;
+}
+static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
+  int debugfs_id)
+{
+}
+#endif /* CONFIG_SHRINKER_DEBUG */
+
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c
index 3ab53fad8876..ee0cddb4530f 100644
--- a/mm/shrinker_debug.c
+++ b/mm/shrinker_debug.c
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+#include "internal.h"
+
 /* defined in vmscan.c */
 extern struct rw_semaphore shrinker_rwsem;
 extern struct list_head shrinker_list;
-- 
2.30.2

[PATCH v3 0/4] cleanups for lockless slab shrink

2023-08-23 Thread Qi Zheng

Hi all,

This series is some cleanups split from the previous patchset[1], I dropped the
[PATCH v2 5/5] which is more related to the main lockless patch.

This series is based on the next-20230823.

Comments and suggestions are welcome.

[1]. 
https://lore.kernel.org/lkml/20230807110936.21819-1-zhengqi.a...@bytedance.com/

Thanks,
Qi

Changlog in part 1 v2 -> part 1 v3:
 - drop [PATCH v2 5/5]
 - collect Acked-by
 - rebase onto the next-20230823

Changlog in part 1 v1 -> part 1 v2:
 - fix compilation warning in [PATCH 1/5]
 - rename synchronize_shrinkers() to ttm_pool_synchronize_shrinkers()
   (pointed by Christian König)
 - collect Reviewed-by

Changlog in v4 -> part 1 v1:
 - split from the previous large patchset
 - fix comment format in [PATCH v4 01/48] (pointed by Muchun Song)
 - change to use kzalloc_node() and fix typo in [PATCH v4 44/48]
   (pointed by Dave Chinner)
 - collect Reviewed-bys
 - rebase onto the next-20230815

Qi Zheng (4):
  mm: move some shrinker-related function declarations to mm/internal.h
  mm: vmscan: move shrinker-related code into a separate file
  mm: shrinker: remove redundant shrinker_rwsem in debugfs operations
  drm/ttm: introduce pool_shrink_rwsem

 drivers/gpu/drm/ttm/ttm_pool.c |  17 +-
 include/linux/shrinker.h   |  20 -
 mm/Makefile|   4 +-
 mm/internal.h  |  28 ++
 mm/shrinker.c  | 694 
 mm/shrinker_debug.c|  18 +-
 mm/vmscan.c| 701 -
 7 files changed, 743 insertions(+), 739 deletions(-)
 create mode 100644 mm/shrinker.c

-- 
2.30.2

Re: [RFC PATCH v2 00/11] Device Memory TCP

2023-08-23 Thread David Ahern

On 8/23/23 3:52 PM, David Wei wrote:
> I'm also preparing a submission for NetDev conf. I wonder if you and others at
> Google plan to present there as well? If so, then we may want to coordinate 
> our
> submissions and talks (if accepted).

personally, I see them as related but separate topics. Mina's proposal
as infra that io_uring builds on. Both are interesting and needed
discussions.

Re: [PATCH v2 0/9] DRM scheduler changes for Xe

2023-08-23 Thread Matthew Brost

On Thu, Aug 24, 2023 at 02:08:59AM +0200, Danilo Krummrich wrote:
> Hi Matt,
> 
> On 8/11/23 04:31, Matthew Brost wrote:
> > As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
> > have been asked to merge our common DRM scheduler patches first.
> > 
> > This a continuation of a RFC [3] with all comments addressed, ready for
> > a full review, and hopefully in state which can merged in the near
> > future. More details of this series can found in the cover letter of the
> > RFC [3].
> > 
> > These changes have been tested with the Xe driver.
> 
> Do you keep a branch with these patches somewhere?
> 

Pushed a branch for you:
https://gitlab.freedesktop.org/mbrost/nouveau-drm-scheduler/-/tree/xe-sched-changes?ref_type=heads

Matt

> - Danilo
> 
> > 
> > v2:
> >   - Break run job, free job, and process message in own work items
> >   - This might break other drivers as run job and free job now can run in
> > parallel, can fix up if needed
> > 
> > Matt
> > 
> > Matthew Brost (9):
> >drm/sched: Convert drm scheduler to use a work queue  rather than
> >  kthread
> >drm/sched: Move schedule policy to scheduler / entity
> >drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
> >drm/sched: Split free_job into own work item
> >drm/sched: Add generic scheduler message interface
> >drm/sched: Add drm_sched_start_timeout_unlocked helper
> >drm/sched: Start run wq before TDR in drm_sched_start
> >drm/sched: Submit job before starting TDR
> >drm/sched: Add helper to set TDR timeout
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   3 +-
> >   drivers/gpu/drm/etnaviv/etnaviv_sched.c|   5 +-
> >   drivers/gpu/drm/lima/lima_sched.c  |   5 +-
> >   drivers/gpu/drm/msm/msm_ringbuffer.c   |   5 +-
> >   drivers/gpu/drm/nouveau/nouveau_sched.c|   5 +-
> >   drivers/gpu/drm/panfrost/panfrost_job.c|   5 +-
> >   drivers/gpu/drm/scheduler/sched_entity.c   |  85 -
> >   drivers/gpu/drm/scheduler/sched_fence.c|   2 +-
> >   drivers/gpu/drm/scheduler/sched_main.c | 408 -
> >   drivers/gpu/drm/v3d/v3d_sched.c|  25 +-
> >   include/drm/gpu_scheduler.h|  75 +++-
> >   11 files changed, 487 insertions(+), 136 deletions(-)
> > 
>

Re: [PATCH drm-misc-next] drm/gpuva_mgr: remove unused prev pointer in __drm_gpuva_sm_map()

2023-08-23 Thread Dave Airlie

On Thu, 24 Aug 2023 at 09:31, Danilo Krummrich  wrote:
>
> The prev pointer in __drm_gpuva_sm_map() was used to implement automatic
> merging of mappings. Since automatic merging did not make its way
> upstream, remove this leftover.
>
> Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
> Signed-off-by: Danilo Krummrich 

Reviewed-by: Dave Airlie 
> ---
>  drivers/gpu/drm/drm_gpuva_mgr.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
> index 1bc91fc60ef3..3e1ca878cb7e 100644
> --- a/drivers/gpu/drm/drm_gpuva_mgr.c
> +++ b/drivers/gpu/drm/drm_gpuva_mgr.c
> @@ -1743,7 +1743,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
>u64 req_addr, u64 req_range,
>struct drm_gem_object *req_obj, u64 req_offset)
>  {
> -   struct drm_gpuva *va, *next, *prev = NULL;
> +   struct drm_gpuva *va, *next;
> u64 req_end = req_addr + req_range;
> int ret;
>
> @@ -1773,7 +1773,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
> ret = op_unmap_cb(ops, priv, va, merge);
> if (ret)
> return ret;
> -   goto next;
> +   continue;
> }
>
> if (end > req_end) {
> @@ -1818,7 +1818,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
> ret = op_remap_cb(ops, priv, , NULL, );
> if (ret)
> return ret;
> -   goto next;
> +   continue;
> }
>
> if (end > req_end) {
> @@ -1851,7 +1851,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
> ret = op_unmap_cb(ops, priv, va, merge);
> if (ret)
> return ret;
> -   goto next;
> +   continue;
> }
>
> if (end > req_end) {
> @@ -1872,8 +1872,6 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
> break;
> }
> }
> -next:
> -   prev = va;
> }
>
> return op_map_cb(ops, priv,
> --
> 2.41.0
>

[PATCH v2 5/6] drm/panfrost: Implement generic DRM object RSS reporting function

2023-08-23 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated and mapped for the
object, either in its entirety at creation time for non-heap buffers, or
else on demand for heap buffers at GPU page fault's IRQ handler.

Same calculations had to be done for imported PRIME objects, since backing
storage for it might have already been allocated by the exporting driver.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 22 ++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 +++-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index aea16b0e4dda..c6bd1f16a6d4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -206,6 +206,17 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
 
return res;
 }
+
+size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (!bo->base.pages)
+   return 0;
+
+   return bo->rss_size;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -218,6 +229,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
@@ -274,13 +286,23 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev,
 {
struct drm_gem_object *obj;
struct panfrost_gem_object *bo;
+   struct scatterlist *sgl;
+   unsigned int count;
+   size_t total = 0;
 
obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
if (IS_ERR(obj))
return ERR_CAST(obj);
 
+   for_each_sgtable_dma_sg(sgt, sgl, count) {
+   size_t len = sg_dma_len(sgl);
+
+   total += len;
+   }
+
bo = to_panfrost_bo(obj);
bo->noexec = true;
+   bo->rss_size = total;
 
return obj;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index e06f7ceb8f73..e2a7c46403c7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t rss_size;
+
bool noexec :1;
bool is_heap:1;
bool is_purgable:1;
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index c0123d09f699..e03a5a9da06f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -285,17 +285,19 @@ static void panfrost_mmu_flush_range(struct 
panfrost_device *pfdev,
pm_runtime_put_autosuspend(pfdev->dev);
 }
 
-static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,
+static size_t mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu 
*mmu,
  u64 iova, int prot, struct sg_table *sgt)
 {
unsigned int count;
struct scatterlist *sgl;
struct io_pgtable_ops *ops = mmu->pgtbl_ops;
u64 start_iova = iova;
+   size_t total = 0;
 
for_each_sgtable_dma_sg(sgt, sgl, count) {
unsigned long paddr = sg_dma_address(sgl);
size_t len = sg_dma_len(sgl);
+   total += len;
 
dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, 
len=%zx", mmu->as, iova, paddr, len);
 
@@ -315,7 +317,7 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct 
panfrost_mmu *mmu,
 
panfrost_mmu_flush_range(pfdev, mmu, start_iova, iova - start_iova);
 
-   return 0;
+   return total;
 }
 
 int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
@@ -326,6 +328,7 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
struct sg_table *sgt;
int prot = IOMMU_READ | IOMMU_WRITE;
+   size_t mapped_size;
 
if (WARN_ON(mapping->active))
return 0;
@@ -337,9 +340,10 @@ int panfrost_mmu_map(struct panfrost_gem_mapping *mapping)
if (WARN_ON(IS_ERR(sgt)))
return PTR_ERR(sgt);
 
-   mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << PAGE_SHIFT,
+   mapped_size = mmu_map_sg(pfdev, mapping->mmu, mapping->mmnode.start << 
PAGE_SHIFT,
   prot, sgt);
mapping->active = true;
+   bo->rss_size += mapped_size;
 
return 0;
 }
@@ -447,6 +451,7 @@ static int

[PATCH v2 3/6] drm/panfrost: Add fdinfo support for memory stats

2023-08-23 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provider more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  9 +++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 12 
 drivers/gpu/drm/panfrost/panfrost_gem.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3fd372301019..93d5f5538c0b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -440,11 +440,14 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
void *data,
args->retained = drm_gem_shmem_madvise(>base, args->madv);
 
if (args->retained) {
-   if (args->madv == PANFROST_MADV_DONTNEED)
+   if (args->madv == PANFROST_MADV_DONTNEED) {
list_move_tail(>base.madv_list,
   >shrinker_list);
-   else if (args->madv == PANFROST_MADV_WILLNEED)
+   bo->is_purgable = true;
+   } else if (args->madv == PANFROST_MADV_WILLNEED) {
list_del_init(>base.madv_list);
+   bo->is_purgable = false;
+   }
}
 
 out_unlock_mappings:
@@ -559,6 +562,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..aea16b0e4dda 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,17 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   res |= (bo->is_purgable) ? DRM_GEM_OBJECT_PURGEABLE : 0;
+
+   res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+   return res;
+}
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +217,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..e06f7ceb8f73 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -38,6 +38,7 @@ struct panfrost_gem_object {
 
bool noexec :1;
bool is_heap:1;
+   bool is_purgable:1;
 };
 
 struct panfrost_gem_mapping {
-- 
2.42.0

[PATCH v2 6/6] drm/drm-file: Allow size unit selection in drm_show_memory_stats

2023-08-23 Thread Adrián Larumbe

The current implementation will try to pick the highest available
unit. This is rather unflexible, and allowing drivers to display BO size
statistics through fdinfo in units of their choice might be desirable.

The new argument to drm_show_memory_stats is to be interpreted as the
integer multiplier of a 10-power of 2, so 1 would give us size in Kib and 2
in Mib. If we want drm-file functions to pick the highest unit, then 0
should be passed.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_file.c  | 22 +-
 drivers/gpu/drm/msm/msm_drv.c   |  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 +-
 include/drm/drm_file.h  |  5 +++--
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..517e1fb8072a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -873,7 +873,7 @@ void drm_send_event(struct drm_device *dev, struct 
drm_pending_event *e)
 EXPORT_SYMBOL(drm_send_event);
 
 static void print_size(struct drm_printer *p, const char *stat,
-  const char *region, u64 sz)
+  const char *region, u64 sz, unsigned int unit)
 {
const char *units[] = {"", " KiB", " MiB"};
unsigned u;
@@ -881,6 +881,8 @@ static void print_size(struct drm_printer *p, const char 
*stat,
for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
if (sz < SZ_1K)
break;
+   if (unit > 0 && unit == u)
+   break;
sz = div_u64(sz, SZ_1K);
}
 
@@ -898,17 +900,18 @@ static void print_size(struct drm_printer *p, const char 
*stat,
 void drm_print_memory_stats(struct drm_printer *p,
const struct drm_memory_stats *stats,
enum drm_gem_object_status supported_status,
-   const char *region)
+   const char *region,
+   unsigned int unit)
 {
-   print_size(p, "total", region, stats->private + stats->shared);
-   print_size(p, "shared", region, stats->shared);
-   print_size(p, "active", region, stats->active);
+   print_size(p, "total", region, stats->private + stats->shared, unit);
+   print_size(p, "shared", region, stats->shared, unit);
+   print_size(p, "active", region, stats->active, unit);
 
if (supported_status & DRM_GEM_OBJECT_RESIDENT)
-   print_size(p, "resident", region, stats->resident);
+   print_size(p, "resident", region, stats->resident, unit);
 
if (supported_status & DRM_GEM_OBJECT_PURGEABLE)
-   print_size(p, "purgeable", region, stats->purgeable);
+   print_size(p, "purgeable", region, stats->purgeable, unit);
 }
 EXPORT_SYMBOL(drm_print_memory_stats);
 
@@ -916,11 +919,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
  * drm_show_memory_stats - Helper to collect and show standard fdinfo memory 
stats
  * @p: the printer to print output to
  * @file: the DRM file
+ * @unit: multipliyer of power of two exponent of desired unit
  *
  * Helper to iterate over GEM objects with a handle allocated in the specified
  * file.
  */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
unsigned int unit)
 {
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -967,7 +971,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(>table_lock);
 
-   drm_print_memory_stats(p, , supported_status, "memory");
+   drm_print_memory_stats(p, , supported_status, "memory", unit);
 }
 EXPORT_SYMBOL(drm_show_memory_stats);
 
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 2a0e3529598b..cd1198151744 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -1067,7 +1067,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, 0);
 }
 
 static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 93d5f5538c0b..79c08cee3e9d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -563,7 +563,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
 
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, 1);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 010239392adf..21a3b022dd63 100644
---

[PATCH v2 0/6] Add fdinfo support to Panfrost

2023-08-23 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v2:
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Allow size unit selection in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c  | 27 +++
 drivers/gpu/drm/msm/msm_drv.c   |  2 +-
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 54 +++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 34 +
 drivers/gpu/drm/panfrost/panfrost_gem.h |  6 +++
 drivers/gpu/drm/panfrost/panfrost_job.c | 30 
 drivers/gpu/drm/panfrost/panfrost_job.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 16 --
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 ++
 include/drm/drm_file.h  |  5 +-
 include/drm/drm_gem.h   |  9 
 14 files changed, 195 insertions(+), 21 deletions(-)

-- 
2.42.0

[PATCH v2 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-08-23 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_file.c | 5 -
 include/drm/drm_gem.h  | 9 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   if (obj->funcs && obj->funcs->rss)
+   status.resident += obj->funcs->rss(obj);
+   else
+   status.resident += obj->size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index c0b13c43b459..78ed9fab6044 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v2 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-08-23 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 ++
 drivers/gpu/drm/panfrost/panfrost_drv.c | 45 -
 drivers/gpu/drm/panfrost/panfrost_job.c | 30 ++
 drivers/gpu/drm/panfrost/panfrost_job.h |  4 ++
 6 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panfrost_devfreq_update_utilization(pfdevfreq);
+   pfdevfreq->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
struct devfreq *devfreq;
struct thermal_cooling_device *cooling;
struct panfrost_devfreq *pfdevfreq = >pfdevfreq;
+   unsigned long freq = ULONG_MAX;
 
if (pfdev->comp->num_supplies > 1) {
/*
@@ -172,6 +174,12 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
return ret;
}
 
+   /* Find the fastest defined rate  */
+   opp = dev_pm_opp_find_freq_floor(dev, );
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   pfdevfreq->fast_rate = freq;
+
dev_pm_opp_put(opp);
 
/*
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.h 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
index 1514c1f9d91c..48dbe185f206 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.h
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.h
@@ -19,6 +19,9 @@ struct panfrost_devfreq {
struct devfreq_simple_ondemand_data gov_data;
bool opp_of_table_added;
 
+   unsigned long current_frequency;
+   unsigned long fast_rate;
+
ktime_t busy_time;
ktime_t idle_time;
ktime_t time_last_update;
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index b0126b9fbadc..680f298fd1a9 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -24,6 +24,7 @@ struct panfrost_perfcnt;
 
 #define NUM_JOB_SLOTS 3
 #define MAX_PM_DOMAINS 5
+#define MAX_SLOT_NAME_LEN 10
 
 struct panfrost_features {
u16 id;
@@ -135,12 +136,24 @@ struct panfrost_mmu {
struct list_head list;
 };
 
+struct drm_info_gpu {
+   unsigned int maxfreq;
+
+   struct engine_info {
+   unsigned long long elapsed_ns;
+   unsigned long long cycles;
+   char name[MAX_SLOT_NAME_LEN];
+   } engines[NUM_JOB_SLOTS];
+};
+
 struct panfrost_file_priv {
struct panfrost_device *pfdev;
 
struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
 
struct panfrost_mmu *mmu;
+
+   struct drm_info_gpu fdinfo;
 };
 
 static inline struct panfrost_device *to_panfrost_device(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a2ab99698ca8..3fd372301019 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -267,6 +267,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, 
void *data,
job->requirements = args->requirements;
job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
job->mmu = file_priv->mmu;
+   job->priv = file_priv;
 
slot = panfrost_job_get_slot(job);
 
@@ -483,6 +484,14 @@ panfrost_open(struct drm_device *dev, struct drm_file 
*file)
goto err_free;
}
 
+   snprintf(panfrost_priv->fdinfo.engines[0].name, MAX_SLOT_NAME_LEN, 
"frg");
+   snprintf(panfrost_priv->fdinfo.engines[1].name, MAX_SLOT_NAME_LEN, 
"vtx");
+#if 0
+   /* Add compute engine in the future */
+   snprintf(panfrost_priv->fdinfo.engines[2].name, MAX_SLOT_NAME_LEN, 
"cmp");
+#endif
+   panfrost_priv->fdinfo.maxfreq =

[PATCH v2 1/6] drm/panfrost: Add cycle count GPU register definitions

2023-08-23 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

Re: [PATCH v2 0/9] DRM scheduler changes for Xe

2023-08-23 Thread Danilo Krummrich


Hi Matt,

On 8/11/23 04:31, Matthew Brost wrote:

As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
have been asked to merge our common DRM scheduler patches first.

This a continuation of a RFC [3] with all comments addressed, ready for
a full review, and hopefully in state which can merged in the near
future. More details of this series can found in the cover letter of the
RFC [3].

These changes have been tested with the Xe driver.


Do you keep a branch with these patches somewhere?

- Danilo



v2:
  - Break run job, free job, and process message in own work items
  - This might break other drivers as run job and free job now can run in
parallel, can fix up if needed

Matt

Matthew Brost (9):
   drm/sched: Convert drm scheduler to use a work queue  rather than
 kthread
   drm/sched: Move schedule policy to scheduler / entity
   drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
   drm/sched: Split free_job into own work item
   drm/sched: Add generic scheduler message interface
   drm/sched: Add drm_sched_start_timeout_unlocked helper
   drm/sched: Start run wq before TDR in drm_sched_start
   drm/sched: Submit job before starting TDR
   drm/sched: Add helper to set TDR timeout

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   3 +-
  drivers/gpu/drm/etnaviv/etnaviv_sched.c|   5 +-
  drivers/gpu/drm/lima/lima_sched.c  |   5 +-
  drivers/gpu/drm/msm/msm_ringbuffer.c   |   5 +-
  drivers/gpu/drm/nouveau/nouveau_sched.c|   5 +-
  drivers/gpu/drm/panfrost/panfrost_job.c|   5 +-
  drivers/gpu/drm/scheduler/sched_entity.c   |  85 -
  drivers/gpu/drm/scheduler/sched_fence.c|   2 +-
  drivers/gpu/drm/scheduler/sched_main.c | 408 -
  drivers/gpu/drm/v3d/v3d_sched.c|  25 +-
  include/drm/gpu_scheduler.h|  75 +++-
  11 files changed, 487 insertions(+), 136 deletions(-)

[PATCH drm-misc-next] drm/gpuva_mgr: remove unused prev pointer in __drm_gpuva_sm_map()

2023-08-23 Thread Danilo Krummrich

The prev pointer in __drm_gpuva_sm_map() was used to implement automatic
merging of mappings. Since automatic merging did not make its way
upstream, remove this leftover.

Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
Signed-off-by: Danilo Krummrich 
---
 drivers/gpu/drm/drm_gpuva_mgr.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index 1bc91fc60ef3..3e1ca878cb7e 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -1743,7 +1743,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
   u64 req_addr, u64 req_range,
   struct drm_gem_object *req_obj, u64 req_offset)
 {
-   struct drm_gpuva *va, *next, *prev = NULL;
+   struct drm_gpuva *va, *next;
u64 req_end = req_addr + req_range;
int ret;
 
@@ -1773,7 +1773,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
-   goto next;
+   continue;
}
 
if (end > req_end) {
@@ -1818,7 +1818,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
ret = op_remap_cb(ops, priv, , NULL, );
if (ret)
return ret;
-   goto next;
+   continue;
}
 
if (end > req_end) {
@@ -1851,7 +1851,7 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
-   goto next;
+   continue;
}
 
if (end > req_end) {
@@ -1872,8 +1872,6 @@ __drm_gpuva_sm_map(struct drm_gpuva_manager *mgr,
break;
}
}
-next:
-   prev = va;
}
 
return op_map_cb(ops, priv,
-- 
2.41.0

Re: [Intel-xe] [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-08-23 Thread Matthew Brost

On Wed, Aug 23, 2023 at 01:26:09PM -0400, Rodrigo Vivi wrote:
> On Wed, Aug 23, 2023 at 11:41:19AM -0400, Alex Deucher wrote:
> > On Wed, Aug 23, 2023 at 11:26 AM Matthew Brost  
> > wrote:
> > >
> > > On Wed, Aug 23, 2023 at 09:10:51AM +0200, Christian König wrote:
> > > > Am 23.08.23 um 05:27 schrieb Matthew Brost:
> > > > > [SNIP]
> > > > > > That is exactly what I want to avoid, tying the TDR to the job is 
> > > > > > what some
> > > > > > AMD engineers pushed for because it looked like a simple solution 
> > > > > > and made
> > > > > > the whole thing similar to what Windows does.
> > > > > >
> > > > > > This turned the previous relatively clean scheduler and TDR design 
> > > > > > into a
> > > > > > complete nightmare. The job contains quite a bunch of things which 
> > > > > > are not
> > > > > > necessarily available after the application which submitted the job 
> > > > > > is torn
> > > > > > down.
> > > > > >
> > > > > Agree the TDR shouldn't be accessing anything application specific
> > > > > rather just internal job state required to tear the job down on the
> > > > > hardware.
> > > > > > So what happens is that you either have stale pointers in the TDR 
> > > > > > which can
> > > > > > go boom extremely easily or we somehow find a way to keep the 
> > > > > > necessary
> > > > > I have not experenced the TDR going boom in Xe.
> > > > >
> > > > > > structures (which include struct thread_info and struct file for 
> > > > > > this driver
> > > > > > connection) alive until all submissions are completed.
> > > > > >
> > > > > In Xe we keep everything alive until all submissions are completed. By
> > > > > everything I mean the drm job, entity, scheduler, and VM via a 
> > > > > reference
> > > > > counting scheme. All of these structures are just kernel state which 
> > > > > can
> > > > > safely be accessed even if the application has been killed.
> > > >
> > > > Yeah, but that might just not be such a good idea from memory management
> > > > point of view.
> > > >
> > > > When you (for example) kill a process all resource from that progress 
> > > > should
> > > > at least be queued to be freed more or less immediately.
> > > >
> > >
> > > We do this, the TDR kicks jobs off the hardware as fast as the hw
> > > interface allows and signals all pending hw fences immediately after.
> > > Free jobs then is immediately called and the reference count goes to
> > > zero. I think max time for all of this to occur is a handful of ms.
> > >
> > > > What Linux is doing for other I/O operations is to keep the relevant 
> > > > pages
> > > > alive until the I/O operation is completed, but for GPUs that usually 
> > > > means
> > > > keeping most of the memory of the process alive and that in turn is 
> > > > really
> > > > not something you can do.
> > > >
> > > > You can of course do this if your driver has a reliable way of killing 
> > > > your
> > > > submissions and freeing resources in a reasonable amount of time. This
> > > > should then be done in the flush callback.
> > > >
> > >
> > > 'flush callback' - Do you mean drm_sched_entity_flush? I looked at that
> > > and think that function doesn't even work for what I tell. It flushes
> > > the spsc queue but what about jobs on the hardware, how do those get
> > > killed?
> > >
> > > As stated we do via the TDR which is rather clean design and fits with
> > > our reference couting scheme.
> > >
> > > > > If we need to teardown on demand we just set the TDR to a minimum 
> > > > > value and
> > > > > it kicks the jobs off the hardware, gracefully cleans everything up 
> > > > > and
> > > > > drops all references. This is a benefit of the 1 to 1 relationship, 
> > > > > not
> > > > > sure if this works with how AMDGPU uses the scheduler.
> > > > >
> > > > > > Delaying application tear down is also not an option because then 
> > > > > > you run
> > > > > > into massive trouble with the OOM killer (or more generally OOM 
> > > > > > handling).
> > > > > > See what we do in drm_sched_entity_flush() as well.
> > > > > >
> > > > > Not an issue for Xe, we never call drm_sched_entity_flush as our
> > > > > referencing counting scheme is all jobs are finished before we attempt
> > > > > to tear down entity / scheduler.
> > > >
> > > > I don't think you can do that upstream. Calling 
> > > > drm_sched_entity_flush() is
> > > > a must have from your flush callback for the file descriptor.
> > > >
> > >
> > > Again 'flush callback'? What are you refering too.
> > >
> > > And why does drm_sched_entity_flush need to be called, doesn't seem to
> > > do anything useful.
> > >
> > > > Unless you have some other method for killing your submissions this 
> > > > would
> > > > give a path for a deny of service attack vector when the Xe driver is in
> > > > use.
> > > >
> > >
> > > Yes, once th TDR fires is disallows all new submissions at the exec
> > > IOCTL plus flushes any pending submissions as fast as possible.
> > >
> > > > > > Since adding the TDR support

[Bug 217664] Laptop doesnt wake up from suspend mode.

2023-08-23 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=217664

--- Comment #26 from Mario Limonciello (AMD) (mario.limoncie...@amd.com) ---
I've never used that tool before, ut please make sure that you have all the
necessary packages installed.  You need both the linux-image and linux-modules
packages.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v5 00/17] Imagination Technologies PowerVR DRM driver

2023-08-23 Thread Masahiro Yamada

On Fri, Aug 18, 2023 at 4:35 AM Sarah Walker  wrote:
>
> This patch series adds the initial DRM driver for Imagination Technologies 
> PowerVR
> GPUs, starting with those based on our Rogue architecture. It's worth pointing
> out that this is a new driver, written from the ground up, rather than a
> refactored version of our existing downstream driver (pvrsrvkm).
>
> This new DRM driver supports:
> - GEM shmem allocations
> - dma-buf / PRIME
> - Per-context userspace managed virtual address space
> - DRM sync objects (binary and timeline)
> - Power management suspend / resume
> - GPU job submission (geometry, fragment, compute, transfer)
> - META firmware processor
> - MIPS firmware processor
> - GPU hang detection and recovery
>
> Currently our main focus is on the AXE-1-16M GPU. Testing so far has been done
> using a TI SK-AM62 board (AXE-1-16M GPU). Firmware for the AXE-1-16M can be
> found here:
> https://gitlab.freedesktop.org/frankbinns/linux-firmware/-/tree/powervr
>
> A Vulkan driver that works with our downstream kernel driver has already been
> merged into Mesa [1][2]. Support for this new DRM driver is being maintained 
> in
> a merge request [3], with the branch located here:
> https://gitlab.freedesktop.org/frankbinns/mesa/-/tree/powervr-winsys
>
> Job stream formats are documented at:
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/f8d2b42ae65c2f16f36a43e0ae39d288431e4263/src/imagination/csbgen/rogue_kmd_stream.xml
>
> The Vulkan driver is progressing towards Vulkan 1.0. We're code complete, and
> are working towards passing conformance. The current combination of this 
> kernel
> driver with the Mesa Vulkan driver (powervr-mesa-next branch) achieves 88.3% 
> conformance.
>
> The code in this patch series, along with some of its history, can also be 
> found here:
> https://gitlab.freedesktop.org/frankbinns/powervr/-/tree/powervr-next
>
> This patch series has dependencies on a number of patches not yet merged. They
> are listed below :
>
> drm/sched: Convert drm scheduler to use a work queue rather than kthread:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-2-matthew.br...@intel.com/
> drm/sched: Move schedule policy to scheduler / entity:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-3-matthew.br...@intel.com/
> drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-4-matthew.br...@intel.com/
> drm/sched: Start run wq before TDR in drm_sched_start:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-6-matthew.br...@intel.com/
> drm/sched: Submit job before starting TDR:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-7-matthew.br...@intel.com/
> drm/sched: Add helper to set TDR timeout:
>   
> https://lore.kernel.org/dri-devel/20230404002211.3611376-8-matthew.br...@intel.com/
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15243
> [2] 
> https://gitlab.freedesktop.org/mesa/mesa/-/tree/main/src/imagination/vulkan
> [3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15507
>
> High level summary of changes:
>
> v5:
> * Retrieve GPU device information from firmware image header
> * Address issues with DT binding and example DTS
> * Update VM code for upstream GPU VA manager
> * BOs are always zeroed on allocation
> * Update copyright
>
> v4:
> * Implemented hang recovery via firmware hard reset
> * Add support for partial render jobs
> * Move to a threaded IRQ
> * Remove unnecessary read/write and clock helpers
> * Remove device tree elements not relevant to AXE-1-16M
> * Clean up resource acquisition
> * Remove unused DT binding attributes
>
> v3:
> * Use drm_sched for scheduling
> * Use GPU VA manager
> * Use runtime PM
> * Use drm_gem_shmem
> * GPU watchdog and device loss handling
> * DT binding changes: remove unused attributes, add additionProperties:false
>
> v2:
> * Redesigned and simplified UAPI based on RFC feedback from XDC 2022
> * Support for transfer and partial render jobs
> * Support for timeline sync objects
>
> RFC v1: 
> https://lore.kernel.org/dri-devel/20220815165156.118212-1-sarah.wal...@imgtec.com/
>
> RFC v2: 
> https://lore.kernel.org/dri-devel/20230413103419.293493-1-sarah.wal...@imgtec.com/
>
> v3: 
> https://lore.kernel.org/dri-devel/20230613144800.52657-1-sarah.wal...@imgtec.com/
>
> v4: 
> https://lore.kernel.org/dri-devel/20230714142355.111382-1-sarah.wal...@imgtec.com/
>
> Matt Coster (1):
>   sizes.h: Add entries between 32G and 64T
>
> Sarah Walker (16):
>   dt-bindings: gpu: Add Imagination Technologies PowerVR GPU
>   drm/imagination/uapi: Add PowerVR driver UAPI
>   drm/imagination: Add skeleton PowerVR driver
>   drm/imagination: Get GPU resources
>   drm/imagination: Add GPU register and FWIF headers
>   drm/imagination: Add GPU ID parsing and firmware loading
>   drm/imagination: Add GEM and VM related code
>   drm/imagination: Implement power management
>   drm/imagination: Implement

Re: [PATCH v2 00/11] Fix typos, comments and copyright

2023-08-23 Thread Bjorn Helgaas

On Wed, Aug 09, 2023 at 06:34:01AM +0800, Sui Jingfeng wrote:
> From: Sui Jingfeng 
> 
> v1:
>   * Various improve.
> v2:
>   * More fixes, optimizations and improvements.
> 
> Sui Jingfeng (11):
>   PCI/VGA: Use unsigned type for the io_state variable

>   PCI: Add the pci_get_class_masked() helper
>   PCI/VGA: Deal with VGA class devices

I dropped these two patches, at least for now.  There's no other use
of pci_get_class_masked(), and the VGA use seems to be mostly
optimization and possibly some behavior change that isn't 100% clear
yet.  If it's important, we can look at it again later.

>   PCI/VGA: Drop the inline in the vga_update_device_decodes() function.
>   PCI/VGA: Move the new_state assignment out of the loop
>   PCI/VGA: Fix two typos in the comments of pci_notify()
>   PCI/VGA: vga_client_register() return -ENODEV on failure, not -1
>   PCI/VGA: Fix a typo to the comment of vga_default
>   PCI/VGA: Fix a typo to the comments in vga_str_to_iostate() function
>   PCI/VGA: Tidy up the code and comment format
>   PCI/VGA: Replace full MIT license text with SPDX identifier
> 
>  drivers/pci/search.c   |  30 ++
>  drivers/pci/vgaarb.c   | 233 +
>  include/linux/pci.h|   7 ++
>  include/linux/vgaarb.h |  27 +
>  4 files changed, 185 insertions(+), 112 deletions(-)

I applied the rest of the patches on pci/vga for v6.5.

I updated the commit logs and tweaked some of the patches.

I squashed all the typo fixes together and added several more that I
had done previously but not posted.  The diff between your original v2
posting and the current pci/vga branch is attached.  Most of it is
additional typo fixes, but if you look closely you'll see:

  - The omission of the pci_get_class_masked() patches (in
vga_arbiter_add_pci_device(), pci_notify(), vga_arb_device_init())

  - The tweaks I did to:

  vga_update_device_decodes()
  vga_client_register()
  vga_arbiter_notify_clients()

I dropped the Reviewed-bys from the typo fixes because I didn't want
to extend them to everything that got squashed together.  Happy to add
them back if anybody wants to look again.

The branch is at 
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=vga

Bjorn

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index f1c15aea868b..b4c138a6ec02 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -334,36 +334,6 @@ struct pci_dev *pci_get_device(unsigned int vendor, 
unsigned int device,
 }
 EXPORT_SYMBOL(pci_get_device);
 
-/**
- * pci_get_class_masked - begin or continue searching for a PCI device by 
class and mask
- * @class: search for a PCI device with this class designation
- * @from: Previous PCI device found in search, or %NULL for new search.
- *
- * Iterates through the list of known PCI devices.  If a PCI device is
- * found with a matching @class, the reference count to the device is
- * incremented and a pointer to its device structure is returned.
- * Otherwise, %NULL is returned.
- * A new search is initiated by passing %NULL as the @from argument.
- * Otherwise if @from is not %NULL, searches continue from next device
- * on the global list.  The reference count for @from is always decremented
- * if it is not %NULL.
- */
-struct pci_dev *pci_get_class_masked(unsigned int class, unsigned int mask,
-struct pci_dev *from)
-{
-   struct pci_device_id id = {
-   .vendor = PCI_ANY_ID,
-   .device = PCI_ANY_ID,
-   .subvendor = PCI_ANY_ID,
-   .subdevice = PCI_ANY_ID,
-   .class_mask = mask,
-   .class = class,
-   };
-
-   return pci_get_dev_by_id(, from);
-}
-EXPORT_SYMBOL(pci_get_class_masked);
-
 /**
  * pci_get_class - begin or continue searching for a PCI device by class
  * @class: search for a PCI device with this class designation
diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index a2f6e0e6b634..5e6b1eb54c64 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: MIT
 /*
- * vgaarb.c: Implements the VGA arbitration. For details refer to
+ * vgaarb.c: Implements VGA arbitration. For details refer to
  * Documentation/gpu/vgaarbiter.rst
  *
  * (C) Copyright 2005 Benjamin Herrenschmidt 
@@ -42,9 +42,9 @@ static void vga_arbiter_notify_clients(void);
 struct vga_device {
struct list_head list;
struct pci_dev *pdev;
-   unsigned int decodes;   /* what does it decodes */
-   unsigned int owns;  /* what does it owns */
-   unsigned int locks; /* what does it locks */
+   unsigned int decodes;   /* what it decodes */
+   unsigned int owns;  /* what it owns */
+   unsigned int locks; /* what it locks */
unsigned int io_lock_cnt;   /* legacy IO lock count */
unsigned int mem_lock_cnt;  /* legacy MEM lock count */

[Bug 217664] Laptop doesnt wake up from suspend mode.

2023-08-23 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=217664

--- Comment #25 from popus_czy_to_ty (pentelja...@o2.pl) ---
sudo add-apt-repository ppa:cappelikan/ppa
sudo apt update
sudo apt install mainline
sudo mainline



lsd@Crawler-E25:~$ sudo mainline install 6.4.11
mainline 1.4.8
Updating Kernels...
Pobieranie 6.4.11
Instalowanie 6.4.11

after this, and format im landing on initramfs on fresh kubuntu 23.04 for some
reason

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH v9 3/3] dma-buf/sw_sync: Add fence deadline support

2023-08-23 Thread Rob Clark

From: Rob Clark 

This consists of simply storing the most recent deadline, and adding an
ioctl to retrieve the deadline.  This can be used in conjunction with
the SET_DEADLINE ioctl on a fence fd for testing.  Ie. create various
sw_sync fences, merge them into a fence-array, set deadline on the
fence-array and confirm that it is propagated properly to each fence.

v2: Switch UABI to express deadline as u64
v3: More verbose UAPI docs, show how to convert from timespec
v4: Better comments, track the soonest deadline, as a normal fence
implementation would, return an error if no deadline set.

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
Acked-by: Pekka Paalanen 
---
 drivers/dma-buf/sw_sync.c| 82 
 drivers/dma-buf/sync_debug.h |  2 +
 2 files changed, 84 insertions(+)

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index f0a35277fd84..c353029789cf 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -52,12 +52,33 @@ struct sw_sync_create_fence_data {
__s32   fence; /* fd of new fence */
 };
 
+/**
+ * struct sw_sync_get_deadline - get the deadline hint of a sw_sync fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad:   must be zero
+ * @fence_fd:  the sw_sync fence fd (in)
+ *
+ * Return the earliest deadline set on the fence.  The timebase for the
+ * deadline is CLOCK_MONOTONIC (same as vblank).  If there is no deadline
+ * set on the fence, this ioctl will return -ENOENT.
+ */
+struct sw_sync_get_deadline {
+   __u64   deadline_ns;
+   __u32   pad;
+   __s32   fence_fd;
+};
+
 #define SW_SYNC_IOC_MAGIC  'W'
 
 #define SW_SYNC_IOC_CREATE_FENCE   _IOWR(SW_SYNC_IOC_MAGIC, 0,\
struct sw_sync_create_fence_data)
 
 #define SW_SYNC_IOC_INC_IOW(SW_SYNC_IOC_MAGIC, 1, 
__u32)
+#define SW_SYNC_GET_DEADLINE   _IOWR(SW_SYNC_IOC_MAGIC, 2, \
+   struct sw_sync_get_deadline)
+
+
+#define SW_SYNC_HAS_DEADLINE_BIT   DMA_FENCE_FLAG_USER_BITS
 
 static const struct dma_fence_ops timeline_fence_ops;
 
@@ -171,6 +192,22 @@ static void timeline_fence_timeline_value_str(struct 
dma_fence *fence,
snprintf(str, size, "%d", parent->value);
 }
 
+static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t 
deadline)
+{
+   struct sync_pt *pt = dma_fence_to_sync_pt(fence);
+   unsigned long flags;
+
+   spin_lock_irqsave(fence->lock, flags);
+   if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags)) {
+   if (ktime_before(deadline, pt->deadline))
+   pt->deadline = deadline;
+   } else {
+   pt->deadline = deadline;
+   __set_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags);
+   }
+   spin_unlock_irqrestore(fence->lock, flags);
+}
+
 static const struct dma_fence_ops timeline_fence_ops = {
.get_driver_name = timeline_fence_get_driver_name,
.get_timeline_name = timeline_fence_get_timeline_name,
@@ -179,6 +216,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
.release = timeline_fence_release,
.fence_value_str = timeline_fence_value_str,
.timeline_value_str = timeline_fence_timeline_value_str,
+   .set_deadline = timeline_fence_set_deadline,
 };
 
 /**
@@ -387,6 +425,47 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, 
unsigned long arg)
return 0;
 }
 
+static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long 
arg)
+{
+   struct sw_sync_get_deadline data;
+   struct dma_fence *fence;
+   unsigned long flags;
+   struct sync_pt *pt;
+   int ret = 0;
+
+   if (copy_from_user(, (void __user *)arg, sizeof(data)))
+   return -EFAULT;
+
+   if (data.deadline_ns || data.pad)
+   return -EINVAL;
+
+   fence = sync_file_get_fence(data.fence_fd);
+   if (!fence)
+   return -EINVAL;
+
+   pt = dma_fence_to_sync_pt(fence);
+   if (!pt)
+   return -EINVAL;
+
+   spin_lock_irqsave(fence->lock, flags);
+   if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags)) {
+   data.deadline_ns = ktime_to_ns(pt->deadline);
+   } else {
+   ret = -ENOENT;
+   }
+   spin_unlock_irqrestore(fence->lock, flags);
+
+   dma_fence_put(fence);
+
+   if (ret)
+   return ret;
+
+   if (copy_to_user((void __user *)arg, , sizeof(data)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static long sw_sync_ioctl(struct file *file, unsigned int cmd,
  unsigned long arg)
 {
@@ -399,6 +478,9 @@ static long sw_sync_ioctl(struct file *file, unsigned int 
cmd,
case SW_SYNC_IOC_INC:
return sw_sync_ioctl_inc(obj, arg);
 
+   case SW_SYNC_GET_DEADLINE:
+   return sw_sync_ioctl_get_deadline(obj, arg);
+
default:
return -ENOTTY;
}
diff --git

[PATCH v9 1/3] drm/syncobj: Add deadline support for syncobj waits

2023-08-23 Thread Rob Clark

From: Rob Clark 

Add a new flag to let userspace provide a deadline as a hint for syncobj
and timeline waits.  This gives a hint to the driver signaling the
backing fences about how soon userspace needs it to compete work, so it
can adjust GPU frequency accordingly.  An immediate deadline can be
given to provide something equivalent to i915 "wait boost".

v2: Use absolute u64 ns value for deadline hint, drop cap and driver
feature flag in favor of allowing count_handles==0 as a way for
userspace to probe kernel for support of new flag
v3: More verbose comments about UAPI
v4: Fix negative zero, s/deadline_ns/deadline_nsec/ for consistency with
existing ioctl struct fields
v5: Comment/description typo fixes

Signed-off-by: Rob Clark 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/drm_syncobj.c | 64 ---
 include/uapi/drm/drm.h| 17 ++
 2 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0c2be8360525..3f86e2b84200 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -126,6 +126,11 @@
  * synchronize between the two.
  * This requirement is inherited from the Vulkan fence API.
  *
+ * If _SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
+ * a fence deadline hint on the backing fences before waiting, to provide the
+ * fence signaler with an appropriate sense of urgency.  The deadline is
+ * specified as an absolute _MONOTONIC value in units of ns.
+ *
  * Similarly, _IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
  * handles as well as an array of u64 points and does a host-side wait on all
  * of syncobj fences at the given points simultaneously.
@@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
  uint32_t count,
  uint32_t flags,
  signed long timeout,
- uint32_t *idx)
+ uint32_t *idx,
+ ktime_t *deadline)
 {
struct syncobj_wait_entry *entries;
struct dma_fence *fence;
@@ -1053,6 +1059,15 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
drm_syncobj_fence_add_wait(syncobjs[i], [i]);
}
 
+   if (deadline) {
+   for (i = 0; i < count; ++i) {
+   fence = entries[i].fence;
+   if (!fence)
+   continue;
+   dma_fence_set_deadline(fence, *deadline);
+   }
+   }
+
do {
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
  struct drm_file *file_private,
  struct drm_syncobj_wait *wait,
  struct drm_syncobj_timeline_wait 
*timeline_wait,
- struct drm_syncobj **syncobjs, bool timeline)
+ struct drm_syncobj **syncobjs, bool timeline,
+ ktime_t *deadline)
 {
signed long timeout = 0;
uint32_t first = ~0;
@@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 NULL,
 wait->count_handles,
 wait->flags,
-timeout, );
+timeout, ,
+deadline);
if (timeout < 0)
return timeout;
wait->first_signaled = first;
@@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 
u64_to_user_ptr(timeline_wait->points),
 
timeline_wait->count_handles,
 timeline_wait->flags,
-timeout, );
+timeout, ,
+deadline);
if (timeout < 0)
return timeout;
timeline_wait->first_signaled = first;
@@ -1243,17 +1261,22 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void 
*data,
 {
struct drm_syncobj_wait *args = data;
struct drm_syncobj **syncobjs;
+   unsigned possible_flags;
+   ktime_t t, *tp =

[PATCH v9 2/3] dma-buf/sync_file: Add SET_DEADLINE ioctl

2023-08-23 Thread Rob Clark

From: Rob Clark 

The initial purpose is for igt tests, but this would also be useful for
compositors that wait until close to vblank deadline to make decisions
about which frame to show.

The igt tests can be found at:

https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline

v2: Clarify the timebase, add link to igt tests
v3: Use u64 value in ns to express deadline.
v4: More doc

Signed-off-by: Rob Clark 
Acked-by: Pekka Paalanen 
---
 drivers/dma-buf/dma-fence.c|  3 ++-
 drivers/dma-buf/sync_file.c| 19 +++
 include/uapi/linux/sync_file.h | 22 ++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f177c56269bb..74e36f6d05b0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -933,7 +933,8 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  *   the GPU's devfreq to reduce frequency, when in fact the opposite is what 
is
  *   needed.
  *
- * To this end, deadline hint(s) can be set on a _fence via 
_fence_set_deadline.
+ * To this end, deadline hint(s) can be set on a _fence via 
_fence_set_deadline
+ * (or indirectly via userspace facing ioctls like _set_deadline).
  * The deadline hint provides a way for the waiting driver, or userspace, to
  * convey an appropriate sense of urgency to the signaling driver.
  *
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index af57799c86ce..418021cfb87c 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -350,6 +350,22 @@ static long sync_file_ioctl_fence_info(struct sync_file 
*sync_file,
return ret;
 }
 
+static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
+   unsigned long arg)
+{
+   struct sync_set_deadline ts;
+
+   if (copy_from_user(, (void __user *)arg, sizeof(ts)))
+   return -EFAULT;
+
+   if (ts.pad)
+   return -EINVAL;
+
+   dma_fence_set_deadline(sync_file->fence, ns_to_ktime(ts.deadline_ns));
+
+   return 0;
+}
+
 static long sync_file_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
 {
@@ -362,6 +378,9 @@ static long sync_file_ioctl(struct file *file, unsigned int 
cmd,
case SYNC_IOC_FILE_INFO:
return sync_file_ioctl_fence_info(sync_file, arg);
 
+   case SYNC_IOC_SET_DEADLINE:
+   return sync_file_ioctl_set_deadline(sync_file, arg);
+
default:
return -ENOTTY;
}
diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
index ff0a931833e2..ff1f38889dcf 100644
--- a/include/uapi/linux/sync_file.h
+++ b/include/uapi/linux/sync_file.h
@@ -76,6 +76,27 @@ struct sync_file_info {
__u64   sync_fence_info;
 };
 
+/**
+ * struct sync_set_deadline - SYNC_IOC_SET_DEADLINE - set a deadline hint on a 
fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad:   must be zero
+ *
+ * Allows userspace to set a deadline on a fence, see _fence_set_deadline
+ *
+ * The timebase for the deadline is CLOCK_MONOTONIC (same as vblank).  For
+ * example
+ *
+ * clock_gettime(CLOCK_MONOTONIC, );
+ * deadline_ns = (t.tv_sec * 10L) + t.tv_nsec + ns_until_deadline
+ */
+struct sync_set_deadline {
+   __u64   deadline_ns;
+   /* Not strictly needed for alignment but gives some possibility
+* for future extension:
+*/
+   __u64   pad;
+};
+
 #define SYNC_IOC_MAGIC '>'
 
 /*
@@ -87,5 +108,6 @@ struct sync_file_info {
 
 #define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
 #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
+#define SYNC_IOC_SET_DEADLINE  _IOW(SYNC_IOC_MAGIC, 5, struct 
sync_set_deadline)
 
 #endif /* _UAPI_LINUX_SYNC_H */
-- 
2.41.0

[PATCH v9 0/3] dma-fence: Deadline awareness (uabi edition)

2023-08-23 Thread Rob Clark

From: Rob Clark 

This is a re-post of the remaining patches from:
https://patchwork.freedesktop.org/series/114490/

Part of the hold-up of the remaining uabi patches was compositor
support, but now an MR for kwin exists:

  https://invent.kde.org/plasma/kwin/-/merge_requests/4358

The syncobj userspace is:

  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21973

v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper.  Add more docs.
v8: Add patch to surface sync_file UAPI, and more docs updates.
v9: Repost the remaining patches that expose new uabi to userspace.

Rob Clark (3):
  drm/syncobj: Add deadline support for syncobj waits
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sw_sync: Add fence deadline support

 drivers/dma-buf/dma-fence.c|  3 +-
 drivers/dma-buf/sw_sync.c  | 82 ++
 drivers/dma-buf/sync_debug.h   |  2 +
 drivers/dma-buf/sync_file.c| 19 
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/uapi/drm/drm.h | 17 +++
 include/uapi/linux/sync_file.h | 22 +
 7 files changed, 195 insertions(+), 14 deletions(-)

-- 
2.41.0

Re: [PATCH v2] drm/amdgpu: register a dirty framebuffer callback for fbcon

2023-08-23 Thread Hamza Mahfooz


On 8/23/23 16:51, Alex Deucher wrote:

@Mahfooz, Hamza
  can you respin with the NULL check?


sure.



Alex

On Wed, Aug 16, 2023 at 10:25 AM Christian König
 wrote:


Am 16.08.23 um 15:41 schrieb Hamza Mahfooz:


On 8/16/23 01:55, Christian König wrote:



Am 15.08.23 um 19:26 schrieb Hamza Mahfooz:

fbcon requires that we implement _framebuffer_funcs.dirty.
Otherwise, the framebuffer might take a while to flush (which would
manifest as noticeable lag). However, we can't enable this callback for
non-fbcon cases since it might cause too many atomic commits to be made
at once. So, implement amdgpu_dirtyfb() and only enable it for fbcon
framebuffers on devices that support atomic KMS.

Cc: Aurabindo Pillai 
Cc: Mario Limonciello 
Cc: sta...@vger.kernel.org # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519
Signed-off-by: Hamza Mahfooz 
---
v2: update variable names
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26
-
   1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index d20dd3f852fc..d3b59f99cb7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -38,6 +38,8 @@
   #include 
   #include 
   #include 
+#include 
+#include 
   #include 
   #include 
   #include 
@@ -532,11 +534,29 @@ bool amdgpu_display_ddc_probe(struct
amdgpu_connector *amdgpu_connector,
   return true;
   }
+static int amdgpu_dirtyfb(struct drm_framebuffer *fb, struct
drm_file *file,
+  unsigned int flags, unsigned int color,
+  struct drm_clip_rect *clips, unsigned int num_clips)
+{
+
+if (strcmp(fb->comm, "[fbcon]"))
+return -ENOSYS;


Once more to the v2 of this patch: Tests like those are a pretty big
NO-GO for upstreaming.


On closer inspection it is actually sufficient to check if `file` is
NULL here (since it means that the request isn't from userspace). So, do
you think that would be palatable for upstream?


That's certainly better than doing a string compare, but I'm not sure if
that's sufficient.

In general drivers shouldn't have any special handling for fdcon.

You should probably have Thomas Zimmermann  take a
look at this.

Regards,
Christian.





Regards,
Christian.


+
+return drm_atomic_helper_dirtyfb(fb, file, flags, color, clips,
+ num_clips);
+}
+
   static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
   .destroy = drm_gem_fb_destroy,
   .create_handle = drm_gem_fb_create_handle,
   };
+static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = {
+.destroy = drm_gem_fb_destroy,
+.create_handle = drm_gem_fb_create_handle,
+.dirty = amdgpu_dirtyfb
+};
+
   uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
 uint64_t bo_flags)
   {
@@ -1139,7 +1159,11 @@ static int
amdgpu_display_gem_fb_verify_and_init(struct drm_device *dev,
   if (ret)
   goto err;
-ret = drm_framebuffer_init(dev, >base, _fb_funcs);
+if (drm_drv_uses_atomic_modeset(dev))
+ret = drm_framebuffer_init(dev, >base,
+   _fb_funcs_atomic);
+else
+ret = drm_framebuffer_init(dev, >base, _fb_funcs);
   if (ret)
   goto err;





--
Hamza

Re: [PATCH v5 03/11] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-23 Thread Rob Clark

On Tue, Aug 22, 2023 at 11:48 AM Rafael J. Wysocki  wrote:
>
> On Tue, Aug 22, 2023 at 8:02 PM Rob Clark  wrote:
> >
> > From: Rob Clark 
> >
> > In the process of adding lockdep annotation for drm GPU scheduler's
> > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > this lockdep splat:
> >
> >==
> >WARNING: possible circular locking dependency detected
> >6.2.0-rc8-debug+ #558 Tainted: GW
> >--
> >ring0/125 is trying to acquire lock:
> >ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> > dev_pm_qos_update_request+0x38/0x68
> >
> >but task is already holding lock:
> >ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> > msm_gpu_submit+0xec/0x178
> >
> >which lock already depends on the new lock.
> >
> >the existing dependency chain (in reverse order) is:
> >
> >-> #4 (>active_lock){+.+.}-{3:3}:
> >   __mutex_lock+0xcc/0x3c8
> >   mutex_lock_nested+0x30/0x44
> >   msm_gpu_submit+0xec/0x178
> >   msm_job_run+0x78/0x150
> >   drm_sched_main+0x290/0x370
> >   kthread+0xf0/0x100
> >   ret_from_fork+0x10/0x20
> >
> >-> #3 (dma_fence_map){}-{0:0}:
> >   __dma_fence_might_wait+0x74/0xc0
> >   dma_resv_lockdep+0x1f4/0x2f4
> >   do_one_initcall+0x104/0x2bc
> >   kernel_init_freeable+0x344/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> >   fs_reclaim_acquire+0x80/0xa8
> >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >   __kmem_cache_alloc_node+0x60/0x1cc
> >   __kmalloc+0xd8/0x100
> >   topology_parse_cpu_capacity+0x8c/0x178
> >   get_cpu_for_node+0x88/0xc4
> >   parse_cluster+0x1b0/0x28c
> >   parse_cluster+0x8c/0x28c
> >   init_cpu_topology+0x168/0x188
> >   smp_prepare_cpus+0x24/0xf8
> >   kernel_init_freeable+0x18c/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #1 (fs_reclaim){+.+.}-{0:0}:
> >   __fs_reclaim_acquire+0x3c/0x48
> >   fs_reclaim_acquire+0x54/0xa8
> >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >   __kmem_cache_alloc_node+0x60/0x1cc
> >   kmalloc_trace+0x50/0xa8
> >   dev_pm_qos_constraints_allocate+0x38/0x100
> >   __dev_pm_qos_add_request+0xb0/0x1e8
> >   dev_pm_qos_add_request+0x58/0x80
> >   dev_pm_qos_expose_latency_limit+0x60/0x13c
> >   register_cpu+0x12c/0x130
> >   topology_init+0xac/0xbc
> >   do_one_initcall+0x104/0x2bc
> >   kernel_init_freeable+0x344/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> >   __lock_acquire+0xe00/0x1060
> >   lock_acquire+0x1e0/0x2f8
> >   __mutex_lock+0xcc/0x3c8
> >   mutex_lock_nested+0x30/0x44
> >   dev_pm_qos_update_request+0x38/0x68
> >   msm_devfreq_boost+0x40/0x70
> >   msm_devfreq_active+0xc0/0xf0
> >   msm_gpu_submit+0x10c/0x178
> >   msm_job_run+0x78/0x150
> >   drm_sched_main+0x290/0x370
> >   kthread+0xf0/0x100
> >   ret_from_fork+0x10/0x20
> >
> >other info that might help us debug this:
> >
> >Chain exists of:
> >  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
> >
> > Possible unsafe locking scenario:
> >
> >   CPU0CPU1
> >   
> >  lock(>active_lock);
> >   lock(dma_fence_map);
> >   lock(>active_lock);
> >  lock(dev_pm_qos_mtx);
> >
> > *** DEADLOCK ***
> >
> >3 locks held by ring0/123:
> > #0: ff8087251170 (>lock){+.+.}-{3:3}, at: 
> > msm_job_run+0x64/0x150
> > #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> > msm_job_run+0x68/0x150
> > #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> > msm_gpu_submit+0xec/0x178
> >
> >stack backtrace:
> >CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> >Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >Call trace:
> > dump_backtrace.part.0+0xb4/0xf8
> > show_stack+0x20/0x38
> > dump_stack_lvl+0x9c/0xd0
> > dump_stack+0x18/0x34
> > print_circular_bug+0x1b4/0x1f0
> > check_noncircular+0x78/0xac
> > __lock_acquire+0xe00/0x1060
> > lock_acquire+0x1e0/0x2f8
> > __mutex_lock+0xcc/0x3c8
> > mutex_lock_nested+0x30/0x44
> > dev_pm_qos_update_request+0x38/0x68
> > msm_devfreq_boost+0x40/0x70
> > msm_devfreq_active+0xc0/0xf0
> > msm_gpu_submit+0x10c/0x178
> > msm_job_run+0x78/0x150
> > drm_sched_main+0x290/0x370
> >

Re: [Patch v2 2/3] drm/mst: Refactor the flow for payload allocation/removement

2023-08-23 Thread Lyude Paul

Sure - you're also welcome to push the first two patches after fixing the
indentation if you'd like

On Wed, 2023-08-23 at 03:19 +, Lin, Wayne wrote:
> [Public]
> 
> Thanks, Lyude!
> Should I push another version to fix the indention?
> 
> > -Original Message-
> > From: Lyude Paul 
> > Sent: Friday, August 18, 2023 6:17 AM
> > To: Lin, Wayne ; dri-devel@lists.freedesktop.org;
> > amd-...@lists.freedesktop.org
> > Cc: jani.nik...@intel.com; ville.syrj...@linux.intel.com; 
> > imre.d...@intel.com;
> > Wentland, Harry ; Zuo, Jerry
> > 
> > Subject: Re: [Patch v2 2/3] drm/mst: Refactor the flow for payload
> > allocation/removement
> > 
> > Two small comments:
> > 
> > On Mon, 2023-08-07 at 10:56 +0800, Wayne Lin wrote:
> > > [Why]
> > > Today, the allocation/deallocation steps and status is a bit unclear.
> > > 
> > > For instance, payload->vc_start_slot = -1 stands for "the failure of
> > > updating DPCD payload ID table" and can also represent as "payload is
> > > not allocated yet". These two cases should be handled differently and
> > > hence better to distinguish them for better understanding.
> > > 
> > > [How]
> > > Define enumeration - ALLOCATION_LOCAL, ALLOCATION_DFP and
> > > ALLOCATION_REMOTE to distinguish different allocation status. Adjust
> > > the code to handle different status accordingly for better
> > > understanding the sequence of payload allocation and payload
> > removement.
> > > 
> > > For payload creation, the procedure should look like this:
> > > DRM part 1:
> > > * step 1 - update sw mst mgr variables to add a new payload
> > > * step 2 - add payload at immediate DFP DPCD payload table
> > > 
> > > Driver:
> > > * Add new payload in HW and sync up with DFP by sending ACT
> > > 
> > > DRM Part 2:
> > > * Send ALLOCATE_PAYLOAD sideband message to allocate bandwidth along
> > the
> > >   virtual channel.
> > > 
> > > And as for payload removement, the procedure should look like this:
> > > DRM part 1:
> > > * step 1 - Send ALLOCATE_PAYLOAD sideband message to release bandwidth
> > >along the virtual channel
> > > * step 2 - Clear payload allocation at immediate DFP DPCD payload
> > > table
> > > 
> > > Driver:
> > > * Remove the payload in HW and sync up with DFP by sending ACT
> > > 
> > > DRM part 2:
> > > * update sw mst mgr variables to remove the payload
> > > 
> > > Note that it's fine to fail when communicate with the branch device
> > > connected at immediate downstrean-facing port, but updating variables
> > > of SW mst mgr and HW configuration should be conducted anyway. That's
> > > because it's under commit_tail and we need to complete the HW
> > programming.
> > > 
> > > Changes since v1:
> > > * Remove the set but not use variable 'old_payload' in function
> > >   'nv50_msto_prepare'. Catched by kernel test robot 
> > > 
> > > Signed-off-by: Wayne Lin 
> > > ---
> > >  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  20 ++-
> > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 159 +++--
> > -
> > >  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  18 +-
> > >  drivers/gpu/drm/nouveau/dispnv50/disp.c   |  21 +--
> > >  include/drm/display/drm_dp_mst_helper.h   |  23 ++-
> > >  5 files changed, 153 insertions(+), 88 deletions(-)
> > > 
> > > diff --git
> > a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > index d9a482908380..9ad509279b0a 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > > @@ -219,7 +219,7 @@ static void dm_helpers_construct_old_payload(
> > > /* Set correct time_slots/PBN of old payload.
> > >  * other fields (delete & dsc_enabled) in
> > >  * struct drm_dp_mst_atomic_payload are don't care fields
> > > -* while calling drm_dp_remove_payload()
> > > +* while calling drm_dp_remove_payload_part2()
> > >  */
> > > for (i = 0; i < current_link_table.stream_count; i++) {
> > > dc_alloc =
> > > @@ -262,13 +262,12 @@ bool
> > > dm_helpers_dp_mst_write_payload_allocation_table(
> > > 
> > > mst_mgr = >mst_root->mst_mgr;
> > > mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
> > > -
> > > -   /* It's OK for this to fail */
> > > new_payload = drm_atomic_get_mst_payload_state(mst_state,
> > > aconnector->mst_output_port);
> > > 
> > > if (enable) {
> > > target_payload = new_payload;
> > > 
> > > +   /* It's OK for this to fail */
> > > drm_dp_add_payload_part1(mst_mgr, mst_state,
> > new_payload);
> > > } else {
> > > /* construct old payload by VCPI*/
> > > @@ -276,7 +275,7 @@ bool
> > dm_helpers_dp_mst_write_payload_allocation_table(
> > > new_payload, _payload);
> > > target_payload = _payload;
> > > 
> > > -   drm_dp_remove_payload(mst_mgr, mst_state,
> >

Re: [PATCH v2] drm/amdgpu: register a dirty framebuffer callback for fbcon

2023-08-23 Thread Alex Deucher

@Mahfooz, Hamza
 can you respin with the NULL check?

Alex

On Wed, Aug 16, 2023 at 10:25 AM Christian König
 wrote:
>
> Am 16.08.23 um 15:41 schrieb Hamza Mahfooz:
> >
> > On 8/16/23 01:55, Christian König wrote:
> >>
> >>
> >> Am 15.08.23 um 19:26 schrieb Hamza Mahfooz:
> >>> fbcon requires that we implement _framebuffer_funcs.dirty.
> >>> Otherwise, the framebuffer might take a while to flush (which would
> >>> manifest as noticeable lag). However, we can't enable this callback for
> >>> non-fbcon cases since it might cause too many atomic commits to be made
> >>> at once. So, implement amdgpu_dirtyfb() and only enable it for fbcon
> >>> framebuffers on devices that support atomic KMS.
> >>>
> >>> Cc: Aurabindo Pillai 
> >>> Cc: Mario Limonciello 
> >>> Cc: sta...@vger.kernel.org # 6.1+
> >>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519
> >>> Signed-off-by: Hamza Mahfooz 
> >>> ---
> >>> v2: update variable names
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26
> >>> -
> >>>   1 file changed, 25 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> >>> index d20dd3f852fc..d3b59f99cb7c 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> >>> @@ -38,6 +38,8 @@
> >>>   #include 
> >>>   #include 
> >>>   #include 
> >>> +#include 
> >>> +#include 
> >>>   #include 
> >>>   #include 
> >>>   #include 
> >>> @@ -532,11 +534,29 @@ bool amdgpu_display_ddc_probe(struct
> >>> amdgpu_connector *amdgpu_connector,
> >>>   return true;
> >>>   }
> >>> +static int amdgpu_dirtyfb(struct drm_framebuffer *fb, struct
> >>> drm_file *file,
> >>> +  unsigned int flags, unsigned int color,
> >>> +  struct drm_clip_rect *clips, unsigned int num_clips)
> >>> +{
> >>> +
> >>> +if (strcmp(fb->comm, "[fbcon]"))
> >>> +return -ENOSYS;
> >>
> >> Once more to the v2 of this patch: Tests like those are a pretty big
> >> NO-GO for upstreaming.
> >
> > On closer inspection it is actually sufficient to check if `file` is
> > NULL here (since it means that the request isn't from userspace). So, do
> > you think that would be palatable for upstream?
>
> That's certainly better than doing a string compare, but I'm not sure if
> that's sufficient.
>
> In general drivers shouldn't have any special handling for fdcon.
>
> You should probably have Thomas Zimmermann  take a
> look at this.
>
> Regards,
> Christian.
>
> >
> >>
> >> Regards,
> >> Christian.
> >>
> >>> +
> >>> +return drm_atomic_helper_dirtyfb(fb, file, flags, color, clips,
> >>> + num_clips);
> >>> +}
> >>> +
> >>>   static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
> >>>   .destroy = drm_gem_fb_destroy,
> >>>   .create_handle = drm_gem_fb_create_handle,
> >>>   };
> >>> +static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = {
> >>> +.destroy = drm_gem_fb_destroy,
> >>> +.create_handle = drm_gem_fb_create_handle,
> >>> +.dirty = amdgpu_dirtyfb
> >>> +};
> >>> +
> >>>   uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
> >>> uint64_t bo_flags)
> >>>   {
> >>> @@ -1139,7 +1159,11 @@ static int
> >>> amdgpu_display_gem_fb_verify_and_init(struct drm_device *dev,
> >>>   if (ret)
> >>>   goto err;
> >>> -ret = drm_framebuffer_init(dev, >base, _fb_funcs);
> >>> +if (drm_drv_uses_atomic_modeset(dev))
> >>> +ret = drm_framebuffer_init(dev, >base,
> >>> +   _fb_funcs_atomic);
> >>> +else
> >>> +ret = drm_framebuffer_init(dev, >base, _fb_funcs);
> >>>   if (ret)
> >>>   goto err;
> >>
>

Re: [PATCH drm-misc-next v2] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-23 Thread Faith Ekstrand

On Wed, Aug 23, 2023 at 1:17 PM Danilo Krummrich  wrote:

> Currently, NO_PREFETCH is passed implicitly through
> drm_nouveau_gem_pushbuf_push::length and drm_nouveau_exec_push::va_len.
>
> Since this is a direct representation of how the HW is programmed it
> isn't really future proof for a uAPI. Hence, fix this up for the new
> uAPI and split up the va_len field of struct drm_nouveau_exec_push,
> such that we keep 32bit for va_len and 32bit for flags.
>
> For drm_nouveau_gem_pushbuf_push::length at least provide
> NOUVEAU_GEM_PUSHBUF_NO_PREFETCH to indicate the bit shift.
>
> While at it, fix up nv50_dma_push() as well, such that the caller
> doesn't need to encode the NO_PREFETCH flag into the length parameter.
>
> Signed-off-by: Danilo Krummrich 
>

Still

Reviewed-by: Faith Ekstrand 


> ---
> Changes in v2:
>   - dma: rename prefetch to no_prefetch in nv50_dma_push() (Faith)
>   - exec: print error message when pushbuf size larger max pushbuf size
> (Faith)
> ---
>  drivers/gpu/drm/nouveau/nouveau_dma.c  |  7 +--
>  drivers/gpu/drm/nouveau/nouveau_dma.h  |  8 ++--
>  drivers/gpu/drm/nouveau/nouveau_exec.c | 19 ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c  |  6 --
>  include/uapi/drm/nouveau_drm.h |  8 +++-
>  5 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c
> b/drivers/gpu/drm/nouveau/nouveau_dma.c
> index b90cac6d5772..b01c029f3a90 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dma.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dma.c
> @@ -69,16 +69,19 @@ READ_GET(struct nouveau_channel *chan, uint64_t
> *prev_get, int *timeout)
>  }
>
>  void
> -nv50_dma_push(struct nouveau_channel *chan, u64 offset, int length)
> +nv50_dma_push(struct nouveau_channel *chan, u64 offset, u32 length,
> + bool no_prefetch)
>  {
> struct nvif_user *user = >drm->client.device.user;
> struct nouveau_bo *pb = chan->push.buffer;
> int ip = (chan->dma.ib_put * 2) + chan->dma.ib_base;
>
> BUG_ON(chan->dma.ib_free < 1);
> +   WARN_ON(length > NV50_DMA_PUSH_MAX_LENGTH);
>
> nouveau_bo_wr32(pb, ip++, lower_32_bits(offset));
> -   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8);
> +   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8 |
> +   (no_prefetch ? (1 << 31) : 0));
>
> chan->dma.ib_put = (chan->dma.ib_put + 1) & chan->dma.ib_max;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h
> b/drivers/gpu/drm/nouveau/nouveau_dma.h
> index 035a709c7be1..1744d95b233e 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dma.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
> @@ -31,7 +31,8 @@
>  #include "nouveau_chan.h"
>
>  int nouveau_dma_wait(struct nouveau_channel *, int slots, int size);
> -void nv50_dma_push(struct nouveau_channel *, u64 addr, int length);
> +void nv50_dma_push(struct nouveau_channel *, u64 addr, u32 length,
> +  bool no_prefetch);
>
>  /*
>   * There's a hw race condition where you can't jump to your PUT offset,
> @@ -45,6 +46,9 @@ void nv50_dma_push(struct nouveau_channel *, u64 addr,
> int length);
>   */
>  #define NOUVEAU_DMA_SKIPS (128 / 4)
>
> +/* Maximum push buffer size. */
> +#define NV50_DMA_PUSH_MAX_LENGTH 0x7f
> +
>  /* Object handles - for stuff that's doesn't use handle == oclass. */
>  enum {
> NvDmaFB = 0x8002,
> @@ -89,7 +93,7 @@ FIRE_RING(struct nouveau_channel *chan)
>
> if (chan->dma.ib_max) {
> nv50_dma_push(chan, chan->push.addr + (chan->dma.put << 2),
> - (chan->dma.cur - chan->dma.put) << 2);
> + (chan->dma.cur - chan->dma.put) << 2, false);
> } else {
> WRITE_PUT(chan->dma.cur);
> }
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index 0f927adda4ed..a90c4cd8cbb2 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -164,8 +164,10 @@ nouveau_exec_job_run(struct nouveau_job *job)
> }
>
> for (i = 0; i < exec_job->push.count; i++) {
> -   nv50_dma_push(chan, exec_job->push.s[i].va,
> - exec_job->push.s[i].va_len);
> +   struct drm_nouveau_exec_push *p = _job->push.s[i];
> +   bool no_prefetch = p->flags &
> DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH;
> +
> +   nv50_dma_push(chan, p->va, p->va_len, no_prefetch);
> }
>
> ret = nouveau_fence_emit(fence, chan);
> @@ -223,7 +225,18 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
>  {
> struct nouveau_exec_job *job;
> struct nouveau_job_args args = {};
> -   int ret;
> +   int i, ret;
> +
> +   for (i = 0; i < __args->push.count; i++) {
> +   struct drm_nouveau_exec_push *p = &__args->push.s[i];
> +
> +   if (unlikely(p->va_len >

Re: [PATCH -next] fbdev/core: Use list_for_each_entry() helper

2023-08-23 Thread Helge Deller


On 8/23/23 09:21, Jinjie Ruan wrote:

Convert list_for_each() to list_for_each_entry() so that the pos
list_head pointer and list_entry() call are no longer needed, which
can reduce a few lines of code. No functional changed.

Signed-off-by: Jinjie Ruan 


applied.

Thanks!
Helge

Re: [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-23 Thread Imre Deak

On Mon, Aug 21, 2023 at 11:27:29AM +0200, Maxime Ripard wrote:
> On Tue, Aug 15, 2023 at 11:12:46AM +0300, Jani Nikula wrote:
> > On Mon, 14 Aug 2023, Imre Deak  wrote:
> > > On Sun, Aug 13, 2023 at 03:41:30PM +0200, Linux regression tracking 
> > > (Thorsten Leemhuis) wrote:
> > > Hi,
> > >
> > >> On 11.08.23 20:10, Mikhail Rudenko wrote:
> > >> > On 2023-08-11 at 08:45 +02, Thorsten Leemhuis 
> > >> >  wrote:
> > >> >> On 10.08.23 21:33, Mikhail Rudenko wrote:
> > >> >>> The following is a copy an issue I posted to drm/i915 gitlab [1] two
> > >> >>> months ago. I repost it to the mailing lists in hope that it will 
> > >> >>> help
> > >> >>> the right people pay attention to it.
> > >> >>
> > >> >> Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
> > >> >> Thomas (who committed it) it didn't look into this, but maybe the i915
> > >> >> devs didn't forward the report to them.
> > >> 
> > >> For the record: they did, and Jani mentioned already. Sorry, should have
> > >> phrased this differently.
> > >> 
> > >> >> Let's see if these mails help. Just wondering: does reverting
> > >> >> a4e771729a51 from 6.5-rc5 or drm-tip help as well?
> > >> > 
> > >> > I've redone my tests with 6.5-rc5, and here are the results:
> > >> > (1) 6.5-rc5 -> still affected
> > >> > (2) 6.5-rc5 + revert a4e771729a51 -> not affected
> > >> > (3) 6.5-rc5 + two patches [1][2] suggested on i915 gitlab by @ideak -> 
> > >> > not affected (!)
> > >> > 
> > >> > Should we somehow tell regzbot about (3)?
> > >> 
> > >> That's good to know, thx. But the more important things are:
> > >> 
> > >> * When will those be merged? They are not yet in next yet afaics, so it
> > >> might take some time to mainline them, especially at this point of the
> > >> devel cycle. Imre, could you try to prod the right people so that these
> > >> are ideally upstreamed rather sooner than later, as they fix a 
> > >> regression?
> > >
> > > I think the patches ([1] and [2]) could be merged via the drm-intel-next
> > > (drm-intel-fixes) tree Cc'ing also stable. Jani, is this ok?
> > 
> > It's fine by me, but need drm-misc maintainer ack to merge [1] via
> > drm-intel.
> 
> That's fine for me

Thanks, I pushed the patches to drm-intel-next.

> Maxime

Re: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default

2023-08-23 Thread Felix Kuehling


On 2023-08-22 11:41, Deucher, Alexander wrote:

[Public]


-Original Message-
From: Sasha Levin 
Sent: Tuesday, August 22, 2023 7:37 AM
To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
Cc: Deucher, Alexander ; Kuehling, Felix
; Koenig, Christian ;
Mike Lothian ; Sasha Levin ; Pan,
Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default

From: Alex Deucher 

[ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]

We are dropping the IOMMUv2 path, so no need to enable this.
It's often buggy on consumer platforms anyway.

This is not needed for stable.


I agree. I was about to comment in the 5.10 patch as well.

Regards,
  Felix




Alex


Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Tested-by: Mike Lothian 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index e574aa32a111d..46dfd9baeb013 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1523,11 +1523,7 @@ static bool kfd_ignore_crat(void)
   if (ignore_crat)
   return true;

-#ifndef KFD_SUPPORT_IOMMU_V2
   ret = true;
-#else
- ret = false;
-#endif

   return ret;
  }
--
2.40.1

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Doug Anderson

Hi,

On Wed, Aug 23, 2023 at 10:10 AM Andy Shevchenko
 wrote:
>
> > No. Please, do not remove the I2C ID table. It had already been
> > discussed a few years ago.
> >
> > > Yes, it make sense, as it saves some memory
>
> Okay, reading code a bit, it seems that it won't work with purely i2c
> ID matching.

OK, so you are in agreement that it would be OK to drop the I2C ID table?

> So the question here is "Do we want to allow enumeration via sysfs or not?"

Is there some pressing need for it? If not, I guess I'd tend to wait
until someone needs this support before adding it.

-Doug

[PATCH drm-misc-next v2] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-23 Thread Danilo Krummrich

Currently, NO_PREFETCH is passed implicitly through
drm_nouveau_gem_pushbuf_push::length and drm_nouveau_exec_push::va_len.

Since this is a direct representation of how the HW is programmed it
isn't really future proof for a uAPI. Hence, fix this up for the new
uAPI and split up the va_len field of struct drm_nouveau_exec_push,
such that we keep 32bit for va_len and 32bit for flags.

For drm_nouveau_gem_pushbuf_push::length at least provide
NOUVEAU_GEM_PUSHBUF_NO_PREFETCH to indicate the bit shift.

While at it, fix up nv50_dma_push() as well, such that the caller
doesn't need to encode the NO_PREFETCH flag into the length parameter.

Signed-off-by: Danilo Krummrich 
---
Changes in v2:
  - dma: rename prefetch to no_prefetch in nv50_dma_push() (Faith)
  - exec: print error message when pushbuf size larger max pushbuf size (Faith)
---
 drivers/gpu/drm/nouveau/nouveau_dma.c  |  7 +--
 drivers/gpu/drm/nouveau/nouveau_dma.h  |  8 ++--
 drivers/gpu/drm/nouveau/nouveau_exec.c | 19 ---
 drivers/gpu/drm/nouveau/nouveau_gem.c  |  6 --
 include/uapi/drm/nouveau_drm.h |  8 +++-
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c 
b/drivers/gpu/drm/nouveau/nouveau_dma.c
index b90cac6d5772..b01c029f3a90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.c
@@ -69,16 +69,19 @@ READ_GET(struct nouveau_channel *chan, uint64_t *prev_get, 
int *timeout)
 }
 
 void
-nv50_dma_push(struct nouveau_channel *chan, u64 offset, int length)
+nv50_dma_push(struct nouveau_channel *chan, u64 offset, u32 length,
+ bool no_prefetch)
 {
struct nvif_user *user = >drm->client.device.user;
struct nouveau_bo *pb = chan->push.buffer;
int ip = (chan->dma.ib_put * 2) + chan->dma.ib_base;
 
BUG_ON(chan->dma.ib_free < 1);
+   WARN_ON(length > NV50_DMA_PUSH_MAX_LENGTH);
 
nouveau_bo_wr32(pb, ip++, lower_32_bits(offset));
-   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8);
+   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8 |
+   (no_prefetch ? (1 << 31) : 0));
 
chan->dma.ib_put = (chan->dma.ib_put + 1) & chan->dma.ib_max;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h 
b/drivers/gpu/drm/nouveau/nouveau_dma.h
index 035a709c7be1..1744d95b233e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.h
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
@@ -31,7 +31,8 @@
 #include "nouveau_chan.h"
 
 int nouveau_dma_wait(struct nouveau_channel *, int slots, int size);
-void nv50_dma_push(struct nouveau_channel *, u64 addr, int length);
+void nv50_dma_push(struct nouveau_channel *, u64 addr, u32 length,
+  bool no_prefetch);
 
 /*
  * There's a hw race condition where you can't jump to your PUT offset,
@@ -45,6 +46,9 @@ void nv50_dma_push(struct nouveau_channel *, u64 addr, int 
length);
  */
 #define NOUVEAU_DMA_SKIPS (128 / 4)
 
+/* Maximum push buffer size. */
+#define NV50_DMA_PUSH_MAX_LENGTH 0x7f
+
 /* Object handles - for stuff that's doesn't use handle == oclass. */
 enum {
NvDmaFB = 0x8002,
@@ -89,7 +93,7 @@ FIRE_RING(struct nouveau_channel *chan)
 
if (chan->dma.ib_max) {
nv50_dma_push(chan, chan->push.addr + (chan->dma.put << 2),
- (chan->dma.cur - chan->dma.put) << 2);
+ (chan->dma.cur - chan->dma.put) << 2, false);
} else {
WRITE_PUT(chan->dma.cur);
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
b/drivers/gpu/drm/nouveau/nouveau_exec.c
index 0f927adda4ed..a90c4cd8cbb2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -164,8 +164,10 @@ nouveau_exec_job_run(struct nouveau_job *job)
}
 
for (i = 0; i < exec_job->push.count; i++) {
-   nv50_dma_push(chan, exec_job->push.s[i].va,
- exec_job->push.s[i].va_len);
+   struct drm_nouveau_exec_push *p = _job->push.s[i];
+   bool no_prefetch = p->flags & DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH;
+
+   nv50_dma_push(chan, p->va, p->va_len, no_prefetch);
}
 
ret = nouveau_fence_emit(fence, chan);
@@ -223,7 +225,18 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
 {
struct nouveau_exec_job *job;
struct nouveau_job_args args = {};
-   int ret;
+   int i, ret;
+
+   for (i = 0; i < __args->push.count; i++) {
+   struct drm_nouveau_exec_push *p = &__args->push.s[i];
+
+   if (unlikely(p->va_len > NV50_DMA_PUSH_MAX_LENGTH)) {
+   NV_PRINTK(err, nouveau_cli(__args->file_priv),
+ "pushbuf size exceeds limit: 0x%x max 0x%x\n",
+ p->va_len, NV50_DMA_PUSH_MAX_LENGTH);
+   return -EINVAL;
+   }

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Andy Shevchenko

On Wed, Aug 23, 2023 at 8:14 PM Doug Anderson  wrote:
> On Wed, Aug 23, 2023 at 9:53 AM Andy Shevchenko
>  wrote:
> > On Wed, Aug 23, 2023 at 5:36 PM Biju Das  wrote:

...

> > No. Please, do not remove the I2C ID table. It had already been
> > discussed a few years ago.
>
> If you really want the table kept then it's no skin off my teeth. I
> just happened to see that nobody was responding to the patch and I was
> trying to be helpful. My analysis above showed that the I2C table must
> not be used, but if you feel strongly that we need to add code then
> feel free to provide a Reviewed-by tag to Biju's patch! :-)

Have you seen my reply to my reply?
I agree with your above analysis.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v7] drm/doc: Document DRM device reset expectations

2023-08-23 Thread André Almeida


Hi Rodrigo,

Em 23/08/2023 14:31, Rodrigo Vivi escreveu:

On Fri, Aug 18, 2023 at 05:06:42PM -0300, André Almeida wrote:

Create a section that specifies how to deal with DRM device resets for
kernel and userspace drivers.

Signed-off-by: André Almeida 

---

v7 changes:
  - s/application/graphical API contex/ in the robustness part (Michel)
  - Grammar fixes (Randy)

v6: https://lore.kernel.org/lkml/20230815185710.159779-1-andrealm...@igalia.com/

v6 changes:
  - Due to substantial changes in the content, dropped Pekka's Acked-by
  - Grammar fixes (Randy)
  - Add paragraph about disabling device resets
  - Add note about integrating reset tracking in drm/sched
  - Add note that KMD should return failure for contexts affected by
resets and UMD should check for this
  - Add note about lack of consensus around what to do about non-robust
apps

v5: 
https://lore.kernel.org/dri-devel/20230627132323.115440-1-andrealm...@igalia.com/
---
  Documentation/gpu/drm-uapi.rst | 77 ++
  1 file changed, 77 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..3694bdb977f5 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -285,6 +285,83 @@ for GPU1 and GPU2 from different vendors, and a third 
handler for
  mmapped regular files. Threads cause additional pain with signal
  handling as well.
  
+Device reset

+
+
+The GPU stack is really complex and is prone to errors, from hardware bugs,
+faulty applications and everything in between the many layers. Some errors
+require resetting the device in order to make the device usable again. This
+section describes the expectations for DRM and usermode drivers when a
+device resets and how to propagate the reset status.
+
+Device resets can not be disabled without tainting the kernel, which can lead 
to
+hanging the entire kernel through shrinkers/mmu_notifiers. Userspace role in
+device resets is to propagate the message to the application and apply any
+special policy for blocking guilty applications, if any. Corollary is that
+debugging a hung GPU context require hardware support to be able to preempt 
such
+a GPU context while it's stopped.
+
+Kernel Mode Driver
+--
+
+The KMD is responsible for checking if the device needs a reset, and to perform
+it as needed. Usually a hang is detected when a job gets stuck executing. KMD
+should keep track of resets, because userspace can query any time about the
+reset status for a specific context. This is needed to propagate to the rest of
+the stack that a reset has happened. Currently, this is implemented by each
+driver separately, with no common DRM interface. Ideally this should be 
properly
+integrated at DRM scheduler to provide a common ground for all drivers. After a
+reset, KMD should reject new command submissions for affected contexts.


is there any consensus around what exactly 'affected contexts' might mean?
I see i915 pin-point only the context that was at execution with head pointing
at it and doesn't blame the queued ones, while on Xe it looks like we are
blaming all the queued context. Not sure what other drivers are doing for the
'affected contexts'.



"Affected contexts" is a generic term indeed, giving the differences 
from each driver as you already pointed out. amdgpu also tends to affect 
all queued contexts during a reset. This wording was used to fit how 
different drivers works.

Re: [PATCH AUTOSEL 6.4 09/11] drm/amdkfd: ignore crat by default

2023-08-23 Thread Sasha Levin


On Tue, Aug 22, 2023 at 03:41:17PM +, Deucher, Alexander wrote:

[Public]


-Original Message-
From: Sasha Levin 
Sent: Tuesday, August 22, 2023 7:36 AM
To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
Cc: Deucher, Alexander ; Kuehling, Felix
; Koenig, Christian ;
Mike Lothian ; Sasha Levin ; Pan,
Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 6.4 09/11] drm/amdkfd: ignore crat by default

From: Alex Deucher 

[ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]

We are dropping the IOMMUv2 path, so no need to enable this.
It's often buggy on consumer platforms anyway.



This is not needed for stable.


I'll drop all the patches you've pointed out, thanks!

--
Thanks,
Sasha

Re: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-23 Thread John Harrison


On 8/23/2023 09:00, Daniel Vetter wrote:

On Tue, Aug 22, 2023 at 11:53:24AM -0700, John Harrison wrote:

On 8/11/2023 11:20, Zhanjun Dong wrote:

This attempts to avoid circular locking dependency between flush delayed
work and intel_gt_reset.
When intel_gt_reset was called, task will hold a lock.
To cacel delayed work here, the _sync version will also acquire a lock,
which might trigger the possible cirular locking dependency warning.
When intel_gt_reset called, reset_in_progress flag will be set, add code
to check the flag, call async verion if reset is in progress.

Signed-off-by: Zhanjun Dong
Cc: John Harrison
Cc: Andi Shyti
Cc: Daniel Vetter
---
   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a0e3ef1c65d2..600388c849f7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct intel_guc 
*guc)
   static void guc_cancel_busyness_worker(struct intel_guc *guc)
   {
-   cancel_delayed_work_sync(>timestamp.work);
+   /*
+* When intel_gt_reset was called, task will hold a lock.
+* To cacel delayed work here, the _sync version will also acquire a 
lock, which might
+* trigger the possible cirular locking dependency warning.
+* Check the reset_in_progress flag, call async verion if reset is in 
progress.
+*/

This needs to explain in much more detail what is going on and why it is not
a problem. E.g.:

The busyness worker needs to be cancelled. In general that means
using the synchronous cancel version to ensure that an in-progress
worker will not keep executing beyond whatever is happening that
needs the cancel. E.g. suspend, driver unload, etc. However, in the
case of a reset, the synchronous version is not required and can
trigger a false deadlock detection warning.

The business worker takes the reset mutex to protect against resets
interfering with it. However, it does a trylock and bails out if the
reset lock is already acquired. Thus there is no actual deadlock or
other concern with the worker running concurrently with a reset. So
an asynchronous cancel is safe in the case of a reset rather than a
driver unload or suspend type operation. On the other hand, if the
cancel_sync version is used when a reset is in progress then the
mutex deadlock detection sees the mutex being acquired through
multiple paths and complains.

So just don't bother. That keeps the detection code happy and is
safe because of the trylock code described above.

So why do we even need to cancel anything if it doesn't do anything while
the reset is in progress?
It still needs to be cancelled. The worker only aborts if it is actively 
executing concurrently with the reset. It might not start to execute 
until after the reset has completed. And there is presumably a reason 
why the cancel is being called, a reason not necessarily related to 
resets at all. Leaving the worker to run arbitrarily after the driver is 
expecting it to be stopped will lead to much worse things than a fake 
lockdep splat, e.g. a use after free pointer deref.


John.



Just remove the cancel from the reset path as uneeded instead, and explain
why that's ok? Because that's defacto what the cancel_work with a
potential deadlock scenario for cancel_work_sync does, you either don't
need it at all, or the replacement creates a bug.
-Daniel



John.



+   if (guc_to_gt(guc)->uc.reset_in_progress)
+   cancel_delayed_work(>timestamp.work);
+   else
+   cancel_delayed_work_sync(>timestamp.work);
   }
   static void __reset_guc_busyness_stats(struct intel_guc *guc)

Re: [PATCH v7] drm/doc: Document DRM device reset expectations

2023-08-23 Thread Rodrigo Vivi

On Fri, Aug 18, 2023 at 05:06:42PM -0300, André Almeida wrote:
> Create a section that specifies how to deal with DRM device resets for
> kernel and userspace drivers.
> 
> Signed-off-by: André Almeida 
> 
> ---
> 
> v7 changes:
>  - s/application/graphical API contex/ in the robustness part (Michel)
>  - Grammar fixes (Randy)
> 
> v6: 
> https://lore.kernel.org/lkml/20230815185710.159779-1-andrealm...@igalia.com/
> 
> v6 changes:
>  - Due to substantial changes in the content, dropped Pekka's Acked-by
>  - Grammar fixes (Randy)
>  - Add paragraph about disabling device resets
>  - Add note about integrating reset tracking in drm/sched
>  - Add note that KMD should return failure for contexts affected by
>resets and UMD should check for this
>  - Add note about lack of consensus around what to do about non-robust
>apps
> 
> v5: 
> https://lore.kernel.org/dri-devel/20230627132323.115440-1-andrealm...@igalia.com/
> ---
>  Documentation/gpu/drm-uapi.rst | 77 ++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 65fb3036a580..3694bdb977f5 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -285,6 +285,83 @@ for GPU1 and GPU2 from different vendors, and a third 
> handler for
>  mmapped regular files. Threads cause additional pain with signal
>  handling as well.
>  
> +Device reset
> +
> +
> +The GPU stack is really complex and is prone to errors, from hardware bugs,
> +faulty applications and everything in between the many layers. Some errors
> +require resetting the device in order to make the device usable again. This
> +section describes the expectations for DRM and usermode drivers when a
> +device resets and how to propagate the reset status.
> +
> +Device resets can not be disabled without tainting the kernel, which can 
> lead to
> +hanging the entire kernel through shrinkers/mmu_notifiers. Userspace role in
> +device resets is to propagate the message to the application and apply any
> +special policy for blocking guilty applications, if any. Corollary is that
> +debugging a hung GPU context require hardware support to be able to preempt 
> such
> +a GPU context while it's stopped.
> +
> +Kernel Mode Driver
> +--
> +
> +The KMD is responsible for checking if the device needs a reset, and to 
> perform
> +it as needed. Usually a hang is detected when a job gets stuck executing. KMD
> +should keep track of resets, because userspace can query any time about the
> +reset status for a specific context. This is needed to propagate to the rest 
> of
> +the stack that a reset has happened. Currently, this is implemented by each
> +driver separately, with no common DRM interface. Ideally this should be 
> properly
> +integrated at DRM scheduler to provide a common ground for all drivers. 
> After a
> +reset, KMD should reject new command submissions for affected contexts.

is there any consensus around what exactly 'affected contexts' might mean?
I see i915 pin-point only the context that was at execution with head pointing
at it and doesn't blame the queued ones, while on Xe it looks like we are
blaming all the queued context. Not sure what other drivers are doing for the
'affected contexts'.

> +
> +User Mode Driver
> +
> +
> +After command submission, UMD should check if the submission was accepted or
> +rejected. After a reset, KMD should reject submissions, and UMD can issue an
> +ioctl to the KMD to check the reset status, and this can be checked more 
> often
> +if the UMD requires it. After detecting a reset, UMD will then proceed to 
> report
> +it to the application using the appropriate API error code, as explained in 
> the
> +section below about robustness.
> +
> +Robustness
> +--
> +
> +The only way to try to keep a graphical API context working after a reset is 
> if
> +it complies with the robustness aspects of the graphical API that it is 
> using.
> +
> +Graphical APIs provide ways to applications to deal with device resets. 
> However,
> +there is no guarantee that the app will use such features correctly, and a
> +userspace that doesn't support robust interfaces (like a non-robust
> +OpenGL context or API without any robustness support like libva) leave the
> +robustness handling entirely to the userspace driver. There is no strong
> +community consensus on what the userspace driver should do in that case,
> +since all reasonable approaches have some clear downsides.
> +
> +OpenGL
> +~~
> +
> +Apps using OpenGL should use the available robust interfaces, like the
> +extension ``GL_ARB_robustness`` (or ``GL_EXT_robustness`` for OpenGL ES). 
> This
> +interface tells if a reset has happened, and if so, all the context state is
> +considered lost and the app proceeds by creating new ones. There's no 
> consensus
> +on what to do to if robustness is not in use.
> +
> +Vulkan
> +~~
> +

Re: [Intel-xe] [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-08-23 Thread Rodrigo Vivi

On Wed, Aug 23, 2023 at 11:41:19AM -0400, Alex Deucher wrote:
> On Wed, Aug 23, 2023 at 11:26 AM Matthew Brost  
> wrote:
> >
> > On Wed, Aug 23, 2023 at 09:10:51AM +0200, Christian König wrote:
> > > Am 23.08.23 um 05:27 schrieb Matthew Brost:
> > > > [SNIP]
> > > > > That is exactly what I want to avoid, tying the TDR to the job is 
> > > > > what some
> > > > > AMD engineers pushed for because it looked like a simple solution and 
> > > > > made
> > > > > the whole thing similar to what Windows does.
> > > > >
> > > > > This turned the previous relatively clean scheduler and TDR design 
> > > > > into a
> > > > > complete nightmare. The job contains quite a bunch of things which 
> > > > > are not
> > > > > necessarily available after the application which submitted the job 
> > > > > is torn
> > > > > down.
> > > > >
> > > > Agree the TDR shouldn't be accessing anything application specific
> > > > rather just internal job state required to tear the job down on the
> > > > hardware.
> > > > > So what happens is that you either have stale pointers in the TDR 
> > > > > which can
> > > > > go boom extremely easily or we somehow find a way to keep the 
> > > > > necessary
> > > > I have not experenced the TDR going boom in Xe.
> > > >
> > > > > structures (which include struct thread_info and struct file for this 
> > > > > driver
> > > > > connection) alive until all submissions are completed.
> > > > >
> > > > In Xe we keep everything alive until all submissions are completed. By
> > > > everything I mean the drm job, entity, scheduler, and VM via a reference
> > > > counting scheme. All of these structures are just kernel state which can
> > > > safely be accessed even if the application has been killed.
> > >
> > > Yeah, but that might just not be such a good idea from memory management
> > > point of view.
> > >
> > > When you (for example) kill a process all resource from that progress 
> > > should
> > > at least be queued to be freed more or less immediately.
> > >
> >
> > We do this, the TDR kicks jobs off the hardware as fast as the hw
> > interface allows and signals all pending hw fences immediately after.
> > Free jobs then is immediately called and the reference count goes to
> > zero. I think max time for all of this to occur is a handful of ms.
> >
> > > What Linux is doing for other I/O operations is to keep the relevant pages
> > > alive until the I/O operation is completed, but for GPUs that usually 
> > > means
> > > keeping most of the memory of the process alive and that in turn is really
> > > not something you can do.
> > >
> > > You can of course do this if your driver has a reliable way of killing 
> > > your
> > > submissions and freeing resources in a reasonable amount of time. This
> > > should then be done in the flush callback.
> > >
> >
> > 'flush callback' - Do you mean drm_sched_entity_flush? I looked at that
> > and think that function doesn't even work for what I tell. It flushes
> > the spsc queue but what about jobs on the hardware, how do those get
> > killed?
> >
> > As stated we do via the TDR which is rather clean design and fits with
> > our reference couting scheme.
> >
> > > > If we need to teardown on demand we just set the TDR to a minimum value 
> > > > and
> > > > it kicks the jobs off the hardware, gracefully cleans everything up and
> > > > drops all references. This is a benefit of the 1 to 1 relationship, not
> > > > sure if this works with how AMDGPU uses the scheduler.
> > > >
> > > > > Delaying application tear down is also not an option because then you 
> > > > > run
> > > > > into massive trouble with the OOM killer (or more generally OOM 
> > > > > handling).
> > > > > See what we do in drm_sched_entity_flush() as well.
> > > > >
> > > > Not an issue for Xe, we never call drm_sched_entity_flush as our
> > > > referencing counting scheme is all jobs are finished before we attempt
> > > > to tear down entity / scheduler.
> > >
> > > I don't think you can do that upstream. Calling drm_sched_entity_flush() 
> > > is
> > > a must have from your flush callback for the file descriptor.
> > >
> >
> > Again 'flush callback'? What are you refering too.
> >
> > And why does drm_sched_entity_flush need to be called, doesn't seem to
> > do anything useful.
> >
> > > Unless you have some other method for killing your submissions this would
> > > give a path for a deny of service attack vector when the Xe driver is in
> > > use.
> > >
> >
> > Yes, once th TDR fires is disallows all new submissions at the exec
> > IOCTL plus flushes any pending submissions as fast as possible.
> >
> > > > > Since adding the TDR support we completely exercised this through in 
> > > > > the
> > > > > last two or three years or so. And to sum it up I would really like 
> > > > > to get
> > > > > away from this mess again.
> > > > >
> > > > > Compared to that what i915 does is actually rather clean I think.
> > > > >
> > > > Not even close, resets where a nightmare in the i915 (I

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Doug Anderson

Hi,

On Wed, Aug 23, 2023 at 9:53 AM Andy Shevchenko
 wrote:
>
> On Wed, Aug 23, 2023 at 5:36 PM Biju Das  wrote:
> > > On Sun, Aug 13, 2023 at 1:51 AM Biju Das 
> > > wrote:
>
> ...
>
> > > It seems like this is a sign that nobody is actually using the i2c match
> > > table.
>
> You can't know. The I2C ID table allows to instantiate a device from
> user space by supplying it's address and a name, that has to be
> matched with the one in ID table.

In general, right, you can't know. ...and in general, I wouldn't have
suggested removing the table. However, in this specific case I think
we have a very good idea that nobody is using it. Specifically, if you
take a look at Biju's patch you can see that if anyone had been trying
to use the I2C table then they would have been getting a NULL pointer
dereference at probe time for the last ~5 years.

Specifically, I think that as of commit 025910db8057 ("drm/bridge:
analogix-anx78xx: add support for 7808 addresses") that if anyone were
using the I2C ID table:

1. In anx78xx_i2c_probe(), device_get_match_data() would have returned NULL
2. We would have tried to dereference that NULL in the loop.

> > > It was probably added because the original author just copy/pasted
> > > from something else, but obviously it hasn't been kept up to date and 
> > > isn't
> > > working.
>
> How can you be so sure?

Unless I misunderstood the code, they'd be crashing.

> > > While your patch would make it work for "anx7814", it wouldn't
> > > make it work for any of the other similar parts. ...and yes, you could add
> > > support for those parts in your patch too, but IMO it makes more sense to
> > > just delete the i2c table and when someone has an actual need then they 
> > > can
> > > re-add it.
> > >
> > > Sound OK?
>
> No. Please, do not remove the I2C ID table. It had already been
> discussed a few years ago.

If you really want the table kept then it's no skin off my teeth. I
just happened to see that nobody was responding to the patch and I was
trying to be helpful. My analysis above showed that the I2C table must
not be used, but if you feel strongly that we need to add code then
feel free to provide a Reviewed-by tag to Biju's patch! :-)

-Doug

Re: [PATCH] drm/vmwgfx: Fix possible invalid drm gem put calls

2023-08-23 Thread Maaz Mombasawala (VMWare)


LGTM!

Reviewed-by: Maaz Mombasawala

Maaz Mombasawala (VMware)

On 8/17/2023 9:13 PM, Zack Rusin wrote:

From: Zack Rusin 

vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.

This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.

Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.

Signed-off-by: Zack Rusin 
Reported-by: Ian Forbes 
Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too 
soon")
Cc:  # v6.4+
---
  drivers/gpu/drm/vmwgfx/vmwgfx_bo.c  | 6 ++
  drivers/gpu/drm/vmwgfx/vmwgfx_bo.h  | 8 
  drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 6 ++
  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 ++
  drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c | 3 +--
  drivers/gpu/drm/vmwgfx/vmwgfx_shader.c  | 3 +--
  6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 82094c137855..c43853597776 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -497,10 +497,9 @@ static int vmw_user_bo_synccpu_release(struct drm_file 
*filp,
if (!(flags & drm_vmw_synccpu_allow_cs)) {
atomic_dec(_bo->cpu_writers);
}
-   ttm_bo_put(_bo->tbo);
+   vmw_user_bo_unref(vmw_bo);
}
  
-	drm_gem_object_put(_bo->tbo.base);

return ret;
  }
  
@@ -540,8 +539,7 @@ int vmw_user_bo_synccpu_ioctl(struct drm_device *dev, void *data,

return ret;
  
  		ret = vmw_user_bo_synccpu_grab(vbo, arg->flags);

-   vmw_bo_unreference();
-   drm_gem_object_put(>tbo.base);
+   vmw_user_bo_unref(vbo);
if (unlikely(ret != 0)) {
if (ret == -ERESTARTSYS || ret == -EBUSY)
return -EBUSY;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 50a836e70994..1d433fceed3d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -195,6 +195,14 @@ static inline struct vmw_bo *vmw_bo_reference(struct 
vmw_bo *buf)
return buf;
  }
  
+static inline void vmw_user_bo_unref(struct vmw_bo *vbo)

+{
+   if (vbo) {
+   ttm_bo_put(>tbo);
+   drm_gem_object_put(>tbo.base);
+   }
+}
+
  static inline struct vmw_bo *to_vmw_bo(struct drm_gem_object *gobj)
  {
return container_of((gobj), struct vmw_bo, tbo.base);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 6b9aa2b4ef54..25b96821df0f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -1164,8 +1164,7 @@ static int vmw_translate_mob_ptr(struct vmw_private 
*dev_priv,
}
vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_MOB, VMW_BO_DOMAIN_MOB);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
-   ttm_bo_put(_bo->tbo);
-   drm_gem_object_put(_bo->tbo.base);
+   vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
  
@@ -1221,8 +1220,7 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv,

vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM,
 VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
-   ttm_bo_put(_bo->tbo);
-   drm_gem_object_put(_bo->tbo.base);
+   vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c

index b62207be3363..1489ad73c103 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1665,10 +1665,8 @@ static struct drm_framebuffer *vmw_kms_fb_create(struct 
drm_device *dev,
  
  err_out:

/* vmw_user_lookup_handle takes one ref so does new_fb */
-   if (bo) {
-   vmw_bo_unreference();
-   drm_gem_object_put(>tbo.base);
-   }
+   if (bo)
+   vmw_user_bo_unref(bo);
if (surface)
vmw_surface_unreference();
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c

index 7e112319a23c..fb85f244c3d0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
@@ -451,8 +451,7 @@ int vmw_overlay_ioctl(struct drm_device *dev, void *data,
  
  	ret = vmw_overlay_update_stream(dev_priv, buf, arg, true);
  
-	vmw_bo_unreference();

-   drm_gem_object_put(>tbo.base);
+   vmw_user_bo_unref(buf);
  
  out_unlock:

mutex_unlock(>mutex);
diff --git

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Andy Shevchenko

On Wed, Aug 23, 2023 at 7:52 PM Andy Shevchenko
 wrote:
> On Wed, Aug 23, 2023 at 5:36 PM Biju Das  wrote:
> > > On Sun, Aug 13, 2023 at 1:51 AM Biju Das 
> > > wrote:

...

> > > It seems like this is a sign that nobody is actually using the i2c match
> > > table.
>
> You can't know. The I2C ID table allows to instantiate a device from
> user space by supplying it's address and a name, that has to be
> matched with the one in ID table.
>
> > > It was probably added because the original author just copy/pasted
> > > from something else, but obviously it hasn't been kept up to date and 
> > > isn't
> > > working.
>
> How can you be so sure?
>
> > > While your patch would make it work for "anx7814", it wouldn't
> > > make it work for any of the other similar parts. ...and yes, you could add
> > > support for those parts in your patch too, but IMO it makes more sense to
> > > just delete the i2c table and when someone has an actual need then they 
> > > can
> > > re-add it.
> > >
> > > Sound OK?
>
> No. Please, do not remove the I2C ID table. It had already been
> discussed a few years ago.
>
> > Yes, it make sense, as it saves some memory

Okay, reading code a bit, it seems that it won't work with purely i2c
ID matching.
So the question here is "Do we want to allow enumeration via sysfs or not?"


-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Andy Shevchenko

On Wed, Aug 23, 2023 at 5:36 PM Biju Das  wrote:
> > On Sun, Aug 13, 2023 at 1:51 AM Biju Das 
> > wrote:

...

> > It seems like this is a sign that nobody is actually using the i2c match
> > table.

You can't know. The I2C ID table allows to instantiate a device from
user space by supplying it's address and a name, that has to be
matched with the one in ID table.

> > It was probably added because the original author just copy/pasted
> > from something else, but obviously it hasn't been kept up to date and isn't
> > working.

How can you be so sure?

> > While your patch would make it work for "anx7814", it wouldn't
> > make it work for any of the other similar parts. ...and yes, you could add
> > support for those parts in your patch too, but IMO it makes more sense to
> > just delete the i2c table and when someone has an actual need then they can
> > re-add it.
> >
> > Sound OK?

No. Please, do not remove the I2C ID table. It had already been
discussed a few years ago.

> Yes, it make sense, as it saves some memory.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 1/2] drm/display/dp: Default 8 bpc support when DSC is supported

2023-08-23 Thread kernel test robot

Hi Ankit,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-tip/drm-tip]
[also build test WARNING on linus/master v6.5-rc7 next-20230823]
[cannot apply to drm-intel/for-linux-next drm-intel/for-linux-next-fixes 
drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Ankit-Nautiyal/drm-display-dp-Default-8-bpc-support-when-DSC-is-supported/20230823-195946
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
patch link:
https://lore.kernel.org/r/20230823115425.715644-2-ankit.k.nautiyal%40intel.com
patch subject: [PATCH 1/2] drm/display/dp: Default 8 bpc support when DSC is 
supported
config: i386-randconfig-r036-20230823 
(https://download.01.org/0day-ci/archive/20230824/202308240007.1eds9xsl-...@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git 
ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce: 
(https://download.01.org/0day-ci/archive/20230824/202308240007.1eds9xsl-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308240007.1eds9xsl-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/display/drm_dp_helper.c:2451:6: warning: logical not is only 
>> applied to the left hand side of this bitwise operator 
>> [-Wlogical-not-parentheses]
   if (!dsc_dpcd[DP_DSC_SUPPORT] & DP_DSC_DECOMPRESSION_IS_SUPPORTED)
   ^ ~
   drivers/gpu/drm/display/drm_dp_helper.c:2451:6: note: add parentheses after 
the '!' to evaluate the bitwise operator first
   if (!dsc_dpcd[DP_DSC_SUPPORT] & DP_DSC_DECOMPRESSION_IS_SUPPORTED)
   ^
(   )
   drivers/gpu/drm/display/drm_dp_helper.c:2451:6: note: add parentheses around 
left hand side expression to silence this warning
   if (!dsc_dpcd[DP_DSC_SUPPORT] & DP_DSC_DECOMPRESSION_IS_SUPPORTED)
   ^
   ()
   1 warning generated.


vim +2451 drivers/gpu/drm/display/drm_dp_helper.c

  2428  
  2429  /**
  2430   * drm_dp_dsc_sink_supported_input_bpcs() - Get all the input bits per 
component
  2431   * values supported by the DSC sink.
  2432   * @dsc_dpcd: DSC capabilities from DPCD
  2433   * @dsc_bpc: An array to be filled by this helper with supported
  2434   *   input bpcs.
  2435   *
  2436   * Read the DSC DPCD from the sink device to parse the supported bits 
per
  2437   * component values. This is used to populate the DSC parameters
  2438   * in the  drm_dsc_config by the driver.
  2439   * Driver creates an infoframe using these parameters to populate
  2440   *  drm_dsc_pps_infoframe. These are sent to the sink using DSC
  2441   * infoframe using the helper function drm_dsc_pps_infoframe_pack()
  2442   *
  2443   * Returns:
  2444   * Number of input BPC values parsed from the DPCD
  2445   */
  2446  int drm_dp_dsc_sink_supported_input_bpcs(const u8 
dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE],
  2447   u8 dsc_bpc[3])
  2448  {
  2449  int num_bpc = 0;
  2450  
> 2451  if (!dsc_dpcd[DP_DSC_SUPPORT] & 
> DP_DSC_DECOMPRESSION_IS_SUPPORTED)
  2452  return 0;
  2453  
  2454  u8 color_depth = dsc_dpcd[DP_DSC_DEC_COLOR_DEPTH_CAP - 
DP_DSC_SUPPORT];
  2455  
  2456  if (color_depth & DP_DSC_12_BPC)
  2457  dsc_bpc[num_bpc++] = 12;
  2458  if (color_depth & DP_DSC_10_BPC)
  2459  dsc_bpc[num_bpc++] = 10;
  2460  
  2461  /* A DP DSC Sink devices shall support 8 bpc. */
  2462  dsc_bpc[num_bpc++] = 8;
  2463  
  2464  return num_bpc;
  2465  }
  2466  EXPORT_SYMBOL(drm_dp_dsc_sink_supported_input_bpcs);
  2467  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-23 Thread Daniel Vetter

On Tue, Aug 22, 2023 at 11:53:24AM -0700, John Harrison wrote:
> On 8/11/2023 11:20, Zhanjun Dong wrote:
> > This attempts to avoid circular locking dependency between flush delayed
> > work and intel_gt_reset.
> > When intel_gt_reset was called, task will hold a lock.
> > To cacel delayed work here, the _sync version will also acquire a lock,
> > which might trigger the possible cirular locking dependency warning.
> > When intel_gt_reset called, reset_in_progress flag will be set, add code
> > to check the flag, call async verion if reset is in progress.
> > 
> > Signed-off-by: Zhanjun Dong
> > Cc: John Harrison
> > Cc: Andi Shyti
> > Cc: Daniel Vetter
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a0e3ef1c65d2..600388c849f7 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct 
> > intel_guc *guc)
> >   static void guc_cancel_busyness_worker(struct intel_guc *guc)
> >   {
> > -   cancel_delayed_work_sync(>timestamp.work);
> > +   /*
> > +* When intel_gt_reset was called, task will hold a lock.
> > +* To cacel delayed work here, the _sync version will also acquire a 
> > lock, which might
> > +* trigger the possible cirular locking dependency warning.
> > +* Check the reset_in_progress flag, call async verion if reset is in 
> > progress.
> > +*/
> This needs to explain in much more detail what is going on and why it is not
> a problem. E.g.:
> 
>The busyness worker needs to be cancelled. In general that means
>using the synchronous cancel version to ensure that an in-progress
>worker will not keep executing beyond whatever is happening that
>needs the cancel. E.g. suspend, driver unload, etc. However, in the
>case of a reset, the synchronous version is not required and can
>trigger a false deadlock detection warning.
> 
>The business worker takes the reset mutex to protect against resets
>interfering with it. However, it does a trylock and bails out if the
>reset lock is already acquired. Thus there is no actual deadlock or
>other concern with the worker running concurrently with a reset. So
>an asynchronous cancel is safe in the case of a reset rather than a
>driver unload or suspend type operation. On the other hand, if the
>cancel_sync version is used when a reset is in progress then the
>mutex deadlock detection sees the mutex being acquired through
>multiple paths and complains.
> 
>So just don't bother. That keeps the detection code happy and is
>safe because of the trylock code described above.

So why do we even need to cancel anything if it doesn't do anything while
the reset is in progress?

Just remove the cancel from the reset path as uneeded instead, and explain
why that's ok? Because that's defacto what the cancel_work with a
potential deadlock scenario for cancel_work_sync does, you either don't
need it at all, or the replacement creates a bug.
-Daniel

> 
> 
> John.
> 
> 
> > +   if (guc_to_gt(guc)->uc.reset_in_progress)
> > +   cancel_delayed_work(>timestamp.work);
> > +   else
> > +   cancel_delayed_work_sync(>timestamp.work);
> >   }
> >   static void __reset_guc_busyness_stats(struct intel_guc *guc)

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-08-23 Thread Alex Deucher

On Wed, Aug 23, 2023 at 11:26 AM Matthew Brost  wrote:
>
> On Wed, Aug 23, 2023 at 09:10:51AM +0200, Christian König wrote:
> > Am 23.08.23 um 05:27 schrieb Matthew Brost:
> > > [SNIP]
> > > > That is exactly what I want to avoid, tying the TDR to the job is what 
> > > > some
> > > > AMD engineers pushed for because it looked like a simple solution and 
> > > > made
> > > > the whole thing similar to what Windows does.
> > > >
> > > > This turned the previous relatively clean scheduler and TDR design into 
> > > > a
> > > > complete nightmare. The job contains quite a bunch of things which are 
> > > > not
> > > > necessarily available after the application which submitted the job is 
> > > > torn
> > > > down.
> > > >
> > > Agree the TDR shouldn't be accessing anything application specific
> > > rather just internal job state required to tear the job down on the
> > > hardware.
> > > > So what happens is that you either have stale pointers in the TDR which 
> > > > can
> > > > go boom extremely easily or we somehow find a way to keep the necessary
> > > I have not experenced the TDR going boom in Xe.
> > >
> > > > structures (which include struct thread_info and struct file for this 
> > > > driver
> > > > connection) alive until all submissions are completed.
> > > >
> > > In Xe we keep everything alive until all submissions are completed. By
> > > everything I mean the drm job, entity, scheduler, and VM via a reference
> > > counting scheme. All of these structures are just kernel state which can
> > > safely be accessed even if the application has been killed.
> >
> > Yeah, but that might just not be such a good idea from memory management
> > point of view.
> >
> > When you (for example) kill a process all resource from that progress should
> > at least be queued to be freed more or less immediately.
> >
>
> We do this, the TDR kicks jobs off the hardware as fast as the hw
> interface allows and signals all pending hw fences immediately after.
> Free jobs then is immediately called and the reference count goes to
> zero. I think max time for all of this to occur is a handful of ms.
>
> > What Linux is doing for other I/O operations is to keep the relevant pages
> > alive until the I/O operation is completed, but for GPUs that usually means
> > keeping most of the memory of the process alive and that in turn is really
> > not something you can do.
> >
> > You can of course do this if your driver has a reliable way of killing your
> > submissions and freeing resources in a reasonable amount of time. This
> > should then be done in the flush callback.
> >
>
> 'flush callback' - Do you mean drm_sched_entity_flush? I looked at that
> and think that function doesn't even work for what I tell. It flushes
> the spsc queue but what about jobs on the hardware, how do those get
> killed?
>
> As stated we do via the TDR which is rather clean design and fits with
> our reference couting scheme.
>
> > > If we need to teardown on demand we just set the TDR to a minimum value 
> > > and
> > > it kicks the jobs off the hardware, gracefully cleans everything up and
> > > drops all references. This is a benefit of the 1 to 1 relationship, not
> > > sure if this works with how AMDGPU uses the scheduler.
> > >
> > > > Delaying application tear down is also not an option because then you 
> > > > run
> > > > into massive trouble with the OOM killer (or more generally OOM 
> > > > handling).
> > > > See what we do in drm_sched_entity_flush() as well.
> > > >
> > > Not an issue for Xe, we never call drm_sched_entity_flush as our
> > > referencing counting scheme is all jobs are finished before we attempt
> > > to tear down entity / scheduler.
> >
> > I don't think you can do that upstream. Calling drm_sched_entity_flush() is
> > a must have from your flush callback for the file descriptor.
> >
>
> Again 'flush callback'? What are you refering too.
>
> And why does drm_sched_entity_flush need to be called, doesn't seem to
> do anything useful.
>
> > Unless you have some other method for killing your submissions this would
> > give a path for a deny of service attack vector when the Xe driver is in
> > use.
> >
>
> Yes, once th TDR fires is disallows all new submissions at the exec
> IOCTL plus flushes any pending submissions as fast as possible.
>
> > > > Since adding the TDR support we completely exercised this through in the
> > > > last two or three years or so. And to sum it up I would really like to 
> > > > get
> > > > away from this mess again.
> > > >
> > > > Compared to that what i915 does is actually rather clean I think.
> > > >
> > > Not even close, resets where a nightmare in the i915 (I spend years
> > > trying to get this right and probably still completely work) and in Xe
> > > basically got it right on the attempt.
> > >
> > > > >Also in Xe some of
> > > > > things done in free_job cannot be from an IRQ context, hence calling
> > > > > this from the scheduler worker is rather helpful.
> > > > Well

Re: [PATCH] drm/prime: Support page array >= 4GB

2023-08-23 Thread Felix Kuehling


On 2023-08-23 01:49, Christian König wrote:

Am 22.08.23 um 20:27 schrieb Philip Yang:


On 2023-08-22 05:43, Christian König wrote:



Am 21.08.23 um 22:02 schrieb Philip Yang:

Without unsigned long typecast, the size is passed in as zero if page
array size >= 4GB, nr_pages >= 0x10, then sg list converted will
have the first and the last chunk lost.


Good catch, but I'm not sure if this is enough to make it work.

Additional to that I don't think we have an use case for BOs > 4GiB.


>4GB buffer is normal for compute applications, the issue is reported 
by "Maelstrom generated exerciser detects micompares when GPU 
accesses larger remote GPU memory." on GFX 9.4.3 APU, which uses GTT 
domain to allocate VRAM, and trigger the bug in this drm prime 
helper. With this fix, the test passed.




Why is the application allocating all the data as a single BO?

Usually you have a single texture, image, array etc... in a single BO 
but this here looks a bit like the application tries to allocate all 
their memory in a single BO (could of course be that this isn't the 
case and that's really just one giant data structure).


Compute applications work with pretty big data structures. For example 
huge multi-dimensional matrices are not uncommon in large 
machine-learning models.






Swapping such large BOs out at once is quite impractical, so should we 
ever have an use case like suspend/resume or checkpoint/restore with 
this it will most likely fail.
Checkpointing and restoring multiple GB at a time should not be a 
problem. I'm pretty sure we have tested that. On systems with 100s of 
GBs of memory, HBM memory bandwidth approaching TB/s and PCIe/CXL bus 
bandwidths going into 10s of GB/s, dealing with multi-GB BOs should not 
be a fundamental problem.


That said, if you wanted to impose limits on the size of single 
allocations, then I would expect some policy somewhere that prohibits 
large allocations. On the contrary, I see long or 64-bit data types all 
over the VRAM manager and TTM code, which tells me that >4GB allocations 
must be part of the plan.


This patch is clearly addressing a bug in the code that results in data 
corruption when mapping large BOs on multiple GPUs. You could address 
this with an allocation policy change, if you want, and leave the bug in 
place. Then we have to update ROCm user mode to break large allocations 
into multiple BOs. It would break applications that try to share such 
large allocations via DMABufs (e.g. with an RDMA NIC), because it would 
become impossible to share large allocations with a single DMABuf handle.


Regards,
  Felix




Christian.


Regards,

Philip



Christian.



Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/drm_prime.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index f924b8b4ab6b..2630ad2e504d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct 
drm_device *dev,

  if (max_segment == 0)
  max_segment = UINT_MAX;
  err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
-    nr_pages << PAGE_SHIFT,
+    (unsigned long)nr_pages << PAGE_SHIFT,
  max_segment, GFP_KERNEL);
  if (err) {
  kfree(sg);

Re: [PATCH v3 4/7] drm/vkms: Add ConfigFS scaffolding to VKMS

2023-08-23 Thread Marius Vlad


Hi Brandon,

On 8/18/23 10:43, Brandon Pollack wrote:

From: Jim Shargo 

This change adds the basic scaffolding for ConfigFS, including setting
up the default directories. It does not allow for the registration of
configfs-backed devices, which is complex and provided in a follow-up
commit.

This CL includes docs about using ConfigFS with VKMS, but I'll summarize
in brief here as well (assuming ConfigFS is mounted at /config/):

To create a new device, you can do so via `mkdir
/config/vkms/my-device`.

This will create a number of directories and files automatically:

/config
`-- vkms
`-- my-device
|-- connectors
|-- crtcs
|-- encoders
|-- planes
`-- enabled

You can then configure objects by mkdir'ing in each of the directories.

When you're satisfied, you can `echo 1 > /config/vkms/my-device/enabled`.
This will create a new device according to your configuration.

For now, this will fail, but the next change will add support for it.

Signed-off-by: Jim Shargo 
Signed-off-by: Brandon Pollack 
---
  Documentation/gpu/vkms.rst   |  18 +-
  drivers/gpu/drm/Kconfig  |   1 +
  drivers/gpu/drm/vkms/Makefile|   1 +
  drivers/gpu/drm/vkms/vkms_configfs.c | 651 +++
  drivers/gpu/drm/vkms/vkms_drv.c  |  56 ++-
  drivers/gpu/drm/vkms/vkms_drv.h  |  92 +++-
  drivers/gpu/drm/vkms/vkms_output.c   |   5 +
  7 files changed, 807 insertions(+), 17 deletions(-)
  create mode 100644 drivers/gpu/drm/vkms/vkms_configfs.c

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index ba04ac7c2167..c3875bf66dba 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -51,6 +51,12 @@ To disable the driver, use ::
  
sudo modprobe -r vkms
  
+Configuration With ConfigFS

+===
+
+.. kernel-doc:: drivers/gpu/drm/vkms/vkms_configfs.c
+   :doc: ConfigFS Support for VKMS
+
  Testing With IGT
  
  
@@ -135,22 +141,16 @@ project.

  Runtime Configuration
  -
  
-We want to be able to reconfigure vkms instance without having to reload the

-module. Use/Test-cases:
+We want to be able to manipulate vkms instances without having to reload the
+module. Such configuration can be added as extensions to vkms's ConfigFS
+support. Use-cases:
  
  - Hotplug/hotremove connectors on the fly (to be able to test DP MST handling

of compositors).
  
-- Configure planes/crtcs/connectors (we'd need some code to have more than 1 of

-  them first).
-
  - Change output configuration: Plug/unplug screens, change EDID, allow 
changing
the refresh rate.
  
-The currently proposed solution is to expose vkms configuration through

-configfs. All existing module options should be supported through configfs
-too.
-
  Writeback support
  -
  
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig

index ab9ef1c20349..e39ee0e8ca06 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -284,6 +284,7 @@ config DRM_VKMS
depends on DRM && MMU
select DRM_KMS_HELPER
select DRM_GEM_SHMEM_HELPER
+   select CONFIGFS_FS
select CRC32
default n
help
diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 1b28a6a32948..6b83907ad554 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -1,5 +1,6 @@
  # SPDX-License-Identifier: GPL-2.0-only
  vkms-y := \
+   vkms_configfs.o \
vkms_drv.o \
vkms_plane.o \
vkms_output.o \
diff --git a/drivers/gpu/drm/vkms/vkms_configfs.c 
b/drivers/gpu/drm/vkms/vkms_configfs.c
new file mode 100644
index ..72723427a1ac
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_configfs.c
@@ -0,0 +1,651 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vkms_drv.h"
+
+/**
+ * DOC: ConfigFS Support for VKMS
+ *
+ * VKMS is instrumented with support for configuration via :doc:`ConfigFS
+ * <../filesystems/configfs>`.
+ *
+ * With VKMS installed, you can mount ConfigFS at ``/config/`` like so::
+ *
+ *   mkdir -p /config/
+ *   sudo mount -t configfs none /config
+ *
+ * This allows you to configure multiple virtual devices. Note
+ * that the default device which can be enabled in the module params with::
+ *
+ *  modprobe vkms default_device=1
+ *
+ * is immutable because we cannot pre-populate ConfigFS directories with normal
+ * files.
+ *
+ * To set up a new device, create a new directory under the VKMS configfs
+ * directory::
+ *
+ *   mkdir /config/vkms/test
+ *
+ * With your device created you'll find an new directory ready to be
+ * configured::
+ *
+ *   /config
+ *   `-- vkms
+ *   `-- test
+ *   |-- connectors
+ *   |-- crtcs
+ *   |-- encoders
+ *   |-- planes
+ *   `-- enabled
+ *
+ * Each

Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-08-23 Thread Matthew Brost

On Wed, Aug 23, 2023 at 09:10:51AM +0200, Christian König wrote:
> Am 23.08.23 um 05:27 schrieb Matthew Brost:
> > [SNIP]
> > > That is exactly what I want to avoid, tying the TDR to the job is what 
> > > some
> > > AMD engineers pushed for because it looked like a simple solution and made
> > > the whole thing similar to what Windows does.
> > > 
> > > This turned the previous relatively clean scheduler and TDR design into a
> > > complete nightmare. The job contains quite a bunch of things which are not
> > > necessarily available after the application which submitted the job is 
> > > torn
> > > down.
> > > 
> > Agree the TDR shouldn't be accessing anything application specific
> > rather just internal job state required to tear the job down on the
> > hardware.
> > > So what happens is that you either have stale pointers in the TDR which 
> > > can
> > > go boom extremely easily or we somehow find a way to keep the necessary
> > I have not experenced the TDR going boom in Xe.
> > 
> > > structures (which include struct thread_info and struct file for this 
> > > driver
> > > connection) alive until all submissions are completed.
> > > 
> > In Xe we keep everything alive until all submissions are completed. By
> > everything I mean the drm job, entity, scheduler, and VM via a reference
> > counting scheme. All of these structures are just kernel state which can
> > safely be accessed even if the application has been killed.
> 
> Yeah, but that might just not be such a good idea from memory management
> point of view.
> 
> When you (for example) kill a process all resource from that progress should
> at least be queued to be freed more or less immediately.
> 

We do this, the TDR kicks jobs off the hardware as fast as the hw
interface allows and signals all pending hw fences immediately after.
Free jobs then is immediately called and the reference count goes to
zero. I think max time for all of this to occur is a handful of ms.

> What Linux is doing for other I/O operations is to keep the relevant pages
> alive until the I/O operation is completed, but for GPUs that usually means
> keeping most of the memory of the process alive and that in turn is really
> not something you can do.
> 
> You can of course do this if your driver has a reliable way of killing your
> submissions and freeing resources in a reasonable amount of time. This
> should then be done in the flush callback.
> 

'flush callback' - Do you mean drm_sched_entity_flush? I looked at that
and think that function doesn't even work for what I tell. It flushes
the spsc queue but what about jobs on the hardware, how do those get
killed?

As stated we do via the TDR which is rather clean design and fits with
our reference couting scheme.

> > If we need to teardown on demand we just set the TDR to a minimum value and
> > it kicks the jobs off the hardware, gracefully cleans everything up and
> > drops all references. This is a benefit of the 1 to 1 relationship, not
> > sure if this works with how AMDGPU uses the scheduler.
> > 
> > > Delaying application tear down is also not an option because then you run
> > > into massive trouble with the OOM killer (or more generally OOM handling).
> > > See what we do in drm_sched_entity_flush() as well.
> > > 
> > Not an issue for Xe, we never call drm_sched_entity_flush as our
> > referencing counting scheme is all jobs are finished before we attempt
> > to tear down entity / scheduler.
> 
> I don't think you can do that upstream. Calling drm_sched_entity_flush() is
> a must have from your flush callback for the file descriptor.
> 

Again 'flush callback'? What are you refering too.

And why does drm_sched_entity_flush need to be called, doesn't seem to
do anything useful.

> Unless you have some other method for killing your submissions this would
> give a path for a deny of service attack vector when the Xe driver is in
> use.
> 

Yes, once th TDR fires is disallows all new submissions at the exec
IOCTL plus flushes any pending submissions as fast as possible.

> > > Since adding the TDR support we completely exercised this through in the
> > > last two or three years or so. And to sum it up I would really like to get
> > > away from this mess again.
> > > 
> > > Compared to that what i915 does is actually rather clean I think.
> > > 
> > Not even close, resets where a nightmare in the i915 (I spend years
> > trying to get this right and probably still completely work) and in Xe
> > basically got it right on the attempt.
> > 
> > > >Also in Xe some of
> > > > things done in free_job cannot be from an IRQ context, hence calling
> > > > this from the scheduler worker is rather helpful.
> > > Well putting things for cleanup into a workitem doesn't sounds like
> > > something hard.
> > > 
> > That is exactly what we doing in the scheduler with the free_job
> > workitem.
> 
> Yeah, but I think that we do it in the scheduler and not the driver is
> problematic.
>

Disagree, a common clean callback

RE: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Biju Das

Hi Doug Anderson,

> Subject: Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support
> for ID table
> 
> Hi,
> 
> On Sun, Aug 13, 2023 at 1:51 AM Biju Das 
> wrote:
> >
> > The driver has ID  table, but still it uses device_get_match_data()
> > for retrieving match data. Replace device_get_match_data->
> > i2c_get_match_data() for retrieving OF/ACPI/I2C match data by adding
> > match data for ID table similar to OF table.
> >
> > Signed-off-by: Biju Das 
> > ---
> > This patch is only compile tested
> > ---
> >  drivers/gpu/drm/bridge/analogix/analogix-anx78xx.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> It seems like this is a sign that nobody is actually using the i2c match
> table. It was probably added because the original author just copy/pasted
> from something else, but obviously it hasn't been kept up to date and isn't
> working. While your patch would make it work for "anx7814", it wouldn't
> make it work for any of the other similar parts. ...and yes, you could add
> support for those parts in your patch too, but IMO it makes more sense to
> just delete the i2c table and when someone has an actual need then they can
> re-add it.
> 
> Sound OK?

Yes, it make sense, as it saves some memory.

Cheers,
Biju

Re: [PATCH] drm/bridge/analogix/anx78xx: Extend match data support for ID table

2023-08-23 Thread Doug Anderson

Hi,

On Sun, Aug 13, 2023 at 1:51 AM Biju Das  wrote:
>
> The driver has ID  table, but still it uses device_get_match_data()
> for retrieving match data. Replace device_get_match_data->
> i2c_get_match_data() for retrieving OF/ACPI/I2C match data by adding
> match data for ID table similar to OF table.
>
> Signed-off-by: Biju Das 
> ---
> This patch is only compile tested
> ---
>  drivers/gpu/drm/bridge/analogix/analogix-anx78xx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

It seems like this is a sign that nobody is actually using the i2c
match table. It was probably added because the original author just
copy/pasted from something else, but obviously it hasn't been kept up
to date and isn't working. While your patch would make it work for
"anx7814", it wouldn't make it work for any of the other similar
parts. ...and yes, you could add support for those parts in your patch
too, but IMO it makes more sense to just delete the i2c table and when
someone has an actual need then they can re-add it.

Sound OK?

-Doug

Re: [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-23 Thread Danilo Krummrich


On 8/23/23 04:53, Faith Ekstrand wrote:

On Tue, Aug 22, 2023 at 6:41 PM Danilo Krummrich mailto:d...@redhat.com>> wrote:

Currently, NO_PREFETCH is passed implicitly through
drm_nouveau_gem_pushbuf_push::length and drm_nouveau_exec_push::va_len.

Since this is a direct representation of how the HW is programmed it
isn't really future proof for a uAPI. Hence, fix this up for the new
uAPI and split up the va_len field of struct drm_nouveau_exec_push,
such that we keep 32bit for va_len and 32bit for flags.

For drm_nouveau_gem_pushbuf_push::length at least provide
NOUVEAU_GEM_PUSHBUF_NO_PREFETCH to indicate the bit shift.

While at it, fix up nv50_dma_push() as well, such that the caller
doesn't need to encode the NO_PREFETCH flag into the length parameter.

Signed-off-by: Danilo Krummrich mailto:d...@redhat.com>>
---
  drivers/gpu/drm/nouveau/nouveau_dma.c  |  7 +--
  drivers/gpu/drm/nouveau/nouveau_dma.h  |  8 ++--
  drivers/gpu/drm/nouveau/nouveau_exec.c | 15 ---
  drivers/gpu/drm/nouveau/nouveau_gem.c  |  6 --
  include/uapi/drm/nouveau_drm.h         |  8 +++-
  5 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c 
b/drivers/gpu/drm/nouveau/nouveau_dma.c
index b90cac6d5772..059925e5db6a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.c
@@ -69,16 +69,19 @@ READ_GET(struct nouveau_channel *chan, uint64_t 
*prev_get, int *timeout)
  }

  void
-nv50_dma_push(struct nouveau_channel *chan, u64 offset, int length)
+nv50_dma_push(struct nouveau_channel *chan, u64 offset, u32 length,
+             bool prefetch)
  {
         struct nvif_user *user = >drm->client.device.user;
         struct nouveau_bo *pb = chan->push.buffer;
         int ip = (chan->dma.ib_put * 2) + chan->dma.ib_base;

         BUG_ON(chan->dma.ib_free < 1);
+       WARN_ON(length > NV50_DMA_PUSH_MAX_LENGTH);

         nouveau_bo_wr32(pb, ip++, lower_32_bits(offset));
-       nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8);
+       nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8 |
+                       (prefetch ? 0 : (1 << 31)));


It feels a bit weird to be inverting this bit twice. IDK that it matters, 
though.


I usually avoid negated argument names, in this case it kinda makes sense 
though.




         chan->dma.ib_put = (chan->dma.ib_put + 1) & chan->dma.ib_max;

diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h 
b/drivers/gpu/drm/nouveau/nouveau_dma.h
index 035a709c7be1..fb471c357336 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.h
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
@@ -31,7 +31,8 @@
  #include "nouveau_chan.h"

  int nouveau_dma_wait(struct nouveau_channel *, int slots, int size);
-void nv50_dma_push(struct nouveau_channel *, u64 addr, int length);
+void nv50_dma_push(struct nouveau_channel *, u64 addr, u32 length,
+                  bool prefetch);

  /*
   * There's a hw race condition where you can't jump to your PUT offset,
@@ -45,6 +46,9 @@ void nv50_dma_push(struct nouveau_channel *, u64 addr, 
int length);
   */
  #define NOUVEAU_DMA_SKIPS (128 / 4)

+/* Maximum push buffer size. */
+#define NV50_DMA_PUSH_MAX_LENGTH 0x7f
+
  /* Object handles - for stuff that's doesn't use handle == oclass. */
  enum {
         NvDmaFB         = 0x8002,
@@ -89,7 +93,7 @@ FIRE_RING(struct nouveau_channel *chan)

         if (chan->dma.ib_max) {
                 nv50_dma_push(chan, chan->push.addr + (chan->dma.put << 2),
-                             (chan->dma.cur - chan->dma.put) << 2);
+                             (chan->dma.cur - chan->dma.put) << 2, true);
         } else {
                 WRITE_PUT(chan->dma.cur);
         }
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
b/drivers/gpu/drm/nouveau/nouveau_exec.c
index 0f927adda4ed..a123b07b2adf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -164,8 +164,10 @@ nouveau_exec_job_run(struct nouveau_job *job)
         }

         for (i = 0; i < exec_job->push.count; i++) {
-               nv50_dma_push(chan, exec_job->push.s[i].va,
-                             exec_job->push.s[i].va_len);
+               struct drm_nouveau_exec_push *p = _job->push.s[i];
+               bool prefetch = !(p->flags & 
DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH);
+
+               nv50_dma_push(chan, p->va, p->va_len, prefetch);
         }

         ret = nouveau_fence_emit(fence, chan);
@@ -223,7 +225,14 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
  {
         struct nouveau_exec_job *job;
         struct nouveau_job_args args = {};

Re: [PATCH v6 3/4] drm: Expand max DRM device number to full MINORBITS

2023-08-23 Thread James Zhu


Hi Simon,

Thanks! Yes, this kernel patch should work with latest libdrm.

Best regards!

James Zhu

On 2023-08-23 06:53, Simon Ser wrote:

On Tuesday, August 8th, 2023 at 17:04, James Zhu  wrote:


I have a MR for libdrm to support drm nodes type up to 2^MINORBITS
nodes which can work with these patches,

https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305

FWIW, this MR has been merged, so in theory this kernel patch should
work fine with latest libdrm.

[Bug 217664] Laptop doesnt wake up from suspend mode.

2023-08-23 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=217664

Mario Limonciello (AMD) (mario.limoncie...@amd.com) changed:

   What|Removed |Added

 CC||mario.limoncie...@amd.com

--- Comment #24 from Mario Limonciello (AMD) (mario.limoncie...@amd.com) ---
> Kernel Version: 6.2.0-25-generic (64-bit)

This is the upstream kernel bug tracker and you're filing a bug on a distro
kernel.

Can you please try against a supported upstream mainline kernel not a distro
kernel?  This might be missing patches.

You can find mainline kernel builds for Ubuntu here:
https://kernel.ubuntu.com/~kernel-ppa/mainline/

I suggest trying 6.4.11.

> [  145.070506] PM: suspend entry (s2idle)
> [  152.723268] amd_pmc AMDI0005:00: Last suspend didn't reach deepest state

This system is using s2idle.  In this case, disabling amdgpu won't be useful to
identify a platform issue because the system won't reach the deepest state
without it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] dt-bindings: display: advantech,idk-2121wr: reference common panel

2023-08-23 Thread Rob Herring



On Wed, 23 Aug 2023 10:11:07 +0200, Krzysztof Kozlowski wrote:
> Reference common panel bindings to bring descriptions of common fields
> like panel-timing.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  .../bindings/display/panel/advantech,idk-2121wr.yaml   | 3 +++
>  1 file changed, 3 insertions(+)
> 

Applied, thanks!

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-23 Thread Tomasz Figa

On Wed, Aug 23, 2023 at 4:11 PM Hsia-Jun Li  wrote:
>
>
>
> On 8/23/23 12:46, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open 
> > attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Hi Hsia-Jun,
> >
> > On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:
> >>
> >> Hello
> >>
> >> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
> >> purpose of that is sharing metadata or just a pure container for cross
> >> drivers.
> >>
> >> We need to exchange some sort of metadata between drivers, likes dynamic
> >> HDR data between video4linux2 and DRM.
> >
> > If the metadata isn't too big, would it be enough to just have the
> > kernel copy_from_user() to a kernel buffer in the ioctl code?
> >
> >> Or the graphics frame buffer is
> >> too complex to be described with plain plane's DMA-buf fd.
> >> An issue between DRM and V4L2 is that DRM could only support 4 planes
> >> while it is 8 for V4L2. It would be pretty hard for DRM to expend its
> >> interface to support that 4 more planes which would lead to revision of
> >> many standard likes Vulkan, EGL.
> >
> > Could you explain how a shmem buffer could be used to support frame
> > buffers with more than 4 planes?
> > If you are asking why we need this:

I'm asking how your proposal to use shmem FD solves the problem for those cases.

> 1. metadata likes dynamic HDR tone data
> 2. DRM also challenges with this problem, let me quote what sima said:
> "another trick that we iirc used for afbc is that sometimes the planes
> have a fixed layout
> like nv12
> and so logically it's multiple planes, but you only need one plane slot
> to describe the buffer
> since I think afbc had the "we need more than 4 planes" issue too"
>
> Unfortunately, there are vendor pixel formats are not fixed layout.
>
> 3. Secure(REE, trusted video piepline) info.
>
> For how to assign such metadata data.
> In case with a drm fb_id, it is simple, we just add a drm plane property
> for it. The V4L2 interface is not flexible, we could only leave into
> CAPTURE request_fd as a control.
> >>
> >> Also, there is no reason to consume a device's memory for the content
> >> that device can't read it, or wasting an entry of IOMMU for such data.
> >
> > That's right, but DMA-buf doesn't really imply any of those. DMA-buf
> > is just a kernel object with some backing memory. It's up to the
> > allocator to decide how the backing memory is allocated and up to the
> > importer on whether it would be mapped into an IOMMU.
> >
> I just want to say it can't be allocated at the same place which was for
> those DMA bufs(graphics or compressed bitstream).
> This also could be answer for your first question, if we place this kind
> of buffer in a plane for DMABUF(importing) in V4L2, V4L2 core would try
> to prepare it, which could map it into IOMMU.
>

V4L2 core will prepare it according to the struct device that is given
to it. For the planes that don't have to go to the hardware a struct
device could be given that doesn't require any DMA mapping. Also you
can check how the uvcvideo driver handles it. It doesn't use the vb2
buffers directly, but always writes to them using CPU (due to how the
UVC protocol is designed).

> >> Usually, such a metadata would be the value should be written to a
> >> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
> >>
> >> Still, I have some problems with SHMEM:
> >> 1. I don't want the userspace modify the context of the SHMEM allocated
> >> by the kernel, is there a way to do so?
> >
> > This is generally impossible without doing any of the two:
> > 1) copying the contents to an internal buffer not accessible to the
> > userspace, OR
> > 2) modifying any of the buffer mappings to read-only
> >
> > 2) can actually be more costly than 1) (depending on the architecture,
> > data size, etc.), so we shouldn't just discard the option of a simple
> > copy_from_user() in the ioctl.
> >
> I don't want the userspace access it at all. So that won't be a problem.

In this case, wouldn't it be enough to have a DMA-buf exporter that
doesn't provide the mmap op?

> >> 2. Should I create a helper function for installing the SHMEM file as a fd?
> >
> > We already have the udmabuf device [1] to turn a memfd into a DMA-buf,
> > so maybe that would be enough?
> >
> > [1] 
> > https://elixir.bootlin.com/linux/v6.5-rc7/source/drivers/dma-buf/udmabuf.c
> >
> It is the kernel driver that allocate this buffer. For example, v4l2
> CAPTURE allocate a buffer for metadata when VIDIOC_REQBUFS.
> Or GBM give you a fd which is assigned with a surface.
>
> So we need a kernel interface.

Sorry, I'm confused. If we're talking about buffers allocated by the
specific allocators like V4L2 or GBM, why do we need SHMEM at all?

Best,
Tomasz

> > Best,
> > Tomasz
> >
> >>
> >> --
> >> Hsia-Jun(Randy) Li
>
> --
> Hsia-Jun(Randy) Li

[PATCH v3 09/10] drm/msm/a6xx: Add A740 support

2023-08-23 Thread Konrad Dybcio

A740 builds upon the A730 IP, shuffling some values and registers
around. More differences will appear when things like BCL are
implemented.

adreno_is_a740_family is added in preparation for more A7xx GPUs,
the logic checks will be valid resulting in smaller diffs.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 88 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 82 +---
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c  | 27 +
 drivers/gpu/drm/msm/adreno/adreno_device.c | 17 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|  6 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h| 19 ++-
 6 files changed, 201 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 17e1e72f5d7d..14ba407e7fe0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -516,6 +516,7 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
struct adreno_gpu *adreno_gpu = _gpu->base;
struct platform_device *pdev = to_platform_device(gmu->dev);
void __iomem *pdcptr = a6xx_gmu_get_mmio(pdev, "gmu_pdc");
+   u32 seqmem0_drv0_reg = REG_A6XX_RSCC_SEQ_MEM_0_DRV0;
void __iomem *seqptr = NULL;
uint32_t pdc_address_offset;
bool pdc_in_aop = false;
@@ -549,21 +550,26 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_ADDR, 0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_DATA + 2, 0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_ADDR + 2, 0);
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_DATA + 4, 0x8000);
+   gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_DATA + 4,
+  adreno_is_a740_family(adreno_gpu) ? 0x8021 : 
0x8000);
gmu_write_rscc(gmu, REG_A6XX_RSCC_HIDDEN_TCS_CMD0_ADDR + 4, 0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_OVERRIDE_START_ADDR, 0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_PDC_SEQ_START_ADDR, 0x4520);
gmu_write_rscc(gmu, REG_A6XX_RSCC_PDC_MATCH_VALUE_LO, 0x4510);
gmu_write_rscc(gmu, REG_A6XX_RSCC_PDC_MATCH_VALUE_HI, 0x4514);
 
+   /* The second spin of A7xx GPUs messed with some register offsets.. */
+   if (adreno_is_a740_family(adreno_gpu))
+   seqmem0_drv0_reg = REG_A7XX_RSCC_SEQ_MEM_0_DRV0_A740;
+
/* Load RSC sequencer uCode for sleep and wakeup */
if (adreno_is_a650_family(adreno_gpu) ||
adreno_is_a7xx(adreno_gpu)) {
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0, 0xeaaae5a0);
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 1, 
0xe1a1ebab);
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 2, 
0xa2e0a581);
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 3, 
0xecac82e2);
-   gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 4, 
0x0020edad);
+   gmu_write_rscc(gmu, seqmem0_drv0_reg, 0xeaaae5a0);
+   gmu_write_rscc(gmu, seqmem0_drv0_reg + 1, 0xe1a1ebab);
+   gmu_write_rscc(gmu, seqmem0_drv0_reg + 2, 0xa2e0a581);
+   gmu_write_rscc(gmu, seqmem0_drv0_reg + 3, 0xecac82e2);
+   gmu_write_rscc(gmu, seqmem0_drv0_reg + 4, 0x0020edad);
} else {
gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0, 0xa7a506a0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 1, 
0xa1e6a6e7);
@@ -767,8 +773,8 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, unsigned 
int state)
struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
struct adreno_gpu *adreno_gpu = _gpu->base;
u32 fence_range_lower, fence_range_upper;
+   u32 chipid, chipid_min = 0;
int ret;
-   u32 chipid;
 
/* Vote veto for FAL10 */
if (adreno_is_a650_family(adreno_gpu) || adreno_is_a7xx(adreno_gpu)) {
@@ -827,16 +833,37 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
 */
gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
 
-   /*
-* Note that the GMU has a slightly different layout for
-* chip_id, for whatever reason, so a bit of massaging
-* is needed.  The upper 16b are the same, but minor and
-* patchid are packed in four bits each with the lower
-* 8b unused:
-*/
-   chipid  = adreno_gpu->chip_id & 0x;
-   chipid |= (adreno_gpu->chip_id << 4) & 0xf000; /* minor */
-   chipid |= (adreno_gpu->chip_id << 8) & 0x0f00; /* patchid */
+   /* NOTE: A730 may also fall in this if-condition with a future GMU fw 
update. */
+   if (adreno_is_a7xx(adreno_gpu) && !adreno_is_a730(adreno_gpu)) {
+   /* A7xx GPUs have obfuscated chip IDs. Use constant maj = 7 */
+

[PATCH v3 04/10] drm/msm/a6xx: Add missing regs for A7XX

2023-08-23 Thread Konrad Dybcio

Add some missing definitions required for A7 support.

This may be substituted with a mesa header sync.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx.xml.h | 9 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h | 8 
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx.xml.h 
b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
index 1c051535fd4a..863b5e3b0e67 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx.xml.h
@@ -1114,6 +1114,12 @@ enum a6xx_tex_type {
 #define REG_A6XX_CP_MISC_CNTL  0x0840
 
 #define REG_A6XX_CP_APRIV_CNTL 0x0844
+#define A6XX_CP_APRIV_CNTL_CDWRITE 0x0040
+#define A6XX_CP_APRIV_CNTL_CDREAD  0x0020
+#define A6XX_CP_APRIV_CNTL_RBRPWB  0x0008
+#define A6XX_CP_APRIV_CNTL_RBPRIVLEVEL 0x0004
+#define A6XX_CP_APRIV_CNTL_RBFETCH 0x0002
+#define A6XX_CP_APRIV_CNTL_ICACHE  0x0001
 
 #define REG_A6XX_CP_PREEMPT_THRESHOLD  0x08c0
 
@@ -1939,6 +1945,8 @@ static inline uint32_t 
REG_A6XX_RBBM_PERFCTR_RBBM_SEL(uint32_t i0) { return 0x00
 
 #define REG_A6XX_RBBM_CLOCK_HYST_TEX_FCHE  0x0122
 
+#define REG_A7XX_RBBM_CLOCK_HYST2_VFD  0x012f
+
 #define REG_A6XX_RBBM_LPAC_GBIF_CLIENT_QOS_CNTL
0x05ff
 
 #define REG_A6XX_DBGC_CFG_DBGBUS_SEL_A 0x0600
@@ -8252,5 +8260,6 @@ static inline uint32_t 
A6XX_CX_DBGC_CFG_DBGBUS_BYTEL_1_BYTEL15(uint32_t val)
 
 #define REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1   0x0002
 
+#define REG_A7XX_CX_MISC_TCM_RET_CNTL  0x0039
 
 #endif /* A6XX_XML */
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h
index fcd9eb53baf8..5b66efafc901 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h
@@ -360,6 +360,12 @@ static inline uint32_t A6XX_GMU_GPU_NAP_CTRL_SID(uint32_t 
val)
 
 #define REG_A6XX_GMU_GENERAL_7 0x51cc
 
+#define REG_A6XX_GMU_GENERAL_8 0x51cd
+
+#define REG_A6XX_GMU_GENERAL_9 0x51ce
+
+#define REG_A6XX_GMU_GENERAL_10
0x51cf
+
 #define REG_A6XX_GMU_ISENSE_CTRL   0x515d
 
 #define REG_A6XX_GPU_CS_ENABLE_REG 0x8920
@@ -471,6 +477,8 @@ static inline uint32_t A6XX_GMU_GPU_NAP_CTRL_SID(uint32_t 
val)
 
 #define REG_A6XX_RSCC_SEQ_BUSY_DRV00x0101
 
+#define REG_A7XX_RSCC_SEQ_MEM_0_DRV0_A740  0x0154
+
 #define REG_A6XX_RSCC_SEQ_MEM_0_DRV0   0x0180
 
 #define REG_A6XX_RSCC_TCS0_DRV0_STATUS 0x0346

-- 
2.42.0

[PATCH v3 01/10] dt-bindings: display/msm/gmu: Add Adreno 7[34]0 GMU

2023-08-23 Thread Konrad Dybcio

The GMU on the A7xx series is pretty much the same as on the A6xx parts.
It's now "smarter", needs a bit less register writes and controls more
things (like inter-frame power collapse) mostly internally (instead of
us having to write to G[PM]U_[CG]X registers from APPS)

The only difference worth mentioning is the now-required DEMET clock,
which is strictly required for things like asserting reset lines, not
turning it on results in GMU not being fully functional (all OOB requests
would fail and HFI would hang after the first submitted OOB).

Describe the A730 and A740 GMU.

Reviewed-by: Krzysztof Kozlowski 
Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 .../devicetree/bindings/display/msm/gmu.yaml   | 40 +-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index 5fc4106110ad..20ddb89a4500 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -21,7 +21,7 @@ properties:
   compatible:
 oneOf:
   - items:
-  - pattern: '^qcom,adreno-gmu-6[0-9][0-9]\.[0-9]$'
+  - pattern: '^qcom,adreno-gmu-[67][0-9][0-9]\.[0-9]$'
   - const: qcom,adreno-gmu
   - const: qcom,adreno-gmu-wrapper
 
@@ -213,6 +213,44 @@ allOf:
 - const: axi
 - const: memnoc
 
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,adreno-gmu-730.1
+  - qcom,adreno-gmu-740.1
+then:
+  properties:
+reg:
+  items:
+- description: Core GMU registers
+- description: Resource controller registers
+- description: GMU PDC registers
+reg-names:
+  items:
+- const: gmu
+- const: rscc
+- const: gmu_pdc
+clocks:
+  items:
+- description: GPU AHB clock
+- description: GMU clock
+- description: GPU CX clock
+- description: GPU AXI clock
+- description: GPU MEMNOC clock
+- description: GMU HUB clock
+- description: GPUSS DEMET clock
+clock-names:
+  items:
+- const: ahb
+- const: gmu
+- const: cxo
+- const: axi
+- const: memnoc
+- const: hub
+- const: demet
+
   - if:
   properties:
 compatible:

-- 
2.42.0

[PATCH v3 10/10] drm/msm/a6xx: Poll for GBIF unhalt status in hw_init

2023-08-23 Thread Konrad Dybcio

Some GPUs - particularly A7xx ones - are really really stubborn and
sometimes take a longer-than-expected time to finish unhalting GBIF.

Note that this is not caused by the request a few lines above.

Poll for the unhalt ack to make sure we're not trying to write bits to
an essentially dead GPU that can't receive data on its end of the bus.
Failing to do this will result in inexplicable GMU timeouts or worse.

This is a rather ugly hack which introduces a whole lot of latency.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2313620084b6..11cb410e0ac7 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1629,6 +1629,10 @@ static int hw_init(struct msm_gpu *gpu)
mb();
}
 
+   /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
+   if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
+   spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
+
gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0);
 
if (adreno_is_a619_holi(adreno_gpu))

-- 
2.42.0

[PATCH v3 06/10] drm/msm/a6xx: Send ACD state to QMP at GMU resume

2023-08-23 Thread Konrad Dybcio

The QMP mailbox expects to be notified of the ACD (Adaptive Clock
Distribution) state. Get a handle to the mailbox at probe time and
poke it at GMU resume.

Since we don't fully support ACD yet, hardcode the message to "val: 0"
(state = disabled).

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 21 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h |  3 +++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 75984260898e..17e1e72f5d7d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -980,11 +980,13 @@ static void a6xx_gmu_set_initial_bw(struct msm_gpu *gpu, 
struct a6xx_gmu *gmu)
dev_pm_opp_put(gpu_opp);
 }
 
+#define GMU_ACD_STATE_MSG_LEN  36
 int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 {
struct adreno_gpu *adreno_gpu = _gpu->base;
struct msm_gpu *gpu = _gpu->base;
struct a6xx_gmu *gmu = _gpu->gmu;
+   char buf[GMU_ACD_STATE_MSG_LEN];
int status, ret;
 
if (WARN(!gmu->initialized, "The GMU is not set up yet\n"))
@@ -992,6 +994,18 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
 
gmu->hung = false;
 
+   /* Notify AOSS about the ACD state (unimplemented for now => disable 
it) */
+   if (!IS_ERR(gmu->qmp)) {
+   ret = snprintf(buf, sizeof(buf),
+  "{class: gpu, res: acd, val: %d}",
+  0 /* Hardcode ACD to be disabled for now */);
+   WARN_ON(ret >= GMU_ACD_STATE_MSG_LEN);
+
+   ret = qmp_send(gmu->qmp, buf, sizeof(buf));
+   if (ret)
+   dev_err(gmu->dev, "failed to send GPU ACD state\n");
+   }
+
/* Turn on the resources */
pm_runtime_get_sync(gmu->dev);
 
@@ -1744,6 +1758,10 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct 
device_node *node)
goto detach_cxpd;
}
 
+   gmu->qmp = qmp_get(gmu->dev);
+   if (IS_ERR(gmu->qmp) && adreno_is_a7xx(adreno_gpu))
+   return PTR_ERR(gmu->qmp);
+
init_completion(>pd_gate);
complete_all(>pd_gate);
gmu->pd_nb.notifier_call = cxpd_notifier_cb;
@@ -1767,6 +1785,9 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct 
device_node *node)
 
return 0;
 
+   if (!IS_ERR_OR_NULL(gmu->qmp))
+   qmp_put(gmu->qmp);
+
 detach_cxpd:
dev_pm_domain_detach(gmu->cxpd, false);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index 236f81a43caa..592b296aab22 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "msm_drv.h"
 #include "a6xx_hfi.h"
 
@@ -96,6 +97,8 @@ struct a6xx_gmu {
/* For power domain callback */
struct notifier_block pd_nb;
struct completion pd_gate;
+
+   struct qmp *qmp;
 };
 
 static inline u32 gmu_read(struct a6xx_gmu *gmu, u32 offset)

-- 
2.42.0

[PATCH v3 05/10] drm/msm/a6xx: Add skeleton A7xx support

2023-08-23 Thread Konrad Dybcio

A7xx GPUs are - from kernel's POV anyway - basically another generation
of A6xx. They build upon the A650/A660_family advancements, skipping some
writes (presumably more values are preset correctly on reset), adding
some new ones and changing others.

One notable difference is the introduction of a second shadow, called BV.
To handle this with the current code, allocate it right after the current
RPTR shadow.

BV handling and .submit are mostly based on Jonathan Marek's work.

All A7xx GPUs are assumed to have a GMU.
A702 is not an A7xx-class GPU, it's a weird forked A610.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c   |  95 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 451 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  10 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h|   2 +
 5 files changed, 478 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 03fa89bf3e4b..75984260898e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -200,9 +200,10 @@ int a6xx_gmu_wait_for_idle(struct a6xx_gmu *gmu)
 
 static int a6xx_gmu_start(struct a6xx_gmu *gmu)
 {
+   struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
+   struct adreno_gpu *adreno_gpu = _gpu->base;
+   u32 mask, reset_val, val;
int ret;
-   u32 val;
-   u32 mask, reset_val;
 
val = gmu_read(gmu, REG_A6XX_GMU_CM3_DTCM_START + 0xff8);
if (val <= 0x20010004) {
@@ -218,7 +219,11 @@ static int a6xx_gmu_start(struct a6xx_gmu *gmu)
/* Set the log wptr index
 * note: downstream saves the value in poweroff and restores it here
 */
-   gmu_write(gmu, REG_A6XX_GPU_GMU_CX_GMU_PWR_COL_CP_RESP, 0);
+   if (adreno_is_a7xx(adreno_gpu))
+   gmu_write(gmu, REG_A6XX_GMU_GENERAL_9, 0);
+   else
+   gmu_write(gmu, REG_A6XX_GPU_GMU_CX_GMU_PWR_COL_CP_RESP, 0);
+
 
gmu_write(gmu, REG_A6XX_GMU_CM3_SYSRESET, 0);
 
@@ -518,7 +523,9 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
if (IS_ERR(pdcptr))
goto err;
 
-   if (adreno_is_a650(adreno_gpu) || adreno_is_a660_family(adreno_gpu))
+   if (adreno_is_a650(adreno_gpu) ||
+   adreno_is_a660_family(adreno_gpu) ||
+   adreno_is_a7xx(adreno_gpu))
pdc_in_aop = true;
else if (adreno_is_a618(adreno_gpu) || 
adreno_is_a640_family(adreno_gpu))
pdc_address_offset = 0x30090;
@@ -550,7 +557,8 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
gmu_write_rscc(gmu, REG_A6XX_RSCC_PDC_MATCH_VALUE_HI, 0x4514);
 
/* Load RSC sequencer uCode for sleep and wakeup */
-   if (adreno_is_a650_family(adreno_gpu)) {
+   if (adreno_is_a650_family(adreno_gpu) ||
+   adreno_is_a7xx(adreno_gpu)) {
gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0, 0xeaaae5a0);
gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 1, 
0xe1a1ebab);
gmu_write_rscc(gmu, REG_A6XX_RSCC_SEQ_MEM_0_DRV0 + 2, 
0xa2e0a581);
@@ -635,11 +643,18 @@ static void a6xx_gmu_rpmh_init(struct a6xx_gmu *gmu)
 /* Set up the idle state for the GMU */
 static void a6xx_gmu_power_config(struct a6xx_gmu *gmu)
 {
+   struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
+   struct adreno_gpu *adreno_gpu = _gpu->base;
+
/* Disable GMU WB/RB buffer */
gmu_write(gmu, REG_A6XX_GMU_SYS_BUS_CONFIG, 0x1);
gmu_write(gmu, REG_A6XX_GMU_ICACHE_CONFIG, 0x1);
gmu_write(gmu, REG_A6XX_GMU_DCACHE_CONFIG, 0x1);
 
+   /* A7xx knows better by default! */
+   if (adreno_is_a7xx(adreno_gpu))
+   return;
+
gmu_write(gmu, REG_A6XX_GMU_PWR_COL_INTER_FRAME_CTRL, 0x9c40400);
 
switch (gmu->idle_level) {
@@ -702,7 +717,7 @@ static int a6xx_gmu_fw_load(struct a6xx_gmu *gmu)
u32 itcm_base = 0x;
u32 dtcm_base = 0x0004;
 
-   if (adreno_is_a650_family(adreno_gpu))
+   if (adreno_is_a650_family(adreno_gpu) || adreno_is_a7xx(adreno_gpu))
dtcm_base = 0x10004000;
 
if (gmu->legacy) {
@@ -751,14 +766,22 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
 {
struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
struct adreno_gpu *adreno_gpu = _gpu->base;
+   u32 fence_range_lower, fence_range_upper;
int ret;
u32 chipid;
 
-   if (adreno_is_a650_family(adreno_gpu)) {
+   /* Vote veto for FAL10 */
+   if (adreno_is_a650_family(adreno_gpu) || adreno_is_a7xx(adreno_gpu)) {
gmu_write(gmu, REG_A6XX_GPU_GMU_CX_GMU_CX_FALNEXT_INTF, 1);
gmu_write(gmu,

[PATCH v3 07/10] drm/msm/a6xx: Mostly implement A7xx gpu_state

2023-08-23 Thread Konrad Dybcio

Provide the necessary alternations to mostly support state dumping on
A7xx. Newer GPUs will probably require more changes here. Crashdumper
and debugbus remain untested.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 52 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h | 61 -
 2 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 4e5d650578c6..18be2d3bde09 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -948,6 +948,18 @@ static u32 a6xx_get_cp_roq_size(struct msm_gpu *gpu)
return gpu_read(gpu, REG_A6XX_CP_ROQ_THRESHOLDS_2) >> 14;
 }
 
+static u32 a7xx_get_cp_roq_size(struct msm_gpu *gpu)
+{
+   /*
+* The value at CP_ROQ_THRESHOLDS_2[20:31] is in 4dword units.
+* That register however is not directly accessible from APSS on A7xx.
+* Program the SQE_UCODE_DBG_ADDR with offset=0x70d3 and read the value.
+*/
+   gpu_write(gpu, REG_A6XX_CP_SQE_UCODE_DBG_ADDR, 0x70d3);
+
+   return 4 * (gpu_read(gpu, REG_A6XX_CP_SQE_UCODE_DBG_DATA) >> 20);
+}
+
 /* Read a block of data from an indexed register pair */
 static void a6xx_get_indexed_regs(struct msm_gpu *gpu,
struct a6xx_gpu_state *a6xx_state,
@@ -1019,8 +1031,40 @@ static void a6xx_get_indexed_registers(struct msm_gpu 
*gpu,
 
/* Restore the size in the hardware */
gpu_write(gpu, REG_A6XX_CP_MEM_POOL_SIZE, mempool_size);
+}
+
+static void a7xx_get_indexed_registers(struct msm_gpu *gpu,
+   struct a6xx_gpu_state *a6xx_state)
+{
+   int i, indexed_count, mempool_count;
+
+   indexed_count = ARRAY_SIZE(a7xx_indexed_reglist);
+   mempool_count = ARRAY_SIZE(a7xx_cp_bv_mempool_indexed);
 
-   a6xx_state->nr_indexed_regs = count;
+   a6xx_state->indexed_regs = state_kcalloc(a6xx_state,
+   indexed_count + mempool_count,
+   sizeof(*a6xx_state->indexed_regs));
+   if (!a6xx_state->indexed_regs)
+   return;
+
+   a6xx_state->nr_indexed_regs = indexed_count + mempool_count;
+
+   /* First read the common regs */
+   for (i = 0; i < indexed_count; i++)
+   a6xx_get_indexed_regs(gpu, a6xx_state, _indexed_reglist[i],
+   _state->indexed_regs[i]);
+
+   gpu_rmw(gpu, REG_A6XX_CP_CHICKEN_DBG, 0, BIT(2));
+   gpu_rmw(gpu, REG_A7XX_CP_BV_CHICKEN_DBG, 0, BIT(2));
+
+   /* Get the contents of the CP_BV mempool */
+   for (i = 0; i < mempool_count; i++)
+   a6xx_get_indexed_regs(gpu, a6xx_state, 
a7xx_cp_bv_mempool_indexed,
+   _state->indexed_regs[indexed_count - 1 + i]);
+
+   gpu_rmw(gpu, REG_A6XX_CP_CHICKEN_DBG, BIT(2), 0);
+   gpu_rmw(gpu, REG_A7XX_CP_BV_CHICKEN_DBG, BIT(2), 0);
+   return;
 }
 
 struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
@@ -1056,6 +1100,12 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu 
*gpu)
return _state->base;
 
/* Get the banks of indexed registers */
+   if (adreno_is_a7xx(adreno_gpu)) {
+   a7xx_get_indexed_registers(gpu, a6xx_state);
+   /* Further codeflow is untested on A7xx. */
+   return _state->base;
+   }
+
a6xx_get_indexed_registers(gpu, a6xx_state);
 
/*
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h
index e788ed72eb0d..8d7e6f26480a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h
@@ -338,6 +338,28 @@ static const struct a6xx_registers a6xx_vbif_reglist =
 static const struct a6xx_registers a6xx_gbif_reglist =
REGS(a6xx_gbif_registers, 0, 0);
 
+static const u32 a7xx_ahb_registers[] = {
+   /* RBBM_STATUS */
+   0x210, 0x210,
+   /* RBBM_STATUS2-3 */
+   0x212, 0x213,
+};
+
+static const u32 a7xx_gbif_registers[] = {
+   0x3c00, 0x3c0b,
+   0x3c40, 0x3c42,
+   0x3c45, 0x3c47,
+   0x3c49, 0x3c4a,
+   0x3cc0, 0x3cd1,
+};
+
+static const struct a6xx_registers a7xx_ahb_reglist[] = {
+   REGS(a7xx_ahb_registers, 0, 0),
+};
+
+static const struct a6xx_registers a7xx_gbif_reglist =
+   REGS(a7xx_gbif_registers, 0, 0);
+
 static const u32 a6xx_gmu_gx_registers[] = {
/* GMU GX */
0x, 0x, 0x0010, 0x0013, 0x0016, 0x0016, 0x0018, 0x001b,
@@ -384,14 +406,17 @@ static const struct a6xx_registers a6xx_gmu_reglist[] = {
 };
 
 static u32 a6xx_get_cp_roq_size(struct msm_gpu *gpu);
+static u32 a7xx_get_cp_roq_size(struct msm_gpu *gpu);
 
-static struct a6xx_indexed_registers {
+struct a6xx_indexed_registers {

[PATCH v3 08/10] drm/msm/a6xx: Add A730 support

2023-08-23 Thread Konrad Dybcio

Add support for Adreno 730, also known as GEN7_0_x, found on SM8450.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 126 -
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c  |  61 ++
 drivers/gpu/drm/msm/adreno/adreno_device.c |  13 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|   7 +-
 4 files changed, 203 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 61ce8d053355..522043883290 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -837,6 +837,63 @@ const struct adreno_reglist a690_hwcg[] = {
{}
 };
 
+const struct adreno_reglist a730_hwcg[] = {
+   { REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x0222 },
+   { REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x0202 },
+   { REG_A6XX_RBBM_CLOCK_HYST_SP0, 0xf3cf },
+   { REG_A6XX_RBBM_CLOCK_DELAY_SP0, 0x0080 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_TP0, 0x2220 },
+   { REG_A6XX_RBBM_CLOCK_CNTL2_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_CNTL3_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_CNTL4_TP0, 0x0022 },
+   { REG_A6XX_RBBM_CLOCK_HYST_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST2_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST3_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST4_TP0, 0x0007 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY2_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY3_TP0, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY4_TP0, 0x0001 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_UCHE, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST_UCHE, 0x0004 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_UCHE, 0x0002 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_RB0, 0x },
+   { REG_A6XX_RBBM_CLOCK_CNTL2_RB0, 0x0100 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_CCU0, 0x2220 },
+   { REG_A6XX_RBBM_CLOCK_HYST_RB_CCU0, 0x44000f00 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_RAC, 0x25222022 },
+   { REG_A6XX_RBBM_CLOCK_CNTL2_RAC, 0x0055 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_RAC, 0x0011 },
+   { REG_A6XX_RBBM_CLOCK_HYST_RAC, 0x00440044 },
+   { REG_A6XX_RBBM_CLOCK_CNTL_TSE_RAS_RBBM, 0x0422 },
+   { REG_A7XX_RBBM_CLOCK_MODE2_GRAS, 0x0222 },
+   { REG_A7XX_RBBM_CLOCK_MODE_BV_GRAS, 0x0022 },
+   { REG_A6XX_RBBM_CLOCK_MODE_GPC, 0x0223 },
+   { REG_A6XX_RBBM_CLOCK_MODE_VFD, 0x },
+   { REG_A7XX_RBBM_CLOCK_MODE_BV_GPC, 0x0022 },
+   { REG_A7XX_RBBM_CLOCK_MODE_BV_VFD, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST_TSE_RAS_RBBM, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST_GPC, 0x04104004 },
+   { REG_A6XX_RBBM_CLOCK_HYST_VFD, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY_TSE_RAS_RBBM, 0x4000 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_GPC, 0x0200 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_VFD, 0x },
+   { REG_A6XX_RBBM_CLOCK_MODE_HLSQ, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY_HLSQ, 0x },
+   { REG_A6XX_RBBM_CLOCK_HYST_HLSQ, 0x },
+   { REG_A6XX_RBBM_CLOCK_DELAY_HLSQ_2, 0x0002 },
+   { REG_A7XX_RBBM_CLOCK_MODE_BV_LRZ, 0x5552 },
+   { REG_A7XX_RBBM_CLOCK_MODE_CP, 0x0223 },
+   { REG_A6XX_RBBM_CLOCK_CNTL, 0x8aa8aa82 },
+   { REG_A6XX_RBBM_ISDB_CNT, 0x0182 },
+   { REG_A6XX_RBBM_RAC_THRESHOLD_CNT, 0x },
+   { REG_A6XX_RBBM_SP_HYST_CNT, 0x },
+   { REG_A6XX_RBBM_CLOCK_CNTL_GMU_GX, 0x0222 },
+   { REG_A6XX_RBBM_CLOCK_DELAY_GMU_GX, 0x0111 },
+   { REG_A6XX_RBBM_CLOCK_HYST_GMU_GX, 0x0555 },
+   {},
+};
+
 static void a6xx_set_hwcg(struct msm_gpu *gpu, bool state)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1048,6 +1105,59 @@ static const u32 a690_protect[] = {
A6XX_PROTECT_NORDWR(0x11c00, 0x0), /*note: infiite range */
 };
 
+static const u32 a730_protect[] = {
+   A6XX_PROTECT_RDONLY(0x0, 0x04ff),
+   A6XX_PROTECT_RDONLY(0x0050b, 0x0058),
+   A6XX_PROTECT_NORDWR(0x0050e, 0x),
+   A6XX_PROTECT_NORDWR(0x00510, 0x),
+   A6XX_PROTECT_NORDWR(0x00534, 0x),
+   A6XX_PROTECT_RDONLY(0x005fb, 0x009d),
+   A6XX_PROTECT_NORDWR(0x00699, 0x01e9),
+   A6XX_PROTECT_NORDWR(0x008a0, 0x0008),
+   A6XX_PROTECT_NORDWR(0x008ab, 0x0024),
+   /* 0x008d0-0x008dd are unprotected on purpose for tools like perfetto */
+   A6XX_PROTECT_RDONLY(0x008de, 0x0154),
+   A6XX_PROTECT_NORDWR(0x00900, 0x004d),
+   A6XX_PROTECT_NORDWR(0x0098d, 0x00b2),
+   A6XX_PROTECT_NORDWR(0x00a41, 0x01be),
+   A6XX_PROTECT_NORDWR(0x00df0, 0x0001),
+   A6XX_PROTECT_NORDWR(0x00e01, 0x),
+   A6XX_PROTECT_NORDWR(0x00e07, 0x0008),
+   A6XX_PROTECT_NORDWR(0x03c00, 0x00c3),
+   A6XX_PROTECT_RDONLY(0x03cc4,

[PATCH v3 02/10] dt-bindings: display/msm/gmu: Allow passing QMP handle

2023-08-23 Thread Konrad Dybcio

When booting the GMU, the QMP mailbox should be pinged about some tunables
(e.g. adaptive clock distribution state). To achieve that, a reference to
it is necessary. Allow it and require it with A730.

Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Acked-by: Krzysztof Kozlowski 
Signed-off-by: Konrad Dybcio 
---
 Documentation/devicetree/bindings/display/msm/gmu.yaml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/msm/gmu.yaml 
b/Documentation/devicetree/bindings/display/msm/gmu.yaml
index 20ddb89a4500..e132dbff3c4a 100644
--- a/Documentation/devicetree/bindings/display/msm/gmu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gmu.yaml
@@ -64,6 +64,10 @@ properties:
   iommus:
 maxItems: 1
 
+  qcom,qmp:
+$ref: /schemas/types.yaml#/definitions/phandle
+description: Reference to the AOSS side-channel message RAM
+
   operating-points-v2: true
 
   opp-table:
@@ -251,6 +255,9 @@ allOf:
 - const: hub
 - const: demet
 
+  required:
+- qcom,qmp
+
   - if:
   properties:
 compatible:

-- 
2.42.0

[PATCH v3 00/10] A7xx support

2023-08-23 Thread Konrad Dybcio

This series attempts to introduce Adreno 700 support (with A730 and A740
found on SM8450 and SM8550 respectively), reusing much of the existing
A6xx code. This submission largely lays the groundwork for expansion and
more or less gives us feature parity (on the kernel side, that is) with
existing A6xx parts.

On top of introducing a very messy set of three (!) separate and
obfuscated deivce identifiers for each 7xx part, this generation
introduces very sophisticated hardware multi-threading and (on some SKUs)
hardware ray-tracing (not supported yet).

After this series, a long-overdue cleanup of drm/msm/adreno is planned
in preparation for adding more features and removing some hardcoding.

The last patch is a hack that may or may not be necessary depending
on your board's humour.. eh.. :/

Developed atop (and hence depends on) [1]

The corresponding devicetree patches are initially available at [2] and
will be posted after this series gets merged. To test it, you'll also need
firmware that you need to obtain from your board (there's none with a
redistributable license, sorry..). Most likely it will be in one of
these directories on your stock android installation:

* /vendor/firmware
* /vendor/firmware_mnt
* /system

..but some vendors make it hard and you have to do some grepping ;)

Requires [3] to work on the userspace side. You'll almost cerainly want
to test it alongside Zink with a lot of debug flags (early impl), like:

TU_DEBUG=sysmem,nolrz,flushall,noubwc MESA_LOADER_DRIVER_OVERRIDE=zink kmscube

[1] 
https://lore.kernel.org/linux-arm-msm/20230517-topic-a7xx_prep-v4-0-b16f273a9...@linaro.org/
[2] https://github.com/SoMainline/linux/commits/topic/a7xx_dt
[3] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217

Signed-off-by: Konrad Dybcio 
---
Changes in v3:
- Pick up tags
- Drop "increase HFI timeout", will revisit another day
- Use family identifiers in "add skeleton a7xx support"
- Drop patches that Rob already picked up
- Retest on A730, didn't explode
- Link to v2: 
https://lore.kernel.org/linux-arm-msm/20230628-topic-a7xx_drmmsm-v2-0-1439e1b23...@linaro.org/#t

Changes in v2:
- Rebase on chipid changes
- Reuse existing description for qcom,aoss in patch 2
- Pick up tags
- Link to v1: 
https://lore.kernel.org/r/20230628-topic-a7xx_drmmsm-v1-0-a7f4496e0...@linaro.org

---
Konrad Dybcio (10):
  dt-bindings: display/msm/gmu: Add Adreno 7[34]0 GMU
  dt-bindings: display/msm/gmu: Allow passing QMP handle
  dt-bindings: display/msm/gpu: Allow A7xx SKUs
  drm/msm/a6xx: Add missing regs for A7XX
  drm/msm/a6xx: Add skeleton A7xx support
  drm/msm/a6xx: Send ACD state to QMP at GMU resume
  drm/msm/a6xx: Mostly implement A7xx gpu_state
  drm/msm/a6xx: Add A730 support
  drm/msm/a6xx: Add A740 support
  drm/msm/a6xx: Poll for GBIF unhalt status in hw_init

 .../devicetree/bindings/display/msm/gmu.yaml   |  47 +-
 .../devicetree/bindings/display/msm/gpu.yaml   |   4 +-
 drivers/gpu/drm/msm/adreno/a6xx.xml.h  |   9 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c  | 204 +--
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h  |   3 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.xml.h  |   8 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 653 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c|  52 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h|  61 +-
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c  |  88 +++
 drivers/gpu/drm/msm/adreno/adreno_device.c |  30 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|   7 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|  32 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h   |   2 +
 14 files changed, 1078 insertions(+), 122 deletions(-)
---
base-commit: c26a0f88bc21bf52303b5a5fbf8edb0cc7723037
change-id: 20230628-topic-a7xx_drmmsm-123f30d76cf7

Best regards,
-- 
Konrad Dybcio

[PATCH v3 03/10] dt-bindings: display/msm/gpu: Allow A7xx SKUs

2023-08-23 Thread Konrad Dybcio

Allow A7xx SKUs, such as the A730 GPU found on SM8450 and friends.
They use GMU for all things DVFS, just like most A6xx GPUs.

Reviewed-by: Krzysztof Kozlowski 
Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Dmitry Baryshkov  # sm8450
Signed-off-by: Konrad Dybcio 
---
 Documentation/devicetree/bindings/display/msm/gpu.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/display/msm/gpu.yaml 
b/Documentation/devicetree/bindings/display/msm/gpu.yaml
index 56b9b247e8c2..b019db954793 100644
--- a/Documentation/devicetree/bindings/display/msm/gpu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/gpu.yaml
@@ -23,7 +23,7 @@ properties:
   The driver is parsing the compat string for Adreno to
   figure out the gpu-id and patch level.
 items:
-  - pattern: '^qcom,adreno-[3-6][0-9][0-9]\.[0-9]$'
+  - pattern: '^qcom,adreno-[3-7][0-9][0-9]\.[0-9]$'
   - const: qcom,adreno
   - description: |
   The driver is parsing the compat string for Imageon to
@@ -203,7 +203,7 @@ allOf:
 properties:
   compatible:
 contains:
-  pattern: '^qcom,adreno-6[0-9][0-9]\.[0-9]$'
+  pattern: '^qcom,adreno-[67][0-9][0-9]\.[0-9]$'
 
   then: # Starting with A6xx, the clocks are usually defined in the GMU 
node
 properties:

-- 
2.42.0

Re: [PATCH 1/2] drm/panfrost: Add fdinfo support to Panfrost

2023-08-23 Thread Adrián Larumbe

Hi Steven, thanks for your feedback.

On 21.08.2023 16:56, Steven Price wrote:
>> We calculate the amount of time the GPU spends on a job with ktime samples,
>> and then add it to the cumulative total for the open DRM file, which is
>> what will be eventually exposed through the 'fdinfo' DRM file descriptor.
>> 
>> Signed-off-by: Adrián Larumbe 
>> ---
>>  drivers/gpu/drm/panfrost/panfrost_device.c | 12 
>>  drivers/gpu/drm/panfrost/panfrost_device.h | 10 +++
>>  drivers/gpu/drm/panfrost/panfrost_drv.c| 32 +-
>>  drivers/gpu/drm/panfrost/panfrost_job.c|  6 
>>  drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
>>  5 files changed, 62 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c 
>> b/drivers/gpu/drm/panfrost/panfrost_device.c
>> index fa1a086a862b..67a5e894d037 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
>> @@ -401,6 +401,18 @@ void panfrost_device_reset(struct panfrost_device 
>> *pfdev)
>>  panfrost_job_enable_interrupts(pfdev);
>>  }
>>  
>> +struct drm_info_gpu panfrost_device_get_counters(struct panfrost_device 
>> *pfdev,
>> + struct panfrost_file_priv 
>> *panfrost_priv)
>> +{
>> +struct drm_info_gpu gpu_info;
>> +
>> +gpu_info.engine =  panfrost_priv->elapsed_ns;
>> +gpu_info.cycles =  panfrost_priv->elapsed_ns * 
>> clk_get_rate(pfdev->clock);
>> +gpu_info.maxfreq =  clk_get_rate(pfdev->clock);
>
>First, calling clk_get_rate() twice here is inefficient.
>
>Second, I'm not sure it's really worth producing these derived values.
>As I understand it the purpose of cycles/maxfreq is to be able to
>provide a utilisation value which accounts for DVFS. I.e. if the GPU is
>clocked down the utilisation of cycles/maxfreq is low even if the GPU is
>active for the whole sample period.

>What we therefore need to report is the *maximum* frequency in
>clk_get_rate(). Also rather than just multiplying elapsed_ns by the
>current clock rate, we need to sum up cycles over time as the clock
>frequency changes. Alternatively it might be possible to use the actual
>GPU register (CYCLE_COUNT_LO/CYCLE_COUNT_HI at offset 0x90,0x94) -
>although note that this is reset when the GPU is reset.

I've fixed this in a second version of the patch and now calculate the maximum
operating frequency during the driver initialisation stage in the following way:

unsigned long freq = ULONG_MAX;

/* Find the fastest defined rate  */
opp = dev_pm_opp_find_freq_floor(dev, );
if (IS_ERR(opp))
return PTR_ERR(opp);
pfdev->features.fast_rate = freq;

dev_pm_opp_put(opp);

Regarding the number of cycles, sampling CYCLE_COUNT would give us the most
accurate figure, however fdinfo must return values that are relative to the file
being queried, whereas that register would give us a raw count for all queues.

There's also the problem of clock frequencies being variable over time because
of DVFS. To get an accurate value for the number of cycles spent in a given
job, we would have to store clock frequencies together with their timestamps
every time there's a rate change, and then in the job deque function traverse
it, find the interval intersection and multiply every frequency by its
respective length. This sounds like too much work, so I think until I can come
up with something less complex I'm going to drop reporting of the drm-cycles
value altogether.

Although come think of it, maybe I could sample the number of cycles both at
the beginning and end of a job and add that to an overall per-file tally. 

>Finally I doubt elapsed_ns is actually what user space is expecting. The
>GPU has multiple job slots (3, but only 2 are used in almost all cases)
>so can be running more than one job at a time. So there's going to be
>some double counting going on here.
>
>Sorry to poke holes in this, I think this would be a good feature. But
>if we're going to return information we want it to be at least
>reasonably correct.

Thanks for pointing this out, I hadn't thought about that the same file could
have two simultaneous jobs running at the same time.

I've checked what other drivers do for reporting these values, and they print a
separate drm-engine value for each of their execution units (render, copy,
compute, etc).  In our case, because there are 2 or 3 queues, perhaps we should
do the same.

>Thanks,
>
>Steve
>
>> +
>> +return gpu_info;
>> +}
>> +
>>  static int panfrost_device_resume(struct device *dev)
>>  {
>>  struct panfrost_device *pfdev = dev_get_drvdata(dev);
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
>> b/drivers/gpu/drm/panfrost/panfrost_device.h
>> index b0126b9fbadc..4621a2ece1bb 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
>> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
>> @@ -141,6 +141,14 @@ struct panfrost_file_priv {
>>  struct drm_sched_entity

[Bug 217664] Laptop doesnt wake up from suspend mode.

2023-08-23 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=217664

--- Comment #23 from Alex Deucher (alexdeuc...@gmail.com) ---
Is the system accessible on resume?  I.e., can you get ssh access if the
display is not active?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3 2/3] dt-bindings: display: novatek,nt35950: define ports

2023-08-23 Thread Rob Herring

On Wed, Aug 23, 2023 at 11:53:56AM +0200, Krzysztof Kozlowski wrote:
> On 23/08/2023 11:08, Rob Herring wrote:
> > 
> > On Wed, 23 Aug 2023 10:14:59 +0200, Krzysztof Kozlowski wrote:
> >> The panel-common schema does not define what "ports" property is, so
> >> bring the definition by referencing the panel-common-dual.yaml. Panels
> >> can be single- or dual-link, thus require only one port@0.
> >>
> >> Signed-off-by: Krzysztof Kozlowski 
> >>
> >> ---
> >>
> >> Changes since v2:
> >> 1. Use panel-common-dual
> >>
> >> Changes since v1:
> >> 1. Rework to add ports to device schema, not to panel-common.
> >> ---
> >>  .../devicetree/bindings/display/panel/novatek,nt35950.yaml | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> > 
> > My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
> > on your patch (DT_CHECKER_FLAGS is new in v5.13):
> > 
> > yamllint warnings/errors:
> 
> Previous patch seems to be missing in Patchwork, thus this error.
> 
> https://patchwork.ozlabs.org/project/devicetree-bindings/list/?submitter=83726=both=*

Must have been some delay on that one as it is there now.

Rob

Re: [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-23 Thread kernel test robot

Hi Danilo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on ad1367f831f8743746a1f49705c28e36a7c95525]

url:
https://github.com/intel-lab-lkp/linux/commits/Danilo-Krummrich/drm-nouveau-uapi-don-t-pass-NO_PREFETCH-flag-implicitly/20230823-074237
base:   ad1367f831f8743746a1f49705c28e36a7c95525
patch link:
https://lore.kernel.org/r/20230822234139.11185-1-dakr%40redhat.com
patch subject: [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH 
flag implicitly
reproduce: 
(https://download.01.org/0day-ci/archive/20230823/202308232030.0r1irpmu-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308232030.0r1irpmu-...@intel.com/

All warnings (new ones prefixed by >>):

>> ./include/uapi/drm/nouveau_drm.h:344: warning: Incorrect use of kernel-doc 
>> format:  * flags: the flags for this push buffer mapping
>> ./include/uapi/drm/nouveau_drm.h:348: warning: Function parameter or member 
>> 'flags' not described in 'drm_nouveau_exec_push'

vim +344 ./include/uapi/drm/nouveau_drm.h

   327  
   328  /**
   329   * struct drm_nouveau_exec_push - EXEC push operation
   330   *
   331   * This structure represents a single EXEC push operation. UMDs should 
pass an
   332   * array of this structure via struct drm_nouveau_exec's _ptr 
field.
   333   */
   334  struct drm_nouveau_exec_push {
   335  /**
   336   * @va: the virtual address of the push buffer mapping
   337   */
   338  __u64 va;
   339  /**
   340   * @va_len: the length of the push buffer mapping
   341   */
   342  __u32 va_len;
   343  /**
 > 344   * flags: the flags for this push buffer mapping
   345   */
   346  __u32 flags;
   347  #define DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH 0x1
 > 348  };
   349  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

[PATCH 2/2] drivers/drm/i915: Honor limits->max_bpp while computing DSC max input bpp

2023-08-23 Thread Ankit Nautiyal

Edid specific BPC constraints are stored in limits->max_bpp. Honor these
limits while computing the input bpp for DSC.

Signed-off-by: Ankit Nautiyal 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 5b48bfe09d0e..2a7f6cfe2832 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2061,9 +2061,11 @@ static int intel_edp_dsc_compute_pipe_bpp(struct 
intel_dp *intel_dp,
if (forced_bpp) {
pipe_bpp = forced_bpp;
} else {
+   u8 max_bpc = limits->max_bpp / 3;
+
/* For eDP use max bpp that can be supported with DSC. */
pipe_bpp = intel_dp_dsc_compute_max_bpp(intel_dp,
-   
conn_state->max_requested_bpc);
+   min(max_bpc, 
conn_state->max_requested_bpc));
if (!is_dsc_pipe_bpp_sufficient(i915, conn_state, limits, 
pipe_bpp)) {
drm_dbg_kms(>drm,
"Computed BPC is not in DSC BPC limits\n");
-- 
2.40.1

[PATCH 1/2] drm/display/dp: Default 8 bpc support when DSC is supported

2023-08-23 Thread Ankit Nautiyal

As per DP v1.4, a DP DSC Sink device shall support 8bpc in DPCD 6Ah.
Apparently some panels that do support DSC, are not setting the bit for
8bpc.

So always assume 8bpc support by DSC decoder, when DSC is claimed to be
supported.

Signed-off-by: Ankit Nautiyal 
---
 drivers/gpu/drm/display/drm_dp_helper.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index e6a78fd32380..0aa4ce17420c 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -2447,14 +2447,19 @@ int drm_dp_dsc_sink_supported_input_bpcs(const u8 
dsc_dpcd[DP_DSC_RECEIVER_CAP_S
 u8 dsc_bpc[3])
 {
int num_bpc = 0;
+
+   if (!dsc_dpcd[DP_DSC_SUPPORT] & DP_DSC_DECOMPRESSION_IS_SUPPORTED)
+   return 0;
+
u8 color_depth = dsc_dpcd[DP_DSC_DEC_COLOR_DEPTH_CAP - DP_DSC_SUPPORT];
 
if (color_depth & DP_DSC_12_BPC)
dsc_bpc[num_bpc++] = 12;
if (color_depth & DP_DSC_10_BPC)
dsc_bpc[num_bpc++] = 10;
-   if (color_depth & DP_DSC_8_BPC)
-   dsc_bpc[num_bpc++] = 8;
+
+   /* A DP DSC Sink devices shall support 8 bpc. */
+   dsc_bpc[num_bpc++] = 8;
 
return num_bpc;
 }
-- 
2.40.1

[PATCH 0/2] eDP DSC fixes

2023-08-23 Thread Ankit Nautiyal

Assume 8bpc is supported if Sink claims DSC support.
Also consider bpc constraint coming from EDID while computing
input BPC for DSC.

Ankit Nautiyal (2):
  drm/display/dp: Default 8 bpc support when DSC is supported
  drivers/drm/i915: Honor limits->max_bpp while computing DSC max input
bpp

 drivers/gpu/drm/display/drm_dp_helper.c | 9 +++--
 drivers/gpu/drm/i915/display/intel_dp.c | 4 +++-
 2 files changed, 10 insertions(+), 3 deletions(-)

-- 
2.40.1

Re: [PATCH v6 3/4] drm: Expand max DRM device number to full MINORBITS

2023-08-23 Thread Simon Ser

On Wednesday, August 23rd, 2023 at 12:53, Simon Ser  wrote:

> On Tuesday, August 8th, 2023 at 17:04, James Zhu jam...@amd.com wrote:
> 
> > I have a MR for libdrm to support drm nodes type up to 2^MINORBITS
> > nodes which can work with these patches,
> > 
> > https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305
> 
> FWIW, this MR has been merged, so in theory this kernel patch should
> work fine with latest libdrm.

Hm, we might want to adjust MAX_DRM_NODES still. It's set to 256
currently, which should be enough for 128 DRM devices, but not more.
Not bumping this value will make drmGetDevices2() print a warning and
not return all devices on systems with more than 128 DRM devices.

1 2 >

1 - 100 of 131 matches

Mail list logo