Re: [PATCH] drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL

2024-01-29 Thread Christian König

Am 30.01.24 um 04:04 schrieb Matthew Brost:

Rather then loop over entities until one with a ready job is found,
re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.

Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
Signed-off-by: Matthew Brost 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/scheduler/sched_main.c | 15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 8acbef7ae53d..7e90c9f95611 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct 
*w)
struct drm_sched_entity *entity;
struct dma_fence *fence;
struct drm_sched_fence *s_fence;
-   struct drm_sched_job *sched_job = NULL;
+   struct drm_sched_job *sched_job;
int r;
  
  	if (READ_ONCE(sched->pause_submit))

return;
  
  	/* Find entity with a ready job */

-   while (!sched_job && (entity = drm_sched_select_entity(sched))) {
-   sched_job = drm_sched_entity_pop_job(entity);
-   if (!sched_job)
-   complete_all(>entity_idle);
-   }
+   entity = drm_sched_select_entity(sched);
if (!entity)
return; /* No more work */
  
+	sched_job = drm_sched_entity_pop_job(entity);

+   if (!sched_job) {
+   complete_all(>entity_idle);
+   drm_sched_run_job_queue(sched);
+   return;
+   }
+
s_fence = sched_job->s_fence;
  
  	atomic_add(sched_job->credits, >credit_count);




Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-29 Thread Salvatore Bonaccorso
Hi,

[for this reply dropping the Debian bugreport to avoid later followups
sending the ack to the mailinglist and adding noise]

On Sun, Jan 28, 2024 at 11:44:59AM +0100, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 27.01.24 14:14, Salvatore Bonaccorso wrote:
> >
> > In Debian (https://bugs.debian.org/1061449) we got the following
> > quotred report:
> > 
> > On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
> >>
> >> Giving a try to 6.7, here is a message extracted from dmesg:
> >> [4.177226] [ cut here ]
> >> [4.177227] WARNING: CPU: 6 PID: 248 at
> >> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
> >> construct_phy+0xb26/0xd60 [amdgpu]
> > [...]
> 
> Not my area of expertise, but looks a lot like a duplicate of
> https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835
> 
> Mario (now CCed) already prepared a patch for that issue that seems to work.

#regzbot link: https://gitlab.freedesktop.org/drm/amd/-/issues/3122

Thanks. Indeed the reporter confirmed in
https://bugs.debian.org/1061449#55 that the patch fixes the issue.

So a duplicate of the above.

Regards,
Salvatore


Re: [PATCH v4 0/3] Convert Microchip's HLCDC Text based DT bindings to JSON schema

2024-01-29 Thread Dharma.B
Hi Conor,

On 24/01/24 10:10 pm, Conor Dooley wrote:
> On Wed, Jan 24, 2024 at 03:30:16PM +0530, Dharma Balasubiramani wrote:
>> Converted the text bindings to YAML and validated them individually using 
>> following commands
>>
>> $ make dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/
>> $ make dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/
>>
>> changelogs are available in respective patches.
>>
>> As Sam suggested I'm sending the PWM binding as it is in this patch series, 
>> clean up patch
>> will be sent as separate patch.
> Please give discussion on the previous version some time to complete
> before sending a new one. I've still got questions about the clocks
> there.

Could you please give a green signal to proceed with the v5 patch series 
with the following changes only in PATCH 3/3?

+  clocks:
+minItems: 3
+
+  clock-names:
+items:
+  - const: periph_clk
+  - const: sys_clk
+  - const: slow_clk
+  - const: lvds_pll_clk

> 
> Thanks,
> Conor.

-- 
With Best Regards,
Dharma B.



RE: [PATCH 2/3] udmabuf: Sync buffer mappings for attached devices

2024-01-29 Thread Kasireddy, Vivek
Hi Andrew,

> 
> On 1/26/24 1:25 AM, Kasireddy, Vivek wrote:
>  Currently this driver creates a SGT table using the CPU as the
>  target device, then performs the dma_sync operations against
>  that SGT. This is backwards to how DMA-BUFs are supposed to behave.
>  This may have worked for the case where these buffers were given
>  only back to the same CPU that produced them as in the QEMU case.
>  And only then because the original author had the dma_sync
>  operations also backwards, syncing for the "device" on begin_cpu.
>  This was noticed and "fixed" in this patch[0].
> 
>  That then meant we were sync'ing from the CPU to the CPU using
>  a pseudo-device "miscdevice". Which then caused another issue
>  due to the miscdevice not having a proper DMA mask (and why should
>  it, the CPU is not a DMA device). The fix for that was an even
>  more egregious hack[1] that declares the CPU is coherent with
>  itself and can access its own memory space..
> 
>  Unwind all this and perform the correct action by doing the dma_sync
>  operations for each device currently attached to the backing buffer.
> >>> Makes sense.
> >>>
> 
>  [0] commit 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
>  [1] commit 9e9fa6a9198b ("udmabuf: Set the DMA mask for the
> udmabuf
>  device (v2)")
> 
>  Signed-off-by: Andrew Davis 
>  ---
> drivers/dma-buf/udmabuf.c | 41 +++
> 1 file changed, 16 insertions(+), 25 deletions(-)
> 
>  diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>  index 3a23f0a7d112a..ab6764322523c 100644
>  --- a/drivers/dma-buf/udmabuf.c
>  +++ b/drivers/dma-buf/udmabuf.c
>  @@ -26,8 +26,6 @@ MODULE_PARM_DESC(size_limit_mb, "Max size
> of a
>  dmabuf, in megabytes. Default is
> struct udmabuf {
>   pgoff_t pagecount;
>   struct page **pages;
>  -struct sg_table *sg;
>  -struct miscdevice *device;
>   struct list_head attachments;
>   struct mutex lock;
> };
>  @@ -169,12 +167,8 @@ static void unmap_udmabuf(struct
>  dma_buf_attachment *at,
> static void release_udmabuf(struct dma_buf *buf)
> {
>   struct udmabuf *ubuf = buf->priv;
>  -struct device *dev = ubuf->device->this_device;
>   pgoff_t pg;
> 
>  -if (ubuf->sg)
>  -put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> >>> What happens if the last importer maps the dmabuf but erroneously
> >>> closes it immediately? Would unmap somehow get called in this case?
> >>>
> >>
> >> Good question, had to scan the framework code a bit here. I thought
> >> closing a DMABUF handle would automatically unwind any current
> >> attachments/mappings, but it seems nothing in the framework does that.
> >>
> >> Looks like that is up to the importing drivers[0]:
> >>
> >>> Once a driver is done with a shared buffer it needs to call
> >>> dma_buf_detach() (after cleaning up any mappings) and then
> >>> release the reference acquired with dma_buf_get() by
> >>> calling dma_buf_put().
> >>
> >> So closing a DMABUF after mapping without first unmapping it would
> >> be a bug in the importer, it is not the exporters problem to check
> > It may be a bug in the importer but wouldn't the memory associated
> > with the sg table and attachment get leaked if unmap doesn't get called
> > in this scenario?
> >
> 
> Yes the attachment data would be leaked if unattach was not called,
> but that is true for all DMABUF exporters. The .release() callback
> is meant to be the mirror of the export function and it only cleans
> up that. Same for attach/unattach, map/unmap, etc.. If these calls
> are not balanced then yes they can leak memory.
> 
> Since balance is guaranteed by the API, checking the balance should
> be done at that level, not in each and every exporter. If your
> comment is that we should add those checks into the DMABUF framework
> layer then I would agree.
Yeah, to avoid leaking memory, it would be even better if the framework
can call unmap when the importer fails to do so. Not sure if this is easier
said than done. 

Thanks,
Vivek

> 
> Andrew
> 
> > Thanks,
> > Vivek
> >
> >> for (although some more warnings in the framework checking for that
> >> might not be a bad idea..).
> >>
> >> Andrew
> >>
> >> [0] https://www.kernel.org/doc/html/v6.7/driver-api/dma-buf.html
> >>
> >>> Thanks,
> >>> Vivek
> >>>
>  -
>   for (pg = 0; pg < ubuf->pagecount; pg++)
>   put_page(ubuf->pages[pg]);
>   kfree(ubuf->pages);
>  @@ -185,33 +179,31 @@ static int begin_cpu_udmabuf(struct
> dma_buf
>  *buf,
>    enum dma_data_direction direction)
> {
>   struct udmabuf *ubuf = buf->priv;
>  -struct device *dev 

Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Abhinav Kumar




On 1/29/2024 9:28 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 06:10, Abhinav Kumar  wrote:




On 1/29/2024 5:43 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 03:07, Abhinav Kumar  wrote:




On 1/29/2024 4:03 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  wrote:




On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:

On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  wrote:



On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.


I think this is handled in the DP driver, where we disallow wide bus
for YUV 4:2:0 modes.

Yes we do disallow wide bus for YUV420 modes, but we still need to
program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
this check.


As I wrote in my second email, I'd prefer to have one if which guards
HCTL_EN and another one for WIDEN


Its hard to separate out the conditions just for HCTL_EN . Its more
about handling the various pixel per clock combinations.

But, here is how I can best summarize it.

Lets consider DSI and DP separately:

1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).

This is same the same condition as widebus today in
msm_dsi_host_is_wide_bus_enabled().

Hence no changes needed for DSI.


Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
enabled, while you have written that HCTL_EN should be set in all
cases on a corresponding platform.



Agreed. This is true, we should enable HCTL_EN for DSI irrespective of
widebus for the versions I wrote.

Basically for the non-compressed case.

I will write something up to fix this for DSI. I think this can go as a
bug fix.

But that does not change the DP conditions OR in other words, I dont see
anything wrong with this patch yet.



2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
as they are independent cases. We dont support YUV420 + DSC case.

There are other cases which fall outside of this bucket but they are
optional ones. We only follow the "required" ones.

With this summary in mind, I am fine with what we have except perhaps
better documentation above this block.

When DSC over DP gets added, I am expecting no changes to this block as
it will fall under the widebus_en case.

With this information, how else would you like the check?


What does this bit really change?



This bit basically just tells that the data sent per line is programmed
with INTF_DISPLAY_DATA_HCTL like this cap is suggesting.

   if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) {
   DPU_REG_WRITE(c, INTF_CONFIG2, intf_cfg2);
   DPU_REG_WRITE(c, INTF_DISPLAY_DATA_HCTL,
display_data_hctl);
   DPU_REG_WRITE(c, INTF_ACTIVE_DATA_HCTL, active_data_hctl);
   }

Prior to that it was programmed with INTF_DISPLAY_HCTL in the same function.


Can we enable it unconditionally for DPU >= 5.0?



There is a corner-case that we should not enable it when compression is
enabled without widebus as per the docs :(


What about explicitly disabling it in such a case?
I mean something like:

if (dpu_core_rev >= 5.0 && !(enc->hw_dsc && !enc->wide_bus_en))
intf_cfg |= INTF_CFG2_HCTL_EN;



Condition is correct now. But we dont have enc or dpu version in this 
function.


We need to pass a new parameter called compression_en to 
dpu_hw_intf_timing_params and set it when dsc is used and then do this. 
We have widebus_en already in dpu_hw_intf_timing_params.


This was anyway part of the DSC over DP but we can add that here and 
then the if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) is indicative 
of dpu version >=5 so we can move this setup there.






For DP there will not be a case like that because compression and
widebus go together but for DSI, it is possible.

So I found that the reset value of this register does cover all cases
for DPU >= 7.0 so below fix will address the DSI concern and will fix
the issue even for YUV420 cases such as this one for DPU >= 7.0

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc4..cbd5ebd516cd 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,6 +168,8 @@ static void dpu_hw_intf_setup_timing_engine(struct
dpu_hw_intf *ctx,
   * video timing. It is recommended to enable it for all cases,
except
   * if compression is enabled in 1 pixel per clock mode
   */
+
+   intf_cfg2 = DPU_REG_READ(c, INTF_CONFIG2);
  if (p->wide_bus_en)
  intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN |
INTF_CFG2_DATA_HCTL_EN;


But, this does not still work for DPU < 7.0 such as sc7180 if we try
YUV420 over DP on that because its DPU version is 6.2 so we will have to
keep this patch for those cases.









Signed-off-by: Paloma Arellano 
---
  

Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Dmitry Baryshkov
On Tue, 30 Jan 2024 at 06:10, Abhinav Kumar  wrote:
>
>
>
> On 1/29/2024 5:43 PM, Dmitry Baryshkov wrote:
> > On Tue, 30 Jan 2024 at 03:07, Abhinav Kumar  
> > wrote:
> >>
> >>
> >>
> >> On 1/29/2024 4:03 PM, Dmitry Baryshkov wrote:
> >>> On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  
> >>> wrote:
> 
> 
> 
>  On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:
> > On Sun, 28 Jan 2024 at 07:16, Paloma Arellano 
> >  wrote:
> >>
> >>
> >> On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:
> >>> On 25/01/2024 21:38, Paloma Arellano wrote:
>  INTF_CONFIG2 register cannot have widebus enabled when DP format is
>  YUV420. Therefore, program the INTF to send 1 ppc.
> >>>
> >>> I think this is handled in the DP driver, where we disallow wide bus
> >>> for YUV 4:2:0 modes.
> >> Yes we do disallow wide bus for YUV420 modes, but we still need to
> >> program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
> >> this check.
> >
> > As I wrote in my second email, I'd prefer to have one if which guards
> > HCTL_EN and another one for WIDEN
> >
>  Its hard to separate out the conditions just for HCTL_EN . Its more
>  about handling the various pixel per clock combinations.
> 
>  But, here is how I can best summarize it.
> 
>  Lets consider DSI and DP separately:
> 
>  1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).
> 
>  This is same the same condition as widebus today in
>  msm_dsi_host_is_wide_bus_enabled().
> 
>  Hence no changes needed for DSI.
> >>>
> >>> Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
> >>> enabled, while you have written that HCTL_EN should be set in all
> >>> cases on a corresponding platform.
> >>>
> >>
> >> Agreed. This is true, we should enable HCTL_EN for DSI irrespective of
> >> widebus for the versions I wrote.
> >>
> >> Basically for the non-compressed case.
> >>
> >> I will write something up to fix this for DSI. I think this can go as a
> >> bug fix.
> >>
> >> But that does not change the DP conditions OR in other words, I dont see
> >> anything wrong with this patch yet.
> >>
> 
>  2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
>  as they are independent cases. We dont support YUV420 + DSC case.
> 
>  There are other cases which fall outside of this bucket but they are
>  optional ones. We only follow the "required" ones.
> 
>  With this summary in mind, I am fine with what we have except perhaps
>  better documentation above this block.
> 
>  When DSC over DP gets added, I am expecting no changes to this block as
>  it will fall under the widebus_en case.
> 
>  With this information, how else would you like the check?
> >>>
> >>> What does this bit really change?
> >>>
> >>
> >> This bit basically just tells that the data sent per line is programmed
> >> with INTF_DISPLAY_DATA_HCTL like this cap is suggesting.
> >>
> >>   if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) {
> >>   DPU_REG_WRITE(c, INTF_CONFIG2, intf_cfg2);
> >>   DPU_REG_WRITE(c, INTF_DISPLAY_DATA_HCTL,
> >> display_data_hctl);
> >>   DPU_REG_WRITE(c, INTF_ACTIVE_DATA_HCTL, 
> >> active_data_hctl);
> >>   }
> >>
> >> Prior to that it was programmed with INTF_DISPLAY_HCTL in the same 
> >> function.
> >
> > Can we enable it unconditionally for DPU >= 5.0?
> >
>
> There is a corner-case that we should not enable it when compression is
> enabled without widebus as per the docs :(

What about explicitly disabling it in such a case?
I mean something like:

if (dpu_core_rev >= 5.0 && !(enc->hw_dsc && !enc->wide_bus_en))
   intf_cfg |= INTF_CFG2_HCTL_EN;


>
> For DP there will not be a case like that because compression and
> widebus go together but for DSI, it is possible.
>
> So I found that the reset value of this register does cover all cases
> for DPU >= 7.0 so below fix will address the DSI concern and will fix
> the issue even for YUV420 cases such as this one for DPU >= 7.0
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> index 6bba531d6dc4..cbd5ebd516cd 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> @@ -168,6 +168,8 @@ static void dpu_hw_intf_setup_timing_engine(struct
> dpu_hw_intf *ctx,
>   * video timing. It is recommended to enable it for all cases,
> except
>   * if compression is enabled in 1 pixel per clock mode
>   */
> +
> +   intf_cfg2 = DPU_REG_READ(c, INTF_CONFIG2);
>  if (p->wide_bus_en)
>  intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN |
> INTF_CFG2_DATA_HCTL_EN;
>
>
> But, this does not still work for DPU < 7.0 such as sc7180 if we try
> YUV420 over DP on that because its DPU version is 6.2 so we will have to
> keep 

Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Abhinav Kumar




On 1/29/2024 5:43 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 03:07, Abhinav Kumar  wrote:




On 1/29/2024 4:03 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  wrote:




On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:

On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  wrote:



On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.


I think this is handled in the DP driver, where we disallow wide bus
for YUV 4:2:0 modes.

Yes we do disallow wide bus for YUV420 modes, but we still need to
program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
this check.


As I wrote in my second email, I'd prefer to have one if which guards
HCTL_EN and another one for WIDEN


Its hard to separate out the conditions just for HCTL_EN . Its more
about handling the various pixel per clock combinations.

But, here is how I can best summarize it.

Lets consider DSI and DP separately:

1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).

This is same the same condition as widebus today in
msm_dsi_host_is_wide_bus_enabled().

Hence no changes needed for DSI.


Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
enabled, while you have written that HCTL_EN should be set in all
cases on a corresponding platform.



Agreed. This is true, we should enable HCTL_EN for DSI irrespective of
widebus for the versions I wrote.

Basically for the non-compressed case.

I will write something up to fix this for DSI. I think this can go as a
bug fix.

But that does not change the DP conditions OR in other words, I dont see
anything wrong with this patch yet.



2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
as they are independent cases. We dont support YUV420 + DSC case.

There are other cases which fall outside of this bucket but they are
optional ones. We only follow the "required" ones.

With this summary in mind, I am fine with what we have except perhaps
better documentation above this block.

When DSC over DP gets added, I am expecting no changes to this block as
it will fall under the widebus_en case.

With this information, how else would you like the check?


What does this bit really change?



This bit basically just tells that the data sent per line is programmed
with INTF_DISPLAY_DATA_HCTL like this cap is suggesting.

  if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) {
  DPU_REG_WRITE(c, INTF_CONFIG2, intf_cfg2);
  DPU_REG_WRITE(c, INTF_DISPLAY_DATA_HCTL,
display_data_hctl);
  DPU_REG_WRITE(c, INTF_ACTIVE_DATA_HCTL, active_data_hctl);
  }

Prior to that it was programmed with INTF_DISPLAY_HCTL in the same function.


Can we enable it unconditionally for DPU >= 5.0?



There is a corner-case that we should not enable it when compression is 
enabled without widebus as per the docs :(


For DP there will not be a case like that because compression and 
widebus go together but for DSI, it is possible.


So I found that the reset value of this register does cover all cases 
for DPU >= 7.0 so below fix will address the DSI concern and will fix 
the issue even for YUV420 cases such as this one for DPU >= 7.0


diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c

index 6bba531d6dc4..cbd5ebd516cd 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,6 +168,8 @@ static void dpu_hw_intf_setup_timing_engine(struct 
dpu_hw_intf *ctx,
 * video timing. It is recommended to enable it for all cases, 
except

 * if compression is enabled in 1 pixel per clock mode
 */
+
+   intf_cfg2 = DPU_REG_READ(c, INTF_CONFIG2);
if (p->wide_bus_en)
intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | 
INTF_CFG2_DATA_HCTL_EN;



But, this does not still work for DPU < 7.0 such as sc7180 if we try 
YUV420 over DP on that because its DPU version is 6.2 so we will have to 
keep this patch for those cases.










Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc41..bfb93f02fe7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,7 +168,9 @@ static void
dpu_hw_intf_setup_timing_engine(struct dpu_hw_intf *ctx,
  * video timing. It is recommended to enable it for all cases,
except
  * if compression is enabled in 1 pixel per clock mode
  */
-if (p->wide_bus_en)
+if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
+intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
+else if (p->wide_bus_en)

RE: re:Making drm_gpuvm work across gpu devices

2024-01-29 Thread Zeng, Oak
Hi Chunming,

In this email thread, Christian mentioned a very special virtualization 
environment where multiple guess processes relies on a host proxy process to 
talk to kfd. Such setup has a hard confliction with SVM concept as SVM means 
shared virtual address space in *one* process while the host proxy process in 
this setup need to represent multiple guest process. Thus SVM doesn’t work in 
such setup.

Normal GPU virtualization such as SRIOV, or system virtualization (such as 
passing whole GPU device to guest machine), works perfectly fine with SVM 
design.

Regards,
Oak

From: 周春明(日月) 
Sent: Monday, January 29, 2024 10:55 PM
To: Felix Kuehling ; Christian König 
; Daniel Vetter 
Cc: Brost, Matthew ; thomas.hellst...@linux.intel.com; 
Welty, Brian ; Ghimiray, Himal Prasad 
; dri-devel@lists.freedesktop.org; Gupta, 
saurabhg ; Danilo Krummrich ; Zeng, 
Oak ; Bommu, Krishnaiah ; Dave 
Airlie ; Vishwanathapura, Niranjana 
; intel...@lists.freedesktop.org; 毛钧(落提) 

Subject: re:Making drm_gpuvm work across gpu devices


Hi Felix,

Following your thread, you mentioned many times that SVM API cannot run in 
virtualization env, I still don't get it why.
Why you often said need a host proxy process? Cannot HW report page fault 
interrupt per VF via vfid? Isn't it sriov env?

Regargs,
-Chunming
--
发件人:Felix Kuehling mailto:felix.kuehl...@amd.com>>
发送时间:2024年1月30日(星期二) 04:24
收件人:"Christian König" 
mailto:christian.koe...@amd.com>>; Daniel Vetter 
mailto:dan...@ffwll.ch>>
抄 送:"Brost, Matthew" mailto:matthew.br...@intel.com>>; 
thomas.hellst...@linux.intel.com 
mailto:thomas.hellst...@linux.intel.com>>; 
"Welty, Brian" mailto:brian.we...@intel.com>>; 
"Ghimiray, Himal Prasad" 
mailto:himal.prasad.ghimi...@intel.com>>; 
dri-devel@lists.freedesktop.org 
mailto:dri-devel@lists.freedesktop.org>>; 
"Gupta, saurabhg" mailto:saurabhg.gu...@intel.com>>; 
Danilo Krummrich mailto:d...@redhat.com>>; "Zeng, Oak" 
mailto:oak.z...@intel.com>>; "Bommu, Krishnaiah" 
mailto:krishnaiah.bo...@intel.com>>; Dave Airlie 
mailto:airl...@redhat.com>>; "Vishwanathapura, Niranjana" 
mailto:niranjana.vishwanathap...@intel.com>>;
 intel...@lists.freedesktop.org 
mailto:intel...@lists.freedesktop.org>>
主 题:Re: Making drm_gpuvm work across gpu devices


On 2024-01-29 14:03, Christian König wrote:
> Am 29.01.24 um 18:52 schrieb Felix Kuehling:
>> On 2024-01-29 11:28, Christian König wrote:
>>> Am 29.01.24 um 17:24 schrieb Felix Kuehling:
 On 2024-01-29 10:33, Christian König wrote:
> Am 29.01.24 um 16:03 schrieb Felix Kuehling:
>> On 2024-01-25 13:32, Daniel Vetter wrote:
>>> On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
 Am 23.01.24 um 20:37 schrieb Zeng, Oak:
> [SNIP]
> Yes most API are per device based.
>
> One exception I know is actually the kfd SVM API. If you look
> at the svm_ioctl function, it is per-process based. Each
> kfd_process represent a process across N gpu devices.
 Yeah and that was a big mistake in my opinion. We should really
 not do that
 ever again.

> Need to say, kfd SVM represent a shared virtual address space
> across CPU and all GPU devices on the system. This is by the
> definition of SVM (shared virtual memory). This is very
> different from our legacy gpu *device* driver which works for
> only one device (i.e., if you want one device to access
> another device's memory, you will have to use dma-buf
> export/import etc).
 Exactly that thinking is what we have currently found as
 blocker for a
 virtualization projects. Having SVM as device independent
 feature which
 somehow ties to the process address space turned out to be an
 extremely bad
 idea.

 The background is that this only works for some use cases but
 not all of
 them.

 What's working much better is to just have a mirror
 functionality which says
 that a range A..B of the process address space is mapped into a
 range C..D
 of the GPU address space.

 Those ranges can then be used to implement the SVM feature
 required for
 higher level APIs and not something you need at the UAPI or
 even inside the
 low level kernel memory management.

 When you talk about migrating memory to a device you also do
 this on a per
 device basis and *not* tied to the process address space. If
 you then get
 crappy performance because userspace gave contradicting
 information where to
 migrate memory then that's a bug in userspace and not something
 the kernel

Re: linux-next: Tree for Jan 29 (drm/xe/ and FB_IOMEM_HELPERS)

2024-01-29 Thread Randy Dunlap


On 1/28/24 19:30, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20240125:
> 
> New trees: i2c-host-fixes, i2c-host
> 

on riscv 64-bit or powerpc 64-bit:

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM_XE [=m] && DRM_XE [=m]=m


riscv 64-bit randconfig file is attached.

-- 
#Randy

config-r5943.gz
Description: application/gzip


re:Making drm_gpuvm work across gpu devices

2024-01-29 Thread 周春明(日月)
Hi Felix,
Following your thread, you mentioned many times that SVM API cannot run in 
virtualization env, I still don't get it why.
Why you often said need a host proxy process? Cannot HW report page fault 
interrupt per VF via vfid? Isn't it sriov env?
Regargs,
-Chunming
--
发件人:Felix Kuehling 
发送时间:2024年1月30日(星期二) 04:24
收件人:"Christian König" ; Daniel Vetter 

抄 送:"Brost, Matthew" ; 
thomas.hellst...@linux.intel.com ; "Welty, 
Brian" ; "Ghimiray, Himal Prasad" 
; dri-devel@lists.freedesktop.org 
; "Gupta, saurabhg" 
; Danilo Krummrich ; "Zeng, Oak" 
; "Bommu, Krishnaiah" ; Dave 
Airlie ; "Vishwanathapura, Niranjana" 
; intel...@lists.freedesktop.org 

主 题:Re: Making drm_gpuvm work across gpu devices
On 2024-01-29 14:03, Christian König wrote:
> Am 29.01.24 um 18:52 schrieb Felix Kuehling:
>> On 2024-01-29 11:28, Christian König wrote:
>>> Am 29.01.24 um 17:24 schrieb Felix Kuehling:
 On 2024-01-29 10:33, Christian König wrote:
> Am 29.01.24 um 16:03 schrieb Felix Kuehling:
>> On 2024-01-25 13:32, Daniel Vetter wrote:
>>> On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
 Am 23.01.24 um 20:37 schrieb Zeng, Oak:
> [SNIP]
> Yes most API are per device based.
>
> One exception I know is actually the kfd SVM API. If you look 
> at the svm_ioctl function, it is per-process based. Each 
> kfd_process represent a process across N gpu devices.
 Yeah and that was a big mistake in my opinion. We should really 
 not do that
 ever again.

> Need to say, kfd SVM represent a shared virtual address space 
> across CPU and all GPU devices on the system. This is by the 
> definition of SVM (shared virtual memory). This is very 
> different from our legacy gpu *device* driver which works for 
> only one device (i.e., if you want one device to access 
> another device's memory, you will have to use dma-buf 
> export/import etc).
 Exactly that thinking is what we have currently found as 
 blocker for a
 virtualization projects. Having SVM as device independent 
 feature which
 somehow ties to the process address space turned out to be an 
 extremely bad
 idea.

 The background is that this only works for some use cases but 
 not all of
 them.

 What's working much better is to just have a mirror 
 functionality which says
 that a range A..B of the process address space is mapped into a 
 range C..D
 of the GPU address space.

 Those ranges can then be used to implement the SVM feature 
 required for
 higher level APIs and not something you need at the UAPI or 
 even inside the
 low level kernel memory management.

 When you talk about migrating memory to a device you also do 
 this on a per
 device basis and *not* tied to the process address space. If 
 you then get
 crappy performance because userspace gave contradicting 
 information where to
 migrate memory then that's a bug in userspace and not something 
 the kernel
 should try to prevent somehow.

 [SNIP]
>> I think if you start using the same drm_gpuvm for multiple 
>> devices you
>> will sooner or later start to run into the same mess we have 
>> seen with
>> KFD, where we moved more and more functionality from the KFD 
>> to the DRM
>> render node because we found that a lot of the stuff simply 
>> doesn't work
>> correctly with a single object to maintain the state.
> As I understand it, KFD is designed to work across devices. A 
> single pseudo /dev/kfd device represent all hardware gpu 
> devices. That is why during kfd open, many pdd (process device 
> data) is created, each for one hardware device for this process.
 Yes, I'm perfectly aware of that. And I can only repeat myself 
 that I see
 this design as a rather extreme failure. And I think it's one 
 of the reasons
 why NVidia is so dominant with Cuda.

 This whole approach KFD takes was designed with the idea of 
 extending the
 CPU process into the GPUs, but this idea only works for a few 
 use cases and
 is not something we should apply to drivers in general.

 A very good example are virtualization use cases where you end 
 up with CPU
 address != GPU address because the VAs are actually coming from 
 the guest VM
 and not the host process.

 SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
 should not have
 any 

[PATCH] nouveau/gsp: use correct size for registry rpc.

2024-01-29 Thread Dave Airlie
From: Dave Airlie 

Timur pointed this out before, and it just slipped my mind,
but this might help some things work better, around pcie power
management.

Fixes: 8d55b0a940bb ("nouveau/gsp: add some basic registry entries.")
Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 9ee58e2a0eb2..5e1fa176aac4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1078,7 +1078,6 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
if (IS_ERR(rpc))
return PTR_ERR(rpc);
 
-   rpc->size = sizeof(*rpc);
rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
 
str_offset = offsetof(typeof(*rpc), entries[NV_GSP_REG_NUM_ENTRIES]);
@@ -1094,6 +1093,7 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
strings += name_len;
str_offset += name_len;
}
+   rpc->size = str_offset;
 
return nvkm_gsp_rpc_wr(gsp, rpc, false);
 }
-- 
2.43.0



[PATCH] drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL

2024-01-29 Thread Matthew Brost
Rather then loop over entities until one with a ready job is found,
re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.

Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
Signed-off-by: Matthew Brost 
---
 drivers/gpu/drm/scheduler/sched_main.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 8acbef7ae53d..7e90c9f95611 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct 
*w)
struct drm_sched_entity *entity;
struct dma_fence *fence;
struct drm_sched_fence *s_fence;
-   struct drm_sched_job *sched_job = NULL;
+   struct drm_sched_job *sched_job;
int r;
 
if (READ_ONCE(sched->pause_submit))
return;
 
/* Find entity with a ready job */
-   while (!sched_job && (entity = drm_sched_select_entity(sched))) {
-   sched_job = drm_sched_entity_pop_job(entity);
-   if (!sched_job)
-   complete_all(>entity_idle);
-   }
+   entity = drm_sched_select_entity(sched);
if (!entity)
return; /* No more work */
 
+   sched_job = drm_sched_entity_pop_job(entity);
+   if (!sched_job) {
+   complete_all(>entity_idle);
+   drm_sched_run_job_queue(sched);
+   return;
+   }
+
s_fence = sched_job->s_fence;
 
atomic_add(sched_job->credits, >credit_count);
-- 
2.34.1



Re: [PATCH v11 14/26] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread

2024-01-29 Thread Byungchul Park
On Fri, Jan 26, 2024 at 06:30:02PM +0100, Thomas Gleixner wrote:
> On Wed, Jan 24 2024 at 20:59, Byungchul Park wrote:
> 
> Why is lockdep in the subsystem prefix here? You are changing the CPU
> hotplug (not hotplus) code, right?
> 
> > cb92173d1f0 ("locking/lockdep, cpu/hotplug: Annotate AP thread") was
> > introduced to make lockdep_assert_cpus_held() work in AP thread.
> >
> > However, the annotation is too strong for that purpose. We don't have to
> > use more than try lock annotation for that.
> 
> This lacks a proper explanation why this is too strong.
> 
> > Furthermore, now that Dept was introduced, false positive alarms was
> > reported by that. Replaced it with try lock annotation.
> 
> I still have zero idea what this is about.

1. can track PG_locked that is a potential deadlock trigger.

   
https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.p...@lge.com/

2. can track any waits/events e.g. wait_for_xxx(), dma fence and so on.

3. easy to annotate using dept_wait() on waits, dept_event() on events.

4. track read lock better way instead of the ugly way, by assinging wait
   or event annotations onto read lock and write lock. For instrance, a
   read lock is annotated as a potential waiter for its write unlock,
   and a write lock is annotated as a potential waiter for either write
   unlock or read unlock.

I'd like to remove unnecessary complexity on deadlock detection and add
additional functionality by making it do what the type of tool exactly
should do.

Byungchul

> Thanks,
> 
> tglx


linux-next: build warning after merge of the amdgpu tree

2024-01-29 Thread Stephen Rothwell
Hi all,

After merging the amdgpu tree, today's linux-next build (htmldocs)
produced this warning:

drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no 
structured comments found
drivers/gpu/drm/amd/display/dc/link/hwss/link_hwss_dio.h:1: warning: no 
structured comments found

Introduced by commit

  d79833f34bdc ("Documentation/gpu: Add entry for the DIO component")

-- 
Cheers,
Stephen Rothwell


pgpYwRqXIhFSc.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the amdgpu tree

2024-01-29 Thread Stephen Rothwell
Hi all,

After merging the amdgpu tree, today's linux-next build (htmldocs)
produced this warning:

drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured comments 
found
drivers/gpu/drm/amd/display/dc/inc/hw/opp.h:1: warning: no structured comments 
found

Introduced by commit

  0fba33311e63 ("Documentation/gpu: Add entry for OPP in the kernel doc")

-- 
Cheers,
Stephen Rothwell


pgp6CfwGszaC6.pgp
Description: OpenPGP digital signature


linux-next: build warnings after merge of the amdgpu tree

2024-01-29 Thread Stephen Rothwell
Hi all,

After merging the amdgpu tree, today's linux-next build (htmldocs)
produced these warnings:

drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:1: warning: no structured comments 
found
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter or 
struct member 'pre_multiplied_alpha' not described in 'mpcc_blnd_cfg'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter or 
struct member 'overlap_only' not described in 'mpcc_blnd_cfg'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'read_mpcc_state' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'mpc_init_single_inst' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'get_mpcc_for_dpp_from_secondary' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'get_mpcc_for_dpp' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'wait_for_idle' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'assert_mpcc_idle_before_connect' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'init_mpcc_list_from_hw' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_denorm' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_denorm_clamp' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_output_csc' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_ocsc_default' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_output_gamma' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'power_on_mpc_mem_pwr' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_dwb_mux' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'disable_dwb_mux' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'is_dwb_idle' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_out_rate_control' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_gamut_remap' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'program_1dlut' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'program_shaper' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'acquire_rmu' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'program_3dlut' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'release_rmu' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'get_mpc_out_mux' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_bg_color' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:548: warning: Function parameter or 
struct member 'set_mpc_mem_lp_mode' not described in 'mpc_funcs'
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:132: warning: Incorrect use of 
kernel-doc format:  * @@overlap_only: Whether overlapping of different 
planes is allowed.
drivers/gpu/drm/amd/display/dc/inc/hw/mpc.h:162: warning: Function parameter or 
struct member 

Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Dmitry Baryshkov
On Tue, 30 Jan 2024 at 03:07, Abhinav Kumar  wrote:
>
>
>
> On 1/29/2024 4:03 PM, Dmitry Baryshkov wrote:
> > On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  
> > wrote:
> >>
> >>
> >>
> >> On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:
> >>> On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  
> >>> wrote:
> 
> 
>  On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:
> > On 25/01/2024 21:38, Paloma Arellano wrote:
> >> INTF_CONFIG2 register cannot have widebus enabled when DP format is
> >> YUV420. Therefore, program the INTF to send 1 ppc.
> >
> > I think this is handled in the DP driver, where we disallow wide bus
> > for YUV 4:2:0 modes.
>  Yes we do disallow wide bus for YUV420 modes, but we still need to
>  program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
>  this check.
> >>>
> >>> As I wrote in my second email, I'd prefer to have one if which guards
> >>> HCTL_EN and another one for WIDEN
> >>>
> >> Its hard to separate out the conditions just for HCTL_EN . Its more
> >> about handling the various pixel per clock combinations.
> >>
> >> But, here is how I can best summarize it.
> >>
> >> Lets consider DSI and DP separately:
> >>
> >> 1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).
> >>
> >> This is same the same condition as widebus today in
> >> msm_dsi_host_is_wide_bus_enabled().
> >>
> >> Hence no changes needed for DSI.
> >
> > Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
> > enabled, while you have written that HCTL_EN should be set in all
> > cases on a corresponding platform.
> >
>
> Agreed. This is true, we should enable HCTL_EN for DSI irrespective of
> widebus for the versions I wrote.
>
> Basically for the non-compressed case.
>
> I will write something up to fix this for DSI. I think this can go as a
> bug fix.
>
> But that does not change the DP conditions OR in other words, I dont see
> anything wrong with this patch yet.
>
> >>
> >> 2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
> >> as they are independent cases. We dont support YUV420 + DSC case.
> >>
> >> There are other cases which fall outside of this bucket but they are
> >> optional ones. We only follow the "required" ones.
> >>
> >> With this summary in mind, I am fine with what we have except perhaps
> >> better documentation above this block.
> >>
> >> When DSC over DP gets added, I am expecting no changes to this block as
> >> it will fall under the widebus_en case.
> >>
> >> With this information, how else would you like the check?
> >
> > What does this bit really change?
> >
>
> This bit basically just tells that the data sent per line is programmed
> with INTF_DISPLAY_DATA_HCTL like this cap is suggesting.
>
>  if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) {
>  DPU_REG_WRITE(c, INTF_CONFIG2, intf_cfg2);
>  DPU_REG_WRITE(c, INTF_DISPLAY_DATA_HCTL,
> display_data_hctl);
>  DPU_REG_WRITE(c, INTF_ACTIVE_DATA_HCTL, active_data_hctl);
>  }
>
> Prior to that it was programmed with INTF_DISPLAY_HCTL in the same function.

Can we enable it unconditionally for DPU >= 5.0?

>
> >>
> >
> >>
> >> Signed-off-by: Paloma Arellano 
> >> ---
> >> drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> >> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> >> index 6bba531d6dc41..bfb93f02fe7c1 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
> >> @@ -168,7 +168,9 @@ static void
> >> dpu_hw_intf_setup_timing_engine(struct dpu_hw_intf *ctx,
> >>  * video timing. It is recommended to enable it for all cases,
> >> except
> >>  * if compression is enabled in 1 pixel per clock mode
> >>  */
> >> -if (p->wide_bus_en)
> >> +if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
> >> +intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
> >> +else if (p->wide_bus_en)
> >> intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | 
> >> INTF_CFG2_DATA_HCTL_EN;
> >>   data_width = p->width;
> >
> >>>
> >>>
> >>>
> >
> >
> >



-- 
With best wishes
Dmitry


Re: [PATCH v2 0/8] drm/lima: fixes and improvements to error recovery

2024-01-29 Thread Qiang Yu
Serial is Reviewed-by: QIang Yu 

On Wed, Jan 24, 2024 at 11:00 AM Erico Nunes  wrote:
>
> v1 reference:
> https://patchwork.kernel.org/project/dri-devel/cover/20240117031212.1104034-1-nunes.er...@gmail.com/
>
> Changes v1 -> v2:
> - Dropped patch 1 which aimed to fix
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/8415 .
> That will require more testing and an actual fix to the irq/timeout
> handler race. It can be solved separately so I am deferring it to a
> followup patch and keeping that issue open.
>
> - Added patches 2 and 4 to cover "reset time out" and bus stop bit to
> hard reset in gp as well.
>
> - Added handling of all processors in synchronize_irq in patch 5 to
> cover multiple pp. Dropped unnecessary duplicate fence in patch 5.
>
> - Added patch 7 in v2. After some discussion in patch 4 (v1), it seems
> to be reasonable to bump our timeout value so that we further decrease
> the chance of users actually hitting any of these timeouts by default.
>
> - Reworked patch 8 in v2. Since I broadened the work to not only focus
> in pp anymore, I also included the change to the other blocks as well.
>
> - Collected some reviews and acks in unmodified patches.
>
>
> Erico Nunes (8):
>   drm/lima: reset async_reset on pp hard reset
>   drm/lima: reset async_reset on gp hard reset
>   drm/lima: set pp bus_stop bit before hard reset
>   drm/lima: set gp bus_stop bit before hard reset
>   drm/lima: handle spurious timeouts due to high irq latency
>   drm/lima: remove guilty drm_sched context handling
>   drm/lima: increase default job timeout to 10s
>   drm/lima: standardize debug messages by ip name
>
>  drivers/gpu/drm/lima/lima_ctx.c  |  2 +-
>  drivers/gpu/drm/lima/lima_ctx.h  |  1 -
>  drivers/gpu/drm/lima/lima_gp.c   | 39 +---
>  drivers/gpu/drm/lima/lima_l2_cache.c |  6 +++--
>  drivers/gpu/drm/lima/lima_mmu.c  | 18 ++---
>  drivers/gpu/drm/lima/lima_pmu.c  |  3 ++-
>  drivers/gpu/drm/lima/lima_pp.c   | 37 --
>  drivers/gpu/drm/lima/lima_sched.c| 38 ++-
>  drivers/gpu/drm/lima/lima_sched.h|  3 +--
>  9 files changed, 107 insertions(+), 40 deletions(-)
>
> --
> 2.43.0
>


Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Abhinav Kumar




On 1/29/2024 4:03 PM, Dmitry Baryshkov wrote:

On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  wrote:




On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:

On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  wrote:



On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.


I think this is handled in the DP driver, where we disallow wide bus
for YUV 4:2:0 modes.

Yes we do disallow wide bus for YUV420 modes, but we still need to
program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
this check.


As I wrote in my second email, I'd prefer to have one if which guards
HCTL_EN and another one for WIDEN


Its hard to separate out the conditions just for HCTL_EN . Its more
about handling the various pixel per clock combinations.

But, here is how I can best summarize it.

Lets consider DSI and DP separately:

1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).

This is same the same condition as widebus today in
msm_dsi_host_is_wide_bus_enabled().

Hence no changes needed for DSI.


Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
enabled, while you have written that HCTL_EN should be set in all
cases on a corresponding platform.



Agreed. This is true, we should enable HCTL_EN for DSI irrespective of 
widebus for the versions I wrote.


Basically for the non-compressed case.

I will write something up to fix this for DSI. I think this can go as a 
bug fix.


But that does not change the DP conditions OR in other words, I dont see 
anything wrong with this patch yet.




2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
as they are independent cases. We dont support YUV420 + DSC case.

There are other cases which fall outside of this bucket but they are
optional ones. We only follow the "required" ones.

With this summary in mind, I am fine with what we have except perhaps
better documentation above this block.

When DSC over DP gets added, I am expecting no changes to this block as
it will fall under the widebus_en case.

With this information, how else would you like the check?


What does this bit really change?



This bit basically just tells that the data sent per line is programmed 
with INTF_DISPLAY_DATA_HCTL like this cap is suggesting.


if (ctx->cap->features & BIT(DPU_DATA_HCTL_EN)) {
DPU_REG_WRITE(c, INTF_CONFIG2, intf_cfg2);
DPU_REG_WRITE(c, INTF_DISPLAY_DATA_HCTL, 
display_data_hctl);

DPU_REG_WRITE(c, INTF_ACTIVE_DATA_HCTL, active_data_hctl);
}

Prior to that it was programmed with INTF_DISPLAY_HCTL in the same function.







Signed-off-by: Paloma Arellano 
---
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc41..bfb93f02fe7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,7 +168,9 @@ static void
dpu_hw_intf_setup_timing_engine(struct dpu_hw_intf *ctx,
 * video timing. It is recommended to enable it for all cases,
except
 * if compression is enabled in 1 pixel per clock mode
 */
-if (p->wide_bus_en)
+if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
+intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
+else if (p->wide_bus_en)
intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | INTF_CFG2_DATA_HCTL_EN;
  data_width = p->width;












Re: [PATCH v2 5/8] drm/lima: handle spurious timeouts due to high irq latency

2024-01-29 Thread Qiang Yu
On Tue, Jan 30, 2024 at 6:55 AM Erico Nunes  wrote:
>
> On Wed, Jan 24, 2024 at 1:38 PM Qiang Yu  wrote:
> >
> > On Wed, Jan 24, 2024 at 11:00 AM Erico Nunes  wrote:
> > >
> > > There are several unexplained and unreproduced cases of rendering
> > > timeouts with lima, for which one theory is high IRQ latency coming from
> > > somewhere else in the system.
> > > This kind of occurrence may cause applications to trigger unnecessary
> > > resets of the GPU or even applications to hang if it hits an issue in
> > > the recovery path.
> > > Panfrost already does some special handling to account for such
> > > "spurious timeouts", it makes sense to have this in lima too to reduce
> > > the chance that it hit users.
> > >
> > > Signed-off-by: Erico Nunes 
> > > ---
> > >  drivers/gpu/drm/lima/lima_sched.c | 31 ---
> > >  1 file changed, 28 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/lima/lima_sched.c 
> > > b/drivers/gpu/drm/lima/lima_sched.c
> > > index c3bf8cda8498..814428564637 100644
> > > --- a/drivers/gpu/drm/lima/lima_sched.c
> > > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > > @@ -1,6 +1,7 @@
> > >  // SPDX-License-Identifier: GPL-2.0 OR MIT
> > >  /* Copyright 2017-2019 Qiang Yu  */
> > >
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -401,9 +402,35 @@ static enum drm_gpu_sched_stat 
> > > lima_sched_timedout_job(struct drm_sched_job *job
> > > struct lima_sched_pipe *pipe = to_lima_pipe(job->sched);
> > > struct lima_sched_task *task = to_lima_task(job);
> > > struct lima_device *ldev = pipe->ldev;
> > > +   struct lima_ip *ip = pipe->processor[0];
> > > +   int i;
> > > +
> > > +   /*
> > > +* If the GPU managed to complete this jobs fence, the timeout is
> > > +* spurious. Bail out.
> > > +*/
> > > +   if (dma_fence_is_signaled(task->fence)) {
> > > +   DRM_WARN("%s spurious timeout\n", lima_ip_name(ip));
> > > +   return DRM_GPU_SCHED_STAT_NOMINAL;
> > > +   }
> > > +
> > > +   /*
> > > +* Lima IRQ handler may take a long time to process an interrupt
> > > +* if there is another IRQ handler hogging the processing.
> > > +* In order to catch such cases and not report spurious Lima job
> > > +* timeouts, synchronize the IRQ handler and re-check the fence
> > > +* status.
> > > +*/
> > > +   for (i = 0; i < pipe->num_processor; i++)
> > > +   synchronize_irq(pipe->processor[i]->irq);
> > > +
> > I have a question, this timeout handler will be called when GP/PP error IRQ.
> > If we call synchronize_irq() in the IRQ handler, will we block ourselves 
> > here?
>
> If I understand correctly, this handler is only called by drm_sched in
> a workqueue, not by gp or pp IRQ and it also does not run in any IRQ
> context.
> So I think this sort of lockup can't happen here.
>
Oh, right. I miss understand the drm_sched_fault() which still call the timeout
handler in work queue instead of caller thread.

> I ran some additional tests with both timeouts and actual error IRQs
> (locally modified Mesa to produce some errored jobs) and was not able
> to cause any lockup related to this.
>
> Erico


Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Abhinav Kumar




On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:

On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  wrote:



On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.


I think this is handled in the DP driver, where we disallow wide bus
for YUV 4:2:0 modes.

Yes we do disallow wide bus for YUV420 modes, but we still need to
program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
this check.


As I wrote in my second email, I'd prefer to have one if which guards
HCTL_EN and another one for WIDEN

Its hard to separate out the conditions just for HCTL_EN . Its more 
about handling the various pixel per clock combinations.


But, here is how I can best summarize it.

Lets consider DSI and DP separately:

1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).

This is same the same condition as widebus today in 
msm_dsi_host_is_wide_bus_enabled().


Hence no changes needed for DSI.

2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
as they are independent cases. We dont support YUV420 + DSC case.

There are other cases which fall outside of this bucket but they are 
optional ones. We only follow the "required" ones.


With this summary in mind, I am fine with what we have except perhaps 
better documentation above this block.


When DSC over DP gets added, I am expecting no changes to this block as 
it will fall under the widebus_en case.


With this information, how else would you like the check?





Signed-off-by: Paloma Arellano 
---
   drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc41..bfb93f02fe7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,7 +168,9 @@ static void
dpu_hw_intf_setup_timing_engine(struct dpu_hw_intf *ctx,
* video timing. It is recommended to enable it for all cases,
except
* if compression is enabled in 1 pixel per clock mode
*/
-if (p->wide_bus_en)
+if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
+intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
+else if (p->wide_bus_en)
   intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | INTF_CFG2_DATA_HCTL_EN;
 data_width = p->width;








Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-29 Thread Dmitry Baryshkov
On Tue, 30 Jan 2024 at 01:51, Abhinav Kumar  wrote:
>
>
>
> On 1/27/2024 9:33 PM, Dmitry Baryshkov wrote:
> > On Sun, 28 Jan 2024 at 07:16, Paloma Arellano  
> > wrote:
> >>
> >>
> >> On 1/25/2024 1:26 PM, Dmitry Baryshkov wrote:
> >>> On 25/01/2024 21:38, Paloma Arellano wrote:
>  INTF_CONFIG2 register cannot have widebus enabled when DP format is
>  YUV420. Therefore, program the INTF to send 1 ppc.
> >>>
> >>> I think this is handled in the DP driver, where we disallow wide bus
> >>> for YUV 4:2:0 modes.
> >> Yes we do disallow wide bus for YUV420 modes, but we still need to
> >> program the INTF_CFG2_DATA_HCTL_EN. Therefore, it is necessary to add
> >> this check.
> >
> > As I wrote in my second email, I'd prefer to have one if which guards
> > HCTL_EN and another one for WIDEN
> >
> Its hard to separate out the conditions just for HCTL_EN . Its more
> about handling the various pixel per clock combinations.
>
> But, here is how I can best summarize it.
>
> Lets consider DSI and DP separately:
>
> 1) For DSI, for anything > DSI version 2.5 ( DPU version 7 ).
>
> This is same the same condition as widebus today in
> msm_dsi_host_is_wide_bus_enabled().
>
> Hence no changes needed for DSI.

Not quite. msm_dsi_host_is_wide_bus_enabled() checks for the DSC being
enabled, while you have written that HCTL_EN should be set in all
cases on a corresponding platform.

>
> 2) For DP, whenever widebus is enabled AND YUV420 uncompressed case
> as they are independent cases. We dont support YUV420 + DSC case.
>
> There are other cases which fall outside of this bucket but they are
> optional ones. We only follow the "required" ones.
>
> With this summary in mind, I am fine with what we have except perhaps
> better documentation above this block.
>
> When DSC over DP gets added, I am expecting no changes to this block as
> it will fall under the widebus_en case.
>
> With this information, how else would you like the check?

What does this bit really change?

>
> >>>
> 
>  Signed-off-by: Paloma Arellano 
>  ---
> drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
>  diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
>  b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
>  index 6bba531d6dc41..bfb93f02fe7c1 100644
>  --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
>  +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
>  @@ -168,7 +168,9 @@ static void
>  dpu_hw_intf_setup_timing_engine(struct dpu_hw_intf *ctx,
>  * video timing. It is recommended to enable it for all cases,
>  except
>  * if compression is enabled in 1 pixel per clock mode
>  */
>  -if (p->wide_bus_en)
>  +if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
>  +intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
>  +else if (p->wide_bus_en)
> intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | INTF_CFG2_DATA_HCTL_EN;
>   data_width = p->width;
> >>>
> >
> >
> >



-- 
With best wishes
Dmitry


RE: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Zeng, Oak
The example you used to prove that KFD is a design failure, is against *any* 
design which utilize system allocator and hmm. The way that one proxy process 
running on host to handle many guest processes, doesn’t fit into the concept of 
“share address space b/t cpu and gpu”. The shared address space has to be 
within one process. Your proxy process represent many guest processes. It is a 
fundamental conflict.

Also your userptr proposal does’t solve this problem either:
Imagine you have guest process1 mapping CPU address range A…B to GPU address 
range C…D
And you have guest process 2 mapping CPU address range A…B to GPU address range 
C…D, since process 1 and 2 are two different process, it is legal for process 2 
to do the exact same mapping.
Now when gpu shader access address C…D, a gpu page fault happens, what does 
your proxy process do? Which guest process will this fault be directed to and 
handled? Except you have extra information/API to tell proxy process and GPU 
HW, there is no way to figure out.

Compared to the shared virtual address space concept of HMM, the userptr design 
is nothing new except it allows CPU and GPU to use different address to access 
the same object. If you replace above C…D with A…B, above description becomes a 
description of the “problem” of HMM/shared virtual address design.

Both design has the same difficulty with your example of the special 
virtualization environment setup.

As said, we spent effort scoped the userptr solution some time ago. The problem 
we found enabling userptr with migration were:

  1.  The user interface of userptr is not as convenient as system allocator. 
With the userptr solution, user need to call userptr_ioctl and vm_bind for 
*every* single cpu pointer that he want to use in a gpu program. While with 
system allocator, programmer just use any cpu pointer directly in gpu program 
without any extra driver ioctls.
  2.  We don’t see the real benefit of using a different Gpu address C…D than 
the A..B, except you can prove my above reasoning is wrong. In most use cases, 
you can make GPU C…D == CPU A…B, why bother then?
  3.  Looked into implementation details, since hmm fundamentally assume a 
shared virtual address space b/t cpu and device, for the userptr solution to 
leverage hmm, you need perform address space conversion every time you calls 
into hmm functions.

In summary, GPU device is just a piece of HW to accelerate your CPU program. If 
HW allows, it is more convenient to use shared address space b/t cpu and GPU. 
On old HW (example, no gpu page fault support, or gpu only has a very limited 
address space), we can disable system allocator/SVM. If you use different 
address space on modern GPU, why don’t you use different address space on 
different CPU cores?

Regards,
Oak
From: dri-devel  On Behalf Of 
Christian König
Sent: Monday, January 29, 2024 5:20 AM
To: Zeng, Oak ; Thomas Hellström 
; Daniel Vetter ; Dave 
Airlie 
Cc: Brost, Matthew ; Felix Kuehling 
; Welty, Brian ; 
dri-devel@lists.freedesktop.org; Ghimiray, Himal Prasad 
; Bommu, Krishnaiah 
; Gupta, saurabhg ; 
Vishwanathapura, Niranjana ; 
intel...@lists.freedesktop.org; Danilo Krummrich 
Subject: Re: Making drm_gpuvm work across gpu devices

Well Daniel and Dave noted it as well, so I'm just repeating it: Your design 
choices are not an argument to get something upstream.

It's the job of the maintainers and at the end of the Linus to judge of 
something is acceptable or not.

As far as I can see a good part of this this idea has been exercised lengthy 
with KFD and it turned out to not be the best approach.

So from what I've seen the design you outlined is extremely unlikely to go 
upstream.

Regards,
Christian.
Am 27.01.24 um 03:21 schrieb Zeng, Oak:
Regarding the idea of expanding userptr to support migration, we explored this 
idea long time ago. It provides similar functions of the system allocator but 
its interface is not as convenient as system allocator. Besides the shared 
virtual address space, another benefit of a system allocator is, you can 
offload cpu program to gpu easier, you don’t need to call driver specific API 
(such as register_userptr and vm_bind in this case) for memory allocation.

We also scoped the implementation. It turned out to be big, and not as 
beautiful as hmm. Why we gave up this approach.

From: Christian König 

Sent: Friday, January 26, 2024 7:52 AM
To: Thomas Hellström 
; 
Daniel Vetter 
Cc: Brost, Matthew ; 
Felix Kuehling ; Welty, 
Brian ; Ghimiray, Himal 
Prasad 
; 
Zeng, Oak ; Gupta, saurabhg 
; Danilo Krummrich 
; 
dri-devel@lists.freedesktop.org; Bommu, 
Krishnaiah 

Re: [PATCH 14/17] drm/msm/dpu: modify encoder programming for CDM over DP

2024-01-29 Thread Dmitry Baryshkov
On Mon, 29 Jan 2024 at 09:08, Abhinav Kumar  wrote:
>
> On 1/28/2024 10:12 PM, Dmitry Baryshkov wrote:
> > On Mon, 29 Jan 2024 at 07:03, Abhinav Kumar  
> > wrote:
> >>
> >>
> >>
> >> On 1/28/2024 7:42 PM, Dmitry Baryshkov wrote:
> >>> On Mon, 29 Jan 2024 at 04:58, Abhinav Kumar  
> >>> wrote:
> 
> 
> 
>  On 1/27/2024 9:55 PM, Dmitry Baryshkov wrote:
> > On Sun, 28 Jan 2024 at 07:48, Paloma Arellano 
> >  wrote:
> >>
> >>
> >> On 1/25/2024 1:57 PM, Dmitry Baryshkov wrote:
> >>> On 25/01/2024 21:38, Paloma Arellano wrote:
>  Adjust the encoder format programming in the case of video mode for 
>  DP
>  to accommodate CDM related changes.
> 
>  Signed-off-by: Paloma Arellano 
>  ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 16 +
>   drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h   |  8 +
>   .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  | 35 
>  ---
>   drivers/gpu/drm/msm/dp/dp_display.c   | 12 +++
>   drivers/gpu/drm/msm/msm_drv.h |  9 -
>   5 files changed, 75 insertions(+), 5 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
>  b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
>  index b0896814c1562..99ec53446ad21 100644
>  --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
>  +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
>  @@ -222,6 +222,22 @@ static u32 dither_matrix[DITHER_MATRIX_SZ] = {
>   15, 7, 13, 5, 3, 11, 1, 9, 12, 4, 14, 6, 0, 8, 2, 10
>   };
>   +u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc,
>  const struct drm_display_mode *mode)
>  +{
>  +const struct dpu_encoder_virt *dpu_enc;
>  +const struct msm_display_info *disp_info;
>  +struct msm_drm_private *priv;
>  +
>  +dpu_enc = to_dpu_encoder_virt(drm_enc);
>  +disp_info = _enc->disp_info;
>  +priv = drm_enc->dev->dev_private;
>  +
>  +if (disp_info->intf_type == INTF_DP &&
>  + msm_dp_is_yuv_420_enabled(priv->dp[disp_info->h_tile_instance[0]],
>  mode))
> >>>
> >>> This should not require interacting with DP. If we got here, we must
> >>> be sure that 4:2:0 is supported and can be configured.
> >> Ack. Will drop this function and only check for if the mode is YUV420.
> >>>
>  +return DRM_FORMAT_YUV420;
>  +
>  +return DRM_FORMAT_RGB888;
>  +}
> bool dpu_encoder_is_widebus_enabled(const struct drm_encoder
>  *drm_enc)
>   {
>  diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
>  b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
>  index 7b4afa71f1f96..62255d0aa4487 100644
>  --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
>  +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
>  @@ -162,6 +162,14 @@ int dpu_encoder_get_vsync_count(struct
>  drm_encoder *drm_enc);
>    */
>   bool dpu_encoder_is_widebus_enabled(const struct drm_encoder
>  *drm_enc);
>   +/**
>  + * dpu_encoder_get_drm_fmt - return DRM fourcc format
>  + * @drm_enc:Pointer to previously created drm encoder structure
>  + * @mode:Corresponding drm_display_mode for dpu encoder
>  + */
>  +u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc,
>  +const struct drm_display_mode *mode);
>  +
>   /**
>    * dpu_encoder_get_crc_values_cnt - get number of physical 
>  encoders
>  contained
>    *in virtual encoder that can collect CRC values
>  diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>  b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>  index e284bf448bdda..a1dde0ff35dc8 100644
>  --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>  +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
>  @@ -234,6 +234,7 @@ static void
>  dpu_encoder_phys_vid_setup_timing_engine(
>   {
>   struct drm_display_mode mode;
>   struct dpu_hw_intf_timing_params timing_params = { 0 };
>  +struct dpu_hw_cdm *hw_cdm;
>   const struct dpu_format *fmt = NULL;
>   u32 fmt_fourcc = DRM_FORMAT_RGB888;
>   unsigned long lock_flags;
>  @@ -254,17 +255,26 @@ static void
>  dpu_encoder_phys_vid_setup_timing_engine(
>   DPU_DEBUG_VIDENC(phys_enc, "enabling mode:\n");
>   drm_mode_debug_printmodeline();
>   -if 

Re: [PATCH 05/17] drm/msm/dp: add an API to indicate if sink supports VSC SDP

2024-01-29 Thread Paloma Arellano



On 1/26/2024 6:40 PM, Dmitry Baryshkov wrote:

On Sat, 27 Jan 2024 at 02:58, Paloma Arellano  wrote:


On 1/25/2024 1:23 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

YUV420 format is supported only in the VSC SDP packet and not through
MSA. Hence add an API which indicates the sink support which can be used
by the rest of the DP programming.

This API ideally should go to drm/display/drm_dp_helper.c

I'm not familiar how other vendors are checking if VSC SDP is supported.
So in moving this API, I'm going to let the other vendors make the
changes themselves.

Let me show it for you:

bool intel_dp_get_colorimetry_status(struct intel_dp *intel_dp)
{
 u8 dprx = 0;

 if (drm_dp_dpcd_readb(_dp->aux, DP_DPRX_FEATURE_ENUMERATION_LIST,
   ) != 1)
 return false;
 return dprx & DP_VSC_SDP_EXT_FOR_COLORIMETRY_SUPPORTED;
}



Signed-off-by: Paloma Arellano 
---
   drivers/gpu/drm/msm/dp/dp_display.c |  3 ++-
   drivers/gpu/drm/msm/dp/dp_panel.c   | 35 +
   drivers/gpu/drm/msm/dp/dp_panel.h   |  1 +
   3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c
b/drivers/gpu/drm/msm/dp/dp_display.c
index ddac55f45a722..f6b3b6ca242f8 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1617,7 +1617,8 @@ void dp_bridge_mode_set(struct drm_bridge
*drm_bridge,
   !!(dp_display->dp_mode.drm_mode.flags & DRM_MODE_FLAG_NHSYNC);
 dp_display->dp_mode.out_fmt_is_yuv_420 =
- drm_mode_is_420_only(>connector->display_info, adjusted_mode);
+ drm_mode_is_420_only(>connector->display_info, adjusted_mode) &&
+dp_panel_vsc_sdp_supported(dp_display->panel);
 /* populate wide_bus_support to different layers */
   dp_display->ctrl->wide_bus_en =
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 127f6af995cd1..af7820b6d35ec 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -17,6 +17,9 @@ struct dp_panel_private {
   struct dp_link *link;
   struct dp_catalog *catalog;
   bool panel_on;
+bool vsc_supported;
+u8 major;
+u8 minor;
   };
 static void dp_panel_read_psr_cap(struct dp_panel_private *panel)
@@ -43,9 +46,10 @@ static void dp_panel_read_psr_cap(struct
dp_panel_private *panel)
   static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
   {
   int rc;
+ssize_t rlen;
   struct dp_panel_private *panel;
   struct dp_link_info *link_info;
-u8 *dpcd, major, minor;
+u8 *dpcd, rx_feature;
 panel = container_of(dp_panel, struct dp_panel_private,
dp_panel);
   dpcd = dp_panel->dpcd;
@@ -53,10 +57,19 @@ static int dp_panel_read_dpcd(struct dp_panel
*dp_panel)
   if (rc)
   return rc;
   +rlen = drm_dp_dpcd_read(panel->aux,
DP_DPRX_FEATURE_ENUMERATION_LIST, _feature, 1);
+if (rlen != 1) {
+panel->vsc_supported = false;
+pr_debug("failed to read DP_DPRX_FEATURE_ENUMERATION_LIST\n");
+} else {
+panel->vsc_supported = !!(rx_feature &
DP_VSC_SDP_EXT_FOR_COLORIMETRY_SUPPORTED);
+pr_debug("vsc=%d\n", panel->vsc_supported);
+}
+
   link_info = _panel->link_info;
   link_info->revision = dpcd[DP_DPCD_REV];
-major = (link_info->revision >> 4) & 0x0f;
-minor = link_info->revision & 0x0f;
+panel->major = (link_info->revision >> 4) & 0x0f;
+panel->minor = link_info->revision & 0x0f;
 link_info->rate = drm_dp_max_link_rate(dpcd);
   link_info->num_lanes = drm_dp_max_lane_count(dpcd);
@@ -69,7 +82,7 @@ static int dp_panel_read_dpcd(struct dp_panel
*dp_panel)
   if (link_info->rate > dp_panel->max_dp_link_rate)
   link_info->rate = dp_panel->max_dp_link_rate;
   -drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", major, minor);
+drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", panel->major,
panel->minor);
   drm_dbg_dp(panel->drm_dev, "link_rate=%d\n", link_info->rate);
   drm_dbg_dp(panel->drm_dev, "lane_count=%d\n",
link_info->num_lanes);
   @@ -280,6 +293,20 @@ void dp_panel_tpg_config(struct dp_panel
*dp_panel, bool enable)
   dp_catalog_panel_tpg_enable(catalog,
>dp_panel.dp_mode.drm_mode);
   }
   +bool dp_panel_vsc_sdp_supported(struct dp_panel *dp_panel)
+{
+struct dp_panel_private *panel;
+
+if (!dp_panel) {
+pr_err("invalid input\n");
+return false;
+}
+
+panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
+
+return panel->major >= 1 && panel->minor >= 3 &&
panel->vsc_supported;

Anyway, this check is incorrect. Please compare the whole revision
against DP_DPCD_REV_13 instead of doing a maj/min comparison.

Ack



+}
+
   void dp_panel_dump_regs(struct dp_panel *dp_panel)
   {
   struct dp_catalog *catalog;
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.h
b/drivers/gpu/drm/msm/dp/dp_panel.h
index 

Re: [PATCH 01/17] drm/msm/dpu: allow dpu_encoder_helper_phys_setup_cdm to work for DP

2024-01-29 Thread Paloma Arellano



On 1/28/2024 9:12 PM, Dmitry Baryshkov wrote:

On Mon, 29 Jan 2024 at 06:33, Abhinav Kumar  wrote:



On 1/28/2024 8:12 PM, Dmitry Baryshkov wrote:

On Mon, 29 Jan 2024 at 06:01, Abhinav Kumar  wrote:



On 1/28/2024 7:23 PM, Dmitry Baryshkov wrote:

On Mon, 29 Jan 2024 at 05:06, Abhinav Kumar  wrote:



On 1/26/2024 4:39 PM, Paloma Arellano wrote:

On 1/25/2024 1:14 PM, Dmitry Baryshkov wrote:

On 25/01/2024 21:38, Paloma Arellano wrote:

Generalize dpu_encoder_helper_phys_setup_cdm to be compatible with DP.

Signed-off-by: Paloma Arellano 
---
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  4 +--
 .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 31 ++-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 993f263433314..37ac385727c3b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -153,6 +153,7 @@ enum dpu_intr_idx {
  * @hw_intf:Hardware interface to the intf registers
  * @hw_wb:Hardware interface to the wb registers
  * @hw_cdm:Hardware interface to the CDM registers
+ * @cdm_cfg:CDM block config needed to store WB/DP block's CDM
configuration

Please realign the description.

Ack

  * @dpu_kms:Pointer to the dpu_kms top level
  * @cached_mode:DRM mode cached at mode_set time, acted on in
enable
  * @vblank_ctl_lock:Vblank ctl mutex lock to protect
vblank_refcount
@@ -183,6 +184,7 @@ struct dpu_encoder_phys {
 struct dpu_hw_intf *hw_intf;
 struct dpu_hw_wb *hw_wb;
 struct dpu_hw_cdm *hw_cdm;
+struct dpu_hw_cdm_cfg cdm_cfg;

It might be slightly better to move it after all the pointers, so
after the dpu_kms.

Ack

 struct dpu_kms *dpu_kms;
 struct drm_display_mode cached_mode;
 struct mutex vblank_ctl_lock;
@@ -213,7 +215,6 @@ static inline int
dpu_encoder_phys_inc_pending(struct dpu_encoder_phys *phys)
  * @wbirq_refcount: Reference count of writeback interrupt
  * @wb_done_timeout_cnt: number of wb done irq timeout errors
  * @wb_cfg:  writeback block config to store fb related details
- * @cdm_cfg: cdm block config needed to store writeback block's CDM
configuration
  * @wb_conn: backpointer to writeback connector
  * @wb_job: backpointer to current writeback job
  * @dest:   dpu buffer layout for current writeback output buffer
@@ -223,7 +224,6 @@ struct dpu_encoder_phys_wb {
 atomic_t wbirq_refcount;
 int wb_done_timeout_cnt;
 struct dpu_hw_wb_cfg wb_cfg;
-struct dpu_hw_cdm_cfg cdm_cfg;
 struct drm_writeback_connector *wb_conn;
 struct drm_writeback_job *wb_job;
 struct dpu_hw_fmt_layout dest;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
index 4cd2d9e3131a4..072fc6950e496 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
@@ -269,28 +269,21 @@ static void
dpu_encoder_phys_wb_setup_ctl(struct dpu_encoder_phys *phys_enc)
  * This API does not handle
DPU_CHROMA_H1V2.
  * @phys_enc:Pointer to physical encoder
  */
-static void dpu_encoder_helper_phys_setup_cdm(struct
dpu_encoder_phys *phys_enc)
+static void dpu_encoder_helper_phys_setup_cdm(struct
dpu_encoder_phys *phys_enc,
+  const struct dpu_format *dpu_fmt,
+  u32 output_type)
 {
 struct dpu_hw_cdm *hw_cdm;
 struct dpu_hw_cdm_cfg *cdm_cfg;
 struct dpu_hw_pingpong *hw_pp;
-struct dpu_encoder_phys_wb *wb_enc;
-const struct msm_format *format;
-const struct dpu_format *dpu_fmt;
-struct drm_writeback_job *wb_job;
 int ret;
   if (!phys_enc)
 return;
 -wb_enc = to_dpu_encoder_phys_wb(phys_enc);
-cdm_cfg = _enc->cdm_cfg;
+cdm_cfg = _enc->cdm_cfg;
 hw_pp = phys_enc->hw_pp;
 hw_cdm = phys_enc->hw_cdm;
-wb_job = wb_enc->wb_job;
-
-format = msm_framebuffer_format(wb_enc->wb_job->fb);
-dpu_fmt = dpu_get_dpu_format_ext(format->pixel_format,
wb_job->fb->modifier);
   if (!hw_cdm)
 return;
@@ -306,10 +299,10 @@ static void
dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc)
   memset(cdm_cfg, 0, sizeof(struct dpu_hw_cdm_cfg));
 -cdm_cfg->output_width = wb_job->fb->width;
-cdm_cfg->output_height = wb_job->fb->height;
+cdm_cfg->output_width = phys_enc->cached_mode.hdisplay;
+cdm_cfg->output_height = phys_enc->cached_mode.vdisplay;

This is a semantic change. Instead of passing the FB size, this passes
the mode dimensions. They are not guaranteed to be the same,
especially for the WB case.


The WB job is storing the output FB of WB. 

Re: [PATCH v2 5/8] drm/lima: handle spurious timeouts due to high irq latency

2024-01-29 Thread Erico Nunes
On Wed, Jan 24, 2024 at 1:38 PM Qiang Yu  wrote:
>
> On Wed, Jan 24, 2024 at 11:00 AM Erico Nunes  wrote:
> >
> > There are several unexplained and unreproduced cases of rendering
> > timeouts with lima, for which one theory is high IRQ latency coming from
> > somewhere else in the system.
> > This kind of occurrence may cause applications to trigger unnecessary
> > resets of the GPU or even applications to hang if it hits an issue in
> > the recovery path.
> > Panfrost already does some special handling to account for such
> > "spurious timeouts", it makes sense to have this in lima too to reduce
> > the chance that it hit users.
> >
> > Signed-off-by: Erico Nunes 
> > ---
> >  drivers/gpu/drm/lima/lima_sched.c | 31 ---
> >  1 file changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c 
> > b/drivers/gpu/drm/lima/lima_sched.c
> > index c3bf8cda8498..814428564637 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -1,6 +1,7 @@
> >  // SPDX-License-Identifier: GPL-2.0 OR MIT
> >  /* Copyright 2017-2019 Qiang Yu  */
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -401,9 +402,35 @@ static enum drm_gpu_sched_stat 
> > lima_sched_timedout_job(struct drm_sched_job *job
> > struct lima_sched_pipe *pipe = to_lima_pipe(job->sched);
> > struct lima_sched_task *task = to_lima_task(job);
> > struct lima_device *ldev = pipe->ldev;
> > +   struct lima_ip *ip = pipe->processor[0];
> > +   int i;
> > +
> > +   /*
> > +* If the GPU managed to complete this jobs fence, the timeout is
> > +* spurious. Bail out.
> > +*/
> > +   if (dma_fence_is_signaled(task->fence)) {
> > +   DRM_WARN("%s spurious timeout\n", lima_ip_name(ip));
> > +   return DRM_GPU_SCHED_STAT_NOMINAL;
> > +   }
> > +
> > +   /*
> > +* Lima IRQ handler may take a long time to process an interrupt
> > +* if there is another IRQ handler hogging the processing.
> > +* In order to catch such cases and not report spurious Lima job
> > +* timeouts, synchronize the IRQ handler and re-check the fence
> > +* status.
> > +*/
> > +   for (i = 0; i < pipe->num_processor; i++)
> > +   synchronize_irq(pipe->processor[i]->irq);
> > +
> I have a question, this timeout handler will be called when GP/PP error IRQ.
> If we call synchronize_irq() in the IRQ handler, will we block ourselves here?

If I understand correctly, this handler is only called by drm_sched in
a workqueue, not by gp or pp IRQ and it also does not run in any IRQ
context.
So I think this sort of lockup can't happen here.

I ran some additional tests with both timeouts and actual error IRQs
(locally modified Mesa to produce some errored jobs) and was not able
to cause any lockup related to this.

Erico


Re: [PATCH 1/5] dt-bindings: display/msm: document MDSS on X1E80100

2024-01-29 Thread Rob Herring


On Mon, 29 Jan 2024 15:18:54 +0200, Abel Vesa wrote:
> Document the MDSS hardware found on the Qualcomm X1E80100 platform.
> 
> Signed-off-by: Abel Vesa 
> ---
>  .../bindings/display/msm/qcom,x1e80100-mdss.yaml   | 249 
> +
>  1 file changed, 249 insertions(+)
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
Error: 
Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.example.dts:33.40-41
 syntax error
FATAL ERROR: Unable to parse input tree
make[2]: *** [scripts/Makefile.lib:419: 
Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.example.dtb] 
Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [/builds/robherring/dt-review-ci/linux/Makefile:1428: 
dt_binding_check] Error 2
make: *** [Makefile:240: __sub-make] Error 2

doc reference errors (make refcheckdocs):

See 
https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20240129-x1e80100-display-v1-1-0d9eb8254...@linaro.org

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.



RE: [PATCH] drm/xe: Fix a build error

2024-01-29 Thread Zeng, Oak
Hi Thomas,

My patch was based on drm-tip because I found drm-tip is broken

As long as drm-tip can build, I am all good.

Thanks,
Oak

> -Original Message-
> From: Thomas Hellström 
> Sent: Monday, January 29, 2024 3:26 PM
> To: Christian König ; Zeng, Oak
> ; dri-devel@lists.freedesktop.org; intel-
> x...@lists.freedesktop.org
> Cc: amaranath.somalapu...@amd.com; De Marchi, Lucas
> 
> Subject: Re: [PATCH] drm/xe: Fix a build error
> 
> Hi,
> 
> On 1/29/24 17:48, Christian König wrote:
> > Am 27.01.24 um 16:53 schrieb Oak Zeng:
> >> This fixes a build failure on drm-tip. This issue was introduced during
> >> merge of "drm/ttm: replace busy placement with flags v6". For some
> >> reason, the xe_bo.c part of above change is not merged. Manually merge
> >> the missing part to drm_tip
> >
> > Mhm, I provided this as manual fixup for drm-tip in this rerere commit:
> >
> > commit afc5797e8c03bed3ec47a34f2bc3cf03fce24411
> > Author: Christian König 
> > Date:   Thu Jan 25 10:44:54 2024 +0100
> >
> >     2024y-01m-25d-09h-44m-07s UTC: drm-tip rerere cache update
> >
> >     git version 2.34.1
> >
> >
> > And for me compiling xe in drm-tip worked fine after that. No idea why
> > that didn't work for you.
> >
> > Anyway feel free to add my rb to this patch here if it helps in any way.
> >
> > Regards,
> > Christian.
> 
> I reverted that rerere cache update and added another one, so now it
> works. Not sure exactly what the difference was, but the resulting patch
> was for the drm-misc-next merge in my case, and It was for
> drm-xe-something in your case.
> 
> /Thomas
> 
> 
> >
> >>
> >> Signed-off-by: Oak Zeng 
> >> ---
> >>   drivers/gpu/drm/xe/xe_bo.c | 33 +++--
> >>   1 file changed, 15 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> >> index 686d716c5581..d6a193060cc0 100644
> >> --- a/drivers/gpu/drm/xe/xe_bo.c
> >> +++ b/drivers/gpu/drm/xe/xe_bo.c
> >> @@ -38,22 +38,26 @@ static const struct ttm_place sys_placement_flags
> >> = {
> >>   static struct ttm_placement sys_placement = {
> >>   .num_placement = 1,
> >>   .placement = _placement_flags,
> >> -    .num_busy_placement = 1,
> >> -    .busy_placement = _placement_flags,
> >>   };
> >>   -static const struct ttm_place tt_placement_flags = {
> >> -    .fpfn = 0,
> >> -    .lpfn = 0,
> >> -    .mem_type = XE_PL_TT,
> >> -    .flags = 0,
> >> +static const struct ttm_place tt_placement_flags[] = {
> >> +    {
> >> +    .fpfn = 0,
> >> +    .lpfn = 0,
> >> +    .mem_type = XE_PL_TT,
> >> +    .flags = TTM_PL_FLAG_DESIRED,
> >> +    },
> >> +    {
> >> +    .fpfn = 0,
> >> +    .lpfn = 0,
> >> +    .mem_type = XE_PL_SYSTEM,
> >> +    .flags = TTM_PL_FLAG_FALLBACK,
> >> +    }
> >>   };
> >>     static struct ttm_placement tt_placement = {
> >> -    .num_placement = 1,
> >> -    .placement = _placement_flags,
> >> -    .num_busy_placement = 1,
> >> -    .busy_placement = _placement_flags,
> >> +    .num_placement = 2,
> >> +    .placement = tt_placement_flags,
> >>   };
> >>     bool mem_type_is_vram(u32 mem_type)
> >> @@ -230,8 +234,6 @@ static int __xe_bo_placement_for_flags(struct
> >> xe_device *xe, struct xe_bo *bo,
> >>   bo->placement = (struct ttm_placement) {
> >>   .num_placement = c,
> >>   .placement = bo->placements,
> >> -    .num_busy_placement = c,
> >> -    .busy_placement = bo->placements,
> >>   };
> >>     return 0;
> >> @@ -251,7 +253,6 @@ static void xe_evict_flags(struct
> >> ttm_buffer_object *tbo,
> >>   /* Don't handle scatter gather BOs */
> >>   if (tbo->type == ttm_bo_type_sg) {
> >>   placement->num_placement = 0;
> >> -    placement->num_busy_placement = 0;
> >>   return;
> >>   }
> >>   @@ -1391,8 +1392,6 @@ static int __xe_bo_fixed_placement(struct
> >> xe_device *xe,
> >>   bo->placement = (struct ttm_placement) {
> >>   .num_placement = 1,
> >>   .placement = place,
> >> -    .num_busy_placement = 1,
> >> -    .busy_placement = place,
> >>   };
> >>     return 0;
> >> @@ -2150,9 +2149,7 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
> >>     xe_place_from_ttm_type(mem_type, );
> >>   placement.num_placement = 1;
> >> -    placement.num_busy_placement = 1;
> >>   placement.placement = 
> >> -    placement.busy_placement = 
> >>     /*
> >>    * Stolen needs to be handled like below VRAM handling if we
> >> ever need
> >


Re: [PATCH] drm/xe: Fix a build error

2024-01-29 Thread Thomas Hellström

Hi,

On 1/29/24 17:48, Christian König wrote:

Am 27.01.24 um 16:53 schrieb Oak Zeng:

This fixes a build failure on drm-tip. This issue was introduced during
merge of "drm/ttm: replace busy placement with flags v6". For some
reason, the xe_bo.c part of above change is not merged. Manually merge
the missing part to drm_tip


Mhm, I provided this as manual fixup for drm-tip in this rerere commit:

commit afc5797e8c03bed3ec47a34f2bc3cf03fce24411
Author: Christian König 
Date:   Thu Jan 25 10:44:54 2024 +0100

    2024y-01m-25d-09h-44m-07s UTC: drm-tip rerere cache update

    git version 2.34.1


And for me compiling xe in drm-tip worked fine after that. No idea why 
that didn't work for you.


Anyway feel free to add my rb to this patch here if it helps in any way.

Regards,
Christian.


I reverted that rerere cache update and added another one, so now it 
works. Not sure exactly what the difference was, but the resulting patch 
was for the drm-misc-next merge in my case, and It was for 
drm-xe-something in your case.


/Thomas






Signed-off-by: Oak Zeng 
---
  drivers/gpu/drm/xe/xe_bo.c | 33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 686d716c5581..d6a193060cc0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -38,22 +38,26 @@ static const struct ttm_place sys_placement_flags 
= {

  static struct ttm_placement sys_placement = {
  .num_placement = 1,
  .placement = _placement_flags,
-    .num_busy_placement = 1,
-    .busy_placement = _placement_flags,
  };
  -static const struct ttm_place tt_placement_flags = {
-    .fpfn = 0,
-    .lpfn = 0,
-    .mem_type = XE_PL_TT,
-    .flags = 0,
+static const struct ttm_place tt_placement_flags[] = {
+    {
+    .fpfn = 0,
+    .lpfn = 0,
+    .mem_type = XE_PL_TT,
+    .flags = TTM_PL_FLAG_DESIRED,
+    },
+    {
+    .fpfn = 0,
+    .lpfn = 0,
+    .mem_type = XE_PL_SYSTEM,
+    .flags = TTM_PL_FLAG_FALLBACK,
+    }
  };
    static struct ttm_placement tt_placement = {
-    .num_placement = 1,
-    .placement = _placement_flags,
-    .num_busy_placement = 1,
-    .busy_placement = _placement_flags,
+    .num_placement = 2,
+    .placement = tt_placement_flags,
  };
    bool mem_type_is_vram(u32 mem_type)
@@ -230,8 +234,6 @@ static int __xe_bo_placement_for_flags(struct 
xe_device *xe, struct xe_bo *bo,

  bo->placement = (struct ttm_placement) {
  .num_placement = c,
  .placement = bo->placements,
-    .num_busy_placement = c,
-    .busy_placement = bo->placements,
  };
    return 0;
@@ -251,7 +253,6 @@ static void xe_evict_flags(struct 
ttm_buffer_object *tbo,

  /* Don't handle scatter gather BOs */
  if (tbo->type == ttm_bo_type_sg) {
  placement->num_placement = 0;
-    placement->num_busy_placement = 0;
  return;
  }
  @@ -1391,8 +1392,6 @@ static int __xe_bo_fixed_placement(struct 
xe_device *xe,

  bo->placement = (struct ttm_placement) {
  .num_placement = 1,
  .placement = place,
-    .num_busy_placement = 1,
-    .busy_placement = place,
  };
    return 0;
@@ -2150,9 +2149,7 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
    xe_place_from_ttm_type(mem_type, );
  placement.num_placement = 1;
-    placement.num_busy_placement = 1;
  placement.placement = 
-    placement.busy_placement = 
    /*
   * Stolen needs to be handled like below VRAM handling if we 
ever need




Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling



On 2024-01-29 14:03, Christian König wrote:

Am 29.01.24 um 18:52 schrieb Felix Kuehling:

On 2024-01-29 11:28, Christian König wrote:

Am 29.01.24 um 17:24 schrieb Felix Kuehling:

On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look 
at the svm_ioctl function, it is per-process based. Each 
kfd_process represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really 
not do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very 
different from our legacy gpu *device* driver which works for 
only one device (i.e., if you want one device to access 
another device's memory, you will have to use dma-buf 
export/import etc).
Exactly that thinking is what we have currently found as 
blocker for a
virtualization projects. Having SVM as device independent 
feature which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but 
not all of

them.

What's working much better is to just have a mirror 
functionality which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature 
required for
higher level APIs and not something you need at the UAPI or 
even inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do 
this on a per
device basis and *not* tied to the process address space. If 
you then get
crappy performance because userspace gave contradicting 
information where to
migrate memory then that's a bug in userspace and not something 
the kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have 
seen with
KFD, where we moved more and more functionality from the KFD 
to the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu 
devices. That is why during kfd open, many pdd (process device 
data) is created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself 
that I see
this design as a rather extreme failure. And I think it's one 
of the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few 
use cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end 
up with CPU
address != GPU address because the VAs are actually coming from 
the guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
should not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you 
need to get
explicit permission to do this from Dave and Daniel and maybe 
even Linus.
I think the one and only one exception where an SVM uapi like in 
kfd makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, 
enforces a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based 
translation, PCI-ATS
(address translation services) or whatever your hw calls it and 
has _no_
device-side pagetables on top. Which from what I've seen all 
devices with
device-memory have, simply because they need some place to store 
whether
that memory is currently in device memory or should be 
translated using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are 
fully cpu

cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct 
page as
ZONE_DEVICE and accelerator va -> physical address translation 
is only
done with PASID ... but for now I haven't seen that, definitely 
not in

upstream drivers.

And the moment you have some per-device pagetables or per-device 
memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with
Christian that the kfd SVM model is too strict and not a great 
idea.


That basically means, without 

RE: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Zeng, Oak
Hi Christian,

Even though this email thread was started to discuss shared virtual address 
space b/t multiple GPU devices, I eventually found you even don’t agree with a 
shared virtual address space b/t CPU and GPU program. So let’s forget about 
multiple GPU devices for now. I will try explain the shared address space b/t 
cpu and one gpu.

HMM was designed to solve the GPU programmability problem with a very 
fundamental assumption which is GPU program shares a same virtual address space 
with CPU program, for example, with HMM any CPU pointers (such as malloc’ed, 
stack variables and globals) can be used directly on you GPU shader program. 
Are you against this design goal? HMM is already part of linux core MM and 
Linus approved this design. CC’ed Jérôme.

Here is an example of how application can use system allocator (hmm),  I copied 
from 
https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/.
 CC’ed a few Nvidia folks.

void sortfile(FILE* fp, int N) {
  char* data;
  data = (char*)malloc(N);

  fread(data, 1, N, fp);
  qsort<<<...>>>(data, N, 1, cmp);
  cudaDeviceSynchronize();

  use_data(data);
  free(data)
}

As you can see, malloced ptr is used directly in GPU program, no userptr ioctl, 
no vm_bind. This is the model Intel also want to support, besides AMD and 
Nvidia.

Lastly, nouveau in the kernel already support hmm and system allocator. It also 
support shared virtual address space b/t CPU and GPU program. All the codes 
already merged upstream.


See also comments inline to your questions.

I will address your other email separately.

Regards,
Oak

From: Christian König 
Sent: Monday, January 29, 2024 5:11 AM
To: Zeng, Oak ; David Airlie 
Cc: Ghimiray, Himal Prasad ; 
thomas.hellst...@linux.intel.com; Winiarski, Michal 
; Felix Kuehling ; Welty, 
Brian ; Shah, Ankur N ; 
dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org; Gupta, 
saurabhg ; Danilo Krummrich ; Daniel 
Vetter ; Brost, Matthew ; Bommu, 
Krishnaiah ; Vishwanathapura, Niranjana 

Subject: Re: Making drm_gpuvm work across gpu devices

Am 26.01.24 um 21:13 schrieb Zeng, Oak:

-Original Message-

From: Christian König 


Sent: Friday, January 26, 2024 5:10 AM

To: Zeng, Oak ; David Airlie 


Cc: Ghimiray, Himal Prasad 
;

thomas.hellst...@linux.intel.com; 
Winiarski, Michal

; Felix Kuehling 
; Welty,

Brian ; Shah, Ankur N 
; dri-

de...@lists.freedesktop.org; 
intel...@lists.freedesktop.org; Gupta, 
saurabhg

; Danilo Krummrich 
; Daniel

Vetter ; Brost, Matthew 
; Bommu,

Krishnaiah ; 
Vishwanathapura, Niranjana



Subject: Re: Making drm_gpuvm work across gpu devices



Hi Oak,



you can still use SVM, but it should not be a design criteria for the

kernel UAPI. In other words the UAPI should be designed in such a way

that the GPU virtual address can be equal to the CPU virtual address of

a buffer, but can also be different to support use cases where this

isn't the case.



Terminology:

SVM: any technology which can achieve a shared virtual address space b/t cpu 
and devices. The virtual address space can be managed by user space or kernel 
space. Intel implemented a SVM, based on the BO-centric gpu driver (gem-create, 
vm-bind) where virtual address space is managed by UMD.

System allocator: another way of implement SVM. User just use malloc'ed memory 
for gpu submission. Virtual address space is managed by Linux core mm. In 
practice, we leverage HMM to implement system allocator.

This article described details of all those different model: 
https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/



Our programming model allows a mixture use of system allocator (even though 
system allocator is ) and traditional vm_bind (where cpu address can != gpu 
address). Let me re-post the pseudo codes:



 1. Fd0 = open(/"dev/dri/render0")

 2. Fd1 = open("/dev/dri/render1")

 3. Fd3 = open("/dev/dri/xe-svm")

 4. Gpu_Vm0 =xe_vm_create(fd0)

 5. Gpu_Vm1 = xe_vm_create(fd1)

 6. Queue0 = xe_exec_queue_create(fd0, gpu_vm0)

 7. Queue1 = xe_exec_queue_create(fd1, gpu_vm1)

 8. ptr = malloc()

 9. bo = xe_bo_create(fd0)

 10. Vm_bind(bo, gpu_vm0, va)//va is from UMD, cpu can access bo with same or 
different va. It is UMD's responsibility that va doesn't conflict with 
malloc'ed PTRs.

 11. Xe_exec(queue0, ptr)//submit gpu job which use ptr, on 

Re: [PATCH 0/5] drm/vmwgfx: Various kms related fixes

2024-01-29 Thread Ian Forbes
LGTM.

Reviewed-by: Ian Forbes 



Re: [PATCH RFC 0/4] Support for Simulated Panels

2024-01-29 Thread Abhinav Kumar



Hi Maxime

On 1/26/2024 4:45 AM, Maxime Ripard wrote:

On Wed, Jan 17, 2024 at 09:36:20AM -0800, Abhinav Kumar wrote:

Hi Jani and Maxime

On 1/17/2024 2:16 AM, Jani Nikula wrote:

On Wed, 17 Jan 2024, Maxime Ripard  wrote:

Hi,

On Tue, Jan 16, 2024 at 02:22:03PM -0800, Jessica Zhang wrote:

This series introduces a simulated MIPI DSI panel.

Currently, the only way to validate DSI connectors is with a physical
panel. Since obtaining physical panels for all possible DSI configurations
is logistically infeasible, introduce a way for DSI drivers to simulate a
panel.

This will be helpful in catching DSI misconfiguration bugs and catching
performance issues for high FPS panels that might not be easily
obtainable.

For now, the simulated panel driver only supports setting customized
modes via the panel_simlation.mode modparam. Eventually, we would like
to add more customizations (such as configuring DSC, dual DSI, etc.).


I think that it's more complicated than it needs to be.


Both too complicated and not complicated enough! :p


The end goal is to have a framework to be able to validate the display
pipeline with MIPI panels of any resolution , DSC/non-DSC, different MIPI
flags etc.

Historically, QC has been having an in-house framework to validate the
panels in a simulated way as its logistically not possible to procure every
panel from every vendor. This has been working pretty well but its not
upstream yet. So we would like to work with the community to work on a model
which works for everyone and this RFC was initiated with that in mind.


I think the goal was pretty clear. My point was more that there's no
reason it should be driver specific, and having a second path for it
doesn't really exert the actual panel path in the driver. I think a
separate driver would be better.



We can make this generic. That would be great actually. One option could 
be to move the modparam we have within the msm to the drm_of.c so that 
drm_of_find_panel_or_bridge shall return the sim panel if the modparam 
is passed to select a sim panel.


So if we make this a compile time decision whether to use real panel or 
sim panel and just enable the appropriate config, we dont need the 
modparam and we can implement some policy in the drm_of to first check 
if sim panel is available and if not try the real panel then everything 
will just happen under-the-hood. But we thought that a modparam based 
switching might be convenient if users dont want to recompile the code 
to switch but will need to compile both the panels.



There is simulation infrastructure in place in upstream for HDMI/DP in the
form of chamelium based testing in IGT but no such fwk exists for DSI
displays.

Different MIPI panels and resolutions test out not only the DSI controller
but the entire display pipeline as based on resolution, compression and MIPI
mode flags different parts of the pipeline can get exercised.


Why do we need to support (and switch to) both the actual and
"simulated" panel?



As per my discussion on IRC with the panel/bridge maintainers and DT
maintainers, a simulation panel does not qualify for its own devicetree as
its not a real hardware so we needed to come up with a way to have a module
which can be attached to the encoder without its own bindings and
devicetree. Thats what led to this RFC.


I still think it's worth trying, there's plenty of virtual drivers in
the DT already. But even then, DT policies shouldn't dictate general
framework design decisions: we have other ways to probe panels than
using the DT (by loading overlays, registering devices by hand, etc.). I
still think it would be a good idea to try though.



DT option would be great if accepted and will nicely solve the 
scalability issue of this as it desperately needs one.


I have absolutely no concerns and would be glad if it will be accepted.

Can the DT maintainers please comment if having a device tree for a 
simulation panel would work OR be considered because of the scalability 
of the number of panels which can be tried as Maxime wrote.



Wouldn't it be simpler if we had a vkms-like panel that we could either
configure from DT or from debugfs that would just be registered the
usual way and would be the only panel we register?




No, we need to have validate actual hardware pipeline with the simulated
panel. With vkms, actual display pipeline will not be validated. With
incorrect display pipeline misconfigurations arising from different panel
combinations, this can easily be caught with any existing IGT CRC testing.
In addition, all performance related bugs can also be easily caught by
simulating high resolution displays.


That's not what I meant. What I meant was that something like a
user-configurable, generic, panel driver would be a good idea. Just like
vkms (with the debugfs patches) is for a full blown KMS device.



Let me respond for both this question and the one below from you/Jani.

Certainly having user-configurable information is a 

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König

Am 29.01.24 um 18:52 schrieb Felix Kuehling:

On 2024-01-29 11:28, Christian König wrote:

Am 29.01.24 um 17:24 schrieb Felix Kuehling:

On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look 
at the svm_ioctl function, it is per-process based. Each 
kfd_process represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really 
not do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very 
different from our legacy gpu *device* driver which works for 
only one device (i.e., if you want one device to access another 
device's memory, you will have to use dma-buf export/import etc).
Exactly that thinking is what we have currently found as blocker 
for a
virtualization projects. Having SVM as device independent 
feature which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but 
not all of

them.

What's working much better is to just have a mirror 
functionality which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature 
required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do 
this on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting 
information where to
migrate memory then that's a bug in userspace and not something 
the kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have 
seen with
KFD, where we moved more and more functionality from the KFD 
to the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu 
devices. That is why during kfd open, many pdd (process device 
data) is created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself 
that I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few 
use cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end 
up with CPU
address != GPU address because the VAs are actually coming from 
the guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
should not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you 
need to get
explicit permission to do this from Dave and Daniel and maybe 
even Linus.
I think the one and only one exception where an SVM uapi like in 
kfd makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, 
enforces a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based 
translation, PCI-ATS
(address translation services) or whatever your hw calls it and 
has _no_
device-side pagetables on top. Which from what I've seen all 
devices with
device-memory have, simply because they need some place to store 
whether
that memory is currently in device memory or should be translated 
using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are 
fully cpu

cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct 
page as
ZONE_DEVICE and accelerator va -> physical address translation is 
only
done with PASID ... but for now I haven't seen that, definitely 
not in

upstream drivers.

And the moment you have some per-device pagetables or per-device 
memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified 

Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-29 Thread Matthew Brost
On Mon, Jan 29, 2024 at 12:10:52PM -0500, Luben Tuikov wrote:
> On 2024-01-29 02:44, Christian König wrote:
> > Am 26.01.24 um 17:29 schrieb Matthew Brost:
> >> On Fri, Jan 26, 2024 at 11:32:57AM +0100, Christian König wrote:
> >>> Am 25.01.24 um 18:30 schrieb Matthew Brost:
>  On Thu, Jan 25, 2024 at 04:12:58PM +0100, Christian König wrote:
> > Am 24.01.24 um 22:08 schrieb Matthew Brost:
> >> All entities must be drained in the DRM scheduler run job worker to
> >> avoid the following case. An entity found that is ready, no job found
> >> ready on entity, and run job worker goes idle with other entities + 
> >> jobs
> >> ready. Draining all ready entities (i.e. loop over all ready entities)
> >> in the run job worker ensures all job that are ready will be scheduled.
> > That doesn't make sense. drm_sched_select_entity() only returns entities
> > which are "ready", e.g. have a job to run.
> >
>  That is what I thought too, hence my original design but it is not
>  exactly true. Let me explain.
> 
>  drm_sched_select_entity() returns an entity with a non-empty spsc queue
>  (job in queue) and no *current* waiting dependecies [1]. Dependecies for
>  an entity can be added when drm_sched_entity_pop_job() is called [2][3]
>  returning a NULL job. Thus we can get into a scenario where 2 entities
>  A and B both have jobs and no current dependecies. A's job is waiting
>  B's job, entity A gets selected first, a dependecy gets installed in
>  drm_sched_entity_pop_job(), run work goes idle, and now we deadlock.
> >>> And here is the real problem. run work doesn't goes idle in that moment.
> >>>
> >>> drm_sched_run_job_work() should restarts itself until there is either no
> >>> more space in the ring buffer or it can't find a ready entity any more.
> >>>
> >>> At least that was the original design when that was all still driven by a
> >>> kthread.
> >>>
> >>> It can perfectly be that we messed this up when switching from kthread to 
> >>> a
> >>> work item.
> >>>
> >> Right, that what this patch does - the run worker does not go idle until
> >> no ready entities are found. That was incorrect in the original patch
> >> and fixed here. Do you have any issues with this fix? It has been tested
> >> 3x times and clearly fixes the issue.
> > 
> > Ah! Yes in this case that patch here is a little bit ugly as well.
> > 
> > The original idea was that run_job restarts so that we are able to pause 
> > the submission thread without searching for an entity to submit more.
> > 
> > I strongly suggest to replace the while loop with a call to 
> > drm_sched_run_job_queue() so that when the entity can't provide a job we 
> > just restart the queuing work.
> 
> I agree with Christian. This more closely preserves the original design
> of the GPU schedulers, so we should go with that.
> -- 
> Regards,
> Luben

As this patch is already in rc2 will post a patch shortly replacing the
loop with a re-queuing design.

Thanks,
Matt


Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling



On 2024-01-29 11:28, Christian König wrote:

Am 29.01.24 um 17:24 schrieb Felix Kuehling:

On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each 
kfd_process represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really 
not do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very 
different from our legacy gpu *device* driver which works for 
only one device (i.e., if you want one device to access another 
device's memory, you will have to use dma-buf export/import etc).
Exactly that thinking is what we have currently found as blocker 
for a
virtualization projects. Having SVM as device independent feature 
which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature 
required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this 
on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting 
information where to
migrate memory then that's a bug in userspace and not something 
the kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have 
seen with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu 
devices. That is why during kfd open, many pdd (process device 
data) is created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself 
that I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from 
the guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
should not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you 
need to get
explicit permission to do this from Dave and Daniel and maybe 
even Linus.
I think the one and only one exception where an SVM uapi like in 
kfd makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, 
enforces a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS
(address translation services) or whatever your hw calls it and 
has _no_
device-side pagetables on top. Which from what I've seen all 
devices with
device-memory have, simply because they need some place to store 
whether
that memory is currently in device memory or should be translated 
using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully 
cpu

cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct 
page as
ZONE_DEVICE and accelerator va -> physical address translation is 
only
done with PASID ... but for now I haven't seen that, definitely 
not in

upstream drivers.

And the moment you have some per-device pagetables or per-device 
memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or 

[PATCH 6.6 259/331] drm: Disable the cursor plane on atomic contexts with virtualized drivers

2024-01-29 Thread Greg Kroah-Hartman
6.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Zack Rusin 

commit 4e3b70da64a53784683cfcbac2deda5d6e540407 upstream.

Cursor planes on virtualized drivers have special meaning and require
that the clients handle them in specific ways, e.g. the cursor plane
should react to the mouse movement the way a mouse cursor would be
expected to and the client is required to set hotspot properties on it
in order for the mouse events to be routed correctly.

This breaks the contract as specified by the "universal planes". Fix it
by disabling the cursor planes on virtualized drivers while adding
a foundation on top of which it's possible to special case mouse cursor
planes for clients that want it.

Disabling the cursor planes makes some kms compositors which were broken,
e.g. Weston, fallback to software cursor which works fine or at least
better than currently while having no effect on others, e.g. gnome-shell
or kwin, which put virtualized drivers on a deny-list when running in
atomic context to make them fallback to legacy kms and avoid this issue.

Signed-off-by: Zack Rusin 
Fixes: 681e7ec73044 ("drm: Allow userspace to ask for universal plane list 
(v2)")
Cc:  # v5.4+
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Gerd Hoffmann 
Cc: Hans de Goede 
Cc: Gurchetan Singh 
Cc: Chia-I Wu 
Cc: dri-devel@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Acked-by: Pekka Paalanen 
Reviewed-by: Javier Martinez Canillas 
Acked-by: Simon Ser 
Signed-off-by: Javier Martinez Canillas 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231023074613.41327-2-aest...@redhat.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/drm_plane.c  |   13 +
 drivers/gpu/drm/qxl/qxl_drv.c|2 +-
 drivers/gpu/drm/vboxvideo/vbox_drv.c |2 +-
 drivers/gpu/drm/virtio/virtgpu_drv.c |2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |2 +-
 include/drm/drm_drv.h|9 +
 include/drm/drm_file.h   |   12 
 7 files changed, 38 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -678,6 +678,19 @@ int drm_mode_getplane_res(struct drm_dev
!file_priv->universal_planes)
continue;
 
+   /*
+* If we're running on a virtualized driver then,
+* unless userspace advertizes support for the
+* virtualized cursor plane, disable cursor planes
+* because they'll be broken due to missing cursor
+* hotspot info.
+*/
+   if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+   drm_core_check_feature(dev, DRIVER_CURSOR_HOTSPOT) &&
+   file_priv->atomic &&
+   !file_priv->supports_virtualized_cursor_plane)
+   continue;
+
if (drm_lease_held(file_priv, plane->base.id)) {
if (count < plane_resp->count_planes &&
put_user(plane->base.id, plane_ptr + count))
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -283,7 +283,7 @@ static const struct drm_ioctl_desc qxl_i
 };
 
 static struct drm_driver qxl_driver = {
-   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC,
+   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC | 
DRIVER_CURSOR_HOTSPOT,
 
.dumb_create = qxl_mode_dumb_create,
.dumb_map_offset = drm_gem_ttm_dumb_map_offset,
--- a/drivers/gpu/drm/vboxvideo/vbox_drv.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c
@@ -182,7 +182,7 @@ DEFINE_DRM_GEM_FOPS(vbox_fops);
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC,
+   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC | DRIVER_CURSOR_HOTSPOT,
 
.fops = _fops,
.name = DRIVER_NAME,
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -177,7 +177,7 @@ static const struct drm_driver driver =
 * out via drm_device::driver_features:
 */
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC |
-  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE,
+  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | 
DRIVER_CURSOR_HOTSPOT,
.open = virtio_gpu_driver_open,
.postclose = virtio_gpu_driver_postclose,
 
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1611,7 +1611,7 @@ static const struct file_operations vmwg
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,
+   DRIVER_MODESET | DRIVER_RENDER | 

[PATCH 6.6 257/331] drm: Fix TODO list mentioning non-KMS drivers

2024-01-29 Thread Greg Kroah-Hartman
6.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Zimmermann 

commit 9cf5ca1f485cae406968947a92bf304603999fa1 upstream.

Non-KMS drivers have been removed from DRM. Update the TODO list
accordingly.

Signed-off-by: Thomas Zimmermann 
Fixes: a276afc19eec ("drm: Remove some obsolete drm pciids(tdfx, mga, i810, 
savage, r128, sis, via)")
Cc: Cai Huoqing 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jonathan Corbet 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.3+
Cc: linux-...@vger.kernel.org
Reviewed-by: David Airlie 
Reviewed-by: Daniel Vetter 
Acked-by: Alex Deucher 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231122122449.11588-3-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/gpu/todo.rst | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 503d57c75215..41a264bf84ce 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -337,8 +337,8 @@ connector register/unregister fixes
 
 Level: Intermediate
 
-Remove load/unload callbacks from all non-DRIVER_LEGACY drivers

+Remove load/unload callbacks
+
 
 The load/unload callbacks in struct _driver are very much midlayers, plus
 for historical reasons they get the ordering wrong (and we can't fix that)
@@ -347,8 +347,7 @@ between setting up the _driver structure and calling 
drm_dev_register().
 - Rework drivers to no longer use the load/unload callbacks, directly coding 
the
   load/unload sequence into the driver's probe function.
 
-- Once all non-DRIVER_LEGACY drivers are converted, disallow the load/unload
-  callbacks for all modern drivers.
+- Once all drivers are converted, remove the load/unload callbacks.
 
 Contact: Daniel Vetter
 
-- 
2.43.0





Re: [PATCH v3 3/3] dt-bindings: mfd: atmel,hlcdc: Convert to DT schema format

2024-01-29 Thread Conor Dooley
On Mon, Jan 29, 2024 at 03:41:22AM +, dharm...@microchip.com wrote:
> I will proceed with updating the clock names to include "lvds pll" and 
> adjusting the clocks minitems to 3. Does this seem appropriate to you?
> 
> Please let me know if there are any additional considerations or 
> specific aspects that require attention.

That seems okay, thanks.


signature.asc
Description: PGP signature


Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-29 Thread Luben Tuikov
On 2024-01-29 02:44, Christian König wrote:
> Am 26.01.24 um 17:29 schrieb Matthew Brost:
>> On Fri, Jan 26, 2024 at 11:32:57AM +0100, Christian König wrote:
>>> Am 25.01.24 um 18:30 schrieb Matthew Brost:
 On Thu, Jan 25, 2024 at 04:12:58PM +0100, Christian König wrote:
> Am 24.01.24 um 22:08 schrieb Matthew Brost:
>> All entities must be drained in the DRM scheduler run job worker to
>> avoid the following case. An entity found that is ready, no job found
>> ready on entity, and run job worker goes idle with other entities + jobs
>> ready. Draining all ready entities (i.e. loop over all ready entities)
>> in the run job worker ensures all job that are ready will be scheduled.
> That doesn't make sense. drm_sched_select_entity() only returns entities
> which are "ready", e.g. have a job to run.
>
 That is what I thought too, hence my original design but it is not
 exactly true. Let me explain.

 drm_sched_select_entity() returns an entity with a non-empty spsc queue
 (job in queue) and no *current* waiting dependecies [1]. Dependecies for
 an entity can be added when drm_sched_entity_pop_job() is called [2][3]
 returning a NULL job. Thus we can get into a scenario where 2 entities
 A and B both have jobs and no current dependecies. A's job is waiting
 B's job, entity A gets selected first, a dependecy gets installed in
 drm_sched_entity_pop_job(), run work goes idle, and now we deadlock.
>>> And here is the real problem. run work doesn't goes idle in that moment.
>>>
>>> drm_sched_run_job_work() should restarts itself until there is either no
>>> more space in the ring buffer or it can't find a ready entity any more.
>>>
>>> At least that was the original design when that was all still driven by a
>>> kthread.
>>>
>>> It can perfectly be that we messed this up when switching from kthread to a
>>> work item.
>>>
>> Right, that what this patch does - the run worker does not go idle until
>> no ready entities are found. That was incorrect in the original patch
>> and fixed here. Do you have any issues with this fix? It has been tested
>> 3x times and clearly fixes the issue.
> 
> Ah! Yes in this case that patch here is a little bit ugly as well.
> 
> The original idea was that run_job restarts so that we are able to pause 
> the submission thread without searching for an entity to submit more.
> 
> I strongly suggest to replace the while loop with a call to 
> drm_sched_run_job_queue() so that when the entity can't provide a job we 
> just restart the queuing work.

I agree with Christian. This more closely preserves the original design
of the GPU schedulers, so we should go with that.
-- 
Regards,
Luben


OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


[PATCH 6.7 253/346] drm: Disable the cursor plane on atomic contexts with virtualized drivers

2024-01-29 Thread Greg Kroah-Hartman
6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Zack Rusin 

commit 4e3b70da64a53784683cfcbac2deda5d6e540407 upstream.

Cursor planes on virtualized drivers have special meaning and require
that the clients handle them in specific ways, e.g. the cursor plane
should react to the mouse movement the way a mouse cursor would be
expected to and the client is required to set hotspot properties on it
in order for the mouse events to be routed correctly.

This breaks the contract as specified by the "universal planes". Fix it
by disabling the cursor planes on virtualized drivers while adding
a foundation on top of which it's possible to special case mouse cursor
planes for clients that want it.

Disabling the cursor planes makes some kms compositors which were broken,
e.g. Weston, fallback to software cursor which works fine or at least
better than currently while having no effect on others, e.g. gnome-shell
or kwin, which put virtualized drivers on a deny-list when running in
atomic context to make them fallback to legacy kms and avoid this issue.

Signed-off-by: Zack Rusin 
Fixes: 681e7ec73044 ("drm: Allow userspace to ask for universal plane list 
(v2)")
Cc:  # v5.4+
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Gerd Hoffmann 
Cc: Hans de Goede 
Cc: Gurchetan Singh 
Cc: Chia-I Wu 
Cc: dri-devel@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Acked-by: Pekka Paalanen 
Reviewed-by: Javier Martinez Canillas 
Acked-by: Simon Ser 
Signed-off-by: Javier Martinez Canillas 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231023074613.41327-2-aest...@redhat.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/drm_plane.c  |   13 +
 drivers/gpu/drm/qxl/qxl_drv.c|2 +-
 drivers/gpu/drm/vboxvideo/vbox_drv.c |2 +-
 drivers/gpu/drm/virtio/virtgpu_drv.c |2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |2 +-
 include/drm/drm_drv.h|9 +
 include/drm/drm_file.h   |   12 
 7 files changed, 38 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -678,6 +678,19 @@ int drm_mode_getplane_res(struct drm_dev
!file_priv->universal_planes)
continue;
 
+   /*
+* If we're running on a virtualized driver then,
+* unless userspace advertizes support for the
+* virtualized cursor plane, disable cursor planes
+* because they'll be broken due to missing cursor
+* hotspot info.
+*/
+   if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+   drm_core_check_feature(dev, DRIVER_CURSOR_HOTSPOT) &&
+   file_priv->atomic &&
+   !file_priv->supports_virtualized_cursor_plane)
+   continue;
+
if (drm_lease_held(file_priv, plane->base.id)) {
if (count < plane_resp->count_planes &&
put_user(plane->base.id, plane_ptr + count))
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -285,7 +285,7 @@ static const struct drm_ioctl_desc qxl_i
 };
 
 static struct drm_driver qxl_driver = {
-   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC,
+   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC | 
DRIVER_CURSOR_HOTSPOT,
 
.dumb_create = qxl_mode_dumb_create,
.dumb_map_offset = drm_gem_ttm_dumb_map_offset,
--- a/drivers/gpu/drm/vboxvideo/vbox_drv.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c
@@ -182,7 +182,7 @@ DEFINE_DRM_GEM_FOPS(vbox_fops);
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC,
+   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC | DRIVER_CURSOR_HOTSPOT,
 
.fops = _fops,
.name = DRIVER_NAME,
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -177,7 +177,7 @@ static const struct drm_driver driver =
 * out via drm_device::driver_features:
 */
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC |
-  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE,
+  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | 
DRIVER_CURSOR_HOTSPOT,
.open = virtio_gpu_driver_open,
.postclose = virtio_gpu_driver_postclose,
 
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1611,7 +1611,7 @@ static const struct file_operations vmwg
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,
+   DRIVER_MODESET | DRIVER_RENDER | 

[PATCH 6.7 251/346] drm: Fix TODO list mentioning non-KMS drivers

2024-01-29 Thread Greg Kroah-Hartman
6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Zimmermann 

commit 9cf5ca1f485cae406968947a92bf304603999fa1 upstream.

Non-KMS drivers have been removed from DRM. Update the TODO list
accordingly.

Signed-off-by: Thomas Zimmermann 
Fixes: a276afc19eec ("drm: Remove some obsolete drm pciids(tdfx, mga, i810, 
savage, r128, sis, via)")
Cc: Cai Huoqing 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jonathan Corbet 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.3+
Cc: linux-...@vger.kernel.org
Reviewed-by: David Airlie 
Reviewed-by: Daniel Vetter 
Acked-by: Alex Deucher 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231122122449.11588-3-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/gpu/todo.rst |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -337,8 +337,8 @@ connector register/unregister fixes
 
 Level: Intermediate
 
-Remove load/unload callbacks from all non-DRIVER_LEGACY drivers

+Remove load/unload callbacks
+
 
 The load/unload callbacks in struct _driver are very much midlayers, plus
 for historical reasons they get the ordering wrong (and we can't fix that)
@@ -347,8 +347,7 @@ between setting up the _driver struc
 - Rework drivers to no longer use the load/unload callbacks, directly coding 
the
   load/unload sequence into the driver's probe function.
 
-- Once all non-DRIVER_LEGACY drivers are converted, disallow the load/unload
-  callbacks for all modern drivers.
+- Once all drivers are converted, remove the load/unload callbacks.
 
 Contact: Daniel Vetter
 




[PATCH v6 6/6] Documentation: iio: Document high-speed DMABUF based API

2024-01-29 Thread Paul Cercueil
Document the new DMABUF based API.

Signed-off-by: Paul Cercueil 

---
v2: - Explicitly state that the new interface is optional and is
  not implemented by all drivers.
- The IOCTLs can now only be called on the buffer FD returned by
  IIO_BUFFER_GET_FD_IOCTL.
- Move the page up a bit in the index since it is core stuff and not
  driver-specific.

v3: Update the documentation to reflect the new API.

v5: Use description lists for the documentation of the three new IOCTLs
instead of abusing subsections.
---
 Documentation/iio/dmabuf_api.rst | 54 
 Documentation/iio/index.rst  |  2 ++
 2 files changed, 56 insertions(+)
 create mode 100644 Documentation/iio/dmabuf_api.rst

diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
new file mode 100644
index ..1cd6cd51a582
--- /dev/null
+++ b/Documentation/iio/dmabuf_api.rst
@@ -0,0 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+High-speed DMABUF interface for IIO
+===
+
+1. Overview
+===
+
+The Industrial I/O subsystem supports access to buffers through a
+file-based interface, with read() and write() access calls through the
+IIO device's dev node.
+
+It additionally supports a DMABUF based interface, where the userspace
+can attach DMABUF objects (externally created) to a IIO buffer, and
+subsequently use them for data transfers.
+
+A userspace application can then use this interface to share DMABUF
+objects between several interfaces, allowing it to transfer data in a
+zero-copy fashion, for instance between IIO and the USB stack.
+
+The userspace application can also memory-map the DMABUF objects, and
+access the sample data directly. The advantage of doing this vs. the
+read() interface is that it avoids an extra copy of the data between the
+kernel and userspace. This is particularly useful for high-speed devices
+which produce several megabytes or even gigabytes of data per second.
+It does however increase the userspace-kernelspace synchronization
+overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to
+be used for data integrity.
+
+2. User API
+===
+
+As part of this interface, three new IOCTLs have been added. These three
+IOCTLs have to be performed on the IIO buffer's file descriptor,
+obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
+
+  ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)``
+Attach the DMABUF object, identified by its file descriptor, to the
+IIO buffer. Returns zero on success, and a negative errno value on
+error.
+
+  ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)``
+Detach the given DMABUF object, identified by its file descriptor,
+from the IIO buffer. Returns zero on success, and a negative errno
+value on error.
+
+Note that closing the IIO buffer's file descriptor will
+automatically detach all previously attached DMABUF objects.
+
+  ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)``
+Enqueue a previously attached DMABUF object to the buffer queue.
+Enqueued DMABUFs will be read from (if output buffer) or written to
+(if input buffer) as long as the buffer is enabled.
diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
index 1b7292c58cd0..3eae8fcb1938 100644
--- a/Documentation/iio/index.rst
+++ b/Documentation/iio/index.rst
@@ -9,6 +9,8 @@ Industrial I/O
 
iio_configfs
 
+   dmabuf_api
+
ep93xx_adc
 
bno055
-- 
2.43.0



[PATCH v6 4/6] iio: buffer-dma: Enable support for DMABUFs

2024-01-29 Thread Paul Cercueil
Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf()
and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO
DMA buffer implementations.

Signed-off-by: Paul Cercueil 

---
v3: Update code to provide the functions that will be used as callbacks
for the new IOCTLs.

v6: - Update iio_dma_buffer_enqueue_dmabuf() to take a dma_fence pointer
- Pass that dma_fence pointer along to
  iio_buffer_signal_dmabuf_done()
- Add iio_dma_buffer_lock_queue() / iio_dma_buffer_unlock_queue()
- Do not lock the queue in iio_dma_buffer_enqueue_dmabuf().
  The caller will ensure that it has been locked already.
- Replace "int += bool;" by "if (bool) int++;"
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Use one "num_dmabufs" fields instead of one "num_blocks" and one
  "num_fileio_blocks". Make it an atomic_t, which makes it possible
  to decrement it atomically in iio_buffer_block_release() without
  having to lock the queue mutex; and in turn, it means that we
  don't need to use iio_buffer_block_put_atomic() everywhere to
  avoid locking the queue mutex twice.
- Use cleanup.h guard(mutex) when possible
- Explicitely list all states in the switch in
  iio_dma_can_enqueue_block()
- Rename iio_dma_buffer_fileio_mode() to
  iio_dma_buffer_can_use_fileio(), and add a comment explaining why
  it cannot race vs. DMABUF.
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 181 +--
 include/linux/iio/buffer-dma.h   |  31 
 2 files changed, 201 insertions(+), 11 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
b/drivers/iio/buffer/industrialio-buffer-dma.c
index 5610ba67925e..c0f539af98f9 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -4,6 +4,8 @@
  *  Author: Lars-Peter Clausen 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -94,13 +98,18 @@ static void iio_buffer_block_release(struct kref *kref)
 {
struct iio_dma_buffer_block *block = container_of(kref,
struct iio_dma_buffer_block, kref);
+   struct iio_dma_buffer_queue *queue = block->queue;
 
-   WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
+   WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
 
-   dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
-   block->vaddr, block->phys_addr);
+   if (block->fileio) {
+   dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
+ block->vaddr, block->phys_addr);
+   } else {
+   atomic_dec(>num_dmabufs);
+   }
 
-   iio_buffer_put(>queue->buffer);
+   iio_buffer_put(>buffer);
kfree(block);
 }
 
@@ -163,7 +172,7 @@ static struct iio_dma_buffer_queue 
*iio_buffer_to_queue(struct iio_buffer *buf)
 }
 
 static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
-   struct iio_dma_buffer_queue *queue, size_t size)
+   struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
 {
struct iio_dma_buffer_block *block;
 
@@ -171,13 +180,16 @@ static struct iio_dma_buffer_block 
*iio_dma_buffer_alloc_block(
if (!block)
return NULL;
 
-   block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
-   >phys_addr, GFP_KERNEL);
-   if (!block->vaddr) {
-   kfree(block);
-   return NULL;
+   if (fileio) {
+   block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
+ >phys_addr, 
GFP_KERNEL);
+   if (!block->vaddr) {
+   kfree(block);
+   return NULL;
+   }
}
 
+   block->fileio = fileio;
block->size = size;
block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
@@ -186,6 +198,9 @@ static struct iio_dma_buffer_block 
*iio_dma_buffer_alloc_block(
 
iio_buffer_get(>buffer);
 
+   if (!fileio)
+   atomic_inc(>num_dmabufs);
+
return block;
 }
 
@@ -206,13 +221,21 @@ void iio_dma_buffer_block_done(struct 
iio_dma_buffer_block *block)
 {
struct iio_dma_buffer_queue *queue = block->queue;
unsigned long flags;
+   bool cookie;
+
+   cookie = dma_fence_begin_signalling();
 
spin_lock_irqsave(>list_lock, flags);
_iio_dma_buffer_block_done(block);
spin_unlock_irqrestore(>list_lock, flags);
 
+   if (!block->fileio)
+   iio_buffer_signal_dmabuf_done(block->fence, 0);
+
iio_buffer_block_put_atomic(block);
wake_up_interruptible_poll(>buffer.pollq, EPOLLIN | EPOLLRDNORM);
+
+   dma_fence_end_signalling(cookie);
 }
 

[PATCH v6 5/6] iio: buffer-dmaengine: Support new DMABUF based userspace API

2024-01-29 Thread Paul Cercueil
Use the functions provided by the buffer-dma core to implement the
DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.

Since we want to be able to transfer an arbitrary number of bytes and
not necesarily the full DMABUF, the associated scatterlist is converted
to an array of DMA addresses + lengths, which is then passed to
dmaengine_prep_slave_dma_array().

Signed-off-by: Paul Cercueil 

---
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to
work with the new functions introduced in industrialio-buffer-dma.c.

v5: - Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet
  supported by IIO buffers.

v6: - Populate .lock_queue / .unlock_queue callbacks
- Switch to atomic memory allocations in .submit_queue, because of
  the dma_fence critical section
- Make sure that the size of the scatterlist is enough
---
 .../buffer/industrialio-buffer-dmaengine.c| 58 +--
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c 
b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 45fe7d0d42ee..c4cfdb0c1231 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -64,15 +64,54 @@ static int iio_dmaengine_buffer_submit_block(struct 
iio_dma_buffer_queue *queue,
struct dmaengine_buffer *dmaengine_buffer =
iio_buffer_to_dmaengine_buffer(>buffer);
struct dma_async_tx_descriptor *desc;
+   struct scatterlist *sgl;
+   struct dma_vec *vecs;
dma_cookie_t cookie;
+   size_t len_total;
+   size_t max_size;
+   unsigned int i;
+   int nents;
 
-   block->bytes_used = min(block->size, dmaengine_buffer->max_size);
-   block->bytes_used = round_down(block->bytes_used,
-   dmaengine_buffer->align);
+   if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
+   /* We do not yet support output buffers. */
+   return -EINVAL;
+   }
 
-   desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
-   block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
-   DMA_PREP_INTERRUPT);
+   if (block->sg_table) {
+   sgl = block->sg_table->sgl;
+   nents = sg_nents_for_len(sgl, block->bytes_used);
+   if (nents < 0)
+   return nents;
+
+   vecs = kmalloc_array(nents, sizeof(*vecs), GFP_ATOMIC);
+   if (!vecs)
+   return -ENOMEM;
+
+   len_total = block->bytes_used;
+
+   for (i = 0; i < nents; i++) {
+   vecs[i].addr = sg_dma_address(sgl);
+   vecs[i].len = min(sg_dma_len(sgl), len_total);
+   len_total -= vecs[i].len;
+
+   sgl = sg_next(sgl);
+   }
+
+   desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
+   vecs, nents, DMA_DEV_TO_MEM,
+   DMA_PREP_INTERRUPT);
+   kfree(vecs);
+   } else {
+   max_size = min(block->size, dmaengine_buffer->max_size);
+   max_size = round_down(max_size, dmaengine_buffer->align);
+   block->bytes_used = max_size;
+
+   desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
+  block->phys_addr,
+  block->bytes_used,
+  DMA_DEV_TO_MEM,
+  DMA_PREP_INTERRUPT);
+   }
if (!desc)
return -ENOMEM;
 
@@ -120,6 +159,13 @@ static const struct iio_buffer_access_funcs 
iio_dmaengine_buffer_ops = {
.data_available = iio_dma_buffer_data_available,
.release = iio_dmaengine_buffer_release,
 
+   .enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf,
+   .attach_dmabuf = iio_dma_buffer_attach_dmabuf,
+   .detach_dmabuf = iio_dma_buffer_detach_dmabuf,
+
+   .lock_queue = iio_dma_buffer_lock_queue,
+   .unlock_queue = iio_dma_buffer_unlock_queue,
+
.modes = INDIO_BUFFER_HARDWARE,
.flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK,
 };
-- 
2.43.0



[PATCH v6 3/6] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil
Add the necessary infrastructure to the IIO core to support a new
optional DMABUF based interface.

With this new interface, DMABUF objects (externally created) can be
attached to a IIO buffer, and subsequently used for data transfer.

A userspace application can then use this interface to share DMABUF
objects between several interfaces, allowing it to transfer data in a
zero-copy fashion, for instance between IIO and the USB stack.

The userspace application can also memory-map the DMABUF objects, and
access the sample data directly. The advantage of doing this vs. the
read() interface is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

As part of the interface, 3 new IOCTLs have been added:

IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
 Attach the DMABUF object identified by the given file descriptor to the
 buffer.

IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
 Detach the DMABUF object identified by the given file descriptor from
 the buffer. Note that closing the IIO buffer's file descriptor will
 automatically detach all previously attached DMABUF objects.

IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
 Request a data transfer to/from the given DMABUF object. Its file
 descriptor, as well as the transfer size and flags are provided in the
 "iio_dmabuf" structure.

These three IOCTLs have to be performed on the IIO buffer's file
descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.

Signed-off-by: Paul Cercueil 

---
v2: Only allow the new IOCTLs on the buffer FD created with
IIO_BUFFER_GET_FD_IOCTL().

v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or
manage DMABUFs anymore, and only attaches/detaches externally
created DMABUFs.
- Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.

v5: - Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's
  private data even when transfers are still pending because it
  won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static

v6: - Remove dead code in iio_dma_resv_lock()
- Fix non-block actually blocking
- Cache dma_buf_attachment instead of mapping/unmapping it for every
  transfer
- Return -EINVAL instead of IIO_IOCTL_UNHANDLED for unknown ioctl
- Make .block_enqueue() callback take a dma_fence pointer, which
  will be passed to iio_buffer_signal_dmabuf_done() instead of the
  dma_buf_attachment; and remove the backpointer from the priv
  structure to the dma_fence.
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Unref dma_fence and dma_buf_attachment in worker, because they
  might try to lock the dma_resv, which would deadlock.
- Add buffer ops to lock/unlock the queue. This is motivated by the
  fact that once the dma_fence has been installed, we cannot lock
  anything anymore - so the queue must be locked before the
  dma_fence is installed.
- Use 'long retl' variable to handle the return value of
  dma_resv_wait_timeout()
- Protect dmabufs list access with a mutex
- Rework iio_buffer_find_attachment() to use the internal dmabufs
  list, instead of messing with dmabufs private data.
- Add an atomically-increasing sequence number for fences
---
 drivers/iio/industrialio-buffer.c | 462 ++
 include/linux/iio/buffer_impl.h   |  33 +++
 include/uapi/linux/iio/buffer.h   |  22 ++
 3 files changed, 517 insertions(+)

diff --git a/drivers/iio/industrialio-buffer.c 
b/drivers/iio/industrialio-buffer.c
index b581a7e80566..0e63a09fa90a 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -9,14 +9,19 @@
  * - Better memory allocation techniques?
  * - Alternative access techniques?
  */
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -28,6 +33,32 @@
 #include 
 #include 
 
+#define DMABUF_ENQUEUE_TIMEOUT_MS 5000
+
+struct iio_dma_fence;
+
+struct iio_dmabuf_priv {
+   struct list_head entry;
+   struct kref ref;
+
+   struct iio_buffer *buffer;
+   struct iio_dma_buffer_block *block;
+
+   u64 context;
+   spinlock_t lock;
+
+   struct dma_buf_attachment *attach;
+   struct sg_table *sgt;
+   enum dma_data_direction dir;
+   atomic_t seqno;
+};
+
+struct iio_dma_fence {
+   struct dma_fence base;
+   struct iio_dmabuf_priv *priv;
+   struct work_struct work;
+};
+
 static const char * const iio_endian_prefix[] = {
[IIO_BE] = "be",
[IIO_LE] = "le",
@@ -332,6 

[PATCH v6 1/6] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

2024-01-29 Thread Paul Cercueil
This function can be used to initiate a scatter-gather DMA transfer,
where the address and size of each segment is located in one entry of
the dma_vec array.

The major difference with dmaengine_prep_slave_sg() is that it supports
specifying the lengths of each DMA transfer; as trying to override the
length of the transfer with dmaengine_prep_slave_sg() is a very tedious
process. The introduction of a new API function is also justified by the
fact that scatterlists are on their way out.

Note that dmaengine_prep_interleaved_dma() is not helpful either in that
case, as it assumes that the address of each segment will be higher than
the one of the previous segment, which we just cannot guarantee in case
of a scatter-gather transfer.

Signed-off-by: Paul Cercueil 

---
v3: New patch

v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct
'dma_vec'.
Note that at some point we will need to support cyclic transfers
using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
parameter to the function?
---
 include/linux/dmaengine.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 3df70d6131c8..ee5931ddb42f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -160,6 +160,16 @@ struct dma_interleaved_template {
struct data_chunk sgl[];
 };
 
+/**
+ * struct dma_vec - DMA vector
+ * @addr: Bus address of the start of the vector
+ * @len: Length in bytes of the DMA vector
+ */
+struct dma_vec {
+   dma_addr_t addr;
+   size_t len;
+};
+
 /**
  * enum dma_ctrl_flags - DMA flags to augment operation preparation,
  *  control completion, and communicate status.
@@ -910,6 +920,10 @@ struct dma_device {
struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
struct dma_chan *chan, unsigned long flags);
 
+   struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)(
+   struct dma_chan *chan, const struct dma_vec *vecs,
+   size_t nents, enum dma_transfer_direction direction,
+   unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor 
*dmaengine_prep_slave_single(
  dir, flags, NULL);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
+   struct dma_chan *chan, const struct dma_vec *vecs, size_t nents,
+   enum dma_transfer_direction dir, unsigned long flags)
+{
+   if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec)
+   return NULL;
+
+   return chan->device->device_prep_slave_dma_vec(chan, vecs, nents,
+  dir, flags);
+}
+
 static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
enum dma_transfer_direction dir, unsigned long flags)
-- 
2.43.0



[PATCH v6 2/6] dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec

2024-01-29 Thread Paul Cercueil
Add implementation of the .device_prep_slave_dma_vec() callback.

Signed-off-by: Paul Cercueil 

---
v3: New patch

v5: Implement .device_prep_slave_dma_vec() instead of v3's
.device_prep_slave_dma_array().

v6: Use new prototype for axi_dmac_alloc_desc() as it changed upstream.
---
 drivers/dma/dma-axi-dmac.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
index 4e339c04fc1e..276856a1742d 100644
--- a/drivers/dma/dma-axi-dmac.c
+++ b/drivers/dma/dma-axi-dmac.c
@@ -620,6 +620,45 @@ static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct 
axi_dmac_chan *chan,
return sg;
 }
 
+static struct dma_async_tx_descriptor *
+axi_dmac_prep_slave_dma_vec(struct dma_chan *c, const struct dma_vec *vecs,
+   size_t nb, enum dma_transfer_direction direction,
+   unsigned long flags)
+{
+   struct axi_dmac_chan *chan = to_axi_dmac_chan(c);
+   struct axi_dmac_desc *desc;
+   unsigned int num_sgs = 0;
+   struct axi_dmac_sg *dsg;
+   size_t i;
+
+   if (direction != chan->direction)
+   return NULL;
+
+   for (i = 0; i < nb; i++)
+   num_sgs += DIV_ROUND_UP(vecs[i].len, chan->max_length);
+
+   desc = axi_dmac_alloc_desc(chan, num_sgs);
+   if (!desc)
+   return NULL;
+
+   dsg = desc->sg;
+
+   for (i = 0; i < nb; i++) {
+   if (!axi_dmac_check_addr(chan, vecs[i].addr) ||
+   !axi_dmac_check_len(chan, vecs[i].len)) {
+   kfree(desc);
+   return NULL;
+   }
+
+   dsg = axi_dmac_fill_linear_sg(chan, direction, vecs[i].addr, 1,
+ vecs[i].len, dsg);
+   }
+
+   desc->cyclic = false;
+
+   return vchan_tx_prep(>vchan, >vdesc, flags);
+}
+
 static struct dma_async_tx_descriptor *axi_dmac_prep_slave_sg(
struct dma_chan *c, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -1055,6 +1094,7 @@ static int axi_dmac_probe(struct platform_device *pdev)
dma_dev->device_tx_status = dma_cookie_status;
dma_dev->device_issue_pending = axi_dmac_issue_pending;
dma_dev->device_prep_slave_sg = axi_dmac_prep_slave_sg;
+   dma_dev->device_prep_slave_dma_vec = axi_dmac_prep_slave_dma_vec;
dma_dev->device_prep_dma_cyclic = axi_dmac_prep_dma_cyclic;
dma_dev->device_prep_interleaved_dma = axi_dmac_prep_interleaved;
dma_dev->device_terminate_all = axi_dmac_terminate_all;
-- 
2.43.0



[PATCH v6 0/6] iio: new DMABUF based API, v6

2024-01-29 Thread Paul Cercueil
Hi Jonathan,

This is the v6 of my patchset that introduces a new interface based on
DMABUF objects.

The code was updated quite a bit, using the feedback on the list for
this patchset but also the feedback I received on the FunctionFS
patchset that I'm working on upstreaming in parallel [1] where the
DMABUF handling code is very similar.

See below for the full changelog.

I decided to drop the scope-based memory management for dma_buf and
I hope you are OK with that. Christian wants the patch(es) to support
scope-based memory management in dma-buf as a separate patchset; once
it's in, I will gladly send a follow-up patch to use __free() where it
makes sense.

For performance numbers, I'll point you to the cover letter for my v5
patchset [2].

This patchset was based on next-20240129.

Cheers,
-Paul

[1] https://lore.kernel.org/all/20230322092118.9213-1-p...@crapouillou.net/
[2] 
https://lore.kernel.org/linux-iio/219abc43b4fdd4a13b307ed2efaa0e6869e68e3f.ca...@gmail.com/T/

---

Changelog:
* [2/6]:
- Use new prototype for axi_dmac_alloc_desc() as it changed upstream
* [3/6]:
- Remove dead code in iio_dma_resv_lock()
- Fix non-block actually blocking
- Cache dma_buf_attachment instead of mapping/unmapping it for every
  transfer
- Return -EINVAL instead of IIO_IOCTL_UNHANDLED for unknown ioctl
- Make .block_enqueue() callback take a dma_fence pointer, which
  will be passed to iio_buffer_signal_dmabuf_done() instead of the
  dma_buf_attachment; and remove the backpointer from the priv
  structure to the dma_fence.
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Unref dma_fence and dma_buf_attachment in worker, because they
  might try to lock the dma_resv, which would deadlock.
- Add buffer ops to lock/unlock the queue. This is motivated by the
  fact that once the dma_fence has been installed, we cannot lock
  anything anymore - so the queue must be locked before the
  dma_fence is installed.
- Use 'long retl' variable to handle the return value of
  dma_resv_wait_timeout()
- Protect dmabufs list access with a mutex
- Rework iio_buffer_find_attachment() to use the internal dmabufs
  list, instead of messing with dmabufs private data.
- Add an atomically-increasing sequence number for fences
* [4/6]:
- Update iio_dma_buffer_enqueue_dmabuf() to take a dma_fence pointer
- Pass that dma_fence pointer along to
  iio_buffer_signal_dmabuf_done()
- Add iio_dma_buffer_lock_queue() / iio_dma_buffer_unlock_queue()
- Do not lock the queue in iio_dma_buffer_enqueue_dmabuf().
  The caller will ensure that it has been locked already.
- Replace "int += bool;" by "if (bool) int++;"
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Use one "num_dmabufs" fields instead of one "num_blocks" and one
  "num_fileio_blocks". Make it an atomic_t, which makes it possible
  to decrement it atomically in iio_buffer_block_release() without
  having to lock the queue mutex; and in turn, it means that we
  don't need to use iio_buffer_block_put_atomic() everywhere to
  avoid locking the queue mutex twice.
- Use cleanup.h guard(mutex) when possible
- Explicitely list all states in the switch in
  iio_dma_can_enqueue_block()
- Rename iio_dma_buffer_fileio_mode() to
  iio_dma_buffer_can_use_fileio(), and add a comment explaining why
  it cannot race vs. DMABUF.
* [5/6]:
- Populate .lock_queue / .unlock_queue callbacks
- Switch to atomic memory allocations in .submit_queue, because of
  the dma_fence critical section
- Make sure that the size of the scatterlist is enough

---
Paul Cercueil (6):
  dmaengine: Add API function dmaengine_prep_slave_dma_vec()
  dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
  iio: core: Add new DMABUF interface infrastructure
  iio: buffer-dma: Enable support for DMABUFs
  iio: buffer-dmaengine: Support new DMABUF based userspace API
  Documentation: iio: Document high-speed DMABUF based API

 Documentation/iio/dmabuf_api.rst  |  54 ++
 Documentation/iio/index.rst   |   2 +
 drivers/dma/dma-axi-dmac.c|  40 ++
 drivers/iio/buffer/industrialio-buffer-dma.c  | 181 ++-
 .../buffer/industrialio-buffer-dmaengine.c|  58 ++-
 drivers/iio/industrialio-buffer.c | 462 ++
 include/linux/dmaengine.h |  25 +
 include/linux/iio/buffer-dma.h|  31 ++
 include/linux/iio/buffer_impl.h   |  33 ++
 include/uapi/linux/iio/buffer.h   |  22 +
 10 files changed, 891 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/iio/dmabuf_api.rst

-- 
2.43.0



Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Mario Limonciello

On 1/29/2024 10:46, Jani Nikula wrote:

On Mon, 29 Jan 2024, Mario Limonciello  wrote:

On 1/29/2024 03:39, Jani Nikula wrote:

On Fri, 26 Jan 2024, Mario Limonciello  wrote:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 9caba10315a8..c7e1563a46d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct drm_connector 
*connector)
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
   
+	if (amdgpu_connector->edid)

+   return;
+
+   /* if the BIOS specifies the EDID via _DDC, prefer this */
+   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);


Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
bigger than 4. Of course it should not, but you have no guarantees, and
it originates outside of the kernel.

The real fix is to have the function return a struct drm_edid which
tracks the allocation size separately. Unfortunately, it requires a
bunch of changes along the way. We've mostly done it in i915, and I've
sent a series to do this in drm/bridge [1].


Looking at it again, perhaps the ACPI code should just return a blob,
and the drm code should have a helper to wrap that around struct
drm_edid, so that the ACPI code does not have to depend on drm. Basic
idea remains.


I'd ideally like to split this stuff and Melissa's rework to be 
independent if possible.  I'll see if that's actually feasible.





Bottom line, we should stop using struct edid in drivers. They'll all
parse the info differently, and from what I've seen, often wrong.




Thanks for the feedback.  In that case this specific change should
probably rebase on the Melissa's work
https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/
after she takes into account the feedback.

Let me ask you this though - do you think that after that's done should
we let all drivers get EDID from BIOS as a priority?  Or would you
prefer that this is unique to amdgpu?


If the reason for having this is that the panel EDID contains some
garbage, that's certainly not unique to amdgpu... :p


OK; maybe a helper in DRM that wraps the ACPI code then and amdgpu will 
use the helper for this series.


I'm also thinking it makes sense to have a new /proc/cmdline setup 
option to ignore the BIOS for EDID.  I'm hoping that since Windows uses 
_DDC that BIOS will be higher quality; but you know; BIOS =)





Something like:

1) If user specifies on kernel command line and puts an EDID in
/lib/firmware use that.
2) If BIOS has EDID in _DDC and it's eDP panel, use that.


I think we should also look into this. We currently don't do this, and
it might help with some machines. However, gut feeling says it's
probably better to keep this as a per driver decision instead of trying
to bolt it into drm helpers.


OK; I'll wire up the helper and if you want to use in the future you can 
too then.




BR,
Jani.



3) Get panel EDID.







Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Frieder Schrempf
On 29.01.24 10:20, Frieder Schrempf wrote:
> On 26.01.24 19:28, Dave Airlie wrote:
>> Just FYI this conflictted pretty heavily with drm-misc-next changes in
>> the same area, someone should check drm-tip has the correct
>> resolution, I'm not really sure what is definitely should be.
>>
>> Dave.
> 
> Thanks! I took a quick look at what is now in Linus' tree and it looks
> correct to me. The only thing I'm missing is my Reviewed-by tag which
> got lost somewhere, but I can get over that.

Apparently I missed the point here. I was looking at the wrong trees
(drm-next and master instead of drm-misc-next and drm-tip). Sorry for
the noise. Michael already pointed out the correct details.


Re: [PATCH] drm/xe: Fix a build error

2024-01-29 Thread Christian König

Am 27.01.24 um 16:53 schrieb Oak Zeng:

This fixes a build failure on drm-tip. This issue was introduced during
merge of "drm/ttm: replace busy placement with flags v6". For some
reason, the xe_bo.c part of above change is not merged. Manually merge
the missing part to drm_tip


Mhm, I provided this as manual fixup for drm-tip in this rerere commit:

commit afc5797e8c03bed3ec47a34f2bc3cf03fce24411
Author: Christian König 
Date:   Thu Jan 25 10:44:54 2024 +0100

    2024y-01m-25d-09h-44m-07s UTC: drm-tip rerere cache update

    git version 2.34.1


And for me compiling xe in drm-tip worked fine after that. No idea why 
that didn't work for you.


Anyway feel free to add my rb to this patch here if it helps in any way.

Regards,
Christian.



Signed-off-by: Oak Zeng 
---
  drivers/gpu/drm/xe/xe_bo.c | 33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 686d716c5581..d6a193060cc0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -38,22 +38,26 @@ static const struct ttm_place sys_placement_flags = {
  static struct ttm_placement sys_placement = {
.num_placement = 1,
.placement = _placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = _placement_flags,
  };
  
-static const struct ttm_place tt_placement_flags = {

-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = XE_PL_TT,
-   .flags = 0,
+static const struct ttm_place tt_placement_flags[] = {
+   {
+   .fpfn = 0,
+   .lpfn = 0,
+   .mem_type = XE_PL_TT,
+   .flags = TTM_PL_FLAG_DESIRED,
+   },
+   {
+   .fpfn = 0,
+   .lpfn = 0,
+   .mem_type = XE_PL_SYSTEM,
+   .flags = TTM_PL_FLAG_FALLBACK,
+   }
  };
  
  static struct ttm_placement tt_placement = {

-   .num_placement = 1,
-   .placement = _placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = _placement_flags,
+   .num_placement = 2,
+   .placement = tt_placement_flags,
  };
  
  bool mem_type_is_vram(u32 mem_type)

@@ -230,8 +234,6 @@ static int __xe_bo_placement_for_flags(struct xe_device 
*xe, struct xe_bo *bo,
bo->placement = (struct ttm_placement) {
.num_placement = c,
.placement = bo->placements,
-   .num_busy_placement = c,
-   .busy_placement = bo->placements,
};
  
  	return 0;

@@ -251,7 +253,6 @@ static void xe_evict_flags(struct ttm_buffer_object *tbo,
/* Don't handle scatter gather BOs */
if (tbo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
  
@@ -1391,8 +1392,6 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,

bo->placement = (struct ttm_placement) {
.num_placement = 1,
.placement = place,
-   .num_busy_placement = 1,
-   .busy_placement = place,
};
  
  	return 0;

@@ -2150,9 +2149,7 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
  
  	xe_place_from_ttm_type(mem_type, );

placement.num_placement = 1;
-   placement.num_busy_placement = 1;
placement.placement = 
-   placement.busy_placement = 
  
  	/*

 * Stolen needs to be handled like below VRAM handling if we ever need




Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Jani Nikula
On Mon, 29 Jan 2024, Mario Limonciello  wrote:
> On 1/29/2024 03:39, Jani Nikula wrote:
>> On Fri, 26 Jan 2024, Mario Limonciello  wrote:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> index 9caba10315a8..c7e1563a46d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> @@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct 
>>> drm_connector *connector)
>>> struct amdgpu_device *adev = drm_to_adev(dev);
>>> struct amdgpu_connector *amdgpu_connector = 
>>> to_amdgpu_connector(connector);
>>>   
>>> +   if (amdgpu_connector->edid)
>>> +   return;
>>> +
>>> +   /* if the BIOS specifies the EDID via _DDC, prefer this */
>>> +   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);
>> 
>> Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
>> bigger than 4. Of course it should not, but you have no guarantees, and
>> it originates outside of the kernel.
>> 
>> The real fix is to have the function return a struct drm_edid which
>> tracks the allocation size separately. Unfortunately, it requires a
>> bunch of changes along the way. We've mostly done it in i915, and I've
>> sent a series to do this in drm/bridge [1].

Looking at it again, perhaps the ACPI code should just return a blob,
and the drm code should have a helper to wrap that around struct
drm_edid, so that the ACPI code does not have to depend on drm. Basic
idea remains.

>> Bottom line, we should stop using struct edid in drivers. They'll all
>> parse the info differently, and from what I've seen, often wrong.
>> 
>> 
>
> Thanks for the feedback.  In that case this specific change should 
> probably rebase on the Melissa's work 
> https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/ 
> after she takes into account the feedback.
>
> Let me ask you this though - do you think that after that's done should 
> we let all drivers get EDID from BIOS as a priority?  Or would you 
> prefer that this is unique to amdgpu?

If the reason for having this is that the panel EDID contains some
garbage, that's certainly not unique to amdgpu... :p

> Something like:
>
> 1) If user specifies on kernel command line and puts an EDID in 
> /lib/firmware use that.
> 2) If BIOS has EDID in _DDC and it's eDP panel, use that.

I think we should also look into this. We currently don't do this, and
it might help with some machines. However, gut feeling says it's
probably better to keep this as a per driver decision instead of trying
to bolt it into drm helpers.

BR,
Jani.


> 3) Get panel EDID.
>

-- 
Jani Nikula, Intel


Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König

Am 29.01.24 um 17:24 schrieb Felix Kuehling:

On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really 
not do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one 
device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).
Exactly that thinking is what we have currently found as blocker 
for a
virtualization projects. Having SVM as device independent feature 
which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature 
required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this 
on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting 
information where to
migrate memory then that's a bug in userspace and not something 
the kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have 
seen with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself 
that I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from 
the guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
should not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in 
kfd makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces 
a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS
(address translation services) or whatever your hw calls it and has 
_no_
device-side pagetables on top. Which from what I've seen all 
devices with
device-memory have, simply because they need some place to store 
whether
that memory is currently in device memory or should be translated 
using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct 
page as

ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device 
memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without 

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling



On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really not 
do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one 
device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature 
which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature required 
for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this 
on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting information 
where to
migrate memory then that's a bug in userspace and not something the 
kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have seen 
with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself that 
I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from the 
guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should 
not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in kfd 
makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces 
a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS
(address translation services) or whatever your hw calls it and has 
_no_
device-side pagetables on top. Which from what I've seen all devices 
with
device-memory have, simply because they need some place to store 
whether

that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified 

Re: [PATCH 1/2] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-29 Thread Mario Limonciello

On 1/29/2024 07:54, Rafael J. Wysocki wrote:

On Fri, Jan 26, 2024 at 7:55 PM Mario Limonciello
 wrote:


The ACPI specification allows for an EDID to be up to 512 bytes but
the _DDC EDID fetching code will only try up to 256 bytes.

Modify the code to instead start at 512 bytes and work it's way
down instead.

Link: 
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
Signed-off-by: Mario Limonciello 
---
  drivers/acpi/acpi_video.c | 23 ---
  1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
index 62f4364e4460..b3b15dd4755d 100644
--- a/drivers/acpi/acpi_video.c
+++ b/drivers/acpi/acpi_video.c
@@ -624,6 +624,10 @@ acpi_video_device_EDID(struct acpi_video_device *device,
 arg0.integer.value = 1;
 else if (length == 256)
 arg0.integer.value = 2;
+   else if (length == 384)
+   arg0.integer.value = 3;
+   else if (length == 512)
+   arg0.integer.value = 4;


It looks like switch () would be somewhat better.

Or maybe even

arg0.integer.value = length / 128;

The validation could be added too:

if (arg0.integer.value > 4 || arg0.integer.value * 128 != length)
 return -EINVAL;

but it is pointless, because the caller is never passing an invalid
number to it AFAICS.



Thanks.  I'll swap over to one of these suggestions.

I will also split this patch separately from the other as the other will 
take some time with refactoring necessary in DRM that will take a cycle 
or two.



 else
 return -EINVAL;

@@ -1443,7 +1447,7 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,

 for (i = 0; i < video->attached_count; i++) {
 video_device = video->attached_array[i].bind_info;
-   length = 256;
+   length = 512;

 if (!video_device)
 continue;
@@ -1478,13 +1482,18 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,

 if (ACPI_FAILURE(status) || !buffer ||
 buffer->type != ACPI_TYPE_BUFFER) {
-   length = 128;
-   status = acpi_video_device_EDID(video_device, ,
-   length);
-   if (ACPI_FAILURE(status) || !buffer ||
-   buffer->type != ACPI_TYPE_BUFFER) {
-   continue;
+   while (length) {


I would prefer a do {} while () loop here, which could include the
first invocation of acpi_video_device_EDID() too (and reduce code
duplication a bit).


+   length -= 128;
+   status = acpi_video_device_EDID(video_device, 
,
+   length);


No line break, please.


+   if (ACPI_FAILURE(status) || !buffer ||
+   buffer->type != ACPI_TYPE_BUFFER) {
+   continue;
+   }
+   break;
 }
+   if (!length)
+   continue;
 }

 *edid = buffer->buffer.pointer;
--




Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Mario Limonciello

On 1/29/2024 03:39, Jani Nikula wrote:

On Fri, 26 Jan 2024, Mario Limonciello  wrote:

Some manufacturers have intentionally put an EDID that differs from
the EDID on the internal panel on laptops.

Attempt to fetch this EDID if it exists and prefer it over the EDID
that is provided by the panel.

Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c  | 30 +++
  .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  5 
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 -
  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  7 +++--
  5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c5f3859fd682..99abe12567a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1520,6 +1520,7 @@ int amdgpu_acpi_get_mem_info(struct amdgpu_device *adev, 
int xcc_id,
  
  void amdgpu_acpi_get_backlight_caps(struct amdgpu_dm_backlight_caps *caps);

  bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev);
+void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector 
*connector);
  void amdgpu_acpi_detect(void);
  void amdgpu_acpi_release(void);
  #else
@@ -1537,6 +1538,7 @@ static inline int amdgpu_acpi_get_mem_info(struct 
amdgpu_device *adev,
  }
  static inline void amdgpu_acpi_fini(struct amdgpu_device *adev) { }
  static inline bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev) { 
return false; }
+static inline void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct 
drm_connector *connector) { return NULL; }
  static inline void amdgpu_acpi_detect(void) { }
  static inline void amdgpu_acpi_release(void) { }
  static inline bool amdgpu_acpi_is_power_shift_control_supported(void) { 
return false; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index e550067e5c5d..c106335f1f22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1380,6 +1380,36 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device 
*adev)
  #endif
  }
  
+/**

+ * amdgpu_acpi_edid
+ * @adev: amdgpu_device pointer
+ * @connector: drm_connector pointer
+ *
+ * Returns the EDID used for the internal panel if present, NULL otherwise.
+ */
+void *
+amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector *connector)
+{
+   struct drm_device *ddev = adev_to_drm(adev);
+   struct acpi_device *acpidev = ACPI_COMPANION(ddev->dev);
+   void *edid;
+   int r;
+
+   if (!acpidev)
+   return NULL;
+
+   if (connector->connector_type != DRM_MODE_CONNECTOR_eDP)
+   return NULL;
+
+   r = acpi_video_get_edid(acpidev, ACPI_VIDEO_DISPLAY_LCD, -1, );
+   if (r < 0) {
+   DRM_DEBUG_DRIVER("Failed to get EDID from ACPI: %d\n", r);
+   return NULL;
+   }
+
+   return kmemdup(edid, r, GFP_KERNEL);
+}
+
  /*
   * amdgpu_acpi_detect - detect ACPI ATIF/ATCS methods
   *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 9caba10315a8..c7e1563a46d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct drm_connector 
*connector)
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
  
+	if (amdgpu_connector->edid)

+   return;
+
+   /* if the BIOS specifies the EDID via _DDC, prefer this */
+   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);


Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
bigger than 4. Of course it should not, but you have no guarantees, and
it originates outside of the kernel.

The real fix is to have the function return a struct drm_edid which
tracks the allocation size separately. Unfortunately, it requires a
bunch of changes along the way. We've mostly done it in i915, and I've
sent a series to do this in drm/bridge [1].

Bottom line, we should stop using struct edid in drivers. They'll all
parse the info differently, and from what I've seen, often wrong.




Thanks for the feedback.  In that case this specific change should 
probably rebase on the Melissa's work 
https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/ 
after she takes into account the feedback.


Let me ask you this though - do you think that after that's done should 
we let all drivers get EDID from BIOS as a priority?  Or would you 
prefer that this is unique to amdgpu?


Something like:

1) If user specifies on kernel command line and puts an EDID in 
/lib/firmware use that.

2) If BIOS has EDID in _DDC and it's eDP panel, use that.
3) Get panel EDID.


BR,
Jani.


[1] 

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Michael Walle

Just FYI this conflictted pretty heavily with drm-misc-next changes in
the same area, someone should check drm-tip has the correct
resolution, I'm not really sure what is definitely should be.


FWIW, this looks rather messy now. The drm-tip doesn't build.

There was a new call to samsung_dsim_set_stop_state() introduced
in commit b2fe2292624ac (drm: bridge: samsung-dsim: enter display
mode in the enable() callback).


I had a closer look at the latest linux-next (where somehow my patch
made it into) and tried to apply commit b2fe2292624ac (drm: bridge:
samsung-dsim: enter display mode in the enable() callback). It looks
like only the following hunk is still needed from that patch. Everything
else is covered by this fixes patch.

Dario, could you rebase your commit onto this patch? I had a quick test
with this change and it seems to work fine for our case.

--snip--
diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c 
b/drivers/gpu/drm/bridge/samsung-dsim.c

index 63a1a0c88be4..92755c90e7d2 100644
--- a/drivers/gpu/drm/bridge/samsung-dsim.c
+++ b/drivers/gpu/drm/bridge/samsung-dsim.c
@@ -1498,6 +1498,8 @@ static void samsung_dsim_atomic_disable(struct 
drm_bridge *bridge,

if (!(dsi->state & DSIM_STATE_ENABLED))
return;

+   samsung_dsim_set_display_enable(dsi, false);
+
dsi->state &= ~DSIM_STATE_VIDOUT_AVAILABLE;
 }

@@ -1506,8 +1508,6 @@ static void 
samsung_dsim_atomic_post_disable(struct drm_bridge *bridge,

 {
struct samsung_dsim *dsi = bridge_to_dsi(bridge);

-   samsung_dsim_set_display_enable(dsi, false);
-
dsi->state &= ~DSIM_STATE_ENABLED;
pm_runtime_put_sync(dsi->dev);
 }
--snip--

-michael


Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really not 
do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one device 
(i.e., if you want one device to access another device's memory, 
you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this on 
a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting information 
where to
migrate memory then that's a bug in userspace and not something the 
kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple devices 
you
will sooner or later start to run into the same mess we have seen 
with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself that 
I see
this design as a rather extreme failure. And I think it's one of the 
reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from the 
guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should 
not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in kfd 
makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces a 
1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS

(address translation services) or whatever your hw calls it and has _no_
device-side pagetables on top. Which from what I've seen all devices 
with

device-memory have, simply because they need some place to store whether
that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but there 
are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% agreeing 
with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified memory is a feature implemented by the KFD SVM API 

Re: [PATCH v4 0/8] drm/amd/display: Introduce KUnit to Display Mode Library

2024-01-29 Thread Christian König

That we include so many C files with relative paths seems to be odd.

Apart from that looks good to me.

Christian.

Am 26.01.24 um 16:48 schrieb Rodrigo Siqueira:

In 2022, we got a great patchset from a GSoC project introducing unit
tests to the amdgpu display. Since version 3, this effort was put on
hold, and now I'm attempting to revive it. I'll add part of the original
cover letter at the bottom of this cover letter, but you can read all
the original messages at:

https://lore.kernel.org/amd-gfx/20220912155919.39877-1-mairaca...@riseup.net/

Anyway, this new version changes are:
- Rebase and adjust conflicts.
- Rewrite part of the dc_dmub_srv_test to represent a real scenario that
   simulates some parameter configuration for using 4k144 and 4k240
   displays.

Thanks
Siqueira

Original cover letter

Hello,

This series is version 3 of the introduction of unit testing to the
AMDPGU driver [1].

Our main goal is to bring unit testing to the AMD display driver; in
particular, we'll focus on the Display Mode Library (DML) for DCN2.0,
DMUB, and some of the DCE functions. This implementation intends to
help developers to recognize bugs before they are merged into the
mainline and also makes it possible for future code refactors of the
AMD display driver.

For the implementation of the tests, we decided to go with the Kernel
Unit Testing Framework (KUnit). KUnit makes it possible to run test
suites on kernel boot or load the tests as a module. It reports all test
case results through a TAP (Test Anything Protocol) in the kernel log.
Moreover, KUnit unifies the test structure and provides tools to
simplify the testing for developers and CI systems.

In regards to CI pipelines, we believe kunit_tool [2] provides
ease of use, but we are also working on integrating KUnit into IGT [3].

Since the second version, we've chosen a mix of approaches to integrate
KUnit tests into amdgpu:
 1. Tests that use static functions are included through guards [4].
 2. Tests without static functions are included through a Makefile.

We understand that testing static functions is not ideal, but taking into
consideration that this driver relies heavily on static functions with
complex behavior which would benefit from unit testing, otherwise, black-box
tested through public functions with dozens of arguments and sometimes high
cyclomatic complexity.

The first seven patches represent what we intend to do for the rest of the
DML modules: systematic testing of the DML functions, especially mathematically
complicated functions. Also, it shows how simple it is to add new tests to the 
DML.

Among the tests, we highlight the dcn20_fpu_test, which, had it existed
then, could catch the defects introduced to dcn20_fpu.c by 8861c27a6c [5]
later fixed by 9ad5d02c2a [6].

In this series, there's also an example of how unit tests can help avoid
regressions and keep track of changes in behavior.

[..]

Isabella Basso (1):
   drm/amd/display: Introduce KUnit tests to display_rq_dlg_calc_20

Magali Lemes (1):
   drm/amd/display: Introduce KUnit tests for dcn20_fpu

Maíra Canal (5):
   drm/amd/display: Introduce KUnit tests to the bw_fixed library
   drm/amd/display: Introduce KUnit tests to the display_mode_vba library
   drm/amd/display: Introduce KUnit to dcn20/display_mode_vba_20 library
   drm/amd/display: Introduce KUnit tests to dc_dmub_srv library
   Documentation/gpu: Add Display Core Unit Test documentation

Tales Aparecida (1):
   drm/amd/display: Introduce KUnit tests for fixed31_32 library

  .../gpu/amdgpu/display/display-test.rst   |  88 ++
  Documentation/gpu/amdgpu/display/index.rst|   1 +
  drivers/gpu/drm/amd/display/Kconfig   |  52 +
  drivers/gpu/drm/amd/display/Makefile  |   2 +-
  drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |   4 +
  .../dc/dml/dcn20/display_mode_vba_20.c|   4 +
  .../dc/dml/dcn20/display_rq_dlg_calc_20.c |   4 +
  drivers/gpu/drm/amd/display/tests/Makefile|  18 +
  .../display/tests/dc/basics/fixpt31_32_test.c | 232 +
  .../amd/display/tests/dc/dc_dmub_srv_test.c   | 159 
  .../tests/dc/dml/calcs/bw_fixed_test.c| 323 +++
  .../tests/dc/dml/dcn20/dcn20_fpu_test.c   | 561 +++
  .../dc/dml/dcn20/display_mode_vba_20_test.c   | 888 ++
  .../dml/dcn20/display_rq_dlg_calc_20_test.c   | 124 +++
  .../tests/dc/dml/display_mode_vba_test.c  | 741 +++
  15 files changed, 3200 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/gpu/amdgpu/display/display-test.rst
  create mode 100644 drivers/gpu/drm/amd/display/tests/Makefile
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/basics/fixpt31_32_test.c
  create mode 100644 drivers/gpu/drm/amd/display/tests/dc/dc_dmub_srv_test.c
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/dml/calcs/bw_fixed_test.c
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/dml/dcn20/dcn20_fpu_test.c
  create mode 100644 

Re: [PATCH 5/5] drm/msm/dpu: Add X1E80100 support

2024-01-29 Thread Dmitry Baryshkov
On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> Add definitions for the display hardware used on the Qualcomm X1E80100
> platform.
>
> Co-developed-by: Abhinav Kumar 
> Signed-off-by: Abhinav Kumar 
> Signed-off-by: Abel Vesa 
> ---
>  .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 
> +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
>  4 files changed, 453 insertions(+)
>


Reviewed-by: Dmitry Baryshkov 


-- 
With best wishes
Dmitry


Re: [PATCH 3/5] drm/msm: mdss: Add X1E80100 support

2024-01-29 Thread Dmitry Baryshkov
On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> Add support for MDSS on X1E80100.
>
> Signed-off-by: Abel Vesa 
> ---
>  drivers/gpu/drm/msm/msm_mdss.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 455b2e3a0cdd..eddf7fdbb60a 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -564,6 +564,15 @@ static const struct msm_mdss_data sdm670_data = {
> .highest_bank_bit = 1,
>  };
>
> +static const struct msm_mdss_data x1e80100_data = {
> +   .ubwc_enc_version = UBWC_4_0,
> +   .ubwc_dec_version = UBWC_4_3,
> +   .ubwc_swizzle = 6,
> +   .ubwc_static = 1,
> +   .highest_bank_bit = 2,
> +   .macrotile_mode = 1,

Missing .reg_bus_bw, LGTM otherwise

> +};
> +
>  static const struct msm_mdss_data sdm845_data = {
> .ubwc_enc_version = UBWC_2_0,
> .ubwc_dec_version = UBWC_2_0,
> @@ -655,6 +664,7 @@ static const struct of_device_id mdss_dt_match[] = {
> { .compatible = "qcom,sm8450-mdss", .data = _data },
> { .compatible = "qcom,sm8550-mdss", .data = _data },
> { .compatible = "qcom,sm8650-mdss", .data = _data},
> +   { .compatible = "qcom,x1e80100-mdss", .data = _data},
> {}
>  };
>  MODULE_DEVICE_TABLE(of, mdss_dt_match);
>
> --
> 2.34.1
>


-- 
With best wishes
Dmitry


Re: Re: Re: [PATCH 1/3] bits: introduce fixed-type genmasks

2024-01-29 Thread Yury Norov
On Mon, Jan 29, 2024 at 08:49:35AM -0600, Lucas De Marchi wrote:
> On Wed, Jan 24, 2024 at 07:27:58AM -0800, Yury Norov wrote:
> > On Wed, Jan 24, 2024 at 08:03:53AM -0600, Lucas De Marchi wrote:
> > > On Wed, Jan 24, 2024 at 09:58:26AM +0200, Jani Nikula wrote:
> > > > On Tue, 23 Jan 2024, Lucas De Marchi  wrote:
> > > > > From: Yury Norov 
> > > > >
> > > > > Generalize __GENMASK() to support different types, and implement
> > > > > fixed-types versions of GENMASK() based on it. The fixed-type version
> > > > > allows more strict checks to the min/max values accepted, which is
> > > > > useful for defining registers like implemented by i915 and xe drivers
> > > > > with their REG_GENMASK*() macros.
> > > >
> > > > Mmh, the commit message says the fixed-type version allows more strict
> > > > checks, but none are actually added. GENMASK_INPUT_CHECK() remains the
> > > > same.
> > > >
> > > > Compared to the i915 and xe versions, this is more lax now. You could
> > > > specify GENMASK_U32(63,32) without complaints.
> > > 
> > > Doing this on top of the this series:
> > > 
> > > -#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(30, 
> > > 27)
> > > +#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(62, 
> > > 32)
> > > 
> > > and I do get a build failure:
> > > 
> > > ../drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function 
> > > ‘__intel_cx0_read_once’:
> > > ../include/linux/bits.h:41:31: error: left shift count >= width of type 
> > > [-Werror=shift-count-overflow]
> > >41 |  (((t)~0ULL - ((t)(1) << (l)) + 1) & \
> > >   |   ^~
> > 
> > I would better include this in commit message to avoid people's
> > confusion. If it comes to v2, can you please do it and mention that
> > this trick relies on shift-count-overflow compiler check?
> 
> either that or an explicit check as it was suggested. What's your
> preference?

Let's put a comment in the code. An argument that shift-count-overflow
may be disabled sounds more like a speculation unless we have a solid
example of a build system where the error is disabled for a good sane
reason, but possible GENMASK() overflow is still considered dangerous.

GENMASK() is all about bit shifts, so shift-related error is something
I'd expect when using GENMASK().

Also, the macro is widely used in the kernel:

yury:linux$ git grep GENMASK | wc -l
26879

Explicit check would add pressure on the compiler for nothing. 

Thanks,
Yury


Re: [PATCH 4/5] drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint first

2024-01-29 Thread Dmitry Baryshkov
On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> From: Abhinav Kumar 
>
> On platforms where the endpoint used is on port@0, looking for port@1
> instead results in just ignoring the max link-frequencies altogether.
> Look at port@0 first, then, if not found, look for port@1.

NAK. Platforms do not "use port@0". It is for the connection between
DPU and DP, while the link-frequencies property is for the link
between DP controller and the actual display.

>
> Signed-off-by: Abhinav Kumar 
> Signed-off-by: Abel Vesa 
> ---
>  drivers/gpu/drm/msm/dp/dp_parser.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
> b/drivers/gpu/drm/msm/dp/dp_parser.c
> index 7032dcc8842b..eec5b8b83f4b 100644
> --- a/drivers/gpu/drm/msm/dp/dp_parser.c
> +++ b/drivers/gpu/drm/msm/dp/dp_parser.c
> @@ -97,7 +97,11 @@ static u32 dp_parser_link_frequencies(struct device_node 
> *of_node)
> u64 frequency = 0;
> int cnt;
>
> -   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* port@1 */
> +   endpoint = of_graph_get_endpoint_by_regs(of_node, 0, 0); /* port@0 */
> +
> +   if (!endpoint)
> +   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* 
> port@1 */
> +
> if (!endpoint)
> return 0;
>
>
> --
> 2.34.1
>


-- 
With best wishes
Dmitry


Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at the svm_ioctl 
function, it is per-process based. Each kfd_process represent a process across 
N gpu devices.

Yeah and that was a big mistake in my opinion. We should really not do that
ever again.


Need to say, kfd SVM represent a shared virtual address space across CPU and 
all GPU devices on the system. This is by the definition of SVM (shared virtual 
memory). This is very different from our legacy gpu *device* driver which works 
for only one device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature which
somehow ties to the process address space turned out to be an extremely bad
idea.

The background is that this only works for some use cases but not all of
them.

What's working much better is to just have a mirror functionality which says
that a range A..B of the process address space is mapped into a range C..D
of the GPU address space.

Those ranges can then be used to implement the SVM feature required for
higher level APIs and not something you need at the UAPI or even inside the
low level kernel memory management.

When you talk about migrating memory to a device you also do this on a per
device basis and *not* tied to the process address space. If you then get
crappy performance because userspace gave contradicting information where to
migrate memory then that's a bug in userspace and not something the kernel
should try to prevent somehow.

[SNIP]

I think if you start using the same drm_gpuvm for multiple devices you
will sooner or later start to run into the same mess we have seen with
KFD, where we moved more and more functionality from the KFD to the DRM
render node because we found that a lot of the stuff simply doesn't work
correctly with a single object to maintain the state.

As I understand it, KFD is designed to work across devices. A single pseudo 
/dev/kfd device represent all hardware gpu devices. That is why during kfd 
open, many pdd (process device data) is created, each for one hardware device 
for this process.

Yes, I'm perfectly aware of that. And I can only repeat myself that I see
this design as a rather extreme failure. And I think it's one of the reasons
why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of extending the
CPU process into the GPUs, but this idea only works for a few use cases and
is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up with CPU
address != GPU address because the VAs are actually coming from the guest VM
and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have
any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need to get
explicit permission to do this from Dave and Daniel and maybe even Linus.

I think the one and only one exception where an SVM uapi like in kfd makes
sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces a 1:1
mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, PCI-ATS
(address translation services) or whatever your hw calls it and has _no_
device-side pagetables on top. Which from what I've seen all devices with
device-memory have, simply because they need some place to store whether
that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but there are
some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% agreeing with
Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified memory is a feature implemented by the KFD SVM API and used by 
ROCm. This is used e.g. to implement OpenMP USM (unified shared memory). 

Re: Re: Re: [PATCH 1/3] bits: introduce fixed-type genmasks

2024-01-29 Thread Lucas De Marchi

On Wed, Jan 24, 2024 at 07:27:58AM -0800, Yury Norov wrote:

On Wed, Jan 24, 2024 at 08:03:53AM -0600, Lucas De Marchi wrote:

On Wed, Jan 24, 2024 at 09:58:26AM +0200, Jani Nikula wrote:
> On Tue, 23 Jan 2024, Lucas De Marchi  wrote:
> > From: Yury Norov 
> >
> > Generalize __GENMASK() to support different types, and implement
> > fixed-types versions of GENMASK() based on it. The fixed-type version
> > allows more strict checks to the min/max values accepted, which is
> > useful for defining registers like implemented by i915 and xe drivers
> > with their REG_GENMASK*() macros.
>
> Mmh, the commit message says the fixed-type version allows more strict
> checks, but none are actually added. GENMASK_INPUT_CHECK() remains the
> same.
>
> Compared to the i915 and xe versions, this is more lax now. You could
> specify GENMASK_U32(63,32) without complaints.

Doing this on top of the this series:

-#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(30, 27)
+#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(62, 32)

and I do get a build failure:

../drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function 
‘__intel_cx0_read_once’:
../include/linux/bits.h:41:31: error: left shift count >= width of type 
[-Werror=shift-count-overflow]
   41 |  (((t)~0ULL - ((t)(1) << (l)) + 1) & \
  |   ^~


I would better include this in commit message to avoid people's
confusion. If it comes to v2, can you please do it and mention that
this trick relies on shift-count-overflow compiler check?


either that or an explicit check as it was suggested. What's your
preference?

Lucas De Marchi



Thanks,
Yury


Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil
Le lundi 29 janvier 2024 à 14:32 +0100, Paul Cercueil a écrit :
> Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> > Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > > Hi Christian,
> > > 
> > > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > > +   iio_buffer_dmabuf_put(attach);
> > > > > > > > +
> > > > > > > > +out_dmabuf_put:
> > > > > > > > +   dma_buf_put(dmabuf);
> > > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > > would
> > > > > > > be a
> > > > > > > nice to have.
> > > > > > I'm working on the patches right now, just one quick
> > > > > > question.
> > > > > > 
> > > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > > first
> > > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > > >  > > > > > buf.h>, which has not been done yet.
> > > > > > 
> > > > > > That would mean carrying a dma-buf specific patch in your
> > > > > > tree,
> > > > > > are you
> > > > > > OK with that?
> > > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > > fine
> > > > > doing
> > > > > so.  Alternative is to circle back to this later after this
> > > > > code is
> > > > > upstream.
> > > > Separate patches for that please, the autocleanup feature is so
> > > > new
> > > > that
> > > > I'm not 100% convinced that everything works out smoothly from
> > > > the
> > > > start.
> > > Separate patches is a given, did you mean outside this patchset?
> > > Because I can send a separate patchset that introduces scope-
> > > based
> > > management for dma_fence and dma_buf, but then it won't have
> > > users.
> > 
> > Outside of the patchset, this is essentially brand new stuff.
> > 
> > IIRC we have quite a number of dma_fence selftests and sw_sync
> > which
> > is 
> > basically code inside the drivers/dma-buf directory only there for 
> > testing DMA-buf functionality.
> > 
> > Convert those over as well and I'm more than happy to upstream this
> > change.
> 
> Well there is very little to convert there; you can use scope-based
> management when the unref is done in all exit points of the
> functional
> block, and the only place I could find that does that in drivers/dma-
> buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Actually - not even that, since it doesn't call dma_fence_get() and
dma_fence_put() on the same fence.

So I cannot use it anywhere in drivers/dma-buf/.

-Paul


Re: [PATCH 1/2] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-29 Thread Rafael J. Wysocki
On Fri, Jan 26, 2024 at 7:55 PM Mario Limonciello
 wrote:
>
> The ACPI specification allows for an EDID to be up to 512 bytes but
> the _DDC EDID fetching code will only try up to 256 bytes.
>
> Modify the code to instead start at 512 bytes and work it's way
> down instead.
>
> Link: 
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/acpi/acpi_video.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 62f4364e4460..b3b15dd4755d 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -624,6 +624,10 @@ acpi_video_device_EDID(struct acpi_video_device *device,
> arg0.integer.value = 1;
> else if (length == 256)
> arg0.integer.value = 2;
> +   else if (length == 384)
> +   arg0.integer.value = 3;
> +   else if (length == 512)
> +   arg0.integer.value = 4;

It looks like switch () would be somewhat better.

Or maybe even

arg0.integer.value = length / 128;

The validation could be added too:

if (arg0.integer.value > 4 || arg0.integer.value * 128 != length)
return -EINVAL;

but it is pointless, because the caller is never passing an invalid
number to it AFAICS.

> else
> return -EINVAL;
>
> @@ -1443,7 +1447,7 @@ int acpi_video_get_edid(struct acpi_device *device, int 
> type, int device_id,
>
> for (i = 0; i < video->attached_count; i++) {
> video_device = video->attached_array[i].bind_info;
> -   length = 256;
> +   length = 512;
>
> if (!video_device)
> continue;
> @@ -1478,13 +1482,18 @@ int acpi_video_get_edid(struct acpi_device *device, 
> int type, int device_id,
>
> if (ACPI_FAILURE(status) || !buffer ||
> buffer->type != ACPI_TYPE_BUFFER) {
> -   length = 128;
> -   status = acpi_video_device_EDID(video_device, ,
> -   length);
> -   if (ACPI_FAILURE(status) || !buffer ||
> -   buffer->type != ACPI_TYPE_BUFFER) {
> -   continue;
> +   while (length) {

I would prefer a do {} while () loop here, which could include the
first invocation of acpi_video_device_EDID() too (and reduce code
duplication a bit).

> +   length -= 128;
> +   status = acpi_video_device_EDID(video_device, 
> ,
> +   length);

No line break, please.

> +   if (ACPI_FAILURE(status) || !buffer ||
> +   buffer->type != ACPI_TYPE_BUFFER) {
> +   continue;
> +   }
> +   break;
> }
> +   if (!length)
> +   continue;
> }
>
> *edid = buffer->buffer.pointer;
> --


Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil
Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > Hi Christian,
> > 
> > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > + iio_buffer_dmabuf_put(attach);
> > > > > > > +
> > > > > > > +out_dmabuf_put:
> > > > > > > + dma_buf_put(dmabuf);
> > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > would
> > > > > > be a
> > > > > > nice to have.
> > > > > I'm working on the patches right now, just one quick
> > > > > question.
> > > > > 
> > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > first
> > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > >  > > > > buf.h>, which has not been done yet.
> > > > > 
> > > > > That would mean carrying a dma-buf specific patch in your
> > > > > tree,
> > > > > are you
> > > > > OK with that?
> > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > fine
> > > > doing
> > > > so.  Alternative is to circle back to this later after this
> > > > code is
> > > > upstream.
> > > Separate patches for that please, the autocleanup feature is so
> > > new
> > > that
> > > I'm not 100% convinced that everything works out smoothly from
> > > the
> > > start.
> > Separate patches is a given, did you mean outside this patchset?
> > Because I can send a separate patchset that introduces scope-based
> > management for dma_fence and dma_buf, but then it won't have users.
> 
> Outside of the patchset, this is essentially brand new stuff.
> 
> IIRC we have quite a number of dma_fence selftests and sw_sync which
> is 
> basically code inside the drivers/dma-buf directory only there for 
> testing DMA-buf functionality.
> 
> Convert those over as well and I'm more than happy to upstream this
> change.

Well there is very little to convert there; you can use scope-based
management when the unref is done in all exit points of the functional
block, and the only place I could find that does that in drivers/dma-
buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Cheers,
-Paul


Re: Implement per-key keyboard backlight as auxdisplay?

2024-01-29 Thread Hans de Goede
Hi Werner,

On 1/19/24 17:04, Werner Sembach wrote:
> Am 19.01.24 um 09:44 schrieb Hans de Goede:



>> So my proposal would be an ioctl interface (ioctl only no r/w)
>> using /dev/rgbkbd0 /dev/rgbkdb1, etc. registered as a misc chardev.
>>
>> For per key controllable rgb LEDs we need to discuss a coordinate
>> system. I propose using a fixed size of 16 rows of 64 keys,
>> so 64x16 in standard WxH notation.
>>
>> And then storing RGB in separate bytes, so userspace will then
>> always send a buffer of 192 bytes per line (64x3) x 14 rows
>> = 3072 bytes. With the kernel driver ignoring parts of
>> the buffer where there are no actual keys.
> The be sure the "14 rows" is a typo? And should be 16 rows?

Yes that should be 16.



>> This way we can address all the possible keys in the various
>> standard layouts in one standard wat and then the drivers can
>> just skip keys which are not there when preparing the buffer
>> to send to the hw / fw.
> 
> Some remarks here:
> 
> - Some keyboards might have two or more leds for big keys like (iso-)enter, 
> shift, capslock, num+, etc. that in theory are individually controllable by 
> the firmware. In windows drivers this is usually abstracted away, but could 
> be interesting for effects (e.g. if the top of iso-enter is separate from the 
> bottom of iso-enter like with one of our devices).
> 
> - In combination with this: The driver might not be able to tell if the 
> actual physical keyboard is ISO or ANSI, so it might not be able the 
> correctly assign the leds around enter correctly as being an own key or being 
> part of ANSI- or ISO-enter.
> 
> - Should the interface have different addresses for the different enter and 
> num+ styles (or even the different length shifts and spacebars)?
> 
> One idea for this: Actually assign 1 value per line for tall keys per line, 3 
> (or maybe even 4, to have one spare) values per line for wide keys and 6 (or 
> 8) values for space. e.g.:

That sounds workable OTOH combined with your remarks about also supporting
lightbars. I'm starting to think that we need to just punt this to userspace.

So basically change things from trying to present a standardized address
space where say the 'Q' key is always in the same place just model
a keyboard as a string of LEDs (1 dimensional / so an array) and leave
mapping which address in the array is which key to userspace, then userspace
can have json or whatever files for this per keyboard.

This keeps the kernel interface much more KISS which I think is what
we need to strive for.

So instead of having /dev/rgbkbd we get a /dev/rgbledstring and then
that can be used for rbb-kbds and also your lightbar example as well
as actual RGB LED strings, which depending on the controller may
also have zones / effects, etc. just like the keyboards.



> - Right shift would have 3 values in row 10. The first value might be the 
> left side of shift or the additional ABNT/JIS key. The 2nd Key might be the 
> left side or middle of shift and the third key might be the right side of 
> shift or the only value for the whole key. The additional ABNT/JIS key still 
> also has a dedicated value which is used by drivers which can differentiate 
> between physical layouts.
> 
> - Enter would have 3 values in row 8 and 3 values in row 9. With the same 
> disambiguation as the additional ABNT/JIS but this time for ansii-/ and iso-#
> 
> - Num+ would have 2 values, one row 8 and one in row 9. The one in row 9 
> might control the whole key or might just control the lower half. The one in 
> row 8 might be another key or the upper half
> 
> For the left half if the main block the leftmost value should be the "might 
> be the only relevant"-value while the right most value should be the "might 
> be another key"-value. For the right side of the main block this should be 
> swapped. Unused values should be adjacent to the "might be another 
> key"-value, e.g.:
> 
>   | Left shift value 1    | Left shift value 
> 2   | Left shift value 3    | Left shift value 4 | 102nd 
> key value
> ISO/ANSI aware    | Left shift color  | Unused
>    | Unused    | Unused | 102nd 
> key color
> ISO non aware 1 led under shift   | Left shift color  | Unused
>    | Unused    | 102nd key color    | Unused
> ANSI non aware 1 led under shift  | Left shift color  | Unused
>    | Unused    | Unused | Unused
> ISO non aware 2 leds under shift  | Left shift left color | Left shift right 
> color   | Unused    | 102nd key color    | Unused
> ANSI non aware 2 leds under shift | Left shift left color | Left shift right 
> color   | Unused    | Unused | Unused
> ISO non aware 3 leds under shift  | Left shift left color | Left shift middle 
> 

Re: Implement per-key keyboard backlight as auxdisplay?

2024-01-29 Thread Hans de Goede
Hi,

On 1/19/24 21:15, Pavel Machek wrote:
> Hi!
> 
 2. Implement per-key keyboards as auxdisplay

     - Pro:

         - Already has a concept for led positions

         - Is conceptually closer to "multiple leds forming a singular 
 entity"

     - Con:

         - No preexisting UPower support

         - No concept for special hardware lightning modes

         - No support for arbitrary led outlines yet (e.g. ISO style 
 enter-key)
>>>
>>> Please do this one.
>>
>> Ok, so based on the discussion so far and Pavel's feedback lets try to
>> design a custom userspace API for this. I do not believe that auxdisplay
>> is a good fit because:
> 
> Ok, so lets call this a "display". These days, framebuffers and drm
> handles displays. My proposal is to use similar API as other displays.
> 
>> So my proposal would be an ioctl interface (ioctl only no r/w)
>> using /dev/rgbkbd0 /dev/rgbkdb1, etc. registered as a misc chardev.
>>
>> For per key controllable rgb LEDs we need to discuss a coordinate
>> system. I propose using a fixed size of 16 rows of 64 keys,
>> so 64x16 in standard WxH notation.
>>
>> And then storing RGB in separate bytes, so userspace will then
>> always send a buffer of 192 bytes per line (64x3) x 14 rows
>> = 3072 bytes. With the kernel driver ignoring parts of
>> the buffer where there are no actual keys.
> 
> That's really really weird interface. If you are doing RGB888 64x14,
> lets make it a ... display? :-).
> 
> ioctl always sending 3072 bytes is really a hack.
> 
> Small displays exist and are quite common, surely we'd handle this as
> a display:
> https://pajenicko.cz/displeje/graficky-oled-displej-0-66-64x48-i2c-bily-wemos-d1-mini
> It is 64x48.

This is indeed a display and should use display APIs

> And then there's this:
> https://pajenicko.cz/displeje/maticovy-8x8-led-displej-s-radicem-max7219
> and this:
> https://pajenicko.cz/displeje/maticovy-8x32-led-displej-s-radicem-max7219
>
> One of them is 8x8.
> 
> Surely those should be displays, too?

The 8x8 one not really, the other one could be used to scroll
some text one but cannot display images, so not really displays
IMHO.

Anyways we are talking about keyboards here and those do not have
a regular x-y grid like your example above, so they certainly do
not count as displays. See the long discussion earlier in the thread.

Regards,

Hans






[PATCH 4/5] drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint first

2024-01-29 Thread Abel Vesa
From: Abhinav Kumar 

On platforms where the endpoint used is on port@0, looking for port@1
instead results in just ignoring the max link-frequencies altogether.
Look at port@0 first, then, if not found, look for port@1.

Signed-off-by: Abhinav Kumar 
Signed-off-by: Abel Vesa 
---
 drivers/gpu/drm/msm/dp/dp_parser.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
b/drivers/gpu/drm/msm/dp/dp_parser.c
index 7032dcc8842b..eec5b8b83f4b 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.c
+++ b/drivers/gpu/drm/msm/dp/dp_parser.c
@@ -97,7 +97,11 @@ static u32 dp_parser_link_frequencies(struct device_node 
*of_node)
u64 frequency = 0;
int cnt;
 
-   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* port@1 */
+   endpoint = of_graph_get_endpoint_by_regs(of_node, 0, 0); /* port@0 */
+
+   if (!endpoint)
+   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* 
port@1 */
+
if (!endpoint)
return 0;
 

-- 
2.34.1



[PATCH 3/5] drm/msm: mdss: Add X1E80100 support

2024-01-29 Thread Abel Vesa
Add support for MDSS on X1E80100.

Signed-off-by: Abel Vesa 
---
 drivers/gpu/drm/msm/msm_mdss.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
index 455b2e3a0cdd..eddf7fdbb60a 100644
--- a/drivers/gpu/drm/msm/msm_mdss.c
+++ b/drivers/gpu/drm/msm/msm_mdss.c
@@ -564,6 +564,15 @@ static const struct msm_mdss_data sdm670_data = {
.highest_bank_bit = 1,
 };
 
+static const struct msm_mdss_data x1e80100_data = {
+   .ubwc_enc_version = UBWC_4_0,
+   .ubwc_dec_version = UBWC_4_3,
+   .ubwc_swizzle = 6,
+   .ubwc_static = 1,
+   .highest_bank_bit = 2,
+   .macrotile_mode = 1,
+};
+
 static const struct msm_mdss_data sdm845_data = {
.ubwc_enc_version = UBWC_2_0,
.ubwc_dec_version = UBWC_2_0,
@@ -655,6 +664,7 @@ static const struct of_device_id mdss_dt_match[] = {
{ .compatible = "qcom,sm8450-mdss", .data = _data },
{ .compatible = "qcom,sm8550-mdss", .data = _data },
{ .compatible = "qcom,sm8650-mdss", .data = _data},
+   { .compatible = "qcom,x1e80100-mdss", .data = _data},
{}
 };
 MODULE_DEVICE_TABLE(of, mdss_dt_match);

-- 
2.34.1



[PATCH 5/5] drm/msm/dpu: Add X1E80100 support

2024-01-29 Thread Abel Vesa
Add definitions for the display hardware used on the Qualcomm X1E80100
platform.

Co-developed-by: Abhinav Kumar 
Signed-off-by: Abhinav Kumar 
Signed-off-by: Abel Vesa 
---
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
 4 files changed, 453 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h 
b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h
new file mode 100644
index ..d4f1fbfa420a
--- /dev/null
+++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h
@@ -0,0 +1,449 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023, Linaro Limited
+ */
+
+#ifndef _DPU_9_2_X1E80100_H
+#define _DPU_9_2_X1E80100_H
+
+static const struct dpu_caps x1e80100_dpu_caps = {
+   .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .max_mixer_blendstages = 0xb,
+   .has_src_split = true,
+   .has_dim_layer = true,
+   .has_idle_pc = true,
+   .has_3d_merge = true,
+   .max_linewidth = 5120,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
+};
+
+static const struct dpu_mdp_cfg x1e80100_mdp = {
+   .name = "top_0",
+   .base = 0, .len = 0x494,
+   .features = BIT(DPU_MDP_PERIPH_0_REMOVED),
+   .clk_ctrls = {
+   [DPU_CLK_CTRL_REG_DMA] = { .reg_off = 0x2bc, .bit_off = 20 },
+   },
+};
+
+/* FIXME: get rid of DPU_CTL_SPLIT_DISPLAY in favour of proper ACTIVE_CTL 
support */
+static const struct dpu_ctl_cfg x1e80100_ctl[] = {
+   {
+   .name = "ctl_0", .id = CTL_0,
+   .base = 0x15000, .len = 0x290,
+   .features = CTL_SM8550_MASK | BIT(DPU_CTL_SPLIT_DISPLAY),
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
+   }, {
+   .name = "ctl_1", .id = CTL_1,
+   .base = 0x16000, .len = 0x290,
+   .features = CTL_SM8550_MASK | BIT(DPU_CTL_SPLIT_DISPLAY),
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 10),
+   }, {
+   .name = "ctl_2", .id = CTL_2,
+   .base = 0x17000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 11),
+   }, {
+   .name = "ctl_3", .id = CTL_3,
+   .base = 0x18000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 12),
+   }, {
+   .name = "ctl_4", .id = CTL_4,
+   .base = 0x19000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 13),
+   }, {
+   .name = "ctl_5", .id = CTL_5,
+   .base = 0x1a000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 23),
+   },
+};
+
+static const struct dpu_sspp_cfg x1e80100_sspp[] = {
+   {
+   .name = "sspp_0", .id = SSPP_VIG0,
+   .base = 0x4000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = _vig_sblk_qseed3_3_2,
+   .xin_id = 0,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_1", .id = SSPP_VIG1,
+   .base = 0x6000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = _vig_sblk_qseed3_3_2,
+   .xin_id = 4,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_2", .id = SSPP_VIG2,
+   .base = 0x8000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = _vig_sblk_qseed3_3_2,
+   .xin_id = 8,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_3", .id = SSPP_VIG3,
+   .base = 0xa000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = _vig_sblk_qseed3_3_2,
+   .xin_id = 12,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_8", .id = SSPP_DMA0,
+   .base = 0x24000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = _dma_sblk,
+   .xin_id = 1,
+   .type = SSPP_TYPE_DMA,
+   }, {
+   .name = "sspp_9", .id = SSPP_DMA1,
+   .base = 0x26000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = _dma_sblk,
+   .xin_id = 5,
+   .type = SSPP_TYPE_DMA,
+   }, {
+   .name = "sspp_10", .id = SSPP_DMA2,
+   .base = 0x28000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = _dma_sblk,
+   .xin_id = 9,
+   .type = SSPP_TYPE_DMA,
+   }, {
+   

[PATCH 2/5] dt-bindings: display/msm: Document the DPU for X1E80100

2024-01-29 Thread Abel Vesa
Document the DPU for Qualcomm X1E80100 platform in the SM8650 schema, as
they are similar.

Signed-off-by: Abel Vesa 
---
 Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml 
b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
index a01d15a03317..f84fa6d5e6a2 100644
--- a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
@@ -13,7 +13,10 @@ $ref: /schemas/display/msm/dpu-common.yaml#
 
 properties:
   compatible:
-const: qcom,sm8650-dpu
+items:
+  - enum:
+  - qcom,sm8650-dpu
+  - qcom,x1e80100-dpu
 
   reg:
 items:

-- 
2.34.1



[PATCH 1/5] dt-bindings: display/msm: document MDSS on X1E80100

2024-01-29 Thread Abel Vesa
Document the MDSS hardware found on the Qualcomm X1E80100 platform.

Signed-off-by: Abel Vesa 
---
 .../bindings/display/msm/qcom,x1e80100-mdss.yaml   | 249 +
 1 file changed, 249 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml 
b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml
new file mode 100644
index ..eaa91f7d61ac
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml
@@ -0,0 +1,249 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/qcom,x1e80100-mdss.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm X1E80100 Display MDSS
+
+maintainers:
+  - Abel Vesa 
+
+description:
+  X1E80100 MSM Mobile Display Subsystem(MDSS), which encapsulates sub-blocks 
like
+  DPU display controller, DP interfaces, etc.
+
+$ref: /schemas/display/msm/mdss-common.yaml#
+
+properties:
+  compatible:
+const: qcom,x1e80100-mdss
+
+  clocks:
+items:
+  - description: Display AHB
+  - description: Display hf AXI
+  - description: Display core
+
+  iommus:
+maxItems: 1
+
+  interconnects:
+maxItems: 3
+
+  interconnect-names:
+maxItems: 3
+
+patternProperties:
+  "^display-controller@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dpu
+
+  "^displayport-controller@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dp
+
+  "^phy@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dp-phy
+
+required:
+  - compatible
+
+unevaluatedProperties: false
+
+examples:
+  - |
+#include 
+#include 
+#include 
+
+display-subsystem@ae0 {
+compatible = "qcom,x1e80100-mdss";
+reg = <0x0ae0 0x1000>;
+reg-names = "mdss";
+
+interconnects = <_noc MASTER_MDP 0 _noc SLAVE_LLCC 0>,
+<_virt MASTER_LLCC 0 _virt SLAVE_EBI1 0>;
+<_noc MASTER_APPSS_PROC 0 _noc 
SLAVE_DISPLAY_CFG 0>;
+interconnect-names = "mdp0-mem", "mdp1-mem", "cpu-cfg";
+
+resets = <_core_bcr>;
+
+power-domains = <_gdsc>;
+
+clocks = < DISP_CC_MDSS_AHB_CLK>,
+ < GCC_DISP_AHB_CLK>,
+ < GCC_DISP_HF_AXI_CLK>,
+ < DISP_CC_MDSS_MDP_CLK>;
+clock-names = "iface", "bus", "nrt_bus", "core";
+
+interrupts = ;
+interrupt-controller;
+#interrupt-cells = <1>;
+
+iommus = <_smmu 0x1c00 0x2>;
+
+#address-cells = <1>;
+#size-cells = <1>;
+ranges;
+
+display-controller@ae01000 {
+compatible = "qcom,x1e80100-dpu";
+reg = <0x0ae01000 0x8f000>,
+  <0x0aeb 0x2008>;
+reg-names = "mdp", "vbif";
+
+clocks = <_axi_clk>,
+ <_ahb_clk>,
+ <_mdp_lut_clk>,
+ <_mdp_clk>,
+ <_mdp_vsync_clk>;
+clock-names = "nrt_bus",
+  "iface",
+  "lut",
+  "core",
+  "vsync";
+
+assigned-clocks = <_mdp_vsync_clk>;
+assigned-clock-rates = <1920>;
+
+operating-points-v2 = <_opp_table>;
+power-domains = < RPMHPD_MMCX>;
+
+interrupt-parent = <>;
+interrupts = <0>;
+
+ports {
+#address-cells = <1>;
+#size-cells = <0>;
+
+port@0 {
+reg = <0>;
+dpu_intf1_out: endpoint {
+remote-endpoint = <_in>;
+};
+};
+
+port@1 {
+reg = <1>;
+dpu_intf2_out: endpoint {
+remote-endpoint = <_in>;
+};
+};
+};
+
+mdp_opp_table: opp-table {
+compatible = "operating-points-v2";
+
+opp-2 {
+opp-hz = /bits/ 64 <2>;
+required-opps = <_opp_low_svs>;
+};
+
+opp-32500 {
+opp-hz = /bits/ 64 <32500>;
+required-opps = <_opp_svs>;
+};
+
+opp-37500 {
+opp-hz = /bits/ 64 <37500>;
+required-opps = <_opp_svs_l1>;
+};
+
+opp-51400 {
+opp-hz = /bits/ 64 <51400>;
+required-opps = <_opp_nom>;
+};
+};
+};
+
+displayport-controller@ae9 {
+compatible = "qcom,x1e80100-dp";
+reg = <0 0xae9 0 0x200>,
+   

[PATCH 0/5] drm/msm: Add display support for X1E80100

2024-01-29 Thread Abel Vesa
This patchset adds support for display for X1E80100.
The support for embedded DisplayPort on this platform will not
be enabled using the connetor type from driver match data,
but through some 'is-edp' property via DT. This subsequent work
will be part of a separate patchset.

Signed-off-by: Abel Vesa 
---
Abel Vesa (4):
  dt-bindings: display/msm: document MDSS on X1E80100
  dt-bindings: display/msm: Document the DPU for X1E80100
  drm/msm: mdss: Add X1E80100 support
  drm/msm/dpu: Add X1E80100 support

Abhinav Kumar (1):
  drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint 
first

 .../bindings/display/msm/qcom,sm8650-dpu.yaml  |   5 +-
 .../bindings/display/msm/qcom,x1e80100-mdss.yaml   | 249 
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
 drivers/gpu/drm/msm/dp/dp_parser.c |   6 +-
 drivers/gpu/drm/msm/msm_mdss.c |  10 +
 8 files changed, 721 insertions(+), 2 deletions(-)
---
base-commit: 6776c8d0924953c6bbd4920d8408f4c1d898af71
change-id: 20231201-x1e80100-display-a46324400baf

Best regards,
-- 
Abel Vesa 



Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Christian König

Am 29.01.24 um 14:06 schrieb Paul Cercueil:

Hi Christian,

Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :

Am 27.01.24 um 17:50 schrieb Jonathan Cameron:

+   iio_buffer_dmabuf_put(attach);
+
+out_dmabuf_put:
+   dma_buf_put(dmabuf);

As below. Feels like a __free(dma_buf_put) bit of magic would
be a
nice to have.

I'm working on the patches right now, just one quick question.

Having a __free(dma_buf_put) requires that dma_buf_put is first
"registered" as a freeing function using DEFINE_FREE() in
, which has not been done yet.

That would mean carrying a dma-buf specific patch in your tree,
are you
OK with that?

Needs an ACK from appropriate maintainer, but otherwise I'm fine
doing
so.  Alternative is to circle back to this later after this code is
upstream.

Separate patches for that please, the autocleanup feature is so new
that
I'm not 100% convinced that everything works out smoothly from the
start.

Separate patches is a given, did you mean outside this patchset?
Because I can send a separate patchset that introduces scope-based
management for dma_fence and dma_buf, but then it won't have users.


Outside of the patchset, this is essentially brand new stuff.

IIRC we have quite a number of dma_fence selftests and sw_sync which is 
basically code inside the drivers/dma-buf directory only there for 
testing DMA-buf functionality.


Convert those over as well and I'm more than happy to upstream this change.

Thanks,
Christian.



Cheers,
-Paul




Re: [PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-29 Thread Christian König

Am 29.01.24 um 11:31 schrieb Julia Zhang:

As vram objects don't have backing pages and thus can't implement
drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
and implement virtgpu specific map/unmap/attach callbacks to support
both of shmem objects and vram objects.

Signed-off-by: Julia Zhang 


I need to find more time to look into the code, but of hand I would say 
that this is the correct solution.


Regards,
Christian.


---
  drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
  1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index 44425f20d91a..b490a5343b06 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
  {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   struct sg_table *sgt;
+   int ret;
  
  	if (virtio_gpu_is_vram(bo))

return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
  
-	return drm_gem_map_dma_buf(attach, dir);

+   sgt = drm_prime_pages_to_sg(obj->dev,
+   to_drm_gem_shmem_obj(obj)->pages,
+   obj->size >> PAGE_SHIFT);
+   if (IS_ERR(sgt))
+   return sgt;
+
+   ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
+   sg_free_table(sgt);
+   kfree(sgt);
+   return ERR_PTR(ret);
+   }
+
+   return sgt;
  }
  
  static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,

@@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
  
+	if (!sgt)

+   return;
+
if (virtio_gpu_is_vram(bo)) {
virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
-   return;
+   } else {
+   dma_unmap_sgtable(attach->dev, sgt, dir, 
DMA_ATTR_SKIP_CPU_SYNC);
+   sg_free_table(sgt);
+   kfree(sgt);
}
+}
+
+static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
+struct dma_buf_attachment *attach)
+{
+   struct drm_gem_object *obj = attach->dmabuf->priv;
+   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   int ret = 0;
+
+   if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
+   ret = obj->funcs->pin(obj);
  
-	drm_gem_unmap_dma_buf(attach, sgt, dir);

+   return ret;
  }
  
  static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {

@@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  
{
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
},
-   .device_attach = drm_gem_map_attach,
+   .device_attach = virtgpu_gem_device_attach,
.get_uuid = virtgpu_virtio_get_uuid,
  };
  




Re: [PATCH v2 10/10] drm/vboxvideo: fix mapping leaks

2024-01-29 Thread Philipp Stanner
Hi,

On Mon, 2024-01-29 at 12:15 +0100, Hans de Goede wrote:
> Hi Philipp,
> 
> On 1/23/24 10:43, Philipp Stanner wrote:
> > When the PCI devres API was introduced to this driver, it was
> > wrongly
> > assumed that initializing the device with pcim_enable_device()
> > instead
> > of pci_enable_device() will make all PCI functions managed.
> > 
> > This is wrong and was caused by the quite confusing devres API for
> > PCI
> > in which some, but not all, functions become managed that way.
> > 
> > The function pci_iomap_range() is never managed.
> > 
> > Replace pci_iomap_range() with the actually managed function
> > pcim_iomap_range().
> > 
> > Additionally, add a call to pcim_request_region() to ensure
> > exclusive
> > access to BAR 0.
> 
> I'm a bit worried about this last change. There might be
> issues where the pcim_request_region() fails due to
> e.g. a conflict with the simplefb / simpledrm code.
> 
> There is a drm_aperture_remove_conflicting_pci_framebuffers()
> call done before hw_init() gets called, but still this
> has been known to cause issues in the past.
> 
> Can you split out the adding of the pcim_request_region()
> into a separate patch and *not* mark that separate patch
> for stable ?

Yes, that sounds reasonable. I'll split it out and deal with it once
I'll send the other DRM patches from my backlog.

Greetings,
P.

> 
> Regards,
> 
> Hans
> 
> 
> 
> 
> 
> > 
> > CC:  # v5.10+
> > Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
> > Signed-off-by: Philipp Stanner 
> > ---
> >  drivers/gpu/drm/vboxvideo/vbox_main.c | 24 +--
> > -
> >  1 file changed, 13 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c
> > b/drivers/gpu/drm/vboxvideo/vbox_main.c
> > index 42c2d8a99509..7f686a0190e6 100644
> > --- a/drivers/gpu/drm/vboxvideo/vbox_main.c
> > +++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
> > @@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private
> > *vbox)
> > /* Take a command buffer for each screen from the end of
> > usable VRAM. */
> > vbox->available_vram_size -= vbox->num_crtcs *
> > VBVA_MIN_BUFFER_SIZE;
> >  
> > -   vbox->vbva_buffers = pci_iomap_range(pdev, 0,
> > -    vbox-
> > >available_vram_size,
> > -    vbox->num_crtcs *
> > -    VBVA_MIN_BUFFER_SIZE);
> > -   if (!vbox->vbva_buffers)
> > -   return -ENOMEM;
> > +   vbox->vbva_buffers = pcim_iomap_range(
> > +   pdev, 0, vbox->available_vram_size,
> > +   vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
> > +   if (IS_ERR(vbox->vbva_buffers))
> > +   return PTR_ERR(vbox->vbva_buffers);
> >  
> > for (i = 0; i < vbox->num_crtcs; ++i) {
> > vbva_setup_buffer_context(>vbva_info[i],
> > @@ -115,12 +114,15 @@ int vbox_hw_init(struct vbox_private *vbox)
> >  
> > DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
> >  
> > +   ret = pcim_request_region(pdev, 0, "vboxvideo");
> > +   if (ret)
> > +   return ret;
> > +
> > /* Map guest-heap at end of vram */
> > -   vbox->guest_heap =
> > -   pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
> > -   GUEST_HEAP_SIZE);
> > -   if (!vbox->guest_heap)
> > -   return -ENOMEM;
> > +   vbox->guest_heap = pcim_iomap_range(pdev, 0,
> > +   GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
> > +   if (IS_ERR(vbox->guest_heap))
> > +   return PTR_ERR(vbox->guest_heap);
> >  
> > /* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
> > vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4,
> > -1,
> 



Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil
Hi Christian,

Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > + iio_buffer_dmabuf_put(attach);
> > > > > +
> > > > > +out_dmabuf_put:
> > > > > + dma_buf_put(dmabuf);
> > > > As below. Feels like a __free(dma_buf_put) bit of magic would
> > > > be a
> > > > nice to have.
> > > I'm working on the patches right now, just one quick question.
> > > 
> > > Having a __free(dma_buf_put) requires that dma_buf_put is first
> > > "registered" as a freeing function using DEFINE_FREE() in
> > >  > > buf.h>, which has not been done yet.
> > > 
> > > That would mean carrying a dma-buf specific patch in your tree,
> > > are you
> > > OK with that?
> > Needs an ACK from appropriate maintainer, but otherwise I'm fine
> > doing
> > so.  Alternative is to circle back to this later after this code is
> > upstream.
> 
> Separate patches for that please, the autocleanup feature is so new
> that 
> I'm not 100% convinced that everything works out smoothly from the
> start.

Separate patches is a given, did you mean outside this patchset?
Because I can send a separate patchset that introduces scope-based
management for dma_fence and dma_buf, but then it won't have users.

Cheers,
-Paul


Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Christian König

Am 27.01.24 um 17:50 schrieb Jonathan Cameron:

+   iio_buffer_dmabuf_put(attach);
+
+out_dmabuf_put:
+   dma_buf_put(dmabuf);

As below. Feels like a __free(dma_buf_put) bit of magic would be a
nice to have.

I'm working on the patches right now, just one quick question.

Having a __free(dma_buf_put) requires that dma_buf_put is first
"registered" as a freeing function using DEFINE_FREE() in , which has not been done yet.

That would mean carrying a dma-buf specific patch in your tree, are you
OK with that?

Needs an ACK from appropriate maintainer, but otherwise I'm fine doing
so.  Alternative is to circle back to this later after this code is upstream.


Separate patches for that please, the autocleanup feature is so new that 
I'm not 100% convinced that everything works out smoothly from the start.


Regards,
Christian.




Cheers,
-Paul




[PATCH] backlight: ktz8866: Correct the check for of_property_read_u32

2024-01-29 Thread Jianhua Lu
of_property_read_u32 returns 0 when success, so reverse the
return value to get the true value.

Fixes: f8449c8f7355 ("backlight: ktz8866: Add support for Kinetic KTZ8866 
backlight")
Signed-off-by: Jianhua Lu 
---
 drivers/video/backlight/ktz8866.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/video/backlight/ktz8866.c 
b/drivers/video/backlight/ktz8866.c
index 9c980f2571ee..014877b5a984 100644
--- a/drivers/video/backlight/ktz8866.c
+++ b/drivers/video/backlight/ktz8866.c
@@ -97,20 +97,20 @@ static void ktz8866_init(struct ktz8866 *ktz)
 {
unsigned int val = 0;
 
-   if (of_property_read_u32(ktz->client->dev.of_node, "current-num-sinks", 
))
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"current-num-sinks", ))
ktz8866_write(ktz, BL_EN, BIT(val) - 1);
else
/* Enable all 6 current sinks if the number of current sinks 
isn't specified. */
ktz8866_write(ktz, BL_EN, BIT(6) - 1);
 
-   if (of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,current-ramp-delay-ms", )) {
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,current-ramp-delay-ms", )) {
if (val <= 128)
ktz8866_write(ktz, BL_CFG2, BIT(7) | (ilog2(val) << 3) 
| PWM_HYST);
else
ktz8866_write(ktz, BL_CFG2, BIT(7) | ((5 + val / 64) << 
3) | PWM_HYST);
}
 
-   if (of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,led-enable-ramp-delay-ms", )) {
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,led-enable-ramp-delay-ms", )) {
if (val == 0)
ktz8866_write(ktz, BL_DIMMING, 0);
else {
-- 
2.43.0



Re: [Linaro-mm-sig] [PATCH 2/3] udmabuf: Sync buffer mappings for attached devices

2024-01-29 Thread Christian König

Am 26.01.24 um 18:24 schrieb Andrew Davis:

On 1/25/24 2:30 PM, Daniel Vetter wrote:

On Tue, Jan 23, 2024 at 04:12:26PM -0600, Andrew Davis wrote:

Currently this driver creates a SGT table using the CPU as the
target device, then performs the dma_sync operations against
that SGT. This is backwards to how DMA-BUFs are supposed to behave.
This may have worked for the case where these buffers were given
only back to the same CPU that produced them as in the QEMU case.
And only then because the original author had the dma_sync
operations also backwards, syncing for the "device" on begin_cpu.
This was noticed and "fixed" in this patch[0].

That then meant we were sync'ing from the CPU to the CPU using
a pseudo-device "miscdevice". Which then caused another issue
due to the miscdevice not having a proper DMA mask (and why should
it, the CPU is not a DMA device). The fix for that was an even
more egregious hack[1] that declares the CPU is coherent with
itself and can access its own memory space..

Unwind all this and perform the correct action by doing the dma_sync
operations for each device currently attached to the backing buffer.

[0] commit 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
[1] commit 9e9fa6a9198b ("udmabuf: Set the DMA mask for the udmabuf 
device (v2)")


Signed-off-by: Andrew Davis 


So yeah the above hacks are terrible, but I don't think this is better.
What you're doing now is that you're potentially doing the flushing
multiple times, so if you have a lot of importers with life mappings 
this

is a performance regression.


I'd take lower performing but correct than fast and broken. :)

Syncing for CPU/device is about making sure the CPU/device can see
the data produced by the other. Some devices might be dma-coherent
and syncing for them would be a NOP, but we cant know that here
in this driver. Let's say we have two attached devices, one that
is cache coherent and one that isn't. If we only sync for first
attached device then that is converted to a NOP and we never flush
like the second device needed.

Same is true for devices behind IOMMU or with an L3 cache when
syncing in the other direction for CPU. So we have to sync for all
attached devices to ensure we get even the lowest common denominator
device sync'd. It is up to the DMA-API layer to decide which syncs
need to actually do something. If all attached devices are coherent
then all syncs will be NOPs and we have no performance penalty.



It's probably time to bite the bullet and teach the dma-api about 
flushing

for multiple devices. Or some way we can figure out which is the one
device we need to pick which gives us the right amount of flushing.



Seems like a constraint solving micro-optimization. The DMA-API layer
would have to track which buffers have already been flushed from CPU
cache and also track that nothing has been written into those caches
since that point, only then could it skip the flush. But that is already
the point of the dirty bit in the caches themselves, cleaning already
clean cache lines is essentially free in hardware. And so is invalidating
lines, it is just flipping a bit.


Well to separate the functionality a bit. What the DMA-API should 
provide is abstracting how the platform does flushing and invalidation 
of caches and the information which devices uses which caches and what 
needs to be flushed/invalidated to allow access between devices and the CPU.


In other words what's necessary is the following:
1. sync device to cpu
2. sync cpu to device
3. sync device to device

1 and 2 are already present and implemented for years, but 3 is missing 
together with some of the necessary infrastructure to actually implement 
this. E.g. we don't know which devices write into which caches etc...


On top of this we need the functionality to track who has accessed which 
piece of data and what DMA-API functions needs to be called to make 
things work for a specific use case. But this is then DMA-buf, I/O layer 
drivers etc.. and should not belong into the DMA-API.


I also strongly think that putting the SWIOTLB bounce buffer 
functionality into the DMA-API was not the right choice.


Regards,
Christian.



Andrew


Cheers, Sima


---
  drivers/dma-buf/udmabuf.c | 41 
+++

  1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 3a23f0a7d112a..ab6764322523c 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,8 +26,6 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a 
dmabuf, in megabytes. Default is

  struct udmabuf {
  pgoff_t pagecount;
  struct page **pages;
-    struct sg_table *sg;
-    struct miscdevice *device;
  struct list_head attachments;
  struct mutex lock;
  };
@@ -169,12 +167,8 @@ static void unmap_udmabuf(struct 
dma_buf_attachment *at,

  static void release_udmabuf(struct dma_buf *buf)
  {
  struct udmabuf *ubuf = buf->priv;
-    struct 

Re: [PATCH 8/8] fbdev/efifb: Remove framebuffer relocation tracking

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> If the firmware framebuffer has been reloacted, the sysfb code
> fixes the screen_info state before it creates the framebuffer's
> platform device. Efifb will automatically receive a screen_info
> with updated values. Hence remove the tracking from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 7/8] firmware/sysfb: Update screen_info for relocated EFI framebuffers

2024-01-29 Thread Javier Martinez Canillas
Javier Martinez Canillas  writes:

> Thomas Zimmermann  writes:
>
>> On ARM PCI systems, the PCI hierarchy might be reconfigured during
>> boot and the firmware framebuffer might move as a result of that.
>> The values in screen_info will then be invalid.
>>
>> Work around this problem by tracking the framebuffer's initial
>> location before it get relocated; then fix the screen_info state
>> between reloaction and creating the firmware framebuffer's device.
>>
>> This functionality has been lifted from efifb. See the commit message
>> of commit 55d728a40d36 ("efi/fb: Avoid reconfiguration of BAR that
>> covers the framebuffer") for more information.
>>
>> Signed-off-by: Thomas Zimmermann 
>> ---
>
> [...]
>
>>  #if defined(CONFIG_PCI)
>
> Shouldn't this be && !defined(CONFIG_X86) ? Or maybe &&
> defined(CONFIG_ARM64), although I don't know if the same
> also applies to other EFI platforms (e.g: CONFIG_RISCV).
>

Answering my own question, the !defined(CONFIG_X86) was dropped in the commit
dcf8f5ce3165 ("drivers/fbdev/efifb: Allow BAR to be moved instead of claiming
it"). The rationale is explained in that commit message:

While this is less likely to occur on x86, given that the firmware's
PCI resource allocation is more likely to be preserved, this is a
worthwhile sanity check to have in place, and so let's remove the
preprocessor conditional that makes it !X86 only.

So it is OK to just guard with #if defined(CONFIG_PCI).

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 7/8] firmware/sysfb: Update screen_info for relocated EFI framebuffers

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> On ARM PCI systems, the PCI hierarchy might be reconfigured during
> boot and the firmware framebuffer might move as a result of that.
> The values in screen_info will then be invalid.
>
> Work around this problem by tracking the framebuffer's initial
> location before it get relocated; then fix the screen_info state
> between reloaction and creating the firmware framebuffer's device.
>
> This functionality has been lifted from efifb. See the commit message
> of commit 55d728a40d36 ("efi/fb: Avoid reconfiguration of BAR that
> covers the framebuffer") for more information.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

>  #if defined(CONFIG_PCI)

Shouldn't this be && !defined(CONFIG_X86) ? Or maybe &&
defined(CONFIG_ARM64), although I don't know if the same
also applies to other EFI platforms (e.g: CONFIG_RISCV).

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 6/8] fbdev/efifb: Do not track parent device status

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> There will be no EFI framebuffer device for disabled parent devices
> and thus we never probe efifb in that case. Hence remove the tracking
> code from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Nice cleanup.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 5/8] firmware/sysfb: Create firmware device only for enabled PCI devices

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> Test if the firmware framebuffer's parent PCI device, if any, has
> been enabled. If not, the firmware framebuffer is most likely not
> working. Hence, do not create a device for the firmware framebuffer
> on disabled PCI devices.
>
> So far, efifb tracked the status of the PCI parent device internally
> and did not bind if it was disabled. This patch implements the
> functionality for all firmware framebuffers.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

>  
> +static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
> +{
> +#if defined(CONFIG_PCI)
> + /*
> +  * TODO: Try to integrate this code into the PCI subsystem
> +  */
> + int ret;
> + u16 command;
> +
> + ret = pci_read_config_word(pdev, PCI_COMMAND, );
> + if (ret != PCIBIOS_SUCCESSFUL)
> + return false;
> + if (!(command & PCI_COMMAND_MEMORY))
> + return false;
> + return true;
> +#else
> + // Getting here without PCI support is probably a bug.
> + return false;

Should we warn before return in this case ?

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 4/8] fbdev/efifb: Remove PM for parent device

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> The EFI device has the correct parent device set. This allows Linux
> to handle the power management internally. Hence, remove the manual
> PM management for the parent device from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 3/8] firmware/sysfb: Set firmware-framebuffer parent device

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> Set the firmware framebuffer's parent device, which usually is the
> graphics hardware's physical device. Integrates the framebuffer in
> the Linux device hierarchy and lets Linux handle dependencies among
> devices. For example, the graphics hardware won't be suspended while
> the firmware device is still active.
>
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/firmware/sysfb.c  | 11 ++-
>  drivers/firmware/sysfb_simplefb.c |  5 -
>  include/linux/sysfb.h |  3 ++-
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
> index 19706bd2642a..8a42da3f67a9 100644
> --- a/drivers/firmware/sysfb.c
> +++ b/drivers/firmware/sysfb.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ EXPORT_SYMBOL_GPL(sysfb_disable);
>  static __init int sysfb_init(void)
>  {
>   const struct screen_info *si = _info;
> + struct device *parent = NULL;
> + struct pci_dev *pparent;

Maybe pci_parent? It's easier to read than pparent IMO.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH v2 10/10] drm/vboxvideo: fix mapping leaks

2024-01-29 Thread Hans de Goede
Hi Philipp,

On 1/23/24 10:43, Philipp Stanner wrote:
> When the PCI devres API was introduced to this driver, it was wrongly
> assumed that initializing the device with pcim_enable_device() instead
> of pci_enable_device() will make all PCI functions managed.
> 
> This is wrong and was caused by the quite confusing devres API for PCI
> in which some, but not all, functions become managed that way.
> 
> The function pci_iomap_range() is never managed.
> 
> Replace pci_iomap_range() with the actually managed function
> pcim_iomap_range().
> 
> Additionally, add a call to pcim_request_region() to ensure exclusive
> access to BAR 0.

I'm a bit worried about this last change. There might be
issues where the pcim_request_region() fails due to
e.g. a conflict with the simplefb / simpledrm code.

There is a drm_aperture_remove_conflicting_pci_framebuffers()
call done before hw_init() gets called, but still this
has been known to cause issues in the past.

Can you split out the adding of the pcim_request_region()
into a separate patch and *not* mark that separate patch
for stable ?

Regards,

Hans





> 
> CC:  # v5.10+
> Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
> Signed-off-by: Philipp Stanner 
> ---
>  drivers/gpu/drm/vboxvideo/vbox_main.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
> b/drivers/gpu/drm/vboxvideo/vbox_main.c
> index 42c2d8a99509..7f686a0190e6 100644
> --- a/drivers/gpu/drm/vboxvideo/vbox_main.c
> +++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
> @@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private *vbox)
>   /* Take a command buffer for each screen from the end of usable VRAM. */
>   vbox->available_vram_size -= vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE;
>  
> - vbox->vbva_buffers = pci_iomap_range(pdev, 0,
> -  vbox->available_vram_size,
> -  vbox->num_crtcs *
> -  VBVA_MIN_BUFFER_SIZE);
> - if (!vbox->vbva_buffers)
> - return -ENOMEM;
> + vbox->vbva_buffers = pcim_iomap_range(
> + pdev, 0, vbox->available_vram_size,
> + vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
> + if (IS_ERR(vbox->vbva_buffers))
> + return PTR_ERR(vbox->vbva_buffers);
>  
>   for (i = 0; i < vbox->num_crtcs; ++i) {
>   vbva_setup_buffer_context(>vbva_info[i],
> @@ -115,12 +114,15 @@ int vbox_hw_init(struct vbox_private *vbox)
>  
>   DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
>  
> + ret = pcim_request_region(pdev, 0, "vboxvideo");
> + if (ret)
> + return ret;
> +
>   /* Map guest-heap at end of vram */
> - vbox->guest_heap =
> - pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
> - GUEST_HEAP_SIZE);
> - if (!vbox->guest_heap)
> - return -ENOMEM;
> + vbox->guest_heap = pcim_iomap_range(pdev, 0,
> + GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
> + if (IS_ERR(vbox->guest_heap))
> + return PTR_ERR(vbox->guest_heap);
>  
>   /* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
>   vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4, -1,



Re: [PATCH 2/8] video: Provide screen_info_get_pci_dev() to find screen_info's PCI device

2024-01-29 Thread Javier Martinez Canillas
Thomas Zimmermann  writes:

> Add screen_info_get_pci_dev() to find the PCI device of an instance
> of screen_info. Does nothing on systems without PCI bus.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

> +struct pci_dev *screen_info_pci_dev(const struct screen_info *si)
> +{
> + struct resource res[SCREEN_INFO_MAX_RESOURCES];
> + size_t i, numres;
> + int ret;
> +
> + ret = screen_info_resources(si, res, ARRAY_SIZE(res));
> + if (ret < 0)
> + return ERR_PTR(ret);
> + numres = ret;
> +

I would just drop the ret variable and assign the screen_info_resources()
return value to numres. I think that makes the code easier to follow.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH v5 0/3] drm/i915: Fix VMA UAF on destroy against deactivate race

2024-01-29 Thread Janusz Krzysztofik
Hi Nirmoy,

On Monday, 29 January 2024 10:24:07 CET Nirmoy Das wrote:
> Hi Janusz,
> 
> There seems to be a regression in CI related to this:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129026v2/bat-dg1-7/
igt@gem_lmem_swapping@random-engi...@lmem0.html#dmesg-warnings1053
> 
> Please have a look.

Yes, that's a problem, the series can't be merged in its current shape.  
However, I'm not sure if that's really a regression, or rather an exposure of 
another, already existing issue.  It looks like a race among two concurrent 
calls to our __active_retire() used in DMA fence callbacks.  I'm going to 
verify some ideas for a fix on trybot.

Thanks,
Janusz

> 
> 
> Regards,
> 
> Nirmoy
> 
> On 1/24/2024 6:13 PM, Janusz Krzysztofik wrote:
> > Object debugging tools were sporadically reporting illegal attempts to
> > free a still active i915 VMA object when parking a GT believed to be idle.
> >
> > [161.359441] ODEBUG: free active (active state 0) object: 88811643b958 
object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
> > [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 
debug_print_object+0x80/0xb0
> > ...
> > [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-
CI_DRM_13375-g003f860e5577+ #1
> > [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/
RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
> > [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
> > [161.360592] RIP: 0010:debug_print_object+0x80/0xb0
> > ...
> > [161.361347] debug_object_free+0xeb/0x110
> > [161.361362] i915_active_fini+0x14/0x130 [i915]
> > [161.361866] release_references+0xfe/0x1f0 [i915]
> > [161.362543] i915_vma_parked+0x1db/0x380 [i915]
> > [161.363129] __gt_park+0x121/0x230 [i915]
> > [161.363515] intel_wakeref_put_last+0x1f/0x70 [i915]
> >
> > That has been tracked down to be happening when another thread is
> > deactivating the VMA inside __active_retire() helper, after the VMA's
> > active counter has been already decremented to 0, but before deactivation
> > of the VMA's object is reported to the object debugging tool.
> >
> > We could prevent from that race by serializing i915_active_fini() with
> > __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
> > being used, e.g. from __i915_vma_retire() called at the end of
> > __active_retire(), after that VMA has been already freed by a concurrent
> > i915_vma_destroy() on return from the i915_active_fini().  Then, we should
> > rather fix the issue at the VMA level, not in i915_active.
> >
> > Since __i915_vma_parked() is called from __gt_park() on last put of the
> > GT's wakeref, the issue could be addressed by holding the GT wakeref long
> > enough for __active_retire() to complete before that wakeref is released
> > and the GT parked.
> >
> > A VMA associated with a request doesn't acquire a GT wakeref by itself.
> > Instead, it depends on a wakeref held directly by the request's active
> > intel_context for a GT associated with its VM, and indirectly on that
> > intel_context's engine wakeref if the engine belongs to the same GT as the
> > VMA's VM.  Those wakerefs are released asynchronously to VMA deactivation.
> >
> > In case of single-GT platforms, at least one of those wakerefs is usually
> > held long enough for the request's VMA to be deactivated on time, before
> > it is destroyed on last put of its VM GT wakeref.  However, on multi-GT
> > platforms, a request may use a VMA from a GT other than the one that hosts
> > the request's engine, then it is protected only with the intel_context's
> > VM GT wakeref.
> >
> > There was an attempt to fix the issue on 2-GT Meteor Lake by acquiring an
> > extra wakeref for a Primary GT from i915_gem_do_execbuffer() -- see commit
> > f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").  However,
> > that fix occurred insufficient -- the issue was still reported by CI.
> > That wakeref was released on exit from i915_gem_do_execbuffer(), then
> > potentially before completion of the request and deactivation of its
> > associated VMAs.  Moreover, CI reports indicated that single-GT platforms
> > also suffered sporadically from the same race.
> >
> > I believe the issue was introduced by commit d93939730347 ("drm/i915:
> > Remove the vma refcount") which moved a call to i915_active_fini() from
> > a dropped i915_vma_release(), called on last put of the removed VMA kref,
> > to i915_vma_parked() processing path called on last put of a GT wakeref.
> > However, its visibility to the object debugging tool was suppressed by a
> > bug in i915_active that was fixed two weeks later with commit e92eb246feb9
> > ("drm/i915/active: Fix missing debug object activation").
> >
> > Fix the issue by getting a wakeref for the VMA's GT when activating it,
> > and putting that wakeref only after the VMA is deactivated.  However,
> > exclude global GTT from that processing path, otherwise the 

Re: Re: Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6

2024-01-29 Thread Thomas Hellström
On Fri, 2024-01-26 at 16:22 -0600, Lucas De Marchi wrote:
> On Fri, Jan 26, 2024 at 04:16:58PM -0600, Lucas De Marchi wrote:
> > On Thu, Jan 18, 2024 at 05:38:16PM +0100, Thomas Hellström wrote:
> > > 
> > > On 1/17/24 13:27, Thomas Hellström wrote:
> > > > 
> > > > On 1/17/24 11:47, Thomas Hellström wrote:
> > > > > Hi, Christian
> > > > > 
> > > > > Xe changes look good. Will send the series to xe ci to check
> > > > > for 
> > > > > regressions.
> > > > 
> > > > Hmm, there are some checkpatch warnings about author / SOB
> > > > email 
> > > > mismatch,
> > > 
> > > With those fixed, this patch is
> > > 
> > > Reviewed-by: Thomas Hellström 
> > 
> > 
> > it actually broke drm-tip now that this is merged:
> > 
> > ../drivers/gpu/drm/xe/xe_bo.c:41:10: error: ‘struct ttm_placement’
> > has no member named ‘num_busy_placement’; did you mean
> > ‘num_placement’
> >   41 | .num_busy_placement = 1,
> >  |  ^~
> >  |  num_placement
> > ../drivers/gpu/drm/xe/xe_bo.c:41:31: error: excess elements in
> > struct initializer [-Werror]
> >   41 | .num_busy_placement = 1,
> >  |   ^
> > 
> > 
> > Apparently a conflict with another patch that got applied a few
> > days
> > ago: a201c6ee37d6 ("drm/xe/bo: Evict VRAM to TT rather than to
> > system")
> 
> oh, no... apparently that commit is  from a long time ago. The
> problem
> was that drm-misc-next was not yet in sync with drm-next. Thomas, do
> you
> have a fixup for this to put in rerere?
> 
> Lucas De Marchi

I added this as a manual fixup and ran some quick igt tests.

Seems to work.





Re: [PATCH 03/19] drm/i915/dp: Add support to notify MST connectors to retry modesets

2024-01-29 Thread Imre Deak
On Mon, Jan 29, 2024 at 12:36:12PM +0200, Hogander, Jouni wrote:
> On Tue, 2024-01-23 at 12:28 +0200, Imre Deak wrote:
> > [...]
> > +void
> > +intel_dp_queue_modeset_retry_for_link(struct intel_atomic_state *state,
> > + struct intel_encoder *encoder,
> > + const struct intel_crtc_state 
> > *crtc_state,
> > + const struct drm_connector_state 
> > *conn_state)
> > +{
> > +   struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
> > +   struct intel_connector *connector;
> > +   struct intel_digital_connector_state *iter_conn_state;
> > +   struct intel_dp *intel_dp;
> > +   int i;
> > +
> > +   if (conn_state) {
> > +   connector = to_intel_connector(conn_state->connector);
> > +   intel_dp_queue_modeset_retry_work(connector);
> > +
> > +   return;
> > +   }
> > +
> > +   if (drm_WARN_ON(>drm,
> > +   !intel_crtc_has_type(crtc_state, 
> > INTEL_OUTPUT_DP_MST)))
> > +   return;
> > +
> > +   intel_dp = enc_to_intel_dp(encoder);
> > +
> > +   for_each_new_intel_connector_in_state(state, connector, 
> > iter_conn_state, i) {
> > +   (void)iter_conn_state;
> 
> Checked iter_conn_state->base->crtc documentation:
> 
> @crtc: CRTC to connect connector to, NULL if disabled.
> 
> Do we need to check if connector is "disabled" or is it impossible
> scenario?

Yes, it does show if the connector is disabled and it would make sense
to not notify those. However the check for that would be racy, at least
during a non-blocking commit, but I think also in general where
userspace could be in the middle of enabling this connector.

The point of the notification is that userspace re-checks the mode it
wants on each MST connector to be enabled, so to prevent that it would
miss the re-check on connectors with a pending enabling like above, the
notification is simply sent to all the connectors in the MST topology.

> 
> BR,
> 
> Jouni Högander
> 
> 
> > +
> > +   if (connector->mst_port != intel_dp)
> > +   continue;
> > +
> > +   intel_dp_queue_modeset_retry_work(connector);
> > +   }
> > +}
> > +
> >  int
> >  intel_dp_compute_config(struct intel_encoder *encoder,
> > struct intel_crtc_state *pipe_config,
> > @@ -6436,6 +6480,14 @@ static void
> > intel_dp_modeset_retry_work_fn(struct work_struct *work)
> > mutex_unlock(>dev->mode_config.mutex);
> > /* Send Hotplug uevent so userspace can reprobe */
> > drm_kms_helper_connector_hotplug_event(connector);
> > +
> > +   drm_connector_put(connector);
> > +}
> > +
> > +void intel_dp_init_modeset_retry_work(struct intel_connector
> > *connector)
> > +{
> > +   INIT_WORK(>modeset_retry_work,
> > + intel_dp_modeset_retry_work_fn);
> >  }
> >
> >  bool
> > @@ -6452,8 +6504,7 @@ intel_dp_init_connector(struct
> > intel_digital_port *dig_port,
> > int type;
> >
> > /* Initialize the work for modeset in case of link train
> > failure */
> > -   INIT_WORK(_connector->modeset_retry_work,
> > - intel_dp_modeset_retry_work_fn);
> > +   intel_dp_init_modeset_retry_work(intel_connector);
> >
> > if (drm_WARN(dev, dig_port->max_lanes < 1,
> >  "Not enough lanes (%d) for DP on
> > [ENCODER:%d:%s]\n",
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.h
> > b/drivers/gpu/drm/i915/display/intel_dp.h
> > index 530cc97bc42f4..105c2086310db 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp.h
> > +++ b/drivers/gpu/drm/i915/display/intel_dp.h
> > @@ -23,6 +23,8 @@ struct intel_digital_port;
> >  struct intel_dp;
> >  struct intel_encoder;
> >
> > +struct work_struct;
> > +
> >  struct link_config_limits {
> > int min_rate, max_rate;
> > int min_lane_count, max_lane_count;
> > @@ -43,6 +45,12 @@ void intel_dp_adjust_compliance_config(struct
> > intel_dp *intel_dp,
> >  bool intel_dp_limited_color_range(const struct intel_crtc_state
> > *crtc_state,
> >   const struct drm_connector_state
> > *conn_state);
> >  int intel_dp_min_bpp(enum intel_output_format output_format);
> > +void intel_dp_init_modeset_retry_work(struct intel_connector
> > *connector);
> > +void intel_dp_queue_modeset_retry_work(struct intel_connector
> > *connector);
> > +void intel_dp_queue_modeset_retry_for_link(struct intel_atomic_state
> > *state,
> > +  struct intel_encoder
> > *encoder,
> > +  const struct
> > intel_crtc_state *crtc_state,
> > +  const struct
> > drm_connector_state *conn_state);
> >  bool intel_dp_init_connector(struct intel_digital_port *dig_port,
> >  struct intel_connector
> > 

Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs

2024-01-29 Thread Boris Brezillon
On Mon, 29 Jan 2024 17:20:47 +0800 (CST)
"Andy Yan"  wrote:

> Hi Boris:
> 
> Thanks for you great work。
> 
> One thing please take note:
> commit (arm64: dts: rockchip: rk3588: Add GPU nodes)  in [1] seems remove the 
> "disabled" status 
> of usb_host2_xhci, this may cause a boot issue on some boards that use 
> combphy2_psu  phy for other functions.

Oops, should be fixed in
https://gitlab.freedesktop.org/panfrost/linux/-/commits/panthor-next+rk3588
now.

Thanks,

Boris


[PATCH RESEND v3 0/3] Update STM DSI PHY driver

2024-01-29 Thread Raphael Gallais-Pou


This patch series aims to add several features of the dw-mipi-dsi phy
driver that are missing or need to be updated.

First patch update a PM macro.

Second patch adds runtime PM functionality to the driver.

Third patch adds a clock provider generated by the PHY itself.  As
explained in the commit log of the second patch, a clock declaration is
missing.  Since this clock is parent of 'dsi_k', it leads to an orphan
clock.  Most importantly this patch is an anticipation for future
versions of the DSI PHY, and its inclusion within the display subsystem
and the DRM framework.

Last patch fixes a corner effect introduced previously.  Since 'dsi' and
'dsi_k' are gated by the same bit on the same register, both reference
work as peripheral clock in the device-tree.

---
Changes in v3-resend:
- Removed last patch as it has been merged
https://lore.kernel.org/lkml/bf49f4c9-9e81-4c91-972d-13782d996...@foss.st.com/

Changes in v3:
- Fix smatch warning (disable dsi->pclk when clk_register fails)

Changes in v2:
- Added patch 1/4 to use SYSTEM_SLEEP_PM_OPS instead of old macro
  and removed __maybe_used for accordingly
- Changed SET_RUNTIME_PM_OPS to RUNTIME_PM_OPS

Raphael Gallais-Pou (3):
  drm/stm: dsi: use new SYSTEM_SLEEP_PM_OPS() macro
  drm/stm: dsi: expose DSI PHY internal clock

Yannick Fertre (1):
  drm/stm: dsi: add pm runtime ops

 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 279 ++
 1 file changed, 238 insertions(+), 41 deletions(-)

-- 
2.25.1



[PATCH RESEND v3 3/3] drm/stm: dsi: expose DSI PHY internal clock

2024-01-29 Thread Raphael Gallais-Pou
DSISRC __
   __\_
  |\
pll4_p_ck   ->|  1  |dsi_k
ck_dsi_phy  ->|  0  |
  |/

A DSI clock is missing in the clock framework. Looking at the
clk_summary, it appears that 'ck_dsi_phy' is not implemented. Since the
DSI kernel clock is based on the internal DSI pll. The common clock
driver can not directly expose this 'ck_dsi_phy' clock because it does
not contain any common registers with the DSI. Thus it needs to be done
directly within the DSI phy driver.

Signed-off-by: Raphael Gallais-Pou 
---
Changes in v3:
- Fix smatch warning:
.../dw_mipi_dsi-stm.c:719 dw_mipi_dsi_stm_probe() warn: 'dsi->pclk'
from clk_prepare_enable() not released on lines: 719.
---
 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 247 ++
 1 file changed, 216 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c 
b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
index 82fff9e84345..b20123854c4a 100644
--- a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
+++ b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
@@ -7,7 +7,9 @@
  */
 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -77,9 +79,12 @@ enum dsi_color {
 
 struct dw_mipi_dsi_stm {
void __iomem *base;
+   struct device *dev;
struct clk *pllref_clk;
struct clk *pclk;
+   struct clk_hw txbyte_clk;
struct dw_mipi_dsi *dsi;
+   struct dw_mipi_dsi_plat_data pdata;
u32 hw_version;
int lane_min_kbps;
int lane_max_kbps;
@@ -196,29 +201,198 @@ static int dsi_pll_get_params(struct dw_mipi_dsi_stm 
*dsi,
return 0;
 }
 
-static int dw_mipi_dsi_phy_init(void *priv_data)
+#define clk_to_dw_mipi_dsi_stm(clk) \
+   container_of(clk, struct dw_mipi_dsi_stm, txbyte_clk)
+
+static void dw_mipi_dsi_clk_disable(struct clk_hw *clk)
 {
-   struct dw_mipi_dsi_stm *dsi = priv_data;
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(clk);
+
+   DRM_DEBUG_DRIVER("\n");
+
+   /* Disable the DSI PLL */
+   dsi_clear(dsi, DSI_WRPCR, WRPCR_PLLEN);
+
+   /* Disable the regulator */
+   dsi_clear(dsi, DSI_WRPCR, WRPCR_REGEN | WRPCR_BGREN);
+}
+
+static int dw_mipi_dsi_clk_enable(struct clk_hw *clk)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(clk);
u32 val;
int ret;
 
+   DRM_DEBUG_DRIVER("\n");
+
/* Enable the regulator */
dsi_set(dsi, DSI_WRPCR, WRPCR_REGEN | WRPCR_BGREN);
-   ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_RRS,
-SLEEP_US, TIMEOUT_US);
+   ret = readl_poll_timeout_atomic(dsi->base + DSI_WISR, val, val & 
WISR_RRS,
+   SLEEP_US, TIMEOUT_US);
if (ret)
DRM_DEBUG_DRIVER("!TIMEOUT! waiting REGU, let's continue\n");
 
/* Enable the DSI PLL & wait for its lock */
dsi_set(dsi, DSI_WRPCR, WRPCR_PLLEN);
-   ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_PLLLS,
-SLEEP_US, TIMEOUT_US);
+   ret = readl_poll_timeout_atomic(dsi->base + DSI_WISR, val, val & 
WISR_PLLLS,
+   SLEEP_US, TIMEOUT_US);
if (ret)
DRM_DEBUG_DRIVER("!TIMEOUT! waiting PLL, let's continue\n");
 
return 0;
 }
 
+static int dw_mipi_dsi_clk_is_enabled(struct clk_hw *hw)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+
+   return dsi_read(dsi, DSI_WRPCR) & WRPCR_PLLEN;
+}
+
+static unsigned long dw_mipi_dsi_clk_recalc_rate(struct clk_hw *hw,
+unsigned long parent_rate)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+   unsigned int idf, ndiv, odf, pll_in_khz, pll_out_khz;
+   u32 val;
+
+   DRM_DEBUG_DRIVER("\n");
+
+   pll_in_khz = (unsigned int)(parent_rate / 1000);
+
+   val = dsi_read(dsi, DSI_WRPCR);
+
+   idf = (val & WRPCR_IDF) >> 11;
+   if (!idf)
+   idf = 1;
+   ndiv = (val & WRPCR_NDIV) >> 2;
+   odf = int_pow(2, (val & WRPCR_ODF) >> 16);
+
+   /* Get the adjusted pll out value */
+   pll_out_khz = dsi_pll_get_clkout_khz(pll_in_khz, idf, ndiv, odf);
+
+   return (unsigned long)pll_out_khz * 1000;
+}
+
+static long dw_mipi_dsi_clk_round_rate(struct clk_hw *hw, unsigned long rate,
+  unsigned long *parent_rate)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+   unsigned int idf, ndiv, odf, pll_in_khz, pll_out_khz;
+   int ret;
+
+   DRM_DEBUG_DRIVER("\n");
+
+   pll_in_khz = (unsigned int)(*parent_rate / 1000);
+
+   /* Compute best pll parameters */
+   idf = 0;
+   ndiv = 0;
+   odf = 0;
+
+   ret = dsi_pll_get_params(dsi, pll_in_khz, rate / 1000,
+, , );
+   if 

  1   2   >