date:20230822

Re: [PATCH 1/3] drm/buddy: Fix contiguous memory allocation issues

2023-08-22 Thread Christian König


Am 21.08.23 um 13:16 schrieb Christian König:

Am 21.08.23 um 12:14 schrieb Arunpravin Paneer Selvam:

The way now contiguous requests are implemented such that
the size rounded up to power of 2 and the corresponding order
block picked from the freelist.

In addition to the older method, the new method will rounddown
the size to power of 2 and the corresponding order block picked
from the freelist. And for the remaining size we traverse the
tree and try to allocate either from the freelist block's buddy
or from the peer block. If the remaining size from peer/buddy
block is not free, we pick the next freelist block and repeat
the same method.


I think it's worth mentioning that Xinhui tried something similar a 
few month ago, but that didn't looked like it would work. For this 
here I'm more confident.


Of hand the implementation looks clean to me, but Matthew or others 
which have more background in how the implementation works need to 
take a look as well.


One more thing I've just noticed, not sure if Matthew already noted it: 
When you mention "fix" in the subject line people might try to backport 
it, better write "improve" and drop the "issues" at the end.


Regards,
Christian.



Thanks,
Christian.



Moved contiguous/alignment size computation part and trim
function to the drm buddy manager.

Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/drm_buddy.c | 253 ++--
  include/drm/drm_buddy.h |   6 +-
  2 files changed, 248 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 7098f125b54a..220f60c08a03 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -569,6 +569,197 @@ static int __drm_buddy_alloc_range(struct 
drm_buddy *mm,

  return __alloc_range(mm, , start, size, blocks);
  }
  +static int __alloc_contiguous_block_from_buddy(struct drm_buddy *mm,
+   u64 size,
+   u64 min_block_size,
+   struct drm_buddy_block *block,
+   struct list_head *blocks)
+{
+    struct drm_buddy_block *buddy, *parent = NULL;
+    u64 start, offset = 0;
+    LIST_HEAD(dfs);
+    int err;
+
+    if (!block)
+    return -EINVAL;
+
+    buddy = __get_buddy(block);
+    if (!buddy)
+    return -ENOSPC;
+
+    if (drm_buddy_block_is_allocated(buddy))
+    return -ENOSPC;
+
+    parent = block->parent;
+    if (!parent)
+    return -ENOSPC;
+
+    if (block->parent->right == block) {
+    u64 remaining;
+
+    /* Compute the leftover size for allocation */
+    remaining = max((size - drm_buddy_block_size(mm, buddy)),
+    min_block_size);
+    if (!IS_ALIGNED(remaining, min_block_size))
+    remaining = round_up(remaining, min_block_size);
+
+    /* Check if remaining size is greater than buddy block size */
+    if (drm_buddy_block_size(mm, buddy) < remaining)
+    return -ENOSPC;
+
+    offset = drm_buddy_block_size(mm, buddy) - remaining;
+    }
+
+    list_add(>tmp_link, );
+    start = drm_buddy_block_offset(parent) + offset;
+
+    err = __alloc_range(mm, , start, size, blocks);
+    if (err)
+    return -ENOSPC;
+
+    return 0;
+}
+
+static int __alloc_contiguous_block_from_peer(struct drm_buddy *mm,
+  u64 size,
+  u64 min_block_size,
+  struct drm_buddy_block *block,
+  struct list_head *blocks)
+{
+    struct drm_buddy_block *first, *peer, *tmp;
+    struct drm_buddy_block *parent = NULL;
+    u64 start, offset = 0;
+    unsigned int order;
+    LIST_HEAD(dfs);
+    int err;
+
+    if (!block)
+    return -EINVAL;
+
+    order = drm_buddy_block_order(block);
+    /* Add freelist block to dfs list */
+    list_add(>tmp_link, );
+
+    tmp = block;
+    parent = block->parent;
+    while (parent) {
+    if (block->parent->left == block) {
+    if (parent->left != tmp) {
+    peer = parent->left;
+    break;
+    }
+    } else {
+    if (parent->right != tmp) {
+    peer = parent->right;
+    break;
+    }
+    }
+
+    tmp = parent;
+    parent = tmp->parent;
+    }
+
+    if (!parent)
+    return -ENOSPC;
+
+    do {
+    if (drm_buddy_block_is_allocated(peer))
+    return -ENOSPC;
+    /* Exit loop if peer block order is equal to block order */
+    if (drm_buddy_block_order(peer) == order)
+    break;
+
+    if (drm_buddy_block_is_split(peer)) {
+    /* Traverse down to the block order level */
+    if (block->parent->left == block)
+    peer = peer->right;
+    else
+    peer = peer->left;
+    } else {
+    break;
+    }
+    } while (1);
+
+    if (block->parent->left == block) {
+    u64 remaining;
+
+

Re: [PATCH] drm/prime: Support page array >= 4GB

2023-08-22 Thread Christian König


Am 22.08.23 um 20:27 schrieb Philip Yang:


On 2023-08-22 05:43, Christian König wrote:



Am 21.08.23 um 22:02 schrieb Philip Yang:

Without unsigned long typecast, the size is passed in as zero if page
array size >= 4GB, nr_pages >= 0x10, then sg list converted will
have the first and the last chunk lost.


Good catch, but I'm not sure if this is enough to make it work.

Additional to that I don't think we have an use case for BOs > 4GiB.


>4GB buffer is normal for compute applications, the issue is reported 
by "Maelstrom generated exerciser detects micompares when GPU accesses 
larger remote GPU memory." on GFX 9.4.3 APU, which uses GTT domain to 
allocate VRAM, and trigger the bug in this drm prime helper. With this 
fix, the test passed.




Why is the application allocating all the data as a single BO?

Usually you have a single texture, image, array etc... in a single BO 
but this here looks a bit like the application tries to allocate all 
their memory in a single BO (could of course be that this isn't the case 
and that's really just one giant data structure).


Swapping such large BOs out at once is quite impractical, so should we 
ever have an use case like suspend/resume or checkpoint/restore with 
this it will most likely fail.


Christian.


Regards,

Philip



Christian.



Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/drm_prime.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index f924b8b4ab6b..2630ad2e504d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -830,7 +830,7 @@ struct sg_table *drm_prime_pages_to_sg(struct 
drm_device *dev,

  if (max_segment == 0)
  max_segment = UINT_MAX;
  err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
-    nr_pages << PAGE_SHIFT,
+    (unsigned long)nr_pages << PAGE_SHIFT,
  max_segment, GFP_KERNEL);
  if (err) {
  kfree(sg);

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Tomasz Figa

Hi Hsia-Jun,

On Tue, Aug 22, 2023 at 8:14 PM Hsia-Jun Li  wrote:
>
> Hello
>
> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
> purpose of that is sharing metadata or just a pure container for cross
> drivers.
>
> We need to exchange some sort of metadata between drivers, likes dynamic
> HDR data between video4linux2 and DRM.

If the metadata isn't too big, would it be enough to just have the
kernel copy_from_user() to a kernel buffer in the ioctl code?

> Or the graphics frame buffer is
> too complex to be described with plain plane's DMA-buf fd.
> An issue between DRM and V4L2 is that DRM could only support 4 planes
> while it is 8 for V4L2. It would be pretty hard for DRM to expend its
> interface to support that 4 more planes which would lead to revision of
> many standard likes Vulkan, EGL.

Could you explain how a shmem buffer could be used to support frame
buffers with more than 4 planes?

>
> Also, there is no reason to consume a device's memory for the content
> that device can't read it, or wasting an entry of IOMMU for such data.

That's right, but DMA-buf doesn't really imply any of those. DMA-buf
is just a kernel object with some backing memory. It's up to the
allocator to decide how the backing memory is allocated and up to the
importer on whether it would be mapped into an IOMMU.

> Usually, such a metadata would be the value should be written to a
> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
>
> Still, I have some problems with SHMEM:
> 1. I don't want thhe userspace modify the context of the SHMEM allocated
> by the kernel, is there a way to do so?

This is generally impossible without doing any of the two:
1) copying the contents to an internal buffer not accessible to the
userspace, OR
2) modifying any of the buffer mappings to read-only

2) can actually be more costly than 1) (depending on the architecture,
data size, etc.), so we shouldn't just discard the option of a simple
copy_from_user() in the ioctl.

> 2. Should I create a helper function for installing the SHMEM file as a fd?

We already have the udmabuf device [1] to turn a memfd into a DMA-buf,
so maybe that would be enough?

[1] https://elixir.bootlin.com/linux/v6.5-rc7/source/drivers/dma-buf/udmabuf.c

Best,
Tomasz

>
> --
> Hsia-Jun(Randy) Li

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Hsia-Jun Li





On 8/23/23 03:55, Nicolas Dufresne wrote:

CAUTION: Email originated externally, do not click links or open attachments 
unless you recognize the sender and know the content is safe.


Hi,

Le mardi 22 août 2023 à 19:14 +0800, Hsia-Jun Li a écrit :

Hello

I would like to introduce a usage of SHMEM slimier to DMA-buf, the major
purpose of that is sharing metadata or just a pure container for cross
drivers.

We need to exchange some sort of metadata between drivers, likes dynamic
HDR data between video4linux2 and DRM. Or the graphics frame buffer is
too complex to be described with plain plane's DMA-buf fd.
An issue between DRM and V4L2 is that DRM could only support 4 planes
while it is 8 for V4L2. It would be pretty hard for DRM to expend its
interface to support that 4 more planes which would lead to revision of
many standard likes Vulkan, EGL.

Also, there is no reason to consume a device's memory for the content
that device can't read it, or wasting an entry of IOMMU for such data.
Usually, such a metadata would be the value should be written to a
hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.

Still, I have some problems with SHMEM:
1. I don't want thhe userspace modify the context of the SHMEM allocated
by the kernel, is there a way to do so?
2. Should I create a helper function for installing the SHMEM file as a fd?


Please have a look at memfd and the seal feature, it does cover the reason why


That is the implement I need, it would affact the userspace not the 
kernel space. Should I expand a kAPI for memfd or just take the 
implement for the SHMEM?

This interfaces need to offer three things:
1. a fd for userspace to exchange between drivers
2. a kernel virtual address for accessing
3. userspace SEAL

Meanwhile, I am thinking whether we should offer a generic context 
header for such usage. Or we need another fields in a driver to describe it.

struct shmem_generic_container {
u64 format; /* use DRM modifier vendor bits but */
u32 size; /* size of the payload */
u8 payload[];
};
/* format linear for nesting dolls context */
struct shmem_nesting_container {
u32 num;
u64 formats[num];
u32 sizes[num];
u32 offsets[num]; /* offset from the payload below */
u8 payload[];
};

unsealed shared memory require full trust. For controls, the SEAL_WRITE is even
needed, as with appropriate timing, a malicous process can modify the data in-
between validation and allocation, causing possible memory overflow.

https://man7.org/linux/man-pages/man2/memfd_create.2.html
File sealing
In the absence of file sealing, processes that communicate via
shared memory must either trust each other, or take measures to
deal with the possibility that an untrusted peer may manipulate
the shared memory region in problematic ways.  For example, an
untrusted peer might modify the contents of the shared memory at
any time, or shrink the shared memory region.  The former
possibility leaves the local process vulnerable to time-of-check-
to-time-of-use race conditions (typically dealt with by copying
data from the shared memory region before checking and using it).
The latter possibility leaves the local process vulnerable to
SIGBUS signals when an attempt is made to access a now-
nonexistent location in the shared memory region.  (Dealing with
this possibility necessitates the use of a handler for the SIGBUS
signal.)

Dealing with untrusted peers imposes extra complexity on code
that employs shared memory.  Memory sealing enables that extra
complexity to be eliminated, by allowing a process to operate
secure in the knowledge that its peer can't modify the shared
memory in an undesired fashion.

[...]

regards,
Nicolas


--
Hsia-Jun(Randy) Li

Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-08-22 Thread Matthew Brost

On Mon, Aug 21, 2023 at 03:17:29PM +0200, Christian König wrote:
> Am 18.08.23 um 15:13 schrieb Matthew Brost:
> > On Fri, Aug 18, 2023 at 07:27:33AM +0200, Christian König wrote:
> > > Am 17.08.23 um 19:54 schrieb Matthew Brost:
> > > > On Thu, Aug 17, 2023 at 03:39:40PM +0200, Christian König wrote:
> > > > > Am 11.08.23 um 04:31 schrieb Matthew Brost:
> > > > > > Rather than call free_job and run_job in same work item have a 
> > > > > > dedicated
> > > > > > work item for each. This aligns with the design and intended use of 
> > > > > > work
> > > > > > queues.
> > > > > I would rather say we should get completely rid of the free_job 
> > > > > callback.
> > > > > 
> > > > Would we still have work item? e.g. Would we still want to call
> > > > drm_sched_get_cleanup_job which removes the job from the pending list
> > > > and adjusts the TDR? Trying to figure out out what this looks like. We
> > > > probably can't do all of this from an IRQ context.
> > > > 
> > > > > Essentially the job is just the container which carries the 
> > > > > information
> > > > > which are necessary before you push it to the hw. The real 
> > > > > representation of
> > > > > the submission is actually the scheduler fence.
> > > > > 
> > > > Most of the free_jobs call plus drm_sched_job_cleanup + a put on job. In
> > > > Xe this cannot be called from an IRQ context either.
> > > > 
> > > > I'm just confused what exactly you are suggesting here.
> > > To summarize on one sentence: Instead of the job we keep the scheduler and
> > > hardware fences around after pushing the job to the hw.
> > > 
> > > The free_job callback would then be replaced by dropping the reference on
> > > the scheduler and hw fence.
> > > 
> > > Would that work for you?
> > > 
> > I don't think so for a few reasons.
> > 
> > The job and hw fence are different structures (also different allocs too)
> > for a reason. The job referenced until it is complete (hw fence is
> > signaled) and the free_job is called. This reference is needed for the
> > TDR to work properly and also some reset flows too.
> 
> That is exactly what I want to avoid, tying the TDR to the job is what some
> AMD engineers pushed for because it looked like a simple solution and made
> the whole thing similar to what Windows does.
> 
> This turned the previous relatively clean scheduler and TDR design into a
> complete nightmare. The job contains quite a bunch of things which are not
> necessarily available after the application which submitted the job is torn
> down.
>

Agree the TDR shouldn't be accessing anything application specific
rather just internal job state required to tear the job down on the
hardware.
 
> So what happens is that you either have stale pointers in the TDR which can
> go boom extremely easily or we somehow find a way to keep the necessary

I have not experenced the TDR going boom in Xe.

> structures (which include struct thread_info and struct file for this driver
> connection) alive until all submissions are completed.
> 

In Xe we keep everything alive until all submissions are completed. By
everything I mean the drm job, entity, scheduler, and VM via a reference
counting scheme. All of these structures are just kernel state which can
safely be accessed even if the application has been killed.

If we need to teardown on demand we just set the TDR to a minimum value and
it kicks the jobs off the hardware, gracefully cleans everything up and
drops all references. This is a benefit of the 1 to 1 relationship, not
sure if this works with how AMDGPU uses the scheduler.

> Delaying application tear down is also not an option because then you run
> into massive trouble with the OOM killer (or more generally OOM handling).
> See what we do in drm_sched_entity_flush() as well.
> 

Not an issue for Xe, we never call drm_sched_entity_flush as our
referencing counting scheme is all jobs are finished before we attempt
to tear down entity / scheduler.

> Since adding the TDR support we completely exercised this through in the
> last two or three years or so. And to sum it up I would really like to get
> away from this mess again.
> 
> Compared to that what i915 does is actually rather clean I think.
> 

Not even close, resets where a nightmare in the i915 (I spend years
trying to get this right and probably still completely work) and in Xe
basically got it right on the attempt.

> >   Also in Xe some of
> > things done in free_job cannot be from an IRQ context, hence calling
> > this from the scheduler worker is rather helpful.
> 
> Well putting things for cleanup into a workitem doesn't sounds like
> something hard.
>

That is exactly what we doing in the scheduler with the free_job
workitem.

> Question is what do you really need for TDR which is not inside the hardware
> fence?
>

A reference to the entity to be able to kick the job off the hardware.
A reference to the entity, job, and VM for error capture.

We also need a reference to the job for recovery after a GPU

RE: [Patch v2 2/3] drm/mst: Refactor the flow for payload allocation/removement

2023-08-22 Thread Lin, Wayne

[Public]

Thanks, Lyude!
Should I push another version to fix the indention?

> -Original Message-
> From: Lyude Paul 
> Sent: Friday, August 18, 2023 6:17 AM
> To: Lin, Wayne ; dri-devel@lists.freedesktop.org;
> amd-...@lists.freedesktop.org
> Cc: jani.nik...@intel.com; ville.syrj...@linux.intel.com; imre.d...@intel.com;
> Wentland, Harry ; Zuo, Jerry
> 
> Subject: Re: [Patch v2 2/3] drm/mst: Refactor the flow for payload
> allocation/removement
>
> Two small comments:
>
> On Mon, 2023-08-07 at 10:56 +0800, Wayne Lin wrote:
> > [Why]
> > Today, the allocation/deallocation steps and status is a bit unclear.
> >
> > For instance, payload->vc_start_slot = -1 stands for "the failure of
> > updating DPCD payload ID table" and can also represent as "payload is
> > not allocated yet". These two cases should be handled differently and
> > hence better to distinguish them for better understanding.
> >
> > [How]
> > Define enumeration - ALLOCATION_LOCAL, ALLOCATION_DFP and
> > ALLOCATION_REMOTE to distinguish different allocation status. Adjust
> > the code to handle different status accordingly for better
> > understanding the sequence of payload allocation and payload
> removement.
> >
> > For payload creation, the procedure should look like this:
> > DRM part 1:
> > * step 1 - update sw mst mgr variables to add a new payload
> > * step 2 - add payload at immediate DFP DPCD payload table
> >
> > Driver:
> > * Add new payload in HW and sync up with DFP by sending ACT
> >
> > DRM Part 2:
> > * Send ALLOCATE_PAYLOAD sideband message to allocate bandwidth along
> the
> >   virtual channel.
> >
> > And as for payload removement, the procedure should look like this:
> > DRM part 1:
> > * step 1 - Send ALLOCATE_PAYLOAD sideband message to release bandwidth
> >along the virtual channel
> > * step 2 - Clear payload allocation at immediate DFP DPCD payload
> > table
> >
> > Driver:
> > * Remove the payload in HW and sync up with DFP by sending ACT
> >
> > DRM part 2:
> > * update sw mst mgr variables to remove the payload
> >
> > Note that it's fine to fail when communicate with the branch device
> > connected at immediate downstrean-facing port, but updating variables
> > of SW mst mgr and HW configuration should be conducted anyway. That's
> > because it's under commit_tail and we need to complete the HW
> programming.
> >
> > Changes since v1:
> > * Remove the set but not use variable 'old_payload' in function
> >   'nv50_msto_prepare'. Catched by kernel test robot 
> >
> > Signed-off-by: Wayne Lin 
> > ---
> >  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  20 ++-
> > drivers/gpu/drm/display/drm_dp_mst_topology.c | 159 +++--
> -
> >  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  18 +-
> >  drivers/gpu/drm/nouveau/dispnv50/disp.c   |  21 +--
> >  include/drm/display/drm_dp_mst_helper.h   |  23 ++-
> >  5 files changed, 153 insertions(+), 88 deletions(-)
> >
> > diff --git
> a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > index d9a482908380..9ad509279b0a 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
> > @@ -219,7 +219,7 @@ static void dm_helpers_construct_old_payload(
> > /* Set correct time_slots/PBN of old payload.
> >  * other fields (delete & dsc_enabled) in
> >  * struct drm_dp_mst_atomic_payload are don't care fields
> > -* while calling drm_dp_remove_payload()
> > +* while calling drm_dp_remove_payload_part2()
> >  */
> > for (i = 0; i < current_link_table.stream_count; i++) {
> > dc_alloc =
> > @@ -262,13 +262,12 @@ bool
> > dm_helpers_dp_mst_write_payload_allocation_table(
> >
> > mst_mgr = >mst_root->mst_mgr;
> > mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
> > -
> > -   /* It's OK for this to fail */
> > new_payload = drm_atomic_get_mst_payload_state(mst_state,
> > aconnector->mst_output_port);
> >
> > if (enable) {
> > target_payload = new_payload;
> >
> > +   /* It's OK for this to fail */
> > drm_dp_add_payload_part1(mst_mgr, mst_state,
> new_payload);
> > } else {
> > /* construct old payload by VCPI*/
> > @@ -276,7 +275,7 @@ bool
> dm_helpers_dp_mst_write_payload_allocation_table(
> > new_payload, _payload);
> > target_payload = _payload;
> >
> > -   drm_dp_remove_payload(mst_mgr, mst_state,
> _payload, new_payload);
> > +   drm_dp_remove_payload_part1(mst_mgr, mst_state,
> new_payload);
> > }
> >
> > /* mst_mgr->->payloads are VC payload notify MST branch using
> DPCD
> > or @@ -342,7 +341,7 @@ bool
> dm_helpers_dp_mst_send_payload_allocation(
> > struct amdgpu_dm_connector *aconnector;
> > struct drm_dp_mst_topology_state *mst_state;
> > struct drm_dp_mst_topology_mgr

RE: [PATCH 3/3] drm/mst: adjust the function drm_dp_remove_payload_part2()

2023-08-22 Thread Lin, Wayne

[AMD Official Use Only - General]

> -Original Message-
> From: Imre Deak 
> Sent: Saturday, August 19, 2023 1:46 AM
> To: Lin, Wayne 
> Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org;
> ly...@redhat.com; jani.nik...@intel.com; ville.syrj...@linux.intel.com;
> Wentland, Harry ; Zuo, Jerry
> 
> Subject: Re: [PATCH 3/3] drm/mst: adjust the function
> drm_dp_remove_payload_part2()
>
> On Tue, Aug 08, 2023 at 03:47:47AM +, Lin, Wayne wrote:
> > [AMD Official Use Only - General]
> >
> > > -Original Message-
> > > From: Imre Deak 
> > > Sent: Tuesday, August 8, 2023 12:00 AM
> > > To: Lin, Wayne 
> > > Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org;
> > > ly...@redhat.com; jani.nik...@intel.com;
> > > ville.syrj...@linux.intel.com; Wentland, Harry
> > > ; Zuo, Jerry 
> > > Subject: Re: [PATCH 3/3] drm/mst: adjust the function
> > > drm_dp_remove_payload_part2()
> > >
> > > On Mon, Aug 07, 2023 at 02:43:02AM +, Lin, Wayne wrote:
> > > > [AMD Official Use Only - General]
> > > >
> > > > > -Original Message-
> > > > > From: Imre Deak 
> > > > > Sent: Friday, August 4, 2023 11:32 PM
> > > > > To: Lin, Wayne 
> > > > > Cc: dri-devel@lists.freedesktop.org;
> > > > > amd-...@lists.freedesktop.org; ly...@redhat.com;
> > > > > jani.nik...@intel.com; ville.syrj...@linux.intel.com; Wentland,
> > > > > Harry ; Zuo, Jerry 
> > > > > Subject: Re: [PATCH 3/3] drm/mst: adjust the function
> > > > > drm_dp_remove_payload_part2()
> > > > >
> > > > > On Fri, Aug 04, 2023 at 02:20:29PM +0800, Wayne Lin wrote:
> > > > > > [...]
> > > > > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > > b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > > index e04f87ff755a..4270178f95f6 100644
> > > > > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > > > > @@ -3382,8 +3382,7 @@
> > > > > EXPORT_SYMBOL(drm_dp_remove_payload_part1);
> > > > > >   * drm_dp_remove_payload_part2() - Remove an MST payload
> locally
> > > > > >   * @mgr: Manager to use.
> > > > > >   * @mst_state: The MST atomic state
> > > > > > - * @old_payload: The payload with its old state
> > > > > > - * @new_payload: The payload with its latest state
> > > > > > + * @payload: The payload with its latest state
> > > > > >   *
> > > > > >   * Updates the starting time slots of all other payloads
> > > > > > which would have
> > > > > been shifted towards
> > > > > >   * the start of the payload ID table as a result of removing
> > > > > > a payload. Driver should call this @@ -3392,25 +3391,36 @@
> > > > > EXPORT_SYMBOL(drm_dp_remove_payload_part1);
> > > > > >   */
> > > > > >  void drm_dp_remove_payload_part2(struct
> > > drm_dp_mst_topology_mgr
> > > > > *mgr,
> > > > > >  struct drm_dp_mst_topology_state
> > > > > *mst_state,
> > > > > > -const struct drm_dp_mst_atomic_payload
> > > > > *old_payload,
> > > > > > -struct drm_dp_mst_atomic_payload
> > > > > *new_payload)
> > > > > > +struct drm_dp_mst_atomic_payload
> > > > > *payload)
> > > > > >  {
> > > > > > struct drm_dp_mst_atomic_payload *pos;
> > > > > > +   u8 time_slots_to_remove;
> > > > > > +   u8 next_payload_vc_start = mgr->next_start_slot;
> > > > > > +
> > > > > > +   /* Find the current allocated time slot number of the payload */
> > > > > > +   list_for_each_entry(pos, _state->payloads, next) {
> > > > > > +   if (pos != payload &&
> > > > > > +   pos->vc_start_slot > payload->vc_start_slot &&
> > > > > > +   pos->vc_start_slot < next_payload_vc_start)
> > > > > > +   next_payload_vc_start = pos->vc_start_slot;
> > > > > > +   }
> > > > > > +
> > > > > > +   time_slots_to_remove = next_payload_vc_start -
> > > > > > +payload->vc_start_slot;
> > > > >
> > > > > Imo, the intuitive way would be to pass the old payload state to
> > > > > this function - which already contains the required time_slots
> > > > > param
> > > > > - and refactor things instead moving vc_start_slot from the
> > > > > payload state to mgr suggested by Ville earlier.
> > > > >
> > > > > --Imre
> > > >
> > > > Hi Imre,
> > > > Thanks for your feedback!
> > > >
> > > > I understand it's functionally correct. But IMHO, it's still a bit
> > > > conceptually different between the time slot in old state and the
> > > > time slot in current payload table. My thought is the time slot at
> > > > the moment when we are removing the payload would be a better
> choice.
> > >
> > > Yes, they are different. The old state contains the time slot the
> > > payload was added with in a preceding commit and so the time slot
> > > value which should be used when removing the same payload in the
> current commit.
> > >
> > > The new state contains a time slot value with which the payload will
> > > be added in the current

Re: [PATCH v4 43/48] drm/ttm: introduce pool_shrink_rwsem

2023-08-22 Thread Qi Zheng


Hi Daniel,

On 2023/8/22 21:56, Daniel Vetter wrote:

On Mon, Aug 07, 2023 at 07:09:31PM +0800, Qi Zheng wrote:

Currently, the synchronize_shrinkers() is only used by TTM pool. It only
requires that no shrinkers run in parallel.

After we use RCU+refcount method to implement the lockless slab shrink,
we can not use shrinker_rwsem or synchronize_rcu() to guarantee that all
shrinker invocations have seen an update before freeing memory.

So we introduce a new pool_shrink_rwsem to implement a private
synchronize_shrinkers(), so as to achieve the same purpose.

Signed-off-by: Qi Zheng 
Reviewed-by: Muchun Song 


On the 5 drm patches (I counted 2 ttm and 3 drivers) for merging through
some other tree (since I'm assuming that's how this will land):


Yeah, there are 5 drm patches: PATCH v4 07/48 23/48 24/48 25/48 43/48.



Acked-by: Daniel Vetter 


Thanks for your review!

Qi




---
  drivers/gpu/drm/ttm/ttm_pool.c | 15 +++
  include/linux/shrinker.h   |  2 --
  mm/shrinker.c  | 15 ---
  3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index c9c9618c0dce..38b4c280725c 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -74,6 +74,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 
1];
  static spinlock_t shrinker_lock;
  static struct list_head shrinker_list;
  static struct shrinker *mm_shrinker;
+static DECLARE_RWSEM(pool_shrink_rwsem);
  
  /* Allocate pages of size 1 << order with the given gfp_flags */

  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t 
gfp_flags,
@@ -317,6 +318,7 @@ static unsigned int ttm_pool_shrink(void)
unsigned int num_pages;
struct page *p;
  
+	down_read(_shrink_rwsem);

spin_lock(_lock);
pt = list_first_entry(_list, typeof(*pt), shrinker_list);
list_move_tail(>shrinker_list, _list);
@@ -329,6 +331,7 @@ static unsigned int ttm_pool_shrink(void)
} else {
num_pages = 0;
}
+   up_read(_shrink_rwsem);
  
  	return num_pages;

  }
@@ -572,6 +575,18 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
*dev,
  }
  EXPORT_SYMBOL(ttm_pool_init);
  
+/**

+ * synchronize_shrinkers - Wait for all running shrinkers to complete.
+ *
+ * This is useful to guarantee that all shrinker invocations have seen an
+ * update, before freeing memory, similar to rcu.
+ */
+static void synchronize_shrinkers(void)
+{
+   down_write(_shrink_rwsem);
+   up_write(_shrink_rwsem);
+}
+
  /**
   * ttm_pool_fini - Cleanup a pool
   *
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index c55c07c3f0cb..025c8070dd86 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -103,8 +103,6 @@ struct shrinker *shrinker_alloc(unsigned int flags, const 
char *fmt, ...);
  void shrinker_register(struct shrinker *shrinker);
  void shrinker_free(struct shrinker *shrinker);
  
-extern void synchronize_shrinkers(void);

-
  #ifdef CONFIG_SHRINKER_DEBUG
  extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
  const char *fmt, ...);
diff --git a/mm/shrinker.c b/mm/shrinker.c
index 3ab301ff122d..a27779ed3798 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -650,18 +650,3 @@ void shrinker_free(struct shrinker *shrinker)
kfree(shrinker);
  }
  EXPORT_SYMBOL_GPL(shrinker_free);
-
-/**
- * synchronize_shrinkers - Wait for all running shrinkers to complete.
- *
- * This is equivalent to calling unregister_shrink() and register_shrinker(),
- * but atomically and with less overhead. This is useful to guarantee that all
- * shrinker invocations have seen an update, before freeing memory, similar to
- * rcu.
- */
-void synchronize_shrinkers(void)
-{
-   down_write(_rwsem);
-   up_write(_rwsem);
-}
-EXPORT_SYMBOL(synchronize_shrinkers);
--
2.30.2

Re: [PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-22 Thread Faith Ekstrand

On Tue, Aug 22, 2023 at 6:41 PM Danilo Krummrich  wrote:

> Currently, NO_PREFETCH is passed implicitly through
> drm_nouveau_gem_pushbuf_push::length and drm_nouveau_exec_push::va_len.
>
> Since this is a direct representation of how the HW is programmed it
> isn't really future proof for a uAPI. Hence, fix this up for the new
> uAPI and split up the va_len field of struct drm_nouveau_exec_push,
> such that we keep 32bit for va_len and 32bit for flags.
>
> For drm_nouveau_gem_pushbuf_push::length at least provide
> NOUVEAU_GEM_PUSHBUF_NO_PREFETCH to indicate the bit shift.
>
> While at it, fix up nv50_dma_push() as well, such that the caller
> doesn't need to encode the NO_PREFETCH flag into the length parameter.
>
> Signed-off-by: Danilo Krummrich 
> ---
>  drivers/gpu/drm/nouveau/nouveau_dma.c  |  7 +--
>  drivers/gpu/drm/nouveau/nouveau_dma.h  |  8 ++--
>  drivers/gpu/drm/nouveau/nouveau_exec.c | 15 ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c  |  6 --
>  include/uapi/drm/nouveau_drm.h |  8 +++-
>  5 files changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c
> b/drivers/gpu/drm/nouveau/nouveau_dma.c
> index b90cac6d5772..059925e5db6a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dma.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dma.c
> @@ -69,16 +69,19 @@ READ_GET(struct nouveau_channel *chan, uint64_t
> *prev_get, int *timeout)
>  }
>
>  void
> -nv50_dma_push(struct nouveau_channel *chan, u64 offset, int length)
> +nv50_dma_push(struct nouveau_channel *chan, u64 offset, u32 length,
> + bool prefetch)
>  {
> struct nvif_user *user = >drm->client.device.user;
> struct nouveau_bo *pb = chan->push.buffer;
> int ip = (chan->dma.ib_put * 2) + chan->dma.ib_base;
>
> BUG_ON(chan->dma.ib_free < 1);
> +   WARN_ON(length > NV50_DMA_PUSH_MAX_LENGTH);
>
> nouveau_bo_wr32(pb, ip++, lower_32_bits(offset));
> -   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8);
> +   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8 |
> +   (prefetch ? 0 : (1 << 31)));
>

It feels a bit weird to be inverting this bit twice. IDK that it matters,
though.


>
> chan->dma.ib_put = (chan->dma.ib_put + 1) & chan->dma.ib_max;
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h
> b/drivers/gpu/drm/nouveau/nouveau_dma.h
> index 035a709c7be1..fb471c357336 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dma.h
> +++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
> @@ -31,7 +31,8 @@
>  #include "nouveau_chan.h"
>
>  int nouveau_dma_wait(struct nouveau_channel *, int slots, int size);
> -void nv50_dma_push(struct nouveau_channel *, u64 addr, int length);
> +void nv50_dma_push(struct nouveau_channel *, u64 addr, u32 length,
> +  bool prefetch);
>
>  /*
>   * There's a hw race condition where you can't jump to your PUT offset,
> @@ -45,6 +46,9 @@ void nv50_dma_push(struct nouveau_channel *, u64 addr,
> int length);
>   */
>  #define NOUVEAU_DMA_SKIPS (128 / 4)
>
> +/* Maximum push buffer size. */
> +#define NV50_DMA_PUSH_MAX_LENGTH 0x7f
> +
>  /* Object handles - for stuff that's doesn't use handle == oclass. */
>  enum {
> NvDmaFB = 0x8002,
> @@ -89,7 +93,7 @@ FIRE_RING(struct nouveau_channel *chan)
>
> if (chan->dma.ib_max) {
> nv50_dma_push(chan, chan->push.addr + (chan->dma.put << 2),
> - (chan->dma.cur - chan->dma.put) << 2);
> + (chan->dma.cur - chan->dma.put) << 2, true);
> } else {
> WRITE_PUT(chan->dma.cur);
> }
> diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c
> b/drivers/gpu/drm/nouveau/nouveau_exec.c
> index 0f927adda4ed..a123b07b2adf 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_exec.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
> @@ -164,8 +164,10 @@ nouveau_exec_job_run(struct nouveau_job *job)
> }
>
> for (i = 0; i < exec_job->push.count; i++) {
> -   nv50_dma_push(chan, exec_job->push.s[i].va,
> - exec_job->push.s[i].va_len);
> +   struct drm_nouveau_exec_push *p = _job->push.s[i];
> +   bool prefetch = !(p->flags &
> DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH);
> +
> +   nv50_dma_push(chan, p->va, p->va_len, prefetch);
> }
>
> ret = nouveau_fence_emit(fence, chan);
> @@ -223,7 +225,14 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
>  {
> struct nouveau_exec_job *job;
> struct nouveau_job_args args = {};
> -   int ret;
> +   int i, ret;
> +
> +   for (i = 0; i < __args->push.count; i++) {
> +   struct drm_nouveau_exec_push *p = &__args->push.s[i];
> +
> +   if (p->va_len > NV50_DMA_PUSH_MAX_LENGTH)
> +   return -EINVAL;
>

This can probably be wrapped in unlikely().  Also, it'd be nice if we
printed an error

RE: [PATCH v14 RESEND 1/6] dt-bindings: display: imx: Add i.MX8qxp/qm DPU binding

2023-08-22 Thread Ying Liu

On  Tuesday, August 22, 2023 7:47 PM Maxime Ripard  wrote:
> 
> Hi,

Hi Maxime,

Thanks for your review.

> 
> On Tue, Aug 22, 2023 at 04:59:44PM +0800, Liu Ying wrote:
> > This patch adds bindings for i.MX8qxp/qm Display Processing Unit.
> >
> > Reviewed-by: Rob Herring 
> > Signed-off-by: Liu Ying 
> > ---
> > v7->v14:
> > * No change.
> >
> > v6->v7:
> > * Add Rob's R-b tag back.
> >
> > v5->v6:
> > * Use graph schema. So, drop Rob's R-b tag as review is needed.
> >
> > v4->v5:
> > * No change.
> >
> > v3->v4:
> > * Improve compatible property by using enum instead of oneOf+const.
> (Rob)
> > * Add Rob's R-b tag.
> >
> > v2->v3:
> > * No change.
> >
> > v1->v2:
> > * Fix yamllint warnings.
> > * Require bypass0 and bypass1 clocks for both i.MX8qxp and i.MX8qm, as
> the
> >   display controller subsystem spec does say that they exist.
> > * Use new dt binding way to add clocks in the example.
> > * Trivial tweaks for the example.
> >
> >  .../bindings/display/imx/fsl,imx8qxp-dpu.yaml | 387 ++
> >  1 file changed, 387 insertions(+)
> >  create mode 100644
> Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-dpu.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-
> dpu.yaml b/Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-
> dpu.yaml
> > new file mode 100644
> > index ..6b05c586cd9d
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/display/imx/fsl,imx8qxp-
> dpu.yaml
> > @@ -0,0 +1,387 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/display/imx/fsl,imx8qxp-dpu.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Freescale i.MX8qm/qxp Display Processing Unit
> > +
> > +maintainers:
> > +  - Liu Ying 
> > +
> > +description: |
> > +  The Freescale i.MX8qm/qxp Display Processing Unit(DPU) is comprised of
> two
> > +  main components that include a blit engine for 2D graphics accelerations
> > +  and a display controller for display output processing, as well as a
> command
> > +  sequencer.
> > +
> > +properties:
> > +  compatible:
> > +enum:
> > +  - fsl,imx8qxp-dpu
> > +  - fsl,imx8qm-dpu
> > +
> > +  reg:
> > +maxItems: 1
> > +
> > +  interrupts:
> > +items:
> > +  - description: |
> > +  store9 shadow load interrupt(blit engine)
> > +  - description: |
> > +  store9 frame complete interrupt(blit engine)
> > +  - description: |
> > +  store9 sequence complete interrupt(blit engine)
> > +  - description: |
> > +  extdst0 shadow load interrupt
> > +  (display controller, content stream 0)
> > +  - description: |
> > +  extdst0 frame complete interrupt
> > +  (display controller, content stream 0)
> > +  - description: |
> > +  extdst0 sequence complete interrupt
> > +  (display controller, content stream 0)
> > +  - description: |
> > +  extdst4 shadow load interrupt
> > +  (display controller, safety stream 0)
> > +  - description: |
> > +  extdst4 frame complete interrupt
> > +  (display controller, safety stream 0)
> > +  - description: |
> > +  extdst4 sequence complete interrupt
> > +  (display controller, safety stream 0)
> > +  - description: |
> > +  extdst1 shadow load interrupt
> > +  (display controller, content stream 1)
> > +  - description: |
> > +  extdst1 frame complete interrupt
> > +  (display controller, content stream 1)
> > +  - description: |
> > +  extdst1 sequence complete interrupt
> > +  (display controller, content stream 1)
> > +  - description: |
> > +  extdst5 shadow load interrupt
> > +  (display controller, safety stream 1)
> > +  - description: |
> > +  extdst5 frame complete interrupt
> > +  (display controller, safety stream 1)
> > +  - description: |
> > +  extdst5 sequence complete interrupt
> > +  (display controller, safety stream 1)
> > +  - description: |
> > +  disengcfg0 shadow load interrupt
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  disengcfg0 frame complete interrupt
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  disengcfg0 sequence complete interrupt
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  framegen0 programmable interrupt0
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  framegen0 programmable interrupt1
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  framegen0 programmable interrupt2
> > +  (display controller, display stream 0)
> > +  - description: |
> > +  framegen0 programmable

Re: [Intel-gfx] [PATCH] drm/i915/dp: Cable type identification for DP2.1

2023-08-22 Thread Almahallawy, Khaled

On Fri, 2023-06-09 at 11:35 +0300, Jani Nikula wrote:
> On Fri, 09 Jun 2023, Animesh Manna  wrote:
> > For DP alt mode display driver get the information
> > about cable speed and cable type through TCSS_DDI_STATUS
> > register which will be updated by type-c platform driver.
> > Accodingly Update dpcd 0x110 with cable information before
> > link training start. This change came part of DP2.1 SCR.
> 
> No need to refer to the SCR anymore, as DP 2.1 is out.
> 
> There are a bunch of detailed comments inline.
> 
> High level, this should probably be done much earlier. See Table 5-21 
> in
> DP 2.1. We need to read DPCD 0x2217 before writing 0x110. The DPRX
> updates 0x2217 before asserting hotplug, so we should probably read
> it
> at detect where we read all other DPCD too.
> 
> How early is TCSS_DDI_STATUS available, should we read that at
> hotplug
> too? 

This is available once the cable is inserted and is configured
by TCSS/EC in Chrome and PD in Windows. 
Please check: VLK-42522

> For USB-C we should write to DPCD 0x110 the least common
> denominator between DPCD 0x2217 and 0x110.
> 
> Another question which I didn't find an answer to yet, does writing
> 0x110 impact what the RPRX reports for capabilities i.e. can we
> proceed

No, DPRX caps will not change. DP2.1 Sink will still repot UHBRx even
if the cable doesn't support UHBRX

> with link training normally from there, *or* should we limit the
> sink_rates/common_rates based on TCSS_DDI_STATUS and DPCD 0x2217
> i.e. filter out UHBR as needed.

Yes, we should limit "common rates" to the intersection of (source,
sink, cable)

The question, do we really need to care about reading/writing DPCD
0x110 & 0x2217 given TCSS_DDI_STATUS already reflects that? 

Thank You
Khaled
> 
> Please read bspec and DP 2.1 further to find answers.
> 
> > Note: This patch is not tested due to unavailability of
> > cable. Sending as RFC for design review.
> > 
> > Signed-off-by: Animesh Manna 
> > ---
> >  drivers/gpu/drm/i915/display/intel_ddi.c | 57
> > 
> >  drivers/gpu/drm/i915/display/intel_tc.c  | 10 +
> >  drivers/gpu/drm/i915/display/intel_tc.h  |  1 +
> >  drivers/gpu/drm/i915/i915_reg.h  |  5 +++
> >  include/drm/display/drm_dp.h |  9 
> >  5 files changed, 82 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c
> > b/drivers/gpu/drm/i915/display/intel_ddi.c
> > index 70d44edd8c6e..3a0f6a3c9f98 100644
> > --- a/drivers/gpu/drm/i915/display/intel_ddi.c
> > +++ b/drivers/gpu/drm/i915/display/intel_ddi.c
> > @@ -2208,6 +2208,55 @@ static void
> > intel_dp_sink_set_msa_timing_par_ignore_state(struct intel_dp
> > *intel
> > str_enable_disable(enable));
> >  }
> >  
> > +#define CABLE_SPEED_SHIFT 4
> > +
> > +enum dp_cable_speed {
> > +   DP_CABLE_HBR3 = 1,
> > +   DP_CABLE_UHBR10,
> > +   DP_CABLE_GEN3_UHBR20,
> > +   DP_CABLE_GEN4_UHBR20
> > +};
> > +
> > +static void intel_dp_set_cable_attributes(struct intel_dp
> > *intel_dp,
> > + u8 cable_attributes)
> 
> There are two "domains" for the cable information, the hardware
> register
> and the DPCD register. However, cable_attributes is neither, but also
> not helpful, which makes this function cumbersome.
> 
> Usually in cases like this, you'd pick one or the other, *or* if you
> want to have a generic middle ground, you'd make it helpful and easy
> to
> use and understand (e.g. a struct).
> 
> In this case, I'd just pick the DPCD as the format, because it's
> platform independent and the whole thing is simple enough.
> 
> So this function would really reduce down to a single DPCD write.
> 
> > +{
> > +   u8 cable_speed;
> > +   bool active_cable, retimer;
> > +   u8 cable_attr_dpcd;
> > +
> > +   cable_speed = cable_attributes >> CABLE_SPEED_SHIFT;
> > +
> > +   switch (cable_speed) {
> > +   case DP_CABLE_HBR3:
> > +   cable_attr_dpcd = 0;
> > +   break;
> > +   case DP_CABLE_UHBR10:
> > +   cable_attr_dpcd = 1;
> > +   break;
> > +   case DP_CABLE_GEN3_UHBR20:
> > +   case DP_CABLE_GEN4_UHBR20:
> > +   cable_attr_dpcd = 2;
> > +   break;
> > +   default:
> > +   cable_attr_dpcd = 0;
> > +   break;
> > +   }
> > +
> > +   active_cable = (cable_attributes <<
> > TCSS_DDI_STATUS_CABLE_ATTR_SHIFT) &
> > +  TCSS_DDI_STATUS_ACTIVE_CABLE;
> > +   retimer = (cable_attributes <<
> > TCSS_DDI_STATUS_CABLE_ATTR_SHIFT) &
> > + TCSS_DDI_STATUS_RETIMER_REDRIVER;
> > +   if (retimer && active_cable)
> > +   cable_attr_dpcd |= DP_CABLE_TYPE_RETIMER_ACTIVE;
> > +   else if (active_cable)
> > +   cable_attr_dpcd |= DP_CABLE_TYPE_LRD_ACTIVE;
> > +   else
> > +   cable_attr_dpcd |= DP_CABLE_TYPE_PASSIVE;
> > +
> > +   drm_dp_dpcd_writeb(_dp->aux,
> > DP_CABLE_ATTRIBUTES_UPDATED_BY_TX,
> > +  cable_attr_dpcd);
> > +}
> > +
> >  static void

[PATCH] accel/habanalabs: refactor deprecated strncpy

2023-08-22 Thread Justin Stitt

`strncpy` is deprecated for use on NUL-terminated destination strings [1].

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on its destination buffer argument which is
_not_ the case for `strncpy`!

There is likely no bug happening in this case since HL_STR_MAX is
strictly larger than all source strings. Nonetheless, prefer a safer and
more robust interface.

It should also be noted that `strscpy` will not pad like `strncpy`. If
this NUL-padding behavior is _required_ we should use `strscpy_pad`
instead of `strscpy`.

Link: 
www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings[1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-harden...@vger.kernel.org
Signed-off-by: Justin Stitt 
---
Note: build-tested only.
---
 drivers/accel/habanalabs/common/habanalabs_drv.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c 
b/drivers/accel/habanalabs/common/habanalabs_drv.c
index 7263e84c1a4d..d9a3418b5ae4 100644
--- a/drivers/accel/habanalabs/common/habanalabs_drv.c
+++ b/drivers/accel/habanalabs/common/habanalabs_drv.c
@@ -408,13 +408,13 @@ static int create_hdev(struct hl_device **dev, struct 
pci_dev *pdev)
hdev->pdev = pdev;
 
/* Assign status description string */
-   strncpy(hdev->status[HL_DEVICE_STATUS_OPERATIONAL], "operational", 
HL_STR_MAX);
-   strncpy(hdev->status[HL_DEVICE_STATUS_IN_RESET], "in reset", 
HL_STR_MAX);
-   strncpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION], "disabled", 
HL_STR_MAX);
-   strncpy(hdev->status[HL_DEVICE_STATUS_NEEDS_RESET], "needs reset", 
HL_STR_MAX);
-   strncpy(hdev->status[HL_DEVICE_STATUS_IN_DEVICE_CREATION],
-   "in device creation", HL_STR_MAX);
-   strncpy(hdev->status[HL_DEVICE_STATUS_IN_RESET_AFTER_DEVICE_RELEASE],
+   strscpy(hdev->status[HL_DEVICE_STATUS_OPERATIONAL], "operational", 
HL_STR_MAX);
+   strscpy(hdev->status[HL_DEVICE_STATUS_IN_RESET], "in reset", 
HL_STR_MAX);
+   strscpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION], "disabled", 
HL_STR_MAX);
+   strscpy(hdev->status[HL_DEVICE_STATUS_NEEDS_RESET], "needs reset", 
HL_STR_MAX);
+   strscpy(hdev->status[HL_DEVICE_STATUS_IN_DEVICE_CREATION],
+   "in device creation", HL_STR_MAX);
+   strscpy(hdev->status[HL_DEVICE_STATUS_IN_RESET_AFTER_DEVICE_RELEASE],
"in reset after device release", 
HL_STR_MAX);
 
 

---
base-commit: 706a741595047797872e669b3101429ab8d378ef
change-id: 
20230823-strncpy-drivers-accel-habanalabs-common-habanalabs_drv-7ffecf6882ed

Best regards,
--
Justin Stitt

Re: [PATCH v2 2/4] drm/xe/vm: Implement userptr page pinning

2023-08-22 Thread Matthew Brost

On Tue, Aug 22, 2023 at 06:21:34PM +0200, Thomas Hellström wrote:
> Implement pinning of userptrs between VM_BIND and VM_UNBIND, which will
> facilitate avoiding long hangs on non-preemptible workloads. But don't
> hook it up to userspace just yet.
> 
> v2:
> - Avoid marking userptr VMAs as invalid in the mmu invalidation notifier.
>   (Matthew Brost)
> - Add an WARN that we don't try to repin userptr pages (Matthew Brost)
> 
> Signed-off-by: Thomas Hellström 

Reviewed-by: Matthew Brost 

> ---
>  drivers/gpu/drm/xe/xe_vm.c   | 80 +++-
>  drivers/gpu/drm/xe/xe_vm.h   |  9 
>  drivers/gpu/drm/xe/xe_vm_types.h | 12 +
>  3 files changed, 79 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 8bf7f62e6548..037ac42f74a5 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -74,10 +74,6 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>   if (notifier_seq == vma->userptr.notifier_seq)
>   return 0;
>  
> - pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
> - if (!pages)
> - return -ENOMEM;
> -
>   if (vma->userptr.sg) {
>   dma_unmap_sgtable(xe->drm.dev,
> vma->userptr.sg,
> @@ -87,6 +83,18 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>   vma->userptr.sg = NULL;
>   }
>  
> + /* TODO: Convert to xe_assert() */
> + if (XE_WARN_ON(vma->userptr.pinned_pages)) {
> + unpin_user_pages_dirty_lock(vma->userptr.pinned_pages,
> + vma->userptr.num_pinned,
> + !read_only);
> + pages = vma->userptr.pinned_pages;
> + } else {
> + pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
> + if (!pages)
> + return -ENOMEM;
> + }
> +
>   pinned = ret = 0;
>   if (in_kthread) {
>   if (!mmget_not_zero(vma->userptr.notifier.mm)) {
> @@ -97,11 +105,18 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>   }
>  
>   while (pinned < num_pages) {
> - ret = get_user_pages_fast(xe_vma_userptr(vma) +
> -   pinned * PAGE_SIZE,
> -   num_pages - pinned,
> -   read_only ? 0 : FOLL_WRITE,
> -   [pinned]);
> + if (xe_vma_is_pinned(vma))
> + ret = pin_user_pages_fast(xe_vma_userptr(vma) +
> +   pinned * PAGE_SIZE,
> +   num_pages - pinned,
> +   read_only ? 0 : FOLL_WRITE,
> +   [pinned]);
> + else
> + ret = get_user_pages_fast(xe_vma_userptr(vma) +
> +   pinned * PAGE_SIZE,
> +   num_pages - pinned,
> +   read_only ? 0 : FOLL_WRITE,
> +   [pinned]);
>   if (ret < 0) {
>   if (in_kthread)
>   ret = 0;
> @@ -137,19 +152,24 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>   if (ret)
>   goto out_free_sg;
>  
> - for (i = 0; i < pinned; ++i) {
> - if (!read_only) {
> - lock_page(pages[i]);
> - set_page_dirty(pages[i]);
> - unlock_page(pages[i]);
> + if (!xe_vma_is_pinned(vma)) {
> + for (i = 0; i < pinned; ++i) {
> + if (!read_only) {
> + lock_page(pages[i]);
> + set_page_dirty(pages[i]);
> + unlock_page(pages[i]);
> + }
> +
> + mark_page_accessed(pages[i]);
>   }
>  
> - mark_page_accessed(pages[i]);
> + release_pages(pages, pinned);
> + kvfree(pages);
> + } else {
> + vma->userptr.pinned_pages = pages;
> + vma->userptr.num_pinned = pinned;
>   }
>  
> - release_pages(pages, pinned);
> - kvfree(pages);
> -
>   vma->userptr.notifier_seq = notifier_seq;
>   if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
>   goto retry;
> @@ -160,9 +180,14 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
>   sg_free_table(vma->userptr.sg);
>   vma->userptr.sg = NULL;
>  out_release_pages:
> - release_pages(pages, pinned);
> + if (!xe_vma_is_pinned(vma))
> + release_pages(pages, pinned);
> + else
> + unpin_user_pages(pages, pinned);
> + vma->userptr.num_pinned = 0;
>  mm_closed:
>

[PATCH drm-misc-next] drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly

2023-08-22 Thread Danilo Krummrich

Currently, NO_PREFETCH is passed implicitly through
drm_nouveau_gem_pushbuf_push::length and drm_nouveau_exec_push::va_len.

Since this is a direct representation of how the HW is programmed it
isn't really future proof for a uAPI. Hence, fix this up for the new
uAPI and split up the va_len field of struct drm_nouveau_exec_push,
such that we keep 32bit for va_len and 32bit for flags.

For drm_nouveau_gem_pushbuf_push::length at least provide
NOUVEAU_GEM_PUSHBUF_NO_PREFETCH to indicate the bit shift.

While at it, fix up nv50_dma_push() as well, such that the caller
doesn't need to encode the NO_PREFETCH flag into the length parameter.

Signed-off-by: Danilo Krummrich 
---
 drivers/gpu/drm/nouveau/nouveau_dma.c  |  7 +--
 drivers/gpu/drm/nouveau/nouveau_dma.h  |  8 ++--
 drivers/gpu/drm/nouveau/nouveau_exec.c | 15 ---
 drivers/gpu/drm/nouveau/nouveau_gem.c  |  6 --
 include/uapi/drm/nouveau_drm.h |  8 +++-
 5 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.c 
b/drivers/gpu/drm/nouveau/nouveau_dma.c
index b90cac6d5772..059925e5db6a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.c
@@ -69,16 +69,19 @@ READ_GET(struct nouveau_channel *chan, uint64_t *prev_get, 
int *timeout)
 }
 
 void
-nv50_dma_push(struct nouveau_channel *chan, u64 offset, int length)
+nv50_dma_push(struct nouveau_channel *chan, u64 offset, u32 length,
+ bool prefetch)
 {
struct nvif_user *user = >drm->client.device.user;
struct nouveau_bo *pb = chan->push.buffer;
int ip = (chan->dma.ib_put * 2) + chan->dma.ib_base;
 
BUG_ON(chan->dma.ib_free < 1);
+   WARN_ON(length > NV50_DMA_PUSH_MAX_LENGTH);
 
nouveau_bo_wr32(pb, ip++, lower_32_bits(offset));
-   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8);
+   nouveau_bo_wr32(pb, ip++, upper_32_bits(offset) | length << 8 |
+   (prefetch ? 0 : (1 << 31)));
 
chan->dma.ib_put = (chan->dma.ib_put + 1) & chan->dma.ib_max;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h 
b/drivers/gpu/drm/nouveau/nouveau_dma.h
index 035a709c7be1..fb471c357336 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dma.h
+++ b/drivers/gpu/drm/nouveau/nouveau_dma.h
@@ -31,7 +31,8 @@
 #include "nouveau_chan.h"
 
 int nouveau_dma_wait(struct nouveau_channel *, int slots, int size);
-void nv50_dma_push(struct nouveau_channel *, u64 addr, int length);
+void nv50_dma_push(struct nouveau_channel *, u64 addr, u32 length,
+  bool prefetch);
 
 /*
  * There's a hw race condition where you can't jump to your PUT offset,
@@ -45,6 +46,9 @@ void nv50_dma_push(struct nouveau_channel *, u64 addr, int 
length);
  */
 #define NOUVEAU_DMA_SKIPS (128 / 4)
 
+/* Maximum push buffer size. */
+#define NV50_DMA_PUSH_MAX_LENGTH 0x7f
+
 /* Object handles - for stuff that's doesn't use handle == oclass. */
 enum {
NvDmaFB = 0x8002,
@@ -89,7 +93,7 @@ FIRE_RING(struct nouveau_channel *chan)
 
if (chan->dma.ib_max) {
nv50_dma_push(chan, chan->push.addr + (chan->dma.put << 2),
- (chan->dma.cur - chan->dma.put) << 2);
+ (chan->dma.cur - chan->dma.put) << 2, true);
} else {
WRITE_PUT(chan->dma.cur);
}
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c 
b/drivers/gpu/drm/nouveau/nouveau_exec.c
index 0f927adda4ed..a123b07b2adf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -164,8 +164,10 @@ nouveau_exec_job_run(struct nouveau_job *job)
}
 
for (i = 0; i < exec_job->push.count; i++) {
-   nv50_dma_push(chan, exec_job->push.s[i].va,
- exec_job->push.s[i].va_len);
+   struct drm_nouveau_exec_push *p = _job->push.s[i];
+   bool prefetch = !(p->flags & DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH);
+
+   nv50_dma_push(chan, p->va, p->va_len, prefetch);
}
 
ret = nouveau_fence_emit(fence, chan);
@@ -223,7 +225,14 @@ nouveau_exec_job_init(struct nouveau_exec_job **pjob,
 {
struct nouveau_exec_job *job;
struct nouveau_job_args args = {};
-   int ret;
+   int i, ret;
+
+   for (i = 0; i < __args->push.count; i++) {
+   struct drm_nouveau_exec_push *p = &__args->push.s[i];
+
+   if (p->va_len > NV50_DMA_PUSH_MAX_LENGTH)
+   return -EINVAL;
+   }
 
job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL);
if (!job)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f39360870c70..2f3dc4d71657 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -856,9 +856,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void 
*data,
for (i = 0; i < req->nr_push;

Re: [RFC PATCH v2 06/11] page-pool: add device memory support

2023-08-22 Thread Mina Almasry

On Tue, Aug 22, 2023 at 5:24 AM Jesper Dangaard Brouer
 wrote:
>
>
>
> On 22/08/2023 08.05, Mina Almasry wrote:
> > On Sat, Aug 19, 2023 at 2:51 AM Jesper Dangaard Brouer
> >  wrote:
> >>
> >> On 10/08/2023 03.57, Mina Almasry wrote:
> >>> Overload the LSB of struct page* to indicate that it's a page_pool_iov.
> >>>
> >>> Refactor mm calls on struct page * into helpers, and add page_pool_iov
> >>> handling on those helpers. Modify callers of these mm APIs with calls to
> >>> these helpers instead.
> >>>
> >>
> >> I don't like of this approach.
> >> This is adding code to the PP (page_pool) fast-path in multiple places.
> >>
> >> I've not had time to run my usual benchmarks, which are here:
> >>
> >> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> >>
> >
> > I ported over this benchmark to my tree and ran it, my results:
> >
>
> What CPU is this and GHz?  (I guess 2.6 GHz based on results).
>
> (It looks like this CPU is more efficient, instructions per cycles, than
> my E5-1650 v4 @ 3.60GHz).
>

cat /proc/cpuinfo
...
vendor_id   : GenuineIntel
cpu family  : 6
model   : 143
model name  : Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
stepping: 8
microcode   : 0x
cpu MHz : 2699.998
```

This is a vCPU on the Google Cloud A3 VMs.

> > net-next @ b44693495af8
> > https://pastebin.com/raw/JuU7UQXe
> >
> > + Jakub's memory-provider APIs:
> > https://pastebin.com/raw/StMBhetn
> >
> > + devmem TCP changes:
> > https://pastebin.com/raw/mY1L6U4r
> >
>
> Only a single cycle slowdown for "page_pool01_fast_path".
>  From 10 cycles to 11 cycles.
>
> > + intentional regression just to make sure the benchmark is working:
> > https://pastebin.com/raw/wqWhcJdG
> >
> > I don't seem to be able to detect a regression with this series as-is,
> > but I'm not that familiar with the test and may be doing something
> > wrong or misinterpreting the results. Does this look ok to you?
> >
>
> The performance results are better than I expected.  The small
> regression from 10 cycles to 11 cycles is actually 10%, but I expect
> with some likely/unlikely instrumentation we can "likely" remove this again.
>

So the patch is already optimized carefully (I hope) to put all the
devmem processing in the default unlikely path. Willem showed me that:

if (page_pool_iov())
   return handle_page_pool_iov();

return handle_page();

The handle_page() will be 'likely' by default, which removes the need
for explicit likely/unlikely. I'm not sure we can get better perf with
explicit likely/unlikey, but I can try.

> So, this change actually looks acceptable from a performance PoV.
> I still think this page_pool_iov is very invasive to page_pool, but
> maybe it is better to hide this "uglyness" inside page_pool.
>
> The test primarily tests fast-path, and you also add "if" statements to
> all the DMA operations, which is not part of this benchmark.  Perhaps we
> can add unlikely statements, or inspect (objdump) the ASM to check code
> priorities the original page based "provider".
>
> >> But I'm sure it will affect performance.
> >>
>
> Guess, I was wrong ;-)
>
> --Jesper
>
>
> >> Regardless of performance, this approach is using ptr-LSB-bits, to hide
> >> that page-pointer are not really struct-pages, feels like force feeding
> >> a solution just to use the page_pool APIs.
> >>
> >>
> >>> In areas where struct page* is dereferenced, add a check for special
> >>> handling of page_pool_iov.
> >>>
> >>> The memory providers producing page_pool_iov can set the LSB on the
> >>> struct page* returned to the page pool.
> >>>
> >>> Note that instead of overloading the LSB of page pointers, we can
> >>> instead define a new union between struct page & struct page_pool_iov and
> >>> compact it in a new type. However, we'd need to implement the code churn
> >>> to modify the page_pool & drivers to use this new type. For this POC
> >>> that is not implemented (feedback welcome).
> >>>
> >>
> >> I've said before, that I prefer multiplexing on page->pp_magic.
> >> For your page_pool_iov the layout would have to match the offset of
> >> pp_magic, to do this. (And if insisting on using PP infra the refcnt
> >> would also need to align).
> >>
> >> On the allocation side, all drivers already use a driver helper
> >> page_pool_dev_alloc_pages() or we could add another (better named)
> >> helper to multiplex between other types of allocators, e.g. a devmem
> >> allocator.
> >>
> >> On free/return/recycle the functions napi_pp_put_page or skb_pp_recycle
> >> could multiplex on pp_magic and call another API.  The API could be an
> >> extension to PP helpers, but it could also be a devmap allocator helper.
> >>
> >> IMHO forcing/piggy-bagging everything into page_pool is not the right
> >> solution.  I really think netstack need to support different allocator
> >> types. The page pool have been leading the way, yes, but perhaps it is
> >> time to add an API layer that e.g.

Re: [PATCH] gpu: drm: i915: fix documentation style

2023-08-22 Thread Vivi, Rodrigo

On Mon, 2023-08-21 at 14:00 -0700, Ceraolo Spurio, Daniele wrote:
> 
> 
> On 8/21/2023 9:22 AM, Jani Nikula wrote:
> > On Mon, 21 Aug 2023, "Ricardo B. Marliere" 
> > wrote:
> > > This patch fixes the following sphinx warnings in the htmldocs
> > > make target:
> > > 
> > > Documentation/gpu/i915:546:
> > > ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:29: ERROR: Unexpected
> > > indentation.
> > > Documentation/gpu/i915:546:
> > > ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:30: WARNING: Block quote
> > > ends without a blank line; unexpected unindent.
> > > Documentation/gpu/i915:546:
> > > ./drivers/gpu/drm/i915/gt/uc/intel_huc.c:35: WARNING: Bullet list
> > > ends without a blank line; unexpected unindent.
> > > 
> > > Signed-off-by: Ricardo B. Marliere 
> > Already fixed by commit 175b036472f6 ("drm/i915: fix Sphinx
> > indentation
> > warning") in drm-next.
> 
> Should we send this commit through the -fixes path, so it gets
> included 
> in 6.5?

175b036472f6 cherry-picked to drm-intel-fixes. Should be in this
week's pull request towards 6.5

> 
> Daniele
> 
> > BR,
> > Jani.
> > 
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> > > b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> > > index ddd146265beb..fa70defcb5b2 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> > > @@ -26,6 +26,7 @@
> > >    * The kernel driver is only responsible for loading the HuC
> > > firmware and
> > >    * triggering its security authentication. This is done
> > > differently depending
> > >    * on the platform:
> > > + *
> > >    * - older platforms (from Gen9 to most Gen12s): the load is
> > > performed via DMA
> > >    *   and the authentication via GuC
> > >    * - DG2: load and authentication are both performed via GSC.
> > > @@ -33,6 +34,7 @@
> > >    *   not-DG2 older platforms), while the authentication is done
> > > in 2-steps,
> > >    *   a first auth for clear-media workloads via GuC and a
> > > second one for all
> > >    *   workloads via GSC.
> > > + *
> > >    * On platforms where the GuC does the authentication, to
> > > correctly do so the
> > >    * HuC binary must be loaded before the GuC one.
> > >    * Loading the HuC is optional; however, not using the HuC
> > > might negatively
>

Re: [PATCH] clk: Annotate struct clk_hw_onecell_data with __counted_by

2023-08-22 Thread Stephen Boyd

Quoting Kees Cook (2023-08-17 13:30:22)
> Prepare for the coming implementation by GCC and Clang of the __counted_by
> attribute. Flexible array members annotated with __counted_by can have
> their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
> (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
> functions).
> 
> As found with Coccinelle[1], add __counted_by for struct clk_hw_onecell_data.
> Additionally, since the element count member must be set before accessing
> the annotated flexible array member, move its initialization earlier.
> 
> [1] 
> https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
> 
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: Joel Stanley 
> Cc: Andrew Jeffery 
> Cc: Taichi Sugaya 
> Cc: Takao Orito 
> Cc: Qin Jian 
> Cc: Andrew Lunn 
> Cc: Gregory Clement 
> Cc: Sebastian Hesselbarth 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Konrad Dybcio 
> Cc: Sergio Paracuellos 
> Cc: Matthias Brugger 
> Cc: AngeloGioacchino Del Regno 
> Cc: Maxime Ripard 
> Cc: Chen-Yu Tsai 
> Cc: Jernej Skrabec 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Samuel Holland 
> Cc: Vinod Koul 
> Cc: Kishon Vijay Abraham I 
> Cc: linux-...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-asp...@lists.ozlabs.org
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-media...@lists.infradead.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-su...@lists.linux.dev
> Cc: linux-...@lists.infradead.org
> Signed-off-by: Kees Cook 
> ---

Applied to clk-next

Re: [PATCH] accel/ivpu/40xx: Fix buttress interrupt handling

2023-08-22 Thread Jeffrey Hugo


On 8/22/2023 3:52 AM, Stanislaw Gruszka wrote:

From: Karol Wachowski 

Buttress spec requires that the interrupt status is cleared at
the source first (before clearing MTL_BUTTRESS_INTERRUPT_STAT),
that implies that we have to mask out the global interrupt while
handling buttress interrupts.

Fixes: 79cdc56c4a54 ("accel/ivpu: Add initial support for VPU 4")
Signed-off-by: Karol Wachowski 
Signed-off-by: Stanislaw Gruszka 


Reviewed-by: Jeffrey Hugo

Re: [RFC]: shmem fd for non-DMA buffer sharing cross drivers

2023-08-22 Thread Nicolas Dufresne

Hi,

Le mardi 22 août 2023 à 19:14 +0800, Hsia-Jun Li a écrit :
> Hello
> 
> I would like to introduce a usage of SHMEM slimier to DMA-buf, the major 
> purpose of that is sharing metadata or just a pure container for cross 
> drivers.
> 
> We need to exchange some sort of metadata between drivers, likes dynamic 
> HDR data between video4linux2 and DRM. Or the graphics frame buffer is 
> too complex to be described with plain plane's DMA-buf fd.
> An issue between DRM and V4L2 is that DRM could only support 4 planes 
> while it is 8 for V4L2. It would be pretty hard for DRM to expend its 
> interface to support that 4 more planes which would lead to revision of 
> many standard likes Vulkan, EGL.
> 
> Also, there is no reason to consume a device's memory for the content 
> that device can't read it, or wasting an entry of IOMMU for such data.
> Usually, such a metadata would be the value should be written to a 
> hardware's registers, a 4KiB page would be 1024 items of 32 bits registers.
> 
> Still, I have some problems with SHMEM:
> 1. I don't want thhe userspace modify the context of the SHMEM allocated 
> by the kernel, is there a way to do so?
> 2. Should I create a helper function for installing the SHMEM file as a fd?

Please have a look at memfd and the seal feature, it does cover the reason why
unsealed shared memory require full trust. For controls, the SEAL_WRITE is even
needed, as with appropriate timing, a malicous process can modify the data in-
between validation and allocation, causing possible memory overflow.

https://man7.org/linux/man-pages/man2/memfd_create.2.html
File sealing
   In the absence of file sealing, processes that communicate via
   shared memory must either trust each other, or take measures to
   deal with the possibility that an untrusted peer may manipulate
   the shared memory region in problematic ways.  For example, an
   untrusted peer might modify the contents of the shared memory at
   any time, or shrink the shared memory region.  The former
   possibility leaves the local process vulnerable to time-of-check-
   to-time-of-use race conditions (typically dealt with by copying
   data from the shared memory region before checking and using it).
   The latter possibility leaves the local process vulnerable to
   SIGBUS signals when an attempt is made to access a now-
   nonexistent location in the shared memory region.  (Dealing with
   this possibility necessitates the use of a handler for the SIGBUS
   signal.)

   Dealing with untrusted peers imposes extra complexity on code
   that employs shared memory.  Memory sealing enables that extra
   complexity to be eliminated, by allowing a process to operate
   secure in the knowledge that its peer can't modify the shared
   memory in an undesired fashion.

   [...]

regards,
Nicolas

Re: [PATCH v5 03/11] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-22 Thread Rob Clark

On Tue, Aug 22, 2023 at 11:48 AM Rafael J. Wysocki  wrote:
>
> On Tue, Aug 22, 2023 at 8:02 PM Rob Clark  wrote:
> >
> > From: Rob Clark 
> >
> > In the process of adding lockdep annotation for drm GPU scheduler's
> > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > this lockdep splat:
> >
> >==
> >WARNING: possible circular locking dependency detected
> >6.2.0-rc8-debug+ #558 Tainted: GW
> >--
> >ring0/125 is trying to acquire lock:
> >ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> > dev_pm_qos_update_request+0x38/0x68
> >
> >but task is already holding lock:
> >ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> > msm_gpu_submit+0xec/0x178
> >
> >which lock already depends on the new lock.
> >
> >the existing dependency chain (in reverse order) is:
> >
> >-> #4 (>active_lock){+.+.}-{3:3}:
> >   __mutex_lock+0xcc/0x3c8
> >   mutex_lock_nested+0x30/0x44
> >   msm_gpu_submit+0xec/0x178
> >   msm_job_run+0x78/0x150
> >   drm_sched_main+0x290/0x370
> >   kthread+0xf0/0x100
> >   ret_from_fork+0x10/0x20
> >
> >-> #3 (dma_fence_map){}-{0:0}:
> >   __dma_fence_might_wait+0x74/0xc0
> >   dma_resv_lockdep+0x1f4/0x2f4
> >   do_one_initcall+0x104/0x2bc
> >   kernel_init_freeable+0x344/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> >   fs_reclaim_acquire+0x80/0xa8
> >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >   __kmem_cache_alloc_node+0x60/0x1cc
> >   __kmalloc+0xd8/0x100
> >   topology_parse_cpu_capacity+0x8c/0x178
> >   get_cpu_for_node+0x88/0xc4
> >   parse_cluster+0x1b0/0x28c
> >   parse_cluster+0x8c/0x28c
> >   init_cpu_topology+0x168/0x188
> >   smp_prepare_cpus+0x24/0xf8
> >   kernel_init_freeable+0x18c/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #1 (fs_reclaim){+.+.}-{0:0}:
> >   __fs_reclaim_acquire+0x3c/0x48
> >   fs_reclaim_acquire+0x54/0xa8
> >   slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >   __kmem_cache_alloc_node+0x60/0x1cc
> >   kmalloc_trace+0x50/0xa8
> >   dev_pm_qos_constraints_allocate+0x38/0x100
> >   __dev_pm_qos_add_request+0xb0/0x1e8
> >   dev_pm_qos_add_request+0x58/0x80
> >   dev_pm_qos_expose_latency_limit+0x60/0x13c
> >   register_cpu+0x12c/0x130
> >   topology_init+0xac/0xbc
> >   do_one_initcall+0x104/0x2bc
> >   kernel_init_freeable+0x344/0x34c
> >   kernel_init+0x30/0x134
> >   ret_from_fork+0x10/0x20
> >
> >-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> >   __lock_acquire+0xe00/0x1060
> >   lock_acquire+0x1e0/0x2f8
> >   __mutex_lock+0xcc/0x3c8
> >   mutex_lock_nested+0x30/0x44
> >   dev_pm_qos_update_request+0x38/0x68
> >   msm_devfreq_boost+0x40/0x70
> >   msm_devfreq_active+0xc0/0xf0
> >   msm_gpu_submit+0x10c/0x178
> >   msm_job_run+0x78/0x150
> >   drm_sched_main+0x290/0x370
> >   kthread+0xf0/0x100
> >   ret_from_fork+0x10/0x20
> >
> >other info that might help us debug this:
> >
> >Chain exists of:
> >  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
> >
> > Possible unsafe locking scenario:
> >
> >   CPU0CPU1
> >   
> >  lock(>active_lock);
> >   lock(dma_fence_map);
> >   lock(>active_lock);
> >  lock(dev_pm_qos_mtx);
> >
> > *** DEADLOCK ***
> >
> >3 locks held by ring0/123:
> > #0: ff8087251170 (>lock){+.+.}-{3:3}, at: 
> > msm_job_run+0x64/0x150
> > #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> > msm_job_run+0x68/0x150
> > #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> > msm_gpu_submit+0xec/0x178
> >
> >stack backtrace:
> >CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> >Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >Call trace:
> > dump_backtrace.part.0+0xb4/0xf8
> > show_stack+0x20/0x38
> > dump_stack_lvl+0x9c/0xd0
> > dump_stack+0x18/0x34
> > print_circular_bug+0x1b4/0x1f0
> > check_noncircular+0x78/0xac
> > __lock_acquire+0xe00/0x1060
> > lock_acquire+0x1e0/0x2f8
> > __mutex_lock+0xcc/0x3c8
> > mutex_lock_nested+0x30/0x44
> > dev_pm_qos_update_request+0x38/0x68
> > msm_devfreq_boost+0x40/0x70
> > msm_devfreq_active+0xc0/0xf0
> > msm_gpu_submit+0x10c/0x178
> > msm_job_run+0x78/0x150
> > drm_sched_main+0x290/0x370
> >

Re: [PATCH 0/4] drm/amd/display: stop using drm_edid_override_connector_update()

2023-08-22 Thread Alex Hung





On 2023-08-22 06:01, Jani Nikula wrote:

Over the past years I've been trying to unify the override and firmware
EDID handling as well as EDID property updates. It won't work if drivers
do their own random things.
Let's check how to replace these references by appropriate ones or fork 
the function as reverting these patches causes regressions.


Cheers,
Alex



BR,
Jani.


Cc: Alex Deucher 
Cc: Alex Hung 
Cc: Chao-kai Wang 
Cc: Daniel Wheeler 
Cc: Harry Wentland 
Cc: Hersen Wu 
Cc: Leo Li 
Cc: Rodrigo Siqueira 
Cc: Wenchieh Chien 
Cc: David Airlie 
Cc: Daniel Vetter 

Jani Nikula (4):
   Revert "drm/amd/display: drop unused count variable in
 create_eml_sink()"
   Revert "drm/amd/display: assign edid_blob_ptr with edid from debugfs"
   Revert "drm/amd/display: mark amdgpu_dm_connector_funcs_force static"
   Revert "drm/amd/display: implement force function in
 amdgpu_dm_connector_funcs"

  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 44 +++
  1 file changed, 5 insertions(+), 39 deletions(-)

Re: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-22 Thread John Harrison


On 8/11/2023 11:20, Zhanjun Dong wrote:

This attempts to avoid circular locking dependency between flush delayed
work and intel_gt_reset.
When intel_gt_reset was called, task will hold a lock.
To cacel delayed work here, the _sync version will also acquire a lock,
which might trigger the possible cirular locking dependency warning.
When intel_gt_reset called, reset_in_progress flag will be set, add code
to check the flag, call async verion if reset is in progress.

Signed-off-by: Zhanjun Dong
Cc: John Harrison
Cc: Andi Shyti
Cc: Daniel Vetter
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a0e3ef1c65d2..600388c849f7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct intel_guc 
*guc)
  
  static void guc_cancel_busyness_worker(struct intel_guc *guc)

  {
-   cancel_delayed_work_sync(>timestamp.work);
+   /*
+* When intel_gt_reset was called, task will hold a lock.
+* To cacel delayed work here, the _sync version will also acquire a 
lock, which might
+* trigger the possible cirular locking dependency warning.
+* Check the reset_in_progress flag, call async verion if reset is in 
progress.
+*/
This needs to explain in much more detail what is going on and why it is 
not a problem. E.g.:


   The busyness worker needs to be cancelled. In general that means
   using the synchronous cancel version to ensure that an in-progress
   worker will not keep executing beyond whatever is happening that
   needs the cancel. E.g. suspend, driver unload, etc. However, in the
   case of a reset, the synchronous version is not required and can
   trigger a false deadlock detection warning.

   The business worker takes the reset mutex to protect against resets
   interfering with it. However, it does a trylock and bails out if the
   reset lock is already acquired. Thus there is no actual deadlock or
   other concern with the worker running concurrently with a reset. So
   an asynchronous cancel is safe in the case of a reset rather than a
   driver unload or suspend type operation. On the other hand, if the
   cancel_sync version is used when a reset is in progress then the
   mutex deadlock detection sees the mutex being acquired through
   multiple paths and complains.

   So just don't bother. That keeps the detection code happy and is
   safe because of the trylock code described above.


John.



+   if (guc_to_gt(guc)->uc.reset_in_progress)
+   cancel_delayed_work(>timestamp.work);
+   else
+   cancel_delayed_work_sync(>timestamp.work);
  }
  
  static void __reset_guc_busyness_stats(struct intel_guc *guc)

Re: [PATCH v5 03/11] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-22 Thread Rafael J. Wysocki

On Tue, Aug 22, 2023 at 8:02 PM Rob Clark  wrote:
>
> From: Rob Clark 
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
>==
>WARNING: possible circular locking dependency detected
>6.2.0-rc8-debug+ #558 Tainted: GW
>--
>ring0/125 is trying to acquire lock:
>ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
> dev_pm_qos_update_request+0x38/0x68
>
>but task is already holding lock:
>ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>which lock already depends on the new lock.
>
>the existing dependency chain (in reverse order) is:
>
>-> #4 (>active_lock){+.+.}-{3:3}:
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   msm_gpu_submit+0xec/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>-> #3 (dma_fence_map){}-{0:0}:
>   __dma_fence_might_wait+0x74/0xc0
>   dma_resv_lockdep+0x1f4/0x2f4
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
>   fs_reclaim_acquire+0x80/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   __kmalloc+0xd8/0x100
>   topology_parse_cpu_capacity+0x8c/0x178
>   get_cpu_for_node+0x88/0xc4
>   parse_cluster+0x1b0/0x28c
>   parse_cluster+0x8c/0x28c
>   init_cpu_topology+0x168/0x188
>   smp_prepare_cpus+0x24/0xf8
>   kernel_init_freeable+0x18c/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #1 (fs_reclaim){+.+.}-{0:0}:
>   __fs_reclaim_acquire+0x3c/0x48
>   fs_reclaim_acquire+0x54/0xa8
>   slab_pre_alloc_hook.constprop.0+0x40/0x25c
>   __kmem_cache_alloc_node+0x60/0x1cc
>   kmalloc_trace+0x50/0xa8
>   dev_pm_qos_constraints_allocate+0x38/0x100
>   __dev_pm_qos_add_request+0xb0/0x1e8
>   dev_pm_qos_add_request+0x58/0x80
>   dev_pm_qos_expose_latency_limit+0x60/0x13c
>   register_cpu+0x12c/0x130
>   topology_init+0xac/0xbc
>   do_one_initcall+0x104/0x2bc
>   kernel_init_freeable+0x344/0x34c
>   kernel_init+0x30/0x134
>   ret_from_fork+0x10/0x20
>
>-> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
>   __lock_acquire+0xe00/0x1060
>   lock_acquire+0x1e0/0x2f8
>   __mutex_lock+0xcc/0x3c8
>   mutex_lock_nested+0x30/0x44
>   dev_pm_qos_update_request+0x38/0x68
>   msm_devfreq_boost+0x40/0x70
>   msm_devfreq_active+0xc0/0xf0
>   msm_gpu_submit+0x10c/0x178
>   msm_job_run+0x78/0x150
>   drm_sched_main+0x290/0x370
>   kthread+0xf0/0x100
>   ret_from_fork+0x10/0x20
>
>other info that might help us debug this:
>
>Chain exists of:
>  dev_pm_qos_mtx --> dma_fence_map --> >active_lock
>
> Possible unsafe locking scenario:
>
>   CPU0CPU1
>   
>  lock(>active_lock);
>   lock(dma_fence_map);
>   lock(>active_lock);
>  lock(dev_pm_qos_mtx);
>
> *** DEADLOCK ***
>
>3 locks held by ring0/123:
> #0: ff8087251170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> #1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: 
> msm_job_run+0x68/0x150
> #2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
> msm_gpu_submit+0xec/0x178
>
>stack backtrace:
>CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
>Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>Call trace:
> dump_backtrace.part.0+0xb4/0xf8
> show_stack+0x20/0x38
> dump_stack_lvl+0x9c/0xd0
> dump_stack+0x18/0x34
> print_circular_bug+0x1b4/0x1f0
> check_noncircular+0x78/0xac
> __lock_acquire+0xe00/0x1060
> lock_acquire+0x1e0/0x2f8
> __mutex_lock+0xcc/0x3c8
> mutex_lock_nested+0x30/0x44
> dev_pm_qos_update_request+0x38/0x68
> msm_devfreq_boost+0x40/0x70
> msm_devfreq_active+0xc0/0xf0
> msm_gpu_submit+0x10c/0x178
> msm_job_run+0x78/0x150
> drm_sched_main+0x290/0x370
> kthread+0xf0/0x100
> ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before

Re: [PATCH] drm/prime: Support page array >= 4GB

2023-08-22 Thread Philip Yang


  


On 2023-08-22 05:43, Christian König
  wrote:


  
  
  Am 21.08.23 um 22:02 schrieb Philip Yang:
  
  Without unsigned long typecast, the size
is passed in as zero if page

array size >= 4GB, nr_pages >= 0x10, then sg list
converted will

have the first and the last chunk lost.

  
  
  Good catch, but I'm not sure if this is enough to make it work.
  
  
  Additional to that I don't think we have an use case for BOs >
  4GiB.
  

>4GB buffer is normal for compute applications, the issue is
  reported by "Maelstrom generated exerciser detects micompares when
  GPU accesses larger remote GPU memory." on GFX 9.4.3 APU, which
  uses GTT domain to allocate VRAM, and trigger the bug in this drm
  prime helper. With this fix, the test passed.

Regards,
Philip


  
  Christian.
  
  
  

Signed-off-by: Philip Yang 

---

  drivers/gpu/drm/drm_prime.c | 2 +-

  1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/drivers/gpu/drm/drm_prime.c
b/drivers/gpu/drm/drm_prime.c

index f924b8b4ab6b..2630ad2e504d 100644

--- a/drivers/gpu/drm/drm_prime.c

+++ b/drivers/gpu/drm/drm_prime.c

@@ -830,7 +830,7 @@ struct sg_table
*drm_prime_pages_to_sg(struct drm_device *dev,

  if (max_segment == 0)

  max_segment = UINT_MAX;

  err = sg_alloc_table_from_pages_segment(sg, pages,
nr_pages, 0,

-    nr_pages << PAGE_SHIFT,

+    (unsigned long)nr_pages <<
PAGE_SHIFT,

  max_segment, GFP_KERNEL);

  if (err) {

  kfree(sg);

[PATCH v5 11/11] drm/msm: Enable fence signalling annotations

2023-08-22 Thread Rob Clark

From: Rob Clark 

Now that the runpm/qos/interconnect lockdep vs reclaim issues are
solved, we can enable the fence signalling annotations without lockdep
making it's immediate displeasure known.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_ringbuffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 7f5e0a961bba..cb9cf41bcb9b 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -97,6 +97,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu 
*gpu, int id,
 /* currently managing hangcheck ourselves: */
sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
+   ring->sched.fence_signalling = true;
ret = drm_sched_init(>sched, _sched_ops,
num_hw_submissions, 0, sched_timeout,
NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
-- 
2.41.0

[PATCH v5 10/11] drm/sched: Add (optional) fence signaling annotation

2023-08-22 Thread Rob Clark

From: Rob Clark 

Based on
https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vet...@ffwll.ch/
but made to be optional.

Signed-off-by: Rob Clark 
Reviewed-by: Luben Tuikov 
---
 drivers/gpu/drm/scheduler/sched_main.c | 9 +
 include/drm/gpu_scheduler.h| 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 23afd70e41ea..6dda18639ac9 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1018,10 +1018,15 @@ static bool drm_sched_blocked(struct drm_gpu_scheduler 
*sched)
 static int drm_sched_main(void *param)
 {
struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
+   const bool fence_signalling = sched->fence_signalling;
+   bool fence_cookie;
int r;
 
sched_set_fifo_low(current);
 
+   if (fence_signalling)
+   fence_cookie = dma_fence_begin_signalling();
+
while (!kthread_should_stop()) {
struct drm_sched_entity *entity = NULL;
struct drm_sched_fence *s_fence;
@@ -1077,6 +1082,10 @@ static int drm_sched_main(void *param)
 
wake_up(>job_scheduled);
}
+
+   if (fence_signalling)
+   dma_fence_end_signalling(fence_cookie);
+
return 0;
 }
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e95b4837e5a3..58d958ad31a1 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -493,6 +493,7 @@ struct drm_sched_backend_ops {
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
  * @dev: system  device
+ * @fence_signalling: Opt in to fence signalling annotations
  *
  * One scheduler is implemented for each hardware ring.
  */
@@ -517,6 +518,7 @@ struct drm_gpu_scheduler {
boolready;
boolfree_guilty;
struct device   *dev;
+   boolfence_signalling;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
-- 
2.41.0

[PATCH v5 09/11] drm/msm: Move runpm enable in submit path

2023-08-22 Thread Rob Clark

From: Rob Clark 

Move runpm enable to just before we enqueue the job to the scheduler,
rather than job_run().  This has the disadvantage of potentially
powering up the GPU before waiting for fences, but it is the only
feasible way to move things like clk_prepare() out of the fence
signalling path.  Ideally runpm would have separate prepare and enable
steps so we could just move the prepare step.  But attempting to
separate these without support in runpm doesn't play nicely with
autosuspend.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 2 ++
 drivers/gpu/drm/msm/msm_gpu.c| 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 99744de6c05a..a908373cf34b 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -981,6 +981,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
msm_rd_dump_submit(priv->rd, submit, NULL);
 
+   pm_runtime_get_sync(>pdev->dev);
+
drm_sched_entity_push_job(>base);
 
args->fence = submit->fence_id;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 243f988c65b7..819140d85205 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -751,8 +751,6 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
 
WARN_ON(!mutex_is_locked(>lock));
 
-   pm_runtime_get_sync(>pdev->dev);
-
msm_gpu_hw_init(gpu);
 
submit->seqno = submit->hw_fence->seqno;
-- 
2.41.0

[PATCH v5 08/11] drm/msm/a6xx: Remove GMU lock from runpm paths

2023-08-22 Thread Rob Clark

From: Rob Clark 

The locking is unneeded here as runpm provides sufficient serialization.
Fixes:

   ==
   WARNING: possible circular locking dependency detected
   6.4.3-debug+ #16 Not tainted
   --
   kworker/5:2/211 is trying to acquire lock:
   ffd577cefb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98

   but task is already holding lock:
   ff809db316c0 (_gpu->gmu.lock){+.+.}-{3:3}, at: 
a6xx_gmu_pm_suspend+0x4c/0xb4 [msm]

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (_gpu->gmu.lock){+.+.}-{3:3}:
  __mutex_lock+0xc8/0x388
  mutex_lock_nested+0x2c/0x38
  a6xx_gmu_resume+0xf0/0x7f8 [msm]
  a6xx_gmu_pm_resume+0x38/0x158 [msm]
  adreno_runtime_resume+0x2c/0x38 [msm]
  pm_generic_runtime_resume+0x30/0x44
  __rpm_callback+0x4c/0x134
  rpm_callback+0x78/0x7c
  rpm_resume+0x3a4/0x46c
  __pm_runtime_resume+0x78/0xbc
  pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
  msm_gpu_submit+0x3c/0x130 [msm]
  msm_job_run+0x84/0x11c [msm]
  drm_sched_main+0x264/0x354 [gpu_sched]
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   -> #2 (dma_fence_map){}-{0:0}:
  __dma_fence_might_wait+0x74/0xc0
  dma_fence_wait_timeout+0x50/0x174
  dma_resv_wait_timeout+0x58/0xa8
  active_evict+0x30/0x5c [msm]
  drm_gem_lru_scan+0x15c/0x1c8
  msm_gem_shrinker_scan+0x124/0x204 [msm]
  do_shrink_slab+0x194/0x324
  shrink_slab+0x270/0x2ec
  shrink_node+0x278/0x674
  do_try_to_free_pages+0x2dc/0x41c
  try_to_free_pages+0x13c/0x1e4
  __alloc_pages+0x364/0xb44
  __folio_alloc+0x24/0x60
  __read_swap_cache_async+0x10c/0x1fc
  swap_cluster_readahead+0x1ac/0x234
  shmem_swapin+0x6c/0xb0
  shmem_swapin_folio+0x208/0x66c
  shmem_get_folio_gfp+0x13c/0x650
  shmem_read_folio_gfp+0x68/0xb0
  shmem_read_mapping_page_gfp+0x20/0x44
  drm_gem_get_pages+0xd4/0x1bc
  get_pages+0x54/0x1e4 [msm]
  msm_gem_pin_pages_locked+0x38/0xac [msm]
  msm_gem_pin_vma_locked+0x58/0x88 [msm]
  msm_ioctl_gem_submit+0xde4/0x13ac [msm]
  drm_ioctl_kernel+0xe0/0x15c
  drm_ioctl+0x2e8/0x3f4
  vfs_ioctl+0x30/0x50
  __arm64_sys_ioctl+0x80/0xb4
  invoke_syscall+0x8c/0x128
  el0_svc_common.constprop.0+0xdc/0x110
  do_el0_svc+0x94/0xa4
  el0_svc+0x44/0x88
  el0t_64_sync_handler+0xac/0x13c
  el0t_64_sync+0x190/0x194

   -> #1 (fs_reclaim){+.+.}-{0:0}:
  __fs_reclaim_acquire+0x3c/0x48
  fs_reclaim_acquire+0x50/0x9c
  slab_pre_alloc_hook.constprop.0+0x40/0x250
  __kmem_cache_alloc_node+0x60/0x18c
  kmalloc_trace+0x44/0x88
  clk_rcg2_dfs_determine_rate+0x60/0x214
  clk_core_determine_round_nolock+0xb8/0xf0
  clk_core_round_rate_nolock+0x84/0x118
  clk_core_round_rate_nolock+0xd8/0x118
  clk_round_rate+0x6c/0xd0
  geni_se_clk_tbl_get+0x78/0xc0
  geni_se_clk_freq_match+0x44/0xe4
  get_spi_clk_cfg+0x50/0xf4
  geni_spi_set_clock_and_bw+0x54/0x104
  spi_geni_prepare_message+0x130/0x174
  __spi_pump_transfer_message+0x200/0x4d8
  __spi_sync+0x13c/0x23c
  spi_sync_locked+0x18/0x24
  do_cros_ec_pkt_xfer_spi+0x124/0x3f0
  cros_ec_xfer_high_pri_work+0x28/0x3c
  kthread_worker_fn+0x14c/0x27c
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   -> #0 (prepare_lock){+.+.}-{3:3}:
  __lock_acquire+0xdf8/0x109c
  lock_acquire+0x234/0x284
  __mutex_lock+0xc8/0x388
  mutex_lock_nested+0x2c/0x38
  clk_prepare_lock+0x70/0x98
  clk_unprepare+0x2c/0x48
  clk_bulk_unprepare+0x48/0x4c
  a6xx_gmu_stop+0x94/0x260 [msm]
  a6xx_gmu_pm_suspend+0x54/0xb4 [msm]
  adreno_runtime_suspend+0x38/0x44 [msm]
  pm_generic_runtime_suspend+0x30/0x44
  __rpm_callback+0x4c/0x134
  rpm_callback+0x78/0x7c
  rpm_suspend+0x28c/0x44c
  pm_runtime_work+0xa0/0xa4
  process_one_work+0x288/0x3d8
  worker_thread+0x1f0/0x260
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
 prepare_lock --> dma_fence_map --> _gpu->gmu.lock

Possible unsafe locking scenario:

  CPU0CPU1
  
 lock(_gpu->gmu.lock);
  lock(dma_fence_map);
  lock(_gpu->gmu.lock);
 lock(prepare_lock);

*** DEADLOCK ***

   3 locks held by

[PATCH v5 07/11] interconnect: Teach lockdep about icc_bw_lock order

2023-08-22 Thread Rob Clark

From: Rob Clark 

Teach lockdep that icc_bw_lock is needed in code paths that could
deadlock if they trigger reclaim.

Signed-off-by: Rob Clark 
---
 drivers/interconnect/core.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index e15a92a79df1..1afbc4f7c6e7 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -1041,13 +1041,21 @@ void icc_sync_state(struct device *dev)
}
}
}
+   mutex_unlock(_bw_lock);
mutex_unlock(_lock);
 }
 EXPORT_SYMBOL_GPL(icc_sync_state);
 
 static int __init icc_init(void)
 {
-   struct device_node *root = of_find_node_by_path("/");
+   struct device_node *root;
+
+   /* Teach lockdep about lock ordering wrt. shrinker: */
+   fs_reclaim_acquire(GFP_KERNEL);
+   might_lock(_bw_lock);
+   fs_reclaim_release(GFP_KERNEL);
+
+   root = of_find_node_by_path("/");
 
providers_count = of_count_icc_providers(root);
of_node_put(root);
-- 
2.41.0

[PATCH v5 06/11] interconnect: Fix locking for runpm vs reclaim

2023-08-22 Thread Rob Clark

From: Rob Clark 

For cases where icc_bw_set() can be called in callbaths that could
deadlock against shrinker/reclaim, such as runpm resume, we need to
decouple the icc locking.  Introduce a new icc_bw_lock for cases where
we need to serialize bw aggregation and update to decouple that from
paths that require memory allocation such as node/link creation/
destruction.

Fixes this lockdep splat:

   ==
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ #554 Not tainted
   --
   ring0/132 is trying to acquire lock:
   ff80871916d0 (>lock){+.+.}-{3:3}, at: a6xx_pm_resume+0xf0/0x234

   but task is already holding lock:
   ffdb5aee57e8 (dma_fence_map){}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #4 (dma_fence_map){}-{0:0}:
  __dma_fence_might_wait+0x74/0xc0
  dma_resv_lockdep+0x1f4/0x2f4
  do_one_initcall+0x104/0x2bc
  kernel_init_freeable+0x344/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
  fs_reclaim_acquire+0x80/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  __kmalloc+0xd8/0x100
  topology_parse_cpu_capacity+0x8c/0x178
  get_cpu_for_node+0x88/0xc4
  parse_cluster+0x1b0/0x28c
  parse_cluster+0x8c/0x28c
  init_cpu_topology+0x168/0x188
  smp_prepare_cpus+0x24/0xf8
  kernel_init_freeable+0x18c/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #2 (fs_reclaim){+.+.}-{0:0}:
  __fs_reclaim_acquire+0x3c/0x48
  fs_reclaim_acquire+0x54/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  __kmalloc+0xd8/0x100
  kzalloc.constprop.0+0x14/0x20
  icc_node_create_nolock+0x4c/0xc4
  icc_node_create+0x38/0x58
  qcom_icc_rpmh_probe+0x1b8/0x248
  platform_probe+0x70/0xc4
  really_probe+0x158/0x290
  __driver_probe_device+0xc8/0xe0
  driver_probe_device+0x44/0x100
  __driver_attach+0xf8/0x108
  bus_for_each_dev+0x78/0xc4
  driver_attach+0x2c/0x38
  bus_add_driver+0xd0/0x1d8
  driver_register+0xbc/0xf8
  __platform_driver_register+0x30/0x3c
  qnoc_driver_init+0x24/0x30
  do_one_initcall+0x104/0x2bc
  kernel_init_freeable+0x344/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #1 (icc_lock){+.+.}-{3:3}:
  __mutex_lock+0xcc/0x3c8
  mutex_lock_nested+0x30/0x44
  icc_set_bw+0x88/0x2b4
  _set_opp_bw+0x8c/0xd8
  _set_opp+0x19c/0x300
  dev_pm_opp_set_opp+0x84/0x94
  a6xx_gmu_resume+0x18c/0x804
  a6xx_pm_resume+0xf8/0x234
  adreno_runtime_resume+0x2c/0x38
  pm_generic_runtime_resume+0x30/0x44
  __rpm_callback+0x15c/0x174
  rpm_callback+0x78/0x7c
  rpm_resume+0x318/0x524
  __pm_runtime_resume+0x78/0xbc
  adreno_load_gpu+0xc4/0x17c
  msm_open+0x50/0x120
  drm_file_alloc+0x17c/0x228
  drm_open_helper+0x74/0x118
  drm_open+0xa0/0x144
  drm_stub_open+0xd4/0xe4
  chrdev_open+0x1b8/0x1e4
  do_dentry_open+0x2f8/0x38c
  vfs_open+0x34/0x40
  path_openat+0x64c/0x7b4
  do_filp_open+0x54/0xc4
  do_sys_openat2+0x9c/0x100
  do_sys_open+0x50/0x7c
  __arm64_sys_openat+0x28/0x34
  invoke_syscall+0x8c/0x128
  el0_svc_common.constprop.0+0xa0/0x11c
  do_el0_svc+0xac/0xbc
  el0_svc+0x48/0xa0
  el0t_64_sync_handler+0xac/0x13c
  el0t_64_sync+0x190/0x194

   -> #0 (>lock){+.+.}-{3:3}:
  __lock_acquire+0xe00/0x1060
  lock_acquire+0x1e0/0x2f8
  __mutex_lock+0xcc/0x3c8
  mutex_lock_nested+0x30/0x44
  a6xx_pm_resume+0xf0/0x234
  adreno_runtime_resume+0x2c/0x38
  pm_generic_runtime_resume+0x30/0x44
  __rpm_callback+0x15c/0x174
  rpm_callback+0x78/0x7c
  rpm_resume+0x318/0x524
  __pm_runtime_resume+0x78/0xbc
  pm_runtime_get_sync.isra.0+0x14/0x20
  msm_gpu_submit+0x58/0x178
  msm_job_run+0x78/0x150
  drm_sched_main+0x290/0x370
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
 >lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

Possible unsafe locking scenario:

  CPU0CPU1
  
 lock(dma_fence_map);

[PATCH v5 03/11] PM / QoS: Fix constraints alloc vs reclaim locking

2023-08-22 Thread Rob Clark

From: Rob Clark 

In the process of adding lockdep annotation for drm GPU scheduler's
job_run() to detect potential deadlock against shrinker/reclaim, I hit
this lockdep splat:

   ==
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ #558 Tainted: GW
   --
   ring0/125 is trying to acquire lock:
   ffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: 
dev_pm_qos_update_request+0x38/0x68

   but task is already holding lock:
   ff8087239208 (>active_lock){+.+.}-{3:3}, at: 
msm_gpu_submit+0xec/0x178

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #4 (>active_lock){+.+.}-{3:3}:
  __mutex_lock+0xcc/0x3c8
  mutex_lock_nested+0x30/0x44
  msm_gpu_submit+0xec/0x178
  msm_job_run+0x78/0x150
  drm_sched_main+0x290/0x370
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   -> #3 (dma_fence_map){}-{0:0}:
  __dma_fence_might_wait+0x74/0xc0
  dma_resv_lockdep+0x1f4/0x2f4
  do_one_initcall+0x104/0x2bc
  kernel_init_freeable+0x344/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
  fs_reclaim_acquire+0x80/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  __kmalloc+0xd8/0x100
  topology_parse_cpu_capacity+0x8c/0x178
  get_cpu_for_node+0x88/0xc4
  parse_cluster+0x1b0/0x28c
  parse_cluster+0x8c/0x28c
  init_cpu_topology+0x168/0x188
  smp_prepare_cpus+0x24/0xf8
  kernel_init_freeable+0x18c/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
  __fs_reclaim_acquire+0x3c/0x48
  fs_reclaim_acquire+0x54/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  kmalloc_trace+0x50/0xa8
  dev_pm_qos_constraints_allocate+0x38/0x100
  __dev_pm_qos_add_request+0xb0/0x1e8
  dev_pm_qos_add_request+0x58/0x80
  dev_pm_qos_expose_latency_limit+0x60/0x13c
  register_cpu+0x12c/0x130
  topology_init+0xac/0xbc
  do_one_initcall+0x104/0x2bc
  kernel_init_freeable+0x344/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
  __lock_acquire+0xe00/0x1060
  lock_acquire+0x1e0/0x2f8
  __mutex_lock+0xcc/0x3c8
  mutex_lock_nested+0x30/0x44
  dev_pm_qos_update_request+0x38/0x68
  msm_devfreq_boost+0x40/0x70
  msm_devfreq_active+0xc0/0xf0
  msm_gpu_submit+0x10c/0x178
  msm_job_run+0x78/0x150
  drm_sched_main+0x290/0x370
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
 dev_pm_qos_mtx --> dma_fence_map --> >active_lock

Possible unsafe locking scenario:

  CPU0CPU1
  
 lock(>active_lock);
  lock(dma_fence_map);
  lock(>active_lock);
 lock(dev_pm_qos_mtx);

*** DEADLOCK ***

   3 locks held by ring0/123:
#0: ff8087251170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
#1: ffd00b0e57e8 (dma_fence_map){}-{0:0}, at: msm_job_run+0x68/0x150
#2: ff8087251208 (>active_lock){+.+.}-{3:3}, at: 
msm_gpu_submit+0xec/0x178

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
dump_backtrace.part.0+0xb4/0xf8
show_stack+0x20/0x38
dump_stack_lvl+0x9c/0xd0
dump_stack+0x18/0x34
print_circular_bug+0x1b4/0x1f0
check_noncircular+0x78/0xac
__lock_acquire+0xe00/0x1060
lock_acquire+0x1e0/0x2f8
__mutex_lock+0xcc/0x3c8
mutex_lock_nested+0x30/0x44
dev_pm_qos_update_request+0x38/0x68
msm_devfreq_boost+0x40/0x70
msm_devfreq_active+0xc0/0xf0
msm_gpu_submit+0x10c/0x178
msm_job_run+0x78/0x150
drm_sched_main+0x290/0x370
kthread+0xf0/0x100
ret_from_fork+0x10/0x20

The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
freq change) path, but it is also held across allocations that could
recurse into shrinker.

Solve this by changing dev_pm_qos_constraints_allocate() into a function
that can be called unconditionally before the device qos object is
needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
be done without holding the mutex.  In the case that we raced with
another thread to allocate the qos object, detect this *after* acquiring
the dev_pm_qos_mtx and simply free the redundant allocations.

[PATCH v5 04/11] PM / QoS: Decouple request alloc from dev_pm_qos_mtx

2023-08-22 Thread Rob Clark

From: Rob Clark 

Similar to the previous patch, move the allocation out from under
dev_pm_qos_mtx, by speculatively doing the allocation and handle
any race after acquiring dev_pm_qos_mtx by freeing the redundant
allocation.

Signed-off-by: Rob Clark 
---
 drivers/base/power/qos.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 7e95760d16dc..09834f3354d7 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -930,8 +930,12 @@ s32 dev_pm_qos_get_user_latency_tolerance(struct device 
*dev)
 int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
 {
struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate(dev);
+   struct dev_pm_qos_request *req = NULL;
int ret = 0;
 
+   if (!qos->latency_tolerance_req)
+   req = kzalloc(sizeof(*req), GFP_KERNEL);
+
mutex_lock(_pm_qos_mtx);
 
dev_pm_qos_constraints_set(dev, qos);
@@ -945,8 +949,6 @@ int dev_pm_qos_update_user_latency_tolerance(struct device 
*dev, s32 val)
goto out;
 
if (!dev->power.qos->latency_tolerance_req) {
-   struct dev_pm_qos_request *req;
-
if (val < 0) {
if (val == PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT)
ret = 0;
@@ -954,17 +956,15 @@ int dev_pm_qos_update_user_latency_tolerance(struct 
device *dev, s32 val)
ret = -EINVAL;
goto out;
}
-   req = kzalloc(sizeof(*req), GFP_KERNEL);
if (!req) {
ret = -ENOMEM;
goto out;
}
ret = __dev_pm_qos_add_request(dev, req, 
DEV_PM_QOS_LATENCY_TOLERANCE, val);
-   if (ret < 0) {
-   kfree(req);
+   if (ret < 0)
goto out;
-   }
dev->power.qos->latency_tolerance_req = req;
+   req = NULL;
} else {
if (val < 0) {
__dev_pm_qos_drop_user_request(dev, 
DEV_PM_QOS_LATENCY_TOLERANCE);
@@ -976,6 +976,7 @@ int dev_pm_qos_update_user_latency_tolerance(struct device 
*dev, s32 val)
 
  out:
mutex_unlock(_pm_qos_mtx);
+   kfree(req);
return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_update_user_latency_tolerance);
-- 
2.41.0

[PATCH v5 05/11] PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order

2023-08-22 Thread Rob Clark

From: Rob Clark 

Annotate dev_pm_qos_mtx to teach lockdep to scream about allocations
that could trigger reclaim under dev_pm_qos_mtx.

Signed-off-by: Rob Clark 
---
 drivers/base/power/qos.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 09834f3354d7..2018c805a6f1 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -1017,3 +1017,14 @@ void dev_pm_qos_hide_latency_tolerance(struct device 
*dev)
pm_runtime_put(dev);
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_hide_latency_tolerance);
+
+static int __init dev_pm_qos_init(void)
+{
+   /* Teach lockdep about lock ordering wrt. shrinker: */
+   fs_reclaim_acquire(GFP_KERNEL);
+   might_lock(_pm_qos_mtx);
+   fs_reclaim_release(GFP_KERNEL);
+
+   return 0;
+}
+early_initcall(dev_pm_qos_init);
-- 
2.41.0

[PATCH v5 02/11] PM / devfreq: Teach lockdep about locking order

2023-08-22 Thread Rob Clark

From: Rob Clark 

This will make it easier to catch places doing allocations that can
trigger reclaim under devfreq->lock.

Because devfreq->lock is held over various devfreq_dev_profile
callbacks, there might be some fallout if those callbacks do allocations
that can trigger reclaim, but I've looked through the various callback
implementations and don't see anything obvious.  If it does trigger any
lockdep splats, those should be fixed.

Signed-off-by: Rob Clark 
---
 drivers/devfreq/devfreq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index e5558ec68ce8..81add6064406 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -817,6 +817,12 @@ struct devfreq *devfreq_add_device(struct device *dev,
}
 
mutex_init(>lock);
+
+   /* Teach lockdep about lock ordering wrt. shrinker: */
+   fs_reclaim_acquire(GFP_KERNEL);
+   might_lock(>lock);
+   fs_reclaim_release(GFP_KERNEL);
+
devfreq->dev.parent = dev;
devfreq->dev.class = devfreq_class;
devfreq->dev.release = devfreq_dev_release;
-- 
2.41.0

[PATCH v5 01/11] PM / devfreq: Drop unneed locking to appease lockdep

2023-08-22 Thread Rob Clark

From: Rob Clark 

In the process of adding lockdep annotation for GPU job_run() path to
catch potential deadlocks against the shrinker/reclaim path, I turned
up this lockdep splat:

   ==
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ #556 Not tainted
   --
   ring0/123 is trying to acquire lock:
   ff8087219078 (>lock){+.+.}-{3:3}, at: 
devfreq_monitor_resume+0x3c/0xf0

   but task is already holding lock:
   ffd6f64e57e8 (dma_fence_map){}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (dma_fence_map){}-{0:0}:
  __dma_fence_might_wait+0x74/0xc0
  dma_resv_lockdep+0x1f4/0x2f4
  do_one_initcall+0x104/0x2bc
  kernel_init_freeable+0x344/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
  fs_reclaim_acquire+0x80/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  __kmalloc+0xd8/0x100
  topology_parse_cpu_capacity+0x8c/0x178
  get_cpu_for_node+0x88/0xc4
  parse_cluster+0x1b0/0x28c
  parse_cluster+0x8c/0x28c
  init_cpu_topology+0x168/0x188
  smp_prepare_cpus+0x24/0xf8
  kernel_init_freeable+0x18c/0x34c
  kernel_init+0x30/0x134
  ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
  __fs_reclaim_acquire+0x3c/0x48
  fs_reclaim_acquire+0x54/0xa8
  slab_pre_alloc_hook.constprop.0+0x40/0x25c
  __kmem_cache_alloc_node+0x60/0x1cc
  __kmalloc_node_track_caller+0xb8/0xe0
  kstrdup+0x70/0x90
  kstrdup_const+0x38/0x48
  kvasprintf_const+0x48/0xbc
  kobject_set_name_vargs+0x40/0xb0
  dev_set_name+0x64/0x8c
  devfreq_add_device+0x31c/0x55c
  devm_devfreq_add_device+0x6c/0xb8
  msm_devfreq_init+0xa8/0x16c
  msm_gpu_init+0x38c/0x570
  adreno_gpu_init+0x1b4/0x2b4
  a6xx_gpu_init+0x15c/0x3e4
  adreno_bind+0x218/0x254
  component_bind_all+0x114/0x1ec
  msm_drm_bind+0x2b8/0x608
  try_to_bring_up_aggregate_device+0x88/0x1a4
  __component_add+0xec/0x13c
  component_add+0x1c/0x28
  dsi_dev_attach+0x28/0x34
  dsi_host_attach+0xdc/0x124
  mipi_dsi_attach+0x30/0x44
  devm_mipi_dsi_attach+0x2c/0x70
  ti_sn_bridge_probe+0x298/0x2c4
  auxiliary_bus_probe+0x7c/0x94
  really_probe+0x158/0x290
  __driver_probe_device+0xc8/0xe0
  driver_probe_device+0x44/0x100
  __device_attach_driver+0x64/0xdc
  bus_for_each_drv+0xa0/0xc8
  __device_attach+0xd8/0x168
  device_initial_probe+0x1c/0x28
  bus_probe_device+0x38/0xa0
  deferred_probe_work_func+0xc8/0xe0
  process_one_work+0x2d8/0x478
  process_scheduled_works+0x4c/0x50
  worker_thread+0x218/0x274
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   -> #0 (>lock){+.+.}-{3:3}:
  __lock_acquire+0xe00/0x1060
  lock_acquire+0x1e0/0x2f8
  __mutex_lock+0xcc/0x3c8
  mutex_lock_nested+0x30/0x44
  devfreq_monitor_resume+0x3c/0xf0
  devfreq_simple_ondemand_handler+0x54/0x7c
  devfreq_resume_device+0xa4/0xe8
  msm_devfreq_resume+0x78/0xa8
  a6xx_pm_resume+0x110/0x234
  adreno_runtime_resume+0x2c/0x38
  pm_generic_runtime_resume+0x30/0x44
  __rpm_callback+0x15c/0x174
  rpm_callback+0x78/0x7c
  rpm_resume+0x318/0x524
  __pm_runtime_resume+0x78/0xbc
  pm_runtime_get_sync.isra.0+0x14/0x20
  msm_gpu_submit+0x58/0x178
  msm_job_run+0x78/0x150
  drm_sched_main+0x290/0x370
  kthread+0xf0/0x100
  ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
 >lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

Possible unsafe locking scenario:

  CPU0CPU1
  
 lock(dma_fence_map);
  lock(mmu_notifier_invalidate_range_start);
  lock(dma_fence_map);
 lock(>lock);

*** DEADLOCK ***

   2 locks held by ring0/123:
#0: ff8087201170 (>lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
#1: ffd6f64e57e8 (dma_fence_map){}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #556
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
dump_backtrace.part.0+0xb4/0xf8
show_stack+0x20/0x38
dump_stack_lvl+0x9c/0xd0
dump_stack+0x18/0x34

[PATCH v5 00/11] drm/msm+PM+icc: Make job_run() reclaim-safe

2023-08-22 Thread Rob Clark

From: Rob Clark 

Inspired by 
https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vet...@ffwll.ch/
it seemed like a good idea to get rid of memory allocation in job_run()
fence signaling path, and use lockdep annotations to yell at us about
anything that could deadlock against shrinker/reclaim.  Anything that
can trigger reclaim, or block on any other thread that has triggered
reclaim, can block the GPU shrinker from releasing memory if it is
waiting the job to complete, causing deadlock.

The first two patches decouple allocation from devfreq->lock, and teach
lockdep that devfreq->lock can be acquired in paths that the shrinker
indirectly depends on.

The next three patches do the same for PM QoS.  And the next two do a
similar thing for interconnect.

And then finally the last two patches enable the lockdep fence-
signalling annotations.


v2: Switch from embedding hw_fence in submit/job object to preallocating
the hw_fence.  Rework "fenced unpin" locking to drop obj lock from
fence signaling path (ie. the part that was still WIP in the first
iteration of the patchset).  Adds the final patch to enable fence
signaling annotations now that job_run() and job_free() are safe.
The PM devfreq/QoS and interconnect patches are unchanged.

v3: Mostly unchanged, but series is much smaller now that drm changes
have landed, mostly consisting of the remaining devfreq/qos/
interconnect fixes.

v4: Re-work PM / QoS patch based on Rafael's suggestion

v5: Add a couple more drm/msm patches for issues I found as making
my way to the bottom of the rabbit hole.  In particular, I had
to move power enable earlier, before enqueing to the scheduler,
rather than after the scheduler waits for in-fences, which means
we could be powering up slightly earlier than needed.  If runpm
had a separate prepare + enable similar to the clk framework, we
wouldn't need this.

Rob Clark (11):
  PM / devfreq: Drop unneed locking to appease lockdep
  PM / devfreq: Teach lockdep about locking order
  PM / QoS: Fix constraints alloc vs reclaim locking
  PM / QoS: Decouple request alloc from dev_pm_qos_mtx
  PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order
  interconnect: Fix locking for runpm vs reclaim
  interconnect: Teach lockdep about icc_bw_lock order
  drm/msm/a6xx: Remove GMU lock from runpm paths
  drm/msm: Move runpm enable in submit path
  drm/sched: Add (optional) fence signaling annotation
  drm/msm: Enable fence signalling annotations

 drivers/base/power/qos.c   | 98 +++---
 drivers/devfreq/devfreq.c  | 52 +++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 15 +---
 drivers/gpu/drm/msm/msm_gem_submit.c   |  2 +
 drivers/gpu/drm/msm/msm_gpu.c  |  2 -
 drivers/gpu/drm/msm/msm_ringbuffer.c   |  1 +
 drivers/gpu/drm/scheduler/sched_main.c |  9 +++
 drivers/interconnect/core.c| 18 -
 include/drm/gpu_scheduler.h|  2 +
 9 files changed, 130 insertions(+), 69 deletions(-)

-- 
2.41.0

RE: Implement svm without BO concept in xe driver

2023-08-22 Thread Zeng, Oak


> -Original Message-
> From: Ruhl, Michael J 
> Sent: August 22, 2023 7:44 AM
> To: Felix Kuehling ; Zeng, Oak ;
> Dave Airlie 
> Cc: Brost, Matthew ; Thomas Hellström
> ; Philip Yang ;
> Welty, Brian ; dri-devel@lists.freedesktop.org; 
> Christian
> König ; Vishwanathapura, Niranjana
> ; intel...@lists.freedesktop.org
> Subject: RE: Implement svm without BO concept in xe driver
> 
> >-Original Message-
> >From: Felix Kuehling 
> >Sent: Monday, August 21, 2023 4:57 PM
> >To: Zeng, Oak ; Dave Airlie 
> >Cc: Brost, Matthew ; Thomas Hellström
> >; Philip Yang ;
> >Welty, Brian ; dri-devel@lists.freedesktop.org;
> >Christian König ; Vishwanathapura, Niranjana
> >; intel...@lists.freedesktop.org;
> >Ruhl, Michael J 
> >Subject: Re: Implement svm without BO concept in xe driver
> >
> >
> >On 2023-08-21 15:41, Zeng, Oak wrote:
> >>> I have thought about emulating BO allocation APIs on top of system SVM.
> >>> This was in the context of KFD where memory management is not tied into
> >>> command submissions APIs, which would add a whole other layer of
> >>> complexity. The main unsolved (unsolvable?) problem I ran into was, that
> >>> there is no way to share SVM memory as DMABufs. So there is no good
> >way
> >>> to support applications that expect to share memory in that way.
> >> Great point. I also discussed the dmabuf thing with Mike (cc'ed). dmabuf 
> >> is a
> >particular technology created specially for the BO driver (and other driver) 
> >to
> >share buffer b/t devices. Hmm/system SVM doesn't need this technology:
> >malloc'ed memory by the nature is already shared b/t different devices (in
> >one process) and CPU. We just can simply submit GPU kernel to all devices
> >with malloc'ed memory and let kmd decide the memory placement (such as
> >map in place or migrate). No need of buffer export/import in hmm/system
> >SVM world.
> >
> >I disagree. DMABuf can be used for sharing memory between processes. And
> >it can be used for sharing memory with 3rd-party devices via PCIe P2P
> >(e.g. a Mellanox NIC). You cannot easily do that with malloc'ed memory.
> >POSIX IPC requires that you know that you'll be sharing the memory at
> >allocation time. It adds overhead. And because it's file-backed, it's
> >currently incompatible with migration. And HMM currently doesn't have a
> >solution for P2P. Any access by a different device causes a migration to
> >system memory.
> 
> Hey Oak,
> 
> I think we were discussing this solution in the context of using the P2P_DMA
> feature.  This has an allocation path and a device 2 device capabilities.


I was thinking sharing malloc'ed memory b/t CPU and multiple devices inside one 
process. I thought this should work. With Felix's words above, I looked more 
details. Now I agree with Felix this doesn't work with hmm.

And as Felix pointed out, POSIX IPC also doesn't work with hmm. Theoretically 
driver can do similar migration b/t device memory and file-backed memory, just 
as what we did with anonymous memory. But I am not sure whether people want to 
do that.

Anyway, buffer sharing with hmm/system SVM seems a big open. I will not try to 
solve this problem for now.

Cheers,
Oak

> 
> Mike
> 
> 
> >Regards,
> >   Felix
> >
> >
> >>
> >> So yes from buffer sharing perspective, the design philosophy is also very
> >different.
> >>
> >> Thanks,
> >> Oak
> >>

[PATCH v4 0/4] drm/msm: Enable widebus for DSI

2023-08-22 Thread Jessica Zhang

DSI 6G v2.5.x+ and DPU support a data-bus widen mode that allows DSI
to send 48 bits of compressed data per pclk instead of 24.

For all chipsets that support this mode, enable it whenever DSC is
enabled as recommended by the hardware programming guide.

Only enable this for command mode as we are currently unable to validate
widebus for video mode.

Note: The dsi.xml.h changes were generated using the headergen2 script in
envytools [2], but the changes to the copyright and rules-ng-ng source file
paths were dropped.

[1] https://patchwork.freedesktop.org/series/121742/
[2] https://github.com/freedreno/envytools/

--
Changes in v4:
- *_widebus_* -> *_wide_bus_* (Marijn)
- Moved dpu_enc::wide_bus setting to outside of DSC check (Dmitry)
- Switched order of dpu_enc::widebus setting (Dmitry)
- Added note about INTF_CONFIG2 being present for DPU 5.0+ (Dmitry)
- Added method stub for msm_dsi_is_widebus_enabled() as to not break 
compilation (Dmitry)
- Whitespace and formatting fixes (Dmitry)
- Edited commit msg for "Move DPU encoder wide_bus_en setting" for clarity 
(Dmitry, Marijn)
- Dropped redundant initialization of disp_info
- Picked up reviewed-by tags
- Link to v3: 
https://lore.kernel.org/r/20230802-add-widebus-support-v3-0-2661706be...@quicinc.com

Changes in v3:
- Split commit into DPU, dsi.xml.h, and DSI changes (Dmitry)
- Add DSC enabled check to DSI *_is_widebus_enabled() helper (Dmitry)
- Dropped mention of DPU in cover letter title
- Moved setting of dpu_enc->wide_bus_en to dpu_encoder_virt_atomic_enable()
- Link to v2: 
https://lore.kernel.org/r/20230713-add-widebus-support-v2-1-ad0added1...@quicinc.com

Changes in v2:
- Rebased on top of "drm/msm/dpu: Re-introduce dpu core revision"
- Squashed all commits to avoid breaking feature if the series is only 
partially applied
- Moved DATABUS_WIDEN bit setting to dsi_ctr_enable() (Marijn)
- Have DPU check if wide bus is requested by output driver (Dmitry)
- Introduced bytes_per_pclk variable for dsi_timing_setup() hdisplay adjustment 
(Marijn)
- Link to v1: 
https://lore.kernel.org/r/20230525-add-widebus-support-v1-0-c7069f2ef...@quicinc.com

---
Jessica Zhang (4):
  drm/msm/dpu: Move setting of dpu_enc::wide_bus_en to atomic enable()
  drm/msm/dpu: Enable widebus for DSI INTF
  drm/msm/dsi: Add DATABUS_WIDEN MDP_CTRL2 bit
  drm/msm/dsi: Enable widebus for DSI

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c| 14 +++---
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c   |  2 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c|  7 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h|  1 +
 drivers/gpu/drm/msm/dsi/dsi.c  |  5 
 drivers/gpu/drm/msm/dsi/dsi.h  |  1 +
 drivers/gpu/drm/msm/dsi/dsi.xml.h  |  1 +
 drivers/gpu/drm/msm/dsi/dsi_host.c | 31 +++---
 drivers/gpu/drm/msm/msm_drv.h  |  5 
 9 files changed, 59 insertions(+), 8 deletions(-)
---
base-commit: 00ee72279c963989ab435b0bc90b5dc05a9aab79
change-id: 20230525-add-widebus-support-f785546ee751

Best regards,
-- 
Jessica Zhang

[PATCH v4 4/4] drm/msm/dsi: Enable widebus for DSI

2023-08-22 Thread Jessica Zhang

DSI 6G v2.5.x+ supports a data-bus widen mode that allows DSI to send
48 bits of compressed data instead of 24.

Enable this mode whenever DSC is enabled for supported chipsets.

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi.c  |  2 +-
 drivers/gpu/drm/msm/dsi/dsi.h  |  1 +
 drivers/gpu/drm/msm/dsi/dsi_host.c | 31 +++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi.c b/drivers/gpu/drm/msm/dsi/dsi.c
index 4cf424b3509f..7327bfc06a84 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.c
+++ b/drivers/gpu/drm/msm/dsi/dsi.c
@@ -19,7 +19,7 @@ struct drm_dsc_config *msm_dsi_get_dsc_config(struct msm_dsi 
*msm_dsi)
 
 bool msm_dsi_wide_bus_enabled(struct msm_dsi *msm_dsi)
 {
-   return false;
+   return msm_dsi_host_is_widebus_enabled(msm_dsi->host);
 }
 
 static int dsi_get_phy(struct msm_dsi *msm_dsi)
diff --git a/drivers/gpu/drm/msm/dsi/dsi.h b/drivers/gpu/drm/msm/dsi/dsi.h
index bd3763a5d723..a557d2c1aaff 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.h
@@ -134,6 +134,7 @@ int dsi_calc_clk_rate_6g(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi);
 void msm_dsi_host_snapshot(struct msm_disp_state *disp_state, struct 
mipi_dsi_host *host);
 void msm_dsi_host_test_pattern_en(struct mipi_dsi_host *host);
 struct drm_dsc_config *msm_dsi_host_get_dsc_config(struct mipi_dsi_host *host);
+bool msm_dsi_host_is_widebus_enabled(struct mipi_dsi_host *host);
 
 /* dsi phy */
 struct msm_dsi_phy;
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 645927214871..267c7fda8854 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -710,6 +710,15 @@ static void dsi_ctrl_disable(struct msm_dsi_host *msm_host)
dsi_write(msm_host, REG_DSI_CTRL, 0);
 }
 
+bool msm_dsi_host_is_widebus_enabled(struct mipi_dsi_host *host)
+{
+   struct msm_dsi_host *msm_host = to_msm_dsi_host(host);
+
+   return msm_host->dsc &&
+   (msm_host->cfg_hnd->major == MSM_DSI_VER_MAJOR_6G &&
+msm_host->cfg_hnd->minor >= MSM_DSI_6G_VER_MINOR_V2_5_0);
+}
+
 static void dsi_ctrl_enable(struct msm_dsi_host *msm_host,
struct msm_dsi_phy_shared_timings *phy_shared_timings, 
struct msm_dsi_phy *phy)
 {
@@ -753,10 +762,16 @@ static void dsi_ctrl_enable(struct msm_dsi_host *msm_host,
data |= DSI_CMD_CFG1_INSERT_DCS_COMMAND;
dsi_write(msm_host, REG_DSI_CMD_CFG1, data);
 
-   if (msm_host->cfg_hnd->major == MSM_DSI_VER_MAJOR_6G &&
-   msm_host->cfg_hnd->minor >= MSM_DSI_6G_VER_MINOR_V1_3) {
+   if (cfg_hnd->major == MSM_DSI_VER_MAJOR_6G) {
data = dsi_read(msm_host, REG_DSI_CMD_MODE_MDP_CTRL2);
-   data |= DSI_CMD_MODE_MDP_CTRL2_BURST_MODE;
+
+   if (cfg_hnd->minor >= MSM_DSI_6G_VER_MINOR_V1_3)
+   data |= DSI_CMD_MODE_MDP_CTRL2_BURST_MODE;
+
+   /* TODO: Allow for video-mode support once tested/fixed 
*/
+   if (msm_dsi_host_is_widebus_enabled(_host->base))
+   data |= DSI_CMD_MODE_MDP_CTRL2_DATABUS_WIDEN;
+
dsi_write(msm_host, REG_DSI_CMD_MODE_MDP_CTRL2, data);
}
}
@@ -894,6 +909,7 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi)
u32 hdisplay = mode->hdisplay;
u32 wc;
int ret;
+   bool widebus_enabled = msm_dsi_host_is_widebus_enabled(_host->base);
 
DBG("");
 
@@ -914,6 +930,7 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, 
bool is_bonded_dsi)
 
if (msm_host->dsc) {
struct drm_dsc_config *dsc = msm_host->dsc;
+   u32 bytes_per_pclk;
 
/* update dsc params with timing params */
if (!dsc || !mode->hdisplay || !mode->vdisplay) {
@@ -937,7 +954,13 @@ static void dsi_timing_setup(struct msm_dsi_host 
*msm_host, bool is_bonded_dsi)
 * pulse width same
 */
h_total -= hdisplay;
-   hdisplay = 
DIV_ROUND_UP(msm_dsc_get_bytes_per_line(msm_host->dsc), 3);
+   if (widebus_enabled && !(msm_host->mode_flags & 
MIPI_DSI_MODE_VIDEO))
+   bytes_per_pclk = 6;
+   else
+   bytes_per_pclk = 3;
+
+   hdisplay = 
DIV_ROUND_UP(msm_dsc_get_bytes_per_line(msm_host->dsc), bytes_per_pclk);
+
h_total += hdisplay;
ha_end = ha_start + hdisplay;
}

-- 
2.42.0

[PATCH v4 3/4] drm/msm/dsi: Add DATABUS_WIDEN MDP_CTRL2 bit

2023-08-22 Thread Jessica Zhang

Add a DATABUS_WIDEN bit to the MDP_CTRL2 register to allow DSI to enable
databus widen mode.

Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/dsi/dsi.xml.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/dsi/dsi.xml.h 
b/drivers/gpu/drm/msm/dsi/dsi.xml.h
index a4a154601114..2a7d980e12c3 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.xml.h
+++ b/drivers/gpu/drm/msm/dsi/dsi.xml.h
@@ -664,6 +664,7 @@ static inline uint32_t 
DSI_CMD_MODE_MDP_CTRL2_INPUT_RGB_SWAP(enum dsi_rgb_swap v
return ((val) << DSI_CMD_MODE_MDP_CTRL2_INPUT_RGB_SWAP__SHIFT) & 
DSI_CMD_MODE_MDP_CTRL2_INPUT_RGB_SWAP__MASK;
 }
 #define DSI_CMD_MODE_MDP_CTRL2_BURST_MODE  0x0001
+#define DSI_CMD_MODE_MDP_CTRL2_DATABUS_WIDEN   0x0010
 
 #define REG_DSI_CMD_MODE_MDP_STREAM2_CTRL  0x01b8
 #define DSI_CMD_MODE_MDP_STREAM2_CTRL_DATA_TYPE__MASK  0x003f

-- 
2.42.0

[PATCH v4 1/4] drm/msm/dpu: Move setting of dpu_enc::wide_bus_en to atomic enable()

2023-08-22 Thread Jessica Zhang

Move the setting of dpu_enc::wide_bus_en to
dpu_encoder_virt_atomic_enable() so that it mirrors how dpu_enc::dsc
is being set.

Since wide bus for DSI is related to DSC, having it mirror how DSC
is set in DPU will also make it easier to accommodate for the possibility
of DSC for DSI being set during runtime in the future.

Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index d34e684a4178..3dcd37c48aac 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -1194,11 +1194,18 @@ static void dpu_encoder_virt_atomic_enable(struct 
drm_encoder *drm_enc,
struct dpu_encoder_virt *dpu_enc = NULL;
int ret = 0;
struct drm_display_mode *cur_mode = NULL;
+   struct msm_drm_private *priv = drm_enc->dev->dev_private;
+   struct msm_display_info *disp_info;
 
dpu_enc = to_dpu_encoder_virt(drm_enc);
+   disp_info = _enc->disp_info;
 
dpu_enc->dsc = dpu_encoder_get_dsc_config(drm_enc);
 
+   if (disp_info->intf_type == INTF_DP)
+   dpu_enc->wide_bus_en = msm_dp_wide_bus_available(
+   priv->dp[disp_info->h_tile_instance[0]]);
+
mutex_lock(_enc->enc_lock);
cur_mode = _enc->base.crtc->state->adjusted_mode;
 
@@ -2383,10 +2390,6 @@ struct drm_encoder *dpu_encoder_init(struct drm_device 
*dev,
timer_setup(_enc->frame_done_timer,
dpu_encoder_frame_done_timeout, 0);
 
-   if (disp_info->intf_type == INTF_DP)
-   dpu_enc->wide_bus_en = msm_dp_wide_bus_available(
-   priv->dp[disp_info->h_tile_instance[0]]);
-
INIT_DELAYED_WORK(_enc->delayed_off_work,
dpu_encoder_off_work);
dpu_enc->idle_timeout = IDLE_TIMEOUT;

-- 
2.42.0

[PATCH v4 2/4] drm/msm/dpu: Enable widebus for DSI INTF

2023-08-22 Thread Jessica Zhang

DPU supports a data-bus widen mode for DSI INTF.

Enable this mode for all supported chipsets if widebus is enabled for DSI.

Signed-off-by: Jessica Zhang 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c  | 7 +--
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c | 2 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c  | 7 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h  | 1 +
 drivers/gpu/drm/msm/dsi/dsi.c| 5 +
 drivers/gpu/drm/msm/msm_drv.h| 5 +
 6 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 3dcd37c48aac..d4a21f172aba 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -1196,15 +1196,18 @@ static void dpu_encoder_virt_atomic_enable(struct 
drm_encoder *drm_enc,
struct drm_display_mode *cur_mode = NULL;
struct msm_drm_private *priv = drm_enc->dev->dev_private;
struct msm_display_info *disp_info;
+   int index;
 
dpu_enc = to_dpu_encoder_virt(drm_enc);
disp_info = _enc->disp_info;
+   index = disp_info->h_tile_instance[0];
 
dpu_enc->dsc = dpu_encoder_get_dsc_config(drm_enc);
 
if (disp_info->intf_type == INTF_DP)
-   dpu_enc->wide_bus_en = msm_dp_wide_bus_available(
-   priv->dp[disp_info->h_tile_instance[0]]);
+   dpu_enc->wide_bus_en = 
msm_dp_wide_bus_available(priv->dp[index]);
+   else if (disp_info->intf_type == INTF_DSI)
+   dpu_enc->wide_bus_en = 
msm_dsi_wide_bus_enabled(priv->dsi[index]);
 
mutex_lock(_enc->enc_lock);
cur_mode = _enc->base.crtc->state->adjusted_mode;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
index df88358e7037..29a5f88a12ee 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c
@@ -72,6 +72,8 @@ static void _dpu_encoder_phys_cmd_update_intf_cfg(
if (intf_cfg.dsc != 0)
cmd_mode_cfg.data_compress = true;
 
+   cmd_mode_cfg.wide_bus_en = 
dpu_encoder_is_widebus_enabled(phys_enc->parent);
+
if (phys_enc->hw_intf->ops.program_intf_cmd_cfg)
phys_enc->hw_intf->ops.program_intf_cmd_cfg(phys_enc->hw_intf, 
_mode_cfg);
 }
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 8ec6505d9e78..5dcc83dd47ef 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -521,6 +521,9 @@ static void dpu_hw_intf_program_intf_cmd_cfg(struct 
dpu_hw_intf *ctx,
if (cmd_mode_cfg->data_compress)
intf_cfg2 |= INTF_CFG2_DCE_DATA_COMPRESS;
 
+   if (cmd_mode_cfg->wide_bus_en)
+   intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN;
+
DPU_REG_WRITE(>hw, INTF_CONFIG2, intf_cfg2);
 }
 
@@ -545,6 +548,10 @@ static void _setup_intf_ops(struct dpu_hw_intf_ops *ops,
ops->disable_autorefresh = dpu_hw_intf_disable_autorefresh;
}
 
+   /* Technically, INTF_CONFIG2 is present for DPU 5.0+, but
+* we can configure it for DPU 7.0+ since the wide bus and DSC flags
+* would not be set for DPU < 7.0 anyways
+*/
if (mdss_rev->core_major_ver >= 7)
ops->program_intf_cmd_cfg = dpu_hw_intf_program_intf_cmd_cfg;
 }
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h
index 77f80531782b..c539025c418b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h
@@ -50,6 +50,7 @@ struct dpu_hw_intf_status {
 
 struct dpu_hw_intf_cmd_mode_cfg {
u8 data_compress;   /* enable data compress between dpu and dsi */
+   u8 wide_bus_en; /* enable databus widen mode */
 };
 
 /**
diff --git a/drivers/gpu/drm/msm/dsi/dsi.c b/drivers/gpu/drm/msm/dsi/dsi.c
index baab79ab6e74..4cf424b3509f 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.c
+++ b/drivers/gpu/drm/msm/dsi/dsi.c
@@ -17,6 +17,11 @@ struct drm_dsc_config *msm_dsi_get_dsc_config(struct msm_dsi 
*msm_dsi)
return msm_dsi_host_get_dsc_config(msm_dsi->host);
 }
 
+bool msm_dsi_wide_bus_enabled(struct msm_dsi *msm_dsi)
+{
+   return false;
+}
+
 static int dsi_get_phy(struct msm_dsi *msm_dsi)
 {
struct platform_device *pdev = msm_dsi->pdev;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 9d9d5e009163..1f37be53c281 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -344,6 +344,7 @@ void msm_dsi_snapshot(struct msm_disp_state *disp_state, 
struct msm_dsi *msm_dsi
 bool msm_dsi_is_cmd_mode(struct msm_dsi *msm_dsi);
 bool msm_dsi_is_bonded_dsi(struct msm_dsi *msm_dsi);
 bool

Re: [PATCH] drm/amd/display: register edp_backlight_control() for DCN301

2023-08-22 Thread Harry Wentland




On 2023-08-22 13:03, Hamza Mahfooz wrote:
> As made mention of in commit 099303e9a9bd ("drm/amd/display: eDP
> intermittent black screen during PnP"), we need to turn off the
> display's backlight before powering off an eDP display. Not doing so
> will result in undefined behaviour according to the eDP spec. So, set
> DCN301's edp_backlight_control() function pointer to
> dce110_edp_backlight_control().
> 
> Cc: sta...@vger.kernel.org
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2765
> Fixes: 9c75891feef0 ("drm/amd/display: rework recent update PHY state commit")
> Suggested-by: Swapnil Patel 
> Signed-off-by: Hamza Mahfooz 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c 
> b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
> index 257df8660b4c..61205cdbe2d5 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
> @@ -75,6 +75,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
>   .get_hw_state = dcn10_get_hw_state,
>   .clear_status_bits = dcn10_clear_status_bits,
>   .wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
> + .edp_backlight_control = dce110_edp_backlight_control,
>   .edp_power_control = dce110_edp_power_control,
>   .edp_wait_for_hpd_ready = dce110_edp_wait_for_hpd_ready,
>   .set_cursor_position = dcn10_set_cursor_position,

Re: [PATCH v8 2/7] phy: Add HDMI configuration options

2023-08-22 Thread Dmitry Baryshkov

On 22/08/2023 16:54, Vinod Koul wrote:
> On 17-08-23, 13:05, Dmitry Baryshkov wrote:
>> On 08/08/2023 11:32, Sandor Yu wrote:
>>> Allow HDMI PHYs to be configured through the generic
>>> functions through a custom structure added to the generic union.
>>>
>>> The parameters added here are based on HDMI PHY
>>> implementation practices.  The current set of parameters
>>> should cover the potential users.
>>>
>>> Signed-off-by: Sandor Yu 
>>> ---
>>>include/linux/phy/phy-hdmi.h | 24 
>>>include/linux/phy/phy.h  |  7 ++-
>>>2 files changed, 30 insertions(+), 1 deletion(-)
>>>create mode 100644 include/linux/phy/phy-hdmi.h
>>
>> I think this looks good now, thank you!
>>
>> Reviewed-by: Dmitry Baryshkov 
>
> Should this go thru drm or phy...?

I'd say, PHY, together with the other PHY patches. If you can merge
them into an immutable branch, then it can also be merged into
drm-misc (?) to provide the dependency between drm and phy parts.


--
With best wishes
Dmitry

Re: [Linaro-mm-sig] [PATCH v2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2023-08-22 Thread Rob Clark

On Tue, Aug 22, 2023 at 6:01 AM Christian König
 wrote:
>
> Am 18.08.23 um 16:59 schrieb Rob Clark:
> > From: Rob Clark 
> >
> > If a signal callback releases the sw_sync fence, that will trigger a
> > deadlock as the timeline_fence_release recurses onto the fence->lock
> > (used both for signaling and the the timeline tree).
> >
> > To avoid that, temporarily hold an extra reference to the signalled
> > fences until after we drop the lock.
> >
> > (This is an alternative implementation of 
> > https://patchwork.kernel.org/patch/11664717/
> > which avoids some potential UAF issues with the original patch.)
> >
> > v2: Remove now obsolete comment, use list_move_tail() and
> >  list_del_init()
> >
> > Reported-by: Bas Nieuwenhuizen 
> > Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
> > Signed-off-by: Rob Clark 
>
> Reviewed-by: Christian König 

Thanks, any chance you could take this via drm-misc?

BR,
-R

>
> > ---
> >   drivers/dma-buf/sw_sync.c | 18 +-
> >   1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > index 63f0aeb66db6..f0a35277fd84 100644
> > --- a/drivers/dma-buf/sw_sync.c
> > +++ b/drivers/dma-buf/sw_sync.c
> > @@ -191,6 +191,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
> >*/
> >   static void sync_timeline_signal(struct sync_timeline *obj, unsigned int 
> > inc)
> >   {
> > + LIST_HEAD(signalled);
> >   struct sync_pt *pt, *next;
> >
> >   trace_sync_timeline(obj);
> > @@ -203,21 +204,20 @@ static void sync_timeline_signal(struct sync_timeline 
> > *obj, unsigned int inc)
> >   if (!timeline_fence_signaled(>base))
> >   break;
> >
> > - list_del_init(>link);
> > + dma_fence_get(>base);
> > +
> > + list_move_tail(>link, );
> >   rb_erase(>node, >pt_tree);
> >
> > - /*
> > -  * A signal callback may release the last reference to this
> > -  * fence, causing it to be freed. That operation has to be
> > -  * last to avoid a use after free inside this loop, and must
> > -  * be after we remove the fence from the timeline in order to
> > -  * prevent deadlocking on timeline->lock inside
> > -  * timeline_fence_release().
> > -  */
> >   dma_fence_signal_locked(>base);
> >   }
> >
> >   spin_unlock_irq(>lock);
> > +
> > + list_for_each_entry_safe(pt, next, , link) {
> > + list_del_init(>link);
> > + dma_fence_put(>base);
> > + }
> >   }
> >
> >   /**
>

[PATCH] drm/amd/display: register edp_backlight_control() for DCN301

2023-08-22 Thread Hamza Mahfooz

As made mention of in commit 099303e9a9bd ("drm/amd/display: eDP
intermittent black screen during PnP"), we need to turn off the
display's backlight before powering off an eDP display. Not doing so
will result in undefined behaviour according to the eDP spec. So, set
DCN301's edp_backlight_control() function pointer to
dce110_edp_backlight_control().

Cc: sta...@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2765
Fixes: 9c75891feef0 ("drm/amd/display: rework recent update PHY state commit")
Suggested-by: Swapnil Patel 
Signed-off-by: Hamza Mahfooz 
---
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
index 257df8660b4c..61205cdbe2d5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
@@ -75,6 +75,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
+   .edp_backlight_control = dce110_edp_backlight_control,
.edp_power_control = dce110_edp_power_control,
.edp_wait_for_hpd_ready = dce110_edp_wait_for_hpd_ready,
.set_cursor_position = dcn10_set_cursor_position,
-- 
2.41.0

Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-08-22 Thread Faith Ekstrand

On Tue, Aug 22, 2023 at 4:51 AM Christian König 
wrote:

> Am 21.08.23 um 21:46 schrieb Faith Ekstrand:
>
> On Mon, Aug 21, 2023 at 1:13 PM Christian König 
> wrote:
>
>> [SNIP]
>> So as long as nobody from userspace comes and says we absolutely need to
>> optimize this use case I would rather not do it.
>>
>
> This is a place where nouveau's needs are legitimately different from AMD
> or Intel, I think.  NVIDIA's command streamer model is very different from
> AMD and Intel.  On AMD and Intel, each EXEC turns into a single small
> packet (on the order of 16B) which kicks off a command buffer.  There may
> be a bit of cache management or something around it but that's it.  From
> there, it's userspace's job to make one command buffer chain to another
> until it's finally done and then do a "return", whatever that looks like.
>
> NVIDIA's model is much more static.  Each packet in the HW/FW ring is an
> address and a size and that much data is processed and then it grabs the
> next packet and processes. The result is that, if we use multiple buffers
> of commands, there's no way to chain them together.  We just have to pass
> the whole list of buffers to the kernel.
>
>
> So far that is actually completely identical to what AMD has.
>
> A single EXEC ioctl / job may have 500 such addr+size packets depending on
> how big the command buffer is.
>
>
> And that is what I don't understand. Why would you need 100dreds of such
> addr+size packets?
>

Well, we're not really in control of it.  We can control our base pushbuf
size and that's something we can tune but we're still limited by the
client.  We have to submit another pushbuf whenever:

 1. We run out of space (power-of-two growth is also possible but the size
is limited to a maximum of about 4MiB due to hardware limitations.)
 2. The client calls a secondary command buffer.
 3. Any usage of indirect draw or dispatch on pre-Turing hardware.

At some point we need to tune our BO size a bit to avoid (1) while also
avoiding piles of tiny BOs.  However, (2) and (3) are out of our control.

This is basically identical to what AMD has (well on newer hw there is an
> extension in the CP packets to JUMP/CALL subsequent IBs, but this isn't
> widely used as far as I know).
>

According to Bas, RADV chains on recent hardware.


> Previously the limit was something like 4 which we extended to because Bas
> came up with similar requirements for the AMD side from RADV.
>
> But essentially those approaches with 100dreds of IBs doesn't sound like a
> good idea to me.
>

No one's arguing that they like it.  Again, the hardware isn't designed to
have a kernel in the way. It's designed to be fed by userspace. But we're
going to have the kernel in the middle for a while so we need to make it
not suck too bad.

~Faith

It gets worse on pre-Turing hardware where we have to split the batch for
> every single DrawIndirect or DispatchIndirect.
>
> Lest you think NVIDIA is just crazy here, it's a perfectly reasonable
> model if you assume that userspace is feeding the firmware.  When that's
> happening, you just have a userspace thread that sits there and feeds the
> ringbuffer with whatever is next and you can marshal as much data through
> as you want. Sure, it'd be nice to have a 2nd level batch thing that gets
> launched from the FW ring and has all the individual launch commands but
> it's not at all necessary.
>
> What does that mean from a gpu_scheduler PoV? Basically, it means a
> variable packet size.
>
> What does this mean for implementation? IDK.  One option would be to teach
> the scheduler about actual job sizes. Another would be to virtualize it and
> have another layer underneath the scheduler that does the actual feeding of
> the ring. Another would be to decrease the job size somewhat and then have
> the front-end submit as many jobs as it needs to service userspace and only
> put the out-fences on the last job. All the options kinda suck.
>
>
> Yeah, agree. The job size Danilo suggested is still the least painful.
>
> Christian.
>
>
> ~Faith
>
>
>

[PATCH v2 3/4] drm/xe/vm: Perform accounting of userptr pinned pages

2023-08-22 Thread Thomas Hellström

Account these pages against RLIMIT_MEMLOCK following how RDMA does this
with CAP_IPC_LOCK bypassing the limit.

v2:
- Change the naming of the accounting functions and WARN if we try
  to account anything but userptr pages. (Matthew Brost)

Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/xe/xe_vm.c | 52 --
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 037ac42f74a5..a645cfa131ca 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -34,6 +34,41 @@
 
 #define TEST_VM_ASYNC_OPS_ERROR
 
+/*
+ * Perform userptr PIN accounting against RLIMIT_MEMLOCK for now, similarly
+ * to how RDMA does this.
+ */
+static int
+xe_vma_userptr_mlock_reserve(struct xe_vma *vma, unsigned long num_pages)
+{
+   unsigned long lock_limit, new_pinned;
+   struct mm_struct *mm = vma->userptr.notifier.mm;
+
+   /* TODO: Convert to xe_assert() */
+   XE_WARN_ON(!xe_vma_is_userptr(vma));
+
+   if (!can_do_mlock())
+   return -EPERM;
+
+   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   new_pinned = atomic64_add_return(num_pages, >pinned_vm);
+   if (new_pinned > lock_limit && !capable(CAP_IPC_LOCK)) {
+   atomic64_sub(num_pages, >pinned_vm);
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static void
+xe_vma_userptr_mlock_release(struct xe_vma *vma, unsigned long num_pages)
+{
+   /* TODO: Convert to xe_assert() */
+   XE_WARN_ON(!xe_vma_is_userptr(vma));
+
+   atomic64_sub(num_pages, >userptr.notifier.mm->pinned_vm);
+}
+
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
  * @vma: The userptr vma
@@ -90,9 +125,17 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
!read_only);
pages = vma->userptr.pinned_pages;
} else {
+   if (xe_vma_is_pinned(vma)) {
+   ret = xe_vma_userptr_mlock_reserve(vma, num_pages);
+   if (ret)
+   return ret;
+   }
+
pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
-   if (!pages)
-   return -ENOMEM;
+   if (!pages) {
+   ret = -ENOMEM;
+   goto out_account;
+   }
}
 
pinned = ret = 0;
@@ -188,6 +231,9 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
 mm_closed:
kvfree(pages);
vma->userptr.pinned_pages = NULL;
+out_account:
+   if (xe_vma_is_pinned(vma))
+   xe_vma_userptr_mlock_release(vma, num_pages);
return ret;
 }
 
@@ -1010,6 +1056,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
unpin_user_pages_dirty_lock(vma->userptr.pinned_pages,
vma->userptr.num_pinned,
!read_only);
+   xe_vma_userptr_mlock_release(vma, xe_vma_size(vma) >>
+PAGE_SHIFT);
kvfree(vma->userptr.pinned_pages);
}
 
-- 
2.41.0

[PATCH v2 4/4] drm/xe/uapi: Support pinning of userptr vmas

2023-08-22 Thread Thomas Hellström

Support pinning of vmas using XE_VM_BIND_FLAG_PIN, initially for userptr
only. Pinned memory becomes accounted against RLIMIT_MEMLOCK and processes
with CAP_IPC_LOCK will not apply the limit. This is pretty similar to
mlock()'ing userptr memory with the added benefit that the driver is
aware and can ignore some actions in the MMU invalidation notifier.

This will initially become useful for compute VMs on hardware without
mid-thread-preemption capability since with pinned pages, the MMU
invalidation notifier never tries to preempt a running compute kernel.

If that were the only usage we could restrict this to a flag that always
pins userptr VMAs on compute VMs on such hardware, but there are
indications that this may become needed in other situations as well.

>From a more general point of view, the usage pattern of a system may be
such that in most cases it only ever runs a single workload per system
and then the sysadmin would want to configure the system to allow
extensive pinning for performance reasons.

Hence we might want to extend the pinning capability to bo-backed VMAs
as well. How that pinning will be accounted remains an open but to build
on the current drm CGROUP work would be an option.

Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/xe/xe_vm.c   | 33 +---
 drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
 include/uapi/drm/xe_drm.h| 18 +
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a645cfa131ca..fdfe5a411386 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -950,6 +950,7 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
u64 start, u64 end,
bool read_only,
bool is_null,
+   bool pin,
u8 tile_mask)
 {
struct xe_vma *vma;
@@ -981,6 +982,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
vma->gpuva.flags |= XE_VMA_READ_ONLY;
if (is_null)
vma->gpuva.flags |= DRM_GPUVA_SPARSE;
+   if (pin)
+   vma->gpuva.flags |= XE_VMA_PINNED;
 
if (tile_mask) {
vma->tile_mask = tile_mask;
@@ -2382,6 +2385,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo 
*bo,
op->map.read_only =
operation & XE_VM_BIND_FLAG_READONLY;
op->map.is_null = operation & XE_VM_BIND_FLAG_NULL;
+   op->map.pin = operation & XE_VM_BIND_FLAG_PIN;
}
break;
case XE_VM_BIND_OP_UNMAP:
@@ -2446,7 +2450,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo 
*bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
- u8 tile_mask, bool read_only, bool is_null)
+ u8 tile_mask, bool read_only, bool is_null,
+ bool pin)
 {
struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
struct xe_vma *vma;
@@ -2462,7 +2467,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct 
drm_gpuva_op_map *op,
}
vma = xe_vma_create(vm, bo, op->gem.offset,
op->va.addr, op->va.addr +
-   op->va.range - 1, read_only, is_null,
+   op->va.range - 1, read_only, is_null, pin,
tile_mask);
if (bo)
xe_bo_unlock(bo, );
@@ -2577,7 +2582,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, 
struct xe_exec_queue *q,
 
vma = new_vma(vm, >base.map,
  op->tile_mask, op->map.read_only,
- op->map.is_null);
+ op->map.is_null, op->map.pin);
if (IS_ERR(vma)) {
err = PTR_ERR(vma);
goto free_fence;
@@ -2602,10 +2607,13 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, 
struct xe_exec_queue *q,
bool is_null =
op->base.remap.unmap->va->flags 
&
DRM_GPUVA_SPARSE;
+   bool pin =
+   op->base.remap.unmap->va->flags 
&
+   XE_VMA_PINNED;
 
vma = new_vma(vm, op->base.remap.prev,
  op->tile_mask, read_only,
- is_null);
+

[PATCH v2 0/4] drm/xe: Support optional pinning of userptr pages

2023-08-22 Thread Thomas Hellström

This series adds a flag at VM_BIND time to pin the memory backing a VMA.
Initially this is needed for long-running workloads on hardware that
neither support mid-thread preemption nor pagefaults, since without it
the userptr MMU notifier will wait for preemption until preemption times
out.

Moving forward this could be supported also for bo-backed VMAs given
a proper accounting takes place. A sysadmin could then optionally configure
a system to be optimized for dealing with a single GPU application
at a time.

The series will be followed up with an igt series to exercise the uAPI.

v2:
- Address review comments by Matthew Brost.

Thomas Hellström (4):
  drm/xe/vm: Use onion unwind for xe_vma_userptr_pin_pages()
  drm/xe/vm: Implement userptr page pinning
  drm/xe/vm: Perform accounting of userptr pinned pages
  drm/xe/uapi: Support pinning of userptr vmas

 drivers/gpu/drm/xe/xe_vm.c   | 194 ---
 drivers/gpu/drm/xe/xe_vm.h   |   9 ++
 drivers/gpu/drm/xe/xe_vm_types.h |  14 +++
 include/uapi/drm/xe_drm.h|  18 +++
 4 files changed, 190 insertions(+), 45 deletions(-)

-- 
2.41.0

[PATCH v2 2/4] drm/xe/vm: Implement userptr page pinning

2023-08-22 Thread Thomas Hellström

Implement pinning of userptrs between VM_BIND and VM_UNBIND, which will
facilitate avoiding long hangs on non-preemptible workloads. But don't
hook it up to userspace just yet.

v2:
- Avoid marking userptr VMAs as invalid in the mmu invalidation notifier.
  (Matthew Brost)
- Add an WARN that we don't try to repin userptr pages (Matthew Brost)

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/xe/xe_vm.c   | 80 +++-
 drivers/gpu/drm/xe/xe_vm.h   |  9 
 drivers/gpu/drm/xe/xe_vm_types.h | 12 +
 3 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 8bf7f62e6548..037ac42f74a5 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -74,10 +74,6 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
if (notifier_seq == vma->userptr.notifier_seq)
return 0;
 
-   pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
-   if (!pages)
-   return -ENOMEM;
-
if (vma->userptr.sg) {
dma_unmap_sgtable(xe->drm.dev,
  vma->userptr.sg,
@@ -87,6 +83,18 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
vma->userptr.sg = NULL;
}
 
+   /* TODO: Convert to xe_assert() */
+   if (XE_WARN_ON(vma->userptr.pinned_pages)) {
+   unpin_user_pages_dirty_lock(vma->userptr.pinned_pages,
+   vma->userptr.num_pinned,
+   !read_only);
+   pages = vma->userptr.pinned_pages;
+   } else {
+   pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+   }
+
pinned = ret = 0;
if (in_kthread) {
if (!mmget_not_zero(vma->userptr.notifier.mm)) {
@@ -97,11 +105,18 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
}
 
while (pinned < num_pages) {
-   ret = get_user_pages_fast(xe_vma_userptr(vma) +
- pinned * PAGE_SIZE,
- num_pages - pinned,
- read_only ? 0 : FOLL_WRITE,
- [pinned]);
+   if (xe_vma_is_pinned(vma))
+   ret = pin_user_pages_fast(xe_vma_userptr(vma) +
+ pinned * PAGE_SIZE,
+ num_pages - pinned,
+ read_only ? 0 : FOLL_WRITE,
+ [pinned]);
+   else
+   ret = get_user_pages_fast(xe_vma_userptr(vma) +
+ pinned * PAGE_SIZE,
+ num_pages - pinned,
+ read_only ? 0 : FOLL_WRITE,
+ [pinned]);
if (ret < 0) {
if (in_kthread)
ret = 0;
@@ -137,19 +152,24 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
if (ret)
goto out_free_sg;
 
-   for (i = 0; i < pinned; ++i) {
-   if (!read_only) {
-   lock_page(pages[i]);
-   set_page_dirty(pages[i]);
-   unlock_page(pages[i]);
+   if (!xe_vma_is_pinned(vma)) {
+   for (i = 0; i < pinned; ++i) {
+   if (!read_only) {
+   lock_page(pages[i]);
+   set_page_dirty(pages[i]);
+   unlock_page(pages[i]);
+   }
+
+   mark_page_accessed(pages[i]);
}
 
-   mark_page_accessed(pages[i]);
+   release_pages(pages, pinned);
+   kvfree(pages);
+   } else {
+   vma->userptr.pinned_pages = pages;
+   vma->userptr.num_pinned = pinned;
}
 
-   release_pages(pages, pinned);
-   kvfree(pages);
-
vma->userptr.notifier_seq = notifier_seq;
if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
goto retry;
@@ -160,9 +180,14 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
sg_free_table(vma->userptr.sg);
vma->userptr.sg = NULL;
 out_release_pages:
-   release_pages(pages, pinned);
+   if (!xe_vma_is_pinned(vma))
+   release_pages(pages, pinned);
+   else
+   unpin_user_pages(pages, pinned);
+   vma->userptr.num_pinned = 0;
 mm_closed:
kvfree(pages);
+   vma->userptr.pinned_pages = NULL;
return ret;
 }
 
@@ -718,6 +743,11 @@ static bool vma_userptr_invalidate(struct 
mmu_interval_notifier

[PATCH v2 1/4] drm/xe/vm: Use onion unwind for xe_vma_userptr_pin_pages()

2023-08-22 Thread Thomas Hellström

Use onion error unwind since that makes the function easier to read
and extend. No functional change.

Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Brost 
---
 drivers/gpu/drm/xe/xe_vm.c | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2e99f865d7ec..8bf7f62e6548 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -116,19 +116,17 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
kthread_unuse_mm(vma->userptr.notifier.mm);
mmput(vma->userptr.notifier.mm);
}
-mm_closed:
if (ret)
-   goto out;
+   goto out_release_pages;
 
ret = sg_alloc_table_from_pages_segment(>userptr.sgt, pages,
pinned, 0,
(u64)pinned << PAGE_SHIFT,
xe_sg_segment_size(xe->drm.dev),
GFP_KERNEL);
-   if (ret) {
-   vma->userptr.sg = NULL;
-   goto out;
-   }
+   if (ret)
+   goto out_release_pages;
+
vma->userptr.sg = >userptr.sgt;
 
ret = dma_map_sgtable(xe->drm.dev, vma->userptr.sg,
@@ -136,11 +134,8 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
  DMA_BIDIRECTIONAL,
  DMA_ATTR_SKIP_CPU_SYNC |
  DMA_ATTR_NO_KERNEL_MAPPING);
-   if (ret) {
-   sg_free_table(vma->userptr.sg);
-   vma->userptr.sg = NULL;
-   goto out;
-   }
+   if (ret)
+   goto out_free_sg;
 
for (i = 0; i < pinned; ++i) {
if (!read_only) {
@@ -152,17 +147,23 @@ int xe_vma_userptr_pin_pages(struct xe_vma *vma)
mark_page_accessed(pages[i]);
}
 
-out:
release_pages(pages, pinned);
kvfree(pages);
 
-   if (!(ret < 0)) {
-   vma->userptr.notifier_seq = notifier_seq;
-   if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
-   goto retry;
-   }
+   vma->userptr.notifier_seq = notifier_seq;
+   if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
+   goto retry;
+
+   return 0;
 
-   return ret < 0 ? ret : 0;
+out_free_sg:
+   sg_free_table(vma->userptr.sg);
+   vma->userptr.sg = NULL;
+out_release_pages:
+   release_pages(pages, pinned);
+mm_closed:
+   kvfree(pages);
+   return ret;
 }
 
 static bool preempt_fences_waiting(struct xe_vm *vm)
-- 
2.41.0

[PATCH v3 09/12] drm/bridge: tc358768: Rename dsibclk to hsbyteclk

2023-08-22 Thread Tomi Valkeinen

The Toshiba documentation talks about HSByteClk when referring to the
DSI HS byte clock, whereas the driver uses 'dsibclk' name. Also, in a
few places the driver calculates the byte clock from the DSI clock, even
if the byte clock is already available in a variable.

To align the driver with the documentation, change the 'dsibclk'
variable to 'hsbyteclk'. This also make it easier to visually separate
'dsibclk' and 'dsiclk' variables.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 48 +++
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 6297d28250e9..0f117d673b14 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -604,7 +604,7 @@ static int tc358768_setup_pll(struct tc358768_priv *priv,
 
dev_dbg(priv->dev, "PLL: refclk %lu, fbd %u, prd %u, frs %u\n",
clk_get_rate(priv->refclk), fbd, prd, frs);
-   dev_dbg(priv->dev, "PLL: pll_clk: %u, DSIClk %u, DSIByteClk %u\n",
+   dev_dbg(priv->dev, "PLL: pll_clk: %u, DSIClk %u, HSByteClk %u\n",
priv->dsiclk * 2, priv->dsiclk, priv->dsiclk / 4);
dev_dbg(priv->dev, "PLL: pclk %u (panel: %u)\n",
tc358768_pll_to_pclk(priv, priv->dsiclk * 2),
@@ -646,8 +646,8 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
u32 val, val2, lptxcnt, hact, data_type;
s32 raw_val;
const struct drm_display_mode *mode;
-   u32 dsibclk_nsk, dsiclk_nsk, ui_nsk;
-   u32 dsiclk, dsibclk, video_start;
+   u32 hsbyteclk_nsk, dsiclk_nsk, ui_nsk;
+   u32 dsiclk, hsbyteclk, video_start;
const u32 internal_delay = 40;
int ret, i;
struct videomode vm;
@@ -678,7 +678,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
drm_display_mode_to_videomode(mode, );
 
dsiclk = priv->dsiclk;
-   dsibclk = dsiclk / 4;
+   hsbyteclk = dsiclk / 4;
 
/* Data Format Control Register */
val = BIT(2) | BIT(1) | BIT(0); /* rdswap_en | dsitx_en | txdt_en */
@@ -730,67 +730,67 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
tc358768_write(priv, TC358768_D0W_CNTRL + i * 4, 0x);
 
/* DSI Timings */
-   dsibclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION,
- dsibclk);
+   hsbyteclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION,
+ hsbyteclk);
dsiclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION, dsiclk);
ui_nsk = dsiclk_nsk / 2;
dev_dbg(dev, "dsiclk_nsk: %u\n", dsiclk_nsk);
dev_dbg(dev, "ui_nsk: %u\n", ui_nsk);
-   dev_dbg(dev, "dsibclk_nsk: %u\n", dsibclk_nsk);
+   dev_dbg(dev, "hsbyteclk_nsk: %u\n", hsbyteclk_nsk);
 
/* LP11 > 100us for D-PHY Rx Init */
-   val = tc358768_ns_to_cnt(100 * 1000, dsibclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(100 * 1000, hsbyteclk_nsk) - 1;
dev_dbg(dev, "LINEINITCNT: %u\n", val);
tc358768_write(priv, TC358768_LINEINITCNT, val);
 
/* LPTimeCnt > 50ns */
-   val = tc358768_ns_to_cnt(50, dsibclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(50, hsbyteclk_nsk) - 1;
lptxcnt = val;
dev_dbg(dev, "LPTXTIMECNT: %u\n", val);
tc358768_write(priv, TC358768_LPTXTIMECNT, val);
 
/* 38ns < TCLK_PREPARE < 95ns */
-   val = tc358768_ns_to_cnt(65, dsibclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(65, hsbyteclk_nsk) - 1;
dev_dbg(dev, "TCLK_PREPARECNT %u\n", val);
/* TCLK_PREPARE + TCLK_ZERO > 300ns */
val2 = tc358768_ns_to_cnt(300 - tc358768_to_ns(2 * ui_nsk),
- dsibclk_nsk) - 2;
+ hsbyteclk_nsk) - 2;
dev_dbg(dev, "TCLK_ZEROCNT %u\n", val2);
val |= val2 << 8;
tc358768_write(priv, TC358768_TCLK_HEADERCNT, val);
 
/* TCLK_TRAIL > 60ns AND TEOT <= 105 ns + 12*UI */
-   raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(2 * ui_nsk), 
dsibclk_nsk) - 5;
+   raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(2 * ui_nsk), 
hsbyteclk_nsk) - 5;
val = clamp(raw_val, 0, 127);
dev_dbg(dev, "TCLK_TRAILCNT: %u\n", val);
tc358768_write(priv, TC358768_TCLK_TRAILCNT, val);
 
/* 40ns + 4*UI < THS_PREPARE < 85ns + 6*UI */
val = 50 + tc358768_to_ns(4 * ui_nsk);
-   val = tc358768_ns_to_cnt(val, dsibclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(val, hsbyteclk_nsk) - 1;
dev_dbg(dev, "THS_PREPARECNT %u\n", val);
/* THS_PREPARE + THS_ZERO > 145ns + 10*UI */
-   raw_val = tc358768_ns_to_cnt(145 - tc358768_to_ns(3 * ui_nsk), 
dsibclk_nsk) - 10;
+   raw_val = tc358768_ns_to_cnt(145 - tc358768_to_ns(3 * ui_nsk), 
hsbyteclk_nsk) - 10;
val2 =

[PATCH v3 07/12] drm/bridge: tc358768: Print logical values, not raw register values

2023-08-22 Thread Tomi Valkeinen

The driver debug prints DSI related timings as raw register values in
hex. It is much more useful to see the "logical" value of the timing,
not the register value.

Change the prints to print the values separately, in case a single
register contains multiple values, and use %u to have it in a more human
consumable form.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index b98c517c4726..88060f961064 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -739,57 +739,59 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
 
/* LP11 > 100us for D-PHY Rx Init */
val = tc358768_ns_to_cnt(100 * 1000, dsibclk_nsk) - 1;
-   dev_dbg(priv->dev, "LINEINITCNT: 0x%x\n", val);
+   dev_dbg(priv->dev, "LINEINITCNT: %u\n", val);
tc358768_write(priv, TC358768_LINEINITCNT, val);
 
/* LPTimeCnt > 50ns */
val = tc358768_ns_to_cnt(50, dsibclk_nsk) - 1;
lptxcnt = val;
-   dev_dbg(priv->dev, "LPTXTIMECNT: 0x%x\n", val);
+   dev_dbg(priv->dev, "LPTXTIMECNT: %u\n", val);
tc358768_write(priv, TC358768_LPTXTIMECNT, val);
 
/* 38ns < TCLK_PREPARE < 95ns */
val = tc358768_ns_to_cnt(65, dsibclk_nsk) - 1;
+   dev_dbg(priv->dev, "TCLK_PREPARECNT %u\n", val);
/* TCLK_PREPARE + TCLK_ZERO > 300ns */
val2 = tc358768_ns_to_cnt(300 - tc358768_to_ns(2 * ui_nsk),
  dsibclk_nsk) - 2;
+   dev_dbg(priv->dev, "TCLK_ZEROCNT %u\n", val2);
val |= val2 << 8;
-   dev_dbg(priv->dev, "TCLK_HEADERCNT: 0x%x\n", val);
tc358768_write(priv, TC358768_TCLK_HEADERCNT, val);
 
/* TCLK_TRAIL > 60ns AND TEOT <= 105 ns + 12*UI */
raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(2 * ui_nsk), 
dsibclk_nsk) - 5;
val = clamp(raw_val, 0, 127);
-   dev_dbg(priv->dev, "TCLK_TRAILCNT: 0x%x\n", val);
+   dev_dbg(priv->dev, "TCLK_TRAILCNT: %u\n", val);
tc358768_write(priv, TC358768_TCLK_TRAILCNT, val);
 
/* 40ns + 4*UI < THS_PREPARE < 85ns + 6*UI */
val = 50 + tc358768_to_ns(4 * ui_nsk);
val = tc358768_ns_to_cnt(val, dsibclk_nsk) - 1;
+   dev_dbg(priv->dev, "THS_PREPARECNT %u\n", val);
/* THS_PREPARE + THS_ZERO > 145ns + 10*UI */
raw_val = tc358768_ns_to_cnt(145 - tc358768_to_ns(3 * ui_nsk), 
dsibclk_nsk) - 10;
val2 = clamp(raw_val, 0, 127);
+   dev_dbg(priv->dev, "THS_ZEROCNT %u\n", val2);
val |= val2 << 8;
-   dev_dbg(priv->dev, "THS_HEADERCNT: 0x%x\n", val);
tc358768_write(priv, TC358768_THS_HEADERCNT, val);
 
/* TWAKEUP > 1ms in lptxcnt steps */
val = tc358768_ns_to_cnt(102, dsibclk_nsk);
val = val / (lptxcnt + 1) - 1;
-   dev_dbg(priv->dev, "TWAKEUP: 0x%x\n", val);
+   dev_dbg(priv->dev, "TWAKEUP: %u\n", val);
tc358768_write(priv, TC358768_TWAKEUP, val);
 
/* TCLK_POSTCNT > 60ns + 52*UI */
val = tc358768_ns_to_cnt(60 + tc358768_to_ns(52 * ui_nsk),
 dsibclk_nsk) - 3;
-   dev_dbg(priv->dev, "TCLK_POSTCNT: 0x%x\n", val);
+   dev_dbg(priv->dev, "TCLK_POSTCNT: %u\n", val);
tc358768_write(priv, TC358768_TCLK_POSTCNT, val);
 
/* max(60ns + 4*UI, 8*UI) < THS_TRAILCNT < 105ns + 12*UI */
raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(18 * ui_nsk),
 dsibclk_nsk) - 4;
val = clamp(raw_val, 0, 15);
-   dev_dbg(priv->dev, "THS_TRAILCNT: 0x%x\n", val);
+   dev_dbg(priv->dev, "THS_TRAILCNT: %u\n", val);
tc358768_write(priv, TC358768_THS_TRAILCNT, val);
 
val = BIT(0);
@@ -803,10 +805,11 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
/* TXTAGOCNT[26:16] RXTASURECNT[10:0] */
val = tc358768_to_ns((lptxcnt + 1) * dsibclk_nsk * 4);
val = tc358768_ns_to_cnt(val, dsibclk_nsk) / 4 - 1;
+   dev_dbg(priv->dev, "TXTAGOCNT: %u\n", val);
val2 = tc358768_ns_to_cnt(tc358768_to_ns((lptxcnt + 1) * dsibclk_nsk),
  dsibclk_nsk) - 2;
+   dev_dbg(priv->dev, "RXTASURECNT: %u\n", val2);
val = val << 16 | val2;
-   dev_dbg(priv->dev, "BTACNTRL1: 0x%x\n", val);
tc358768_write(priv, TC358768_BTACNTRL1, val);
 
/* START[0] */

-- 
2.34.1

[PATCH v3 06/12] drm/bridge: tc358768: Use struct videomode

2023-08-22 Thread Tomi Valkeinen

The TC358768 documentation uses HFP, HBP, etc. values to deal with the
video mode, while the driver currently uses the DRM display mode
(htotal, hsync_start, etc).

Change the driver to convert the DRM display mode to struct videomode,
which then allows us to use the same units the documentation uses. This
makes it much easier to work on the code when using the TC358768
documentation as a reference.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 45 +--
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index a465674f1e2e..b98c517c4726 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -650,6 +650,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
u32 dsiclk, dsibclk, video_start;
const u32 internal_delay = 40;
int ret, i;
+   struct videomode vm;
 
if (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) {
dev_warn_once(priv->dev, "Non-continuous mode unimplemented, 
falling back to continuous\n");
@@ -673,6 +674,8 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
return;
}
 
+   drm_display_mode_to_videomode(mode, );
+
dsiclk = priv->dsiclk;
dsibclk = dsiclk / 4;
 
@@ -681,28 +684,28 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
switch (dsi_dev->format) {
case MIPI_DSI_FMT_RGB888:
val |= (0x3 << 4);
-   hact = mode->hdisplay * 3;
-   video_start = (mode->htotal - mode->hsync_start) * 3;
+   hact = vm.hactive * 3;
+   video_start = (vm.hsync_len + vm.hback_porch) * 3;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_24;
break;
case MIPI_DSI_FMT_RGB666:
val |= (0x4 << 4);
-   hact = mode->hdisplay * 3;
-   video_start = (mode->htotal - mode->hsync_start) * 3;
+   hact = vm.hactive * 3;
+   video_start = (vm.hsync_len + vm.hback_porch) * 3;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_18;
break;
 
case MIPI_DSI_FMT_RGB666_PACKED:
val |= (0x4 << 4) | BIT(3);
-   hact = mode->hdisplay * 18 / 8;
-   video_start = (mode->htotal - mode->hsync_start) * 18 / 8;
+   hact = vm.hactive * 18 / 8;
+   video_start = (vm.hsync_len + vm.hback_porch) * 18 / 8;
data_type = MIPI_DSI_PIXEL_STREAM_3BYTE_18;
break;
 
case MIPI_DSI_FMT_RGB565:
val |= (0x5 << 4);
-   hact = mode->hdisplay * 2;
-   video_start = (mode->htotal - mode->hsync_start) * 2;
+   hact = vm.hactive * 2;
+   video_start = (vm.hsync_len + vm.hback_porch) * 2;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_16;
break;
default:
@@ -814,43 +817,43 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
tc358768_write(priv, TC358768_DSI_EVENT, 0);
 
/* vact */
-   tc358768_write(priv, TC358768_DSI_VACT, mode->vdisplay);
+   tc358768_write(priv, TC358768_DSI_VACT, vm.vactive);
 
/* vsw */
-   tc358768_write(priv, TC358768_DSI_VSW,
-  mode->vsync_end - mode->vsync_start);
+   tc358768_write(priv, TC358768_DSI_VSW, vm.vsync_len);
+
/* vbp */
-   tc358768_write(priv, TC358768_DSI_VBPR,
-  mode->vtotal - mode->vsync_end);
+   tc358768_write(priv, TC358768_DSI_VBPR, vm.vback_porch);
 
/* hsw * byteclk * ndl / pclk */
-   val = (u32)div_u64((mode->hsync_end - mode->hsync_start) *
+   val = (u32)div_u64(vm.hsync_len *
   ((u64)priv->dsiclk / 4) * priv->dsi_lanes,
-  mode->clock * 1000);
+  vm.pixelclock);
tc358768_write(priv, TC358768_DSI_HSW, val);
 
/* hbp * byteclk * ndl / pclk */
-   val = (u32)div_u64((mode->htotal - mode->hsync_end) *
+   val = (u32)div_u64(vm.hback_porch *
   ((u64)priv->dsiclk / 4) * priv->dsi_lanes,
-  mode->clock * 1000);
+  vm.pixelclock);
tc358768_write(priv, TC358768_DSI_HBPR, val);
} else {
/* Set event mode */
tc358768_write(priv, TC358768_DSI_EVENT, 1);
 
/* vact */
-   tc358768_write(priv, TC358768_DSI_VACT, mode->vdisplay);
+   tc358768_write(priv, TC358768_DSI_VACT, vm.vactive);

[PATCH v3 11/12] drm/bridge: tc358768: Fix tc358768_ns_to_cnt()

2023-08-22 Thread Tomi Valkeinen

The tc358768_ns_to_cnt() is, most likely, supposed to do a div-round-up
operation, but it misses subtracting one from the dividend.

Fix this by just using DIV_ROUND_UP().

Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 9ce8d120b50c..f41bf56b7d6b 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -630,7 +630,7 @@ static int tc358768_setup_pll(struct tc358768_priv *priv,
 
 static u32 tc358768_ns_to_cnt(u32 ns, u32 period_ps)
 {
-   return (ns * 1000 + period_ps) / period_ps;
+   return DIV_ROUND_UP(ns * 1000, period_ps);
 }
 
 static u32 tc358768_ps_to_ns(u32 ps)

-- 
2.34.1

[PATCH v3 08/12] drm/bridge: tc358768: Use dev for dbg prints, not priv->dev

2023-08-22 Thread Tomi Valkeinen

Simplify the code by capturing the priv->dev value to dev variable, and
use it.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 41 ---
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 88060f961064..6297d28250e9 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -651,9 +651,10 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
const u32 internal_delay = 40;
int ret, i;
struct videomode vm;
+   struct device *dev = priv->dev;
 
if (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) {
-   dev_warn_once(priv->dev, "Non-continuous mode unimplemented, 
falling back to continuous\n");
+   dev_warn_once(dev, "Non-continuous mode unimplemented, falling 
back to continuous\n");
mode_flags &= ~MIPI_DSI_CLOCK_NON_CONTINUOUS;
}
 
@@ -661,7 +662,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
 
ret = tc358768_sw_reset(priv);
if (ret) {
-   dev_err(priv->dev, "Software reset failed: %d\n", ret);
+   dev_err(dev, "Software reset failed: %d\n", ret);
tc358768_hw_disable(priv);
return;
}
@@ -669,7 +670,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
mode = >encoder->crtc->state->adjusted_mode;
ret = tc358768_setup_pll(priv, mode);
if (ret) {
-   dev_err(priv->dev, "PLL setup failed: %d\n", ret);
+   dev_err(dev, "PLL setup failed: %d\n", ret);
tc358768_hw_disable(priv);
return;
}
@@ -709,7 +710,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_16;
break;
default:
-   dev_err(priv->dev, "Invalid data format (%u)\n",
+   dev_err(dev, "Invalid data format (%u)\n",
dsi_dev->format);
tc358768_hw_disable(priv);
return;
@@ -733,65 +734,65 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
  dsibclk);
dsiclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION, dsiclk);
ui_nsk = dsiclk_nsk / 2;
-   dev_dbg(priv->dev, "dsiclk_nsk: %u\n", dsiclk_nsk);
-   dev_dbg(priv->dev, "ui_nsk: %u\n", ui_nsk);
-   dev_dbg(priv->dev, "dsibclk_nsk: %u\n", dsibclk_nsk);
+   dev_dbg(dev, "dsiclk_nsk: %u\n", dsiclk_nsk);
+   dev_dbg(dev, "ui_nsk: %u\n", ui_nsk);
+   dev_dbg(dev, "dsibclk_nsk: %u\n", dsibclk_nsk);
 
/* LP11 > 100us for D-PHY Rx Init */
val = tc358768_ns_to_cnt(100 * 1000, dsibclk_nsk) - 1;
-   dev_dbg(priv->dev, "LINEINITCNT: %u\n", val);
+   dev_dbg(dev, "LINEINITCNT: %u\n", val);
tc358768_write(priv, TC358768_LINEINITCNT, val);
 
/* LPTimeCnt > 50ns */
val = tc358768_ns_to_cnt(50, dsibclk_nsk) - 1;
lptxcnt = val;
-   dev_dbg(priv->dev, "LPTXTIMECNT: %u\n", val);
+   dev_dbg(dev, "LPTXTIMECNT: %u\n", val);
tc358768_write(priv, TC358768_LPTXTIMECNT, val);
 
/* 38ns < TCLK_PREPARE < 95ns */
val = tc358768_ns_to_cnt(65, dsibclk_nsk) - 1;
-   dev_dbg(priv->dev, "TCLK_PREPARECNT %u\n", val);
+   dev_dbg(dev, "TCLK_PREPARECNT %u\n", val);
/* TCLK_PREPARE + TCLK_ZERO > 300ns */
val2 = tc358768_ns_to_cnt(300 - tc358768_to_ns(2 * ui_nsk),
  dsibclk_nsk) - 2;
-   dev_dbg(priv->dev, "TCLK_ZEROCNT %u\n", val2);
+   dev_dbg(dev, "TCLK_ZEROCNT %u\n", val2);
val |= val2 << 8;
tc358768_write(priv, TC358768_TCLK_HEADERCNT, val);
 
/* TCLK_TRAIL > 60ns AND TEOT <= 105 ns + 12*UI */
raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(2 * ui_nsk), 
dsibclk_nsk) - 5;
val = clamp(raw_val, 0, 127);
-   dev_dbg(priv->dev, "TCLK_TRAILCNT: %u\n", val);
+   dev_dbg(dev, "TCLK_TRAILCNT: %u\n", val);
tc358768_write(priv, TC358768_TCLK_TRAILCNT, val);
 
/* 40ns + 4*UI < THS_PREPARE < 85ns + 6*UI */
val = 50 + tc358768_to_ns(4 * ui_nsk);
val = tc358768_ns_to_cnt(val, dsibclk_nsk) - 1;
-   dev_dbg(priv->dev, "THS_PREPARECNT %u\n", val);
+   dev_dbg(dev, "THS_PREPARECNT %u\n", val);
/* THS_PREPARE + THS_ZERO > 145ns + 10*UI */
raw_val = tc358768_ns_to_cnt(145 - tc358768_to_ns(3 * ui_nsk), 
dsibclk_nsk) - 10;
val2 = clamp(raw_val, 0, 127);
-   dev_dbg(priv->dev, "THS_ZEROCNT %u\n", val2);
+   dev_dbg(dev, "THS_ZEROCNT %u\n", val2);
val |= val2 << 8;
tc358768_write(priv, TC358768_THS_HEADERCNT, val);
 
/* TWAKEUP > 1ms in lptxcnt steps */
val = tc358768_ns_to_cnt(102,

[PATCH v3 10/12] drm/bridge: tc358768: Clean up clock period code

2023-08-22 Thread Tomi Valkeinen

The driver defines TC358768_PRECISION as 1000, and uses "nsk" to refer
to clock periods. The original author does not remember where all this
came from. Effectively the driver is using picoseconds as the unit for
clock periods, yet referring to them by "nsk".

Clean this up by just saying the periods are in picoseconds.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 60 +++
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 0f117d673b14..9ce8d120b50c 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -627,15 +628,14 @@ static int tc358768_setup_pll(struct tc358768_priv *priv,
return tc358768_clear_error(priv);
 }
 
-#define TC358768_PRECISION 1000
-static u32 tc358768_ns_to_cnt(u32 ns, u32 period_nsk)
+static u32 tc358768_ns_to_cnt(u32 ns, u32 period_ps)
 {
-   return (ns * TC358768_PRECISION + period_nsk) / period_nsk;
+   return (ns * 1000 + period_ps) / period_ps;
 }
 
-static u32 tc358768_to_ns(u32 nsk)
+static u32 tc358768_ps_to_ns(u32 ps)
 {
-   return (nsk / TC358768_PRECISION);
+   return ps / 1000;
 }
 
 static void tc358768_bridge_pre_enable(struct drm_bridge *bridge)
@@ -646,7 +646,7 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
u32 val, val2, lptxcnt, hact, data_type;
s32 raw_val;
const struct drm_display_mode *mode;
-   u32 hsbyteclk_nsk, dsiclk_nsk, ui_nsk;
+   u32 hsbyteclk_ps, dsiclk_ps, ui_ps;
u32 dsiclk, hsbyteclk, video_start;
const u32 internal_delay = 40;
int ret, i;
@@ -730,67 +730,65 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
tc358768_write(priv, TC358768_D0W_CNTRL + i * 4, 0x);
 
/* DSI Timings */
-   hsbyteclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION,
- hsbyteclk);
-   dsiclk_nsk = (u32)div_u64((u64)10 * TC358768_PRECISION, dsiclk);
-   ui_nsk = dsiclk_nsk / 2;
-   dev_dbg(dev, "dsiclk_nsk: %u\n", dsiclk_nsk);
-   dev_dbg(dev, "ui_nsk: %u\n", ui_nsk);
-   dev_dbg(dev, "hsbyteclk_nsk: %u\n", hsbyteclk_nsk);
+   hsbyteclk_ps = (u32)div_u64(PICO, hsbyteclk);
+   dsiclk_ps = (u32)div_u64(PICO, dsiclk);
+   ui_ps = dsiclk_ps / 2;
+   dev_dbg(dev, "dsiclk: %u ps, ui %u ps, hsbyteclk %u ps\n", dsiclk_ps,
+   ui_ps, hsbyteclk_ps);
 
/* LP11 > 100us for D-PHY Rx Init */
-   val = tc358768_ns_to_cnt(100 * 1000, hsbyteclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(100 * 1000, hsbyteclk_ps) - 1;
dev_dbg(dev, "LINEINITCNT: %u\n", val);
tc358768_write(priv, TC358768_LINEINITCNT, val);
 
/* LPTimeCnt > 50ns */
-   val = tc358768_ns_to_cnt(50, hsbyteclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(50, hsbyteclk_ps) - 1;
lptxcnt = val;
dev_dbg(dev, "LPTXTIMECNT: %u\n", val);
tc358768_write(priv, TC358768_LPTXTIMECNT, val);
 
/* 38ns < TCLK_PREPARE < 95ns */
-   val = tc358768_ns_to_cnt(65, hsbyteclk_nsk) - 1;
+   val = tc358768_ns_to_cnt(65, hsbyteclk_ps) - 1;
dev_dbg(dev, "TCLK_PREPARECNT %u\n", val);
/* TCLK_PREPARE + TCLK_ZERO > 300ns */
-   val2 = tc358768_ns_to_cnt(300 - tc358768_to_ns(2 * ui_nsk),
- hsbyteclk_nsk) - 2;
+   val2 = tc358768_ns_to_cnt(300 - tc358768_ps_to_ns(2 * ui_ps),
+ hsbyteclk_ps) - 2;
dev_dbg(dev, "TCLK_ZEROCNT %u\n", val2);
val |= val2 << 8;
tc358768_write(priv, TC358768_TCLK_HEADERCNT, val);
 
/* TCLK_TRAIL > 60ns AND TEOT <= 105 ns + 12*UI */
-   raw_val = tc358768_ns_to_cnt(60 + tc358768_to_ns(2 * ui_nsk), 
hsbyteclk_nsk) - 5;
+   raw_val = tc358768_ns_to_cnt(60 + tc358768_ps_to_ns(2 * ui_ps), 
hsbyteclk_ps) - 5;
val = clamp(raw_val, 0, 127);
dev_dbg(dev, "TCLK_TRAILCNT: %u\n", val);
tc358768_write(priv, TC358768_TCLK_TRAILCNT, val);
 
/* 40ns + 4*UI < THS_PREPARE < 85ns + 6*UI */
-   val = 50 + tc358768_to_ns(4 * ui_nsk);
-   val = tc358768_ns_to_cnt(val, hsbyteclk_nsk) - 1;
+   val = 50 + tc358768_ps_to_ns(4 * ui_ps);
+   val = tc358768_ns_to_cnt(val, hsbyteclk_ps) - 1;
dev_dbg(dev, "THS_PREPARECNT %u\n", val);
/* THS_PREPARE + THS_ZERO > 145ns + 10*UI */
-   raw_val = tc358768_ns_to_cnt(145 - tc358768_to_ns(3 * ui_nsk), 
hsbyteclk_nsk) - 10;
+   raw_val = tc358768_ns_to_cnt(145 - tc358768_ps_to_ns(3 * ui_ps), 
hsbyteclk_ps) - 10;
val2 = clamp(raw_val, 0, 127);
dev_dbg(dev, "THS_ZEROCNT %u\n", val2);
val |= val2 << 8;
tc358768_write(priv, TC358768_THS_HEADERCNT, val);
 
/* TWAKEUP > 1ms in

[PATCH v3 12/12] drm/bridge: tc358768: Attempt to fix DSI horizontal timings

2023-08-22 Thread Tomi Valkeinen

The DSI horizontal timing calculations done by the driver seem to often
lead to underflows or overflows, depending on the videomode.

There are two main things the current driver doesn't seem to get right:
DSI HSW and HFP, and VSDly. However, even following Toshiba's
documentation it seems we don't always get a working display.

This patch attempts to fix the horizontal timings for DSI event mode, and
on a system with a DSI->HDMI encoder, a lot of standard HDMI modes now
seem to work. The work relies on Toshiba's documentation, but also quite
a bit on empirical testing.

This also adds timing related debug prints to make it easier to improve
on this later.

The DSI pulse mode has only been tested with a fixed-resolution panel,
which limits the testing of different modes on DSI pulse mode. However,
as the VSDly calculation also affects pulse mode, so this might cause a
regression.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 211 +-
 1 file changed, 183 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index f41bf56b7d6b..b465e0a31d09 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -157,6 +158,7 @@ struct tc358768_priv {
u32 frs;/* PLL Freqency range for HSCK (post divider) */
 
u32 dsiclk; /* pll_clk / 2 */
+   u32 pclk;   /* incoming pclk rate */
 };
 
 static inline struct tc358768_priv *dsi_host_to_tc358768(struct mipi_dsi_host
@@ -380,6 +382,7 @@ static int tc358768_calc_pll(struct tc358768_priv *priv,
priv->prd = best_prd;
priv->frs = frs;
priv->dsiclk = best_pll / 2;
+   priv->pclk = mode->clock * 1000;
 
return 0;
 }
@@ -638,6 +641,28 @@ static u32 tc358768_ps_to_ns(u32 ps)
return ps / 1000;
 }
 
+static u32 tc358768_dpi_to_ns(u32 val, u32 pclk)
+{
+   return (u32)div_u64((u64)val * NANO, pclk);
+}
+
+/* Convert value in DPI pixel clock units to DSI byte count */
+static u32 tc358768_dpi_to_dsi_bytes(struct tc358768_priv *priv, u32 val)
+{
+   u64 m = (u64)val * priv->dsiclk / 4 * priv->dsi_lanes;
+   u64 n = priv->pclk;
+
+   return (u32)div_u64(m + n - 1, n);
+}
+
+static u32 tc358768_dsi_bytes_to_ns(struct tc358768_priv *priv, u32 val)
+{
+   u64 m = (u64)val * NANO;
+   u64 n = priv->dsiclk / 4 * priv->dsi_lanes;
+
+   return (u32)div_u64(m, n);
+}
+
 static void tc358768_bridge_pre_enable(struct drm_bridge *bridge)
 {
struct tc358768_priv *priv = bridge_to_tc358768(bridge);
@@ -647,11 +672,19 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
s32 raw_val;
const struct drm_display_mode *mode;
u32 hsbyteclk_ps, dsiclk_ps, ui_ps;
-   u32 dsiclk, hsbyteclk, video_start;
-   const u32 internal_delay = 40;
+   u32 dsiclk, hsbyteclk;
int ret, i;
struct videomode vm;
struct device *dev = priv->dev;
+   /* In pixelclock units */
+   u32 dpi_htot, dpi_data_start;
+   /* In byte units */
+   u32 dsi_dpi_htot, dsi_dpi_data_start;
+   u32 dsi_hsw, dsi_hbp, dsi_hact, dsi_hfp;
+   const u32 dsi_hss = 4; /* HSS is a short packet (4 bytes) */
+   /* In hsbyteclk units */
+   u32 dsi_vsdly;
+   const u32 internal_dly = 40;
 
if (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) {
dev_warn_once(dev, "Non-continuous mode unimplemented, falling 
back to continuous\n");
@@ -686,27 +719,23 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
case MIPI_DSI_FMT_RGB888:
val |= (0x3 << 4);
hact = vm.hactive * 3;
-   video_start = (vm.hsync_len + vm.hback_porch) * 3;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_24;
break;
case MIPI_DSI_FMT_RGB666:
val |= (0x4 << 4);
hact = vm.hactive * 3;
-   video_start = (vm.hsync_len + vm.hback_porch) * 3;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_18;
break;
 
case MIPI_DSI_FMT_RGB666_PACKED:
val |= (0x4 << 4) | BIT(3);
hact = vm.hactive * 18 / 8;
-   video_start = (vm.hsync_len + vm.hback_porch) * 18 / 8;
data_type = MIPI_DSI_PIXEL_STREAM_3BYTE_18;
break;
 
case MIPI_DSI_FMT_RGB565:
val |= (0x5 << 4);
hact = vm.hactive * 2;
-   video_start = (vm.hsync_len + vm.hback_porch) * 2;
data_type = MIPI_DSI_PACKED_PIXEL_STREAM_16;
break;
default:
@@ -716,9 +745,150 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
return;
}
 
+   /*
+* There are three

[PATCH v3 01/12] drm/tegra: rgb: Parameterize V- and H-sync polarities

2023-08-22 Thread Tomi Valkeinen

From: Thierry Reding 

The polarities of the V- and H-sync signals are encoded as flags in the
display mode, so use the existing information to setup the signals for
the RGB interface.

Signed-off-by: Thierry Reding 
Cc: Thierry Reding 
[tomi.valkei...@ideasonboard.com: default to positive sync]
Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/tegra/rgb.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tegra/rgb.c b/drivers/gpu/drm/tegra/rgb.c
index 79566c9ea8ff..fc66bbd913b2 100644
--- a/drivers/gpu/drm/tegra/rgb.c
+++ b/drivers/gpu/drm/tegra/rgb.c
@@ -99,6 +99,7 @@ static void tegra_rgb_encoder_disable(struct drm_encoder 
*encoder)
 
 static void tegra_rgb_encoder_enable(struct drm_encoder *encoder)
 {
+   struct drm_display_mode *mode = >crtc->state->adjusted_mode;
struct tegra_output *output = encoder_to_output(encoder);
struct tegra_rgb *rgb = to_rgb(output);
u32 value;
@@ -108,10 +109,19 @@ static void tegra_rgb_encoder_enable(struct drm_encoder 
*encoder)
value = DE_SELECT_ACTIVE | DE_CONTROL_NORMAL;
tegra_dc_writel(rgb->dc, value, DC_DISP_DATA_ENABLE_OPTIONS);
 
-   /* XXX: parameterize? */
+   /* configure H- and V-sync signal polarities */
value = tegra_dc_readl(rgb->dc, DC_COM_PIN_OUTPUT_POLARITY(1));
-   value &= ~LVS_OUTPUT_POLARITY_LOW;
-   value &= ~LHS_OUTPUT_POLARITY_LOW;
+
+   if (mode->flags & DRM_MODE_FLAG_NHSYNC)
+   value |= LHS_OUTPUT_POLARITY_LOW;
+   else
+   value &= ~LHS_OUTPUT_POLARITY_LOW;
+
+   if (mode->flags & DRM_MODE_FLAG_NVSYNC)
+   value |= LVS_OUTPUT_POLARITY_LOW;
+   else
+   value &= ~LVS_OUTPUT_POLARITY_LOW;
+
tegra_dc_writel(rgb->dc, value, DC_COM_PIN_OUTPUT_POLARITY(1));
 
/* XXX: parameterize? */

-- 
2.34.1

[PATCH v3 05/12] drm/bridge: tc358768: Cleanup PLL calculations

2023-08-22 Thread Tomi Valkeinen

As is quite common, some of TC358768's PLL register fields are to be
programmed with (value - 1). Specifically, the FBD and PRD, multiplier
and divider, are such fields.

However, what the driver currently does is that it considers that the
formula used for PLL rate calculation is:

RefClk * [(FBD + 1)/ (PRD + 1)] * [1 / (2^FRS)]

where FBD and PRD are values directly from the registers, while a more
sensible way to look at it is:

RefClk * FBD / PRD * (1 / (2^FRS))

and when the FBD and PRD values are written to the registers, they will
be subtracted by one.

Change the driver accordingly, as it simplifies the PLL code.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 2af23f1e..a465674f1e2e 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -316,7 +316,7 @@ static int tc358768_calc_pll(struct tc358768_priv *priv,
 
target_pll = tc358768_pclk_to_pll(priv, mode->clock * 1000);
 
-   /* pll_clk = RefClk * [(FBD + 1)/ (PRD + 1)] * [1 / (2^FRS)] */
+   /* pll_clk = RefClk * FBD / PRD * (1 / (2^FRS)) */
 
for (i = 0; i < ARRAY_SIZE(frs_limits); i++)
if (target_pll >= frs_limits[i])
@@ -336,19 +336,19 @@ static int tc358768_calc_pll(struct tc358768_priv *priv,
best_prd = 0;
best_fbd = 0;
 
-   for (prd = 0; prd < 16; ++prd) {
-   u32 divisor = (prd + 1) * (1 << frs);
+   for (prd = 1; prd <= 16; ++prd) {
+   u32 divisor = prd * (1 << frs);
u32 fbd;
 
-   for (fbd = 0; fbd < 512; ++fbd) {
+   for (fbd = 1; fbd <= 512; ++fbd) {
u32 pll, diff, pll_in;
 
-   pll = (u32)div_u64((u64)refclk * (fbd + 1), divisor);
+   pll = (u32)div_u64((u64)refclk * fbd, divisor);
 
if (pll >= max_pll || pll < min_pll)
continue;
 
-   pll_in = (u32)div_u64((u64)refclk, prd + 1);
+   pll_in = (u32)div_u64((u64)refclk, prd);
if (pll_in < 400)
continue;
 
@@ -611,7 +611,7 @@ static int tc358768_setup_pll(struct tc358768_priv *priv,
mode->clock * 1000);
 
/* PRD[15:12] FBD[8:0] */
-   tc358768_write(priv, TC358768_PLLCTL0, (prd << 12) | fbd);
+   tc358768_write(priv, TC358768_PLLCTL0, ((prd - 1) << 12) | (fbd - 1));
 
/* FRS[11:10] LBWS[9:8] CKEN[4] RESETB[1] EN[0] */
tc358768_write(priv, TC358768_PLLCTL1,

-- 
2.34.1

[PATCH v3 03/12] drm/bridge: tc358768: Default to positive h/v syncs

2023-08-22 Thread Tomi Valkeinen

As the TC358768 is a DPI to DSI bridge, the DSI side does not need to
define h/v sync polarities. This means that sometimes we have a mode
without defined sync polarities, which does not work on the DPI side.

Add a mode_fixup hook to default to positive sync polarities.

Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index bc97a837955b..963ac550509b 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -963,9 +963,27 @@ tc358768_atomic_get_input_bus_fmts(struct drm_bridge 
*bridge,
return input_fmts;
 }
 
+static bool tc358768_mode_fixup(struct drm_bridge *bridge,
+   const struct drm_display_mode *mode,
+   struct drm_display_mode *adjusted_mode)
+{
+   /* Default to positive sync */
+
+   if (!(adjusted_mode->flags &
+ (DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_NHSYNC)))
+   adjusted_mode->flags |= DRM_MODE_FLAG_PHSYNC;
+
+   if (!(adjusted_mode->flags &
+ (DRM_MODE_FLAG_PVSYNC | DRM_MODE_FLAG_NVSYNC)))
+   adjusted_mode->flags |= DRM_MODE_FLAG_PVSYNC;
+
+   return true;
+}
+
 static const struct drm_bridge_funcs tc358768_bridge_funcs = {
.attach = tc358768_bridge_attach,
.mode_valid = tc358768_bridge_mode_valid,
+   .mode_fixup = tc358768_mode_fixup,
.pre_enable = tc358768_bridge_pre_enable,
.enable = tc358768_bridge_enable,
.disable = tc358768_bridge_disable,

-- 
2.34.1

[PATCH v3 04/12] drm/bridge: tc358768: Fix bit updates

2023-08-22 Thread Tomi Valkeinen

The driver has a few places where it does:

if (thing_is_enabled_in_config)
update_thing_bit_in_hw()

This means that if the thing is _not_ enabled, the bit never gets
cleared. This affects the h/vsyncs and continuous DSI clock bits.

Fix the driver to always update the bit.

Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 963ac550509b..2af23f1e 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -794,8 +794,8 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
val |= BIT(i + 1);
tc358768_write(priv, TC358768_HSTXVREGEN, val);
 
-   if (!(mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS))
-   tc358768_write(priv, TC358768_TXOPTIONCNTRL, 0x1);
+   tc358768_write(priv, TC358768_TXOPTIONCNTRL,
+  (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) ? 0 : 
BIT(0));
 
/* TXTAGOCNT[26:16] RXTASURECNT[10:0] */
val = tc358768_to_ns((lptxcnt + 1) * dsibclk_nsk * 4);
@@ -861,11 +861,12 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
tc358768_write(priv, TC358768_DSI_HACT, hact);
 
/* VSYNC polarity */
-   if (!(mode->flags & DRM_MODE_FLAG_NVSYNC))
-   tc358768_update_bits(priv, TC358768_CONFCTL, BIT(5), BIT(5));
+   tc358768_update_bits(priv, TC358768_CONFCTL, BIT(5),
+(mode->flags & DRM_MODE_FLAG_PVSYNC) ? BIT(5) : 0);
+
/* HSYNC polarity */
-   if (mode->flags & DRM_MODE_FLAG_PHSYNC)
-   tc358768_update_bits(priv, TC358768_PP_MISC, BIT(0), BIT(0));
+   tc358768_update_bits(priv, TC358768_PP_MISC, BIT(0),
+(mode->flags & DRM_MODE_FLAG_PHSYNC) ? BIT(0) : 0);
 
/* Start DSI Tx */
tc358768_write(priv, TC358768_DSI_START, 0x1);

-- 
2.34.1

[PATCH v3 02/12] drm/bridge: tc358768: Fix use of uninitialized variable

2023-08-22 Thread Tomi Valkeinen

smatch reports:

drivers/gpu/drm/bridge/tc358768.c:223 tc358768_update_bits() error: 
uninitialized symbol 'orig'.

Fix this by bailing out from tc358768_update_bits() if the
tc358768_read() produces an error.

Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
Reviewed-by: Peter Ujfalusi 
Signed-off-by: Tomi Valkeinen 
---
 drivers/gpu/drm/bridge/tc358768.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index 819a4b6ec2a0..bc97a837955b 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -216,6 +216,10 @@ static void tc358768_update_bits(struct tc358768_priv 
*priv, u32 reg, u32 mask,
u32 tmp, orig;
 
tc358768_read(priv, reg, );
+
+   if (priv->error)
+   return;
+
tmp = orig & ~mask;
tmp |= val & mask;
if (tmp != orig)

-- 
2.34.1

[PATCH v3 00/12] drm/bridge: tc358768: Fixes and timings improvements

2023-08-22 Thread Tomi Valkeinen

This series contains various fixes and cleanups for TC358768. The target
of this work is to get TC358768 working on Toradex's AM62 based board,
which has the following display pipeline:

AM62 DPI -> TC358768 -> LT8912B -> HDMI connector

The main thing the series does is to improve the DSI HSW, HFP and VSDly
calculations.

 Tomi

Signed-off-by: Tomi Valkeinen 
---
Changes in v3:
- Add Peter's reviewed-bys
- Move "Default to positive h/v syncs" earlier in the series to avoid
  regression in the middle of the series
- Link to v2: 
https://lore.kernel.org/r/20230816-tc358768-v2-0-242b9d5f7...@ideasonboard.com

Changes in v2:
- Add "drm/tegra: rgb: Parameterize V- and H-sync polarities" so that
  Tegra can configure the polarities correctly.
- Add "drm/bridge: tc358768: Default to positive h/v syncs" as we don't
  (necessarily) have the polarities set in the mode.
- Drop "drm/bridge: tc358768: Add DRM_BRIDGE_ATTACH_NO_CONNECTOR
  support" as it's not needed for DRM_BRIDGE_ATTACH_NO_CONNECTOR
  support.
- Link to v1: 
https://lore.kernel.org/r/20230804-tc358768-v1-0-1afd44b78...@ideasonboard.com

---
Thierry Reding (1):
  drm/tegra: rgb: Parameterize V- and H-sync polarities

Tomi Valkeinen (11):
  drm/bridge: tc358768: Fix use of uninitialized variable
  drm/bridge: tc358768: Default to positive h/v syncs
  drm/bridge: tc358768: Fix bit updates
  drm/bridge: tc358768: Cleanup PLL calculations
  drm/bridge: tc358768: Use struct videomode
  drm/bridge: tc358768: Print logical values, not raw register values
  drm/bridge: tc358768: Use dev for dbg prints, not priv->dev
  drm/bridge: tc358768: Rename dsibclk to hsbyteclk
  drm/bridge: tc358768: Clean up clock period code
  drm/bridge: tc358768: Fix tc358768_ns_to_cnt()
  drm/bridge: tc358768: Attempt to fix DSI horizontal timings

 drivers/gpu/drm/bridge/tc358768.c | 381 --
 drivers/gpu/drm/tegra/rgb.c   |  16 +-
 2 files changed, 295 insertions(+), 102 deletions(-)
---
base-commit: 25205087df1ffe06ccea9302944ed1f77dc68c6f
change-id: 20230804-tc358768-1b6949ef2e3d

Best regards,
-- 
Tomi Valkeinen

RE: [PATCH v10 0/4] Add RZ/{G2L, G2LC} and RZ/V2L Display Unit support

2023-08-22 Thread Biju Das

Hi Laurent and all,

Gentle ping. Are we happy with this patch series?

I will send follow up fixes if we find any issues later.

Cheers,
Biju

> Subject: RE: [PATCH v10 0/4] Add RZ/{G2L,G2LC} and RZ/V2L Display Unit
> support
> 
> Hi Laurent and all,
> 
> Gentle ping. Are we ok this patch series?
> 
> Cheers,
> Biju
> 
> > Subject: [PATCH v10 0/4] Add RZ/{G2L,G2LC} and RZ/V2L Display Unit
> > support
> >
> > This path series aims to add support for RZ/G2L DU DRM driver.
> >
> > RZ/G2L LCD controller composed of Frame compression Processor(FCPVD),
> > Video signal processor (VSPD) and Display unit(DU). The output of LCDC
> > is connected to Display parallel interface and MIPI link video interface.
> >
> > The output from DSI is connected to ADV7535.
> >
> > Ref:
> >
> >
> >
> > This patch series is tested with [2]
> > [2]
> >
> > v9->v10:
> >  * patch#1 is mainlined, so dropped from this series.
> >  * Added Rb tag from Laurent for the binding patch.
> >  * Updated the commit description.
> >  * Updated description of the port by dropping the text "specified in
> >Documentation/devicetree/bindings/graph.txt."
> >  * Dropped empty endpoint from example.
> >  * Dropped ARM64 dependency from Kconfig.
> >  * Sorted the configs alphabetically in Kconfig.
> >  * Dropped DRM_RCAR_VSP config option and make DRM_RZG2L_DU depend on
> >VIDEO_RENESAS_VSP1.
> >  * On rzg2l_du_crtc_set_display_timing() replaced the setting of parent
> >clk rate with dclk rate.
> >  * Added rzg2l_du_write() wrapper function.
> >  * Updated the comment atomic_begin->atomic_flush.
> >  * Dropped .atomic_check and .atomic_begin callback
> >  * Renamed __rzg2l_du_crtc_plane_atomic_check-
> >__rzg2l_du_vsp_plane_atomic
> >_check and moved it to rzg2l_du_vsp.c
> >  * Added struct clk in rzg2l_du_crtc.h
> >  * Dropped the variables mmio_offset,index,vblank_lock,vblank_wait,
> >vblank_count from struct rzg2l_du_crtc.
> >  * Replaced the macro to_rzg2l_crtc with static inline functions.
> >  * Dropped the unneeded header files clk.h, io.h, mm.h, pm.h, slab.h,
> >wait.h and drm_managed.h from rzg2l_du_drv.c.
> >  * Replaced DRM_INFO->drm_info
> >  * Dropped the callbacks prime_handle_to_fd, prime_fd_to_handle and
> >gem_prime_mmap.
> >  * Replaced the callback remove->remove_new.
> >  * Dropped header file wait.h and added forward declarations struct
> > clk and
> >rzg2l_du_device from rzg2l_du_drv.h.
> >  * Dropped the dsi and dpad0_source variables from struct
> rzg2l_du_device.
> >  * Replaced the macro to_rzg2l_encoder with static inline functions.
> >  * Dropped header files dma-buf.h and wait.h from rzg2l_du_kms.c.
> >  * Dropped struct sg_table and added the scatterlist.h header file in
> >rzg2l_du_vsp.h
> >  * Added container_of.h header file, forward declarations struct
> > device and
> >struct rzg2l_du_device in rzg2l_du_vsp.h.
> > v8->v9:
> >  * Added Rb tag from Laurent and Acked-by tag from Kieran for patch#1.
> >  * Added Rb tag from Laurent and Geert for patch#3.
> >  * Dropped reset_control_assert() from error patch for
> > rzg2l_du_crtc_get() as
> >suggested by Philipp Zabel.
> >  * Added Rb tag from Laurent oatch#5.
> >  * Updated MAINTAINERS entries for common parts(Makefile and Kconfig).
> > v7->v8:
> >  * Moved rcar-du and shmobile DRM drivers to renesas specific vendor
> > directory.
> >  * Fixed the typo vsp2->du in RZ/V2L DU bindings patch.
> >  * Added Rb tag from Rob for RZ/V2L DU bindings patch.
> >  * Dropped RCar du lib and created RZ/G2L DU DRM driver by creating
> > rz_du folder.
> >  * Updated MAINTAINERS entries.
> > v6->v7:
> >  * Split DU lib and  RZ/G2L du driver as separate patch series as
> >DU support added to more platforms based on RZ/G2L alike SoCs.
> >  * Rebased to latest drm-tip.
> >  * Added patch #2 for binding support for RZ/V2L DU
> >  * Added patch #4 for driver support for RZ/V2L DU
> >  * Added patch #5 for SoC DTSI support for RZ/G2L DU
> >  * Added patch #6 for SoC DTSI support for RZ/V2L DU
> >  * Added patch #7 for Enabling DU on SMARC EVK based on RZ/{G2L,V2L}
> SoCs.
> >  * Added patch #8 for Enabling DU on SMARC EVK based on RZ/G2LC SoC.
> > v5->v6:
> >  * Merged DU lib and RZ/G2L du driver in same patch series
> >  * Rebased to latest drm-misc.
> >  * Merged patch#1 to RZ/G2L Driver patch.
> >  * Updated KConfig dependency from ARCH_RENESAS->ARCH_RZG2L.
> >  * Optimized rzg2l_du_output_name() by removing unsupported outputs.
> >
> > v4->v5:
> >  * Added Rb tag from Rob for binding patch.
> >  * Started using RCar DU libs(kms, vsp and encoder)
> >  * Started using rcar_du_device, rcar_du_write, rcar_du_crtc,
> >rcar_du_format_info and rcar_du_encoder.
> > v3->v4:
> >  * Changed compatible name from
> > renesas,du-r9a07g044->renesas,r9a07g044-du
> >  * started using same compatible for RZ/G2{L,LC}
> >  * Removed rzg2l_du_group.h and struct rzg2l_du_group
> >  * Renamed __rzg2l_du_group_start_stop->rzg2l_du_start_stop
> >  * Removed

Re: [PATCH v2 03/12] drm/bridge: tc358768: Fix bit updates

2023-08-22 Thread Tomi Valkeinen


On 22/08/2023 01:22, Maxim Schwalm wrote:

Hi Tomi,

On 16.08.23 13:25, Tomi Valkeinen wrote:

The driver has a few places where it does:

if (thing_is_enabled_in_config)
update_thing_bit_in_hw()

This means that if the thing is _not_ enabled, the bit never gets
cleared. This affects the h/vsyncs and continuous DSI clock bits.

Fix the driver to always update the bit.

Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
Signed-off-by: Tomi Valkeinen 
---
  drivers/gpu/drm/bridge/tc358768.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/tc358768.c 
b/drivers/gpu/drm/bridge/tc358768.c
index bc97a837955b..b668f77673c3 100644
--- a/drivers/gpu/drm/bridge/tc358768.c
+++ b/drivers/gpu/drm/bridge/tc358768.c
@@ -794,8 +794,8 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
val |= BIT(i + 1);
tc358768_write(priv, TC358768_HSTXVREGEN, val);
  
-	if (!(mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS))

-   tc358768_write(priv, TC358768_TXOPTIONCNTRL, 0x1);
+   tc358768_write(priv, TC358768_TXOPTIONCNTRL,
+  (mode_flags & MIPI_DSI_CLOCK_NON_CONTINUOUS) ? 0 : 
BIT(0));
  
  	/* TXTAGOCNT[26:16] RXTASURECNT[10:0] */

val = tc358768_to_ns((lptxcnt + 1) * dsibclk_nsk * 4);
@@ -861,11 +861,12 @@ static void tc358768_bridge_pre_enable(struct drm_bridge 
*bridge)
tc358768_write(priv, TC358768_DSI_HACT, hact);
  
  	/* VSYNC polarity */

-   if (!(mode->flags & DRM_MODE_FLAG_NVSYNC))
-   tc358768_update_bits(priv, TC358768_CONFCTL, BIT(5), BIT(5));
+   tc358768_update_bits(priv, TC358768_CONFCTL, BIT(5),
+(mode->flags & DRM_MODE_FLAG_PVSYNC) ? BIT(5) : 0);
+
/* HSYNC polarity */
-   if (mode->flags & DRM_MODE_FLAG_PHSYNC)
-   tc358768_update_bits(priv, TC358768_PP_MISC, BIT(0), BIT(0));
+   tc358768_update_bits(priv, TC358768_PP_MISC, BIT(0),
+(mode->flags & DRM_MODE_FLAG_PHSYNC) ? BIT(0) : 0);
  
  	/* Start DSI Tx */

tc358768_write(priv, TC358768_DSI_START, 0x1);



shouldn't the last patch of this series be moved before this one?
Currently, this patch will still lead to a temporary regression until
patch #12 is applied.


Indeed, good point. I'll change the patch order.

 Tomi

Re: [PATCH 1/1] drm/fourcc: Add documentation about software color conversion.

2023-08-22 Thread Jocelyn Falempe


On 22/08/2023 10:20, Pekka Paalanen wrote:

On Mon, 21 Aug 2023 17:55:33 +0200
Maxime Ripard  wrote:


Hi Pekka,

Thanks for answering

On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote:

On Thu, 10 Aug 2023 09:45:27 +0200
Maxime Ripard  wrote:

On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote:

After discussions on IRC, the consensus is that the DRM drivers should
not do software color conversion, and only advertise the supported formats.
Update the doc accordingly so that the rule and exceptions are clear for
everyone.

Signed-off-by: Jocelyn Falempe 
---
  include/uapi/drm/drm_fourcc.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 8db7fd3f743e..00a29152da9f 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -38,6 +38,13 @@ extern "C" {
   * fourcc code, a Format Modifier may optionally be provided, in order to
   * further describe the buffer's format - for example tiling or compression.
   *
+ * DRM drivers should not do software color conversion, and only advertise the
+ * format they support in hardware. But there are two exceptions:


I would do a bullet list here:
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks
   

+ * The first is to support XRGB if the hardware doesn't support it, because
+ * it's the de facto standard for userspace applications.


We can also provide a bit more context here, something like:

All drivers must support XRGB, even if the hardware cannot support
it. This has become the de-facto standard and a lot of user-space assume
it will be present.
   

+ * The second is to drop the unused bits when sending the data to the hardware,
+ * to improve the bandwidth, like dropping the "X" in XRGB.


I think it can be made a bit more generic, with something like:

Any driver is free to modify its internal representation of the format,
as long as it doesn't alter the visible content in any way. An example
would be to drop the padding component from a format to save some memory
bandwidth.


to my understanding and desire, the rule to not "fake" pixel format
support is strictly related to performance. When a KMS client does a
page flip, it usually does not expect a massive amount of CPU or GPU
work to occur just because of the flip. A name for such work is "copy",
referring to any kind of copying of large amounts of pixel data,
including a format conversion or not.


Should we add that to the suggested documentation that it shouldn't
degrade performance and shouldn't be something that the userspace can
notice?


I would let Sima (or Simon Ser) answer that, and verify my
understanding too.


This is especially important with GPU rendering and hardware video
playback systems, where any such copy could destroy the usability of
the whole system. This is the main reason why KMS must not do any
expensive processing unexpectedly (as in, not documented in UAPI).
Doing any kind of copy could cause a vblank to be missed, ruining
display timings.

I believe the above is the spirit of the rule.


That's totally reasonable to me :)


Then there will be exceptions. I'd like to think that everything below
(except for XRGB) can be derived from the above with common sense
- that's what I did.

XRGB support is the prime exception. I suspect it originates from
the legacy KMS UAPI, and the practise that XRGB has been widely
supported always. This makes it plausible for userspace to exist that
cannot produce any other format. Hence, it is good to support XRGB
through a conversion (copy) in the kernel for dumb buffers (that is,
for software rendered framebuffers). I would be very hesitant to extend
this exception to GPU rendered buffers, but OTOH if you have a GPU,
presumably you also have a display controller capable of scanning out
what the GPU renders, so you wouldn't even consider copying under the
hood.

DRM devices that cannot directly scan out buffers at all are a whole
category of exceptions. They include USB display adapters (literal USB,
not USB-C alt mode), perhaps networked and wireless displays, VKMS
which does everything in software, and so on. They simply have to
process the bulk pixel data with a CPU one way or another, and
hopefully they make use of damage rectangles to minimise the work.

Old-school special cursor planes may have been using special pixel
formats that may not be supported by userspace. Cursors are usually
small images and they can make a huge performance impact, so it makes
sense to support ARGB even with a CPU conversion.

Then we have display controllers without GPUs. Everything is
software-rendered. If it so happens that software rendering into sysram
and then copying (with conversion) into VRAM is more performant than
rendering into VRAM, then the copy is well justified.

Software-rendering into sysram and then copying into VRAM is actually
so

RE: [PATCH AUTOSEL 5.10 3/3] drm/amdkfd: ignore crat by default

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:37 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 5.10 3/3] drm/amdkfd: ignore crat by default
>
> From: Alex Deucher 
>
> [ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]
>
> We are dropping the IOMMUv2 path, so no need to enable this.
> It's often buggy on consumer platforms anyway.

This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 86b4dadf772e3..61fea0d268b96 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -749,11 +749,7 @@ static bool kfd_ignore_crat(void)
>   if (ignore_crat)
>   return true;
>
> -#ifndef KFD_SUPPORT_IOMMU_V2
>   ret = true;
> -#else
> - ret = false;
> -#endif
>
>   return ret;
>  }
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.1 10/10] drm/amdkfd: disable IOMMUv2 support for Raven

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.1 10/10] drm/amdkfd: disable IOMMUv2
> support for Raven
>
> From: Alex Deucher 
>
> [ Upstream commit 091ae5473f96ced844af6ba39b94757359b12348 ]
>
> Use the dGPU path instead.  There were a lot of platform issues with IOMMU
> in general on these chips due to windows not enabling IOMMU at the time.
> The dGPU path has been used for a long time with newer APUs and works
> fine.  This also paves the way to simplify the driver significantly.


This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 4cc5debdd119b..af18378e58d9f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -185,11 +185,6 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
>
>   kfd_device_info_set_event_interrupt_class(kfd);
>
> - /* Raven */
> - if (gc_version == IP_VERSION(9, 1, 0) ||
> - gc_version == IP_VERSION(9, 2, 2))
> - kfd->device_info.needs_iommu_device = true;
> -
>   if (gc_version < IP_VERSION(11, 0, 0)) {
>   /* Navi2x+, Navi1x+ */
>   if (gc_version == IP_VERSION(10, 3, 6)) @@ -287,7
> +282,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev,
> bool vf)
>   gfx_target_version = 9;
>   f2g = _v9_kfd2kgd;
>   break;
> -#ifdef KFD_SUPPORT_IOMMU_V2
>   /* Raven */
>   case IP_VERSION(9, 1, 0):
>   case IP_VERSION(9, 2, 2):
> @@ -295,7 +289,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   if (!vf)
>   f2g = _v9_kfd2kgd;
>   break;
> -#endif
>   /* Vega12 */
>   case IP_VERSION(9, 2, 1):
>   gfx_target_version = 90004;
> --
> 2.40.1

RE: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:37 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default
>
> From: Alex Deucher 
>
> [ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]
>
> We are dropping the IOMMUv2 path, so no need to enable this.
> It's often buggy on consumer platforms anyway.

This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index e574aa32a111d..46dfd9baeb013 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -1523,11 +1523,7 @@ static bool kfd_ignore_crat(void)
>   if (ignore_crat)
>   return true;
>
> -#ifndef KFD_SUPPORT_IOMMU_V2
>   ret = true;
> -#else
> - ret = false;
> -#endif
>
>   return ret;
>  }
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.1 09/10] drm/amdkfd: disable IOMMUv2 support for KV/CZ

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.1 09/10] drm/amdkfd: disable IOMMUv2
> support for KV/CZ
>
> From: Alex Deucher 
>
> [ Upstream commit 616f92d188ee7142a95a52068efdbea82645f859 ]
>
> Use the dGPU path instead.  There were a lot of platform issues with IOMMU
> in general on these chips due to windows not enabling IOMMU at the time.
> The dGPU path has been used for a long time with newer APUs and works
> fine.  This also paves the way to simplify the driver significantly.

This is not needed for stable.

Alex

>
> v2: use the dGPU queue manager functions
>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 6 --
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +---
>  2 files changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 27820f0a282d1..4cc5debdd119b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -216,10 +216,6 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
>   asic_type != CHIP_TONGA)
>   kfd->device_info.supports_cwsr = true;
>
> - if (asic_type == CHIP_KAVERI ||
> - asic_type == CHIP_CARRIZO)
> - kfd->device_info.needs_iommu_device = true;
> -
>   if (asic_type != CHIP_HAWAII && !vf)
>   kfd->device_info.needs_pci_atomics = true;
>   }
> @@ -233,7 +229,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   uint32_t gfx_target_version = 0;
>
>   switch (adev->asic_type) {
> -#ifdef KFD_SUPPORT_IOMMU_V2
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>   case CHIP_KAVERI:
>   gfx_target_version = 7;
> @@ -246,7 +241,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   if (!vf)
>   f2g = _v8_kfd2kgd;
>   break;
> -#endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>   case CHIP_HAWAII:
>   gfx_target_version = 70001;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index c06ada0844ba1..5616a722578f5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2335,18 +2335,12 @@ struct device_queue_manager
> *device_queue_manager_init(struct kfd_dev *dev)
>   }
>
>   switch (dev->adev->asic_type) {
> - case CHIP_CARRIZO:
> - device_queue_manager_init_vi(>asic_ops);
> - break;
> -
>   case CHIP_KAVERI:
> - device_queue_manager_init_cik(>asic_ops);
> - break;
> -
>   case CHIP_HAWAII:
>   device_queue_manager_init_cik_hawaii(>asic_ops);
>   break;
>
> + case CHIP_CARRIZO:
>   case CHIP_TONGA:
>   case CHIP_FIJI:
>   case CHIP_POLARIS10:
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.1 08/10] drm/amdkfd: ignore crat by default

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.1 08/10] drm/amdkfd: ignore crat by default
>
> From: Alex Deucher 
>
> [ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]
>
> We are dropping the IOMMUv2 path, so no need to enable this.
> It's often buggy on consumer platforms anyway.


This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index e45c6bc8d10bb..a9fa4772b2d35 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -1543,11 +1543,7 @@ static bool kfd_ignore_crat(void)
>   if (ignore_crat)
>   return true;
>
> -#ifndef KFD_SUPPORT_IOMMU_V2
>   ret = true;
> -#else
> - ret = false;
> -#endif
>
>   return ret;
>  }
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.4 09/11] drm/amdkfd: ignore crat by default

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.4 09/11] drm/amdkfd: ignore crat by default
>
> From: Alex Deucher 
>
> [ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ]
>
> We are dropping the IOMMUv2 path, so no need to enable this.
> It's often buggy on consumer platforms anyway.
>

This is not needed for stable.

Alex


> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 475e470273540..ee0cc35d68a84 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -1543,11 +1543,7 @@ static bool kfd_ignore_crat(void)
>   if (ignore_crat)
>   return true;
>
> -#ifndef KFD_SUPPORT_IOMMU_V2
>   ret = true;
> -#else
> - ret = false;
> -#endif
>
>   return ret;
>  }
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.4 11/11] drm/amdkfd: disable IOMMUv2 support for Raven

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.4 11/11] drm/amdkfd: disable IOMMUv2
> support for Raven
>
> From: Alex Deucher 
>
> [ Upstream commit 091ae5473f96ced844af6ba39b94757359b12348 ]
>
> Use the dGPU path instead.  There were a lot of platform issues with IOMMU
> in general on these chips due to windows not enabling IOMMU at the time.
> The dGPU path has been used for a long time with newer APUs and works
> fine.  This also paves the way to simplify the driver significantly.

This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 9c8197573dee7..224e057d2dbbf 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -185,11 +185,6 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
>
>   kfd_device_info_set_event_interrupt_class(kfd);
>
> - /* Raven */
> - if (gc_version == IP_VERSION(9, 1, 0) ||
> - gc_version == IP_VERSION(9, 2, 2))
> - kfd->device_info.needs_iommu_device = true;
> -
>   if (gc_version < IP_VERSION(11, 0, 0)) {
>   /* Navi2x+, Navi1x+ */
>   if (gc_version == IP_VERSION(10, 3, 6)) @@ -283,7
> +278,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev,
> bool vf)
>   gfx_target_version = 9;
>   f2g = _v9_kfd2kgd;
>   break;
> -#ifdef KFD_SUPPORT_IOMMU_V2
>   /* Raven */
>   case IP_VERSION(9, 1, 0):
>   case IP_VERSION(9, 2, 2):
> @@ -291,7 +285,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   if (!vf)
>   f2g = _v9_kfd2kgd;
>   break;
> -#endif
>   /* Vega12 */
>   case IP_VERSION(9, 2, 1):
>   gfx_target_version = 90004;
> --
> 2.40.1

RE: [PATCH AUTOSEL 6.4 10/11] drm/amdkfd: disable IOMMUv2 support for KV/CZ

2023-08-22 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Sasha Levin 
> Sent: Tuesday, August 22, 2023 7:36 AM
> To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Koenig, Christian ;
> Mike Lothian ; Sasha Levin ; Pan,
> Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd-
> g...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: [PATCH AUTOSEL 6.4 10/11] drm/amdkfd: disable IOMMUv2
> support for KV/CZ
>
> From: Alex Deucher 
>
> [ Upstream commit 616f92d188ee7142a95a52068efdbea82645f859 ]
>
> Use the dGPU path instead.  There were a lot of platform issues with IOMMU
> in general on these chips due to windows not enabling IOMMU at the time.
> The dGPU path has been used for a long time with newer APUs and works
> fine.  This also paves the way to simplify the driver significantly.
>
> v2: use the dGPU queue manager functions

This is not needed for stable.

Alex

>
> Reviewed-by: Felix Kuehling 
> Acked-by: Christian König 
> Tested-by: Mike Lothian 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 6 --
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +---
>  2 files changed, 1 insertion(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 00f528eb98126..9c8197573dee7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -224,10 +224,6 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
>   asic_type != CHIP_TONGA)
>   kfd->device_info.supports_cwsr = true;
>
> - if (asic_type == CHIP_KAVERI ||
> - asic_type == CHIP_CARRIZO)
> - kfd->device_info.needs_iommu_device = true;
> -
>   if (asic_type != CHIP_HAWAII && !vf)
>   kfd->device_info.needs_pci_atomics = true;
>   }
> @@ -240,7 +236,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   uint32_t gfx_target_version = 0;
>
>   switch (adev->asic_type) {
> -#ifdef KFD_SUPPORT_IOMMU_V2
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>   case CHIP_KAVERI:
>   gfx_target_version = 7;
> @@ -253,7 +248,6 @@ struct kfd_dev *kgd2kfd_probe(struct
> amdgpu_device *adev, bool vf)
>   if (!vf)
>   f2g = _v8_kfd2kgd;
>   break;
> -#endif
>  #ifdef CONFIG_DRM_AMDGPU_CIK
>   case CHIP_HAWAII:
>   gfx_target_version = 70001;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 7a95698d83f73..c73417e79745e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2335,18 +2335,12 @@ struct device_queue_manager
> *device_queue_manager_init(struct kfd_dev *dev)
>   }
>
>   switch (dev->adev->asic_type) {
> - case CHIP_CARRIZO:
> - device_queue_manager_init_vi(>asic_ops);
> - break;
> -
>   case CHIP_KAVERI:
> - device_queue_manager_init_cik(>asic_ops);
> - break;
> -
>   case CHIP_HAWAII:
>   device_queue_manager_init_cik_hawaii(>asic_ops);
>   break;
>
> + case CHIP_CARRIZO:
>   case CHIP_TONGA:
>   case CHIP_FIJI:
>   case CHIP_POLARIS10:
> --
> 2.40.1

Re: [PATCH RFC 00/13] drm/connector: Create HDMI Connector infrastructure

2023-08-22 Thread Daniel Vetter

On Tue, Aug 22, 2023 at 05:51:39PM +0300, Jani Nikula wrote:
> On Tue, 22 Aug 2023, Maxime Ripard  wrote:
> > Hi,
> >
> > On Tue, Aug 22, 2023 at 04:16:08PM +0200, Daniel Vetter wrote:
> >> On Mon, Aug 14, 2023 at 03:56:12PM +0200, Maxime Ripard wrote:
> >> > Here's a series that creates a subclass of drm_connector specifically
> >> > targeted at HDMI controllers.
> >> > 
> >> > The idea behind this series came from a recent discussion on IRC during
> >> > which we discussed infoframes generation of i915 vs everything else. 
> >> > 
> >> > Infoframes generation code still requires some decent boilerplate, with
> >> > each driver doing some variation of it.
> >> > 
> >> > In parallel, while working on vc4, we ended up converting a lot of i915
> >> > logic (mostly around format / bpc selection, and scrambler setup) to
> >> > apply on top of a driver that relies only on helpers.
> >> > 
> >> > While currently sitting in the vc4 driver, none of that logic actually
> >> > relies on any driver or hardware-specific behaviour.
> >> > 
> >> > The only missing piec to make it shareable are a bunch of extra
> >> > variables stored in a state (current bpc, format, RGB range selection,
> >> > etc.).
> >> > 
> >> > Thus, I decided to create some generic subclass of drm_connector to
> >> > address HDMI connectors, with a bunch of helpers that will take care of
> >> > all the "HDMI Spec" related code. Scrambler setup is missing at the
> >> > moment but can easily be plugged in.
> >> > 
> >> > Last week, Hans Verkuil also expressed interest in retrieving the
> >> > infoframes generated from userspace to create an infoframe-decode tool.
> >> > This series thus leverages the infoframe generation code to expose it
> >> > through debugfs.
> >> > 
> >> > This entire series is only build-tested at the moment. Let me know what
> >> > you think,
> >>
> >> I think the idea overall makes sense, we we probably need it to roll out
> >> actual hdmi support to all the hdmi drivers we have. But there's the
> >> eternal issue of "C sucks at multiple inheritance".
> >> 
> >> Which means if you have a driver that subclasses drm_connector already for
> >> it's driver needs it defacto cannot, or only under some serious pains, use
> >> this.
> >
> > That's what vc4 is doing, and it went fine I think? it was mostly a
> > matter of subclassing drm_hdmi_connector instead of drm_connector, and
> > adjusting the various pointers and accessors here and there.
> >
> > It does create a fairly big diffstat, but nothing too painful.
> 
> The main pain point is not the diffstat per se, but that *all* casts to
> subclass need to check what the connector type is before doing
> so. You'll also get fun NULL conditions that you need to check and
> handle if the type isn't what you'd like it to be.
> 
> Currently i915 can just assume all drm_connectors it encounters are
> intel_connectors that it created, always.
> 
> Basically this has blocked the writeback connector stuff for a few years
> now in i915, because writeback forces a different subclassing, and what
> should be a small change in i915 turns into huge churn.

Yeah after the writeback experience I'm heavily leaning towards "this was
a mistake".

For writeback we could refactor it I think by just moving it all (which I
hope isn't too much churn), and then removing the then empty types (which
is where the big churn kicks in, so maybe just add that to gpu/todo.rst).

Cheers, Sima

> 
> BR,
> Jani.
> 
> 
> >
> >> Which is kinda why in practice we tend to not subclass, but stuff
> >> subclass fields into a name sub-structure. So essentially struct
> >> drm_connector.hdmi and struct drm_connector_state.hdmi instead of
> >> drm_hdmi_connector and drm_hdmi_connector_state. The helper functions to
> >> set it all up would all still be the same roughly. It's less typesafe but
> >> I think the gain in practical use (like you could make i915 use the
> >> helpers probably, which with this approach here is practically
> >> impossible).
> >
> > Ack.
> >
> >> The only other nit is that we probably want to put some of the hdmi
> >> properties into struct drm_mode_config because there's no reason to have
> >> per-connector valid values.
> >
> > What property would you want to move?
> >
> >> Also, it might be really good if you can find a co-conspirator who also
> >> wants to use this in their driver, then with some i915 extracting we'd
> >> have three, which should ensure the helper api is solid.
> >
> > I can convert sunxi (old) HDMI driver if needed. I'm not sure how
> > helpful it would be since it doesn't support bpc > 8, but it could be a
> > nice showcase still for "simple" HDMI controllers.
> >
> > Maxime
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH RFC 00/13] drm/connector: Create HDMI Connector infrastructure

2023-08-22 Thread Jani Nikula

On Tue, 22 Aug 2023, Maxime Ripard  wrote:
> Hi,
>
> On Tue, Aug 22, 2023 at 04:16:08PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 14, 2023 at 03:56:12PM +0200, Maxime Ripard wrote:
>> > Here's a series that creates a subclass of drm_connector specifically
>> > targeted at HDMI controllers.
>> > 
>> > The idea behind this series came from a recent discussion on IRC during
>> > which we discussed infoframes generation of i915 vs everything else. 
>> > 
>> > Infoframes generation code still requires some decent boilerplate, with
>> > each driver doing some variation of it.
>> > 
>> > In parallel, while working on vc4, we ended up converting a lot of i915
>> > logic (mostly around format / bpc selection, and scrambler setup) to
>> > apply on top of a driver that relies only on helpers.
>> > 
>> > While currently sitting in the vc4 driver, none of that logic actually
>> > relies on any driver or hardware-specific behaviour.
>> > 
>> > The only missing piec to make it shareable are a bunch of extra
>> > variables stored in a state (current bpc, format, RGB range selection,
>> > etc.).
>> > 
>> > Thus, I decided to create some generic subclass of drm_connector to
>> > address HDMI connectors, with a bunch of helpers that will take care of
>> > all the "HDMI Spec" related code. Scrambler setup is missing at the
>> > moment but can easily be plugged in.
>> > 
>> > Last week, Hans Verkuil also expressed interest in retrieving the
>> > infoframes generated from userspace to create an infoframe-decode tool.
>> > This series thus leverages the infoframe generation code to expose it
>> > through debugfs.
>> > 
>> > This entire series is only build-tested at the moment. Let me know what
>> > you think,
>>
>> I think the idea overall makes sense, we we probably need it to roll out
>> actual hdmi support to all the hdmi drivers we have. But there's the
>> eternal issue of "C sucks at multiple inheritance".
>> 
>> Which means if you have a driver that subclasses drm_connector already for
>> it's driver needs it defacto cannot, or only under some serious pains, use
>> this.
>
> That's what vc4 is doing, and it went fine I think? it was mostly a
> matter of subclassing drm_hdmi_connector instead of drm_connector, and
> adjusting the various pointers and accessors here and there.
>
> It does create a fairly big diffstat, but nothing too painful.

The main pain point is not the diffstat per se, but that *all* casts to
subclass need to check what the connector type is before doing
so. You'll also get fun NULL conditions that you need to check and
handle if the type isn't what you'd like it to be.

Currently i915 can just assume all drm_connectors it encounters are
intel_connectors that it created, always.

Basically this has blocked the writeback connector stuff for a few years
now in i915, because writeback forces a different subclassing, and what
should be a small change in i915 turns into huge churn.

BR,
Jani.


>
>> Which is kinda why in practice we tend to not subclass, but stuff
>> subclass fields into a name sub-structure. So essentially struct
>> drm_connector.hdmi and struct drm_connector_state.hdmi instead of
>> drm_hdmi_connector and drm_hdmi_connector_state. The helper functions to
>> set it all up would all still be the same roughly. It's less typesafe but
>> I think the gain in practical use (like you could make i915 use the
>> helpers probably, which with this approach here is practically
>> impossible).
>
> Ack.
>
>> The only other nit is that we probably want to put some of the hdmi
>> properties into struct drm_mode_config because there's no reason to have
>> per-connector valid values.
>
> What property would you want to move?
>
>> Also, it might be really good if you can find a co-conspirator who also
>> wants to use this in their driver, then with some i915 extracting we'd
>> have three, which should ensure the helper api is solid.
>
> I can convert sunxi (old) HDMI driver if needed. I'm not sure how
> helpful it would be since it doesn't support bpc > 8, but it could be a
> nice showcase still for "simple" HDMI controllers.
>
> Maxime

-- 
Jani Nikula, Intel Open Source Graphics Center

Re: [PATCH RFC 00/13] drm/connector: Create HDMI Connector infrastructure

2023-08-22 Thread Daniel Vetter

On Tue, Aug 22, 2023 at 04:35:55PM +0200, Maxime Ripard wrote:
> Hi,
> 
> On Tue, Aug 22, 2023 at 04:16:08PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 14, 2023 at 03:56:12PM +0200, Maxime Ripard wrote:
> > > Here's a series that creates a subclass of drm_connector specifically
> > > targeted at HDMI controllers.
> > > 
> > > The idea behind this series came from a recent discussion on IRC during
> > > which we discussed infoframes generation of i915 vs everything else. 
> > > 
> > > Infoframes generation code still requires some decent boilerplate, with
> > > each driver doing some variation of it.
> > > 
> > > In parallel, while working on vc4, we ended up converting a lot of i915
> > > logic (mostly around format / bpc selection, and scrambler setup) to
> > > apply on top of a driver that relies only on helpers.
> > > 
> > > While currently sitting in the vc4 driver, none of that logic actually
> > > relies on any driver or hardware-specific behaviour.
> > > 
> > > The only missing piec to make it shareable are a bunch of extra
> > > variables stored in a state (current bpc, format, RGB range selection,
> > > etc.).
> > > 
> > > Thus, I decided to create some generic subclass of drm_connector to
> > > address HDMI connectors, with a bunch of helpers that will take care of
> > > all the "HDMI Spec" related code. Scrambler setup is missing at the
> > > moment but can easily be plugged in.
> > > 
> > > Last week, Hans Verkuil also expressed interest in retrieving the
> > > infoframes generated from userspace to create an infoframe-decode tool.
> > > This series thus leverages the infoframe generation code to expose it
> > > through debugfs.
> > > 
> > > This entire series is only build-tested at the moment. Let me know what
> > > you think,
> >
> > I think the idea overall makes sense, we we probably need it to roll out
> > actual hdmi support to all the hdmi drivers we have. But there's the
> > eternal issue of "C sucks at multiple inheritance".
> > 
> > Which means if you have a driver that subclasses drm_connector already for
> > it's driver needs it defacto cannot, or only under some serious pains, use
> > this.
> 
> That's what vc4 is doing, and it went fine I think? it was mostly a
> matter of subclassing drm_hdmi_connector instead of drm_connector, and
> adjusting the various pointers and accessors here and there.
> 
> It does create a fairly big diffstat, but nothing too painful.

Yeah it's the massive churn that's the pain for refactoring existing
bigger drivers.

Plus what do you do when you both need a hdmi connector and a dp connector
(or a writeback connector).

> > Which is kinda why in practice we tend to not subclass, but stuff
> > subclass fields into a name sub-structure. So essentially struct
> > drm_connector.hdmi and struct drm_connector_state.hdmi instead of
> > drm_hdmi_connector and drm_hdmi_connector_state. The helper functions to
> > set it all up would all still be the same roughly. It's less typesafe but
> > I think the gain in practical use (like you could make i915 use the
> > helpers probably, which with this approach here is practically
> > impossible).
> 
> Ack.
> 
> > The only other nit is that we probably want to put some of the hdmi
> > properties into struct drm_mode_config because there's no reason to have
> > per-connector valid values.
> 
> What property would you want to move?

The rgb broadcast property looked very much like it's connector invariant.
Just the one I noticed, I didn't check all the others.

> > Also, it might be really good if you can find a co-conspirator who also
> > wants to use this in their driver, then with some i915 extracting we'd
> > have three, which should ensure the helper api is solid.
> 
> I can convert sunxi (old) HDMI driver if needed. I'm not sure how
> helpful it would be since it doesn't support bpc > 8, but it could be a
> nice showcase still for "simple" HDMI controllers.

Yeah that might be good. Or perhaps poke Rob Clark whether msm is
interested and someone could do a conversion for dpu5 or so?

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH RFC 00/13] drm/connector: Create HDMI Connector infrastructure

2023-08-22 Thread Maxime Ripard

Hi,

On Tue, Aug 22, 2023 at 04:16:08PM +0200, Daniel Vetter wrote:
> On Mon, Aug 14, 2023 at 03:56:12PM +0200, Maxime Ripard wrote:
> > Here's a series that creates a subclass of drm_connector specifically
> > targeted at HDMI controllers.
> > 
> > The idea behind this series came from a recent discussion on IRC during
> > which we discussed infoframes generation of i915 vs everything else. 
> > 
> > Infoframes generation code still requires some decent boilerplate, with
> > each driver doing some variation of it.
> > 
> > In parallel, while working on vc4, we ended up converting a lot of i915
> > logic (mostly around format / bpc selection, and scrambler setup) to
> > apply on top of a driver that relies only on helpers.
> > 
> > While currently sitting in the vc4 driver, none of that logic actually
> > relies on any driver or hardware-specific behaviour.
> > 
> > The only missing piec to make it shareable are a bunch of extra
> > variables stored in a state (current bpc, format, RGB range selection,
> > etc.).
> > 
> > Thus, I decided to create some generic subclass of drm_connector to
> > address HDMI connectors, with a bunch of helpers that will take care of
> > all the "HDMI Spec" related code. Scrambler setup is missing at the
> > moment but can easily be plugged in.
> > 
> > Last week, Hans Verkuil also expressed interest in retrieving the
> > infoframes generated from userspace to create an infoframe-decode tool.
> > This series thus leverages the infoframe generation code to expose it
> > through debugfs.
> > 
> > This entire series is only build-tested at the moment. Let me know what
> > you think,
>
> I think the idea overall makes sense, we we probably need it to roll out
> actual hdmi support to all the hdmi drivers we have. But there's the
> eternal issue of "C sucks at multiple inheritance".
> 
> Which means if you have a driver that subclasses drm_connector already for
> it's driver needs it defacto cannot, or only under some serious pains, use
> this.

That's what vc4 is doing, and it went fine I think? it was mostly a
matter of subclassing drm_hdmi_connector instead of drm_connector, and
adjusting the various pointers and accessors here and there.

It does create a fairly big diffstat, but nothing too painful.

> Which is kinda why in practice we tend to not subclass, but stuff
> subclass fields into a name sub-structure. So essentially struct
> drm_connector.hdmi and struct drm_connector_state.hdmi instead of
> drm_hdmi_connector and drm_hdmi_connector_state. The helper functions to
> set it all up would all still be the same roughly. It's less typesafe but
> I think the gain in practical use (like you could make i915 use the
> helpers probably, which with this approach here is practically
> impossible).

Ack.

> The only other nit is that we probably want to put some of the hdmi
> properties into struct drm_mode_config because there's no reason to have
> per-connector valid values.

What property would you want to move?

> Also, it might be really good if you can find a co-conspirator who also
> wants to use this in their driver, then with some i915 extracting we'd
> have three, which should ensure the helper api is solid.

I can convert sunxi (old) HDMI driver if needed. I'm not sure how
helpful it would be since it doesn't support bpc > 8, but it could be a
nice showcase still for "simple" HDMI controllers.

Maxime


signature.asc
Description: PGP signature

Re: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-22 Thread Daniel Vetter

On Tue, Aug 22, 2023 at 02:14:28PM +, Dong, Zhanjun wrote:
> 
> 
> > -Original Message-
> > From: Daniel Vetter 
> > Sent: August 22, 2023 9:51 AM
> > To: Dong, Zhanjun 
> > Cc: intel-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
> > Harrison,
> > John C ; Andi Shyti ;
> > Daniel Vetter 
> > Subject: Re: [PATCH v5] drm/i915: Avoid circular locking dependency when
> > flush delayed work on gt reset
> > 
> > On Fri, Aug 11, 2023 at 11:20:11AM -0700, Zhanjun Dong wrote:
> > > This attempts to avoid circular locking dependency between flush delayed
> > > work and intel_gt_reset.
> > > When intel_gt_reset was called, task will hold a lock.
> > > To cacel delayed work here, the _sync version will also acquire a lock,
> > > which might trigger the possible cirular locking dependency warning.
> > > When intel_gt_reset called, reset_in_progress flag will be set, add code
> > > to check the flag, call async verion if reset is in progress.
> > >
> > > Signed-off-by: Zhanjun Dong 
> > > Cc: John Harrison 
> > > Cc: Andi Shyti 
> > > Cc: Daniel Vetter 
> > > ---
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index a0e3ef1c65d2..600388c849f7 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct
> > intel_guc *guc)
> > >
> > >  static void guc_cancel_busyness_worker(struct intel_guc *guc)
> > >  {
> > > - cancel_delayed_work_sync(>timestamp.work);
> > > + /*
> > > +  * When intel_gt_reset was called, task will hold a lock.
> > > +  * To cacel delayed work here, the _sync version will also acquire a 
> > > lock,
> > which might
> > > +  * trigger the possible cirular locking dependency warning.
> > 
> > This is not even close to a locking bugfix. Consider this a formal nack,
> > because the issue here is not even close to "needs more comments to
> > explain what's going on".
> > -Daniel
> 
> The purpose of the comment here it is to explain locking issue condition
> > 
> > > +  * Check the reset_in_progress flag, call async verion if reset is in
> > progress.
> 
> 
> The comment here explains check with the flag to avoid locking condition.
> The reset process is not considered to be complete in short time, other than 
> that, do we missed anything?

Either the _sync is not needed at all, in case you need to explain why.
Which this patch doesn't. And if the _sync isn't needed, then it's
probably not needed in all/most cases?

Or the _sync is needed, and in that case you just replace a potential
deadlock scenario with a potential race condition.

In neither case should this patch here be merged.
-Daniel

> 
> > > +  */
> > > + if (guc_to_gt(guc)->uc.reset_in_progress)
> > > + cancel_delayed_work(>timestamp.work);
> > > + else
> > > + cancel_delayed_work_sync(>timestamp.work);
> > >  }
> > >
> > >  static void __reset_guc_busyness_stats(struct intel_guc *guc)
> > > --
> > > 2.34.1
> > >
> > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v4 0/3] drm: simplify support for transparent DRM bridges

2023-08-22 Thread Neil Armstrong


On 22/08/2023 16:19, Laurent Pinchart wrote:

On Tue, Aug 22, 2023 at 05:17:37PM +0300, Laurent Pinchart wrote:

Hi Dmitry,

Thank you for the patches.

On Thu, Aug 17, 2023 at 05:55:13PM +0300, Dmitry Baryshkov wrote:

Supporting DP/USB-C can result in a chain of several transparent
bridges (PHY, redrivers, mux, etc). This results in drivers having
similar boilerplate code for such bridges.


What do you mean by transparent bridge here ? Bridges are a DRM concept,
and as far as I can tell, a PHY isn't a bridge. Why does it need to be
handled as one, especially if it's completely transparent ?


Next, these drivers are susceptible to -EPROBE_DEFER loops: the next
bridge can either be probed from the bridge->attach callback, when it is
too late to return -EPROBE_DEFER, or from the probe() callback, when the
next bridge might not yet be available, because it depends on the
resources provided by the probing device.


Can't device links help avoiding defer probing in those cases ?


Last, but not least, this results in the the internal knowledge of DRM
subsystem slowly diffusing into other subsystems, like PHY or USB/TYPEC.


Why so ? The PHY subsystem should provide a PHY, without considering
what subsystem it will be used by. This patch series seems to me to
actually create this DRM dependency in other subsystems,


I was wrong on this one, there are indeed existing drm_bridge instances
in drivers/usb/ and drivers/phy/. That's certainly not nice. Why do we
even need drm_bridge there, why can't the PHYs be acquired by their
consumers in DRM (and anywhere else) using the PHY API ?


Because with USB-C Altmode/USB4/Thunderbolt, DisplayPort is one of the
data streams handled by PHYs, USB-C PD manager, re-timers, SBU muxes...
and all this must be coordinated with the display controller and can
be considered as bridges between the DP controller and the USB-C connector.

As of today, it has been handled by OOB events on Intel & AMD, but the entirety
of USB-C chain is handled in firmare, so this scales.
When we need to describe the entire USB-C data stream chain as port/endpoint
in DT, OOB handling doesn't work anymore since we need to sync the entire
USB-C chain (muxes, switches, retimers, phys...) handled by Linux before
starting the DP stream.

Neil




which I don't
think is a very good idea. Resources should be registered in their own
subsystem with the appropriate API, not in a way that is tied to a
particular consumer.


To solve all these issues, define a separate DRM helper, which creates
separate aux device just for the bridge. During probe such aux device
doesn't result in the EPROBE_DEFER loops. Instead it allows the device
drivers to probe properly, according to the actual resource
dependencies. The bridge auxdevs are then probed when the next bridge
becomes available, sparing drivers from drm_bridge_attach() returning
-EPROBE_DEFER.


I'm not thrilled :-( Let's discuss the questions above first.


Proposed merge strategy: immutable branch with the drm commit, which is
then merged into PHY and USB subsystems together with the corresponding
patch.

Changes since v3:
  - Moved bridge driver to gpu/drm/bridge (Neil Armstrong)
  - Renamed it to aux-bridge (since there is already a simple_bridge driver)
  - Made CONFIG_OF mandatory for this driver (Neil Armstrong)
  - Added missing kfree and ida_free (Dan Carpenter)

Changes since v2:
  - ifdef'ed bridge->of_node access (LKP)

Changes since v1:
  - Added EXPORT_SYMBOL_GPL / MODULE_LICENSE / etc. to drm_simple_bridge

Dmitry Baryshkov (3):
   drm/bridge: add transparent bridge helper
   phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE
   usb: typec: nb7vpq904m: switch to DRM_AUX_BRIDGE

  drivers/gpu/drm/bridge/Kconfig|   9 ++
  drivers/gpu/drm/bridge/Makefile   |   1 +
  drivers/gpu/drm/bridge/aux-bridge.c   | 132 ++
  drivers/phy/qualcomm/Kconfig  |   2 +-
  drivers/phy/qualcomm/phy-qcom-qmp-combo.c |  44 +---
  drivers/usb/typec/mux/Kconfig |   2 +-
  drivers/usb/typec/mux/nb7vpq904m.c|  44 +---
  include/drm/bridge/aux-bridge.h   |  19 
  8 files changed, 167 insertions(+), 86 deletions(-)
  create mode 100644 drivers/gpu/drm/bridge/aux-bridge.c
  create mode 100644 include/drm/bridge/aux-bridge.h

Re: [PATCH v4 0/3] drm: simplify support for transparent DRM bridges

2023-08-22 Thread Laurent Pinchart

On Tue, Aug 22, 2023 at 05:17:37PM +0300, Laurent Pinchart wrote:
> Hi Dmitry,
> 
> Thank you for the patches.
> 
> On Thu, Aug 17, 2023 at 05:55:13PM +0300, Dmitry Baryshkov wrote:
> > Supporting DP/USB-C can result in a chain of several transparent
> > bridges (PHY, redrivers, mux, etc). This results in drivers having
> > similar boilerplate code for such bridges.
> 
> What do you mean by transparent bridge here ? Bridges are a DRM concept,
> and as far as I can tell, a PHY isn't a bridge. Why does it need to be
> handled as one, especially if it's completely transparent ?
> 
> > Next, these drivers are susceptible to -EPROBE_DEFER loops: the next
> > bridge can either be probed from the bridge->attach callback, when it is
> > too late to return -EPROBE_DEFER, or from the probe() callback, when the
> > next bridge might not yet be available, because it depends on the
> > resources provided by the probing device.
> 
> Can't device links help avoiding defer probing in those cases ?
> 
> > Last, but not least, this results in the the internal knowledge of DRM
> > subsystem slowly diffusing into other subsystems, like PHY or USB/TYPEC.
> 
> Why so ? The PHY subsystem should provide a PHY, without considering
> what subsystem it will be used by. This patch series seems to me to
> actually create this DRM dependency in other subsystems,

I was wrong on this one, there are indeed existing drm_bridge instances
in drivers/usb/ and drivers/phy/. That's certainly not nice. Why do we
even need drm_bridge there, why can't the PHYs be acquired by their
consumers in DRM (and anywhere else) using the PHY API ?

> which I don't
> think is a very good idea. Resources should be registered in their own
> subsystem with the appropriate API, not in a way that is tied to a
> particular consumer.
> 
> > To solve all these issues, define a separate DRM helper, which creates
> > separate aux device just for the bridge. During probe such aux device
> > doesn't result in the EPROBE_DEFER loops. Instead it allows the device
> > drivers to probe properly, according to the actual resource
> > dependencies. The bridge auxdevs are then probed when the next bridge
> > becomes available, sparing drivers from drm_bridge_attach() returning
> > -EPROBE_DEFER.
> 
> I'm not thrilled :-( Let's discuss the questions above first.
> 
> > Proposed merge strategy: immutable branch with the drm commit, which is
> > then merged into PHY and USB subsystems together with the corresponding
> > patch.
> > 
> > Changes since v3:
> >  - Moved bridge driver to gpu/drm/bridge (Neil Armstrong)
> >  - Renamed it to aux-bridge (since there is already a simple_bridge driver)
> >  - Made CONFIG_OF mandatory for this driver (Neil Armstrong)
> >  - Added missing kfree and ida_free (Dan Carpenter)
> > 
> > Changes since v2:
> >  - ifdef'ed bridge->of_node access (LKP)
> > 
> > Changes since v1:
> >  - Added EXPORT_SYMBOL_GPL / MODULE_LICENSE / etc. to drm_simple_bridge
> > 
> > Dmitry Baryshkov (3):
> >   drm/bridge: add transparent bridge helper
> >   phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE
> >   usb: typec: nb7vpq904m: switch to DRM_AUX_BRIDGE
> > 
> >  drivers/gpu/drm/bridge/Kconfig|   9 ++
> >  drivers/gpu/drm/bridge/Makefile   |   1 +
> >  drivers/gpu/drm/bridge/aux-bridge.c   | 132 ++
> >  drivers/phy/qualcomm/Kconfig  |   2 +-
> >  drivers/phy/qualcomm/phy-qcom-qmp-combo.c |  44 +---
> >  drivers/usb/typec/mux/Kconfig |   2 +-
> >  drivers/usb/typec/mux/nb7vpq904m.c|  44 +---
> >  include/drm/bridge/aux-bridge.h   |  19 
> >  8 files changed, 167 insertions(+), 86 deletions(-)
> >  create mode 100644 drivers/gpu/drm/bridge/aux-bridge.c
> >  create mode 100644 include/drm/bridge/aux-bridge.h

-- 
Regards,

Laurent Pinchart

RE: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-22 Thread Dong, Zhanjun



> -Original Message-
> From: Daniel Vetter 
> Sent: August 22, 2023 9:51 AM
> To: Dong, Zhanjun 
> Cc: intel-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; 
> Harrison,
> John C ; Andi Shyti ;
> Daniel Vetter 
> Subject: Re: [PATCH v5] drm/i915: Avoid circular locking dependency when
> flush delayed work on gt reset
> 
> On Fri, Aug 11, 2023 at 11:20:11AM -0700, Zhanjun Dong wrote:
> > This attempts to avoid circular locking dependency between flush delayed
> > work and intel_gt_reset.
> > When intel_gt_reset was called, task will hold a lock.
> > To cacel delayed work here, the _sync version will also acquire a lock,
> > which might trigger the possible cirular locking dependency warning.
> > When intel_gt_reset called, reset_in_progress flag will be set, add code
> > to check the flag, call async verion if reset is in progress.
> >
> > Signed-off-by: Zhanjun Dong 
> > Cc: John Harrison 
> > Cc: Andi Shyti 
> > Cc: Daniel Vetter 
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a0e3ef1c65d2..600388c849f7 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct
> intel_guc *guc)
> >
> >  static void guc_cancel_busyness_worker(struct intel_guc *guc)
> >  {
> > -   cancel_delayed_work_sync(>timestamp.work);
> > +   /*
> > +* When intel_gt_reset was called, task will hold a lock.
> > +* To cacel delayed work here, the _sync version will also acquire a 
> > lock,
> which might
> > +* trigger the possible cirular locking dependency warning.
> 
> This is not even close to a locking bugfix. Consider this a formal nack,
> because the issue here is not even close to "needs more comments to
> explain what's going on".
> -Daniel

The purpose of the comment here it is to explain locking issue condition
> 
> > +* Check the reset_in_progress flag, call async verion if reset is in
> progress.


The comment here explains check with the flag to avoid locking condition.
The reset process is not considered to be complete in short time, other than 
that, do we missed anything?

> > +*/
> > +   if (guc_to_gt(guc)->uc.reset_in_progress)
> > +   cancel_delayed_work(>timestamp.work);
> > +   else
> > +   cancel_delayed_work_sync(>timestamp.work);
> >  }
> >
> >  static void __reset_guc_busyness_stats(struct intel_guc *guc)
> > --
> > 2.34.1
> >
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Re: [PATCH v4 0/3] drm: simplify support for transparent DRM bridges

2023-08-22 Thread Laurent Pinchart

Hi Dmitry,

Thank you for the patches.

On Thu, Aug 17, 2023 at 05:55:13PM +0300, Dmitry Baryshkov wrote:
> Supporting DP/USB-C can result in a chain of several transparent
> bridges (PHY, redrivers, mux, etc). This results in drivers having
> similar boilerplate code for such bridges.

What do you mean by transparent bridge here ? Bridges are a DRM concept,
and as far as I can tell, a PHY isn't a bridge. Why does it need to be
handled as one, especially if it's completely transparent ?

> Next, these drivers are susceptible to -EPROBE_DEFER loops: the next
> bridge can either be probed from the bridge->attach callback, when it is
> too late to return -EPROBE_DEFER, or from the probe() callback, when the
> next bridge might not yet be available, because it depends on the
> resources provided by the probing device.

Can't device links help avoiding defer probing in those cases ?

> Last, but not least, this results in the the internal knowledge of DRM
> subsystem slowly diffusing into other subsystems, like PHY or USB/TYPEC.

Why so ? The PHY subsystem should provide a PHY, without considering
what subsystem it will be used by. This patch series seems to me to
actually create this DRM dependency in other subsystems, which I don't
think is a very good idea. Resources should be registered in their own
subsystem with the appropriate API, not in a way that is tied to a
particular consumer.

> To solve all these issues, define a separate DRM helper, which creates
> separate aux device just for the bridge. During probe such aux device
> doesn't result in the EPROBE_DEFER loops. Instead it allows the device
> drivers to probe properly, according to the actual resource
> dependencies. The bridge auxdevs are then probed when the next bridge
> becomes available, sparing drivers from drm_bridge_attach() returning
> -EPROBE_DEFER.

I'm not thrilled :-( Let's discuss the questions above first.

> Proposed merge strategy: immutable branch with the drm commit, which is
> then merged into PHY and USB subsystems together with the corresponding
> patch.
> 
> Changes since v3:
>  - Moved bridge driver to gpu/drm/bridge (Neil Armstrong)
>  - Renamed it to aux-bridge (since there is already a simple_bridge driver)
>  - Made CONFIG_OF mandatory for this driver (Neil Armstrong)
>  - Added missing kfree and ida_free (Dan Carpenter)
> 
> Changes since v2:
>  - ifdef'ed bridge->of_node access (LKP)
> 
> Changes since v1:
>  - Added EXPORT_SYMBOL_GPL / MODULE_LICENSE / etc. to drm_simple_bridge
> 
> Dmitry Baryshkov (3):
>   drm/bridge: add transparent bridge helper
>   phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE
>   usb: typec: nb7vpq904m: switch to DRM_AUX_BRIDGE
> 
>  drivers/gpu/drm/bridge/Kconfig|   9 ++
>  drivers/gpu/drm/bridge/Makefile   |   1 +
>  drivers/gpu/drm/bridge/aux-bridge.c   | 132 ++
>  drivers/phy/qualcomm/Kconfig  |   2 +-
>  drivers/phy/qualcomm/phy-qcom-qmp-combo.c |  44 +---
>  drivers/usb/typec/mux/Kconfig |   2 +-
>  drivers/usb/typec/mux/nb7vpq904m.c|  44 +---
>  include/drm/bridge/aux-bridge.h   |  19 
>  8 files changed, 167 insertions(+), 86 deletions(-)
>  create mode 100644 drivers/gpu/drm/bridge/aux-bridge.c
>  create mode 100644 include/drm/bridge/aux-bridge.h

-- 
Regards,

Laurent Pinchart

Re: [PATCH RFC 00/13] drm/connector: Create HDMI Connector infrastructure

2023-08-22 Thread Daniel Vetter

On Mon, Aug 14, 2023 at 03:56:12PM +0200, Maxime Ripard wrote:
> Hi,
> 
> Here's a series that creates a subclass of drm_connector specifically
> targeted at HDMI controllers.
> 
> The idea behind this series came from a recent discussion on IRC during
> which we discussed infoframes generation of i915 vs everything else. 
> 
> Infoframes generation code still requires some decent boilerplate, with
> each driver doing some variation of it.
> 
> In parallel, while working on vc4, we ended up converting a lot of i915
> logic (mostly around format / bpc selection, and scrambler setup) to
> apply on top of a driver that relies only on helpers.
> 
> While currently sitting in the vc4 driver, none of that logic actually
> relies on any driver or hardware-specific behaviour.
> 
> The only missing piec to make it shareable are a bunch of extra
> variables stored in a state (current bpc, format, RGB range selection,
> etc.).
> 
> Thus, I decided to create some generic subclass of drm_connector to
> address HDMI connectors, with a bunch of helpers that will take care of
> all the "HDMI Spec" related code. Scrambler setup is missing at the
> moment but can easily be plugged in.
> 
> Last week, Hans Verkuil also expressed interest in retrieving the
> infoframes generated from userspace to create an infoframe-decode tool.
> This series thus leverages the infoframe generation code to expose it
> through debugfs.
> 
> This entire series is only build-tested at the moment. Let me know what
> you think,
> Maxime

I think the idea overall makes sense, we we probably need it to roll out
actual hdmi support to all the hdmi drivers we have. But there's the
eternal issue of "C sucks at multiple inheritance".

Which means if you have a driver that subclasses drm_connector already for
it's driver needs it defacto cannot, or only under some serious pains, use
this. Which is kinda why in practice we tend to not subclass, but stuff
subclass fields into a name sub-structure. So essentially struct
drm_connector.hdmi and struct drm_connector_state.hdmi instead of
drm_hdmi_connector and drm_hdmi_connector_state. The helper functions to
set it all up would all still be the same roughly. It's less typesafe but
I think the gain in practical use (like you could make i915 use the
helpers probably, which with this approach here is practically
impossible).

The only other nit is that we probably want to put some of the hdmi
properties into struct drm_mode_config because there's no reason to have
per-connector valid values.

Also, it might be really good if you can find a co-conspirator who also
wants to use this in their driver, then with some i915 extracting we'd
have three, which should ensure the helper api is solid.

Cheers, Sima


> 
> Signed-off-by: Maxime Ripard 
> ---
> Maxime Ripard (13):
>   drm/connector: Introduce an HDMI connector
>   drm/connector: hdmi: Create a custom state
>   drm/connector: hdmi: Add Broadcast RGB property
>   drm/connector: hdmi: Add helper to get the RGB range
>   drm/connector: hdmi: Add output BPC to the connector state
>   drm/connector: hdmi: Add support for output format
>   drm/connector: hdmi: Calculate TMDS character rate
>   drm/connector: hdmi: Add custom hook to filter TMDS character rate
>   drm/connector: hdmi: Compute bpc and format automatically
>   drm/connector: hdmi: Add Infoframes generation
>   drm/connector: hdmi: Create Infoframe DebugFS entries
>   drm/vc4: hdmi: Create destroy state implementation
>   drm/vc4: hdmi: Switch to HDMI connector
> 
>  drivers/gpu/drm/Makefile |1 +
>  drivers/gpu/drm/drm_hdmi_connector.c | 1112 
> ++
>  drivers/gpu/drm/vc4/vc4_hdmi.c   |  720 --
>  drivers/gpu/drm/vc4/vc4_hdmi.h   |   37 +-
>  drivers/gpu/drm/vc4/vc4_hdmi_phy.c   |4 +-
>  include/drm/drm_connector.h  |  256 
>  6 files changed, 1508 insertions(+), 622 deletions(-)
> ---
> base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
> change-id: 20230814-kms-hdmi-connector-state-616787e67927
> 
> Best regards,
> -- 
> Maxime Ripard 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: TODO list task: Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2023-08-22 Thread Sharique Mohammad

Ok. So it is already completed.
I have to find something else...

Thanks and regards,
Sharique

Am Di., 22. Aug. 2023 um 14:46 Uhr schrieb Jani Nikula <
jani.nik...@linux.intel.com>:

> On Tue, 22 Aug 2023, Sharq Mohammad  wrote:
> > Hello All,
> >
> > I am a usual kernel developer, and wanted to contribute to the open
> source.
> > I saw a small TODO list in the DRM graphics subsystem, with some tasks.
> > So, just wanted to ask, is anyone working on the task:
> > *Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi*
> >
> > Its on the TODO list.
>
> Yeah, I've got branch
>
>
> https://gitlab.freedesktop.org/jani/linux/-/commits/drm-edid-is-hdmi-has-audio
>
> BR,
> Jani.
>
>
> >
> > Thanks and regards,
> > Sharique
>
> --
> Jani Nikula, Intel Open Source Graphics Center
>

Re: [PATCH 1/3] drm/buddy: Fix contiguous memory allocation issues

2023-08-22 Thread Arunpravin Paneer Selvam




On 21/08/23 10:46, Matthew Auld wrote:

Hi,

On 21/08/2023 11:14, Arunpravin Paneer Selvam wrote:

The way now contiguous requests are implemented such that
the size rounded up to power of 2 and the corresponding order
block picked from the freelist.

In addition to the older method, the new method will rounddown
the size to power of 2 and the corresponding order block picked
from the freelist. And for the remaining size we traverse the
tree and try to allocate either from the freelist block's buddy
or from the peer block. If the remaining size from peer/buddy
block is not free, we pick the next freelist block and repeat
the same method.

Moved contiguous/alignment size computation part and trim
function to the drm buddy manager.


I think we should also mention somewhere what issue this is trying to 
solve. IIUC the roundup_power_of_two() might in some cases trigger 
-ENOSPC even though there might be enough free space, and so to help 
with that we introduce a try harder mechanism.
Yes, we are trying to solve the above issue. I will add the problem 
statement to the commit description.




Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/drm_buddy.c | 253 ++--
  include/drm/drm_buddy.h |   6 +-
  2 files changed, 248 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 7098f125b54a..220f60c08a03 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -569,6 +569,197 @@ static int __drm_buddy_alloc_range(struct 
drm_buddy *mm,

  return __alloc_range(mm, , start, size, blocks);
  }
  +static int __alloc_contiguous_block_from_buddy(struct drm_buddy *mm,
+   u64 size,
+   u64 min_block_size,
+   struct drm_buddy_block *block,
+   struct list_head *blocks)
+{
+    struct drm_buddy_block *buddy, *parent = NULL;
+    u64 start, offset = 0;
+    LIST_HEAD(dfs);
+    int err;
+
+    if (!block)
+    return -EINVAL;
+
+    buddy = __get_buddy(block);
+    if (!buddy)
+    return -ENOSPC;
+
+    if (drm_buddy_block_is_allocated(buddy))
+    return -ENOSPC;
+
+    parent = block->parent;
+    if (!parent)
+    return -ENOSPC;
+
+    if (block->parent->right == block) {
+    u64 remaining;
+
+    /* Compute the leftover size for allocation */
+    remaining = max((size - drm_buddy_block_size(mm, buddy)),
+    min_block_size);
+    if (!IS_ALIGNED(remaining, min_block_size))
+    remaining = round_up(remaining, min_block_size);
+
+    /* Check if remaining size is greater than buddy block size */
+    if (drm_buddy_block_size(mm, buddy) < remaining)
+    return -ENOSPC;
+
+    offset = drm_buddy_block_size(mm, buddy) - remaining;
+    }
+
+    list_add(>tmp_link, );
+    start = drm_buddy_block_offset(parent) + offset;
+
+    err = __alloc_range(mm, , start, size, blocks);
+    if (err)
+    return -ENOSPC;
+
+    return 0;
+}
+
+static int __alloc_contiguous_block_from_peer(struct drm_buddy *mm,
+  u64 size,
+  u64 min_block_size,
+  struct drm_buddy_block *block,
+  struct list_head *blocks)
+{
+    struct drm_buddy_block *first, *peer, *tmp;
+    struct drm_buddy_block *parent = NULL;
+    u64 start, offset = 0;
+    unsigned int order;
+    LIST_HEAD(dfs);
+    int err;
+
+    if (!block)
+    return -EINVAL;
+
+    order = drm_buddy_block_order(block);
+    /* Add freelist block to dfs list */
+    list_add(>tmp_link, );
+
+    tmp = block;
+    parent = block->parent;
+    while (parent) {
+    if (block->parent->left == block) {
+    if (parent->left != tmp) {
+    peer = parent->left;
+    break;
+    }
+    } else {
+    if (parent->right != tmp) {
+    peer = parent->right;
+    break;
+    }
+    }
+
+    tmp = parent;
+    parent = tmp->parent;
+    }
+
+    if (!parent)
+    return -ENOSPC;
+
+    do {
+    if (drm_buddy_block_is_allocated(peer))
+    return -ENOSPC;
+    /* Exit loop if peer block order is equal to block order */
+    if (drm_buddy_block_order(peer) == order)
+    break;
+
+    if (drm_buddy_block_is_split(peer)) {
+    /* Traverse down to the block order level */
+    if (block->parent->left == block)
+    peer = peer->right;
+    else
+    peer = peer->left;
+    } else {
+    break;
+    }
+    } while (1);
+
+    if (block->parent->left == block) {
+    u64 remaining;
+
+    /* Compute the leftover size for allocation */
+    remaining = max((size - drm_buddy_block_size(mm, block)),
+    min_block_size);
+    if (!IS_ALIGNED(remaining, min_block_size))
+

Re: [PATCH v4 43/48] drm/ttm: introduce pool_shrink_rwsem

2023-08-22 Thread Daniel Vetter

On Mon, Aug 07, 2023 at 07:09:31PM +0800, Qi Zheng wrote:
> Currently, the synchronize_shrinkers() is only used by TTM pool. It only
> requires that no shrinkers run in parallel.
> 
> After we use RCU+refcount method to implement the lockless slab shrink,
> we can not use shrinker_rwsem or synchronize_rcu() to guarantee that all
> shrinker invocations have seen an update before freeing memory.
> 
> So we introduce a new pool_shrink_rwsem to implement a private
> synchronize_shrinkers(), so as to achieve the same purpose.
> 
> Signed-off-by: Qi Zheng 
> Reviewed-by: Muchun Song 

On the 5 drm patches (I counted 2 ttm and 3 drivers) for merging through
some other tree (since I'm assuming that's how this will land):

Acked-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/ttm/ttm_pool.c | 15 +++
>  include/linux/shrinker.h   |  2 --
>  mm/shrinker.c  | 15 ---
>  3 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index c9c9618c0dce..38b4c280725c 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -74,6 +74,7 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER 
> + 1];
>  static spinlock_t shrinker_lock;
>  static struct list_head shrinker_list;
>  static struct shrinker *mm_shrinker;
> +static DECLARE_RWSEM(pool_shrink_rwsem);
>  
>  /* Allocate pages of size 1 << order with the given gfp_flags */
>  static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t 
> gfp_flags,
> @@ -317,6 +318,7 @@ static unsigned int ttm_pool_shrink(void)
>   unsigned int num_pages;
>   struct page *p;
>  
> + down_read(_shrink_rwsem);
>   spin_lock(_lock);
>   pt = list_first_entry(_list, typeof(*pt), shrinker_list);
>   list_move_tail(>shrinker_list, _list);
> @@ -329,6 +331,7 @@ static unsigned int ttm_pool_shrink(void)
>   } else {
>   num_pages = 0;
>   }
> + up_read(_shrink_rwsem);
>  
>   return num_pages;
>  }
> @@ -572,6 +575,18 @@ void ttm_pool_init(struct ttm_pool *pool, struct device 
> *dev,
>  }
>  EXPORT_SYMBOL(ttm_pool_init);
>  
> +/**
> + * synchronize_shrinkers - Wait for all running shrinkers to complete.
> + *
> + * This is useful to guarantee that all shrinker invocations have seen an
> + * update, before freeing memory, similar to rcu.
> + */
> +static void synchronize_shrinkers(void)
> +{
> + down_write(_shrink_rwsem);
> + up_write(_shrink_rwsem);
> +}
> +
>  /**
>   * ttm_pool_fini - Cleanup a pool
>   *
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index c55c07c3f0cb..025c8070dd86 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -103,8 +103,6 @@ struct shrinker *shrinker_alloc(unsigned int flags, const 
> char *fmt, ...);
>  void shrinker_register(struct shrinker *shrinker);
>  void shrinker_free(struct shrinker *shrinker);
>  
> -extern void synchronize_shrinkers(void);
> -
>  #ifdef CONFIG_SHRINKER_DEBUG
>  extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker,
> const char *fmt, ...);
> diff --git a/mm/shrinker.c b/mm/shrinker.c
> index 3ab301ff122d..a27779ed3798 100644
> --- a/mm/shrinker.c
> +++ b/mm/shrinker.c
> @@ -650,18 +650,3 @@ void shrinker_free(struct shrinker *shrinker)
>   kfree(shrinker);
>  }
>  EXPORT_SYMBOL_GPL(shrinker_free);
> -
> -/**
> - * synchronize_shrinkers - Wait for all running shrinkers to complete.
> - *
> - * This is equivalent to calling unregister_shrink() and register_shrinker(),
> - * but atomically and with less overhead. This is useful to guarantee that 
> all
> - * shrinker invocations have seen an update, before freeing memory, similar 
> to
> - * rcu.
> - */
> -void synchronize_shrinkers(void)
> -{
> - down_write(_rwsem);
> - up_write(_rwsem);
> -}
> -EXPORT_SYMBOL(synchronize_shrinkers);
> -- 
> 2.30.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v8 2/7] phy: Add HDMI configuration options

2023-08-22 Thread Vinod Koul

On 17-08-23, 13:05, Dmitry Baryshkov wrote:
> On 08/08/2023 11:32, Sandor Yu wrote:
> > Allow HDMI PHYs to be configured through the generic
> > functions through a custom structure added to the generic union.
> > 
> > The parameters added here are based on HDMI PHY
> > implementation practices.  The current set of parameters
> > should cover the potential users.
> > 
> > Signed-off-by: Sandor Yu 
> > ---
> >   include/linux/phy/phy-hdmi.h | 24 
> >   include/linux/phy/phy.h  |  7 ++-
> >   2 files changed, 30 insertions(+), 1 deletion(-)
> >   create mode 100644 include/linux/phy/phy-hdmi.h
> 
> I think this looks good now, thank you!
> 
> Reviewed-by: Dmitry Baryshkov 

Should this go thru drm or phy...?

> 
> -- 
> With best wishes
> Dmitry

-- 
~Vinod

Re: [PATCH v4 2/3] phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE

2023-08-22 Thread Vinod Koul

On 17-08-23, 17:55, Dmitry Baryshkov wrote:
> Switch to using the new DRM_AUX_BRIDGE helper to create the
> transparent DRM bridge device instead of handcoding corresponding
> functionality.

Acked-by: Vinod Koul 

-- 
~Vinod

Re: [PATCH v5] drm/i915: Avoid circular locking dependency when flush delayed work on gt reset

2023-08-22 Thread Daniel Vetter

On Fri, Aug 11, 2023 at 11:20:11AM -0700, Zhanjun Dong wrote:
> This attempts to avoid circular locking dependency between flush delayed
> work and intel_gt_reset.
> When intel_gt_reset was called, task will hold a lock.
> To cacel delayed work here, the _sync version will also acquire a lock,
> which might trigger the possible cirular locking dependency warning.
> When intel_gt_reset called, reset_in_progress flag will be set, add code
> to check the flag, call async verion if reset is in progress.
> 
> Signed-off-by: Zhanjun Dong 
> Cc: John Harrison 
> Cc: Andi Shyti 
> Cc: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index a0e3ef1c65d2..600388c849f7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1359,7 +1359,16 @@ static void guc_enable_busyness_worker(struct 
> intel_guc *guc)
>  
>  static void guc_cancel_busyness_worker(struct intel_guc *guc)
>  {
> - cancel_delayed_work_sync(>timestamp.work);
> + /*
> +  * When intel_gt_reset was called, task will hold a lock.
> +  * To cacel delayed work here, the _sync version will also acquire a 
> lock, which might
> +  * trigger the possible cirular locking dependency warning.

This is not even close to a locking bugfix. Consider this a formal nack,
because the issue here is not even close to "needs more comments to
explain what's going on".
-Daniel

> +  * Check the reset_in_progress flag, call async verion if reset is in 
> progress.
> +  */
> + if (guc_to_gt(guc)->uc.reset_in_progress)
> + cancel_delayed_work(>timestamp.work);
> + else
> + cancel_delayed_work_sync(>timestamp.work);
>  }
>  
>  static void __reset_guc_busyness_stats(struct intel_guc *guc)
> -- 
> 2.34.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH] drm/dp_mst: Fix NULL deref in get_mst_branch_device_by_guid_helper()

2023-08-22 Thread Radosław Biernacki

śr., 16 sie 2023 o 11:08 Lukasz Majczak  napisał(a):
>
> czw., 3 sie 2023 o 11:23 Lukasz Majczak  napisał(a):
> >
> > Check mgr->mst_primary, before passing it to
> > the get_mst_branch_device_by_guid_helper(), otherwise NULL dereference
> > may occur in the call to memcpy() and cause:
> >
> > [12579.365869] BUG: kernel NULL pointer dereference, address: 
> > 0049
> > [12579.365878] #PF: supervisor read access in kernel mode
> > [12579.365880] #PF: error_code(0x) - not-present page
> > [12579.365882] PGD 0 P4D 0
> > [12579.365887] Oops:  [#1] PREEMPT SMP NOPTI
> > ...
> > [12579.365895] Workqueue: events_long drm_dp_mst_up_req_work
> > [12579.365899] RIP: 0010:memcmp+0xb/0x29
> > [12579.365921] Call Trace:
> > [12579.365927] get_mst_branch_device_by_guid_helper+0x22/0x64
> > [12579.365930] drm_dp_mst_up_req_work+0x137/0x416
> > [12579.365933] process_one_work+0x1d0/0x419
> > [12579.365935] worker_thread+0x11a/0x289
> > [12579.365938] kthread+0x13e/0x14f
> > [12579.365941] ? process_one_work+0x419/0x419
> > [12579.365943] ? kthread_blkcg+0x31/0x31
> > [12579.365946] ret_from_fork+0x1f/0x30
> >
> > Similar check is done in e.g: drm_dp_mst_topology_get_mstb_validated().
> >
> > Fixes: 5e93b8208d3c ("drm/dp/mst: move GUID storage from mgr, port to only 
> > mst branch")
> > Cc:  # 4.14+
> > Signed-off-by: Lukasz Majczak 
> > ---
> >  drivers/gpu/drm/display/drm_dp_mst_topology.c | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
> > b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > index ed96cfcfa304..703cd97b1d11 100644
> > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > @@ -2595,19 +2595,19 @@ static struct drm_dp_mst_branch *
> >  drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
> >  const uint8_t *guid)
> >  {
> > -   struct drm_dp_mst_branch *mstb;
> > +   struct drm_dp_mst_branch *mstb = NULL;
> > int ret;
> >
> > /* find the port by iterating down */
> > mutex_lock(>lock);
> > -
> > -   mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> > -   if (mstb) {
> > -   ret = drm_dp_mst_topology_try_get_mstb(mstb);
> > -   if (!ret)
> > -   mstb = NULL;
> > +   if (mgr->mst_primary) {

One suggestion which just came to my mind:
get_mst_branch_device_by_guid_helper() is a recursive function.
This condition might be moved to the inside of that function as the first line.
This way we would have a single condition, meaning remove a similar
one for step over of NULL elements inside a recursive call so NULL
would be an acceptable value as param and therefore no need to check
for this here.

> > +   mstb = 
> > get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> > +   if (mstb) {
> > +   ret = drm_dp_mst_topology_try_get_mstb(mstb);
> > +   if (!ret)
> > +   mstb = NULL;
> > +   }
> > }
> > -
> > mutex_unlock(>lock);
> > return mstb;
> >  }
> > --
> > 2.41.0.640.ga95def55d0-goog
> >
> Hi,
>
> Is there anything more I should do regarding these changes?
>
> Best regards,
> Lukasz

Re: [Intel-gfx] [PATCH] drm/display/dp: Fix the DP DSC Receiver cap size

2023-08-22 Thread Jani Nikula

On Fri, 18 Aug 2023, Ankit Nautiyal  wrote:
> DP DSC Receiver Capabilities are exposed via DPCD 60h-6Fh.
> Fix the DSC RECEIVER CAP SIZE accordingly.
>
> Fixes: ffddc4363c28 ("drm/dp: Add DP DSC DPCD receiver capability size define 
> and missing SHIFT")
> Cc: Anusha Srivatsa 
> Cc: Manasi Navare 
> Cc:  # v5.0+
>
> Signed-off-by: Ankit Nautiyal 
> Reviewed-by: Stanislav Lisovskiy 

Thanks for the patch and review, pushed to drm-misc-fixes.

BR,
Jani.

> ---
>  include/drm/display/drm_dp.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/drm/display/drm_dp.h b/include/drm/display/drm_dp.h
> index 02f2ac4dd2df..e69cece404b3 100644
> --- a/include/drm/display/drm_dp.h
> +++ b/include/drm/display/drm_dp.h
> @@ -1537,7 +1537,7 @@ enum drm_dp_phy {
>  
>  #define DP_BRANCH_OUI_HEADER_SIZE0xc
>  #define DP_RECEIVER_CAP_SIZE 0xf
> -#define DP_DSC_RECEIVER_CAP_SIZE0xf
> +#define DP_DSC_RECEIVER_CAP_SIZE0x10 /* DSC Capabilities 0x60 
> through 0x6F */
>  #define EDP_PSR_RECEIVER_CAP_SIZE2
>  #define EDP_DISPLAY_CTL_CAP_SIZE 3
>  #define DP_LTTPR_COMMON_CAP_SIZE 8

-- 
Jani Nikula, Intel Open Source Graphics Center

[PATCH] drm/mediatek: Add spinlock for setting vblank event in atomic_begin

2023-08-22 Thread Jason-JH . Lin

Add spinlock protection to avoid race condition on vblank event
between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip().

Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
Signed-off-by: Jason-JH.Lin 
---
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index d40142842f85..128a672fe3c9 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,
  crtc);
struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state);
struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
+   unsigned long flags;
+
+   spin_lock_irqsave(>dev->event_lock, flags);
 
if (mtk_crtc->event && mtk_crtc_state->base.event)
DRM_ERROR("new event while there is still a pending event\n");
@@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,
mtk_crtc->event = mtk_crtc_state->base.event;
mtk_crtc_state->base.event = NULL;
}
+
+   spin_unlock_irqrestore(>dev->event_lock, flags);
 }
 
 static void mtk_drm_crtc_atomic_flush(struct drm_crtc *crtc,
-- 
2.18.0

Re: [Linaro-mm-sig] [PATCH v2] dma-buf/sw_sync: Avoid recursive lock during fence signal

2023-08-22 Thread Christian König


Am 18.08.23 um 16:59 schrieb Rob Clark:

From: Rob Clark 

If a signal callback releases the sw_sync fence, that will trigger a
deadlock as the timeline_fence_release recurses onto the fence->lock
(used both for signaling and the the timeline tree).

To avoid that, temporarily hold an extra reference to the signalled
fences until after we drop the lock.

(This is an alternative implementation of 
https://patchwork.kernel.org/patch/11664717/
which avoids some potential UAF issues with the original patch.)

v2: Remove now obsolete comment, use list_move_tail() and
 list_del_init()

Reported-by: Bas Nieuwenhuizen 
Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
Signed-off-by: Rob Clark 


Reviewed-by: Christian König 


---
  drivers/dma-buf/sw_sync.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 63f0aeb66db6..f0a35277fd84 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -191,6 +191,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
   */
  static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
  {
+   LIST_HEAD(signalled);
struct sync_pt *pt, *next;
  
  	trace_sync_timeline(obj);

@@ -203,21 +204,20 @@ static void sync_timeline_signal(struct sync_timeline 
*obj, unsigned int inc)
if (!timeline_fence_signaled(>base))
break;
  
-		list_del_init(>link);

+   dma_fence_get(>base);
+
+   list_move_tail(>link, );
rb_erase(>node, >pt_tree);
  
-		/*

-* A signal callback may release the last reference to this
-* fence, causing it to be freed. That operation has to be
-* last to avoid a use after free inside this loop, and must
-* be after we remove the fence from the timeline in order to
-* prevent deadlocking on timeline->lock inside
-* timeline_fence_release().
-*/
dma_fence_signal_locked(>base);
}
  
  	spin_unlock_irq(>lock);

+
+   list_for_each_entry_safe(pt, next, , link) {
+   list_del_init(>link);
+   dma_fence_put(>base);
+   }
  }
  
  /**

Re: [PATCH v14 RESEND 5/6] drm/imx: Introduce i.MX8qm/qxp DPU DRM

2023-08-22 Thread Maxime Ripard

Hi,

Aside from the discussion on the binding and the general architecture, I
have some comments there.

On Tue, Aug 22, 2023 at 04:59:48PM +0800, Liu Ying wrote:
> +int dpu_cf_init(struct dpu_soc *dpu, unsigned int index,
> + unsigned int id, enum dpu_unit_type type,
> + unsigned long pec_base, unsigned long base)
> +{
> + struct dpu_constframe *cf;
> +
> + cf = devm_kzalloc(dpu->dev, sizeof(*cf), GFP_KERNEL);
> + if (!cf)
> + return -ENOMEM;
> +
> + dpu->cf_priv[index] = cf;

You can't store structures related to KMS in a device managed structure.
The DRM KMS device will stick around (and be accessible from userspace)
after the device has been removed until the last application closed its
file descriptor to the device.

This can be checked by enabling KASAN and manually unbinding the driver
through sysfs.

> + cf->pec_base = devm_ioremap(dpu->dev, pec_base, SZ_16);
> + if (!cf->pec_base)
> + return -ENOMEM;
> +
> + cf->base = devm_ioremap(dpu->dev, base, SZ_32);
> + if (!cf->base)
> + return -ENOMEM;

For the same reason, you need to protect any access to a device managed
resource (so clocks, registers, regulators, etc.) by a call to
drm_dev_enter/drm_dev_exit and you need to call drm_dev_unplug instead
of drm_dev_unregister.

> +static int dpu_crtc_pm_runtime_get_sync(struct dpu_crtc *dpu_crtc)
> +{
> + int ret;
> +
> + ret = pm_runtime_get_sync(dpu_crtc->dev->parent);
> + if (ret < 0) {
> + pm_runtime_put_noidle(dpu_crtc->dev->parent);
> + dpu_crtc_err(_crtc->base,
> +  "failed to get parent device RPM sync: %d\n", ret);
> + }
> +
> + return ret;
> +}

That's pm_runtime_resume_and_get.

> +static int dpu_crtc_pm_runtime_put(struct dpu_crtc *dpu_crtc)
> +{
> + int ret;
> +
> + ret = pm_runtime_put(dpu_crtc->dev->parent);
> + if (ret < 0)
> + dpu_crtc_err(_crtc->base,
> +  "failed to put parent device RPM: %d\n", ret);
> +
> + return ret;
> +}
> +
> +static void dpu_crtc_mode_set_nofb(struct drm_crtc *crtc)
> +{
> + struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> + struct drm_display_mode *adj = >state->adjusted_mode;
> + enum dpu_link_id cf_link;
> +
> + dpu_crtc_dbg(crtc, "mode " DRM_MODE_FMT "\n", DRM_MODE_ARG(adj));
> +
> + /* request power-on when we start to set mode for CRTC */
> + dpu_crtc_pm_runtime_get_sync(dpu_crtc);

From the drm_crtc_helper_funcs documentation:

"""
 * Note that the display pipe is completely off when this function is
 * called. Atomic drivers which need hardware to be running before they
 * program the new display mode (e.g. because they implement runtime PM)
 * should not use this hook. This is because the helper library calls
 * this hook only once per mode change and not every time the display
 * pipeline is suspended using either DPMS or the new "ACTIVE" property.
 * Which means register values set in this callback might get reset when
 * the CRTC is suspended, but not restored.  Such drivers should instead
 * move all their CRTC setup into the @atomic_enable callback.
"""

> +static void dpu_crtc_atomic_enable(struct drm_crtc *crtc,
> +struct drm_atomic_state *state)
> +{
> + struct dpu_crtc *dpu_crtc = to_dpu_crtc(crtc);
> + unsigned long flags;
> +
> + drm_crtc_vblank_on(crtc);
> +
> + enable_irq(dpu_crtc->dec_shdld_irq);
> + enable_irq(dpu_crtc->ed_cont_shdld_irq);
> + enable_irq(dpu_crtc->ed_safe_shdld_irq);
> +
> + dpu_fg_enable_clock(dpu_crtc->fg);
> + dpu_ed_pec_sync_trigger(dpu_crtc->ed_cont);
> + dpu_ed_pec_sync_trigger(dpu_crtc->ed_safe);
> + if (crtc->state->gamma_lut)
> + dpu_crtc_set_gammacor(dpu_crtc);
> + else
> + dpu_crtc_disable_gammacor(dpu_crtc);
> + dpu_fg_shdtokgen(dpu_crtc->fg);
> +
> + /* don't relinquish CPU until TCON is set to operation mode */
> + local_irq_save(flags);
> + preempt_disable();
> + dpu_fg_enable(dpu_crtc->fg);

That's super fishy. You shouldn't need that, at all. What is going on
there?

> +
> + /*
> +  * TKT320590:

Those are NXP internal references as far as as I can tell. They
shouldn't be here.

> +  * Turn TCON into operation mode as soon as the first dumb
> +  * frame is generated by DPU(we don't relinquish CPU to ensure
> +  * this).  This makes DPR/PRG be able to evade the frame.
> +  */
> + DPU_CRTC_WAIT_FOR_FRAMEGEN_FRAME_CNT_MOVING(dpu_crtc->fg);
> + dpu_tcon_set_operation_mode(dpu_crtc->tcon);
> + local_irq_restore(flags);
> + preempt_enable();
> +
> + DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_safe_shdld_done);
> + DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(ed_cont_shdld_done);
> + DPU_CRTC_WAIT_FOR_COMPLETION_TIMEOUT(dec_shdld_done);
> +
> +

Re: [PATCH v2 4/7] drm/amdgpu: Add suspend function to clear the GPU power profile.

2023-08-22 Thread Yadav, Arvind




On 8/22/2023 6:24 PM, Lazar, Lijo wrote:



On 8/22/2023 5:52 PM, Yadav, Arvind wrote:


On 8/22/2023 12:01 PM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a suspend function that will clear the GPU
power profile before going into suspend state.

v2:
- Add the new suspend function based on review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 23 
+++

  drivers/gpu/drm/amd/include/amdgpu_workload.h |  2 ++
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index cd3bf641b630..3b70e657b439 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4212,6 +4212,8 @@ int amdgpu_device_suspend(struct drm_device 
*dev, bool fbcon)

    amdgpu_ras_suspend(adev);
  +    amdgpu_workload_profile_suspend(adev);
+
  amdgpu_device_ip_suspend_phase1(adev);
    if (!adev->in_s0ix)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 6367eb88a44d..44ca8e986984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -174,6 +174,29 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  mutex_unlock(>workload_lock);
  }
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev)
+{
+    struct amdgpu_smu_workload *workload = >smu_workload;
+    int ret;
+
+    mutex_lock(>workload_lock);
+ cancel_delayed_work_sync(>smu_delayed_work);


Another deadlock candidate. Between fini() and suspend(), the only 
difference probably could be initialization status. If so, just use 
a helper that is used during fini() and suspend().


Before going to suspend(), we need to cancel the work and clear all 
the profiles but in fini() we are destroying the mutex. also it will 
be called when we are unloading everything.




What I meant is for both suspend/fini, you need to cancel any work 
scheduled, clear refcounts and set the profile back to default 
profile. Keep this in a helper and reuse.



Noted.

Thank you,
~Arvind


Thanks,
Lijo


~Arvind


Thanks,
Lijo


+
+    /* Clear all the set GPU power profile*/
+    for (int index = fls(workload->submit_workload_status);
+ index > 0; index--) {
+    if (workload->submit_workload_status & (1 << index)) {
+ atomic_set(>power_profile_ref[index], 0);
+    ret = amdgpu_power_profile_clear(adev, index);
+    if (ret)
+    DRM_WARN("Failed to clear power profile %s, err = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    }
+    }
+    workload->submit_workload_status = 0;
+    mutex_unlock(>workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index ee1f87257f2d..0acd8769ec52 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -52,6 +52,8 @@ void amdgpu_workload_profile_put(struct 
amdgpu_device *adev,

  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type);
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);

Re: [PATCH v2 4/7] drm/amdgpu: Add suspend function to clear the GPU power profile.

2023-08-22 Thread Lazar, Lijo





On 8/22/2023 5:52 PM, Yadav, Arvind wrote:


On 8/22/2023 12:01 PM, Lazar, Lijo wrote:



On 8/21/2023 12:17 PM, Arvind Yadav wrote:

This patch adds a suspend function that will clear the GPU
power profile before going into suspend state.

v2:
- Add the new suspend function based on review comment.

Cc: Shashank Sharma 
Cc: Christian Koenig 
Cc: Alex Deucher 
Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c  | 23 +++
  drivers/gpu/drm/amd/include/amdgpu_workload.h |  2 ++
  3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index cd3bf641b630..3b70e657b439 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4212,6 +4212,8 @@ int amdgpu_device_suspend(struct drm_device 
*dev, bool fbcon)

    amdgpu_ras_suspend(adev);
  +    amdgpu_workload_profile_suspend(adev);
+
  amdgpu_device_ip_suspend_phase1(adev);
    if (!adev->in_s0ix)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c

index 6367eb88a44d..44ca8e986984 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_workload.c
@@ -174,6 +174,29 @@ void amdgpu_workload_profile_set(struct 
amdgpu_device *adev,

  mutex_unlock(>workload_lock);
  }
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev)
+{
+    struct amdgpu_smu_workload *workload = >smu_workload;
+    int ret;
+
+    mutex_lock(>workload_lock);
+ cancel_delayed_work_sync(>smu_delayed_work);


Another deadlock candidate. Between fini() and suspend(), the only 
difference probably could be initialization status. If so, just use a 
helper that is used during fini() and suspend().


Before going to suspend(), we need to cancel the work and clear all the 
profiles but in fini() we are destroying the mutex. also it will be 
called when we are unloading everything.




What I meant is for both suspend/fini, you need to cancel any work 
scheduled, clear refcounts and set the profile back to default profile. 
Keep this in a helper and reuse.


Thanks,
Lijo


~Arvind


Thanks,
Lijo


+
+    /* Clear all the set GPU power profile*/
+    for (int index = fls(workload->submit_workload_status);
+ index > 0; index--) {
+    if (workload->submit_workload_status & (1 << index)) {
+ atomic_set(>power_profile_ref[index], 0);
+    ret = amdgpu_power_profile_clear(adev, index);
+    if (ret)
+    DRM_WARN("Failed to clear power profile %s, err = 
%d\n",

+ amdgpu_workload_mode_name[index], ret);
+    }
+    }
+    workload->submit_workload_status = 0;
+    mutex_unlock(>workload_lock);
+}
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev)
  {
  adev->smu_workload.adev = adev;
diff --git a/drivers/gpu/drm/amd/include/amdgpu_workload.h 
b/drivers/gpu/drm/amd/include/amdgpu_workload.h

index ee1f87257f2d..0acd8769ec52 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_workload.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_workload.h
@@ -52,6 +52,8 @@ void amdgpu_workload_profile_put(struct 
amdgpu_device *adev,

  void amdgpu_workload_profile_set(struct amdgpu_device *adev,
   uint32_t ring_type);
  +void amdgpu_workload_profile_suspend(struct amdgpu_device *adev);
+
  void amdgpu_workload_profile_init(struct amdgpu_device *adev);
    void amdgpu_workload_profile_fini(struct amdgpu_device *adev);

Re: TODO list task: Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2023-08-22 Thread Jani Nikula

On Tue, 22 Aug 2023, Sharq Mohammad  wrote:
> Hello All,
>
> I am a usual kernel developer, and wanted to contribute to the open source.
> I saw a small TODO list in the DRM graphics subsystem, with some tasks.
> So, just wanted to ask, is anyone working on the task:
> *Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi*
>
> Its on the TODO list.

Yeah, I've got branch

https://gitlab.freedesktop.org/jani/linux/-/commits/drm-edid-is-hdmi-has-audio

BR,
Jani.


>
> Thanks and regards,
> Sharique

-- 
Jani Nikula, Intel Open Source Graphics Center

1 2 >

1 - 100 of 180 matches

Mail list logo