date:20240125

Re: [PATCH 08/11] ARM: dts: DRA7xx: Add device tree entry for SGX GPU

2024-01-25 Thread Tony Lindgren

* Tony Lindgren  [240118 08:57]:
> * Andrew Davis  [240117 15:52]:
> > On 1/10/24 2:29 AM, Tony Lindgren wrote:
> > > * Andrew Davis  [240109 17:20]:
> > > > --- a/arch/arm/boot/dts/ti/omap/dra7.dtsi
> > > > +++ b/arch/arm/boot/dts/ti/omap/dra7.dtsi
> > > > @@ -850,12 +850,19 @@ target-module@5600 {
> > > > ;
> > > > ti,sysc-sidle = ,
> > > > ,
> > > > -   ;
> > > > +   ,
> > > > +   ;
> > > 
> > > You probably checked this already.. But just in case, can you please
> > > confirm this is intentional. The documentation lists the smart wakeup
> > > capability bit as reserved for dra7, maybe the documentation is wrong.
> > > 
> > 
> > It was an intentional change, although I'm not sure it is correct :)
> > 
> > This is how we had it in our "evil vendor tree" for years (back when it
> > was hwmod based), so when converting these nodes to use "ti,sysc" I noticed
> > this bit was set, but as you point out the documentation disagrees.
> > 
> > I'd rather go with what has worked before, but it doesn't seem to
> > break anything either way, so we could also break this change out into
> > its own patch if you would prefer.
> 
> I agree it's best to stick what is known to work. How about let's add
> the related information to the patch description?

I'll update the commit message for it and apply these, no need to repost.

Regards,

Tony

[PATCH] dt-bindings: display: bridge: it6505: Add #sound-dai-cells

2024-01-25 Thread Chen-Yu Tsai

The ITE IT6505 display bridge can take one I2S input and transmit it
over the DisplayPort link.

Add #sound-dai-cells (= 0) to the binding for it.

Signed-off-by: Chen-Yu Tsai 
---
The driver side changes [1] are still being worked on, but given the
hardware is very simple, it would be nice if we could land the binding
first and be able to introduct device trees that have this.

[1] 
https://lore.kernel.org/linux-arm-kernel/20230730180803.22570-4-jiaxin...@mediatek.com/

 .../devicetree/bindings/display/bridge/ite,it6505.yaml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml 
b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
index 348b02f26041..7ec4decc9c21 100644
--- a/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
@@ -52,6 +52,9 @@ properties:
 maxItems: 1
 description: extcon specifier for the Power Delivery
 
+  "#sound-dai-cells":
+const: 0
+
   ports:
 $ref: /schemas/graph.yaml#/properties/ports
 
-- 
2.43.0.429.g432eaa2c6b-goog

RE: [PATCH 2/3] udmabuf: Sync buffer mappings for attached devices

2024-01-25 Thread Kasireddy, Vivek

> >> Currently this driver creates a SGT table using the CPU as the
> >> target device, then performs the dma_sync operations against
> >> that SGT. This is backwards to how DMA-BUFs are supposed to behave.
> >> This may have worked for the case where these buffers were given
> >> only back to the same CPU that produced them as in the QEMU case.
> >> And only then because the original author had the dma_sync
> >> operations also backwards, syncing for the "device" on begin_cpu.
> >> This was noticed and "fixed" in this patch[0].
> >>
> >> That then meant we were sync'ing from the CPU to the CPU using
> >> a pseudo-device "miscdevice". Which then caused another issue
> >> due to the miscdevice not having a proper DMA mask (and why should
> >> it, the CPU is not a DMA device). The fix for that was an even
> >> more egregious hack[1] that declares the CPU is coherent with
> >> itself and can access its own memory space..
> >>
> >> Unwind all this and perform the correct action by doing the dma_sync
> >> operations for each device currently attached to the backing buffer.
> > Makes sense.
> >
> >>
> >> [0] commit 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
> >> [1] commit 9e9fa6a9198b ("udmabuf: Set the DMA mask for the udmabuf
> >> device (v2)")
> >>
> >> Signed-off-by: Andrew Davis 
> >> ---
> >>   drivers/dma-buf/udmabuf.c | 41 +++
> >>   1 file changed, 16 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> >> index 3a23f0a7d112a..ab6764322523c 100644
> >> --- a/drivers/dma-buf/udmabuf.c
> >> +++ b/drivers/dma-buf/udmabuf.c
> >> @@ -26,8 +26,6 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a
> >> dmabuf, in megabytes. Default is
> >>   struct udmabuf {
> >>pgoff_t pagecount;
> >>struct page **pages;
> >> -  struct sg_table *sg;
> >> -  struct miscdevice *device;
> >>struct list_head attachments;
> >>struct mutex lock;
> >>   };
> >> @@ -169,12 +167,8 @@ static void unmap_udmabuf(struct
> >> dma_buf_attachment *at,
> >>   static void release_udmabuf(struct dma_buf *buf)
> >>   {
> >>struct udmabuf *ubuf = buf->priv;
> >> -  struct device *dev = ubuf->device->this_device;
> >>pgoff_t pg;
> >>
> >> -  if (ubuf->sg)
> >> -  put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> > What happens if the last importer maps the dmabuf but erroneously
> > closes it immediately? Would unmap somehow get called in this case?
> >
> 
> Good question, had to scan the framework code a bit here. I thought
> closing a DMABUF handle would automatically unwind any current
> attachments/mappings, but it seems nothing in the framework does that.
> 
> Looks like that is up to the importing drivers[0]:
> 
> > Once a driver is done with a shared buffer it needs to call
> > dma_buf_detach() (after cleaning up any mappings) and then
> > release the reference acquired with dma_buf_get() by
> > calling dma_buf_put().
> 
> So closing a DMABUF after mapping without first unmapping it would
> be a bug in the importer, it is not the exporters problem to check
It may be a bug in the importer but wouldn't the memory associated
with the sg table and attachment get leaked if unmap doesn't get called
in this scenario?

Thanks,
Vivek

> for (although some more warnings in the framework checking for that
> might not be a bad idea..).
> 
> Andrew
> 
> [0] https://www.kernel.org/doc/html/v6.7/driver-api/dma-buf.html
> 
> > Thanks,
> > Vivek
> >
> >> -
> >>for (pg = 0; pg < ubuf->pagecount; pg++)
> >>put_page(ubuf->pages[pg]);
> >>kfree(ubuf->pages);
> >> @@ -185,33 +179,31 @@ static int begin_cpu_udmabuf(struct dma_buf
> >> *buf,
> >> enum dma_data_direction direction)
> >>   {
> >>struct udmabuf *ubuf = buf->priv;
> >> -  struct device *dev = ubuf->device->this_device;
> >> -  int ret = 0;
> >> -
> >> -  if (!ubuf->sg) {
> >> -  ubuf->sg = get_sg_table(dev, buf, direction);
> >> -  if (IS_ERR(ubuf->sg)) {
> >> -  ret = PTR_ERR(ubuf->sg);
> >> -  ubuf->sg = NULL;
> >> -  }
> >> -  } else {
> >> -  dma_sync_sg_for_cpu(dev, ubuf->sg->sgl, ubuf->sg->nents,
> >> -  direction);
> >> -  }
> >> +  struct udmabuf_attachment *a;
> >>
> >> -  return ret;
> >> +  mutex_lock(>lock);
> >> +
> >> +  list_for_each_entry(a, >attachments, list)
> >> +  dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
> >> +
> >> +  mutex_unlock(>lock);
> >> +
> >> +  return 0;
> >>   }
> >>
> >>   static int end_cpu_udmabuf(struct dma_buf *buf,
> >>   enum dma_data_direction direction)
> >>   {
> >>struct udmabuf *ubuf = buf->priv;
> >> -  struct device *dev = ubuf->device->this_device;
> >> +  struct udmabuf_attachment *a;
> >>
> >> -  if (!ubuf->sg)
> >> -  return -EINVAL;
> >> +  mutex_lock(>lock);
> >> +
> >> +  list_for_each_entry(a, >attachments, list)
> >> +

Re: [PATCH v3] drm/ttm: Make sure the mapped tt pages are decrypted when needed

2024-01-25 Thread Zack Rusin

On Fri, Jan 5, 2024 at 8:51 AM Zack Rusin  wrote:
>
> Some drivers require the mapped tt pages to be decrypted. In an ideal
> world this would have been handled by the dma layer, but the TTM page
> fault handling would have to be rewritten to able to do that.
>
> A side-effect of the TTM page fault handling is using a dma allocation
> per order (via ttm_pool_alloc_page) which makes it impossible to just
> trivially use dma_mmap_attrs. As a result ttm has to be very careful
> about trying to make its pgprot for the mapped tt pages match what
> the dma layer thinks it is. At the ttm layer it's possible to
> deduce the requirement to have tt pages decrypted by checking
> whether coherent dma allocations have been requested and the system
> is running with confidential computing technologies.
>
> This approach isn't ideal but keeping TTM matching DMAs expectations
> for the page properties is in general fragile, unfortunately proper
> fix would require a rewrite of TTM's page fault handling.
>
> Fixes vmwgfx with SEV enabled.
>
> v2: Explicitly include cc_platform.h
> v3: Use CC_ATTR_GUEST_MEM_ENCRYPT instead of CC_ATTR_MEM_ENCRYPT to
> limit the scope to guests and log when memory decryption is enabled.

Hi, Christian.

Gentle ping on that one. This is probably the cleanest we can get this
code. Can we land this or is there anything else you'd like to see?

z

RE: [PATCH] mm: Remove double faults once write a device pfn

2024-01-25 Thread Zhou, Xianrong

[AMD Official Use Only - General]

>  The vmf_insert_pfn_prot could cause unnecessary double faults
>  on a device pfn. Because currently the vmf_insert_pfn_prot does
>  not make the pfn writable so the pte entry is normally
>  read-only or dirty catching.
> >>> What? How do you got to this conclusion?
> >> Sorry. I did not mention that this problem only exists on arm64
> platform.
> > Ok, that makes at least a little bit more sense.
> >
> >> Because on arm64 platform the PTE_RDONLY is automatically
> >> attached to the userspace pte entries even through VM_WRITE +
> VM_SHARE.
> >> The  PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot.
> >> However vmf_insert_pfn_prot do not make the pte writable passing
> >> false @mkwrite to insert_pfn.
> > Question is why is arm64 doing this? As far as I can see they must
> > have some hardware reason for that.
> >
> > The mkwrite parameter to insert_pfn() was added by commit
> > b2770da642540 to make insert_pfn() look more like insert_pfn_pmd()
> > so that the DAX code can insert PTEs which are writable and dirty
> > at the same
> >>> time.
>  This is one scenario to do so. In fact on arm64 there are many
>  scenarios could be to do so. So we can let vmf_insert_pfn_prot
>  supporting @mkwrite for drivers at core layer and let drivers to
>  decide whether or not to make writable and dirty at one time. The
>  patch did this. Otherwise double faults on arm64 when call
> >>> vmf_insert_pfn_prot.
> >>>
> >>> Well, that doesn't answer my question why arm64 is double faulting
> >>> in the first place,.
> >>>
> >>
> >> Eh.
> >>
> >> On arm64 When userspace mmap() with PROT_WRITE and MAP_SHARED
> the
> >> vma->vm_page_prot has the PTE_RDONLY and PTE_WRITE within
> >> PAGE_SHARED_EXEC. (seeing arm64 protection_map)
>
> Well that's your observation, but not the explanation why arm64 is doing this.
>
> See this would have quite some negative impact on performance, not only for
> gfx drivers but in general.
>
> So either the observation is incorrect or there is a *really* good reason why
> arm64 is taking this performance penalty.
>
> >> When write the userspace virtual address the first fault happen and
> >> call into driver's .fault->ttm_bo_vm_fault_reserved->vmf_insert_pfn_prot-
> >insert_pfn.
> >> The insert_pfn will establish the pte entry. However the
> >> vmf_insert_pfn_prot pass false @mkwrite to insert_pfn by default and
> >> so insert_pfn could not make the pfn writable and it do not call
> >> maybe_mkwrite(pte_mkdirty(entry), vma) to clear the PTE_RDONLY bit. So
> the pte entry is actually write protection for mmu.
> >> So when the first fault return and re-execute the store instruction
> >> the second fault happen again. And in second fault it just only do
> >> pte_mkdirty(entry) which clear the PTE_RDONLY.
> > It depends if the ARM64 CPU in question supports hardware dirty bit
> > management (DBM). If that is the case and PTE_DBM (ie. PTE_WRITE) is
> > set HW will automatically clear PTE_RDONLY bit to mark the entry dirty
> > instead of raising a write fault. So you shouldn't see a double fault
> > if PTE_DBM/WRITE is set.

Thanks. This is reasonable. But I still really had the double faults in my 
project
platform.

> >
> > On ARM64 you can kind of think of PTE_RDONLY as the HW dirty bit and
> > PTE_DBM as the read/write permission bit with SW being responsible for
> > updating PTE_RDONLY via the fault handler if DBM is not supported by HW.
> >
> > At least that's my understanding from having hacked on this in the
> > past. You can see all this weirdness happening in the definitions of
> > pte_dirty() and pte_write() for ARM64.
>
> +1
>
> Thanks a lot for that, this was exactly the information I was looking for.
>
> In this light it makes this patch here look unnecessary and questionable at
> best.
>
> Xianrong if you have an arm64 platform which really double faults (confirmed
> through a debugger for example) then you have to ask why this platform
> shows this behavior and not try to work around it.
>
> Behaviors like those usually have a very very good reason and without a
> confirmed explanation I'm not allowing any patch in which would disable stuff
> like that.

Thanks. You are very right. I found CONFIG_ARM64_HW_AFDBM is not enabled
in my project. So actually PTE_DBM is not hardware bit. It is software bit.
Now i understand why arm64 to do this attaching PTE_RDONLY automatically.
It is compatible for PTE_DBM hardware or not. This answers your question.

So I met double faults in my project when CONFIG_ARM64_HW_AFDBM is false.
However these double faults can be eliminated when i replace the 
vmf_insert_mixed
to vmf_insert_mixed_mkwrite in drivers under CONFIG_ARM64_HW_AFDBM = false.
The vmf_insert_pfn_prot is similar with vmf_insert_mixed. It should supply 
@mkwrite
parameter rather than false mkwrite by default.

So i think if you forgot to

Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-25 Thread Dave Airlie

 Just FYI I'm pulling this into drm-fixes straight as is, since if
fixes the regression and avoids the revert, however please keep
discussing until we are sure things are right, and we can deal with
any fixes in a follow-up patch.

Dave.

On Fri, 26 Jan 2024 at 03:32, Matthew Brost  wrote:
>
> On Thu, Jan 25, 2024 at 10:24:24AM +0100, Vlastimil Babka wrote:
> > On 1/24/24 22:08, Matthew Brost wrote:
> > > All entities must be drained in the DRM scheduler run job worker to
> > > avoid the following case. An entity found that is ready, no job found
> > > ready on entity, and run job worker goes idle with other entities + jobs
> > > ready. Draining all ready entities (i.e. loop over all ready entities)
> > > in the run job worker ensures all job that are ready will be scheduled.
> > >
> > > Cc: Thorsten Leemhuis 
> > > Reported-by: Mikhail Gavrilov 
> > > Closes: 
> > > https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuowtoeee+q26z...@mail.gmail.com/
> > > Reported-and-tested-by: Mario Limonciello 
> > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
> > > Link: 
> > > https://lore.kernel.org/all/20240123021155.2775-1-mario.limoncie...@amd.com/
> > > Reported-by: Vlastimil Babka 
> >
> > Can change to Reported-and-tested-by: Vlastimil Babka 
> >
>
> +1, got it.
>
> Matt
>
> > Thanks!
> >
> > > Closes: 
> > > https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd15...@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_main.c | 15 +++
> > >  1 file changed, 7 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 550492a7a031..85f082396d42 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1178,21 +1178,20 @@ static void drm_sched_run_job_work(struct 
> > > work_struct *w)
> > > struct drm_sched_entity *entity;
> > > struct dma_fence *fence;
> > > struct drm_sched_fence *s_fence;
> > > -   struct drm_sched_job *sched_job;
> > > +   struct drm_sched_job *sched_job = NULL;
> > > int r;
> > >
> > > if (READ_ONCE(sched->pause_submit))
> > > return;
> > >
> > > -   entity = drm_sched_select_entity(sched);
> > > +   /* Find entity with a ready job */
> > > +   while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> > > +   sched_job = drm_sched_entity_pop_job(entity);
> > > +   if (!sched_job)
> > > +   complete_all(>entity_idle);
> > > +   }
> > > if (!entity)
> > > -   return;
> > > -
> > > -   sched_job = drm_sched_entity_pop_job(entity);
> > > -   if (!sched_job) {
> > > -   complete_all(>entity_idle);
> > > return; /* No more work */
> > > -   }
> > >
> > > s_fence = sched_job->s_fence;
> > >
> >

Re: [PATCH v6 7/7] x86/vmware: Add TDX hypercall support

2024-01-25 Thread Alexey Makhalov





On 1/22/24 4:17 PM, H. Peter Anvin wrote:

On January 22, 2024 4:04:33 PM PST, Alexey Makhalov 
 wrote:



On 1/22/24 10:28 AM, H. Peter Anvin wrote:

On January 22, 2024 8:32:22 AM PST, Dave Hansen  wrote:

On 1/9/24 00:40, Alexey Makhalov wrote:

+#ifdef CONFIG_INTEL_TDX_GUEST
+unsigned long vmware_tdx_hypercall(unsigned long cmd,
+  struct tdx_module_args *args)
+{
+   if (!hypervisor_is_type(X86_HYPER_VMWARE))
+   return ULONG_MAX;
+
+   if (cmd & ~VMWARE_CMD_MASK) {
+   pr_warn_once("Out of range command %lx\n", cmd);
+   return ULONG_MAX;
+   }
+
+   args->r10 = VMWARE_TDX_VENDOR_LEAF;
+   args->r11 = VMWARE_TDX_HCALL_FUNC;
+   args->r12 = VMWARE_HYPERVISOR_MAGIC;
+   args->r13 = cmd;
+   args->r15 = 0; /* CPL */
+
+   __tdx_hypercall(args);
+
+   return args->r12;
+}
+EXPORT_SYMBOL_GPL(vmware_tdx_hypercall);
+#endif


This is the kind of wrapper that I was hoping for.  Thanks.

Acked-by: Dave Hansen 



I'm slightly confused by this TBH.

Why are the arguments passed in as a structure, which is modified by the 
wrapper to boot? This is analogous to a system call interface.

Furthermore, this is an out-of-line function; it should never be called with 
!X86_HYPER_VMWARE or you are introducing overhead for other hypervisors; I 
believe a pr_warn_once() is in order at least, just as you have for the 
out-of-range test.



This patch series introduces vmware_hypercall family of functions similar to 
kvm_hypercall. Similarity: both vmware and kvm implementations are static 
inline functions and both of them use __tdx_hypercall (global not exported 
symbol). Difference: kvm_hypercall functions are used _only_ within the kernel, 
but vmware_hypercall are also used by modules.
Exporting __tdx_hypercall function is an original Dave's concern.
So we ended up with exporting wrapper, not generic, but VMware specific with 
added checks against arbitrary use.
vmware_tdx_hypercall is not designed for !X86_HYPER_VMWARE callers. But such a 
calls are not forbidden.
Arguments in a structure is an API for __tdx_hypercall(). Input and output 
argument handling are done by vmware_hypercall callers, while VMware specific 
dress up is inside the wrapper.

Peter, do you think code comments are required to make it clear for the reader?




TBH that explanation didn't make much sense to me...


Peter,

I would like to understand your concerns.

1. Are you suggesting to move structure (tdx parameters) initialization 
in one please, instead of one part there another part here? Do you 
prefer to pass all arguments as is to vmware_tdx_hypercall() and only 
define tdx_module_args there?


2. And second suggestion is to add pr_warn_once under "if 
(!hypervisor_is_type(X86_HYPER_VMWARE))" ?


--Alexey

[PATCH v3 05/15] drm/msm/dp: fold dp_power into dp_ctrl module

2024-01-25 Thread Dmitry Baryshkov

The dp_power submodule is limited to handling the clocks only following
previous cleanups. Fold it into the dp_ctrl submodule, removing one
unnecessary level of indirection.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/Makefile|   1 -
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 150 +++
 drivers/gpu/drm/msm/dp/dp_ctrl.h|   6 +-
 drivers/gpu/drm/msm/dp/dp_display.c |  24 +
 drivers/gpu/drm/msm/dp/dp_power.c   | 170 
 drivers/gpu/drm/msm/dp/dp_power.h   |  74 
 6 files changed, 142 insertions(+), 283 deletions(-)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index b1173128b5b9..8dbdf3fba69e 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -128,7 +128,6 @@ msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
dp/dp_link.o \
dp/dp_panel.o \
dp/dp_parser.o \
-   dp/dp_power.o \
dp/dp_audio.o
 
 msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 77a8d9366ed7..da29281c575b 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -76,13 +76,16 @@ struct dp_ctrl_private {
struct drm_dp_aux *aux;
struct dp_panel *panel;
struct dp_link *link;
-   struct dp_power *power;
struct dp_parser *parser;
struct dp_catalog *catalog;
 
struct completion idle_comp;
struct completion psr_op_comp;
struct completion video_comp;
+
+   bool core_clks_on;
+   bool link_clks_on;
+   bool stream_clks_on;
 };
 
 static int dp_aux_link_configure(struct drm_dp_aux *aux,
@@ -1338,6 +1341,83 @@ static void dp_ctrl_set_clock_rate(struct 
dp_ctrl_private *ctrl,
name, rate);
 }
 
+int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
+  enum dp_pm_type pm_type, bool enable)
+{
+   struct dp_ctrl_private *ctrl;
+   struct dss_module_power *mp;
+   int ret = 0;
+
+   ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
+
+   if (pm_type != DP_CORE_PM &&
+   pm_type != DP_CTRL_PM &&
+   pm_type != DP_STREAM_PM) {
+   DRM_ERROR("unsupported ctrl module: %s\n",
+ dp_parser_pm_name(pm_type));
+   return -EINVAL;
+   }
+
+   if (enable) {
+   if (pm_type == DP_CORE_PM && ctrl->core_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev,
+  "core clks already enabled\n");
+   return 0;
+   }
+
+   if (pm_type == DP_CTRL_PM && ctrl->link_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev,
+  "links clks already enabled\n");
+   return 0;
+   }
+
+   if (pm_type == DP_STREAM_PM && ctrl->stream_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev,
+  "pixel clks already enabled\n");
+   return 0;
+   }
+
+   if ((pm_type == DP_CTRL_PM) && (!ctrl->core_clks_on)) {
+   drm_dbg_dp(ctrl->drm_dev,
+  "Enable core clks before link clks\n");
+   mp = >parser->mp[DP_CORE_PM];
+
+   ret = clk_bulk_prepare_enable(mp->num_clk, mp->clocks);
+   if (ret)
+   return ret;
+
+   ctrl->core_clks_on = true;
+   }
+   }
+
+   mp = >parser->mp[pm_type];
+   if (enable) {
+   ret = clk_bulk_prepare_enable(mp->num_clk, mp->clocks);
+   if (ret)
+   return ret;
+   } else {
+   clk_bulk_disable_unprepare(mp->num_clk, mp->clocks);
+   }
+
+   if (pm_type == DP_CORE_PM)
+   ctrl->core_clks_on = enable;
+   else if (pm_type == DP_STREAM_PM)
+   ctrl->stream_clks_on = enable;
+   else
+   ctrl->link_clks_on = enable;
+
+   drm_dbg_dp(ctrl->drm_dev, "%s clocks for %s\n",
+  enable ? "enable" : "disable",
+  dp_parser_pm_name(pm_type));
+   drm_dbg_dp(ctrl->drm_dev,
+  "stream_clks:%s link_clks:%s core_clks:%s\n",
+  ctrl->stream_clks_on ? "on" : "off",
+  ctrl->link_clks_on ? "on" : "off",
+  ctrl->core_clks_on ? "on" : "off");
+
+   return 0;
+}
+
 static int dp_ctrl_enable_mainlink_clocks(struct dp_ctrl_private *ctrl)
 {
int ret = 0;
@@ -1354,7 +1434,7 @@ static int dp_ctrl_enable_mainlink_clocks(struct 
dp_ctrl_private *ctrl)
phy_power_on(phy);
 
dev_pm_opp_set_rate(ctrl->dev, ctrl->link->link_params.rate * 1000);
-   ret = dp_power_clk_enable(ctrl->power, DP_CTRL_PM, true);
+

[PATCH v3 14/15] drm/msm/dp: move next_bridge handling to dp_display

2024-01-25 Thread Dmitry Baryshkov

Remove two levels of indirection and fetch next bridge directly in
dp_display_probe_tail().

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 43 -
 drivers/gpu/drm/msm/dp/dp_parser.c  | 14 
 drivers/gpu/drm/msm/dp/dp_parser.h  | 14 
 3 files changed, 14 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index f19cb8c7e8cb..de1306a88748 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1195,16 +1195,25 @@ static const struct msm_dp_desc 
*dp_display_get_desc(struct platform_device *pde
return NULL;
 }
 
-static int dp_display_get_next_bridge(struct msm_dp *dp);
-
 static int dp_display_probe_tail(struct device *dev)
 {
struct msm_dp *dp = dev_get_drvdata(dev);
int ret;
 
-   ret = dp_display_get_next_bridge(dp);
-   if (ret)
-   return ret;
+   /*
+* External bridges are mandatory for eDP interfaces: one has to
+* provide at least an eDP panel (which gets wrapped into panel-bridge).
+*
+* For DisplayPort interfaces external bridges are optional, so
+* silently ignore an error if one is not present (-ENODEV).
+*/
+   dp->next_bridge = devm_drm_of_get_bridge(>pdev->dev, 
dp->pdev->dev.of_node, 1, 0);
+   if (IS_ERR(dp->next_bridge)) {
+   ret = PTR_ERR(dp->next_bridge);
+   dp->next_bridge = NULL;
+   if (dp->is_edp || ret != -ENODEV)
+   return ret;
+   }
 
ret = component_add(dev, _display_comp_ops);
if (ret)
@@ -1397,30 +1406,6 @@ void dp_display_debugfs_init(struct msm_dp *dp_display, 
struct dentry *root, boo
}
 }
 
-static int dp_display_get_next_bridge(struct msm_dp *dp)
-{
-   int rc;
-   struct dp_display_private *dp_priv;
-
-   dp_priv = container_of(dp, struct dp_display_private, dp_display);
-
-   /*
-* External bridges are mandatory for eDP interfaces: one has to
-* provide at least an eDP panel (which gets wrapped into panel-bridge).
-*
-* For DisplayPort interfaces external bridges are optional, so
-* silently ignore an error if one is not present (-ENODEV).
-*/
-   rc = devm_dp_parser_find_next_bridge(>pdev->dev, dp_priv->parser);
-   if (!dp->is_edp && rc == -ENODEV)
-   return 0;
-
-   if (!rc)
-   dp->next_bridge = dp_priv->parser->next_bridge;
-
-   return rc;
-}
-
 int msm_dp_modeset_init(struct msm_dp *dp_display, struct drm_device *dev,
struct drm_encoder *encoder)
 {
diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
b/drivers/gpu/drm/msm/dp/dp_parser.c
index aa135d5cedbd..f95ab3c5c72c 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.c
+++ b/drivers/gpu/drm/msm/dp/dp_parser.c
@@ -24,20 +24,6 @@ static int dp_parser_ctrl_res(struct dp_parser *parser)
return 0;
 }
 
-int devm_dp_parser_find_next_bridge(struct device *dev, struct dp_parser 
*parser)
-{
-   struct platform_device *pdev = parser->pdev;
-   struct drm_bridge *bridge;
-
-   bridge = devm_drm_of_get_bridge(dev, pdev->dev.of_node, 1, 0);
-   if (IS_ERR(bridge))
-   return PTR_ERR(bridge);
-
-   parser->next_bridge = bridge;
-
-   return 0;
-}
-
 static int dp_parser_parse(struct dp_parser *parser)
 {
int rc = 0;
diff --git a/drivers/gpu/drm/msm/dp/dp_parser.h 
b/drivers/gpu/drm/msm/dp/dp_parser.h
index 21a66932e35e..38fd335d5950 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.h
+++ b/drivers/gpu/drm/msm/dp/dp_parser.h
@@ -21,7 +21,6 @@
 struct dp_parser {
struct platform_device *pdev;
struct phy *phy;
-   struct drm_bridge *next_bridge;
 };
 
 /**
@@ -37,17 +36,4 @@ struct dp_parser {
  */
 struct dp_parser *dp_parser_get(struct platform_device *pdev);
 
-/**
- * devm_dp_parser_find_next_bridge() - find an additional bridge to DP
- *
- * @dev: device to tie bridge lifetime to
- * @parser: dp_parser data from client
- *
- * This function is used to find any additional bridge attached to
- * the DP controller. The eDP interface requires a panel bridge.
- *
- * Return: 0 if able to get the bridge, otherwise negative errno for failure.
- */
-int devm_dp_parser_find_next_bridge(struct device *dev, struct dp_parser 
*parser);
-
 #endif

-- 
2.39.2

[PATCH v3 12/15] drm/msm/dp: move all IO handling to dp_catalog

2024-01-25 Thread Dmitry Baryshkov

Rather than parsing the I/O addresses from dp_parser and then passing
them via a struct pointer to dp_catalog, handle I/O region parsing in
dp_catalog and drop it from dp_parser.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 125 ++--
 drivers/gpu/drm/msm/dp/dp_catalog.h |   2 +-
 drivers/gpu/drm/msm/dp/dp_display.c |   6 +-
 drivers/gpu/drm/msm/dp/dp_parser.c  |  73 +
 drivers/gpu/drm/msm/dp/dp_parser.h  |  26 +---
 5 files changed, 114 insertions(+), 118 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 4c6207797c99..541aac2cb246 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -53,10 +54,31 @@
(PSR_UPDATE_MASK | PSR_CAPTURE_MASK | PSR_EXIT_MASK | \
PSR_UPDATE_ERROR_MASK | PSR_WAKE_ERROR_MASK)
 
+#define DP_DEFAULT_AHB_OFFSET  0x
+#define DP_DEFAULT_AHB_SIZE0x0200
+#define DP_DEFAULT_AUX_OFFSET  0x0200
+#define DP_DEFAULT_AUX_SIZE0x0200
+#define DP_DEFAULT_LINK_OFFSET 0x0400
+#define DP_DEFAULT_LINK_SIZE   0x0C00
+#define DP_DEFAULT_P0_OFFSET   0x1000
+#define DP_DEFAULT_P0_SIZE 0x0400
+
+struct dss_io_region {
+   size_t len;
+   void __iomem *base;
+};
+
+struct dss_io_data {
+   struct dss_io_region ahb;
+   struct dss_io_region aux;
+   struct dss_io_region link;
+   struct dss_io_region p0;
+};
+
 struct dp_catalog_private {
struct device *dev;
struct drm_device *drm_dev;
-   struct dp_io *io;
+   struct dss_io_data io;
u32 (*audio_map)[DP_AUDIO_SDP_HEADER_MAX];
struct dp_catalog dp_catalog;
u8 aux_lut_cfg_index[PHY_AUX_CFG_MAX];
@@ -66,7 +88,7 @@ void dp_catalog_snapshot(struct dp_catalog *dp_catalog, 
struct msm_disp_state *d
 {
struct dp_catalog_private *catalog = container_of(dp_catalog,
struct dp_catalog_private, dp_catalog);
-   struct dss_io_data *dss = >io->dp_controller;
+   struct dss_io_data *dss = >io;
 
msm_disp_snapshot_add_block(disp_state, dss->ahb.len, dss->ahb.base, 
"dp_ahb");
msm_disp_snapshot_add_block(disp_state, dss->aux.len, dss->aux.base, 
"dp_aux");
@@ -76,7 +98,7 @@ void dp_catalog_snapshot(struct dp_catalog *dp_catalog, 
struct msm_disp_state *d
 
 static inline u32 dp_read_aux(struct dp_catalog_private *catalog, u32 offset)
 {
-   return readl_relaxed(catalog->io->dp_controller.aux.base + offset);
+   return readl_relaxed(catalog->io.aux.base + offset);
 }
 
 static inline void dp_write_aux(struct dp_catalog_private *catalog,
@@ -86,12 +108,12 @@ static inline void dp_write_aux(struct dp_catalog_private 
*catalog,
 * To make sure aux reg writes happens before any other operation,
 * this function uses writel() instread of writel_relaxed()
 */
-   writel(data, catalog->io->dp_controller.aux.base + offset);
+   writel(data, catalog->io.aux.base + offset);
 }
 
 static inline u32 dp_read_ahb(const struct dp_catalog_private *catalog, u32 
offset)
 {
-   return readl_relaxed(catalog->io->dp_controller.ahb.base + offset);
+   return readl_relaxed(catalog->io.ahb.base + offset);
 }
 
 static inline void dp_write_ahb(struct dp_catalog_private *catalog,
@@ -101,7 +123,7 @@ static inline void dp_write_ahb(struct dp_catalog_private 
*catalog,
 * To make sure phy reg writes happens before any other operation,
 * this function uses writel() instread of writel_relaxed()
 */
-   writel(data, catalog->io->dp_controller.ahb.base + offset);
+   writel(data, catalog->io.ahb.base + offset);
 }
 
 static inline void dp_write_p0(struct dp_catalog_private *catalog,
@@ -111,7 +133,7 @@ static inline void dp_write_p0(struct dp_catalog_private 
*catalog,
 * To make sure interface reg writes happens before any other operation,
 * this function uses writel() instread of writel_relaxed()
 */
-   writel(data, catalog->io->dp_controller.p0.base + offset);
+   writel(data, catalog->io.p0.base + offset);
 }
 
 static inline u32 dp_read_p0(struct dp_catalog_private *catalog,
@@ -121,12 +143,12 @@ static inline u32 dp_read_p0(struct dp_catalog_private 
*catalog,
 * To make sure interface reg writes happens before any other operation,
 * this function uses writel() instread of writel_relaxed()
 */
-   return readl_relaxed(catalog->io->dp_controller.p0.base + offset);
+   return readl_relaxed(catalog->io.p0.base + offset);
 }
 
 static inline u32 dp_read_link(struct dp_catalog_private *catalog, u32 offset)
 {
-   return readl_relaxed(catalog->io->dp_controller.link.base + offset);
+   return readl_relaxed(catalog->io.link.base + offset);
 }
 
 static inline void dp_write_link(struct dp_catalog_private *catalog,
@@ -136,7 +158,7 @@

[PATCH v3 11/15] drm/msm/dp: handle PHY directly in dp_ctrl

2024-01-25 Thread Dmitry Baryshkov

There is little point in going trough dp_parser->io indirection each
time the driver needs to access the PHY. Store the pointer directly in
dp_ctrl_private.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 37 +
 drivers/gpu/drm/msm/dp/dp_ctrl.h|  2 +-
 drivers/gpu/drm/msm/dp/dp_display.c |  3 ++-
 3 files changed, 16 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 4aea72a2b8e8..fc7ce315ae41 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -76,9 +76,10 @@ struct dp_ctrl_private {
struct drm_dp_aux *aux;
struct dp_panel *panel;
struct dp_link *link;
-   struct dp_parser *parser;
struct dp_catalog *catalog;
 
+   struct phy *phy;
+
unsigned int num_core_clks;
struct clk_bulk_data *core_clks;
 
@@ -1028,7 +1029,7 @@ static int dp_ctrl_set_vx_px(struct dp_ctrl_private *ctrl,
phy_opts->dp.voltage[0] = v_level;
phy_opts->dp.pre[0] = p_level;
phy_opts->dp.set_voltages = 1;
-   phy_configure(ctrl->parser->io.phy, phy_opts);
+   phy_configure(ctrl->phy, phy_opts);
phy_opts->dp.set_voltages = 0;
 
return 0;
@@ -1442,7 +1443,7 @@ static void dp_ctrl_link_clk_disable(struct dp_ctrl 
*dp_ctrl)
 static int dp_ctrl_enable_mainlink_clocks(struct dp_ctrl_private *ctrl)
 {
int ret = 0;
-   struct phy *phy = ctrl->parser->io.phy;
+   struct phy *phy = ctrl->phy;
const u8 *dpcd = ctrl->panel->dpcd;
 
ctrl->phy_opts.dp.lanes = ctrl->link->link_params.num_lanes;
@@ -1540,12 +1541,10 @@ void dp_ctrl_set_psr(struct dp_ctrl *dp_ctrl, bool 
enter)
 void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dp_io *dp_io;
struct phy *phy;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
dp_catalog_ctrl_phy_reset(ctrl->catalog);
phy_init(phy);
@@ -1557,12 +1556,10 @@ void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl)
 void dp_ctrl_phy_exit(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dp_io *dp_io;
struct phy *phy;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
dp_catalog_ctrl_phy_reset(ctrl->catalog);
phy_exit(phy);
@@ -1587,7 +1584,7 @@ static bool dp_ctrl_use_fixed_nvid(struct dp_ctrl_private 
*ctrl)
 
 static int dp_ctrl_reinitialize_mainlink(struct dp_ctrl_private *ctrl)
 {
-   struct phy *phy = ctrl->parser->io.phy;
+   struct phy *phy = ctrl->phy;
int ret = 0;
 
dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, false);
@@ -1617,11 +1614,9 @@ static int dp_ctrl_reinitialize_mainlink(struct 
dp_ctrl_private *ctrl)
 
 static int dp_ctrl_deinitialize_mainlink(struct dp_ctrl_private *ctrl)
 {
-   struct dp_io *dp_io;
struct phy *phy;
 
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, false);
 
@@ -2047,12 +2042,10 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
 void dp_ctrl_off_link_stream(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dp_io *dp_io;
struct phy *phy;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
/* set dongle to D3 (power off) mode */
dp_link_psm_config(ctrl->link, >panel->link_info, true);
@@ -2080,12 +2073,10 @@ void dp_ctrl_off_link_stream(struct dp_ctrl *dp_ctrl)
 void dp_ctrl_off_link(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dp_io *dp_io;
struct phy *phy;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, false);
 
@@ -2103,12 +2094,10 @@ void dp_ctrl_off_link(struct dp_ctrl *dp_ctrl)
 void dp_ctrl_off(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dp_io *dp_io;
struct phy *phy;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dp_io = >parser->io;
-   phy = dp_io->phy;
+   phy = ctrl->phy;
 
dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, false);
 
@@ -2225,7 +2214,7 @@ static int dp_ctrl_clk_init(struct dp_ctrl *dp_ctrl)
 struct dp_ctrl *dp_ctrl_get(struct device *dev, struct dp_link *link,
struct dp_panel *panel, struct drm_dp_aux *aux,
struct dp_catalog *catalog,
-   struct dp_parser *parser)
+

[PATCH v3 15/15] drm/msm/dp: drop dp_parser

2024-01-25 Thread Dmitry Baryshkov

Finally drop separate "parsing" submodule. There is no need in it
anymore. All submodules handle DT properties directly rather than
passing them via the separate structure pointer.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/Makefile|  1 -
 drivers/gpu/drm/msm/dp/dp_aux.h |  1 +
 drivers/gpu/drm/msm/dp/dp_catalog.h |  1 -
 drivers/gpu/drm/msm/dp/dp_ctrl.h|  3 +-
 drivers/gpu/drm/msm/dp/dp_debug.c   |  1 -
 drivers/gpu/drm/msm/dp/dp_display.c | 18 +--
 drivers/gpu/drm/msm/dp/dp_display.h |  2 ++
 drivers/gpu/drm/msm/dp/dp_parser.c  | 61 -
 drivers/gpu/drm/msm/dp/dp_parser.h  | 39 
 9 files changed, 12 insertions(+), 115 deletions(-)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 8dbdf3fba69e..543e04fa72e3 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -127,7 +127,6 @@ msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \
dp/dp_drm.o \
dp/dp_link.o \
dp/dp_panel.o \
-   dp/dp_parser.o \
dp/dp_audio.o
 
 msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o
diff --git a/drivers/gpu/drm/msm/dp/dp_aux.h b/drivers/gpu/drm/msm/dp/dp_aux.h
index 16d9b1758748..f47d591c1f54 100644
--- a/drivers/gpu/drm/msm/dp/dp_aux.h
+++ b/drivers/gpu/drm/msm/dp/dp_aux.h
@@ -16,6 +16,7 @@ void dp_aux_init(struct drm_dp_aux *dp_aux);
 void dp_aux_deinit(struct drm_dp_aux *dp_aux);
 void dp_aux_reconfig(struct drm_dp_aux *dp_aux);
 
+struct phy;
 struct drm_dp_aux *dp_aux_get(struct device *dev, struct dp_catalog *catalog,
  struct phy *phy,
  bool is_edp);
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 989e4c4fd6fa..a724a986b6ee 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -8,7 +8,6 @@
 
 #include 
 
-#include "dp_parser.h"
 #include "disp/msm_disp_snapshot.h"
 
 /* interrupts */
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.h b/drivers/gpu/drm/msm/dp/dp_ctrl.h
index 6e9f375b856a..fa014cee7e21 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.h
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.h
@@ -9,7 +9,6 @@
 #include "dp_aux.h"
 #include "dp_panel.h"
 #include "dp_link.h"
-#include "dp_parser.h"
 #include "dp_catalog.h"
 
 struct dp_ctrl {
@@ -17,6 +16,8 @@ struct dp_ctrl {
bool wide_bus_en;
 };
 
+struct phy;
+
 int dp_ctrl_on_link(struct dp_ctrl *dp_ctrl);
 int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool force_link_train);
 void dp_ctrl_off_link_stream(struct dp_ctrl *dp_ctrl);
diff --git a/drivers/gpu/drm/msm/dp/dp_debug.c 
b/drivers/gpu/drm/msm/dp/dp_debug.c
index 6c281dc095b9..ac68554801a4 100644
--- a/drivers/gpu/drm/msm/dp/dp_debug.c
+++ b/drivers/gpu/drm/msm/dp/dp_debug.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 
-#include "dp_parser.h"
 #include "dp_catalog.h"
 #include "dp_aux.h"
 #include "dp_ctrl.h"
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index de1306a88748..67956e34436d 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -9,12 +9,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include "msm_drv.h"
 #include "msm_kms.h"
-#include "dp_parser.h"
 #include "dp_ctrl.h"
 #include "dp_catalog.h"
 #include "dp_aux.h"
@@ -87,7 +87,6 @@ struct dp_display_private {
struct drm_device *drm_dev;
struct dentry *root;
 
-   struct dp_parser  *parser;
struct dp_catalog *catalog;
struct drm_dp_aux *aux;
struct dp_link*link;
@@ -704,14 +703,11 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
struct dp_panel_in panel_in = {
.dev = dev,
};
+   struct phy *phy;
 
-   dp->parser = dp_parser_get(dp->dp_display.pdev);
-   if (IS_ERR(dp->parser)) {
-   rc = PTR_ERR(dp->parser);
-   DRM_ERROR("failed to initialize parser, rc = %d\n", rc);
-   dp->parser = NULL;
-   goto error;
-   }
+   phy = devm_phy_get(dev, "dp");
+   if (IS_ERR(phy))
+   return PTR_ERR(phy);
 
dp->catalog = dp_catalog_get(dev);
if (IS_ERR(dp->catalog)) {
@@ -722,7 +718,7 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
}
 
dp->aux = dp_aux_get(dev, dp->catalog,
-dp->parser->phy,
+phy,
 dp->dp_display.is_edp);
if (IS_ERR(dp->aux)) {
rc = PTR_ERR(dp->aux);
@@ -753,7 +749,7 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
 
dp->ctrl = dp_ctrl_get(dev, dp->link, dp->panel, dp->aux,
   dp->catalog,
-  dp->parser->phy);
+  phy);
if (IS_ERR(dp->ctrl)) {
rc = PTR_ERR(dp->ctrl);

[PATCH v3 13/15] drm/msm/dp: move link property handling to dp_panel

2024-01-25 Thread Dmitry Baryshkov

Instead of passing link properties through the separate struct, parse
them directly in the dp_panel.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_display.c |  8 -
 drivers/gpu/drm/msm/dp/dp_display.h |  1 -
 drivers/gpu/drm/msm/dp/dp_panel.c   | 66 +
 drivers/gpu/drm/msm/dp/dp_parser.c  | 54 --
 drivers/gpu/drm/msm/dp/dp_parser.h  |  4 ---
 5 files changed, 66 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 5ad96989c5f2..f19cb8c7e8cb 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -356,12 +356,6 @@ static int dp_display_process_hpd_high(struct 
dp_display_private *dp)
int rc = 0;
struct edid *edid;
 
-   dp->panel->max_dp_lanes = dp->parser->max_dp_lanes;
-   dp->panel->max_dp_link_rate = dp->parser->max_dp_link_rate;
-
-   drm_dbg_dp(dp->drm_dev, "max_lanes=%d max_link_rate=%d\n",
-   dp->panel->max_dp_lanes, dp->panel->max_dp_link_rate);
-
rc = dp_panel_read_sink_caps(dp->panel, dp->dp_display.connector);
if (rc)
goto end;
@@ -381,8 +375,6 @@ static int dp_display_process_hpd_high(struct 
dp_display_private *dp)
dp->audio_supported = drm_detect_monitor_audio(edid);
dp_panel_handle_sink_request(dp->panel);
 
-   dp->dp_display.max_dp_lanes = dp->parser->max_dp_lanes;
-
/*
 * set sink to normal operation mode -- D0
 * before dpcd read
diff --git a/drivers/gpu/drm/msm/dp/dp_display.h 
b/drivers/gpu/drm/msm/dp/dp_display.h
index 102f3507d824..70759dd1bfd0 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.h
+++ b/drivers/gpu/drm/msm/dp/dp_display.h
@@ -28,7 +28,6 @@ struct msm_dp {
 
bool wide_bus_en;
 
-   u32 max_dp_lanes;
struct dp_audio *dp_audio;
bool psr_supported;
 };
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 127f6af995cd..8242541a81b9 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -7,8 +7,12 @@
 
 #include 
 #include 
+#include 
 #include 
 
+#define DP_MAX_NUM_DP_LANES4
+#define DP_LINK_RATE_HBR2  54 /* kbytes */
+
 struct dp_panel_private {
struct device *dev;
struct drm_device *drm_dev;
@@ -138,6 +142,9 @@ int dp_panel_read_sink_caps(struct dp_panel *dp_panel,
 
panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
 
+   drm_dbg_dp(panel->drm_dev, "max_lanes=%d max_link_rate=%d\n",
+   dp_panel->max_dp_lanes, dp_panel->max_dp_link_rate);
+
rc = dp_panel_read_dpcd(dp_panel);
if (rc) {
DRM_ERROR("read dpcd failed %d\n", rc);
@@ -386,10 +393,65 @@ int dp_panel_init_panel_info(struct dp_panel *dp_panel)
return 0;
 }
 
+static u32 dp_panel_link_frequencies(struct device_node *of_node)
+{
+   struct device_node *endpoint;
+   u64 frequency = 0;
+   int cnt;
+
+   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* port@1 */
+   if (!endpoint)
+   return 0;
+
+   cnt = of_property_count_u64_elems(endpoint, "link-frequencies");
+
+   if (cnt > 0)
+   of_property_read_u64_index(endpoint, "link-frequencies",
+   cnt - 1, );
+   of_node_put(endpoint);
+
+   do_div(frequency,
+   10 * /* from symbol rate to link rate */
+   1000); /* kbytes */
+
+   return frequency;
+}
+
+static int dp_panel_parse_dt(struct dp_panel *dp_panel)
+{
+   struct dp_panel_private *panel;
+   struct device_node *of_node;
+   int cnt;
+
+   panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
+   of_node = panel->dev->of_node;
+
+   /*
+* data-lanes is the property of dp_out endpoint
+*/
+   cnt = drm_of_get_data_lanes_count_ep(of_node, 1, 0, 1, 
DP_MAX_NUM_DP_LANES);
+   if (cnt < 0) {
+   /* legacy code, data-lanes is the property of mdss_dp node */
+   cnt = drm_of_get_data_lanes_count(of_node, 1, 
DP_MAX_NUM_DP_LANES);
+   }
+
+   if (cnt > 0)
+   dp_panel->max_dp_lanes = cnt;
+   else
+   dp_panel->max_dp_lanes = DP_MAX_NUM_DP_LANES; /* 4 lanes */
+
+   dp_panel->max_dp_link_rate = dp_panel_link_frequencies(of_node);
+   if (!dp_panel->max_dp_link_rate)
+   dp_panel->max_dp_link_rate = DP_LINK_RATE_HBR2;
+
+   return 0;
+}
+
 struct dp_panel *dp_panel_get(struct dp_panel_in *in)
 {
struct dp_panel_private *panel;
struct dp_panel *dp_panel;
+   int ret;
 
if (!in->dev || !in->catalog || !in->aux || !in->link) {
DRM_ERROR("invalid input\n");
@@ -408,6 +470,10 @@ struct dp_panel *dp_panel_get(struct dp_panel_in *in)
dp_panel = >dp_panel;

[PATCH v3 07/15] drm/msm/dp: stop parsing clock names from DT

2024-01-25 Thread Dmitry Baryshkov

All supported platforms use the same clocks configuration. Instead of
parsing names from DT in a pretty complex manner, use the static
configuration. If at some point newer (or older) platforms have
different clock configuration, this clock config can be moved to the
device data.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_ctrl.c   |  73 ++--
 drivers/gpu/drm/msm/dp/dp_ctrl.h   |   6 ++
 drivers/gpu/drm/msm/dp/dp_parser.c | 112 -
 drivers/gpu/drm/msm/dp/dp_parser.h |  22 
 4 files changed, 63 insertions(+), 150 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 56a424a82a1b..cfcf6136ffa6 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -69,6 +69,11 @@ struct dp_vc_tu_mapping_table {
u8 tu_size_minus1;
 };
 
+struct dss_module_power {
+   unsigned int num_clk;
+   struct clk_bulk_data *clocks;
+};
+
 struct dp_ctrl_private {
struct dp_ctrl dp_ctrl;
struct drm_device *drm_dev;
@@ -79,6 +84,7 @@ struct dp_ctrl_private {
struct dp_parser *parser;
struct dp_catalog *catalog;
 
+   struct dss_module_power mp[DP_MAX_PM];
struct clk *pixel_clk;
 
struct completion idle_comp;
@@ -90,6 +96,15 @@ struct dp_ctrl_private {
bool stream_clks_on;
 };
 
+static inline const char *dp_pm_name(enum dp_pm_type module)
+{
+   switch (module) {
+   case DP_CORE_PM:return "DP_CORE_PM";
+   case DP_CTRL_PM:return "DP_CTRL_PM";
+   default:return "???";
+   }
+}
+
 static int dp_aux_link_configure(struct drm_dp_aux *aux,
struct dp_link_info *link)
 {
@@ -1334,7 +1349,7 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
if (pm_type != DP_CORE_PM &&
pm_type != DP_CTRL_PM) {
DRM_ERROR("unsupported ctrl module: %s\n",
- dp_parser_pm_name(pm_type));
+ dp_pm_name(pm_type));
return -EINVAL;
}
 
@@ -1354,7 +1369,7 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
if ((pm_type == DP_CTRL_PM) && (!ctrl->core_clks_on)) {
drm_dbg_dp(ctrl->drm_dev,
   "Enable core clks before link clks\n");
-   mp = >parser->mp[DP_CORE_PM];
+   mp = >mp[DP_CORE_PM];
 
ret = clk_bulk_prepare_enable(mp->num_clk, mp->clocks);
if (ret)
@@ -1364,7 +1379,7 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
}
}
 
-   mp = >parser->mp[pm_type];
+   mp = >mp[pm_type];
if (enable) {
ret = clk_bulk_prepare_enable(mp->num_clk, mp->clocks);
if (ret)
@@ -1380,7 +1395,7 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
 
drm_dbg_dp(ctrl->drm_dev, "%s clocks for %s\n",
   enable ? "enable" : "disable",
-  dp_parser_pm_name(pm_type));
+  dp_pm_name(pm_type));
drm_dbg_dp(ctrl->drm_dev,
   "stream_clks:%s link_clks:%s core_clks:%s\n",
   ctrl->stream_clks_on ? "on" : "off",
@@ -2158,30 +2173,56 @@ irqreturn_t dp_ctrl_isr(struct dp_ctrl *dp_ctrl)
return ret;
 }
 
+static const char *core_clks[] = {
+   "core_iface",
+   "core_aux",
+};
+
+static const char *ctrl_clks[] = {
+   "ctrl_link",
+   "ctrl_link_iface",
+};
+
 static int dp_ctrl_clk_init(struct dp_ctrl *dp_ctrl)
 {
-   struct dp_ctrl_private *ctrl_private;
-   int rc = 0;
-   struct dss_module_power *core, *ctrl;
+   struct dp_ctrl_private *ctrl;
+   struct dss_module_power *core, *link;
struct device *dev;
+   int i, rc;
+
+   ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
+   dev = ctrl->dev;
 
-   ctrl_private = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
-   dev = ctrl_private->dev;
+   core = >mp[DP_CORE_PM];
+   link = >mp[DP_CTRL_PM];
 
-   core = _private->parser->mp[DP_CORE_PM];
-   ctrl = _private->parser->mp[DP_CTRL_PM];
+   core->num_clk = ARRAY_SIZE(core_clks);
+   core->clocks = devm_kcalloc(dev, core->num_clk, sizeof(*core->clocks), 
GFP_KERNEL);
+   if (!core->clocks)
+   return -ENOMEM;
+
+   for (i = 0; i < core->num_clk; i++)
+   core->clocks[i].id = core_clks[i];
 
rc = devm_clk_bulk_get(dev, core->num_clk, core->clocks);
if (rc)
return rc;
 
-   rc = devm_clk_bulk_get(dev, ctrl->num_clk, ctrl->clocks);
+   link->num_clk = ARRAY_SIZE(ctrl_clks);
+   link->clocks = devm_kcalloc(dev, link->num_clk, sizeof(*link->clocks), 
GFP_KERNEL);
+   if (!link->clocks)
+   return -ENOMEM;
+
+   for (i = 0; i < link->num_clk; i++)
+

[PATCH v3 09/15] drm/msm/dp: move phy_configure_opts to dp_ctrl

2024-01-25 Thread Dmitry Baryshkov

There is little point in sharing phy configuration structure between
several modules. Move it to dp_ctrl, which becomes the only submodule
re-configuring the PHY.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 19 -
 drivers/gpu/drm/msm/dp/dp_catalog.h |  2 --
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 41 -
 drivers/gpu/drm/msm/dp/dp_parser.h  |  3 ---
 4 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 5142aeb705a4..e07651768805 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -765,25 +765,6 @@ void dp_catalog_ctrl_phy_reset(struct dp_catalog 
*dp_catalog)
dp_write_ahb(catalog, REG_DP_PHY_CTRL, 0x0);
 }
 
-int dp_catalog_ctrl_update_vx_px(struct dp_catalog *dp_catalog,
-   u8 v_level, u8 p_level)
-{
-   struct dp_catalog_private *catalog = container_of(dp_catalog,
-   struct dp_catalog_private, dp_catalog);
-   struct dp_io *dp_io = catalog->io;
-   struct phy *phy = dp_io->phy;
-   struct phy_configure_opts_dp *opts_dp = _io->phy_opts.dp;
-
-   /* TODO: Update for all lanes instead of just first one */
-   opts_dp->voltage[0] = v_level;
-   opts_dp->pre[0] = p_level;
-   opts_dp->set_voltages = 1;
-   phy_configure(phy, _io->phy_opts);
-   opts_dp->set_voltages = 0;
-
-   return 0;
-}
-
 void dp_catalog_ctrl_send_phy_pattern(struct dp_catalog *dp_catalog,
u32 pattern)
 {
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 38786e855b51..ba7c62ba7ca3 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -111,8 +111,6 @@ void dp_catalog_ctrl_set_psr(struct dp_catalog *dp_catalog, 
bool enter);
 u32 dp_catalog_link_is_connected(struct dp_catalog *dp_catalog);
 u32 dp_catalog_hpd_get_intr_status(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_phy_reset(struct dp_catalog *dp_catalog);
-int dp_catalog_ctrl_update_vx_px(struct dp_catalog *dp_catalog, u8 v_level,
-   u8 p_level);
 int dp_catalog_ctrl_get_interrupt(struct dp_catalog *dp_catalog);
 u32 dp_catalog_ctrl_read_psr_interrupt_status(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_update_transfer_unit(struct dp_catalog *dp_catalog,
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index e367eb8e5bea..4aea72a2b8e8 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -87,6 +87,8 @@ struct dp_ctrl_private {
 
struct clk *pixel_clk;
 
+   union phy_configure_opts phy_opts;
+
struct completion idle_comp;
struct completion psr_op_comp;
struct completion video_comp;
@@ -1017,6 +1019,21 @@ static int dp_ctrl_wait4video_ready(struct 
dp_ctrl_private *ctrl)
return ret;
 }
 
+static int dp_ctrl_set_vx_px(struct dp_ctrl_private *ctrl,
+u8 v_level, u8 p_level)
+{
+   union phy_configure_opts *phy_opts = >phy_opts;
+
+   /* TODO: Update for all lanes instead of just first one */
+   phy_opts->dp.voltage[0] = v_level;
+   phy_opts->dp.pre[0] = p_level;
+   phy_opts->dp.set_voltages = 1;
+   phy_configure(ctrl->parser->io.phy, phy_opts);
+   phy_opts->dp.set_voltages = 0;
+
+   return 0;
+}
+
 static int dp_ctrl_update_vx_px(struct dp_ctrl_private *ctrl)
 {
struct dp_link *link = ctrl->link;
@@ -1029,7 +1046,7 @@ static int dp_ctrl_update_vx_px(struct dp_ctrl_private 
*ctrl)
drm_dbg_dp(ctrl->drm_dev,
"voltage level: %d emphasis level: %d\n",
voltage_swing_level, pre_emphasis_level);
-   ret = dp_catalog_ctrl_update_vx_px(ctrl->catalog,
+   ret = dp_ctrl_set_vx_px(ctrl,
voltage_swing_level, pre_emphasis_level);
 
if (ret)
@@ -1425,16 +1442,14 @@ static void dp_ctrl_link_clk_disable(struct dp_ctrl 
*dp_ctrl)
 static int dp_ctrl_enable_mainlink_clocks(struct dp_ctrl_private *ctrl)
 {
int ret = 0;
-   struct dp_io *dp_io = >parser->io;
-   struct phy *phy = dp_io->phy;
-   struct phy_configure_opts_dp *opts_dp = _io->phy_opts.dp;
+   struct phy *phy = ctrl->parser->io.phy;
const u8 *dpcd = ctrl->panel->dpcd;
 
-   opts_dp->lanes = ctrl->link->link_params.num_lanes;
-   opts_dp->link_rate = ctrl->link->link_params.rate / 100;
-   opts_dp->ssc = drm_dp_max_downspread(dpcd);
+   ctrl->phy_opts.dp.lanes = ctrl->link->link_params.num_lanes;
+   ctrl->phy_opts.dp.link_rate = ctrl->link->link_params.rate / 100;
+   ctrl->phy_opts.dp.ssc = drm_dp_max_downspread(dpcd);
 
-   phy_configure(phy, _io->phy_opts);
+   phy_configure(phy, >phy_opts);
phy_power_on(phy);
 
dev_pm_opp_set_rate(ctrl->dev,

[PATCH v3 10/15] drm/msm/dp: remove PHY handling from dp_catalog.c

2024-01-25 Thread Dmitry Baryshkov

Inline dp_catalog_aux_update_cfg() and call phy_calibrate() from dp_aux
functions directly.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_aux.c |  9 +++--
 drivers/gpu/drm/msm/dp/dp_aux.h |  1 +
 drivers/gpu/drm/msm/dp/dp_catalog.c | 12 
 drivers/gpu/drm/msm/dp/dp_catalog.h |  1 -
 drivers/gpu/drm/msm/dp/dp_display.c |  4 +++-
 5 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
index 03f4951c49f4..adbd5a367395 100644
--- a/drivers/gpu/drm/msm/dp/dp_aux.c
+++ b/drivers/gpu/drm/msm/dp/dp_aux.c
@@ -4,6 +4,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "dp_reg.h"
@@ -23,6 +24,8 @@ struct dp_aux_private {
struct device *dev;
struct dp_catalog *catalog;
 
+   struct phy *phy;
+
struct mutex mutex;
struct completion comp;
 
@@ -336,7 +339,7 @@ static ssize_t dp_aux_transfer(struct drm_dp_aux *dp_aux,
if (aux->native) {
aux->retry_cnt++;
if (!(aux->retry_cnt % MAX_AUX_RETRIES))
-   dp_catalog_aux_update_cfg(aux->catalog);
+   phy_calibrate(aux->phy);
}
/* reset aux if link is in connected state */
if (dp_catalog_link_is_connected(aux->catalog))
@@ -439,7 +442,7 @@ void dp_aux_reconfig(struct drm_dp_aux *dp_aux)
 
aux = container_of(dp_aux, struct dp_aux_private, dp_aux);
 
-   dp_catalog_aux_update_cfg(aux->catalog);
+   phy_calibrate(aux->phy);
dp_catalog_aux_reset(aux->catalog);
 }
 
@@ -517,6 +520,7 @@ static int dp_wait_hpd_asserted(struct drm_dp_aux *dp_aux,
 }
 
 struct drm_dp_aux *dp_aux_get(struct device *dev, struct dp_catalog *catalog,
+ struct phy *phy,
  bool is_edp)
 {
struct dp_aux_private *aux;
@@ -537,6 +541,7 @@ struct drm_dp_aux *dp_aux_get(struct device *dev, struct 
dp_catalog *catalog,
 
aux->dev = dev;
aux->catalog = catalog;
+   aux->phy = phy;
aux->retry_cnt = 0;
 
/*
diff --git a/drivers/gpu/drm/msm/dp/dp_aux.h b/drivers/gpu/drm/msm/dp/dp_aux.h
index 511305da4f66..16d9b1758748 100644
--- a/drivers/gpu/drm/msm/dp/dp_aux.h
+++ b/drivers/gpu/drm/msm/dp/dp_aux.h
@@ -17,6 +17,7 @@ void dp_aux_deinit(struct drm_dp_aux *dp_aux);
 void dp_aux_reconfig(struct drm_dp_aux *dp_aux);
 
 struct drm_dp_aux *dp_aux_get(struct device *dev, struct dp_catalog *catalog,
+ struct phy *phy,
  bool is_edp);
 void dp_aux_put(struct drm_dp_aux *aux);
 
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index e07651768805..4c6207797c99 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -7,8 +7,6 @@
 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
 #include 
@@ -243,16 +241,6 @@ void dp_catalog_aux_enable(struct dp_catalog *dp_catalog, 
bool enable)
dp_write_aux(catalog, REG_DP_AUX_CTRL, aux_ctrl);
 }
 
-void dp_catalog_aux_update_cfg(struct dp_catalog *dp_catalog)
-{
-   struct dp_catalog_private *catalog = container_of(dp_catalog,
-   struct dp_catalog_private, dp_catalog);
-   struct dp_io *dp_io = catalog->io;
-   struct phy *phy = dp_io->phy;
-
-   phy_calibrate(phy);
-}
-
 int dp_catalog_aux_wait_for_hpd_connect_state(struct dp_catalog *dp_catalog)
 {
u32 state;
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index ba7c62ba7ca3..1f3f58d4b8de 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -84,7 +84,6 @@ int dp_catalog_aux_clear_trans(struct dp_catalog *dp_catalog, 
bool read);
 int dp_catalog_aux_clear_hw_interrupts(struct dp_catalog *dp_catalog);
 void dp_catalog_aux_reset(struct dp_catalog *dp_catalog);
 void dp_catalog_aux_enable(struct dp_catalog *dp_catalog, bool enable);
-void dp_catalog_aux_update_cfg(struct dp_catalog *dp_catalog);
 int dp_catalog_aux_wait_for_hpd_connect_state(struct dp_catalog *dp_catalog);
 u32 dp_catalog_aux_get_irq(struct dp_catalog *dp_catalog);
 
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 6fbbd0f93d13..c1a51c498e01 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -729,7 +729,9 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
goto error;
}
 
-   dp->aux = dp_aux_get(dev, dp->catalog, dp->dp_display.is_edp);
+   dp->aux = dp_aux_get(dev, dp->catalog,
+dp->parser->io.phy,
+dp->dp_display.is_edp);
if (IS_ERR(dp->aux)) {
rc = PTR_ERR(dp->aux);
DRM_ERROR("failed to initialize aux, rc = %d\n", rc);

[PATCH v3 08/15] drm/msm/dp: split dp_ctrl_clk_enable into four functuions

2024-01-25 Thread Dmitry Baryshkov

Split the dp_ctrl_clk_enable() beast into four functions, each of them
doing just a single item: enabling or disabling core or link clocks.
This allows us to cleanup the dss_module_power structure and makes
several dp_ctrl functions return void.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 220 +---
 drivers/gpu/drm/msm/dp/dp_ctrl.h|  16 +--
 drivers/gpu/drm/msm/dp/dp_display.c |   4 +-
 3 files changed, 108 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index cfcf6136ffa6..e367eb8e5bea 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -69,11 +69,6 @@ struct dp_vc_tu_mapping_table {
u8 tu_size_minus1;
 };
 
-struct dss_module_power {
-   unsigned int num_clk;
-   struct clk_bulk_data *clocks;
-};
-
 struct dp_ctrl_private {
struct dp_ctrl dp_ctrl;
struct drm_device *drm_dev;
@@ -84,7 +79,12 @@ struct dp_ctrl_private {
struct dp_parser *parser;
struct dp_catalog *catalog;
 
-   struct dss_module_power mp[DP_MAX_PM];
+   unsigned int num_core_clks;
+   struct clk_bulk_data *core_clks;
+
+   unsigned int num_link_clks;
+   struct clk_bulk_data *link_clks;
+
struct clk *pixel_clk;
 
struct completion idle_comp;
@@ -96,15 +96,6 @@ struct dp_ctrl_private {
bool stream_clks_on;
 };
 
-static inline const char *dp_pm_name(enum dp_pm_type module)
-{
-   switch (module) {
-   case DP_CORE_PM:return "DP_CORE_PM";
-   case DP_CTRL_PM:return "DP_CTRL_PM";
-   default:return "???";
-   }
-}
-
 static int dp_aux_link_configure(struct drm_dp_aux *aux,
struct dp_link_info *link)
 {
@@ -1337,67 +1328,76 @@ static int dp_ctrl_setup_main_link(struct 
dp_ctrl_private *ctrl,
return ret;
 }
 
-int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
-  enum dp_pm_type pm_type, bool enable)
+int dp_ctrl_core_clk_enable(struct dp_ctrl *dp_ctrl)
 {
struct dp_ctrl_private *ctrl;
-   struct dss_module_power *mp;
int ret = 0;
 
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
 
-   if (pm_type != DP_CORE_PM &&
-   pm_type != DP_CTRL_PM) {
-   DRM_ERROR("unsupported ctrl module: %s\n",
- dp_pm_name(pm_type));
-   return -EINVAL;
+   if (ctrl->core_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev, "core clks already enabled\n");
+   return 0;
}
 
-   if (enable) {
-   if (pm_type == DP_CORE_PM && ctrl->core_clks_on) {
-   drm_dbg_dp(ctrl->drm_dev,
-  "core clks already enabled\n");
-   return 0;
-   }
+   ret = clk_bulk_prepare_enable(ctrl->num_core_clks, ctrl->core_clks);
+   if (ret)
+   return ret;
 
-   if (pm_type == DP_CTRL_PM && ctrl->link_clks_on) {
-   drm_dbg_dp(ctrl->drm_dev,
-  "links clks already enabled\n");
-   return 0;
-   }
+   ctrl->core_clks_on = true;
 
-   if ((pm_type == DP_CTRL_PM) && (!ctrl->core_clks_on)) {
-   drm_dbg_dp(ctrl->drm_dev,
-  "Enable core clks before link clks\n");
-   mp = >mp[DP_CORE_PM];
+   drm_dbg_dp(ctrl->drm_dev, "enable core clocks \n");
+   drm_dbg_dp(ctrl->drm_dev, "stream_clks:%s link_clks:%s core_clks:%s\n",
+  ctrl->stream_clks_on ? "on" : "off",
+  ctrl->link_clks_on ? "on" : "off",
+  ctrl->core_clks_on ? "on" : "off");
 
-   ret = clk_bulk_prepare_enable(mp->num_clk, mp->clocks);
-   if (ret)
-   return ret;
+   return 0;
+}
 
-   ctrl->core_clks_on = true;
-   }
+void dp_ctrl_core_clk_disable(struct dp_ctrl *dp_ctrl)
+{
+   struct dp_ctrl_private *ctrl;
+
+   ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
+
+   clk_bulk_disable_unprepare(ctrl->num_core_clks, ctrl->core_clks);
+
+   ctrl->core_clks_on = false;
+
+   drm_dbg_dp(ctrl->drm_dev, "disable core clocks \n");
+   drm_dbg_dp(ctrl->drm_dev, "stream_clks:%s link_clks:%s core_clks:%s\n",
+  ctrl->stream_clks_on ? "on" : "off",
+  ctrl->link_clks_on ? "on" : "off",
+  ctrl->core_clks_on ? "on" : "off");
+}
+
+static int dp_ctrl_link_clk_enable(struct dp_ctrl *dp_ctrl)
+{
+   struct dp_ctrl_private *ctrl;
+   int ret = 0;
+
+   ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
+
+   if (ctrl->link_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev, "links clks

[PATCH v3 06/15] drm/msm/dp: simplify stream clocks handling

2024-01-25 Thread Dmitry Baryshkov

There is only a single DP_STREAM_PM clock, stream_pixel. Instead of
using a separate dss_module_power instance for this single clock, handle
this clock directly. This allows us to drop several wrapping functions.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_ctrl.c   | 91 --
 drivers/gpu/drm/msm/dp/dp_parser.c | 41 -
 drivers/gpu/drm/msm/dp/dp_parser.h |  2 -
 3 files changed, 47 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index da29281c575b..56a424a82a1b 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -79,6 +79,8 @@ struct dp_ctrl_private {
struct dp_parser *parser;
struct dp_catalog *catalog;
 
+   struct clk *pixel_clk;
+
struct completion idle_comp;
struct completion psr_op_comp;
struct completion video_comp;
@@ -1320,27 +1322,6 @@ static int dp_ctrl_setup_main_link(struct 
dp_ctrl_private *ctrl,
return ret;
 }
 
-static void dp_ctrl_set_clock_rate(struct dp_ctrl_private *ctrl,
-   enum dp_pm_type module, char *name, unsigned long rate)
-{
-   u32 num = ctrl->parser->mp[module].num_clk;
-   struct clk_bulk_data *cfg = ctrl->parser->mp[module].clocks;
-
-   while (num && strcmp(cfg->id, name)) {
-   num--;
-   cfg++;
-   }
-
-   drm_dbg_dp(ctrl->drm_dev, "setting rate=%lu on clk=%s\n",
-   rate, name);
-
-   if (num)
-   clk_set_rate(cfg->clk, rate);
-   else
-   DRM_ERROR("%s clock doesn't exit to set rate %lu\n",
-   name, rate);
-}
-
 int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
   enum dp_pm_type pm_type, bool enable)
 {
@@ -1351,8 +1332,7 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
ctrl = container_of(dp_ctrl, struct dp_ctrl_private, dp_ctrl);
 
if (pm_type != DP_CORE_PM &&
-   pm_type != DP_CTRL_PM &&
-   pm_type != DP_STREAM_PM) {
+   pm_type != DP_CTRL_PM) {
DRM_ERROR("unsupported ctrl module: %s\n",
  dp_parser_pm_name(pm_type));
return -EINVAL;
@@ -1371,12 +1351,6 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
return 0;
}
 
-   if (pm_type == DP_STREAM_PM && ctrl->stream_clks_on) {
-   drm_dbg_dp(ctrl->drm_dev,
-  "pixel clks already enabled\n");
-   return 0;
-   }
-
if ((pm_type == DP_CTRL_PM) && (!ctrl->core_clks_on)) {
drm_dbg_dp(ctrl->drm_dev,
   "Enable core clks before link clks\n");
@@ -1401,8 +1375,6 @@ int dp_ctrl_clk_enable(struct dp_ctrl *dp_ctrl,
 
if (pm_type == DP_CORE_PM)
ctrl->core_clks_on = enable;
-   else if (pm_type == DP_STREAM_PM)
-   ctrl->stream_clks_on = enable;
else
ctrl->link_clks_on = enable;
 
@@ -1734,14 +1706,23 @@ static int dp_ctrl_process_phy_test_request(struct 
dp_ctrl_private *ctrl)
}
 
pixel_rate = ctrl->panel->dp_mode.drm_mode.clock;
-   dp_ctrl_set_clock_rate(ctrl, DP_STREAM_PM, "stream_pixel", pixel_rate * 
1000);
-
-   ret = dp_ctrl_clk_enable(>dp_ctrl, DP_STREAM_PM, true);
+   ret = clk_set_rate(ctrl->pixel_clk, pixel_rate * 1000);
if (ret) {
-   DRM_ERROR("Failed to start pixel clocks. ret=%d\n", ret);
+   DRM_ERROR("Failed to set pixel clock rate. ret=%d\n", ret);
return ret;
}
 
+   if (ctrl->stream_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev, "pixel clks already enabled\n");
+   } else {
+   ret = clk_prepare_enable(ctrl->pixel_clk);
+   if (ret) {
+   DRM_ERROR("Failed to start pixel clocks. ret=%d\n", 
ret);
+   return ret;
+   }
+   ctrl->stream_clks_on = true;
+   }
+
dp_ctrl_send_phy_test_pattern(ctrl);
 
return 0;
@@ -1977,14 +1958,23 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
}
}
 
-   dp_ctrl_set_clock_rate(ctrl, DP_STREAM_PM, "stream_pixel", pixel_rate * 
1000);
-
-   ret = dp_ctrl_clk_enable(>dp_ctrl, DP_STREAM_PM, true);
+   ret = clk_set_rate(ctrl->pixel_clk, pixel_rate * 1000);
if (ret) {
-   DRM_ERROR("Unable to start pixel clocks. ret=%d\n", ret);
+   DRM_ERROR("Failed to set pixel clock rate. ret=%d\n", ret);
goto end;
}
 
+   if (ctrl->stream_clks_on) {
+   drm_dbg_dp(ctrl->drm_dev, "pixel clks already enabled\n");
+   } else {
+   ret = clk_prepare_enable(ctrl->pixel_clk);
+

[PATCH v3 04/15] drm/msm/dp: inline dp_power_(de)init

2024-01-25 Thread Dmitry Baryshkov

In preparation to cleanup of the dp_power module, inline dp_power_init()
and dp_power_deinit() functions, which are now just turning the clocks
on and off.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_display.c |  4 ++--
 drivers/gpu/drm/msm/dp/dp_power.c   | 10 --
 drivers/gpu/drm/msm/dp/dp_power.h   | 21 -
 3 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 67b48f0a6c83..8cd18705740f 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -434,7 +434,7 @@ static void dp_display_host_init(struct dp_display_private 
*dp)
dp->dp_display.connector_type, dp->core_initialized,
dp->phy_initialized);
 
-   dp_power_init(dp->power);
+   dp_power_clk_enable(dp->power, DP_CORE_PM, true);
dp_ctrl_reset_irq_ctrl(dp->ctrl, true);
dp_aux_init(dp->aux);
dp->core_initialized = true;
@@ -448,7 +448,7 @@ static void dp_display_host_deinit(struct 
dp_display_private *dp)
 
dp_ctrl_reset_irq_ctrl(dp->ctrl, false);
dp_aux_deinit(dp->aux);
-   dp_power_deinit(dp->power);
+   dp_power_clk_enable(dp->power, DP_CORE_PM, false);
dp->core_initialized = false;
 }
 
diff --git a/drivers/gpu/drm/msm/dp/dp_power.c 
b/drivers/gpu/drm/msm/dp/dp_power.c
index b095a5b47c8b..f49e3aede308 100644
--- a/drivers/gpu/drm/msm/dp/dp_power.c
+++ b/drivers/gpu/drm/msm/dp/dp_power.c
@@ -152,16 +152,6 @@ int dp_power_client_init(struct dp_power *dp_power)
return dp_power_clk_init(power);
 }
 
-int dp_power_init(struct dp_power *dp_power)
-{
-   return dp_power_clk_enable(dp_power, DP_CORE_PM, true);
-}
-
-int dp_power_deinit(struct dp_power *dp_power)
-{
-   return dp_power_clk_enable(dp_power, DP_CORE_PM, false);
-}
-
 struct dp_power *dp_power_get(struct device *dev, struct dp_parser *parser)
 {
struct dp_power_private *power;
diff --git a/drivers/gpu/drm/msm/dp/dp_power.h 
b/drivers/gpu/drm/msm/dp/dp_power.h
index 55ada51edb57..eb836b5aa24a 100644
--- a/drivers/gpu/drm/msm/dp/dp_power.h
+++ b/drivers/gpu/drm/msm/dp/dp_power.h
@@ -22,27 +22,6 @@ struct dp_power {
bool stream_clks_on;
 };
 
-/**
- * dp_power_init() - enable power supplies for display controller
- *
- * @power: instance of power module
- * return: 0 if success or error if failure.
- *
- * This API will turn on the regulators and configures gpio's
- * aux/hpd.
- */
-int dp_power_init(struct dp_power *power);
-
-/**
- * dp_power_deinit() - turn off regulators and gpios.
- *
- * @power: instance of power module
- * return: 0 for success
- *
- * This API turns off power and regulators.
- */
-int dp_power_deinit(struct dp_power *power);
-
 /**
  * dp_power_clk_status() - display controller clocks status
  *

-- 
2.39.2

[PATCH v3 03/15] drm/msm/dp: parse DT from dp_parser_get

2024-01-25 Thread Dmitry Baryshkov

It makes little sense to split the submodule get and actual DT parsing.
Call dp_parser_parse() directly from dp_parser_get(), so that the parser
data is fully initialised once it is returned to the caller.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 6 --
 drivers/gpu/drm/msm/dp/dp_parser.c  | 8 +++-
 drivers/gpu/drm/msm/dp/dp_parser.h  | 3 ---
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index d37d599aec27..67b48f0a6c83 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1266,12 +1266,6 @@ static int dp_display_probe(struct platform_device *pdev)
return -EPROBE_DEFER;
}
 
-   rc = dp->parser->parse(dp->parser);
-   if (rc) {
-   DRM_ERROR("device tree parsing failed\n");
-   goto err;
-   }
-
rc = dp_power_client_init(dp->power);
if (rc) {
DRM_ERROR("Power client create failed\n");
diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
b/drivers/gpu/drm/msm/dp/dp_parser.c
index 7032dcc8842b..2d9d126c119b 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.c
+++ b/drivers/gpu/drm/msm/dp/dp_parser.c
@@ -315,13 +315,19 @@ static int dp_parser_parse(struct dp_parser *parser)
 struct dp_parser *dp_parser_get(struct platform_device *pdev)
 {
struct dp_parser *parser;
+   int ret;
 
parser = devm_kzalloc(>dev, sizeof(*parser), GFP_KERNEL);
if (!parser)
return ERR_PTR(-ENOMEM);
 
-   parser->parse = dp_parser_parse;
parser->pdev = pdev;
 
+   ret = dp_parser_parse(parser);
+   if (ret) {
+   dev_err(>dev, "device tree parsing failed\n");
+   return ERR_PTR(ret);
+   }
+
return parser;
 }
diff --git a/drivers/gpu/drm/msm/dp/dp_parser.h 
b/drivers/gpu/drm/msm/dp/dp_parser.h
index 90a2cdbbe344..4ccc432b4142 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.h
+++ b/drivers/gpu/drm/msm/dp/dp_parser.h
@@ -67,7 +67,6 @@ struct dss_module_power {
  *
  * @pdev: platform data of the client
  * @mp: gpio, regulator and clock related data
- * @parse: function to be called by client to parse device tree.
  */
 struct dp_parser {
struct platform_device *pdev;
@@ -76,8 +75,6 @@ struct dp_parser {
u32 max_dp_lanes;
u32 max_dp_link_rate;
struct drm_bridge *next_bridge;
-
-   int (*parse)(struct dp_parser *parser);
 };
 
 /**

-- 
2.39.2

[PATCH v3 01/15] drm/msm/dp: drop unused parser definitions

2024-01-25 Thread Dmitry Baryshkov

Drop several unused and obsolete definitions from the dp_parser module.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_parser.h | 46 --
 1 file changed, 46 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_parser.h 
b/drivers/gpu/drm/msm/dp/dp_parser.h
index 1f068626d445..90a2cdbbe344 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.h
+++ b/drivers/gpu/drm/msm/dp/dp_parser.h
@@ -12,7 +12,6 @@
 
 #include "msm_drv.h"
 
-#define DP_LABEL "MDSS DP DISPLAY"
 #define DP_MAX_PIXEL_CLK_KHZ   675000
 #define DP_MAX_NUM_DP_LANES4
 #define DP_LINK_RATE_HBR2  54 /* kbytes */
@@ -21,7 +20,6 @@ enum dp_pm_type {
DP_CORE_PM,
DP_CTRL_PM,
DP_STREAM_PM,
-   DP_PHY_PM,
DP_MAX_PM
 };
 
@@ -43,28 +41,10 @@ static inline const char *dp_parser_pm_name(enum dp_pm_type 
module)
case DP_CORE_PM:return "DP_CORE_PM";
case DP_CTRL_PM:return "DP_CTRL_PM";
case DP_STREAM_PM:  return "DP_STREAM_PM";
-   case DP_PHY_PM: return "DP_PHY_PM";
default:return "???";
}
 }
 
-/**
- * struct dp_display_data  - display related device tree data.
- *
- * @ctrl_node: referece to controller device
- * @phy_node:  reference to phy device
- * @is_active: is the controller currently active
- * @name: name of the display
- * @display_type: type of the display
- */
-struct dp_display_data {
-   struct device_node *ctrl_node;
-   struct device_node *phy_node;
-   bool is_active;
-   const char *name;
-   const char *display_type;
-};
-
 /**
  * struct dp_ctrl_resource - controller's IO related data
  *
@@ -77,28 +57,6 @@ struct dp_io {
union phy_configure_opts phy_opts;
 };
 
-/**
- * struct dp_pinctrl - DP's pin control
- *
- * @pin: pin-controller's instance
- * @state_active: active state pin control
- * @state_hpd_active: hpd active state pin control
- * @state_suspend: suspend state pin control
- */
-struct dp_pinctrl {
-   struct pinctrl *pin;
-   struct pinctrl_state *state_active;
-   struct pinctrl_state *state_hpd_active;
-   struct pinctrl_state *state_suspend;
-};
-
-/* Regulators for DP devices */
-struct dp_reg_entry {
-   char name[32];
-   int enable_load;
-   int disable_load;
-};
-
 struct dss_module_power {
unsigned int num_clk;
struct clk_bulk_data *clocks;
@@ -109,16 +67,12 @@ struct dss_module_power {
  *
  * @pdev: platform data of the client
  * @mp: gpio, regulator and clock related data
- * @pinctrl: pin-control related data
- * @disp_data: controller's display related data
  * @parse: function to be called by client to parse device tree.
  */
 struct dp_parser {
struct platform_device *pdev;
struct dss_module_power mp[DP_MAX_PM];
-   struct dp_pinctrl pinctrl;
struct dp_io io;
-   struct dp_display_data disp_data;
u32 max_dp_lanes;
u32 max_dp_link_rate;
struct drm_bridge *next_bridge;

-- 
2.39.2

[PATCH v3 00/15] drm/msm/dp: clear power and parser submodules away

2024-01-25 Thread Dmitry Baryshkov

Reshuffle code in the DP driver, cleaning up clocks and DT parsing and
dropping the dp_power and dp_parser submodules.

Initially I started by looking onto stream_pixel clock handling only to
find several wrapping layers around a single clocks. After inlining
and/or dropping them (and thus dp_power submodule), it was more or less
natural to continue cleaning up the dp_parser until it got removed
completely.

---
Changes in v3:
- Fixed crash in the DP when there is no next bridge (Kuogee)
- Removed excess documentation for the removed dp_parser::io field
- Link to v2: 
https://lore.kernel.org/r/20231231-dp-power-parser-cleanup-v2-0-fc3e902a6...@linaro.org

Changes in v2:
- Fixed unrelated power->ctrl change in comment (Konrad)
- Made sure that all functions use reverse-Christmas-tree flow (Konrad)
- Fixed indents in several moved functions
- Added a patch splitting dp_ctlr_clk_enable
- Link to v1: 
https://lore.kernel.org/r/20231229225650.912751-1-dmitry.barysh...@linaro.org

---
Dmitry Baryshkov (15):
  drm/msm/dp: drop unused parser definitions
  drm/msm/dp: drop unused fields from dp_power_private
  drm/msm/dp: parse DT from dp_parser_get
  drm/msm/dp: inline dp_power_(de)init
  drm/msm/dp: fold dp_power into dp_ctrl module
  drm/msm/dp: simplify stream clocks handling
  drm/msm/dp: stop parsing clock names from DT
  drm/msm/dp: split dp_ctrl_clk_enable into four functuions
  drm/msm/dp: move phy_configure_opts to dp_ctrl
  drm/msm/dp: remove PHY handling from dp_catalog.c
  drm/msm/dp: handle PHY directly in dp_ctrl
  drm/msm/dp: move all IO handling to dp_catalog
  drm/msm/dp: move link property handling to dp_panel
  drm/msm/dp: move next_bridge handling to dp_display
  drm/msm/dp: drop dp_parser

 drivers/gpu/drm/msm/Makefile|   2 -
 drivers/gpu/drm/msm/dp/dp_aux.c |   9 +-
 drivers/gpu/drm/msm/dp/dp_aux.h |   2 +
 drivers/gpu/drm/msm/dp/dp_catalog.c | 156 +++-
 drivers/gpu/drm/msm/dp/dp_catalog.h |   6 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 358 
 drivers/gpu/drm/msm/dp/dp_ctrl.h|  17 +-
 drivers/gpu/drm/msm/dp/dp_debug.c   |   1 -
 drivers/gpu/drm/msm/dp/dp_display.c | 102 +++---
 drivers/gpu/drm/msm/dp/dp_display.h |   3 +-
 drivers/gpu/drm/msm/dp/dp_panel.c   |  66 +++
 drivers/gpu/drm/msm/dp/dp_parser.c  | 327 
 drivers/gpu/drm/msm/dp/dp_parser.h  | 155 
 drivers/gpu/drm/msm/dp/dp_power.c   | 183 --
 drivers/gpu/drm/msm/dp/dp_power.h   |  95 --
 15 files changed, 465 insertions(+), 1017 deletions(-)
---
base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
change-id: 20231231-dp-power-parser-cleanup-9e3a5f9a6821

Best regards,
-- 
Dmitry Baryshkov

[PATCH v3 02/15] drm/msm/dp: drop unused fields from dp_power_private

2024-01-25 Thread Dmitry Baryshkov

Drop unused and obsolete fields from struct dp_power_private.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/msm/dp/dp_power.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_power.c 
b/drivers/gpu/drm/msm/dp/dp_power.c
index c4843dd69f47..b095a5b47c8b 100644
--- a/drivers/gpu/drm/msm/dp/dp_power.c
+++ b/drivers/gpu/drm/msm/dp/dp_power.c
@@ -16,9 +16,6 @@ struct dp_power_private {
struct dp_parser *parser;
struct device *dev;
struct drm_device *drm_dev;
-   struct clk *link_clk_src;
-   struct clk *pixel_provider;
-   struct clk *link_provider;
 
struct dp_power dp_power;
 };

-- 
2.39.2

[pull] amdgpu drm-fixes-6.8

2024-01-25 Thread Alex Deucher

Hi Dave, Sima,

Fixes for 6.8.

The following changes since commit b16702be210bb49256f8a32df2c310383134dd57:

  Merge tag 'exynos-drm-fixes-for-v6.8-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes 
(2024-01-25 14:22:15 +1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-6.8-2024-01-25

for you to fetch changes up to c82eb25c5f005b33aebb1415a8472fc2eeea0deb:

  drm/amd/display: "Enable IPS by default" (2024-01-25 16:00:24 -0500)


amd-drm-fixes-6.8-2024-01-25:

amdgpu:
- AC/DC power supply tracking fix
- Don't show invalid vram vendor data
- SMU 13.0.x fixes
- GART fix for umr on systems without VRAM
- GFX 10/11 UNORD_DISPATCH fixes
- IPS display fixes (required for S0ix on some platforms)
- Misc fixes


Alex Deucher (2):
  drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
  drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs

Alvin Lee (1):
  drm/amd/display: Add Replay IPS register for DMUB command table

ChunTao Tso (1):
  drm/amd/display: Replay + IPS + ABM in Full Screen VPB

Hawking Zhang (1):
  drm/amdgpu: Fix null pointer dereference

Kenneth Feng (1):
  drm/amd/pm: update the power cap setting

Lijo Lazar (3):
  drm/amdgpu: Avoid fetching vram vendor information
  drm/amdgpu: Show vram vendor only if available
  drm/amd/pm: Fetch current power limit from FW

Ma Jun (1):
  drm/amdgpu/pm: Fix the power source flag error

Nicholas Kazlauskas (1):
  drm/amd/display: Allow IPS2 during Replay

Roman Li (4):
  drm/amd/display: Add IPS checks before dcn register access
  drm/amd/display: Disable ips before dc interrupt setting
  drm/amd: Add a DC debug mask for IPS
  drm/amd/display: "Enable IPS by default"

Srinivasan Shanmugam (1):
  drm/amd/display: Fix uninitialized variable usage in core_link_ 
'read_dpcd() & write_dpcd()' functions

Tom St Denis (1):
  drm/amd/amdgpu: Assign GART pages to AMD device mapping

Yang Wang (1):
  drm/amd/pm: udpate smu v13.0.6 message permission

 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c   | 17 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c   |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c   |  1 +
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 21 -
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c  |  5 +-
 drivers/gpu/drm/amd/display/dc/dc.h|  1 +
 drivers/gpu/drm/amd/display/dc/dc_types.h  |  5 ++
 .../drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c|  9 +++-
 .../drm/amd/display/dc/link/protocols/link_dpcd.c  |  4 +-
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h| 47 +++
 .../drm/amd/display/modules/power/power_helpers.c  |  5 ++
 .../drm/amd/display/modules/power/power_helpers.h  |  1 +
 drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
 drivers/gpu/drm/amd/include/amdgpu_reg_state.h |  2 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  | 14 ++
 drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c |  2 +
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c |  2 +
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c   | 54 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c   |  4 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c   | 54 +-
 24 files changed, 229 insertions(+), 36 deletions(-)

Re: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Danilo Krummrich


On 1/24/24 04:57, Zeng, Oak wrote:

Thanks a lot Danilo.

Maybe I wasn't clear enough. In the solution I proposed, each device still have 
separate vm/page tables. Each device still need to manage the mapping, page 
table flags etc. It is just in svm use case, all devices share one drm_gpuvm 
instance. As I understand it, drm_gpuvm's main function is the va range split 
and merging. I don't see why it doesn't work across gpu devices.


I'm pretty sure it does work. You can indeed use GPUVM for tracking mappings 
using
the split and merge feature only, ignoring all other features it provides. 
However,
I don't think it's a good idea to have a single GPUVM instance to track the 
memory
mappings of different devices with different page tables, different object life 
times,
etc.



But I read more about drm_gpuvm. Its split merge function takes a 
drm_gem_object parameter, see drm_gpuvm_sm_map_ops_create and drm_gpuvm_sm_map. 
Actually the whole drm_gpuvm is designed for BO-centric driver, for example, it 
has a drm_gpuvm_bo concept to keep track of the 1BO:Ngpuva mapping. The whole 
purpose of leveraging drm_gpuvm is to re-use the va split/merge functions for 
SVM. But in our SVM implementation, there is no buffer object at all. So I 
don't think our SVM codes can leverage drm_gpuvm.


That's all optional features. As mentioned above, you can use GPUVM for 
tracking mappings
using the split and merge feature only. The drm_gem_object parameter in
drm_gpuvm_sm_map_ops_create() can simply be NULL. Afaik, Xe already does that 
for userptr
stuff already. But again, I don't think it's a good idea to track memory 
mappings of
multiple independent physical devices and driver instances in a single 
different place
whether you use GPUVM or a custom implementation.

- Danilo



I will give up this approach, unless Matt or Brian can see a way.

A few replies inline @Welty, Brian I had more thoughts inline to one of 
your original question


-Original Message-
From: Danilo Krummrich 
Sent: Tuesday, January 23, 2024 6:57 PM
To: Zeng, Oak ; Christian König
; Dave Airlie ; Daniel Vetter
; Felix Kuehling 
Cc: Welty, Brian ; dri-devel@lists.freedesktop.org; 
intel-
x...@lists.freedesktop.org; Bommu, Krishnaiah ;
Ghimiray, Himal Prasad ;
thomas.hellst...@linux.intel.com; Vishwanathapura, Niranjana
; Brost, Matthew
; Gupta, saurabhg 
Subject: Re: Making drm_gpuvm work across gpu devices

Hi Oak,

On 1/23/24 20:37, Zeng, Oak wrote:

Thanks Christian. I have some comment inline below.

Danilo, can you also take a look and give your feedback? Thanks.


I agree with everything Christian already wrote. Except for the KFD parts, which
I'm simply not familiar with, I had exactly the same thoughts after reading your
initial mail.

Please find some more comments below.




-Original Message-
From: Christian König 
Sent: Tuesday, January 23, 2024 6:13 AM
To: Zeng, Oak ; Danilo Krummrich ;
Dave Airlie ; Daniel Vetter 
Cc: Welty, Brian ; dri-devel@lists.freedesktop.org;

intel-

x...@lists.freedesktop.org; Bommu, Krishnaiah

;

Ghimiray, Himal Prasad ;
thomas.hellst...@linux.intel.com; Vishwanathapura, Niranjana
; Brost, Matthew

Subject: Re: Making drm_gpuvm work across gpu devices

Hi Oak,

Am 23.01.24 um 04:21 schrieb Zeng, Oak:

Hi Danilo and all,

During the work of Intel's SVM code, we came up the idea of making

drm_gpuvm to work across multiple gpu devices. See some discussion here:
https://lore.kernel.org/dri-


devel/PH7PR11MB70049E7E6A2F40BF6282ECC292742@PH7PR11MB7004.namprd

11.prod.outlook.com/


The reason we try to do this is, for a SVM (shared virtual memory across cpu

program and all gpu program on all gpu devices) process, the address space

has

to be across all gpu devices. So if we make drm_gpuvm to work across devices,
then our SVM code can leverage drm_gpuvm as well.


At a first look, it seems feasible because drm_gpuvm doesn't really use the

drm_device *drm pointer a lot. This param is used only for printing/warning.

So I

think maybe we can delete this drm field from drm_gpuvm.


This way, on a multiple gpu device system, for one process, we can have only

one drm_gpuvm instance, instead of multiple drm_gpuvm instances (one for
each gpu device).


What do you think?


Well from the GPUVM side I don't think it would make much difference if
we have the drm device or not.

But the experience we had with the KFD I think I should mention that we
should absolutely *not* deal with multiple devices at the same time in
the UAPI or VM objects inside the driver.

The background is that all the APIs inside the Linux kernel are build
around the idea that they work with only one device at a time. This
accounts for both low level APIs like the DMA API as well as pretty high
level things like for example file system address space etc...


Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at the svm_ioctl

function, it is per-process based. Each

Re: [PATCH 10/17] drm/msm/dp: modify dp_catalog_hw_revision to show major and minor val

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Modify dp_catalog_hw_revision to make the major and minor version values
known instead of outputting the entire hex value of the hardware version
register in preparation of using it for VSC SDP programming.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_catalog.c | 12 +---
  drivers/gpu/drm/msm/dp/dp_catalog.h |  2 +-
  2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 5d84c089e520a..c025786170ba5 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -24,6 +24,9 @@
  #define DP_INTERRUPT_STATUS_ACK_SHIFT 1
  #define DP_INTERRUPT_STATUS_MASK_SHIFT2
  
+#define DP_HW_VERSION_MAJOR(reg)	FIELD_GET(GENMASK(31, 28), reg)

+#define DP_HW_VERSION_MINOR(reg)   FIELD_GET(GENMASK(27, 16), reg)
+
  #define DP_INTF_CONFIG_DATABUS_WIDEN BIT(4)
  
  #define DP_INTERRUPT_STATUS1 \

@@ -531,15 +534,18 @@ int dp_catalog_ctrl_set_pattern_state_bit(struct 
dp_catalog *dp_catalog,
   *
   * @dp_catalog: DP catalog structure
   *
- * Return: DP controller hw revision
+ * Return: void
   *
   */
-u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog)
+void dp_catalog_hw_revision(const struct dp_catalog *dp_catalog, u16 *major, 
u16 *minor)
  {
const struct dp_catalog_private *catalog = container_of(dp_catalog,
struct dp_catalog_private, dp_catalog);
+   u32 reg_dp_hw_version;
  
-	return dp_read_ahb(catalog, REG_DP_HW_VERSION);

+   reg_dp_hw_version = dp_read_ahb(catalog, REG_DP_HW_VERSION);
+   *major = DP_HW_VERSION_MAJOR(reg_dp_hw_version);
+   *minor = DP_HW_VERSION_MINOR(reg_dp_hw_version);


After looking at the code, it might be easier to keep 
dp_catalog_hw_revision as is, add define for hw revision 1.2 and 
corepare to it directly.



  }
  
  /**

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 563903605b3a7..94c377ef90c35 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -170,7 +170,7 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog, u32 cc, u32 tb);
  void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
  int dp_catalog_ctrl_set_pattern_state_bit(struct dp_catalog *dp_catalog, u32 
pattern);
-u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog);
+void dp_catalog_hw_revision(const struct dp_catalog *dp_catalog, u16 *major, 
u16 *minor);
  void dp_catalog_ctrl_reset(struct dp_catalog *dp_catalog);
  bool dp_catalog_ctrl_mainlink_ready(struct dp_catalog *dp_catalog);
  void dp_catalog_ctrl_enable_irq(struct dp_catalog *dp_catalog, bool enable);


--
With best wishes
Dmitry

Re: [PATCH 17/17] drm/msm/dp: allow YUV420 mode for DP connector when VSC SDP supported

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

All the components of YUV420 over DP are added. Therefore, let's mark the
connector property as true for DP connector when the DP type is not eDP
and when VSC SDP is supported.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_display.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 4329435518351..97edd607400b8 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -370,11 +370,14 @@ static int dp_display_process_hpd_high(struct 
dp_display_private *dp)
  
  	dp_link_process_request(dp->link);
  
-	if (!dp->dp_display.is_edp)

+   if (!dp->dp_display.is_edp) {
+   if (dp_panel_vsc_sdp_supported(dp->panel))
+   dp->dp_display.connector->ycbcr_420_allowed = true;


Please consider fixing a TODO in drm_bridge_connector_init().


drm_dp_set_subconnector_property(dp->dp_display.connector,
 connector_status_connected,
 dp->panel->dpcd,
 dp->panel->downstream_ports);
+   }
  
  	edid = dp->panel->edid;
  


--
With best wishes
Dmitry

Re: [PATCH 16/17] drm/msm/dpu: reserve CDM blocks for DP if mode is YUV420

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Reserve CDM blocks for DP if the mode format is YUV420. Currently this
reservation only works for writeback and DP if the format is YUV420. But
this can be easily extented to other YUV formats for DP.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 99ec53446ad21..c7dcda3d54ae6 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -613,6 +613,7 @@ static int dpu_encoder_virt_atomic_check(
struct dpu_kms *dpu_kms;
struct drm_display_mode *adj_mode;
struct msm_display_topology topology;
+   struct msm_display_info *disp_info;
struct dpu_global_state *global_state;
struct drm_framebuffer *fb;
struct drm_dsc_config *dsc;
@@ -629,6 +630,7 @@ static int dpu_encoder_virt_atomic_check(
DPU_DEBUG_ENC(dpu_enc, "\n");
  
  	priv = drm_enc->dev->dev_private;

+   disp_info = _enc->disp_info;
dpu_kms = to_dpu_kms(priv->kms);
adj_mode = _state->adjusted_mode;
global_state = dpu_kms_get_global_state(crtc_state->state);
@@ -656,8 +658,8 @@ static int dpu_encoder_virt_atomic_check(
topology = dpu_encoder_get_topology(dpu_enc, dpu_kms, adj_mode, 
crtc_state, dsc);
  
  	/*

-* Use CDM only for writeback at the moment as other interfaces cannot 
handle it.
-* if writeback itself cannot handle cdm for some reason it will fail 
in its atomic_check()
+* Use CDM only for writeback or DP at the moment as other interfaces 
cannot handle it.
+* If writeback itself cannot handle cdm for some reason it will fail 
in its atomic_check()
 * earlier.
 */
if (dpu_enc->disp_info.intf_type == INTF_WB && 
conn_state->writeback_job) {
@@ -665,12 +667,15 @@ static int dpu_encoder_virt_atomic_check(
  
  		if (fb && DPU_FORMAT_IS_YUV(to_dpu_format(msm_framebuffer_format(fb

topology.needs_cdm = true;
-   if (topology.needs_cdm && !dpu_enc->cur_master->hw_cdm)
-   crtc_state->mode_changed = true;
-   else if (!topology.needs_cdm && dpu_enc->cur_master->hw_cdm)
-   crtc_state->mode_changed = true;
+   } else if (dpu_enc->disp_info.intf_type == INTF_DP) {


You can use disp_info directly here.


+   if 
(msm_dp_is_yuv_420_enabled(priv->dp[disp_info->h_tile_instance[0]], adj_mode))
+   topology.needs_cdm = true;
}
  
+	if (topology.needs_cdm && !dpu_enc->cur_master->hw_cdm)

+   crtc_state->mode_changed = true;
+   else if (!topology.needs_cdm && dpu_enc->cur_master->hw_cdm)
+   crtc_state->mode_changed = true;
/*
 * Release and Allocate resources on every modeset
 * Dont allocate when active is false.
@@ -,7 +1116,8 @@ static void dpu_encoder_virt_atomic_mode_set(struct 
drm_encoder *drm_enc,
  
  	dpu_enc->dsc_mask = dsc_mask;
  
-	if (dpu_enc->disp_info.intf_type == INTF_WB && conn_state->writeback_job) {

+   if ((dpu_enc->disp_info.intf_type == INTF_WB && 
conn_state->writeback_job) ||
+   dpu_enc->disp_info.intf_type == INTF_DP) {
struct dpu_hw_blk *hw_cdm = NULL;
  
  		dpu_rm_get_assigned_resources(_kms->rm, global_state,


--
With best wishes
Dmitry

Re: [PATCH 15/17] drm/msm/dpu: allow certain formats for CDM for DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

CDM block supports formats other than H1V2 for DP. Since we are now
adding support for CDM over DP, relax the checks to allow all other
formats for DP other than H1V2.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
index e9cdc7934a499..9016b3ade6bc3 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
@@ -186,7 +186,7 @@ static int dpu_hw_cdm_enable(struct dpu_hw_cdm *ctx, struct 
dpu_hw_cdm_cfg *cdm)
dpu_hw_cdm_setup_cdwn(ctx, cdm);
  
  	if (cdm->output_type == CDM_CDWN_OUTPUT_HDMI) {

-   if (fmt->chroma_sample != DPU_CHROMA_H1V2)
+   if (fmt->chroma_sample == DPU_CHROMA_H1V2)
return -EINVAL; /*unsupported format */


This means that the original check was incorrect. Please add 
corresponding Fixes tag and move to the top of the patchset.



opmode = CDM_HDMI_PACK_OP_MODE_EN;
opmode |= (fmt->chroma_sample << 1);


--
With best wishes
Dmitry

Re: [PATCH 14/17] drm/msm/dpu: modify encoder programming for CDM over DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Adjust the encoder format programming in the case of video mode for DP
to accommodate CDM related changes.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 16 +
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h   |  8 +
  .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  | 35 ---
  drivers/gpu/drm/msm/dp/dp_display.c   | 12 +++
  drivers/gpu/drm/msm/msm_drv.h |  9 -
  5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index b0896814c1562..99ec53446ad21 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -222,6 +222,22 @@ static u32 dither_matrix[DITHER_MATRIX_SZ] = {
15, 7, 13, 5, 3, 11, 1, 9, 12, 4, 14, 6, 0, 8, 2, 10
  };
  
+u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc, const struct drm_display_mode *mode)

+{
+   const struct dpu_encoder_virt *dpu_enc;
+   const struct msm_display_info *disp_info;
+   struct msm_drm_private *priv;
+
+   dpu_enc = to_dpu_encoder_virt(drm_enc);
+   disp_info = _enc->disp_info;
+   priv = drm_enc->dev->dev_private;
+
+   if (disp_info->intf_type == INTF_DP &&
+   msm_dp_is_yuv_420_enabled(priv->dp[disp_info->h_tile_instance[0]], 
mode))


This should not require interacting with DP. If we got here, we must be 
sure that 4:2:0 is supported and can be configured.



+   return DRM_FORMAT_YUV420;
+
+   return DRM_FORMAT_RGB888;
+}
  
  bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc)

  {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
index 7b4afa71f1f96..62255d0aa4487 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
@@ -162,6 +162,14 @@ int dpu_encoder_get_vsync_count(struct drm_encoder 
*drm_enc);
   */
  bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc);
  
+/**

+ * dpu_encoder_get_drm_fmt - return DRM fourcc format
+ * @drm_enc:Pointer to previously created drm encoder structure
+ * @mode:  Corresponding drm_display_mode for dpu encoder
+ */
+u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc,
+   const struct drm_display_mode *mode);
+
  /**
   * dpu_encoder_get_crc_values_cnt - get number of physical encoders contained
   *in virtual encoder that can collect CRC values
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index e284bf448bdda..a1dde0ff35dc8 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -234,6 +234,7 @@ static void dpu_encoder_phys_vid_setup_timing_engine(
  {
struct drm_display_mode mode;
struct dpu_hw_intf_timing_params timing_params = { 0 };
+   struct dpu_hw_cdm *hw_cdm;
const struct dpu_format *fmt = NULL;
u32 fmt_fourcc = DRM_FORMAT_RGB888;
unsigned long lock_flags;
@@ -254,17 +255,26 @@ static void dpu_encoder_phys_vid_setup_timing_engine(
DPU_DEBUG_VIDENC(phys_enc, "enabling mode:\n");
drm_mode_debug_printmodeline();
  
-	if (phys_enc->split_role != ENC_ROLE_SOLO) {

+   hw_cdm = phys_enc->hw_cdm;
+   if (hw_cdm) {
+   intf_cfg.cdm = hw_cdm->idx;
+   fmt_fourcc = dpu_encoder_get_drm_fmt(phys_enc->parent, );
+   }
+
+   if (phys_enc->split_role != ENC_ROLE_SOLO ||
+   dpu_encoder_get_drm_fmt(phys_enc->parent, ) == 
DRM_FORMAT_YUV420) {
mode.hdisplay >>= 1;
mode.htotal >>= 1;
mode.hsync_start >>= 1;
mode.hsync_end >>= 1;
+   mode.hskew >>= 1;


Separate patch.

  
  		DPU_DEBUG_VIDENC(phys_enc,

-   "split_role %d, halve horizontal %d %d %d %d\n",
+   "split_role %d, halve horizontal %d %d %d %d %d\n",
phys_enc->split_role,
mode.hdisplay, mode.htotal,
-   mode.hsync_start, mode.hsync_end);
+   mode.hsync_start, mode.hsync_end,
+   mode.hskew);
}
  
  	drm_mode_to_intf_timing_params(phys_enc, , _params);

@@ -412,8 +422,15 @@ static int dpu_encoder_phys_vid_control_vblank_irq(
  static void dpu_encoder_phys_vid_enable(struct dpu_encoder_phys *phys_enc)
  {
struct dpu_hw_ctl *ctl;
+   struct dpu_hw_cdm *hw_cdm;
+   const struct dpu_format *fmt = NULL;
+   u32 fmt_fourcc = DRM_FORMAT_RGB888;
  
  	ctl = phys_enc->hw_ctl;

+   hw_cdm = phys_enc->hw_cdm;
+   if (hw_cdm)
+   fmt_fourcc = dpu_encoder_get_drm_fmt(phys_enc->parent, 
_enc->cached_mode);
+   fmt =

Re: [PATCH] nouveau: rip out fence irq allow/block sequences.

2024-01-25 Thread Dave Airlie

On Fri, 26 Jan 2024 at 04:28, Daniel Vetter  wrote:
>
> On Tue, Jan 23, 2024 at 05:25:38PM +1000, Dave Airlie wrote:
> > From: Dave Airlie 
> >
> > fences are signalled on nvidia hw using non-stall interrupts.
> >
> > non-stall interrupts are not latched from my reading.
> >
> > When nouveau emits a fence, it requests a NON_STALL signalling,
> > but it only calls the interface to allow the non-stall irq to happen
> > after it has already emitted the fence. A recent change
> > eacabb546271 ("nouveau: push event block/allowing out of the fence context")
> > made this worse by pushing out the fence allow/block to a workqueue.
> >
> > However I can't see how this could ever work great, since when
> > enable signalling is called, the semaphore has already been emitted
> > to the ring, and the hw could already have tried to set the bits,
> > but it's been masked off. Changing the allowed mask later won't make
> > the interrupt get called again.
> >
> > For now rip all of this out.
> >
> > This fixes a bunch of stalls seen running VK CTS sync tests.
> >
> > Signed-off-by: Dave Airlie 
> > ---
> >  drivers/gpu/drm/nouveau/nouveau_fence.c | 77 +
> >  drivers/gpu/drm/nouveau/nouveau_fence.h |  2 -
> >  2 files changed, 16 insertions(+), 63 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> > b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > index 5057d976fa57..d6d50cdccf75 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> > @@ -50,24 +50,14 @@ nouveau_fctx(struct nouveau_fence *fence)
> >   return container_of(fence->base.lock, struct nouveau_fence_chan, 
> > lock);
> >  }
> >
> > -static int
> > +static void
> >  nouveau_fence_signal(struct nouveau_fence *fence)
> >  {
> > - int drop = 0;
> > -
> >   dma_fence_signal_locked(>base);
> >   list_del(>head);
> >   rcu_assign_pointer(fence->channel, NULL);
> >
> > - if (test_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags)) {
> > - struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> > -
> > - if (atomic_dec_and_test(>notify_ref))
> > - drop = 1;
> > - }
> > -
> >   dma_fence_put(>base);
> > - return drop;
> >  }
> >
> >  static struct nouveau_fence *
> > @@ -93,8 +83,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan 
> > *fctx, int error)
> >   if (error)
> >   dma_fence_set_error(>base, error);
> >
> > - if (nouveau_fence_signal(fence))
> > - nvif_event_block(>event);
> > + nouveau_fence_signal(fence);
> >   }
> >   fctx->killed = 1;
> >   spin_unlock_irqrestore(>lock, flags);
> > @@ -103,8 +92,8 @@ nouveau_fence_context_kill(struct nouveau_fence_chan 
> > *fctx, int error)
> >  void
> >  nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
> >  {
> > - cancel_work_sync(>allow_block_work);
> >   nouveau_fence_context_kill(fctx, 0);
> > + nvif_event_block(>event);
> >   nvif_event_dtor(>event);
> >   fctx->dead = 1;
> >
> > @@ -127,11 +116,10 @@ nouveau_fence_context_free(struct nouveau_fence_chan 
> > *fctx)
> >   kref_put(>fence_ref, nouveau_fence_context_put);
> >  }
> >
> > -static int
> > +static void
> >  nouveau_fence_update(struct nouveau_channel *chan, struct 
> > nouveau_fence_chan *fctx)
> >  {
> >   struct nouveau_fence *fence;
> > - int drop = 0;
> >   u32 seq = fctx->read(chan);
> >
> >   while (!list_empty(>pending)) {
> > @@ -140,10 +128,8 @@ nouveau_fence_update(struct nouveau_channel *chan, 
> > struct nouveau_fence_chan *fc
> >   if ((int)(seq - fence->base.seqno) < 0)
> >   break;
> >
> > - drop |= nouveau_fence_signal(fence);
> > + nouveau_fence_signal(fence);
> >   }
> > -
> > - return drop;
> >  }
> >
> >  static int
> > @@ -160,26 +146,13 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
> > *event, void *repv, u32 repc
> >
> >   fence = list_entry(fctx->pending.next, typeof(*fence), head);
> >   chan = rcu_dereference_protected(fence->channel, 
> > lockdep_is_held(>lock));
> > - if (nouveau_fence_update(chan, fctx))
> > - ret = NVIF_EVENT_DROP;
> > + nouveau_fence_update(chan, fctx);
> >   }
> >   spin_unlock_irqrestore(>lock, flags);
> >
> >   return ret;
> >  }
> >
> > -static void
> > -nouveau_fence_work_allow_block(struct work_struct *work)
> > -{
> > - struct nouveau_fence_chan *fctx = container_of(work, struct 
> > nouveau_fence_chan,
> > -allow_block_work);
> > -
> > - if (atomic_read(>notify_ref) == 0)
> > - nvif_event_block(>event);
> > - else
> > - nvif_event_allow(>event);
> > -}
> > -
> >  void
> >  nouveau_fence_context_new(struct nouveau_channel *chan, struct 
> >

Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

2024-01-25 Thread Dmitry Osipenko

On 1/25/24 13:19, Boris Brezillon wrote:
> On Fri,  5 Jan 2024 21:46:16 +0300
> Dmitry Osipenko  wrote:
> 
>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
>> +{
>> +return (shmem->madv >= 0) && shmem->base.funcs->evict &&
>> +refcount_read(>pages_use_count) &&
>> +!refcount_read(>pages_pin_count) &&
>> +!shmem->base.dma_buf && !shmem->base.import_attach &&
>> +!shmem->evicted;
> 
> Are we missing
> 
> && dma_resv_test_signaled(shmem->base.resv,
> DMA_RESV_USAGE_BOOKKEEP)
> 
> to make sure the GPU is done using the BO?
> The same applies to drm_gem_shmem_is_purgeable() BTW.
> 
> If you don't want to do this test here, we need a way to let drivers
> provide a custom is_{evictable,purgeable}() test.
> 
> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> to let drivers move the GEMs that were used most recently (those
> referenced by a GPU job) at the end of the evictable LRU.

We have the signaled-check in the common drm_gem_evict() helper:

https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496

-- 
Best regards,
Dmitry

Re: [PATCH v19 09/30] drm/shmem-helper: Add and use lockless drm_gem_shmem_get_pages()

2024-01-25 Thread Dmitry Osipenko

On 1/25/24 20:24, Daniel Vetter wrote:
> On Fri, Jan 05, 2024 at 09:46:03PM +0300, Dmitry Osipenko wrote:
>> Add lockless drm_gem_shmem_get_pages() helper that skips taking reservation
>> lock if pages_use_count is non-zero, leveraging from atomicity of the
>> refcount_t. Make drm_gem_shmem_mmap() to utilize the new helper.
>>
>> Acked-by: Maxime Ripard 
>> Reviewed-by: Boris Brezillon 
>> Suggested-by: Boris Brezillon 
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  drivers/gpu/drm/drm_gem_shmem_helper.c | 19 +++
>>  1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
>> b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> index cacf0f8c42e2..1c032513abf1 100644
>> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> @@ -226,6 +226,20 @@ void drm_gem_shmem_put_pages_locked(struct 
>> drm_gem_shmem_object *shmem)
>>  }
>>  EXPORT_SYMBOL_GPL(drm_gem_shmem_put_pages_locked);
>>  
>> +static int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
>> +{
>> +int ret;
> 
> Just random drive-by comment: a might_lock annotation here might be good,
> or people could hit some really interesting bugs that are rather hard to
> reproduce ...
> -Sima

Thanks for the suggestion!

-- 
Best regards,
Dmitry

Re: [PATCH 13/17] drm/msm/dp: enable SDP and SDE periph flush update

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

DP controller can be setup to operate in either SDP update flush mode or
peripheral flush mode based on the DP controller hardware version.

Starting in DP v1.2, the hardware documents require the use of
peripheral flush mode for SDP packets such as PPS OR VSC SDP packets.

In-line with this guidance, lets program the DP controller to use
peripheral flush mode starting DP v1.2

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_catalog.c | 18 ++
  drivers/gpu/drm/msm/dp/dp_catalog.h |  1 +
  drivers/gpu/drm/msm/dp/dp_ctrl.c|  1 +
  drivers/gpu/drm/msm/dp/dp_reg.h |  2 ++
  4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 7e4c68be23e56..b43083b9c2df6 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -446,6 +446,24 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog,
dp_write_link(catalog, REG_DP_MISC1_MISC0, misc_val);
  }
  
+void dp_catalog_setup_peripheral_flush(struct dp_catalog *dp_catalog)

+{
+   u32 mainlink_ctrl;
+   u16 major = 0, minor = 0;
+   struct dp_catalog_private *catalog = container_of(dp_catalog,
+   struct dp_catalog_private, dp_catalog);
+
+   mainlink_ctrl = dp_read_link(catalog, REG_DP_MAINLINK_CTRL);
+
+   dp_catalog_hw_revision(dp_catalog, , );
+   if (major >= 1 && minor >= 2)


if (major > 1 || (major == 1 && minor >= 2))

As a check, which of the values should be written for maj.min = 2.1?


+   mainlink_ctrl |= DP_MAINLINK_FLUSH_MODE_SDE_PERIPH_UPDATE;
+   else
+   mainlink_ctrl |= DP_MAINLINK_FLUSH_MODE_UPDATE_SDP;
+
+   dp_write_link(catalog, REG_DP_MAINLINK_CTRL, mainlink_ctrl);
+}
+
  void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog,
u32 rate, u32 stream_rate_khz,
bool fixed_nvid, bool is_ycbcr_420)
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 6b757249c0698..1d57988aa6689 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -169,6 +169,7 @@ void dp_catalog_ctrl_config_ctrl(struct dp_catalog 
*dp_catalog, u32 config);
  void dp_catalog_ctrl_lane_mapping(struct dp_catalog *dp_catalog);
  void dp_catalog_ctrl_mainlink_ctrl(struct dp_catalog *dp_catalog, bool 
enable);
  void dp_catalog_ctrl_psr_mainlink_enable(struct dp_catalog *dp_catalog, bool 
enable);
+void dp_catalog_setup_peripheral_flush(struct dp_catalog *dp_catalog);
  void dp_catalog_ctrl_config_misc(struct dp_catalog *dp_catalog, u32 cc, u32 
tb);
  void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index ddd92a63d5a67..c375b36f53ce1 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -170,6 +170,7 @@ static void dp_ctrl_configure_source_params(struct 
dp_ctrl_private *ctrl)
  
  	dp_catalog_ctrl_lane_mapping(ctrl->catalog);

dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, true);
+   dp_catalog_setup_peripheral_flush(ctrl->catalog);
  
  	dp_ctrl_config_ctrl(ctrl);
  
diff --git a/drivers/gpu/drm/msm/dp/dp_reg.h b/drivers/gpu/drm/msm/dp/dp_reg.h

index 756ddf85b1e81..05a1009d2f678 100644
--- a/drivers/gpu/drm/msm/dp/dp_reg.h
+++ b/drivers/gpu/drm/msm/dp/dp_reg.h
@@ -102,6 +102,8 @@
  #define DP_MAINLINK_CTRL_ENABLE   (0x0001)
  #define DP_MAINLINK_CTRL_RESET(0x0002)
  #define DP_MAINLINK_CTRL_SW_BYPASS_SCRAMBLER  (0x0010)
+#define DP_MAINLINK_FLUSH_MODE_UPDATE_SDP  (0x0080)
+#define DP_MAINLINK_FLUSH_MODE_SDE_PERIPH_UPDATE   (0x0180)
  #define DP_MAINLINK_FB_BOUNDARY_SEL   (0x0200)
  
  #define REG_DP_STATE_CTRL			(0x0004)


--
With best wishes
Dmitry

Re: [PATCH 12/17] drm/msm/dpu: add support of new peripheral flush mechanism

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

From: Kuogee Hsieh 

Introduce a peripheral flushing mechanism to decouple peripheral
metadata flushing from timing engine related flush.

Signed-off-by: Kuogee Hsieh 
Signed-off-by: Paloma Arellano 
---
  .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c|  3 +++
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c  | 17 +
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h  | 10 ++
  3 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index d0f56c5c4cce9..e284bf448bdda 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -437,6 +437,9 @@ static void dpu_encoder_phys_vid_enable(struct 
dpu_encoder_phys *phys_enc)
if (ctl->ops.update_pending_flush_merge_3d && phys_enc->hw_pp->merge_3d)
ctl->ops.update_pending_flush_merge_3d(ctl, 
phys_enc->hw_pp->merge_3d->idx);
  
+	if (ctl->ops.update_pending_flush_periph && phys_enc->hw_intf->cap->type == INTF_DP)

+   ctl->ops.update_pending_flush_periph(ctl, 
phys_enc->hw_intf->idx);
+
  skip_flush:
DPU_DEBUG_VIDENC(phys_enc,
"update pending flush ctl %d intf %d\n",
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
index e76565c3e6a43..bf45afeb616d3 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
@@ -39,6 +39,7 @@
  #define   CTL_WB_FLUSH  0x108
  #define   CTL_INTF_FLUSH0x110
  #define   CTL_CDM_FLUSH0x114
+#define   CTL_PERIPH_FLUSH  0x128
  #define   CTL_INTF_MASTER   0x134
  #define   CTL_DSPP_n_FLUSH(n)   ((0x13C) + ((n) * 4))
  
@@ -49,6 +50,7 @@

  #define  MERGE_3D_IDX   23
  #define  DSC_IDX22
  #define CDM_IDX 26
+#define  PERIPH_IDX 30
  #define  INTF_IDX   31
  #define WB_IDX  16
  #define  DSPP_IDX   29  /* From DPU hw rev 7.x.x */
@@ -151,6 +153,10 @@ static inline void dpu_hw_ctl_trigger_flush_v1(struct 
dpu_hw_ctl *ctx)
ctx->pending_dspp_flush_mask[dspp - DSPP_0]);
}
  
+	if (ctx->pending_flush_mask & BIT(PERIPH_IDX))

+   DPU_REG_WRITE(>hw, CTL_PERIPH_FLUSH,
+ ctx->pending_periph_flush_mask);
+
if (ctx->pending_flush_mask & BIT(DSC_IDX))
DPU_REG_WRITE(>hw, CTL_DSC_FLUSH,
  ctx->pending_dsc_flush_mask);
@@ -311,6 +317,13 @@ static void dpu_hw_ctl_update_pending_flush_intf_v1(struct 
dpu_hw_ctl *ctx,
ctx->pending_flush_mask |= BIT(INTF_IDX);
  }
  
+static void dpu_hw_ctl_update_pending_flush_periph(struct dpu_hw_ctl *ctx,

+   enum dpu_intf intf)


I assume this is _v1.
Also the argument is misaligned.


+{
+   ctx->pending_periph_flush_mask |= BIT(intf - INTF_0);
+   ctx->pending_flush_mask |= BIT(PERIPH_IDX);
+}
+
  static void dpu_hw_ctl_update_pending_flush_merge_3d_v1(struct dpu_hw_ctl 
*ctx,
enum dpu_merge_3d merge_3d)
  {
@@ -680,6 +693,10 @@ static void _setup_ctl_ops(struct dpu_hw_ctl_ops *ops,
ops->reset_intf_cfg = dpu_hw_ctl_reset_intf_cfg_v1;
ops->update_pending_flush_intf =
dpu_hw_ctl_update_pending_flush_intf_v1;
+
+   ops->update_pending_flush_periph =
+   dpu_hw_ctl_update_pending_flush_periph;
+
ops->update_pending_flush_merge_3d =
dpu_hw_ctl_update_pending_flush_merge_3d_v1;
ops->update_pending_flush_wb = 
dpu_hw_ctl_update_pending_flush_wb_v1;


What about the pre-active platforms?


diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
index ff85b5ee0acf8..5d86c560b6d3f 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
@@ -122,6 +122,15 @@ struct dpu_hw_ctl_ops {
void (*update_pending_flush_intf)(struct dpu_hw_ctl *ctx,
enum dpu_intf blk);
  
+	/**

+* OR in the given flushbits to the cached pending_(periph_)flush_mask
+* No effect on hardware
+* @ctx   : ctl path ctx pointer
+* @blk   : interface block index
+*/
+   void (*update_pending_flush_periph)(struct dpu_hw_ctl *ctx,
+   enum dpu_intf blk);
+
/**
 * OR in the given flushbits to the cached pending_(merge_3d_)flush_mask
 * No effect on hardware
@@ -264,6 +273,7 @@ struct dpu_hw_ctl {
u32 pending_flush_mask;
u32 pending_intf_flush_mask;
u32 pending_wb_flush_mask;
+   u32 pending_periph_flush_mask;
u32 pending_merge_3d_flush_mask;
u32 pending_dspp_flush_mask[DSPP_MAX - DSPP_0];

Re: [PATCH 11/17] drm/msm/dp: add VSC SDP support for YUV420 over DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Add support to pack and send the VSC SDP packet for DP. This therefore
allows the transmision of format information to the sinks which is
needed for YUV420 support over DP.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_catalog.c | 147 
  drivers/gpu/drm/msm/dp/dp_catalog.h |   4 +
  drivers/gpu/drm/msm/dp/dp_ctrl.c|   4 +
  drivers/gpu/drm/msm/dp/dp_panel.c   |  47 +
  drivers/gpu/drm/msm/dp/dp_reg.h |   3 +
  5 files changed, 205 insertions(+)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index c025786170ba5..7e4c68be23e56 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -29,6 +29,9 @@
  
  #define DP_INTF_CONFIG_DATABUS_WIDEN BIT(4)
  
+#define DP_GENERIC0_6_YUV_8_BPC		BIT(0)

+#define DP_GENERIC0_6_YUV_10_BPC   BIT(1)
+
  #define DP_INTERRUPT_STATUS1 \
(DP_INTR_AUX_XFER_DONE| \
DP_INTR_WRONG_ADDR | DP_INTR_TIMEOUT | \
@@ -907,6 +910,150 @@ int dp_catalog_panel_timing_cfg(struct dp_catalog 
*dp_catalog)
return 0;
  }
  
+static void dp_catalog_panel_setup_vsc_sdp(struct dp_catalog *dp_catalog)

+{
+   struct dp_catalog_private *catalog;
+   u32 header, parity, data;
+   u8 bpc, off = 0;
+   u8 buf[SZ_128];
+
+   if (!dp_catalog) {
+   pr_err("invalid input\n");
+   return;
+   }
+
+   catalog = container_of(dp_catalog, struct dp_catalog_private, 
dp_catalog);
+
+   /* HEADER BYTE 1 */
+   header = dp_catalog->sdp.sdp_header.HB1;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_1_BIT) | (parity << 
PARITY_BYTE_1_BIT));
+   dp_write_link(catalog, MMSS_DP_GENERIC0_0, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   /* HEADER BYTE 2 */
+   header = dp_catalog->sdp.sdp_header.HB2;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_2_BIT) | (parity << 
PARITY_BYTE_2_BIT));
+   dp_write_link(catalog, MMSS_DP_GENERIC0_1, data);
+
+   /* HEADER BYTE 3 */
+   header = dp_catalog->sdp.sdp_header.HB3;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_3_BIT) | (parity << 
PARITY_BYTE_3_BIT));
+   data |= dp_read_link(catalog, MMSS_DP_GENERIC0_1);
+   dp_write_link(catalog, MMSS_DP_GENERIC0_1, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);


This seems to be common with the dp_audio code. Please extract this 
header writing too.



+
+   data = 0;
+   dp_write_link(catalog, MMSS_DP_GENERIC0_2, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);


Generally this is not how these functions are expected to be written. 
Please take a look at drivers/video/hdmi.c. It should be split into:

- generic function that packs the C structure into a flat byte buffer,
- driver-specific function that formats and writes the buffer to the 
hardware.



+   dp_write_link(catalog, MMSS_DP_GENERIC0_3, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_4, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_5, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   switch (dp_catalog->vsc_sdp_data.bpc) {
+   case 10:
+   bpc = DP_GENERIC0_6_YUV_10_BPC;
+   break;
+   case 8:
+   default:
+   bpc = DP_GENERIC0_6_YUV_8_BPC;
+   break;
+   }
+
+   /* VSC SDP payload as per table 2-117 of DP 1.4 specification */
+   data = (dp_catalog->vsc_sdp_data.colorimetry & 0xF) |
+  ((dp_catalog->vsc_sdp_data.pixelformat & 0xF) << 4) |
+  (bpc << 8) |
+  ((dp_catalog->vsc_sdp_data.dynamic_range & 0x1) << 15) |
+  ((dp_catalog->vsc_sdp_data.content_type & 0x7) << 16);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_6, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   data = 0;
+   dp_write_link(catalog, MMSS_DP_GENERIC0_7, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_8, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_9, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   print_hex_dump(KERN_DEBUG, "[drm-dp] VSC: ", DUMP_PREFIX_NONE, 16, 4, 
buf, off, false);
+}
+
+void dp_catalog_panel_config_vsc_sdp(struct dp_catalog *dp_catalog, bool en)
+{
+   struct dp_catalog_private *catalog;
+   u32 cfg, cfg2, misc;
+   u16 major = 0, minor = 0;
+
+

Re: [PATCH v19 17/30] drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()

2024-01-25 Thread Dmitry Osipenko

On 1/26/24 00:41, Dmitry Osipenko wrote:
> On 1/5/24 21:46, Dmitry Osipenko wrote:
>>  for (i = page_offset; i < page_offset + NUM_FAULT_PAGES; i++) {
>> +/* Can happen if the last fault only partially filled this
>> + * section of the pages array before failing. In that case
>> + * we skip already filled pages.
>> + */
>> +if (pages[i])
>> +continue;
>> +
>>  pages[i] = shmem_read_mapping_page(mapping, i);
> 
> Although, the shmem_read_mapping_page() should return same page if it
> was already allocated, isn't it? I.e. there was no bug here and the
> fixes/stable tags not needed.

Scratch that, I forgot that the patch is about the unbalanced
get/put_pages

-- 
Best regards,
Dmitry

Re: [PATCH v19 17/30] drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()

2024-01-25 Thread Dmitry Osipenko

On 1/5/24 21:46, Dmitry Osipenko wrote:
>   for (i = page_offset; i < page_offset + NUM_FAULT_PAGES; i++) {
> + /* Can happen if the last fault only partially filled this
> +  * section of the pages array before failing. In that case
> +  * we skip already filled pages.
> +  */
> + if (pages[i])
> + continue;
> +
>   pages[i] = shmem_read_mapping_page(mapping, i);

Although, the shmem_read_mapping_page() should return same page if it
was already allocated, isn't it? I.e. there was no bug here and the
fixes/stable tags not needed.

-- 
Best regards,
Dmitry

Re: [PATCH 09/17] drm/msm/dp: move parity calculation to dp_catalog

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Parity calculation is necessary for VSC SDP implementation, therefore
move it to dp_catalog so it usable by both SDP programming and
dp_audio.c

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_audio.c   | 100 
  drivers/gpu/drm/msm/dp/dp_catalog.h |  72 
  2 files changed, 86 insertions(+), 86 deletions(-)


There is nothing catalog-uish in the parity calculation. Just add 
dp_utils.c. Another options is to push it to the drm/display/


LGTM otherwise.



diff --git a/drivers/gpu/drm/msm/dp/dp_audio.c 
b/drivers/gpu/drm/msm/dp/dp_audio.c
index 4a2e479723a85..7aa785018155a 100644
--- a/drivers/gpu/drm/msm/dp/dp_audio.c
+++ b/drivers/gpu/drm/msm/dp/dp_audio.c
@@ -16,13 +16,6 @@
  #include "dp_panel.h"
  #include "dp_display.h"
  
-#define HEADER_BYTE_2_BIT	 0

-#define PARITY_BYTE_2_BIT   8
-#define HEADER_BYTE_1_BIT  16
-#define PARITY_BYTE_1_BIT  24
-#define HEADER_BYTE_3_BIT  16
-#define PARITY_BYTE_3_BIT  24
-
  struct dp_audio_private {
struct platform_device *audio_pdev;
struct platform_device *pdev;
@@ -36,71 +29,6 @@ struct dp_audio_private {
struct dp_audio dp_audio;
  };
  
-static u8 dp_audio_get_g0_value(u8 data)

-{
-   u8 c[4];
-   u8 g[4];
-   u8 ret_data = 0;
-   u8 i;
-
-   for (i = 0; i < 4; i++)
-   c[i] = (data >> i) & 0x01;
-
-   g[0] = c[3];
-   g[1] = c[0] ^ c[3];
-   g[2] = c[1];
-   g[3] = c[2];
-
-   for (i = 0; i < 4; i++)
-   ret_data = ((g[i] & 0x01) << i) | ret_data;
-
-   return ret_data;
-}
-
-static u8 dp_audio_get_g1_value(u8 data)
-{
-   u8 c[4];
-   u8 g[4];
-   u8 ret_data = 0;
-   u8 i;
-
-   for (i = 0; i < 4; i++)
-   c[i] = (data >> i) & 0x01;
-
-   g[0] = c[0] ^ c[3];
-   g[1] = c[0] ^ c[1] ^ c[3];
-   g[2] = c[1] ^ c[2];
-   g[3] = c[2] ^ c[3];
-
-   for (i = 0; i < 4; i++)
-   ret_data = ((g[i] & 0x01) << i) | ret_data;
-
-   return ret_data;
-}
-
-static u8 dp_audio_calculate_parity(u32 data)
-{
-   u8 x0 = 0;
-   u8 x1 = 0;
-   u8 ci = 0;
-   u8 iData = 0;
-   u8 i = 0;
-   u8 parity_byte;
-   u8 num_byte = (data & 0xFF00) > 0 ? 8 : 2;
-
-   for (i = 0; i < num_byte; i++) {
-   iData = (data >> i*4) & 0xF;
-
-   ci = iData ^ x1;
-   x1 = x0 ^ dp_audio_get_g1_value(ci);
-   x0 = dp_audio_get_g0_value(ci);
-   }
-
-   parity_byte = x1 | (x0 << 4);
-
-   return parity_byte;
-}
-
  static u32 dp_audio_get_header(struct dp_catalog *catalog,
enum dp_catalog_audio_sdp_type sdp,
enum dp_catalog_audio_header_type header)
@@ -134,7 +62,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_1);
  
  	new_value = 0x02;

-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_1_BIT)
| (parity_byte << PARITY_BYTE_1_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -147,7 +75,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
value = dp_audio_get_header(catalog,
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_2);
new_value = value;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_2_BIT)
| (parity_byte << PARITY_BYTE_2_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -162,7 +90,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_3);
  
  	new_value = audio->channels - 1;

-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_3_BIT)
| (parity_byte << PARITY_BYTE_3_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -184,7 +112,7 @@ static void dp_audio_timestamp_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_TIMESTAMP, DP_AUDIO_SDP_HEADER_1);
  
  	new_value = 0x1;

-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_1_BIT)
| (parity_byte << PARITY_BYTE_1_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -198,7 +126,7 @@ static void dp_audio_timestamp_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_TIMESTAMP, DP_AUDIO_SDP_HEADER_2);
  
  	new_value = 0x17;

-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value

Re: [PATCH 08/17] drm/msm/dp: change YUV420 related programming for DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Change all relevant DP controller related programming for YUV420 cases.
Namely, change the pixel clock math to consider YUV420, program the
configuration control to indicate YUV420, as well as modify the MVID
programming to consider YUV420.


Too many items for a single commit.



Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_catalog.c |  5 -
  drivers/gpu/drm/msm/dp/dp_catalog.h |  2 +-
  drivers/gpu/drm/msm/dp/dp_ctrl.c| 12 +---
  drivers/gpu/drm/msm/dp/dp_display.c |  8 +++-
  drivers/gpu/drm/msm/msm_kms.h   |  3 +++
  5 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 5142aeb705a44..5d84c089e520a 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -442,7 +442,7 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog,
  
  void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog,

u32 rate, u32 stream_rate_khz,
-   bool fixed_nvid)
+   bool fixed_nvid, bool is_ycbcr_420)
  {
u32 pixel_m, pixel_n;
u32 mvid, nvid, pixel_div = 0, dispcc_input_rate;
@@ -485,6 +485,9 @@ void dp_catalog_ctrl_config_msa(struct dp_catalog 
*dp_catalog,
nvid = temp;
}
  
+	if (is_ycbcr_420)

+   mvid /= 2;
+
if (link_rate_hbr2 == rate)
nvid *= 2;
  
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h b/drivers/gpu/drm/msm/dp/dp_catalog.h

index 38786e855b51a..6cb5e2a243de2 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -96,7 +96,7 @@ void dp_catalog_ctrl_mainlink_ctrl(struct dp_catalog 
*dp_catalog, bool enable);
  void dp_catalog_ctrl_psr_mainlink_enable(struct dp_catalog *dp_catalog, bool 
enable);
  void dp_catalog_ctrl_config_misc(struct dp_catalog *dp_catalog, u32 cc, u32 
tb);
  void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
-   u32 stream_rate_khz, bool fixed_nvid);
+   u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
  int dp_catalog_ctrl_set_pattern_state_bit(struct dp_catalog *dp_catalog, u32 
pattern);
  u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog);
  void dp_catalog_ctrl_reset(struct dp_catalog *dp_catalog);
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 77a8d9366ed7b..209cf2a35642f 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -128,6 +128,9 @@ static void dp_ctrl_config_ctrl(struct dp_ctrl_private 
*ctrl)
/* Default-> LSCLK DIV: 1/4 LCLK  */
config |= (2 << DP_CONFIGURATION_CTRL_LSCLK_DIV_SHIFT);
  
+	if (ctrl->panel->dp_mode.out_fmt_is_yuv_420)

+   config |= DP_CONFIGURATION_CTRL_RGB_YUV; /* YUV420 */
+


This definitely is not related to clock rate calculations.


/* Scrambler reset enable */
if (drm_dp_alternate_scrambler_reset_cap(dpcd))
config |= DP_CONFIGURATION_CTRL_ASSR;
@@ -957,7 +960,7 @@ static void dp_ctrl_calc_tu_parameters(struct 
dp_ctrl_private *ctrl,
in.hporch = drm_mode->htotal - drm_mode->hdisplay;
in.nlanes = ctrl->link->link_params.num_lanes;
in.bpp = ctrl->panel->dp_mode.bpp;
-   in.pixel_enc = 444;
+   in.pixel_enc = ctrl->panel->dp_mode.out_fmt_is_yuv_420 ? 420 : 444;
in.dsc_en = 0;
in.async_en = 0;
in.fec_en = 0;
@@ -1763,6 +1766,8 @@ int dp_ctrl_on_link(struct dp_ctrl *dp_ctrl)
ctrl->link->link_params.rate = rate;
ctrl->link->link_params.num_lanes =
ctrl->panel->link_info.num_lanes;
+   if (ctrl->panel->dp_mode.out_fmt_is_yuv_420)
+   pixel_rate >>= 1;
}
  
  	drm_dbg_dp(ctrl->drm_dev, "rate=%d, num_lanes=%d, pixel_rate=%lu\n",

@@ -1878,7 +1883,7 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
  
  	pixel_rate = pixel_rate_orig = ctrl->panel->dp_mode.drm_mode.clock;
  
-	if (dp_ctrl->wide_bus_en)

+   if (dp_ctrl->wide_bus_en || ctrl->panel->dp_mode.out_fmt_is_yuv_420)
pixel_rate >>= 1;
  
  	drm_dbg_dp(ctrl->drm_dev, "rate=%d, num_lanes=%d, pixel_rate=%lu\n",

@@ -1917,7 +1922,8 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
  
  	dp_catalog_ctrl_config_msa(ctrl->catalog,

ctrl->link->link_params.rate,
-   pixel_rate_orig, dp_ctrl_use_fixed_nvid(ctrl));
+   pixel_rate_orig, dp_ctrl_use_fixed_nvid(ctrl),
+   ctrl->panel->dp_mode.out_fmt_is_yuv_420);
  
  	dp_ctrl_setup_tr_unit(ctrl);
  
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c

index

Re: [PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.


I think this is handled in the DP driver, where we disallow wide bus for 
YUV 4:2:0 modes.




Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc41..bfb93f02fe7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,7 +168,9 @@ static void dpu_hw_intf_setup_timing_engine(struct 
dpu_hw_intf *ctx,
 * video timing. It is recommended to enable it for all cases, except
 * if compression is enabled in 1 pixel per clock mode
 */
-   if (p->wide_bus_en)
+   if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
+   intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
+   else if (p->wide_bus_en)
intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | INTF_CFG2_DATA_HCTL_EN;
  
  	data_width = p->width;


--
With best wishes
Dmitry

Re: [PATCH 06/17] drm/msm/dpu: move widebus logic to its own API

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Widebus enablement is decided by the interfaces based on their specific
checks and that already happens with DSI/DP specific helpers. Let's
invoke these helpers from dpu_encoder_is_widebus_enabled() to make it
cleaner overall.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 29 -
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h |  4 +++
  2 files changed, 20 insertions(+), 13 deletions(-)



Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH 05/17] drm/msm/dp: add an API to indicate if sink supports VSC SDP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

YUV420 format is supported only in the VSC SDP packet and not through
MSA. Hence add an API which indicates the sink support which can be used
by the rest of the DP programming.


This API ideally should go to drm/display/drm_dp_helper.c



Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_display.c |  3 ++-
  drivers/gpu/drm/msm/dp/dp_panel.c   | 35 +
  drivers/gpu/drm/msm/dp/dp_panel.h   |  1 +
  3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index ddac55f45a722..f6b3b6ca242f8 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1617,7 +1617,8 @@ void dp_bridge_mode_set(struct drm_bridge *drm_bridge,
!!(dp_display->dp_mode.drm_mode.flags & DRM_MODE_FLAG_NHSYNC);
  
  	dp_display->dp_mode.out_fmt_is_yuv_420 =

-   drm_mode_is_420_only(>connector->display_info, 
adjusted_mode);
+   drm_mode_is_420_only(>connector->display_info, adjusted_mode) 
&&
+   dp_panel_vsc_sdp_supported(dp_display->panel);
  
  	/* populate wide_bus_support to different layers */

dp_display->ctrl->wide_bus_en =
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 127f6af995cd1..af7820b6d35ec 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -17,6 +17,9 @@ struct dp_panel_private {
struct dp_link *link;
struct dp_catalog *catalog;
bool panel_on;
+   bool vsc_supported;
+   u8 major;
+   u8 minor;
  };
  
  static void dp_panel_read_psr_cap(struct dp_panel_private *panel)

@@ -43,9 +46,10 @@ static void dp_panel_read_psr_cap(struct dp_panel_private 
*panel)
  static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
  {
int rc;
+   ssize_t rlen;
struct dp_panel_private *panel;
struct dp_link_info *link_info;
-   u8 *dpcd, major, minor;
+   u8 *dpcd, rx_feature;
  
  	panel = container_of(dp_panel, struct dp_panel_private, dp_panel);

dpcd = dp_panel->dpcd;
@@ -53,10 +57,19 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
if (rc)
return rc;
  
+	rlen = drm_dp_dpcd_read(panel->aux, DP_DPRX_FEATURE_ENUMERATION_LIST, _feature, 1);

+   if (rlen != 1) {
+   panel->vsc_supported = false;
+   pr_debug("failed to read DP_DPRX_FEATURE_ENUMERATION_LIST\n");
+   } else {
+   panel->vsc_supported = !!(rx_feature & 
DP_VSC_SDP_EXT_FOR_COLORIMETRY_SUPPORTED);
+   pr_debug("vsc=%d\n", panel->vsc_supported);
+   }
+
link_info = _panel->link_info;
link_info->revision = dpcd[DP_DPCD_REV];
-   major = (link_info->revision >> 4) & 0x0f;
-   minor = link_info->revision & 0x0f;
+   panel->major = (link_info->revision >> 4) & 0x0f;
+   panel->minor = link_info->revision & 0x0f;
  
  	link_info->rate = drm_dp_max_link_rate(dpcd);

link_info->num_lanes = drm_dp_max_lane_count(dpcd);
@@ -69,7 +82,7 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
if (link_info->rate > dp_panel->max_dp_link_rate)
link_info->rate = dp_panel->max_dp_link_rate;
  
-	drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", major, minor);

+   drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", panel->major, 
panel->minor);
drm_dbg_dp(panel->drm_dev, "link_rate=%d\n", link_info->rate);
drm_dbg_dp(panel->drm_dev, "lane_count=%d\n", link_info->num_lanes);
  
@@ -280,6 +293,20 @@ void dp_panel_tpg_config(struct dp_panel *dp_panel, bool enable)

dp_catalog_panel_tpg_enable(catalog, >dp_panel.dp_mode.drm_mode);
  }
  
+bool dp_panel_vsc_sdp_supported(struct dp_panel *dp_panel)

+{
+   struct dp_panel_private *panel;
+
+   if (!dp_panel) {
+   pr_err("invalid input\n");
+   return false;
+   }
+
+   panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
+
+   return panel->major >= 1 && panel->minor >= 3 && panel->vsc_supported;
+}
+
  void dp_panel_dump_regs(struct dp_panel *dp_panel)
  {
struct dp_catalog *catalog;
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.h 
b/drivers/gpu/drm/msm/dp/dp_panel.h
index 6ec68be9f2366..590eca5ce304b 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.h
+++ b/drivers/gpu/drm/msm/dp/dp_panel.h
@@ -66,6 +66,7 @@ int dp_panel_get_modes(struct dp_panel *dp_panel,
struct drm_connector *connector);
  void dp_panel_handle_sink_request(struct dp_panel *dp_panel);
  void dp_panel_tpg_config(struct dp_panel *dp_panel, bool enable);
+bool dp_panel_vsc_sdp_supported(struct dp_panel *dp_panel);
  
  /**

   * is_link_rate_valid() - validates the link rate


--
With best wishes
Dmitry

Re: [PATCH 04/17] drm/msm/dp: store mode YUV420 information to be used by rest of DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Wide bus is not supported when the mode is YUV420 in DP. In preparation
for changing the DPU programming to reflect this, the value and
assignment location of wide_bus_en for the DP submodules must be
changed. Move it from boot time in dp_init_sub_modules() to run time in
dp_display_mode_set.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_display.c | 17 +
  drivers/gpu/drm/msm/dp/dp_panel.h   |  1 +
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 9df2a8b21021e..ddac55f45a722 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -784,10 +784,6 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
goto error_ctrl;
}
  
-	/* populate wide_bus_supported to different layers */

-   dp->ctrl->wide_bus_en = dp->wide_bus_supported;
-   dp->catalog->wide_bus_en = dp->wide_bus_supported;
-
return rc;
  
  error_ctrl:

@@ -808,6 +804,7 @@ static int dp_display_set_mode(struct msm_dp *dp_display,
drm_mode_copy(>panel->dp_mode.drm_mode, >drm_mode);
dp->panel->dp_mode.bpp = mode->bpp;
dp->panel->dp_mode.capabilities = mode->capabilities;
+   dp->panel->dp_mode.out_fmt_is_yuv_420 = mode->out_fmt_is_yuv_420;


Why do we need it in dp_panel too?


dp_panel_init_panel_info(dp->panel);
return 0;
  }
@@ -1402,6 +1399,9 @@ bool msm_dp_wide_bus_available(const struct msm_dp 
*dp_display)
  
  	dp = container_of(dp_display, struct dp_display_private, dp_display);
  
+	if (dp->dp_mode.out_fmt_is_yuv_420)

+   return false;
+
return dp->wide_bus_supported;
  }
  
@@ -1615,6 +1615,15 @@ void dp_bridge_mode_set(struct drm_bridge *drm_bridge,
  
  	dp_display->dp_mode.h_active_low =

!!(dp_display->dp_mode.drm_mode.flags & DRM_MODE_FLAG_NHSYNC);
+
+   dp_display->dp_mode.out_fmt_is_yuv_420 =
+   drm_mode_is_420_only(>connector->display_info, 
adjusted_mode);
+
+   /* populate wide_bus_support to different layers */
+   dp_display->ctrl->wide_bus_en =
+   dp_display->dp_mode.out_fmt_is_yuv_420 ? false : 
dp_display->wide_bus_supported;
+   dp_display->catalog->wide_bus_en =
+   dp_display->dp_mode.out_fmt_is_yuv_420 ? false : 
dp_display->wide_bus_supported;
  }
  
  void dp_bridge_hpd_enable(struct drm_bridge *bridge)

diff --git a/drivers/gpu/drm/msm/dp/dp_panel.h 
b/drivers/gpu/drm/msm/dp/dp_panel.h
index a0dfc579c5f9f..6ec68be9f2366 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.h
+++ b/drivers/gpu/drm/msm/dp/dp_panel.h
@@ -19,6 +19,7 @@ struct dp_display_mode {
u32 bpp;
u32 h_active_low;
u32 v_active_low;
+   bool out_fmt_is_yuv_420;
  };
  
  struct dp_panel_in {


--
With best wishes
Dmitry

Re: [PATCH 03/17] drm/msm/dp: rename wide_bus_en to wide_bus_supported

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Rename wide_bus_en to wide_bus_supported in dp_display_private to
correctly establish that the parameter is referencing if wide bus is
supported instead of enabled.

Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/dp/dp_display.c | 42 ++---
  1 file changed, 21 insertions(+), 21 deletions(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry

Re: [PATCH 02/17] drm/msm/dpu: move dpu_encoder_helper_phys_setup_cdm to dpu_encoder

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Move dpu_encoder_helper_phys_setup_cdm to dpu_encoder in preparation for
implementing CDM compatibility for DP.


Nit: s/CDM compatibility/YUV support/. It might make sense to spell it 
out that YUV over DP requires CDM.




Signed-off-by: Paloma Arellano 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 78 +
  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  9 ++
  .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 84 ---
  3 files changed, 87 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 83380bc92a00a..6cef98f046ea6 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -2114,6 +2114,84 @@ void dpu_encoder_helper_phys_cleanup(struct 
dpu_encoder_phys *phys_enc)
ctl->ops.clear_pending_flush(ctl);
  }
  
+void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc,

+  const struct dpu_format *dpu_fmt,
+  u32 output_type)


My email client suggests that the parameters are not idented properly 
anymore.



+{
+   struct dpu_hw_cdm *hw_cdm;
+   struct dpu_hw_cdm_cfg *cdm_cfg;
+   struct dpu_hw_pingpong *hw_pp;
+   int ret;
+
+   if (!phys_enc)
+   return;
+
+   cdm_cfg = _enc->cdm_cfg;
+   hw_pp = phys_enc->hw_pp;
+   hw_cdm = phys_enc->hw_cdm;
+
+   if (!hw_cdm)
+   return;
+
+   if (!DPU_FORMAT_IS_YUV(dpu_fmt)) {
+   DPU_DEBUG("[enc:%d] cdm_disable fmt:%x\n", 
DRMID(phys_enc->parent),
+ dpu_fmt->base.pixel_format);
+   if (hw_cdm->ops.bind_pingpong_blk)
+   hw_cdm->ops.bind_pingpong_blk(hw_cdm, PINGPONG_NONE);
+
+   return;
+   }
+
+   memset(cdm_cfg, 0, sizeof(struct dpu_hw_cdm_cfg));
+
+   cdm_cfg->output_width = phys_enc->cached_mode.hdisplay;
+   cdm_cfg->output_height = phys_enc->cached_mode.vdisplay;
+   cdm_cfg->output_fmt = dpu_fmt;
+   cdm_cfg->output_type = output_type;
+   cdm_cfg->output_bit_depth = DPU_FORMAT_IS_DX(dpu_fmt) ?
+   CDM_CDWN_OUTPUT_10BIT : CDM_CDWN_OUTPUT_8BIT;
+   cdm_cfg->csc_cfg = _csc10_rgb2yuv_601l;
+
+   /* enable 10 bit logic */
+   switch (cdm_cfg->output_fmt->chroma_sample) {
+   case DPU_CHROMA_RGB:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_DISABLE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   case DPU_CHROMA_H2V1:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_COSITE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   case DPU_CHROMA_420:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_COSITE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_OFFSITE;
+   break;
+   case DPU_CHROMA_H1V2:
+   default:
+   DPU_ERROR("[enc:%d] unsupported chroma sampling type\n",
+ DRMID(phys_enc->parent));
+   cdm_cfg->h_cdwn_type = CDM_CDWN_DISABLE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   }
+
+   DPU_DEBUG("[enc:%d] cdm_enable:%d,%d,%X,%d,%d,%d,%d]\n",
+ DRMID(phys_enc->parent), cdm_cfg->output_width,
+ cdm_cfg->output_height, 
cdm_cfg->output_fmt->base.pixel_format,
+ cdm_cfg->output_type, cdm_cfg->output_bit_depth,
+ cdm_cfg->h_cdwn_type, cdm_cfg->v_cdwn_type);
+
+   if (hw_cdm->ops.enable) {
+   cdm_cfg->pp_id = hw_pp->idx;
+   ret = hw_cdm->ops.enable(hw_cdm, cdm_cfg);
+   if (ret < 0) {
+   DPU_ERROR("[enc:%d] failed to enable CDM; ret:%d\n",
+ DRMID(phys_enc->parent), ret);
+   return;
+   }
+   }
+}
+
  #ifdef CONFIG_DEBUG_FS
  static int _dpu_encoder_status_show(struct seq_file *s, void *data)
  {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 37ac385727c3b..310944303a056 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -381,6 +381,15 @@ int dpu_encoder_helper_wait_for_irq(struct 
dpu_encoder_phys *phys_enc,
   */
  void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc);
  
+/**

+ * dpu_encoder_helper_phys_setup_cdm - setup chroma down sampling block
+ * @phys_enc: Pointer to physical encoder
+ * @output_type: HDMI/WB
+ */
+void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc,
+  const struct dpu_format *dpu_fmt,
+  u32 output_type);


Again, indentation.


+
  /**
   * dpu_encoder_vblank_callback - Notify virtual

Re: [PATCH 01/17] drm/msm/dpu: allow dpu_encoder_helper_phys_setup_cdm to work for DP

2024-01-25 Thread Dmitry Baryshkov


On 25/01/2024 21:38, Paloma Arellano wrote:

Generalize dpu_encoder_helper_phys_setup_cdm to be compatible with DP.

Signed-off-by: Paloma Arellano 
---
  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  4 +--
  .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 31 ++-
  2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 993f263433314..37ac385727c3b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -153,6 +153,7 @@ enum dpu_intr_idx {
   * @hw_intf:  Hardware interface to the intf registers
   * @hw_wb:Hardware interface to the wb registers
   * @hw_cdm:   Hardware interface to the CDM registers
+ * @cdm_cfg:   CDM block config needed to store WB/DP block's CDM configuration


Please realign the description.


   * @dpu_kms:  Pointer to the dpu_kms top level
   * @cached_mode:  DRM mode cached at mode_set time, acted on in enable
   * @vblank_ctl_lock:  Vblank ctl mutex lock to protect vblank_refcount
@@ -183,6 +184,7 @@ struct dpu_encoder_phys {
struct dpu_hw_intf *hw_intf;
struct dpu_hw_wb *hw_wb;
struct dpu_hw_cdm *hw_cdm;
+   struct dpu_hw_cdm_cfg cdm_cfg;


It might be slightly better to move it after all the pointers, so after 
the dpu_kms.



struct dpu_kms *dpu_kms;
struct drm_display_mode cached_mode;
struct mutex vblank_ctl_lock;
@@ -213,7 +215,6 @@ static inline int dpu_encoder_phys_inc_pending(struct 
dpu_encoder_phys *phys)
   * @wbirq_refcount: Reference count of writeback interrupt
   * @wb_done_timeout_cnt: number of wb done irq timeout errors
   * @wb_cfg:  writeback block config to store fb related details
- * @cdm_cfg: cdm block config needed to store writeback block's CDM 
configuration
   * @wb_conn: backpointer to writeback connector
   * @wb_job: backpointer to current writeback job
   * @dest:   dpu buffer layout for current writeback output buffer
@@ -223,7 +224,6 @@ struct dpu_encoder_phys_wb {
atomic_t wbirq_refcount;
int wb_done_timeout_cnt;
struct dpu_hw_wb_cfg wb_cfg;
-   struct dpu_hw_cdm_cfg cdm_cfg;
struct drm_writeback_connector *wb_conn;
struct drm_writeback_job *wb_job;
struct dpu_hw_fmt_layout dest;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
index 4cd2d9e3131a4..072fc6950e496 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
@@ -269,28 +269,21 @@ static void dpu_encoder_phys_wb_setup_ctl(struct 
dpu_encoder_phys *phys_enc)
   * This API does not handle 
DPU_CHROMA_H1V2.
   * @phys_enc:Pointer to physical encoder
   */
-static void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys 
*phys_enc)
+static void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys 
*phys_enc,
+ const struct dpu_format *dpu_fmt,
+ u32 output_type)
  {
struct dpu_hw_cdm *hw_cdm;
struct dpu_hw_cdm_cfg *cdm_cfg;
struct dpu_hw_pingpong *hw_pp;
-   struct dpu_encoder_phys_wb *wb_enc;
-   const struct msm_format *format;
-   const struct dpu_format *dpu_fmt;
-   struct drm_writeback_job *wb_job;
int ret;
  
  	if (!phys_enc)

return;
  
-	wb_enc = to_dpu_encoder_phys_wb(phys_enc);

-   cdm_cfg = _enc->cdm_cfg;
+   cdm_cfg = _enc->cdm_cfg;
hw_pp = phys_enc->hw_pp;
hw_cdm = phys_enc->hw_cdm;
-   wb_job = wb_enc->wb_job;
-
-   format = msm_framebuffer_format(wb_enc->wb_job->fb);
-   dpu_fmt = dpu_get_dpu_format_ext(format->pixel_format, 
wb_job->fb->modifier);
  
  	if (!hw_cdm)

return;
@@ -306,10 +299,10 @@ static void dpu_encoder_helper_phys_setup_cdm(struct 
dpu_encoder_phys *phys_enc)
  
  	memset(cdm_cfg, 0, sizeof(struct dpu_hw_cdm_cfg));
  
-	cdm_cfg->output_width = wb_job->fb->width;

-   cdm_cfg->output_height = wb_job->fb->height;
+   cdm_cfg->output_width = phys_enc->cached_mode.hdisplay;
+   cdm_cfg->output_height = phys_enc->cached_mode.vdisplay;


This is a semantic change. Instead of passing the FB size, this passes 
the mode dimensions. They are not guaranteed to be the same, especially 
for the WB case.



cdm_cfg->output_fmt = dpu_fmt;
-   cdm_cfg->output_type = CDM_CDWN_OUTPUT_WB;
+   cdm_cfg->output_type = output_type;
cdm_cfg->output_bit_depth = DPU_FORMAT_IS_DX(dpu_fmt) ?
CDM_CDWN_OUTPUT_10BIT : CDM_CDWN_OUTPUT_8BIT;
cdm_cfg->csc_cfg = _csc10_rgb2yuv_601l;
@@ -462,6 +455,14 @@ static void dpu_encoder_phys_wb_setup(
struct dpu_hw_wb *hw_wb = phys_enc->hw_wb;

Re: (subset) [PATCH v4 00/29] Add HDMI support for RK3128

2024-01-25 Thread Heiko Stuebner

On Fri, 22 Dec 2023 18:41:51 +0100, Alex Bee wrote:
> This is version 4 of my series that aims to add support for the display
> controller (VOP) and the HDMI controller block of RK3128 (which is very
> similar to the one found in RK3036). The original intention of this series
> was to add support for this slightly different integration but is by now,
> driven by maintainer's feedback, exploded to be a rework of inno-hdmi
> driver in large parts. It is, however, a change for the better.
> 
> [...]

Applied, thanks!

[27/29] ARM: dts: rockchip: Add display subsystem for RK3128
commit: 695b9b57443d88a1c8e0567c88a79d1a4532c75e
[28/29] ARM: dts: rockchip: Add HDMI node for RK3128
commit: 3fd6e33f8fde16869d4cd9cef71ca964b2b0789b
[29/29] ARM: dts: rockchip: Enable HDMI output for XPI-3128
commit: 5aab66e319df2a6fc4ab06bcb4bd974c1ac4927e

Best regards,
-- 
Heiko Stuebner

RE: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Zeng, Oak




> -Original Message-
> From: Daniel Vetter 
> Sent: Thursday, January 25, 2024 1:33 PM
> To: Christian König 
> Cc: Zeng, Oak ; Danilo Krummrich ;
> Dave Airlie ; Daniel Vetter ; Felix
> Kuehling ; Welty, Brian ; dri-
> de...@lists.freedesktop.org; intel...@lists.freedesktop.org; Bommu, Krishnaiah
> ; Ghimiray, Himal Prasad
> ; thomas.hellst...@linux.intel.com;
> Vishwanathapura, Niranjana ; Brost,
> Matthew ; Gupta, saurabhg
> 
> Subject: Re: Making drm_gpuvm work across gpu devices
> 
> On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
> > Am 23.01.24 um 20:37 schrieb Zeng, Oak:
> > > [SNIP]
> > > Yes most API are per device based.
> > >
> > > One exception I know is actually the kfd SVM API. If you look at the 
> > > svm_ioctl
> function, it is per-process based. Each kfd_process represent a process 
> across N
> gpu devices.
> >
> > Yeah and that was a big mistake in my opinion. We should really not do that
> > ever again.
> >
> > > Need to say, kfd SVM represent a shared virtual address space across CPU
> and all GPU devices on the system. This is by the definition of SVM (shared 
> virtual
> memory). This is very different from our legacy gpu *device* driver which 
> works
> for only one device (i.e., if you want one device to access another device's
> memory, you will have to use dma-buf export/import etc).
> >
> > Exactly that thinking is what we have currently found as blocker for a
> > virtualization projects. Having SVM as device independent feature which
> > somehow ties to the process address space turned out to be an extremely bad
> > idea.
> >
> > The background is that this only works for some use cases but not all of
> > them.
> >
> > What's working much better is to just have a mirror functionality which says
> > that a range A..B of the process address space is mapped into a range C..D
> > of the GPU address space.
> >
> > Those ranges can then be used to implement the SVM feature required for
> > higher level APIs and not something you need at the UAPI or even inside the
> > low level kernel memory management.
> >
> > When you talk about migrating memory to a device you also do this on a per
> > device basis and *not* tied to the process address space. If you then get
> > crappy performance because userspace gave contradicting information where
> to
> > migrate memory then that's a bug in userspace and not something the kernel
> > should try to prevent somehow.
> >
> > [SNIP]
> > > > I think if you start using the same drm_gpuvm for multiple devices you
> > > > will sooner or later start to run into the same mess we have seen with
> > > > KFD, where we moved more and more functionality from the KFD to the
> DRM
> > > > render node because we found that a lot of the stuff simply doesn't work
> > > > correctly with a single object to maintain the state.
> > > As I understand it, KFD is designed to work across devices. A single 
> > > pseudo
> /dev/kfd device represent all hardware gpu devices. That is why during kfd 
> open,
> many pdd (process device data) is created, each for one hardware device for 
> this
> process.
> >
> > Yes, I'm perfectly aware of that. And I can only repeat myself that I see
> > this design as a rather extreme failure. And I think it's one of the reasons
> > why NVidia is so dominant with Cuda.
> >
> > This whole approach KFD takes was designed with the idea of extending the
> > CPU process into the GPUs, but this idea only works for a few use cases and
> > is not something we should apply to drivers in general.
> >
> > A very good example are virtualization use cases where you end up with CPU
> > address != GPU address because the VAs are actually coming from the guest
> VM
> > and not the host process.
> >
> > SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have
> > any influence on the design of the kernel UAPI.
> >
> > If you want to do something similar as KFD for Xe I think you need to get
> > explicit permission to do this from Dave and Daniel and maybe even Linus.
> 
> I think the one and only one exception where an SVM uapi like in kfd makes
> sense, is if the _hardware_ itself, not the software stack defined
> semantics that you've happened to build on top of that hw, enforces a 1:1
> mapping with the cpu process address space.
> 
> Which means your hardware is using PASID, IOMMU based translation, PCI-ATS
> (address translation services) or whatever your hw calls it and has _no_
> device-side pagetables on top. Which from what I've seen all devices with
> device-memory have, simply because they need some place to store whether
> that memory is currently in device memory or should be translated using
> PASID. Currently there's no gpu that works with PASID only, but there are
> some on-cpu-die accelerator things that do work like that.
> 
> Maybe in the future there will be some accelerators that are fully cpu
> cache coherent (including atomics) with something like CXL, and the
> on-device memory is

Re: [PATCH v5 003/111] pwm: Provide a macro to get the parent device of a given chip

2024-01-25 Thread Uwe Kleine-König

On Thu, Jan 25, 2024 at 09:29:37PM +0100, Uwe Kleine-König wrote:
> On Thu, Jan 25, 2024 at 11:32:47AM -0800, Florian Fainelli wrote:
> > On 1/25/24 04:08, Uwe Kleine-König wrote:
> > > Currently a pwm_chip stores in its struct device *dev member a pointer
> > > to the parent device. Preparing a change that embeds a full struct
> > > device in struct pwm_chip, this accessor macro should be used in all
> > > drivers directly accessing chip->dev now. This way struct pwm_chip and
> > > this macro can be changed without having to touch all drivers in the
> > > same change set.
> > > 
> > > Signed-off-by: Uwe Kleine-König 
> > 
> > Nit: this is not a macro but an inline function.
> 
> Oh right, it used to be a macro, but I changed that. I made the commit
> log read:
> 
> pwm: Provide an inline function to get the parent device of a given chip
> 
> Currently a pwm_chip stores in its struct device *dev member a pointer
> to the parent device. Preparing a change that embeds a full struct
> device in struct pwm_chip, this accessor function should be used in all
> drivers directly accessing chip->dev now. This way struct pwm_chip and
> this new function can be changed without having to touch all drivers in
> the same change set.

While looking into further feedback, I noticed I did the same mistake in
all the patches that convert the drivers to use this function. I did

git filter-branch --msg-filter 'sed "s/Make use of pwmchip_parent() 
macro/Make use of pwmchip_parent() accessor/; s/commit as struct pwm_chip::dev, 
use the macro/commit as struct pwm_chip::dev, use the accessor/; s/provided for 
exactly this purpose./function provided for exactly this purpose./"' 
linus/master..

on my branch to make the typical commit log read:

pwm: atmel: Make use of pwmchip_parent() accessor

struct pwm_chip::dev is about to change. To not have to touch this
driver in the same commit as struct pwm_chip::dev, use the accessor
function provided for exactly this purpose.

I wont resend the whole series if this is the only change to it.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |

signature.asc
Description: PGP signature

Re: [PATCH v5 037/111] drm/bridge: ti-sn65dsi86: Make use of pwmchip_parent() macro

2024-01-25 Thread Uwe Kleine-König

Hello Doug,

On Thu, Jan 25, 2024 at 09:47:42AM -0800, Doug Anderson wrote:
> On Thu, Jan 25, 2024 at 4:11 AM Uwe Kleine-König
>  wrote:
> >
> > struct pwm_chip::dev is about to change. To not have to touch this
> > driver in the same commit as struct pwm_chip::dev, use the macro
> > provided for exactly this purpose.
> >
> > Signed-off-by: Uwe Kleine-König 
> > ---
> >  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> This seems OK with me. Unless someone more senior in the drm-misc
> community contradicts me, feel free to take this through your tree.
> 
> Acked-by: Douglas Anderson 

Thanks.
 
> NOTE: though the patch seems OK to me, I have one small concern. If I
> understand correctly, your eventual goal is to add a separate "dev"
> for the PWM chip without further changes to the ti-sn65dsi86 driver.
> If that's true, you'll have to find some way to magically call
> devm_pm_runtime_enable() on the new "dev" since the code you have here
> is calling pm_runtime functions on what will eventually be this new
> "dev". Maybe you'll do something like enabling runtime PM on it
> automatically if its parent had runtime PM enabled?

The idea is that the pwmchip_parent macro always returns the pwmchip's
parent. So when this patch gets applied, we have

+static inline struct device *pwmchip_parent(struct pwm_chip *chip)
{
   return chip->dev;
}

and when the pwmchip gets its own struct device, it is adapted to return
chip->dev.parent (and not >dev). See patches #3 and #109.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH v5 104/111] drm/bridge: ti-sn65dsi86: Make use of devm_pwmchip_alloc() function

2024-01-25 Thread Uwe Kleine-König

Hello Doug,

On Thu, Jan 25, 2024 at 09:48:04AM -0800, Doug Anderson wrote:
> On Thu, Jan 25, 2024 at 4:11 AM Uwe Kleine-König
>  wrote:
> > @@ -1374,7 +1374,7 @@ static void ti_sn_pwm_pin_release(struct ti_sn65dsi86 
> > *pdata)
> >
> >  static struct ti_sn65dsi86 *pwm_chip_to_ti_sn_bridge(struct pwm_chip *chip)
> >  {
> > -   return container_of(chip, struct ti_sn65dsi86, pchip);
> > +   return pwmchip_get_drvdata(chip);
> >  }
> 
> nit: given Linux conventions that I'm aware of, a reader of the code
> would see the name "pwm_chip_to_ti_sn_bridge" and assume it's doing a
> container_of operation. It no longer is, so the name doesn't make as
> much sense. ...and, in fact, the function itself doesn't make as much
> sense. Maybe just have all callers call pwmchip_get_drvdata()
> directly?

The upside of keeping the thin wrapper is that it returns the right
type, so I tend to keep it. Probably subjective, but even if it the
function should be dropped, I'd suggest to do that in a separate change
to keep the changes easier to review.

> In any case, this seems fine to me. I haven't done lots to analyze
> your full plans to fix lifetime issues, but this patch itself looks
> benign and I wouldn't object to it landing. Thus I'm OK with:
> 
> Acked-by: Douglas Anderson 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH v5 003/111] pwm: Provide a macro to get the parent device of a given chip

2024-01-25 Thread Uwe Kleine-König

On Thu, Jan 25, 2024 at 11:32:47AM -0800, Florian Fainelli wrote:
> On 1/25/24 04:08, Uwe Kleine-König wrote:
> > Currently a pwm_chip stores in its struct device *dev member a pointer
> > to the parent device. Preparing a change that embeds a full struct
> > device in struct pwm_chip, this accessor macro should be used in all
> > drivers directly accessing chip->dev now. This way struct pwm_chip and
> > this macro can be changed without having to touch all drivers in the
> > same change set.
> > 
> > Signed-off-by: Uwe Kleine-König 
> 
> Nit: this is not a macro but an inline function.

Oh right, it used to be a macro, but I changed that. I made the commit
log read:

pwm: Provide an inline function to get the parent device of a given chip

Currently a pwm_chip stores in its struct device *dev member a pointer
to the parent device. Preparing a change that embeds a full struct
device in struct pwm_chip, this accessor function should be used in all
drivers directly accessing chip->dev now. This way struct pwm_chip and
this new function can be changed without having to touch all drivers in
the same change set.

Thanks,
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |

signature.asc
Description: PGP signature

Re: [Linaro-mm-sig] [PATCH 2/3] udmabuf: Sync buffer mappings for attached devices

2024-01-25 Thread Daniel Vetter

On Tue, Jan 23, 2024 at 04:12:26PM -0600, Andrew Davis wrote:
> Currently this driver creates a SGT table using the CPU as the
> target device, then performs the dma_sync operations against
> that SGT. This is backwards to how DMA-BUFs are supposed to behave.
> This may have worked for the case where these buffers were given
> only back to the same CPU that produced them as in the QEMU case.
> And only then because the original author had the dma_sync
> operations also backwards, syncing for the "device" on begin_cpu.
> This was noticed and "fixed" in this patch[0].
> 
> That then meant we were sync'ing from the CPU to the CPU using
> a pseudo-device "miscdevice". Which then caused another issue
> due to the miscdevice not having a proper DMA mask (and why should
> it, the CPU is not a DMA device). The fix for that was an even
> more egregious hack[1] that declares the CPU is coherent with
> itself and can access its own memory space..
> 
> Unwind all this and perform the correct action by doing the dma_sync
> operations for each device currently attached to the backing buffer.
> 
> [0] commit 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
> [1] commit 9e9fa6a9198b ("udmabuf: Set the DMA mask for the udmabuf device 
> (v2)")
> 
> Signed-off-by: Andrew Davis 

So yeah the above hacks are terrible, but I don't think this is better.
What you're doing now is that you're potentially doing the flushing
multiple times, so if you have a lot of importers with life mappings this
is a performance regression.

It's probably time to bite the bullet and teach the dma-api about flushing
for multiple devices. Or some way we can figure out which is the one
device we need to pick which gives us the right amount of flushing.

Cheers, Sima

> ---
>  drivers/dma-buf/udmabuf.c | 41 +++
>  1 file changed, 16 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 3a23f0a7d112a..ab6764322523c 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -26,8 +26,6 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in 
> megabytes. Default is
>  struct udmabuf {
>   pgoff_t pagecount;
>   struct page **pages;
> - struct sg_table *sg;
> - struct miscdevice *device;
>   struct list_head attachments;
>   struct mutex lock;
>  };
> @@ -169,12 +167,8 @@ static void unmap_udmabuf(struct dma_buf_attachment *at,
>  static void release_udmabuf(struct dma_buf *buf)
>  {
>   struct udmabuf *ubuf = buf->priv;
> - struct device *dev = ubuf->device->this_device;
>   pgoff_t pg;
>  
> - if (ubuf->sg)
> - put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> -
>   for (pg = 0; pg < ubuf->pagecount; pg++)
>   put_page(ubuf->pages[pg]);
>   kfree(ubuf->pages);
> @@ -185,33 +179,31 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
>enum dma_data_direction direction)
>  {
>   struct udmabuf *ubuf = buf->priv;
> - struct device *dev = ubuf->device->this_device;
> - int ret = 0;
> -
> - if (!ubuf->sg) {
> - ubuf->sg = get_sg_table(dev, buf, direction);
> - if (IS_ERR(ubuf->sg)) {
> - ret = PTR_ERR(ubuf->sg);
> - ubuf->sg = NULL;
> - }
> - } else {
> - dma_sync_sg_for_cpu(dev, ubuf->sg->sgl, ubuf->sg->nents,
> - direction);
> - }
> + struct udmabuf_attachment *a;
>  
> - return ret;
> + mutex_lock(>lock);
> +
> + list_for_each_entry(a, >attachments, list)
> + dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
> +
> + mutex_unlock(>lock);
> +
> + return 0;
>  }
>  
>  static int end_cpu_udmabuf(struct dma_buf *buf,
>  enum dma_data_direction direction)
>  {
>   struct udmabuf *ubuf = buf->priv;
> - struct device *dev = ubuf->device->this_device;
> + struct udmabuf_attachment *a;
>  
> - if (!ubuf->sg)
> - return -EINVAL;
> + mutex_lock(>lock);
> +
> + list_for_each_entry(a, >attachments, list)
> + dma_sync_sgtable_for_device(a->dev, a->table, direction);
> +
> + mutex_unlock(>lock);
>  
> - dma_sync_sg_for_device(dev, ubuf->sg->sgl, ubuf->sg->nents, direction);
>   return 0;
>  }
>  
> @@ -307,7 +299,6 @@ static long udmabuf_create(struct miscdevice *device,
>   exp_info.priv = ubuf;
>   exp_info.flags = O_RDWR;
>  
> - ubuf->device = device;
>   buf = dma_buf_export(_info);
>   if (IS_ERR(buf)) {
>   ret = PTR_ERR(buf);
> -- 
> 2.39.2
> 
> ___
> Linaro-mm-sig mailing list -- linaro-mm-...@lists.linaro.org
> To unsubscribe send an email to linaro-mm-sig-le...@lists.linaro.org

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: fb_defio and page->mapping

2024-01-25 Thread Daniel Vetter

On Tue, Jan 23, 2024 at 05:20:55PM +, Matthew Wilcox wrote:
> We're currently trying to remove page->mapping from the entire kernel.
> This has me interested in fb_defio and since I made such a mess of it
> with commits ccf953d8f3d6 / 0b78f8bcf495, I'd like to discuss what to
> do before diving in.
> 
> Folios continue to have a mapping.  So we can effectively do
> page_folio(page)->mapping (today that's calling compound_head() to get
> to the head page; eventually it's a separate allocation).
> 
> But now I look at commit 56c134f7f1b5, I'm a little scared.
> Apparently pages are being allocated from shmem and being mapped by
> fb_deferred_io_fault()?  This line:
> 
> page->mapping = vmf->vma->vm_file->f_mapping;
> 
> initially appears harmless for shmem files (because that's presumably
> a noop), but it's only a noop for head pages.  If shmem allocates a
> compound page (ie a 2MB THP today), you'll overlay some information
> stored in the second and third pages; looks like _entire_mapcount
> and _deferred_list.prev (but we do shift those fields around without
> regard to what the fb_defio driver is doing).  Even if you've disabled
> THP today, setting page->mapping to NULL in fb_deferred_io_lastclose()
> for a shmem page is a really bad idea.
> 
> I'd like to avoid fb_defio playing with page->mapping at all.
> As I understand it, the only reason to set page->mapping is so that
> page_mkclean() works.  But there are all kinds of assumptions in
> page_mkclean() (now essentially folio_mkclean()) that we're dealing with
> file-backed or anonymous memory.  I wonder if we might be better off
> calling pfn_mkclean_range() for each VMA which maps these allocations?
> You'd have to keep track of each VMA yourself (I think?)  but it would
> avoid messing with page->mapping.
> 
> Anyway, I don't know enough about DRM.  There might be something
> unutterably obvious we could do to fix this.

It's just really old code that's been barely touched to keep it going.

The issue is that the entire defio stuff is pretty bad layering violation.
Not sure what the cleanest way to do this really would be if it only
touches the ptes and nothing else. Not sure what's the right function for
a bit of pte walking for that.

That would still potentially mess with the mapping by the gpu memory
allocator in bad ways, but I think at least for all current ones it should
avoid problems.

Definitely agree that messing with struct page in any way is really bad,
we simply didn't get that far yet.

I think the cleanest way would be if we have a fb_mmap only for drm
drivers in drm_fbdev_generic.c and sunset fb_deferred_io_mmap and use that
to just replicate the ptes from the kernel's vmap into one that is ok for
userspace. The fbdev implementation in drm_fbdev_generic.c is the only one
left in drm that supports fb_defio, so that would catch all of them. To my
knowledge all the other defio implementations in native fbdev drivers
aren't problematic since none use shmem.

For I while we pondered with proxying the vma to the driver's drm native
mmap implementation, but that gets real messy plus there's no benefit
because fbdev assumes a bit too much that the memory is permanently
pinned. So we need the pinned kernel vmap anyway.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

[PATCH 16/17] drm/msm/dpu: reserve CDM blocks for DP if mode is YUV420

2024-01-25 Thread Paloma Arellano

Reserve CDM blocks for DP if the mode format is YUV420. Currently this
reservation only works for writeback and DP if the format is YUV420. But
this can be easily extented to other YUV formats for DP.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 99ec53446ad21..c7dcda3d54ae6 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -613,6 +613,7 @@ static int dpu_encoder_virt_atomic_check(
struct dpu_kms *dpu_kms;
struct drm_display_mode *adj_mode;
struct msm_display_topology topology;
+   struct msm_display_info *disp_info;
struct dpu_global_state *global_state;
struct drm_framebuffer *fb;
struct drm_dsc_config *dsc;
@@ -629,6 +630,7 @@ static int dpu_encoder_virt_atomic_check(
DPU_DEBUG_ENC(dpu_enc, "\n");
 
priv = drm_enc->dev->dev_private;
+   disp_info = _enc->disp_info;
dpu_kms = to_dpu_kms(priv->kms);
adj_mode = _state->adjusted_mode;
global_state = dpu_kms_get_global_state(crtc_state->state);
@@ -656,8 +658,8 @@ static int dpu_encoder_virt_atomic_check(
topology = dpu_encoder_get_topology(dpu_enc, dpu_kms, adj_mode, 
crtc_state, dsc);
 
/*
-* Use CDM only for writeback at the moment as other interfaces cannot 
handle it.
-* if writeback itself cannot handle cdm for some reason it will fail 
in its atomic_check()
+* Use CDM only for writeback or DP at the moment as other interfaces 
cannot handle it.
+* If writeback itself cannot handle cdm for some reason it will fail 
in its atomic_check()
 * earlier.
 */
if (dpu_enc->disp_info.intf_type == INTF_WB && 
conn_state->writeback_job) {
@@ -665,12 +667,15 @@ static int dpu_encoder_virt_atomic_check(
 
if (fb && 
DPU_FORMAT_IS_YUV(to_dpu_format(msm_framebuffer_format(fb
topology.needs_cdm = true;
-   if (topology.needs_cdm && !dpu_enc->cur_master->hw_cdm)
-   crtc_state->mode_changed = true;
-   else if (!topology.needs_cdm && dpu_enc->cur_master->hw_cdm)
-   crtc_state->mode_changed = true;
+   } else if (dpu_enc->disp_info.intf_type == INTF_DP) {
+   if 
(msm_dp_is_yuv_420_enabled(priv->dp[disp_info->h_tile_instance[0]], adj_mode))
+   topology.needs_cdm = true;
}
 
+   if (topology.needs_cdm && !dpu_enc->cur_master->hw_cdm)
+   crtc_state->mode_changed = true;
+   else if (!topology.needs_cdm && dpu_enc->cur_master->hw_cdm)
+   crtc_state->mode_changed = true;
/*
 * Release and Allocate resources on every modeset
 * Dont allocate when active is false.
@@ -,7 +1116,8 @@ static void dpu_encoder_virt_atomic_mode_set(struct 
drm_encoder *drm_enc,
 
dpu_enc->dsc_mask = dsc_mask;
 
-   if (dpu_enc->disp_info.intf_type == INTF_WB && 
conn_state->writeback_job) {
+   if ((dpu_enc->disp_info.intf_type == INTF_WB && 
conn_state->writeback_job) ||
+   dpu_enc->disp_info.intf_type == INTF_DP) {
struct dpu_hw_blk *hw_cdm = NULL;
 
dpu_rm_get_assigned_resources(_kms->rm, global_state,
-- 
2.39.2

[PATCH 17/17] drm/msm/dp: allow YUV420 mode for DP connector when VSC SDP supported

2024-01-25 Thread Paloma Arellano

All the components of YUV420 over DP are added. Therefore, let's mark the
connector property as true for DP connector when the DP type is not eDP
and when VSC SDP is supported.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 4329435518351..97edd607400b8 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -370,11 +370,14 @@ static int dp_display_process_hpd_high(struct 
dp_display_private *dp)
 
dp_link_process_request(dp->link);
 
-   if (!dp->dp_display.is_edp)
+   if (!dp->dp_display.is_edp) {
+   if (dp_panel_vsc_sdp_supported(dp->panel))
+   dp->dp_display.connector->ycbcr_420_allowed = true;
drm_dp_set_subconnector_property(dp->dp_display.connector,
 connector_status_connected,
 dp->panel->dpcd,
 dp->panel->downstream_ports);
+   }
 
edid = dp->panel->edid;
 
-- 
2.39.2

[PATCH 11/17] drm/msm/dp: add VSC SDP support for YUV420 over DP

2024-01-25 Thread Paloma Arellano

Add support to pack and send the VSC SDP packet for DP. This therefore
allows the transmision of format information to the sinks which is
needed for YUV420 support over DP.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 147 
 drivers/gpu/drm/msm/dp/dp_catalog.h |   4 +
 drivers/gpu/drm/msm/dp/dp_ctrl.c|   4 +
 drivers/gpu/drm/msm/dp/dp_panel.c   |  47 +
 drivers/gpu/drm/msm/dp/dp_reg.h |   3 +
 5 files changed, 205 insertions(+)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index c025786170ba5..7e4c68be23e56 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -29,6 +29,9 @@
 
 #define DP_INTF_CONFIG_DATABUS_WIDEN BIT(4)
 
+#define DP_GENERIC0_6_YUV_8_BPCBIT(0)
+#define DP_GENERIC0_6_YUV_10_BPC   BIT(1)
+
 #define DP_INTERRUPT_STATUS1 \
(DP_INTR_AUX_XFER_DONE| \
DP_INTR_WRONG_ADDR | DP_INTR_TIMEOUT | \
@@ -907,6 +910,150 @@ int dp_catalog_panel_timing_cfg(struct dp_catalog 
*dp_catalog)
return 0;
 }
 
+static void dp_catalog_panel_setup_vsc_sdp(struct dp_catalog *dp_catalog)
+{
+   struct dp_catalog_private *catalog;
+   u32 header, parity, data;
+   u8 bpc, off = 0;
+   u8 buf[SZ_128];
+
+   if (!dp_catalog) {
+   pr_err("invalid input\n");
+   return;
+   }
+
+   catalog = container_of(dp_catalog, struct dp_catalog_private, 
dp_catalog);
+
+   /* HEADER BYTE 1 */
+   header = dp_catalog->sdp.sdp_header.HB1;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_1_BIT) | (parity << 
PARITY_BYTE_1_BIT));
+   dp_write_link(catalog, MMSS_DP_GENERIC0_0, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   /* HEADER BYTE 2 */
+   header = dp_catalog->sdp.sdp_header.HB2;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_2_BIT) | (parity << 
PARITY_BYTE_2_BIT));
+   dp_write_link(catalog, MMSS_DP_GENERIC0_1, data);
+
+   /* HEADER BYTE 3 */
+   header = dp_catalog->sdp.sdp_header.HB3;
+   parity = dp_catalog_calculate_parity(header);
+   data   = ((header << HEADER_BYTE_3_BIT) | (parity << 
PARITY_BYTE_3_BIT));
+   data |= dp_read_link(catalog, MMSS_DP_GENERIC0_1);
+   dp_write_link(catalog, MMSS_DP_GENERIC0_1, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   data = 0;
+   dp_write_link(catalog, MMSS_DP_GENERIC0_2, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_3, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_4, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_5, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   switch (dp_catalog->vsc_sdp_data.bpc) {
+   case 10:
+   bpc = DP_GENERIC0_6_YUV_10_BPC;
+   break;
+   case 8:
+   default:
+   bpc = DP_GENERIC0_6_YUV_8_BPC;
+   break;
+   }
+
+   /* VSC SDP payload as per table 2-117 of DP 1.4 specification */
+   data = (dp_catalog->vsc_sdp_data.colorimetry & 0xF) |
+  ((dp_catalog->vsc_sdp_data.pixelformat & 0xF) << 4) |
+  (bpc << 8) |
+  ((dp_catalog->vsc_sdp_data.dynamic_range & 0x1) << 15) |
+  ((dp_catalog->vsc_sdp_data.content_type & 0x7) << 16);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_6, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   data = 0;
+   dp_write_link(catalog, MMSS_DP_GENERIC0_7, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_8, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   dp_write_link(catalog, MMSS_DP_GENERIC0_9, data);
+   memcpy(buf + off, , sizeof(data));
+   off += sizeof(data);
+
+   print_hex_dump(KERN_DEBUG, "[drm-dp] VSC: ", DUMP_PREFIX_NONE, 16, 4, 
buf, off, false);
+}
+
+void dp_catalog_panel_config_vsc_sdp(struct dp_catalog *dp_catalog, bool en)
+{
+   struct dp_catalog_private *catalog;
+   u32 cfg, cfg2, misc;
+   u16 major = 0, minor = 0;
+
+   if (!dp_catalog) {
+   pr_err("invalid input\n");
+   return;
+   }
+
+   catalog = container_of(dp_catalog, struct dp_catalog_private, 
dp_catalog);
+
+   cfg = dp_read_link(catalog, MMSS_DP_SDP_CFG);
+   cfg2 = dp_read_link(catalog, MMSS_DP_SDP_CFG2);
+   misc = dp_read_link(catalog, REG_DP_MISC1_MISC0);
+
+   if (en) {
+   cfg |= GEN0_SDP_EN;
+

[PATCH 04/17] drm/msm/dp: store mode YUV420 information to be used by rest of DP

2024-01-25 Thread Paloma Arellano

Wide bus is not supported when the mode is YUV420 in DP. In preparation
for changing the DPU programming to reflect this, the value and
assignment location of wide_bus_en for the DP submodules must be
changed. Move it from boot time in dp_init_sub_modules() to run time in
dp_display_mode_set.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 17 +
 drivers/gpu/drm/msm/dp/dp_panel.h   |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 9df2a8b21021e..ddac55f45a722 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -784,10 +784,6 @@ static int dp_init_sub_modules(struct dp_display_private 
*dp)
goto error_ctrl;
}
 
-   /* populate wide_bus_supported to different layers */
-   dp->ctrl->wide_bus_en = dp->wide_bus_supported;
-   dp->catalog->wide_bus_en = dp->wide_bus_supported;
-
return rc;
 
 error_ctrl:
@@ -808,6 +804,7 @@ static int dp_display_set_mode(struct msm_dp *dp_display,
drm_mode_copy(>panel->dp_mode.drm_mode, >drm_mode);
dp->panel->dp_mode.bpp = mode->bpp;
dp->panel->dp_mode.capabilities = mode->capabilities;
+   dp->panel->dp_mode.out_fmt_is_yuv_420 = mode->out_fmt_is_yuv_420;
dp_panel_init_panel_info(dp->panel);
return 0;
 }
@@ -1402,6 +1399,9 @@ bool msm_dp_wide_bus_available(const struct msm_dp 
*dp_display)
 
dp = container_of(dp_display, struct dp_display_private, dp_display);
 
+   if (dp->dp_mode.out_fmt_is_yuv_420)
+   return false;
+
return dp->wide_bus_supported;
 }
 
@@ -1615,6 +1615,15 @@ void dp_bridge_mode_set(struct drm_bridge *drm_bridge,
 
dp_display->dp_mode.h_active_low =
!!(dp_display->dp_mode.drm_mode.flags & DRM_MODE_FLAG_NHSYNC);
+
+   dp_display->dp_mode.out_fmt_is_yuv_420 =
+   drm_mode_is_420_only(>connector->display_info, 
adjusted_mode);
+
+   /* populate wide_bus_support to different layers */
+   dp_display->ctrl->wide_bus_en =
+   dp_display->dp_mode.out_fmt_is_yuv_420 ? false : 
dp_display->wide_bus_supported;
+   dp_display->catalog->wide_bus_en =
+   dp_display->dp_mode.out_fmt_is_yuv_420 ? false : 
dp_display->wide_bus_supported;
 }
 
 void dp_bridge_hpd_enable(struct drm_bridge *bridge)
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.h 
b/drivers/gpu/drm/msm/dp/dp_panel.h
index a0dfc579c5f9f..6ec68be9f2366 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.h
+++ b/drivers/gpu/drm/msm/dp/dp_panel.h
@@ -19,6 +19,7 @@ struct dp_display_mode {
u32 bpp;
u32 h_active_low;
u32 v_active_low;
+   bool out_fmt_is_yuv_420;
 };
 
 struct dp_panel_in {
-- 
2.39.2

[PATCH 12/17] drm/msm/dpu: add support of new peripheral flush mechanism

2024-01-25 Thread Paloma Arellano

From: Kuogee Hsieh 

Introduce a peripheral flushing mechanism to decouple peripheral
metadata flushing from timing engine related flush.

Signed-off-by: Kuogee Hsieh 
Signed-off-by: Paloma Arellano 
---
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c|  3 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c  | 17 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h  | 10 ++
 3 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index d0f56c5c4cce9..e284bf448bdda 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -437,6 +437,9 @@ static void dpu_encoder_phys_vid_enable(struct 
dpu_encoder_phys *phys_enc)
if (ctl->ops.update_pending_flush_merge_3d && phys_enc->hw_pp->merge_3d)
ctl->ops.update_pending_flush_merge_3d(ctl, 
phys_enc->hw_pp->merge_3d->idx);
 
+   if (ctl->ops.update_pending_flush_periph && 
phys_enc->hw_intf->cap->type == INTF_DP)
+   ctl->ops.update_pending_flush_periph(ctl, 
phys_enc->hw_intf->idx);
+
 skip_flush:
DPU_DEBUG_VIDENC(phys_enc,
"update pending flush ctl %d intf %d\n",
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
index e76565c3e6a43..bf45afeb616d3 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
@@ -39,6 +39,7 @@
 #define   CTL_WB_FLUSH  0x108
 #define   CTL_INTF_FLUSH0x110
 #define   CTL_CDM_FLUSH0x114
+#define   CTL_PERIPH_FLUSH  0x128
 #define   CTL_INTF_MASTER   0x134
 #define   CTL_DSPP_n_FLUSH(n)   ((0x13C) + ((n) * 4))
 
@@ -49,6 +50,7 @@
 #define  MERGE_3D_IDX   23
 #define  DSC_IDX22
 #define CDM_IDX 26
+#define  PERIPH_IDX 30
 #define  INTF_IDX   31
 #define WB_IDX  16
 #define  DSPP_IDX   29  /* From DPU hw rev 7.x.x */
@@ -151,6 +153,10 @@ static inline void dpu_hw_ctl_trigger_flush_v1(struct 
dpu_hw_ctl *ctx)
ctx->pending_dspp_flush_mask[dspp - DSPP_0]);
}
 
+   if (ctx->pending_flush_mask & BIT(PERIPH_IDX))
+   DPU_REG_WRITE(>hw, CTL_PERIPH_FLUSH,
+ ctx->pending_periph_flush_mask);
+
if (ctx->pending_flush_mask & BIT(DSC_IDX))
DPU_REG_WRITE(>hw, CTL_DSC_FLUSH,
  ctx->pending_dsc_flush_mask);
@@ -311,6 +317,13 @@ static void dpu_hw_ctl_update_pending_flush_intf_v1(struct 
dpu_hw_ctl *ctx,
ctx->pending_flush_mask |= BIT(INTF_IDX);
 }
 
+static void dpu_hw_ctl_update_pending_flush_periph(struct dpu_hw_ctl *ctx,
+   enum dpu_intf intf)
+{
+   ctx->pending_periph_flush_mask |= BIT(intf - INTF_0);
+   ctx->pending_flush_mask |= BIT(PERIPH_IDX);
+}
+
 static void dpu_hw_ctl_update_pending_flush_merge_3d_v1(struct dpu_hw_ctl *ctx,
enum dpu_merge_3d merge_3d)
 {
@@ -680,6 +693,10 @@ static void _setup_ctl_ops(struct dpu_hw_ctl_ops *ops,
ops->reset_intf_cfg = dpu_hw_ctl_reset_intf_cfg_v1;
ops->update_pending_flush_intf =
dpu_hw_ctl_update_pending_flush_intf_v1;
+
+   ops->update_pending_flush_periph =
+   dpu_hw_ctl_update_pending_flush_periph;
+
ops->update_pending_flush_merge_3d =
dpu_hw_ctl_update_pending_flush_merge_3d_v1;
ops->update_pending_flush_wb = 
dpu_hw_ctl_update_pending_flush_wb_v1;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
index ff85b5ee0acf8..5d86c560b6d3f 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h
@@ -122,6 +122,15 @@ struct dpu_hw_ctl_ops {
void (*update_pending_flush_intf)(struct dpu_hw_ctl *ctx,
enum dpu_intf blk);
 
+   /**
+* OR in the given flushbits to the cached pending_(periph_)flush_mask
+* No effect on hardware
+* @ctx   : ctl path ctx pointer
+* @blk   : interface block index
+*/
+   void (*update_pending_flush_periph)(struct dpu_hw_ctl *ctx,
+   enum dpu_intf blk);
+
/**
 * OR in the given flushbits to the cached pending_(merge_3d_)flush_mask
 * No effect on hardware
@@ -264,6 +273,7 @@ struct dpu_hw_ctl {
u32 pending_flush_mask;
u32 pending_intf_flush_mask;
u32 pending_wb_flush_mask;
+   u32 pending_periph_flush_mask;
u32 pending_merge_3d_flush_mask;
u32 pending_dspp_flush_mask[DSPP_MAX - DSPP_0];
u32 pending_dsc_flush_mask;
-- 
2.39.2

[PATCH 14/17] drm/msm/dpu: modify encoder programming for CDM over DP

2024-01-25 Thread Paloma Arellano

Adjust the encoder format programming in the case of video mode for DP
to accommodate CDM related changes.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 16 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h   |  8 +
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  | 35 ---
 drivers/gpu/drm/msm/dp/dp_display.c   | 12 +++
 drivers/gpu/drm/msm/msm_drv.h |  9 -
 5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index b0896814c1562..99ec53446ad21 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -222,6 +222,22 @@ static u32 dither_matrix[DITHER_MATRIX_SZ] = {
15, 7, 13, 5, 3, 11, 1, 9, 12, 4, 14, 6, 0, 8, 2, 10
 };
 
+u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc, const struct 
drm_display_mode *mode)
+{
+   const struct dpu_encoder_virt *dpu_enc;
+   const struct msm_display_info *disp_info;
+   struct msm_drm_private *priv;
+
+   dpu_enc = to_dpu_encoder_virt(drm_enc);
+   disp_info = _enc->disp_info;
+   priv = drm_enc->dev->dev_private;
+
+   if (disp_info->intf_type == INTF_DP &&
+   msm_dp_is_yuv_420_enabled(priv->dp[disp_info->h_tile_instance[0]], 
mode))
+   return DRM_FORMAT_YUV420;
+
+   return DRM_FORMAT_RGB888;
+}
 
 bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc)
 {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
index 7b4afa71f1f96..62255d0aa4487 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
@@ -162,6 +162,14 @@ int dpu_encoder_get_vsync_count(struct drm_encoder 
*drm_enc);
  */
 bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc);
 
+/**
+ * dpu_encoder_get_drm_fmt - return DRM fourcc format
+ * @drm_enc:Pointer to previously created drm encoder structure
+ * @mode:  Corresponding drm_display_mode for dpu encoder
+ */
+u32 dpu_encoder_get_drm_fmt(const struct drm_encoder *drm_enc,
+   const struct drm_display_mode *mode);
+
 /**
  * dpu_encoder_get_crc_values_cnt - get number of physical encoders contained
  * in virtual encoder that can collect CRC values
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index e284bf448bdda..a1dde0ff35dc8 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -234,6 +234,7 @@ static void dpu_encoder_phys_vid_setup_timing_engine(
 {
struct drm_display_mode mode;
struct dpu_hw_intf_timing_params timing_params = { 0 };
+   struct dpu_hw_cdm *hw_cdm;
const struct dpu_format *fmt = NULL;
u32 fmt_fourcc = DRM_FORMAT_RGB888;
unsigned long lock_flags;
@@ -254,17 +255,26 @@ static void dpu_encoder_phys_vid_setup_timing_engine(
DPU_DEBUG_VIDENC(phys_enc, "enabling mode:\n");
drm_mode_debug_printmodeline();
 
-   if (phys_enc->split_role != ENC_ROLE_SOLO) {
+   hw_cdm = phys_enc->hw_cdm;
+   if (hw_cdm) {
+   intf_cfg.cdm = hw_cdm->idx;
+   fmt_fourcc = dpu_encoder_get_drm_fmt(phys_enc->parent, );
+   }
+
+   if (phys_enc->split_role != ENC_ROLE_SOLO ||
+   dpu_encoder_get_drm_fmt(phys_enc->parent, ) == 
DRM_FORMAT_YUV420) {
mode.hdisplay >>= 1;
mode.htotal >>= 1;
mode.hsync_start >>= 1;
mode.hsync_end >>= 1;
+   mode.hskew >>= 1;
 
DPU_DEBUG_VIDENC(phys_enc,
-   "split_role %d, halve horizontal %d %d %d %d\n",
+   "split_role %d, halve horizontal %d %d %d %d %d\n",
phys_enc->split_role,
mode.hdisplay, mode.htotal,
-   mode.hsync_start, mode.hsync_end);
+   mode.hsync_start, mode.hsync_end,
+   mode.hskew);
}
 
drm_mode_to_intf_timing_params(phys_enc, , _params);
@@ -412,8 +422,15 @@ static int dpu_encoder_phys_vid_control_vblank_irq(
 static void dpu_encoder_phys_vid_enable(struct dpu_encoder_phys *phys_enc)
 {
struct dpu_hw_ctl *ctl;
+   struct dpu_hw_cdm *hw_cdm;
+   const struct dpu_format *fmt = NULL;
+   u32 fmt_fourcc = DRM_FORMAT_RGB888;
 
ctl = phys_enc->hw_ctl;
+   hw_cdm = phys_enc->hw_cdm;
+   if (hw_cdm)
+   fmt_fourcc = dpu_encoder_get_drm_fmt(phys_enc->parent, 
_enc->cached_mode);
+   fmt = dpu_get_dpu_format(fmt_fourcc);
 
DPU_DEBUG_VIDENC(phys_enc, "\n");
 
@@ -422,6 +439,8 @@ static void dpu_encoder_phys_vid_enable(struct 
dpu_encoder_phys *phys_enc)

[PATCH 13/17] drm/msm/dp: enable SDP and SDE periph flush update

2024-01-25 Thread Paloma Arellano

DP controller can be setup to operate in either SDP update flush mode or
peripheral flush mode based on the DP controller hardware version.

Starting in DP v1.2, the hardware documents require the use of
peripheral flush mode for SDP packets such as PPS OR VSC SDP packets.

In-line with this guidance, lets program the DP controller to use
peripheral flush mode starting DP v1.2

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 18 ++
 drivers/gpu/drm/msm/dp/dp_catalog.h |  1 +
 drivers/gpu/drm/msm/dp/dp_ctrl.c|  1 +
 drivers/gpu/drm/msm/dp/dp_reg.h |  2 ++
 4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 7e4c68be23e56..b43083b9c2df6 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -446,6 +446,24 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog,
dp_write_link(catalog, REG_DP_MISC1_MISC0, misc_val);
 }
 
+void dp_catalog_setup_peripheral_flush(struct dp_catalog *dp_catalog)
+{
+   u32 mainlink_ctrl;
+   u16 major = 0, minor = 0;
+   struct dp_catalog_private *catalog = container_of(dp_catalog,
+   struct dp_catalog_private, dp_catalog);
+
+   mainlink_ctrl = dp_read_link(catalog, REG_DP_MAINLINK_CTRL);
+
+   dp_catalog_hw_revision(dp_catalog, , );
+   if (major >= 1 && minor >= 2)
+   mainlink_ctrl |= DP_MAINLINK_FLUSH_MODE_SDE_PERIPH_UPDATE;
+   else
+   mainlink_ctrl |= DP_MAINLINK_FLUSH_MODE_UPDATE_SDP;
+
+   dp_write_link(catalog, REG_DP_MAINLINK_CTRL, mainlink_ctrl);
+}
+
 void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog,
u32 rate, u32 stream_rate_khz,
bool fixed_nvid, bool is_ycbcr_420)
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 6b757249c0698..1d57988aa6689 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -169,6 +169,7 @@ void dp_catalog_ctrl_config_ctrl(struct dp_catalog 
*dp_catalog, u32 config);
 void dp_catalog_ctrl_lane_mapping(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_mainlink_ctrl(struct dp_catalog *dp_catalog, bool enable);
 void dp_catalog_ctrl_psr_mainlink_enable(struct dp_catalog *dp_catalog, bool 
enable);
+void dp_catalog_setup_peripheral_flush(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_config_misc(struct dp_catalog *dp_catalog, u32 cc, u32 
tb);
 void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index ddd92a63d5a67..c375b36f53ce1 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -170,6 +170,7 @@ static void dp_ctrl_configure_source_params(struct 
dp_ctrl_private *ctrl)
 
dp_catalog_ctrl_lane_mapping(ctrl->catalog);
dp_catalog_ctrl_mainlink_ctrl(ctrl->catalog, true);
+   dp_catalog_setup_peripheral_flush(ctrl->catalog);
 
dp_ctrl_config_ctrl(ctrl);
 
diff --git a/drivers/gpu/drm/msm/dp/dp_reg.h b/drivers/gpu/drm/msm/dp/dp_reg.h
index 756ddf85b1e81..05a1009d2f678 100644
--- a/drivers/gpu/drm/msm/dp/dp_reg.h
+++ b/drivers/gpu/drm/msm/dp/dp_reg.h
@@ -102,6 +102,8 @@
 #define DP_MAINLINK_CTRL_ENABLE(0x0001)
 #define DP_MAINLINK_CTRL_RESET (0x0002)
 #define DP_MAINLINK_CTRL_SW_BYPASS_SCRAMBLER   (0x0010)
+#define DP_MAINLINK_FLUSH_MODE_UPDATE_SDP  (0x0080)
+#define DP_MAINLINK_FLUSH_MODE_SDE_PERIPH_UPDATE   (0x0180)
 #define DP_MAINLINK_FB_BOUNDARY_SEL(0x0200)
 
 #define REG_DP_STATE_CTRL  (0x0004)
-- 
2.39.2

[PATCH 15/17] drm/msm/dpu: allow certain formats for CDM for DP

2024-01-25 Thread Paloma Arellano

CDM block supports formats other than H1V2 for DP. Since we are now
adding support for CDM over DP, relax the checks to allow all other
formats for DP other than H1V2.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
index e9cdc7934a499..9016b3ade6bc3 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c
@@ -186,7 +186,7 @@ static int dpu_hw_cdm_enable(struct dpu_hw_cdm *ctx, struct 
dpu_hw_cdm_cfg *cdm)
dpu_hw_cdm_setup_cdwn(ctx, cdm);
 
if (cdm->output_type == CDM_CDWN_OUTPUT_HDMI) {
-   if (fmt->chroma_sample != DPU_CHROMA_H1V2)
+   if (fmt->chroma_sample == DPU_CHROMA_H1V2)
return -EINVAL; /*unsupported format */
opmode = CDM_HDMI_PACK_OP_MODE_EN;
opmode |= (fmt->chroma_sample << 1);
-- 
2.39.2

[PATCH 05/17] drm/msm/dp: add an API to indicate if sink supports VSC SDP

2024-01-25 Thread Paloma Arellano

YUV420 format is supported only in the VSC SDP packet and not through
MSA. Hence add an API which indicates the sink support which can be used
by the rest of the DP programming.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_display.c |  3 ++-
 drivers/gpu/drm/msm/dp/dp_panel.c   | 35 +
 drivers/gpu/drm/msm/dp/dp_panel.h   |  1 +
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index ddac55f45a722..f6b3b6ca242f8 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1617,7 +1617,8 @@ void dp_bridge_mode_set(struct drm_bridge *drm_bridge,
!!(dp_display->dp_mode.drm_mode.flags & DRM_MODE_FLAG_NHSYNC);
 
dp_display->dp_mode.out_fmt_is_yuv_420 =
-   drm_mode_is_420_only(>connector->display_info, 
adjusted_mode);
+   drm_mode_is_420_only(>connector->display_info, 
adjusted_mode) &&
+   dp_panel_vsc_sdp_supported(dp_display->panel);
 
/* populate wide_bus_support to different layers */
dp_display->ctrl->wide_bus_en =
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 127f6af995cd1..af7820b6d35ec 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -17,6 +17,9 @@ struct dp_panel_private {
struct dp_link *link;
struct dp_catalog *catalog;
bool panel_on;
+   bool vsc_supported;
+   u8 major;
+   u8 minor;
 };
 
 static void dp_panel_read_psr_cap(struct dp_panel_private *panel)
@@ -43,9 +46,10 @@ static void dp_panel_read_psr_cap(struct dp_panel_private 
*panel)
 static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
 {
int rc;
+   ssize_t rlen;
struct dp_panel_private *panel;
struct dp_link_info *link_info;
-   u8 *dpcd, major, minor;
+   u8 *dpcd, rx_feature;
 
panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
dpcd = dp_panel->dpcd;
@@ -53,10 +57,19 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
if (rc)
return rc;
 
+   rlen = drm_dp_dpcd_read(panel->aux, DP_DPRX_FEATURE_ENUMERATION_LIST, 
_feature, 1);
+   if (rlen != 1) {
+   panel->vsc_supported = false;
+   pr_debug("failed to read DP_DPRX_FEATURE_ENUMERATION_LIST\n");
+   } else {
+   panel->vsc_supported = !!(rx_feature & 
DP_VSC_SDP_EXT_FOR_COLORIMETRY_SUPPORTED);
+   pr_debug("vsc=%d\n", panel->vsc_supported);
+   }
+
link_info = _panel->link_info;
link_info->revision = dpcd[DP_DPCD_REV];
-   major = (link_info->revision >> 4) & 0x0f;
-   minor = link_info->revision & 0x0f;
+   panel->major = (link_info->revision >> 4) & 0x0f;
+   panel->minor = link_info->revision & 0x0f;
 
link_info->rate = drm_dp_max_link_rate(dpcd);
link_info->num_lanes = drm_dp_max_lane_count(dpcd);
@@ -69,7 +82,7 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel)
if (link_info->rate > dp_panel->max_dp_link_rate)
link_info->rate = dp_panel->max_dp_link_rate;
 
-   drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", major, minor);
+   drm_dbg_dp(panel->drm_dev, "version: %d.%d\n", panel->major, 
panel->minor);
drm_dbg_dp(panel->drm_dev, "link_rate=%d\n", link_info->rate);
drm_dbg_dp(panel->drm_dev, "lane_count=%d\n", link_info->num_lanes);
 
@@ -280,6 +293,20 @@ void dp_panel_tpg_config(struct dp_panel *dp_panel, bool 
enable)
dp_catalog_panel_tpg_enable(catalog, >dp_panel.dp_mode.drm_mode);
 }
 
+bool dp_panel_vsc_sdp_supported(struct dp_panel *dp_panel)
+{
+   struct dp_panel_private *panel;
+
+   if (!dp_panel) {
+   pr_err("invalid input\n");
+   return false;
+   }
+
+   panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
+
+   return panel->major >= 1 && panel->minor >= 3 && panel->vsc_supported;
+}
+
 void dp_panel_dump_regs(struct dp_panel *dp_panel)
 {
struct dp_catalog *catalog;
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.h 
b/drivers/gpu/drm/msm/dp/dp_panel.h
index 6ec68be9f2366..590eca5ce304b 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.h
+++ b/drivers/gpu/drm/msm/dp/dp_panel.h
@@ -66,6 +66,7 @@ int dp_panel_get_modes(struct dp_panel *dp_panel,
struct drm_connector *connector);
 void dp_panel_handle_sink_request(struct dp_panel *dp_panel);
 void dp_panel_tpg_config(struct dp_panel *dp_panel, bool enable);
+bool dp_panel_vsc_sdp_supported(struct dp_panel *dp_panel);
 
 /**
  * is_link_rate_valid() - validates the link rate
-- 
2.39.2

[PATCH 06/17] drm/msm/dpu: move widebus logic to its own API

2024-01-25 Thread Paloma Arellano

Widebus enablement is decided by the interfaces based on their specific
checks and that already happens with DSI/DP specific helpers. Let's
invoke these helpers from dpu_encoder_is_widebus_enabled() to make it
cleaner overall.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 29 -
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h |  4 +++
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 6cef98f046ea6..b0896814c1562 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -225,9 +225,21 @@ static u32 dither_matrix[DITHER_MATRIX_SZ] = {
 
 bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc)
 {
-   const struct dpu_encoder_virt *dpu_enc = to_dpu_encoder_virt(drm_enc);
+   const struct dpu_encoder_virt *dpu_enc;
+   struct msm_drm_private *priv = drm_enc->dev->dev_private;
+   const struct msm_display_info *disp_info;
+   int index;
+
+   dpu_enc = to_dpu_encoder_virt(drm_enc);
+   disp_info = _enc->disp_info;
+   index = disp_info->h_tile_instance[0];
 
-   return dpu_enc->wide_bus_en;
+   if (disp_info->intf_type == INTF_DP)
+   return msm_dp_wide_bus_available(priv->dp[index]);
+   else if (disp_info->intf_type == INTF_DSI)
+   return msm_dsi_wide_bus_enabled(priv->dsi[index]);
+
+   return false;
 }
 
 int dpu_encoder_get_crc_values_cnt(const struct drm_encoder *drm_enc)
@@ -1192,26 +1204,17 @@ static void dpu_encoder_virt_atomic_enable(struct 
drm_encoder *drm_enc,
struct dpu_encoder_virt *dpu_enc = NULL;
int ret = 0;
struct drm_display_mode *cur_mode = NULL;
-   struct msm_drm_private *priv = drm_enc->dev->dev_private;
-   struct msm_display_info *disp_info;
-   int index;
 
dpu_enc = to_dpu_encoder_virt(drm_enc);
-   disp_info = _enc->disp_info;
-   index = disp_info->h_tile_instance[0];
-
dpu_enc->dsc = dpu_encoder_get_dsc_config(drm_enc);
 
atomic_set(_enc->frame_done_timeout_cnt, 0);
 
-   if (disp_info->intf_type == INTF_DP)
-   dpu_enc->wide_bus_en = 
msm_dp_wide_bus_available(priv->dp[index]);
-   else if (disp_info->intf_type == INTF_DSI)
-   dpu_enc->wide_bus_en = 
msm_dsi_wide_bus_enabled(priv->dsi[index]);
-
mutex_lock(_enc->enc_lock);
cur_mode = _enc->base.crtc->state->adjusted_mode;
 
+   dpu_enc->wide_bus_en = dpu_encoder_is_widebus_enabled(drm_enc);
+
trace_dpu_enc_enable(DRMID(drm_enc), cur_mode->hdisplay,
 cur_mode->vdisplay);
 
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
index 4c05fd5e9ed18..7b4afa71f1f96 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h
@@ -156,6 +156,10 @@ int dpu_encoder_get_linecount(struct drm_encoder *drm_enc);
  */
 int dpu_encoder_get_vsync_count(struct drm_encoder *drm_enc);
 
+/**
+ * dpu_encoder_is_widebus_enabled - return bool value if widebus is enabled
+ * @drm_enc:Pointer to previously created drm encoder structure
+ */
 bool dpu_encoder_is_widebus_enabled(const struct drm_encoder *drm_enc);
 
 /**
-- 
2.39.2

[PATCH 10/17] drm/msm/dp: modify dp_catalog_hw_revision to show major and minor val

2024-01-25 Thread Paloma Arellano

Modify dp_catalog_hw_revision to make the major and minor version values
known instead of outputting the entire hex value of the hardware version
register in preparation of using it for VSC SDP programming.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 12 +---
 drivers/gpu/drm/msm/dp/dp_catalog.h |  2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 5d84c089e520a..c025786170ba5 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -24,6 +24,9 @@
 #define DP_INTERRUPT_STATUS_ACK_SHIFT  1
 #define DP_INTERRUPT_STATUS_MASK_SHIFT 2
 
+#define DP_HW_VERSION_MAJOR(reg)   FIELD_GET(GENMASK(31, 28), reg)
+#define DP_HW_VERSION_MINOR(reg)   FIELD_GET(GENMASK(27, 16), reg)
+
 #define DP_INTF_CONFIG_DATABUS_WIDEN BIT(4)
 
 #define DP_INTERRUPT_STATUS1 \
@@ -531,15 +534,18 @@ int dp_catalog_ctrl_set_pattern_state_bit(struct 
dp_catalog *dp_catalog,
  *
  * @dp_catalog: DP catalog structure
  *
- * Return: DP controller hw revision
+ * Return: void
  *
  */
-u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog)
+void dp_catalog_hw_revision(const struct dp_catalog *dp_catalog, u16 *major, 
u16 *minor)
 {
const struct dp_catalog_private *catalog = container_of(dp_catalog,
struct dp_catalog_private, dp_catalog);
+   u32 reg_dp_hw_version;
 
-   return dp_read_ahb(catalog, REG_DP_HW_VERSION);
+   reg_dp_hw_version = dp_read_ahb(catalog, REG_DP_HW_VERSION);
+   *major = DP_HW_VERSION_MAJOR(reg_dp_hw_version);
+   *minor = DP_HW_VERSION_MINOR(reg_dp_hw_version);
 }
 
 /**
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 563903605b3a7..94c377ef90c35 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -170,7 +170,7 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog, u32 cc, u32 tb);
 void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
 int dp_catalog_ctrl_set_pattern_state_bit(struct dp_catalog *dp_catalog, u32 
pattern);
-u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog);
+void dp_catalog_hw_revision(const struct dp_catalog *dp_catalog, u16 *major, 
u16 *minor);
 void dp_catalog_ctrl_reset(struct dp_catalog *dp_catalog);
 bool dp_catalog_ctrl_mainlink_ready(struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_enable_irq(struct dp_catalog *dp_catalog, bool enable);
-- 
2.39.2

[PATCH 07/17] drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420

2024-01-25 Thread Paloma Arellano

INTF_CONFIG2 register cannot have widebus enabled when DP format is
YUV420. Therefore, program the INTF to send 1 ppc.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
index 6bba531d6dc41..bfb93f02fe7c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c
@@ -168,7 +168,9 @@ static void dpu_hw_intf_setup_timing_engine(struct 
dpu_hw_intf *ctx,
 * video timing. It is recommended to enable it for all cases, except
 * if compression is enabled in 1 pixel per clock mode
 */
-   if (p->wide_bus_en)
+   if (dp_intf && fmt->base.pixel_format == DRM_FORMAT_YUV420)
+   intf_cfg2 |= INTF_CFG2_DATA_HCTL_EN;
+   else if (p->wide_bus_en)
intf_cfg2 |= INTF_CFG2_DATABUS_WIDEN | INTF_CFG2_DATA_HCTL_EN;
 
data_width = p->width;
-- 
2.39.2

[PATCH 01/17] drm/msm/dpu: allow dpu_encoder_helper_phys_setup_cdm to work for DP

2024-01-25 Thread Paloma Arellano

Generalize dpu_encoder_helper_phys_setup_cdm to be compatible with DP.

Signed-off-by: Paloma Arellano 
---
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  4 +--
 .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 31 ++-
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 993f263433314..37ac385727c3b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -153,6 +153,7 @@ enum dpu_intr_idx {
  * @hw_intf:   Hardware interface to the intf registers
  * @hw_wb: Hardware interface to the wb registers
  * @hw_cdm:Hardware interface to the CDM registers
+ * @cdm_cfg:   CDM block config needed to store WB/DP block's CDM configuration
  * @dpu_kms:   Pointer to the dpu_kms top level
  * @cached_mode:   DRM mode cached at mode_set time, acted on in enable
  * @vblank_ctl_lock:   Vblank ctl mutex lock to protect vblank_refcount
@@ -183,6 +184,7 @@ struct dpu_encoder_phys {
struct dpu_hw_intf *hw_intf;
struct dpu_hw_wb *hw_wb;
struct dpu_hw_cdm *hw_cdm;
+   struct dpu_hw_cdm_cfg cdm_cfg;
struct dpu_kms *dpu_kms;
struct drm_display_mode cached_mode;
struct mutex vblank_ctl_lock;
@@ -213,7 +215,6 @@ static inline int dpu_encoder_phys_inc_pending(struct 
dpu_encoder_phys *phys)
  * @wbirq_refcount: Reference count of writeback interrupt
  * @wb_done_timeout_cnt: number of wb done irq timeout errors
  * @wb_cfg:  writeback block config to store fb related details
- * @cdm_cfg: cdm block config needed to store writeback block's CDM 
configuration
  * @wb_conn: backpointer to writeback connector
  * @wb_job: backpointer to current writeback job
  * @dest:   dpu buffer layout for current writeback output buffer
@@ -223,7 +224,6 @@ struct dpu_encoder_phys_wb {
atomic_t wbirq_refcount;
int wb_done_timeout_cnt;
struct dpu_hw_wb_cfg wb_cfg;
-   struct dpu_hw_cdm_cfg cdm_cfg;
struct drm_writeback_connector *wb_conn;
struct drm_writeback_job *wb_job;
struct dpu_hw_fmt_layout dest;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
index 4cd2d9e3131a4..072fc6950e496 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
@@ -269,28 +269,21 @@ static void dpu_encoder_phys_wb_setup_ctl(struct 
dpu_encoder_phys *phys_enc)
  * This API does not handle 
DPU_CHROMA_H1V2.
  * @phys_enc:Pointer to physical encoder
  */
-static void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys 
*phys_enc)
+static void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys 
*phys_enc,
+ const struct dpu_format *dpu_fmt,
+ u32 output_type)
 {
struct dpu_hw_cdm *hw_cdm;
struct dpu_hw_cdm_cfg *cdm_cfg;
struct dpu_hw_pingpong *hw_pp;
-   struct dpu_encoder_phys_wb *wb_enc;
-   const struct msm_format *format;
-   const struct dpu_format *dpu_fmt;
-   struct drm_writeback_job *wb_job;
int ret;
 
if (!phys_enc)
return;
 
-   wb_enc = to_dpu_encoder_phys_wb(phys_enc);
-   cdm_cfg = _enc->cdm_cfg;
+   cdm_cfg = _enc->cdm_cfg;
hw_pp = phys_enc->hw_pp;
hw_cdm = phys_enc->hw_cdm;
-   wb_job = wb_enc->wb_job;
-
-   format = msm_framebuffer_format(wb_enc->wb_job->fb);
-   dpu_fmt = dpu_get_dpu_format_ext(format->pixel_format, 
wb_job->fb->modifier);
 
if (!hw_cdm)
return;
@@ -306,10 +299,10 @@ static void dpu_encoder_helper_phys_setup_cdm(struct 
dpu_encoder_phys *phys_enc)
 
memset(cdm_cfg, 0, sizeof(struct dpu_hw_cdm_cfg));
 
-   cdm_cfg->output_width = wb_job->fb->width;
-   cdm_cfg->output_height = wb_job->fb->height;
+   cdm_cfg->output_width = phys_enc->cached_mode.hdisplay;
+   cdm_cfg->output_height = phys_enc->cached_mode.vdisplay;
cdm_cfg->output_fmt = dpu_fmt;
-   cdm_cfg->output_type = CDM_CDWN_OUTPUT_WB;
+   cdm_cfg->output_type = output_type;
cdm_cfg->output_bit_depth = DPU_FORMAT_IS_DX(dpu_fmt) ?
CDM_CDWN_OUTPUT_10BIT : CDM_CDWN_OUTPUT_8BIT;
cdm_cfg->csc_cfg = _csc10_rgb2yuv_601l;
@@ -462,6 +455,14 @@ static void dpu_encoder_phys_wb_setup(
struct dpu_hw_wb *hw_wb = phys_enc->hw_wb;
struct drm_display_mode mode = phys_enc->cached_mode;
struct drm_framebuffer *fb = NULL;
+   struct dpu_encoder_phys_wb *wb_enc = to_dpu_encoder_phys_wb(phys_enc);
+   struct drm_writeback_job *wb_job;
+   const struct msm_format *format;
+   const struct dpu_format *dpu_fmt;
+
+   wb_job =

[PATCH 09/17] drm/msm/dp: move parity calculation to dp_catalog

2024-01-25 Thread Paloma Arellano

Parity calculation is necessary for VSC SDP implementation, therefore
move it to dp_catalog so it usable by both SDP programming and
dp_audio.c

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_audio.c   | 100 
 drivers/gpu/drm/msm/dp/dp_catalog.h |  72 
 2 files changed, 86 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_audio.c 
b/drivers/gpu/drm/msm/dp/dp_audio.c
index 4a2e479723a85..7aa785018155a 100644
--- a/drivers/gpu/drm/msm/dp/dp_audio.c
+++ b/drivers/gpu/drm/msm/dp/dp_audio.c
@@ -16,13 +16,6 @@
 #include "dp_panel.h"
 #include "dp_display.h"
 
-#define HEADER_BYTE_2_BIT   0
-#define PARITY_BYTE_2_BIT   8
-#define HEADER_BYTE_1_BIT  16
-#define PARITY_BYTE_1_BIT  24
-#define HEADER_BYTE_3_BIT  16
-#define PARITY_BYTE_3_BIT  24
-
 struct dp_audio_private {
struct platform_device *audio_pdev;
struct platform_device *pdev;
@@ -36,71 +29,6 @@ struct dp_audio_private {
struct dp_audio dp_audio;
 };
 
-static u8 dp_audio_get_g0_value(u8 data)
-{
-   u8 c[4];
-   u8 g[4];
-   u8 ret_data = 0;
-   u8 i;
-
-   for (i = 0; i < 4; i++)
-   c[i] = (data >> i) & 0x01;
-
-   g[0] = c[3];
-   g[1] = c[0] ^ c[3];
-   g[2] = c[1];
-   g[3] = c[2];
-
-   for (i = 0; i < 4; i++)
-   ret_data = ((g[i] & 0x01) << i) | ret_data;
-
-   return ret_data;
-}
-
-static u8 dp_audio_get_g1_value(u8 data)
-{
-   u8 c[4];
-   u8 g[4];
-   u8 ret_data = 0;
-   u8 i;
-
-   for (i = 0; i < 4; i++)
-   c[i] = (data >> i) & 0x01;
-
-   g[0] = c[0] ^ c[3];
-   g[1] = c[0] ^ c[1] ^ c[3];
-   g[2] = c[1] ^ c[2];
-   g[3] = c[2] ^ c[3];
-
-   for (i = 0; i < 4; i++)
-   ret_data = ((g[i] & 0x01) << i) | ret_data;
-
-   return ret_data;
-}
-
-static u8 dp_audio_calculate_parity(u32 data)
-{
-   u8 x0 = 0;
-   u8 x1 = 0;
-   u8 ci = 0;
-   u8 iData = 0;
-   u8 i = 0;
-   u8 parity_byte;
-   u8 num_byte = (data & 0xFF00) > 0 ? 8 : 2;
-
-   for (i = 0; i < num_byte; i++) {
-   iData = (data >> i*4) & 0xF;
-
-   ci = iData ^ x1;
-   x1 = x0 ^ dp_audio_get_g1_value(ci);
-   x0 = dp_audio_get_g0_value(ci);
-   }
-
-   parity_byte = x1 | (x0 << 4);
-
-   return parity_byte;
-}
-
 static u32 dp_audio_get_header(struct dp_catalog *catalog,
enum dp_catalog_audio_sdp_type sdp,
enum dp_catalog_audio_header_type header)
@@ -134,7 +62,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_1);
 
new_value = 0x02;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_1_BIT)
| (parity_byte << PARITY_BYTE_1_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -147,7 +75,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
value = dp_audio_get_header(catalog,
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_2);
new_value = value;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_2_BIT)
| (parity_byte << PARITY_BYTE_2_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -162,7 +90,7 @@ static void dp_audio_stream_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_STREAM, DP_AUDIO_SDP_HEADER_3);
 
new_value = audio->channels - 1;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_3_BIT)
| (parity_byte << PARITY_BYTE_3_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -184,7 +112,7 @@ static void dp_audio_timestamp_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_TIMESTAMP, DP_AUDIO_SDP_HEADER_1);
 
new_value = 0x1;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_1_BIT)
| (parity_byte << PARITY_BYTE_1_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -198,7 +126,7 @@ static void dp_audio_timestamp_sdp(struct dp_audio_private 
*audio)
DP_AUDIO_SDP_TIMESTAMP, DP_AUDIO_SDP_HEADER_2);
 
new_value = 0x17;
-   parity_byte = dp_audio_calculate_parity(new_value);
+   parity_byte = dp_catalog_calculate_parity(new_value);
value |= ((new_value << HEADER_BYTE_2_BIT)
| (parity_byte << PARITY_BYTE_2_BIT));
drm_dbg_dp(audio->drm_dev,
@@ -212,7 +140,7 @@ static void

[PATCH 08/17] drm/msm/dp: change YUV420 related programming for DP

2024-01-25 Thread Paloma Arellano

Change all relevant DP controller related programming for YUV420 cases.
Namely, change the pixel clock math to consider YUV420, program the
configuration control to indicate YUV420, as well as modify the MVID
programming to consider YUV420.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_catalog.c |  5 -
 drivers/gpu/drm/msm/dp/dp_catalog.h |  2 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 12 +---
 drivers/gpu/drm/msm/dp/dp_display.c |  8 +++-
 drivers/gpu/drm/msm/msm_kms.h   |  3 +++
 5 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 5142aeb705a44..5d84c089e520a 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -442,7 +442,7 @@ void dp_catalog_ctrl_config_misc(struct dp_catalog 
*dp_catalog,
 
 void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog,
u32 rate, u32 stream_rate_khz,
-   bool fixed_nvid)
+   bool fixed_nvid, bool is_ycbcr_420)
 {
u32 pixel_m, pixel_n;
u32 mvid, nvid, pixel_div = 0, dispcc_input_rate;
@@ -485,6 +485,9 @@ void dp_catalog_ctrl_config_msa(struct dp_catalog 
*dp_catalog,
nvid = temp;
}
 
+   if (is_ycbcr_420)
+   mvid /= 2;
+
if (link_rate_hbr2 == rate)
nvid *= 2;
 
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.h 
b/drivers/gpu/drm/msm/dp/dp_catalog.h
index 38786e855b51a..6cb5e2a243de2 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.h
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.h
@@ -96,7 +96,7 @@ void dp_catalog_ctrl_mainlink_ctrl(struct dp_catalog 
*dp_catalog, bool enable);
 void dp_catalog_ctrl_psr_mainlink_enable(struct dp_catalog *dp_catalog, bool 
enable);
 void dp_catalog_ctrl_config_misc(struct dp_catalog *dp_catalog, u32 cc, u32 
tb);
 void dp_catalog_ctrl_config_msa(struct dp_catalog *dp_catalog, u32 rate,
-   u32 stream_rate_khz, bool fixed_nvid);
+   u32 stream_rate_khz, bool fixed_nvid, bool 
is_ycbcr_420);
 int dp_catalog_ctrl_set_pattern_state_bit(struct dp_catalog *dp_catalog, u32 
pattern);
 u32 dp_catalog_hw_revision(const struct dp_catalog *dp_catalog);
 void dp_catalog_ctrl_reset(struct dp_catalog *dp_catalog);
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 77a8d9366ed7b..209cf2a35642f 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -128,6 +128,9 @@ static void dp_ctrl_config_ctrl(struct dp_ctrl_private 
*ctrl)
/* Default-> LSCLK DIV: 1/4 LCLK  */
config |= (2 << DP_CONFIGURATION_CTRL_LSCLK_DIV_SHIFT);
 
+   if (ctrl->panel->dp_mode.out_fmt_is_yuv_420)
+   config |= DP_CONFIGURATION_CTRL_RGB_YUV; /* YUV420 */
+
/* Scrambler reset enable */
if (drm_dp_alternate_scrambler_reset_cap(dpcd))
config |= DP_CONFIGURATION_CTRL_ASSR;
@@ -957,7 +960,7 @@ static void dp_ctrl_calc_tu_parameters(struct 
dp_ctrl_private *ctrl,
in.hporch = drm_mode->htotal - drm_mode->hdisplay;
in.nlanes = ctrl->link->link_params.num_lanes;
in.bpp = ctrl->panel->dp_mode.bpp;
-   in.pixel_enc = 444;
+   in.pixel_enc = ctrl->panel->dp_mode.out_fmt_is_yuv_420 ? 420 : 444;
in.dsc_en = 0;
in.async_en = 0;
in.fec_en = 0;
@@ -1763,6 +1766,8 @@ int dp_ctrl_on_link(struct dp_ctrl *dp_ctrl)
ctrl->link->link_params.rate = rate;
ctrl->link->link_params.num_lanes =
ctrl->panel->link_info.num_lanes;
+   if (ctrl->panel->dp_mode.out_fmt_is_yuv_420)
+   pixel_rate >>= 1;
}
 
drm_dbg_dp(ctrl->drm_dev, "rate=%d, num_lanes=%d, pixel_rate=%lu\n",
@@ -1878,7 +1883,7 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
 
pixel_rate = pixel_rate_orig = ctrl->panel->dp_mode.drm_mode.clock;
 
-   if (dp_ctrl->wide_bus_en)
+   if (dp_ctrl->wide_bus_en || ctrl->panel->dp_mode.out_fmt_is_yuv_420)
pixel_rate >>= 1;
 
drm_dbg_dp(ctrl->drm_dev, "rate=%d, num_lanes=%d, pixel_rate=%lu\n",
@@ -1917,7 +1922,8 @@ int dp_ctrl_on_stream(struct dp_ctrl *dp_ctrl, bool 
force_link_train)
 
dp_catalog_ctrl_config_msa(ctrl->catalog,
ctrl->link->link_params.rate,
-   pixel_rate_orig, dp_ctrl_use_fixed_nvid(ctrl));
+   pixel_rate_orig, dp_ctrl_use_fixed_nvid(ctrl),
+   ctrl->panel->dp_mode.out_fmt_is_yuv_420);
 
dp_ctrl_setup_tr_unit(ctrl);
 
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index f6b3b6ca242f8..6d764f5b08727 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -916,9 +916,10 @@ enum drm_mode_status

[PATCH 03/17] drm/msm/dp: rename wide_bus_en to wide_bus_supported

2024-01-25 Thread Paloma Arellano

Rename wide_bus_en to wide_bus_supported in dp_display_private to
correctly establish that the parameter is referencing if wide bus is
supported instead of enabled.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/dp/dp_display.c | 42 ++---
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index d37d599aec273..9df2a8b21021e 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -113,7 +113,7 @@ struct dp_display_private {
struct dp_event event_list[DP_EVENT_Q_MAX];
spinlock_t event_lock;
 
-   bool wide_bus_en;
+   bool wide_bus_supported;
 
struct dp_audio *audio;
 };
@@ -122,7 +122,7 @@ struct msm_dp_desc {
phys_addr_t io_start;
unsigned int id;
unsigned int connector_type;
-   bool wide_bus_en;
+   bool wide_bus_supported;
 };
 
 static const struct msm_dp_desc sc7180_dp_descs[] = {
@@ -131,8 +131,8 @@ static const struct msm_dp_desc sc7180_dp_descs[] = {
 };
 
 static const struct msm_dp_desc sc7280_dp_descs[] = {
-   { .io_start = 0x0ae9, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_en = true },
+   { .io_start = 0x0ae9, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_supported = true },
{}
 };
 
@@ -144,22 +144,22 @@ static const struct msm_dp_desc sc8180x_dp_descs[] = {
 };
 
 static const struct msm_dp_desc sc8280xp_dp_descs[] = {
-   { .io_start = 0x0ae9, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x0ae98000, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x0ae9a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x2209, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x22098000, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x2209a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
-   { .io_start = 0x220a, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_en = true },
+   { .io_start = 0x0ae9, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x0ae98000, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x0ae9a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x2209, .id = MSM_DP_CONTROLLER_0, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x22098000, .id = MSM_DP_CONTROLLER_1, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x2209a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
+   { .io_start = 0x220a, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_DisplayPort, .wide_bus_supported = true },
{}
 };
 
 static const struct msm_dp_desc sc8280xp_edp_descs[] = {
-   { .io_start = 0x0ae9a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_en = true },
-   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_en = true },
-   { .io_start = 0x2209a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_en = true },
-   { .io_start = 0x220a, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_en = true },
+   { .io_start = 0x0ae9a000, .id = MSM_DP_CONTROLLER_2, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_supported = true },
+   { .io_start = 0x0aea, .id = MSM_DP_CONTROLLER_3, .connector_type = 
DRM_MODE_CONNECTOR_eDP, .wide_bus_supported = true },
+   { .io_start = 0x2209a000, .id = MSM_DP_CONTROLLER_2, .connector_type =

[PATCH 02/17] drm/msm/dpu: move dpu_encoder_helper_phys_setup_cdm to dpu_encoder

2024-01-25 Thread Paloma Arellano

Move dpu_encoder_helper_phys_setup_cdm to dpu_encoder in preparation for
implementing CDM compatibility for DP.

Signed-off-by: Paloma Arellano 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 78 +
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  9 ++
 .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 84 ---
 3 files changed, 87 insertions(+), 84 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 83380bc92a00a..6cef98f046ea6 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -2114,6 +2114,84 @@ void dpu_encoder_helper_phys_cleanup(struct 
dpu_encoder_phys *phys_enc)
ctl->ops.clear_pending_flush(ctl);
 }
 
+void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc,
+  const struct dpu_format *dpu_fmt,
+  u32 output_type)
+{
+   struct dpu_hw_cdm *hw_cdm;
+   struct dpu_hw_cdm_cfg *cdm_cfg;
+   struct dpu_hw_pingpong *hw_pp;
+   int ret;
+
+   if (!phys_enc)
+   return;
+
+   cdm_cfg = _enc->cdm_cfg;
+   hw_pp = phys_enc->hw_pp;
+   hw_cdm = phys_enc->hw_cdm;
+
+   if (!hw_cdm)
+   return;
+
+   if (!DPU_FORMAT_IS_YUV(dpu_fmt)) {
+   DPU_DEBUG("[enc:%d] cdm_disable fmt:%x\n", 
DRMID(phys_enc->parent),
+ dpu_fmt->base.pixel_format);
+   if (hw_cdm->ops.bind_pingpong_blk)
+   hw_cdm->ops.bind_pingpong_blk(hw_cdm, PINGPONG_NONE);
+
+   return;
+   }
+
+   memset(cdm_cfg, 0, sizeof(struct dpu_hw_cdm_cfg));
+
+   cdm_cfg->output_width = phys_enc->cached_mode.hdisplay;
+   cdm_cfg->output_height = phys_enc->cached_mode.vdisplay;
+   cdm_cfg->output_fmt = dpu_fmt;
+   cdm_cfg->output_type = output_type;
+   cdm_cfg->output_bit_depth = DPU_FORMAT_IS_DX(dpu_fmt) ?
+   CDM_CDWN_OUTPUT_10BIT : CDM_CDWN_OUTPUT_8BIT;
+   cdm_cfg->csc_cfg = _csc10_rgb2yuv_601l;
+
+   /* enable 10 bit logic */
+   switch (cdm_cfg->output_fmt->chroma_sample) {
+   case DPU_CHROMA_RGB:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_DISABLE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   case DPU_CHROMA_H2V1:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_COSITE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   case DPU_CHROMA_420:
+   cdm_cfg->h_cdwn_type = CDM_CDWN_COSITE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_OFFSITE;
+   break;
+   case DPU_CHROMA_H1V2:
+   default:
+   DPU_ERROR("[enc:%d] unsupported chroma sampling type\n",
+ DRMID(phys_enc->parent));
+   cdm_cfg->h_cdwn_type = CDM_CDWN_DISABLE;
+   cdm_cfg->v_cdwn_type = CDM_CDWN_DISABLE;
+   break;
+   }
+
+   DPU_DEBUG("[enc:%d] cdm_enable:%d,%d,%X,%d,%d,%d,%d]\n",
+ DRMID(phys_enc->parent), cdm_cfg->output_width,
+ cdm_cfg->output_height, 
cdm_cfg->output_fmt->base.pixel_format,
+ cdm_cfg->output_type, cdm_cfg->output_bit_depth,
+ cdm_cfg->h_cdwn_type, cdm_cfg->v_cdwn_type);
+
+   if (hw_cdm->ops.enable) {
+   cdm_cfg->pp_id = hw_pp->idx;
+   ret = hw_cdm->ops.enable(hw_cdm, cdm_cfg);
+   if (ret < 0) {
+   DPU_ERROR("[enc:%d] failed to enable CDM; ret:%d\n",
+ DRMID(phys_enc->parent), ret);
+   return;
+   }
+   }
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int _dpu_encoder_status_show(struct seq_file *s, void *data)
 {
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
index 37ac385727c3b..310944303a056 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h
@@ -381,6 +381,15 @@ int dpu_encoder_helper_wait_for_irq(struct 
dpu_encoder_phys *phys_enc,
  */
 void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc);
 
+/**
+ * dpu_encoder_helper_phys_setup_cdm - setup chroma down sampling block
+ * @phys_enc: Pointer to physical encoder
+ * @output_type: HDMI/WB
+ */
+void dpu_encoder_helper_phys_setup_cdm(struct dpu_encoder_phys *phys_enc,
+  const struct dpu_format *dpu_fmt,
+  u32 output_type);
+
 /**
  * dpu_encoder_vblank_callback - Notify virtual encoder of vblank IRQ reception
  * @drm_enc:Pointer to drm encoder structure
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_wb.c
index 072fc6950e496..400580847bde7 100644
---

[PATCH 00/17] Add support for CDM over DP

2024-01-25 Thread Paloma Arellano

The Chroma Down Sampling (CDM) block is a hardware component in the DPU
pipeline that includes a CSC block capable of converting RGB input from
the DPU to YUV data.

This block can be used with either HDMI, DP, or writeback interfaces.
This series adds support for the CDM block to be used with DP in
YUV420 mode format.

This series allows selection of the YUV420 format for monitors which support
certain resolutions only in YUV420 thus unblocking the validation of many
other resolutions which were previously filtered out if the connector did
not support YUV420.

This was validated using a DP connected monitor requiring the use of
YUV420 format.

This series currently works as-is. But it was also validated to function on
top of in the case of future integration:

https://patchwork.freedesktop.org/series/118831/

Kuogee Hsieh (1):
  drm/msm/dpu: add support of new peripheral flush mechanism

Paloma Arellano (16):
  drm/msm/dpu: allow dpu_encoder_helper_phys_setup_cdm to work for DP
  drm/msm/dpu: move dpu_encoder_helper_phys_setup_cdm to dpu_encoder
  drm/msm/dp: rename wide_bus_en to wide_bus_supported
  drm/msm/dp: store mode YUV420 information to be used by rest of DP
  drm/msm/dp: add an API to indicate if sink supports VSC SDP
  drm/msm/dpu: move widebus logic to its own API
  drm/msm/dpu: disallow widebus en in INTF_CONFIG2 when DP is YUV420
  drm/msm/dp: change YUV420 related programming for DP
  drm/msm/dp: move parity calculation to dp_catalog
  drm/msm/dp: modify dp_catalog_hw_revision to show major and minor val
  drm/msm/dp: add VSC SDP support for YUV420 over DP
  drm/msm/dp: enable SDP and SDE periph flush update
  drm/msm/dpu: modify encoder programming for CDM over DP
  drm/msm/dpu: allow certain formats for CDM for DP
  drm/msm/dpu: reserve CDM blocks for DP if mode is YUV420
  drm/msm/dp: allow YUV420 mode for DP connector when VSC SDP supported

 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c   | 143 --
 drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h   |  12 ++
 .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h  |  13 +-
 .../drm/msm/disp/dpu1/dpu_encoder_phys_vid.c  |  36 +++-
 .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c   | 101 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_cdm.c|   2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c|  17 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h|  10 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c   |   4 +-
 drivers/gpu/drm/msm/dp/dp_audio.c | 100 ++
 drivers/gpu/drm/msm/dp/dp_catalog.c   | 182 +-
 drivers/gpu/drm/msm/dp/dp_catalog.h   |  81 +++-
 drivers/gpu/drm/msm/dp/dp_ctrl.c  |  17 +-
 drivers/gpu/drm/msm/dp/dp_display.c   |  79 +---
 drivers/gpu/drm/msm/dp/dp_panel.c |  82 +++-
 drivers/gpu/drm/msm/dp/dp_panel.h |   2 +
 drivers/gpu/drm/msm/dp/dp_reg.h   |   5 +
 drivers/gpu/drm/msm/msm_drv.h |   9 +-
 drivers/gpu/drm/msm/msm_kms.h |   3 +
 19 files changed, 655 insertions(+), 243 deletions(-)

-- 
2.39.2

Re: [PATCH v5 003/111] pwm: Provide a macro to get the parent device of a given chip

2024-01-25 Thread Florian Fainelli


On 1/25/24 04:08, Uwe Kleine-König wrote:

Currently a pwm_chip stores in its struct device *dev member a pointer
to the parent device. Preparing a change that embeds a full struct
device in struct pwm_chip, this accessor macro should be used in all
drivers directly accessing chip->dev now. This way struct pwm_chip and
this macro can be changed without having to touch all drivers in the
same change set.

Signed-off-by: Uwe Kleine-König 


Nit: this is not a macro but an inline function.
--
Florian



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] drm/imagination: On device loss, handle unplug after critical section

2024-01-25 Thread Daniel Vetter

On Tue, Jan 23, 2024 at 01:04:24PM +, Matt Coster wrote:
> From: Donald Robson 
> 
> When the kernel driver 'loses' the device, for instance if the firmware
> stops communicating, the driver calls drm_dev_unplug(). This is
> currently done inside the drm_dev_enter() critical section, which isn't
> permitted. In addition, the bool that marks the device as lost is not
> atomic or protected by a lock.
> 
> This fix replaces the bool with an atomic that also acts as a mechanism
> to ensure the device is unplugged after drm_dev_exit(), preventing a
> concurrent call to drm_dev_enter() from succeeding in a race between it
> and drm_dev_unplug().

Uh ... atomic_t does not make locking.

>From a quick look this entire thing smells a bit like bad design overall,
and my gut feeling is that you probably want to rip out pvr_dev->lost
outright. Or alternatively, explain what exactly this does beyond
drm_dev_enter/exit, and then probably add that functionality there instead
of hand-roll lockless trickery in drivers.

>From a quick look keeping track of where you realize the device is lost
and then calling drm_dev_unplug after the drm_dev_exit is probably the
clean solution. That also means the drm_dev_unplug() is not delayed due to
a drm_dev_enter/exit section on a different thread, which is probably a
good thing.

Cheers, Sima

> 
> Reported-by: Steven Price 
> Closes: 
> https://lore.kernel.org/dri-devel/1b957ca4-71cf-42fd-ac81-1920592b9...@arm.com/
> Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and 
> META FW support")
> Signed-off-by: Donald Robson 
> Signed-off-by: Matt Coster 
> ---
>  drivers/gpu/drm/imagination/pvr_ccb.c  |  2 +-
>  drivers/gpu/drm/imagination/pvr_device.c   | 98 +-
>  drivers/gpu/drm/imagination/pvr_device.h   | 72 +---
>  drivers/gpu/drm/imagination/pvr_drv.c  | 87 ++-
>  drivers/gpu/drm/imagination/pvr_fw.c   | 12 +--
>  drivers/gpu/drm/imagination/pvr_fw_trace.c |  4 +-
>  drivers/gpu/drm/imagination/pvr_mmu.c  | 20 ++---
>  drivers/gpu/drm/imagination/pvr_power.c| 42 +++---
>  drivers/gpu/drm/imagination/pvr_power.h|  2 -
>  9 files changed, 237 insertions(+), 102 deletions(-)
> 
> diff --git a/drivers/gpu/drm/imagination/pvr_ccb.c 
> b/drivers/gpu/drm/imagination/pvr_ccb.c
> index 4deeac7ed40a..1fe64adc0c2c 100644
> --- a/drivers/gpu/drm/imagination/pvr_ccb.c
> +++ b/drivers/gpu/drm/imagination/pvr_ccb.c
> @@ -247,7 +247,7 @@ pvr_kccb_send_cmd_reserved_powered(struct pvr_device 
> *pvr_dev,
>   u32 old_write_offset;
>   u32 new_write_offset;
>  
> - WARN_ON(pvr_dev->lost);
> + WARN_ON(pvr_device_is_lost(pvr_dev));
>  
>   mutex_lock(_ccb->lock);
>  
> diff --git a/drivers/gpu/drm/imagination/pvr_device.c 
> b/drivers/gpu/drm/imagination/pvr_device.c
> index 1704c0268589..397491375b7d 100644
> --- a/drivers/gpu/drm/imagination/pvr_device.c
> +++ b/drivers/gpu/drm/imagination/pvr_device.c
> @@ -6,14 +6,15 @@
>  
>  #include "pvr_fw.h"
>  #include "pvr_params.h"
> -#include "pvr_power.h"
>  #include "pvr_queue.h"
>  #include "pvr_rogue_cr_defs.h"
>  #include "pvr_stream.h"
>  #include "pvr_vm.h"
>  
> +#include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -556,6 +557,101 @@ pvr_device_fini(struct pvr_device *pvr_dev)
>   pvr_device_gpu_fini(pvr_dev);
>  }
>  
> +/**
> + * pvr_device_enter() - Try to enter device critical section.
> + * @pvr_dev: Target PowerVR device.
> + * @idx: Pointer to index that will be passed to the matching 
> pvr_device_exit().
> + *
> + * Use this in place of drm_dev_enter() within this driver.
> + *
> + * Returns:
> + *  * %true if the critical section was entered, or
> + *  * %false otherwise.
> + */
> +bool pvr_device_enter(struct pvr_device *pvr_dev, int *idx)
> +{
> + const enum pvr_device_state old_state =
> + atomic_cmpxchg(_dev->state,
> +PVR_DEVICE_STATE_PRESENT,
> +PVR_DEVICE_STATE_ENTERED);
> +
> + switch (old_state) {
> + case PVR_DEVICE_STATE_PRESENT:
> + case PVR_DEVICE_STATE_ENTERED:
> + return drm_dev_enter(from_pvr_device(pvr_dev), idx);
> +
> + case PVR_DEVICE_STATE_LOST:
> + case PVR_DEVICE_STATE_LOST_UNPLUGGED:
> + WARN_ONCE(1, "Attempt to use GPU after becoming lost.");
> + break;
> + }
> +
> + return false;
> +}
> +
> +/**
> + * pvr_device_exit() - Exit a device critical section.
> + * @pvr_dev: Target PowerVR device.
> + * @idx: Index given by matching pvr_device_enter().
> + *
> + * Use this in place of drm_dev_exit() within this driver.
> + */
> +void pvr_device_exit(struct pvr_device *pvr_dev, int idx)
> +{
> + const enum pvr_device_state old_state =
> + atomic_cmpxchg(_dev->state,
> +PVR_DEVICE_STATE_ENTERED,
> +PVR_DEVICE_STATE_PRESENT);
> +
> + switch (old_state) {
> +

Re: [PATCH 2/2] drm/bridge: samsung-dsim: Fix porch calcalcuation rounding

2024-01-25 Thread Adam Ford

On Mon, Dec 11, 2023 at 9:33 PM Adam Ford  wrote:
>
> When using video sync pulses, the HFP, HBP, and HSA are divided between
> the available lanes if there is more than one lane.  For certain
> timings and lane configurations, the HFP may not be evenly divisible.
> If the HFP is rounded down, it ends up being too small which can cause
> some monitors to not sync properly. In these instances, adjust htotal
> and hsync to round the HFP up, and recalculate the htotal.
>
> Tested-by: Frieder Schrempf  # Kontron BL 
> i.MX8MM with HDMI monitor
> Signed-off-by: Adam Ford 

Gentle nudge on this one.  Basically this fixes an issue with the 8MP,
but it's still unknown why it doesn't work on 8MM or 8MN, but Frieder
confirmed there are no regressions on 8MM or 8MN.

adam


>
> diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c 
> b/drivers/gpu/drm/bridge/samsung-dsim.c
> index 239d253a7d71..f5795da1d8bb 100644
> --- a/drivers/gpu/drm/bridge/samsung-dsim.c
> +++ b/drivers/gpu/drm/bridge/samsung-dsim.c
> @@ -1628,6 +1628,27 @@ static int samsung_dsim_atomic_check(struct drm_bridge 
> *bridge,
> adjusted_mode->flags |= (DRM_MODE_FLAG_PHSYNC | 
> DRM_MODE_FLAG_PVSYNC);
> }
>
> +   /*
> +* When using video sync pulses, the HFP, HBP, and HSA are divided 
> between
> +* the available lanes if there is more than one lane.  For certain
> +* timings and lane configurations, the HFP may not be evenly 
> divisible.
> +* If the HFP is rounded down, it ends up being too small which can 
> cause
> +* some monitors to not sync properly. In these instances, adjust 
> htotal
> +* and hsync to round the HFP up, and recalculate the htotal. Through 
> trial
> +* and error, it appears that the HBP and HSA do not appearto need 
> the same
> +* correction that HFP does.
> +*/
> +   if (dsi->mode_flags & MIPI_DSI_MODE_VIDEO_SYNC_PULSE && dsi->lanes > 
> 1) {
> +   int hfp = adjusted_mode->hsync_start - 
> adjusted_mode->hdisplay;
> +   int remainder = hfp % dsi->lanes;
> +
> +   if (remainder) {
> +   adjusted_mode->hsync_start += remainder;
> +   adjusted_mode->hsync_end   += remainder;
> +   adjusted_mode->htotal  += remainder;
> +   }
> +   }
> +
> return 0;
>  }
>
> --
> 2.40.1
>

Re: [PATCH] drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func

2024-01-25 Thread Erik Kurzinger

Sorry, I realized there is a mistake in this patch after sending it out. It 
results in a use-after-free of "entry". I've sent out an updated version which 
should avoid the issue.

On 1/25/24 10:03, Erik Kurzinger wrote:
> During syncobj_eventfd_entry_func, dma_fence_chain_find_seqno may set
> the fence to NULL if the given seqno is signaled and a later seqno has
> already been submitted. In that case, the eventfd should be signaled
> immediately which currently does not happen.
> 
> This is a similar issue to the one addressed by b19926d4f3a6
> ("drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence")
> 
> As a fix, if the return value of dma_fence_chain_find_seqno indicates
> success but it sets the fence to NULL, we should simply signal the
> eventfd immediately.
> 
> Signed-off-by: Erik Kurzinger 
> Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd")
> ---
>  drivers/gpu/drm/drm_syncobj.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index e04965878a08..cc3af1084950 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1441,10 +1441,21 @@ syncobj_eventfd_entry_func(struct drm_syncobj 
> *syncobj,
>  
>   /* This happens inside the syncobj lock */
>   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
> + if (!fence)
> + return;
> +
>   ret = dma_fence_chain_find_seqno(, entry->point);
> - if (ret != 0 || !fence) {
> + if (ret != 0) {
> + /* The given seqno has not been submitted yet. */
>   dma_fence_put(fence);
>   return;
> + } else if (!fence) {
> + /* If dma_fence_chain_find_seqno returns 0 but sets the fence
> +  * to NULL, it implies that the given seqno is signaled and a
> +  * later seqno has already been submitted. Signal the eventfd
> +  * immediately in that case. */
> + eventfd_signal(entry->ev_fd_ctx, 1);
> + syncobj_eventfd_entry_free(entry);
>   }
>  
>   list_del_init(>node);

[PATCH v2] drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func

2024-01-25 Thread Erik Kurzinger

During syncobj_eventfd_entry_func, dma_fence_chain_find_seqno may set
the fence to NULL if the given seqno is signaled and a later seqno has
already been submitted. In that case, the eventfd should be signaled
immediately which currently does not happen.

This is a similar issue to the one addressed by b19926d4f3a6
("drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence")

As a fix, if the return value of dma_fence_chain_find_seqno indicates
success but it sets the fence to NULL, we will assign a stub fence to
ensure the following code still signals the eventfd.

v1 -> v2: assign a stub fence instead of signaling the eventfd

Signed-off-by: Erik Kurzinger 
Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd")
---
 drivers/gpu/drm/drm_syncobj.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e04965878a08..10476204f8b0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1441,10 +1441,20 @@ syncobj_eventfd_entry_func(struct drm_syncobj *syncobj,
 
/* This happens inside the syncobj lock */
fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence)
+   return;
+
ret = dma_fence_chain_find_seqno(, entry->point);
-   if (ret != 0 || !fence) {
+   if (ret != 0) {
+   /* The given seqno has not been submitted yet. */
dma_fence_put(fence);
return;
+   } else if (!fence) {
+   /* If dma_fence_chain_find_seqno returns 0 but sets the fence
+* to NULL, it implies that the given seqno is signaled and a
+* later seqno has already been submitted. Assign a stub fence
+* so that the eventfd still gets signaled below. */
+   fence = dma_fence_get_stub();
}
 
list_del_init(>node);
-- 
2.43.0

RE: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Zeng, Oak



> -Original Message-
> From: Felix Kuehling 
> Sent: Thursday, January 25, 2024 12:16 PM
> To: Zeng, Oak ; Christian König
> ; Danilo Krummrich ; Dave
> Airlie ; Daniel Vetter ; Shah, Ankur N
> ; Winiarski, Michal 
> Cc: Welty, Brian ; dri-devel@lists.freedesktop.org; 
> intel-
> x...@lists.freedesktop.org; Bommu, Krishnaiah ;
> Ghimiray, Himal Prasad ;
> thomas.hellst...@linux.intel.com; Vishwanathapura, Niranjana
> ; Brost, Matthew
> ; Gupta, saurabhg 
> Subject: Re: Making drm_gpuvm work across gpu devices
> 
> 
> On 2024-01-24 20:17, Zeng, Oak wrote:
> >
> > Hi Christian,
> >
> > Even though I mentioned KFD design, I didn’t mean to copy the KFD
> > design. I also had hard time to understand the difficulty of KFD under
> > virtualization environment.
> >
> The problem with virtualization is related to virtualization design
> choices. There is a single process that proxies requests from multiple
> processes in one (or more?) VMs to the GPU driver. That means, we need a
> single process with multiple contexts (and address spaces). One proxy
> process on the host must support multiple guest address spaces.

My first response is, why processes on the virtual machine can't open /dev/kfd 
device itself?

Also try to picture why base amdgpu driver (which is per hardware device based) 
doesn't have this problem... creating multiple contexts under single amdgpu 
device, each context servicing one guest process?
> 
> I don't know much more than these very high level requirements, and I
> only found out about those a few weeks ago. Due to my own bias I can't
> comment whether there are bad design choices in the proxy architecture
> or in KFD or both. The way we are considering fixing this, is to enable
> creating multiple KFD contexts in the same process. Each of those
> contexts will still represent a shared virtual address space across
> devices (but not the CPU). Because the device address space is not
> shared with the CPU, we cannot support our SVM API in this situation.
> 

One kfd process, multiple contexts, each context has a shared address space 
across devices I do see some complications 

> I still believe that it makes sense to have the kernel mode driver aware
> of a shared virtual address space at some level. A per-GPU API and an
> API that doesn't require matching CPU and GPU virtual addresses would
> enable more flexibility at the cost duplicate information tracking for
> multiple devices and duplicate overhead for things like MMU notifiers
> and interval tree data structures. Having to coordinate multiple devices
> with potentially different address spaces would probably make it more
> awkward to implement memory migration. The added flexibility would go
> mostly unused, except in some very niche applications.
> 
> Regards,
>    Felix
> 
> 
> > For us, Xekmd doesn't need to know it is running under bare metal or
> > virtualized environment. Xekmd is always a guest driver. All the
> > virtual address used in xekmd is guest virtual address. For SVM, we
> > require all the VF devices share one single shared address space with
> > guest CPU program. So all the design works in bare metal environment
> > can automatically work under virtualized environment. +@Shah, Ankur N
> >  +@Winiarski, Michal
> >  to backup me if I am wrong.
> >
> > Again, shared virtual address space b/t cpu and all gpu devices is a
> > hard requirement for our system allocator design (which means
> > malloc’ed memory, cpu stack variables, globals can be directly used in
> > gpu program. Same requirement as kfd SVM design). This was aligned
> > with our user space software stack.
> >
> > For anyone who want to implement system allocator, or SVM, this is a
> > hard requirement. I started this thread hoping I can leverage the
> > drm_gpuvm design to manage the shared virtual address space (as the
> > address range split/merge function was scary to me and I didn’t want
> > re-invent). I guess my takeaway from this you and Danilo is this
> > approach is a NAK. Thomas also mentioned to me drm_gpuvm is a overkill
> > for our svm address range split/merge. So I will make things work
> > first by manage address range xekmd internally. I can re-look
> > drm-gpuvm approach in the future.
> >
> > Maybe a pseudo user program can illustrate our programming model:
> >
> > Fd0 = open(card0)
> >
> > Fd1 = open(card1)
> >
> > Vm0 =xe_vm_create(fd0) //driver create process xe_svm on the process's
> > first vm_create
> >
> > Vm1 = xe_vm_create(fd1) //driver re-use xe_svm created above if called
> > from same process
> >
> > Queue0 = xe_exec_queue_create(fd0, vm0)
> >
> > Queue1 = xe_exec_queue_create(fd1, vm1)
> >
> > //check p2p capability calling L0 API….
> >
> > ptr = malloc()//this replace bo_create, vm_bind, dma-import/export
> >
> > Xe_exec(queue0, ptr)//submit gpu job which use ptr, on card0
> >
> > Xe_exec(queue1, ptr)//submit gpu job which use ptr, on card1
>

Re: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Daniel Vetter

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:
> Am 23.01.24 um 20:37 schrieb Zeng, Oak:
> > [SNIP]
> > Yes most API are per device based.
> > 
> > One exception I know is actually the kfd SVM API. If you look at the 
> > svm_ioctl function, it is per-process based. Each kfd_process represent a 
> > process across N gpu devices.
> 
> Yeah and that was a big mistake in my opinion. We should really not do that
> ever again.
> 
> > Need to say, kfd SVM represent a shared virtual address space across CPU 
> > and all GPU devices on the system. This is by the definition of SVM (shared 
> > virtual memory). This is very different from our legacy gpu *device* driver 
> > which works for only one device (i.e., if you want one device to access 
> > another device's memory, you will have to use dma-buf export/import etc).
> 
> Exactly that thinking is what we have currently found as blocker for a
> virtualization projects. Having SVM as device independent feature which
> somehow ties to the process address space turned out to be an extremely bad
> idea.
> 
> The background is that this only works for some use cases but not all of
> them.
> 
> What's working much better is to just have a mirror functionality which says
> that a range A..B of the process address space is mapped into a range C..D
> of the GPU address space.
> 
> Those ranges can then be used to implement the SVM feature required for
> higher level APIs and not something you need at the UAPI or even inside the
> low level kernel memory management.
> 
> When you talk about migrating memory to a device you also do this on a per
> device basis and *not* tied to the process address space. If you then get
> crappy performance because userspace gave contradicting information where to
> migrate memory then that's a bug in userspace and not something the kernel
> should try to prevent somehow.
> 
> [SNIP]
> > > I think if you start using the same drm_gpuvm for multiple devices you
> > > will sooner or later start to run into the same mess we have seen with
> > > KFD, where we moved more and more functionality from the KFD to the DRM
> > > render node because we found that a lot of the stuff simply doesn't work
> > > correctly with a single object to maintain the state.
> > As I understand it, KFD is designed to work across devices. A single pseudo 
> > /dev/kfd device represent all hardware gpu devices. That is why during kfd 
> > open, many pdd (process device data) is created, each for one hardware 
> > device for this process.
> 
> Yes, I'm perfectly aware of that. And I can only repeat myself that I see
> this design as a rather extreme failure. And I think it's one of the reasons
> why NVidia is so dominant with Cuda.
> 
> This whole approach KFD takes was designed with the idea of extending the
> CPU process into the GPUs, but this idea only works for a few use cases and
> is not something we should apply to drivers in general.
> 
> A very good example are virtualization use cases where you end up with CPU
> address != GPU address because the VAs are actually coming from the guest VM
> and not the host process.
> 
> SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have
> any influence on the design of the kernel UAPI.
> 
> If you want to do something similar as KFD for Xe I think you need to get
> explicit permission to do this from Dave and Daniel and maybe even Linus.

I think the one and only one exception where an SVM uapi like in kfd makes
sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces a 1:1
mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, PCI-ATS
(address translation services) or whatever your hw calls it and has _no_
device-side pagetables on top. Which from what I've seen all devices with
device-memory have, simply because they need some place to store whether
that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but there are
some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% agreeing with
Christian that the kfd SVM model is too strict and not a great idea.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH] nouveau: rip out fence irq allow/block sequences.

2024-01-25 Thread Daniel Vetter

On Tue, Jan 23, 2024 at 05:25:38PM +1000, Dave Airlie wrote:
> From: Dave Airlie 
> 
> fences are signalled on nvidia hw using non-stall interrupts.
> 
> non-stall interrupts are not latched from my reading.
> 
> When nouveau emits a fence, it requests a NON_STALL signalling,
> but it only calls the interface to allow the non-stall irq to happen
> after it has already emitted the fence. A recent change
> eacabb546271 ("nouveau: push event block/allowing out of the fence context")
> made this worse by pushing out the fence allow/block to a workqueue.
> 
> However I can't see how this could ever work great, since when
> enable signalling is called, the semaphore has already been emitted
> to the ring, and the hw could already have tried to set the bits,
> but it's been masked off. Changing the allowed mask later won't make
> the interrupt get called again.
> 
> For now rip all of this out.
> 
> This fixes a bunch of stalls seen running VK CTS sync tests.
> 
> Signed-off-by: Dave Airlie 
> ---
>  drivers/gpu/drm/nouveau/nouveau_fence.c | 77 +
>  drivers/gpu/drm/nouveau/nouveau_fence.h |  2 -
>  2 files changed, 16 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 5057d976fa57..d6d50cdccf75 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -50,24 +50,14 @@ nouveau_fctx(struct nouveau_fence *fence)
>   return container_of(fence->base.lock, struct nouveau_fence_chan, lock);
>  }
>  
> -static int
> +static void
>  nouveau_fence_signal(struct nouveau_fence *fence)
>  {
> - int drop = 0;
> -
>   dma_fence_signal_locked(>base);
>   list_del(>head);
>   rcu_assign_pointer(fence->channel, NULL);
>  
> - if (test_bit(DMA_FENCE_FLAG_USER_BITS, >base.flags)) {
> - struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> -
> - if (atomic_dec_and_test(>notify_ref))
> - drop = 1;
> - }
> -
>   dma_fence_put(>base);
> - return drop;
>  }
>  
>  static struct nouveau_fence *
> @@ -93,8 +83,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, 
> int error)
>   if (error)
>   dma_fence_set_error(>base, error);
>  
> - if (nouveau_fence_signal(fence))
> - nvif_event_block(>event);
> + nouveau_fence_signal(fence);
>   }
>   fctx->killed = 1;
>   spin_unlock_irqrestore(>lock, flags);
> @@ -103,8 +92,8 @@ nouveau_fence_context_kill(struct nouveau_fence_chan 
> *fctx, int error)
>  void
>  nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
>  {
> - cancel_work_sync(>allow_block_work);
>   nouveau_fence_context_kill(fctx, 0);
> + nvif_event_block(>event);
>   nvif_event_dtor(>event);
>   fctx->dead = 1;
>  
> @@ -127,11 +116,10 @@ nouveau_fence_context_free(struct nouveau_fence_chan 
> *fctx)
>   kref_put(>fence_ref, nouveau_fence_context_put);
>  }
>  
> -static int
> +static void
>  nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan 
> *fctx)
>  {
>   struct nouveau_fence *fence;
> - int drop = 0;
>   u32 seq = fctx->read(chan);
>  
>   while (!list_empty(>pending)) {
> @@ -140,10 +128,8 @@ nouveau_fence_update(struct nouveau_channel *chan, 
> struct nouveau_fence_chan *fc
>   if ((int)(seq - fence->base.seqno) < 0)
>   break;
>  
> - drop |= nouveau_fence_signal(fence);
> + nouveau_fence_signal(fence);
>   }
> -
> - return drop;
>  }
>  
>  static int
> @@ -160,26 +146,13 @@ nouveau_fence_wait_uevent_handler(struct nvif_event 
> *event, void *repv, u32 repc
>  
>   fence = list_entry(fctx->pending.next, typeof(*fence), head);
>   chan = rcu_dereference_protected(fence->channel, 
> lockdep_is_held(>lock));
> - if (nouveau_fence_update(chan, fctx))
> - ret = NVIF_EVENT_DROP;
> + nouveau_fence_update(chan, fctx);
>   }
>   spin_unlock_irqrestore(>lock, flags);
>  
>   return ret;
>  }
>  
> -static void
> -nouveau_fence_work_allow_block(struct work_struct *work)
> -{
> - struct nouveau_fence_chan *fctx = container_of(work, struct 
> nouveau_fence_chan,
> -allow_block_work);
> -
> - if (atomic_read(>notify_ref) == 0)
> - nvif_event_block(>event);
> - else
> - nvif_event_allow(>event);
> -}
> -
>  void
>  nouveau_fence_context_new(struct nouveau_channel *chan, struct 
> nouveau_fence_chan *fctx)
>  {
> @@ -191,7 +164,6 @@ nouveau_fence_context_new(struct nouveau_channel *chan, 
> struct nouveau_fence_cha
>   } args;
>   int ret;
>  
> - INIT_WORK(>allow_block_work, nouveau_fence_work_allow_block);
>   INIT_LIST_HEAD(>flip);
>   INIT_LIST_HEAD(>pending);
>

Re: [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

2024-01-25 Thread Daniel Vetter

On Tue, Jan 23, 2024 at 06:09:05AM +, Jason-JH Lin (林睿祥) wrote:
> Hi Maxime, Daniel,
> 
> We encountered similar issue with mediatek SoCs.
> 
> We have found that in drm_atomic_helper_commit_rpm(), when disabling
> the cursor plane, the old_state->legacy_cursor_update in
> drm_atomic_wait_for_vblank() is set to true.
> As the result, we are not actually waiting for a vlbank to wait for our
> hardware to close the cursor plane. Subsequently, the execution
> proceeds to drm_atomic_helper_cleanup_planes() to  free the cursor
> buffer. This can lead to use-after-free issues with our hardware.
> 
> Could you please apply this patch to fix our problem?
> Or are there any considerations for not applying this patch?

Mostly it needs someone to collect a pile of acks/tested-by and then land
it.

I'd be _very_ happy if someone else can take care of that ...

There's also the potential issue that it might slow down some of the
legacy X11 use-cases that really needed a non-blocking cursor, but I think
all the drivers where this matters have switched over to the async plane
update stuff meanwhile. So hopefully that's good.

Cheers, Sima
> 
> Regards,
> Jason-JH.Lin
> 
> On Tue, 2023-03-07 at 15:56 +0100, Maxime Ripard wrote:
> > Hi,
> > 
> > On Thu, Feb 16, 2023 at 12:12:13PM +0100, Daniel Vetter wrote:
> > > The stuff never really worked, and leads to lots of fun because it
> > > out-of-order frees atomic states. Which upsets KASAN, among other
> > > things.
> > > 
> > > For async updates we now have a more solid solution with the
> > > ->atomic_async_check and ->atomic_async_commit hooks. Support for
> > > that
> > > for msm and vc4 landed. nouveau and i915 have their own commit
> > > routines, doing something similar.
> > > 
> > > For everyone else it's probably better to remove the use-after-free
> > > bug, and encourage folks to use the async support instead. The
> > > affected drivers which register a legacy cursor plane and don't
> > > either
> > > use the new async stuff or their own commit routine are: amdgpu,
> > > atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and
> > > vmwgfx.
> > > 
> > > Inspired by an amdgpu bug report.
> > 
> > Thanks for submitting that patch. It's been in the downstream RPi
> > tree
> > for a while, so I'd really like it to be merged eventually :)
> > 
> > Acked-by: Maxime Ripard 
> > 
> > Maxime

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 0/2] kernel-doc: Do not pre-process comments

2024-01-25 Thread Daniel Vetter

On Mon, Jan 22, 2024 at 10:31:50AM +0100, Anna-Maria Behnsen wrote:
> Hi,
> 
> this is a repost of the RFC queue
> https://lkml.kernel.org/r/20240116151456.48238-1-anna-ma...@linutronix.de
> 
> Jonathan Corbet is fine with this change and mentioned in an answer the
> following:
> 
>   "The kernel-doc change should really go together with the DRM change.
>   I'm happy to carry both with an ack from DRMland or have the kernel-doc
>   patch go through the DRM tree, whichever is easiest."

Agree, that sounds like the simplest merge plan and I don't think we have
anything in-flight for vram helpers that would cause conflicts. For
merging the drm patch through Jon's -doc tree:

Acked-by: Daniel Vetter 

> 
> But back to the patchset: Commit 654784284430 ("kernel-doc: bugfix -
> multi-line macros") introduces pre-processing of backslashes at the end of
> a line to not break multi-line macros. This pre-processing is done
> independently if it is inside code or inside a comment.
> 
> This illustation of a hierarchy as a code block inside a kernel-doc comment
> has a backslash at the end of the line:
> 
> ---8<---
> /**
>  * DOC: hierarchy
>  *
>  *Top Level
>  */   \
>  * Child A Child B
>  */
> ---8<---
> 
> It will be displayed as:
> 
> ---8<---
>Top Level
>/*Child A Child B
> ---8<---
> 
> 
> As I asked for a solution on the linux-doc mailing list, I got some
> suggestions with workarounds and also got the suggestion by Matthew Wilcox
> to adapt the backslash preprocessing in kernel-doc script. I tested it and
> fixed then the newly produced warnings which are covered in the first
> patch. The processing of the documentation seems to work - but please don't
> rely on my tests as I'm not a perl neither a kernel-doc expert.
> 
> Thanks,
> 
>   Anna-Maria
> 
> 
> 
> Anna-Maria Behnsen (2):
>   drm/vram-helper: Fix 'multi-line' kernel-doc comments
>   scripts/kernel-doc: Do not process backslash lines in comments
> 
>  drivers/gpu/drm/drm_gem_vram_helper.c | 44 ---
>  include/drm/drm_gem_vram_helper.h | 16 +-
>  scripts/kernel-doc|  2 +-
>  3 files changed, 29 insertions(+), 33 deletions(-)
> 
> -- 
> 2.39.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v2 1/3] drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set

2024-01-25 Thread Daniel Vetter

On Fri, Jan 19, 2024 at 08:32:06AM -0800, Erik Kurzinger wrote:
> When waiting for a syncobj timeline point whose fence has not yet been
> submitted with the WAIT_FOR_SUBMIT flag, a callback is registered using
> drm_syncobj_fence_add_wait and the thread is put to sleep until the
> timeout expires. If the fence is submitted before then,
> drm_syncobj_add_point will wake up the sleeping thread immediately which
> will proceed to wait for the fence to be signaled.
> 
> However, if the WAIT_AVAILABLE flag is used instead,
> drm_syncobj_fence_add_wait won't get called, meaning the waiting thread
> will always sleep for the full timeout duration, even if the fence gets
> submitted earlier. If it turns out that the fence *has* been submitted
> by the time it eventually wakes up, it will still indicate to userspace
> that the wait completed successfully (it won't return -ETIME), but it
> will have taken much longer than it should have.
> 
> To fix this, we must call drm_syncobj_fence_add_wait if *either* the
> WAIT_FOR_SUBMIT flag or the WAIT_AVAILABLE flag is set. The only
> difference being that with WAIT_FOR_SUBMIT we will also wait for the
> fence to be signaled after it has been submitted while with
> WAIT_AVAILABLE we will return immediately.
> 
> IGT test patch: 
> https://lists.freedesktop.org/archives/igt-dev/2024-January/067537.html
> 
> v1 -> v2: adjust lockdep_assert_none_held_once condition
> 
> Fixes: 01d6c3578379 ("drm/syncobj: add support for timeline point wait v8")
> Signed-off-by: Erik Kurzinger 

Yeah I think this series catches now all the corner cases I spotted in v1.
On the series:

Reviewed-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 94ebc71e5be5..97be8b140599 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1058,7 +1058,8 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>   uint64_t *points;
>   uint32_t signaled_count, i;
>  
> - if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)
> + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
> +  DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE))
>   lockdep_assert_none_held_once();
>  
>   points = kmalloc_array(count, sizeof(*points), GFP_KERNEL);
> @@ -1127,7 +1128,8 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>* fallthough and try a 0 timeout wait!
>*/
>  
> - if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
> + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
> +  DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
>   for (i = 0; i < count; ++i)
>   drm_syncobj_fence_add_wait(syncobjs[i], [i]);
>   }
> -- 
> 2.43.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()

2024-01-25 Thread Daniel Vetter

On Fri, Jan 19, 2024 at 03:13:57PM +0100, Paul Cercueil wrote:
> These functions should be used by device drivers when they start and
> stop accessing the data of DMABUF. It allows DMABUF importers to cache
> the dma_buf_attachment while ensuring that the data they want to access
> is available for their device when the DMA transfers take place.
> 
> Signed-off-by: Paul Cercueil 

Putting my detailed review comments here just so I don't have to remember
them any longer. We need to reach consensus on the big picture direction
first.

> 
> ---
> v5: New patch
> ---
>  drivers/dma-buf/dma-buf.c | 66 +++
>  include/linux/dma-buf.h   | 37 ++
>  2 files changed, 103 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 8fe5aa67b167..a8bab6c18fcd 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -830,6 +830,8 @@ static struct sg_table * __map_dma_buf(struct 
> dma_buf_attachment *attach,
>   * - dma_buf_mmap()
>   * - dma_buf_begin_cpu_access()
>   * - dma_buf_end_cpu_access()
> + * - dma_buf_begin_access()
> + * - dma_buf_end_access()
>   * - dma_buf_map_attachment_unlocked()
>   * - dma_buf_unmap_attachment_unlocked()
>   * - dma_buf_vmap_unlocked()
> @@ -1602,6 +1604,70 @@ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, 
> struct iosys_map *map)
>  }
>  EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
>  
> +/**
> + * @dma_buf_begin_access - Call before any hardware access from/to the DMABUF
> + * @attach:  [in]attachment used for hardware access
> + * @sg_table:[in]scatterlist used for the DMA transfer
> + * @direction:  [in]direction of DMA transfer

I think for the kerneldoc would be good to point at the other function
here, explain why this might be needed and that for most reasonable
devices it's probably not, and link between the function pairs.

Also we need to document that dma_buf_map does an implied
dma_buf_begin_access (because dma_sg_map does an implied
dma_sg_sync_for_device) and vice versa for dma_buf_end_access. Which also
means that dma_buf_map/unmap should link to these functions in their
kerneldoc too.

Finally I think we should document here that it's ok to call these from
dma_fence signalling critical section and link to the relevant discussion
in the dma_fence docs for that.

> + */
> +int dma_buf_begin_access(struct dma_buf_attachment *attach,
> +  struct sg_table *sgt, enum dma_data_direction dir)
> +{
> + struct dma_buf *dmabuf;
> + bool cookie;
> + int ret;
> +
> + if (WARN_ON(!attach))
> + return -EINVAL;
> +
> + dmabuf = attach->dmabuf;
> +
> + if (!dmabuf->ops->begin_access)
> + return 0;
> +
> + cookie = dma_fence_begin_signalling();
> + ret = dmabuf->ops->begin_access(attach, sgt, dir);
> + dma_fence_end_signalling(cookie);
> +
> + if (WARN_ON_ONCE(ret))
> + return ret;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(dma_buf_begin_access, DMA_BUF);

So explicit device side coherency management is not going to be very
compatible with dynamic buffer managament where the exporter can move the
buffer around. The reason for that is that for a dynamic exporter we cache
the sg mapping, which means any device-side coherency management which
dma_buf_map/unmap would do will not happen (since it's cached),
potentially breaking things for importers that rely on the assumption that
dma_buf_map/unmap already implies dma_buf_begin/end_device_access.

I think for now it's sufficient to put a WARN_ON(dma_buf_is_dymamic() &&
ops->begin|end_access) or similar into dma_buf_export and bail out with an
error to catch that.

Aside from the nits I do think this is roughly what we brievely discussed
well over a decade ago in the original dma-buf kickoff meeting at a linaro
connect in Budapest :-)

Cheers, Sima

> +
> +/**
> + * @dma_buf_end_access - Call after any hardware access from/to the DMABUF
> + * @attach:  [in]attachment used for hardware access
> + * @sg_table:[in]scatterlist used for the DMA transfer
> + * @direction:  [in]direction of DMA transfer
> + */
> +int dma_buf_end_access(struct dma_buf_attachment *attach,
> +struct sg_table *sgt, enum dma_data_direction dir)
> +{
> + struct dma_buf *dmabuf;
> + bool cookie;
> + int ret;
> +
> + if (WARN_ON(!attach))
> + return -EINVAL;
> +
> + dmabuf = attach->dmabuf;
> +
> + if (!dmabuf->ops->end_access)
> + return 0;
> +
> + cookie = dma_fence_begin_signalling();
> + ret = dmabuf->ops->end_access(attach, sgt, dir);
> + dma_fence_end_signalling(cookie);
> +
> + if (WARN_ON_ONCE(ret))
> + return ret;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(dma_buf_end_access, DMA_BUF);
> +
>  #ifdef CONFIG_DEBUG_FS
>  static int dma_buf_debug_show(struct

[PATCH] drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func

2024-01-25 Thread Erik Kurzinger

During syncobj_eventfd_entry_func, dma_fence_chain_find_seqno may set
the fence to NULL if the given seqno is signaled and a later seqno has
already been submitted. In that case, the eventfd should be signaled
immediately which currently does not happen.

This is a similar issue to the one addressed by b19926d4f3a6
("drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence")

As a fix, if the return value of dma_fence_chain_find_seqno indicates
success but it sets the fence to NULL, we should simply signal the
eventfd immediately.

Signed-off-by: Erik Kurzinger 
Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd")
---
 drivers/gpu/drm/drm_syncobj.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e04965878a08..cc3af1084950 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1441,10 +1441,21 @@ syncobj_eventfd_entry_func(struct drm_syncobj *syncobj,
 
/* This happens inside the syncobj lock */
fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence)
+   return;
+
ret = dma_fence_chain_find_seqno(, entry->point);
-   if (ret != 0 || !fence) {
+   if (ret != 0) {
+   /* The given seqno has not been submitted yet. */
dma_fence_put(fence);
return;
+   } else if (!fence) {
+   /* If dma_fence_chain_find_seqno returns 0 but sets the fence
+* to NULL, it implies that the given seqno is signaled and a
+* later seqno has already been submitted. Signal the eventfd
+* immediately in that case. */
+   eventfd_signal(entry->ev_fd_ctx, 1);
+   syncobj_eventfd_entry_free(entry);
}
 
list_del_init(>node);
-- 
2.43.0

Re: BUG [RESEND][NEW BUG]: kernel NULL pointer dereference, address: 0000000000000008

2024-01-25 Thread Mirsad Todorovac


Hi Ma Jun,

Greetings again.

So, I just tested the recommended patch and the issue with the graphical login
screen was successfully resolved.

Thank you very much for your prompt reviews and recommended patches.

God bless.

Best regards,
Mirsad Todorovac

On 1/25/24 10:29, Mirsad Todorovac wrote:

Hi Ma Jun,

Copy that. This appears to be the exact problem, and thank you for
reviewing the bug report at such a short notice.

I apologise for the wrong assertion.

The patch you sent then just triggered another bug, and it is not manifested 
without the patch (but a NULL pointer dereference instead).

But of course, it is not profitable to remove your patch and have
the NULL ptr dereference, but a proper fix is required.

Thanks again.

Best regards,
Mirsad Todorovac

On 1/25/2024 8:38 AM, Ma, Jun wrote:

Hi Mirsad,


On 1/25/2024 1:48 AM, Mirsad Todorovac wrote:

Hi, Ma Jun,

Normally, I would reply under the quoted text, but I will adjust to your 
convention.

I have just discovered that your patch causes Ubuntu 22.04 LTS GNOME XWayland 
session
to block at typing password and ENTER in the graphical logon screen (tested 
several times).


This problem is not caused by my patch.
Based on your syslog, it looks more like a shedule issue.
I just saw a similar problem, please refer to the link below
https://gitlab.freedesktop.org/drm/amd/-/issues/3124

Regards,
Ma Jun

After that, I was not able to even log from another box with ssh, or the 
session would
block (tested one time, second time too, thrid time it passed after I connected 
before
attempt to login on XWayland console).

You might find useful syslog and dmesg of the freeze on this link (they were 
+100K):

https://magrf.grf.hr/~mtodorov/linux/bugreports/6.7.0/amdgpu/6.7.0-xway-09721-g61da593f4458/

The exact applied patch was this:

marvin@defiant:~/linux/kernel/linux_torvalds$ git diff
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 73f6d7e72c73..6ef333df9adf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3996,16 +3996,13 @@ static int gfx_v10_0_init_microcode(struct 
amdgpu_device *adev)
   if (!amdgpu_sriov_vf(adev)) {
   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", 
ucode_prefix);
-   err = amdgpu_ucode_request(adev, >gfx.rlc_fw, fw_name);
-   /* don't check this.  There are apparently firmwares in the 
wild with
-    * incorrect size in the header
-    */
-   if (err == -ENODEV)
-   goto out;
+   err = request_firmware(>gfx.rlc_fw, fw_name, adev->dev);
   if (err)
-   dev_dbg(adev->dev,
-   "gfx10: amdgpu_ucode_request() failed \"%s\"\n",
-   fw_name);
+   goto out;
+
+   /* don't validate this firmware.  There are apparently firmwares
+    * in the wild with incorrect size in the header
+    */
   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
   version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
   version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
Linux 6.7.0-xway-09721-g61da593f4458 x86_64
marvin@defiant:~/linux/kernel/linux_torvalds$

So, there seems to be a problem with the way the patch affects XWayland.

Checked multiple times the exact commit with and without the diff.

Hope this helps, because I am not familiar with the amdgpu driver.

Best regards,
Mirsad Todorovac

On 1/22/24 09:34, Ma, Jun wrote:

Perhaps similar to the problem I encountered earlier, you can
try the following patch

https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html

Regards,
Ma Jun

On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:

Hi,

The last email did not pass to the most of the recipients due to banned .xz 
attachment.

As the .config is too big to send inline or uncompressed either, I will omit it 
in this
attempt. In the meantime, I had some success in decoding the stack trace, but 
sadly not
complete.

I don't think this Oops is deterministic, but I am working on a reproducer.

The platform is Ubuntu 22.04 LTS.

Complete list of hardware and .config is available here:

https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/

Best regards,
Mirsad

---
kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 
0008
kernel: [    5.576707] #PF: supervisor read access in kernel mode
kernel: [    5.576710] #PF: error_code(0x) - not-present page
kernel: [    5.576712] PGD 0 P4D 0
kernel: [    5.576715] Oops:  [#1] PREEMPT SMP NOPTI
kernel: [

Re: [Linaro-mm-sig] [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()

2024-01-25 Thread Daniel Vetter

On Thu, Jan 25, 2024 at 04:00:16PM +0100, Christian König wrote:
> Am 24.01.24 um 11:58 schrieb Paul Cercueil:
> > [SNIP]
> > > > The problem was then that dma_buf_unmap_attachment cannot be called
> > > > before the dma_fence is signaled, and calling it after is already
> > > > too
> > > > late (because the fence would be signaled before the data is
> > > > sync'd).
> > >   Well what sync are you talking about? CPU sync? In DMA-buf that is
> > > handled differently.
> > >   For importers it's mandatory that they can be coherent with the
> > > exporter. That usually means they can snoop the CPU cache if the
> > > exporter can snoop the CPU cache.
> > I seem to have such a system where one device can snoop the CPU cache
> > and the other cannot. Therefore if I want to support it properly, I do
> > need cache flush/sync. I don't actually try to access the data using
> > the CPU (and when I do, I call the sync start/end ioctls).
> 
> Usually that isn't a problem as long as you don't access the data with the
> CPU.
> 
> [SNIP]
> 
> > > > (and I *think* there is a way to force coherency in the
> > > > Ultrascale's
> > > > interconnect - we're investigating it)
> > >   What you can do is that instead of using udmabuf or dma-heaps is
> > > that the device which can't provide coherency act as exporters of the
> > > buffers.
> > >   The exporter is allowed to call sync_for_cpu/sync_for_device on it's
> > > own buffers and also gets begin/end CPU access notfications. So you
> > > can then handle coherency between the exporter and the CPU.
> > But again that would only work if the importers would call
> > begin_cpu_access() / end_cpu_access(), which they don't, because they
> > don't actually access the data using the CPU.
> 
> Wow, that is a completely new use case then.
> 
> Neither DMA-buf nor the DMA subsystem in Linux actually supports this as far
> as I can see.
> 
> > Unless you mean that the exporter can call sync_for_cpu/sync_for_device
> > before/after every single DMA transfer so that the data appears
> > coherent to the importers, without them having to call
> > begin_cpu_access() / end_cpu_access().
> 
> Yeah, I mean the importers don't have to call begin_cpu_access() /
> end_cpu_access() if they don't do CPU access :)
> 
> What you can still do as exporter is to call sync_for_device() and
> sync_for_cpu() before and after each operation on your non-coherent device.
> Paired with the fence signaling that should still work fine then.
> 
> But taking a step back, this use case is not something even the low level
> DMA subsystem supports. That sync_for_cpu() does the right thing is
> coincident and not proper engineering.
> 
> What you need is a sync_device_to_device() which does the appropriate
> actions depending on which devices are involved.
> 
> > In which case - this would still demultiply the complexity; my USB-
> > functionfs interface here (and IIO interface in the separate patchset)
> > are not device-specific, so I'd rather keep them importers.
> > >   If you really don't have coherency between devices then that would
> > > be a really new use case and we would need much more agreement on how
> > > to do this.
> > [snip]
> > 
> > Agreed. Desiging a good generic solution would be better.
> > 
> > With that said...
> > 
> > Let's keep it out of this USB-functionfs interface for now. The
> > interface does work perfectly fine on platforms that don't have
> > coherency problems. The coherency issue in itself really is a
> > tangential issue.
> 
> Yeah, completely agree.
> 
> > So I will send a v6 where I don't try to force the cache coherency -
> > and instead assume that the attached devices are coherent between
> > themselves.
> > 
> > But it would be even better to have a way to detect non-coherency and
> > return an error on attach.
> 
> Take a look into the DMA subsystem. I'm pretty sure we already have
> something like this in there.
> 
> If nothing else helps you could take a look if the coherent memory access
> mask is non zero or something like that.

Jumping in way late and apolgies to everyone since yes I indeed suggested
this entire mess to Paul in some private thread.

And worse, I think we need it, it's just that we got away without it thus
far.

So way back at the og dma-buf kick-off dma coherency was discussed, and a
few things where noted:
- the dma api only supports device<->cpu coherency
- getting the full coherency model off the ground right away is probably
  too hard, so we made the decision that where it matters, relevant
  flushing needs to be done in dma_buf_map/unmap.

If you look at the earliest patches for dma-buf we had pretty clear
language that all dma-operations should be bracketed with map/unmap. Of
course that didn't work out for drm at all, and we had to first get
dma_resv_lock and dma_fence landed and then your dynamic exporter/importer
support in just to get the buffer migration functionality working, which
was only one of the things discussed that braketing

Re: [PATCH v5 104/111] drm/bridge: ti-sn65dsi86: Make use of devm_pwmchip_alloc() function

2024-01-25 Thread Doug Anderson

Hi,

On Thu, Jan 25, 2024 at 4:11 AM Uwe Kleine-König
 wrote:
>
> This prepares the pwm driver of the ti-sn65dsi86 to further changes of
> the pwm core outlined in the commit introducing devm_pwmchip_alloc().
> There is no intended semantical change and the driver should behave as
> before.
>
> Signed-off-by: Uwe Kleine-König 
> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
> b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> index f1fffbef3324..7fbc307cc025 100644
> --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> @@ -197,7 +197,7 @@ struct ti_sn65dsi86 {
> DECLARE_BITMAP(gchip_output, SN_NUM_GPIOS);
>  #endif
>  #if defined(CONFIG_PWM)
> -   struct pwm_chip pchip;
> +   struct pwm_chip *pchip;
> boolpwm_enabled;
> atomic_tpwm_pin_busy;
>  #endif
> @@ -1374,7 +1374,7 @@ static void ti_sn_pwm_pin_release(struct ti_sn65dsi86 
> *pdata)
>
>  static struct ti_sn65dsi86 *pwm_chip_to_ti_sn_bridge(struct pwm_chip *chip)
>  {
> -   return container_of(chip, struct ti_sn65dsi86, pchip);
> +   return pwmchip_get_drvdata(chip);
>  }

nit: given Linux conventions that I'm aware of, a reader of the code
would see the name "pwm_chip_to_ti_sn_bridge" and assume it's doing a
container_of operation. It no longer is, so the name doesn't make as
much sense. ...and, in fact, the function itself doesn't make as much
sense. Maybe just have all callers call pwmchip_get_drvdata()
directly?

In any case, this seems fine to me. I haven't done lots to analyze
your full plans to fix lifetime issues, but this patch itself looks
benign and I wouldn't object to it landing. Thus I'm OK with:

Acked-by: Douglas Anderson 

Similar to the other ti-sn65dsi86 patch in this series, unless someone
more senior in the drm-misc community contradicts me I think it's safe
to assume you could land this through your tree.

-Doug

Re: [PATCH v5 037/111] drm/bridge: ti-sn65dsi86: Make use of pwmchip_parent() macro

2024-01-25 Thread Doug Anderson

Hi,

On Thu, Jan 25, 2024 at 4:11 AM Uwe Kleine-König
 wrote:
>
> struct pwm_chip::dev is about to change. To not have to touch this
> driver in the same commit as struct pwm_chip::dev, use the macro
> provided for exactly this purpose.
>
> Signed-off-by: Uwe Kleine-König 
> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

This seems OK with me. Unless someone more senior in the drm-misc
community contradicts me, feel free to take this through your tree.

Acked-by: Douglas Anderson 

NOTE: though the patch seems OK to me, I have one small concern. If I
understand correctly, your eventual goal is to add a separate "dev"
for the PWM chip without further changes to the ti-sn65dsi86 driver.
If that's true, you'll have to find some way to magically call
devm_pm_runtime_enable() on the new "dev" since the code you have here
is calling pm_runtime functions on what will eventually be this new
"dev". Maybe you'll do something like enabling runtime PM on it
automatically if its parent had runtime PM enabled?

Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-25 Thread Matthew Brost

On Thu, Jan 25, 2024 at 10:24:24AM +0100, Vlastimil Babka wrote:
> On 1/24/24 22:08, Matthew Brost wrote:
> > All entities must be drained in the DRM scheduler run job worker to
> > avoid the following case. An entity found that is ready, no job found
> > ready on entity, and run job worker goes idle with other entities + jobs
> > ready. Draining all ready entities (i.e. loop over all ready entities)
> > in the run job worker ensures all job that are ready will be scheduled.
> > 
> > Cc: Thorsten Leemhuis 
> > Reported-by: Mikhail Gavrilov 
> > Closes: 
> > https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuowtoeee+q26z...@mail.gmail.com/
> > Reported-and-tested-by: Mario Limonciello 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
> > Link: 
> > https://lore.kernel.org/all/20240123021155.2775-1-mario.limoncie...@amd.com/
> > Reported-by: Vlastimil Babka 
> 
> Can change to Reported-and-tested-by: Vlastimil Babka 
> 

+1, got it.

Matt

> Thanks!
> 
> > Closes: 
> > https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd15...@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 15 +++
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 550492a7a031..85f082396d42 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1178,21 +1178,20 @@ static void drm_sched_run_job_work(struct 
> > work_struct *w)
> > struct drm_sched_entity *entity;
> > struct dma_fence *fence;
> > struct drm_sched_fence *s_fence;
> > -   struct drm_sched_job *sched_job;
> > +   struct drm_sched_job *sched_job = NULL;
> > int r;
> >  
> > if (READ_ONCE(sched->pause_submit))
> > return;
> >  
> > -   entity = drm_sched_select_entity(sched);
> > +   /* Find entity with a ready job */
> > +   while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> > +   sched_job = drm_sched_entity_pop_job(entity);
> > +   if (!sched_job)
> > +   complete_all(>entity_idle);
> > +   }
> > if (!entity)
> > -   return;
> > -
> > -   sched_job = drm_sched_entity_pop_job(entity);
> > -   if (!sched_job) {
> > -   complete_all(>entity_idle);
> > return; /* No more work */
> > -   }
> >  
> > s_fence = sched_job->s_fence;
> >  
>

Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-25 Thread Matthew Brost

On Thu, Jan 25, 2024 at 04:12:58PM +0100, Christian König wrote:
> 
> 
> Am 24.01.24 um 22:08 schrieb Matthew Brost:
> > All entities must be drained in the DRM scheduler run job worker to
> > avoid the following case. An entity found that is ready, no job found
> > ready on entity, and run job worker goes idle with other entities + jobs
> > ready. Draining all ready entities (i.e. loop over all ready entities)
> > in the run job worker ensures all job that are ready will be scheduled.
> 
> That doesn't make sense. drm_sched_select_entity() only returns entities
> which are "ready", e.g. have a job to run.
> 

That is what I thought too, hence my original design but it is not
exactly true. Let me explain.

drm_sched_select_entity() returns an entity with a non-empty spsc queue
(job in queue) and no *current* waiting dependecies [1]. Dependecies for
an entity can be added when drm_sched_entity_pop_job() is called [2][3]
returning a NULL job. Thus we can get into a scenario where 2 entities
A and B both have jobs and no current dependecies. A's job is waiting
B's job, entity A gets selected first, a dependecy gets installed in
drm_sched_entity_pop_job(), run work goes idle, and now we deadlock.

The proper solution is to loop over all ready entities until one with a
job is found via drm_sched_entity_pop_job() and then requeue the run
job worker. Or loop over all entities until drm_sched_select_entity()
returns NULL and then let the run job worker go idle. This is what the
old threaded design did too [4]. Hope this clears everything up.

Matt

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_entity.c#L144
[2] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_entity.c#L464
[3] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_entity.c#L397
[4] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_main.c#L1011

> If that's not the case any more then you have broken something else.
> 
> Regards,
> Christian.
> 
> > 
> > Cc: Thorsten Leemhuis 
> > Reported-by: Mikhail Gavrilov 
> > Closes: 
> > https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuowtoeee+q26z...@mail.gmail.com/
> > Reported-and-tested-by: Mario Limonciello 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
> > Link: 
> > https://lore.kernel.org/all/20240123021155.2775-1-mario.limoncie...@amd.com/
> > Reported-by: Vlastimil Babka 
> > Closes: 
> > https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd15...@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 15 +++
> >   1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 550492a7a031..85f082396d42 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1178,21 +1178,20 @@ static void drm_sched_run_job_work(struct 
> > work_struct *w)
> > struct drm_sched_entity *entity;
> > struct dma_fence *fence;
> > struct drm_sched_fence *s_fence;
> > -   struct drm_sched_job *sched_job;
> > +   struct drm_sched_job *sched_job = NULL;
> > int r;
> > if (READ_ONCE(sched->pause_submit))
> > return;
> > -   entity = drm_sched_select_entity(sched);
> > +   /* Find entity with a ready job */
> > +   while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> > +   sched_job = drm_sched_entity_pop_job(entity);
> > +   if (!sched_job)
> > +   complete_all(>entity_idle);
> > +   }
> > if (!entity)
> > -   return;
> > -
> > -   sched_job = drm_sched_entity_pop_job(entity);
> > -   if (!sched_job) {
> > -   complete_all(>entity_idle);
> > return; /* No more work */
> > -   }
> > s_fence = sched_job->s_fence;
>

Re: [PATCH v19 09/30] drm/shmem-helper: Add and use lockless drm_gem_shmem_get_pages()

2024-01-25 Thread Daniel Vetter

On Fri, Jan 05, 2024 at 09:46:03PM +0300, Dmitry Osipenko wrote:
> Add lockless drm_gem_shmem_get_pages() helper that skips taking reservation
> lock if pages_use_count is non-zero, leveraging from atomicity of the
> refcount_t. Make drm_gem_shmem_mmap() to utilize the new helper.
> 
> Acked-by: Maxime Ripard 
> Reviewed-by: Boris Brezillon 
> Suggested-by: Boris Brezillon 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/drm_gem_shmem_helper.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index cacf0f8c42e2..1c032513abf1 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -226,6 +226,20 @@ void drm_gem_shmem_put_pages_locked(struct 
> drm_gem_shmem_object *shmem)
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_put_pages_locked);
>  
> +static int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
> +{
> + int ret;

Just random drive-by comment: a might_lock annotation here might be good,
or people could hit some really interesting bugs that are rather hard to
reproduce ...
-Sima

> +
> + if (refcount_inc_not_zero(>pages_use_count))
> + return 0;
> +
> + dma_resv_lock(shmem->base.resv, NULL);
> + ret = drm_gem_shmem_get_pages_locked(shmem);
> + dma_resv_unlock(shmem->base.resv);
> +
> + return ret;
> +}
> +
>  static int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem)
>  {
>   int ret;
> @@ -609,10 +623,7 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object 
> *shmem, struct vm_area_struct
>   return ret;
>   }
>  
> - dma_resv_lock(shmem->base.resv, NULL);
> - ret = drm_gem_shmem_get_pages_locked(shmem);
> - dma_resv_unlock(shmem->base.resv);
> -
> + ret = drm_gem_shmem_get_pages(shmem);
>   if (ret)
>   return ret;
>  
> -- 
> 2.43.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Felix Kuehling




On 2024-01-24 20:17, Zeng, Oak wrote:


Hi Christian,

Even though I mentioned KFD design, I didn’t mean to copy the KFD 
design. I also had hard time to understand the difficulty of KFD under 
virtualization environment.


The problem with virtualization is related to virtualization design 
choices. There is a single process that proxies requests from multiple 
processes in one (or more?) VMs to the GPU driver. That means, we need a 
single process with multiple contexts (and address spaces). One proxy 
process on the host must support multiple guest address spaces.


I don't know much more than these very high level requirements, and I 
only found out about those a few weeks ago. Due to my own bias I can't 
comment whether there are bad design choices in the proxy architecture 
or in KFD or both. The way we are considering fixing this, is to enable 
creating multiple KFD contexts in the same process. Each of those 
contexts will still represent a shared virtual address space across 
devices (but not the CPU). Because the device address space is not 
shared with the CPU, we cannot support our SVM API in this situation.


I still believe that it makes sense to have the kernel mode driver aware 
of a shared virtual address space at some level. A per-GPU API and an 
API that doesn't require matching CPU and GPU virtual addresses would 
enable more flexibility at the cost duplicate information tracking for 
multiple devices and duplicate overhead for things like MMU notifiers 
and interval tree data structures. Having to coordinate multiple devices 
with potentially different address spaces would probably make it more 
awkward to implement memory migration. The added flexibility would go 
mostly unused, except in some very niche applications.


Regards,
  Felix


For us, Xekmd doesn't need to know it is running under bare metal or 
virtualized environment. Xekmd is always a guest driver. All the 
virtual address used in xekmd is guest virtual address. For SVM, we 
require all the VF devices share one single shared address space with 
guest CPU program. So all the design works in bare metal environment 
can automatically work under virtualized environment. +@Shah, Ankur N 
 +@Winiarski, Michal 
 to backup me if I am wrong.


Again, shared virtual address space b/t cpu and all gpu devices is a 
hard requirement for our system allocator design (which means 
malloc’ed memory, cpu stack variables, globals can be directly used in 
gpu program. Same requirement as kfd SVM design). This was aligned 
with our user space software stack.


For anyone who want to implement system allocator, or SVM, this is a 
hard requirement. I started this thread hoping I can leverage the 
drm_gpuvm design to manage the shared virtual address space (as the 
address range split/merge function was scary to me and I didn’t want 
re-invent). I guess my takeaway from this you and Danilo is this 
approach is a NAK. Thomas also mentioned to me drm_gpuvm is a overkill 
for our svm address range split/merge. So I will make things work 
first by manage address range xekmd internally. I can re-look 
drm-gpuvm approach in the future.


Maybe a pseudo user program can illustrate our programming model:

Fd0 = open(card0)

Fd1 = open(card1)

Vm0 =xe_vm_create(fd0) //driver create process xe_svm on the process's 
first vm_create


Vm1 = xe_vm_create(fd1) //driver re-use xe_svm created above if called 
from same process


Queue0 = xe_exec_queue_create(fd0, vm0)

Queue1 = xe_exec_queue_create(fd1, vm1)

//check p2p capability calling L0 API….

ptr = malloc()//this replace bo_create, vm_bind, dma-import/export

Xe_exec(queue0, ptr)//submit gpu job which use ptr, on card0

Xe_exec(queue1, ptr)//submit gpu job which use ptr, on card1

//Gpu page fault handles memory allocation/migration/mapping to gpu

As you can see, from above model, our design is a little bit different 
than the KFD design. user need to explicitly create gpuvm (vm0 and vm1 
above) for each gpu device. Driver internally have a xe_svm represent 
the shared address space b/t cpu and multiple gpu devices. But end 
user doesn’t see and no need to create xe_svm. The shared virtual 
address space is really managed by linux core mm (through the vma 
struct, mm_struct etc). From each gpu device’s perspective, it just 
operate under its own gpuvm, not aware of the existence of other 
gpuvm, even though in reality all those gpuvm shares a same virtual 
address space.


See one more comment inline

*From:*Christian König 
*Sent:* Wednesday, January 24, 2024 3:33 AM
*To:* Zeng, Oak ; Danilo Krummrich 
; Dave Airlie ; Daniel Vetter 
; Felix Kuehling 
*Cc:* Welty, Brian ; 
dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org; 
Bommu, Krishnaiah ; Ghimiray, Himal Prasad 
; thomas.hellst...@linux.intel.com; 
Vishwanathapura, Niranjana ; 
Brost, Matthew ; Gupta, saurabhg 


*Subject:* Re: Making drm_gpuvm work across gpu

Re: [PATCH] drm/msm/dpu: make "vblank timeout" more useful

2024-01-25 Thread Abhinav Kumar





On 1/5/2024 3:50 PM, Dmitry Baryshkov wrote:

We have several reports of vblank timeout messages. However after some
debugging it was found that there might be different causes to that.
Include the actual CTL_FLUSH value into the timeout message. This allows
us to identify the DPU block that gets stuck.

Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
index d0f56c5c4cce..fb34067ab6af 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c
@@ -489,7 +489,7 @@ static int dpu_encoder_phys_vid_wait_for_commit_done(
(hw_ctl->ops.get_flush_register(hw_ctl) == 0),
msecs_to_jiffies(50));
if (ret <= 0) {
-   DPU_ERROR("vblank timeout\n");
+   DPU_ERROR("vblank timeout: %x\n", 
hw_ctl->ops.get_flush_register(hw_ctl));
return -ETIMEDOUT;
}


Nothing wrong with this change.

But I dont know how much information this is giving to really find out 
what is causing the vblank timeout. Sure, we know which flush bit is 
actually stuck but we dont know why its stuck.


We should add a logic here to get the snapshot on the first vblank 
timeout that way we avoid excessive capture as well similar to the other 
fatal locations of calling snapshot.


  


---
base-commit: 39676dfe52331dba909c617f213fdb21015c8d10
change-id: 20240106-fd-dpu-debug-timeout-e917f0bc8063

Best regards,

RE: 回复：Making drm_gpuvm work across gpu devices

2024-01-25 Thread Zeng, Oak

Hi Chunming,

From: 周春明(日月) 
Sent: Thursday, January 25, 2024 6:01 AM
To: Zeng, Oak ; Christian König ; 
Danilo Krummrich ; Dave Airlie ; Daniel 
Vetter ; Felix Kuehling ; Shah, Ankur 
N ; Winiarski, Michal 
Cc: Brost, Matthew ; thomas.hellst...@linux.intel.com; 
Welty, Brian ; dri-devel@lists.freedesktop.org; 
Ghimiray, Himal Prasad ; Gupta, saurabhg 
; Bommu, Krishnaiah ; 
Vishwanathapura, Niranjana ; 
intel...@lists.freedesktop.org
Subject: 回复：Making drm_gpuvm work across gpu devices

[snip]

Fd0 = open(card0)

Fd1 = open(card1)

Vm0 =xe_vm_create(fd0) //driver create process xe_svm on the process's first 
vm_create

Vm1 = xe_vm_create(fd1) //driver re-use xe_svm created above if called from 
same process

Queue0 = xe_exec_queue_create(fd0, vm0)

Queue1 = xe_exec_queue_create(fd1, vm1)

//check p2p capability calling L0 API….

ptr = malloc()//this replace bo_create, vm_bind, dma-import/export

Xe_exec(queue0, ptr)//submit gpu job which use ptr, on card0

Xe_exec(queue1, ptr)//submit gpu job which use ptr, on card1

//Gpu page fault handles memory allocation/migration/mapping to gpu
[snip]
Hi Oak,
From your sample code, you not only need va-manager cross gpu devices, but also 
cpu, right?

No. Per the feedback from Christian and Danilo, I would give up the idea of 
making drm_gpuvm to work across gpu devices. I might want to come back later 
but for now it is not the plan anymore.

I think you need a UVA (unified va) manager in user space and make range of 
drm_gpuvm reserved from cpu va space. In that way, malloc's va and gpu va are 
in same space and will not conflict. And then via HMM mechanism, gpu devices 
can safely use VA passed from HMM.

Under HMM, both GPU and CPU are simply under the same address space. A same 
virtual address represent the same allocation for both CPU and GPUs. See the 
hmm doc here: https://www.kernel.org/doc/Documentation/vm/hmm.rst.  User space 
program doesn’t need to reserve any address range. All the address ranges are 
managed by linux kernel core mm. Today GPU kmd driver has some structure to 
save the address range based memory attributes.

Regards,
Oak

By the way, I'm not familiar with drm_gpuvm, traditionally, gpu driver often 
put va-manager in user space, not sure what's benefit we can get from drm_gpuvm 
invented in kernel space. Can anyone help explain more?

- Chunming
--
发件人：Zeng, Oak mailto:oak.z...@intel.com>>
发送时间：2024年1月25日(星期四) 09:17
收件人："Christian König" 
mailto:christian.koe...@amd.com>>; Danilo Krummrich 
mailto:d...@redhat.com>>; Dave Airlie 
mailto:airl...@redhat.com>>; Daniel Vetter 
mailto:dan...@ffwll.ch>>; Felix Kuehling 
mailto:felix.kuehl...@amd.com>>; "Shah, Ankur N" 
mailto:ankur.n.s...@intel.com>>; "Winiarski, Michal" 
mailto:michal.winiar...@intel.com>>
抄　送："Brost, Matthew" mailto:matthew.br...@intel.com>>; 
thomas.hellst...@linux.intel.com 
mailto:thomas.hellst...@linux.intel.com>>; 
"Welty, Brian" mailto:brian.we...@intel.com>>; 
dri-devel@lists.freedesktop.org 
mailto:dri-devel@lists.freedesktop.org>>; 
"Ghimiray, Himal Prasad" 
mailto:himal.prasad.ghimi...@intel.com>>; 
"Gupta, saurabhg" mailto:saurabhg.gu...@intel.com>>; 
"Bommu, Krishnaiah" 
mailto:krishnaiah.bo...@intel.com>>; 
"Vishwanathapura, Niranjana" 
mailto:niranjana.vishwanathap...@intel.com>>;
 intel...@lists.freedesktop.org 
mailto:intel...@lists.freedesktop.org>>
主　题：RE: Making drm_gpuvm work across gpu devices

Hi Christian,

Even though I mentioned KFD design, I didn’t mean to copy the KFD design. I 
also had hard time to understand the difficulty of KFD under virtualization 
environment.

For us, Xekmd doesn't need to know it is running under bare metal or 
virtualized environment. Xekmd is always a guest driver. All the virtual 
address used in xekmd is guest virtual address. For SVM, we require all the VF 
devices share one single shared address space with guest CPU program. So all 
the design works in bare metal environment can automatically work under 
virtualized environment. +@Shah, Ankur N 
+@Winiarski, Michal to backup me if I am 
wrong.

Again, shared virtual address space b/t cpu and all gpu devices is a hard 
requirement for our system allocator design (which means malloc’ed memory, cpu 
stack variables, globals can be directly used in gpu program. Same requirement 
as kfd SVM design). This was aligned with our user space software stack.

For anyone who want to implement system allocator, or SVM, this is a hard 
requirement. I started this thread hoping I can leverage the drm_gpuvm design 
to manage the shared virtual address space (as the address range split/merge 
function was scary to me and I didn’t want re-invent). I guess my takeaway from 
this you and Danilo is this approach is a NAK. Thomas also

Re: [PATCH v2 2/3] drm/etnaviv: Turn etnaviv_is_model_rev() into a function

2024-01-25 Thread Lucas Stach

Am Donnerstag, dem 25.01.2024 um 17:27 +0100 schrieb Christian Gmeiner:
> Hi Philipp
> 
> > 
> > Turn the etnaviv_is_model_rev() macro into a static inline function.
> > Use the raw model number as a parameter instead of the chipModel_GC
> > defines. This reduces synchronization requirements for the generated
> > headers. For newer hardware, the GC names are not the correct model
> > names anyway. For example, model 0x8000 NPUs are called VIPNano-QI/SI(+)
> > by VeriSilicon.
> 
> To catch up with your NPU example Vivante's kernel driver has such
> lines in its hw database [0]
> 
> /* vipnano-si+ */
> {
> 0x8000, /* ChipID */
> 0x8002, /* ChipRevision */
> 0x5080009, /* ProductID */
> 0x600, /* EcoID */
> 0x9f, /* CustomerID */
> ...
> 
> I think in reality this function should be called
> etnaviv_is_chip_rev(..) or etnaviv_is_id_rev(..). That would be
> semantically correct and we could even stick the the current macro
> (that gets renamed) and with the current
> GCxxx defines.

The value for what is called ChipID in the downstream driver is read
from a register which is called VIVS_HI_CHIP_MODEL in rnndb. I would
like to stay consistent by calling this model in the etnaviv driver.

I don't see any value in the GCxxx defines, which only add a (pretty)
prefix to a perfectly readable hex number, so I'm fine with changing
the current macro and getting rid of any usage of those defines in the
driver.

Regards,
Lucas

> 
> [0]: 
> https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/drivers/mxc/gpu-viv/hal/kernel/inc/gc_feature_database.h#L22373
> 
> > 
> > Signed-off-by: Philipp Zabel 
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 66 
> > ++-
> >  1 file changed, 34 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c 
> > b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > index 9b8445d2a128..c61d50dd3829 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> > @@ -172,10 +172,12 @@ int etnaviv_gpu_get_param(struct etnaviv_gpu *gpu, 
> > u32 param, u64 *value)
> > return 0;
> >  }
> > 
> > +static inline bool etnaviv_is_model_rev(struct etnaviv_gpu *gpu, u32 
> > model, u32 revision)
> > +{
> > +   return gpu->identity.model == model &&
> > +  gpu->identity.revision == revision;
> > +}
> > 
> > -#define etnaviv_is_model_rev(gpu, mod, rev) \
> > -   ((gpu)->identity.model == chipModel_##mod && \
> > -(gpu)->identity.revision == rev)
> >  #define etnaviv_field(val, field) \
> > (((val) & field##__MASK) >> field##__SHIFT)
> > 
> > @@ -281,7 +283,7 @@ static void etnaviv_hw_specs(struct etnaviv_gpu *gpu)
> > 
> > switch (gpu->identity.instruction_count) {
> > case 0:
> > -   if (etnaviv_is_model_rev(gpu, GC2000, 0x5108) ||
> > +   if (etnaviv_is_model_rev(gpu, 0x2000, 0x5108) ||
> > gpu->identity.model == chipModel_GC880)
> > gpu->identity.instruction_count = 512;
> > else
> > @@ -315,17 +317,17 @@ static void etnaviv_hw_specs(struct etnaviv_gpu *gpu)
> >  * For some cores, two varyings are consumed for position, so the
> >  * maximum varying count needs to be reduced by one.
> >  */
> > -   if (etnaviv_is_model_rev(gpu, GC5000, 0x5434) ||
> > -   etnaviv_is_model_rev(gpu, GC4000, 0x5222) ||
> > -   etnaviv_is_model_rev(gpu, GC4000, 0x5245) ||
> > -   etnaviv_is_model_rev(gpu, GC4000, 0x5208) ||
> > -   etnaviv_is_model_rev(gpu, GC3000, 0x5435) ||
> > -   etnaviv_is_model_rev(gpu, GC2200, 0x5244) ||
> > -   etnaviv_is_model_rev(gpu, GC2100, 0x5108) ||
> > -   etnaviv_is_model_rev(gpu, GC2000, 0x5108) ||
> > -   etnaviv_is_model_rev(gpu, GC1500, 0x5246) ||
> > -   etnaviv_is_model_rev(gpu, GC880, 0x5107) ||
> > -   etnaviv_is_model_rev(gpu, GC880, 0x5106))
> > +   if (etnaviv_is_model_rev(gpu, 0x5000, 0x5434) ||
> > +   etnaviv_is_model_rev(gpu, 0x4000, 0x5222) ||
> > +   etnaviv_is_model_rev(gpu, 0x4000, 0x5245) ||
> > +   etnaviv_is_model_rev(gpu, 0x4000, 0x5208) ||
> > +   etnaviv_is_model_rev(gpu, 0x3000, 0x5435) ||
> > +   etnaviv_is_model_rev(gpu, 0x2200, 0x5244) ||
> > +   etnaviv_is_model_rev(gpu, 0x2100, 0x5108) ||
> > +   etnaviv_is_model_rev(gpu, 0x2000, 0x5108) ||
> > +   etnaviv_is_model_rev(gpu, 0x1500, 0x5246) ||
> > +   etnaviv_is_model_rev(gpu, 0x880, 0x5107) ||
> > +   etnaviv_is_model_rev(gpu, 0x880, 0x5106))
> > gpu->identity.varyings_count -= 1;
> >  }
> > 
> > @@ -351,7 +353,7 @@ static void etnaviv_hw_identify(struct etnaviv_gpu *gpu)
> >  * Reading these two registers on GC600 rev 0x19 result in a
> >  * unhandled fault: external abort on non-linefetch
> >

Re: [PATCH v19 18/30] drm/panfrost: Explicitly get and put drm-shmem pages

2024-01-25 Thread Steven Price

On 05/01/2024 18:46, Dmitry Osipenko wrote:
> To simplify the drm-shmem refcnt handling, we're moving away from
> the implicit get_pages() that is used by get_pages_sgt(). From now on
> drivers will have to pin pages while they use sgt. Panfrost's shrinker
> doesn't support swapping out BOs, hence pages are pinned and sgt is valid
> as long as pages' use-count > 0.
> 
> In Panfrost, panfrost_gem_mapping, which is the object representing a
> GPU mapping of a BO, owns a pages ref. This guarantees that any BO being
> mapped GPU side has its pages retained till the mapping is destroyed.
> 
> Since pages are no longer guaranteed to stay pinned for the BO lifetime,
> and MADVISE(DONT_NEED) flagging remains after the GEM handle has been
> destroyed, we need to add an extra 'is_purgeable' check in
> panfrost_gem_purge(), to make sure we're not trying to purge a BO that
> already had its pages released.
> 
> Signed-off-by: Dmitry Osipenko 

Reviewed-by: Steven Price 

Although I don't like the condition in panfrost_gem_mapping_release()
for drm_gem_shmem_put_pages() and assigning NULL to bo->sgts - it feels
very fragile. See below.

> ---
>  drivers/gpu/drm/panfrost/panfrost_gem.c   | 63 ++-
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  6 ++
>  2 files changed, 52 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
> b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index f268bd5c2884..7edfc12f7c1f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -35,20 +35,6 @@ static void panfrost_gem_free_object(struct drm_gem_object 
> *obj)
>*/
>   WARN_ON_ONCE(!list_empty(>mappings.list));
>  
> - if (bo->sgts) {
> - int i;
> - int n_sgt = bo->base.base.size / SZ_2M;
> -
> - for (i = 0; i < n_sgt; i++) {
> - if (bo->sgts[i].sgl) {
> - dma_unmap_sgtable(pfdev->dev, >sgts[i],
> -   DMA_BIDIRECTIONAL, 0);
> - sg_free_table(>sgts[i]);
> - }
> - }
> - kvfree(bo->sgts);
> - }
> -
>   drm_gem_shmem_free(>base);
>  }
>  
> @@ -85,11 +71,40 @@ panfrost_gem_teardown_mapping(struct panfrost_gem_mapping 
> *mapping)
>  
>  static void panfrost_gem_mapping_release(struct kref *kref)
>  {
> - struct panfrost_gem_mapping *mapping;
> -
> - mapping = container_of(kref, struct panfrost_gem_mapping, refcount);
> + struct panfrost_gem_mapping *mapping =
> + container_of(kref, struct panfrost_gem_mapping, refcount);
> + struct panfrost_gem_object *bo = mapping->obj;
> + struct panfrost_device *pfdev = bo->base.base.dev->dev_private;
>  
>   panfrost_gem_teardown_mapping(mapping);
> +
> + /* On heap BOs, release the sgts created in the fault handler path. */
> + if (bo->sgts) {
> + int i, n_sgt = bo->base.base.size / SZ_2M;
> +
> + for (i = 0; i < n_sgt; i++) {
> + if (bo->sgts[i].sgl) {
> + dma_unmap_sgtable(pfdev->dev, >sgts[i],
> +   DMA_BIDIRECTIONAL, 0);
> + sg_free_table(>sgts[i]);
> + }
> + }
> + kvfree(bo->sgts);
> + }
> +
> + /* Pages ref is owned by the panfrost_gem_mapping object. We must
> +  * release our pages ref (if any), before releasing the object
> +  * ref.
> +  * Non-heap BOs acquired the pages at panfrost_gem_mapping creation
> +  * time, and heap BOs may have acquired pages if the fault handler
> +  * was called, in which case bo->sgts should be non-NULL.
> +  */
> + if (!bo->base.base.import_attach && (!bo->is_heap || bo->sgts) &&
> + bo->base.madv >= 0) {
> + drm_gem_shmem_put_pages(>base);
> + bo->sgts = NULL;

The assignment of NULL here really ought to be unconditional - it isn't
a valid pointer because of the kvfree() above.

I also feel that the big condition above suggests there's a need for a
better state machine to keep track of what's going on.

But having said that I do think this series as a whole is an
improvement, it's nice to get the shrinker code generic. And sadly I
don't have an immediate idea for cleaning this up, hence my R-b.

Steve

> + }
> +
>   drm_gem_object_put(>obj->base.base);
>   panfrost_mmu_ctx_put(mapping->mmu);
>   kfree(mapping);
> @@ -125,6 +140,20 @@ int panfrost_gem_open(struct drm_gem_object *obj, struct 
> drm_file *file_priv)
>   if (!mapping)
>   return -ENOMEM;
>  
> + if (!bo->is_heap && !bo->base.base.import_attach) {
> + /* Pages ref is owned by the panfrost_gem_mapping object.
> +  * For non-heap BOs, we request pages at mapping creation
> +  * time, such that the

Re: [PATCH v19 17/30] drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()

2024-01-25 Thread Steven Price

On 05/01/2024 18:46, Dmitry Osipenko wrote:
> From: Boris Brezillon 
> 
> If some the pages or sgt allocation failed, we shouldn't release the
> pages ref we got earlier, otherwise we will end up with unbalanced
> get/put_pages() calls. We should instead leave everything in place
> and let the BO release function deal with extra cleanup when the object
> is destroyed, or let the fault handler try again next time it's called.
> 
> Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations")
> Cc: 
> Signed-off-by: Boris Brezillon 
> Co-developed-by: Dmitry Osipenko 
> Signed-off-by: Dmitry Osipenko 

Reviewed-by: Steven Price 

> ---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
> b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index bd5a0073009d..4a0b4bf03f1a 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -502,11 +502,18 @@ static int panfrost_mmu_map_fault_addr(struct 
> panfrost_device *pfdev, int as,
>   mapping_set_unevictable(mapping);
>  
>   for (i = page_offset; i < page_offset + NUM_FAULT_PAGES; i++) {
> + /* Can happen if the last fault only partially filled this
> +  * section of the pages array before failing. In that case
> +  * we skip already filled pages.
> +  */
> + if (pages[i])
> + continue;
> +
>   pages[i] = shmem_read_mapping_page(mapping, i);
>   if (IS_ERR(pages[i])) {
>   ret = PTR_ERR(pages[i]);
>   pages[i] = NULL;
> - goto err_pages;
> + goto err_unlock;
>   }
>   }
>  
> @@ -514,7 +521,7 @@ static int panfrost_mmu_map_fault_addr(struct 
> panfrost_device *pfdev, int as,
>   ret = sg_alloc_table_from_pages(sgt, pages + page_offset,
>   NUM_FAULT_PAGES, 0, SZ_2M, GFP_KERNEL);
>   if (ret)
> - goto err_pages;
> + goto err_unlock;
>  
>   ret = dma_map_sgtable(pfdev->dev, sgt, DMA_BIDIRECTIONAL, 0);
>   if (ret)
> @@ -537,8 +544,6 @@ static int panfrost_mmu_map_fault_addr(struct 
> panfrost_device *pfdev, int as,
>  
>  err_map:
>   sg_free_table(sgt);
> -err_pages:
> - drm_gem_shmem_put_pages_locked(>base);
>  err_unlock:
>   dma_resv_unlock(obj->resv);
>  err_bo:

RE: Making drm_gpuvm work across gpu devices

2024-01-25 Thread Zeng, Oak

Hi Christian,

I got a few more questions inline

From: Christian König 
Sent: Wednesday, January 24, 2024 3:33 AM
To: Zeng, Oak ; Danilo Krummrich ; Dave 
Airlie ; Daniel Vetter ; Felix Kuehling 

Cc: Welty, Brian ; dri-devel@lists.freedesktop.org; 
intel...@lists.freedesktop.org; Bommu, Krishnaiah ; 
Ghimiray, Himal Prasad ; 
thomas.hellst...@linux.intel.com; Vishwanathapura, Niranjana 
; Brost, Matthew 
; Gupta, saurabhg 
Subject: Re: Making drm_gpuvm work across gpu devices

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]

Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at the svm_ioctl 
function, it is per-process based. Each kfd_process represent a process across 
N gpu devices.

Yeah and that was a big mistake in my opinion. We should really not do that 
ever again.

Need to say, kfd SVM represent a shared virtual address space across CPU and 
all GPU devices on the system. This is by the definition of SVM (shared virtual 
memory). This is very different from our legacy gpu *device* driver which works 
for only one device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a 
virtualization projects. Having SVM as device independent feature which somehow 
ties to the process address space turned out to be an extremely bad idea.

The background is that this only works for some use cases but not all of them.

What's working much better is to just have a mirror functionality which says 
that a range A..B of the process address space is mapped into a range C..D of 
the GPU address space.

Those ranges can then be used to implement the SVM feature required for higher 
level APIs and not something you need at the UAPI or even inside the low level 
kernel memory management.

The whole purpose of the HMM design is to create a shared address space b/t cpu 
and gpu program. See here: https://www.kernel.org/doc/Documentation/vm/hmm.rst. 
Mapping process address A..B to C..D of GPU address space is exactly referred 
as “split address space” in the HMM design.

When you talk about migrating memory to a device you also do this on a per 
device basis and *not* tied to the process address space. If you then get 
crappy performance because userspace gave contradicting information where to 
migrate memory then that's a bug in userspace and not something the kernel 
should try to prevent somehow.

[SNIP]

I think if you start using the same drm_gpuvm for multiple devices you

will sooner or later start to run into the same mess we have seen with

KFD, where we moved more and more functionality from the KFD to the DRM

render node because we found that a lot of the stuff simply doesn't work

correctly with a single object to maintain the state.

As I understand it, KFD is designed to work across devices. A single pseudo 
/dev/kfd device represent all hardware gpu devices. That is why during kfd 
open, many pdd (process device data) is created, each for one hardware device 
for this process.

Yes, I'm perfectly aware of that. And I can only repeat myself that I see this 
design as a rather extreme failure. And I think it's one of the reasons why 
NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of extending the CPU 
process into the GPUs, but this idea only works for a few use cases and is not 
something we should apply to drivers in general.

A very good example are virtualization use cases where you end up with CPU 
address != GPU address because the VAs are actually coming from the guest VM 
and not the host process.

Are you talking about general virtualization set up such as SRIOV, GPU device 
pass through, or something else?

In a typical virtualization set up, gpu driver such as xekmd or amdgpu is 
always a guest driver. In xekmd case, xekmd doesn’t need to know it is 
operating under virtualized environment. So the virtual address in driver is 
guest virtual address. From kmd driver perspective, there is no difference b/t 
bare metal and virtualized.

Are you talking about special virtualized setup such as para-virtualized/VirGL? 
I need more background info to understand why you end up with CPU address !=GPU 
address in SVM….

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have 
any influence on the design of the kernel UAPI.

Maybe a terminology problem here. I agree what you said above. We also have 
achieved the SVM design with our BO-centric driver such as i915, xekmd.

But we are mainly talking about system allocator here, like use malloc’ed 
memory directly for GPU program. And we want to leverage HMM. System allocator 
can be used to implement the same SVM concept at OpenCL/Cuda/ROCm, but SVM can 
be implemented with BO-centric driver also.

If you want to do something similar as KFD for Xe I think you need to get 
explicit permission to do

1 2 >

1 - 100 of 160 matches

Mail list logo