Re: [PATCH] treewide: Fix common grammar mistake "the the"

2024-04-11 Thread Robin Murphy

On 11/04/2024 4:04 pm, Thorsten Blum wrote:

Use `find . -type f -exec sed -i 's/\/the/g' {} +` to find all
occurrences of "the the" and replace them with a single "the".


[...]

diff --git a/arch/arm/include/asm/unwind.h b/arch/arm/include/asm/unwind.h
index d60b09a5acfc..a75da9a01f91 100644
--- a/arch/arm/include/asm/unwind.h
+++ b/arch/arm/include/asm/unwind.h
@@ -10,7 +10,7 @@
  
  #ifndef __ASSEMBLY__
  
-/* Unwind reason code according the the ARM EABI documents */

+/* Unwind reason code according the ARM EABI documents */


Well, that's clearly still not right... repeated words aren't *always* 
redundant, sometimes they're meant to be other words ;)


Thanks,
Robin.


Re: [PATCH] drm/panthor: Don't use virt_to_pfn()

2024-03-18 Thread Robin Murphy

On 18/03/2024 2:51 pm, Steven Price wrote:

virt_to_pfn() isn't available on x86 (except to xen) so breaks
COMPILE_TEST builds. Avoid its use completely by instead storing the
struct page pointer allocated in panthor_device_init() and using
page_to_pfn() instead.

Signed-off-by: Steven Price 
---
  drivers/gpu/drm/panthor/panthor_device.c | 10 ++
  drivers/gpu/drm/panthor/panthor_device.h |  2 +-
  2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 69deb8e17778..3c30da03fa48 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -154,6 +154,7 @@ int panthor_device_init(struct panthor_device *ptdev)
  {
struct resource *res;
struct page *p;
+   u32 *dummy_page_virt;
int ret;
  
  	ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;

@@ -172,9 +173,10 @@ int panthor_device_init(struct panthor_device *ptdev)
if (!p)
return -ENOMEM;
  
-	ptdev->pm.dummy_latest_flush = page_address(p);

+   ptdev->pm.dummy_latest_flush = p;
+   dummy_page_virt = page_address(p);
ret = drmm_add_action_or_reset(>base, panthor_device_free_page,
-  ptdev->pm.dummy_latest_flush);
+  dummy_page_virt);


Nit: I was about to say I'd be inclined to switch the callback to 
__free_page() instead, but then I realise there's no real need to be 
reinventing that in the first place:


dummy_page_virt = (void *)devm_get_free_pages(ptdev->base.dev,
GFP_KERNEL | GFP_ZERO, 0);
if (!dummy_page_virt)
return -ENOMEM;

ptdev->pm.dummy_latest_flush = virt_to_page(dummy_page_virt);

Cheers,
Robin.


if (ret)
return ret;
  
@@ -184,7 +186,7 @@ int panthor_device_init(struct panthor_device *ptdev)

 * happens while the dummy page is mapped. Zero cannot be used because
 * that means 'always flush'.
 */
-   *ptdev->pm.dummy_latest_flush = 1;
+   *dummy_page_virt = 1;
  
  	INIT_WORK(>reset.work, panthor_device_reset_work);

ptdev->reset.wq = alloc_ordered_workqueue("panthor-reset-wq", 0);
@@ -353,7 +355,7 @@ static vm_fault_t panthor_mmio_vm_fault(struct vm_fault 
*vmf)
if (active)
pfn = __phys_to_pfn(ptdev->phys_addr + 
CSF_GPU_LATEST_FLUSH_ID);
else
-   pfn = virt_to_pfn(ptdev->pm.dummy_latest_flush);
+   pfn = page_to_pfn(ptdev->pm.dummy_latest_flush);
break;
  
  	default:

diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 51c9d61b6796..c84c27dcc92c 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -160,7 +160,7 @@ struct panthor_device {
 * Used to replace the real LATEST_FLUSH page when the GPU
 * is suspended.
 */
-   u32 *dummy_latest_flush;
+   struct page *dummy_latest_flush;
} pm;
  };
  


Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case

2024-03-18 Thread Robin Murphy

On 18/03/2024 1:49 pm, Steven Price wrote:

On 18/03/2024 13:08, Boris Brezillon wrote:

On Mon, 18 Mar 2024 11:31:05 +
Steven Price  wrote:


On 18/03/2024 08:58, Boris Brezillon wrote:

Putting a hard dependency on CONFIG_PM is not possible because of a
circular dependency issue, and it's actually not desirable either. In
order to support this use case, we forcibly resume at init time, and
suspend at unplug time.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/
Signed-off-by: Boris Brezillon 


Reviewed-by: Steven Price 


---
Tested by faking CONFIG_PM=n in the driver (basically commenting
all pm_runtime calls, and making the panthor_device_suspend/resume()
calls unconditional in the panthor_device_unplug/init() path) since
CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I
can't be 100% sure this will work correctly on a platform that has
CONFIG_PM=n.


The same - I can't test this properly :(

Note that the other option (which AFAICT doesn't cause any problems) is
to "select PM" rather than depend on it - AIUI the 'select' dependency
is considered in the opposite direction by kconfig so won't cause the
dependency loop.


Doesn't seem to work with COMPILE_TEST though? I mean, we need
something like

depends on ARM || ARM64 || (COMPILE_TEST && PM)
...
select PM

but kconfig doesn't like that


Why do we need the "&& PM" part? Just:

depends on ARM || ARM64 || COMPILE_TEST
...
select PM

Or at least that appears to work for me.


drivers/gpu/drm/panthor/Kconfig:3:error: recursive dependency detected!
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on
PM kernel/power/Kconfig:183:symbol PM is selected by DRM_PANTHOR

which id why I initially when for a depends on PM



Of course if there is actually anyone who has a
platform which can be built !CONFIG_PM then that won't help. But the
inability of anyone to actually properly test this configuration does
worry me a little.


Well, as long as it doesn't regress the PM behavior, I think I'm happy
to take the risk. Worst case scenario, someone complains that this is
not working properly when they do the !PM bringup :-).


Indeed, I've no objection to this patch - although I really should have
compiled tested it as Robin pointed out ;)

But one other thing I've noticed when compile testing it - we don't
appear to have fully fixed the virt_to_pfn() problem. On x86 with
COMPILE_TEST I still get an error. Looking at the code it appears that
virt_to_pfn() isn't available on x86... it overrides asm/page.h and
doesn't provide a definition. The definition on x86 is hiding in
asm/xen/page.h.

Outside of arch code it's only drivers/xen that currently uses that
function. So I guess it's probably best to do a
PFN_DOWN(virt_to_phys(...)) instead. Or look to fix x86 :)


FWIW from a quick look it might be cleaner to store the struct page 
pointer for the dummy page - especially since the VA only seems to be 
used once in panthor_device_init() anyway - then use page_to_pfn() at 
the business end.


Cheers,
Robin.


Re: [PATCH] drm/panthor: Fix the CONFIG_PM=n case

2024-03-18 Thread Robin Murphy

On 18/03/2024 8:58 am, Boris Brezillon wrote:

Putting a hard dependency on CONFIG_PM is not possible because of a
circular dependency issue, and it's actually not desirable either. In
order to support this use case, we forcibly resume at init time, and
suspend at unplug time.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202403031944.eoimq8wk-...@intel.com/
Signed-off-by: Boris Brezillon 
---
Tested by faking CONFIG_PM=n in the driver (basically commenting
all pm_runtime calls, and making the panthor_device_suspend/resume()
calls unconditional in the panthor_device_unplug/init() path) since
CONFIG_ARCH_ROCKCHIP selects CONFIG_PM. Seems to work fine, but I
can't be 100% sure this will work correctly on a platform that has
CONFIG_PM=n.
---
  drivers/gpu/drm/panthor/panthor_device.c | 13 +++--
  drivers/gpu/drm/panthor/panthor_drv.c|  4 +++-
  2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 69deb8e17778..ba7aedbb4931 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -87,6 +87,10 @@ void panthor_device_unplug(struct panthor_device *ptdev)
pm_runtime_dont_use_autosuspend(ptdev->base.dev);
pm_runtime_put_sync_suspend(ptdev->base.dev);
  
+	/* If PM is disabled, we need to call the suspend handler manually. */

+   if (!IS_ENABLED(CONFIG_PM))
+   panthor_device_suspend(ptdev->base.dev);
+
/* Report the unplug operation as done to unblock concurrent
 * panthor_device_unplug() callers.
 */
@@ -218,6 +222,13 @@ int panthor_device_init(struct panthor_device *ptdev)
if (ret)
return ret;
  
+	/* If PM is disabled, we need to call panthor_device_resume() manually. */

+   if (!IS_ENABLED(CONFIG_PM)) {
+   ret = panthor_device_resume(ptdev->base.dev);
+   if (ret)
+   return ret;
+   }
+
ret = panthor_gpu_init(ptdev);
if (ret)
goto err_rpm_put;
@@ -402,7 +413,6 @@ int panthor_device_mmap_io(struct panthor_device *ptdev, 
struct vm_area_struct *
return 0;
  }
  
-#ifdef CONFIG_PM

  int panthor_device_resume(struct device *dev)
  {
struct panthor_device *ptdev = dev_get_drvdata(dev);
@@ -547,4 +557,3 @@ int panthor_device_suspend(struct device *dev)
mutex_unlock(>pm.mmio_lock);
return ret;
  }
-#endif
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index ff484506229f..2ea6a9f436db 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1407,17 +1407,19 @@ static const struct of_device_id dt_match[] = {
  };
  MODULE_DEVICE_TABLE(of, dt_match);
  
+#ifdef CONFIG_PM


This #ifdef isn't necessary, and in fact will break the !PM build - 
pm_ptr() already takes care of allowing the compiler to optimise out the 
ops structure itself without any further annotations.


Thanks,
Robin.


  static DEFINE_RUNTIME_DEV_PM_OPS(panthor_pm_ops,
 panthor_device_suspend,
 panthor_device_resume,
 NULL);
+#endif
  
  static struct platform_driver panthor_driver = {

.probe = panthor_probe,
.remove_new = panthor_remove,
.driver = {
.name = "panthor",
-   .pm = _pm_ops,
+   .pm = pm_ptr(_pm_ops),
.of_match_table = dt_match,
},
  };


Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue

2024-03-11 Thread Robin Murphy

On 2024-03-11 1:22 pm, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:11:28 +
Robin Murphy  wrote:


On 2024-03-11 11:52 am, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:49:56 +0200
Jani Nikula  wrote:
   

On Mon, 11 Mar 2024, Boris Brezillon  wrote:

On Mon, 11 Mar 2024 13:05:01 +0200
Jani Nikula  wrote:
 

This breaks the config for me:

SYNCinclude/config/auto.conf.cmd
GEN Makefile
drivers/iommu/Kconfig:14:error: recursive dependency detected!
drivers/iommu/Kconfig:14:   symbol IOMMU_SUPPORT is selected by DRM_PANTHOR
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on PM
kernel/power/Kconfig:183:   symbol PM is selected by PM_SLEEP
kernel/power/Kconfig:117:   symbol PM_SLEEP depends on HIBERNATE_CALLBACKS
kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by 
XEN_SAVE_RESTORE
arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN
arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT
arch/x86/Kconfig:781:   symbol PARAVIRT is selected by HYPERV
drivers/hv/Kconfig:5:   symbol HYPERV depends on X86_LOCAL_APIC
arch/x86/Kconfig:1106:  symbol X86_LOCAL_APIC depends on X86_UP_APIC
arch/x86/Kconfig:1081:  symbol X86_UP_APIC prompt is visible depending on 
PCI_MSI
drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU
drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT


Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select
IOMMU_SUPPORT" in panthor then.


That works for me.


Let's revert the faulty commit first. We'll see if Steve has a
different solution for the original issue.


FWIW, the reasoning in the offending commit seems incredibly tenuous.
There are far more practical reasons for building an arm/arm64 kernel
without PM - for debugging or whatever, and where one may even still
want a usable GPU, let alone just a non-broken build - than there are
for building this driver for x86. Using pm_ptr() is trivial, and if you
want to support COMPILE_TEST then there's really no justifiable excuse
not to.


The problem is not just about using pm_ptr(), but also making sure
panthor_device_resume/suspend() are called called in the init/unplug
path when !PM, as I don't think the PM helpers automate that for us. I
was just aiming for a simple fix that wouldn't force me to test the !PM
case...
Fair enough, at worst we could always have a runtime check and refuse to 
probe in conditions we don't think are worth the bother of implementing 
fully-functional support for. However if we want to make an argument for 
only supporting "realistic" configs at build time then that is an 
argument for dropping COMPILE_TEST as well.


Thanks,
Robin.


Re: [PATCH 3/3] drm/panthor: Fix undefined panthor_device_suspend/resume symbol issue

2024-03-11 Thread Robin Murphy

On 2024-03-11 11:52 am, Boris Brezillon wrote:

On Mon, 11 Mar 2024 13:49:56 +0200
Jani Nikula  wrote:


On Mon, 11 Mar 2024, Boris Brezillon  wrote:

On Mon, 11 Mar 2024 13:05:01 +0200
Jani Nikula  wrote:
  

This breaks the config for me:

   SYNCinclude/config/auto.conf.cmd
   GEN Makefile
drivers/iommu/Kconfig:14:error: recursive dependency detected!
drivers/iommu/Kconfig:14:   symbol IOMMU_SUPPORT is selected by DRM_PANTHOR
drivers/gpu/drm/panthor/Kconfig:3:  symbol DRM_PANTHOR depends on PM
kernel/power/Kconfig:183:   symbol PM is selected by PM_SLEEP
kernel/power/Kconfig:117:   symbol PM_SLEEP depends on HIBERNATE_CALLBACKS
kernel/power/Kconfig:35:symbol HIBERNATE_CALLBACKS is selected by 
XEN_SAVE_RESTORE
arch/x86/xen/Kconfig:67:symbol XEN_SAVE_RESTORE depends on XEN
arch/x86/xen/Kconfig:6: symbol XEN depends on PARAVIRT
arch/x86/Kconfig:781:   symbol PARAVIRT is selected by HYPERV
drivers/hv/Kconfig:5:   symbol HYPERV depends on X86_LOCAL_APIC
arch/x86/Kconfig:1106:  symbol X86_LOCAL_APIC depends on X86_UP_APIC
arch/x86/Kconfig:1081:  symbol X86_UP_APIC prompt is visible depending on 
PCI_MSI
drivers/pci/Kconfig:39: symbol PCI_MSI is selected by AMD_IOMMU
drivers/iommu/amd/Kconfig:3:symbol AMD_IOMMU depends on IOMMU_SUPPORT


Uh, I guess we want a "depends on IOMMU_SUPPORT" instead of "select
IOMMU_SUPPORT" in panthor then.


That works for me.


Let's revert the faulty commit first. We'll see if Steve has a
different solution for the original issue.


FWIW, the reasoning in the offending commit seems incredibly tenuous. 
There are far more practical reasons for building an arm/arm64 kernel 
without PM - for debugging or whatever, and where one may even still 
want a usable GPU, let alone just a non-broken build - than there are 
for building this driver for x86. Using pm_ptr() is trivial, and if you 
want to support COMPILE_TEST then there's really no justifiable excuse 
not to.


Thanks,
Robin.


Re: [PATCH 10/10] ACPI: IORT: Allow COMPILE_TEST of IORT

2023-11-30 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The arm-smmu driver can COMPILE_TEST on x86, so expand this to also
enable the IORT code so it can be COMPILE_TEST'd too.

Signed-off-by: Jason Gunthorpe 
---
  drivers/acpi/Kconfig| 2 --
  drivers/acpi/Makefile   | 2 +-
  drivers/acpi/arm64/Kconfig  | 1 +
  drivers/acpi/arm64/Makefile | 2 +-
  drivers/iommu/Kconfig   | 1 +
  5 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index f819e760ff195a..3b7f77b227d13a 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -541,9 +541,7 @@ config ACPI_PFRUT
  To compile the drivers as modules, choose M here:
  the modules will be called pfr_update and pfr_telemetry.
  
-if ARM64

  source "drivers/acpi/arm64/Kconfig"
-endif
  
  config ACPI_PPTT

bool
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index eaa09bf52f1760..4e77ae37b80726 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -127,7 +127,7 @@ obj-y   += pmic/
  video-objs+= acpi_video.o video_detect.o
  obj-y += dptf/
  
-obj-$(CONFIG_ARM64)		+= arm64/

+obj-y  += arm64/
  
  obj-$(CONFIG_ACPI_VIOT)		+= viot.o
  
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig

index b3ed6212244c1e..537d49d8ace69e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -11,6 +11,7 @@ config ACPI_GTDT
  
  config ACPI_AGDI

bool "Arm Generic Diagnostic Dump and Reset Device Interface"
+   depends on ARM64
depends on ARM_SDE_INTERFACE
help
  Arm Generic Diagnostic Dump and Reset Device Interface (AGDI) is
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 143debc1ba4a9d..71d0e635599390 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_ACPI_IORT) += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
  obj-$(CONFIG_ACPI_APMT)   += apmt.o
  obj-$(CONFIG_ARM_AMBA)+= amba.o
-obj-y  += dma.o init.o
+obj-$(CONFIG_ARM64)+= dma.o init.o
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b6c..309378e76a9bc9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -318,6 +318,7 @@ config ARM_SMMU
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
+   select ACPI_IORT if ACPI


This is incomplete. If you want the driver to be responsible for 
enabling its own probing mechanisms then you need to select OF and ACPI 
too. And all the other drivers which probe from IORT should surely also 
select ACPI_IORT, and thus ACPI as well. And maybe the PCI core should 
as well because there are general properties of PCI host bridges and 
devices described in there?


But of course that's clearly backwards nonsense, because drivers do not 
and should not do that, so this change is not appropriate either. The 
IORT code may not be *functionally* arm64-specific, but logically it 
very much is - it serves a specification which is tied to the Arm 
architecture and describes Arm-architecture-specific concepts, within 
the wider context of ACPI on Arm itself only supporting AArch64, and not 
AArch32. It's also not like it's driver code that someone might use as 
an example and copy to a similar driver which could then run on 
different architectures where a latent theoretical bug becomes real. 
There's really no practical value to be had from compile-testing IORT.


Thanks,
Robin.


Re: [PATCH 06/10] iommu: Replace iommu_device_lock with iommu_probe_device_lock

2023-11-29 Thread Robin Murphy

On 29/11/2023 12:48 am, Jason Gunthorpe wrote:

The iommu_device_lock protects the iommu_device_list which is only read by
iommu_ops_from_fwnode().

This is now always called under the iommu_probe_device_lock, so we don't
need to double lock the linked list. Use the iommu_probe_device_lock on
the write side too.


Please no, iommu_probe_device_lock() is a hack and we need to remove the 
*reason* it exists at all. And IMO just because iommu_present() is 
deprecated doesn't justify making it look utterly nonsensical - in no 
way does that have any relationship with probe_device, much less need to 
serialise against it!


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 30 +-
  1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 08f29a1dfcd5f8..9557c2ec08d915 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -146,7 +146,6 @@ struct iommu_group_attribute iommu_group_attr_##_name = 
\
container_of(_kobj, struct iommu_group, kobj)
  
  static LIST_HEAD(iommu_device_list);

-static DEFINE_SPINLOCK(iommu_device_lock);
  
  static const struct bus_type * const iommu_buses[] = {

_bus_type,
@@ -262,9 +261,9 @@ int iommu_device_register(struct iommu_device *iommu,
if (hwdev)
iommu->fwnode = dev_fwnode(hwdev);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses) && !err; i++)

err = bus_iommu_probe(iommu_buses[i]);
@@ -279,9 +278,9 @@ void iommu_device_unregister(struct iommu_device *iommu)
for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++)
bus_for_each_dev(iommu_buses[i], NULL, iommu, 
remove_iommu_group);
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_del(>list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	/* Pairs with the alloc in generic_single_device_group() */

iommu_group_put(iommu->singleton_group);
@@ -316,9 +315,9 @@ int iommu_device_register_bus(struct iommu_device *iommu,
if (err)
return err;
  
-	spin_lock(_device_lock);

+   mutex_lock(_probe_device_lock);
list_add_tail(>list, _device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
  
  	err = bus_iommu_probe(bus);

if (err) {
@@ -2033,9 +2032,9 @@ bool iommu_present(const struct bus_type *bus)
  
  	for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) {

if (iommu_buses[i] == bus) {
-   spin_lock(_device_lock);
+   mutex_lock(_probe_device_lock);
ret = !list_empty(_device_list);
-   spin_unlock(_device_lock);
+   mutex_unlock(_probe_device_lock);
}
}
return ret;
@@ -2980,17 +2979,14 @@ EXPORT_SYMBOL_GPL(iommu_default_passthrough);
  
  const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode)

  {
-   const struct iommu_ops *ops = NULL;
struct iommu_device *iommu;
  
-	spin_lock(_device_lock);

+   lockdep_assert_held(_probe_device_lock);
+
list_for_each_entry(iommu, _device_list, list)
-   if (iommu->fwnode == fwnode) {
-   ops = iommu->ops;
-   break;
-   }
-   spin_unlock(_device_lock);
-   return ops;
+   if (iommu->fwnode == fwnode)
+   return iommu->ops;
+   return NULL;
  }
  
  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,


[PATCH v3] drm/mediatek: Stop using iommu_present()

2023-11-23 Thread Robin Murphy
Remove the pointless check. If an IOMMU is providing transparent DMA API
ops for any device(s) we care about, the DT code will have enforced the
appropriate probe ordering already. And if the IOMMU *is* entirely
absent, then attempting to go ahead with CMA and either suceeding or
failing decisively seems more useful than deferring forever.

Signed-off-by: Robin Murphy 
---

I realised that last time I sent this I probably should have CCed a
wider audience of reviewers, so here's one with an updated commit
message as well to make the resend more worthwhile.

 drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 2dfaa613276a..48581da51857 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -608,9 +607,6 @@ static int mtk_drm_bind(struct device *dev)
struct drm_device *drm;
int ret, i;
 
-   if (!iommu_present(_bus_type))
-   return -EPROBE_DEFER;
-
pdev = of_find_device_by_node(private->mutex_node);
if (!pdev) {
dev_err(dev, "Waiting for disp-mutex device %pOF\n",
-- 
2.39.2.101.g768bb238c484.dirty



Re: [PATCH v2 6/8] dt-bindings: reserved-memory: Add secure CMA reserved memory range

2023-11-14 Thread Robin Murphy

On 13/11/2023 6:37 am, Yong Wu (吴勇) wrote:
[...]

+properties:
+  compatible:
+const: secure_cma_region


Still wrong compatible. Look at other bindings - there is nowhere
underscore. Look at other reserved memory bindings especially.

Also, CMA is a Linux thingy, so either not suitable for bindings at
all,
or you need Linux specific compatible. I don't quite get why do you
evennot
put CMA there - adding Linux specific stuff will get obvious
pushback...


Thanks. I will change to: secure-region. Is this ok?


No, the previous discussion went off in entirely the wrong direction. To 
reiterate, the point of the binding is not to describe the expected 
usage of the thing nor the general concept of the thing, but to describe 
the actual thing itself. There are any number of different ways software 
may interact with a "secure region", so that is meaningless as a 
compatible. It needs to describe *this* secure memory interface offered 
by *this* TEE, so that software knows that to use it requires making 
those particular SiP calls with that particular UUID etc.


Thanks,
Robin.


Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-29 Thread Robin Murphy

On 29/09/2023 4:45 pm, Will Deacon wrote:

On Mon, Sep 25, 2023 at 06:54:42PM +0100, Robin Murphy wrote:

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is
that if the SMMU is coherent then walks *will* be outer-WBWA, so I honestly
can't see why the io-pgtable code is going out of its way to explicitly
reject a request to give them the same attribute it's already giving then
anyway :/

Even if the original intent was for the quirk to have an over-specific
implication of representing inner-NC as well, that hardly seems useful if
what we've ended up with in practice is a nonsensical-looking check in one
place and then a weird hacky bodge in another purely to work around it.

Does anyone know a good reason why this is the way it is?


I think it was mainly because the quick doesn't make sense for a coherent
page-table walker and we could in theory use that bit for something else
in that case.


Yuck, even if we did want some horrible notion of quirks being 
conditional on parts of the config rather than just the format, then the 
users would need to be testing for the same condition as the pagetable 
code itself (i.e. cfg->coherent_walk), rather than hoping some other 
property of something else indirectly reflects the right information - 
e.g. there'd be no hope of backporting this particular bodge before 5.19 
where the old iommu_capable(IOMMU_CAP_CACHE_COHERENCY) always returned 
true, and in future we could conceivably support coherent SMMUs being 
configured for non-coherent walks on a per-domain basis.


Furthermore, if we did overload a flag to have multiple meanings, then 
we'd have no way of knowing which one the caller was actually expecting, 
thus the illusion of being able to validate calls in the meantime isn't 
necessarily as helpful as it seems, particularly in a case where the 
"wrong" interpretation would be to have no effect anyway. Mostly though 
I'd hope that if we ever got anywhere near the point of running out of 
quirk bits we'd have already realised that it's time for a better 
interface :(


Based on that, I think that when I do get round to needing to touch this 
code, I'll propose just streamlining the whole quirk.


Cheers,
Robin.


Re: [PATCH] drm/msm/a6xx: don't set IO_PGTABLE_QUIRK_ARM_OUTER_WBWA with coherent SMMU

2023-09-25 Thread Robin Murphy

On 2023-04-10 19:52, Dmitry Baryshkov wrote:

If the Adreno SMMU is dma-coherent, allocation will fail unless we
disable IO_PGTABLE_QUIRK_ARM_OUTER_WBWA. Skip setting this quirk for the
coherent SMMUs (like we have on sm8350 platform).


Hmm, but is it right that it should fail in the first place? The fact is 
that if the SMMU is coherent then walks *will* be outer-WBWA, so I 
honestly can't see why the io-pgtable code is going out of its way to 
explicitly reject a request to give them the same attribute it's already 
giving then anyway :/


Even if the original intent was for the quirk to have an over-specific 
implication of representing inner-NC as well, that hardly seems useful 
if what we've ended up with in practice is a nonsensical-looking check 
in one place and then a weird hacky bodge in another purely to work 
around it.


Does anyone know a good reason why this is the way it is?

[ just came across this code in the tree while trying to figure out what 
to do with iommu_set_pgtable_quirks()... ]


Thanks,
Robin.


Fixes: 54af0ceb7595 ("arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU 
nodes")
Reported-by: David Heidelberg 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 2942d2548ce6..f74495dcbd96 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1793,7 +1793,8 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct 
platform_device *pdev)
 * This allows GPU to set the bus attributes required to use system
 * cache on behalf of the iommu page table walker.
 */
-   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
+   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice) &&
+   !device_iommu_capable(>dev, IOMMU_CAP_CACHE_COHERENCY))
quirks |= IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
  
  	return adreno_iommu_create_address_space(gpu, pdev, quirks);


Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Robin Murphy

On 12/09/2023 4:53 pm, Rob Herring wrote:

On Tue, Sep 12, 2023 at 11:13:50AM +0100, Robin Murphy wrote:

On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:

On 12/09/2023 08:16, Yong Wu (吴勇) wrote:

Hi Rob,

Thanks for your review.

On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:


External email : Please do not click links or open attachments until
you have verified the sender or the content.
   On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:

This adds the binding for describing a CMA memory for MediaTek

SVP(Secure

Video Path).


CMA is a Linux thing. How is this related to CMA?




Signed-off-by: Yong Wu 
---
   .../mediatek,secure_cma_chunkmem.yaml | 42

+++

   1 file changed, 42 insertions(+)
   create mode 100644 Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml


diff --git a/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml
b/Documentation/devicetree/bindings/reserved-
memory/mediatek,secure_cma_chunkmem.yaml

new file mode 100644
index ..cc10e00d35c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml

@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id:

http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#

+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek Secure Video Path Reserved Memory


What makes this specific to Mediatek? Secure video path is fairly
common, right?


Here we just reserve a buffer and would like to create a dma-buf secure
heap for SVP, then the secure engines(Vcodec and DRM) could prepare
secure buffer through it.
But the heap driver is pure SW driver, it is not platform device and


All drivers are pure SW.


we don't have a corresponding HW unit for it. Thus I don't think I
could create a platform dtsi node and use "memory-region" pointer to
the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
[9/9]). Sorry if this is not right.


If this is not for any hardware and you already understand this (since
you cannot use other bindings) then you cannot have custom bindings for
it either.



Then in our usage case, is there some similar method to do this? or
any other suggestion?


Don't stuff software into DTS.


Aren't most reserved-memory bindings just software policy if you look at it
that way, though? IIUC this is a pool of memory that is visible and
available to the Non-Secure OS, but is fundamentally owned by the Secure
TEE, and pages that the TEE allocates from it will become physically
inaccessible to the OS. Thus the platform does impose constraints on how the
Non-Secure OS may use it, and per the rest of the reserved-memory bindings,
describing it as a "reusable" reservation seems entirely appropriate. If
anything that's *more* platform-related and so DT-relevant than typical
arbitrary reservations which just represent "save some memory to dedicate to
a particular driver" and don't actually bear any relationship to firmware or
hardware at all.


Yes, a memory range defined by hardware or firmware is within scope of
DT. (CMA at aribitrary address was questionable.)

My issue here is more that 'secure video memory' is not any way Mediatek
specific. AIUI, it's a requirement from certain content providers for
video playback to work. So why the Mediatek specific binding?


Based on the implementation, I'd ask the question the other way round - 
the way it works looks to be at least somewhat dependent on Mediatek's 
TEE, in ways where other vendors' equivalent implementations may be 
functionally incompatible, however nothing suggests it's actually 
specific to video (beyond that presumably being the primary use-case 
they had in mind).


Thanks,
Robin.


Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Robin Murphy

On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:

On 12/09/2023 08:16, Yong Wu (吴勇) wrote:

Hi Rob,

Thanks for your review.

On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:


External email : Please do not click links or open attachments until
you have verified the sender or the content.
  On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:

This adds the binding for describing a CMA memory for MediaTek

SVP(Secure

Video Path).


CMA is a Linux thing. How is this related to CMA?




Signed-off-by: Yong Wu 
---
  .../mediatek,secure_cma_chunkmem.yaml | 42

+++

  1 file changed, 42 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml


diff --git a/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml
b/Documentation/devicetree/bindings/reserved-
memory/mediatek,secure_cma_chunkmem.yaml

new file mode 100644
index ..cc10e00d35c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml

@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id:

http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#

+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek Secure Video Path Reserved Memory


What makes this specific to Mediatek? Secure video path is fairly
common, right?


Here we just reserve a buffer and would like to create a dma-buf secure
heap for SVP, then the secure engines(Vcodec and DRM) could prepare
secure buffer through it.
  
But the heap driver is pure SW driver, it is not platform device and


All drivers are pure SW.


we don't have a corresponding HW unit for it. Thus I don't think I
could create a platform dtsi node and use "memory-region" pointer to
the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
[9/9]). Sorry if this is not right.


If this is not for any hardware and you already understand this (since
you cannot use other bindings) then you cannot have custom bindings for
it either.



Then in our usage case, is there some similar method to do this? or
any other suggestion?


Don't stuff software into DTS.


Aren't most reserved-memory bindings just software policy if you look at 
it that way, though? IIUC this is a pool of memory that is visible and 
available to the Non-Secure OS, but is fundamentally owned by the Secure 
TEE, and pages that the TEE allocates from it will become physically 
inaccessible to the OS. Thus the platform does impose constraints on how 
the Non-Secure OS may use it, and per the rest of the reserved-memory 
bindings, describing it as a "reusable" reservation seems entirely 
appropriate. If anything that's *more* platform-related and so 
DT-relevant than typical arbitrary reservations which just represent 
"save some memory to dedicate to a particular driver" and don't actually 
bear any relationship to firmware or hardware at all.


However, the fact that Linux's implementation of how to reuse reserved 
memory areas is called CMA is indeed still irrelevant and has no place 
in the binding itself.


Thanks,
Robin.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-09-04 17:16, Boris Brezillon wrote:

On Mon, 4 Sep 2023 16:22:19 +0100
Steven Price  wrote:


On 04/09/2023 10:26, Boris Brezillon wrote:

On Mon, 4 Sep 2023 08:42:08 +0100
Steven Price  wrote:
   

On 01/09/2023 17:10, Boris Brezillon wrote:

On Wed,  9 Aug 2023 18:53:15 +0200
Boris Brezillon  wrote:
 

+/**
+ * DOC: MMIO regions exposed to userspace.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
+ *
+ * File offset for all MMIO regions being exposed to userspace. Don't use
+ * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
+ *
+ * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls
+ * GPU cache flushling through CS instructions, but the flush reduction
+ * mechanism requires a flush_id. This flush_id could be queried with an
+ * ioctl, but Arm provides a well-isolated register page containing only this
+ * read-only register, so let's expose this page through a static mmap offset
+ * and allow direct mapping of this MMIO region so we can avoid the
+ * user <-> kernel round-trip.
+ */
+#define DRM_PANTHOR_USER_MMIO_OFFSET   (0x1ull << 56)


I'm playing with a 32-bit kernel/userspace, and this is problematic,
because vm_pgoff is limited to 32-bit there, meaning we can only map up
to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
userspace set the mmio range?


Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
user space is likely to still be needed.


Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
chance you could advise me on what to do here?

1. assume this limitation is here for a good reason, and limit the GPU
VA space to 32-bits on 32-bit kernels

or

2. update the interface to make iova an u64


I'm not sure I can answer the question from a technical perspective,
hopefully Robin will be able to.


Had a quick chat with Robin, and he's recommending going for #1 too.



But why do we care about 32-bit kernels on a platform which is new
enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?


Apparently the memory you save by switching to a 32-bit kernel matters
to some people. To clarify, the CPU is aarch64, but they want to use it
in 32-bit mode.



Given the other limitations present in a 32-bit kernel I'd be tempted to
say '1' just for simplicity. Especially since apparently we've lived
with this for panfrost which presumably has the same limitation (even
though all Bifrost/Midgard GPUs have at least 33 bits of VA space).


Well, Panfrost is simpler in that you don't have this kernel VA range,
and, IIRC, we are using the old format that naturally limits the GPU VA
space to 4G.


FWIW the legacy pagetable format itself should be fine going up to 
however many bits the GPU supports, however there were various ISA 
limitations around crossing 4GB boundaries, and the easiest way to avoid 
having to think about those was to just not use more than 4GB of VA at 
all (minus chunks at the ends for similar weird ISA reasons).


Cheers,
Robin.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-08-09 17:53, Boris Brezillon wrote:
[...]

+/**
+ * struct drm_panthor_vm_create - Arguments passed to 
DRM_PANTHOR_IOCTL_VM_CREATE
+ */
+struct drm_panthor_vm_create {
+   /** @flags: VM flags, MBZ. */
+   __u32 flags;
+
+   /** @id: Returned VM ID. */
+   __u32 id;
+
+   /**
+* @kernel_va_range: Size of the VA space reserved for kernel objects.
+*
+* If kernel_va_range is zero, we pick half of the VA space for kernel 
objects.
+*
+* Kernel VA space is always placed at the top of the supported VA 
range.
+*/
+   __u64 kernel_va_range;


Off the back of the "IOVA as unsigned long" concern, Boris and I 
reasoned through the 64-bit vs. 32-bit vs. compat cases on IRC, and it 
seems like this kernel_va_range argument is a source of much of the pain.


Rather than have userspace specify a quantity which it shouldn't care 
about and depend on assumptions of kernel behaviour to infer the 
quantity which *is* relevant (i.e. how large the usable range of the VM 
will actually be), I think it would be considerably more logical for 
userspace to simply request the size of usable VM it actually wants. 
Then it would be straightforward and consistent to define the default 
value in terms of the minimum of half the GPU VA size or TASK_SIZE (the 
latter being the largest *meaningful* value in all 3 cases), and it's 
still easy enough for the kernel to deduce for itself whether there's a 
reasonable amount of space left between the requested limit and 
ULONG_MAX for it to use. 32-bit kernels should then get at least 1GB to 
play with, for compat the kernel BOs can get well out of the way into 
the >32-bit range, and it's only really 64-bit where userspace is liable 
to see "kernel" VA space impinging on usable process VAs. Even then 
we're not sure that's a significant concern beyond OpenCL SVM.


Thanks,
Robin.


Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-21 Thread Robin Murphy

On 2023-08-14 12:18, Steven Price wrote:

On 11/08/2023 20:26, Robin Murphy wrote:

On 2023-08-11 17:56, Daniel Stone wrote:

Hi,

On 11/08/2023 17:35, Robin Murphy wrote:

On 2023-08-09 17:53, Boris Brezillon wrote:

+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor
directory/Kconfig/Makefile reshuffle and a trivial bit of extra
registration glue to build both drivers into a single module. It
seems like it could be a perpetual source of confusion to end users
where Mesa "panfrost" is the right option but kernel "panfrost" is
the wrong one. Especially when pretty much every other GPU driver is
also just one big top-level module to load for many different
generations of hardware. Plus it would mean that if someone did want
to have a go at deduplicating the resource-wrangling boilerplate for
OPPs etc. in future, there's more chance of being able to do so
meaningfully.


It might be nice to point it out, but to be fair Intel and AMD both
have two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali.


Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to
i915 what lima is to panfrost. It was more that unlike the others where
there's a pretty clear line in the sand between "driver for old
hardware" and "driver for the majority of recent hardware", this one
happens to fall splat in the middle of the current major generation such
that panfrost is the correct module for Mali Bifrost but also the wrong
one for Mali Bifrost... :/


Well panfrost.ko is the correct module for all Bifrost ;) It's Valhall
that's the confusing one.


Bah, you see? If even developers sufficiently involved to be CCed on the 
patches can't remember what's what, what hope does Joe User have? :D



I would hope that for most users they can just build both panfrost and
panthor and everything will "Just Work (tm)". I'm not sure how much
users are actually aware of the architecture family of their GPU.

I think at the moment (until marketing mess it up) there's also the
'simple' rule:

* Mali T* is Midgard and supported by panfrost.ko
* Mali Gxx (two digits) is Bifrost or first-generation Valhall and
supported by panfrost.ko
* Mali Gxxx (three digits) is Valhall CSF and supported by panthor.

(and Immortalis is always three digits and Valhall CSF).


With brain now engaged, indeed that sounds right. However if the 
expectation is that most people would steer clear even of marketing's 
alphabet soup and just enable everything, that could also be seen as 
somewhat of an argument for just putting it all together and not 
bothering with a separate option.



I can see the point, but otoh if someone's managed to build all the
right regulator/clock/etc modules to get a working system, they'll
probably manage to figure teh GPU side out?


Maybe; either way I guess it's not really my concern, since I'm the only
user that *I* have to support, and I do already understand it. From the
upstream perspective I mostly just want to hold on to the hope of not
having to write my io-pgtable bugs twice over if at all possible :)


I agree it would be nice to merge some of the common code, I'm hoping
this is something that might be possible in the future. But at the
moment the focus is on trying to get basic support for the new GPUs
without the danger of regressing the old GPUs.


Yup, I get that, it's just the niggling concern I have is whether what 
we do at the moment might paint us into a corner with respect to what 
we're then able to change later; I know KConfig symbols are explicitly 
not ABI, but module names and driver names might be more of a grey area.



And, to be honest, for a fair bit of the common code in
panfrost/panthorm it's common to a few other drivers too. So the correct
answer might well be to try to add more generic helpers (devfreq,
clocks, power domains all spring to mind - there's a lot of boiler plate
and nothing very special about Mali).


That much is true, however I guess there's also stuff like perf counter 
support which is less likely to be DRM-level generic but perhaps still 
sufficiently similar between JM and CSF. The main thing I don't know, 
and thus feel compelled to poke at, is whether there's any possibility 
that once the new UAPI is mature, it might eventually become preferable 
to move Job Manager support over to some subset of that rather than 
maintain two whole UAPIs in parallel (particularly at the Mesa end). My 
(limited) understanding is that all the BO-wrangling and MMU code is 
primarily different here for the sake of supporting new shiny UAPI 
features, not because of anything inherent to CSF itself (other than CSF 
being the thing which makes supporting said features feasible). If 
that's a preposterous idea and absolutely never ever going to be 
realistic, then fine, but if not, then it feels like the kind of thing 
that my all-too-great experience of technical debt and bad short-term

Re: [PATCH v2 05/15] drm/panthor: Add the GPU logical block

2023-08-21 Thread Robin Murphy

On 2023-08-14 11:54, Steven Price wrote:
[...]

+/**
+ * panthor_gpu_l2_power_on() - Power-on the L2-cache
+ * @ptdev: Device.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+int panthor_gpu_l2_power_on(struct panthor_device *ptdev)
+{
+   u64 core_mask = U64_MAX;
+
+   if (ptdev->gpu_info.l2_present != 1) {
+   /*
+* Only support one core group now.
+* ~(l2_present - 1) unsets all bits in l2_present except
+* the bottom bit. (l2_present - 2) has all the bits in
+* the first core group set. AND them together to generate
+* a mask of cores in the first core group.
+*/
+   core_mask = ~(ptdev->gpu_info.l2_present - 1) &
+(ptdev->gpu_info.l2_present - 2);
+   drm_info_once(>base, "using only 1st core group (%lu cores 
from %lu)\n",
+ hweight64(core_mask),
+ hweight64(ptdev->gpu_info.shader_present));


I'm not sure what the point of this complexity is. This boils down to
the equivalent of:

if (ptdev->gpu_info.l2_present != 1)
core_mask = 1;


Hmm, that doesn't look right - the idiom here should be to set all bits 
of the output below the *second* set bit of the input, i.e. 0x11 -> 
0x0f. However since panthor is (somewhat ironically) unlikely to ever 
run on T628, and everything newer should pretend to have a single L2 
because software-managed coherency is a terrible idea, I would agree 
that ultimately it does all seem a bit pointless.



If we were doing shader-core power management manually (like on pre-CSF
GPUs, rather than letting the firmware control it) then the computed
core_mask would be useful. So I guess it comes down to the
drm_info_once() output and counting the cores - which is nice to have
but it took me some time figuring out what was going on here.
As for the complexity, I'd suggest you can have some choice words with 
the guy who originally suggested that code[1] ;)


Cheers,
Robin.

[1] 
https://lore.kernel.org/dri-devel/b009b4c4-0396-58c2-7779-30c844f36...@arm.com/


Re: [PATCH] iommu: Remove the device_lock_assert() from __iommu_probe_device()

2023-08-21 Thread Robin Murphy

On 2023-08-18 22:32, Jason Gunthorpe wrote:

It turns out several drivers are calling of_dma_configure() outside the
expected bus_type.dma_configure op. This ends up being mis-locked and
triggers a lockdep assertion, or instance:

   iommu_probe_device_locked+0xd4/0xe4
   of_iommu_configure+0x10c/0x200
   of_dma_configure_id+0x104/0x3b8
   a6xx_gmu_init+0x4c/0xccc [msm]
   a6xx_gpu_init+0x3ac/0x770 [msm]
   adreno_bind+0x174/0x2ac [msm]
   component_bind_all+0x118/0x24c
   msm_drm_bind+0x1e8/0x6c4 [msm]
   try_to_bring_up_aggregate_device+0x168/0x1d4
   __component_add+0xa8/0x170
   component_add+0x14/0x20
   dsi_dev_attach+0x20/0x2c [msm]
   dsi_host_attach+0x9c/0x144 [msm]
   devm_mipi_dsi_attach+0x34/0xb4
   lt9611uxc_attach_dsi.isra.0+0x84/0xfc [lontium_lt9611uxc]
   lt9611uxc_probe+0x5c8/0x68c [lontium_lt9611uxc]
   i2c_device_probe+0x14c/0x290
   really_probe+0x148/0x2b4
   __driver_probe_device+0x78/0x12c
   driver_probe_device+0x3c/0x160
   __device_attach_driver+0xb8/0x138
   bus_for_each_drv+0x84/0xe0
   __device_attach+0xa8/0x1b0
   device_initial_probe+0x14/0x20
   bus_probe_device+0xb0/0xb4
   deferred_probe_work_func+0x8c/0xc8
   process_one_work+0x1ec/0x53c
   worker_thread+0x298/0x408
   kthread+0x124/0x128
   ret_from_fork+0x10/0x20

It is subtle and was never documented or enforced, but there has always
been an assumption that of_dma_configure_id() is not concurrent. It makes
several calls into the iommu layer that require this, including
dev_iommu_get(). The majority of cases have been preventing concurrency
using the device_lock().

Thus the new lock debugging added exposes an existing problem in
drivers. On inspection this looks like a theoretical locking problem as
generally the cases are already assuming they are the exclusive (single
threaded) user of the target device.


Sorry to be blunt, but the only problem is that you've introduced an 
idealistic new locking scheme which failed to take into account how 
things currently actually work, and is broken and achieving nothing but 
causing problems.


The solution is to drop those locking patches entirely and rethink the 
whole thing. When their sole purpose was to improve the locking and make 
it easier to reason about, and now the latest "fix" is now to remove one 
of the assertions which forms the fundamental basis for that reasoning, 
then the point has clearly been lost. All we've done is churned a dodgy 
and incomplete locking scheme into a *different* dodgy and incomplete 
locking scheme. I do not think that continuing to dig in deeper is the 
way out of the hole...


It's now rc7, and I have little confidence that aren't still more latent 
problems which just haven't been hit yet (e.g. acpi_dma_configure() is 
also called in different contexts relative to the device lock, which is 
absolutely by design and not broken).


And on the subject of idealism, the fact is that doing IOMMU 
configuration based on driver probe via bus->dma_configure is 
*fundamentally wrong* and breaking a bunch of other IOMMU API 
assumptions, so it is not a robust foundation to build anything upon in 
the first place. The problem it causes with broken groups has been known 
about for several years now, however it's needed a lot of work to get to 
the point of being able to fix it properly (FWIW that is now #2 on my 
priority list after getting the bus ops stuff done, which should also 
make it easier).


Thanks,
Robin.


Sadly, there are deeper technical problems with all of the places doing
this. There are several problemetic patterns:

1) Probe a driver on device A and then steal device B and use it as part
of the driver operation.

Since no driver was probed to device B it means we never called
bus_type.dma_configure and thus the drivers hackily try to open code
this.

Unfortunately nothing prevents another driver from binding to device B
and creating total chaos. eg vfio bind triggered by userspace

2) Probe a driver on device A and then create a new platform driver B for a
fwnode that doesn't have one, then do #1

This has the same essential problem as #1, the new device is never
probed so the hack call to of_dma_configure() is needed to setup DMA,
and we are at risk of something else trying to use the device.

3) Probe a driver on device A but the of_node was incorrect for DMA so fix
it by figuring out the right node and calling of_dma_configure()

This will blow up in the iommu code if the driver is unprobed because
the bus_type now assumes that dma_configure and dma_cleanup are
strictly paired. Since dma_configure will have done the wrong thing due
to the missing of_node, dma_cleanup will be unpaired and
iommu_device_unuse_default_domain() will blow up.

Further the driver operating on device A will not be protected against
changes to the iommu domain since it never called
iommu_device_use_default_domain()

At least this case will not throw a lockdep warning as 

Re: [PATCH v3] misc: sram: Add DMA-BUF Heap exporting of SRAM areas

2023-08-17 Thread Robin Murphy

On 2023-07-13 20:13, Andrew Davis wrote:

This new export type exposes to userspace the SRAM area as a DMA-BUF Heap,
this allows for allocations of DMA-BUFs that can be consumed by various
DMA-BUF supporting devices.

Signed-off-by: Andrew Davis 
---

Changes from v2:
  - Make sram_dma_heap_allocate static (kernel test robot)
  - Rebase on v6.5-rc1

  drivers/misc/Kconfig |   7 +
  drivers/misc/Makefile|   1 +
  drivers/misc/sram-dma-heap.c | 245 +++
  drivers/misc/sram.c  |   6 +
  drivers/misc/sram.h  |  16 +++
  5 files changed, 275 insertions(+)
  create mode 100644 drivers/misc/sram-dma-heap.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 75e427f124b28..ee34dfb61605f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -448,6 +448,13 @@ config SRAM
  config SRAM_EXEC
bool
  
+config SRAM_DMA_HEAP

+   bool "Export on-chip SRAM pools using DMA-Heaps"
+   depends on DMABUF_HEAPS && SRAM
+   help
+ This driver allows the export of on-chip SRAM marked as both pool
+ and exportable to userspace using the DMA-Heaps interface.
+
  config DW_XDATA_PCIE
depends on PCI
tristate "Synopsys DesignWare xData PCIe driver"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index f2a4d1ff65d46..5e7516bfaa8de 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
  obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o
  obj-$(CONFIG_SRAM)+= sram.o
  obj-$(CONFIG_SRAM_EXEC)   += sram-exec.o
+obj-$(CONFIG_SRAM_DMA_HEAP)+= sram-dma-heap.o
  obj-$(CONFIG_GENWQE)  += genwqe/
  obj-$(CONFIG_ECHO)+= echo/
  obj-$(CONFIG_CXL_BASE)+= cxl/
diff --git a/drivers/misc/sram-dma-heap.c b/drivers/misc/sram-dma-heap.c
new file mode 100644
index 0..c054c04dff33e
--- /dev/null
+++ b/drivers/misc/sram-dma-heap.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SRAM DMA-Heap userspace exporter
+ *
+ * Copyright (C) 2019-2022 Texas Instruments Incorporated - https://www.ti.com/
+ * Andrew Davis 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sram.h"
+
+struct sram_dma_heap {
+   struct dma_heap *heap;
+   struct gen_pool *pool;
+};
+
+struct sram_dma_heap_buffer {
+   struct gen_pool *pool;
+   struct list_head attachments;
+   struct mutex attachments_lock;
+   unsigned long len;
+   void *vaddr;
+   phys_addr_t paddr;
+};
+
+struct dma_heap_attachment {
+   struct device *dev;
+   struct sg_table *table;
+   struct list_head list;
+};
+
+static int dma_heap_attach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attachment)
+{
+   struct sram_dma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a;
+   struct sg_table *table;
+
+   a = kzalloc(sizeof(*a), GFP_KERNEL);
+   if (!a)
+   return -ENOMEM;
+
+   table = kmalloc(sizeof(*table), GFP_KERNEL);
+   if (!table) {
+   kfree(a);
+   return -ENOMEM;
+   }
+   if (sg_alloc_table(table, 1, GFP_KERNEL)) {
+   kfree(table);
+   kfree(a);
+   return -ENOMEM;
+   }
+   sg_set_page(table->sgl, pfn_to_page(PFN_DOWN(buffer->paddr)), 
buffer->len, 0);


What happens if someone (reasonably) assumes that this struct page 
pointer isn't completely made up, and dereferences it?


(That's if pfn_to_page() itself doesn't blow up, which it potentially 
might, at least under CONFIG_SPARSEMEM)


I think this needs to be treated as P2PDMA if it's going to have any 
hope of working robustly.



+
+   a->table = table;
+   a->dev = attachment->dev;
+   INIT_LIST_HEAD(>list);
+
+   attachment->priv = a;
+
+   mutex_lock(>attachments_lock);
+   list_add(>list, >attachments);
+   mutex_unlock(>attachments_lock);
+
+   return 0;
+}
+
+static void dma_heap_detatch(struct dma_buf *dmabuf,
+struct dma_buf_attachment *attachment)
+{
+   struct sram_dma_heap_buffer *buffer = dmabuf->priv;
+   struct dma_heap_attachment *a = attachment->priv;
+
+   mutex_lock(>attachments_lock);
+   list_del(>list);
+   mutex_unlock(>attachments_lock);
+
+   sg_free_table(a->table);
+   kfree(a->table);
+   kfree(a);
+}
+
+static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment 
*attachment,
+enum dma_data_direction direction)
+{
+   struct dma_heap_attachment *a = attachment->priv;
+   struct sg_table *table = a->table;
+
+   /*
+* As this heap is backed by uncached SRAM memory we do not need to
+* perform any sync operations on the buffer before allowing device
+  

Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-11 Thread Robin Murphy

On 2023-08-11 17:56, Daniel Stone wrote:

Hi,

On 11/08/2023 17:35, Robin Murphy wrote:

On 2023-08-09 17:53, Boris Brezillon wrote:

+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor 
directory/Kconfig/Makefile reshuffle and a trivial bit of extra 
registration glue to build both drivers into a single module. It seems 
like it could be a perpetual source of confusion to end users where 
Mesa "panfrost" is the right option but kernel "panfrost" is the wrong 
one. Especially when pretty much every other GPU driver is also just 
one big top-level module to load for many different generations of 
hardware. Plus it would mean that if someone did want to have a go at 
deduplicating the resource-wrangling boilerplate for OPPs etc. in 
future, there's more chance of being able to do so meaningfully.


It might be nice to point it out, but to be fair Intel and AMD both have 
two (or more) drivers, as does Broadcom/RPi. As does, err ... Mali.


Indeed, I didn't mean to imply that I'm not aware that e.g. gma500 is to 
i915 what lima is to panfrost. It was more that unlike the others where 
there's a pretty clear line in the sand between "driver for old 
hardware" and "driver for the majority of recent hardware", this one 
happens to fall splat in the middle of the current major generation such 
that panfrost is the correct module for Mali Bifrost but also the wrong 
one for Mali Bifrost... :/


I can see the point, but otoh if someone's managed to build all the 
right regulator/clock/etc modules to get a working system, they'll 
probably manage to figure teh GPU side out?


Maybe; either way I guess it's not really my concern, since I'm the only 
user that *I* have to support, and I do already understand it. From the 
upstream perspective I mostly just want to hold on to the hope of not 
having to write my io-pgtable bugs twice over if at all possible :)


Cheers,
Robin.


Re: [PATCH v2 13/15] drm/panthor: Allow driver compilation

2023-08-11 Thread Robin Murphy

On 2023-08-09 17:53, Boris Brezillon wrote:

Now that all blocks are available, we can add/update Kconfig/Makefile
files to allow compilation.

v2:
- Rename the driver (pancsf -> panthor)
- Change the license (GPL2 -> MIT + GPL2)
- Split the driver addition commit
- Add new dependencies on GPUVA and DRM_SCHED

Signed-off-by: Boris Brezillon 
---
  drivers/gpu/drm/Kconfig  |  2 ++
  drivers/gpu/drm/Makefile |  1 +
  drivers/gpu/drm/panthor/Kconfig  | 16 
  drivers/gpu/drm/panthor/Makefile | 15 +++
  4 files changed, 34 insertions(+)
  create mode 100644 drivers/gpu/drm/panthor/Kconfig
  create mode 100644 drivers/gpu/drm/panthor/Makefile

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2a44b9419d4d..bddfbdb2ffee 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -358,6 +358,8 @@ source "drivers/gpu/drm/lima/Kconfig"
  
  source "drivers/gpu/drm/panfrost/Kconfig"
  
+source "drivers/gpu/drm/panthor/Kconfig"

+
  source "drivers/gpu/drm/aspeed/Kconfig"
  
  source "drivers/gpu/drm/mcde/Kconfig"

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 215e78e79125..0a260727505f 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -188,6 +188,7 @@ obj-$(CONFIG_DRM_TVE200) += tve200/
  obj-$(CONFIG_DRM_XEN) += xen/
  obj-$(CONFIG_DRM_VBOXVIDEO) += vboxvideo/
  obj-$(CONFIG_DRM_LIMA)  += lima/
+obj-$(CONFIG_DRM_PANTHOR) += panthor/
  obj-$(CONFIG_DRM_PANFROST) += panfrost/
  obj-$(CONFIG_DRM_ASPEED_GFX) += aspeed/
  obj-$(CONFIG_DRM_MCDE) += mcde/
diff --git a/drivers/gpu/drm/panthor/Kconfig b/drivers/gpu/drm/panthor/Kconfig
new file mode 100644
index ..a9d17b1bbb75
--- /dev/null
+++ b/drivers/gpu/drm/panthor/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0 or MIT
+
+config DRM_PANTHOR
+   tristate "Panthor (DRM support for ARM Mali CSF-based GPUs)"
+   depends on DRM
+   depends on ARM || ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on MMU
+   select DRM_EXEC
+   select DRM_SCHED
+   select IOMMU_SUPPORT
+   select IOMMU_IO_PGTABLE_LPAE
+   select DRM_GEM_SHMEM_HELPER
+   select PM_DEVFREQ
+   select DEVFREQ_GOV_SIMPLE_ONDEMAND
+   help
+ DRM driver for ARM Mali CSF-based GPUs.
diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
new file mode 100644
index ..64193a484879
--- /dev/null
+++ b/drivers/gpu/drm/panthor/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0 or MIT
+
+panthor-y := \
+   panthor_devfreq.o \
+   panthor_device.o \
+   panthor_drv.o \
+   panthor_gem.o \
+   panthor_gpu.o \
+   panthor_heap.o \
+   panthor_heap.o \
+   panthor_fw.o \
+   panthor_mmu.o \
+   panthor_sched.o
+
+obj-$(CONFIG_DRM_PANTHOR) += panthor.o


FWIW I still think it would be nice to have a minor 
directory/Kconfig/Makefile reshuffle and a trivial bit of extra 
registration glue to build both drivers into a single module. It seems 
like it could be a perpetual source of confusion to end users where Mesa 
"panfrost" is the right option but kernel "panfrost" is the wrong one. 
Especially when pretty much every other GPU driver is also just one big 
top-level module to load for many different generations of hardware. 
Plus it would mean that if someone did want to have a go at 
deduplicating the resource-wrangling boilerplate for OPPs etc. in 
future, there's more chance of being able to do so meaningfully.


Cheers,
Robin.


Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

2023-06-23 Thread Robin Murphy

On 2023-06-20 10:47, Sui Jingfeng wrote:

From: Sui Jingfeng 

Loongson CPUs maintain cache coherency by hardware, which means that the
data in the CPU cache is identical to the data in main system memory. As
for the peripheral device, most of Loongson chips chose to define the
peripherals as DMA coherent by default, device drivers do not need to
maintain the coherency between a processor and an I/O device manually.

There are exceptions, for LS2K1000 SoC, part of peripheral device can be
configured as DMA non-coherent. But there is no released version of such
firmware exist in the market. Peripherals of older LS2K1000 is also DMA
non-coherent, but they are nearly outdated. So, those are trivial cases.

Nevertheless, kernel space still need to do the probe work, because vivante
GPU IP has been integrated into various platform. Hence, this patch add
runtime detection code to probe if a specific GPU is DMA coherent, If the
answer is yes, we are going to utilize such features. On Loongson platform,
When a buffer is accessed by both the GPU and the CPU, the driver should
prefer ETNA_BO_CACHED over ETNA_BO_WC.

This patch also add a new parameter: etnaviv_param_gpu_coherent, which
allow userspace to know if such a feature is available. Because
write-combined BO is still preferred in some case, especially where don't
need CPU read, for example, uploading compiled shader bin.

Cc: Lucas Stach 
Cc: Christian Gmeiner 
Cc: Philipp Zabel 
Cc: Bjorn Helgaas 
Cc: Daniel Vetter 
Signed-off-by: Sui Jingfeng 
---
  drivers/gpu/drm/etnaviv/etnaviv_drv.c   | 35 +
  drivers/gpu/drm/etnaviv/etnaviv_drv.h   |  6 
  drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 22 ++---
  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c |  7 -
  drivers/gpu/drm/etnaviv/etnaviv_gpu.c   |  4 +++
  include/uapi/drm/etnaviv_drm.h  |  1 +
  6 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 0a365e96d371..d8e788aa16cb 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -5,7 +5,9 @@
  
  #include 

  #include 
+#include 


/*
 * This header is for implementations of dma_map_ops and related code.
 * It should not be included in drivers just using the DMA API.
 */


  #include 
+#include 
  #include 
  #include 
  
@@ -24,6 +26,34 @@

  #include "etnaviv_pci_drv.h"
  #include "etnaviv_perfmon.h"
  
+static struct device_node *etnaviv_of_first_available_node(void)

+{
+   struct device_node *core_node;
+
+   for_each_compatible_node(core_node, NULL, "vivante,gc") {
+   if (of_device_is_available(core_node))
+   return core_node;
+   }
+
+   return NULL;
+}
+
+static bool etnaviv_is_dma_coherent(struct device *dev)
+{
+   struct device_node *np;
+   bool coherent;
+
+   np = etnaviv_of_first_available_node();
+   if (np) {
+   coherent = of_dma_is_coherent(np);
+   of_node_put(np);
+   } else {
+   coherent = dev_is_dma_coherent(dev);
+   }


Please use device_get_dma_attr() like other well-behaved drivers.


+
+   return coherent;
+}
+
  /*
   * etnaviv private data construction and destructions:
   */
@@ -52,6 +82,11 @@ etnaviv_alloc_private(struct device *dev, struct drm_device 
*drm)
return ERR_PTR(-ENOMEM);
}
  
+	priv->dma_coherent = etnaviv_is_dma_coherent(dev);

+
+   if (priv->dma_coherent)
+   drm_info(drm, "%s is dma coherent\n", dev_name(dev));


I'm pretty sure the end-user doesn't care.


+
return priv;
  }
  
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h

index 9cd72948cfad..644e5712c050 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
@@ -46,6 +46,12 @@ struct etnaviv_drm_private {
struct xarray active_contexts;
u32 next_context_id;
  
+	/*

+* If true, the GPU is capable of snooping cpu cache. Here, it
+* also means that cache coherency is enforced by the hardware.
+*/
+   bool dma_coherent;
+
/* list of GEM objects: */
struct mutex gem_lock;
struct list_head gem_list;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index b5f73502e3dd..39bdc3774f2d 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -343,6 +343,7 @@ void *etnaviv_gem_vmap(struct drm_gem_object *obj)
  static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)
  {
struct page **pages;
+   pgprot_t prot;
  
  	lockdep_assert_held(>lock);
  
@@ -350,8 +351,19 @@ static void *etnaviv_gem_vmap_impl(struct etnaviv_gem_object *obj)

if (IS_ERR(pages))
return NULL;
  
-	return vmap(pages, obj->base.size >> PAGE_SHIFT,

-   

Re: [PATCH 4/7] drm/apu: Add support of IOMMU

2023-05-18 Thread Robin Murphy

On 2023-05-17 15:52, Alexandre Bailon wrote:

Some APU devices are behind an IOMMU.
For some of these devices, we can't use DMA API because
they use static addresses so we have to manually use
IOMMU API to correctly map the buffers.


Except you still need to use the DMA for the sake of cache coherency and 
any other aspects :(



This adds support of IOMMU.

Signed-off-by: Alexandre Bailon 
Reviewed-by: Julien Stephan 
---
  drivers/gpu/drm/apu/apu_drv.c  |   4 +
  drivers/gpu/drm/apu/apu_gem.c  | 174 +
  drivers/gpu/drm/apu/apu_internal.h |  16 +++
  drivers/gpu/drm/apu/apu_sched.c|  28 +
  include/uapi/drm/apu_drm.h |  12 +-
  5 files changed, 233 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c
index b6bd340b2bc8..a0dce785a02a 100644
--- a/drivers/gpu/drm/apu/apu_drv.c
+++ b/drivers/gpu/drm/apu/apu_drv.c
@@ -23,6 +23,10 @@ static const struct drm_ioctl_desc ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
+ DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
+ DRM_RENDER_ALLOW),
  };
  
  DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops);

diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
index 0e7b3b27942c..0a91363754c5 100644
--- a/drivers/gpu/drm/apu/apu_gem.c
+++ b/drivers/gpu/drm/apu/apu_gem.c
@@ -2,6 +2,9 @@
  //
  // Copyright 2020 BayLibre SAS
  
+#include 

+#include 
+
  #include 
  
  #include 

@@ -42,6 +45,7 @@ int ioctl_gem_new(struct drm_device *dev, void *data,
 */
apu_obj->size = args->size;
apu_obj->offset = 0;
+   apu_obj->iommu_refcount = 0;
mutex_init(_obj->mutex);
  
  	ret = drm_gem_handle_create(file_priv, gem_obj, >handle);

@@ -54,3 +58,173 @@ int ioctl_gem_new(struct drm_device *dev, void *data,
  
  	return 0;

  }
+
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
+{
+   int iova_pfn;
+   int i;
+
+   if (!obj->iommu_sgt)
+   return;
+
+   mutex_lock(>mutex);
+   obj->iommu_refcount--;
+   if (obj->iommu_refcount) {
+   mutex_unlock(>mutex);
+   return;
+   }
+
+   iova_pfn = PHYS_PFN(obj->iova);


Using mm layer operations on IOVAs looks wrong. In practice I don't 
think it's ultimately harmful, other than potentially making less 
efficient use of IOVA space if the CPU page size is larger than the 
IOMMU page size, but it's still a bad code smell when you're using an 
IOVA abstraction that is deliberately decoupled from CPU pages.



+   for (i = 0; i < obj->iommu_sgt->nents; i++) {
+   iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
+   PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+   iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));


You can unmap a set of IOVA-contiguous mappings as a single range with 
one call.



+   }
+
+   sg_free_table(obj->iommu_sgt);
+   kfree(obj->iommu_sgt);
+
+   free_iova(_drm->iovad, PHYS_PFN(obj->iova));
+   mutex_unlock(>mutex);
+}
+
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
+{
+   if (obj->funcs)
+   return obj->funcs->get_sg_table(obj);
+   return NULL;
+}
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
+{
+   struct apu_gem_object *apu_obj = to_apu_bo(obj);
+   struct scatterlist *sgl;
+   phys_addr_t phys;
+   int total_buf_space;
+   int iova_pfn;
+   int iova;
+   int ret;
+   int i;
+
+   mutex_lock(_obj->mutex);
+   apu_obj->iommu_refcount++;
+   if (apu_obj->iommu_refcount != 1) {
+   mutex_unlock(_obj->mutex);
+   return 0;
+   }
+
+   apu_obj->iommu_sgt = apu_get_sg_table(obj);
+   if (IS_ERR(apu_obj->iommu_sgt)) {
+   mutex_unlock(_obj->mutex);
+   return PTR_ERR(apu_obj->iommu_sgt);
+   }
+
+   total_buf_space = obj->size;
+   iova_pfn = alloc_iova_fast(_drm->iovad,
+  total_buf_space >> PAGE_SHIFT,
+  apu_drm->iova_limit_pfn, true);


If you need things mapped at specific addresses like the commit message 
claims, the DMA IOVA allocator is a terrible tool for the job. DRM 
already has its own more flexible abstraction for address space 
management in the form of drm_mm, so as a DRM driver it would seem a lot 
more sensible to use one of those.


And even if you could justify using this allocator, I can't imagine 
there's any way you'd need the _fast version (further illustrated by the 
fact that you're freeing the IOVAs wrongly for that).



+   apu_obj->iova = 

[PATCH v2] drm/mediatek: Stop using iommu_present()

2023-05-10 Thread Robin Murphy
Remove the pointless check. If an IOMMU is providing transparent DMA API
ops for any device(s) we care about, the DT code will have enforced the
appropriate probe ordering already.

Signed-off-by: Robin Murphy 
---

v2: Rebase to 6.4-rc1

 drivers/gpu/drm/mediatek/mtk_drm_drv.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index 6dcb4ba2466c..3e677eb0dc70 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -582,9 +581,6 @@ static int mtk_drm_bind(struct device *dev)
struct drm_device *drm;
int ret, i;
 
-   if (!iommu_present(_bus_type))
-   return -EPROBE_DEFER;
-
pdev = of_find_device_by_node(private->mutex_node);
if (!pdev) {
dev_err(dev, "Waiting for disp-mutex device %pOF\n",
-- 
2.39.2.101.g768bb238c484.dirty



Re: [PATCH 1/3] iommu/dma: Clean up Kconfig

2023-05-05 Thread Robin Murphy

On 2023-05-05 15:50, Jason Gunthorpe wrote:

On Tue, Aug 16, 2022 at 06:28:03PM +0100, Robin Murphy wrote:

Although iommu-dma is a per-architecture chonce, that is currently
implemented in a rather haphazard way. Selecting from the arch Kconfig
was the original logical approach, but is complicated by having to
manage dependencies; conversely, selecting from drivers ends up hiding
the architecture dependency *too* well. Instead, let's just have it
enable itself automatically when IOMMU API support is enabled for the
relevant architectures. It can't get much clearer than that.

Signed-off-by: Robin Murphy 
---
  arch/arm64/Kconfig  | 1 -
  drivers/iommu/Kconfig   | 3 +--
  drivers/iommu/amd/Kconfig   | 1 -
  drivers/iommu/intel/Kconfig | 1 -
  4 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..59af600445c2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,7 +209,6 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
-   select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 5c5cb5bee8b6..1d99c2d984fb 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -137,7 +137,7 @@ config OF_IOMMU
  
  # IOMMU-agnostic DMA-mapping layer

  config IOMMU_DMA
-   bool
+   def_bool ARM64 || IA64 || X86


Robin, do you remember why you added IA64 here? What is the Itanimum
IOMMU driver?


config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
depends on PCI_MSI && ACPI && (X86 || IA64)

Yes, really :)

Robin.


Re: [PATCH v2 5/5] fbdev: Define framebuffer I/O from Linux' I/O functions

2023-04-28 Thread Robin Murphy

On 2023-04-28 10:27, Thomas Zimmermann wrote:

Implement framebuffer I/O helpers, such as fb_read*() and fb_write*()
with Linux' regular I/O functions. Remove all ifdef cases for the
various architectures.

Most of the supported architectures use __raw_() I/O functions or treat
framebuffer memory like regular memory. This is also implemented by the
architectures' I/O function, so we can use them instead.

Sparc uses SBus to connect to framebuffer devices. It provides respective
implementations of the framebuffer I/O helpers. The involved sbus_()
I/O helpers map to the same code as Sparc's regular I/O functions. As
with other platforms, we can use those instead.

We leave a TODO item to replace all fb_() functions with their regular
I/O counterparts throughout the fbdev drivers.

Signed-off-by: Thomas Zimmermann 
---
  include/linux/fb.h | 63 +++---
  1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/include/linux/fb.h b/include/linux/fb.h
index 08cb47da71f8..4aa9e90edd17 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -15,7 +15,6 @@
  #include 
  #include 
  #include 
-#include 
  
  struct vm_area_struct;

  struct fb_info;
@@ -511,58 +510,26 @@ struct fb_info {
   */
  #define STUPID_ACCELF_TEXT_SHIT
  
-// This will go away

-#if defined(__sparc__)
-
-/* We map all of our framebuffers such that big-endian accesses
- * are what we want, so the following is sufficient.
+/*
+ * TODO: Update fbdev drivers to call the I/O helpers directly and
+ *   remove the fb_() tokens.
   */
-
-// This will go away
-#define fb_readb sbus_readb
-#define fb_readw sbus_readw
-#define fb_readl sbus_readl
-#define fb_readq sbus_readq
-#define fb_writeb sbus_writeb
-#define fb_writew sbus_writew
-#define fb_writel sbus_writel
-#define fb_writeq sbus_writeq
-#define fb_memset sbus_memset_io
-#define fb_memcpy_fromfb sbus_memcpy_fromio
-#define fb_memcpy_tofb sbus_memcpy_toio
-
-#elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) ||
\
-   defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \
-   defined(__arm__) || defined(__aarch64__) || defined(__mips__)
-
-#define fb_readb __raw_readb
-#define fb_readw __raw_readw
-#define fb_readl __raw_readl
-#define fb_readq __raw_readq
-#define fb_writeb __raw_writeb
-#define fb_writew __raw_writew
-#define fb_writel __raw_writel
-#define fb_writeq __raw_writeq


Note that on at least some architectures, the __raw variants are 
native-endian, whereas the regular accessors are explicitly 
little-endian, so there is a slight risk of inadvertently changing 
behaviour on big-endian systems (MIPS most likely, but a few old ARM 
platforms run BE as well).



+#define fb_readb readb
+#define fb_readw readw
+#define fb_readl readl
+#if defined(CONFIG_64BIT)
+#define fb_readq readq
+#endif


You probably don't need to bother making these conditional - 32-bit 
architectures aren't forbidden from providing readq/writeq if they 
really want to, and drivers can also use the io-64-nonatomic headers for 
portability. The build will still fail in a sufficiently obvious manner 
if neither is true.


Thanks,
Robin.


+#define fb_writeb writeb
+#define fb_writew writew
+#define fb_writel writel
+#if defined(CONFIG_64BIT)
+#define fb_writeq writeq
+#endif
  #define fb_memset memset_io
  #define fb_memcpy_fromfb memcpy_fromio
  #define fb_memcpy_tofb memcpy_toio
  
-#else

-
-#define fb_readb(addr) (*(volatile u8 *) (addr))
-#define fb_readw(addr) (*(volatile u16 *) (addr))
-#define fb_readl(addr) (*(volatile u32 *) (addr))
-#define fb_readq(addr) (*(volatile u64 *) (addr))
-#define fb_writeb(b,addr) (*(volatile u8 *) (addr) = (b))
-#define fb_writew(b,addr) (*(volatile u16 *) (addr) = (b))
-#define fb_writel(b,addr) (*(volatile u32 *) (addr) = (b))
-#define fb_writeq(b,addr) (*(volatile u64 *) (addr) = (b))
-#define fb_memset memset
-#define fb_memcpy_fromfb memcpy
-#define fb_memcpy_tofb memcpy
-
-#endif
-
  #define FB_LEFT_POS(p, bpp)  (fb_be_math(p) ? (32 - (bpp)) : 0)
  #define FB_SHIFT_HIGH(p, val, bits)  (fb_be_math(p) ? (val) >> (bits) : \
  (val) << (bits))


Re: [PATCH v2 04/10] iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous()

2023-01-20 Thread Robin Murphy

On 2023-01-18 18:00, Jason Gunthorpe wrote:

Change the sg_alloc_table_from_pages() allocation that was hardwired to
GFP_KERNEL to use the gfp parameter like the other allocations in this
function.

Auditing says this is never called from an atomic context, so it is safe
as is, but reads wrong.


I think the point may have been that the sgtable metadata is a 
logically-distinct allocation from the buffer pages themselves. Much 
like the allocation of the pages array itself further down in 
__iommu_dma_alloc_pages(). I see these days it wouldn't be catastrophic 
to pass GFP_HIGHMEM into __get_free_page() via sg_kmalloc(), but still, 
allocating implementation-internal metadata with all the same 
constraints as a DMA buffer has just as much smell of wrong about it IMO.


I'd say the more confusing thing about this particular context is why 
we're using iommu_map_sg_atomic() further down - that seems to have been 
an oversight in 781ca2de89ba, since this particular path has never 
supported being called in atomic context.


Overall I'm starting to wonder if it might not be better to stick a "use 
GFP_KERNEL_ACCOUNT if you allocate" flag in the domain for any level of 
the API internals to pick up as appropriate, rather than propagate 
per-call gfp flags everywhere. As it stands we're still missing 
potential pagetable and other domain-related allocations by drivers in 
.attach_dev and even (in probably-shouldn't-really-happen cases) 
.unmap_pages...


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8c2788633c1766..e4bf1bb159f7c7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -822,7 +822,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
if (!iova)
goto out_free_pages;
  
-	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))

+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, gfp))
goto out_free_iova;
  
  	if (!(ioprot & IOMMU_CACHE)) {


Re: [PATCH] drm/panfrost: fix GENERIC_ATOMIC64 dependency

2023-01-18 Thread Robin Murphy

On 2023-01-18 11:09, Steven Price wrote:

On 17/01/2023 16:44, Arnd Bergmann wrote:

From: Arnd Bergmann 

On ARMv5 and earlier, a randconfig build can still run into

WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE
   Depends on [n]: IOMMU_SUPPORT [=y] && (ARM [=y] || ARM64 || COMPILE_TEST [=y]) 
&& !GENERIC_ATOMIC64 [=y]
   Selected by [y]:
   - DRM_PANFROST [=y] && HAS_IOMEM [=y] && DRM [=y] && (ARM [=y] || ARM64 || COMPILE_TEST [=y] 
&& !GENERIC_ATOMIC64 [=y]) && MMU [=y]

Rework the dependencies to always require a working cmpxchg64.

Fixes: db594ba3fcf9 ("drm/panfrost: depend on !GENERIC_ATOMIC64 when using 
COMPILE_TEST")


Looking at db594ba3fcf9 - it states:


 Since panfrost has a 'select' on IOMMU_IO_PGTABLE_LPAE we must depend on
 the same set of flags. Otherwise IOMMU_IO_PGTABLE_LPAE will be forced on
 even though it cannot build (no support for cmpxchg64).


And at the time the dependencies on IOMMU_IO_PGTABLE_LPAE were exactly
these.

However d286a58bc8f4 ("iommu: Tidy up io-pgtable dependencies")
(currently in the iommu tree) changed the depends to split the
!GENERIC_ATOMIC64 out. So we could argue that really that's the commit
that should be blamed in the fixes line.


Oh bum... indeed this is entirely my fault for forgetting about our one 
"foreign" io-pgtable user in that commit, sorry about that.



However there's no harm in this being backported further than it
strictly needs to be, and it's clearly better having the
!GENERIC_ATOMIC64 split out. So I'll merge this to drm-misc-fixes.


Thanks both!

Robin.



Reviewed-by: Steven Price 

Thanks!

Steve


Signed-off-by: Arnd Bergmann 
---
  drivers/gpu/drm/panfrost/Kconfig | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
index 079600328be1..e6403a9d66ad 100644
--- a/drivers/gpu/drm/panfrost/Kconfig
+++ b/drivers/gpu/drm/panfrost/Kconfig
@@ -3,7 +3,8 @@
  config DRM_PANFROST
tristate "Panfrost (DRM support for ARM Mali Midgard/Bifrost GPUs)"
depends on DRM
-   depends on ARM || ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on ARM || ARM64 || COMPILE_TEST
+   depends on !GENERIC_ATOMIC64# for IOMMU_IO_PGTABLE_LPAE
depends on MMU
select DRM_SCHED
select IOMMU_SUPPORT




Re: [PATCH 1/8] iommu: Add a gfp parameter to iommu_map()

2023-01-06 Thread Robin Murphy

On 2023-01-06 16:42, Jason Gunthorpe wrote:

The internal mechanisms support this, but instead of exposting the gfp to
the caller it wrappers it into iommu_map() and iommu_map_atomic()

Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT.


FWIW, since we *do* have two variants already, I think I'd have a mild 
preference for leaving the regular map calls as-is (i.e. implicit 
GFP_KERNEL), and just generalising the _atomic versions for the special 
cases.


However, echoing the recent activity over on the DMA API side of things, 
I think it's still worth proactively constraining the set of permissible 
flags, lest we end up with more weird problems if stuff that doesn't 
really make sense, like GFP_COMP or zone flags, manages to leak through 
(that may have been part of the reason for having the current wrappers 
rather than a bare gfp argument in the first place, I forget now).


Thanks,
Robin.


Signed-off-by: Jason Gunthorpe 
---
  arch/arm/mm/dma-mapping.c   | 11 +++
  .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c |  3 ++-
  drivers/gpu/drm/tegra/drm.c |  2 +-
  drivers/gpu/host1x/cdma.c   |  2 +-
  drivers/infiniband/hw/usnic/usnic_uiom.c|  4 ++--
  drivers/iommu/dma-iommu.c   |  2 +-
  drivers/iommu/iommu.c   | 17 ++---
  drivers/iommu/iommufd/pages.c   |  6 --
  drivers/media/platform/qcom/venus/firmware.c|  2 +-
  drivers/net/ipa/ipa_mem.c   |  6 --
  drivers/net/wireless/ath/ath10k/snoc.c  |  2 +-
  drivers/net/wireless/ath/ath11k/ahb.c   |  4 ++--
  drivers/remoteproc/remoteproc_core.c|  5 +++--
  drivers/vfio/vfio_iommu_type1.c |  9 +
  drivers/vhost/vdpa.c|  2 +-
  include/linux/iommu.h   |  4 ++--
  16 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c135f6e37a00ca..8bc01071474ab7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -984,7 +984,8 @@ __iommu_create_mapping(struct device *dev, struct page 
**pages, size_t size,
  
  		len = (j - i) << PAGE_SHIFT;

ret = iommu_map(mapping->domain, iova, phys, len,
-   __dma_info_to_prot(DMA_BIDIRECTIONAL, attrs));
+   __dma_info_to_prot(DMA_BIDIRECTIONAL, attrs),
+   GFP_KERNEL);
if (ret < 0)
goto fail;
iova += len;
@@ -1207,7 +1208,8 @@ static int __map_sg_chunk(struct device *dev, struct 
scatterlist *sg,
  
  		prot = __dma_info_to_prot(dir, attrs);
  
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);

+   ret = iommu_map(mapping->domain, iova, phys, len, prot,
+   GFP_KERNEL);
if (ret < 0)
goto fail;
count += len >> PAGE_SHIFT;
@@ -1379,7 +1381,8 @@ static dma_addr_t arm_iommu_map_page(struct device *dev, 
struct page *page,
  
  	prot = __dma_info_to_prot(dir, attrs);
  
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);

+   ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len,
+   prot, GFP_KERNEL);
if (ret < 0)
goto fail;
  
@@ -1443,7 +1446,7 @@ static dma_addr_t arm_iommu_map_resource(struct device *dev,
  
  	prot = __dma_info_to_prot(dir, attrs) | IOMMU_MMIO;
  
-	ret = iommu_map(mapping->domain, dma_addr, addr, len, prot);

+   ret = iommu_map(mapping->domain, dma_addr, addr, len, prot, GFP_KERNEL);
if (ret < 0)
goto fail;
  
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c

index 648ecf5a8fbc2a..a4ac94a2ab57fc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -475,7 +475,8 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 
npages, u32 align,
u32 offset = (r->offset + i) << imem->iommu_pgshift;
  
  		ret = iommu_map(imem->domain, offset, node->dma_addrs[i],

-   PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
+   PAGE_SIZE, IOMMU_READ | IOMMU_WRITE,
+   GFP_KERNEL);
if (ret < 0) {
nvkm_error(subdev, "IOMMU mapping failure: %d\n", ret);
  
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c

index 7bd2e65c2a16c5..6ca9f396e55be4 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1057,7 +1057,7 @@ void *tegra_drm_alloc(struct tegra_drm *tegra, size_t 
size, dma_addr_t *dma)
  
  	*dma = iova_dma_addr(>carveout.domain, alloc);

err = 

Re: [PATCH v2 2/3] iommu/sound: Use component_match_add_of helper

2023-01-03 Thread Robin Murphy

On 03/01/2023 4:15 pm, Maxime Ripard wrote:

Hi Robin,

On Tue, Jan 03, 2023 at 01:01:07PM +, Robin Murphy wrote:

Hi Sean,

On 22/12/2022 11:37 pm, Sean Anderson wrote:

Convert users of component_match_add_release with component_release_of
and component_compare_of to component_match_add_of.

Signed-off-by: Sean Anderson 
Acked-by: Mark Brown 
---

Changes in v2:
- Split off from helper addition

   drivers/iommu/mtk_iommu.c| 3 +--
   drivers/iommu/mtk_iommu_v1.c | 3 +--
   sound/soc/codecs/wcd938x.c   | 6 ++
   3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2ab2ecfe01f8..483b7a9e4410 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1079,8 +1079,7 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
}
data->larb_imu[id].dev = >dev;
-   component_match_add_release(dev, match, component_release_of,
-   component_compare_of, larbnode);
+   component_match_add_of(dev, match, larbnode);


I've long since given up trying to make sense of how the DRM tree works, but
the conflicting change is definitely already in mainline:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b5765a1b44bea9dfcae69c53ffeb4c689d0922a7


As far as I can see, that patch doesn't affect DRM at all, and the
commit you pointed to doesn't either, nor has it been merged through the
DRM tree.


Right it doesn't affect DRM, and was merged via the IOMMU tree, but it 
does affect *this* patch, which Sean has based on a drm-next branch that 
seemingly still wasn't up to date with 6.2-rc1 at the time.


Since v2 had already been posted, it seemed like a bright idea to 
comment here to clarify that it was still relevant, rather than bumping 
the old thread to reply directly. Apologies for any confusion.


In practical terms I think it's merely a case of dropping this hunk; the 
other one in mtk_iommu_v1.c looks fine to me.


Cheers,
Robin.


Can you expand a bit on how we're involved in this, what we should
clarify or help with?

Maxime


Re: [PATCH v2 2/3] iommu/sound: Use component_match_add_of helper

2023-01-03 Thread Robin Murphy

Hi Sean,

On 22/12/2022 11:37 pm, Sean Anderson wrote:

Convert users of component_match_add_release with component_release_of
and component_compare_of to component_match_add_of.

Signed-off-by: Sean Anderson 
Acked-by: Mark Brown 
---

Changes in v2:
- Split off from helper addition

  drivers/iommu/mtk_iommu.c| 3 +--
  drivers/iommu/mtk_iommu_v1.c | 3 +--
  sound/soc/codecs/wcd938x.c   | 6 ++
  3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2ab2ecfe01f8..483b7a9e4410 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1079,8 +1079,7 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
}
data->larb_imu[id].dev = >dev;
  
-		component_match_add_release(dev, match, component_release_of,

-   component_compare_of, larbnode);
+   component_match_add_of(dev, match, larbnode);


I've long since given up trying to make sense of how the DRM tree works, 
but the conflicting change is definitely already in mainline:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b5765a1b44bea9dfcae69c53ffeb4c689d0922a7

Thanks,
Robin.


}
  
  	/* Get smi-(sub)-common dev from the last larb. */

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 6e0e65831eb7..fb09ed6bf550 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -672,8 +672,7 @@ static int mtk_iommu_v1_probe(struct platform_device *pdev)
}
data->larb_imu[i].dev = >dev;
  
-		component_match_add_release(dev, , component_release_of,

-   component_compare_of, larbnode);
+   component_match_add_of(dev, , larbnode);
}
  
  	platform_set_drvdata(pdev, data);

diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c
index aca06a4026f3..2f8444e54083 100644
--- a/sound/soc/codecs/wcd938x.c
+++ b/sound/soc/codecs/wcd938x.c
@@ -4474,8 +4474,7 @@ static int wcd938x_add_slave_components(struct 
wcd938x_priv *wcd938x,
}
  
  	of_node_get(wcd938x->rxnode);

-   component_match_add_release(dev, matchptr, component_release_of,
-   component_compare_of, wcd938x->rxnode);
+   component_match_add_of(dev, matchptr, wcd938x->rxnode);
  
  	wcd938x->txnode = of_parse_phandle(np, "qcom,tx-device", 0);

if (!wcd938x->txnode) {
@@ -4483,8 +4482,7 @@ static int wcd938x_add_slave_components(struct 
wcd938x_priv *wcd938x,
return -ENODEV;
}
of_node_get(wcd938x->txnode);
-   component_match_add_release(dev, matchptr, component_release_of,
-   component_compare_of, wcd938x->txnode);
+   component_match_add_of(dev, matchptr, wcd938x->txnode);
return 0;
  }
  


Re: [PATCH 0/2] drm: Add component_match_add_of and convert users of drm_of_component_match_add

2022-12-16 Thread Robin Murphy

On 2022-12-16 17:08, Sean Anderson wrote:

On 11/3/22 14:22, Sean Anderson wrote:

This series adds a new function component_match_add_of to simplify the
common case of calling component_match_add_release with
component_release_of and component_compare_of. There is already
drm_of_component_match_add, which allows for a custom compare function.
However, all existing users just use component_compare_of (or an
equivalent).

I can split the second commit up if that is easier to review.


Sean Anderson (2):
   component: Add helper for device nodes
   drm: Convert users of drm_of_component_match_add to
 component_match_add_of

  .../gpu/drm/arm/display/komeda/komeda_drv.c   |  6 ++--
  drivers/gpu/drm/arm/hdlcd_drv.c   |  9 +-
  drivers/gpu/drm/arm/malidp_drv.c  | 11 +--
  drivers/gpu/drm/armada/armada_drv.c   | 10 ---
  drivers/gpu/drm/drm_of.c  | 29 +++
  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  4 +--
  .../gpu/drm/hisilicon/kirin/kirin_drm_drv.c   |  3 +-
  drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  3 +-
  drivers/gpu/drm/mediatek/mtk_drm_drv.c|  4 +--
  drivers/gpu/drm/msm/msm_drv.c | 14 -
  drivers/gpu/drm/sti/sti_drv.c |  3 +-
  drivers/gpu/drm/sun4i/sun4i_drv.c |  3 +-
  drivers/gpu/drm/tilcdc/tilcdc_external.c  | 10 ++-
  drivers/iommu/mtk_iommu.c |  3 +-
  drivers/iommu/mtk_iommu_v1.c  |  3 +-
  include/drm/drm_of.h  | 12 
  include/linux/component.h |  9 ++
  sound/soc/codecs/wcd938x.c|  6 ++--
  18 files changed, 46 insertions(+), 96 deletions(-)



ping?

Should I send a v2 broken up like Mark suggested?


FWIW you'll need to rebase the IOMMU changes on 6.2-rc1 anyway - 
mtk_iommu stops using component_match_add_release() at all.


Thanks,
Robin.


Re: [PATCH] drm/radeon: Fix screen corruption (v2)

2022-12-15 Thread Robin Murphy

On 2022-12-15 11:40, Luben Tuikov wrote:

On 2022-12-15 06:27, Christian König wrote:

Am 15.12.22 um 11:19 schrieb Luben Tuikov:

On 2022-12-15 04:46, Christian König wrote:

Am 15.12.22 um 10:08 schrieb Luben Tuikov:

On 2022-12-15 03:07, Christian König wrote:

Am 15.12.22 um 00:08 schrieb Robin Murphy:

On 2022-12-14 22:02, Alex Deucher wrote:

On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy 
wrote:

On 2022-12-12 02:08, Luben Tuikov wrote:

Fix screen corruption on older 32-bit systems using AGP chips.

On older systems with little memory, for instance 1.5 GiB, using an
AGP chip,
the device's DMA mask is 0x, but the memory mask is
0x7FF, and
subsequently dma_addressing_limited() returns 0x < 0x7FFF,
false. As such the result of this static inline isn't suitable for
the last
argument to ttm_device_init()--it simply needs to now whether to
use GFP_DMA32
when allocating DMA buffers.

This sounds wrong to me. If the issues happen on systems without PAE it
clearly can't have anything to with the actual DMA address size. Not to
mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
GFP_DMA32 would be functionally meaningless anyway. Although the
reported symptoms initially sounded like they could be caused by DMA
going to the wrong place, that is also equally consistent with a
loss of
cache coherency.

My (limited) understanding of AGP is that the GART can effectively
alias
memory to a second physical address, so I could well believe that
something somewhere in the driver stack needs to perform some cache
maintenance to avoid coherency issues, and that in these particular
setups whatever that is might be assuming the memory is direct-mapped
and thus going wrong for highmem pages.

So as I said before, I really think this is not about using
GFP_DMA32 at
all, but about *not* using GFP_HIGHUSER.

One of the wonderful features of AGP is that it has to be used with
uncached memory.  The aperture basically just provides a remapping of
physical pages into a linear aperture that you point the GPU at.  TTM
has to jump through quite a few hoops to get uncached memory in the
first place, so it's likely that that somehow isn't compatible with
HIGHMEM.  Can you get uncached HIGHMEM?

I guess in principle yes, if you're careful not to use regular
kmap()/kmap_atomic(), and always use pgprot_noncached() for
userspace/vmalloc mappings, but clearly that leaves lots of scope for
slipping up.

I theory we should do exactly that in TTM, but we have very few users
who actually still exercise that functionality.


Working backwards from primitives like set_memory_uc(), I see various
paths in TTM where manipulating the caching state is skipped for
highmem pages, but I wouldn't even know where to start looking for
whether the right state is propagated to all the places where they
might eventually be mapped somewhere.

The tt object has the caching state for the pages and
ttm_prot_from_caching() then uses pgprot_noncached() and co for the
userspace/vmalloc mappings.


The point of this patch is that dma_addressing_limited() is unsuitable as
the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
particular case--then TTM ends up using HIGHUSER, and we get the screen 
corruption.
(gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())

Well I would rather say that dma_addressing_limited() works, but the
default value from dma_get_required_mask() is broken.


dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FF.


This 0x7FF mask looks fishy to me. That would only be 128MiB
addressable memory (27 bits set)? Or is there another F missing?


Yeah, I'm missing an F--it is correctly described at the top of the thread 
above,
i.e. in the commit of v2 patch.

0x7FFF_, which seems correct, no?


While the dma mask is 0x, as set in radeon_device.c in 
radeon_device_init().


32 bits only work with bounce buffers and we can't use those on graphics
hardware.


Is there an objection to this patch, if it fixes the screen corruption?

Not from my side, but fixing the underlying issues would be better I think.


Have they been identified?


I'm not 100% sure. I think by using GFP_DMA32 we just work around the
issue somehow.


Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
code trying to understand what we do when GFP_DMA32 is not set, and the 
immediate
thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device 
struct.
(Then I got down to the caching attributes...)

It's be nice if we can find the actual issue--what else would it show us that 
needs fixing...?

So what do we do with this patch?

Shouldn't leave it in a limbo--some OSes ship their kernel
with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
limitations") wholly
reverted.


Removing dma_addressing_limited() is still wrong, for the reasons given 
in that commit. What we need is an *additional* condition t

Re: [PATCH] drm/radeon: Fix screen corruption (v2)

2022-12-14 Thread Robin Murphy

On 2022-12-14 22:02, Alex Deucher wrote:

On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy  wrote:


On 2022-12-12 02:08, Luben Tuikov wrote:

Fix screen corruption on older 32-bit systems using AGP chips.

On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
the device's DMA mask is 0x, but the memory mask is 0x7FF, and
subsequently dma_addressing_limited() returns 0x < 0x7FFF,
false. As such the result of this static inline isn't suitable for the last
argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
when allocating DMA buffers.


This sounds wrong to me. If the issues happen on systems without PAE it
clearly can't have anything to with the actual DMA address size. Not to
mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
GFP_DMA32 would be functionally meaningless anyway. Although the
reported symptoms initially sounded like they could be caused by DMA
going to the wrong place, that is also equally consistent with a loss of
cache coherency.

My (limited) understanding of AGP is that the GART can effectively alias
memory to a second physical address, so I could well believe that
something somewhere in the driver stack needs to perform some cache
maintenance to avoid coherency issues, and that in these particular
setups whatever that is might be assuming the memory is direct-mapped
and thus going wrong for highmem pages.

So as I said before, I really think this is not about using GFP_DMA32 at
all, but about *not* using GFP_HIGHUSER.


One of the wonderful features of AGP is that it has to be used with
uncached memory.  The aperture basically just provides a remapping of
physical pages into a linear aperture that you point the GPU at.  TTM
has to jump through quite a few hoops to get uncached memory in the
first place, so it's likely that that somehow isn't compatible with
HIGHMEM.  Can you get uncached HIGHMEM?


I guess in principle yes, if you're careful not to use regular 
kmap()/kmap_atomic(), and always use pgprot_noncached() for 
userspace/vmalloc mappings, but clearly that leaves lots of scope for 
slipping up.


Working backwards from primitives like set_memory_uc(), I see various 
paths in TTM where manipulating the caching state is skipped for highmem 
pages, but I wouldn't even know where to start looking for whether the 
right state is propagated to all the places where they might eventually 
be mapped somewhere.


Cheers,
Robin.


Re: [PATCH] drm/radeon: Fix screen corruption (v2)

2022-12-14 Thread Robin Murphy

On 2022-12-12 02:08, Luben Tuikov wrote:

Fix screen corruption on older 32-bit systems using AGP chips.

On older systems with little memory, for instance 1.5 GiB, using an AGP chip,
the device's DMA mask is 0x, but the memory mask is 0x7FF, and
subsequently dma_addressing_limited() returns 0x < 0x7FFF,
false. As such the result of this static inline isn't suitable for the last
argument to ttm_device_init()--it simply needs to now whether to use GFP_DMA32
when allocating DMA buffers.


This sounds wrong to me. If the issues happen on systems without PAE it 
clearly can't have anything to with the actual DMA address size. Not to 
mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so 
GFP_DMA32 would be functionally meaningless anyway. Although the 
reported symptoms initially sounded like they could be caused by DMA 
going to the wrong place, that is also equally consistent with a loss of 
cache coherency.


My (limited) understanding of AGP is that the GART can effectively alias 
memory to a second physical address, so I could well believe that 
something somewhere in the driver stack needs to perform some cache 
maintenance to avoid coherency issues, and that in these particular 
setups whatever that is might be assuming the memory is direct-mapped 
and thus going wrong for highmem pages.


So as I said before, I really think this is not about using GFP_DMA32 at 
all, but about *not* using GFP_HIGHUSER.


Thanks,
Robin.


Partially reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

v2: Amend the commit description.

Cc: Mikhail Krylov 
Cc: Alex Deucher 
Cc: Robin Murphy 
Cc: Direct Rendering Infrastructure - Development 

Cc: AMD Graphics 
Fixes: 33b3ad3788aba8 ("drm/radeon: handle PCIe root ports with addressing 
limitations")
Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/radeon/radeon.h| 1 +
  drivers/gpu/drm/radeon/radeon_device.c | 2 +-
  drivers/gpu/drm/radeon/radeon_ttm.c| 2 +-
  3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 37dec92339b16a..4fe38fd9be3267 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2426,6 +2426,7 @@ struct radeon_device {
struct radeon_wbwb;
struct radeon_dummy_pagedummy_page;
boolshutdown;
+   boolneed_dma32;
boolneed_swiotlb;
boolaccel_working;
boolfastfb_working; /* IGP feature*/
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 6344454a772172..3643a3cfe061bd 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1370,7 +1370,7 @@ int radeon_device_init(struct radeon_device *rdev,
if (rdev->family == CHIP_CEDAR)
dma_bits = 32;
  #endif
-
+   rdev->need_dma32 = dma_bits == 32;
r = dma_set_mask_and_coherent(>pdev->dev, DMA_BIT_MASK(dma_bits));
if (r) {
pr_warn("radeon: No suitable DMA available\n");
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index bdb4c0e0736ba2..3debaeb720d173 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -696,7 +696,7 @@ int radeon_ttm_init(struct radeon_device *rdev)
   rdev->ddev->anon_inode->i_mapping,
   rdev->ddev->vma_offset_manager,
   rdev->need_swiotlb,
-  dma_addressing_limited(>pdev->dev));
+  rdev->need_dma32);
if (r) {
DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
return r;

base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f


Re: [PATCH] drm: mali-dp: Add check for kzalloc

2022-12-07 Thread Robin Murphy

On 2022-12-07 15:29, Liviu Dudau wrote:

On Wed, Dec 07, 2022 at 01:59:04PM +, Robin Murphy wrote:

On 2022-12-07 09:21, Jiasheng Jiang wrote:

As kzalloc may fail and return NULL pointer, it should be better to check
the return value in order to avoid the NULL pointer dereference in
__drm_atomic_helper_connector_reset.


This commit message is nonsense; if __drm_atomic_helper_connector_reset()
would dereference the NULL implied by _state->base, it would equally
still dereference the explicit NULL pointer passed after this patch.


Where?


Exactly, that function already checks conn_state for NULL anyway, so any 
reasoning based on it not doing that is clearly erroneous. Even if 
something else changed in future to actually make this a bug, it still 
wouldn't strictly dereference NULL, but some small non-NULL value.



The current code works out OK because "base" is the first member of struct
malidp_mw_connector_state, thus if mw_state is NULL then _state->base ==
NULL + 0 == NULL. Now you *could* argue that this isn't robust if the layout
of struct malidp_mw_connector_state ever changes, and that could be a valid
justification for making this change, but the reason given certainly isn't.


I appreciate the input and I agree with your analysis, however I don't have the 
same
confidence that compilers will always do the NULL + 0 math to get address of 
base.
Would this always work when you have authenticated pointers or is the compiler 
going
to generate some plumbing code that checks the pointer before doing the math?


For the current definition of struct malidp_mw_connector_state, 
_state->base is equal to mw_state, that's just how C works:


"A pointer to a structure object, suitably converted, points to its 
initial member (or if that member is a bit-field, then to the unit in 
which it resides), and vice versa. There may be unnamed padding within a 
structure object, but not at its beginning."


Indeed a C compiler is technically at liberty to make checks for whether 
any pointer points to a valid object when evaluating it, but in practice 
no compiler is going to do that because it would be horrendously 
inefficient, and since the behaviour of dereferencing an invalid pointer 
is undefined, compilers are also able to simply assume all pointers are 
valid and generate good code based on that. Don't forget that there are 
several compiler optimisations that Linux actually depends on; AFAICT 
this is one of them.



Arithmetic on a (potentially) NULL pointer may well be a sign that it's
worth a closer look to check whether it really is what the code intended to
do, but don't automatically assume it has to be a bug. Otherwise, good luck
with "fixing" every user of container_of() throughout the entire kernel.


My understanding is that you're supposed to use container_of() only when you're 
sure
that your pointer is valid. container_of_safe() seems to be the one to use when 
you
don't care about NULL pointers.


I was thinking more along the lines of the "((type *)0)->member" 
expression in the definition, but fair enough, that's perhaps not the 
best example since you can argue it's an operand of typeof() which won't 
actually be evaluated. Try `git grep '&((.\+ *)\(0\|NULL\))->'` for more 
examples that will be. If none of those are going to work as intended, 
the kernel likely has bigger problems than how one driver might behave 
in OOM conditions.


Anyway, like I say I'm not objecting to the code change - even if the 
current non-bug wasn't an oversight, it's still a bit too clever for its 
own good. However, if the *justification* for making that change is 
going to go beyond "do this because static analysis suggested it", then 
it needs to explain a potential issue that actually exists and is worthy 
of fixing, not make up one that doesn't.


Cheers,
Robin.


Re: [PATCH] drm: mali-dp: Add check for kzalloc

2022-12-07 Thread Robin Murphy

On 2022-12-07 09:21, Jiasheng Jiang wrote:

As kzalloc may fail and return NULL pointer, it should be better to check
the return value in order to avoid the NULL pointer dereference in
__drm_atomic_helper_connector_reset.


This commit message is nonsense; if 
__drm_atomic_helper_connector_reset() would dereference the NULL implied 
by _state->base, it would equally still dereference the explicit NULL 
pointer passed after this patch.


The current code works out OK because "base" is the first member of 
struct malidp_mw_connector_state, thus if mw_state is NULL then 
_state->base == NULL + 0 == NULL. Now you *could* argue that this 
isn't robust if the layout of struct malidp_mw_connector_state ever 
changes, and that could be a valid justification for making this change, 
but the reason given certainly isn't.


Arithmetic on a (potentially) NULL pointer may well be a sign that it's 
worth a closer look to check whether it really is what the code intended 
to do, but don't automatically assume it has to be a bug. Otherwise, 
good luck with "fixing" every user of container_of() throughout the 
entire kernel.


Thanks,
Robin.


Fixes: 8cbc5caf36ef ("drm: mali-dp: Add writeback connector")
Signed-off-by: Jiasheng Jiang 
---
  drivers/gpu/drm/arm/malidp_mw.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/arm/malidp_mw.c b/drivers/gpu/drm/arm/malidp_mw.c
index ef76d0e6ee2f..fe4474c2ddcf 100644
--- a/drivers/gpu/drm/arm/malidp_mw.c
+++ b/drivers/gpu/drm/arm/malidp_mw.c
@@ -72,7 +72,11 @@ static void malidp_mw_connector_reset(struct drm_connector 
*connector)
__drm_atomic_helper_connector_destroy_state(connector->state);
  
  	kfree(connector->state);

-   __drm_atomic_helper_connector_reset(connector, _state->base);
+
+   if (mw_state)
+   __drm_atomic_helper_connector_reset(connector, _state->base);
+   else
+   __drm_atomic_helper_connector_reset(connector, NULL);
  }
  
  static enum drm_connector_status


Re: Screen corruption using radeon kernel driver

2022-12-01 Thread Robin Murphy

On 2022-11-30 19:59, Mikhail Krylov wrote:

On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:


On 2022-11-30 14:28, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:


On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the
expected result at the point of radeon_ttm_init()? Disabling highmem is
presumably just hiding whatever problem exists, by throwing away all
   >32-bit RAM such that use_dma32 doesn't matter.


The device in question only supports a 32 bit DMA mask so
dma_addressing_limited() should return true.  Bounce buffers are not
really usable on GPUs because they map so much memory.  If
dma_addressing_limited() returns false, that would explain it.


Right, it appears to be the only part of the offending commit that
*could* reasonably make any difference, so I'm primarily wondering if
dma_get_required_mask() somehow gets confused.


Mikhail,

Can you see that dma_addressing_limited() and dma_get_required_mask()
return in this case?

Alex




Thanks,
Robin.


Unfortunately, right now I don't have enough time for kernel
modifications and rebuilds (I will later!), so I did a quick-and-dirty
research with kprobe.

The problem is that dma_addressing_limited() seems to be inlined and
kprobe fails to intercept it.

But I managed to get the result of dma_get_required_mask(). It returns
0x7fff (!) on the vanilla (with the patch, buggy) kernel:
  
$ sudo kprobe-perf 'r:dma_get_required_mask $retval'

Tracing kprobe dma_get_required_mask. Ctrl-C to end.
 modprobe-1244[000] d...   105.582816: dma_get_required_mask: 
(radeon_ttm_init+0x61/0x240 [radeon] <- dma_get_required_mask) arg1=0x7fff

This function does not even get called in the kernel without the patch
that I built myself. I believe that's because ttm_bo_device_init()
doesn't call it without the patch.

Hope that helps at least a bit. If not, I'll be able to do more thorough
research in a couple of weeks, probably.


Hmm, just to clarify, what's your actual RAM layout? I've been assuming
that the issue must be caused by unexpected DMA address truncation, but
double-checking the older threads it seems that might not be the case.
I just did a quick sanity-check of both HIGHMEM4G and HIGHMEM64G configs
in a VM with either 2GB or 4GB of RAM assigned, and the
dma_direct_get_required_mask() calculation seemed to return the
appropriate result for all combinations.

Otherwise, the only significant difference of use_dma32 seems to be to
switch TTM's allocation flags from GFP_HIGHUSER to GFP_DMA32. Could it
just be that the highmem support somewhere between TTM and radeon has
bitrotted, and it hasn't been noticed until this change because everyone
still using a 32-bit system with highmem also happens not to be using a
newer 40-bit-capable GPU? Or perhaps it never worked for AGP at all, in
which case an explicit special case might be clearer?

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon

Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Robin Murphy

On 2022-11-30 14:28, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:


On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the
expected result at the point of radeon_ttm_init()? Disabling highmem is
presumably just hiding whatever problem exists, by throwing away all
  >32-bit RAM such that use_dma32 doesn't matter.


The device in question only supports a 32 bit DMA mask so
dma_addressing_limited() should return true.  Bounce buffers are not
really usable on GPUs because they map so much memory.  If
dma_addressing_limited() returns false, that would explain it.


Right, it appears to be the only part of the offending commit that 
*could* reasonably make any difference, so I'm primarily wondering if 
dma_get_required_mask() somehow gets confused.


Thanks,
Robin.


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Robin Murphy

On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the 
expected result at the point of radeon_ttm_init()? Disabling highmem is 
presumably just hiding whatever problem exists, by throwing away all 
>32-bit RAM such that use_dma32 doesn't matter.


Robin.


Re: [PATCH] drm/dma-helpers: Don't change vma flags

2022-11-24 Thread Robin Murphy

On 2022-11-24 17:24, Daniel Vetter wrote:

On Thu, Nov 24, 2022 at 11:11:21AM +, Robin Murphy wrote:

On 2022-11-23 17:28, Daniel Vetter wrote:

This code was added in b65e64f7ccd4 ("drm/cma: Use
dma_mmap_writecombine() to mmap buffer"), but does not explain why
it's needed.

It should be entirely unnecessary, because remap_pfn_range(), which is
what the various dma_mmap functiosn are built on top of, does already
unconditionally adjust the vma flags:


Not all dma_mmap_*() implementations use remap_pfn_range() though. For
instance on ARM where one set of DMA ops uses vm_map_pages(), but AFAICS not
all the relevant drivers would set VM_MIXEDMAP to prevent reaching the
BUG_ON(vma->vm_flags & VM_PFNMAP) in there.


Uh a dma_mmap which does not use VM_PFNMAP has pretty good chances of
being busted, since that allows get_user_pages on these memory
allocations. And I'm really not sure that's a bright idea ...

Can you please point me at these dma-ops so that I can try and understand
what they're trying to do?


See arm_iommu_mmap_attrs(), but also one of the paths in 
iommu_dma_mmap() for both arm64 and x86. These aren't using 
remap_pfn_range() because they're mapping a potentially-disjoint set of 
arbitrary pages, not a single physically-contiguous range. And for the 
avoidance of doubt, yes, in those cases they will always be real kernel 
pages. dma_mmap_attrs() can be relied upon to do the right thing for 
whatever dma_alloc_attrs() did; what isn't reliable is trying to 
second-guess from outside exactly what that might be.


I forgot to mention also that removing the VM_DONTEXPAND line will 
seemingly just reintroduce the annoying warning spam for which we added 
it in the first place (and 59f39bfa6553 does document this same reasoning).


Thanks,
Robin.


-Daniel



Robin.


https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

More importantly, it does uncondtionally set VM_PFNMAP, so clearing
that does not make much sense.

Patch motived by discussions around enforcing VM_PFNMAP semantics for
all dma-buf users, where Thomas asked why dma helpers will work with
that dma_buf_mmap() contract.

References: 
https://lore.kernel.org/dri-devel/5c3c8d4f-2c06-9210-b00a-4d0ff6f6f...@suse.de/
Cc: Laurent Pinchart 
Cc: Dave Airlie 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: Sumit Semwal 
Cc: "Christian K�nig" 
Signed-off-by: Daniel Vetter 
---
   drivers/gpu/drm/drm_gem_dma_helper.c | 7 ++-
   1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c 
b/drivers/gpu/drm/drm_gem_dma_helper.c
index 1e658c448366..637a5cc62457 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -525,13 +525,10 @@ int drm_gem_dma_mmap(struct drm_gem_dma_object *dma_obj, 
struct vm_area_struct *
int ret;
/*
-* Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
-* vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
-* the whole buffer.
+* Set the vm_pgoff (used as a fake buffer offset by DRM) to 0 as we
+* want to map the whole buffer.
 */
vma->vm_pgoff -= drm_vma_node_start(>vma_node);
-   vma->vm_flags &= ~VM_PFNMAP;
-   vma->vm_flags |= VM_DONTEXPAND;
if (dma_obj->map_noncoherent) {
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);




Re: [PATCH] drm/dma-helpers: Don't change vma flags

2022-11-24 Thread Robin Murphy

On 2022-11-23 17:28, Daniel Vetter wrote:

This code was added in b65e64f7ccd4 ("drm/cma: Use
dma_mmap_writecombine() to mmap buffer"), but does not explain why
it's needed.

It should be entirely unnecessary, because remap_pfn_range(), which is
what the various dma_mmap functiosn are built on top of, does already
unconditionally adjust the vma flags:


Not all dma_mmap_*() implementations use remap_pfn_range() though. For 
instance on ARM where one set of DMA ops uses vm_map_pages(), but AFAICS 
not all the relevant drivers would set VM_MIXEDMAP to prevent reaching 
the BUG_ON(vma->vm_flags & VM_PFNMAP) in there.


Robin.


https://elixir.bootlin.com/linux/v6.1-rc6/source/mm/memory.c#L2518

More importantly, it does uncondtionally set VM_PFNMAP, so clearing
that does not make much sense.

Patch motived by discussions around enforcing VM_PFNMAP semantics for
all dma-buf users, where Thomas asked why dma helpers will work with
that dma_buf_mmap() contract.

References: 
https://lore.kernel.org/dri-devel/5c3c8d4f-2c06-9210-b00a-4d0ff6f6f...@suse.de/
Cc: Laurent Pinchart 
Cc: Dave Airlie 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: Sumit Semwal 
Cc: "Christian König" 
Signed-off-by: Daniel Vetter 
---
  drivers/gpu/drm/drm_gem_dma_helper.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c 
b/drivers/gpu/drm/drm_gem_dma_helper.c
index 1e658c448366..637a5cc62457 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -525,13 +525,10 @@ int drm_gem_dma_mmap(struct drm_gem_dma_object *dma_obj, 
struct vm_area_struct *
int ret;
  
  	/*

-* Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
-* vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
-* the whole buffer.
+* Set the vm_pgoff (used as a fake buffer offset by DRM) to 0 as we
+* want to map the whole buffer.
 */
vma->vm_pgoff -= drm_vma_node_start(>vma_node);
-   vma->vm_flags &= ~VM_PFNMAP;
-   vma->vm_flags |= VM_DONTEXPAND;
  
  	if (dma_obj->map_noncoherent) {

vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);


[PATCH] drm/panfrost: Split io-pgtable requests properly

2022-11-08 Thread Robin Murphy
Although we don't use 1GB block mappings, we still need to split
map/unmap requests at 1GB boundaries to match what io-pgtable expects.
Fix that, and add some explanation to make sense of it all.

Fixes: 3740b081795a ("drm/panfrost: Update io-pgtable API")
Reported-by: Dmitry Osipenko 
Signed-off-by: Robin Murphy 
---
The previous diff turned out to be not quite right, so I've not
included Dmitry's Tested-by given for that.
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e246d914e7f6..4e83a1891f3e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -250,13 +250,22 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
 
 static size_t get_pgsize(u64 addr, size_t size, size_t *count)
 {
+   /*
+* io-pgtable only operates on multiple pages within a single table
+* entry, so we need to split at boundaries of the table size, i.e.
+* the next block size up. The distance from address A to the next
+* boundary of block size B is logically B - A % B, but in unsigned
+* two's complement where B is a power of two we get the equivalence
+* B - A % B == (B - A) % B == (n * B - A) % B, and choose n = 0 :)
+*/
size_t blk_offset = -addr % SZ_2M;
 
if (blk_offset || size < SZ_2M) {
*count = min_not_zero(blk_offset, size) / SZ_4K;
return SZ_4K;
}
-   *count = size / SZ_2M;
+   blk_offset = -addr % SZ_1G ?: SZ_1G;
+   *count = min(blk_offset, size) / SZ_2M;
return SZ_2M;
 }
 
-- 
2.36.1.dirty



Re: [PATCH] drm/panfrost: Update io-pgtable API

2022-11-07 Thread Robin Murphy

On 2022-11-04 20:48, Dmitry Osipenko wrote:

On 11/4/22 23:37, Robin Murphy wrote:

On 2022-11-04 20:11, Dmitry Osipenko wrote:

On 8/23/22 01:01, Robin Murphy wrote:

Convert to io-pgtable's bulk {map,unmap}_pages() APIs, to help the old
single-page interfaces eventually go away. Unmapping heap BOs still
wants to be done a page at a time, but everything else can get the full
benefit of the more efficient interface.

Signed-off-by: Robin Murphy 
---
   drivers/gpu/drm/panfrost/panfrost_mmu.c | 40 +++--
   1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index b285a8001b1d..e246d914e7f6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -248,11 +248,15 @@ void panfrost_mmu_reset(struct panfrost_device
*pfdev)
   mmu_write(pfdev, MMU_INT_MASK, ~0);
   }
   -static size_t get_pgsize(u64 addr, size_t size)
+static size_t get_pgsize(u64 addr, size_t size, size_t *count)
   {
-    if (addr & (SZ_2M - 1) || size < SZ_2M)
-    return SZ_4K;
+    size_t blk_offset = -addr % SZ_2M;
   +    if (blk_offset || size < SZ_2M) {
+    *count = min_not_zero(blk_offset, size) / SZ_4K;
+    return SZ_4K;
+    }
+    *count = size / SZ_2M;
   return SZ_2M;
   }
   @@ -287,12 +291,16 @@ static int mmu_map_sg(struct panfrost_device
*pfdev, struct panfrost_mmu *mmu,
   dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx,
len=%zx", mmu->as, iova, paddr, len);
     while (len) {
-    size_t pgsize = get_pgsize(iova | paddr, len);
+    size_t pgcount, mapped = 0;
+    size_t pgsize = get_pgsize(iova | paddr, len, );
   -    ops->map(ops, iova, paddr, pgsize, prot, GFP_KERNEL);
-    iova += pgsize;
-    paddr += pgsize;
-    len -= pgsize;
+    ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
+   GFP_KERNEL, );
+    /* Don't get stuck if things have gone wrong */
+    mapped = max(mapped, pgsize);
+    iova += mapped;
+    paddr += mapped;
+    len -= mapped;
   }
   }
   @@ -344,15 +352,17 @@ void panfrost_mmu_unmap(struct
panfrost_gem_mapping *mapping)
   mapping->mmu->as, iova, len);
     while (unmapped_len < len) {
-    size_t unmapped_page;
-    size_t pgsize = get_pgsize(iova, len - unmapped_len);
+    size_t unmapped_page, pgcount;
+    size_t pgsize = get_pgsize(iova, len - unmapped_len, );
   -    if (ops->iova_to_phys(ops, iova)) {
-    unmapped_page = ops->unmap(ops, iova, pgsize, NULL);
-    WARN_ON(unmapped_page != pgsize);
+    if (bo->is_heap)
+    pgcount = 1;
+    if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
+    unmapped_page = ops->unmap_pages(ops, iova, pgsize,
pgcount, NULL);
+    WARN_ON(unmapped_page != pgsize * pgcount);


This patch causes this WARN_ON to trigger. It doesn't happen all the
time, I see that the whole unmapped area is mapped. Initially, I thought
that this happens because it tries to unmap a partially mapped range,
but I checked that ops->iova_to_phys() returns address for all 4k chunks.

For example the pgsize * pgcount = 0x800, while returned
unmapped_page = 0x600.

I don't see this problem with this patch reverted. This is using today's
linux-next. Any ideas?


What's the base IOVA in such a case? I'm wondering if the truncated size
lines up to any interesting boundary. Presumably you're not seeing any
additional warnings from io-pgtable itself?


No warnings from io-pgtable. It succeeds for 0x3200 and fails for
0x3a00 using same size 0x800. It actually fails only for the
0x3a00 as far as I see from my logs. Perhaps it indeed has to do
something with the boundary.


Bleh, indeed even though we don't use 1GB block mappings, we still need 
to split at 1GB boundaries to match what the IOMMU API will do, and thus 
what io-pgtable expects. I guess I hadn't really considered that we 
might ever have that much graphics memory in play at once...


The fix probably looks like this:

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c

index e246d914e7f6..6abc7d3726dd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -256,7 +256,9 @@ static size_t get_pgsize(u64 addr, size_t size, 
size_t *count)

*count = min_not_zero(blk_offset, size) / SZ_4K;
return SZ_4K;
}
-   *count = size / SZ_2M;
+
+   blk_offset = -addr % SZ_1G;
+   *count = min_not_zero(blk_offset, size) / SZ_2M;
return SZ_2M;
 }


Thanks,
Robin.


Re: [PATCH] drm/panfrost: Update io-pgtable API

2022-11-04 Thread Robin Murphy

On 2022-11-04 20:11, Dmitry Osipenko wrote:

On 8/23/22 01:01, Robin Murphy wrote:

Convert to io-pgtable's bulk {map,unmap}_pages() APIs, to help the old
single-page interfaces eventually go away. Unmapping heap BOs still
wants to be done a page at a time, but everything else can get the full
benefit of the more efficient interface.

Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/panfrost/panfrost_mmu.c | 40 +++--
  1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index b285a8001b1d..e246d914e7f6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -248,11 +248,15 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
mmu_write(pfdev, MMU_INT_MASK, ~0);
  }
  
-static size_t get_pgsize(u64 addr, size_t size)

+static size_t get_pgsize(u64 addr, size_t size, size_t *count)
  {
-   if (addr & (SZ_2M - 1) || size < SZ_2M)
-   return SZ_4K;
+   size_t blk_offset = -addr % SZ_2M;
  
+	if (blk_offset || size < SZ_2M) {

+   *count = min_not_zero(blk_offset, size) / SZ_4K;
+   return SZ_4K;
+   }
+   *count = size / SZ_2M;
return SZ_2M;
  }
  
@@ -287,12 +291,16 @@ static int mmu_map_sg(struct panfrost_device *pfdev, struct panfrost_mmu *mmu,

dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, len=%zx", 
mmu->as, iova, paddr, len);
  
  		while (len) {

-   size_t pgsize = get_pgsize(iova | paddr, len);
+   size_t pgcount, mapped = 0;
+   size_t pgsize = get_pgsize(iova | paddr, len, );
  
-			ops->map(ops, iova, paddr, pgsize, prot, GFP_KERNEL);

-   iova += pgsize;
-   paddr += pgsize;
-   len -= pgsize;
+   ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
+  GFP_KERNEL, );
+   /* Don't get stuck if things have gone wrong */
+   mapped = max(mapped, pgsize);
+   iova += mapped;
+   paddr += mapped;
+   len -= mapped;
}
}
  
@@ -344,15 +352,17 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping *mapping)

mapping->mmu->as, iova, len);
  
  	while (unmapped_len < len) {

-   size_t unmapped_page;
-   size_t pgsize = get_pgsize(iova, len - unmapped_len);
+   size_t unmapped_page, pgcount;
+   size_t pgsize = get_pgsize(iova, len - unmapped_len, );
  
-		if (ops->iova_to_phys(ops, iova)) {

-   unmapped_page = ops->unmap(ops, iova, pgsize, NULL);
-   WARN_ON(unmapped_page != pgsize);
+   if (bo->is_heap)
+   pgcount = 1;
+   if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
+   unmapped_page = ops->unmap_pages(ops, iova, pgsize, 
pgcount, NULL);
+   WARN_ON(unmapped_page != pgsize * pgcount);


This patch causes this WARN_ON to trigger. It doesn't happen all the
time, I see that the whole unmapped area is mapped. Initially, I thought
that this happens because it tries to unmap a partially mapped range,
but I checked that ops->iova_to_phys() returns address for all 4k chunks.

For example the pgsize * pgcount = 0x800, while returned
unmapped_page = 0x600.

I don't see this problem with this patch reverted. This is using today's
linux-next. Any ideas?


What's the base IOVA in such a case? I'm wondering if the truncated size 
lines up to any interesting boundary. Presumably you're not seeing any 
additional warnings from io-pgtable itself?


Thanks,
Robin.


[PATCH v2] gpu: host1x: Avoid trying to use GART on Tegra20

2022-10-20 Thread Robin Murphy
Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with
bus") quite some time ago, the GART driver has effectively disabled
itself to avoid issues with the GPU driver expecting it to work in ways
that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to
IOMMU device registration") that bodge no longer works, but really the
GPU driver should be responsible for its own behaviour anyway. Make the
workaround explicit.

Reported-by: Jon Hunter 
Suggested-by: Dmitry Osipenko 
Signed-off-by: Robin Murphy 
---

v2: Cover DRM instance too, move into *_wants_iommu() for consistency

 drivers/gpu/drm/tegra/drm.c | 4 
 drivers/gpu/host1x/dev.c| 4 
 2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 6748ec1e0005..a1f909dac89a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1093,6 +1093,10 @@ static bool host1x_drm_wants_iommu(struct host1x_device 
*dev)
struct host1x *host1x = dev_get_drvdata(dev->dev.parent);
struct iommu_domain *domain;
 
+   /* Our IOMMU usage policy doesn't currently play well with GART */
+   if (of_machine_is_compatible("nvidia,tegra20"))
+   return false;
+
/*
 * If the Tegra DRM clients are backed by an IOMMU, push buffers are
 * likely to be allocated beyond the 32-bit boundary if sufficient
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 0cd3f97e7e49..f60ea24db0ec 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -292,6 +292,10 @@ static void host1x_setup_virtualization_tables(struct 
host1x *host)
 
 static bool host1x_wants_iommu(struct host1x *host1x)
 {
+   /* Our IOMMU usage policy doesn't currently play well with GART */
+   if (of_machine_is_compatible("nvidia,tegra20"))
+   return false;
+
/*
 * If we support addressing a maximum of 32 bits of physical memory
 * and if the host1x firewall is enabled, there's no need to enable
-- 
2.36.1.dirty



Re: [PATCH] gpu: host1x: Avoid trying to use GART on Tegra20

2022-10-20 Thread Robin Murphy

On 2022-10-20 13:25, Jon Hunter wrote:

Hi Robin,

On 19/10/2022 18:23, Robin Murphy wrote:

Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with
bus") quite some time ago, the GART driver has effectively disabled
itself to avoid issues with the GPU driver expecting it to work in ways
that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to
IOMMU device registration") that bodge no longer works, but really the
GPU driver should be responsible for its own behaviour anyway. Make the
workaround explicit.

Reported-by: Jon Hunter 
Suggested-by: Dmitry Osipenko 
Signed-off-by: Robin Murphy 
---
  drivers/gpu/host1x/dev.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index a13fd9441edc..1cae8eea92cf 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -352,6 +352,10 @@ static struct iommu_domain 
*host1x_iommu_attach(struct host1x *host)

  if (!host1x_wants_iommu(host) || domain)
  return domain;
+    /* Our IOMMU usage policy doesn't currently play well with GART */
+    if (of_machine_is_compatible("nvidia,tegra20"))
+    return NULL;
+
  host->group = iommu_group_get(host->dev);
  if (host->group) {
  struct iommu_domain_geometry *geometry;



Thanks for sending. I gave this a quick test, but I still see ...

[    2.901739] tegra-gr2d 5414.gr2d: failed to attach to domain: -19
[    2.908373] drm drm: failed to initialize 5414.gr2d: -19


Urgh, of course it's the same-but-different logic in host1x_drm_probe() 
that matters for that one. Am I allowed to mention how much these 
drivers drive me to despair?


v2 coming soon...

Thanks,
Robin.


[PATCH] gpu: host1x: Avoid trying to use GART on Tegra20

2022-10-19 Thread Robin Murphy
Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with
bus") quite some time ago, the GART driver has effectively disabled
itself to avoid issues with the GPU driver expecting it to work in ways
that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to
IOMMU device registration") that bodge no longer works, but really the
GPU driver should be responsible for its own behaviour anyway. Make the
workaround explicit.

Reported-by: Jon Hunter 
Suggested-by: Dmitry Osipenko 
Signed-off-by: Robin Murphy 
---
 drivers/gpu/host1x/dev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index a13fd9441edc..1cae8eea92cf 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -352,6 +352,10 @@ static struct iommu_domain *host1x_iommu_attach(struct 
host1x *host)
if (!host1x_wants_iommu(host) || domain)
return domain;
 
+   /* Our IOMMU usage policy doesn't currently play well with GART */
+   if (of_machine_is_compatible("nvidia,tegra20"))
+   return NULL;
+
host->group = iommu_group_get(host->dev);
if (host->group) {
struct iommu_domain_geometry *geometry;
-- 
2.36.1.dirty



Re: [PATCH v2 0/2] drm/rockchip: dw_hdmi: Add 4k@30 support

2022-10-05 Thread Robin Murphy

On 2022-10-05 12:10, Sascha Hauer wrote:

On Wed, Oct 05, 2022 at 12:51:57PM +0200, Dan Johansen wrote:


Den 05.10.2022 kl. 12.06 skrev Sascha Hauer:

On Wed, Sep 28, 2022 at 10:39:27AM +0200, Dan Johansen wrote:

Den 28.09.2022 kl. 10.37 skrev Sascha Hauer:

On Tue, Sep 27, 2022 at 07:53:54PM +0200, Dan Johansen wrote:

Den 26.09.2022 kl. 12.30 skrev Michael Riesch:

Hi Sascha,

On 9/26/22 10:04, Sascha Hauer wrote:

This series adds support for 4k@30 to the rockchip HDMI controller. This
has been tested on a rk3568 rock3a board. It should be possible to add
4k@60 support the same way, but it doesn't work for me, so let's add
4k@30 as a first step.

 Sascha

Changes since v1:
- Allow non standard clock rates only on Synopsys phy as suggested by
  Robin Murphy

Sascha Hauer (2):
  drm/rockchip: dw_hdmi: relax mode_valid hook
  drm/rockchip: dw_hdmi: Add support for 4k@30 resolution

 drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 34 -
 1 file changed, 27 insertions(+), 7 deletions(-)

Thanks for the v2! On a RK3568 EVB1 with a HP 27f 4k monitor

Tested-by: Michael Riesch 

Sadly this still doesn't give my display out on my 2k monitor. Not even just
1080p picture like the old current implementation does.

By "like the old current implementation" you mean that this patchset
introduces a regression for you?

Yes. What currently in the kernel at least shows as 1080p on my 2K monitor,
while this patchset turns off the screen.

Which SoC are you testing this on? I assume RK3568, right? Which patch
introduces that regression, the first or the second one?

I tested on the Odroid M, which is rk3568.
I have only applied them both, as I was under the impression that both are
needed for the 4k support.


Yes, both I needed, but I am interested which one introduces the
regression as I can't reproduce it.


One thing that might be worthwhile is to compare what "drm.debug=4" 
output says about the chosen mode and its clock rate vs. what 
/sys/kernel/debug/clk/clk_summary says about how things ended up in 
practice, to see whether it's a case of the clock not being able to get 
close enough to the correct rate at all.


Robin.


Re: [PATCH 1/2] drm/rockchip: dw_hdmi: relax mode_valid hook

2022-09-22 Thread Robin Murphy

On 25/08/2022 12:40 pm, Sascha Hauer wrote:

On Wed, Aug 24, 2022 at 05:07:50PM +0100, Robin Murphy wrote:

On 2022-08-22 16:20, Sascha Hauer wrote:

The driver checks if the pixel clock of the given mode matches an entry
in the mpll config table. The frequencies in the mpll table are meant as
a frequency range up to which the entry works, not as a frequency that
must match the pixel clock. Return MODE_OK when the pixelclock is
smaller than one of the mpll frequencies to allow for more display
resolutions.


Has the issue been fixed that this table is also used to validate modes on
RK3328, which doesn't even *have* the Synopsys phy? Last time I looked, that
tended to lead to complete display breakage when the proper phy driver later
decides it doesn't like a pixel clock that mode_valid already said was OK.

The more general concern is that these known-good clock rates are good, but
others may not be even when nominally supported, which I suspect is the
dirty secret of why it was implemented this way to begin with. I would
really really love this patch so my RK3399 board can drive my 1920x1200
monitor at native resolution, but on the other hand my RK3288 box generates
such a crap 154MHz clock for that mode that - unless that's been improved in
the meantime too - patch #2 might be almost be considered a regression if it
means such a setup would start defaulting to an unusably glitchy display
instead of falling back to 1920x1080 which does at least work perfectly
(even if the slightly squished aspect ratio is ugly).


I could limit the change to rk3568 only. Would that be an option?
Not sure if I should rk3399 as well then as this would work, at least in
your setup.


I think for now it might be enough to force an exact match if 
hdmi->plat_data.phy_force_vendor is set, with a big fat comment that 
it's to preserve the previous behaviour until vendor phy support can be 
sorted out properly. Beyond that, given that RK3288 and RK3399 do 
nominally support 4K as well, I don't think we actually have to leave 
them out, I just wanted to flag up that untested non-standard clock 
rates are a known source of potential issues once we open the door to them.


Cheers,
Robin.


Re: [PATCH 2/2] drm/panfrost: replace endian-specific types with generic ones

2022-09-21 Thread Robin Murphy

On 2022-09-21 09:48, Steven Price wrote:

On 20/09/2022 23:13, Alyssa Rosenzweig wrote:

Tentative r-b, but we *do* need to make a decision on how we want to
handle endianness. I don't have strong feelings but the results of that
discussion should go in the commit message.


Linux currently treats the dump objects specially - the headers are
little endian. All the other (Panfrost) DRM structures are native endian
(although I doubt anyone has tested it so I expect bugs).


If there can be *any* native-endian data included in the dump, then the 
original endianness needs to be recorded to be able to analyse it 
correctly anyway. The dumping code can't know the granularity at which 
arbitrary BOs may or may not need to be byteswapped to make everything 
consistently LE.



I've no
particularly strong views on this, but since the dump objects are likely
to be saved to disk and transferred between computers it makes sense to
fix the endianness for those. The __le types currently mean sparse can
warn if we screw up in the kernel, so it would be a shame to lose that
type checking.

Another option would be to extend the list of typedefs in
include/uapi/drm/drm.h to include the __le types. We'd need wider buy-in
for that change though.

Finally etnaviv 'solves' the issue by not including the dump structures
in the UABI header...

Or of course we could just actually use native endian and detect from
the magic which endian is in use. That would require ripping out the
cpu_to_lexx() calls in Linux and making the user space tool more
intelligent. I'm happy with that, but it's pushing the complexity onto Mesa.


If there's a clearly identifiable header, then I'd say making the whole 
dump native-endian is probably the way to go. Unless and until anyone 
actually demands to be able to do cross-endian post-mortem GPU 
debugging, the realistic extent of the complexity in Mesa is that it 
doesn't recognise the foreign dump format and gives up, which I assume 
is already implemented :)


Cheers,
Robin.



Steve


On Tue, Sep 20, 2022 at 10:15:45PM +0100, Adri??n Larumbe wrote:

__le32 and __l64 endian-specific types aren't portable and not available on
FreeBSD, for which there's a uAPI compatible reimplementation of Panfrost.

Replace these specific types with more generic unsigned ones, to prevent
FreeBSD Mesa build errors.

Bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7252
Fixes: 730c2bf4ad39 ("drm/panfrost: Add support for devcoredump")
Signed-off-by: Adri??n Larumbe 
---
  include/uapi/drm/panfrost_drm.h | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index bd77254be121..c1a10a9366a9 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -236,24 +236,24 @@ struct drm_panfrost_madvise {
  #define PANFROSTDUMP_BUF_TRAILER (PANFROSTDUMP_BUF_BO + 1)
  
  struct panfrost_dump_object_header {

-   __le32 magic;
-   __le32 type;
-   __le32 file_size;
-   __le32 file_offset;
+   __u32 magic;
+   __u32 type;
+   __u32 file_size;
+   __u32 file_offset;
  
  	union {

struct {
-   __le64 jc;
-   __le32 gpu_id;
-   __le32 major;
-   __le32 minor;
-   __le64 nbos;
+   __u64 jc;
+   __u32 gpu_id;
+   __u32 major;
+   __u32 minor;
+   __u64 nbos;
} reghdr;
  
  		struct {

-   __le32 valid;
-   __le64 iova;
-   __le32 data[2];
+   __u32 valid;
+   __u64 iova;
+   __u32 data[2];
} bomap;
  
  		/*

@@ -261,14 +261,14 @@ struct panfrost_dump_object_header {
 * with new fields and also keep it 512-byte aligned
 */
  
-		__le32 sizer[496];

+   __u32 sizer[496];
};
  };
  
  /* Registers object, an array of these */

  struct panfrost_dump_registers {
-   __le32 reg;
-   __le32 value;
+   __u32 reg;
+   __u32 value;
  };
  
  #if defined(__cplusplus)

--
2.37.0





Re: [BUG] ls1046a: eDMA does not transfer data from I2C

2022-09-20 Thread Robin Murphy

On 2022-09-19 23:24, Sean Anderson wrote:

Hi all,

I discovered a bug in either imx_i2c or fsl-edma on the LS1046A where no
data is read in i2c_imx_dma_read except for the last two bytes (which
are not read using DMA). This is perhaps best illustrated with the
following example:

# hexdump -C /sys/bus/nvmem/devices/0-00540/nvmem
[  308.914884] i2c i2c-0: 00080938 0x00088938 
0xf5401000 75401000
[  308.923529] src= 2180004 dst=f5401000 attr=   0 soff=   0 nbytes=1 slast=
   0
[  308.923529] citer=  7e biter=  7e doff=   1 dlast_sga=   0
[  308.923529] major_int=1 disable_req=1 enable_sg=0
[  308.942113] fsl-edma 2c0.edma: vchan 1b4371fc: txd 
d9dd26c5[4]: submitted
[  308.974049] fsl-edma 2c0.edma: txd d9dd26c5[4]: marked complete
[  308.981339] i2c i2c-0: 00080938 = [2e 2e 2f 2e 2e 2f 2e 2e 2f 64 65 
76 69 63 65 73 2f 70 6c 61 74 66 6f 72 6d 2f 73 6f 63 2f 32 31 38 30 30 30 30 
2e 69 32 63 2f 69 32 63 2d 30 2f 30 2d 30 30 35 34 2f 30 2d 30 30 35 34 30 00 
00]
[  309.002226] i2c i2c-0: 75401000 = [2e 2e 2f 2e 2e 2f 2e 2e 2f 64 65 
76 69 63 65 73 2f 70 6c 61 74 66 6f 72 6d 2f 73 6f 63 2f 32 31 38 30 30 30 30 
2e 69 32 63 2f 69 32 63 2d 30 2f 30 2d 30 30 35 34 2f 30 2d 30 30 35 34 30 00 
00]
[  309.024649] i2c i2c-0: 000809380080 0x000889380080 
0xf5401800 75401800
[  309.033270] src= 2180004 dst=f5401800 attr=   0 soff=   0 nbytes=1 slast=
   0
[  309.033270] citer=  7e biter=  7e doff=   1 dlast_sga=   0
[  309.033270] major_int=1 disable_req=1 enable_sg=0
[  309.051633] fsl-edma 2c0.edma: vchan 1b4371fc: txd 
d9dd26c5[5]: submitted
[  309.083526] fsl-edma 2c0.edma: txd d9dd26c5[5]: marked complete
[  309.090807] i2c i2c-0: 000809380080 = [00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00]
[  309.111694] i2c i2c-0: 75401800 = [00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00]
  2e 2e 2f 2e 2e 2f 2e 2e  2f 64 65 76 69 63 65 73  |../../../devices|
0010  2f 70 6c 61 74 66 6f 72  6d 2f 73 6f 63 2f 32 31  |/platform/soc/21|
0020  38 30 30 30 30 2e 69 32  63 2f 69 32 63 2d 30 2f  |8.i2c/i2c-0/|
0030  30 2d 30 30 35 34 2f 30  2d 30 30 35 34 30 00 00  |0-0054/0-00540..|
0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 ff ff  ||
0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 ff 5b  |...[|
0100

(patch with my debug prints appended below)

Despite the DMA completing successfully, no data was copied into the
buffer, leaving the original (now junk) contents. I probed the I2C bus
with an oscilloscope, and I verified that the transfer did indeed occur.
The timing between submission and completion seems reasonable for the
bus speed (50 kHz for whatever reason).

I had a look over the I2C driver, and nothing looked obviously
incorrect. If anyone has ideas on what to try, I'm more than willing.


Is the DMA controller cache-coherent? I see the mainline LS1046A DT 
doesn't have a "dma-coherent" property for it, but the behaviour is 
entirely consistent with that being wrong - dma_map_single() cleans the 
cache, coherent DMA write hits the still-present cache lines, 
dma_unmap_single() invalidates the cache, and boom, the data is gone and 
you read back the previous content of the buffer that was cleaned out to 
DRAM beforehand.


Robin.


--Sean

diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index 15896e2413c4..1d9d4a55d2af 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -391,6 +391,12 @@ void fsl_edma_fill_tcd(struct fsl_edma_hw_tcd *tcd, u32 
src, u32 dst,
  {
 u16 csr = 0;
  
+   pr_info("src=%8x dst=%8x attr=%4x soff=%4x nbytes=%u slast=%8x\n"

+   "citer=%4x biter=%4x doff=%4x dlast_sga=%8x\n"
+   "major_int=%d disable_req=%d enable_sg=%d\n",
+   src, dst, attr, soff, nbytes, slast, citer, biter, doff,
+   dlast_sga, major_int, disable_req, enable_sg);
+
 /*
  * eDMA hardware SGs require the TCDs to be stored in little
  * endian format irrespective of the register endian model.
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 3576b63a6c03..0217f0cb1331 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -402,6 +402,9 @@ static int i2c_imx_dma_xfer(struct imx_i2c_struct *i2c_imx,
 dev_err(dev, "DMA mapping failed\n");
 

Re: [PATCH v2 3/3] arm64: dts: rockchip: enable gamma control on RK3399

2022-09-15 Thread Robin Murphy

On 2022-09-15 17:53, Hugh Cole-Baker wrote:



On 15 Sep 2022, at 15:40, Robin Murphy  wrote:

On 2021-10-19 22:58, Hugh Cole-Baker wrote:

Define the memory region on RK3399 VOPs containing the gamma LUT at
base+0x2000.
Signed-off-by: Hugh Cole-Baker 
---
Changes from v1: no changes in this patch
  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 3871c7fd83b0..9cbf6ccdd256 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -1619,7 +1619,7 @@ i2s2: i2s@ff8a {
vopl: vop@ff8f {
compatible = "rockchip,rk3399-vop-lit";
-   reg = <0x0 0xff8f 0x0 0x3efc>;
+   reg = <0x0 0xff8f 0x0 0x2000>, <0x0 0xff8f2000 0x0 0x400>;
interrupts = ;
assigned-clocks = < ACLK_VOP1>, < HCLK_VOP1>;
assigned-clock-rates = <4>, <1>;
@@ -1676,7 +1676,7 @@ vopl_mmu: iommu@ff8f3f00 {
vopb: vop@ff90 {
compatible = "rockchip,rk3399-vop-big";
-   reg = <0x0 0xff90 0x0 0x3efc>;
+   reg = <0x0 0xff90 0x0 0x2000>, <0x0 0xff902000 0x0 0x1000>;


Doesn't the second range still need to be shorter than 0xf00 to avoid 
overlapping the IOMMU?

Robin.


This should be OK, the other registers are in the range ff90-ff902000, the
gamma LUT occupies the range ff902000-ff903000, and then the IOMMU registers
begin at ff903f00. I don't see any overlaps with the IOMMU unless I'm
misreading the dts.


Oh dear, you're quite right, apparently I can't add up in hex today. 
Sorry for the noise!


Robin.


Re: [PATCH v2 3/3] arm64: dts: rockchip: enable gamma control on RK3399

2022-09-15 Thread Robin Murphy

On 2021-10-19 22:58, Hugh Cole-Baker wrote:

Define the memory region on RK3399 VOPs containing the gamma LUT at
base+0x2000.

Signed-off-by: Hugh Cole-Baker 
---

Changes from v1: no changes in this patch

  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 3871c7fd83b0..9cbf6ccdd256 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -1619,7 +1619,7 @@ i2s2: i2s@ff8a {
  
  	vopl: vop@ff8f {

compatible = "rockchip,rk3399-vop-lit";
-   reg = <0x0 0xff8f 0x0 0x3efc>;
+   reg = <0x0 0xff8f 0x0 0x2000>, <0x0 0xff8f2000 0x0 0x400>;
interrupts = ;
assigned-clocks = < ACLK_VOP1>, < HCLK_VOP1>;
assigned-clock-rates = <4>, <1>;
@@ -1676,7 +1676,7 @@ vopl_mmu: iommu@ff8f3f00 {
  
  	vopb: vop@ff90 {

compatible = "rockchip,rk3399-vop-big";
-   reg = <0x0 0xff90 0x0 0x3efc>;
+   reg = <0x0 0xff90 0x0 0x2000>, <0x0 0xff902000 0x0 0x1000>;


Doesn't the second range still need to be shorter than 0xf00 to avoid 
overlapping the IOMMU?


Robin.


interrupts = ;
assigned-clocks = < ACLK_VOP0>, < HCLK_VOP0>;
assigned-clock-rates = <4>, <1>;


Re: [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf

2022-09-07 Thread Robin Murphy

On 2022-09-07 16:23, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 07:29:58AM -0700, Christoph Hellwig wrote:

On Wed, Sep 07, 2022 at 09:33:11AM -0300, Jason Gunthorpe wrote:

Yes, you said that, and I said that when the AMD driver first merged
it - but it went in anyhow and now people are using it in a bunch of
places.


drm folks made up their own weird rules, if they internally stick
to it they have to listen to it given that they ignore review comments,
but it violates the scatterlist API and has not business anywhere
else in the kernel.  And yes, there probably is a reason or two why
the drm code is unusually error prone.


That may be, but it is creating problems if DRM gets to do X crazy
thing and nobody else can..

So, we have two issues here

  1) DMABUF abuses the scatter list, but this is very constrainted we have
 this weird special "DMABUF scatterlist" that is only touched by DMABUF
 importers. The imports signal that they understand the format with
 a flag. This is ugly and would be nice to clean to a dma mapped
 address list of some sort.

 I spent alot of time a few years ago removing driver touches of
 the SGL and preparing the RDMA stack to do this kind of change, at
 least.

  2) DMABUF abuses dma_map_resource() for P2P and thus doesn't work in
 certain special cases.


FWIW, dma_map_resource() *is* for P2P in general. The classic case of 
one device poking at another's registers that was the original 
motivation is a standalone DMA engine reading/writing a peripheral 
device's FIFO, so the very similar inter-device doorbell signal is 
absolutely in scope too; VRAM might be a slightly greyer area, but if 
it's still not page-backed kernel memory then I reckon that's fair game.


The only trouble is that it's not geared for *PCI* P2P when that may or 
may not happen entirely upstream of IOMMU translation.


Robin.


Re: [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf

2022-09-07 Thread Robin Murphy

On 2022-09-07 13:33, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 05:05:57AM -0700, Christoph Hellwig wrote:

On Tue, Sep 06, 2022 at 08:48:28AM -0300, Jason Gunthorpe wrote:

Right, this whole thing is the "standard" that dmabuf has adopted
instead of the struct pages. Once the AMD GPU driver started doing
this some time ago other drivers followed.


But it is simple wrong.  The scatterlist requires struct page backing.
In theory a physical address would be enough, but when Dan Williams
sent patches for that Linus shot them down.


Yes, you said that, and I said that when the AMD driver first merged
it - but it went in anyhow and now people are using it in a bunch of
places.

I'm happy that Christian wants to start trying to fix it, and will
help him, but it doesn't really impact this. Whatever fix is cooked up
will apply equally to vfio and habana.


We've just added support for P2P segments in scatterlists, can that not 
be used here?


Robin.


That being said the scatterlist is the wrong interface here (and
probably for most of it's uses).  We really want a lot-level struct
with just the dma_address and length for the DMA side, and leave it
separate from that what is used to generate it (in most cases that
would be a bio_vec).


Oh definitely


Now we have struct pages, almost, but I'm not sure if their limits are
compatible with VFIO? This has to work for small bars as well.


Why would small BARs be problematic for the pages?  The pages are more
a problem for gigantic BARs do the memory overhead.


How do I get a struct page * for a 4k BAR in vfio?

The docs say:

  ..hotplug api on memory block boundaries. The implementation relies on
  this lack of user-api constraint to allow sub-section sized memory
  ranges to be specified to :c:func:`arch_add_memory`, the top-half of
  memory hotplug. Sub-section support allows for 2MB as the cross-arch
  common alignment granularity for :c:func:`devm_memremap_pages`.

Jason


Re: [PATCH] drm/amdgpu: fix repeated words in comments

2022-09-07 Thread Robin Murphy

On 2022-09-07 12:34, Jilin Yuan wrote:

Delete the redundant word 'and'.
Delete the redundant word 'in'.
Delete the redundant word 'the'.
Delete the redundant word 'are'.

Signed-off-by: Jilin Yuan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index afaa1056e039..71367b9dd590 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -958,7 +958,7 @@ static void amdgpu_device_vram_scratch_fini(struct 
amdgpu_device *adev)
   * @registers: pointer to the register array
   * @array_size: size of the register array
   *
- * Programs an array or registers with and and or masks.


Not redundant - the first "and" refers to the boolean operation, the 
second is a conjunction. This is clear from the code if you look at it. 
You could perhaps restyle the comment as "with AND and OR masks" to make 
that stand out a bit better, but either way, please try to actually 
understand the changes you're proposing.


Robin.


+ * Programs an array or registers with and or masks.
   * This is a helper for setting golden registers.
   */
  void amdgpu_device_program_register_sequence(struct amdgpu_device *adev,
@@ -1569,7 +1569,7 @@ static int amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
   * @state: vga_switcheroo state
   *
   * Callback for the switcheroo driver.  Suspends or resumes the
- * the asics before or after it is powered up using ACPI methods.
+ * asics before or after it is powered up using ACPI methods.
   */
  static void amdgpu_switcheroo_set_state(struct pci_dev *pdev,
enum vga_switcheroo_state state)
@@ -3203,7 +3203,7 @@ static int amdgpu_device_ip_resume_phase2(struct 
amdgpu_device *adev)
   *
   * Main resume function for hardware IPs.  The hardware IPs
   * are split into two resume functions because they are
- * are also used in in recovering from a GPU reset and some additional
+ * also used in recovering from a GPU reset and some additional
   * steps need to be take between them.  In this case (S3/S4) they are
   * run sequentially.
   * Returns 0 on success, negative error code on failure.


Re: [PATCH] drm/amdgpu: fix repeated words in comments

2022-09-07 Thread Robin Murphy

On 2022-09-07 12:26, Jilin Yuan wrote:

Delete the redundant word 'we'.


FWIW, to me it's not redundant because while indeed it is not correct, 
it looks exactly like the kind of typo I might make of "if we", and 
parsing it as *that* does make sense. The sentence you end up with here 
can hardly be considered an improvement since it is still ungrammatical 
nonsense.


Robin.


Signed-off-by: Jilin Yuan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 02cb3a12dd76..6d6cc4637d41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -694,7 +694,7 @@ static int amdgpu_vce_cs_reloc(struct amdgpu_cs_parser *p, 
struct amdgpu_ib *ib,
   * @allocated: allocated a new handle?
   *
   * Validates the handle and return the found session index or -EINVAL
- * we we don't have another free session index.
+ * we don't have another free session index.
   */
  static int amdgpu_vce_validate_handle(struct amdgpu_cs_parser *p,
  uint32_t handle, uint32_t *allocated)


Re: mesa-22.3.0-devel + linux-5.19.6 + mediapipe: panfrost js fault

2022-09-06 Thread Robin Murphy

On 2022-09-04 05:13, Chris Ruehl wrote:

Hi,

Something you might have a head up for it,

have a mediapipe application for POSE which use the T860 GPU for the 
calculation
but the kernel driver report error (js fault) - I see one or 2 
calculation frames on the mat-picture output only before

the pipe stop working.

Linux bullseye64 5.19.6 #1 SMP PREEMPT Fri Sep 2 02:25:59 UTC 2022 
aarch64 GNU/Linux


[    5.164415] panfrost ff9a.gpu: clock rate = 5
[    5.169845] panfrost ff9a.gpu: [drm:panfrost_devfreq_init 
[panfrost]] Failed to register cooling device
[    5.169989] panfrost ff9a.gpu: mali-t860 id 0x860 major 0x2 minor 
0x0 status 0x0
[    5.16] panfrost ff9a.gpu: features: ,0407, 
issues: ,24040400
[    5.170008] panfrost ff9a.gpu: Features: L2:0x07120206 
Shader:0x Tiler:0x0809 Mem:0x1 MMU:0x2830 AS:0xff JS:0x7

[    5.170017] panfrost ff9a.gpu: shader_present=0xf l2_present=0x1
[    5.206827] [drm] Initialized panfrost 1.2.0 20180908 for 
ff9a.gpu on minor 1

...
[  162.862064] panfrost ff9a.gpu: js fault, js=1, 
status=DATA_INVALID_FAULT, head=0xaba7100, tail=0xaba7100
[  162.862269] panfrost ff9a.gpu: js fault, js=1, 
status=DATA_INVALID_FAULT, head=0xa1e0100, tail=0xa1e0100


Have a RK3399 customized board and compiled the mesa drivers for it

meson $1 . build/ \
-D dri-drivers= \
-D vulkan-drivers= \
-D gallium-drivers=panfrost,kmsro \
-D llvm=disabled \
-D libunwind=false \
-D platforms=x11,wayland

glmark2 runs flawless scores 588.

Same code run on a x86_64 with an AMD GPU working fine.

Anything help fix the bug is welcome.


This is almost certainly a userspace problem, so you're best off raising 
a Mesa issue with some more details - an apitrace of the failing 
application and/or PAN_MESA_DEBUG=trace output capturing the offending 
invalid descriptors would probably be a good starting point.


Robin.


Re: [PATCH] drm/amdgpu: cleanup coding style in amdgpu_drv.c

2022-09-06 Thread Robin Murphy

On 2022-09-04 20:15, Jingyu Wang wrote:
[...]

@@ -565,8 +566,8 @@ module_param_named(timeout_period, 
amdgpu_watchdog_timer.period, uint, 0644);
   */
  #ifdef CONFIG_DRM_AMDGPU_SI
  
-#if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)

-int amdgpu_si_support = 0;
+#if IS_ENABLED(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)


Hint: read the checkpatch warning again more closely, and consider what 
IS_ENABLED() does and therefore why this is still not quite right.


Robin.


+int amdgpu_si_support;
  MODULE_PARM_DESC(si_support, "SI support (1 = enabled, 0 = disabled 
(default))");
  #else
  int amdgpu_si_support = 1;


[PATCH 3/3] iommu/dma: Make header private

2022-08-24 Thread Robin Murphy
Now that dma-iommu.h only contains internal interfaces, make it
private to the IOMMU subsytem.

Signed-off-by: Robin Murphy 
---
 drivers/acpi/viot.c  |  1 -
 drivers/gpu/drm/exynos/exynos_drm_dma.c  |  1 -
 drivers/iommu/amd/iommu.c|  2 +-
 drivers/iommu/apple-dart.c   |  3 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c  |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c|  2 +-
 drivers/iommu/dma-iommu.c|  3 ++-
 {include/linux => drivers/iommu}/dma-iommu.h | 17 +
 drivers/iommu/intel/iommu.c  |  2 +-
 drivers/iommu/iommu.c|  3 ++-
 drivers/iommu/virtio-iommu.c |  3 ++-
 11 files changed, 13 insertions(+), 26 deletions(-)
 rename {include/linux => drivers/iommu}/dma-iommu.h (67%)

diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
index 6132092dab2a..ed752cbbe636 100644
--- a/drivers/acpi/viot.c
+++ b/drivers/acpi/viot.c
@@ -19,7 +19,6 @@
 #define pr_fmt(fmt) "ACPI: VIOT: " fmt
 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_dma.c 
b/drivers/gpu/drm/exynos/exynos_drm_dma.c
index d819ee69dfb7..7012aa8ed4c6 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dma.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dma.c
@@ -4,7 +4,6 @@
 // Author: Inki Dae 
 // Author: Andrzej Hajda 
 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2df1bfa884e5..b339bf13259d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,6 +39,7 @@
 #include 
 
 #include "amd_iommu.h"
+#include "../dma-iommu.h"
 #include "../irq_remapping.h"
 
 #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ed6b5fa538af..716f34a768b1 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -33,6 +32,8 @@
 #include 
 #include 
 
+#include "dma-iommu.h"
+
 #define DART_MAX_STREAMS 16
 #define DART_MAX_TTBR 4
 #define MAX_DARTS_PER_DEVICE 2
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c13b46a15dcb..f1785e518a90 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -29,6 +28,7 @@
 #include 
 
 #include "arm-smmu-v3.h"
+#include "../../dma-iommu.h"
 #include "../../iommu-sva-lib.h"
 
 static bool disable_bypass = true;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2cece34f4824..c30f82c19240 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -40,6 +39,7 @@
 #include 
 
 #include "arm-smmu.h"
+#include "../../dma-iommu.h"
 
 /*
  * Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6809b33ac9df..9297b741f5e8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -30,6 +29,8 @@
 #include 
 #include 
 
+#include "dma-iommu.h"
+
 struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
diff --git a/include/linux/dma-iommu.h b/drivers/iommu/dma-iommu.h
similarity index 67%
rename from include/linux/dma-iommu.h
rename to drivers/iommu/dma-iommu.h
index e83de4f1f3d6..c6d0235feb6e 100644
--- a/include/linux/dma-iommu.h
+++ b/drivers/iommu/dma-iommu.h
@@ -5,15 +5,10 @@
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
 
-#include 
-#include 
+#include 
 
 #ifdef CONFIG_IOMMU_DMA
-#include 
-#include 
-#include 
 
-/* Domain management interface for IOMMU drivers */
 int iommu_get_dma_cookie(struct iommu_domain *domain);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
@@ -21,16 +16,10 @@ int iommu_dma_init_fq(struct iommu_domain *domain);
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
-   struct iommu_domain *domain);
-
 extern bool iommu_dma_forcedac;
 
 #else /* CONFIG_IOMMU_DMA */
 
-struct iommu_domain;
-struct device;
-
 static inline int iommu_dma_init_fq(struct iommu_domain *domain)
 {
return -EINVAL;
@@ -45,9 +34,5 @@ static inline void iommu_put_dma_cookie(str

Re: [PATCH 1/2] drm/rockchip: dw_hdmi: relax mode_valid hook

2022-08-24 Thread Robin Murphy

On 2022-08-22 16:20, Sascha Hauer wrote:

The driver checks if the pixel clock of the given mode matches an entry
in the mpll config table. The frequencies in the mpll table are meant as
a frequency range up to which the entry works, not as a frequency that
must match the pixel clock. Return MODE_OK when the pixelclock is
smaller than one of the mpll frequencies to allow for more display
resolutions.


Has the issue been fixed that this table is also used to validate modes 
on RK3328, which doesn't even *have* the Synopsys phy? Last time I 
looked, that tended to lead to complete display breakage when the proper 
phy driver later decides it doesn't like a pixel clock that mode_valid 
already said was OK.


The more general concern is that these known-good clock rates are good, 
but others may not be even when nominally supported, which I suspect is 
the dirty secret of why it was implemented this way to begin with. I 
would really really love this patch so my RK3399 board can drive my 
1920x1200 monitor at native resolution, but on the other hand my RK3288 
box generates such a crap 154MHz clock for that mode that - unless 
that's been improved in the meantime too - patch #2 might be almost be 
considered a regression if it means such a setup would start defaulting 
to an unusably glitchy display instead of falling back to 1920x1080 
which does at least work perfectly (even if the slightly squished aspect 
ratio is ugly).


Thanks,
Robin.


Signed-off-by: Sascha Hauer 
---
  drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index c14f888938688..b6b662dabedc6 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -251,7 +251,7 @@ dw_hdmi_rockchip_mode_valid(struct dw_hdmi *hdmi, void 
*data,
int i;
  
  	for (i = 0; mpll_cfg[i].mpixelclock != (~0UL); i++) {

-   if (pclk == mpll_cfg[i].mpixelclock) {
+   if (pclk <= mpll_cfg[i].mpixelclock) {
valid = true;
break;
}


Re: [PATCH] drm/exynos: fix repeated words in comments

2022-08-23 Thread Robin Murphy

On 2022-08-23 13:21, Jilin Yuan wrote:

  Delete the redundant word 'next'.


From the context, I'm not sure it is redundant - as far as I can tell 
this comment seems to be describing a sequence of 3 commands, where 
"current" is the first, "next" is the second, and "next next" implies 
the third. The whole comment could certainly be reworded more clearly, 
but as it stands I suspect a replacement like s/next next/next+1/ is 
more likely to be correct.


Robin.


Signed-off-by: Jilin Yuan 
---
  drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 471fd6c8135f..4f9edca66632 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -1195,7 +1195,7 @@ int exynos_g2d_set_cmdlist_ioctl(struct drm_device 
*drm_dev, void *data,
 * If don't clear SFR registers, the cmdlist is affected by register
 * values of previous cmdlist. G2D hw executes SFR clear command and
 * a next command at the same time then the next command is ignored and
-* is executed rightly from next next command, so needs a dummy command
+* is executed rightly from next command, so needs a dummy command
 * to next command of SFR clear command.
 */
cmdlist->data[cmdlist->last++] = G2D_SOFT_RESET;


Re: [PATCH] drm/panfrost: Update io-pgtable API

2022-08-23 Thread Robin Murphy

On 2022-08-23 03:51, Alyssa Rosenzweig wrote:

-static size_t get_pgsize(u64 addr, size_t size)
+static size_t get_pgsize(u64 addr, size_t size, size_t *count)
  {
-   if (addr & (SZ_2M - 1) || size < SZ_2M)
-   return SZ_4K;
+   size_t blk_offset = -addr % SZ_2M;


addr is unsigned. if this is correct, it's magic.


Eh, it's just well-defined unsigned integer overflow. Take "SZ_2M - 
(addr % SZ_2M)", realise the first term can be anything that's zero 
modulo SZ_2M, including zero, then also that the operations can be done 
in either order to give the same result, and there you go.


Cheers,
Robin.


[PATCH] drm/panfrost: Update io-pgtable API

2022-08-22 Thread Robin Murphy
Convert to io-pgtable's bulk {map,unmap}_pages() APIs, to help the old
single-page interfaces eventually go away. Unmapping heap BOs still
wants to be done a page at a time, but everything else can get the full
benefit of the more efficient interface.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 40 +++--
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index b285a8001b1d..e246d914e7f6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -248,11 +248,15 @@ void panfrost_mmu_reset(struct panfrost_device *pfdev)
mmu_write(pfdev, MMU_INT_MASK, ~0);
 }
 
-static size_t get_pgsize(u64 addr, size_t size)
+static size_t get_pgsize(u64 addr, size_t size, size_t *count)
 {
-   if (addr & (SZ_2M - 1) || size < SZ_2M)
-   return SZ_4K;
+   size_t blk_offset = -addr % SZ_2M;
 
+   if (blk_offset || size < SZ_2M) {
+   *count = min_not_zero(blk_offset, size) / SZ_4K;
+   return SZ_4K;
+   }
+   *count = size / SZ_2M;
return SZ_2M;
 }
 
@@ -287,12 +291,16 @@ static int mmu_map_sg(struct panfrost_device *pfdev, 
struct panfrost_mmu *mmu,
dev_dbg(pfdev->dev, "map: as=%d, iova=%llx, paddr=%lx, 
len=%zx", mmu->as, iova, paddr, len);
 
while (len) {
-   size_t pgsize = get_pgsize(iova | paddr, len);
+   size_t pgcount, mapped = 0;
+   size_t pgsize = get_pgsize(iova | paddr, len, );
 
-   ops->map(ops, iova, paddr, pgsize, prot, GFP_KERNEL);
-   iova += pgsize;
-   paddr += pgsize;
-   len -= pgsize;
+   ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot,
+  GFP_KERNEL, );
+   /* Don't get stuck if things have gone wrong */
+   mapped = max(mapped, pgsize);
+   iova += mapped;
+   paddr += mapped;
+   len -= mapped;
}
}
 
@@ -344,15 +352,17 @@ void panfrost_mmu_unmap(struct panfrost_gem_mapping 
*mapping)
mapping->mmu->as, iova, len);
 
while (unmapped_len < len) {
-   size_t unmapped_page;
-   size_t pgsize = get_pgsize(iova, len - unmapped_len);
+   size_t unmapped_page, pgcount;
+   size_t pgsize = get_pgsize(iova, len - unmapped_len, );
 
-   if (ops->iova_to_phys(ops, iova)) {
-   unmapped_page = ops->unmap(ops, iova, pgsize, NULL);
-   WARN_ON(unmapped_page != pgsize);
+   if (bo->is_heap)
+   pgcount = 1;
+   if (!bo->is_heap || ops->iova_to_phys(ops, iova)) {
+   unmapped_page = ops->unmap_pages(ops, iova, pgsize, 
pgcount, NULL);
+   WARN_ON(unmapped_page != pgsize * pgcount);
}
-   iova += pgsize;
-   unmapped_len += pgsize;
+   iova += pgsize * pgcount;
+   unmapped_len += pgsize * pgcount;
}
 
panfrost_mmu_flush_range(pfdev, mapping->mmu,
-- 
2.36.1.dirty



Re: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables

2022-08-22 Thread Robin Murphy

On 2022-08-22 14:52, Robin Murphy wrote:

On 2022-08-21 19:19, Rob Clark wrote:

From: Rob Clark 

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)


Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/


If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.


Oh, and on top of that, if you did want to go down that route then 
chances are you'd then also want to start looking at using global 
mappings in TTBR1 to avoid increased TLB pressure from kernel buffers, 
and then we'd run up against some pretty horrid MMU-500 errata which so 
far I've been happy to ignore on the basis that Linux doesn't use global 
mappings. Spoiler alert: unless you can additionally convert everything 
to invalidate by VA, the workaround for #752357 most likely makes the 
whole idea moot.


Robin.


Signed-off-by: Rob Clark 
---
  drivers/gpu/drm/msm/msm_iommu.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
b/drivers/gpu/drm/msm/msm_iommu.c

index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
*mmu, u64 iova,

  size_t size)
  {
  struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+    struct adreno_smmu_priv *adreno_smmu =
+    dev_get_drvdata(pagetable->parent->dev);
  struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
  size_t unmapped = 0;
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu 
*mmu, u64 iova,

  size -= 4096;
  }
-    iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);
+    adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
  return (unmapped == size) ? 0 : -EINVAL;
  }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain 
*domain, struct device *dev,

  struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
  {
+    static atomic_t asid = ATOMIC_INIT(1);
  struct adreno_smmu_priv *adreno_smmu = 
dev_get_drvdata(parent->dev);

  struct msm_iommu *iommu = to_msm_iommu(parent);
  struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu 
*msm_iommu_pagetable_create(struct msm_mmu *parent)

  pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
  /*
- * TODO we would like each set of page tables to have a unique ASID
- * to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
- * end up flushing the ASID used for TTBR1 pagetables, which is not
- * what we want.  So for now just use the same ASID as TTBR1.
+ * ASID 0 is used for kernel mapped buffers in TTBR1, which we
+ * do not need to invalidate when unmapping from TTBR0 pgtables.
+ * The hw ASID is at *least* 8b, but can be 16b.  We just assume
+ * the worst:
   */
  pagetable->asid = 0;
+    while (!pagetable->asid)
+    pagetable->asid = atomic_inc_return() & 0xff;
  return >base;
  }


Re: [PATCH 4/5] drm/msm: Use separate ASID for each set of pgtables

2022-08-22 Thread Robin Murphy

On 2022-08-21 19:19, Rob Clark wrote:

From: Rob Clark 

Optimize TLB invalidation by using different ASID for each set of
pgtables.  There can be scenarios where multiple processes end up
with the same ASID (such as >256 processes using the GPU), but this
is harmless, it will only result in some over-invalidation (but
less over-invalidation compared to using ASID=0 for all processes)


Um, if you're still relying on the GPU doing an invalidate-all-by-ASID 
whenever it switches a TTBR, then there's only ever one ASID live in the 
TLBs at once, so it really doesn't matter whether its value stays the 
same or changes. This seems like a whole chunk of complexity to achieve 
nothing :/


If you could actually use multiple ASIDs in a meaningful way to avoid 
any invalidations, you'd need to do considerably more work to keep track 
of reuse, and those races would probably be a lot less benign.


Robin.

.> Signed-off-by: Rob Clark 

---
  drivers/gpu/drm/msm/msm_iommu.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index a54ed354578b..94c8c09980d1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -33,6 +33,8 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 
iova,
size_t size)
  {
struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct adreno_smmu_priv *adreno_smmu =
+   dev_get_drvdata(pagetable->parent->dev);
struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
size_t unmapped = 0;
  
@@ -43,7 +45,7 @@ static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,

size -= 4096;
}
  
-	iommu_flush_iotlb_all(to_msm_iommu(pagetable->parent)->domain);

+   adreno_smmu->tlb_inv_by_id(adreno_smmu->cookie, pagetable->asid);
  
  	return (unmapped == size) ? 0 : -EINVAL;

  }
@@ -147,6 +149,7 @@ static int msm_fault_handler(struct iommu_domain *domain, 
struct device *dev,
  
  struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)

  {
+   static atomic_t asid = ATOMIC_INIT(1);
struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(parent->dev);
struct msm_iommu *iommu = to_msm_iommu(parent);
struct msm_iommu_pagetable *pagetable;
@@ -210,12 +213,14 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu 
*parent)
pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
  
  	/*

-* TODO we would like each set of page tables to have a unique ASID
-* to optimize TLB invalidation.  But iommu_flush_iotlb_all() will
-* end up flushing the ASID used for TTBR1 pagetables, which is not
-* what we want.  So for now just use the same ASID as TTBR1.
+* ASID 0 is used for kernel mapped buffers in TTBR1, which we
+* do not need to invalidate when unmapping from TTBR0 pgtables.
+* The hw ASID is at *least* 8b, but can be 16b.  We just assume
+* the worst:
 */
pagetable->asid = 0;
+   while (!pagetable->asid)
+   pagetable->asid = atomic_inc_return() & 0xff;
  
  	return >base;

  }


Re: [PATCH 2/3] iommu/dma: Move public interfaces to linux/iommu.h

2022-08-22 Thread Robin Murphy

On 2022-08-22 12:21, Christoph Hellwig wrote:

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 70393fbb57ed..79cb6eb560a8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1059,4 +1059,40 @@ void iommu_debugfs_setup(void);
  static inline void iommu_debugfs_setup(void) {}
  #endif
  
+#ifdef CONFIG_IOMMU_DMA

+#include 


I don't think msi.h is actually needed here.

Just make the struct msi_desc and struct msi_msg forward declarations
unconditional and we should be fine.


dma-iommu.c still needs to pick up msi.h for the actual definitions 
somehow, so it seemed logical to keep things the same shape as before. 
However I don't have a particularly strong preference either way.


Thanks,
Robin.


[PATCH 2/3] iommu/dma: Move public interfaces to linux/iommu.h

2022-08-16 Thread Robin Murphy
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with
only a couple more public interfaces left pertaining to MSI integration.
Since these depend on the main IOMMU API header anyway, move their
declarations there, taking the opportunity to update the half-baked
comments to proper kerneldoc along the way.

Signed-off-by: Robin Murphy 
---

Note that iommu_setup_dma_ops() should also become internal in a future
phase of the great IOMMU API upheaval - for now as the last bit of true
arch code glue I consider it more "necessarily exposed" than "public".

 arch/arm64/mm/dma-mapping.c   |  2 +-
 drivers/iommu/dma-iommu.c | 15 ++--
 drivers/irqchip/irq-gic-v2m.c |  2 +-
 drivers/irqchip/irq-gic-v3-its.c  |  2 +-
 drivers/irqchip/irq-gic-v3-mbi.c  |  2 +-
 drivers/irqchip/irq-ls-scfg-msi.c |  2 +-
 drivers/vfio/vfio_iommu_type1.c   |  1 -
 include/linux/dma-iommu.h | 40 ---
 include/linux/iommu.h | 36 
 9 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 599cf81f5685..7d7e9a046305 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 17dd683b2fce..6809b33ac9df 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1633,6 +1633,13 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
 }
 
+/**
+ * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU domain
+ * @desc: MSI descriptor, will store the MSI page
+ * @msi_addr: MSI target address to be mapped
+ *
+ * Return: 0 on success or negative error code if the mapping failed.
+ */
 int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
 {
struct device *dev = msi_desc_to_dev(desc);
@@ -1661,8 +1668,12 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, 
phys_addr_t msi_addr)
return 0;
 }
 
-void iommu_dma_compose_msi_msg(struct msi_desc *desc,
-  struct msi_msg *msg)
+/**
+ * iommu_dma_compose_msi_msg() - Apply translation to an MSI message
+ * @desc: MSI descriptor prepared by iommu_dma_prepare_msi()
+ * @msg: MSI message containing target physical address
+ */
+void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
 {
struct device *dev = msi_desc_to_dev(desc);
const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index b249d4df899e..6e1ac330d7a6 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -13,7 +13,7 @@
 #define pr_fmt(fmt) "GICv2m: " fmt
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5ff09de6c48f..e7d8d4208ee6 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -11,9 +11,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c
index a2163d32f17d..e1efdec9e9ac 100644
--- a/drivers/irqchip/irq-gic-v3-mbi.c
+++ b/drivers/irqchip/irq-gic-v3-mbi.c
@@ -6,7 +6,7 @@
 
 #define pr_fmt(fmt) "GICv3: " fmt
 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/irqchip/irq-ls-scfg-msi.c 
b/drivers/irqchip/irq-ls-scfg-msi.c
index b4927e425f7b..527c90e0920e 100644
--- a/drivers/irqchip/irq-ls-scfg-msi.c
+++ b/drivers/irqchip/irq-ls-scfg-msi.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define MSI_IRQS_PER_MSIR  32
 #define MSI_MSIR_OFFSET4
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c766aa683110..e65861fdba7b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,7 +37,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "vfio.h"
 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..e83de4f1f3d6 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -15,27 +15,10 @@
 
 /* Domain management interface for IOMMU drivers */
 int iommu_get_dma_cookie(struct iommu_domain *domain);
-int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
-/* Setup call for arch DMA mapping code */
-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
 int iommu_dma_init_fq(struct iommu_domain *domain);
 
-/*

[PATCH 1/3] iommu/dma: Clean up Kconfig

2022-08-16 Thread Robin Murphy
Although iommu-dma is a per-architecture chonce, that is currently
implemented in a rather haphazard way. Selecting from the arch Kconfig
was the original logical approach, but is complicated by having to
manage dependencies; conversely, selecting from drivers ends up hiding
the architecture dependency *too* well. Instead, let's just have it
enable itself automatically when IOMMU API support is enabled for the
relevant architectures. It can't get much clearer than that.

Signed-off-by: Robin Murphy 
---
 arch/arm64/Kconfig  | 1 -
 drivers/iommu/Kconfig   | 3 +--
 drivers/iommu/amd/Kconfig   | 1 -
 drivers/iommu/intel/Kconfig | 1 -
 4 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..59af600445c2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,7 +209,6 @@ config ARM64
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_GENERIC_VDSO
-   select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 5c5cb5bee8b6..1d99c2d984fb 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -137,7 +137,7 @@ config OF_IOMMU
 
 # IOMMU-agnostic DMA-mapping layer
 config IOMMU_DMA
-   bool
+   def_bool ARM64 || IA64 || X86
select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
@@ -476,7 +476,6 @@ config VIRTIO_IOMMU
depends on VIRTIO
depends on (ARM64 || X86)
select IOMMU_API
-   select IOMMU_DMA
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help
diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index a3cbafb603f5..9b5fc3356bf2 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -9,7 +9,6 @@ config AMD_IOMMU
select PCI_PASID
select IOMMU_API
select IOMMU_IOVA
-   select IOMMU_DMA
select IOMMU_IO_PGTABLE
depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 39a06d245f12..c48005147ac5 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -19,7 +19,6 @@ config INTEL_IOMMU
select DMAR_TABLE
select SWIOTLB
select IOASID
-   select IOMMU_DMA
select PCI_ATS
help
  DMA remapping (DMAR) devices support enables independent address
-- 
2.36.1.dirty



[PATCH 0/3] iommu/dma: Some housekeeping

2022-08-16 Thread Robin Murphy
Hi All,

It's been a while now since iommu-dma grew from a library of DMA ops
helpers for arch code into something more abstracted and closely coupled
to the IOMMU API core, so it seemed about time to do some housekeeping
in the more neglected areas to reflect that.

The header reorganisation does touch a range of areas (a couple of which
seemingly had no reason to be involved anyway), but hopefully these are
all low-impact changes that nobody minds going through the IOMMU tree.

Now for the build-bots to tell me what I've missed...

Thanks,
Robin.


Robin Murphy (3):
  iommu/dma: Clean up Kconfig
  iommu/dma: Move public interfaces to linux/iommu.h
  iommu/dma: Make header private

 arch/arm64/Kconfig  |  1 -
 arch/arm64/mm/dma-mapping.c |  2 +-
 drivers/acpi/viot.c |  1 -
 drivers/gpu/drm/exynos/exynos_drm_dma.c |  1 -
 drivers/iommu/Kconfig   |  3 +-
 drivers/iommu/amd/Kconfig   |  1 -
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/apple-dart.c  |  3 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  2 +-
 drivers/iommu/dma-iommu.c   | 18 +++-
 drivers/iommu/dma-iommu.h   | 38 +
 drivers/iommu/intel/Kconfig |  1 -
 drivers/iommu/intel/iommu.c |  2 +-
 drivers/iommu/iommu.c   |  3 +-
 drivers/iommu/virtio-iommu.c|  3 +-
 drivers/irqchip/irq-gic-v2m.c   |  2 +-
 drivers/irqchip/irq-gic-v3-its.c|  2 +-
 drivers/irqchip/irq-gic-v3-mbi.c|  2 +-
 drivers/irqchip/irq-ls-scfg-msi.c   |  2 +-
 drivers/vfio/vfio_iommu_type1.c |  1 -
 include/linux/dma-iommu.h   | 93 -
 include/linux/iommu.h   | 36 
 23 files changed, 105 insertions(+), 116 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.h
 delete mode 100644 include/linux/dma-iommu.h

-- 
2.36.1.dirty



Re: [PATCH 3/3] drm/komeda - Fix handling of pending crtc state commit to avoid lock-up

2022-07-14 Thread Robin Murphy

On 2022-07-11 11:13, Liviu Dudau wrote:
[...]

But nothing worrying. It does work, though doesn't compile due to:

drivers/gpu/drm/arm/display/komeda/komeda_kms.c: In function
‘komeda_kms_atomic_commit_hw_done’:
drivers/gpu/drm/arm/display/komeda/komeda_kms.c:77:9: error: ‘for’ loop
initial declarations are only allowed in C99 or C11 mode
77 | for (int i = 0; i < kms->n_crtcs; i++) {
   | ^~~
drivers/gpu/drm/arm/display/komeda/komeda_kms.c:77:9: note: use option
‘-std=c9
’, ‘-std=gnu99’, ‘-std=c11’ or ‘-std=gnu11’ to compile your code

but that was a trivial fixup.


Interesting that I'm not seeing that, probably due to using GCC12? Anyway, I'll 
fix
that and send a proper patch.


FWIW we do use -std=gnu11 since 5.18 (see e8c07082a810), but I'm not 
entirely sure what the status quo is for using the new features in fixes 
which might also warrant backporting to stable. I believe Carsten's 
stuck on an older kernel thanks to constraints of the rest of that 
project ;)


Cheers,
Robin.


Re: [PATCH v11 20/24] arm64: dts: rockchip: enable vop2 and hdmi tx on rock-3a

2022-07-11 Thread Robin Murphy

On 2022-07-11 12:04, Piotr Oniszczuk wrote:




Wiadomość napisana przez Robin Murphy  w dniu 11.07.2022, 
o godz. 12:41:

On 2022-06-25 16:31, Piotr Oniszczuk wrote:

Wiadomość napisana przez Peter Geis  w dniu 25.06.2022, o 
godz. 16:00:


The first issue you have is the TV isn't responding until the absolute
end.

I suspect this is because lack on idle gaps between cec commands sent from 
board to tv.
Maybe TV sw. can't deal with consecutive commands without any idle between them?
It is interesting that disconnecting TV - so CEC line is driven only by board - 
rock3a still don't have any idle gaps while rock3b (and radxa 4.19 bsp) has 
them (very similar between 5.18mailine and 4.19 bsp).
How this is possible that change I/O from m0->m1 impacts _timings_ on free 
hanging CEC line?


Check all the pinctrl settings beyond just the function mux - pulls, drive 
strength, output type, etc. - the defaults tend to be all over the place, and 
rarely what you want.

Robin.


Robin,

I'm not sure do I looked in right place...

but:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3568-pinctrl.dtsi?h=v5.18.10#n788

vs.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3568-pinctrl.dtsi?h=v5.18.10#n795

are looking ok?


I meant more in terms of dumping out the actual hardware state to 
compare across both axes of cec_m0 vs. cec_m1 and mainline vs. BSP. 
However from a quick skim of the Rock3 schematic there doesn't appear to 
be an external pull-up, so the internal pull-up also being disabled is a 
clear suspect to start with.


Robin.


Re: [PATCH v11 20/24] arm64: dts: rockchip: enable vop2 and hdmi tx on rock-3a

2022-07-11 Thread Robin Murphy

On 2022-06-25 16:31, Piotr Oniszczuk wrote:




Wiadomość napisana przez Peter Geis  w dniu 25.06.2022, o 
godz. 16:00:


The first issue you have is the TV isn't responding until the absolute
end.


I suspect this is because lack on idle gaps between cec commands sent from 
board to tv.
Maybe TV sw. can't deal with consecutive commands without any idle between them?

It is interesting that disconnecting TV - so CEC line is driven only by board - 
rock3a still don't have any idle gaps while rock3b (and radxa 4.19 bsp) has 
them (very similar between 5.18mailine and 4.19 bsp).

How this is possible that change I/O from m0->m1 impacts _timings_ on free 
hanging CEC line?


Check all the pinctrl settings beyond just the function mux - pulls, 
drive strength, output type, etc. - the defaults tend to be all over the 
place, and rarely what you want.


Robin.


This strikes me as a signal integrity issue. Do you have an
oscilloscope (not a logic analyzer, you need voltages and ramp times)
to compare the working vs non-working signals? Check both sides of the
level shifter.


Indeed - i will verify this with digital oscilloscope.
Already ordered and must await week or 2 for delivery :-(

My analog oscilloscope shows correct levels and slopes "seems" to be the same 
like in working (no memory so i can compare only visually on fuzzy screen)

For me key is to understand why on rock3a there is no any idles between cec 
commands - even when nothing is connected to bard (so cec is only sending and 
nothing external impacts cec state machine)
___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip


Re: [PATCH] gpu: host1x: Register context bus unconditionally

2022-07-08 Thread Robin Murphy

On 2022-07-08 15:32, Thierry Reding wrote:

On Thu, Jul 07, 2022 at 06:30:44PM +0100, Robin Murphy wrote:

Conditional registration is a problem for other subsystems which may
unwittingly try to interact with host1x_context_device_bus_type in an
uninitialised state on non-Tegra platforms. A look under /sys/bus on a
typical system already reveals plenty of entries from enabled but
otherwise irrelevant configs, so lets keep things simple and register
our context bus unconditionally too.

Signed-off-by: Robin Murphy 
---
  drivers/gpu/host1x/context_bus.c | 5 -
  1 file changed, 5 deletions(-)


Applied, thanks.

Do we need this in v5.19 or is it enough if this gets into v5.20?


It's not strictly a critical fix, so I think 5.20 is fine. I plan to 
post v4 of my bus_set_iommu() series next week as the hopefully-final 
version, but at this point I think it might be safer to hold off 
actually merging that until early next cycle, to give it plenty of time 
in -next.


Thanks,
Robin.


[PATCH] gpu: host1x: Register context bus unconditionally

2022-07-07 Thread Robin Murphy
Conditional registration is a problem for other subsystems which may
unwittingly try to interact with host1x_context_device_bus_type in an
uninitialised state on non-Tegra platforms. A look under /sys/bus on a
typical system already reveals plenty of entries from enabled but
otherwise irrelevant configs, so lets keep things simple and register
our context bus unconditionally too.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/host1x/context_bus.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/host1x/context_bus.c b/drivers/gpu/host1x/context_bus.c
index b0d35b2bbe89..d9421179d7b4 100644
--- a/drivers/gpu/host1x/context_bus.c
+++ b/drivers/gpu/host1x/context_bus.c
@@ -15,11 +15,6 @@ static int __init host1x_context_device_bus_init(void)
 {
int err;
 
-   if (!of_machine_is_compatible("nvidia,tegra186") &&
-   !of_machine_is_compatible("nvidia,tegra194") &&
-   !of_machine_is_compatible("nvidia,tegra234"))
-   return 0;
-
err = bus_register(_context_device_bus_type);
if (err < 0) {
pr_err("bus type registration failed: %d\n", err);
-- 
2.36.1.dirty



Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Robin Murphy

On 2022-05-27 00:50, Dmitry Osipenko wrote:

Hello,

This patchset introduces memory shrinker for the VirtIO-GPU DRM driver
and adds memory purging and eviction support to VirtIO-GPU driver.

The new dma-buf locking convention is introduced here as well.

During OOM, the shrinker will release BOs that are marked as "not needed"
by userspace using the new madvise IOCTL, it will also evict idling BOs
to SWAP. The userspace in this case is the Mesa VirGL driver, it will mark
the cached BOs as "not needed", allowing kernel driver to release memory
of the cached shmem BOs on lowmem situations, preventing OOM kills.

The Panfrost driver is switched to use generic memory shrinker.


I think we still have some outstanding issues here - Alyssa reported 
some weirdness yesterday, so I just tried provoking a low-memory 
condition locally with this series applied and a few debug options 
enabled, and the results as below were... interesting.


Thanks,
Robin.

->8-
[   68.295951] ==
[   68.295956] WARNING: possible circular locking dependency detected
[   68.295963] 5.19.0-rc3+ #400 Not tainted
[   68.295972] --
[   68.295977] cc1/295 is trying to acquire lock:
[   68.295986] 08d7f1a0 
(reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198

[   68.296036]
[   68.296036] but task is already holding lock:
[   68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470

[   68.296080]
[   68.296080] which lock already depends on the new lock.
[   68.296080]
[   68.296085]
[   68.296085] the existing dependency chain (in reverse order) is:
[   68.296090]
[   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
[   68.296111]fs_reclaim_acquire+0xb8/0x150
[   68.296130]dma_resv_lockdep+0x298/0x3fc
[   68.296148]do_one_initcall+0xe4/0x5f8
[   68.296163]kernel_init_freeable+0x414/0x49c
[   68.296180]kernel_init+0x2c/0x148
[   68.296195]ret_from_fork+0x10/0x20
[   68.296207]
[   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   68.296229]__lock_acquire+0x1724/0x2398
[   68.296246]lock_acquire+0x218/0x5b0
[   68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378
[   68.296277]ww_mutex_lock+0x7c/0x4d8
[   68.296291]drm_gem_shmem_free+0x7c/0x198
[   68.296304]panfrost_gem_free_object+0x118/0x138
[   68.296318]drm_gem_object_free+0x40/0x68
[   68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
[   68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
[   68.296368]do_shrink_slab+0x220/0x808
[   68.296381]shrink_slab+0x11c/0x408
[   68.296392]shrink_node+0x6ac/0xb90
[   68.296403]do_try_to_free_pages+0x1dc/0x8d0
[   68.296416]try_to_free_pages+0x1ec/0x5b0
[   68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470
[   68.296444]__alloc_pages+0x4e0/0x5b8
[   68.296455]__folio_alloc+0x24/0x60
[   68.296467]vma_alloc_folio+0xb8/0x2f8
[   68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68
[   68.296498]__handle_mm_fault+0x918/0x12a8
[   68.296513]handle_mm_fault+0x130/0x300
[   68.296527]do_page_fault+0x1d0/0x568
[   68.296539]do_translation_fault+0xa0/0xb8
[   68.296551]do_mem_abort+0x68/0xf8
[   68.296562]el0_da+0x74/0x100
[   68.296572]el0t_64_sync_handler+0x68/0xc0
[   68.296585]el0t_64_sync+0x18c/0x190
[   68.296596]
[   68.296596] other info that might help us debug this:
[   68.296596]
[   68.296601]  Possible unsafe locking scenario:
[   68.296601]
[   68.296604]CPU0CPU1
[   68.296608]
[   68.296612]   lock(fs_reclaim);
[   68.296622] 
lock(reservation_ww_class_mutex);

[   68.296633]lock(fs_reclaim);
[   68.296644]   lock(reservation_ww_class_mutex);
[   68.296654]
[   68.296654]  *** DEADLOCK ***
[   68.296654]
[   68.296658] 3 locks held by cc1/295:
[   68.29]  #0: 0616e898 (>mmap_lock){}-{3:3}, at: 
do_page_fault+0x144/0x568
[   68.296702]  #1: 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470
[   68.296740]  #2: 8c1215b0 (shrinker_rwsem){}-{3:3}, at: 
shrink_slab+0xc0/0x408

[   68.296774]
[   68.296774] stack backtrace:
[   68.296780] CPU: 2 PID: 295 Comm: cc1 Not tainted 5.19.0-rc3+ #400
[   68.296794] Hardware name: ARM LTD ARM Juno Development Platform/ARM 
Juno Development Platform, BIOS EDK II Sep  3 2019

[   68.296803] Call trace:
[   68.296808]  dump_backtrace+0x1e4/0x1f0
[   68.296821]  show_stack+0x20/0x70
[   68.296832]  dump_stack_lvl+0x8c/0xb8
[   68.296849]  dump_stack+0x1c/0x38
[   68.296864]  print_circular_bug.isra.0+0x284/0x378
[   68.296881]  check_noncircular+0x1d8/0x1f8

Re: [PATCH v2] drm/sun4i: Add DMA mask and segment size

2022-06-21 Thread Robin Murphy

On 2022-06-20 19:13, Jernej Skrabec wrote:

Kernel occasionally complains that there is mismatch in segment size
when trying to render HW decoded videos and rendering them directly with
sun4i DRM driver. Following message can be observed on H6 SoC:

[  184.298308] [ cut here ]
[  184.298326] DMA-API: sun4i-drm display-engine: mapping sg segment longer 
than device claims to support [len=6144000] [max=65536]
[  184.298364] WARNING: CPU: 1 PID: 382 at kernel/dma/debug.c:1162 
debug_dma_map_sg+0x2b0/0x350
[  184.322997] CPU: 1 PID: 382 Comm: ffmpeg Not tainted 5.19.0-rc1+ #1331
[  184.329533] Hardware name: Tanix TX6 (DT)
[  184.333544] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  184.340512] pc : debug_dma_map_sg+0x2b0/0x350
[  184.344882] lr : debug_dma_map_sg+0x2b0/0x350
[  184.349250] sp : 89f33a50
[  184.352567] x29: 89f33a50 x28: 0001 x27: 01b86c00
[  184.359725] x26:  x25: 05d8cc80 x24: 
[  184.366879] x23: 8939ab18 x22: 0001 x21: 0001
[  184.374031] x20:  x19: 018a7410 x18: 
[  184.381186] x17:  x16:  x15: 
[  184.388338] x14: 0001 x13: 89534e86 x12: 6f70707573206f74
[  184.395493] x11: 20736d69616c6320 x10: 000a x9 : 0001
[  184.402647] x8 : 893b6d40 x7 : 89f33850 x6 : 000c
[  184.409800] x5 : bf997940 x4 :  x3 : 0027
[  184.416953] x2 :  x1 :  x0 : 03960e80
[  184.424106] Call trace:
[  184.426556]  debug_dma_map_sg+0x2b0/0x350
[  184.430580]  __dma_map_sg_attrs+0xa0/0x110
[  184.434687]  dma_map_sgtable+0x28/0x4c
[  184.438447]  vb2_dc_dmabuf_ops_map+0x60/0xcc
[  184.442729]  __map_dma_buf+0x2c/0xd4
[  184.446321]  dma_buf_map_attachment+0xa0/0x130
[  184.450777]  drm_gem_prime_import_dev+0x7c/0x18c
[  184.455410]  drm_gem_prime_fd_to_handle+0x1b8/0x214
[  184.460300]  drm_prime_fd_to_handle_ioctl+0x2c/0x40
[  184.465190]  drm_ioctl_kernel+0xc4/0x174
[  184.469123]  drm_ioctl+0x204/0x420
[  184.472534]  __arm64_sys_ioctl+0xac/0xf0
[  184.476474]  invoke_syscall+0x48/0x114
[  184.480240]  el0_svc_common.constprop.0+0x44/0xec
[  184.484956]  do_el0_svc+0x2c/0xc0
[  184.488283]  el0_svc+0x2c/0x84
[  184.491354]  el0t_64_sync_handler+0x11c/0x150
[  184.495723]  el0t_64_sync+0x18c/0x190
[  184.499397] ---[ end trace  ]---

Fix that by setting DMA mask and segment size.

Signed-off-by: Jernej Skrabec 
---
Changes from v1:
- added comment
- updated commit message with kernel report

  drivers/gpu/drm/sun4i/sun4i_drv.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/sun4i/sun4i_drv.c 
b/drivers/gpu/drm/sun4i/sun4i_drv.c
index 275f7e4a03ae..f135a6b3cadb 100644
--- a/drivers/gpu/drm/sun4i/sun4i_drv.c
+++ b/drivers/gpu/drm/sun4i/sun4i_drv.c
@@ -7,6 +7,7 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -367,6 +368,13 @@ static int sun4i_drv_probe(struct platform_device *pdev)
  
  	INIT_KFIFO(list.fifo);
  
+	/*

+* DE2 and DE3 cores actually supports 40-bit addresses, but
+* driver does not.
+*/
+   dma_set_mask_and_coherent(>dev, DMA_BIT_MASK(32));
+   dma_set_max_seg_size(>dev, DMA_BIT_MASK(32));


Nit: this one is a number, not a bitmask, so UINT_MAX would be more 
appropriate semantically.


Thanks,
Robin.


+
for (i = 0;; i++) {
struct device_node *pipeline = of_parse_phandle(np,

"allwinner,pipelines",


[PATCH] drm/arm/hdlcd: Simplify IRQ install/uninstall

2022-06-15 Thread Robin Murphy
Since we no longer need to conform to the structure of the various DRM
IRQ callbacks, we can streamline the code by consolidating the piecemeal
functions and passing around our private data structure directly. We're
also a platform device so should never see IRQ_NOTCONNECTED either.

Furthermore we can also get rid of all the unnecesary read-modify-write
operations, since on install we know we cleared the whole interrupt mask
before enabling the debug IRQs, and thus on uninstall we're always
clearing everything as well.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/arm/hdlcd_drv.c | 62 +
 1 file changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index 1f1171f2f16a..7d6aa9b3b577 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -41,8 +41,7 @@
 
 static irqreturn_t hdlcd_irq(int irq, void *arg)
 {
-   struct drm_device *drm = arg;
-   struct hdlcd_drm_private *hdlcd = drm->dev_private;
+   struct hdlcd_drm_private *hdlcd = arg;
unsigned long irq_status;
 
irq_status = hdlcd_read(hdlcd, HDLCD_REG_INT_STATUS);
@@ -70,61 +69,32 @@ static irqreturn_t hdlcd_irq(int irq, void *arg)
return IRQ_HANDLED;
 }
 
-static void hdlcd_irq_preinstall(struct drm_device *drm)
-{
-   struct hdlcd_drm_private *hdlcd = drm->dev_private;
-   /* Ensure interrupts are disabled */
-   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0);
-   hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, ~0);
-}
-
-static void hdlcd_irq_postinstall(struct drm_device *drm)
-{
-#ifdef CONFIG_DEBUG_FS
-   struct hdlcd_drm_private *hdlcd = drm->dev_private;
-   unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
-
-   /* enable debug interrupts */
-   irq_mask |= HDLCD_DEBUG_INT_MASK;
-
-   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
-#endif
-}
-
-static int hdlcd_irq_install(struct drm_device *drm, int irq)
+static int hdlcd_irq_install(struct hdlcd_drm_private *hdlcd)
 {
int ret;
 
-   if (irq == IRQ_NOTCONNECTED)
-   return -ENOTCONN;
+   /* Ensure interrupts are disabled */
+   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0);
+   hdlcd_write(hdlcd, HDLCD_REG_INT_CLEAR, ~0);
 
-   hdlcd_irq_preinstall(drm);
-
-   ret = request_irq(irq, hdlcd_irq, 0, drm->driver->name, drm);
+   ret = request_irq(hdlcd->irq, hdlcd_irq, 0, "hdlcd", hdlcd);
if (ret)
return ret;
 
-   hdlcd_irq_postinstall(drm);
+#ifdef CONFIG_DEBUG_FS
+   /* enable debug interrupts */
+   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, HDLCD_DEBUG_INT_MASK);
+#endif
 
return 0;
 }
 
-static void hdlcd_irq_uninstall(struct drm_device *drm)
+static void hdlcd_irq_uninstall(struct hdlcd_drm_private *hdlcd)
 {
-   struct hdlcd_drm_private *hdlcd = drm->dev_private;
/* disable all the interrupts that we might have enabled */
-   unsigned long irq_mask = hdlcd_read(hdlcd, HDLCD_REG_INT_MASK);
+   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, 0);
 
-#ifdef CONFIG_DEBUG_FS
-   /* disable debug interrupts */
-   irq_mask &= ~HDLCD_DEBUG_INT_MASK;
-#endif
-
-   /* disable vsync interrupts */
-   irq_mask &= ~HDLCD_INTERRUPT_VSYNC;
-   hdlcd_write(hdlcd, HDLCD_REG_INT_MASK, irq_mask);
-
-   free_irq(hdlcd->irq, drm);
+   free_irq(hdlcd->irq, hdlcd);
 }
 
 static int hdlcd_load(struct drm_device *drm, unsigned long flags)
@@ -184,7 +154,7 @@ static int hdlcd_load(struct drm_device *drm, unsigned long 
flags)
goto irq_fail;
hdlcd->irq = ret;
 
-   ret = hdlcd_irq_install(drm, hdlcd->irq);
+   ret = hdlcd_irq_install(hdlcd);
if (ret < 0) {
DRM_ERROR("failed to install IRQ handler\n");
goto irq_fail;
@@ -342,7 +312,7 @@ static int hdlcd_drm_bind(struct device *dev)
 err_unload:
of_node_put(hdlcd->crtc.port);
hdlcd->crtc.port = NULL;
-   hdlcd_irq_uninstall(drm);
+   hdlcd_irq_uninstall(hdlcd);
of_reserved_mem_device_release(drm->dev);
 err_free:
drm_mode_config_cleanup(drm);
@@ -364,7 +334,7 @@ static void hdlcd_drm_unbind(struct device *dev)
hdlcd->crtc.port = NULL;
pm_runtime_get_sync(dev);
drm_atomic_helper_shutdown(drm);
-   hdlcd_irq_uninstall(drm);
+   hdlcd_irq_uninstall(hdlcd);
pm_runtime_put(dev);
if (pm_runtime_enabled(dev))
pm_runtime_disable(dev);
-- 
2.36.1.dirty



[PATCH v2] drm/arm/hdlcd: Take over EFI framebuffer properly

2022-06-15 Thread Robin Murphy
The Arm Juno board EDK2 port has provided an EFI GOP display via HDLCD0
for some time now, which works nicely as an early framebuffer. However,
once the HDLCD driver probes and takes over the hardware, it should
take over the logical framebuffer as well, otherwise the now-defunct GOP
device hangs about and virtual console output inevitably disappears into
the wrong place most of the time.

We'll do this after binding the HDMI encoder, since that's the most
likely thing to fail, and the EFI console is still better than nothing
when that happens. However, the two HDLCD controllers on Juno are
independent, and many users will still be using older firmware without
any display support, so we'll only bother if we find that the HDLCD
we're probing is already enabled. And if it is, then we'll also stop it,
since otherwise the display can end up shifted if it's still scanning
out while the rest of the registers are subsequently reconfigured.

Signed-off-by: Robin Murphy 
---

Since I ended up adding (relatively) a lot here, I didn't want to
second-guess Javier's opinion so left off the R-b tag from v1.

 drivers/gpu/drm/arm/hdlcd_drv.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index e89ae0ec60eb..1f1171f2f16a 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -314,6 +315,12 @@ static int hdlcd_drm_bind(struct device *dev)
goto err_vblank;
}
 
+   /* If EFI left us running, take over from efifb/sysfb */
+   if (hdlcd_read(hdlcd, HDLCD_REG_COMMAND)) {
+   hdlcd_write(hdlcd, HDLCD_REG_COMMAND, 0);
+   drm_aperture_remove_framebuffers(false, _driver);
+   }
+
drm_mode_config_reset(drm);
drm_kms_helper_poll_init(drm);
 
-- 
2.36.1.dirty



Re: [PATCH] drm/arm/hdlcd: Take over EFI framebuffer properly

2022-06-14 Thread Robin Murphy

On 2022-06-14 14:48, Thomas Zimmermann wrote:

Hi

Am 14.06.22 um 15:04 schrieb Robin Murphy:

The Arm Juno board EDK2 port has provided an EFI GOP display via HDLCD0
for some time now, which works nicely as an early framebuffer. However,
once the HDLCD driver probes and takes over the hardware, it should
take over the logical framebuffer as well, otherwise the now-defunct GOP
device hangs about and virtual console output inevitably disappears into
the wrong place most of the time.

Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/arm/hdlcd_drv.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c 
b/drivers/gpu/drm/arm/hdlcd_drv.c

index af59077a5481..a5d04884658b 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -331,6 +331,8 @@ static int hdlcd_drm_bind(struct device *dev)
  goto err_vblank;
  }
+    drm_fb_helper_remove_conflicting_framebuffers(NULL, "hdlcd", false);
+


In addition to what Javier said, it appears to be too late to call this 
function. If anything her etouches hardware, you might accidentally 
interfere with the EFI-related driver. Rather call it at the top of 
ldlcd_drm_bind().


OK, thanks for the info. I mostly just copied the pattern from the 
simplest-looking other users (sun4i, tegra, vc4) who all seemed to call 
it fairly late, and indeed naively it seemed logical not to do it *too* 
early when there's more chance we might fail to bind and leave the user 
with no framebuffer at all. In particular, waiting until we've bound the 
HDMI encoder seems like a good idea in the case of the Juno board (which 
is the only real HDLCD user), as the I2C bus often gets stuck if the 
System Control Processor is having a bad day. I also don't believe 
there's anything here that would affect efifb more than the fact that 
once the DRM CRTC is alive we simply stop scanning out from the region 
of memory that efifb is managing, but if it's considered good practice 
to do this early then I can certainly make that change too.


Cheers,
Robin.


Re: [PATCH] drm/arm/hdlcd: Take over EFI framebuffer properly

2022-06-14 Thread Robin Murphy

On 2022-06-14 14:26, Javier Martinez Canillas wrote:

Hello Robin,

On 6/14/22 15:04, Robin Murphy wrote:

The Arm Juno board EDK2 port has provided an EFI GOP display via HDLCD0
for some time now, which works nicely as an early framebuffer. However,
once the HDLCD driver probes and takes over the hardware, it should
take over the logical framebuffer as well, otherwise the now-defunct GOP
device hangs about and virtual console output inevitably disappears into
the wrong place most of the time.

Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/arm/hdlcd_drv.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index af59077a5481..a5d04884658b 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -331,6 +331,8 @@ static int hdlcd_drm_bind(struct device *dev)
goto err_vblank;
}
  
+	drm_fb_helper_remove_conflicting_framebuffers(NULL, "hdlcd", false);

+


Seems you are using an older base, since this function doesn't exist anymore
after commit 603dc7ed917f ("drm/aperture: Inline fbdev conflict helpers into
aperture helpers").


Ah, you got me! I'm having to work with a 5.10 kernel at the moment, but 
the randomly-disappearing console had finally sufficiently annoyed me 
into figuring out and fixing it.



Instead, you should use the drm_aperture_remove_framebuffers() function, i.e:

  + drm_aperture_remove_framebuffers(false, _driver);

If you do that and re-spin the patch, feel free to add:

Reviewed-by: Javier Martinez Canillas 


Thanks for the advice and review - I'll send a v2 later once I've had 
time to build and boot test 5.19-rc.


Cheers,
Robin.


[PATCH] drm/arm/hdlcd: Take over EFI framebuffer properly

2022-06-14 Thread Robin Murphy
The Arm Juno board EDK2 port has provided an EFI GOP display via HDLCD0
for some time now, which works nicely as an early framebuffer. However,
once the HDLCD driver probes and takes over the hardware, it should
take over the logical framebuffer as well, otherwise the now-defunct GOP
device hangs about and virtual console output inevitably disappears into
the wrong place most of the time.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/arm/hdlcd_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c
index af59077a5481..a5d04884658b 100644
--- a/drivers/gpu/drm/arm/hdlcd_drv.c
+++ b/drivers/gpu/drm/arm/hdlcd_drv.c
@@ -331,6 +331,8 @@ static int hdlcd_drm_bind(struct device *dev)
goto err_vblank;
}
 
+   drm_fb_helper_remove_conflicting_framebuffers(NULL, "hdlcd", false);
+
drm_mode_config_reset(drm);
drm_kms_helper_poll_init(drm);
 
-- 
2.36.1.dirty



Re: [PATCH v4] dma-buf: Add a capabilities directory

2022-06-06 Thread Robin Murphy

On 2022-06-06 16:22, Greg KH wrote:

On Mon, Jun 06, 2022 at 04:10:09PM +0100, Robin Murphy wrote:

On 2022-06-02 07:47, Daniel Vetter wrote:

On Thu, 2 Jun 2022 at 08:34, Simon Ser  wrote:


On Thursday, June 2nd, 2022 at 08:25, Greg KH  wrote:


On Thu, Jun 02, 2022 at 06:17:31AM +, Simon Ser wrote:


On Thursday, June 2nd, 2022 at 07:40, Greg KH g...@kroah.com wrote:


On Wed, Jun 01, 2022 at 04:13:14PM +, Simon Ser wrote:


To discover support for new DMA-BUF IOCTLs, user-space has no
choice but to try to perform the IOCTL on an existing DMA-BUF.


Which is correct and how all kernel features work (sorry I missed the
main goal of this patch earlier and focused only on the sysfs stuff).


However, user-space may want to figure out whether or not the
IOCTL is available before it has a DMA-BUF at hand, e.g. at
initialization time in a Wayland compositor.


Why not just do the ioctl in a test way? That's how we determine kernel
features, we do not poke around in sysfs to determine what is, or is
not, present at runtime.


Add a /sys/kernel/dmabuf/caps directory which allows the DMA-BUF
subsystem to advertise supported features. Add a
sync_file_import_export entry which indicates that importing and
exporting sync_files from/to DMA-BUFs is supported.


No, sorry, this is not a sustainable thing to do for all kernel features
over time. Please just do the ioctl and go from there. sysfs is not
for advertising what is and is not enabled/present in a kernel with
regards to functionality or capabilities of the system.

If sysfs were to export this type of thing, it would have to do it for
everything, not just some random tiny thing of one kernel driver.


I'd argue that DMA-BUF is a special case here.


So this is special and unique just like everything else? :)


To check whether the import/export IOCTLs are available, user-space
needs a DMA-BUF to try to perform the IOCTL. To get a DMA-BUF,
user-space needs to enumerate GPUs, pick one at random, load GBM or
Vulkan, use that heavy-weight API to allocate a "fake" buffer on the
GPU, export that buffer into a DMA-BUF, try the IOCTL, then teardown
all of this. There is no other way.

This sounds like a roundabout way to answer the simple question "is the
IOCTL available?". Do you have another suggestion to address this
problem?


What does userspace do differently if the ioctl is present or not?


Globally enable a synchronization API for Wayland clients, for instance
in the case of a Wayland compositor.


And why is this somehow more special than of the tens of thousands of
other ioctl calls where you have to do exactly the same thing you list
above to determine if it is present or not?


For other IOCTLs it's not as complicated to obtain a FD to do the test
with.


Two expand on this:

- compositor opens the drm render /dev node
- compositor initializes the opengl or vulkan userspace driver on top of that
- compositor asks that userspace driver to allocate some buffer, which
can be pretty expensive
- compositor asks the userspace driver to export that buffer into a dma-buf
- compositor can finally do the test ioctl, realizes support isn't
there and tosses the entire thing

read() on a sysfs file is so much more reasonable it's not even funny.


Just a drive-by observation, so apologies if I'm overlooking something
obvious, but it sounds like the ideal compromise would be to expose a sysfs
file which behaves as a dummy exported dma-buf. That way userspace could
just open() it and try ioctl() directly - assuming that supported operations
can fail distinctly from unsupported ones, or succeed as a no-op - which
seems even simpler still.


ioctl() will not work on a sysfs file, sorry.


Ah, fair enough - TBH I should have just said "a file", since I presume 
some sort of /dev/dma-buf might also be an option, if a bit more work to 
implement.


I'll scuttle back to my low-level DMA corner now :)

Cheers,
Robin.


Re: [PATCH v4] dma-buf: Add a capabilities directory

2022-06-06 Thread Robin Murphy

On 2022-06-02 07:47, Daniel Vetter wrote:

On Thu, 2 Jun 2022 at 08:34, Simon Ser  wrote:


On Thursday, June 2nd, 2022 at 08:25, Greg KH  wrote:


On Thu, Jun 02, 2022 at 06:17:31AM +, Simon Ser wrote:


On Thursday, June 2nd, 2022 at 07:40, Greg KH g...@kroah.com wrote:


On Wed, Jun 01, 2022 at 04:13:14PM +, Simon Ser wrote:


To discover support for new DMA-BUF IOCTLs, user-space has no
choice but to try to perform the IOCTL on an existing DMA-BUF.


Which is correct and how all kernel features work (sorry I missed the
main goal of this patch earlier and focused only on the sysfs stuff).


However, user-space may want to figure out whether or not the
IOCTL is available before it has a DMA-BUF at hand, e.g. at
initialization time in a Wayland compositor.


Why not just do the ioctl in a test way? That's how we determine kernel
features, we do not poke around in sysfs to determine what is, or is
not, present at runtime.


Add a /sys/kernel/dmabuf/caps directory which allows the DMA-BUF
subsystem to advertise supported features. Add a
sync_file_import_export entry which indicates that importing and
exporting sync_files from/to DMA-BUFs is supported.


No, sorry, this is not a sustainable thing to do for all kernel features
over time. Please just do the ioctl and go from there. sysfs is not
for advertising what is and is not enabled/present in a kernel with
regards to functionality or capabilities of the system.

If sysfs were to export this type of thing, it would have to do it for
everything, not just some random tiny thing of one kernel driver.


I'd argue that DMA-BUF is a special case here.


So this is special and unique just like everything else? :)


To check whether the import/export IOCTLs are available, user-space
needs a DMA-BUF to try to perform the IOCTL. To get a DMA-BUF,
user-space needs to enumerate GPUs, pick one at random, load GBM or
Vulkan, use that heavy-weight API to allocate a "fake" buffer on the
GPU, export that buffer into a DMA-BUF, try the IOCTL, then teardown
all of this. There is no other way.

This sounds like a roundabout way to answer the simple question "is the
IOCTL available?". Do you have another suggestion to address this
problem?


What does userspace do differently if the ioctl is present or not?


Globally enable a synchronization API for Wayland clients, for instance
in the case of a Wayland compositor.


And why is this somehow more special than of the tens of thousands of
other ioctl calls where you have to do exactly the same thing you list
above to determine if it is present or not?


For other IOCTLs it's not as complicated to obtain a FD to do the test
with.


Two expand on this:

- compositor opens the drm render /dev node
- compositor initializes the opengl or vulkan userspace driver on top of that
- compositor asks that userspace driver to allocate some buffer, which
can be pretty expensive
- compositor asks the userspace driver to export that buffer into a dma-buf
- compositor can finally do the test ioctl, realizes support isn't
there and tosses the entire thing

read() on a sysfs file is so much more reasonable it's not even funny.


Just a drive-by observation, so apologies if I'm overlooking something 
obvious, but it sounds like the ideal compromise would be to expose a 
sysfs file which behaves as a dummy exported dma-buf. That way userspace 
could just open() it and try ioctl() directly - assuming that supported 
operations can fail distinctly from unsupported ones, or succeed as a 
no-op - which seems even simpler still.


Robin.


Re: [PATCH v5 5/9] iommu/arm-smmu: Attach to host1x context device bus

2022-05-16 Thread Robin Murphy

On 2022-05-16 11:13, Mikko Perttunen wrote:

On 5/16/22 13:07, Will Deacon wrote:

On Mon, May 16, 2022 at 11:52:54AM +0300, cyn...@kapsi.fi wrote:

From: Mikko Perttunen 

Set itself as the IOMMU for the host1x context device bus, containing
"dummy" devices used for Host1x context isolation.

Signed-off-by: Mikko Perttunen 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index 568cce590ccc..9ff54eaecf81 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -39,6 +39,7 @@
  #include 
  #include 
+#include 
  #include "arm-smmu.h"
@@ -2053,8 +2054,20 @@ static int arm_smmu_bus_init(struct iommu_ops 
*ops)

  goto err_reset_pci_ops;
  }
  #endif
+#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS
+    if (!iommu_present(_context_device_bus_type)) {
+    err = bus_set_iommu(_context_device_bus_type, ops);
+    if (err)
+    goto err_reset_fsl_mc_ops;
+    }
+#endif
+
  return 0;
+err_reset_fsl_mc_ops: __maybe_unused;
+#ifdef CONFIG_FSL_MC_BUS
+    bus_set_iommu(_mc_bus_type, NULL);
+#endif


bus_set_iommu() is going away:

https://lore.kernel.org/r/cover.1650890638.git.robin.mur...@arm.com

Will


Thanks for the heads-up. Robin had pointed out that this work was 
ongoing but I hadn't seen the patches yet. I'll look into it.


Although that *is* currently blocked on the mystery intel-iommu problem 
that I can't reproduce... If this series is ready to land right now for 
5.19 then in principle that might be the easiest option overall. 
Hopefully at least patch #2 could sneak in so that the compile-time 
dependencies are ready for me to roll up host1x into the next rebase of 
"iommu: Always register bus notifiers".


Cheers,
Robin.


Re: [PATCH v11 20/24] arm64: dts: rockchip: enable vop2 and hdmi tx on rock-3a

2022-05-12 Thread Robin Murphy

On 2022-05-08 17:53, Peter Geis wrote:

On Sun, May 8, 2022 at 9:40 AM Piotr Oniszczuk
 wrote:





Wiadomość napisana przez Sascha Hauer  w dniu 
22.04.2022, o godz. 09:28:

From: Michael Riesch 

Enable the RK356x Video Output Processor (VOP) 2 on the Radxa
ROCK3 Model A.

Signed-off-by: Michael Riesch 
Reported-by: kernel test robot 
Link: 
https://lore.kernel.org/r/20220310210352.451136-4-michael.rie...@wolfvision.net
Signed-off-by: Sascha Hauer 
---


Sascha, Michael,


Good Afternoon,


I'm using v11 series on 5.18-rc5 on rk3566 tvbox with great success.
Recently i started to work on rock3-a (rk3568).
v11 gives me video, audio - but cec is not working on rock3-a.

I was told:

32k clock needed for cec and this clock is generated by the rtc which is 
embedded in the rk8xx regulator.
So you should make sure it is enabled when hdmi is powerd on, eg adding it to 
the RK3568_PD_VO powerdomain should help

I was trying to do this in dts https://pastebin.com/67wu9QrH but cec is still 
non-functional

Maybe You have some hints/pointers here?


Add the following to the HDMI node:
assigned-clocks = < CLK_HDMI_CEC>;
assigned-clock-rates = <32768>;

The issue is the clk_rtc32k_frac clock that feeds clk_rtc_32k which
feeds clk_hdmi_cec is 24mhz at boot, which is too high for CEC to
function.
I submitted a patch to have the hdmi driver handle this, but it broke
other SoCs because 32k is an optional clock.
Since this is the case, I'd like Robin to weigh in on going the
assigned-clock route again.


(did you mean to CC me or have I missed another thread elsewhere?)

FWIW I still think it would be good to fix the clock driver(s) and/or 
DTs to correctly deal with the availability and configuration of xin_32k 
where appropriate. However, much like the HCLK_VO mess I guess that's a 
larger cleanup tangent in its own right, so using "assigned-clocks" for 
this one case in the meantime doesn't seem unreasonable. I was 
optimistic for the cleanest, most generic solution, but if reality gets 
in the way then oh well.


Judging by the datasheet, RK3568 might actually have a similar situation 
with its clk32k_in pin, so you may want "assigned-clock-parents" as well 
to ensure the whole clk_rtc32k branch is really set up the way you 
currently expect - baking any more assumptions into DTBs now only seems 
to add potential for breakage if kernel behaviour changes in future.


Robin.


Re: [PATCH v2 5/5] drm/msm: switch msm_kms_init_aspace() to use device_iommu_mapped()

2022-05-05 Thread Robin Murphy

On 2022-05-05 01:16, Dmitry Baryshkov wrote:

Change msm_kms_init_aspace() to use generic function
device_iommu_mapped() instead of the fwnode-specific interface
dev_iommu_fwspec_get(). While we are at it, stop referencing
platform_bus_type directly and use the bus of the IOMMU device.


FWIW, I'd have squashed these changes across the previous patches, such 
that the dodgy fwspec calls are never introduced in the first place, but 
it's your driver, and if that's the way you want to work it and Rob's 
happy with it too, then fine by me.


For the end result,

Reviewed-by: Robin Murphy 

I'm guessing MDP4 could probably use msm_kms_init_aspace() now as well, 
but unless there's any other reason to respin this series, that's 
something we could do as a follow-up. Thanks for sorting this out!


Robin.


Suggested-by: Robin Murphy 
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/msm_drv.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 98ae0036ab57..2fc3f820cd59 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -272,21 +272,21 @@ struct msm_gem_address_space *msm_kms_init_aspace(struct 
drm_device *dev)
struct device *mdss_dev = mdp_dev->parent;
struct device *iommu_dev;
  
-	domain = iommu_domain_alloc(_bus_type);

-   if (!domain) {
-   drm_info(dev, "no IOMMU, fallback to phys contig buffers for 
scanout\n");
-   return NULL;
-   }
-
/*
 * IOMMUs can be a part of MDSS device tree binding, or the
 * MDP/DPU device.
 */
-   if (dev_iommu_fwspec_get(mdp_dev))
+   if (device_iommu_mapped(mdp_dev))
iommu_dev = mdp_dev;
else
iommu_dev = mdss_dev;
  
+	domain = iommu_domain_alloc(iommu_dev->bus);

+   if (!domain) {
+   drm_info(dev, "no IOMMU, fallback to phys contig buffers for 
scanout\n");
+   return NULL;
+   }
+
mmu = msm_iommu_new(iommu_dev, domain);
if (IS_ERR(mmu)) {
iommu_domain_free(domain);


Re: [PATCH v2] drm/tegra: Stop using iommu_present()

2022-05-04 Thread Robin Murphy

On 2022-05-04 01:52, Dmitry Osipenko wrote:

On 4/11/22 16:46, Robin Murphy wrote:

@@ -1092,6 +1092,19 @@ static bool host1x_drm_wants_iommu(struct host1x_device 
*dev)
struct host1x *host1x = dev_get_drvdata(dev->dev.parent);
struct iommu_domain *domain;
  
+	/* For starters, this is moot if no IOMMU is available */

+   if (!device_iommu_mapped(>dev))
+   return false;


Unfortunately this returns false on T30 with enabled IOMMU because we
don't use IOMMU for Host1x on T30 [1] to optimize performance. We can't
change it until we will update drivers to support Host1x-dedicated buffers.


Huh, so is dev->dev here not the DRM device? If it is, and 
device_iommu_mapped() returns false, then the later iommu_attach_group() 
call is going to fail anyway, so there's not much point allocating a 
domain. If it's not, then what the heck is host1x_drm_wants_iommu() 
actually testing for?


In the not-too-distant future we'll need to pass an appropriate IOMMU 
client device to iommu_domain_alloc() as well, so the sooner we can get 
this code straight the better.


Thanks,
Robin.



[1]
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/host1x/dev.c#L258



Re: [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage

2022-05-03 Thread Robin Murphy

On 2022-05-03 14:30, Dmitry Baryshkov wrote:

On Tue, 3 May 2022 at 13:57, Robin Murphy  wrote:


On 2022-05-01 11:10, Dmitry Baryshkov wrote:

Move iommu_domain_alloc() in front of adress space/IOMMU initialization.
This allows us to drop final bits of struct mdp5_cfg_platform which
remained from the pre-DT days.

Signed-off-by: Dmitry Baryshkov 
---
   drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 
   drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h |  6 --
   drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  6 --
   3 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
index 1bf9ff5dbabc..714effb967ff 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
@@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = {
   { .revision = 3, .config = { .hw = _config } },
   };

-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev);
-
   const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler 
*cfg_handler)
   {
   return cfg_handler->config.hw;
@@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
   uint32_t major, uint32_t minor)
   {
   struct drm_device *dev = mdp5_kms->dev;
- struct platform_device *pdev = to_platform_device(dev->dev);
   struct mdp5_cfg_handler *cfg_handler;
   const struct mdp5_cfg_handler *cfg_handlers;
- struct mdp5_cfg_platform *pconfig;
   int i, ret = 0, num_handlers;

   cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL);
@@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
   cfg_handler->revision = minor;
   cfg_handler->config.hw = mdp5_cfg;

- pconfig = mdp5_get_config(pdev);
- memcpy(_handler->config.platform, pconfig, sizeof(*pconfig));
-
   DBG("MDP5: %s hw config selected", mdp5_cfg->name);

   return cfg_handler;
@@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,

   return ERR_PTR(ret);
   }
-
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev)
-{
- static struct mdp5_cfg_platform config = {};
-
- config.iommu = iommu_domain_alloc(_bus_type);
-
- return 
-}
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
index 6b03d7899309..c2502cc33864 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
@@ -104,14 +104,8 @@ struct mdp5_cfg_hw {
   uint32_t max_clk;
   };

-/* platform config data (ie. from DT, or pdata) */
-struct mdp5_cfg_platform {
- struct iommu_domain *iommu;
-};
-
   struct mdp5_cfg {
   const struct mdp5_cfg_hw *hw;
- struct mdp5_cfg_platform platform;
   };

   struct mdp5_kms;
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 9b7bbc3adb97..1c67c2c828cd 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev)
   struct msm_gem_address_space *aspace;
   int irq, i, ret;
   struct device *iommu_dev;
+ struct iommu_domain *iommu;

   ret = mdp5_init(to_platform_device(dev->dev), dev);

@@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev)
   }
   mdelay(16);

- if (config->platform.iommu) {
+ iommu = iommu_domain_alloc(_bus_type);


To preempt the next change down the line as well, could this be
rearranged to work as iommu_domain_alloc(iommu_dev->bus)?


I'd prefer to split this into the separate change, if you don't mind.


Oh, for sure, divide the patches however you see fit - I'm just hoping 
to save your time overall by getting all the IOMMU-related refactoring 
done now as a single series rather than risk me coming back and breaking 
things again in a few months :)


Cheers,
Robin.






+ if (iommu) {
   struct msm_mmu *mmu;

   iommu_dev = >dev;
   if (!dev_iommu_fwspec_get(iommu_dev))


The fwspec helpers are more of an internal thing between the IOMMU
drivers and the respective firmware code - I'd rather that external API
users stuck consistently to using device_iommu_mapped() (it should give
the same result).


Let me check that it works correctly and spin a v2 afterwards.



Otherwise, thanks for sorting this out!

Robin.


   iommu_dev = iommu_dev->parent;

- mmu = msm_iommu_new(iommu_dev, config->platform.iommu);
+ mmu = msm_iommu_new(iommu_dev, iommu);

   aspace = msm_gem_address_space_create(mmu, "mdp5",
   0x1000, 0x1 - 0x1000);






Re: [PATCH v11 11/24] drm/rockchip: dw_hdmi: Use auto-generated tables

2022-05-03 Thread Robin Murphy

On 2022-05-03 12:02, Heiko Stübner wrote:

Am Freitag, 22. April 2022, 09:28:28 CEST schrieb Sascha Hauer:

From: Douglas Anderson 

The previous tables for mpll_cfg and curr_ctrl were created using the
20-pages of example settings provided by the PHY vendor.  Those
example settings weren't particularly dense, so there were places
where we were guessing what the settings would be for 10-bit and
12-bit (not that we use those anyway).  It was also always a lot of
extra work every time we wanted to add a new clock rate since we had
to cross-reference several tables.

In  I've gone through the work to figure
out how to generate this table automatically.  Let's now use the
automatically generated table and then we'll never need to look at it
again.

We only support 8-bit mode right now and only support a small number
of clock rates and I've verified that the only 8-bit rate that was
affected was 148.5.  That mode appears to have been wrong in the old
table.

Signed-off-by: Douglas Anderson 
Signed-off-by: Yakir Yang 
Signed-off-by: Sascha Hauer 


This breaks hdmi on my rk3328-rock64 which then ends up in a
[CRTC:37:crtc-0] vblank wait timed out

warning-loop.


Oh yeah, that... IIRC from back when I was looking at it, it's because 
the inno-hdmi phy does its own rate validation at a point when it's 
already far too late to actually reject the mode. It manages to work at 
the moment because its set of supported rates mostly line up with those 
for the Synopsys phy which dw-hdmi-rockchip still insists on validating 
against even when a vendor phy is present.



Some part of the patch11-14 range also creates sparking horizontal
lines on my rk3288-pinky.


I figure that's the PLL jitter issue that's come up before. Similarly, 
when I last tried hacking in a 154MHz rate for my monitor's 1920x1200 
mode, it was rock solid on RK3399, but intolerably glitchy on RK3288.


Robin.


I haven't the time to dig overly deep into that, so left out the
hdmi-rate patches (11-14) for now.


Heiko



---

Notes:
 Changes since v5:
 - Add missing Signed-off-by me
 
 Changes since v3:

 - new patch

  drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 130 +++-
  1 file changed, 69 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index fe4f9556239ac..cb43e7b47157d 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -91,80 +91,88 @@ static struct rockchip_hdmi *to_rockchip_hdmi(struct 
drm_encoder *encoder)
  
  static const struct dw_hdmi_mpll_config rockchip_mpll_cfg[] = {

{
-   2700, {
-   { 0x00b3, 0x},
-   { 0x2153, 0x},
-   { 0x40f3, 0x}
+   30666000, {
+   { 0x00b3, 0x },
+   { 0x2153, 0x },
+   { 0x40f3, 0x },
},
-   }, {
-   3600, {
-   { 0x00b3, 0x},
-   { 0x2153, 0x},
-   { 0x40f3, 0x}
+   },  {
+   3680, {
+   { 0x00b3, 0x },
+   { 0x2153, 0x },
+   { 0x40a2, 0x0001 },
},
-   }, {
-   4000, {
-   { 0x00b3, 0x},
-   { 0x2153, 0x},
-   { 0x40f3, 0x}
+   },  {
+   4600, {
+   { 0x00b3, 0x },
+   { 0x2142, 0x0001 },
+   { 0x40a2, 0x0001 },
},
-   }, {
-   5400, {
-   { 0x0072, 0x0001},
-   { 0x2142, 0x0001},
-   { 0x40a2, 0x0001},
+   },  {
+   61333000, {
+   { 0x0072, 0x0001 },
+   { 0x2142, 0x0001 },
+   { 0x40a2, 0x0001 },
},
-   }, {
-   6500, {
-   { 0x0072, 0x0001},
-   { 0x2142, 0x0001},
-   { 0x40a2, 0x0001},
+   },  {
+   7360, {
+   { 0x0072, 0x0001 },
+   { 0x2142, 0x0001 },
+   { 0x4061, 0x0002 },
},
-   }, {
-   6600, {
-   { 0x013e, 0x0003},
-   { 0x217e, 0x0002},
-   { 0x4061, 0x0002}
+   },  {
+   9200, {
+   { 0x0072, 0x0001 },
+   { 0x2145, 0x0002 },
+   { 0x4061, 0x0002 },
},
-   }, {
-   7425, {
-   { 0x0072, 0x0001},
-   { 0x2145, 0x0002},
-

Re: [PATCH 2/3] drm/msm/mdp5: move iommu_domain_alloc() call close to its usage

2022-05-03 Thread Robin Murphy

On 2022-05-01 11:10, Dmitry Baryshkov wrote:

Move iommu_domain_alloc() in front of adress space/IOMMU initialization.
This allows us to drop final bits of struct mdp5_cfg_platform which
remained from the pre-DT days.

Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 16 
  drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h |  6 --
  drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  6 --
  3 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
index 1bf9ff5dbabc..714effb967ff 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c
@@ -1248,8 +1248,6 @@ static const struct mdp5_cfg_handler cfg_handlers_v3[] = {
{ .revision = 3, .config = { .hw = _config } },
  };
  
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev);

-
  const struct mdp5_cfg_hw *mdp5_cfg_get_hw_config(struct mdp5_cfg_handler 
*cfg_handler)
  {
return cfg_handler->config.hw;
@@ -1274,10 +1272,8 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
uint32_t major, uint32_t minor)
  {
struct drm_device *dev = mdp5_kms->dev;
-   struct platform_device *pdev = to_platform_device(dev->dev);
struct mdp5_cfg_handler *cfg_handler;
const struct mdp5_cfg_handler *cfg_handlers;
-   struct mdp5_cfg_platform *pconfig;
int i, ret = 0, num_handlers;
  
  	cfg_handler = kzalloc(sizeof(*cfg_handler), GFP_KERNEL);

@@ -1320,9 +1316,6 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
cfg_handler->revision = minor;
cfg_handler->config.hw = mdp5_cfg;
  
-	pconfig = mdp5_get_config(pdev);

-   memcpy(_handler->config.platform, pconfig, sizeof(*pconfig));
-
DBG("MDP5: %s hw config selected", mdp5_cfg->name);
  
  	return cfg_handler;

@@ -1333,12 +1326,3 @@ struct mdp5_cfg_handler *mdp5_cfg_init(struct mdp5_kms 
*mdp5_kms,
  
  	return ERR_PTR(ret);

  }
-
-static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev)
-{
-   static struct mdp5_cfg_platform config = {};
-
-   config.iommu = iommu_domain_alloc(_bus_type);
-
-   return 
-}
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
index 6b03d7899309..c2502cc33864 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.h
@@ -104,14 +104,8 @@ struct mdp5_cfg_hw {
uint32_t max_clk;
  };
  
-/* platform config data (ie. from DT, or pdata) */

-struct mdp5_cfg_platform {
-   struct iommu_domain *iommu;
-};
-
  struct mdp5_cfg {
const struct mdp5_cfg_hw *hw;
-   struct mdp5_cfg_platform platform;
  };
  
  struct mdp5_kms;

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index 9b7bbc3adb97..1c67c2c828cd 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -558,6 +558,7 @@ static int mdp5_kms_init(struct drm_device *dev)
struct msm_gem_address_space *aspace;
int irq, i, ret;
struct device *iommu_dev;
+   struct iommu_domain *iommu;
  
  	ret = mdp5_init(to_platform_device(dev->dev), dev);
  
@@ -601,14 +602,15 @@ static int mdp5_kms_init(struct drm_device *dev)

}
mdelay(16);
  
-	if (config->platform.iommu) {

+   iommu = iommu_domain_alloc(_bus_type);


To preempt the next change down the line as well, could this be 
rearranged to work as iommu_domain_alloc(iommu_dev->bus)?



+   if (iommu) {
struct msm_mmu *mmu;
  
  		iommu_dev = >dev;

if (!dev_iommu_fwspec_get(iommu_dev))


The fwspec helpers are more of an internal thing between the IOMMU 
drivers and the respective firmware code - I'd rather that external API 
users stuck consistently to using device_iommu_mapped() (it should give 
the same result).


Otherwise, thanks for sorting this out!

Robin.


iommu_dev = iommu_dev->parent;
  
-		mmu = msm_iommu_new(iommu_dev, config->platform.iommu);

+   mmu = msm_iommu_new(iommu_dev, iommu);
  
  		aspace = msm_gem_address_space_create(mmu, "mdp5",

0x1000, 0x1 - 0x1000);


Re: [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"

2022-04-19 Thread Robin Murphy

On 2022-04-19 22:08, Dmitry Baryshkov wrote:

On 20/04/2022 00:04, Robin Murphy wrote:

On 2022-04-19 14:04, Dmitry Baryshkov wrote:

This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit
in question makes msm_use_mmu() check whether the DRM 'component master'
device is translated by the IOMMU. At this moment it is the 'mdss'
device.
However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016,
MSM8996/APQ8096) it's the mdp5 device, which has the iommus property
(and thus is "translated by the IOMMU"). This results in these devices
being broken with the following lines in the dmesg.

[drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw 
from new location
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw 
from new location

msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: could not allocate stolen bo
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to 
allocate buffer object

msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb

Getting the mdp5 device pointer from this function is not that easy at
this moment. Thus this patch is reverted till the MDSS rework [1] lands.
It will make the mdp5/dpu1 device component master and the check will be
legit.

[1] https://patchwork.freedesktop.org/series/98525/


Oh, DRM...

If that series is going to land got 5.19, could you please implement 
the correct equivalent of this patch within it?


Yes, that's the plan. I'm sending a reworked version of your patch 
shortly (but it still depends on [1]).




I'm fine with the revert for now if this patch doesn't work properly 
in all cases, but I have very little sympathy left for DRM drivers 
riding roughshod over all the standard driver model abstractions 
because they're "special". iommu_present() *needs* to go away, so if 
it's left to me to have a second go at fixing this driver next cycle, 
you're liable to get some abomination based on 
of_find_compatible_node() or similar, and I'll probably be demanding 
an ack to take it through the IOMMU tree ;)


No need for such measures :-)


Awesome, thanks!

Robin.


Re: [PATCH] drm/msm: Revert "drm/msm: Stop using iommu_present()"

2022-04-19 Thread Robin Murphy

On 2022-04-19 14:04, Dmitry Baryshkov wrote:

This reverts commit e2a88eabb02410267519b838fb9b79f5206769be. The commit
in question makes msm_use_mmu() check whether the DRM 'component master'
device is translated by the IOMMU. At this moment it is the 'mdss'
device.
However on platforms using the MDP5 driver (e.g. MSM8916/APQ8016,
MSM8996/APQ8096) it's the mdp5 device, which has the iommus property
(and thus is "translated by the IOMMU"). This results in these devices
being broken with the following lines in the dmesg.

[drm] Initialized msm 1.9.0 20130625 for 1a0.mdss on minor 0
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pm4.fw from new 
location
msm 1a0.mdss: [drm:adreno_request_fw] loaded qcom/a300_pfp.fw from new 
location
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: could not allocate stolen bo
msm 1a0.mdss: [drm:get_pages] *ERROR* could not get pages: -28
msm 1a0.mdss: [drm:msm_alloc_stolen_fb] *ERROR* failed to allocate buffer 
object
msm 1a0.mdss: [drm:msm_fbdev_create] *ERROR* failed to allocate fb

Getting the mdp5 device pointer from this function is not that easy at
this moment. Thus this patch is reverted till the MDSS rework [1] lands.
It will make the mdp5/dpu1 device component master and the check will be
legit.

[1] https://patchwork.freedesktop.org/series/98525/


Oh, DRM...

If that series is going to land got 5.19, could you please implement the 
correct equivalent of this patch within it?


I'm fine with the revert for now if this patch doesn't work properly in 
all cases, but I have very little sympathy left for DRM drivers riding 
roughshod over all the standard driver model abstractions because 
they're "special". iommu_present() *needs* to go away, so if it's left 
to me to have a second go at fixing this driver next cycle, you're 
liable to get some abomination based on of_find_compatible_node() or 
similar, and I'll probably be demanding an ack to take it through the 
IOMMU tree ;)


Thanks,
Robin.


Fixes: e2a88eabb024 ("drm/msm: Stop using iommu_present()")
Signed-off-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/msm_drv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index b6702b0fafcb..e2b5307b2360 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -263,7 +263,7 @@ bool msm_use_mmu(struct drm_device *dev)
struct msm_drm_private *priv = dev->dev_private;
  
  	/* a2xx comes with its own MMU */

-   return priv->is_a2xx || device_iommu_mapped(dev->dev);
+   return priv->is_a2xx || iommu_present(_bus_type);
  }
  
  static int msm_init_vram(struct drm_device *dev)


[PATCH v2] drm: tegra: Include DMA API header where used

2022-04-11 Thread Robin Murphy
Even though the IOVA API never actually needed it, iova.h is still
carrying an include of dma-mapping.h, now solely for the sake of not
breaking tegra-drm. Fix that properly.

Signed-off-by: Robin Murphy 
---

v2: Apparently nvdec.c needs one now too.

 drivers/gpu/drm/tegra/dc.c| 1 +
 drivers/gpu/drm/tegra/hub.c   | 1 +
 drivers/gpu/drm/tegra/nvdec.c | 1 +
 drivers/gpu/drm/tegra/plane.c | 1 +
 4 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index c6951cf5d2ca..bfc79c61bca6 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index b8d3174c04c9..5f9b85959fae 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/tegra/nvdec.c b/drivers/gpu/drm/tegra/nvdec.c
index 79e1e88203cf..b412cc5c6db2 100644
--- a/drivers/gpu/drm/tegra/nvdec.c
+++ b/drivers/gpu/drm/tegra/nvdec.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/gpu/drm/tegra/plane.c b/drivers/gpu/drm/tegra/plane.c
index e0e6938c6200..e46adb107f77 100644
--- a/drivers/gpu/drm/tegra/plane.c
+++ b/drivers/gpu/drm/tegra/plane.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2017 NVIDIA CORPORATION.  All rights reserved.
  */
 
+#include 
 #include 
 #include 
 
-- 
2.28.0.dirty



[PATCH v2] drm/tegra: Stop using iommu_present()

2022-04-11 Thread Robin Murphy
Refactor the confusing logic to make it both clearer and more robust. If
the host1x parent device does have an IOMMU domain then iommu_present()
is redundantly true, while otherwise for the 32-bit DMA mask case it
still doesn't say whether the IOMMU driver actually knows about the DRM
device or not.

Signed-off-by: Robin Murphy 
---

v2: Fix logic for older SoCs and clarify.

 drivers/gpu/drm/tegra/drm.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..4f2bdab31064 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1092,6 +1092,19 @@ static bool host1x_drm_wants_iommu(struct host1x_device 
*dev)
struct host1x *host1x = dev_get_drvdata(dev->dev.parent);
struct iommu_domain *domain;
 
+   /* For starters, this is moot if no IOMMU is available */
+   if (!device_iommu_mapped(>dev))
+   return false;
+
+   /*
+* Tegra20 and Tegra30 don't support addressing memory beyond the
+* 32-bit boundary, so the regular GATHER opcodes will always be
+* sufficient and whether or not the host1x is attached to an IOMMU
+* doesn't matter.
+*/
+   if (host1x_get_dma_mask(host1x) <= DMA_BIT_MASK(32))
+   return true;
+
/*
 * If the Tegra DRM clients are backed by an IOMMU, push buffers are
 * likely to be allocated beyond the 32-bit boundary if sufficient
@@ -1122,14 +1135,13 @@ static bool host1x_drm_wants_iommu(struct host1x_device 
*dev)
domain = iommu_get_domain_for_dev(dev->dev.parent);
 
/*
-* Tegra20 and Tegra30 don't support addressing memory beyond the
-* 32-bit boundary, so the regular GATHER opcodes will always be
-* sufficient and whether or not the host1x is attached to an IOMMU
-* doesn't matter.
+* At the moment, the exact type of domain doesn't actually matter.
+* Only for 64-bit kernels might this be a managed DMA API domain, and
+* then only on newer SoCs using arm-smmu, since tegra-smmu doesn't
+* support default domains at all, and since those SoCs are the same
+* ones with extended GATHER support, even if it's a passthrough domain
+* it can still work out OK.
 */
-   if (!domain && host1x_get_dma_mask(host1x) <= DMA_BIT_MASK(32))
-   return true;
-
return domain != NULL;
 }
 
@@ -1149,7 +1161,7 @@ static int host1x_drm_probe(struct host1x_device *dev)
goto put;
}
 
-   if (host1x_drm_wants_iommu(dev) && iommu_present(_bus_type)) {
+   if (host1x_drm_wants_iommu(dev)) {
tegra->domain = iommu_domain_alloc(_bus_type);
if (!tegra->domain) {
err = -ENOMEM;
-- 
2.28.0.dirty



Re: [PATCH] drm/tegra: Stop using iommu_present()

2022-04-08 Thread Robin Murphy

On 2022-04-07 18:51, Dmitry Osipenko wrote:

On 4/6/22 21:06, Robin Murphy wrote:

On 2022-04-06 15:32, Dmitry Osipenko wrote:

On 4/5/22 17:19, Robin Murphy wrote:

Remove the pointless check. host1x_drm_wants_iommu() cannot return true
unless an IOMMU exists for the host1x platform device, which at the
moment
means the iommu_present() test could never fail.

Signed-off-by: Robin Murphy 
---
   drivers/gpu/drm/tegra/drm.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..bc4321561400 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1149,7 +1149,7 @@ static int host1x_drm_probe(struct
host1x_device *dev)
   goto put;
   }
   -    if (host1x_drm_wants_iommu(dev) &&
iommu_present(_bus_type)) {
+    if (host1x_drm_wants_iommu(dev)) {
   tegra->domain = iommu_domain_alloc(_bus_type);
   if (!tegra->domain) {
   err = -ENOMEM;


host1x_drm_wants_iommu() returns true if there is no IOMMU for the
host1x platform device of Tegra20/30 SoCs.


Ah, apparently this is another example of what happens when I write
patches late on a Friday night...

So on second look, what we want to ascertain here is whether dev has an
IOMMU, but only if the host1x parent is not addressing-limited, either
because it can also use the IOMMU, or because all possible addresses are
small enough anyway, right?


Yes


Are we specifically looking for the host1x
having a DMA-API-managed domain, or can that also end up using the
tegra->domain or another unmanaged domain too?


We have host1x DMA that could have:

1. No IOMMU domain, depending on kernel/DT config
2. Managed domain, on newer SoCs
3. Unmanaged domain, on older SoCs

We have Tegra DRM devices which can:

1. Be attached to a shared unmanaged tegra->domain, on older SoCs
2. Have own managed domains, on newer SoCs


I can't quite figure out
from the comments whether it's physical addresses, IOVAs, or both that
we're concerned with here.


Tegra DRM allocates buffers and submits jobs to h/w using host1x's
channel DMA. DRM framebuffers' addresses are inserted into host1x
command buffers by kernel driver and addresses beyond 32bit space need
to be treated specially, we don't support such addresses in upstream.

IOMMU AS is limited to 32bits on Tegra in upstream kernel for pre-T186
SoCs, it hides 64bit addresses from host1x. Post-T186 SoCs have extra
features that allow kernel driver not to bother about addresses.

For newer ARM64 SoCs there is assumption in the Tegra drivers that IOMMU
always presents, to simplify things.


That summary helps a lot, thank you!

I was particularly worried about the case where the host1x has a 
passthrough domain, which we'll assume is a DMA domain and leave in 
place, but if all the SoCs with the 32-bit gather limitation are also 
the ones with tegra-smmu, which doesn't support default domains anyway, 
then it sounds like that's a non-issue.


I'll give this a bit more thought to make sure I really get it right, 
and send a v2 next week.


Thanks,
Robin.


Re: [PATCH] drm/tegra: Stop using iommu_present()

2022-04-06 Thread Robin Murphy

On 2022-04-06 15:32, Dmitry Osipenko wrote:

On 4/5/22 17:19, Robin Murphy wrote:

Remove the pointless check. host1x_drm_wants_iommu() cannot return true
unless an IOMMU exists for the host1x platform device, which at the moment
means the iommu_present() test could never fail.

Signed-off-by: Robin Murphy 
---
  drivers/gpu/drm/tegra/drm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..bc4321561400 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1149,7 +1149,7 @@ static int host1x_drm_probe(struct host1x_device *dev)
goto put;
}
  
-	if (host1x_drm_wants_iommu(dev) && iommu_present(_bus_type)) {

+   if (host1x_drm_wants_iommu(dev)) {
tegra->domain = iommu_domain_alloc(_bus_type);
if (!tegra->domain) {
err = -ENOMEM;


host1x_drm_wants_iommu() returns true if there is no IOMMU for the
host1x platform device of Tegra20/30 SoCs.


Ah, apparently this is another example of what happens when I write 
patches late on a Friday night...


So on second look, what we want to ascertain here is whether dev has an 
IOMMU, but only if the host1x parent is not addressing-limited, either 
because it can also use the IOMMU, or because all possible addresses are 
small enough anyway, right? Are we specifically looking for the host1x 
having a DMA-API-managed domain, or can that also end up using the 
tegra->domain or another unmanaged domain too? I can't quite figure out 
from the comments whether it's physical addresses, IOVAs, or both that 
we're concerned with here.


Thanks,
Robin.


Re: [PATCH v9 01/23] clk: rk3568: Mark hclk_vo as critical

2022-04-06 Thread Robin Murphy

On 2022-03-28 16:10, Sascha Hauer wrote:

Whenever pclk_vo is enabled hclk_vo must be enabled as well. This is
described in the Reference Manual as:

| 2.8.6 NIU Clock gating reliance
|
| A part of niu clocks have a dependence on another niu clock in order to
| sharing the internal bus. When these clocks are in use, another niu
| clock must be opened, and cannot be gated.  These clocks and the special
| clock on which they are relied are as following:
|
| Clocks which have dependency The clock which can not be gated
| -
| ...
| pclk_vo_niu, hclk_vo_s_niu   hclk_vo_niu
| ...

The clock framework doesn't offer a way to enable clock B whenever clock A is
enabled, at least not when B is not an ancestor of A. Workaround this by
marking hclk_vo as critical so it is never disabled. This is suboptimal in
terms of power consumption, but a stop gap solution until the clock framework
has a way to deal with this.

We have this clock tree:

|  aclk_vo  220   3  0 
0  5 Y
| aclk_hdcp 000   3  0 
0  5 N
| pclk_vo   2307500  0 
0  5 Y
|pclk_edp_ctrl  0007500  0 
0  5 N
|pclk_dsitx_1   0007500  0 
0  5 N
|pclk_dsitx_0   1207500  0 
0  5 Y
|pclk_hdmi_host 1207500  0 
0  5 Y
|pclk_hdcp  0007500  0 
0  5 N
| hclk_vo   250   15000  0 
0  5 Y
|hclk_hdcp  000   15000  0 
0  5 N
|hclk_vop   020   15000  0 
0  5 N

Without this patch the edp, dsitx, hdmi and hdcp driver would enable their
clocks which then enables pclk_vo, but hclk_vo stays disabled and register
accesses just hang. hclk_vo is enabled by the VOP2 driver, so reproducibility
of this issue depends on the probe order.


FWIW,

Reviewed-by: Robin Murphy 


Signed-off-by: Sascha Hauer 
---

Notes:
 Changes since v8:
 - new patch

  drivers/clk/rockchip/clk-rk3568.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/clk/rockchip/clk-rk3568.c 
b/drivers/clk/rockchip/clk-rk3568.c
index 63dfbeeeb06d9..62694d95173ab 100644
--- a/drivers/clk/rockchip/clk-rk3568.c
+++ b/drivers/clk/rockchip/clk-rk3568.c
@@ -1591,6 +1591,7 @@ static const char *const rk3568_cru_critical_clocks[] 
__initconst = {
"hclk_php",
"pclk_php",
"hclk_usb",
+   "hclk_vo",
  };
  
  static const char *const rk3568_pmucru_critical_clocks[] __initconst = {


  1   2   3   4   5   6   >