Re: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-17 Thread Nicolin Chen via iommu
On Fri, Jun 17, 2022 at 02:53:13AM +, Tian, Kevin wrote:
> > > ...
> > > > - if (resv_msi) {
> > > > + if (resv_msi && !domain->msi_cookie) {
> > > >   ret = iommu_get_msi_cookie(domain->domain,
> > > > resv_msi_base);
> > > >   if (ret && ret != -ENODEV)
> > > >   goto out_detach;
> > > > + domain->msi_cookie = true;
> > > >   }
> > >
> > > why not moving to alloc_attach_domain() then no need for the new
> > > domain field? It's required only when a new domain is allocated.
> >
> > When reusing an existing domain that doesn't have an msi_cookie,
> > we can do iommu_get_msi_cookie() if resv_msi is found. So it is
> > not limited to a new domain.
> 
> Looks msi_cookie requirement is per platform (currently only
> for smmu. see arm_smmu_get_resv_regions()). If there is
> no mixed case then above check is not required.

Do you mean "reusing existing domain" for the "mixed case"?

> But let's hear whether Robin has a different thought here.

Yea, sure.

> > > > - iommu_domain_free(domain->domain);
> > > > - list_del(>next);
> > > > - kfree(domain);
> > > > - vfio_iommu_aper_expand(iommu, _copy);
> > >
> > > Previously the aperture is adjusted when a domain is freed...
> > >
> > > > - vfio_update_pgsize_bitmap(iommu);
> > > > - }
> > > > - /*
> > > > -  * Removal of a group without dirty tracking may allow
> > > > -  * the iommu scope to be promoted.
> > > > -  */
> > > > - if (!group->pinned_page_dirty_scope) {
> > > > - iommu->num_non_pinned_groups--;
> > > > - if (iommu->dirty_page_tracking)
> > > > - vfio_iommu_populate_bitmap_full(iommu);
> > > > - }
> > > > + vfio_iommu_detach_destroy_domain(domain, iommu,
> > > > group);
> > > >   kfree(group);
> > > >   break;
> > > >   }
> > > >
> > > > + vfio_iommu_aper_expand(iommu, _copy);
> > >
> > > but now it's done for every group detach. The aperture is decided
> > > by domain geometry which is not affected by attached groups.
> >
> > Yea, I've noticed this part. Actually Jason did this change for
> > simplicity, and I think it'd be safe to do so?
> 
> Perhaps detach_destroy() can return a Boolean to indicate whether
> a domain is destroyed.

It could be a solution but doesn't feel that common for a clean
function to have a return value indicating a special case. Maybe
passing in "" so that we can check if it's NULL after?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Aw: Re: helping with remapping vmem for dma

2022-06-17 Thread Frank Wunderlich
> Gesendet: Freitag, 17. Juni 2022 um 19:22 Uhr
> Von: "Robin Murphy" 
> An: "Frank Wunderlich" , "Christoph Hellwig" 
> 
> Cc: linux-ker...@vger.kernel.org, iommu@lists.linux-foundation.org, "Marek 
> Szyprowski" 
> Betreff: Re: helping with remapping vmem for dma
>
> On 2022-06-17 17:17, Frank Wunderlich wrote:
> > Am 15. Juni 2022 15:17:00 MESZ schrieb Christoph Hellwig :
> >> On Wed, Jun 15, 2022 at 02:15:33PM +0100, Robin Murphy wrote:
> >>> Put simply, if you want to call dma_map_single() on a buffer, then that
> >>> buffer needs to be allocated with kmalloc() (or technically alloc_pages(),
> >>> but then dma_map_page() would make more sense when dealing with entire
> >>> pages.
> >>
> >> Yes.  It sounds like the memory here comes from the dma coherent
> >> allocator, in which case the code need to use the address returned
> >> by that and not create another mapping.
> >
> > Hi
> >
> > traced it to buffer allocated as simple uint8-array [1]:
> >
> > UINT_8 aucBuffer[sizeof(INIT_HIF_RX_HEADER_T) + 
> > sizeof(INIT_EVENT_CMD_RESULT)];
>
> Ah, so it's trying to do DMA with a stack variable? CONFIG_DMA_API_DEBUG
> is your friend; it should have screamed about that specifically.
> Allocate this buffer properly to begin with, and free it again on the
> way out of the function (it's surely not worth having to make a
> temporary copy further down the callchain). The kmalloc flags can
> probably be regular GFP_KERNEL, unless this can be called from more
> restrictive contexts like an IRQ handler - the fact that it might be
> mapped for DMA later is essentially irrelevant in that respect.

Hi,

simply replaced the stack-vars to uint_8-pointers and using kmalloc/kfree for
memory handling (needed to replace some returns to goto to always free memory).

Thanks very much for support, driver is now working again :)

https://github.com/frank-w/BPI-R2-4.14/commit/7f3a721d5b0d8ca44935c23d5513a19cc57786c0

> Thanks,
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: helping with remapping vmem for dma

2022-06-17 Thread Robin Murphy

On 2022-06-17 17:17, Frank Wunderlich wrote:

Am 15. Juni 2022 15:17:00 MESZ schrieb Christoph Hellwig :

On Wed, Jun 15, 2022 at 02:15:33PM +0100, Robin Murphy wrote:

Put simply, if you want to call dma_map_single() on a buffer, then that
buffer needs to be allocated with kmalloc() (or technically alloc_pages(),
but then dma_map_page() would make more sense when dealing with entire
pages.


Yes.  It sounds like the memory here comes from the dma coherent
allocator, in which case the code need to use the address returned
by that and not create another mapping.


Hi

traced it to buffer allocated as simple uint8-array [1]:

UINT_8 aucBuffer[sizeof(INIT_HIF_RX_HEADER_T) + sizeof(INIT_EVENT_CMD_RESULT)];


Ah, so it's trying to do DMA with a stack variable? CONFIG_DMA_API_DEBUG 
is your friend; it should have screamed about that specifically. 
Allocate this buffer properly to begin with, and free it again on the 
way out of the function (it's surely not worth having to make a 
temporary copy further down the callchain). The kmalloc flags can 
probably be regular GFP_KERNEL, unless this can be called from more 
restrictive contexts like an IRQ handler - the fact that it might be 
mapped for DMA later is essentially irrelevant in that respect.


Thanks,
Robin.



and then called as

nicRxWaitResponse(prAdapter,0,aucBuffer,sizeof(INIT_HIF_RX_HEADER_T) + 
sizeof(INIT_EVENT_CMD_RESULT),/* 4B + 4B */
)

this calls [2]:

WLAN_STATUS
nicRxWaitResponse(IN P_ADAPTER_T prAdapter,
  IN UINT_8 ucPortIdx, OUT PUINT_8 pucRspBuffer, IN UINT_32 
u4MaxRespBufferLen, OUT PUINT_32 pu4Length)
{
...
HAL_PORT_RD(prAdapter,ucPortIdx == 0 ? MCR_WRDR0 : MCR_WRDR1, u4PktLen, 
pucRspBuffer, u4MaxRespBufferLen);
}


nicRxWaitResponse contains a do-while(true)-loop, but it looks like this is 
failing on first call (i see my debug before call of HAL_PORT_RD only once)...

do i need kmalloc before call of nicRxWaitResponse and if yes which flags are 
right here?
https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html

callstack is like this:

[  126.919569]  __dma_page_dev_to_cpu from kalDevPortRead+0x3a0/0x6b4 
[wlan_gen2]
[  126.928643]  kalDevPortRead [wlan_gen2] from nicRxWaitResponse+0x4c0/0x61c 
[wlan_gen2]
[  126.939752]  nicRxWaitResponse [wlan_gen2] from 
wlanImageSectionDownloadStatus.part.0+0xd0/0x26c [wlan_gen2]
[  126.952783]  wlanImageSectionDownloadStatus.part.0 [wlan_gen2] from 
wlanImageSectionDownload+0x168/0x36c [wlan_gen2]
[  126.966511]  wlanImageSectionDownload [wlan_gen2] from 
wlanAdapterStart+0x674/0xf94 [wlan_gen2]
[  126.978367]  wlanAdapterStart [wlan_gen2] from wlanProbe+0x318/0xbe8 
[wlan_gen2]
[  126.988951]  wlanProbe [wlan_gen2] from HifAhbProbe+0x54/0x88 [wlan_gen2]
[  126.998905]  HifAhbProbe [wlan_gen2] from wmt_func_wifi_on+0x4c/0x150

regards Frank

[1] 
https://github.com/frank-w/BPI-R2-4.14/blob/5.18-main/drivers/misc/mediatek/connectivity/wlan/gen2/common/wlan_lib.c#L3046
[2] 
https://github.com/frank-w/BPI-R2-4.14/blob/5.18-main/drivers/misc/mediatek/connectivity/wlan/gen2/nic/nic_rx.c#L3604

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: helping with remapping vmem for dma

2022-06-17 Thread Frank Wunderlich
Am 15. Juni 2022 15:17:00 MESZ schrieb Christoph Hellwig :
>On Wed, Jun 15, 2022 at 02:15:33PM +0100, Robin Murphy wrote:
>> Put simply, if you want to call dma_map_single() on a buffer, then that
>> buffer needs to be allocated with kmalloc() (or technically alloc_pages(),
>> but then dma_map_page() would make more sense when dealing with entire
>> pages.
>
>Yes.  It sounds like the memory here comes from the dma coherent
>allocator, in which case the code need to use the address returned
>by that and not create another mapping.

Hi

traced it to buffer allocated as simple uint8-array [1]:

UINT_8 aucBuffer[sizeof(INIT_HIF_RX_HEADER_T) + sizeof(INIT_EVENT_CMD_RESULT)];

and then called as

nicRxWaitResponse(prAdapter,0,aucBuffer,sizeof(INIT_HIF_RX_HEADER_T) + 
sizeof(INIT_EVENT_CMD_RESULT),/* 4B + 4B */
)

this calls [2]:

WLAN_STATUS
nicRxWaitResponse(IN P_ADAPTER_T prAdapter,
  IN UINT_8 ucPortIdx, OUT PUINT_8 pucRspBuffer, IN UINT_32 
u4MaxRespBufferLen, OUT PUINT_32 pu4Length)
{
...
HAL_PORT_RD(prAdapter,ucPortIdx == 0 ? MCR_WRDR0 : MCR_WRDR1, u4PktLen, 
pucRspBuffer, u4MaxRespBufferLen);
}


nicRxWaitResponse contains a do-while(true)-loop, but it looks like this is 
failing on first call (i see my debug before call of HAL_PORT_RD only once)...

do i need kmalloc before call of nicRxWaitResponse and if yes which flags are 
right here?
https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html

callstack is like this:

[  126.919569]  __dma_page_dev_to_cpu from kalDevPortRead+0x3a0/0x6b4 
[wlan_gen2]
[  126.928643]  kalDevPortRead [wlan_gen2] from nicRxWaitResponse+0x4c0/0x61c 
[wlan_gen2]
[  126.939752]  nicRxWaitResponse [wlan_gen2] from 
wlanImageSectionDownloadStatus.part.0+0xd0/0x26c [wlan_gen2]
[  126.952783]  wlanImageSectionDownloadStatus.part.0 [wlan_gen2] from 
wlanImageSectionDownload+0x168/0x36c [wlan_gen2]
[  126.966511]  wlanImageSectionDownload [wlan_gen2] from 
wlanAdapterStart+0x674/0xf94 [wlan_gen2]
[  126.978367]  wlanAdapterStart [wlan_gen2] from wlanProbe+0x318/0xbe8 
[wlan_gen2]
[  126.988951]  wlanProbe [wlan_gen2] from HifAhbProbe+0x54/0x88 [wlan_gen2]
[  126.998905]  HifAhbProbe [wlan_gen2] from wmt_func_wifi_on+0x4c/0x150

regards Frank

[1] 
https://github.com/frank-w/BPI-R2-4.14/blob/5.18-main/drivers/misc/mediatek/connectivity/wlan/gen2/common/wlan_lib.c#L3046
[2] 
https://github.com/frank-w/BPI-R2-4.14/blob/5.18-main/drivers/misc/mediatek/connectivity/wlan/gen2/nic/nic_rx.c#L3604
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock

2022-06-17 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 3 files changed, 194 insertions(+), 39 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8090130b544b..5d46271982d5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5869,8 +5869,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..7157428cf3ac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +120,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+void __init swiotlb_adjust_nareas(unsigned int nareas);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline void swiotlb_adjust_nareas(unsigned int nareas)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..139d08068912 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,8 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+#define DEFAULT_NUM_AREAS 1
+
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
@@ -70,6 +72,7 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = DEFAULT_NUM_AREAS;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +82,10 @@ setup_io_tlb_npages(char *str)
default_nslabs =

Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-06-17 Thread Marek Szyprowski
Hi Robin,

On 06.06.2022 14:38, Robin Murphy wrote:
> On 2022-06-02 16:58, Marek Szyprowski wrote:
>> On 23.05.2022 19:30, Robin Murphy wrote:
>>> On 2022-05-11 13:15, Ajay Kumar wrote:
 From: Marek Szyprowski 

 Zero is a valid DMA and IOVA address on many architectures, so adjust
 the
 IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
 (~0UL) is introduced as a generic value for the error case. Adjust all
 callers of the alloc_iova_fast() function for the new return value.
>>>
>>> And when does anything actually need this? In fact if you were to stop
>>> iommu-dma from reserving IOVA 0 - which you don't - it would only show
>>> how patch #3 is broken.
>>
>> I don't get why you says that patch #3 is broken.
>
> My mistake, in fact it's already just as broken either way - I forgot 
> that that's done implicitly via iovad->start_pfn rather than an actual 
> reserve_iova() entry. I see there is an initial calculation attempting 
> to honour start_pfn in patch #3, but it's always immediately 
> overwritten. More critically, the rb_first()/rb_next walk means the 
> starting point can only creep inevitably upwards as older allocations 
> are freed, so will end up pathologically stuck at or above limit_pfn 
> and unable to recover. At best it's more "next-fit" than "first-fit", 
> and TBH the fact that it could ever even allocate anything at all in 
> an empty domain makes me want to change the definition of IOVA_ANCHOR ;)
>
>> Them main issue with
>> address 0 being reserved appears when one uses first fit allocation
>> algorithm. In such case the first allocation will be at the 0 address,
>> what causes some issues. This patch simply makes the whole iova related
>> code capable of such case. This mimics the similar code already used on
>> ARM 32bit. There are drivers that rely on the way the IOVAs are
>> allocated. This is especially important for the drivers for devices with
>> limited addressing capabilities. And there exists such and they can be
>> even behind the IOMMU.
>>
>> Lets focus on the s5p-mfc driver and the way one of the supported
>> hardware does the DMA. The hardware has very limited DMA window (256M)
>> and addresses all memory buffers as an offset from the firmware base.
>> Currently it has been assumed that the first allocated buffer will be on
>> the beginning of the IOVA space. This worked fine on ARM 32bit and
>> supporting such case is a target of this patchset.
>
> I still understand the use-case; I object to a solution where PFN 0 is 
> always reserved yet sometimes allocated. Not to mention the slightly 
> more fundamental thing of the whole lot not actually working the way 
> it's supposed to.
>
> I also remain opposed to baking in secret ABIs where allocators are 
> expected to honour a particular undocumented behaviour. FWIW I've had 
> plans for a while now to make the IOVA walk bidirectional to make the 
> existing retry case more efficient, at which point it's then almost 
> trivial to implement "bottom-up" allocation, which I'm thinking I 
> might then force on by default for CONFIG_ARM to minimise any further 
> surprises for v2 of the default domain conversion. However I'm 
> increasingly less convinced that it's really sufficient for your case, 
> and am leaning back towards the opinion that any driver with really 
> specific constraints on how its DMA addresses are laid out should not 
> be passively relying on a generic allocator to do exactly what it 
> wants. A simple flag saying "try to allocate DMA addresses bottom-up" 
> doesn't actually mean "allocate this thing on a 256MB-aligned address 
> and everything subsequent within a 256MB window above it", so let's 
> not build a fragile house of cards on top of pretending that it does.


Okay, I see your point. I'm fine with adding more of the IOVA allocation 
logic to the respective driver (s5p-mfc), however I would still like to 
use the dma-mapping framework and helpers, which depend on it (like 
v4l2-videobuf2, dma-buf, and so on). Would it be fine for you to let 
device drivers to access the IOVA domains hidden by the DMA-IOMMU glue 
code to mark certain parts of IOVA space as reserved or to manually 
remap the buffer (like firmware) with specific address requirements?


Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-17 Thread Zhangfei Gao



On 2022/6/17 下午4:20, Zhangfei Gao wrote:



On 2022/6/17 下午2:05, Zhangfei Gao wrote:



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode 
*inode, struct file *filep)

   if (!q)
   return -ENOMEM;
+    mutex_lock(>queues_lock);
+
+    if (!uacce->parent->driver) {
I don't think this is useful, because the core clears 
parent->driver after

having run uacce_remove():

    rmmod hisi_zip    open()
 ... uacce_fops_open()
 __device_release_driver()  ...
  pci_device_remove()
   hisi_zip_remove()
    hisi_qm_uninit()
 uacce_remove()
  ...  ...
   mutex_lock(uacce->queues_lock)
  ...  if (!uacce->parent->driver)
  device_unbind_cleanup()  /* driver still valid, proceed */
   dev->driver = NULL
The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer

may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.
Ah right, relying on the return value of bind() wouldn't be enough 
even if

we mandated SVA.

[...]
I think we need the global uacce_mutex to serialize uacce_remove() 
and

uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would 
try to
obtain the uacce object from the xarray while holding the mutex, 
which

fails if the uacce object is being removed.
Since fops_open get char device refcount, uacce_release will not 
happen

until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer 
uacce_remove()

runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */
 ...    rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open 
release the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue 
in open should fail.


Accessing uacce->ops after free_module() is a use-after-free. We 
need all

you men parent release the resources.
the fops to synchronize with uacce_remove() to ensure they don't use 
any

resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all 
dma first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: 
hisi_qm_wait_task_finish, which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 
8b700204

pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(>queues_lock);
mutex_lock(_mutex);

Or we set all the callbacks of uacce_ops to NULL?

How about in uacce_remove
mutex_lock(_mutex);
uacce->ops = NULL;
mutex_unlock(_mutex);

And check uacce->ops  first when using.



Diff like this, will merge together.

 drivers/misc/uacce/uacce.c | 65 --
 1 file changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index b6219c6bfb48..311192728132 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -23,6 +23,11 @@ static int uacce_start_queue(struct uacce_queue *q)
     goto out_with_lock;
 }

+    if (!q->uacce->ops) {
+        ret = -EINVAL;
+        goto out_with_lock;
+    }
+
 if (q->uacce->ops->start_queue) {
     ret = q->uacce->ops->start_queue(q);
     if (ret < 0)
@@ -46,6 +51,9 @@ static int uacce_put_queue(struct uacce_queue *q)
 if (q->state == UACCE_Q_ZOMBIE)
     goto out;

+    if (!uacce->ops)
+        goto out;
+
 if ((q->state == UACCE_Q_STARTED) && uacce->ops->stop_queue)
     uacce->ops->stop_queue(q);

@@ -65,6 +73,7 @@ static long uacce_fops_unl_ioctl(struct file *filep,
 {
 struct uacce_queue *q = filep->private_data;
 struct uacce_device 

Re: [PATCH v2 2/3] scsi: BusLogic remove bus_to_virt

2022-06-17 Thread Robin Murphy

On 2022-06-17 13:57, Arnd Bergmann wrote:

From: Arnd Bergmann 

The BusLogic driver is the last remaining driver that relies on the
deprecated bus_to_virt() function, which in turn only works on a few
architectures, and is incompatible with both swiotlb and iommu support.

Before commit 391e2f25601e ("[SCSI] BusLogic: Port driver to 64-bit."),
the driver had a dependency on x86-32, presumably because of this
problem. However, the change introduced another bug that made it still
impossible to use the driver on any 64-bit machine.

This was in turn fixed in commit 56f396146af2 ("scsi: BusLogic: Fix
64-bit system enumeration error for Buslogic"), 8 years later, which
shows that there are not a lot of users.

Maciej is still using the driver on 32-bit hardware, and Khalid mentioned
that the driver works with the device emulation used in VirtualBox
and VMware. Both of those only emulate it for Windows 2000 and older
operating systems that did not ship with the better LSI logic driver.

Do a minimum fix that searches through the list of descriptors to find
one that matches the bus address. This is clearly as inefficient as
was indicated in the code comment about the lack of a bus_to_virt()
replacement. A better fix would likely involve changing out the entire
descriptor allocation for a simpler one, but that would be much
more invasive.


FWIW, if efficiency *is* a practical concern, even under the current 
allocation scheme it looks like there are only 4 actual DMA allocations 
to search through. From a quick scan (since it's too hot here not to get 
distracted...), if I wanted to optimise this in future I'd probably 
remove the semi-redundant allocgrp_* fields from struct blogic_ccb and 
hang a dedicated list of the block allocations off the adapter - at that 
point the lookup could likely already be more efficient than a 
theoretical dma_to_virt() interface would be if it had to go off and 
walk an IOMMU pagetable. Then the next question would be whether it's 
viable to make a single 32KB allocation rather 4*8KB, so it's no longer 
even a list.


For now, though, I agree with the simple change that's clear and easy to 
reason about:


Reviewed-by: Robin Murphy 

Thanks,
Robin.


Cc: Maciej W. Rozycki 
Cc: Matt Wang 
Cc: Khalid Aziz 
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/BusLogic.c | 27 ---
  drivers/scsi/Kconfig|  2 +-
  2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index a897c8f914cf..d057abfcdd5c 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -2515,12 +2515,26 @@ static int blogic_resultcode(struct blogic_adapter 
*adapter,
return (hoststatus << 16) | tgt_status;
  }
  
+/*

+ * turn the dma address from an inbox into a ccb pointer
+ * This is rather inefficient.
+ */
+static struct blogic_ccb *
+blogic_inbox_to_ccb(struct blogic_adapter *adapter, struct blogic_inbox *inbox)
+{
+   struct blogic_ccb *ccb;
+
+   for (ccb = adapter->all_ccbs; ccb; ccb = ccb->next_all)
+   if (inbox->ccb == ccb->dma_handle)
+   break;
+
+   return ccb;
+}
  
  /*

blogic_scan_inbox scans the Incoming Mailboxes saving any
Incoming Mailbox entries for completion processing.
  */
-
  static void blogic_scan_inbox(struct blogic_adapter *adapter)
  {
/*
@@ -2540,16 +2554,7 @@ static void blogic_scan_inbox(struct blogic_adapter 
*adapter)
enum blogic_cmplt_code comp_code;
  
  	while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) {

-   /*
-  We are only allowed to do this because we limit our
-  architectures we run on to machines where bus_to_virt(
-  actually works.  There *needs* to be a dma_addr_to_virt()
-  in the new PCI DMA mapping interface to replace
-  bus_to_virt() or else this code is going to become very
-  innefficient.
-*/
-   struct blogic_ccb *ccb =
-   (struct blogic_ccb *) bus_to_virt(next_inbox->ccb);
+   struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter, 
adapter->next_inbox);
if (comp_code != BLOGIC_CMD_NOTFOUND) {
if (ccb->status == BLOGIC_CCB_ACTIVE ||
ccb->status == BLOGIC_CCB_RESET) {
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index cf75588a2587..56bdc08d0b77 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -513,7 +513,7 @@ config SCSI_HPTIOP
  
  config SCSI_BUSLOGIC

tristate "BusLogic SCSI support"
-   depends on PCI && SCSI && VIRT_TO_BUS
+   depends on PCI && SCSI
help
  This is support for BusLogic MultiMaster and FlashPoint SCSI Host
  Adapters. Consult the SCSI-HOWTO, available from

___
iommu mailing list

[PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-17 Thread Arnd Bergmann
From: Arnd Bergmann 

All architecture-independent users of virt_to_bus() and bus_to_virt()
have been fixed to use the dma mapping interfaces or have been
removed now.  This means the definitions on most architectures, and the
CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed.

The only exceptions to this are a few network and scsi drivers for m68k
Amiga and VME machines and ppc32 Macintosh. These drivers work correctly
with the old interfaces and are probably not worth changing.

On alpha and parisc, virt_to_bus() were still used in asm/floppy.h.
alpha can use isa_virt_to_bus() like x86 does, and parisc can just
open-code the virt_to_phys() here, as this is architecture specific
code.

I tried updating the bus-virt-phys-mapping.rst documentation, which
started as an email from Linus to explain some details of the Linux-2.0
driver interfaces. The bits about virt_to_bus() were declared obsolete
backin 2000, and the rest is not all that relevant any more, so in the
end I just decided to remove the file completely.

Reviewed-by: Geert Uytterhoeven 
Acked-by: Geert Uytterhoeven 
Acked-by: Michael Ellerman  (powerpc)
Signed-off-by: Arnd Bergmann 
---
 .../core-api/bus-virt-phys-mapping.rst| 220 --
 Documentation/core-api/dma-api-howto.rst  |  14 --
 Documentation/core-api/index.rst  |   1 -
 .../translations/zh_CN/core-api/index.rst |   1 -
 arch/alpha/Kconfig|   1 -
 arch/alpha/include/asm/floppy.h   |   2 +-
 arch/alpha/include/asm/io.h   |   8 +-
 arch/ia64/Kconfig |   1 -
 arch/ia64/include/asm/io.h|   8 -
 arch/m68k/Kconfig |   1 -
 arch/m68k/include/asm/virtconvert.h   |   4 +-
 arch/microblaze/Kconfig   |   1 -
 arch/microblaze/include/asm/io.h  |   2 -
 arch/mips/Kconfig |   1 -
 arch/mips/include/asm/io.h|   9 -
 arch/parisc/Kconfig   |   1 -
 arch/parisc/include/asm/floppy.h  |   4 +-
 arch/parisc/include/asm/io.h  |   2 -
 arch/powerpc/Kconfig  |   1 -
 arch/powerpc/include/asm/io.h |   2 -
 arch/riscv/include/asm/page.h |   1 -
 arch/x86/Kconfig  |   1 -
 arch/x86/include/asm/io.h |   9 -
 arch/xtensa/Kconfig   |   1 -
 arch/xtensa/include/asm/io.h  |   3 -
 include/asm-generic/io.h  |  14 --
 mm/Kconfig|   8 -
 27 files changed, 10 insertions(+), 311 deletions(-)
 delete mode 100644 Documentation/core-api/bus-virt-phys-mapping.rst

diff --git a/Documentation/core-api/bus-virt-phys-mapping.rst 
b/Documentation/core-api/bus-virt-phys-mapping.rst
deleted file mode 100644
index c72b24a7d52c..
--- a/Documentation/core-api/bus-virt-phys-mapping.rst
+++ /dev/null
@@ -1,220 +0,0 @@
-==
-How to access I/O mapped memory from within device drivers
-==
-
-:Author: Linus
-
-.. warning::
-
-   The virt_to_bus() and bus_to_virt() functions have been
-   superseded by the functionality provided by the PCI DMA interface
-   (see Documentation/core-api/dma-api-howto.rst).  They continue
-   to be documented below for historical purposes, but new code
-   must not use them. --davidm 00/12/12
-
-::
-
-  [ This is a mail message in response to a query on IO mapping, thus the
-strange format for a "document" ]
-
-The AHA-1542 is a bus-master device, and your patch makes the driver give the
-controller the physical address of the buffers, which is correct on x86
-(because all bus master devices see the physical memory mappings directly). 
-
-However, on many setups, there are actually **three** different ways of looking
-at memory addresses, and in this case we actually want the third, the
-so-called "bus address". 
-
-Essentially, the three ways of addressing memory are (this is "real memory",
-that is, normal RAM--see later about other details): 
-
- - CPU untranslated.  This is the "physical" address.  Physical address 
-   0 is what the CPU sees when it drives zeroes on the memory bus.
-
- - CPU translated address. This is the "virtual" address, and is 
-   completely internal to the CPU itself with the CPU doing the appropriate
-   translations into "CPU untranslated". 
-
- - bus address. This is the address of memory as seen by OTHER devices, 
-   not the CPU. Now, in theory there could be many different bus 
-   addresses, with each device seeing memory in some device-specific way, but
-   happily most hardware designers aren't actually actively trying to make
-   things any more complex than necessary, so you can assume that all 
-   external 

[PATCH v2 2/3] scsi: BusLogic remove bus_to_virt

2022-06-17 Thread Arnd Bergmann
From: Arnd Bergmann 

The BusLogic driver is the last remaining driver that relies on the
deprecated bus_to_virt() function, which in turn only works on a few
architectures, and is incompatible with both swiotlb and iommu support.

Before commit 391e2f25601e ("[SCSI] BusLogic: Port driver to 64-bit."),
the driver had a dependency on x86-32, presumably because of this
problem. However, the change introduced another bug that made it still
impossible to use the driver on any 64-bit machine.

This was in turn fixed in commit 56f396146af2 ("scsi: BusLogic: Fix
64-bit system enumeration error for Buslogic"), 8 years later, which
shows that there are not a lot of users.

Maciej is still using the driver on 32-bit hardware, and Khalid mentioned
that the driver works with the device emulation used in VirtualBox
and VMware. Both of those only emulate it for Windows 2000 and older
operating systems that did not ship with the better LSI logic driver.

Do a minimum fix that searches through the list of descriptors to find
one that matches the bus address. This is clearly as inefficient as
was indicated in the code comment about the lack of a bus_to_virt()
replacement. A better fix would likely involve changing out the entire
descriptor allocation for a simpler one, but that would be much
more invasive.

Cc: Maciej W. Rozycki 
Cc: Matt Wang 
Cc: Khalid Aziz 
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/BusLogic.c | 27 ---
 drivers/scsi/Kconfig|  2 +-
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index a897c8f914cf..d057abfcdd5c 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -2515,12 +2515,26 @@ static int blogic_resultcode(struct blogic_adapter 
*adapter,
return (hoststatus << 16) | tgt_status;
 }
 
+/*
+ * turn the dma address from an inbox into a ccb pointer
+ * This is rather inefficient.
+ */
+static struct blogic_ccb *
+blogic_inbox_to_ccb(struct blogic_adapter *adapter, struct blogic_inbox *inbox)
+{
+   struct blogic_ccb *ccb;
+
+   for (ccb = adapter->all_ccbs; ccb; ccb = ccb->next_all)
+   if (inbox->ccb == ccb->dma_handle)
+   break;
+
+   return ccb;
+}
 
 /*
   blogic_scan_inbox scans the Incoming Mailboxes saving any
   Incoming Mailbox entries for completion processing.
 */
-
 static void blogic_scan_inbox(struct blogic_adapter *adapter)
 {
/*
@@ -2540,16 +2554,7 @@ static void blogic_scan_inbox(struct blogic_adapter 
*adapter)
enum blogic_cmplt_code comp_code;
 
while ((comp_code = next_inbox->comp_code) != BLOGIC_INBOX_FREE) {
-   /*
-  We are only allowed to do this because we limit our
-  architectures we run on to machines where bus_to_virt(
-  actually works.  There *needs* to be a dma_addr_to_virt()
-  in the new PCI DMA mapping interface to replace
-  bus_to_virt() or else this code is going to become very
-  innefficient.
-*/
-   struct blogic_ccb *ccb =
-   (struct blogic_ccb *) bus_to_virt(next_inbox->ccb);
+   struct blogic_ccb *ccb = blogic_inbox_to_ccb(adapter, 
adapter->next_inbox);
if (comp_code != BLOGIC_CMD_NOTFOUND) {
if (ccb->status == BLOGIC_CCB_ACTIVE ||
ccb->status == BLOGIC_CCB_RESET) {
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index cf75588a2587..56bdc08d0b77 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -513,7 +513,7 @@ config SCSI_HPTIOP
 
 config SCSI_BUSLOGIC
tristate "BusLogic SCSI support"
-   depends on PCI && SCSI && VIRT_TO_BUS
+   depends on PCI && SCSI
help
  This is support for BusLogic MultiMaster and FlashPoint SCSI Host
  Adapters. Consult the SCSI-HOWTO, available from
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/3] phase out CONFIG_VIRT_TO_BUS

2022-06-17 Thread Arnd Bergmann
From: Arnd Bergmann 

The virt_to_bus/bus_to_virt interface has been deprecated for
decades. After Jakub Kicinski put a lot of work into cleaning out the
network drivers using them, there are only a couple of other drivers
left, which can all be removed or otherwise cleaned up, to remove the
old interface for good.

Any out of tree drivers using virt_to_bus() should be converted to
using the dma-mapping interfaces, typically dma_alloc_coherent()
or dma_map_single()).

There are a few m68k and ppc32 specific drivers that keep using the
interfaces, but these are all guarded with architecture-specific
Kconfig dependencies, and are not actually broken.

There are still a number of drivers that are using virt_to_phys()
and phys_to_virt() in place of dma-mapping operations, and these
are often broken, but they are out of scope for this series.

I would like the first two patches to either get merged through
the SCSI tree, or get an Ack from the SCSI maintainers so I can
merge them through the asm-generic tree

  Arnd

---
Changes since v1:
 - dropped VME patches that are already in staging-next
 - dropped media patch that gets merged independently
 - added a networking patch and dropped it again after it got merged
 - replace BusLogic removal with a workaround

Cc: Jakub Kicinski 
Cc: Christoph Hellwig  # dma-mapping
Cc: Marek Szyprowski  # dma-mapping
Cc: Robin Murphy  # dma-mapping
Cc: iommu@lists.linux-foundation.org
Cc: Khalid Aziz  # buslogic
Cc: Maciej W. Rozycki  # buslogic
Cc: Matt Wang  # buslogic
Cc: Miquel van Smoorenburg  # dpt_i2o
Cc: Mark Salyzyn  # dpt_i2o
Cc: linux-s...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-par...@vger.kernel.org
Cc: Denis Efremov  # floppy

Arnd Bergmann (3):
  scsi: dpt_i2o: drop stale VIRT_TO_BUS dependency
  scsi: BusLogic remove bus_to_virt
  arch/*/: remove CONFIG_VIRT_TO_BUS

 .../core-api/bus-virt-phys-mapping.rst| 220 --
 Documentation/core-api/dma-api-howto.rst  |  14 --
 Documentation/core-api/index.rst  |   1 -
 .../translations/zh_CN/core-api/index.rst |   1 -
 arch/alpha/Kconfig|   1 -
 arch/alpha/include/asm/floppy.h   |   2 +-
 arch/alpha/include/asm/io.h   |   8 +-
 arch/ia64/Kconfig |   1 -
 arch/ia64/include/asm/io.h|   8 -
 arch/m68k/Kconfig |   1 -
 arch/m68k/include/asm/virtconvert.h   |   4 +-
 arch/microblaze/Kconfig   |   1 -
 arch/microblaze/include/asm/io.h  |   2 -
 arch/mips/Kconfig |   1 -
 arch/mips/include/asm/io.h|   9 -
 arch/parisc/Kconfig   |   1 -
 arch/parisc/include/asm/floppy.h  |   4 +-
 arch/parisc/include/asm/io.h  |   2 -
 arch/powerpc/Kconfig  |   1 -
 arch/powerpc/include/asm/io.h |   2 -
 arch/riscv/include/asm/page.h |   1 -
 arch/x86/Kconfig  |   1 -
 arch/x86/include/asm/io.h |   9 -
 arch/xtensa/Kconfig   |   1 -
 arch/xtensa/include/asm/io.h  |   3 -
 drivers/scsi/BusLogic.c   |  27 ++-
 drivers/scsi/Kconfig  |   4 +-
 drivers/scsi/dpt_i2o.c|   4 +-
 include/asm-generic/io.h  |  14 --
 mm/Kconfig|   8 -
 30 files changed, 30 insertions(+), 326 deletions(-)
 delete mode 100644 Documentation/core-api/bus-virt-phys-mapping.rst

-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/3] scsi: dpt_i2o: drop stale VIRT_TO_BUS dependency

2022-06-17 Thread Arnd Bergmann
From: Arnd Bergmann 

The dpt_i2o driver was fixed to stop using virt_to_bus() in 2008, but
it still has a stale reference in an error handling code path that could
never work.

Fix it up to build without VIRT_TO_BUS and remove the Kconfig dependency.

The alternative to this would be to just remove the driver, as it is
clearly obsolete. The i2o driver layer was removed in 2015 with commit
4a72a7af462d ("staging: remove i2o subsystem"), but the even older
dpt_i2o scsi driver stayed around.

The last non-cleanup patches I could find were from Miquel van Smoorenburg
and Mark Salyzyn back in 2008, they might know if there is any chance
of the hardware still being used anywhere.

Fixes: 67af2b060e02 ("[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to 
dma_alloc_coherent")
Cc: Miquel van Smoorenburg 
Cc: Mark Salyzyn 
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/Kconfig   | 2 +-
 drivers/scsi/dpt_i2o.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index a9fe5152addd..cf75588a2587 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -460,7 +460,7 @@ config SCSI_MVUMI
 
 config SCSI_DPT_I2O
tristate "Adaptec I2O RAID support "
-   depends on SCSI && PCI && VIRT_TO_BUS
+   depends on SCSI && PCI
help
  This driver supports all of Adaptec's I2O based RAID controllers as 
  well as the DPT SmartRaid V cards.  This is an Adaptec maintained
diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 2e9155ba7408..55dfe7011912 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -52,11 +52,11 @@ MODULE_DESCRIPTION("Adaptec I2O RAID Driver");
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include  /* for boot_cpu_data */
-#include /* for virt_to_bus, etc. */
 
 #include 
 #include 
@@ -2112,7 +2112,7 @@ static irqreturn_t adpt_isr(int irq, void *dev_id)
} else {
/* Ick, we should *never* be here */
printk(KERN_ERR "dpti: reply frame not from pool\n");
-   reply = (u8 *)bus_to_virt(m);
+   goto out;
}
 
if (readl(reply) & MSG_FAIL) {
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

2022-06-17 Thread Steven Price
On 15/06/2022 11:10, Shameer Kolothum wrote:
> Hi
> 
> v12 --> v13
>   -No changes. Rebased to 5.19-rc1.
>   -Picked up tags received from Laurentiu, Hanjun and Will. Thanks!.

You've already got my Tested-by tags, but just to confirm I gave this a
spin and it works fine.

Thanks,

Steve

> 
> Thanks,
> Shameer
> 
> From old:
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these
> controllers make use of host memory for various caching related
> purposes and when SMMU is enabled the iMR firmware fails to
> access these memory regions as there is no mapping for them.
> IORT RMR provides a way for UEFI to describe and report these
> memory regions so that the kernel can make a unity mapping for
> these in SMMU.
> 
> Change History:
> 
> v11 --> v12
>   -Minor fix in patch #4 to address the issue reported by the kernel test 
> robot.
>   -Added R-by tags by Christoph(patch #1) and Lorenzo(patch #4).
>   -Added T-by from Steve to all relevant patches. Many thanks!.
> 
> v10 --> v11
>  -Addressed Christoph's comments. We now have a  callback to 
>   struct iommu_resv_region to free all related memory and also dropped
>   the FW specific union and now has a container struct iommu_iort_rmr_data.
>   See patches #1 & #4
>  -Added R-by from Christoph.
>  -Dropped R-by from Lorenzo for patches #4 & #5 due to the above changes.
>  -Also dropped T-by from Steve and Laurentiu. Many thanks for your test
>   efforts. I have done basic sanity testing on my platform but please
>   do it again at your end.
> 
> v9 --> v10
>  - Dropped patch #1 ("Add temporary RMR node flag definitions") since
>the ACPICA header updates patch is now in the mailing list
>  - Based on the suggestion from Christoph, introduced a 
>resv_region_free_fw_data() callback in struct iommu_resv_region and
>used that to free RMR specific memory allocations.
> 
> v8 --> v9
>  - Adressed comments from Robin on interfaces.
>  - Addressed comments from Lorenzo.
> 
> v7 --> v8
>   - Patch #1 has temp definitions for RMR related changes till
>     the ACPICA header changes are part of kernel.
>   - No early parsing of RMR node info and is only parsed at the
>     time of use.
>   - Changes to the RMR get/put API format compared to the
>     previous version.
>   - Support for RMR descriptor shared by multiple stream IDs.
> 
> v6 --> v7
>  -fix pointed out by Steve to the SMMUv2 SMR bypass install in patch #8.
> 
> v5 --> v6
> - Addressed comments from Robin & Lorenzo.
>   : Moved iort_parse_rmr() to acpi_iort_init() from
>     iort_init_platform_devices().
>   : Removed use of struct iort_rmr_entry during the initial
>     parse. Using struct iommu_resv_region instead.
>   : Report RMR address alignment and overlap errors, but continue.
>   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
>   on Type of RMR region. Suggested by Jon N.
> 
> v4 --> v5
>  -Added a fw_data union to struct iommu_resv_region and removed
>   struct iommu_rmr (Based on comments from Joerg/Robin).
>  -Added iommu_put_rmrs() to release mem.
>  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
>   yet because of the above changes.
> 
> v3 -->v4
> -Included the SMMUv2 SMR bypass install changes suggested by
>  Steve(patch #7)
> -As per Robin's comments, RMR reserve implementation is now
>  more generic  (patch #8) and dropped v3 patches 8 and 10.
> -Rebase to 5.13-rc1
> 
> RFC v2 --> v3
>  -Dropped RFC tag as the ACPICA header changes are now ready to be
>   part of 5.13[0]. But this series still has a dependency on that patch.
>  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
>   PCIe).
>  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>   discussion here[1].
>  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> 
> Jon Nettleton (1):
>   iommu/arm-smmu: Get associated RMR info and install bypass SMR
> 
> Shameer Kolothum (8):
>   iommu: Introduce a callback to struct iommu_resv_region
>   ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void
>   ACPI/IORT: Provide a generic helper to retrieve reserve regions
>   ACPI/IORT: Add support to retrieve IORT RMR reserved regions
>   ACPI/IORT: Add a helper to retrieve RMR info directly
>   iommu/arm-smmu-v3: Introduce strtab init helper
>   iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force
> bypass
>   iommu/arm-smmu-v3: Get associated RMR info and install bypass STE
> 
>  drivers/acpi/arm64/iort.c   | 360 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  78 -
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  52 +++
>  drivers/iommu/dma-iommu.c   |   2 +-
>  drivers/iommu/iommu.c   |  16 +-
>  include/linux/acpi_iort.h   |  14 +-
>  

Re: [PATCH v3 6/6] iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg

2022-06-17 Thread Matthias Brugger



On 15/06/2022 14:28, AngeloGioacchino Del Regno wrote:

Il 15/06/22 14:09, Matthias Brugger ha scritto:



On 09/06/2022 12:08, AngeloGioacchino Del Regno wrote:

On some SoCs (of which only MT8195 is supported at the time of writing),
the "R" and "W" (I/O) enable bits for the IOMMUs are in the pericfg_ao
register space and not in the IOMMU space: as it happened already with
infracfg, it is expected that this list will grow.

Instead of specifying pericfg compatibles on a per-SoC basis, following
what was done with infracfg, let's lookup the syscon by phandle instead.

Signed-off-by: AngeloGioacchino Del Regno 


---
  drivers/iommu/mtk_iommu.c | 23 +--
  1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 90685946fcbe..0ea0848581e9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -138,6 +138,8 @@
  /* PM and clock always on. e.g. infra iommu */
  #define PM_CLK_AO    BIT(15)
  #define IFA_IOMMU_PCIE_SUPPORT    BIT(16)
+/* IOMMU I/O (r/w) is enabled using PERICFG_IOMMU_1 register */
+#define HAS_PERI_IOMMU1_REG    BIT(17)


 From what I can see MTK_IOMMU_TYPE_INFRA is only set in MT8195 which uses 
pericfg. So we don't need a new flag here. For me the flag name 
MTK_IOMMU_TYPE_INFRA was confusing as it has nothing to do with the use of 
infracfg. I'll hijack this patch to provide some feedback on the actual code, 
please see below.



  #define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask)    \
  pdata)->flags) & (mask)) == (_x))
@@ -187,7 +189,6 @@ struct mtk_iommu_plat_data {
  u32    flags;
  u32    inv_sel_reg;
-    char    *pericfg_comp_str;
  struct list_head    *hw_list;
  unsigned int    iova_region_nr;
  const struct mtk_iommu_iova_region    *iova_region;
@@ -1218,14 +1219,16 @@ static int mtk_iommu_probe(struct platform_device *pdev)
  goto out_runtime_disable;
  }
  } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
-   data->plat_data->pericfg_comp_str) {


Check for pericfg_comp_str is not needed, we only have one platform that uses 
MTK_IOMMU_TYPE_INFRA.




Fair enough. I agree.

-    infracfg = 
syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str);


We can do something like this to make the code clearer:
data->pericfg = 
syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str);

 if (IS_ERR(data->pericfg)) {

Using infracfg variable here is confusing as it has nothing to do with 
infracfg used with HAS_4GB_MODE flag.


Yes Matthias, using the infracfg variable is confusing - that's why I changed 
that
already



Regards,
Matthias


-    if (IS_ERR(infracfg)) {
-    ret = PTR_ERR(infracfg);
-    goto out_runtime_disable;
+   MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_PERI_IOMMU1_REG)) {




+    data->pericfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
"mediatek,pericfg");


Here, where I'm assigning directly to data->pericfg :-P



Uuuups, sorry, did look too much on the existing code and not enough on your 
patch.


By the way, since it was only about one platform, my intention was to remove the
pericfg_comp_str from struct iommu_plat_data (as you can see), but then, with 
the
current code, I had to assign .



+    if (IS_ERR(data->pericfg)) {
+    p = "mediatek,mt8195-pericfg_ao";


...the string to 'p', because otherwise it would go over 100 columns.

In any case, I just checked and, apparently, MT8195 is really the one and only 
SoC
that needs this pericfg register to be managed by Linux... even the latest and
greatest smartphone chip (Dimensity 9000, MT6983) doesn't need this (at least,
from what I can read on a downstream kernel).

On an afterthought, perhaps the best idea is to just leave this as it is and, as
you proposed, avoid using that confusing infracfg variable, without adding the
pericfg handle at all.


Either this or get also rid of the pericfg_comp_str in the _plat_data. I'm 
unemotional about this :)


Regards,
Matthias



After all, it's just one single SoC.

I'll send a new version soon!

Cheers,
Angelo


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[CFP LPC 2022] VFIO/IOMMU/PCI Microconference

2022-06-17 Thread Jörg Rödel
Hello everyone!

We are pleased to announce that there will be another

VFIO/IOMMU/PCI Microconference

at this year's Linux Plumbers Conference (LPC), from 12th to the 14th of
September in Dublin, Ireland. LPC is a hybrid event this year;
attendance can be in person or remote.

In this microconference we want to discuss ongoing developments around
the VFIO, IOMMU and PCI subsystems and their interactions in Linux.

Tentative topics that are under consideration for this year include (but
not limited to):

* PCI:
  - Cache Coherent Interconnect for Accelerators (CCIX)/Compute
Express Link (CXL) expansion memory and accelerators
management
  - Data Object Exchange (DOE)
  - Integrity and Data Encryption (IDE)
  - Component Measurement and Authentication (CMA)
  - Security Protocol and Data Model (SPDM)
  - I/O Address Space ID Allocator (IOASID)
  - INTX/MSI IRQ domain consolidation
  - Gen-Z interconnect fabric
  - ARM64 architecture and hardware
  - PCI native host controllers/endpoints drivers current
challenges and improvements (e.g., state of PCI quirks, etc.)
  - PCI error handling and management e.g., Advanced Error
Reporting (AER), Downstream Port Containment (DPC), ACPI
Platform Error Interface (APEI) and Error Disconnect Recover
(EDR)
  - Power management and devices supporting Active-state Power
Management (ASPM)
  - Peer-to-Peer DMA (P2PDMA)
  - Resources claiming/assignment consolidation
  - Probing of native PCIe controllers and general reset
implementation
  - Prefetchable vs non-prefetchable BAR address mappings
  - Untrusted/external devices management
  - DMA ownership models
  - Thunderbolt, DMA, RDMA and USB4 security

* VFIO:
  - Write-combine on non-x86 architectures
  - I/O Page Fault (IOPF) for passthrough devices
  - Shared Virtual Addressing (SVA) interface
  - Single-root I/O Virtualization(SRIOV)/Process Address Space
ID (PASID) integration
  - PASID in SRIOV virtual functions
  - Device assignment/sub-assignment

* IOMMU:
  - /dev/iommufd development
  - IOMMU virtualization
  - IOMMU drivers SVA interface
  - DMA-API layer interactions and the move towards generic
dma-ops for IOMMU drivers
  - Possible IOMMU core changes (e.g., better integration with
device-driver core, etc.)

Please submit your proposals on the LPC website at:

https://lpc.events/event/16/abstracts/

Make sure to select the "VFIO/IOMMU/PCI MC" in the Track pulldown
menu.

Looking forward to seeing you all there, either in Dublin or virtual! :)

The organizers,

Alex Williamson
Bjorn Helgaas
Jörg Rödel
Lorenzo Pieralisi
Krzysztof Wilczyński

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-17 Thread Zhangfei Gao



On 2022/6/17 下午2:05, Zhangfei Gao wrote:



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode 
*inode, struct file *filep)

   if (!q)
   return -ENOMEM;
+    mutex_lock(>queues_lock);
+
+    if (!uacce->parent->driver) {
I don't think this is useful, because the core clears 
parent->driver after

having run uacce_remove():

    rmmod hisi_zip    open()
 ... uacce_fops_open()
 __device_release_driver()  ...
  pci_device_remove()
   hisi_zip_remove()
    hisi_qm_uninit()
 uacce_remove()
  ...  ...
   mutex_lock(uacce->queues_lock)
  ...  if (!uacce->parent->driver)
  device_unbind_cleanup()  /* driver still valid, proceed */
   dev->driver = NULL
The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer

may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.
Ah right, relying on the return value of bind() wouldn't be enough 
even if

we mandated SVA.

[...]

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would 
try to

obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.

Since fops_open get char device refcount, uacce_release will not happen
until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer 
uacce_remove()

runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */
 ...    rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open 
release the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue 
in open should fail.


Accessing uacce->ops after free_module() is a use-after-free. We need 
all

you men parent release the resources.

the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all 
dma first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: 
hisi_qm_wait_task_finish, which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 
8b700204

pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(>queues_lock);
mutex_lock(_mutex);

Or we set all the callbacks of uacce_ops to NULL?

How about in uacce_remove
mutex_lock(_mutex);
uacce->ops = NULL;
mutex_unlock(_mutex);

And check uacce->ops  first when using.

Or set all ops of uacce->ops to NULL.


Module_get/put only works for module, but not for removing device.

Thanks



Thanks,
Jean




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v8 05/11] iommu/vt-d: Add SVA domain support

2022-06-17 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, June 7, 2022 9:50 AM
> 
> +
> +static const struct iommu_domain_ops intel_svm_domain_ops = {
> + .set_dev_pasid  = intel_svm_attach_dev_pasid,
> + .block_dev_pasid= intel_svm_detach_dev_pasid,
> + .free   = intel_svm_domain_free,
> +};
> +

It's clearer to use set/block for intel callbacks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v8 04/11] iommu: Add sva iommu_domain support

2022-06-17 Thread Tian, Kevin
> From: Baolu Lu
> Sent: Friday, June 10, 2022 3:16 PM
> >
> >> +#define __IOMMU_DOMAIN_HOST_VA(1U << 5)  /* Host CPU virtual
> address */
> >
> > Do you mean general CPU VA? or Host CPU VA, I'm reading the latter as
> 2nd
> > stage?
> 
> Host CPU VA. In the near future, we will add another flag _GUEST_VA, so
> that the shared page table types are distiguished.

How does the kernel knows that an user page table translates guest VA?
IMHO I don't think the kernel should care about it except managing
all the aspects related to the user page table itself...

> 
> >
> >> +
> >>   /*
> >>* This are the possible domain-types
> >>*
> >> @@ -86,15 +89,24 @@ struct iommu_domain_geometry {
> >>   #define IOMMU_DOMAIN_DMA_FQ
>   (__IOMMU_DOMAIN_PAGING |\
> >> __IOMMU_DOMAIN_DMA_API |   \
> >> __IOMMU_DOMAIN_DMA_FQ)
> >> +#define IOMMU_DOMAIN_SVA  (__IOMMU_DOMAIN_SHARED |
>   \
> >> +   __IOMMU_DOMAIN_HOST_VA)
> >
> > Doesn't shared automatically mean CPU VA? Do we need another flag?
> 
> Yes. Shared means CPU VA, but there're many types. Besides above two, we
> also see the shared KVM/EPT.
> 

Will the two sharing scenarios share any common code? If not then
having a separate flag bit is meaningless.

It might be more straightforward to be:

#define IOMMU_DOMAIN_SVA__IOMMU_DOMAIN_SVA
#define IOMMU_DOMAIN_KVM __IOMMU_DOMAIN_KVM
#define IOMMU_DOMAIN_USER __IOMMU_DOMAIN_USER

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix compatible for rcar-gen4

2022-06-17 Thread Geert Uytterhoeven
On Fri, Jun 17, 2022 at 3:16 AM Yoshihiro Shimoda
 wrote:
> Fix compatible string for R-Car Gen4.
>
> Fixes: ae684caf465b ("iommu/ipmmu-vmsa: Add support for R-Car Gen4")
> Signed-off-by: Yoshihiro Shimoda 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-17 Thread Zhangfei Gao



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct 
file *filep)
if (!q)
return -ENOMEM;
+   mutex_lock(>queues_lock);
+
+   if (!uacce->parent->driver) {

I don't think this is useful, because the core clears parent->driver after
having run uacce_remove():

rmmod hisi_zip  open()
 ... uacce_fops_open()
 __device_release_driver()...
  pci_device_remove()
   hisi_zip_remove()
hisi_qm_uninit()
 uacce_remove()
  ... ...
  mutex_lock(uacce->queues_lock)
  ... if (!uacce->parent->driver)
  device_unbind_cleanup() /* driver still valid, proceed */
   dev->driver = NULL

The check  if (!uacce->parent->driver) is required, otherwise NULL pointer
may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.

Ah right, relying on the return value of bind() wouldn't be enough even if
we mandated SVA.

[...]

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would try to
obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.

Since fops_open get char device refcount, uacce_release will not happen
until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */  
 ...rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open release 
the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue in 
open should fail.



Accessing uacce->ops after free_module() is a use-after-free. We need all

you men parent release the resources.

the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all dma 
first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: hisi_qm_wait_task_finish, 
which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 8b700204
pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(>queues_lock);
mutex_lock(_mutex);

Or we set all the callbacks of uacce_ops to NULL?
Module_get/put only works for module, but not for removing device.

Thanks



Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu