Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-23 Thread John Garry via iommu

On 10/06/2022 16:37, John Garry via iommu wrote:



On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind 
an IOMMU? I had the impression that UFS controllers would be mostly 
found in embedded systems and IOMMUs are not as common on there.


Modern phones have an IOMMU. Below one can find an example from a 
Pixel 6 phone. The UFS storage controller is not controller by the 
IOMMU as far as I can see but I wouldn't be surprised if the security 
team would ask us one day to enable the IOMMU for the UFS controller.


OK, then unfortunately it seems that you have no method to test. I might 
be able to test USB MSC but I am not even sure if I can even get DMA 
mappings who length exceeds the IOVA rcache limit there.


I was able to do some testing on USB MSC for an XHCI controller. The 
result is that limiting the max HW sectors there does not affect 
performance in normal conditions.


However if I hack the USB driver and fiddle with request queue settings 
then it can:

- lift max_sectors limit in usb_stor_host_template 120KB -> 256KB
- lift request queue read_ahead_kb 128KB -> 256KB

In this scenario I can get 42.5MB/s read throughput, as opposed to 
39.5MB/s in normal conditions. Since .can_queue=1 for that host it would 
not fall foul of some issues I experience in IOVA allocator performance, 
so limiting max_sectors would not be required for that reason.


So this is an artificial test, but it may be worth considering only 
applying this DMA mapping optimal max_sectors limit to SAS controllers 
which I know can benefit.


Christoph, any opinion?

thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-10 Thread John Garry via iommu

On 09/06/2022 21:34, Bart Van Assche wrote:

On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind 
an IOMMU? I had the impression that UFS controllers would be mostly 
found in embedded systems and IOMMUs are not as common on there.


Modern phones have an IOMMU. Below one can find an example from a Pixel 
6 phone. The UFS storage controller is not controller by the IOMMU as 
far as I can see but I wouldn't be surprised if the security team would 
ask us one day to enable the IOMMU for the UFS controller.


OK, then unfortunately it seems that you have no method to test. I might 
be able to test USB MSC but I am not even sure if I can even get DMA 
mappings who length exceeds the IOVA rcache limit there.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread Bart Van Assche

On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind an 
IOMMU? I had the impression that UFS controllers would be mostly found 
in embedded systems and IOMMUs are not as common on there.


Modern phones have an IOMMU. Below one can find an example from a Pixel 
6 phone. The UFS storage controller is not controller by the IOMMU as 
far as I can see but I wouldn't be surprised if the security team would 
ask us one day to enable the IOMMU for the UFS controller.


# (cd /sys/class/iommu && ls */devices)
1a09.sysmmu/devices:
1900.aoc

1a51.sysmmu/devices:
1a44.lwis_csi

1a54.sysmmu/devices:
1aa4.lwis_pdp

1a88.sysmmu/devices:
1a84.lwis_g3aa

1ad0.sysmmu/devices:
1ac4.lwis_ipp  1ac8.lwis_gtnr_align

1b08.sysmmu/devices:
1b45.lwis_itp

1b78.sysmmu/devices:

1b7b.sysmmu/devices:
1b76.lwis_mcsc

1b7e.sysmmu/devices:

1baa.sysmmu/devices:
1a4e.lwis_votf  1ba4.lwis_gdc

1bad.sysmmu/devices:
1ba6.lwis_gdc

1bb0.sysmmu/devices:
1ba8.lwis_scsc

1bc7.sysmmu/devices:
1bc4.lwis_gtnr_merge

1bca.sysmmu/devices:

1bcd.sysmmu/devices:

1bd0.sysmmu/devices:

1bd3.sysmmu/devices:

1c10.sysmmu/devices:
1c30.drmdecon  1c302000.drmdecon

1c11.sysmmu/devices:

1c12.sysmmu/devices:

1c66.sysmmu/devices:
1c64.g2d

1c69.sysmmu/devices:

1c71.sysmmu/devices:
1c70.smfc

1c87.sysmmu/devices:
1c8d.MFC-0  mfc

1c8a.sysmmu/devices:

1ca4.sysmmu/devices:
1cb0.bigocean

1cc4.sysmmu/devices:
1ce0.abrolhos

Bart.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread John Garry via iommu

On 09/06/2022 18:18, Bart Van Assche wrote:


SCSI host bus adapters that support 64-bit DMA may support much 
larger transfer sizes than 128 KiB.


Indeed, and that is my problem today, as my storage controller is 
generating DMA mapping lengths which exceeds 128K and they slow 
everything down.


If you say that SRP enjoys best peformance with larger transfers then 
can you please test this with an IOMMU enabled (iommu group type DMA 
or DMA-FQ)?


Hmm ... what exactly do you want me to test? Do you perhaps want me to 
measure how much performance drops with an IOMMU enabled? 


Yes, I would like to know of any performance change with an IOMMU 
enabled and then with an IOMMU enabled and including my series.


I don't have 
access anymore to the SRP setup I referred to in my previous email. But 
I do have access to devices that boot from UFS storage. For these 
devices we need to transfer 2 MiB per request to achieve full bandwidth.


ok, but do you have a system where the UFS host controller is behind an 
IOMMU? I had the impression that UFS controllers would be mostly found 
in embedded systems and IOMMUs are not as common on there.


Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread Bart Van Assche

On 6/9/22 01:00, John Garry wrote:

On 08/06/2022 22:07, Bart Van Assche wrote:

On 6/8/22 10:50, John Garry wrote:
Please note that this limit only applies if we have an IOMMU enabled 
for the scsi host dma device. Otherwise we are limited by dma direct 
or swiotlb max mapping size, as before.


SCSI host bus adapters that support 64-bit DMA may support much larger 
transfer sizes than 128 KiB.


Indeed, and that is my problem today, as my storage controller is 
generating DMA mapping lengths which exceeds 128K and they slow 
everything down.


If you say that SRP enjoys best peformance with larger transfers then 
can you please test this with an IOMMU enabled (iommu group type DMA or 
DMA-FQ)?


Hmm ... what exactly do you want me to test? Do you perhaps want me to 
measure how much performance drops with an IOMMU enabled? I don't have 
access anymore to the SRP setup I referred to in my previous email. But 
I do have access to devices that boot from UFS storage. For these 
devices we need to transfer 2 MiB per request to achieve full bandwidth.


Thanks,

Bart.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread John Garry via iommu

On 08/06/2022 22:07, Bart Van Assche wrote:

On 6/8/22 10:50, John Garry wrote:
Please note that this limit only applies if we have an IOMMU enabled 
for the scsi host dma device. Otherwise we are limited by dma direct 
or swiotlb max mapping size, as before.


SCSI host bus adapters that support 64-bit DMA may support much larger 
transfer sizes than 128 KiB.


Indeed, and that is my problem today, as my storage controller is 
generating DMA mapping lengths which exceeds 128K and they slow 
everything down.


If you say that SRP enjoys best peformance with larger transfers then 
can you please test this with an IOMMU enabled (iommu group type DMA or 
DMA-FQ)?


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-08 Thread Bart Van Assche

On 6/8/22 10:50, John Garry wrote:
Please note that this limit only applies if we have an IOMMU enabled for 
the scsi host dma device. Otherwise we are limited by dma direct or 
swiotlb max mapping size, as before.


SCSI host bus adapters that support 64-bit DMA may support much larger 
transfer sizes than 128 KiB.


Thanks,

Bart.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-08 Thread John Garry via iommu

On 08/06/2022 18:33, Bart Van Assche wrote:

On 6/6/22 02:30, John Garry wrote:

+    if (dma_dev->dma_mask) {
+    shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+    dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+    }


Since IOVA_RANGE_CACHE_MAX_SIZE = 6 this limits max_sectors to 2**6 * 
PAGE_SIZE or 256 KiB if the page size is 4 KiB.


It's actually 128K for 4K page size, as any IOVA size is roundup to 
power-of-2 when testing if we may cache it, which means anything >128K 
would roundup to 256K and cannot be cached.


I think that's too 
small. Some (SRP) storage arrays require much larger transfers to 
achieve optimal performance.


Have you tried this achieve this optimal performance with an IOMMU enabled?

Please note that this limit only applies if we have an IOMMU enabled for 
the scsi host dma device. Otherwise we are limited by dma direct or 
swiotlb max mapping size, as before.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-08 Thread Bart Van Assche

On 6/6/22 02:30, John Garry wrote:

+   if (dma_dev->dma_mask) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }


Since IOVA_RANGE_CACHE_MAX_SIZE = 6 this limits max_sectors to 2**6 * 
PAGE_SIZE or 256 KiB if the page size is 4 KiB. I think that's too 
small. Some (SRP) storage arrays require much larger transfers to 
achieve optimal performance.


Thanks,

Bart.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu