Re: iommu_iova slab eats too much memory

2020-04-24 Thread John Garry

On 24/04/2020 17:30, Robin Murphy wrote:

On 2020-04-24 2:20 pm, Bin wrote:

Dear Robin:
  Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?


I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
memory about an order of magnitude faster than the IOVAs alone, but it
should shed some light on whether DMA API usage looks suspicious, and
dumping the mappings should help track down the responsible driver(s).
Although the debugfs code doesn't show the stacktrace of where each
mapping was made, I guess it would be fairly simple to tweak that for a
quick way to narrow down where to start looking in an offending driver.

Robin.


Just mentioning this in case it's relevant - we found long term aging 
throughput test causes RB tree to grow very large (and would I assume 
eat lots of memory):


https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

John




Robin Murphy  于2020年4月24日周五 下午8:15写道:


On 2020-04-24 1:06 pm, Bin wrote:

I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu

module

itself?


Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
should, since I'd expect a lot of people to have noticed that. It's far
more likely that some driver is failing to call dma_unmap_* when it's
finished with a buffer - with the IOMMU disabled that would be a no-op
on x86 with a modern 64-bit-capable device, so such a latent bug could
have been easily overlooked.

Robin.


Bin  于 2020年4月24日周五 20:00写道:


Well, that's the problem! I'm assuming the iommu kernel module is

leaking

memory. But I don't know why and how.

Do you have any idea about it? Or any further information is needed?

Robin Murphy  于 2020年4月24日周五 19:20写道:


On 2020-04-24 1:40 am, Bin wrote:

Hello? anyone there?

Bin  于2020年4月23日周四 下午5:14写道:


Forget to mention, I've already disabled the slab merge, so this is

what

it is.

Bin  于2020年4月23日周四 下午5:11写道:


Hey, guys:

I'm running a batch of CoreOS boxes, the lsb_release is:

```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```

```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00

2019

x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel

GNU/Linux

```
Recently, I found my vms constently being killed due to OOM, and

after

digging into the problem, I finally realized that the kernel is

leaking

memory.

Here's my slabinfo:

 Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
 Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
 Active / Total Caches (% used) : 152 / 217 (70.0%)
 Active / Total Size (% used)   : 5828768.08K / 5996848.72K

(97.2%)

 Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

80253888 80253888 100%0.06K 1253967   64   5015868K

iommu_iova


Do you really have a peak demand of ~80 million simultaneous DMA
buffers, or is some driver leaking DMA mappings?

Robin.


489472 489123  99%0.03K   3824  128 15296K kmalloc-32

297444 271112  91%0.19K   7082   42 56656K dentry

254400 252784  99%0.06K   3975   64 15900K

anon_vma_chain


222528  39255  17%0.50K   6954   32111264K kmalloc-512

202482 201814  99%0.19K   4821   42 38568K

vm_area_struct


200192 200192 100%0.01K391  512  1564K kmalloc-8

170528 169359  99%0.25K   5329   32 42632K filp

158144 153508  97%0.06K   2471   64  9884K kmalloc-64

149914 149365  99%0.09K   3259   46 13036K anon_vma

146640 143123  97%0.10K   3760   39 15040K buffer_head

130368  32791  25%0.09K   3104   42 12416K kmalloc-96

129752 129752 100%0.07K   2317   56  9268K Acpi-Operand

105468 105106  99%0.04K   1034  102  4136K
selinux_inode_security
 73080  73080 100%0.13K   2436   30  9744K

kernfs_node_cache


 72360  70261  97%0.59K   1340   54 42880K inode_cache

 71040  71040 100%0.12K   2220   32  8880K

eventpoll_epi


 68096  59262  87%0.02K266  256  1064K kmalloc-16

 53652  53652 100%0.04K526  102  2104K pde_opener

 50496  31654  62%2.00K   3156   16100992K

kmalloc-2048


 46242  46242 100%0.19K   1101   42  8808K cred_jar

 44496  43013  96%0.66K927   48 29664K

proc_inode_cache


 44352  44352 100%0.06K693   64  2772K

task_delay_info


 43516  43471  99%0.69K946   46 

Re: iommu_iova slab eats too much memory

2020-04-24 Thread Robin Murphy

On 2020-04-24 2:20 pm, Bin wrote:

Dear Robin:
 Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?


I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through 
memory about an order of magnitude faster than the IOVAs alone, but it 
should shed some light on whether DMA API usage looks suspicious, and 
dumping the mappings should help track down the responsible driver(s). 
Although the debugfs code doesn't show the stacktrace of where each 
mapping was made, I guess it would be fairly simple to tweak that for a 
quick way to narrow down where to start looking in an offending driver.


Robin.


Robin Murphy  于2020年4月24日周五 下午8:15写道:


On 2020-04-24 1:06 pm, Bin wrote:

I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu

module

itself?


Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
should, since I'd expect a lot of people to have noticed that. It's far
more likely that some driver is failing to call dma_unmap_* when it's
finished with a buffer - with the IOMMU disabled that would be a no-op
on x86 with a modern 64-bit-capable device, so such a latent bug could
have been easily overlooked.

Robin.


Bin  于 2020年4月24日周五 20:00写道:


Well, that's the problem! I'm assuming the iommu kernel module is

leaking

memory. But I don't know why and how.

Do you have any idea about it? Or any further information is needed?

Robin Murphy  于 2020年4月24日周五 19:20写道:


On 2020-04-24 1:40 am, Bin wrote:

Hello? anyone there?

Bin  于2020年4月23日周四 下午5:14写道:


Forget to mention, I've already disabled the slab merge, so this is

what

it is.

Bin  于2020年4月23日周四 下午5:11写道:


Hey, guys:

I'm running a batch of CoreOS boxes, the lsb_release is:

```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```

```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00

2019

x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel

GNU/Linux

```
Recently, I found my vms constently being killed due to OOM, and

after

digging into the problem, I finally realized that the kernel is

leaking

memory.

Here's my slabinfo:

Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
Active / Total Caches (% used) : 152 / 217 (70.0%)
Active / Total Size (% used)   : 5828768.08K / 5996848.72K

(97.2%)

Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K

 OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

80253888 80253888 100%0.06K 1253967   64   5015868K

iommu_iova


Do you really have a peak demand of ~80 million simultaneous DMA
buffers, or is some driver leaking DMA mappings?

Robin.


489472 489123  99%0.03K   3824  128 15296K kmalloc-32

297444 271112  91%0.19K   7082   42 56656K dentry

254400 252784  99%0.06K   3975   64 15900K

anon_vma_chain


222528  39255  17%0.50K   6954   32111264K kmalloc-512

202482 201814  99%0.19K   4821   42 38568K

vm_area_struct


200192 200192 100%0.01K391  512  1564K kmalloc-8

170528 169359  99%0.25K   5329   32 42632K filp

158144 153508  97%0.06K   2471   64  9884K kmalloc-64

149914 149365  99%0.09K   3259   46 13036K anon_vma

146640 143123  97%0.10K   3760   39 15040K buffer_head

130368  32791  25%0.09K   3104   42 12416K kmalloc-96

129752 129752 100%0.07K   2317   56  9268K Acpi-Operand

105468 105106  99%0.04K   1034  102  4136K
selinux_inode_security
73080  73080 100%0.13K   2436   30  9744K

kernfs_node_cache


72360  70261  97%0.59K   1340   54 42880K inode_cache

71040  71040 100%0.12K   2220   32  8880K

eventpoll_epi


68096  59262  87%0.02K266  256  1064K kmalloc-16

53652  53652 100%0.04K526  102  2104K pde_opener

50496  31654  62%2.00K   3156   16100992K

kmalloc-2048


46242  46242 100%0.19K   1101   42  8808K cred_jar

44496  43013  96%0.66K927   48 29664K

proc_inode_cache


44352  44352 100%0.06K693   64  2772K

task_delay_info


43516  43471  99%0.69K946   46 30272K

sock_inode_cache


37856  27626  72%1.00K   1183   32 37856K

kmalloc-1024


36736  36736 100%0.07K656   56  2624K

eventpoll_pwq


34076  31282  91%0.57K   1217   28 19472K

radix_tree_node


33660  30528  90%1.05K   1122   30 35904K


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-04-24 Thread Shaik Ameer Basha
On Fri, Apr 24, 2020 at 8:59 PM Robin Murphy  wrote:
>
> On 2020-04-24 4:04 pm, Ajay kumar wrote:
> > Can someone check this?
> >
> > On Mon, Apr 20, 2020 at 9:24 PM Ajay kumar  wrote:
> >>
> >> Hi All,
> >>
> >> I have an IOMMU master which has limitations as mentioned below:
> >> 1) The IOMMU master internally executes a firmware, and the firmware memory
> >> is allocated by the same master driver.
> >> The firmware buffer address should be of the lowest range than other 
> >> address
> >> allocated by the device, or in other words, all the remaining buffer 
> >> addresses
> >> should always be in a higher range than the firmware address.
> >> 2) None of the buffer addresses should go beyond 0xC000_
>
> That particular constraint could (and perhaps should) be expressed as a
> DMA mask/limit for the device, but if you have specific requirements to

Yes Robin. We do use 0xC000_ address to set the DMA mask in our driver.

> place buffers at particular addresses then you might be better off
> managing your own IOMMU domain like some other (mostly DRM) drivers do.

If you remember any of such drivers can you please point the driver path ?

> The DMA APIs don't offer any guarantees about what addresses you'll get
> other than that they won't exceed the appropriate mask.

True, we have gone through most of the APIs and didn't find any way to match our
requirements with the existing DMA APIs

>
> >> example:
> >> If firmware buffer address is buf_fw = 0x8000_5000;
> >> All other addresses given to the device should be greater than
> >> (0x8000_5000 + firmware size) and less than 0xC000_
>
> Out of curiosity, how do you control that in the no-IOMMU or IOMMU
> passthrough cases?

We manage the no-IOMMU or pass through cases using the reserved-memory.

>
> Robin.
>
> >> Currently, this is being handled with one of the below hacks:
> >> 1) By keeping dma_mask in lower range while allocating firmware buffer,
> >> and then increasing the dma_mask to higher range for other buffers.
> >> 2) By reserving IOVA for firmware at the lowest range and creating direct 
> >> mappings for the same.
> >>
> >> I want to know if there is a better way this can be handled with current 
> >> framework,
> >> or if anybody is facing similar problems with their devices,
> >> please share how it is taken care.
> >>
> >> I also think there should be some way the masters can specify the IOVA
> >> range they want to limit to for current allocation.
> >> Something like a new iommu_ops callback like below:
> >> limit_iova_alloc_range(dev, iova_start, iova_end)
> >>
> >> And, in my driver, the sequence will be:
> >> limit_iova_alloc_range(dev, 0x_, 0x1000_); /* via helpers */
> >> alloc( ) firmware buffer using DMA API
> >> limit_iova_alloc_range(dev, 0x1000_, 0xC000_); /* via helpers */
> >> alloc( ) other buffers using DMA API
> >>

Just want to understand more from you, on the new iommu_ops we suggested.
Shouldn't device have that flexibility to allocate IOVA as per it's requirement?
If you see our device as example, we need to have control on the
allocated IOVA region
based on where device is using this buffer.

If we have these callbacks in place, then the low level IOMMU driver
can implement and
manage such requests when needed.

If this can't be taken forward for some right reasons, then we will
definitely try to understand
on how to manage the IOMMU domain from our driver as per your suggestion

- Shaik.

> >> Thanks,
> >> Ajay Kumar
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-04-24 Thread Robin Murphy

On 2020-04-24 4:04 pm, Ajay kumar wrote:

Can someone check this?

On Mon, Apr 20, 2020 at 9:24 PM Ajay kumar  wrote:


Hi All,

I have an IOMMU master which has limitations as mentioned below:
1) The IOMMU master internally executes a firmware, and the firmware memory
is allocated by the same master driver.
The firmware buffer address should be of the lowest range than other address
allocated by the device, or in other words, all the remaining buffer addresses
should always be in a higher range than the firmware address.
2) None of the buffer addresses should go beyond 0xC000_


That particular constraint could (and perhaps should) be expressed as a 
DMA mask/limit for the device, but if you have specific requirements to 
place buffers at particular addresses then you might be better off 
managing your own IOMMU domain like some other (mostly DRM) drivers do. 
The DMA APIs don't offer any guarantees about what addresses you'll get 
other than that they won't exceed the appropriate mask.



example:
If firmware buffer address is buf_fw = 0x8000_5000;
All other addresses given to the device should be greater than
(0x8000_5000 + firmware size) and less than 0xC000_


Out of curiosity, how do you control that in the no-IOMMU or IOMMU 
passthrough cases?


Robin.


Currently, this is being handled with one of the below hacks:
1) By keeping dma_mask in lower range while allocating firmware buffer,
and then increasing the dma_mask to higher range for other buffers.
2) By reserving IOVA for firmware at the lowest range and creating direct 
mappings for the same.

I want to know if there is a better way this can be handled with current 
framework,
or if anybody is facing similar problems with their devices,
please share how it is taken care.

I also think there should be some way the masters can specify the IOVA
range they want to limit to for current allocation.
Something like a new iommu_ops callback like below:
limit_iova_alloc_range(dev, iova_start, iova_end)

And, in my driver, the sequence will be:
limit_iova_alloc_range(dev, 0x_, 0x1000_); /* via helpers */
alloc( ) firmware buffer using DMA API
limit_iova_alloc_range(dev, 0x1000_, 0xC000_); /* via helpers */
alloc( ) other buffers using DMA API

Thanks,
Ajay Kumar

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-04-24 Thread Ajay kumar
Can someone check this?

On Mon, Apr 20, 2020 at 9:24 PM Ajay kumar  wrote:
>
> Hi All,
>
> I have an IOMMU master which has limitations as mentioned below:
> 1) The IOMMU master internally executes a firmware, and the firmware memory
> is allocated by the same master driver.
> The firmware buffer address should be of the lowest range than other address
> allocated by the device, or in other words, all the remaining buffer addresses
> should always be in a higher range than the firmware address.
> 2) None of the buffer addresses should go beyond 0xC000_
>
> example:
> If firmware buffer address is buf_fw = 0x8000_5000;
> All other addresses given to the device should be greater than
> (0x8000_5000 + firmware size) and less than 0xC000_
>
> Currently, this is being handled with one of the below hacks:
> 1) By keeping dma_mask in lower range while allocating firmware buffer,
> and then increasing the dma_mask to higher range for other buffers.
> 2) By reserving IOVA for firmware at the lowest range and creating direct 
> mappings for the same.
>
> I want to know if there is a better way this can be handled with current 
> framework,
> or if anybody is facing similar problems with their devices,
> please share how it is taken care.
>
> I also think there should be some way the masters can specify the IOVA
> range they want to limit to for current allocation.
> Something like a new iommu_ops callback like below:
> limit_iova_alloc_range(dev, iova_start, iova_end)
>
> And, in my driver, the sequence will be:
> limit_iova_alloc_range(dev, 0x_, 0x1000_); /* via helpers */
> alloc( ) firmware buffer using DMA API
> limit_iova_alloc_range(dev, 0x1000_, 0xC000_); /* via helpers */
> alloc( ) other buffers using DMA API
>
> Thanks,
> Ajay Kumar
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/omap: Add registration for DT fwnode pointer

2020-04-24 Thread Tero Kristo via iommu
The fwnode pointer must be passed to the iommu core, so that the core
can map the IOMMU towards device requests properly. Without this, some
IOMMU clients like OMAP remoteproc will fail the iommu configuration
multiple times with -EPROBE_DEFER, which will eventually be ignored with
a kernel warning banner.

Signed-off-by: Tero Kristo 
---
 drivers/iommu/omap-iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 887fefcb03b4..934726e20b72 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1248,6 +1248,7 @@ static int omap_iommu_probe(struct platform_device *pdev)
goto out_group;
 
iommu_device_set_ops(>iommu, _iommu_ops);
+   iommu_device_set_fwnode(>iommu, >fwnode);
 
err = iommu_device_register(>iommu);
if (err)
-- 
2.17.1

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. 
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu_iova slab eats too much memory

2020-04-24 Thread Bin
Dear Robin:
Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?

Robin Murphy  于2020年4月24日周五 下午8:15写道:

> On 2020-04-24 1:06 pm, Bin wrote:
> > I'm not familiar with the mmu stuff, so what you mean by "some driver
> > leaking DMA mappings", is it possible that some other kernel module like
> > KVM or NIC driver leads to the leaking problem instead of the iommu
> module
> > itself?
>
> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> should, since I'd expect a lot of people to have noticed that. It's far
> more likely that some driver is failing to call dma_unmap_* when it's
> finished with a buffer - with the IOMMU disabled that would be a no-op
> on x86 with a modern 64-bit-capable device, so such a latent bug could
> have been easily overlooked.
>
> Robin.
>
> > Bin  于 2020年4月24日周五 20:00写道:
> >
> >> Well, that's the problem! I'm assuming the iommu kernel module is
> leaking
> >> memory. But I don't know why and how.
> >>
> >> Do you have any idea about it? Or any further information is needed?
> >>
> >> Robin Murphy  于 2020年4月24日周五 19:20写道:
> >>
> >>> On 2020-04-24 1:40 am, Bin wrote:
>  Hello? anyone there?
> 
>  Bin  于2020年4月23日周四 下午5:14写道:
> 
> > Forget to mention, I've already disabled the slab merge, so this is
> >>> what
> > it is.
> >
> > Bin  于2020年4月23日周四 下午5:11写道:
> >
> >> Hey, guys:
> >>
> >> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>
> >> ```
> >> # cat /etc/lsb-release
> >> DISTRIB_ID="Container Linux by CoreOS"
> >> DISTRIB_RELEASE=2303.3.0
> >> DISTRIB_CODENAME="Rhyolite"
> >> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >> ```
> >>
> >> ```
> >> # uname -a
> >> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> >>> 2019
> >> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>> GNU/Linux
> >> ```
> >> Recently, I found my vms constently being killed due to OOM, and
> after
> >> digging into the problem, I finally realized that the kernel is
> >>> leaking
> >> memory.
> >>
> >> Here's my slabinfo:
> >>
> >>Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
> >>Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
> >>Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>Active / Total Size (% used)   : 5828768.08K / 5996848.72K
> >>> (97.2%)
> >>Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>
> >> OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >>
> >> 80253888 80253888 100%0.06K 1253967   64   5015868K
> iommu_iova
> >>>
> >>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>> buffers, or is some driver leaking DMA mappings?
> >>>
> >>> Robin.
> >>>
> >> 489472 489123  99%0.03K   3824  128 15296K kmalloc-32
> >>
> >> 297444 271112  91%0.19K   7082   42 56656K dentry
> >>
> >> 254400 252784  99%0.06K   3975   64 15900K
> anon_vma_chain
> >>
> >> 222528  39255  17%0.50K   6954   32111264K kmalloc-512
> >>
> >> 202482 201814  99%0.19K   4821   42 38568K
> vm_area_struct
> >>
> >> 200192 200192 100%0.01K391  512  1564K kmalloc-8
> >>
> >> 170528 169359  99%0.25K   5329   32 42632K filp
> >>
> >> 158144 153508  97%0.06K   2471   64  9884K kmalloc-64
> >>
> >> 149914 149365  99%0.09K   3259   46 13036K anon_vma
> >>
> >> 146640 143123  97%0.10K   3760   39 15040K buffer_head
> >>
> >> 130368  32791  25%0.09K   3104   42 12416K kmalloc-96
> >>
> >> 129752 129752 100%0.07K   2317   56  9268K Acpi-Operand
> >>
> >> 105468 105106  99%0.04K   1034  102  4136K
> >> selinux_inode_security
> >>73080  73080 100%0.13K   2436   30  9744K
> >>> kernfs_node_cache
> >>
> >>72360  70261  97%0.59K   1340   54 42880K inode_cache
> >>
> >>71040  71040 100%0.12K   2220   32  8880K
> eventpoll_epi
> >>
> >>68096  59262  87%0.02K266  256  1064K kmalloc-16
> >>
> >>53652  53652 100%0.04K526  102  2104K pde_opener
> >>
> >>50496  31654  62%2.00K   3156   16100992K
> kmalloc-2048
> >>
> >>46242  46242 100%0.19K   1101   42  8808K cred_jar
> >>
> >>44496  43013  96%0.66K927   48 29664K
> >>> proc_inode_cache
> >>
> >>44352  44352 100%0.06K693   64  2772K
> >>> task_delay_info
> >>
> >>43516  43471  99%0.69K946   46 30272K
> >>> 

Re: iommu_iova slab eats too much memory

2020-04-24 Thread Robin Murphy

On 2020-04-24 1:06 pm, Bin wrote:

I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu module
itself?


Yes - I doubt that intel-iommu itself is failing to free IOVAs when it 
should, since I'd expect a lot of people to have noticed that. It's far 
more likely that some driver is failing to call dma_unmap_* when it's 
finished with a buffer - with the IOMMU disabled that would be a no-op 
on x86 with a modern 64-bit-capable device, so such a latent bug could 
have been easily overlooked.


Robin.


Bin  于 2020年4月24日周五 20:00写道:


Well, that's the problem! I'm assuming the iommu kernel module is leaking
memory. But I don't know why and how.

Do you have any idea about it? Or any further information is needed?

Robin Murphy  于 2020年4月24日周五 19:20写道:


On 2020-04-24 1:40 am, Bin wrote:

Hello? anyone there?

Bin  于2020年4月23日周四 下午5:14写道:


Forget to mention, I've already disabled the slab merge, so this is

what

it is.

Bin  于2020年4月23日周四 下午5:11写道:


Hey, guys:

I'm running a batch of CoreOS boxes, the lsb_release is:

```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```

```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00

2019

x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel

GNU/Linux

```
Recently, I found my vms constently being killed due to OOM, and after
digging into the problem, I finally realized that the kernel is

leaking

memory.

Here's my slabinfo:

   Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
   Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
   Active / Total Caches (% used) : 152 / 217 (70.0%)
   Active / Total Size (% used)   : 5828768.08K / 5996848.72K

(97.2%)

   Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K

OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova


Do you really have a peak demand of ~80 million simultaneous DMA
buffers, or is some driver leaking DMA mappings?

Robin.


489472 489123  99%0.03K   3824  128 15296K kmalloc-32

297444 271112  91%0.19K   7082   42 56656K dentry

254400 252784  99%0.06K   3975   64 15900K anon_vma_chain

222528  39255  17%0.50K   6954   32111264K kmalloc-512

202482 201814  99%0.19K   4821   42 38568K vm_area_struct

200192 200192 100%0.01K391  512  1564K kmalloc-8

170528 169359  99%0.25K   5329   32 42632K filp

158144 153508  97%0.06K   2471   64  9884K kmalloc-64

149914 149365  99%0.09K   3259   46 13036K anon_vma

146640 143123  97%0.10K   3760   39 15040K buffer_head

130368  32791  25%0.09K   3104   42 12416K kmalloc-96

129752 129752 100%0.07K   2317   56  9268K Acpi-Operand

105468 105106  99%0.04K   1034  102  4136K
selinux_inode_security
   73080  73080 100%0.13K   2436   30  9744K

kernfs_node_cache


   72360  70261  97%0.59K   1340   54 42880K inode_cache

   71040  71040 100%0.12K   2220   32  8880K eventpoll_epi

   68096  59262  87%0.02K266  256  1064K kmalloc-16

   53652  53652 100%0.04K526  102  2104K pde_opener

   50496  31654  62%2.00K   3156   16100992K kmalloc-2048

   46242  46242 100%0.19K   1101   42  8808K cred_jar

   44496  43013  96%0.66K927   48 29664K

proc_inode_cache


   44352  44352 100%0.06K693   64  2772K

task_delay_info


   43516  43471  99%0.69K946   46 30272K

sock_inode_cache


   37856  27626  72%1.00K   1183   32 37856K kmalloc-1024

   36736  36736 100%0.07K656   56  2624K eventpoll_pwq

   34076  31282  91%0.57K   1217   28 19472K

radix_tree_node


   33660  30528  90%1.05K   1122   30 35904K

ext4_inode_cache


   32760  30959  94%0.19K780   42  6240K kmalloc-192

   32028  32028 100%0.04K314  102  1256K

ext4_extent_status


   30048  30048 100%0.25K939   32  7512K

skbuff_head_cache


   28736  28736 100%0.06K449   64  1796K fs_cache

   24702  24702 100%0.69K537   46 17184K files_cache

   23808  23808 100%0.66K496   48 15872K ovl_inode

   23104  22945  99%0.12K722   32  2888K kmalloc-128

   22724  21307  93%0.69K494   46 15808K

shmem_inode_cache


   21472  21472 100%0.12K671   32  2684K seq_file

   19904  19904 100%1.00K622   32 19904K UNIX

   17340  17340 100%1.06K578   30 18496K mm_struct

 

Re: iommu_iova slab eats too much memory

2020-04-24 Thread Bin
I'm not familiar with the mmu stuff, so what you mean by "some driver
leaking DMA mappings", is it possible that some other kernel module like
KVM or NIC driver leads to the leaking problem instead of the iommu module
itself?

Bin  于 2020年4月24日周五 20:00写道:

> Well, that's the problem! I'm assuming the iommu kernel module is leaking
> memory. But I don't know why and how.
>
> Do you have any idea about it? Or any further information is needed?
>
> Robin Murphy  于 2020年4月24日周五 19:20写道:
>
>> On 2020-04-24 1:40 am, Bin wrote:
>> > Hello? anyone there?
>> >
>> > Bin  于2020年4月23日周四 下午5:14写道:
>> >
>> >> Forget to mention, I've already disabled the slab merge, so this is
>> what
>> >> it is.
>> >>
>> >> Bin  于2020年4月23日周四 下午5:11写道:
>> >>
>> >>> Hey, guys:
>> >>>
>> >>> I'm running a batch of CoreOS boxes, the lsb_release is:
>> >>>
>> >>> ```
>> >>> # cat /etc/lsb-release
>> >>> DISTRIB_ID="Container Linux by CoreOS"
>> >>> DISTRIB_RELEASE=2303.3.0
>> >>> DISTRIB_CODENAME="Rhyolite"
>> >>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
>> >>> ```
>> >>>
>> >>> ```
>> >>> # uname -a
>> >>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
>> 2019
>> >>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
>> GNU/Linux
>> >>> ```
>> >>> Recently, I found my vms constently being killed due to OOM, and after
>> >>> digging into the problem, I finally realized that the kernel is
>> leaking
>> >>> memory.
>> >>>
>> >>> Here's my slabinfo:
>> >>>
>> >>>   Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
>> >>>   Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
>> >>>   Active / Total Caches (% used) : 152 / 217 (70.0%)
>> >>>   Active / Total Size (% used)   : 5828768.08K / 5996848.72K
>> (97.2%)
>> >>>   Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
>> >>>
>> >>>OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> >>>
>> >>> 80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova
>>
>> Do you really have a peak demand of ~80 million simultaneous DMA
>> buffers, or is some driver leaking DMA mappings?
>>
>> Robin.
>>
>> >>> 489472 489123  99%0.03K   3824  128 15296K kmalloc-32
>> >>>
>> >>> 297444 271112  91%0.19K   7082   42 56656K dentry
>> >>>
>> >>> 254400 252784  99%0.06K   3975   64 15900K anon_vma_chain
>> >>>
>> >>> 222528  39255  17%0.50K   6954   32111264K kmalloc-512
>> >>>
>> >>> 202482 201814  99%0.19K   4821   42 38568K vm_area_struct
>> >>>
>> >>> 200192 200192 100%0.01K391  512  1564K kmalloc-8
>> >>>
>> >>> 170528 169359  99%0.25K   5329   32 42632K filp
>> >>>
>> >>> 158144 153508  97%0.06K   2471   64  9884K kmalloc-64
>> >>>
>> >>> 149914 149365  99%0.09K   3259   46 13036K anon_vma
>> >>>
>> >>> 146640 143123  97%0.10K   3760   39 15040K buffer_head
>> >>>
>> >>> 130368  32791  25%0.09K   3104   42 12416K kmalloc-96
>> >>>
>> >>> 129752 129752 100%0.07K   2317   56  9268K Acpi-Operand
>> >>>
>> >>> 105468 105106  99%0.04K   1034  102  4136K
>> >>> selinux_inode_security
>> >>>   73080  73080 100%0.13K   2436   30  9744K
>> kernfs_node_cache
>> >>>
>> >>>   72360  70261  97%0.59K   1340   54 42880K inode_cache
>> >>>
>> >>>   71040  71040 100%0.12K   2220   32  8880K eventpoll_epi
>> >>>
>> >>>   68096  59262  87%0.02K266  256  1064K kmalloc-16
>> >>>
>> >>>   53652  53652 100%0.04K526  102  2104K pde_opener
>> >>>
>> >>>   50496  31654  62%2.00K   3156   16100992K kmalloc-2048
>> >>>
>> >>>   46242  46242 100%0.19K   1101   42  8808K cred_jar
>> >>>
>> >>>   44496  43013  96%0.66K927   48 29664K
>> proc_inode_cache
>> >>>
>> >>>   44352  44352 100%0.06K693   64  2772K
>> task_delay_info
>> >>>
>> >>>   43516  43471  99%0.69K946   46 30272K
>> sock_inode_cache
>> >>>
>> >>>   37856  27626  72%1.00K   1183   32 37856K kmalloc-1024
>> >>>
>> >>>   36736  36736 100%0.07K656   56  2624K eventpoll_pwq
>> >>>
>> >>>   34076  31282  91%0.57K   1217   28 19472K
>> radix_tree_node
>> >>>
>> >>>   33660  30528  90%1.05K   1122   30 35904K
>> ext4_inode_cache
>> >>>
>> >>>   32760  30959  94%0.19K780   42  6240K kmalloc-192
>> >>>
>> >>>   32028  32028 100%0.04K314  102  1256K
>> ext4_extent_status
>> >>>
>> >>>   30048  30048 100%0.25K939   32  7512K
>> skbuff_head_cache
>> >>>
>> >>>   28736  28736 100%0.06K449   64  1796K fs_cache
>> >>>
>> >>>   24702  24702 100%0.69K537   46 17184K files_cache
>> >>>
>> >>>   23808  23808 100%0.66K496   48 15872K ovl_inode
>> >>>
>> >>>   23104  22945  99%0.12K722   32  2888K kmalloc-128
>> >>>

Re: iommu_iova slab eats too much memory

2020-04-24 Thread Bin
Well, that's the problem! I'm assuming the iommu kernel module is leaking
memory. But I don't know why and how.

Do you have any idea about it? Or any further information is needed?

Robin Murphy  于 2020年4月24日周五 19:20写道:

> On 2020-04-24 1:40 am, Bin wrote:
> > Hello? anyone there?
> >
> > Bin  于2020年4月23日周四 下午5:14写道:
> >
> >> Forget to mention, I've already disabled the slab merge, so this is what
> >> it is.
> >>
> >> Bin  于2020年4月23日周四 下午5:11写道:
> >>
> >>> Hey, guys:
> >>>
> >>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>
> >>> ```
> >>> # cat /etc/lsb-release
> >>> DISTRIB_ID="Container Linux by CoreOS"
> >>> DISTRIB_RELEASE=2303.3.0
> >>> DISTRIB_CODENAME="Rhyolite"
> >>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >>> ```
> >>>
> >>> ```
> >>> # uname -a
> >>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
> >>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
> >>> ```
> >>> Recently, I found my vms constently being killed due to OOM, and after
> >>> digging into the problem, I finally realized that the kernel is leaking
> >>> memory.
> >>>
> >>> Here's my slabinfo:
> >>>
> >>>   Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
> >>>   Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
> >>>   Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>   Active / Total Size (% used)   : 5828768.08K / 5996848.72K
> (97.2%)
> >>>   Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>
> >>>OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >>>
> >>> 80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova
>
> Do you really have a peak demand of ~80 million simultaneous DMA
> buffers, or is some driver leaking DMA mappings?
>
> Robin.
>
> >>> 489472 489123  99%0.03K   3824  128 15296K kmalloc-32
> >>>
> >>> 297444 271112  91%0.19K   7082   42 56656K dentry
> >>>
> >>> 254400 252784  99%0.06K   3975   64 15900K anon_vma_chain
> >>>
> >>> 222528  39255  17%0.50K   6954   32111264K kmalloc-512
> >>>
> >>> 202482 201814  99%0.19K   4821   42 38568K vm_area_struct
> >>>
> >>> 200192 200192 100%0.01K391  512  1564K kmalloc-8
> >>>
> >>> 170528 169359  99%0.25K   5329   32 42632K filp
> >>>
> >>> 158144 153508  97%0.06K   2471   64  9884K kmalloc-64
> >>>
> >>> 149914 149365  99%0.09K   3259   46 13036K anon_vma
> >>>
> >>> 146640 143123  97%0.10K   3760   39 15040K buffer_head
> >>>
> >>> 130368  32791  25%0.09K   3104   42 12416K kmalloc-96
> >>>
> >>> 129752 129752 100%0.07K   2317   56  9268K Acpi-Operand
> >>>
> >>> 105468 105106  99%0.04K   1034  102  4136K
> >>> selinux_inode_security
> >>>   73080  73080 100%0.13K   2436   30  9744K
> kernfs_node_cache
> >>>
> >>>   72360  70261  97%0.59K   1340   54 42880K inode_cache
> >>>
> >>>   71040  71040 100%0.12K   2220   32  8880K eventpoll_epi
> >>>
> >>>   68096  59262  87%0.02K266  256  1064K kmalloc-16
> >>>
> >>>   53652  53652 100%0.04K526  102  2104K pde_opener
> >>>
> >>>   50496  31654  62%2.00K   3156   16100992K kmalloc-2048
> >>>
> >>>   46242  46242 100%0.19K   1101   42  8808K cred_jar
> >>>
> >>>   44496  43013  96%0.66K927   48 29664K
> proc_inode_cache
> >>>
> >>>   44352  44352 100%0.06K693   64  2772K task_delay_info
> >>>
> >>>   43516  43471  99%0.69K946   46 30272K
> sock_inode_cache
> >>>
> >>>   37856  27626  72%1.00K   1183   32 37856K kmalloc-1024
> >>>
> >>>   36736  36736 100%0.07K656   56  2624K eventpoll_pwq
> >>>
> >>>   34076  31282  91%0.57K   1217   28 19472K radix_tree_node
> >>>
> >>>   33660  30528  90%1.05K   1122   30 35904K
> ext4_inode_cache
> >>>
> >>>   32760  30959  94%0.19K780   42  6240K kmalloc-192
> >>>
> >>>   32028  32028 100%0.04K314  102  1256K
> ext4_extent_status
> >>>
> >>>   30048  30048 100%0.25K939   32  7512K
> skbuff_head_cache
> >>>
> >>>   28736  28736 100%0.06K449   64  1796K fs_cache
> >>>
> >>>   24702  24702 100%0.69K537   46 17184K files_cache
> >>>
> >>>   23808  23808 100%0.66K496   48 15872K ovl_inode
> >>>
> >>>   23104  22945  99%0.12K722   32  2888K kmalloc-128
> >>>
> >>>   22724  21307  93%0.69K494   46 15808K
> shmem_inode_cache
> >>>
> >>>   21472  21472 100%0.12K671   32  2684K seq_file
> >>>
> >>>   19904  19904 100%1.00K622   32 19904K UNIX
> >>>
> >>>   17340  17340 100%1.06K578   30 18496K mm_struct
> >>>
> >>>   15980  15980 100%0.02K 94  170   376K avtab_node
> >>>
> >>>   14070 

Re: iommu_iova slab eats too much memory

2020-04-24 Thread Robin Murphy

On 2020-04-24 1:40 am, Bin wrote:

Hello? anyone there?

Bin  于2020年4月23日周四 下午5:14写道:


Forget to mention, I've already disabled the slab merge, so this is what
it is.

Bin  于2020年4月23日周四 下午5:11写道:


Hey, guys:

I'm running a batch of CoreOS boxes, the lsb_release is:

```
# cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=2303.3.0
DISTRIB_CODENAME="Rhyolite"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
```

```
# uname -a
Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00 2019
x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel GNU/Linux
```
Recently, I found my vms constently being killed due to OOM, and after
digging into the problem, I finally realized that the kernel is leaking
memory.

Here's my slabinfo:

  Active / Total Objects (% used): 83818306 / 84191607 (99.6%)
  Active / Total Slabs (% used)  : 1336293 / 1336293 (100.0%)
  Active / Total Caches (% used) : 152 / 217 (70.0%)
  Active / Total Size (% used)   : 5828768.08K / 5996848.72K (97.2%)
  Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

80253888 80253888 100%0.06K 1253967   64   5015868K iommu_iova


Do you really have a peak demand of ~80 million simultaneous DMA 
buffers, or is some driver leaking DMA mappings?


Robin.


489472 489123  99%0.03K   3824  128 15296K kmalloc-32

297444 271112  91%0.19K   7082   42 56656K dentry

254400 252784  99%0.06K   3975   64 15900K anon_vma_chain

222528  39255  17%0.50K   6954   32111264K kmalloc-512

202482 201814  99%0.19K   4821   42 38568K vm_area_struct

200192 200192 100%0.01K391  512  1564K kmalloc-8

170528 169359  99%0.25K   5329   32 42632K filp

158144 153508  97%0.06K   2471   64  9884K kmalloc-64

149914 149365  99%0.09K   3259   46 13036K anon_vma

146640 143123  97%0.10K   3760   39 15040K buffer_head

130368  32791  25%0.09K   3104   42 12416K kmalloc-96

129752 129752 100%0.07K   2317   56  9268K Acpi-Operand

105468 105106  99%0.04K   1034  102  4136K
selinux_inode_security
  73080  73080 100%0.13K   2436   30  9744K kernfs_node_cache

  72360  70261  97%0.59K   1340   54 42880K inode_cache

  71040  71040 100%0.12K   2220   32  8880K eventpoll_epi

  68096  59262  87%0.02K266  256  1064K kmalloc-16

  53652  53652 100%0.04K526  102  2104K pde_opener

  50496  31654  62%2.00K   3156   16100992K kmalloc-2048

  46242  46242 100%0.19K   1101   42  8808K cred_jar

  44496  43013  96%0.66K927   48 29664K proc_inode_cache

  44352  44352 100%0.06K693   64  2772K task_delay_info

  43516  43471  99%0.69K946   46 30272K sock_inode_cache

  37856  27626  72%1.00K   1183   32 37856K kmalloc-1024

  36736  36736 100%0.07K656   56  2624K eventpoll_pwq

  34076  31282  91%0.57K   1217   28 19472K radix_tree_node

  33660  30528  90%1.05K   1122   30 35904K ext4_inode_cache

  32760  30959  94%0.19K780   42  6240K kmalloc-192

  32028  32028 100%0.04K314  102  1256K ext4_extent_status

  30048  30048 100%0.25K939   32  7512K skbuff_head_cache

  28736  28736 100%0.06K449   64  1796K fs_cache

  24702  24702 100%0.69K537   46 17184K files_cache

  23808  23808 100%0.66K496   48 15872K ovl_inode

  23104  22945  99%0.12K722   32  2888K kmalloc-128

  22724  21307  93%0.69K494   46 15808K shmem_inode_cache

  21472  21472 100%0.12K671   32  2684K seq_file

  19904  19904 100%1.00K622   32 19904K UNIX

  17340  17340 100%1.06K578   30 18496K mm_struct

  15980  15980 100%0.02K 94  170   376K avtab_node

  14070  14070 100%1.06K469   30 15008K signal_cache

  13248  13248 100%0.12K414   32  1656K pid

  12128  11777  97%0.25K379   32  3032K kmalloc-256

  11008  11008 100%0.02K 43  256   172K
selinux_file_security
  10812  10812 100%0.04K106  102   424K Acpi-Namespace

These information shows that the 'iommu_iova' is the top memory consumer.
In order to optimize the network performence of Openstack virtual machines,
I enabled the vt-d feature in bios and sriov feature of Intel 82599 10G
NIC. I'm assuming this is the root cause of this issue.

Is there anything I can do to fix it?






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list

RE: [PATCH v12 4/8] iommu/vt-d: Add bind guest PASID support

2020-04-24 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Wednesday, April 22, 2020 2:53 AM
> 
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.

Above description is not accurate. Guest PASID table updates don't
'result in' PASID cache flush automatically. What about:
--
The guest needs to invalidate the PASID cache for any update to
guest PASID table. Those invalidation requests are intercepted
by the VMM and passed down to the host as binding guest PASID
calls.
--
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
> .-.  .---.
> |   vIOMMU|  | Guest process CR3, FL only|
> | |  '---'
> ./
> | PASID Entry |--- PASID cache flush -
> '-'   |
> | |   V
> | |CR3 in GPA
> '-'
> Guest
> --| Shadow |--|
>   vv  v
> Host
> .-.  .--.
> |   pIOMMU|  | Bind FL for GVA-GPA  |
> | |  '--'
> ./  |
> | PASID Entry | V (Nested xlate)
> '\.--.
> | |   |SL for GPA-HPA, default domain|
> | |   '--'
> '-'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 204
> 
>  include/linux/intel-iommu.h |   8 +-
>  include/linux/intel-svm.h   |  17 
>  4 files changed, 232 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 9c01e391a931..8862d6b0ef21 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6179,6 +6179,10 @@ const struct iommu_ops intel_iommu_ops = {
>   .dev_disable_feat   = intel_iommu_dev_disable_feat,
>   .is_attach_deferred = intel_iommu_is_attach_deferred,
>   .pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid= intel_svm_bind_gpasid,
> + .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
> +#endif
>  };
> 
>  static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 2998418f0a38..69b2070b843d 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -226,6 +226,210 @@ static LIST_HEAD(global_svm_list);
>   list_for_each_entry((sdev), &(svm)->devs, list) \
>   if ((d) != (sdev)->dev) {} else
> 
> +static inline void intel_svm_free_if_empty(struct intel_svm *svm, u64 pasid)
> +{
> + if (list_empty(>devs)) {
> + ioasid_set_data(pasid, NULL);
> + kfree(svm);
> + }
> +}

Do we really need a function form instead of putting the 4 lines directly 
after the 'out' label?

> +
> +int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device
> *dev,
> +   struct iommu_gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct dmar_domain *dmar_domain;
> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;

well, why not checking !dev together?

> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + } else {
> + return -ENOTSUPP;
> + }
> +
> + /*
> +  * We only check host PASID range, we have no knowledge to check
> +  * guest PASID range.
> +  */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + dmar_domain = to_dmar_domain(domain);
> +
> + mutex_lock(_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> + if (svm) {
> + /*
> +  * If we found svm for the PASID, there must be at
> +  * least one device bond, otherwise svm should be freed.
> +  */
> + if 

Re: [PATCH v3] of_device: removed #include that caused a recursion in included headers

2020-04-24 Thread Lee Jones
On Mon, 20 Apr 2020, Hadar Gat wrote:

> Both of_platform.h and of_device.h were included each other.
> In of_device.h, removed unneeded #include to of_platform.h
> and added include to of_platform.h in the files that needs it.
> 
> Signed-off-by: Hadar Gat 
> Reported-by: kbuild test robot 
> Acked-by: Jonathan Cameron  #for-iio
> Acked-by: Stephen Boyd  # clk
> ---
> v3: add include to of_platform.h in more files. (reported due other builds)
> v2: add include to of_platform.h in more files. (reported due other builds)
> 
>  arch/sparc/kernel/pci.c   | 1 +
>  arch/sparc/kernel/pci_sabre.c | 1 +
>  arch/sparc/kernel/pci_schizo.c| 1 +
>  arch/sparc/kernel/sbus.c  | 1 +
>  arch/sparc/mm/io-unit.c   | 1 +
>  arch/sparc/mm/iommu.c | 1 +
>  drivers/base/platform.c   | 1 +
>  drivers/bus/imx-weim.c| 1 +
>  drivers/bus/vexpress-config.c | 1 +
>  drivers/clk/mediatek/clk-mt7622-aud.c | 1 +
>  drivers/dma/at_hdmac.c| 1 +
>  drivers/dma/stm32-dmamux.c| 1 +
>  drivers/dma/ti/dma-crossbar.c | 1 +
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 1 +
>  drivers/gpu/drm/msm/hdmi/hdmi.c   | 1 +
>  drivers/gpu/drm/msm/msm_drv.c | 1 +
>  drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 1 +
>  drivers/gpu/drm/sun4i/sun4i_tcon.c| 1 +
>  drivers/iio/adc/stm32-adc-core.c  | 1 +
>  drivers/iio/adc/stm32-dfsdm-adc.c | 1 +
>  drivers/iio/adc/stm32-dfsdm-core.c| 1 +
>  drivers/iommu/tegra-smmu.c| 1 +
>  drivers/media/platform/coda/coda-common.c | 1 +
>  drivers/memory/atmel-ebi.c| 1 +

>  drivers/mfd/palmas.c  | 1 +
>  drivers/mfd/ssbi.c| 1 +

Acked-by: Lee Jones 

>  drivers/mtd/nand/raw/omap2.c  | 1 +
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 1 +
>  drivers/net/ethernet/ti/cpsw.c| 1 +
>  drivers/phy/tegra/xusb.c  | 1 +
>  drivers/pinctrl/freescale/pinctrl-imx1-core.c | 1 +
>  drivers/pinctrl/nomadik/pinctrl-nomadik.c | 1 +
>  drivers/soc/samsung/exynos-pmu.c  | 1 +
>  drivers/soc/sunxi/sunxi_sram.c| 1 +
>  include/linux/of_device.h | 2 --
>  lib/genalloc.c| 1 +
>  36 files changed, 35 insertions(+), 2 deletions(-)

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-contiguous: fix comment for dma_release_from_contiguous

2020-04-24 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping for-next branch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu