Re: Consult on ARM SMMU debugfs

2021-02-05 Thread Sai Prakash Ranjan

On 2021-01-15 22:47, Robin Murphy wrote:

On 2021-01-15 15:14, Russell King - ARM Linux admin wrote:

On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote:

On 2021-01-07 02:45, chenxiang (M) wrote:

Hi Will,� Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
lacks of enough debugfs for ARM SMMU driver (such as

the value of STE/CD which we need to check sometimes). Currently it
creates top-level iommu directory in debugfs, but there is no 
debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan 
to

do that recently?


FWIW I don't think I've ever felt the need to need to inspect the 
Stream
Table on a live system. So far the nature of the STE code has been 
simple
enough that it's very hard for any given STE to be *wrong* - either 
it's set
up as expected and thus works fine, or it's not initialised at all 
and you
get C_BAD_STE, where 99% of the time you then just cross-reference 
the

Stream ID against the firmware and find that the DT/IORT is wrong.

Similarly I don't think I've even even *seen* an issue that could be
attributed to a context descriptor, although I appreciate that as we 
start

landing more PASID and SVA support the scope for that starts to widen
considerably.

Feel free to propose a patch if you believe it would be genuinely 
useful and
won't just bit-rot into a maintenance burden, but it's not something 
that's

on our roadmap here.


I do think that the IOMMU stuff needs better debugging. I've hit the
WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
so I've resorted to putting the IOMMU into bypass mode permanently to
work around the issue.

The reason that it's undebuggable is if one puts printk() or trace
statements in the code, boots the platform, you get flooded with those
debugging messages, because every access to the rootfs generates and
tears down a mapping.

It would be nice to be able to inspect the IOMMU page tables and state
of the IOMMU, rather than having to resort to effectively disabling
the IOMMU.


Certainly once we get to stuff like unpinned VFIO, having the ability
to inspect pagetables for arbitrary IOMMU API usage will indeed be
useful. From the DMA mapping perspective, though, unless you're
working on the io-pgtable code itself it's not really going to tell
you much that dumping the mappings from dma-debug can't already.

FWIW whenever I encounter that particular warning in iommu-dma
context, I don't care where the existing mapping is pointing, since
it's merely a symptom of the damage already having been done. At that
point I'd usually go off and audit all the DMA API calls in the
offending driver, since it's typically caused by corruption in the
IOVA allocator from passing the wrong size in a dma_unmap_*() call,
and those can often be spotted by inspection. For active debugging,
what you really want to know is the *history* of operations around
that IOVA, since you're primarily interested in the request that last
mapped it, then the corresponding unmap request for nominally the same
buffer (which allowed the IOVA region to be freed for reuse) that for
some reason didn't cover one or more pages that it should have. The
IOMMU API tracepoints can be a handy tool there.



Currently IOMMU trace events are not straight forward to decode if
there are multiple devices attached. For ex: consider below:

map: IOMMU: iova=0x00035000 paddr=0x000113be2000 size=4096
unmap: IOMMU: iova=0x00034000 size=4096 unmapped_size=4096
unmap: IOMMU: iova=0x00035000 size=4096 unmapped_size=4096
map: IOMMU: iova=0x00036000 paddr=0x0001164d8000 size=4096
map: IOMMU: iova=0x00037000 paddr=0x0001164da000 size=4096
unmap: IOMMU: iova=0x00036000 size=4096 unmapped_size=4096
unmap: IOMMU: iova=0x00037000 size=4096 unmapped_size=4096

How about making it more useful adding the device name as well? Ex:

map: IOMMU:ae0.mdss iova=0x0002b000 paddr=0x00010a9e6000 
size=8192
map: IOMMU:ae0.mdss iova=0x0002d000 paddr=0x00010a9ec000 
size=21790
map: IOMMU:ae0.mdss iova=0x00241000 paddr=0x00010c40 
size=59392
map: IOMMU:a60.dwc3 iova=0x0004a000 paddr=0x00010a821000 
size=4096
map: IOMMU:a60.dwc3 iova=0x00049000 paddr=0x00010a82 
size=4096
unmap: IOMMU:a60.dwc3 iova=0x0004a000 size=4096 
unmapped_size=4096
unmap: IOMMU:a60.dwc3 iova=0x00049000 size=4096 
unmapped_size=4096


We have been carrying a local patch downstream like forever, I can post 
a

patch if you guys think it is useful in general.

Thanks
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Consult on ARM SMMU debugfs

2021-01-15 Thread Robin Murphy

On 2021-01-15 15:14, Russell King - ARM Linux admin wrote:

On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote:

On 2021-01-07 02:45, chenxiang (M) wrote:

Hi Will,� Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
lacks of enough debugfs for ARM SMMU driver (such as

the value of STE/CD which we need to check sometimes). Currently it
creates top-level iommu directory in debugfs, but there is no debugfs

for ARM SMMU driver specially. Do you know whether ARM have the plan to
do that recently?


FWIW I don't think I've ever felt the need to need to inspect the Stream
Table on a live system. So far the nature of the STE code has been simple
enough that it's very hard for any given STE to be *wrong* - either it's set
up as expected and thus works fine, or it's not initialised at all and you
get C_BAD_STE, where 99% of the time you then just cross-reference the
Stream ID against the firmware and find that the DT/IORT is wrong.

Similarly I don't think I've even even *seen* an issue that could be
attributed to a context descriptor, although I appreciate that as we start
landing more PASID and SVA support the scope for that starts to widen
considerably.

Feel free to propose a patch if you believe it would be genuinely useful and
won't just bit-rot into a maintenance burden, but it's not something that's
on our roadmap here.


I do think that the IOMMU stuff needs better debugging. I've hit the
WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
so I've resorted to putting the IOMMU into bypass mode permanently to
work around the issue.

The reason that it's undebuggable is if one puts printk() or trace
statements in the code, boots the platform, you get flooded with those
debugging messages, because every access to the rootfs generates and
tears down a mapping.

It would be nice to be able to inspect the IOMMU page tables and state
of the IOMMU, rather than having to resort to effectively disabling
the IOMMU.


Certainly once we get to stuff like unpinned VFIO, having the ability to 
inspect pagetables for arbitrary IOMMU API usage will indeed be useful. 
From the DMA mapping perspective, though, unless you're working on the 
io-pgtable code itself it's not really going to tell you much that 
dumping the mappings from dma-debug can't already.


FWIW whenever I encounter that particular warning in iommu-dma context, 
I don't care where the existing mapping is pointing, since it's merely a 
symptom of the damage already having been done. At that point I'd 
usually go off and audit all the DMA API calls in the offending driver, 
since it's typically caused by corruption in the IOVA allocator from 
passing the wrong size in a dma_unmap_*() call, and those can often be 
spotted by inspection. For active debugging, what you really want to 
know is the *history* of operations around that IOVA, since you're 
primarily interested in the request that last mapped it, then the 
corresponding unmap request for nominally the same buffer (which allowed 
the IOVA region to be freed for reuse) that for some reason didn't cover 
one or more pages that it should have. The IOMMU API tracepoints can be 
a handy tool there.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Consult on ARM SMMU debugfs

2021-01-15 Thread Russell King - ARM Linux admin
On Mon, Jan 11, 2021 at 08:01:48PM +, Robin Murphy wrote:
> On 2021-01-07 02:45, chenxiang (M) wrote:
> > Hi Will,  Robin or other guys,
> > 
> > When debugging SMMU/SVA issue on huawei ARM64 board, we find that it
> > lacks of enough debugfs for ARM SMMU driver (such as
> > 
> > the value of STE/CD which we need to check sometimes). Currently it
> > creates top-level iommu directory in debugfs, but there is no debugfs
> > 
> > for ARM SMMU driver specially. Do you know whether ARM have the plan to
> > do that recently?
> 
> FWIW I don't think I've ever felt the need to need to inspect the Stream
> Table on a live system. So far the nature of the STE code has been simple
> enough that it's very hard for any given STE to be *wrong* - either it's set
> up as expected and thus works fine, or it's not initialised at all and you
> get C_BAD_STE, where 99% of the time you then just cross-reference the
> Stream ID against the firmware and find that the DT/IORT is wrong.
> 
> Similarly I don't think I've even even *seen* an issue that could be
> attributed to a context descriptor, although I appreciate that as we start
> landing more PASID and SVA support the scope for that starts to widen
> considerably.
> 
> Feel free to propose a patch if you believe it would be genuinely useful and
> won't just bit-rot into a maintenance burden, but it's not something that's
> on our roadmap here.

I do think that the IOMMU stuff needs better debugging. I've hit the
WARN_ON() in __arm_lpae_map(), and it's been pretty much undebuggable,
so I've resorted to putting the IOMMU into bypass mode permanently to
work around the issue.

The reason that it's undebuggable is if one puts printk() or trace
statements in the code, boots the platform, you get flooded with those
debugging messages, because every access to the rootfs generates and
tears down a mapping.

It would be nice to be able to inspect the IOMMU page tables and state
of the IOMMU, rather than having to resort to effectively disabling
the IOMMU.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Consult on ARM SMMU debugfs

2021-01-15 Thread chenxiang (M)



在 2021/1/12 4:01, Robin Murphy 写道:

On 2021-01-07 02:45, chenxiang (M) wrote:

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as


the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan 
to do that recently?


FWIW I don't think I've ever felt the need to need to inspect the 
Stream Table on a live system. So far the nature of the STE code has 
been simple enough that it's very hard for any given STE to be *wrong* 
- either it's set up as expected and thus works fine, or it's not 
initialised at all and you get C_BAD_STE, where 99% of the time you 
then just cross-reference the Stream ID against the firmware and find 
that the DT/IORT is wrong.


Similarly I don't think I've even even *seen* an issue that could be 
attributed to a context descriptor, although I appreciate that as we 
start landing more PASID and SVA support the scope for that starts to 
widen considerably.


Thank you for your reply.  I aggree that it is very hard for the content 
of STE/CD to be wrong in current code, but  there are more upsteaming 
features(such as SVA/vSMMU) which are related to SMMU,
when debugging with those features, it is useful for us to locate issues 
if there are interfaces to dump those info. And also when debugging 
together with hardware guys, the content of dump is important for them too.




Feel free to propose a patch if you believe it would be genuinely 
useful and won't just bit-rot into a maintenance burden, but it's not 
something that's on our roadmap here.


OK, we are considering about incorporating following requirements into 
the plan:

- Dump the value of STE/CD of devices
- Dump page table entries of devices
- Dump the entries info of CMDQ/EVENTQ



Thanks,
Robin.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Consult on ARM SMMU debugfs

2021-01-11 Thread Robin Murphy

On 2021-01-07 02:45, chenxiang (M) wrote:

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as


the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan to 
do that recently?


FWIW I don't think I've ever felt the need to need to inspect the Stream 
Table on a live system. So far the nature of the STE code has been 
simple enough that it's very hard for any given STE to be *wrong* - 
either it's set up as expected and thus works fine, or it's not 
initialised at all and you get C_BAD_STE, where 99% of the time you then 
just cross-reference the Stream ID against the firmware and find that 
the DT/IORT is wrong.


Similarly I don't think I've even even *seen* an issue that could be 
attributed to a context descriptor, although I appreciate that as we 
start landing more PASID and SVA support the scope for that starts to 
widen considerably.


Feel free to propose a patch if you believe it would be genuinely useful 
and won't just bit-rot into a maintenance burden, but it's not something 
that's on our roadmap here.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Consult on ARM SMMU debugfs

2021-01-06 Thread chenxiang (M)

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as


the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan to 
do that recently?



Best regards,

Shawn


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu