Re: IOAT DMA w/IOMMU

2018-08-21 Thread Eric Pilmore
On Tue, Aug 21, 2018 at 4:53 PM, Logan Gunthorpe  wrote:
>
> CPUs don't always have good support for routing PCI P2P transactions.
> However, I would have expected a relatively new i7 CPU to support it
> well. It may simply be that this CPU does not have good support, however
>  that comes as a bit of a surprise to me. Our plans for the P2P
> patch-set was to have a white-list for CPUs that work.
>
> If you can, it would be worth hooking up a PCI analyzer to see what's
> happening to the TLPs in both cases.

We do have one, but it will be a bit of a logistical effort to get it set-up
in this config. Will be a while before I can gather that data. Will forward
along when I do.

Thanks,
Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-21 Thread Logan Gunthorpe



On 21/08/18 05:45 PM, Eric Pilmore wrote:
> Well, the only difference between success and failure is running with the
> call to dma_map_resource for the destination address, which is a PCI BAR
> address. Prior to Kit introducing this call, we never created a mapping for 
> the
> destination PCI BAR address and it worked fine on all systems when using
> PLX DMA. It was only when we went to a Xeon system and attempted to use
> IOAT DMA that we found we needed a mapping for that destination PCI BAR
> address.
> 
> The only thing the PLX driver does related to "mappings" is a call to
> dma_descriptor_unmap when the descriptor is freed, however that is more
> of an administrative step to clean up the unmap-data data structure used
> when the mapping was originally established.

Ah, so there's a big difference here in hardware. Without the mapping
call you are going to essentially be doing a P2P transaction. So the
TLPs won't even hit the CPU. With the mapping call, the TLPs will now
go through the CPU and IOMMU.

CPUs don't always have good support for routing PCI P2P transactions.
However, I would have expected a relatively new i7 CPU to support it
well. It may simply be that this CPU does not have good support, however
 that comes as a bit of a surprise to me. Our plans for the P2P
patch-set was to have a white-list for CPUs that work.

If you can, it would be worth hooking up a PCI analyzer to see what's
happening to the TLPs in both cases.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-21 Thread Eric Pilmore
On Tue, Aug 21, 2018 at 4:35 PM, Logan Gunthorpe  wrote:
>
>
> On 21/08/18 05:28 PM, Eric Pilmore wrote:
>>
>>
>> On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe > > wrote:
>>
>>
>>
>> On 21/08/18 05:18 PM, Eric Pilmore wrote:
>> > We have been running locally with Kit's change for dma_map_resource 
>> and its
>> > incorporation in ntb_async_tx_submit for the destination address and
>> > it runs fine
>> > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) 
>> based system,
>> > regardless of whether the DMA engine being used is IOAT or a PLX
>> > device sitting in
>> > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 
>> 4.20GHz) based
>> > system it seems to run into issues, specifically when put under a
>> > load. In this case,
>> > just having a load using a single ping command with an interval=0, 
>> i.e. no delay
>> > between ping packets, after a few thousand packets the system just 
>> hangs. No
>> > panic or watchdogs.  Note that in this scenario I can only use a PLX 
>> DMA engine.
>>
>> This is just my best guess: but it sounds to me like a bug in the PLX
>> DMA driver or hardware.
>>
>>
>> The PLX DMA driver?  But the PLX driver isn't really even involved in
>> the mapping
>> stage.  Are you thinking maybe the stage at which the DMA descriptor is
>> freed and
>> the PLX DMA driver does a dma_descriptor_unmap?
>
> Hmm, well what would make you think the hang is during
> mapping/unmapping?

Well, the only difference between success and failure is running with the
call to dma_map_resource for the destination address, which is a PCI BAR
address. Prior to Kit introducing this call, we never created a mapping for the
destination PCI BAR address and it worked fine on all systems when using
PLX DMA. It was only when we went to a Xeon system and attempted to use
IOAT DMA that we found we needed a mapping for that destination PCI BAR
address.

The only thing the PLX driver does related to "mappings" is a call to
dma_descriptor_unmap when the descriptor is freed, however that is more
of an administrative step to clean up the unmap-data data structure used
when the mapping was originally established.

> I would expect a hang to be in handling completions
> from the DMA engine or something like that.
>
>> Again, PLX did not exhibit any issues on the Xeon system.
>
> Oh, I missed that. That puts a crinkle in my theory but, as you say, it
> could be a timing issue.
>
> Also, it's VERY strange that it would hang the entire system. That makes
> things very hard to debug...

Tell me about it!  ;-)

Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-21 Thread Logan Gunthorpe


On 21/08/18 05:28 PM, Eric Pilmore wrote:
> 
> 
> On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe  > wrote:
> 
> 
> 
> On 21/08/18 05:18 PM, Eric Pilmore wrote:
> > We have been running locally with Kit's change for dma_map_resource and 
> its
> > incorporation in ntb_async_tx_submit for the destination address and
> > it runs fine
> > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based 
> system,
> > regardless of whether the DMA engine being used is IOAT or a PLX
> > device sitting in
> > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 
> 4.20GHz) based
> > system it seems to run into issues, specifically when put under a
> > load. In this case,
> > just having a load using a single ping command with an interval=0, i.e. 
> no delay
> > between ping packets, after a few thousand packets the system just 
> hangs. No
> > panic or watchdogs.  Note that in this scenario I can only use a PLX 
> DMA engine.
> 
> This is just my best guess: but it sounds to me like a bug in the PLX
> DMA driver or hardware.
> 
> 
> The PLX DMA driver?  But the PLX driver isn't really even involved in
> the mapping
> stage.  Are you thinking maybe the stage at which the DMA descriptor is
> freed and
> the PLX DMA driver does a dma_descriptor_unmap?

Hmm, well what would make you think the hang is during
mapping/unmapping? I would expect a hang to be in handling completions
from the DMA engine or something like that.

> Again, PLX did not exhibit any issues on the Xeon system.

Oh, I missed that. That puts a crinkle in my theory but, as you say, it
could be a timing issue.

Also, it's VERY strange that it would hang the entire system. That makes
things very hard to debug...

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-21 Thread Eric Pilmore
On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe  wrote:
>
>
> On 21/08/18 05:18 PM, Eric Pilmore wrote:
>> We have been running locally with Kit's change for dma_map_resource and its
>> incorporation in ntb_async_tx_submit for the destination address and
>> it runs fine
>> under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based 
>> system,
>> regardless of whether the DMA engine being used is IOAT or a PLX
>> device sitting in
>> the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) 
>> based
>> system it seems to run into issues, specifically when put under a
>> load. In this case,
>> just having a load using a single ping command with an interval=0, i.e. no 
>> delay
>> between ping packets, after a few thousand packets the system just hangs. No
>> panic or watchdogs.  Note that in this scenario I can only use a PLX DMA 
>> engine.
>
> This is just my best guess: but it sounds to me like a bug in the PLX
> DMA driver or hardware.
>
> Logan

(sorry, html in last response.  resending..)

The PLX DMA driver?  But the PLX driver isn't really even involved in
the mapping
stage.  Are you thinking maybe the stage at which the DMA descriptor
is freed and
the PLX DMA driver does a dma_descriptor_unmap?

Again, PLX did not exhibit any issues on the Xeon system.

Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-21 Thread Eric Pilmore
On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe 
wrote:

>
>
> On 21/08/18 05:18 PM, Eric Pilmore wrote:
> > We have been running locally with Kit's change for dma_map_resource and
> its
> > incorporation in ntb_async_tx_submit for the destination address and
> > it runs fine
> > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based
> system,
> > regardless of whether the DMA engine being used is IOAT or a PLX
> > device sitting in
> > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz)
> based
> > system it seems to run into issues, specifically when put under a
> > load. In this case,
> > just having a load using a single ping command with an interval=0, i.e.
> no delay
> > between ping packets, after a few thousand packets the system just
> hangs. No
> > panic or watchdogs.  Note that in this scenario I can only use a PLX DMA
> engine.
>
> This is just my best guess: but it sounds to me like a bug in the PLX
> DMA driver or hardware.
>
>
The PLX DMA driver?  But the PLX driver isn't really even involved in the
mapping
stage.  Are you thinking maybe the stage at which the DMA descriptor is
freed and
the PLX DMA driver does a dma_descriptor_unmap?

Again, PLX did not exhibit any issues on the Xeon system.

Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-21 Thread Logan Gunthorpe



On 21/08/18 05:18 PM, Eric Pilmore wrote:
> We have been running locally with Kit's change for dma_map_resource and its
> incorporation in ntb_async_tx_submit for the destination address and
> it runs fine
> under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based 
> system,
> regardless of whether the DMA engine being used is IOAT or a PLX
> device sitting in
> the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
> system it seems to run into issues, specifically when put under a
> load. In this case,
> just having a load using a single ping command with an interval=0, i.e. no 
> delay
> between ping packets, after a few thousand packets the system just hangs. No
> panic or watchdogs.  Note that in this scenario I can only use a PLX DMA 
> engine.

This is just my best guess: but it sounds to me like a bug in the PLX
DMA driver or hardware.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-21 Thread Eric Pilmore
On Thu, Aug 16, 2018 at 11:56 AM, Logan Gunthorpe  wrote:
>
>
>
> On 16/08/18 12:53 PM, Kit Chow wrote:
> >
> >
> > On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:
> >>
> >> On 16/08/18 11:16 AM, Kit Chow wrote:
> >>> I only have access to intel hosts for testing (and possibly an AMD
> >>> host currently collecting dust) and am not sure how to go about getting
> >>> the proper test coverage for other architectures.
> >> Well, I thought you were only changing the Intel IOMMU implementation...
> >> So testing on Intel hardware seems fine to me.
> > For the ntb change, I wasn't sure if there was some other arch where
> > ntb_async_tx_submit would work ok with the pci bar address but
> > would break with the dma map (assuming map_resource has been implemented).
> > If its all x86 at the moment, I guess I'm good to go. Please confirm.
>
> I expect lots of arches have issues with dma_map_resource(), it's pretty
> new and if it didn't work correctly on x86 I imagine many other arches
> are wrong too. I an arch is doesn't work, someone will have to fix that
> arches Iommu implementation. But using dma_map_resource() in
> ntb_transport is definitely correct compared to what was done before.
>
> Logan


We have been running locally with Kit's change for dma_map_resource and its
incorporation in ntb_async_tx_submit for the destination address and
it runs fine
under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
regardless of whether the DMA engine being used is IOAT or a PLX
device sitting in
the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
system it seems to run into issues, specifically when put under a
load. In this case,
just having a load using a single ping command with an interval=0, i.e. no delay
between ping packets, after a few thousand packets the system just hangs. No
panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.

Just wondering if anybody might have a clue as to why the i7 system might run
into issues, but a Xeon system not. IOMMU differences? My gut reaction is that
the problem may not necessarily be IOMMU hardware related, but perhaps simply
a timing issue given the difference in cpu speeds between these systems and
maybe we're hitting some window in the Intel IOMMU mapping/unmapping logic.

Thanks,
Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-16 Thread Logan Gunthorpe



On 16/08/18 12:53 PM, Kit Chow wrote:
> 
> 
> On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:
>>
>> On 16/08/18 11:16 AM, Kit Chow wrote:
>>> I only have access to intel hosts for testing (and possibly an AMD
>>> host currently collecting dust) and am not sure how to go about getting
>>> the proper test coverage for other architectures.
>> Well, I thought you were only changing the Intel IOMMU implementation...
>> So testing on Intel hardware seems fine to me.
> For the ntb change, I wasn't sure if there was some other arch where
> ntb_async_tx_submit would work ok with the pci bar address but
> would break with the dma map (assuming map_resource has been implemented).
> If its all x86 at the moment, I guess I'm good to go. Please confirm.

I expect lots of arches have issues with dma_map_resource(), it's pretty
new and if it didn't work correctly on x86 I imagine many other arches
are wrong too. I an arch is doesn't work, someone will have to fix that
arches Iommu implementation. But using dma_map_resource() in
ntb_transport is definitely correct compared to what was done before.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-16 Thread Kit Chow



On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:


On 16/08/18 11:16 AM, Kit Chow wrote:

I only have access to intel hosts for testing (and possibly an AMD
host currently collecting dust) and am not sure how to go about getting
the proper test coverage for other architectures.

Well, I thought you were only changing the Intel IOMMU implementation...
So testing on Intel hardware seems fine to me.

For the ntb change, I wasn't sure if there was some other arch where
ntb_async_tx_submit would work ok with the pci bar address but
would break with the dma map (assuming map_resource has been implemented).
If its all x86 at the moment, I guess I'm good to go. Please confirm.

Thanks
Kit


Any suggestions? Do you anticipate any issues with dma mapping the pci bar
address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
who have ntb test setups with non-x86 hosts who might be able to help?

I expect other IOMMUs will need to be updated to properly support
dma_map_resource but that will probably need to be done on a case by
case basis as the need arises. Unfortunately it seems the work to add
dma_map_resource() didn't really consider that the IOMMUs had to have
proper support and probably should have errored out instead of just
passing along the physical address if the IOMMU didn't have support.

I don't know of anyone with an NTB setup on anything but x86 at the moment.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-16 Thread Logan Gunthorpe


On 16/08/18 11:16 AM, Kit Chow wrote:
> I only have access to intel hosts for testing (and possibly an AMD
> host currently collecting dust) and am not sure how to go about getting
> the proper test coverage for other architectures.

Well, I thought you were only changing the Intel IOMMU implementation...
So testing on Intel hardware seems fine to me.

> Any suggestions? Do you anticipate any issues with dma mapping the pci bar
> address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
> who have ntb test setups with non-x86 hosts who might be able to help?

I expect other IOMMUs will need to be updated to properly support
dma_map_resource but that will probably need to be done on a case by
case basis as the need arises. Unfortunately it seems the work to add
dma_map_resource() didn't really consider that the IOMMUs had to have
proper support and probably should have errored out instead of just
passing along the physical address if the IOMMU didn't have support.

I don't know of anyone with an NTB setup on anything but x86 at the moment.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-16 Thread Kit Chow


On 08/09/2018 02:36 PM, Logan Gunthorpe wrote:


On 09/08/18 03:31 PM, Eric Pilmore wrote:

On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe  wrote:

Hey,

On 09/08/18 12:51 PM, Eric Pilmore wrote:

Was wondering if anybody here has used IOAT DMA engines with an
IOMMU turned on (Xeon based system)? My specific question is really
whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
destination without having to map that address to the IOVA space of
the DMA engine first (assuming the IOMMU is on)?

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

The thought was to avoid paying the price of having to go through yet another
translation and also because it was not believed to be necessary anyway since
the DMA device could go straight to a PCI BAR address without the need for a
mapping.  We have been playing with two DMA engines, IOAT and PLX. The
PLX does not have any issues going straight to the PCI BAR address, but unlike
IOAT, PLX is sitting "in the PCI tree".

Yes, you can do true P2P transactions with the PLX DMA engine. (Though
you do have to be careful as technically you need to translate to a PCI
bus address not the CPU physical address which are the same on x86; and
you need to watch that ACS doesn't mess it up).

It doesn't sound like it's possible to avoid the IOMMU when using the
IOAT so you need the mapping. I would not expect the extra translation
in the IOMMU to be noticeable, performance wise.

Logan

Hi Logan,

Based on Dave Jiang's suggestion, I am looking into upstreaming the
ntb change to dma map the pci bar address in ntb_async_tx_submit() 
(along with

the intel iommu change to support dma_map_resource).

I only have access to intel hosts for testing (and possibly an AMD
host currently collecting dust) and am not sure how to go about getting
the proper test coverage for other architectures.

Any suggestions? Do you anticipate any issues with dma mapping the pci bar
address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
who have ntb test setups with non-x86 hosts who might be able to help?

Thanks
Kit


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-14 Thread Robin Murphy

On 14/08/18 00:50, Logan Gunthorpe wrote:

On 13/08/18 05:48 PM, Kit Chow wrote:

On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:


On 13/08/18 05:30 PM, Kit Chow wrote:

In arch/x86/include/asm/page.h, there is the following comment in
regards to validating the virtual address.

/*
   * virt_to_page(kaddr) returns a valid pointer if and only if
   * virt_addr_valid(kaddr) returns true.
   */
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped
from the virt_to_page code path. Does anyone have any ideas what
happended to it?

I don't think it was ever validated (though I haven't been around long
enough to say for certain). What the comment is saying is that you
shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
true (which in most cases you can without the extra check). virt_to_page
is meant to be really fast so adding an extra validation would probably
be a significant performance regression for the entire kernel.

The fact that this can happen through dma_map_single() is non-ideal at
best. It assumes the caller is mapping regular memory and doesn't check
this at all. It may make sense to fix that but I think people expect
dma_map_single() to be as fast as possible as well...


dma_map_single() is already documented as only supporting lowmem (for 
which virt_to_page() can be assumed to be valid). You might get away 
with feeding it bogus addresses on x86, but on non-coherent 
architectures which convert the page back to a virtual address to 
perform cache maintenance you can expect that to crash and burn rapidly.


There may be some minimal-overhead sanity checking of fundamentals, but 
in general it's not really the DMA API's job to police its callers 
exhaustively; consider that the mm layer doesn't go out of its way to 
stop you from doing things like "kfree(kfree);" either.





Perhaps include the validation with some debug turned on?


The problem is how often do you develop code with any of the debug
config options turned on?

There's already a couple of BUG_ONs in dma_map_single so maybe another
one with virt_addr_valid wouldn't be so bad.


Note that virt_addr_valid() may be pretty heavyweight in itself. For 
example the arm64 implementation involves memblock_search(); that really 
isn't viable in a DMA mapping fastpath.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-14 Thread Kit Chow



On 08/13/2018 04:50 PM, Logan Gunthorpe wrote:


On 13/08/18 05:48 PM, Kit Chow wrote:

On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:

On 13/08/18 05:30 PM, Kit Chow wrote:

In arch/x86/include/asm/page.h, there is the following comment in
regards to validating the virtual address.

/*
   * virt_to_page(kaddr) returns a valid pointer if and only if
   * virt_addr_valid(kaddr) returns true.
   */
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped
from the virt_to_page code path. Does anyone have any ideas what
happended to it?

I don't think it was ever validated (though I haven't been around long
enough to say for certain). What the comment is saying is that you
shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
true (which in most cases you can without the extra check). virt_to_page
is meant to be really fast so adding an extra validation would probably
be a significant performance regression for the entire kernel.

The fact that this can happen through dma_map_single() is non-ideal at
best. It assumes the caller is mapping regular memory and doesn't check
this at all. It may make sense to fix that but I think people expect
dma_map_single() to be as fast as possible as well...


Perhaps include the validation with some debug turned on?

The problem is how often do you develop code with any of the debug
config options turned on?

There's already a couple of BUG_ONs in dma_map_single so maybe another
one with virt_addr_valid wouldn't be so bad.
Had my very first Linux crash on the dma direction BUG_ON when I tried 
DMA_NONE :).


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Logan Gunthorpe


On 13/08/18 05:48 PM, Kit Chow wrote:
> On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:
>>
>> On 13/08/18 05:30 PM, Kit Chow wrote:
>>> In arch/x86/include/asm/page.h, there is the following comment in
>>> regards to validating the virtual address.
>>>
>>> /*
>>>   * virt_to_page(kaddr) returns a valid pointer if and only if
>>>   * virt_addr_valid(kaddr) returns true.
>>>   */
>>> #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>>>
>>> So it looks like the validation by virt_addr_valid was somehow dropped
>>> from the virt_to_page code path. Does anyone have any ideas what
>>> happended to it?
>> I don't think it was ever validated (though I haven't been around long
>> enough to say for certain). What the comment is saying is that you
>> shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
>> true (which in most cases you can without the extra check). virt_to_page
>> is meant to be really fast so adding an extra validation would probably
>> be a significant performance regression for the entire kernel.
>>
>> The fact that this can happen through dma_map_single() is non-ideal at
>> best. It assumes the caller is mapping regular memory and doesn't check
>> this at all. It may make sense to fix that but I think people expect
>> dma_map_single() to be as fast as possible as well...
>>
> Perhaps include the validation with some debug turned on?

The problem is how often do you develop code with any of the debug
config options turned on?

There's already a couple of BUG_ONs in dma_map_single so maybe another
one with virt_addr_valid wouldn't be so bad.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Kit Chow



On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:


On 13/08/18 05:30 PM, Kit Chow wrote:

In arch/x86/include/asm/page.h, there is the following comment in
regards to validating the virtual address.

/*
  * virt_to_page(kaddr) returns a valid pointer if and only if
  * virt_addr_valid(kaddr) returns true.
  */
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped
from the virt_to_page code path. Does anyone have any ideas what
happended to it?

I don't think it was ever validated (though I haven't been around long
enough to say for certain). What the comment is saying is that you
shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
true (which in most cases you can without the extra check). virt_to_page
is meant to be really fast so adding an extra validation would probably
be a significant performance regression for the entire kernel.

The fact that this can happen through dma_map_single() is non-ideal at
best. It assumes the caller is mapping regular memory and doesn't check
this at all. It may make sense to fix that but I think people expect
dma_map_single() to be as fast as possible as well...


Perhaps include the validation with some debug turned on?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Logan Gunthorpe


On 13/08/18 05:30 PM, Kit Chow wrote:
> In arch/x86/include/asm/page.h, there is the following comment in
> regards to validating the virtual address.
> 
> /*
>  * virt_to_page(kaddr) returns a valid pointer if and only if
>  * virt_addr_valid(kaddr) returns true.
>  */
> #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
> 
> So it looks like the validation by virt_addr_valid was somehow dropped
> from the virt_to_page code path. Does anyone have any ideas what
> happended to it?

I don't think it was ever validated (though I haven't been around long
enough to say for certain). What the comment is saying is that you
shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
true (which in most cases you can without the extra check). virt_to_page
is meant to be really fast so adding an extra validation would probably
be a significant performance regression for the entire kernel.

The fact that this can happen through dma_map_single() is non-ideal at
best. It assumes the caller is mapping regular memory and doesn't check
this at all. It may make sense to fix that but I think people expect
dma_map_single() to be as fast as possible as well...

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Kit Chow


Taking a step back, I was a little surprised that dma_map_single 
successfully returned an
iommu address for the pci bar address passed into it during my initial 
experiment...


Internally, dma_map_single calls virt_to_page() to translate the 
"virtual address" into a
page and intel_map page then calls page_to_phys() to convert the page to 
a dma_addr_t.


The virt_to_page and page_to_phys routines don't appear to do any 
validation and just

uses arithmetic to do the conversions.

the pci bar address (0x383c70e51578) does fall into a valid VA range in 
x86_64 so
it could conceivably be a valid VA.  So I tried an virtual address 
inside the VA hole

 and and it too returned without any errors.

virt_to_page(0x8000) -> 0xede0
page_to_phys(0xede0) -> 0xf800

In arch/x86/include/asm/page.h, there is the following comment in regards to
validating the virtual address.

/*
 * virt_to_page(kaddr) returns a valid pointer if and only if
 * virt_addr_valid(kaddr) returns true.
 */
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped 
from the

virt_to_page code path. Does anyone have any ideas what happended to it?

Kit

(Resending, earlier message tagged as containing html subpart)

On 08/13/2018 08:21 AM, Kit Chow wrote:



On 08/13/2018 07:59 AM, Robin Murphy wrote:

On 13/08/18 15:23, Kit Chow wrote:

On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:


On 10/08/18 06:53 PM, Kit Chow wrote:
I was able to finally succeed in doing the dma transfers over ioat 
only

when prot has DMA_PTE_WRITE set by setting the direction to either
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
need to be changed? Are there any bad side effects if I used
DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. 
Adding

another PCI device just adds to the confusion.

Yep, confusing :).

=== =
DMA_NONE    no direction (used for debugging)
DMA_TO_DEVICE    data is going from the memory to the device
DMA_FROM_DEVICE    data is coming from the device to the memory
DMA_BIDIRECTIONAL    direction isn't known
=== =

I believe, the direction should be from the IOAT's point of view. 
So if

the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper 
choice; IOAT is the plumbing to move host data (memory) to the bar 
address (device).


Except that the "device" in question is the IOAT itself (more 
generally, it means the device represented by the first argument to 
dma_map_*() - the one actually emitting the reads and writes). The 
context of a DMA API call is the individual mapping in question, not 
whatever overall operation it may be part of - your example already 
involves two separate mappings: one "from" system memory "to" the DMA 
engine, and one "from" the DMA engine "to" PCI BAR memory.


OK, that makes sense.  The middleman (aka DMA engine device) is the 
key in the to/from puzzle. Thanks!





Note that the DMA API's dma_direction is also distinct from the 
dmaengine API's dma_transfer_direction, and there's plenty of fun to 
be had mapping between the two - see pl330.c or rcar-dmac.c for other 
examples of dma_map_resource() for slave devices - no guarantees that 
those implementations are entirely correct (especially the one I 
did!), but in practice they do make the "DMA engine behind an IOMMU" 
case work for UARTs and similar straightforward slaves.



Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map 
routines for the src and dest addresses need to be consistent?


In general, the mappings of source and destination addresses would 
typically have opposite directions as above, unless they're both 
bidirectional.


And does the direction setting for the dmaengine_unmap_data have to 
be consistent with the direction used in dma_map_*?


Yes, the arguments to an unmap are expected to match whatever was 
passed to the corresponding map call. CONFIG_DMA_API_DEBUG should 
help catch any mishaps.


Robin.

BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
keep track of the dma_map routine used and call the corresponding 
dma_unmap routine?  In the case of the intel iommu, it doesn't matter.


Thanks
Kit



Using DMA_BIDIRECTIONAL just forgoes any hardware security / 
protection

that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.


Given that using the pci bar address as is without getting an iommu
address results in the same "PTE 

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Kit Chow
Taking a step back, I was a little surprised that dma_map_single 
successfully returned an iommu address for the pci bar address passed 
into it during my initial experiment...


Internally, dma_map_single calls virt_to_page() to translate the 
"virtual address" into a page and intel_map page then calls 
page_to_phys() to convert the page to a dma_addr_t.


The virt_to_page and page_to_phys routines don't appear to do any 
validation and just uses arithmetic to do the conversions.


the pci bar address (0x383c70e51578) does fall into a valid VA range in 
x86_64 so it could conceivably be a valid VA.  So I tried an virtual 
address inside the VA hole and and it too returned without any errors.


virt_to_page(0x8000) -> 0xede0
page_to_phys(0xede0) -> 0xf800

In arch/x86/include/asm/page.h, there is the following comment in 
regards to validating the virtual address.


/*
 * virt_to_page(kaddr) returns a valid pointer if and only if
 * virt_addr_valid(kaddr) returns true.
 */
#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped 
from the virt_to_page code path. Does anyone have any ideas what 
happended to it?


Kit


On 08/13/2018 08:21 AM, Kit Chow wrote:



On 08/13/2018 07:59 AM, Robin Murphy wrote:

On 13/08/18 15:23, Kit Chow wrote:

On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:


On 10/08/18 06:53 PM, Kit Chow wrote:
I was able to finally succeed in doing the dma transfers over ioat 
only

when prot has DMA_PTE_WRITE set by setting the direction to either
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
need to be changed? Are there any bad side effects if I used
DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. 
Adding

another PCI device just adds to the confusion.

Yep, confusing :).

=== =
DMA_NONE    no direction (used for debugging)
DMA_TO_DEVICE    data is going from the memory to the device
DMA_FROM_DEVICE    data is coming from the device to the memory
DMA_BIDIRECTIONAL    direction isn't known
=== =

I believe, the direction should be from the IOAT's point of view. 
So if

the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper 
choice; IOAT is the plumbing to move host data (memory) to the bar 
address (device).


Except that the "device" in question is the IOAT itself (more 
generally, it means the device represented by the first argument to 
dma_map_*() - the one actually emitting the reads and writes). The 
context of a DMA API call is the individual mapping in question, not 
whatever overall operation it may be part of - your example already 
involves two separate mappings: one "from" system memory "to" the DMA 
engine, and one "from" the DMA engine "to" PCI BAR memory.


OK, that makes sense.  The middleman (aka DMA engine device) is the 
key in the to/from puzzle. Thanks!





Note that the DMA API's dma_direction is also distinct from the 
dmaengine API's dma_transfer_direction, and there's plenty of fun to 
be had mapping between the two - see pl330.c or rcar-dmac.c for other 
examples of dma_map_resource() for slave devices - no guarantees that 
those implementations are entirely correct (especially the one I 
did!), but in practice they do make the "DMA engine behind an IOMMU" 
case work for UARTs and similar straightforward slaves.



Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map 
routines for the src and dest addresses need to be consistent?


In general, the mappings of source and destination addresses would 
typically have opposite directions as above, unless they're both 
bidirectional.


And does the direction setting for the dmaengine_unmap_data have to 
be consistent with the direction used in dma_map_*?


Yes, the arguments to an unmap are expected to match whatever was 
passed to the corresponding map call. CONFIG_DMA_API_DEBUG should 
help catch any mishaps.


Robin.

BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
keep track of the dma_map routine used and call the corresponding 
dma_unmap routine?  In the case of the intel iommu, it doesn't matter.


Thanks
Kit



Using DMA_BIDIRECTIONAL just forgoes any hardware security / 
protection

that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.


Given that using the pci bar address as is without getting an iommu
address results in the same "PTE Write access" error, I wonder if 
there

is some internal 'prot' 

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Kit Chow



On 08/13/2018 07:59 AM, Robin Murphy wrote:

On 13/08/18 15:23, Kit Chow wrote:

On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:


On 10/08/18 06:53 PM, Kit Chow wrote:
I was able to finally succeed in doing the dma transfers over ioat 
only

when prot has DMA_PTE_WRITE set by setting the direction to either
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
need to be changed? Are there any bad side effects if I used
DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. 
Adding

another PCI device just adds to the confusion.

Yep, confusing :).

=== =
DMA_NONE    no direction (used for debugging)
DMA_TO_DEVICE    data is going from the memory to the device
DMA_FROM_DEVICE    data is coming from the device to the memory
DMA_BIDIRECTIONAL    direction isn't known
=== =


I believe, the direction should be from the IOAT's point of view. So if
the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper 
choice; IOAT is the plumbing to move host data (memory) to the bar 
address (device).


Except that the "device" in question is the IOAT itself (more 
generally, it means the device represented by the first argument to 
dma_map_*() - the one actually emitting the reads and writes). The 
context of a DMA API call is the individual mapping in question, not 
whatever overall operation it may be part of - your example already 
involves two separate mappings: one "from" system memory "to" the DMA 
engine, and one "from" the DMA engine "to" PCI BAR memory.


OK, that makes sense.  The middleman (aka DMA engine device) is the key 
in the to/from puzzle. Thanks!





Note that the DMA API's dma_direction is also distinct from the 
dmaengine API's dma_transfer_direction, and there's plenty of fun to 
be had mapping between the two - see pl330.c or rcar-dmac.c for other 
examples of dma_map_resource() for slave devices - no guarantees that 
those implementations are entirely correct (especially the one I 
did!), but in practice they do make the "DMA engine behind an IOMMU" 
case work for UARTs and similar straightforward slaves.



Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map 
routines for the src and dest addresses need to be consistent?


In general, the mappings of source and destination addresses would 
typically have opposite directions as above, unless they're both 
bidirectional.


And does the direction setting for the dmaengine_unmap_data have to 
be consistent with the direction used in dma_map_*?


Yes, the arguments to an unmap are expected to match whatever was 
passed to the corresponding map call. CONFIG_DMA_API_DEBUG should help 
catch any mishaps.


Robin.

BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
keep track of the dma_map routine used and call the corresponding 
dma_unmap routine?  In the case of the intel iommu, it doesn't matter.


Thanks
Kit



Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.


Given that using the pci bar address as is without getting an iommu
address results in the same "PTE Write access" error, I wonder if 
there

is some internal 'prot' associated with the non-translated pci bar
address that just needs to be tweaked to include DMA_PTE_WRITE???

No, I don't think so. The 'prot' will be a property of the IOMMU. Not
having an entry is probably just the same (from the perspective of the
error you see) as only having an entry for reading.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Robin Murphy

On 13/08/18 15:23, Kit Chow wrote:

On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:


On 10/08/18 06:53 PM, Kit Chow wrote:

I was able to finally succeed in doing the dma transfers over ioat only
when prot has DMA_PTE_WRITE set by setting the direction to either
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
need to be changed? Are there any bad side effects if I used
DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. Adding
another PCI device just adds to the confusion.

Yep, confusing :).

=== =
DMA_NONE    no direction (used for debugging)
DMA_TO_DEVICE    data is going from the memory to the device
DMA_FROM_DEVICE    data is coming from the device to the memory
DMA_BIDIRECTIONAL    direction isn't known
=== =


I believe, the direction should be from the IOAT's point of view. So if
the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper choice; 
IOAT is the plumbing to move host data (memory) to the bar address 
(device).


Except that the "device" in question is the IOAT itself (more generally, 
it means the device represented by the first argument to dma_map_*() - 
the one actually emitting the reads and writes). The context of a DMA 
API call is the individual mapping in question, not whatever overall 
operation it may be part of - your example already involves two separate 
mappings: one "from" system memory "to" the DMA engine, and one "from" 
the DMA engine "to" PCI BAR memory.


Note that the DMA API's dma_direction is also distinct from the 
dmaengine API's dma_transfer_direction, and there's plenty of fun to be 
had mapping between the two - see pl330.c or rcar-dmac.c for other 
examples of dma_map_resource() for slave devices - no guarantees that 
those implementations are entirely correct (especially the one I did!), 
but in practice they do make the "DMA engine behind an IOMMU" case work 
for UARTs and similar straightforward slaves.



Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map routines 
for the src and dest addresses need to be consistent?


In general, the mappings of source and destination addresses would 
typically have opposite directions as above, unless they're both 
bidirectional.


And does the direction setting for the dmaengine_unmap_data have to be 
consistent with the direction used in dma_map_*?


Yes, the arguments to an unmap are expected to match whatever was passed 
to the corresponding map call. CONFIG_DMA_API_DEBUG should help catch 
any mishaps.


Robin.

BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it keep 
track of the dma_map routine used and call the corresponding dma_unmap 
routine?  In the case of the intel iommu, it doesn't matter.


Thanks
Kit



Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.


Given that using the pci bar address as is without getting an iommu
address results in the same "PTE Write access" error, I wonder if there
is some internal 'prot' associated with the non-translated pci bar
address that just needs to be tweaked to include DMA_PTE_WRITE???

No, I don't think so. The 'prot' will be a property of the IOMMU. Not
having an entry is probably just the same (from the perspective of the
error you see) as only having an entry for reading.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-13 Thread Kit Chow



On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:


On 10/08/18 06:53 PM, Kit Chow wrote:

I was able to finally succeed in doing the dma transfers over ioat only
when prot has DMA_PTE_WRITE set by setting the direction to either
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
need to be changed? Are there any bad side effects if I used
DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. Adding
another PCI device just adds to the confusion.

Yep, confusing :).

=== =
DMA_NONEno direction (used for debugging)
DMA_TO_DEVICE   data is going from the memory to the device
DMA_FROM_DEVICE data is coming from the device to the memory
DMA_BIDIRECTIONAL   direction isn't known
=== =


I believe, the direction should be from the IOAT's point of view. So if
the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper choice; 
IOAT is the plumbing to move host data (memory) to the bar address (device).


Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map routines 
for the src and dest addresses need to be consistent?


And does the direction setting for the dmaengine_unmap_data have to be 
consistent with the direction used in dma_map_*?


BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it keep 
track of the dma_map routine used and call the corresponding dma_unmap 
routine?  In the case of the intel iommu, it doesn't matter.


Thanks
Kit



Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.


Given that using the pci bar address as is without getting an iommu
address results in the same "PTE Write access" error, I wonder if there
is some internal 'prot' associated with the non-translated pci bar
address that just needs to be tweaked to include DMA_PTE_WRITE???

No, I don't think so. The 'prot' will be a property of the IOMMU. Not
having an entry is probably just the same (from the perspective of the
error you see) as only having an entry for reading.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Logan Gunthorpe



On 10/08/18 06:53 PM, Kit Chow wrote:
> I was able to finally succeed in doing the dma transfers over ioat only
> when prot has DMA_PTE_WRITE set by setting the direction to either
> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
> need to be changed? Are there any bad side effects if I used
> DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. Adding
another PCI device just adds to the confusion.

I believe, the direction should be from the IOAT's point of view. So if
the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).

Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.

> Given that using the pci bar address as is without getting an iommu
> address results in the same "PTE Write access" error, I wonder if there
> is some internal 'prot' associated with the non-translated pci bar
> address that just needs to be tweaked to include DMA_PTE_WRITE???

No, I don't think so. The 'prot' will be a property of the IOMMU. Not
having an entry is probably just the same (from the perspective of the
error you see) as only having an entry for reading.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-10 Thread Kit Chow

Success!

I've implemented a new intel_map_resource (and intel_unmap_resource) 
routine which is called by dma_map_resource. As mentioned previously, 
the primary job of dma_map_resource/intel_map_resource is to call the 
intel iommu internal mapping routine (__intel_map_single) without 
translating the pci bar address into a page and then back to a phys addr 
as dma_map_page and dma_map_single are doing.


With the new dma_map_resource routine mapping the pci bar address in 
ntb_transport_tx_submit, I was still getting the "PTE Write access is 
not set" error.


The __intel_map_single routine sets prot based on the DMA direction.

    if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL || \
    !cap_zlr(iommu->cap))
    prot |= DMA_PTE_READ;
    if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
    prot |= DMA_PTE_WRITE;

I was able to finally succeed in doing the dma transfers over ioat only 
when prot has DMA_PTE_WRITE set by setting the direction to either 
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings 
need to be changed? Are there any bad side effects if I used 
DMA_BIDIRECTIONAL?


Given that using the pci bar address as is without getting an iommu 
address results in the same "PTE Write access" error, I wonder if there 
is some internal 'prot' associated with the non-translated pci bar 
address that just needs to be tweaked to include DMA_PTE_WRITE???


Thanks!




On 08/10/2018 10:46 AM, Dave Jiang wrote:


On 08/10/2018 10:15 AM, Logan Gunthorpe wrote:


On 10/08/18 11:01 AM, Dave Jiang wrote:

Or if the BIOS has provided mapping for the Intel NTB device
specifically? Is that a possibility? NTB does go through the IOMMU.

I don't know but if the BIOS is doing it, but that would only at best
work for Intel NTB I see no hope in getting the BIOS to map the MW
for a Switchtec NTB device...

Right. I'm just speculating why it may possibly work. But yes, I think
kernel will need appropriate mapping if it's not happening right now.
Hopefully Kit is onto something.


Logan



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Dave Jiang



On 08/10/2018 10:15 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 11:01 AM, Dave Jiang wrote:
>> Or if the BIOS has provided mapping for the Intel NTB device
>> specifically? Is that a possibility? NTB does go through the IOMMU.
> 
> I don't know but if the BIOS is doing it, but that would only at best
> work for Intel NTB I see no hope in getting the BIOS to map the MW
> for a Switchtec NTB device...

Right. I'm just speculating why it may possibly work. But yes, I think
kernel will need appropriate mapping if it's not happening right now.
Hopefully Kit is onto something.

> 
> Logan
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-10 Thread Logan Gunthorpe



On 10/08/18 11:01 AM, Dave Jiang wrote:
> Or if the BIOS has provided mapping for the Intel NTB device
> specifically? Is that a possibility? NTB does go through the IOMMU.

I don't know but if the BIOS is doing it, but that would only at best
work for Intel NTB I see no hope in getting the BIOS to map the MW
for a Switchtec NTB device...

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-10 Thread Dave Jiang


On 08/10/2018 09:33 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 10:31 AM, Dave Jiang wrote:
>>
>>
>> On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 10/08/18 10:02 AM, Kit Chow wrote:
 Turns out there is no dma_map_resource routine on x86. get_dma_ops 
 returns intel_dma_ops which has map_resource pointing to NULL.
>>>
>>> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
>>> shouldn't be too hard to implement though.
>>>
 Will poke around some in the intel_map_page code but can you actually 
 get a valid struct page for a pci bar address (dma_map_single calls 
 virt_to_page)?  If not, does a map_resource routine that can properly 
 map a pci bar address need to be implemented?
>>>
>>> Yes, you can not get a struct page for a PCI bar address unless it's
>>> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
>>> dma_map_single() didn't work.
>>>
>>> This all implies that ntb_transport doesn't work with DMA and the IOMMU
>>> turned on. I'm not sure I've ever tried that configuration myself but it
>>> is a bit surprising.
>>
>> Hmmthat's surprising because it seems to work on Skylake platform
>> when I tested it yesterday with Intel NTB. Kit is using a Haswell
>> platform at the moment I think. Although I'm curious if it works with
>> the PLX NTB he's using on Skylake.
> 
> Does that mean on Skylake the IOAT can bypass the IOMMU? Because it
> looks like the ntb_transport code doesn't map the physical address of
> the NTB MW into the IOMMU when doing DMA...

Or if the BIOS has provided mapping for the Intel NTB device
specifically? Is that a possibility? NTB does go through the IOMMU.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Logan Gunthorpe


On 10/08/18 10:31 AM, Dave Jiang wrote:
> 
> 
> On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
>>
>>
>> On 10/08/18 10:02 AM, Kit Chow wrote:
>>> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
>>> returns intel_dma_ops which has map_resource pointing to NULL.
>>
>> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
>> shouldn't be too hard to implement though.
>>
>>> Will poke around some in the intel_map_page code but can you actually 
>>> get a valid struct page for a pci bar address (dma_map_single calls 
>>> virt_to_page)?  If not, does a map_resource routine that can properly 
>>> map a pci bar address need to be implemented?
>>
>> Yes, you can not get a struct page for a PCI bar address unless it's
>> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
>> dma_map_single() didn't work.
>>
>> This all implies that ntb_transport doesn't work with DMA and the IOMMU
>> turned on. I'm not sure I've ever tried that configuration myself but it
>> is a bit surprising.
> 
> Hmmthat's surprising because it seems to work on Skylake platform
> when I tested it yesterday with Intel NTB. Kit is using a Haswell
> platform at the moment I think. Although I'm curious if it works with
> the PLX NTB he's using on Skylake.

Does that mean on Skylake the IOAT can bypass the IOMMU? Because it
looks like the ntb_transport code doesn't map the physical address of
the NTB MW into the IOMMU when doing DMA...

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Dave Jiang


On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 10:02 AM, Kit Chow wrote:
>> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
>> returns intel_dma_ops which has map_resource pointing to NULL.
> 
> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
> shouldn't be too hard to implement though.
> 
>> Will poke around some in the intel_map_page code but can you actually 
>> get a valid struct page for a pci bar address (dma_map_single calls 
>> virt_to_page)?  If not, does a map_resource routine that can properly 
>> map a pci bar address need to be implemented?
> 
> Yes, you can not get a struct page for a PCI bar address unless it's
> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
> dma_map_single() didn't work.
> 
> This all implies that ntb_transport doesn't work with DMA and the IOMMU
> turned on. I'm not sure I've ever tried that configuration myself but it
> is a bit surprising.

Hmmthat's surprising because it seems to work on Skylake platform
when I tested it yesterday with Intel NTB. Kit is using a Haswell
platform at the moment I think. Although I'm curious if it works with
the PLX NTB he's using on Skylake.

> 
> Logan
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Logan Gunthorpe



On 10/08/18 10:23 AM, Kit Chow wrote:
> There is an internal routine (__intel_map_single) inside the intel iommu 
> code that does the actual mapping using a phys_addr_t. Think I'll try to 
> implement a intel_map_resource routine that calls that routine directly 
> without all of the conversions done for dma_map_{single,page} (pci bar 
> addr -> page -> phys_addr)...

Nice, yes, that was what I was thinking.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-10 Thread Logan Gunthorpe


On 10/08/18 10:02 AM, Kit Chow wrote:
> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
> returns intel_dma_ops which has map_resource pointing to NULL.

Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
shouldn't be too hard to implement though.

> Will poke around some in the intel_map_page code but can you actually 
> get a valid struct page for a pci bar address (dma_map_single calls 
> virt_to_page)?  If not, does a map_resource routine that can properly 
> map a pci bar address need to be implemented?

Yes, you can not get a struct page for a PCI bar address unless it's
mapped with ZONE_DEVICE like in my p2p work. So that would explain why
dma_map_single() didn't work.

This all implies that ntb_transport doesn't work with DMA and the IOMMU
turned on. I'm not sure I've ever tried that configuration myself but it
is a bit surprising.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Kit Chow
There is an internal routine (__intel_map_single) inside the intel iommu 
code that does the actual mapping using a phys_addr_t. Think I'll try to 
implement a intel_map_resource routine that calls that routine directly 
without all of the conversions done for dma_map_{single,page} (pci bar 
addr -> page -> phys_addr)...



On 08/10/2018 09:02 AM, Kit Chow wrote:
Turns out there is no dma_map_resource routine on x86. get_dma_ops 
returns intel_dma_ops which has map_resource pointing to NULL.


(gdb) p intel_dma_ops
$7 = {alloc = 0x8150f310 ,
  free = 0x8150ec20 ,
  mmap = 0x0 , get_sgtable = 0x0 ,
  map_page = 0x8150f2d0 ,
  unmap_page = 0x8150ec10 ,
  map_sg = 0x8150ef40 ,
  unmap_sg = 0x8150eb80 ,
  map_resource = 0x0 ,
  unmap_resource = 0x0 ,
  sync_single_for_cpu = 0x0 ,
  sync_single_for_device = 0x0 ,
  sync_sg_for_cpu = 0x0 ,
  sync_sg_for_device = 0x0 ,
  cache_sync = 0x0 ,
  mapping_error = 0x815095f0 ,
  dma_supported = 0x81033830 , is_phys = 0}

Will poke around some in the intel_map_page code but can you actually 
get a valid struct page for a pci bar address (dma_map_single calls 
virt_to_page)?  If not, does a map_resource routine that can properly 
map a pci bar address need to be implemented?


Kit

---


static inline dma_addr_t dma_map_single_attrs(struct device *dev, void 
*ptr,

  size_t size,
  enum dma_data_direction 
dir,

  unsigned long attrs)
{
    const struct dma_map_ops *ops = get_dma_ops(dev);
    dma_addr_t addr;

    BUG_ON(!valid_dma_direction(dir));
    addr = ops->map_page(dev, virt_to_page(ptr),
 offset_in_page(ptr), size,
 dir, attrs);
    debug_dma_map_page(dev, virt_to_page(ptr),
   offset_in_page(ptr), size,
   dir, addr, true);
    return addr;
}






On 08/09/2018 04:00 PM, Kit Chow wrote:



On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:


On 09/08/18 04:48 PM, Kit Chow wrote:

Based on Logan's comments, I am very hopeful that the dma_map_resource
will make things work on the older platforms...

Well, I *think* dma_map_single() would still work. So I'm not that
confident that's the root of your problem. I'd still like to see the
actual code snippet you are using.

Logan
Here's the code snippet - (ntbdebug & 4) path does dma_map_resource 
of the pci bar address.


It was:
    unmap->addr[1] = dma_map_single(device->dev, (void 
*)dest, len,

    DMA_TO_DEVICE);

Kit
---


static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
   struct ntb_queue_entry *entry)
{
    struct dma_async_tx_descriptor *txd;
    struct dma_chan *chan = qp->tx_dma_chan;
    struct dma_device *device;
    size_t len = entry->len;
    void *buf = entry->buf;
    size_t dest_off, buff_off;
    struct dmaengine_unmap_data *unmap;
    dma_addr_t dest;
    dma_cookie_t cookie;
    int unmapcnt;

    device = chan->device;

    dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;

    buff_off = (size_t)buf & ~PAGE_MASK;
    dest_off = (size_t)dest & ~PAGE_MASK;

    if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
    goto err;


    if (ntbdebug & 0x4) {
    unmapcnt = 2;
    } else {
    unmapcnt = 1;
    }

    unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
GFP_NOWAIT);

    if (!unmap)
    goto err;

    unmap->len = len;
    unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
  buff_off, len, DMA_TO_DEVICE);
    if (dma_mapping_error(device->dev, unmap->addr[0]))
    goto err_get_unmap;

    if (ntbdebug & 0x4) {
    unmap->addr[1] = dma_map_resource(device->dev,
    (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
    if (dma_mapping_error(device->dev, unmap->addr[1]))
    goto err_get_unmap;
    unmap->to_cnt = 2;
    } else {
    unmap->addr[1] = dest;
    unmap->to_cnt = 1;
    }

    txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
    unmap->addr[0], len, DMA_PREP_INTERRUPT);

    if (!txd)
    goto err_get_unmap;

    txd->callback_result = ntb_tx_copy_callback;
    txd->callback_param = entry;
    dma_set_unmap(txd, unmap);

    cookie = dmaengine_submit(txd);
    if (dma_submit_error(cookie))
    goto err_set_unmap;

    dmaengine_unmap_put(unmap);

    dma_async_issue_pending(chan);

    return 0;

err_set_unmap:
    dma_descriptor_unmap(txd);
    txd->desc_free(txd);
err_get_unmap:
  

Re: IOAT DMA w/IOMMU

2018-08-10 Thread Kit Chow
Turns out there is no dma_map_resource routine on x86. get_dma_ops 
returns intel_dma_ops which has map_resource pointing to NULL.


(gdb) p intel_dma_ops
$7 = {alloc = 0x8150f310 ,
  free = 0x8150ec20 ,
  mmap = 0x0 , get_sgtable = 0x0 ,
  map_page = 0x8150f2d0 ,
  unmap_page = 0x8150ec10 ,
  map_sg = 0x8150ef40 ,
  unmap_sg = 0x8150eb80 ,
  map_resource = 0x0 ,
  unmap_resource = 0x0 ,
  sync_single_for_cpu = 0x0 ,
  sync_single_for_device = 0x0 ,
  sync_sg_for_cpu = 0x0 ,
  sync_sg_for_device = 0x0 ,
  cache_sync = 0x0 ,
  mapping_error = 0x815095f0 ,
  dma_supported = 0x81033830 , is_phys = 0}

Will poke around some in the intel_map_page code but can you actually 
get a valid struct page for a pci bar address (dma_map_single calls 
virt_to_page)?  If not, does a map_resource routine that can properly 
map a pci bar address need to be implemented?


Kit

---


static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
  size_t size,
  enum dma_data_direction dir,
  unsigned long attrs)
{
    const struct dma_map_ops *ops = get_dma_ops(dev);
    dma_addr_t addr;

    BUG_ON(!valid_dma_direction(dir));
    addr = ops->map_page(dev, virt_to_page(ptr),
 offset_in_page(ptr), size,
 dir, attrs);
    debug_dma_map_page(dev, virt_to_page(ptr),
   offset_in_page(ptr), size,
   dir, addr, true);
    return addr;
}






On 08/09/2018 04:00 PM, Kit Chow wrote:



On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:


On 09/08/18 04:48 PM, Kit Chow wrote:

Based on Logan's comments, I am very hopeful that the dma_map_resource
will make things work on the older platforms...

Well, I *think* dma_map_single() would still work. So I'm not that
confident that's the root of your problem. I'd still like to see the
actual code snippet you are using.

Logan
Here's the code snippet - (ntbdebug & 4) path does dma_map_resource of 
the pci bar address.


It was:
    unmap->addr[1] = dma_map_single(device->dev, (void 
*)dest, len,

    DMA_TO_DEVICE);

Kit
---


static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
   struct ntb_queue_entry *entry)
{
    struct dma_async_tx_descriptor *txd;
    struct dma_chan *chan = qp->tx_dma_chan;
    struct dma_device *device;
    size_t len = entry->len;
    void *buf = entry->buf;
    size_t dest_off, buff_off;
    struct dmaengine_unmap_data *unmap;
    dma_addr_t dest;
    dma_cookie_t cookie;
    int unmapcnt;

    device = chan->device;

    dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;

    buff_off = (size_t)buf & ~PAGE_MASK;
    dest_off = (size_t)dest & ~PAGE_MASK;

    if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
    goto err;


    if (ntbdebug & 0x4) {
    unmapcnt = 2;
    } else {
    unmapcnt = 1;
    }

    unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
GFP_NOWAIT);

    if (!unmap)
    goto err;

    unmap->len = len;
    unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
  buff_off, len, DMA_TO_DEVICE);
    if (dma_mapping_error(device->dev, unmap->addr[0]))
    goto err_get_unmap;

    if (ntbdebug & 0x4) {
    unmap->addr[1] = dma_map_resource(device->dev,
    (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
    if (dma_mapping_error(device->dev, unmap->addr[1]))
    goto err_get_unmap;
    unmap->to_cnt = 2;
    } else {
    unmap->addr[1] = dest;
    unmap->to_cnt = 1;
    }

    txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
    unmap->addr[0], len, DMA_PREP_INTERRUPT);

    if (!txd)
    goto err_get_unmap;

    txd->callback_result = ntb_tx_copy_callback;
    txd->callback_param = entry;
    dma_set_unmap(txd, unmap);

    cookie = dmaengine_submit(txd);
    if (dma_submit_error(cookie))
    goto err_set_unmap;

    dmaengine_unmap_put(unmap);

    dma_async_issue_pending(chan);

    return 0;

err_set_unmap:
    dma_descriptor_unmap(txd);
    txd->desc_free(txd);
err_get_unmap:
    dmaengine_unmap_put(unmap);
err:
    return -ENXIO;
}



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Kit Chow



On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:


On 09/08/18 04:48 PM, Kit Chow wrote:

Based on Logan's comments, I am very hopeful that the dma_map_resource
will make things work on the older platforms...

Well, I *think* dma_map_single() would still work. So I'm not that
confident that's the root of your problem. I'd still like to see the
actual code snippet you are using.

Logan
Here's the code snippet - (ntbdebug & 4) path does dma_map_resource of 
the pci bar address.


It was:
    unmap->addr[1] = dma_map_single(device->dev, (void 
*)dest, len,

    DMA_TO_DEVICE);

Kit
---


static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
   struct ntb_queue_entry *entry)
{
    struct dma_async_tx_descriptor *txd;
    struct dma_chan *chan = qp->tx_dma_chan;
    struct dma_device *device;
    size_t len = entry->len;
    void *buf = entry->buf;
    size_t dest_off, buff_off;
    struct dmaengine_unmap_data *unmap;
    dma_addr_t dest;
    dma_cookie_t cookie;
    int unmapcnt;

    device = chan->device;

    dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;

    buff_off = (size_t)buf & ~PAGE_MASK;
    dest_off = (size_t)dest & ~PAGE_MASK;

    if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
    goto err;


    if (ntbdebug & 0x4) {
    unmapcnt = 2;
    } else {
    unmapcnt = 1;
    }

    unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
GFP_NOWAIT);

    if (!unmap)
    goto err;

    unmap->len = len;
    unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
  buff_off, len, DMA_TO_DEVICE);
    if (dma_mapping_error(device->dev, unmap->addr[0]))
    goto err_get_unmap;

    if (ntbdebug & 0x4) {
    unmap->addr[1] = dma_map_resource(device->dev,
    (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
    if (dma_mapping_error(device->dev, unmap->addr[1]))
    goto err_get_unmap;
    unmap->to_cnt = 2;
    } else {
    unmap->addr[1] = dest;
    unmap->to_cnt = 1;
    }

    txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
    unmap->addr[0], len, DMA_PREP_INTERRUPT);

    if (!txd)
    goto err_get_unmap;

    txd->callback_result = ntb_tx_copy_callback;
    txd->callback_param = entry;
    dma_set_unmap(txd, unmap);

    cookie = dmaengine_submit(txd);
    if (dma_submit_error(cookie))
    goto err_set_unmap;

    dmaengine_unmap_put(unmap);

    dma_async_issue_pending(chan);

    return 0;

err_set_unmap:
    dma_descriptor_unmap(txd);
    txd->desc_free(txd);
err_get_unmap:
    dmaengine_unmap_put(unmap);
err:
    return -ENXIO;
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Logan Gunthorpe



On 09/08/18 04:48 PM, Kit Chow wrote:
> Based on Logan's comments, I am very hopeful that the dma_map_resource 
> will make things work on the older platforms...

Well, I *think* dma_map_single() would still work. So I'm not that
confident that's the root of your problem. I'd still like to see the
actual code snippet you are using.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Kit Chow



On 08/09/2018 03:40 PM, Jiang, Dave wrote:

-Original Message-
From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] 
On Behalf Of Kit Chow
Sent: Thursday, August 9, 2018 2:48 PM
To: Logan Gunthorpe ; Eric Pilmore ; Bjorn 
Helgaas 
Cc: linux-...@vger.kernel.org; David Woodhouse ; Alex Williamson 
;
iommu@lists.linux-foundation.org
Subject: Re: IOAT DMA w/IOMMU



On 08/09/2018 02:11 PM, Logan Gunthorpe wrote:

On 09/08/18 02:57 PM, Kit Chow wrote:

On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:

On 09/08/18 01:47 PM, Kit Chow wrote:

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

I have experimented with changing ntb_async_tx_submit to dma_map the PCI
BAR
address. With this, I get a different DMAR error:

What code did you use to do this?

If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I
believe it was dma_map_single.

I mean the complete code you use to do the mapping so we can see if it's
correct. dma_map_single() seems like an odd choice, I expected to see
dma_map_resource().

Thanks for the suggestion!  Will try out dma_map_resource and report back.

Kit

Kit, I was able to try this on the Skylake Xeon platform with Intel NTB and 
ioatdma and I did not encounter any issues. Unfortunately I do not have my 
pre-Skylake platform to test anymore to verify.
We are expecting to get skylakes within a couple of weeks and I'll have 
a chance to give it a go on there.


Based on Logan's comments, I am very hopeful that the dma_map_resource 
will make things work on the older platforms...


Thanks
Kit



DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
[fault reason 12] non-zero reserved fields in PTE

Also, what device corresponds to 07:00.4 on your system?

I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Using the mapping with the PLX dma device likely converts it from a pure
P2P request to one where the TLPs pass through the IOMMU. So the fact
that you get the same error with both means IOAT almost certainly goes
through the IOMMU and there's something wrong with the mapping setup.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: IOAT DMA w/IOMMU

2018-08-09 Thread Jiang, Dave
> -Original Message-
> From: linux-pci-ow...@vger.kernel.org 
> [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Kit Chow
> Sent: Thursday, August 9, 2018 2:48 PM
> To: Logan Gunthorpe ; Eric Pilmore 
> ; Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org; David Woodhouse ; Alex 
> Williamson ;
> iommu@lists.linux-foundation.org
> Subject: Re: IOAT DMA w/IOMMU
> 
> 
> 
> On 08/09/2018 02:11 PM, Logan Gunthorpe wrote:
> >
> > On 09/08/18 02:57 PM, Kit Chow wrote:
> >>
> >> On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
> >>> On 09/08/18 01:47 PM, Kit Chow wrote:
> >>>>> I haven't tested this scenario but my guess would be that IOAT would
> >>>>> indeed go through the IOMMU and the PCI BAR address would need to be
> >>>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
> >>>>> is probably a good indication that this is the case. I really don't know
> >>>>> why you'd want to DMA something without mapping it.
> >>>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
> >>>> BAR
> >>>> address. With this, I get a different DMAR error:
> >>> What code did you use to do this?
> >> If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I
> >> believe it was dma_map_single.
> > I mean the complete code you use to do the mapping so we can see if it's
> > correct. dma_map_single() seems like an odd choice, I expected to see
> > dma_map_resource().
> Thanks for the suggestion!  Will try out dma_map_resource and report back.
> 
> Kit

Kit, I was able to try this on the Skylake Xeon platform with Intel NTB and 
ioatdma and I did not encounter any issues. Unfortunately I do not have my 
pre-Skylake platform to test anymore to verify.

> 
> >>>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
> >>>> [fault reason 12] non-zero reserved fields in PTE
> >>> Also, what device corresponds to 07:00.4 on your system?
> >> I believe 07.00.4 was the PLX dma device. I get the same error with ioat.
> > Using the mapping with the PLX dma device likely converts it from a pure
> > P2P request to one where the TLPs pass through the IOMMU. So the fact
> > that you get the same error with both means IOAT almost certainly goes
> > through the IOMMU and there's something wrong with the mapping setup.
> >
> > Logan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Kit Chow



On 08/09/2018 02:11 PM, Logan Gunthorpe wrote:


On 09/08/18 02:57 PM, Kit Chow wrote:


On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:

On 09/08/18 01:47 PM, Kit Chow wrote:

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

I have experimented with changing ntb_async_tx_submit to dma_map the PCI
BAR
address. With this, I get a different DMAR error:

What code did you use to do this?

If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I
believe it was dma_map_single.

I mean the complete code you use to do the mapping so we can see if it's
correct. dma_map_single() seems like an odd choice, I expected to see
dma_map_resource().

Thanks for the suggestion!  Will try out dma_map_resource and report back.

Kit


DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
[fault reason 12] non-zero reserved fields in PTE

Also, what device corresponds to 07:00.4 on your system?

I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Using the mapping with the PLX dma device likely converts it from a pure
P2P request to one where the TLPs pass through the IOMMU. So the fact
that you get the same error with both means IOAT almost certainly goes
through the IOMMU and there's something wrong with the mapping setup.

Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Logan Gunthorpe



On 09/08/18 03:31 PM, Eric Pilmore wrote:
> On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe  wrote:
>> Hey,
>>
>> On 09/08/18 12:51 PM, Eric Pilmore wrote:
> Was wondering if anybody here has used IOAT DMA engines with an
> IOMMU turned on (Xeon based system)? My specific question is really
> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
> destination without having to map that address to the IOVA space of
> the DMA engine first (assuming the IOMMU is on)?
>>
>> I haven't tested this scenario but my guess would be that IOAT would
>> indeed go through the IOMMU and the PCI BAR address would need to be
>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>> is probably a good indication that this is the case. I really don't know
>> why you'd want to DMA something without mapping it.
> 
> The thought was to avoid paying the price of having to go through yet another
> translation and also because it was not believed to be necessary anyway since
> the DMA device could go straight to a PCI BAR address without the need for a
> mapping.  We have been playing with two DMA engines, IOAT and PLX. The
> PLX does not have any issues going straight to the PCI BAR address, but unlike
> IOAT, PLX is sitting "in the PCI tree".

Yes, you can do true P2P transactions with the PLX DMA engine. (Though
you do have to be careful as technically you need to translate to a PCI
bus address not the CPU physical address which are the same on x86; and
you need to watch that ACS doesn't mess it up).

It doesn't sound like it's possible to avoid the IOMMU when using the
IOAT so you need the mapping. I would not expect the extra translation
in the IOMMU to be noticeable, performance wise.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Eric Pilmore
On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe  wrote:
> Hey,
>
> On 09/08/18 12:51 PM, Eric Pilmore wrote:
 Was wondering if anybody here has used IOAT DMA engines with an
 IOMMU turned on (Xeon based system)? My specific question is really
 whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
 destination without having to map that address to the IOVA space of
 the DMA engine first (assuming the IOMMU is on)?
>
> I haven't tested this scenario but my guess would be that IOAT would
> indeed go through the IOMMU and the PCI BAR address would need to be
> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
> is probably a good indication that this is the case. I really don't know
> why you'd want to DMA something without mapping it.

The thought was to avoid paying the price of having to go through yet another
translation and also because it was not believed to be necessary anyway since
the DMA device could go straight to a PCI BAR address without the need for a
mapping.  We have been playing with two DMA engines, IOAT and PLX. The
PLX does not have any issues going straight to the PCI BAR address, but unlike
IOAT, PLX is sitting "in the PCI tree".

>
>>> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
>>> be a transaction initiated by a PCI device, to a PCI BAR address, so
>>> it doesn't sound like system memory is involved.
>>
>> No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
>> has had an IOMMU mapping created) to a PCI BAR address.
>
> It's definitely peer-to-peer in the case where you are using a DMA
> engine in the PCI tree. You have the DMA PCI device sending TLPs
> directly to the PCI BAR device. So, if everything is done right, the
> TLPs will avoid the root complex completely. (Though, ACS settings could
> also prevent this from working and you'd either get similar DMAR errors
> or they'd disappear into a black hole).
>
> When using the IOAT, it is part of the CPU so I wouldn't say it's really
> peer-to-peer. But an argument could be made that it is. Though, this is
> exactly what the existing ntb_transport is doing: DMAing from system
> memory to a PCI BAR and vice versa using IOAT.
>
> Logan
>



-- 
Eric Pilmore
epilm...@gigaio.com
http://gigaio.com
Phone: (858) 775 2514

This e-mail message is intended only for the individual(s) to whom it
is addressed and
may contain information that is privileged, confidential, proprietary,
or otherwise exempt
from disclosure under applicable law. If you believe you have received
this message in
error, please advise the sender by return e-mail and delete it from
your mailbox.
Thank you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Logan Gunthorpe


On 09/08/18 02:57 PM, Kit Chow wrote:
> 
> 
> On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
>>
>> On 09/08/18 01:47 PM, Kit Chow wrote:
 I haven't tested this scenario but my guess would be that IOAT would
 indeed go through the IOMMU and the PCI BAR address would need to be
 properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
 is probably a good indication that this is the case. I really don't know
 why you'd want to DMA something without mapping it.
>>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
>>> BAR
>>> address. With this, I get a different DMAR error:
>> What code did you use to do this?
> If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I 
> believe it was dma_map_single.

I mean the complete code you use to do the mapping so we can see if it's
correct. dma_map_single() seems like an odd choice, I expected to see
dma_map_resource().

>>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
>>> [fault reason 12] non-zero reserved fields in PTE
>> Also, what device corresponds to 07:00.4 on your system?
> I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Using the mapping with the PLX dma device likely converts it from a pure
P2P request to one where the TLPs pass through the IOMMU. So the fact
that you get the same error with both means IOAT almost certainly goes
through the IOMMU and there's something wrong with the mapping setup.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Kit Chow



On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:


On 09/08/18 01:47 PM, Kit Chow wrote:

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

I have experimented with changing ntb_async_tx_submit to dma_map the PCI
BAR
address. With this, I get a different DMAR error:

What code did you use to do this?
If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I 
believe it was dma_map_single.





DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
[fault reason 12] non-zero reserved fields in PTE

Also, what device corresponds to 07:00.4 on your system?

I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Kit



Logan


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOAT DMA w/IOMMU

2018-08-09 Thread Logan Gunthorpe



On 09/08/18 01:47 PM, Kit Chow wrote:
>> I haven't tested this scenario but my guess would be that IOAT would
>> indeed go through the IOMMU and the PCI BAR address would need to be
>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>> is probably a good indication that this is the case. I really don't know
>> why you'd want to DMA something without mapping it.
> I have experimented with changing ntb_async_tx_submit to dma_map the PCI 
> BAR
> address. With this, I get a different DMAR error:

What code did you use to do this?


> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
> [fault reason 12] non-zero reserved fields in PTE

Also, what device corresponds to 07:00.4 on your system?

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Logan Gunthorpe
Hey,

On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>> Was wondering if anybody here has used IOAT DMA engines with an
>>> IOMMU turned on (Xeon based system)? My specific question is really
>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>> destination without having to map that address to the IOVA space of
>>> the DMA engine first (assuming the IOMMU is on)?

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

>> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
>> be a transaction initiated by a PCI device, to a PCI BAR address, so
>> it doesn't sound like system memory is involved.
> 
> No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
> has had an IOMMU mapping created) to a PCI BAR address.

It's definitely peer-to-peer in the case where you are using a DMA
engine in the PCI tree. You have the DMA PCI device sending TLPs
directly to the PCI BAR device. So, if everything is done right, the
TLPs will avoid the root complex completely. (Though, ACS settings could
also prevent this from working and you'd either get similar DMAR errors
or they'd disappear into a black hole).

When using the IOAT, it is part of the CPU so I wouldn't say it's really
peer-to-peer. But an argument could be made that it is. Though, this is
exactly what the existing ntb_transport is doing: DMAing from system
memory to a PCI BAR and vice versa using IOAT.

Logan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Kit Chow



On 08/09/2018 12:35 PM, Logan Gunthorpe wrote:

Hey,

On 09/08/18 12:51 PM, Eric Pilmore wrote:

Was wondering if anybody here has used IOAT DMA engines with an
IOMMU turned on (Xeon based system)? My specific question is really
whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
destination without having to map that address to the IOVA space of
the DMA engine first (assuming the IOMMU is on)?

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.
I have experimented with changing ntb_async_tx_submit to dma_map the PCI 
BAR

address. With this, I get a different DMAR error:

DMAR: [DMA Write] Request device [07:00.4] fault addr ffd0
[fault reason 12] non-zero reserved fields in PTE

Kit

So is this a peer-to-peer DMA scenario?  You mention DMA, which would
be a transaction initiated by a PCI device, to a PCI BAR address, so
it doesn't sound like system memory is involved.

No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
has had an IOMMU mapping created) to a PCI BAR address.

It's definitely peer-to-peer in the case where you are using a DMA
engine in the PCI tree. You have the DMA PCI device sending TLPs
directly to the PCI BAR device. So, if everything is done right, the
TLPs will avoid the root complex completely. (Though, ACS settings could
also prevent this from working and you'd either get similar DMAR errors
or they'd disappear into a black hole).

When using the IOAT, it is part of the CPU so I wouldn't say it's really
peer-to-peer. But an argument could be made that it is. Though, this is
exactly what the existing ntb_transport is doing: DMAing from system
memory to a PCI BAR and vice versa using IOAT.

Logan



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Eric Pilmore
On Thu, Aug 9, 2018 at 11:43 AM, Bjorn Helgaas  wrote:
> [+cc David, Logan, Alex, iommu list]
>
> On Thu, Aug 09, 2018 at 11:14:13AM -0700, Eric Pilmore wrote:
>> Didn't get any response on the IRC channel so trying here.
>>
>> Was wondering if anybody here has used IOAT DMA engines with an
>> IOMMU turned on (Xeon based system)? My specific question is really
>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>> destination without having to map that address to the IOVA space of
>> the DMA engine first (assuming the IOMMU is on)?
>
> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
> be a transaction initiated by a PCI device, to a PCI BAR address, so
> it doesn't sound like system memory is involved.

No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
has had an IOMMU mapping created) to a PCI BAR address.

>
> I copied some folks who know a lot more about this than I do.

Thank you!


>
>> I am encountering issues where I see PTE Errors reported from DMAR
>> in this scenario, but I do not if I use a different DMA engine
>> that's sitting downstream off the PCI tree. I'm wondering if the
>> IOAT DMA failure is some artifact of these engines sitting behind
>> the Host Bridge.
>>
>> Thanks in advance!
>> Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOAT DMA w/IOMMU

2018-08-09 Thread Bjorn Helgaas
[+cc David, Logan, Alex, iommu list]

On Thu, Aug 09, 2018 at 11:14:13AM -0700, Eric Pilmore wrote:
> Didn't get any response on the IRC channel so trying here.
> 
> Was wondering if anybody here has used IOAT DMA engines with an
> IOMMU turned on (Xeon based system)? My specific question is really
> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
> destination without having to map that address to the IOVA space of
> the DMA engine first (assuming the IOMMU is on)?

So is this a peer-to-peer DMA scenario?  You mention DMA, which would
be a transaction initiated by a PCI device, to a PCI BAR address, so
it doesn't sound like system memory is involved.

I copied some folks who know a lot more about this than I do.

> I am encountering issues where I see PTE Errors reported from DMAR
> in this scenario, but I do not if I use a different DMA engine
> that's sitting downstream off the PCI tree. I'm wondering if the
> IOAT DMA failure is some artifact of these engines sitting behind
> the Host Bridge.
> 
> Thanks in advance!
> Eric
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu