Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-19 Thread Jacob Pan
Hi Jason,

On Mon, 17 May 2021 11:37:58 -0300, Jason Gunthorpe  wrote:

> On Thu, May 13, 2021 at 04:40:28PM -0700, Jacob Pan wrote:
> 
> > Looks like we are converging. Let me summarize the takeaways:
> > 1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
> > will be no flags at all for iommu_sva_bind_device()
> > 2. Remove all supervisor SVA related vt-d, idxd code.
> > 3. Create API iommu_setup_system_pasid_direct_map(option_flag)
> > if (option_flag == 1)
> > iommu_domain_alloc(IOMMU_DOMAIN_DMA);
> > if (option_flag == 2)
> > iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain
> > type? setup IOMMU page tables mirroring the direct map
> > 4. Create API iommu_enable_dev_direct_map(struct dev, , )
> > - Drivers call this API to get the system PASID and which
> > option is available on the system PASID
> > - mark device as PASID only, perhaps a new flag in struct
> > device->dev_iommu->pasid_only = 1
> > 5. DMA API IOMMU vendor ops will take action based on the pasid_only
> > flag to decide if the mapping is for system PASID page tables.
> > 
> > Does it make sense?  
> 
> I think you will run into trouble with that approach when you get to
> patches..
> 
> For 'option 1' what you want is an API that is 'give me a PASID that
> is equivalent to the RID'.
> 
> Then all the DMA API operations map IO page tables to both RID and
> PASID access. For the direct mode the PASID and RID will both point at
> the shared all physical memory IO page table.
> 
> Otherwise the DMA API won't care if the device is using RID or PASID,
> if it needs to map a range it does it to the shared IO page table and
> flushes both the RID and PASID based caches.
> 
> Then the driver will use the normal DMA API with its normal struct
> pci_device and simply tell the HW to do DMA TLP's with the returned
> PASID.
> 
> For 'option 2' it should be a completely different API family.
> 
Make sense, thanks for the suggestions.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-17 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 04:40:28PM -0700, Jacob Pan wrote:

> Looks like we are converging. Let me summarize the takeaways:
> 1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
> will be no flags at all for iommu_sva_bind_device()
> 2. Remove all supervisor SVA related vt-d, idxd code.
> 3. Create API iommu_setup_system_pasid_direct_map(option_flag)
>   if (option_flag == 1)
>   iommu_domain_alloc(IOMMU_DOMAIN_DMA);
>   if (option_flag == 2)
>   iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain type?
>   setup IOMMU page tables mirroring the direct map
> 4. Create API iommu_enable_dev_direct_map(struct dev, , )
>   - Drivers call this API to get the system PASID and which option is
>   available on the system PASID
>   - mark device as PASID only, perhaps a new flag in struct
>   device->dev_iommu->pasid_only = 1
> 5. DMA API IOMMU vendor ops will take action based on the pasid_only flag to
> decide if the mapping is for system PASID page tables.
> 
> Does it make sense?

I think you will run into trouble with that approach when you get to
patches..

For 'option 1' what you want is an API that is 'give me a PASID that
is equivalent to the RID'.

Then all the DMA API operations map IO page tables to both RID and
PASID access. For the direct mode the PASID and RID will both point at
the shared all physical memory IO page table.

Otherwise the DMA API won't care if the device is using RID or PASID,
if it needs to map a range it does it to the shared IO page table and
flushes both the RID and PASID based caches.

Then the driver will use the normal DMA API with its normal struct
pci_device and simply tell the HW to do DMA TLP's with the returned
PASID.

For 'option 2' it should be a completely different API family.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jacob Pan
Hi Jason,

On Thu, 13 May 2021 19:31:22 -0300, Jason Gunthorpe  wrote:

> On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> > Hi Tony,
> > 
> > On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" 
> > wrote:
> >   
> > > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:  
> > > > It seems there are two options:
> > > > 1. Add a new IOMMU API to set up a system PASID with a *separate*
> > > > IOMMU page table/domain, mark the device is PASID only with a flag.
> > > > Use DMA APIs to explicit map/unmap. Based on this PASID-only flag,
> > > > Vendor IOMMU driver will decide whether to use system PASID domain
> > > > during map/unmap. Not clear if we also need to make IOVA==kernel VA.
> > > > 
> > > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > > init_mm.pgd. This API only allows trusted device to bind with the
> > > > system PASID at its own risk. There is no need for DMA API. This is
> > > > the same as the current code except with an explicit API.
> > > > 
> > > > Which option?
> > > 
> > > Option #1 looks cleaner to me. Option #2 gives access to bits
> > > of memory that the users of system PASID shouldn't ever need
> > > to touch ... just map regions of memory that the kernel has
> > > a "struct page" for.
> > > 
> > > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > > for the whole region?
> > >   
> > If we map the entire kernel direct map during system PASID setup, then
> > we don't need to use DMA API to map/unmap certain range.
> > 
> > I was thinking this system PASID page table could be on-demand. The
> > mapping is built by explicit use of DMA map/unmap APIs.  
> 
> Option 1 should be the PASID works exactly like a normal RID and uses
> all the normal DMA APIs and IOMMU mechanisms, whatever the platform
> implements. This might mean an iommu update on every operation or not.
>  
> > > I'm expecting that once this system PASID has been initialized,
> > > then any accelerator device with a kernel use case would use the
> > > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > > & decompression, etc.
> > >   
> > OK, sounds like we have to map the entire kernel VA with struct page as
> > you said. So we still by-pass DMA APIs, can we all agree on that?  
> 
> Option 2 should be the faster option, but not available in all cases.
> 
> Option 1 isn't optional. DMA and IOMMU code has to be portable and
> this is the portable API.
> 
> If you want to do option 1 and option 2 then give it a go, but in most
> common cases with the IOMMU in a direct map you shouldn't get a
> notable performance win.
> 
Looks like we are converging. Let me summarize the takeaways:
1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
will be no flags at all for iommu_sva_bind_device()
2. Remove all supervisor SVA related vt-d, idxd code.
3. Create API iommu_setup_system_pasid_direct_map(option_flag)
if (option_flag == 1)
iommu_domain_alloc(IOMMU_DOMAIN_DMA);
if (option_flag == 2)
iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain type?
setup IOMMU page tables mirroring the direct map
4. Create API iommu_enable_dev_direct_map(struct dev, , )
- Drivers call this API to get the system PASID and which option is
available on the system PASID
- mark device as PASID only, perhaps a new flag in struct
device->dev_iommu->pasid_only = 1
5. DMA API IOMMU vendor ops will take action based on the pasid_only flag to
decide if the mapping is for system PASID page tables.

Does it make sense?


> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> Hi Tony,
> 
> On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" 
> wrote:
> 
> > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > > It seems there are two options:
> > > 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU
> > > page table/domain, mark the device is PASID only with a flag. Use DMA
> > > APIs to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU
> > > driver will decide whether to use system PASID domain during map/unmap.
> > > Not clear if we also need to make IOVA==kernel VA.
> > > 
> > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > init_mm.pgd. This API only allows trusted device to bind with the
> > > system PASID at its own risk. There is no need for DMA API. This is the
> > > same as the current code except with an explicit API.
> > > 
> > > Which option?  
> > 
> > Option #1 looks cleaner to me. Option #2 gives access to bits
> > of memory that the users of system PASID shouldn't ever need
> > to touch ... just map regions of memory that the kernel has
> > a "struct page" for.
> > 
> > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > for the whole region?
> > 
> If we map the entire kernel direct map during system PASID setup, then we
> don't need to use DMA API to map/unmap certain range.
> 
> I was thinking this system PASID page table could be on-demand. The mapping
> is built by explicit use of DMA map/unmap APIs.

Option 1 should be the PASID works exactly like a normal RID and uses
all the normal DMA APIs and IOMMU mechanisms, whatever the platform
implements. This might mean an iommu update on every operation or not.
 
> > I'm expecting that once this system PASID has been initialized,
> > then any accelerator device with a kernel use case would use the
> > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > & decompression, etc.
> > 
> OK, sounds like we have to map the entire kernel VA with struct page as you
> said. So we still by-pass DMA APIs, can we all agree on that?

Option 2 should be the faster option, but not available in all cases.

Option 1 isn't optional. DMA and IOMMU code has to be portable and
this is the portable API.

If you want to do option 1 and option 2 then give it a go, but in most
common cases with the IOMMU in a direct map you shouldn't get a
notable performance win.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jacob Pan
Hi Tony,

On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" 
wrote:

> On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > It seems there are two options:
> > 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU
> > page table/domain, mark the device is PASID only with a flag. Use DMA
> > APIs to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU
> > driver will decide whether to use system PASID domain during map/unmap.
> > Not clear if we also need to make IOVA==kernel VA.
> > 
> > 2. Add a new IOMMU API to setup a system PASID which points to
> > init_mm.pgd. This API only allows trusted device to bind with the
> > system PASID at its own risk. There is no need for DMA API. This is the
> > same as the current code except with an explicit API.
> > 
> > Which option?  
> 
> Option #1 looks cleaner to me. Option #2 gives access to bits
> of memory that the users of system PASID shouldn't ever need
> to touch ... just map regions of memory that the kernel has
> a "struct page" for.
> 
> What does "use DMA APIs to explicitly map/unmap" mean? Is that
> for the whole region?
> 
If we map the entire kernel direct map during system PASID setup, then we
don't need to use DMA API to map/unmap certain range.

I was thinking this system PASID page table could be on-demand. The mapping
is built by explicit use of DMA map/unmap APIs.

> I'm expecting that once this system PASID has been initialized,
> then any accelerator device with a kernel use case would use the
> same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> & decompression, etc.
> 
OK, sounds like we have to map the entire kernel VA with struct page as you
said. So we still by-pass DMA APIs, can we all agree on that?

> -Tony


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> It seems there are two options:
> 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU page
> table/domain, mark the device is PASID only with a flag. Use DMA APIs
> to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU driver
> will decide whether to use system PASID domain during map/unmap. Not clear
> if we also need to make IOVA==kernel VA.
> 
> 2. Add a new IOMMU API to setup a system PASID which points to init_mm.pgd.
> This API only allows trusted device to bind with the system PASID at its
> own risk. There is no need for DMA API. This is the same as the current
> code except with an explicit API.
> 
> Which option?

Option #1 looks cleaner to me. Option #2 gives access to bits
of memory that the users of system PASID shouldn't ever need
to touch ... just map regions of memory that the kernel has
a "struct page" for.

What does "use DMA APIs to explicitly map/unmap" mean? Is that
for the whole region?

I'm expecting that once this system PASID has been initialized,
then any accelerator device with a kernel use case would use the
same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
& decompression, etc.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jacob Pan
Hi Jason,

On Thu, 13 May 2021 16:20:14 -0300, Jason Gunthorpe  wrote:

> On Thu, May 13, 2021 at 07:14:54PM +, Luck, Tony wrote:
> > > If you want this then be explicit about what it is you are making when
> > > building the API. Don't try to hide it under some generic idea of
> > > "kernel PCI DMA SVA"  
> > 
> > So, a special API call (that Dave can call from IDXD) to set up this
> > kernel PASID. With suitable documentation to explain the scope.
> > Maybe with a separate CONFIG option so it can be completely
> > stubbed out (IDXD does *NOT* "select" this option ... users have
> > to explicitly pick it).
> >   
> > > I could easily see an admin option to "turn this off" entirely as
> > > being too dangerous, especially if the users have no interest in
> > > IDXD.  
> > 
> > And a kernel command line option to block IDXD from using that
> > special API ... for users on generic kernels who want to block
> > this use model (but still use IDXD in non-kernel cases). Users
> > who don't want IDXD at all can block loading of the driver.  
> 
> A generic IOMMU API should not be IDXD specific, if you want to allow
> some special "integrated to the SOC accelerator PASID" mode then it
> should be a global IOMMU API and any security toggles for it should be
> global and unrelated to IDXD.
> 
> Concurrently it seems necessary to have a solution for secure kernel
> PASID use under the DMA API and reserve this special mode for getting
> higher performance.
> 
> I think you need to come with a proposal, and it is a bit alarming a
> noteworthy security hole was added under the guise of "kernel SVA" :(
> 
It seems there are two options:
1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU page
table/domain, mark the device is PASID only with a flag. Use DMA APIs
to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU driver
will decide whether to use system PASID domain during map/unmap. Not clear
if we also need to make IOVA==kernel VA.

2. Add a new IOMMU API to setup a system PASID which points to init_mm.pgd.
This API only allows trusted device to bind with the system PASID at its
own risk. There is no need for DMA API. This is the same as the current
code except with an explicit API.

Which option?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 07:14:54PM +, Luck, Tony wrote:
> > If you want this then be explicit about what it is you are making when
> > building the API. Don't try to hide it under some generic idea of
> > "kernel PCI DMA SVA"
> 
> So, a special API call (that Dave can call from IDXD) to set up this
> kernel PASID. With suitable documentation to explain the scope.
> Maybe with a separate CONFIG option so it can be completely
> stubbed out (IDXD does *NOT* "select" this option ... users have
> to explicitly pick it).
> 
> > I could easily see an admin option to "turn this off" entirely as
> > being too dangerous, especially if the users have no interest in IDXD.
> 
> And a kernel command line option to block IDXD from using that
> special API ... for users on generic kernels who want to block
> this use model (but still use IDXD in non-kernel cases). Users
> who don't want IDXD at all can block loading of the driver.

A generic IOMMU API should not be IDXD specific, if you want to allow
some special "integrated to the SOC accelerator PASID" mode then it
should be a global IOMMU API and any security toggles for it should be
global and unrelated to IDXD.

Concurrently it seems necessary to have a solution for secure kernel
PASID use under the DMA API and reserve this special mode for getting
higher performance.

I think you need to come with a proposal, and it is a bit alarming a
noteworthy security hole was added under the guise of "kernel SVA" :(

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
> If you want this then be explicit about what it is you are making when
> building the API. Don't try to hide it under some generic idea of
> "kernel PCI DMA SVA"

So, a special API call (that Dave can call from IDXD) to set up this
kernel PASID. With suitable documentation to explain the scope.
Maybe with a separate CONFIG option so it can be completely
stubbed out (IDXD does *NOT* "select" this option ... users have
to explicitly pick it).

> I could easily see an admin option to "turn this off" entirely as
> being too dangerous, especially if the users have no interest in IDXD.

And a kernel command line option to block IDXD from using that
special API ... for users on generic kernels who want to block
this use model (but still use IDXD in non-kernel cases). Users
who don't want IDXD at all can block loading of the driver.

Would that work?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 11:53:49AM -0700, Luck, Tony wrote:

> I'd like people to think of DSA as an extension to the instruction
> set. It implements asynchronous instructions like "MEMFILL" and
> "MEMCOPY". These can be limited in scope when executed in user
> processes or guests. But when executed by the host OS ring0 code
> they can have all the same access that ring0 code has when it
> dereferences a pointer.

If you want this then be explicit about what it is you are making when
building the API. Don't try to hide it under some generic idea of
"kernel PCI DMA SVA"

Add appropriately safe APIs that might have a chance of making it
secure, eg by confirming it is a physically trusted on-socket device.

It is not just Thunderbolt devices that could trigger this, many
devices with programmable firmware can spoof PCI DID/VID - having an
exploit chain that compromises a in-system FW device, chains it to
faking a IDXD HW, then accessing the all-memory PASID is pretty
alarming if the admin thought they had protection against DMA attacks.

I could easially see an admin option to "turn this off" entirely as
being too dangerous, especially if the users have no interest in IDXD.

Which is why being explicit of intent is really important.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
On Thu, May 13, 2021 at 02:33:03PM -0300, Jason Gunthorpe wrote:
> The page table under the kernel PASID should behave the same way that
> the kernel would operate the page table assigned to a kernel RID.
> 
> If the kernel has security off then the PASID should map to all
> physical memory, just like the RID does.
> 
> If security is on then every DMA map needs to be loaded into the
> PASID's io page table no different than a RID page table.
> 
> "kernel SVA" is, IMHO, not a desirable thing, it completely destroys
> the kernel's DMA security model.
> 
> > If people want to use an accelerator on memory allocated by vmalloc()
> > things will get more complicated. But maybe we can delay solving that
> > problem until someone comes up with a real use case that needs to
> > do this?
> 
> If you have a HW limitation that the device can only issue TLPs
> with a PASID, even for kernel users, then I think the proper thing is
> to tell the IOMMU layer than a certain 'struct device' enters
> PASID-only mode and the IOMMU layer should construct an appropriate
> PASID and flow the dma operations through it.
> 
> Pretending the DMA layer doesn't exist and that PASID gets a free pass
> is not OK in the kernel.

I can see why a tight security model is needed to stop
random devices having access to mamory that they should
not be able to access.  Now that PCIe devices can be plugged
into Thunderbolt ports on computers, nobody wants to repeat
the disaster that Firewire ports created for systems over
a decade ago.

But I'd like to challege the one-size-fits-all policy. There's
a big difference between a random device plugged into a port
(which may even lie about its VendorID/DeviceID) and a device
that is part of the CPU socket.

I'd like people to think of DSA as an extension to the instruction
set. It implements asynchronous instructions like "MEMFILL" and
"MEMCOPY". These can be limited in scope when executed in user
processes or guests. But when executed by the host OS ring0 code
they can have all the same access that ring0 code has when it
dereferences a pointer.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 04:44:14PM +, Luck, Tony wrote:
> > For shared workqueue, it can only generate DMA request with PASID. The
> > submission is done by ENQCMDS (S for supervisor) instruction.
> >
> > If we were not to share page tables with init_mm, we need a system PASID
> > that doing the same direct mapping in IOMMU page tables.
> 
> Note that for the currently envisioned kernel use cases for accelerators it
> would be OK for this system PASID to just provide either:
> 
> 1) A 1:1 mapping for physical addresses.  Kernel users of the accelerators
>   would provide physical addresses in descriptors.
> 2) The same mapping that the kernel uses for its "1:1" map of all physical
> memory. Users would use kernel virtual addresses in that "1:1" range
> (e.g. those obtained from page_to_virt(struct page *p);)

Well, no, neither of those are OK.

The page table under the kernel PASID should behave the same way that
the kernel would operate the page table assigned to a kernel RID.

If the kernel has security off then the PASID should map to all
physical memory, just like the RID does.

If security is on then every DMA map needs to be loaded into the
PASID's io page table no different than a RID page table.

"kernel SVA" is, IMHO, not a desirable thing, it completely destroys
the kernel's DMA security model.

> If people want to use an accelerator on memory allocated by vmalloc()
> things will get more complicated. But maybe we can delay solving that
> problem until someone comes up with a real use case that needs to
> do this?

If you have a HW limitation that the device can only issue TLPs
with a PASID, even for kernel users, then I think the proper thing is
to tell the IOMMU layer than a certain 'struct device' enters
PASID-only mode and the IOMMU layer should construct an appropriate
PASID and flow the dma operations through it.

Pretending the DMA layer doesn't exist and that PASID gets a free pass
is not OK in the kernel.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
> For shared workqueue, it can only generate DMA request with PASID. The
> submission is done by ENQCMDS (S for supervisor) instruction.
>
> If we were not to share page tables with init_mm, we need a system PASID
> that doing the same direct mapping in IOMMU page tables.

Note that for the currently envisioned kernel use cases for accelerators it
would be OK for this system PASID to just provide either:

1) A 1:1 mapping for physical addresses.  Kernel users of the accelerators
  would provide physical addresses in descriptors.
2) The same mapping that the kernel uses for its "1:1" map of all physical
memory. Users would use kernel virtual addresses in that "1:1" range
(e.g. those obtained from page_to_virt(struct page *p);)

If people want to use an accelerator on memory allocated by vmalloc()
things will get more complicated. But maybe we can delay solving that
problem until someone comes up with a real use case that needs to
do this?

-Tony


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jacob Pan
Hi Jason,

On Thu, 13 May 2021 10:38:34 -0300, Jason Gunthorpe  wrote:

> On Thu, May 13, 2021 at 06:00:12AM -0700, Jacob Pan wrote:
> > > > If you want to do SVA PASID then it also must come with DMA APIs to
> > > > manage the CPU cache coherence that are all NOP's on x86.
> > > 
> > > Yes.  And we have plenty of precende where an IOMMU is in "bypass"
> > > mode to allow access to all memory and then uses the simple
> > > dma-direct case.  
> > I agree it is better not to expose the entire direct map. But the
> > missing piece of using DMA APIs is the PASID. The caller needs the
> > PASID value to do work submission once buffer is mapped.  
> 
> You still haven't explained why the kernel driver should have a PASID at
> all.
> 
For shared workqueue, it can only generate DMA request with PASID. The
submission is done by ENQCMDS (S for supervisor) instruction.

If we were not to share page tables with init_mm, we need a system PASID
that doing the same direct mapping in IOMMU page tables.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jason Gunthorpe
On Thu, May 13, 2021 at 06:00:12AM -0700, Jacob Pan wrote:
> > > If you want to do SVA PASID then it also must come with DMA APIs to
> > > manage the CPU cache coherence that are all NOP's on x86.  
> > 
> > Yes.  And we have plenty of precende where an IOMMU is in "bypass" mode
> > to allow access to all memory and then uses the simple dma-direct case.
> I agree it is better not to expose the entire direct map. But the missing
> piece of using DMA APIs is the PASID. The caller needs the PASID value to
> do work submission once buffer is mapped.

You still haven't explained why the kernel driver should have a PASID at all.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Jacob Pan
Hi Christoph,

On Wed, 12 May 2021 07:37:40 +0100, Christoph Hellwig 
wrote:

> On Tue, May 11, 2021 at 04:47:26PM -0300, Jason Gunthorpe wrote:
> > > Let me try to break down your concerns:
> > > 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU.
> > > is that your concern? But PASID is intrinsically tied with IOMMU and
> > > if the drivers are using a generic sva-lib API, why they are not
> > > portable? SVA by its definition is to avoid map/unmap every time.  
> > 
> > Kernel explicitly does not support this programming model. All DMA is
> > explicit and the DMA API hides platform details like IOMMU and CPU
> > cache coherences. Just because x86 doesn't care about this doesn't
> > make any of it optional.  
> 
> Exactly.
Perhaps we can view these SVA capable devices as FPU or a device that can
be fused onto the CPU by PASID binding. We don't require DMA map/unmap for
FPU to use kernel virtual address, right?

> 
> > If you want to do SVA PASID then it also must come with DMA APIs to
> > manage the CPU cache coherence that are all NOP's on x86.  
> 
> Yes.  And we have plenty of precende where an IOMMU is in "bypass" mode
> to allow access to all memory and then uses the simple dma-direct case.
I agree it is better not to expose the entire direct map. But the missing
piece of using DMA APIs is the PASID. The caller needs the PASID value to
do work submission once buffer is mapped.

Perhaps we can add a parameter to the DMA API to specify the address space?
As Jason suggested the definition of IOASID, which represents a page table.
Just my quick thought, can you help us with a viable solution?

I know we are supposed to hide IOMMU by using DMA APIs which makes drivers
portable w/ and w/o IOMMU. This IOASID can be optional.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-12 Thread Jean-Philippe Brucker
On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> The void* drvdata parameter isn't really used in iommu_sva_bind_device()
> API, the current IDXD code "borrows" the drvdata for a VT-d private flag
> for supervisor SVA usage.
> 
> Supervisor/Privileged mode request is a generic feature. It should be
> promoted from the VT-d vendor driver to the generic code.
> 
> This patch replaces void* drvdata with a unsigned int flags parameter
> and adjusts callers accordingly.
> 
> Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 

Thanks for cleaning this up. Whether Vt-d's supervisor mode will need
rework or not, this is still good to have. One nit below if you resend

Reviewed-by: Jean-Philippe Brucker 

[...]
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..fcc9d36b4b01 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -152,6 +152,19 @@ enum iommu_dev_features {
>  
>  #define IOMMU_PASID_INVALID  (-1U)
>  
> +/*
> + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
> + * for access to kernel addresses. No IOTLB flushes are automatically done
> + * for kernel mappings; it is valid only for access to the kernel's static
> + * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> + * A future API addition may permit the use of such ranges, by means of an
> + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> + *
> + * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> + * do such IOTLB flushes automatically.

I would add that this is platform specific and not all IOMMU drivers
support the feature.

Thanks,
Jean

> + */
> +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)
> +
>  #ifdef CONFIG_IOMMU_API
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-12 Thread Christoph Hellwig
On Tue, May 11, 2021 at 04:47:26PM -0300, Jason Gunthorpe wrote:
> > Let me try to break down your concerns:
> > 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
> > that your concern? But PASID is intrinsically tied with IOMMU and if
> > the drivers are using a generic sva-lib API, why they are not portable?
> > SVA by its definition is to avoid map/unmap every time.
> 
> Kernel explicitly does not support this programming model. All DMA is
> explicit and the DMA API hides platform details like IOMMU and CPU
> cache coherences. Just because x86 doesn't care about this doesn't
> make any of it optional.

Exactly.

> If you want to do SVA PASID then it also must come with DMA APIs to
> manage the CPU cache coherence that are all NOP's on x86.

Yes.  And we have plenty of precende where an IOMMU is in "bypass" mode
to allow access to all memory and then uses the simple dma-direct case.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-11 Thread Jason Gunthorpe
On Tue, May 11, 2021 at 11:05:50AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 11 May 2021 13:35:21 -0300, Jason Gunthorpe  wrote:
> 
> > On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:
> > 
> > > > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > > > does IDXD use normal DMA on the RID for kernel controlled accesses?  
> > > 
> > > Using SVA simplifies the work submission, there is no need to do
> > > map/unmap. Just bind PASID with init_mm, then submit work directly
> > > either with ENQCMDS (supervisor version of ENQCMD) to a shared
> > > workqueue or put the supervisor PASID in the descriptor for dedicated
> > > workqueue.  
> > 
> > That is not OK, protable drivers in Linux have to sue dma map/unmap
> > calls to manage cache coherence. PASID does not opt out of any of
> > that.
> > 
> Let me try to break down your concerns:
> 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
> that your concern? But PASID is intrinsically tied with IOMMU and if
> the drivers are using a generic sva-lib API, why they are not portable?
> SVA by its definition is to avoid map/unmap every time.

Kernel explicitly does not support this programming model. All DMA is
explicit and the DMA API hides platform details like IOMMU and CPU
cache coherences. Just because x86 doesn't care about this doesn't
make any of it optional.

If you want to do SVA PASID then it also must come with DMA APIs to
manage the CPU cache coherence that are all NOP's on x86.

> > I dislike this whole idea a lot. A single driver should not opt itself
> > out of IOMMU based security "just because"
>
> Perhaps I missed your point here. This is NOT a single driver, privileged
> access is a PCIe spec defined feature. We are using IOMMU sva-lib APIs, not
> sure why it is by-passing.

Until today the overal security of the IOMMU configuration is
controlled by kernel boot parameters set by the sysadmin.

This magic PASID effectively disables all IOMMU security and allows an
IO device unrestricted access to the entire system memory. This is the
opposite of what the admin may have specified in the various boot
options.

I don't like it all. Kust because the PCI sig defined this mechanism
doesn't mean it is mandatory for Linux to use it.

If you want the performance gain of no iommu updates then use RID
based access and boot flags that have an identity translation for all
RIDs.

Forcing the system to create an identiy translation in all cases by
having a single kernel driver create a PASID feels really
wrong. 

Especially since there isn't any real gain in functionality or
performance as it it just exposing the same things the normal DMA API
would expose.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-11 Thread Jacob Pan
Hi Jason,

On Tue, 11 May 2021 13:35:21 -0300, Jason Gunthorpe  wrote:

> On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:
> 
> > > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > > does IDXD use normal DMA on the RID for kernel controlled accesses?  
> > 
> > Using SVA simplifies the work submission, there is no need to do
> > map/unmap. Just bind PASID with init_mm, then submit work directly
> > either with ENQCMDS (supervisor version of ENQCMD) to a shared
> > workqueue or put the supervisor PASID in the descriptor for dedicated
> > workqueue.  
> 
> That is not OK, protable drivers in Linux have to sue dma map/unmap
> calls to manage cache coherence. PASID does not opt out of any of
> that.
> 
Let me try to break down your concerns:
1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
that your concern? But PASID is intrinsically tied with IOMMU and if
the drivers are using a generic sva-lib API, why they are not portable?
SVA by its definition is to avoid map/unmap every time.

2. cache coherence - as you suggested if we name the flag "direct_map",
there is no mapping change, then there is no need for mmu_notifier like tlb
flush calls, right? it is caller's responsibility to make sure vmalloc are
not used.

> I dislike this whole idea a lot. A single driver should not opt itself
> out of IOMMU based security "just because"
> 
Perhaps I missed your point here. This is NOT a single driver, privileged
access is a PCIe spec defined feature. We are using IOMMU sva-lib APIs, not
sure why it is by-passing.

Perhaps we should add check for struct pci_dev->untrusted && rciep to
address security?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-11 Thread Jason Gunthorpe
On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:

> > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > does IDXD use normal DMA on the RID for kernel controlled accesses?
> 
> Using SVA simplifies the work submission, there is no need to do map/unmap.
> Just bind PASID with init_mm, then submit work directly either with ENQCMDS
> (supervisor version of ENQCMD) to a shared workqueue or put the supervisor
> PASID in the descriptor for dedicated workqueue.

That is not OK, protable drivers in Linux have to sue dma map/unmap
calls to manage cache coherence. PASID does not opt out of any of
that.

I dislike this whole idea a lot. A single driver should not opt itself
out of IOMMU based security "just because"

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-11 Thread Jacob Pan
Hi Jason,

On Tue, 11 May 2021 08:48:48 -0300, Jason Gunthorpe  wrote:

> On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> > >   
> > > > +/*
> > > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > > used only
> > > > + * for access to kernel addresses. No IOTLB flushes are
> > > > automatically done
> > > > + * for kernel mappings; it is valid only for access to the kernel's
> > > > static
> > > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > > mappings.
> > > > + * A future API addition may permit the use of such ranges, by
> > > > means of an
> > > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > > + *
> > > > + * It is unlikely that we will ever hook into
> > > > flush_tlb_kernel_range() to
> > > > + * do such IOTLB flushes automatically.
> > > > + */
> > > > +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)
> > > 
> > > Huh? That isn't really SVA, can you call it something saner please?
> > >   
> > This is shared kernel virtual address, I am following the SVA lib naming
> > since this is where the flag will be used. Why this is not SVA? Kernel
> > virtual address is still virtual address. Is it due to direct map?  
> 
> As the above explains it doesn't actually synchronize the kernel's
> address space it just shoves the direct map into the IOMMU.
> 
There is no duplicated kernel direct map in IOMMU.

> I suppose a different IOMMU implementation might point the PASID directly
> at the kernel's page table and avoid those limitations - but since
> that isn't portable it seems irrelevant.
> 
This is what we are doing here. We allocate a supervisor PASID and put
the kernel page table (init_mm pgd) in this PASID entry.

> Since the only thing it really maps is the direct map I would just
> call it direct_map, or all physical or something.
> 
Good idea. It makes things clear to the callers. They must only use direct
map memory for DMA.

> How does this interact with the DMA APIs?
DMA API would use RID2PASID (PASID 0), so it is separated by PASIDs.

> How do you get CPU cache
> flushing/etc into PASID operations that don't trigger IOMMU updates?
> 
Sorry, I am not following. This is used for direct map only.

> Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> does IDXD use normal DMA on the RID for kernel controlled accesses?
> 
Using SVA simplifies the work submission, there is no need to do map/unmap.
Just bind PASID with init_mm, then submit work directly either with ENQCMDS
(supervisor version of ENQCMD) to a shared workqueue or put the supervisor
PASID in the descriptor for dedicated workqueue.

> > > Is it really a PASID that always has all of physical memory mapped
> > > into it? Sounds dangerous. What is it for?  
> > 
> > Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per
> > PCIe spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d
> > supports this with "Privileged-mode-Requested (PR) flag (to distinguish
> > user versus supervisor access)". Each PASID entry has a SRE (Supervisor
> > Request Enable) bit.  
> 
> The PR flag is only needed if the underlying IOMMU is directly
> processing the CPU page tables. For cases where the IOMMU is using its
> own page table format and has its own copies the PR flag shouldn't be
> used.
> 
We are doing the former case. There is no IOMMU page tables for the direct
map.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-11 Thread Jason Gunthorpe
On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe  wrote:
> 
> > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> > 
> > > +/*
> > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > used only
> > > + * for access to kernel addresses. No IOTLB flushes are automatically
> > > done
> > > + * for kernel mappings; it is valid only for access to the kernel's
> > > static
> > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > mappings.
> > > + * A future API addition may permit the use of such ranges, by means
> > > of an
> > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > + *
> > > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > > to
> > > + * do such IOTLB flushes automatically.
> > > + */
> > > +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)  
> > 
> > Huh? That isn't really SVA, can you call it something saner please?
> > 
> This is shared kernel virtual address, I am following the SVA lib naming
> since this is where the flag will be used. Why this is not SVA? Kernel
> virtual address is still virtual address. Is it due to direct map?

As the above explains it doesn't actually synchronize the kernel's
address space it just shoves the direct map into the IOMMU.

I suppose a different IOMMU implementation might point the PASID directly
at the kernel's page table and avoid those limitations - but since
that isn't portable it seems irrelevant.

Since the only thing it really maps is the direct map I would just
call it direct_map, or all physical or something.

How does this interact with the DMA APIs? How do you get CPU cache
flushing/etc into PASID operations that don't trigger IOMMU updates?

Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
does IDXD use normal DMA on the RID for kernel controlled accesses?

> > Is it really a PASID that always has all of physical memory mapped
> > into it? Sounds dangerous. What is it for?
> 
> Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
> spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
> this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
> supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
> bit.

The PR flag is only needed if the underlying IOMMU is directly
processing the CPU page tables. For cases where the IOMMU is using its
own page table format and has its own copies the PR flag shouldn't be
used.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-10 Thread Jacob Pan
Hi Jason,

On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe  wrote:

> On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> 
> > +/*
> > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > used only
> > + * for access to kernel addresses. No IOTLB flushes are automatically
> > done
> > + * for kernel mappings; it is valid only for access to the kernel's
> > static
> > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > mappings.
> > + * A future API addition may permit the use of such ranges, by means
> > of an
> > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > + *
> > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > to
> > + * do such IOTLB flushes automatically.
> > + */
> > +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)  
> 
> Huh? That isn't really SVA, can you call it something saner please?
> 
This is shared kernel virtual address, I am following the SVA lib naming
since this is where the flag will be used. Why this is not SVA? Kernel
virtual address is still virtual address. Is it due to direct map?

> Is it really a PASID that always has all of physical memory mapped
> into it? Sounds dangerous. What is it for?
> 

Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
bit.

Perhaps we should limit that to trusted device, e.g. RCIEP device?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-10 Thread Jason Gunthorpe
On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:

> +/*
> + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
> + * for access to kernel addresses. No IOTLB flushes are automatically done
> + * for kernel mappings; it is valid only for access to the kernel's static
> + * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> + * A future API addition may permit the use of such ranges, by means of an
> + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> + *
> + * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> + * do such IOTLB flushes automatically.
> + */
> +#define IOMMU_SVA_BIND_SUPERVISOR   BIT(0)

Huh? That isn't really SVA, can you call it something saner please?

Is it really a PASID that always has all of physical memory mapped
into it? Sounds dangerous. What is it for?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu