Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Andy Lutomirski
On Wed, Apr 17, 2019 at 5:00 PM Linus Torvalds
 wrote:
>
> On Wed, Apr 17, 2019 at 4:42 PM Thomas Gleixner  wrote:
> >
> > On Wed, 17 Apr 2019, Linus Torvalds wrote:
> >
> > > With SMEP, user space pages are always NX.
> >
> > We talk past each other. The user space page in the ring3 valid virtual
> > address space (non negative) is of course protected by SMEP.
> >
> > The attack utilizes the kernel linear mapping of the physical
> > memory. I.e. user space address 0x43210 has a kernel equivalent at
> > 0xfxx. So if the attack manages to trick the kernel to that valid
> > kernel address and that is mapped X --> game over. SMEP does not help
> > there.
>
> Oh, agreed.
>
> But that would simply be a kernel bug. We should only map kernel pages
> executable when we have kernel code in them, and we should certainly
> not allow those pages to be mapped writably in user space.
>
> That kind of "executable in kernel, writable in user" would be a
> horrendous and major bug.
>
> So i think it's a non-issue.
>
> > From the top of my head I'd say this is a non issue as those kernel address
> > space mappings _should_ be NX, but we got bitten by _should_ in the past:)
>
> I do agree that bugs can happen, obviously, and we might have missed 
> something.
>
> But in the context of XPFO, I would argue (*very* strongly) that the
> likelihood of the above kind of bug is absolutely *miniscule* compared
> to the likelihood that we'd have something wrong in the software
> implementation of XPFO.
>
> So if the argument is "we might have bugs in software", then I think
> that's an argument _against_ XPFO rather than for it.
>

I don't think this type of NX goof was ever the argument for XPFO.
The main argument I've heard is that a malicious user program writes a
ROP payload into user memory (regular anonymous user memory) and then
gets the kernel to erroneously set RSP (*not* RIP) to point there.

I find this argument fairly weak for a couple reasons.  First, if
we're worried about this, let's do in-kernel CFI, not XPFO, to
mitigate it.  Second, I don't see why the exact same attack can't be
done using, say, page cache, and unless I'm missing something, XPFO
doesn't protect page cache.  Or network buffers, or pipe buffers, etc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Linus Torvalds
On Wed, Apr 17, 2019 at 4:42 PM Thomas Gleixner  wrote:
>
> On Wed, 17 Apr 2019, Linus Torvalds wrote:
>
> > With SMEP, user space pages are always NX.
>
> We talk past each other. The user space page in the ring3 valid virtual
> address space (non negative) is of course protected by SMEP.
>
> The attack utilizes the kernel linear mapping of the physical
> memory. I.e. user space address 0x43210 has a kernel equivalent at
> 0xfxx. So if the attack manages to trick the kernel to that valid
> kernel address and that is mapped X --> game over. SMEP does not help
> there.

Oh, agreed.

But that would simply be a kernel bug. We should only map kernel pages
executable when we have kernel code in them, and we should certainly
not allow those pages to be mapped writably in user space.

That kind of "executable in kernel, writable in user" would be a
horrendous and major bug.

So i think it's a non-issue.

> From the top of my head I'd say this is a non issue as those kernel address
> space mappings _should_ be NX, but we got bitten by _should_ in the past:)

I do agree that bugs can happen, obviously, and we might have missed something.

But in the context of XPFO, I would argue (*very* strongly) that the
likelihood of the above kind of bug is absolutely *miniscule* compared
to the likelihood that we'd have something wrong in the software
implementation of XPFO.

So if the argument is "we might have bugs in software", then I think
that's an argument _against_ XPFO rather than for it.

Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Thomas Gleixner
On Wed, 17 Apr 2019, Linus Torvalds wrote:

> On Wed, Apr 17, 2019, 14:20 Thomas Gleixner  wrote:
> 
> >
> > It's not necessarily a W+X issue. The user space text is mapped in the
> > kernel as well and even if it is mapped RX then this can happen. So any
> > kernel mappings of user space text need to be mapped NX!
> 
> With SMEP, user space pages are always NX.

We talk past each other. The user space page in the ring3 valid virtual
address space (non negative) is of course protected by SMEP.

The attack utilizes the kernel linear mapping of the physical
memory. I.e. user space address 0x43210 has a kernel equivalent at
0xfxx. So if the attack manages to trick the kernel to that valid
kernel address and that is mapped X --> game over. SMEP does not help
there.

>From the top of my head I'd say this is a non issue as those kernel address
space mappings _should_ be NX, but we got bitten by _should_ in the past:)

Thanks,

tglx


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Linus Torvalds
On Wed, Apr 17, 2019, 14:20 Thomas Gleixner  wrote:

>
> It's not necessarily a W+X issue. The user space text is mapped in the
> kernel as well and even if it is mapped RX then this can happen. So any
> kernel mappings of user space text need to be mapped NX!


With SMEP, user space pages are always NX.

I really think SM[AE]P is something we can already take for granted. People
who have old CPU's workout it are simply not serious about security anyway.
There is no point in saying "we can do it badly in software".

   Linus

>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-17 Thread Thiago Jung Bauermann


David Gibson  writes:

> On Sat, Mar 23, 2019 at 05:01:35PM -0400, Michael S. Tsirkin wrote:
>> On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
>> > Michael S. Tsirkin  writes:
> [snip]
>> > >> > Is there any justification to doing that beyond someone putting
>> > >> > out slow code in the past?
>> > >>
>> > >> The definition of the ACCESS_PLATFORM flag is generic and captures the
>> > >> notion of memory access restrictions for the device. Unfortunately, on
>> > >> powerpc pSeries guests it also implies that the IOMMU is turned on
>> > >
>> > > IIUC that's really because on pSeries IOMMU is *always* turned on.
>> > > Platform has no way to say what you want it to say
>> > > which is bypass the iommu for the specific device.
>> >
>> > Yes, that's correct. pSeries guests running on KVM are in a gray area
>> > where theoretically they use an IOMMU but in practice KVM ignores it.
>> > It's unfortunate but it's the reality on the ground today. :-/
>
> Um.. I'm not sure what you mean by this.  As far as I'm concerned
> there is always a guest-visible (paravirtualized) IOMMU, and that will
> be backed onto the host IOMMU when necessary.

There is, but vhost will ignore it and directly map the guest memory
when ACCESS_PLATFORM (the flag previously known as IOMMU_PLATFORM) isn't
set. From QEMU's hw/virtio/vhost.c:

static int vhost_dev_has_iommu(struct vhost_dev *dev)
{
VirtIODevice *vdev = dev->vdev;

return virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
}

static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
  hwaddr *plen, int is_write)
{
if (!vhost_dev_has_iommu(dev)) {
return cpu_physical_memory_map(addr, plen, is_write);
} else {
return (void *)(uintptr_t)addr;
}
}

--
Thiago Jung Bauermann
IBM Linux Technology Center

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-17 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
>>
>> Michael S. Tsirkin  writes:
>>
>> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
>> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
>> >> only ever try to access memory addresses that are supplied to it by the
>> >> guest, so all of the secure guest memory that the host cares about is
>> >> accessible:
>> >>
>> >> If this feature bit is set to 0, then the device has same access to
>> >> memory addresses supplied to it as the driver has. In particular,
>> >> the device will always use physical addresses matching addresses
>> >> used by the driver (typically meaning physical addresses used by the
>> >> CPU) and not translated further, and can access any address supplied
>> >> to it by the driver. When clear, this overrides any
>> >> platform-specific description of whether device access is limited or
>> >> translated in any way, e.g. whether an IOMMU may be present.
>> >>
>> >> All of the above is true for POWER guests, whether they are secure
>> >> guests or not.
>> >>
>> >> Or are you saying that a virtio device may want to access memory
>> >> addresses that weren't supplied to it by the driver?
>> >
>> > Your logic would apply to IOMMUs as well.  For your mode, there are
>> > specific encrypted memory regions that driver has access to but device
>> > does not. that seems to violate the constraint.
>>
>> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
>> the device can ignore the IOMMU for all practical purposes I would
>> indeed say that the logic would apply to IOMMUs as well. :-)
>>
>> I guess I'm still struggling with the purpose of signalling to the
>> driver that the host may not have access to memory addresses that it
>> will never try to access.
>
> For example, one of the benefits is to signal to host that driver does
> not expect ability to access all memory. If it does, host can
> fail initialization gracefully.

But why would the ability to access all memory be necessary or even
useful? When would the host access memory that the driver didn't tell it
to access?

>> >> >> > But the name "sev_active" makes me scared because at least AMD guys 
>> >> >> > who
>> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
>> >> >>
>> >> >> My understanding is, AMD guest-platform knows in advance that their
>> >> >> guest will run in secure mode and hence sets the flag at the time of VM
>> >> >> instantiation. Unfortunately we dont have that luxury on our platforms.
>> >> >
>> >> > Well you do have that luxury. It looks like that there are existing
>> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
>> >> > with how that path is slow. So you are trying to optimize for
>> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
>> >> > to invoke DMA API.
>> >> >
>> >> > For example if there was another flag just like ACCESS_PLATFORM
>> >> > just not yet used by anyone, you would be all fine using that right?
>> >>
>> >> Yes, a new flag sounds like a great idea. What about the definition
>> >> below?
>> >>
>> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
>> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
>> >> exception that the IOMMU is explicitly defined to be off or bypassed
>> >> when accessing memory addresses supplied to the device by the
>> >> driver. This flag should be set by the guest if offered, but to
>> >> allow for backward-compatibility device implementations allow for it
>> >> to be left unset by the guest. It is an error to set both this flag
>> >> and VIRTIO_F_ACCESS_PLATFORM.
>> >
>> > It looks kind of narrow but it's an option.
>>
>> Great!
>>
>> > I wonder how we'll define what's an iommu though.
>>
>> Hm, it didn't occur to me it could be an issue. I'll try.

I rephrased it in terms of address translation. What do you think of
this version? The flag name is slightly different too:


VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
with the exception that address translation is guaranteed to be
unnecessary when accessing memory addresses supplied to the device
by the driver. Which is to say, the device will always use physical
addresses matching addresses used by the driver (typically meaning
physical addresses used by the CPU) and not translated further. This
flag should be set by the guest if offered, but to allow for
backward-compatibility device implementations allow for it to be
left unset by the guest. It is an error to set both this flag and
VIRTIO_F_ACCESS_PLATFORM.

>> > Another idea is maybe something like virtio-iommu?
>>
>> You mean, have legacy guests use virtio-iommu to request an IOMMU
>> bypass? If so, it's an 

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Thomas Gleixner
On Wed, 17 Apr 2019, Nadav Amit wrote:
> > On Apr 17, 2019, at 10:26 AM, Ingo Molnar  wrote:
> >> As I was curious, I looked at the paper. Here is a quote from it:
> >> 
> >> "In x86-64, however, the permissions of physmap are not in sane state.
> >> Kernels up to v3.8.13 violate the W^X property by mapping the entire region
> >> as “readable, writeable, and executable” (RWX)—only very recent kernels
> >> (≥v3.9) use the more conservative RW mapping.”
> > 
> > But v3.8.13 is a 5+ years old kernel, it doesn't count as a "modern" 
> > kernel in any sense of the word. For any proposed patchset with 
> > significant complexity and non-trivial costs the benchmark version 
> > threshold is the "current upstream kernel".
> > 
> > So does that quote address my followup questions:
> > 
> >> Is this actually true of modern x86-64 kernels? We've locked down W^X
> >> protections in general.
> >> 
> >> I.e. this conclusion:
> >> 
> >>  "Therefore, by simply overwriting kfptr with 0x87FF9F08 and
> >>   triggering the kernel to dereference it, an attacker can directly
> >>   execute shell code with kernel privileges."
> >> 
> >> ... appears to be predicated on imperfect W^X protections on the x86-64
> >> kernel.
> >> 
> >> Do such holes exist on the latest x86-64 kernel? If yes, is there a
> >> reason to believe that these W^X holes cannot be fixed, or that any fix
> >> would be more expensive than XPFO?
> > 
> > ?
> > 
> > What you are proposing here is a XPFO patch-set against recent kernels 
> > with significant runtime overhead, so my questions about the W^X holes 
> > are warranted.
> > 
> 
> Just to clarify - I am an innocent bystander and have no part in this work.
> I was just looking (again) at the paper, as I was curious due to the recent
> patches that I sent that improve W^X protection.

It's not necessarily a W+X issue. The user space text is mapped in the
kernel as well and even if it is mapped RX then this can happen. So any
kernel mappings of user space text need to be mapped NX!

Thanks,

tglx___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC] arm64: swiotlb: cma_alloc error spew

2019-04-17 Thread dann frazier
hey,
  I'm seeing an issue on a couple of arm64 systems[*] where they spew
~10K "cma: cma_alloc: alloc failed" messages at boot. The errors are
non-fatal, and bumping up cma to a large enough size (~128M) gets rid
of them - but that seems suboptimal. Bisection shows that this started
after commit fafadcd16595 ("swiotlb: don't dip into swiotlb pool for
coherent allocations"). It looks like __dma_direct_alloc_pages()
is opportunistically using CMA memory but falls back to non-CMA if CMA
disabled or unavailable. I've demonstrated that this fallback is
indeed returning a valid pointer. So perhaps the issue is really just
the warning emission.

The following naive patch solves the problem for me - just silence the
cma errors, since it looks like a soft error. But is there a better
approach?

[*] APM X-Gene & HiSilicon Hi1620 w/ SMMU disabled

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 6310ad01f915b..0324aa606c173 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -112,7 +112,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
/* CMA can be used only in the context which permits sleeping */
if (gfpflags_allow_blocking(gfp)) {
page = dma_alloc_from_contiguous(dev, count, page_order,
-gfp & __GFP_NOWARN);
+true);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_release_from_contiguous(dev, page, count);
page = NULL;




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Khalid Aziz
On 4/17/19 1:49 PM, Andy Lutomirski wrote:
> On Wed, Apr 17, 2019 at 10:33 AM Khalid Aziz  wrote:
>>
>> On 4/17/19 11:09 AM, Ingo Molnar wrote:
>>>
>>> * Khalid Aziz  wrote:
>>>
> I.e. the original motivation of the XPFO patches was to prevent execution
> of direct kernel mappings. Is this motivation still present if those
> mappings are non-executable?
>
> (Sorry if this has been asked and answered in previous discussions.)

 Hi Ingo,

 That is a good question. Because of the cost of XPFO, we have to be very
 sure we need this protection. The paper from Vasileios, Michalis and
 Angelos - ,
 does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
 and 6.2.
>>>
>>> So it would be nice if you could generally summarize external arguments
>>> when defending a patchset, instead of me having to dig through a PDF
>>> which not only causes me to spend time that you probably already spent
>>> reading that PDF, but I might also interpret it incorrectly. ;-)
>>
>> Sorry, you are right. Even though that paper explains it well, a summary
>> is always useful.
>>
>>>
>>> The PDF you cited says this:
>>>
>>>   "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced
>>>in many platforms, including x86-64.  In our example, the content of
>>>user address 0xBEEF000 is also accessible through kernel address
>>>0x87FF9F08 as plain, executable code."
>>>
>>> Is this actually true of modern x86-64 kernels? We've locked down W^X
>>> protections in general.
>>>
>>> I.e. this conclusion:
>>>
>>>   "Therefore, by simply overwriting kfptr with 0x87FF9F08 and
>>>triggering the kernel to dereference it, an attacker can directly
>>>execute shell code with kernel privileges."
>>>
>>> ... appears to be predicated on imperfect W^X protections on the x86-64
>>> kernel.
>>>
>>> Do such holes exist on the latest x86-64 kernel? If yes, is there a
>>> reason to believe that these W^X holes cannot be fixed, or that any fix
>>> would be more expensive than XPFO?
>>
>> Even if physmap is not executable, return-oriented programming (ROP) can
>> still be used to launch an attack. Instead of placing executable code at
>> user address 0xBEEF000, attacker can place an ROP payload there. kfptr
>> is then overwritten to point to a stack-pivoting gadget. Using the
>> physmap address aliasing, the ROP payload becomes kernel-mode stack. The
>> execution can then be hijacked upon execution of ret instruction. This
>> is a gist of the subsection titled "Non-executable physmap" under
>> section 6.2 and it looked convincing enough to me. If you have a
>> different take on this, I am very interested in your point of view.
> 
> My issue with all this is that XPFO is really very expensive.  I think
> that, if we're going to seriously consider upstreaming expensive
> exploit mitigations like this, we should consider others first, in
> particular CFI techniques.  grsecurity's RAP would be a great start.
> I also proposed using a gcc plugin (or upstream gcc feature) to add
> some instrumentation to any code that pops RSP to verify that the
> resulting (unsigned) change in RSP is between 0 and THREAD_SIZE bytes.
> This will make ROP quite a bit harder.
> 

Yes, XPFO is expensive. I have been able to reduce the overhead of XPFO
from 2537% to 28% (on large servers) but 28% is still quite significant.
Alternative mitigation techniques with lower impact would easily be more
acceptable as long as they provide same level of protection. If we have
to go with XPFO, we will continue to look for more performance
improvement to bring that number down further from 28%. Hopefully what
Tycho is working on will yield better results. I am continuing to look
for improvements to XPFO in parallel.

Thanks,
Khalid

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Tycho Andersen
On Wed, Apr 17, 2019 at 12:49:04PM -0700, Andy Lutomirski wrote:
> I also proposed using a gcc plugin (or upstream gcc feature) to add
> some instrumentation to any code that pops RSP to verify that the
> resulting (unsigned) change in RSP is between 0 and THREAD_SIZE bytes.
> This will make ROP quite a bit harder.

I've been playing around with this for a bit, and hope to have
something to post Soon :)

Tycho
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Andy Lutomirski
On Wed, Apr 17, 2019 at 10:33 AM Khalid Aziz  wrote:
>
> On 4/17/19 11:09 AM, Ingo Molnar wrote:
> >
> > * Khalid Aziz  wrote:
> >
> >>> I.e. the original motivation of the XPFO patches was to prevent execution
> >>> of direct kernel mappings. Is this motivation still present if those
> >>> mappings are non-executable?
> >>>
> >>> (Sorry if this has been asked and answered in previous discussions.)
> >>
> >> Hi Ingo,
> >>
> >> That is a good question. Because of the cost of XPFO, we have to be very
> >> sure we need this protection. The paper from Vasileios, Michalis and
> >> Angelos - ,
> >> does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
> >> and 6.2.
> >
> > So it would be nice if you could generally summarize external arguments
> > when defending a patchset, instead of me having to dig through a PDF
> > which not only causes me to spend time that you probably already spent
> > reading that PDF, but I might also interpret it incorrectly. ;-)
>
> Sorry, you are right. Even though that paper explains it well, a summary
> is always useful.
>
> >
> > The PDF you cited says this:
> >
> >   "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced
> >in many platforms, including x86-64.  In our example, the content of
> >user address 0xBEEF000 is also accessible through kernel address
> >0x87FF9F08 as plain, executable code."
> >
> > Is this actually true of modern x86-64 kernels? We've locked down W^X
> > protections in general.
> >
> > I.e. this conclusion:
> >
> >   "Therefore, by simply overwriting kfptr with 0x87FF9F08 and
> >triggering the kernel to dereference it, an attacker can directly
> >execute shell code with kernel privileges."
> >
> > ... appears to be predicated on imperfect W^X protections on the x86-64
> > kernel.
> >
> > Do such holes exist on the latest x86-64 kernel? If yes, is there a
> > reason to believe that these W^X holes cannot be fixed, or that any fix
> > would be more expensive than XPFO?
>
> Even if physmap is not executable, return-oriented programming (ROP) can
> still be used to launch an attack. Instead of placing executable code at
> user address 0xBEEF000, attacker can place an ROP payload there. kfptr
> is then overwritten to point to a stack-pivoting gadget. Using the
> physmap address aliasing, the ROP payload becomes kernel-mode stack. The
> execution can then be hijacked upon execution of ret instruction. This
> is a gist of the subsection titled "Non-executable physmap" under
> section 6.2 and it looked convincing enough to me. If you have a
> different take on this, I am very interested in your point of view.

My issue with all this is that XPFO is really very expensive.  I think
that, if we're going to seriously consider upstreaming expensive
exploit mitigations like this, we should consider others first, in
particular CFI techniques.  grsecurity's RAP would be a great start.
I also proposed using a gcc plugin (or upstream gcc feature) to add
some instrumentation to any code that pops RSP to verify that the
resulting (unsigned) change in RSP is between 0 and THREAD_SIZE bytes.
This will make ROP quite a bit harder.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/9] PCI: Add a stub for pci_ats_disabled()

2019-04-17 Thread Bjorn Helgaas
On Wed, Apr 17, 2019 at 07:24:41PM +0100, Jean-Philippe Brucker wrote:
> Currently pci_ats_disabled() is only defined when CONFIG_PCI is enabled.
> Since we're about to use the function in the Arm SMMUv3 driver, which
> could be built with CONFIG_PCI disabled, add a definition of
> pci_ats_disabled() for !CONFIG_PCI.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 169c6a18d0b0..61d7cd888bad 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1713,6 +1713,7 @@ static inline int pci_irqd_intx_xlate(struct irq_domain 
> *d,
>  static inline const struct pci_device_id *pci_match_id(const struct 
> pci_device_id *ids,
>struct pci_dev *dev)
>  { return NULL; }
> +static inline bool pci_ats_disabled(void) { return true; }
>  #endif /* CONFIG_PCI */
>  
>  #ifdef CONFIG_PCI_ATS
> -- 
> 2.21.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/9] PCI: Move ATS declarations outside of CONFIG_PCI

2019-04-17 Thread Bjorn Helgaas
On Wed, Apr 17, 2019 at 07:24:40PM +0100, Jean-Philippe Brucker wrote:
> At the moment, the ATS functions are only defined when CONFIG_PCI is
> enabled. Since we're about to use them in the Arm SMMUv3 driver, which
> could be built with CONFIG_PCI disabled, and they are already guarded by
> CONFIG_PCI_ATS which depends on CONFIG_PCI, move the definitions outside
> of CONFIG_PCI.
> 
> Signed-off-by: Jean-Philippe Brucker 

I guess this is OK with me, although AFAICS they're only used in
arm_smmu_enable_ats() and arm_smmu_disable_ats() and I personally
wouldn't find it objectionable to wrap the bodies of those functions
in "#ifdef CONFIG_PCI".  That might even be a useful hint to the
reader, as opposed to relying on all these stub functions
(dev_is_pci(), pci_ats_disabled(), pci_enable_ats(),
pci_disable_ats(), as well as the complete struct pci_dev declaration)
that depend on config settings that aren't obvious in the caller.

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci.h | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 77448215ef5b..169c6a18d0b0 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1521,21 +1521,6 @@ static inline void pcie_ecrc_get_policy(char *str) { }
>  
>  bool pci_ats_disabled(void);
>  
> -#ifdef CONFIG_PCI_ATS
> -/* Address Translation Service */
> -void pci_ats_init(struct pci_dev *dev);
> -int pci_enable_ats(struct pci_dev *dev, int ps);
> -void pci_disable_ats(struct pci_dev *dev);
> -int pci_ats_queue_depth(struct pci_dev *dev);
> -int pci_ats_page_aligned(struct pci_dev *dev);
> -#else
> -static inline void pci_ats_init(struct pci_dev *d) { }
> -static inline int pci_enable_ats(struct pci_dev *d, int ps) { return 
> -ENODEV; }
> -static inline void pci_disable_ats(struct pci_dev *d) { }
> -static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
> -static inline int pci_ats_page_aligned(struct pci_dev *dev) { return 0; }
> -#endif
> -
>  #ifdef CONFIG_PCIE_PTM
>  int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
>  #else
> @@ -1730,6 +1715,21 @@ static inline const struct pci_device_id 
> *pci_match_id(const struct pci_device_i
>  { return NULL; }
>  #endif /* CONFIG_PCI */
>  
> +#ifdef CONFIG_PCI_ATS
> +/* Address Translation Service */
> +void pci_ats_init(struct pci_dev *dev);
> +int pci_enable_ats(struct pci_dev *dev, int ps);
> +void pci_disable_ats(struct pci_dev *dev);
> +int pci_ats_queue_depth(struct pci_dev *dev);
> +int pci_ats_page_aligned(struct pci_dev *dev);
> +#else
> +static inline void pci_ats_init(struct pci_dev *d) { }
> +static inline int pci_enable_ats(struct pci_dev *d, int ps) { return 
> -ENODEV; }
> +static inline void pci_disable_ats(struct pci_dev *d) { }
> +static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
> +static inline int pci_ats_page_aligned(struct pci_dev *dev) { return 0; }
> +#endif
> +
>  /* Include architecture-dependent settings and functions */
>  
>  #include 
> -- 
> 2.21.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 9/9] iommu/arm-smmu-v3: Disable tagged pointers

2019-04-17 Thread Jean-Philippe Brucker
The ARM architecture has a "Top Byte Ignore" (TBI) option that makes the
MMU mask out bits [63:56] of an address, allowing a userspace application
to store data in its pointers. This option is incompatible with PCI ATS.

If TBI is enabled in the SMMU and userspace triggers DMA transactions on
tagged pointers, the endpoint might create ATC entries for addresses that
include a tag. Software would then have to send ATC invalidation packets
for each 255 possible alias of an address, or just wipe the whole address
space. This is not a viable option, so disable TBI.

The impact of this change is unclear, since there are very few users of
tagged pointers, much less SVA. But the requirement introduced by this
patch doesn't seem excessive: a userspace application using both tagged
pointers and SVA should now sanitize addresses (clear the tag) before
using them for device DMA.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3bde137a3755..99b83fda11e4 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1057,7 +1057,6 @@ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
val |= ARM_SMMU_TCR2CD(tcr, EPD0);
val |= ARM_SMMU_TCR2CD(tcr, EPD1);
val |= ARM_SMMU_TCR2CD(tcr, IPS);
-   val |= ARM_SMMU_TCR2CD(tcr, TBI0);
 
return val;
 }
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 8/9] iommu/arm-smmu-v3: Add support for PCI ATS

2019-04-17 Thread Jean-Philippe Brucker
PCIe devices can implement their own TLB, named Address Translation Cache
(ATC). Enable Address Translation Service (ATS) for devices that support
it and send them invalidation requests whenever we invalidate the IOTLBs.

ATC invalidation is allowed to take up to 90 seconds, according to the
PCIe spec, so it is possible to get a SMMU command queue timeout during
normal operations. However we expect implementations to complete
invalidation in reasonable time.

We only enable ATS for "trusted" devices, and currently rely on the
pci_dev->untrusted bit. For ATS we have to trust that:

(a) The device doesn't issue "translated" memory requests for addresses
that weren't returned by the SMMU in a Translation Completion. In
particular, if we give control of a device or device partition to a VM
or userspace, software cannot program the device to access arbitrary
"translated" addresses.

(b) The device follows permissions granted by the SMMU in a Translation
Completion. If the device requested read+write permission and only
got read, then it doesn't write.

(c) The device doesn't send Translated transactions for an address that
was invalidated by an ATC invalidation.

Note that the PCIe specification explicitly requires all of these, so we
can assume that implementations will cleanly shield ATCs from software.

All ATS translated requests still go through the SMMU, to walk the stream
table and check that the device is actually allowed to send translated
requests.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 201 ++--
 1 file changed, 195 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3e7198ee9530..3bde137a3755 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -86,6 +87,7 @@
 #define IDR5_VAX_52_BIT1
 
 #define ARM_SMMU_CR0   0x20
+#define CR0_ATSCHK (1 << 4)
 #define CR0_CMDQEN (1 << 3)
 #define CR0_EVTQEN (1 << 2)
 #define CR0_PRIQEN (1 << 1)
@@ -294,6 +296,7 @@
 #define CMDQ_ERR_CERROR_NONE_IDX   0
 #define CMDQ_ERR_CERROR_ILL_IDX1
 #define CMDQ_ERR_CERROR_ABT_IDX2
+#define CMDQ_ERR_CERROR_ATC_INV_IDX3
 
 #define CMDQ_0_OP  GENMASK_ULL(7, 0)
 #define CMDQ_0_SSV (1UL << 11)
@@ -312,6 +315,12 @@
 #define CMDQ_TLBI_1_VA_MASKGENMASK_ULL(63, 12)
 #define CMDQ_TLBI_1_IPA_MASK   GENMASK_ULL(51, 12)
 
+#define CMDQ_ATC_0_SSIDGENMASK_ULL(31, 12)
+#define CMDQ_ATC_0_SID GENMASK_ULL(63, 32)
+#define CMDQ_ATC_0_GLOBAL  (1UL << 9)
+#define CMDQ_ATC_1_SIZEGENMASK_ULL(5, 0)
+#define CMDQ_ATC_1_ADDR_MASK   GENMASK_ULL(63, 12)
+
 #define CMDQ_PRI_0_SSIDGENMASK_ULL(31, 12)
 #define CMDQ_PRI_0_SID GENMASK_ULL(63, 32)
 #define CMDQ_PRI_1_GRPID   GENMASK_ULL(8, 0)
@@ -433,6 +442,16 @@ struct arm_smmu_cmdq_ent {
u64 addr;
} tlbi;
 
+   #define CMDQ_OP_ATC_INV 0x40
+   #define ATC_INV_SIZE_ALL52
+   struct {
+   u32 sid;
+   u32 ssid;
+   u64 addr;
+   u8  size;
+   boolglobal;
+   } atc;
+
#define CMDQ_OP_PRI_RESP0x41
struct {
u32 sid;
@@ -580,10 +599,12 @@ struct arm_smmu_device {
 /* SMMU private data for each master */
 struct arm_smmu_master {
struct arm_smmu_device  *smmu;
+   struct device   *dev;
struct arm_smmu_domain  *domain;
struct list_headdomain_head;
u32 *sids;
unsigned intnum_sids;
+   boolats_enabled :1;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -813,6 +834,14 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct 
arm_smmu_cmdq_ent *ent)
case CMDQ_OP_TLBI_S12_VMALL:
cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid);
break;
+   case CMDQ_OP_ATC_INV:
+   cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid);
+   cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global);
+   cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid);
+   cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid);
+   cmd[1] |= 

[PATCH v3 3/9] ACPI/IORT: Check ATS capability in root complex nodes

2019-04-17 Thread Jean-Philippe Brucker
Root complex node in IORT has a bit telling whether it supports ATS or
not. Store this bit in the IOMMU fwspec when setting up a device, so it
can be accessed later by an IOMMU driver. In the future we'll probably
want to store this bit at the host bridge or SMMU rather than in each
endpoint.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/iort.c | 11 +++
 include/linux/iommu.h |  4 
 2 files changed, 15 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e48894e002ba..4000902e57f0 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1028,6 +1028,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
 }
 
+static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
+{
+   struct acpi_iort_root_complex *pci_rc;
+
+   pci_rc = (struct acpi_iort_root_complex *)node->node_data;
+   return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
+}
+
 /**
  * iort_iommu_configure - Set-up IOMMU configuration for a device.
  *
@@ -1063,6 +1071,9 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
info.node = node;
err = pci_for_each_dma_alias(to_pci_dev(dev),
 iort_pci_iommu_init, );
+
+   if (!err && iort_pci_rc_supports_ats(node))
+   dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
} else {
int i = 0;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 480921dfbadf..51ab006d348e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -446,6 +446,7 @@ struct iommu_fwspec {
const struct iommu_ops  *ops;
struct fwnode_handle*iommu_fwnode;
void*iommu_priv;
+   u32 flags;
unsigned intnum_ids;
u32 ids[1];
 };
@@ -458,6 +459,9 @@ struct iommu_sva {
const struct iommu_sva_ops  *ops;
 };
 
+/* ATS is supported */
+#define IOMMU_FWSPEC_PCI_RC_ATS(1 << 0)
+
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
  const struct iommu_ops *ops);
 void iommu_fwspec_free(struct device *dev);
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/9] Add PCI ATS support to Arm SMMUv3

2019-04-17 Thread Jean-Philippe Brucker
This series enables PCI ATS in SMMUv3. Changes since v2 [1]:

* Fix build failure when building arm-smmu-v3 without CONFIG_PCI
  Patches 1 and 2 are new.

* Only enable ATS if the root complex supports it. For the moment, only
  IORT provides this information. I have patches for devicetree but
  they are less mature and I'd rather make it a separate series.

* Tried to address most comments. I'll see if I can improve the firmware
  code when adding devicetree support (see [2]).

Note that there is a small conflict with the SVA API. This series
applies on top of Joerg's api-features branch for v5.2.

[1] https://www.spinics.net/lists/arm-kernel/msg719722.html
[2] git://linux-arm.org/linux-jpb.git ats/current

Jean-Philippe Brucker (9):
  PCI: Move ATS declarations outside of CONFIG_PCI
  PCI: Add a stub for pci_ats_disabled()
  ACPI/IORT: Check ATS capability in root complex nodes
  iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master
  iommu/arm-smmu-v3: Store SteamIDs in master
  iommu/arm-smmu-v3: Add a master->domain pointer
  iommu/arm-smmu-v3: Link domains and devices
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Disable tagged pointers

 drivers/acpi/arm64/iort.c   |  11 ++
 drivers/iommu/arm-smmu-v3.c | 345 
 include/linux/iommu.h   |   4 +
 include/linux/pci.h |  31 ++--
 4 files changed, 306 insertions(+), 85 deletions(-)

-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/9] iommu/arm-smmu-v3: Link domains and devices

2019-04-17 Thread Jean-Philippe Brucker
When removing a mapping from a domain, we need to send an invalidation to
all devices that might have stored it in their Address Translation Cache
(ATC). In addition when updating the context descriptor of a live domain,
we'll need to send invalidations for all devices attached to it.

Maintain a list of devices in each domain, protected by a spinlock. It is
updated every time we attach or detach devices to and from domains.

It needs to be a spinlock because we'll invalidate ATC entries from
within hardirq-safe contexts, but it may be possible to relax the read
side with RCU later.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7b425483f4b6..3e7198ee9530 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -581,6 +581,7 @@ struct arm_smmu_device {
 struct arm_smmu_master {
struct arm_smmu_device  *smmu;
struct arm_smmu_domain  *domain;
+   struct list_headdomain_head;
u32 *sids;
unsigned intnum_sids;
 };
@@ -607,6 +608,9 @@ struct arm_smmu_domain {
};
 
struct iommu_domain domain;
+
+   struct list_headdevices;
+   spinlock_t  devices_lock;
 };
 
 struct arm_smmu_option_prop {
@@ -1504,6 +1508,9 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
}
 
mutex_init(_domain->init_mutex);
+   INIT_LIST_HEAD(_domain->devices);
+   spin_lock_init(_domain->devices_lock);
+
return _domain->domain;
 }
 
@@ -1721,9 +1728,16 @@ static void arm_smmu_install_ste_for_dev(struct 
arm_smmu_master *master)
 
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
-   if (!master->domain)
+   unsigned long flags;
+   struct arm_smmu_domain *smmu_domain = master->domain;
+
+   if (!smmu_domain)
return;
 
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_del(>domain_head);
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
+
master->domain = NULL;
arm_smmu_install_ste_for_dev(master);
 }
@@ -1731,6 +1745,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master 
*master)
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
int ret = 0;
+   unsigned long flags;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -1764,6 +1779,10 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
master->domain = smmu_domain;
 
+   spin_lock_irqsave(_domain->devices_lock, flags);
+   list_add(>domain_head, _domain->devices);
+   spin_unlock_irqrestore(_domain->devices_lock, flags);
+
if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
arm_smmu_write_ctx_desc(smmu, _domain->s1_cfg);
 
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/9] iommu/arm-smmu-v3: Store SteamIDs in master

2019-04-17 Thread Jean-Philippe Brucker
Simplify the attach/detach code a bit by keeping a pointer to the stream
IDs in the master structure. Although not completely obvious here, it does
make the subsequent support for ATS, PRI and PASID a bit simpler.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 50cb037f3d8a..25ba546cae7f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -594,6 +594,8 @@ struct arm_smmu_device {
 struct arm_smmu_master {
struct arm_smmu_device  *smmu;
struct arm_smmu_strtab_ent  ste;
+   u32 *sids;
+   unsigned intnum_sids;
 };
 
 /* SMMU private data for an IOMMU domain */
@@ -1688,19 +1690,18 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)
return step;
 }
 
-static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
+static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 {
int i, j;
-   struct arm_smmu_master *master = fwspec->iommu_priv;
struct arm_smmu_device *smmu = master->smmu;
 
-   for (i = 0; i < fwspec->num_ids; ++i) {
-   u32 sid = fwspec->ids[i];
+   for (i = 0; i < master->num_sids; ++i) {
+   u32 sid = master->sids[i];
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
 
/* Bridged PCI devices may end up with duplicated IDs */
for (j = 0; j < i; j++)
-   if (fwspec->ids[j] == sid)
+   if (master->sids[j] == sid)
break;
if (j < i)
continue;
@@ -1709,13 +1710,10 @@ static void arm_smmu_install_ste_for_dev(struct 
iommu_fwspec *fwspec)
}
 }
 
-static void arm_smmu_detach_dev(struct device *dev)
+static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct arm_smmu_master *master = fwspec->iommu_priv;
-
master->ste.assigned = false;
-   arm_smmu_install_ste_for_dev(fwspec);
+   arm_smmu_install_ste_for_dev(master);
 }
 
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
@@ -1736,7 +1734,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
/* Already attached to a different domain? */
if (ste->assigned)
-   arm_smmu_detach_dev(dev);
+   arm_smmu_detach_dev(master);
 
mutex_lock(_domain->init_mutex);
 
@@ -1770,7 +1768,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
ste->s2_cfg = _domain->s2_cfg;
}
 
-   arm_smmu_install_ste_for_dev(fwspec);
+   arm_smmu_install_ste_for_dev(master);
 out_unlock:
mutex_unlock(_domain->init_mutex);
return ret;
@@ -1883,12 +1881,14 @@ static int arm_smmu_add_device(struct device *dev)
return -ENOMEM;
 
master->smmu = smmu;
+   master->sids = fwspec->ids;
+   master->num_sids = fwspec->num_ids;
fwspec->iommu_priv = master;
}
 
/* Check the SIDs are in range of the SMMU and our stream table */
-   for (i = 0; i < fwspec->num_ids; i++) {
-   u32 sid = fwspec->ids[i];
+   for (i = 0; i < master->num_sids; i++) {
+   u32 sid = master->sids[i];
 
if (!arm_smmu_sid_in_range(smmu, sid))
return -ERANGE;
@@ -1922,7 +1922,7 @@ static void arm_smmu_remove_device(struct device *dev)
master = fwspec->iommu_priv;
smmu = master->smmu;
if (master && master->ste.assigned)
-   arm_smmu_detach_dev(dev);
+   arm_smmu_detach_dev(master);
iommu_group_remove_device(dev);
iommu_device_unlink(>iommu, dev);
kfree(master);
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/9] iommu/arm-smmu-v3: Add a master->domain pointer

2019-04-17 Thread Jean-Philippe Brucker
As we're going to track domain-master links more closely for ATS and CD
invalidation, add pointer to the attached domain in struct
arm_smmu_master. As a result, arm_smmu_strtab_ent is redundant and can be
removed.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 92 ++---
 1 file changed, 45 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 25ba546cae7f..7b425483f4b6 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -505,19 +505,6 @@ struct arm_smmu_s2_cfg {
u64 vtcr;
 };
 
-struct arm_smmu_strtab_ent {
-   /*
-* An STE is "assigned" if the master emitting the corresponding SID
-* is attached to a domain. The behaviour of an unassigned STE is
-* determined by the disable_bypass parameter, whereas an assigned
-* STE behaves according to s1_cfg/s2_cfg, which themselves are
-* configured according to the domain type.
-*/
-   boolassigned;
-   struct arm_smmu_s1_cfg  *s1_cfg;
-   struct arm_smmu_s2_cfg  *s2_cfg;
-};
-
 struct arm_smmu_strtab_cfg {
__le64  *strtab;
dma_addr_t  strtab_dma;
@@ -593,7 +580,7 @@ struct arm_smmu_device {
 /* SMMU private data for each master */
 struct arm_smmu_master {
struct arm_smmu_device  *smmu;
-   struct arm_smmu_strtab_ent  ste;
+   struct arm_smmu_domain  *domain;
u32 *sids;
unsigned intnum_sids;
 };
@@ -1087,8 +1074,8 @@ static void arm_smmu_sync_ste_for_sid(struct 
arm_smmu_device *smmu, u32 sid)
arm_smmu_cmdq_issue_sync(smmu);
 }
 
-static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
- __le64 *dst, struct arm_smmu_strtab_ent 
*ste)
+static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
+ __le64 *dst)
 {
/*
 * This is hideously complicated, but we only really care about
@@ -1108,6 +1095,10 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
 */
u64 val = le64_to_cpu(dst[0]);
bool ste_live = false;
+   struct arm_smmu_device *smmu = NULL;
+   struct arm_smmu_s1_cfg *s1_cfg = NULL;
+   struct arm_smmu_s2_cfg *s2_cfg = NULL;
+   struct arm_smmu_domain *smmu_domain = NULL;
struct arm_smmu_cmdq_ent prefetch_cmd = {
.opcode = CMDQ_OP_PREFETCH_CFG,
.prefetch   = {
@@ -1115,6 +1106,25 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
},
};
 
+   if (master) {
+   smmu_domain = master->domain;
+   smmu = master->smmu;
+   }
+
+   if (smmu_domain) {
+   switch (smmu_domain->stage) {
+   case ARM_SMMU_DOMAIN_S1:
+   s1_cfg = _domain->s1_cfg;
+   break;
+   case ARM_SMMU_DOMAIN_S2:
+   case ARM_SMMU_DOMAIN_NESTED:
+   s2_cfg = _domain->s2_cfg;
+   break;
+   default:
+   break;
+   }
+   }
+
if (val & STRTAB_STE_0_V) {
switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
case STRTAB_STE_0_CFG_BYPASS:
@@ -1135,8 +1145,8 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
val = STRTAB_STE_0_V;
 
/* Bypass/fault */
-   if (!ste->assigned || !(ste->s1_cfg || ste->s2_cfg)) {
-   if (!ste->assigned && disable_bypass)
+   if (!smmu_domain || !(s1_cfg || s2_cfg)) {
+   if (!smmu_domain && disable_bypass)
val |= FIELD_PREP(STRTAB_STE_0_CFG, 
STRTAB_STE_0_CFG_ABORT);
else
val |= FIELD_PREP(STRTAB_STE_0_CFG, 
STRTAB_STE_0_CFG_BYPASS);
@@ -1154,7 +1164,7 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
return;
}
 
-   if (ste->s1_cfg) {
+   if (s1_cfg) {
BUG_ON(ste_live);
dst[1] = cpu_to_le64(
 FIELD_PREP(STRTAB_STE_1_S1CIR, 
STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1169,22 +1179,22 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_device *smmu, u32 sid,
   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-   val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+   val |= (s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS);
}
 
-   if 

[PATCH v3 4/9] iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master

2019-04-17 Thread Jean-Philippe Brucker
The arm_smmu_master_data structure already represents more than just the
firmware data associated to a master, and will be used extensively to
represent a device's state when implementing more SMMU features. Rename
the structure to arm_smmu_master.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm-smmu-v3.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d3880010c6cf..50cb037f3d8a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -591,7 +591,7 @@ struct arm_smmu_device {
 };
 
 /* SMMU private data for each master */
-struct arm_smmu_master_data {
+struct arm_smmu_master {
struct arm_smmu_device  *smmu;
struct arm_smmu_strtab_ent  ste;
 };
@@ -1691,7 +1691,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)
 static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec)
 {
int i, j;
-   struct arm_smmu_master_data *master = fwspec->iommu_priv;
+   struct arm_smmu_master *master = fwspec->iommu_priv;
struct arm_smmu_device *smmu = master->smmu;
 
for (i = 0; i < fwspec->num_ids; ++i) {
@@ -1712,7 +1712,7 @@ static void arm_smmu_install_ste_for_dev(struct 
iommu_fwspec *fwspec)
 static void arm_smmu_detach_dev(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct arm_smmu_master_data *master = fwspec->iommu_priv;
+   struct arm_smmu_master *master = fwspec->iommu_priv;
 
master->ste.assigned = false;
arm_smmu_install_ste_for_dev(fwspec);
@@ -1724,7 +1724,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_device *smmu;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-   struct arm_smmu_master_data *master;
+   struct arm_smmu_master *master;
struct arm_smmu_strtab_ent *ste;
 
if (!fwspec)
@@ -1860,7 +1860,7 @@ static int arm_smmu_add_device(struct device *dev)
 {
int i, ret;
struct arm_smmu_device *smmu;
-   struct arm_smmu_master_data *master;
+   struct arm_smmu_master *master;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct iommu_group *group;
 
@@ -1913,7 +1913,7 @@ static int arm_smmu_add_device(struct device *dev)
 static void arm_smmu_remove_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-   struct arm_smmu_master_data *master;
+   struct arm_smmu_master *master;
struct arm_smmu_device *smmu;
 
if (!fwspec || fwspec->ops != _smmu_ops)
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/9] PCI: Add a stub for pci_ats_disabled()

2019-04-17 Thread Jean-Philippe Brucker
Currently pci_ats_disabled() is only defined when CONFIG_PCI is enabled.
Since we're about to use the function in the Arm SMMUv3 driver, which
could be built with CONFIG_PCI disabled, add a definition of
pci_ats_disabled() for !CONFIG_PCI.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 169c6a18d0b0..61d7cd888bad 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1713,6 +1713,7 @@ static inline int pci_irqd_intx_xlate(struct irq_domain 
*d,
 static inline const struct pci_device_id *pci_match_id(const struct 
pci_device_id *ids,
 struct pci_dev *dev)
 { return NULL; }
+static inline bool pci_ats_disabled(void) { return true; }
 #endif /* CONFIG_PCI */
 
 #ifdef CONFIG_PCI_ATS
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/9] PCI: Move ATS declarations outside of CONFIG_PCI

2019-04-17 Thread Jean-Philippe Brucker
At the moment, the ATS functions are only defined when CONFIG_PCI is
enabled. Since we're about to use them in the Arm SMMUv3 driver, which
could be built with CONFIG_PCI disabled, and they are already guarded by
CONFIG_PCI_ATS which depends on CONFIG_PCI, move the definitions outside
of CONFIG_PCI.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/pci.h | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 77448215ef5b..169c6a18d0b0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1521,21 +1521,6 @@ static inline void pcie_ecrc_get_policy(char *str) { }
 
 bool pci_ats_disabled(void);
 
-#ifdef CONFIG_PCI_ATS
-/* Address Translation Service */
-void pci_ats_init(struct pci_dev *dev);
-int pci_enable_ats(struct pci_dev *dev, int ps);
-void pci_disable_ats(struct pci_dev *dev);
-int pci_ats_queue_depth(struct pci_dev *dev);
-int pci_ats_page_aligned(struct pci_dev *dev);
-#else
-static inline void pci_ats_init(struct pci_dev *d) { }
-static inline int pci_enable_ats(struct pci_dev *d, int ps) { return -ENODEV; }
-static inline void pci_disable_ats(struct pci_dev *d) { }
-static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
-static inline int pci_ats_page_aligned(struct pci_dev *dev) { return 0; }
-#endif
-
 #ifdef CONFIG_PCIE_PTM
 int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
 #else
@@ -1730,6 +1715,21 @@ static inline const struct pci_device_id 
*pci_match_id(const struct pci_device_i
 { return NULL; }
 #endif /* CONFIG_PCI */
 
+#ifdef CONFIG_PCI_ATS
+/* Address Translation Service */
+void pci_ats_init(struct pci_dev *dev);
+int pci_enable_ats(struct pci_dev *dev, int ps);
+void pci_disable_ats(struct pci_dev *dev);
+int pci_ats_queue_depth(struct pci_dev *dev);
+int pci_ats_page_aligned(struct pci_dev *dev);
+#else
+static inline void pci_ats_init(struct pci_dev *d) { }
+static inline int pci_enable_ats(struct pci_dev *d, int ps) { return -ENODEV; }
+static inline void pci_disable_ats(struct pci_dev *d) { }
+static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
+static inline int pci_ats_page_aligned(struct pci_dev *dev) { return 0; }
+#endif
+
 /* Include architecture-dependent settings and functions */
 
 #include 
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Nadav Amit
> On Apr 17, 2019, at 10:26 AM, Ingo Molnar  wrote:
> 
> 
> * Nadav Amit  wrote:
> 
>>> On Apr 17, 2019, at 10:09 AM, Ingo Molnar  wrote:
>>> 
>>> 
>>> * Khalid Aziz  wrote:
>>> 
> I.e. the original motivation of the XPFO patches was to prevent execution 
> of direct kernel mappings. Is this motivation still present if those 
> mappings are non-executable?
> 
> (Sorry if this has been asked and answered in previous discussions.)
 
 Hi Ingo,
 
 That is a good question. Because of the cost of XPFO, we have to be very
 sure we need this protection. The paper from Vasileios, Michalis and
 Angelos - ,
 does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
 and 6.2.
>>> 
>>> So it would be nice if you could generally summarize external arguments 
>>> when defending a patchset, instead of me having to dig through a PDF 
>>> which not only causes me to spend time that you probably already spent 
>>> reading that PDF, but I might also interpret it incorrectly. ;-)
>>> 
>>> The PDF you cited says this:
>>> 
>>> "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced 
>>>  in many platforms, including x86-64.  In our example, the content of 
>>>  user address 0xBEEF000 is also accessible through kernel address 
>>>  0x87FF9F08 as plain, executable code."
>>> 
>>> Is this actually true of modern x86-64 kernels? We've locked down W^X 
>>> protections in general.
>> 
>> As I was curious, I looked at the paper. Here is a quote from it:
>> 
>> "In x86-64, however, the permissions of physmap are not in sane state.
>> Kernels up to v3.8.13 violate the W^X property by mapping the entire region
>> as “readable, writeable, and executable” (RWX)—only very recent kernels
>> (≥v3.9) use the more conservative RW mapping.”
> 
> But v3.8.13 is a 5+ years old kernel, it doesn't count as a "modern" 
> kernel in any sense of the word. For any proposed patchset with 
> significant complexity and non-trivial costs the benchmark version 
> threshold is the "current upstream kernel".
> 
> So does that quote address my followup questions:
> 
>> Is this actually true of modern x86-64 kernels? We've locked down W^X
>> protections in general.
>> 
>> I.e. this conclusion:
>> 
>>  "Therefore, by simply overwriting kfptr with 0x87FF9F08 and
>>   triggering the kernel to dereference it, an attacker can directly
>>   execute shell code with kernel privileges."
>> 
>> ... appears to be predicated on imperfect W^X protections on the x86-64
>> kernel.
>> 
>> Do such holes exist on the latest x86-64 kernel? If yes, is there a
>> reason to believe that these W^X holes cannot be fixed, or that any fix
>> would be more expensive than XPFO?
> 
> ?
> 
> What you are proposing here is a XPFO patch-set against recent kernels 
> with significant runtime overhead, so my questions about the W^X holes 
> are warranted.
> 

Just to clarify - I am an innocent bystander and have no part in this work.
I was just looking (again) at the paper, as I was curious due to the recent
patches that I sent that improve W^X protection.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Khalid Aziz
On 4/17/19 11:09 AM, Ingo Molnar wrote:
> 
> * Khalid Aziz  wrote:
> 
>>> I.e. the original motivation of the XPFO patches was to prevent execution 
>>> of direct kernel mappings. Is this motivation still present if those 
>>> mappings are non-executable?
>>>
>>> (Sorry if this has been asked and answered in previous discussions.)
>>
>> Hi Ingo,
>>
>> That is a good question. Because of the cost of XPFO, we have to be very
>> sure we need this protection. The paper from Vasileios, Michalis and
>> Angelos - ,
>> does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
>> and 6.2.
> 
> So it would be nice if you could generally summarize external arguments 
> when defending a patchset, instead of me having to dig through a PDF 
> which not only causes me to spend time that you probably already spent 
> reading that PDF, but I might also interpret it incorrectly. ;-)

Sorry, you are right. Even though that paper explains it well, a summary
is always useful.

> 
> The PDF you cited says this:
> 
>   "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced 
>in many platforms, including x86-64.  In our example, the content of 
>user address 0xBEEF000 is also accessible through kernel address 
>0x87FF9F08 as plain, executable code."
> 
> Is this actually true of modern x86-64 kernels? We've locked down W^X 
> protections in general.
> 
> I.e. this conclusion:
> 
>   "Therefore, by simply overwriting kfptr with 0x87FF9F08 and 
>triggering the kernel to dereference it, an attacker can directly 
>execute shell code with kernel privileges."
> 
> ... appears to be predicated on imperfect W^X protections on the x86-64 
> kernel.
> 
> Do such holes exist on the latest x86-64 kernel? If yes, is there a 
> reason to believe that these W^X holes cannot be fixed, or that any fix 
> would be more expensive than XPFO?

Even if physmap is not executable, return-oriented programming (ROP) can
still be used to launch an attack. Instead of placing executable code at
user address 0xBEEF000, attacker can place an ROP payload there. kfptr
is then overwritten to point to a stack-pivoting gadget. Using the
physmap address aliasing, the ROP payload becomes kernel-mode stack. The
execution can then be hijacked upon execution of ret instruction. This
is a gist of the subsection titled "Non-executable physmap" under
section 6.2 and it looked convincing enough to me. If you have a
different take on this, I am very interested in your point of view.

Thanks,
Khalid


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Ingo Molnar

* Nadav Amit  wrote:

> > On Apr 17, 2019, at 10:09 AM, Ingo Molnar  wrote:
> > 
> > 
> > * Khalid Aziz  wrote:
> > 
> >>> I.e. the original motivation of the XPFO patches was to prevent execution 
> >>> of direct kernel mappings. Is this motivation still present if those 
> >>> mappings are non-executable?
> >>> 
> >>> (Sorry if this has been asked and answered in previous discussions.)
> >> 
> >> Hi Ingo,
> >> 
> >> That is a good question. Because of the cost of XPFO, we have to be very
> >> sure we need this protection. The paper from Vasileios, Michalis and
> >> Angelos - ,
> >> does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
> >> and 6.2.
> > 
> > So it would be nice if you could generally summarize external arguments 
> > when defending a patchset, instead of me having to dig through a PDF 
> > which not only causes me to spend time that you probably already spent 
> > reading that PDF, but I might also interpret it incorrectly. ;-)
> > 
> > The PDF you cited says this:
> > 
> >  "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced 
> >   in many platforms, including x86-64.  In our example, the content of 
> >   user address 0xBEEF000 is also accessible through kernel address 
> >   0x87FF9F08 as plain, executable code."
> > 
> > Is this actually true of modern x86-64 kernels? We've locked down W^X 
> > protections in general.
> 
> As I was curious, I looked at the paper. Here is a quote from it:
> 
> "In x86-64, however, the permissions of physmap are not in sane state.
> Kernels up to v3.8.13 violate the W^X property by mapping the entire region
> as “readable, writeable, and executable” (RWX)—only very recent kernels
> (≥v3.9) use the more conservative RW mapping.”

But v3.8.13 is a 5+ years old kernel, it doesn't count as a "modern" 
kernel in any sense of the word. For any proposed patchset with 
significant complexity and non-trivial costs the benchmark version 
threshold is the "current upstream kernel".

So does that quote address my followup questions:

> Is this actually true of modern x86-64 kernels? We've locked down W^X
> protections in general.
>
> I.e. this conclusion:
>
>   "Therefore, by simply overwriting kfptr with 0x87FF9F08 and
>triggering the kernel to dereference it, an attacker can directly
>execute shell code with kernel privileges."
>
> ... appears to be predicated on imperfect W^X protections on the x86-64
> kernel.
>
> Do such holes exist on the latest x86-64 kernel? If yes, is there a
> reason to believe that these W^X holes cannot be fixed, or that any fix
> would be more expensive than XPFO?

?

What you are proposing here is a XPFO patch-set against recent kernels 
with significant runtime overhead, so my questions about the W^X holes 
are warranted.

Thanks,

Ingo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Nadav Amit
> On Apr 17, 2019, at 10:09 AM, Ingo Molnar  wrote:
> 
> 
> * Khalid Aziz  wrote:
> 
>>> I.e. the original motivation of the XPFO patches was to prevent execution 
>>> of direct kernel mappings. Is this motivation still present if those 
>>> mappings are non-executable?
>>> 
>>> (Sorry if this has been asked and answered in previous discussions.)
>> 
>> Hi Ingo,
>> 
>> That is a good question. Because of the cost of XPFO, we have to be very
>> sure we need this protection. The paper from Vasileios, Michalis and
>> Angelos - ,
>> does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
>> and 6.2.
> 
> So it would be nice if you could generally summarize external arguments 
> when defending a patchset, instead of me having to dig through a PDF 
> which not only causes me to spend time that you probably already spent 
> reading that PDF, but I might also interpret it incorrectly. ;-)
> 
> The PDF you cited says this:
> 
>  "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced 
>   in many platforms, including x86-64.  In our example, the content of 
>   user address 0xBEEF000 is also accessible through kernel address 
>   0x87FF9F08 as plain, executable code."
> 
> Is this actually true of modern x86-64 kernels? We've locked down W^X 
> protections in general.

As I was curious, I looked at the paper. Here is a quote from it:

"In x86-64, however, the permissions of physmap are not in sane state.
Kernels up to v3.8.13 violate the W^X property by mapping the entire region
as “readable, writeable, and executable” (RWX)—only very recent kernels
(≥v3.9) use the more conservative RW mapping.”

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Ingo Molnar


* Khalid Aziz  wrote:

> > I.e. the original motivation of the XPFO patches was to prevent execution 
> > of direct kernel mappings. Is this motivation still present if those 
> > mappings are non-executable?
> > 
> > (Sorry if this has been asked and answered in previous discussions.)
> 
> Hi Ingo,
> 
> That is a good question. Because of the cost of XPFO, we have to be very
> sure we need this protection. The paper from Vasileios, Michalis and
> Angelos - ,
> does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
> and 6.2.

So it would be nice if you could generally summarize external arguments 
when defending a patchset, instead of me having to dig through a PDF 
which not only causes me to spend time that you probably already spent 
reading that PDF, but I might also interpret it incorrectly. ;-)

The PDF you cited says this:

  "Unfortunately, as shown in Table 1, the W^X prop-erty is not enforced 
   in many platforms, including x86-64.  In our example, the content of 
   user address 0xBEEF000 is also accessible through kernel address 
   0x87FF9F08 as plain, executable code."

Is this actually true of modern x86-64 kernels? We've locked down W^X 
protections in general.

I.e. this conclusion:

  "Therefore, by simply overwriting kfptr with 0x87FF9F08 and 
   triggering the kernel to dereference it, an attacker can directly 
   execute shell code with kernel privileges."

... appears to be predicated on imperfect W^X protections on the x86-64 
kernel.

Do such holes exist on the latest x86-64 kernel? If yes, is there a 
reason to believe that these W^X holes cannot be fixed, or that any fix 
would be more expensive than XPFO?

Thanks,

Ingo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Khalid Aziz
On 4/17/19 10:15 AM, Ingo Molnar wrote:
> 
> [ Sorry, had to trim the Cc: list from hell. Tried to keep all the 
>   mailing lists and all x86 developers. ]
> 
> * Khalid Aziz  wrote:
> 
>> From: Juerg Haefliger 
>>
>> This patch adds basic support infrastructure for XPFO which protects 
>> against 'ret2dir' kernel attacks. The basic idea is to enforce 
>> exclusive ownership of page frames by either the kernel or userspace, 
>> unless explicitly requested by the kernel. Whenever a page destined for 
>> userspace is allocated, it is unmapped from physmap (the kernel's page 
>> table). When such a page is reclaimed from userspace, it is mapped back 
>> to physmap. Individual architectures can enable full XPFO support using 
>> this infrastructure by supplying architecture specific pieces.
> 
> I have a higher level, meta question:
> 
> Is there any updated analysis outlining why this XPFO overhead would be 
> required on x86-64 kernels running on SMAP/SMEP CPUs which should be all 
> recent Intel and AMD CPUs, and with kernel that mark all direct kernel 
> mappings as non-executable - which should be all reasonably modern 
> kernels later than v4.0 or so?
> 
> I.e. the original motivation of the XPFO patches was to prevent execution 
> of direct kernel mappings. Is this motivation still present if those 
> mappings are non-executable?
> 
> (Sorry if this has been asked and answered in previous discussions.)

Hi Ingo,

That is a good question. Because of the cost of XPFO, we have to be very
sure we need this protection. The paper from Vasileios, Michalis and
Angelos - ,
does go into how ret2dir attacks can bypass SMAP/SMEP in sections 6.1
and 6.2.

Thanks,
Khalid


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-17 Thread Ingo Molnar


[ Sorry, had to trim the Cc: list from hell. Tried to keep all the 
  mailing lists and all x86 developers. ]

* Khalid Aziz  wrote:

> From: Juerg Haefliger 
> 
> This patch adds basic support infrastructure for XPFO which protects 
> against 'ret2dir' kernel attacks. The basic idea is to enforce 
> exclusive ownership of page frames by either the kernel or userspace, 
> unless explicitly requested by the kernel. Whenever a page destined for 
> userspace is allocated, it is unmapped from physmap (the kernel's page 
> table). When such a page is reclaimed from userspace, it is mapped back 
> to physmap. Individual architectures can enable full XPFO support using 
> this infrastructure by supplying architecture specific pieces.

I have a higher level, meta question:

Is there any updated analysis outlining why this XPFO overhead would be 
required on x86-64 kernels running on SMAP/SMEP CPUs which should be all 
recent Intel and AMD CPUs, and with kernel that mark all direct kernel 
mappings as non-executable - which should be all reasonably modern 
kernels later than v4.0 or so?

I.e. the original motivation of the XPFO patches was to prevent execution 
of direct kernel mappings. Is this motivation still present if those 
mappings are non-executable?

(Sorry if this has been asked and answered in previous discussions.)

Thanks,

Ingo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/mediatek: fix leaked of_node references

2019-04-17 Thread Markus Elfring
> The call to of_parse_phandle returns a node pointer with refcount
> incremented thus it must be explicitly decremented after the last
> usage.

Can a splitting of this information into two sentences help?


> 581 static int mtk_iommu_probe(struct platform_device *pdev)
> 582 {
> ...

I suggest to reconsider such a commit description once more.
Would it be better to mention that another function call should be added
in two if branches so that the exception handling would be completed
(instead of copying source code where a software update should become
clear also from the provided diff hunk)?

Regards,
Markus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/mediatek: fix leaked of_node references

2019-04-17 Thread Markus Elfring
> The call to of_parse_phandle returns a node pointer with refcount
> incremented thus it must be explicitly decremented after the last
> usage.

Can a splitting of this information into two sentences help?


> 581 static int mtk_iommu_probe(struct platform_device *pdev)
> 582 {
> ...

I suggest to reconsider such a commit description once more.
Would it be better to mention that another function call should be added
in two if branches so that the exception handling would be completed
(instead of copying source code where a software update should become
clear also from the provided diff hunk)?

Regards,
Markus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-17 Thread Robin Murphy

On 17/04/2019 07:33, Christoph Hellwig wrote:

On Wed, Apr 10, 2019 at 08:11:57AM +0200, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 06:59:32PM +0100, Robin Murphy wrote:

On 27/03/2019 08:04, Christoph Hellwig wrote:

This keeps the code together and will simplify compiling the code
out on architectures that are always dma coherent.


And this is where things take a turn in the direction I just can't get on
with - I'm looking at the final result and the twisty maze of little
disjoint helpers all overlapping each other in functionality is really
difficult to follow. And I would *much* rather have things rely on
compile-time constant optimisation than spend the future having to fix the
#ifdefed parts for arm64 whenever x86-centric changes fail to test them.


Can you draft up a patch on top of my series to show me what you
want?  I can take care of finishing it up and moving the changes
into the right patches in the series.


Any chance to make some progress on this?  Or at least a better
description of what you want?


Heh, I did actually start writing a reply last night to apologise for 
the delay - I've been clearing the decks a bit so that I can sit down 
and actually concentrate on this (plus the PCI DT mask fix). That's now 
my plan for the rest of today :)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


iommu/vt-d: drop mm use count if address is not canonical

2019-04-17 Thread Pan Bian
The use count of svm->mm is incremented by mmget_not_zero. However, it
is not dropped when the address is not canonical. This patch fixes the
bug.

Fixes: 9d8c3af31607("iommu/vt-d: IOMMU Page Request needs to check if
address is canonical.")

Signed-off-by: Pan Bian 
---
 drivers/iommu/intel-svm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 3a4b09a..2630d2e 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -574,8 +574,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto bad_req;
 
/* If address is not canonical, return invalid response */
-   if (!is_canonical_address(address))
+   if (!is_canonical_address(address)) {
+   mmput(svm->mm);
goto bad_req;
+   }
 
down_read(>mm->mmap_sem);
vma = find_extend_vma(svm->mm, address);
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/mediatek: fix leaked of_node references

2019-04-17 Thread Matthias Brugger



On 17/04/2019 04:41, Wen Yang wrote:
> The call to of_parse_phandle returns a node pointer with refcount
> incremented thus it must be explicitly decremented after the last
> usage.
> 
> 581 static int mtk_iommu_probe(struct platform_device *pdev)
> 582 {
> ...
> 626 for (i = 0; i < larb_nr; i++) {
> 627 struct device_node *larbnode;
> ...
> 631 larbnode = of_parse_phandle(...);
> 632 if (!larbnode)
> 633 return -EINVAL;
> 634
> 635 if (!of_device_is_available(larbnode))
> 636 continue; ---> leaked here
> 637
> ...
> 643 if (!plarbdev)
> 644 return -EPROBE_DEFER; ---> leaked here
> ...
> 647 component_match_add_release(dev, , release_of,
> 648 compare_of, larbnode);
>---> release_of will call of_node_put
> 649 }
> ...
> 650
> 
> Detected by coccinelle with the following warnings:
> ./drivers/iommu/mtk_iommu.c:644:3-9: ERROR: missing of_node_put; acquired a 
> node pointer with refcount incremented on line 631, but without a 
> corresponding object release within this function.
> 
> Signed-off-by: Wen Yang 
> Cc: Joerg Roedel 
> Cc: Matthias Brugger 
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-media...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org

Reviewed-by: Matthias Brugger 

> ---
>  drivers/iommu/mtk_iommu.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index de3e022..b66d11b 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -632,16 +632,20 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   if (!larbnode)
>   return -EINVAL;
>  
> - if (!of_device_is_available(larbnode))
> + if (!of_device_is_available(larbnode)) {
> + of_node_put(larbnode);
>   continue;
> + }
>  
>   ret = of_property_read_u32(larbnode, "mediatek,larb-id", );
>   if (ret)/* The id is consecutive if there is no this property */
>   id = i;
>  
>   plarbdev = of_find_device_by_node(larbnode);
> - if (!plarbdev)
> + if (!plarbdev) {
> + of_node_put(larbnode);
>   return -EPROBE_DEFER;
> + }
>   data->smi_imu.larb_imu[id].dev = >dev;
>  
>   component_match_add_release(dev, , release_of,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-17 Thread Christoph Hellwig
On Wed, Apr 10, 2019 at 08:11:57AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 09, 2019 at 06:59:32PM +0100, Robin Murphy wrote:
> > On 27/03/2019 08:04, Christoph Hellwig wrote:
> >> This keeps the code together and will simplify compiling the code
> >> out on architectures that are always dma coherent.
> >
> > And this is where things take a turn in the direction I just can't get on 
> > with - I'm looking at the final result and the twisty maze of little 
> > disjoint helpers all overlapping each other in functionality is really 
> > difficult to follow. And I would *much* rather have things rely on 
> > compile-time constant optimisation than spend the future having to fix the 
> > #ifdefed parts for arm64 whenever x86-centric changes fail to test them.
> 
> Can you draft up a patch on top of my series to show me what you
> want?  I can take care of finishing it up and moving the changes
> into the right patches in the series.

Any chance to make some progress on this?  Or at least a better
description of what you want?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu