RE: Plan for /dev/ioasid RFC v2

2021-06-25 Thread Tian, Kevin
Hi, Alex/Joerg/Jason,

Want to draw your attention on an updated proposal below. Let's see
whether there is a converged direction to move forward. 

> From: Jason Gunthorpe 
> Sent: Saturday, June 19, 2021 2:23 AM
> 
> On Fri, Jun 18, 2021 at 04:57:40PM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Friday, June 18, 2021 8:20 AM
> > >
> > > On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote:
> > >
> > > > I've referred to this as a limitation of type1, that we can't put
> > > > devices within the same group into different address spaces, such as
> > > > behind separate vRoot-Ports in a vIOMMU config, but really, who cares?
> > > > As isolation support improves we see fewer multi-device groups, this
> > > > scenario becomes the exception.  Buy better hardware to use the
> devices
> > > > independently.
> > >
> > > This is basically my thinking too, but my conclusion is that we should
> > > not continue to make groups central to the API.
> > >
> > > As I've explained to David this is actually causing functional
> > > problems and mess - and I don't see a clean way to keep groups central
> > > but still have the device in control of what is happening. We need
> > > this device <-> iommu connection to be direct to robustly model all
> > > the things that are in the RFC.
> > >
> > > To keep groups central someone needs to sketch out how to solve
> > > today's mdev SW page table and mdev PASID issues in a clean
> > > way. Device centric is my suggestion on how to make it clean, but I
> > > haven't heard an alternative??
> > >
> > > So, I view the purpose of this discussion to scope out what a
> > > device-centric world looks like and then if we can securely fit in the
> > > legacy non-isolated world on top of that clean future oriented
> > > API. Then decide if it is work worth doing or not.
> > >
> > > To my mind it looks like it is not so bad, granted not every detail is
> > > clear, and no code has be sketched, but I don't see a big scary
> > > blocker emerging. An extra ioctl or two, some special logic that
> > > activates for >1 device groups that looks a lot like VFIO's current
> > > logic..
> > >
> > > At some level I would be perfectly fine if we made the group FD part
> > > of the API for >1 device groups - except that complexifies every user
> > > space implementation to deal with that. It doesn't feel like a good
> > > trade off.
> > >
> >
> > Would it be an acceptable tradeoff by leaving >1 device groups
> > supported only via legacy VFIO (which is anyway kept for backward
> > compatibility), if we think such scenario is being deprecated over
> > time (thus little value to add new features on it)? Then all new
> > sub-systems including vdpa and new vfio only support singleton
> > device group via /dev/iommu...
> 
> That might just be a great idea - userspace has to support those APIs
> anyhow, if it can be made trivially obvious to use this fallback even
> though /dev/iommu is available it is a great place to start. It also
> means PASID/etc are naturally blocked off.
> 
> Maybe years down the road we will want to harmonize them, so I would
> still sketch it out enough to be confident it could be implemented..
> 

First let's align on the high level goal of supporting multi-devices group 
via IOMMU fd. Based on previous discussions I feel it's fair to say that 
we will not provide new features beyond what vfio group delivers today,
which implies:

1) All devices within the group must share the same address space.

Though it's possible to support multiple address spaces (e.g. if caused
by !ACS), there are some scenarios (DMA aliasing, RID sharing, etc.)
where a single address space is mandatory. The effort to support
multiple spaces is not worthwhile due to improved isolation over time.

2) It's not necessary to bind all devices within the group to the IOMMU fd.

Other devices could be left unused, or bound to a known driver which
doesn't do DMA. This implies a group viability mechanism must be in
place which can identify when the group is viable for operation and 
BUG_ON() when the viability is changed due to user action.

3) User must be denied from accessing a device before its group is attached
 to a known security context.

If above goals are agreed, below is the updated proposal for supporting
multi-devices group via device-centric API. Most ideas come from Jason.
Here try to expand and compose them in a full picture.

In general:

-   vfio keeps existing uAPI sequence, with slightly different semantics:

a) VFIO_GROUP_SET_CONTAINER, as today

b) VFIO_SET_IOMMU with a new iommu type (VFIO_EXTERNAL_
 IOMMU) which, once set, tells VFIO not to establish its own
 security context.

c)  VFIO_GROUP_GET_DEVICE_FD_NEW, carrying additional info
 about external iommu driver (iommu_fd, device_cookie). This
 call automatically binds 

Re: [PATCH v5 0/5] Add support for ACPI VIOT

2021-06-25 Thread Joerg Roedel
On Fri, Jun 18, 2021 at 05:20:55PM +0200, Jean-Philippe Brucker wrote:
> Jean-Philippe Brucker (5):
>   ACPI: arm64: Move DMA setup operations out of IORT
>   ACPI: Move IOMMU setup code out of IORT
>   ACPI: Add driver for the VIOT table
>   iommu/dma: Pass address limit rather than size to
> iommu_setup_dma_ops()
>   iommu/virtio: Enable x86 support

Applied all for v5.14, thanks Jean-Philippe for driving this effort and
the patience to do it right!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 00/12] Restricted DMA

2021-06-25 Thread Will Deacon
On Thu, Jun 24, 2021 at 03:19:48PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Jun 24, 2021 at 11:55:14PM +0800, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> > 
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> > 
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> > 
> > [1a] 
> > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> > [1b] 
> > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> > [2] https://blade.tencent.com/en/advisories/qualpwn/
> > [3] 
> > https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> > [4] 
> > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> > 
> > v15:
> > - Apply Will's diff 
> > (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
> >   to fix the crash reported by Qian.
> > - Add Stefano's Acked-by tag for patch 01/12 from v14
> 
> That all should be now be on
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/
> devel/for-linus-5.14 (and linux-next)

Thanks Konrad!

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-06-25 Thread Joerg Roedel
Hi Douglas,

On Thu, Jun 24, 2021 at 10:17:56AM -0700, Douglas Anderson wrote:
> The goal of this patch series is to get better SD/MMC performance on
> Qualcomm eMMC controllers and in generally nudge us forward on the
> path of allowing some devices to be in strict mode and others to be in
> non-strict mode.

So if I understand it right, this patch-set wants a per-device decision
about setting dma-mode to strict vs. non-strict.

I think we should get to the reason why strict mode is used by default
first. Is the default on ARM platforms to use iommu-strict mode by
default and if so, why?

The x86 IOMMUs use non-strict mode by default (yes, it is a security
trade-off).

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/5] drm/msm: Improve the a6xx page fault handler

2021-06-25 Thread Rob Clark
On Thu, Jun 24, 2021 at 8:39 PM Bjorn Andersson
 wrote:
>
> On Thu 10 Jun 16:44 CDT 2021, Rob Clark wrote:
> [..]
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> > b/drivers/gpu/drm/msm/msm_iommu.c
> > index 50d881794758..6975b95c3c29 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -211,8 +211,17 @@ static int msm_fault_handler(struct iommu_domain 
> > *domain, struct device *dev,
> >   unsigned long iova, int flags, void *arg)
> >  {
> >   struct msm_iommu *iommu = arg;
> > + struct adreno_smmu_priv *adreno_smmu = 
> > dev_get_drvdata(iommu->base.dev);
> > + struct adreno_smmu_fault_info info, *ptr = NULL;
> > +
> > + if (adreno_smmu->get_fault_info) {
>
> This seemed reasonable when I read it last time, but I didn't realize
> that the msm_fault_handler() is installed for all msm_iommu instances.
>
> So while we're trying to recover from the boot splash and setup the new
> framebuffer we end up here with iommu->base.dev being the mdss device.
> Naturally drvdata of mdss is not a struct adreno_smmu_priv.
>
> > + adreno_smmu->get_fault_info(adreno_smmu->cookie, );
>
> So here we just jump straight out into hyperspace, never to return.
>
> Not sure how to wire this up to avoid the problem, but right now I don't
> think we can boot any device with a boot splash.
>

I think we could do:


diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index eed2a762e9dd..30ee8866154e 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -29,6 +29,9 @@ static struct msm_iommu_pagetable
*to_pagetable(struct msm_mmu *mmu)
  return container_of(mmu, struct msm_iommu_pagetable, base);
 }

+static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
+ unsigned long iova, int flags, void *arg);
+
 static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
  size_t size)
 {
@@ -151,6 +154,8 @@ struct msm_mmu *msm_iommu_pagetable_create(struct
msm_mmu *parent)
  struct io_pgtable_cfg ttbr0_cfg;
  int ret;

+ iommu_set_fault_handler(iommu->domain, msm_fault_handler, iommu);
+
  /* Get the pagetable configuration from the domain */
  if (adreno_smmu->cookie)
  ttbr1_cfg = adreno_smmu->get_ttbr1_cfg(adreno_smmu->cookie);
@@ -300,7 +305,6 @@ struct msm_mmu *msm_iommu_new(struct device *dev,
struct iommu_domain *domain)

  iommu->domain = domain;
  msm_mmu_init(>base, dev, , MSM_MMU_IOMMU);
- iommu_set_fault_handler(domain, msm_fault_handler, iommu);

  atomic_set(>pagetables, 0);



That would have the result of setting the same fault handler multiple
times, but that looks harmless.  Mostly the fault handling stuff is to
make it easier to debug userspace issues, the fallback dmesg spam from
arm-smmu should be sufficient for any kernel side issues.

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-25 Thread Luck, Tony
On Wed, Jun 09, 2021 at 04:34:31PM -0700, Andy Lutomirski wrote:
> On 6/9/21 10:32 AM, Luck, Tony wrote:
> >>> I've told Fenghua to dig out the previous iteration of this patch where
> >>> the plan was to lazily fix the PASID_MSR in other threads in the #GP
> >>> handler.
> >>
> >> Blech.  Also this won't work for other PASID-like features.
> >>
> >> I have a half-written patch to fix this up for real.  Stay tuned.
> > 
> > Andy: Any update on this?
> > 
> > -Tony
> > 
> 
> Let me try to merge my pile with tglx's pile and come up with something
> halfway sane.

Looks like Thomas' pile is done (well done enough to be queued in TIP).

Can we see your "pile" soon?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Plan for /dev/ioasid RFC v2

2021-06-25 Thread Jason Gunthorpe
On Fri, Jun 25, 2021 at 10:27:18AM +, Tian, Kevin wrote:

> -   When receiving the binding call for the 1st device in a group, iommu_fd 
> calls iommu_group_set_block_dma(group, dev->driver) which does 
> several things:

The whole problem here is trying to match this new world where we want
devices to be in charge of their own IOMMU configuration and the old
world where groups are in charge.

Inserting the group fd and then calling a device-centric
VFIO_GROUP_GET_DEVICE_FD_NEW doesn't solve this conflict, and isn't
necessary. We can always get the group back from the device at any
point in the sequence do to a group wide operation.

What I saw as the appeal of the sort of idea was to just completely
leave all the difficult multi-device-group scenarios behind on the old
group centric API and then we don't have to deal with them at all, or
least not right away.

I'd see some progression where iommu_fd only works with 1:1 groups at
the start. Other scenarios continue with the old API.

Then maybe groups where all devices use the same IOASID.

Then 1:N groups if the source device is reliably identifiable, this
requires iommu subystem work to attach domains to sub-group objects -
not sure it is worthwhile.

But at least we can talk about each step with well thought out patches

The only thing that needs to be done to get the 1:1 step is to broadly
define how the other two cases will work so we don't get into trouble
and set some way to exclude the problematic cases from even getting to
iommu_fd in the first place.

For instance if we go ahead and create /dev/vfio/device nodes we could
do this only if the group was 1:1, otherwise the group cdev has to be
used, along with its API.

> a) Check group viability. A group is viable only when all devices in
> the group are in one of below states:
> 
> * driver-less
> * bound to a driver which is same as dev->driver (vfio in 
> this case)
> * bound to an otherwise allowed driver (same list as in vfio)

This really shouldn't use hardwired driver checks. Attached drivers
should generically indicate to the iommu layer that they are safe for
iommu_fd usage by calling some function around probe()

Thus a group must contain only iommu_fd safe drivers, or drivers-less
devices before any of it can be used. It is the more general
refactoring of what VFIO is doing.

> c) The iommu layer also verifies group viability on BUS_NOTIFY_
> BOUND_DRIVER event. BUG_ON if viability is broken while block_dma
> is set.

And with this concept of iommu_fd safety being first-class maybe we
can somehow eliminate this gross BUG_ON (and the 100's of lines of
code that are used to create it) by denying probe to non-iommu-safe
drivers, somehow.

> -   Binding other devices in the group to iommu_fd just succeeds since 
> the group is already in block_dma.

I think the rest of this more or less describes the device centric
logic for multi-device groups we've already talked about. I don't
think it benifits from having the group fd

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-06-25 Thread Doug Anderson
Hi,

On Fri, Jun 25, 2021 at 6:19 AM Joerg Roedel  wrote:
>
> Hi Douglas,
>
> On Thu, Jun 24, 2021 at 10:17:56AM -0700, Douglas Anderson wrote:
> > The goal of this patch series is to get better SD/MMC performance on
> > Qualcomm eMMC controllers and in generally nudge us forward on the
> > path of allowing some devices to be in strict mode and others to be in
> > non-strict mode.
>
> So if I understand it right, this patch-set wants a per-device decision
> about setting dma-mode to strict vs. non-strict.
>
> I think we should get to the reason why strict mode is used by default
> first. Is the default on ARM platforms to use iommu-strict mode by
> default and if so, why?
>
> The x86 IOMMUs use non-strict mode by default (yes, it is a security
> trade-off).

It is certainly a good question. I will say that, as per usual, I'm
fumbling around trying to solve problems in subsystems I'm not an
expert at, so if something I'm saying sounds like nonsense it probably
is. Please correct me.

I guess I'd start out by thinking about what devices I think need to
be in "strict" mode. Most of my thoughts on this are in the 3rd patch
in the series. I think devices where it's important to be in strict
mode fall into "Case 1" from that patch description, copied here:

Case 1: IOMMUs prevent malicious code running on the peripheral (maybe
a malicious peripheral or maybe someone exploited a benign peripheral)
from turning into an exploit of the Linux kernel. This is particularly
important if the peripheral has loadable / updatable firmware or if
the peripheral has some type of general purpose processor and is
processing untrusted inputs. It's also important if the device is
something that can be easily plugged into the host and the device has
direct DMA access itself, like a PCIe device.


Using sc7180 as an example (searching for iommus in sc7180.dtsi), I'd
expect these peripherals to be in strict mode:

* WiFi / LTE - I'm almost certain we want this in "strict" mode. Both
have loadable / updatable firmware and both do complex processing on
untrusted inputs. Both have a history of being compromised over the
air just by being near an attacker. Note that on sc7180 these are
_not_ connected over PCI so we can't leverage any PCI mechanism for
deciding strict / non-strict.

* Video decode / encode - pretty sure we want this in strict. It's got
loadable / updatable firmware and processing complex / untrusted
inputs.

* LPASS (low power audio subsystem) - I don't know a ton and I think
we don't use this much on our designs, but I believe it meets the
definitions for needing "strict".

* The QUPs (handles UART, SPI, and i2c) - I'm not as sure here. These
are much "smarter" than you'd expect. They have loadable / updatable
firmware and certainly have a sort of general purpose processor in
them. They also might be processing untrusted inputs, but presumably
in a pretty simple way. At the moment we don't use a ton of DMA here
anyway and these are pretty low speed, so I would tend to leave them
as strict just to be on the safe side.


I'd expect these to be non-strict:

* SD/MMC - as described in this patch series.

* USB - As far as I know firmware isn't updatable and has no history
of being compromised.


Special:

* GPU - This already has a bunch of special cases, so we probably
don't need to discuss here.


As far as I can tell everything in sc7180.dtsi that has an "iommus"
property is classified above. So, unless I'm wrong and it's totally
fine to run LTE / WiFi / Video / LPASS in non-strict mode then:

* We still need some way to pick strict vs. non-strict.

* Since I've only identified two peripherals that I think should be
non-strict, having "strict" the default seems like fewer special
cases. It's also safer.


In terms of thinking about x86 / AMD where the default is non-strict,
I don't have any historical knowledge there. I guess the use of PCI
for connecting WiFi is more common (so you can use the PCI special
cases) and I'd sorta hope that WiFi is running in strict mode. For
video encode / decode, perhaps x86 / AMD are just accepting the risk
here because there was no kernel infrastructure for doing better? I'd
also expect that x86/AMD don't have something quite as crazy as the
QUPs for UART/I2C/SPI, but even if they do I wouldn't be terribly
upset if they were in non-strict mode.

...so I guess maybe the super short answer to everything above is that
I believe that at least WiFi ought to be in "strict" mode and it's not
on PCI so we need to come up with some type of per-device solution.


-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 0/6] iommu: Enhance IOMMU default DMA mode build options

2021-06-25 Thread John Garry

On 18/06/2021 12:34, John Garry wrote:

This is a reboot of Zhen Lei's series from a couple of years ago, which
never made it across the line.

I still think that it has some value, so taking up the mantle.

Motivation:
Allow lazy mode be default mode for DMA domains for all ARCHs, and not
only those who hardcode it (to be lazy). For ARM64, currently we must use
a kernel command line parameter to use lazy mode, which is less than
ideal.

I have now included the print for strict/lazy mode, which I originally
sent in:
https://lore.kernel.org/linux-iommu/72eb3de9-1d1c-ae46-c5a9-95f26525d...@huawei.com/

There was some concern there about drivers and their custom prints
conflicting with the print in that patch, but I think that it
should be ok.

Based on next-20210611 + "iommu: Update "iommu.strict" documentation"


Hi Joerg, Will,

We think that this series is ready to go.

There would be a build conflict with the following:
https://lore.kernel.org/linux-iommu/20210616100500.174507-1-na...@vmware.com/

So please let us know where you stand on it, so that could be resolved.

Robin and Baolu have kindly reviewed all the patches, apart from the AMD 
one.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU regression in 5.11 kernel related to device removal from PCI topolgy

2021-06-25 Thread Robin Murphy

Hi Andrey,

On 2021-06-24 17:19, Andrey Grodzovsky via iommu wrote:

Hello,

We are testing AMD GPU removal from PCI topology using sysfs 'remove'
hook. We encountered a crash in IOMMU code related to double free.
Note that the crash happens without even loading AMD graphic driver -
amdgpu. The dmesg with KASAN report and PCI topology is attached here
in log. I went a bit ahead to 5.12-rc3 kernel and already there this
crash is gone. Also we didn't have such issue in 5.9 kernel.
It's essential to us to use 5.11 kernel and so we cannot currently
to just skip to 5.12 kernel to avoid this issue. I would very much
appreciate if someone on the list is aware of a fix  for this issue
which was done that we could back-port to our private 5.11 kernel tree.


Unfortunately I don't recall any specific issues with the IOVA and 
iommu-dma code in that timeframe, and I fear it may be nastier than it 
looks. The KASAN report implies that put_iova_domain() (probably the 
free_iova_mem() loop) is trying to free something allocated by 
iommu_get_dma_cookie() - that's a big red flag to start with. The only 
thing that iommu_get_dma_cookie() allocates is the iova_domain itself, 
so the most logical cause would seem to be put_iova_domain() trying to 
free iovad->anchor, except I'm struggling to make that 136-byte offset 
line up in a way that makes sense, and even then free_iova_mem() 
explicitly knows not to do that so something would have had to corrupt 
anchor->pfn_lo in the meantime :/


Is this reproducible reliably enough that you can bisect it?

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu