Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-06-07 Thread Jean-Philippe Brucker
On 06/06/18 22:22, Jacob Pan wrote:
> On Wed, 6 Jun 2018 12:20:51 +0100
> Jean-Philippe Brucker  wrote:
> 
>> On 05/06/18 18:32, Jacob Pan wrote:
 "bytes" could be passed by VFIO as argument to bind_pasid_table,
 since it can deduce it from argsz
  
>>> Are you suggesting we wrap this struct in a vfio struct with argsz?
>>> or we directly use this struct?
>>>
>>> I need to clarify how vfio will use this.  
>>
>> Right, I think we've diverged a bit since the last discussion :)
>>
>>> - User program:
>>> struct pasid_table_config ptc = { .bytes = sizeof(ptc) };
>>> ptc.version = 1;
>>> ioctl(device, VFIO_DEVICE_BIND_PASID_TABLE, );  
>>
>> Any reason to do the ioctl on device instead of container? As we're
>> binding address spaces we probably want a consistent view for the
>> whole container, like the MAP/UNMAP ioctls do.
>>
> I was thinking the pasid table storage is per device, it would be
> more secure if the pasid table is contained within the device. We
> should have one device per container in most cases.
> in case of two or more devices in the same container shares the same
> pasid table, isolation may not be good in that the second device can
> dma with pasids it does not own but in the shared pasid table.

The situation seems similar to map/unmap interface: if two devices are
in the same container, they are not isolated from each others, they
access the same address space. One device can access mappings that were
created for the other, and it's a feature rather than a security issue.
In a non-SVA configuration, if user wants to isolate two devices (the
usual case), they will use different containers. With SVA I think they
should keep doing that. But that's probably a matter of taste more than
a technical problem.

My issue with doing the ioctl on device, though, is that we tell users
that we can isolate PASIDs at device granularity, which isn't
necessarily the case. If two PCI devices are in the same group because
they aren't isolated by ACS (they can do p2p), then a BIND_PASID_TABLE
call on one device might allow the other device to see the same address
spaces, even if that other device doesn't have a pasid table.

In my host-sva patches I don't allow bind if there's more than one
device in the group, but that's only to keep the series simple, and I
don't think we should prevent SVA support for multi-device groups from
being added later (some people might actually want p2p + PASID). So if
not on containers, the ioctl should at least be on groups. Otherwise
we'll make false promises to users and might run into trouble later.

>> As I remember it the userspace interface would use a VFIO header and
>> the BIND ioctl. I can't find the email in my archive though, so I
>> might be imagining it. This is what I remember, on the user side:
>>
>> struct {
>>  struct vfio_iommu_type1_bindhdr;
>>  struct pasid_table_config   cfg;
>> } bind = {
>>  .hdr.argsz  = sizeof(bind),
>>  .hdr.flags  = VFIO_IOMMU_BIND_PASID_TABLE,
>>  /* cfg data here */
>> };
>>
>> ioctl(container, VFIO_DEVICE_BIND, );
>>
> or maybe just use your VFIO_IOMMU_BIND command and vfio_iommu_type1_bind
> with a new flag and PTC as the data. there can be future extensions,
> bind pasid table can be too narrow. And i agree below using argsz and
> flags are more flexible.
> 
> i.e.
> /* takes pasid_table_config as data for flag VFIO_IOMMU_BIND_PASIDTBL */
> struct vfio_iommu_type1_bind {
>   __u32   argsz;
>   __u32   flags;
> #define VFIO_IOMMU_BIND_PROCESS   (1 << 0)
> #define VFIO_IOMMU_BIND_PASIDTBL  (1 << 1)
>   __u8data[];
> };
> 
> pseudo code in kernel:
>   switch (bind.flags) {
>   case VFIO_IOMMU_BIND_PROCESS:
>   return vfio_iommu_type1_bind_process(iommu, (void *)arg,
>);
>   case VFIO_IOMMU_BIND_PASIDTBL:
>   return vfio_iommu_type1_bind_pasid_tbl(iommu, );
> }
> 
> vfio_iommu_type1_bind_pasid_tbl(iommu, bind)
> {
>   /* loop through domain list, group, device */
>   struct pasid_table_cfg *ptc = bind->data;
>   iommu_bind_pasid_table(domain, device, ptc);
> }

Seems sensible

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-06-06 Thread Jacob Pan
On Wed, 6 Jun 2018 12:20:51 +0100
Jean-Philippe Brucker  wrote:

> On 05/06/18 18:32, Jacob Pan wrote:
> >> "bytes" could be passed by VFIO as argument to bind_pasid_table,
> >> since it can deduce it from argsz
> >>  
> > Are you suggesting we wrap this struct in a vfio struct with argsz?
> > or we directly use this struct?
> > 
> > I need to clarify how vfio will use this.  
> 
> Right, I think we've diverged a bit since the last discussion :)
> 
> > - User program:
> > struct pasid_table_config ptc = { .bytes = sizeof(ptc) };
> > ptc.version = 1;
> > ioctl(device, VFIO_DEVICE_BIND_PASID_TABLE, );  
> 
> Any reason to do the ioctl on device instead of container? As we're
> binding address spaces we probably want a consistent view for the
> whole container, like the MAP/UNMAP ioctls do.
> 
I was thinking the pasid table storage is per device, it would be
more secure if the pasid table is contained within the device. We
should have one device per container in most cases.
in case of two or more devices in the same container shares the same
pasid table, isolation may not be good in that the second device can
dma with pasids it does not own but in the shared pasid table.

> As I remember it the userspace interface would use a VFIO header and
> the BIND ioctl. I can't find the email in my archive though, so I
> might be imagining it. This is what I remember, on the user side:
> 
> struct {
>   struct vfio_iommu_type1_bindhdr;
>   struct pasid_table_config   cfg;
> } bind = {
>   .hdr.argsz  = sizeof(bind),
>   .hdr.flags  = VFIO_IOMMU_BIND_PASID_TABLE,
>   /* cfg data here */
> };
> 
> ioctl(container, VFIO_DEVICE_BIND, );
> 
or maybe just use your VFIO_IOMMU_BIND command and vfio_iommu_type1_bind
with a new flag and PTC as the data. there can be future extensions,
bind pasid table can be too narrow. And i agree below using argsz and
flags are more flexible.

i.e.
/* takes pasid_table_config as data for flag VFIO_IOMMU_BIND_PASIDTBL */
struct vfio_iommu_type1_bind {
__u32   argsz;
__u32   flags;
#define VFIO_IOMMU_BIND_PROCESS (1 << 0)
#define VFIO_IOMMU_BIND_PASIDTBL(1 << 1)
__u8data[];
};

pseudo code in kernel:
switch (bind.flags) {
case VFIO_IOMMU_BIND_PROCESS:
return vfio_iommu_type1_bind_process(iommu, (void *)arg,
 );
case VFIO_IOMMU_BIND_PASIDTBL:
return vfio_iommu_type1_bind_pasid_tbl(iommu, );
}

vfio_iommu_type1_bind_pasid_tbl(iommu, bind)
{
/* loop through domain list, group, device */
struct pasid_table_cfg *ptc = bind->data;
iommu_bind_pasid_table(domain, device, ptc);
}


> 
> But I don't feel strongly about the interface. However I'd suggest to
> keep incremental versioning like the rest of VFIO, with argsz and
> flags, instead of version numbers, because it's more flexible.
> 
> Initially the PTC struct would look like:
> struct pasid_table_config {
>   u32 argsz;  /* sizeof(pasid_table_config) */
>   u32 flags;  /* Should be zero */
>   u64 base_ptr;
>   u8 model;
>   u8 pasid_bits;
> };
> 
> (Even though it doesn't use a version field let's call this version 1
> for the sake of the example)
> 
> --
> If someone wants to add a new field to the structure, then they also
> add a flag (let's call this version 2):
> 
> struct pasid_table_config {
>   u32 argsz;
> #define PASID_TABLE_CONFIG_EXTN (1 << 0)
>   u32 flags;
>   u64 base_ptr;
>   u8 model;
>   u8 pasid_bits;
>   u64 some_extension;
> };
> 
> * Assume user has a version 2 header and kernel has a version 1
> header.
>   * If user doesn't want the extension, it doesn't set the EXTN flag.
> The ioctl succeeds because the kernel checks that argsz >=
> offsetofend(pasid_bits) and that (flags == 0).
>   * If user wants to use the extension, it sets the EXTN flag. The
> ioctl fails because the kernel doesn't recognize the flag.
> * Assume user has version 1 and kernel has version 2.
>   * User doesn't use the extension. The kernel still checks that
> argsz >= offsetofend(pasid_bits), but also that (flags &
> ~PASID_TABLE_CONFIG_EXTN), which succeeds.
>   * User wants the extension, sets PASID_TABLE_CONFIG_EXTN. When
> seeing the flag, the kernel additionally checks that argsz >=
> offsetofend(some_extension), which succeeds.
> 
> --
> Adding model-specific fields is a bit more complicated, because I
> think they should always stay at the end of the struct. One solution
> is to add padding for common extensions:
> 
> struct pasid_table_config {
>   u32 argsz;
>   u32 flags;
>   u64 base_ptr;
>   u8 model;
>   u8 pasid_bits;
>   u8 padding[64];
> 
>   union {
> struct {
>   u8 s1dss;
>   u8 s1fmt;
> } model_arm;
> struct {
>   u64 foo;
> } model_bar;
>   };
> };
> 
> (we might call this version 3, but can be added before or after
> version 2, it doesn't matter)
> 

Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-06-06 Thread Jean-Philippe Brucker
On 05/06/18 18:32, Jacob Pan wrote:
>> "bytes" could be passed by VFIO as argument to bind_pasid_table, since
>> it can deduce it from argsz
>>
> Are you suggesting we wrap this struct in a vfio struct with argsz? or
> we directly use this struct?
> 
> I need to clarify how vfio will use this.

Right, I think we've diverged a bit since the last discussion :)

> - User program:
> struct pasid_table_config ptc = { .bytes = sizeof(ptc) };
> ptc.version = 1;
> ioctl(device, VFIO_DEVICE_BIND_PASID_TABLE, );

Any reason to do the ioctl on device instead of container? As we're
binding address spaces we probably want a consistent view for the whole
container, like the MAP/UNMAP ioctls do.

As I remember it the userspace interface would use a VFIO header and the
BIND ioctl. I can't find the email in my archive though, so I might be
imagining it. This is what I remember, on the user side:

struct {
struct vfio_iommu_type1_bindhdr;
struct pasid_table_config   cfg;
} bind = {
.hdr.argsz  = sizeof(bind),
.hdr.flags  = VFIO_IOMMU_BIND_PASID_TABLE,
/* cfg data here */
};

ioctl(container, VFIO_DEVICE_BIND, );


But I don't feel strongly about the interface. However I'd suggest to
keep incremental versioning like the rest of VFIO, with argsz and flags,
instead of version numbers, because it's more flexible.

Initially the PTC struct would look like:
struct pasid_table_config {
  u32 argsz;/* sizeof(pasid_table_config) */
  u32 flags;/* Should be zero */
  u64 base_ptr;
  u8 model;
  u8 pasid_bits;
};

(Even though it doesn't use a version field let's call this version 1
for the sake of the example)

--
If someone wants to add a new field to the structure, then they also add
a flag (let's call this version 2):

struct pasid_table_config {
  u32 argsz;
#define PASID_TABLE_CONFIG_EXTN (1 << 0)
  u32 flags;
  u64 base_ptr;
  u8 model;
  u8 pasid_bits;
  u64 some_extension;
};

* Assume user has a version 2 header and kernel has a version 1 header.
  * If user doesn't want the extension, it doesn't set the EXTN flag.
The ioctl succeeds because the kernel checks that argsz >=
offsetofend(pasid_bits) and that (flags == 0).
  * If user wants to use the extension, it sets the EXTN flag. The ioctl
fails because the kernel doesn't recognize the flag.
* Assume user has version 1 and kernel has version 2.
  * User doesn't use the extension. The kernel still checks that
argsz >= offsetofend(pasid_bits), but also that (flags &
~PASID_TABLE_CONFIG_EXTN), which succeeds.
  * User wants the extension, sets PASID_TABLE_CONFIG_EXTN. When
seeing the flag, the kernel additionally checks that argsz >=
offsetofend(some_extension), which succeeds.

--
Adding model-specific fields is a bit more complicated, because I think
they should always stay at the end of the struct. One solution is to add
padding for common extensions:

struct pasid_table_config {
  u32 argsz;
  u32 flags;
  u64 base_ptr;
  u8 model;
  u8 pasid_bits;
  u8 padding[64];

  union {
struct {
  u8 s1dss;
  u8 s1fmt;
} model_arm;
struct {
  u64 foo;
} model_bar;
  };
};

(we might call this version 3, but can be added before or after version
2, it doesn't matter)

A subsequent extension can still add the "some_extension" field and a
flag. If the kernel sees model "ARM", then it checks argsz >=
offsetofend(model_arm). If it sees model "BAR" then it checks argsz >=
offsetofend(model_bar). A model could also have flags to make the
model-specific structure extensible.

The problem is when we run out of space in the padding area, but we
might not need much extensibility in the common part.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-06-05 Thread Jacob Pan
On Thu, 31 May 2018 10:09:46 +0100
Jean-Philippe Brucker  wrote:

> On 30/05/18 20:52, Jacob Pan wrote:
> >> However I think the model number should be added to
> >> pasid_table_config. For one thing it gives us a simple
> >> sanity-check, but it also tells which other fields are valid in
> >> pasid_table_config. Arm-smmu-v3 needs at least two additional
> >> 8-bit fields describing the PASID table format (number of levels
> >> and PASID0 behaviour), which are written to device context tables
> >> when installing the PASID table pointer.
> >>  
> > We had model number field in v2 of this patchset. My thought was
> > that since the config info is meant to be generic, we shouldn't
> > include model info. But I also think a simple sanity check can be
> > useful, would that be sufficient to address Alex's concern? Of
> > course we still need sysfs for more specific IOMMU features.
> > 
> > Would this work?
> > enum pasid_table_model {
> > PASID_TABLE_FORMAT_HOST,
> > PASID_TABLE_FORMAT_ARM_1LVL,
> > PASID_TABLE_FORMAT_ARM_2LVL,  
> 
> I'd rather use a single PASID_TABLE_FORMAT_ARM, because "2LVL" may be
> further split into 2LVL_4k or 2LVL_64k leaf tables... I think it's
> best if I add an arch-specific field in pasid_table_config for that,
> and for the PASID0 configuration, when adding FORMAT_ARM in a future
> patch
> 
sounds good. will only use PASID_TABLE_FORMAT_ARM.
> > PASID_TABLE_FORMAT_AMD,
> > PASID_TABLE_FORMAT_INTEL,
> > };
> > 
> > /**
> >  * PASID table data used to bind guest PASID table to the host
> > IOMMU. This will
> >  * enable guest managed first level page tables.
> >  * @version: for future extensions and identification of the data
> > format
> >  * @bytes: size of this structure
> >  * @model: PASID table format for different IOMMU models
> >  * @base_ptr:   PASID table pointer
> >  * @pasid_bits: number of bits supported in the guest PASID
> > table, must be less
> >  *  or equal than the host supported PASID size.
> >  */
> > struct pasid_table_config {
> > __u32 version;
> > #define PASID_TABLE_CFG_VERSION_1 1
> > __u32 bytes;  
> 
> "bytes" could be passed by VFIO as argument to bind_pasid_table, since
> it can deduce it from argsz
> 
Are you suggesting we wrap this struct in a vfio struct with argsz? or
we directly use this struct?

I need to clarify how vfio will use this.
- User program:
struct pasid_table_config ptc = { .bytes = sizeof(ptc) };
ptc.version = 1;
ioctl(device, VFIO_DEVICE_BIND_PASID_TABLE, );

- Kernel:
minsz = offsetofend(struct pasid_table_config, pasid_bits);
if (ptc.bytes < minsz)
return -EINVAL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-31 Thread Jean-Philippe Brucker
On 30/05/18 20:52, Jacob Pan wrote:
>> However I think the model number should be added to
>> pasid_table_config. For one thing it gives us a simple sanity-check,
>> but it also tells which other fields are valid in pasid_table_config.
>> Arm-smmu-v3 needs at least two additional 8-bit fields describing the
>> PASID table format (number of levels and PASID0 behaviour), which are
>> written to device context tables when installing the PASID table
>> pointer.
>>
> We had model number field in v2 of this patchset. My thought was that
> since the config info is meant to be generic, we shouldn't include
> model info. But I also think a simple sanity check can be useful,
> would that be sufficient to address Alex's concern? Of course we still
> need sysfs for more specific IOMMU features.
> 
> Would this work?
> enum pasid_table_model {
>   PASID_TABLE_FORMAT_HOST,
>   PASID_TABLE_FORMAT_ARM_1LVL,
>   PASID_TABLE_FORMAT_ARM_2LVL,

I'd rather use a single PASID_TABLE_FORMAT_ARM, because "2LVL" may be
further split into 2LVL_4k or 2LVL_64k leaf tables... I think it's best
if I add an arch-specific field in pasid_table_config for that, and for
the PASID0 configuration, when adding FORMAT_ARM in a future patch

>   PASID_TABLE_FORMAT_AMD,
>   PASID_TABLE_FORMAT_INTEL,
> };
> 
> /**
>  * PASID table data used to bind guest PASID table to the host IOMMU. This 
> will
>  * enable guest managed first level page tables.
>  * @version: for future extensions and identification of the data format
>  * @bytes: size of this structure
>  * @model: PASID table format for different IOMMU models
>  * @base_ptr: PASID table pointer
>  * @pasid_bits:   number of bits supported in the guest PASID table, must 
> be less
>  *or equal than the host supported PASID size.
>  */
> struct pasid_table_config {
>   __u32 version;
> #define PASID_TABLE_CFG_VERSION_1 1
>   __u32 bytes;

"bytes" could be passed by VFIO as argument to bind_pasid_table, since
it can deduce it from argsz

Thanks,
Jean

>   enum pasid_table_model model;
>   __u64 base_ptr;
>   __u8 pasid_bits;
> };
> 
> 
> 
>> Compatibility: new optional features are easy to add to a given model,
>> just add a new sysfs file. If in the future, the host describes a new
>> feature that is mandatory, or implements a different PASID table
>> format, how does it ensure that user understands it? Perhaps use a
>> new model number for this, e.g. "arm-smmu-v3-a=3", with similar
>> features. I think it would be the same if the host stops supporting a
>> feature for a given model, because they are ABI. But we can also
>> define default values from the start, for example "if ssid_bits file
>> isn't present, default value is 0 - PASID not supported"
>>
>> Thanks,
>> Jean
> 
> [Jacob Pan]
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-30 Thread Jacob Pan
On Wed, 30 May 2018 12:53:53 +0100
Jean-Philippe Brucker  wrote:

> On 30/05/18 04:45, Tian, Kevin wrote:
> >> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so
> >> a  
>  guest  
> >> under a vSMMU might pass a pointer that's not aligned on 4k.
> >>  
> > PASID table pointer for VT-d is 4K aligned.  
> >> Maybe this information could be part of the data passed to  
> >> userspace  
> >> about IOMMU table formats and features? They're not part of
> >> this series, but I think we wanted to communicate
> >> IOMMU-specific  
> >> features  
> >> via sysfs.
> >>  
> > Agreed, I believe Yi Liu is working on a sysfs interface such
> > that QEMU can match IOMMU model and features.  
> 
>  Digging this up again since v5 still has this issue.  The IOMMU
>  API is a kernel internal abstraction of the IOMMU.  sysfs is a
>  userspace interface.  Are we suggesting that the /only/ way to
>  make use of the internal IOMMU API here is to have a user
>  provided opaque pasid table that we can't even do minimal
>  compatibility sanity testing on and we simply hope that hardware
>  covers all the fault conditions without taking the host down
>  with it?  I guess we have to assume the latter since the user
>  has full control of the table, but I have a hard time getting
>  past lack of internal ability to use the interface and no
>  ability to provide even the slimmest sanity testing.  Thanks, 
> >>>
> >>> checking size, alignment, ... is OK, which I think is already
> >>> considered by vendor IOMMU driver. However sanity testing table
> >>> format might be difficult. The initial table provided by guest is
> >>> likely just all ZEROs. whatever format violation may be caught
> >>> only when a PASID entry is updated...  
> >>
> >> There's sanity testing the actual contents of the table, which I
> >> agree would be difficult and would likely require some sort of
> >> shadowing at additional overhead, but what about even basic
> >> consistency checking? For example, is it possible that due to
> >> hardware variations a user might generate a table which works on
> >> some systems but not others? What
> >> if two table formats are sufficiently similar that the IOMMU driver
> >> puts an incompatible table in place but it continuously generates
> >> faults, how do we debug that?  As an intermediary in this whole
> >> process I'd really rather be able to identify that the user claims
> >> to be providing a TypeA table but the IOMMU only supports TypeB,
> >> so clearly this won't work.  I don't see that we have that
> >> capability.  Thanks,  
> >
> > I remember we ever discussed to define some vendor/model ID,
> > which can be retrieved by user space and then passed back when
> > doing table binding. Then above simple model matching check can
> > be done accordingly. It is actually a basic requirement when using 
> > virtio-iommu, same driver expecting to work on all vendor IOMMUs.
> > 
> > However I don't remember whether/where that logic is implemented
> > in this series (especially when there are two tracks moving in
> > parallel). I'll leave to Jacob/Jean to further comment.  
> 
> For Arm we do need some form of sanity checking. As each architecture
> version brings a new set of features that may be supported and enabled
> individually, we need to communicate fine-grained features to users.
> They describes the general capability of the physical IOMMU, and also
> which fields are available in the PASID table (entries are 512-bits
> and leave some space for future extensions).
> 
> In the past I briefly tried using a ioctl-based interface through VFIO
> only, but it seemed more complicated to extend than sysfs for this
> kind of probing.
> 
> Note that the following is from my own prototype. I'm not sure how
> much Yi Liu's implementation differs but I think this was roughly
> what we agreed on last time. In sysfs an IOMMU device is described
> with:
> 
> * A model number, for example intel-vtd=1, arm-smmu-v3=2.
> * Properties and features, describing in detail what the pIOMMU device
>   and driver support.
> 
> /sys/class/iommu///
> 
> For example an SMMUv3:
> 
> The model number is described as a property
> /sys/class/iommu/smmu.0xe060/arm-smmu-v3/model = 2
> 
> A few feature bits and values:
> .../arm-smmu-v3/asid_bits // max address space ID bits, %d
> .../arm-smmu-v3/ssid_bits // max substream ID (PASID) bits, %d
> .../arm-smmu-v3/input_bits// max input address size, %d
> .../arm-smmu-v3/output_bits   // max output address size, %d
> .../arm-smmu-v3/btm   // broadcast TLB maintenance,
> enabled/disabled .../arm-smmu-v3/httu // Hardware
> table update,
> access+dirty/access/none .../arm-smmu-v3/stall//
> transaction stalling, enabled/disabled/force
> 
> (Note that the base pointer alignment previously discussed could be
> implied by the model number, or 

Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-30 Thread Jean-Philippe Brucker
On 30/05/18 04:45, Tian, Kevin wrote:
>> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a
 guest
>> under a vSMMU might pass a pointer that's not aligned on 4k.
>>
> PASID table pointer for VT-d is 4K aligned.
>> Maybe this information could be part of the data passed to
>> userspace
>> about IOMMU table formats and features? They're not part of this
>> series, but I think we wanted to communicate IOMMU-specific
>> features
>> via sysfs.
>>
> Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
> can match IOMMU model and features.

 Digging this up again since v5 still has this issue.  The IOMMU API is
 a kernel internal abstraction of the IOMMU.  sysfs is a userspace
 interface.  Are we suggesting that the /only/ way to make use of the
 internal IOMMU API here is to have a user provided opaque pasid table
 that we can't even do minimal compatibility sanity testing on and we
 simply hope that hardware covers all the fault conditions without
 taking the host down with it?  I guess we have to assume the latter
 since the user has full control of the table, but I have a hard time
 getting past lack of internal ability to use the interface and no
 ability to provide even the slimmest sanity testing.  Thanks,

>>>
>>> checking size, alignment, ... is OK, which I think is already considered
>>> by vendor IOMMU driver. However sanity testing table format might
>>> be difficult. The initial table provided by guest is likely just all ZEROs.
>>> whatever format violation may be caught only when a PASID entry
>>> is updated...
>>
>> There's sanity testing the actual contents of the table, which I agree
>> would be difficult and would likely require some sort of shadowing at
>> additional overhead, but what about even basic consistency checking?
>> For example, is it possible that due to hardware variations a user
>> might generate a table which works on some systems but not others?
>> What
>> if two table formats are sufficiently similar that the IOMMU driver
>> puts an incompatible table in place but it continuously generates
>> faults, how do we debug that?  As an intermediary in this whole process
>> I'd really rather be able to identify that the user claims to be
>> providing a TypeA table but the IOMMU only supports TypeB, so clearly
>> this won't work.  I don't see that we have that capability.  Thanks,
>
> I remember we ever discussed to define some vendor/model ID,
> which can be retrieved by user space and then passed back when
> doing table binding. Then above simple model matching check can
> be done accordingly. It is actually a basic requirement when using 
> virtio-iommu, same driver expecting to work on all vendor IOMMUs.
> 
> However I don't remember whether/where that logic is implemented
> in this series (especially when there are two tracks moving in parallel).
> I'll leave to Jacob/Jean to further comment.

For Arm we do need some form of sanity checking. As each architecture
version brings a new set of features that may be supported and enabled
individually, we need to communicate fine-grained features to users.
They describes the general capability of the physical IOMMU, and also
which fields are available in the PASID table (entries are 512-bits and
leave some space for future extensions).

In the past I briefly tried using a ioctl-based interface through VFIO
only, but it seemed more complicated to extend than sysfs for this kind
of probing.

Note that the following is from my own prototype. I'm not sure how much
Yi Liu's implementation differs but I think this was roughly what we
agreed on last time. In sysfs an IOMMU device is described with:

* A model number, for example intel-vtd=1, arm-smmu-v3=2.
* Properties and features, describing in detail what the pIOMMU device
  and driver support.

/sys/class/iommu///

For example an SMMUv3:

The model number is described as a property
/sys/class/iommu/smmu.0xe060/arm-smmu-v3/model = 2

A few feature bits and values:
.../arm-smmu-v3/asid_bits   // max address space ID bits, %d
.../arm-smmu-v3/ssid_bits   // max substream ID (PASID) bits, %d
.../arm-smmu-v3/input_bits  // max input address size, %d
.../arm-smmu-v3/output_bits // max output address size, %d
.../arm-smmu-v3/btm // broadcast TLB maintenance, enabled/disabled
.../arm-smmu-v3/httu// Hardware table update, 
access+dirty/access/none
.../arm-smmu-v3/stall   // transaction stalling, enabled/disabled/force

(Note that the base pointer alignment previously discussed could be
implied by the model number, or added explicitly here.)

Which page table formats are supported:
.../arm-smmu-v3/pgtable_format/lpae-64
.../arm-smmu-v3/pgtable_format/v7s
I'm not sure yet what values these will have, they might simply contain
arbitrary format numbers because fields available in the page tables can
be deduced from the 

RE: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-29 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, May 30, 2018 11:18 AM
> 
> On Wed, 30 May 2018 01:41:43 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, May 30, 2018 4:09 AM
> > >
> > > On Fri, 20 Apr 2018 16:42:51 -0700
> > > Jacob Pan  wrote:
> > >
> > > > On Fri, 20 Apr 2018 19:25:34 +0100
> > > > Jean-Philippe Brucker  wrote:
> > > >
> > > > > On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
> > > > > [...]
> > > > > > > + /* Assign guest PASID table pointer and size order */
> > > > > > > + ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > > > > > > + (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);
> > > > > >
> > > > > > Where does this IOMMU API interface define that base_ptr is 4K
> > > > > > aligned or the format of the PASID table?  Are these all
> > > > > > standardized or do they vary by host IOMMU?  If they're standards,
> > > > > > maybe we could note that and the spec which defines them when
> we
> > > > > > declare base_ptr.  If they're IOMMU specific then I don't
> > > > > > understand how we'll match a user provided PASID table to the
> > > > > > requirements and format of the host IOMMU. Thanks,
> > > > >
> > > > > On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a
> > > guest
> > > > > under a vSMMU might pass a pointer that's not aligned on 4k.
> > > > >
> > > > PASID table pointer for VT-d is 4K aligned.
> > > > > Maybe this information could be part of the data passed to
> userspace
> > > > > about IOMMU table formats and features? They're not part of this
> > > > > series, but I think we wanted to communicate IOMMU-specific
> features
> > > > > via sysfs.
> > > > >
> > > > Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
> > > > can match IOMMU model and features.
> > >
> > > Digging this up again since v5 still has this issue.  The IOMMU API is
> > > a kernel internal abstraction of the IOMMU.  sysfs is a userspace
> > > interface.  Are we suggesting that the /only/ way to make use of the
> > > internal IOMMU API here is to have a user provided opaque pasid table
> > > that we can't even do minimal compatibility sanity testing on and we
> > > simply hope that hardware covers all the fault conditions without
> > > taking the host down with it?  I guess we have to assume the latter
> > > since the user has full control of the table, but I have a hard time
> > > getting past lack of internal ability to use the interface and no
> > > ability to provide even the slimmest sanity testing.  Thanks,
> > >
> >
> > checking size, alignment, ... is OK, which I think is already considered
> > by vendor IOMMU driver. However sanity testing table format might
> > be difficult. The initial table provided by guest is likely just all ZEROs.
> > whatever format violation may be caught only when a PASID entry
> > is updated...
> 
> There's sanity testing the actual contents of the table, which I agree
> would be difficult and would likely require some sort of shadowing at
> additional overhead, but what about even basic consistency checking?
> For example, is it possible that due to hardware variations a user
> might generate a table which works on some systems but not others?
> What
> if two table formats are sufficiently similar that the IOMMU driver
> puts an incompatible table in place but it continuously generates
> faults, how do we debug that?  As an intermediary in this whole process
> I'd really rather be able to identify that the user claims to be
> providing a TypeA table but the IOMMU only supports TypeB, so clearly
> this won't work.  I don't see that we have that capability.  Thanks,
> 

I remember we ever discussed to define some vendor/model ID,
which can be retrieved by user space and then passed back when
doing table binding. Then above simple model matching check can
be done accordingly. It is actually a basic requirement when using 
virtio-iommu, same driver expecting to work on all vendor IOMMUs.

However I don't remember whether/where that logic is implemented
in this series (especially when there are two tracks moving in parallel).
I'll leave to Jacob/Jean to further comment.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-29 Thread Alex Williamson
On Wed, 30 May 2018 01:41:43 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, May 30, 2018 4:09 AM
> > 
> > On Fri, 20 Apr 2018 16:42:51 -0700
> > Jacob Pan  wrote:
> >   
> > > On Fri, 20 Apr 2018 19:25:34 +0100
> > > Jean-Philippe Brucker  wrote:
> > >  
> > > > On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
> > > > [...]  
> > > > > > +   /* Assign guest PASID table pointer and size order */
> > > > > > +   ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > > > > > +   (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);  
> > > > >
> > > > > Where does this IOMMU API interface define that base_ptr is 4K
> > > > > aligned or the format of the PASID table?  Are these all
> > > > > standardized or do they vary by host IOMMU?  If they're standards,
> > > > > maybe we could note that and the spec which defines them when we
> > > > > declare base_ptr.  If they're IOMMU specific then I don't
> > > > > understand how we'll match a user provided PASID table to the
> > > > > requirements and format of the host IOMMU. Thanks,  
> > > >
> > > > On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a  
> > guest  
> > > > under a vSMMU might pass a pointer that's not aligned on 4k.
> > > >  
> > > PASID table pointer for VT-d is 4K aligned.  
> > > > Maybe this information could be part of the data passed to userspace
> > > > about IOMMU table formats and features? They're not part of this
> > > > series, but I think we wanted to communicate IOMMU-specific features
> > > > via sysfs.
> > > >  
> > > Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
> > > can match IOMMU model and features.  
> > 
> > Digging this up again since v5 still has this issue.  The IOMMU API is
> > a kernel internal abstraction of the IOMMU.  sysfs is a userspace
> > interface.  Are we suggesting that the /only/ way to make use of the
> > internal IOMMU API here is to have a user provided opaque pasid table
> > that we can't even do minimal compatibility sanity testing on and we
> > simply hope that hardware covers all the fault conditions without
> > taking the host down with it?  I guess we have to assume the latter
> > since the user has full control of the table, but I have a hard time
> > getting past lack of internal ability to use the interface and no
> > ability to provide even the slimmest sanity testing.  Thanks,
> >   
> 
> checking size, alignment, ... is OK, which I think is already considered
> by vendor IOMMU driver. However sanity testing table format might 
> be difficult. The initial table provided by guest is likely just all ZEROs. 
> whatever format violation may be caught only when a PASID entry 
> is updated...

There's sanity testing the actual contents of the table, which I agree
would be difficult and would likely require some sort of shadowing at
additional overhead, but what about even basic consistency checking?
For example, is it possible that due to hardware variations a user
might generate a table which works on some systems but not others?  What
if two table formats are sufficiently similar that the IOMMU driver
puts an incompatible table in place but it continuously generates
faults, how do we debug that?  As an intermediary in this whole process
I'd really rather be able to identify that the user claims to be
providing a TypeA table but the IOMMU only supports TypeB, so clearly
this won't work.  I don't see that we have that capability.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-29 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, May 30, 2018 4:09 AM
> 
> On Fri, 20 Apr 2018 16:42:51 -0700
> Jacob Pan  wrote:
> 
> > On Fri, 20 Apr 2018 19:25:34 +0100
> > Jean-Philippe Brucker  wrote:
> >
> > > On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
> > > [...]
> > > > > + /* Assign guest PASID table pointer and size order */
> > > > > + ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > > > > + (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);
> > > >
> > > > Where does this IOMMU API interface define that base_ptr is 4K
> > > > aligned or the format of the PASID table?  Are these all
> > > > standardized or do they vary by host IOMMU?  If they're standards,
> > > > maybe we could note that and the spec which defines them when we
> > > > declare base_ptr.  If they're IOMMU specific then I don't
> > > > understand how we'll match a user provided PASID table to the
> > > > requirements and format of the host IOMMU. Thanks,
> > >
> > > On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a
> guest
> > > under a vSMMU might pass a pointer that's not aligned on 4k.
> > >
> > PASID table pointer for VT-d is 4K aligned.
> > > Maybe this information could be part of the data passed to userspace
> > > about IOMMU table formats and features? They're not part of this
> > > series, but I think we wanted to communicate IOMMU-specific features
> > > via sysfs.
> > >
> > Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
> > can match IOMMU model and features.
> 
> Digging this up again since v5 still has this issue.  The IOMMU API is
> a kernel internal abstraction of the IOMMU.  sysfs is a userspace
> interface.  Are we suggesting that the /only/ way to make use of the
> internal IOMMU API here is to have a user provided opaque pasid table
> that we can't even do minimal compatibility sanity testing on and we
> simply hope that hardware covers all the fault conditions without
> taking the host down with it?  I guess we have to assume the latter
> since the user has full control of the table, but I have a hard time
> getting past lack of internal ability to use the interface and no
> ability to provide even the slimmest sanity testing.  Thanks,
> 

checking size, alignment, ... is OK, which I think is already considered
by vendor IOMMU driver. However sanity testing table format might 
be difficult. The initial table provided by guest is likely just all ZEROs. 
whatever format violation may be caught only when a PASID entry 
is updated...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-05-29 Thread Alex Williamson
On Fri, 20 Apr 2018 16:42:51 -0700
Jacob Pan  wrote:

> On Fri, 20 Apr 2018 19:25:34 +0100
> Jean-Philippe Brucker  wrote:
> 
> > On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
> > [...]  
> > > > +   /* Assign guest PASID table pointer and size order */
> > > > +   ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > > > +   (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);
> > > 
> > > Where does this IOMMU API interface define that base_ptr is 4K
> > > aligned or the format of the PASID table?  Are these all
> > > standardized or do they vary by host IOMMU?  If they're standards,
> > > maybe we could note that and the spec which defines them when we
> > > declare base_ptr.  If they're IOMMU specific then I don't
> > > understand how we'll match a user provided PASID table to the
> > > requirements and format of the host IOMMU. Thanks,
> > 
> > On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a guest
> > under a vSMMU might pass a pointer that's not aligned on 4k.
> >   
> PASID table pointer for VT-d is 4K aligned. 
> > Maybe this information could be part of the data passed to userspace
> > about IOMMU table formats and features? They're not part of this
> > series, but I think we wanted to communicate IOMMU-specific features
> > via sysfs.
> >   
> Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
> can match IOMMU model and features.

Digging this up again since v5 still has this issue.  The IOMMU API is
a kernel internal abstraction of the IOMMU.  sysfs is a userspace
interface.  Are we suggesting that the /only/ way to make use of the
internal IOMMU API here is to have a user provided opaque pasid table
that we can't even do minimal compatibility sanity testing on and we
simply hope that hardware covers all the fault conditions without
taking the host down with it?  I guess we have to assume the latter
since the user has full control of the table, but I have a hard time
getting past lack of internal ability to use the interface and no
ability to provide even the slimmest sanity testing.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-04-20 Thread Jacob Pan
On Fri, 20 Apr 2018 19:25:34 +0100
Jean-Philippe Brucker  wrote:

> On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
> [...]
> > > + /* Assign guest PASID table pointer and size order */
> > > + ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > > + (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);  
> > 
> > Where does this IOMMU API interface define that base_ptr is 4K
> > aligned or the format of the PASID table?  Are these all
> > standardized or do they vary by host IOMMU?  If they're standards,
> > maybe we could note that and the spec which defines them when we
> > declare base_ptr.  If they're IOMMU specific then I don't
> > understand how we'll match a user provided PASID table to the
> > requirements and format of the host IOMMU. Thanks,  
> 
> On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a guest
> under a vSMMU might pass a pointer that's not aligned on 4k.
> 
PASID table pointer for VT-d is 4K aligned. 
> Maybe this information could be part of the data passed to userspace
> about IOMMU table formats and features? They're not part of this
> series, but I think we wanted to communicate IOMMU-specific features
> via sysfs.
> 
Agreed, I believe Yi Liu is working on a sysfs interface such that QEMU
can match IOMMU model and features.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-04-20 Thread Jacob Pan
On Tue, 17 Apr 2018 13:10:47 -0600
Alex Williamson  wrote:

> On Mon, 16 Apr 2018 14:48:53 -0700
> Jacob Pan  wrote:
> 
> > Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> > functions.
> > 
> > The primary use case is for direct assignment of SVM capable
> > device. Originated from emulated IOMMU in the guest, the request
> > goes through many layers (e.g. VFIO). Upon calling host IOMMU
> > driver, caller passes guest PASID table pointer (GPA) and size.
> > 
> > Device context table entry is modified by Intel IOMMU specific
> > bind_pasid_table function. This will turn on nesting mode and
> > matching translation type.
> > 
> > The unbind operation restores default context mapping.
> > 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Ashok Raj 
> > ---
> >  drivers/iommu/intel-iommu.c   | 119
> > ++
> > include/linux/dma_remapping.h |   1 + 2 files changed, 120
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index a0f81a4..d8058be 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -2409,6 +2409,7 @@ static struct dmar_domain
> > *dmar_insert_one_dev_info(struct intel_iommu *iommu,
> > info->ats_supported = info->pasid_supported = info->pri_supported =
> > 0; info->ats_enabled = info->pasid_enabled = info->pri_enabled = 0;
> > info->ats_qdep = 0;
> > +   info->pasid_table_bound = 0;
> > info->dev = dev;
> > info->domain = domain;
> > info->iommu = iommu;
> > @@ -5132,6 +5133,7 @@ static void
> > intel_iommu_put_resv_regions(struct device *dev, 
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> >  #define MAX_NR_PASID_BITS (20)
> > +#define MIN_NR_PASID_BITS (5)
> >  static inline unsigned long intel_iommu_get_pts(struct intel_iommu
> > *iommu) {
> > /*
> > @@ -5258,6 +5260,119 @@ struct intel_iommu
> > *intel_svm_device_to_iommu(struct device *dev) 
> > return iommu;
> >  }
> > +
> > +static int intel_iommu_bind_pasid_table(struct iommu_domain
> > *domain,
> > +   struct device *dev, struct pasid_table_config
> > *pasidt_binfo) +{  
> 
> Never validates pasidt_binfo->{version,bytes}
> 
good catch, will do.
> > +   struct intel_iommu *iommu;
> > +   struct context_entry *context;
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct device_domain_info *info;
> > +   struct pci_dev *pdev;
> > +   u8 bus, devfn, host_table_pasid_bits;
> > +   u16 did, sid;
> > +   int ret = 0;
> > +   unsigned long flags;
> > +   u64 ctx_lo;
> > +
> > +   iommu = device_to_iommu(dev, , );
> > +   if (!iommu)
> > +   return -ENODEV;
> > +   /* VT-d spec section 9.4 says pasid table size is encoded
> > as 2^(x+5) */
> > +   host_table_pasid_bits = intel_iommu_get_pts(iommu) +
> > MIN_NR_PASID_BITS;
> > +   if (!pasidt_binfo || pasidt_binfo->pasid_bits >
> > host_table_pasid_bits ||
> > +   pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
> > +   pr_err("Invalid gPASID bits %d, host range %d -
> > %d\n",
> > +   pasidt_binfo->pasid_bits,
> > +   MIN_NR_PASID_BITS, host_table_pasid_bits);
> > +   return -ERANGE;
> > +   }
> > +   if (!ecap_nest(iommu->ecap)) {
> > +   dev_err(dev, "Cannot bind PASID table, no nested
> > translation\n");
> > +   ret = -EINVAL;
> > +   goto out;
> > +   }  
> 
> Gratuitous use of pr_err, some of these look user reachable, for
> instance can vfio know in advance the supported widths or can the user
> trigger that pr_err at will?
Yes, the current IOMMU sysfs for vt-d does show the content of
capability registers so user could know in advance whether the nested
mode is supported. But I think we are in need of some generic interface
to enumerate IOMMU features. Here I am trying to prepare for the worst.
Are you concerned about security if user can trigger that error at
will? Sorry I didn't get the point.

>  Some of these errno values are also
> maybe not as descriptive as they could be.  For instance if the iommu
> doesn't support nesting, that's not a calling argument error, that's
> an unsupported device error, right?
> 
your are right, that is not invalid argument. You mean use ENODEV?

> > +   pdev = to_pci_dev(dev);
> > +   sid = PCI_DEVID(bus, devfn);
> > +   info = dev->archdata.iommu;
> > +
> > +   if (!info) {
> > +   dev_err(dev, "Invalid device domain info\n");
> > +   ret = -EINVAL;
> > +   goto out;
> > +   }
> > +   if (info->pasid_table_bound) {
> > +   dev_err(dev, "Device PASID table already bound\n");
> > +   ret = -EBUSY;
> > +   goto out;
> > +   }
> > +   if (!info->pasid_enabled) {
> > +   ret = pci_enable_pasid(pdev, info->pasid_supported
> > & ~1);
> > +   if (ret) {
> > +   

Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-04-20 Thread Jean-Philippe Brucker
On Tue, Apr 17, 2018 at 08:10:47PM +0100, Alex Williamson wrote:
[...]
> > +   /* Assign guest PASID table pointer and size order */
> > +   ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
> > +   (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);
> 
> Where does this IOMMU API interface define that base_ptr is 4K
> aligned or the format of the PASID table?  Are these all standardized
> or do they vary by host IOMMU?  If they're standards, maybe we could
> note that and the spec which defines them when we declare base_ptr.  If
> they're IOMMU specific then I don't understand how we'll match a user
> provided PASID table to the requirements and format of the host IOMMU.
> Thanks,

On SMMUv3 the minimum alignment for base_ptr is 64 bytes, so a guest
under a vSMMU might pass a pointer that's not aligned on 4k.

Maybe this information could be part of the data passed to userspace
about IOMMU table formats and features? They're not part of this series,
but I think we wanted to communicate IOMMU-specific features via sysfs.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/22] iommu/vt-d: add bind_pasid_table function

2018-04-17 Thread Alex Williamson

On Mon, 16 Apr 2018 14:48:53 -0700
Jacob Pan  wrote:

> Add Intel VT-d ops to the generic iommu_bind_pasid_table API
> functions.
> 
> The primary use case is for direct assignment of SVM capable
> device. Originated from emulated IOMMU in the guest, the request goes
> through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
> passes guest PASID table pointer (GPA) and size.
> 
> Device context table entry is modified by Intel IOMMU specific
> bind_pasid_table function. This will turn on nesting mode and matching
> translation type.
> 
> The unbind operation restores default context mapping.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/iommu/intel-iommu.c   | 119 
> ++
>  include/linux/dma_remapping.h |   1 +
>  2 files changed, 120 insertions(+)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index a0f81a4..d8058be 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2409,6 +2409,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   info->ats_supported = info->pasid_supported = info->pri_supported = 0;
>   info->ats_enabled = info->pasid_enabled = info->pri_enabled = 0;
>   info->ats_qdep = 0;
> + info->pasid_table_bound = 0;
>   info->dev = dev;
>   info->domain = domain;
>   info->iommu = iommu;
> @@ -5132,6 +5133,7 @@ static void intel_iommu_put_resv_regions(struct device 
> *dev,
>  
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  #define MAX_NR_PASID_BITS (20)
> +#define MIN_NR_PASID_BITS (5)
>  static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu)
>  {
>   /*
> @@ -5258,6 +5260,119 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
> device *dev)
>  
>   return iommu;
>  }
> +
> +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
> + struct device *dev, struct pasid_table_config *pasidt_binfo)
> +{

Never validates pasidt_binfo->{version,bytes}

> + struct intel_iommu *iommu;
> + struct context_entry *context;
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct pci_dev *pdev;
> + u8 bus, devfn, host_table_pasid_bits;
> + u16 did, sid;
> + int ret = 0;
> + unsigned long flags;
> + u64 ctx_lo;
> +
> + iommu = device_to_iommu(dev, , );
> + if (!iommu)
> + return -ENODEV;
> + /* VT-d spec section 9.4 says pasid table size is encoded as 2^(x+5) */
> + host_table_pasid_bits = intel_iommu_get_pts(iommu) + MIN_NR_PASID_BITS;
> + if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits ||
> + pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
> + pr_err("Invalid gPASID bits %d, host range %d - %d\n",
> + pasidt_binfo->pasid_bits,
> + MIN_NR_PASID_BITS, host_table_pasid_bits);
> + return -ERANGE;
> + }
> + if (!ecap_nest(iommu->ecap)) {
> + dev_err(dev, "Cannot bind PASID table, no nested 
> translation\n");
> + ret = -EINVAL;
> + goto out;
> + }

Gratuitous use of pr_err, some of these look user reachable, for
instance can vfio know in advance the supported widths or can the user
trigger that pr_err at will?  Some of these errno values are also maybe
not as descriptive as they could be.  For instance if the iommu doesn't
support nesting, that's not a calling argument error, that's an
unsupported device error, right?

> + pdev = to_pci_dev(dev);
> + sid = PCI_DEVID(bus, devfn);
> + info = dev->archdata.iommu;
> +
> + if (!info) {
> + dev_err(dev, "Invalid device domain info\n");
> + ret = -EINVAL;
> + goto out;
> + }
> + if (info->pasid_table_bound) {
> + dev_err(dev, "Device PASID table already bound\n");
> + ret = -EBUSY;
> + goto out;
> + }
> + if (!info->pasid_enabled) {
> + ret = pci_enable_pasid(pdev, info->pasid_supported & ~1);
> + if (ret) {
> + dev_err(dev, "Failed to enable PASID\n");
> + goto out;
> + }
> + }
> + spin_lock_irqsave(>lock, flags);
> + context = iommu_context_addr(iommu, bus, devfn, 0);
> + if (!context_present(context)) {
> + dev_err(dev, "Context not present\n");
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /* Anticipate guest to use SVM and owns the first level, so we turn
> +  * nested mode on
> +  */
> + ctx_lo = context[0].lo;
> + ctx_lo |= CONTEXT_NESTE | CONTEXT_PRS | CONTEXT_PASIDE;
> + ctx_lo &= ~CONTEXT_TT_MASK;
> + ctx_lo |= CONTEXT_TT_DEV_IOTLB <<