Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-24 Thread Jason Gunthorpe
On Mon, May 20, 2024 at 09:24:09AM +0800, Baolu Lu wrote:
> On 5/15/24 4:37 PM, Tian, Kevin wrote:
> > > +static ssize_t iommufd_fault_fops_write(struct file *filep, const char 
> > > __user
> > > *buf,
> > > + size_t count, loff_t *ppos)
> > > +{
> > > + size_t response_size = sizeof(struct iommu_hwpt_page_response);
> > > + struct iommufd_fault *fault = filep->private_data;
> > > + struct iommu_hwpt_page_response response;
> > > + struct iommufd_device *idev = NULL;
> > > + struct iopf_group *group;
> > > + size_t done = 0;
> > > + int rc;
> > > +
> > > + if (*ppos || count % response_size)
> > > + return -ESPIPE;
> > > +
> > > + mutex_lock(>mutex);
> > > + while (count > done) {
> > > + rc = copy_from_user(, buf + done, response_size);
> > > + if (rc)
> > > + break;
> > > +
> > > + if (!idev || idev->obj.id != response.dev_id)
> > > + idev = container_of(iommufd_get_object(fault->ictx,
> > > +response.dev_id,
> > > +
> > > IOMMUFD_OBJ_DEVICE),
> > > + struct iommufd_device, obj);
> > > + if (IS_ERR(idev))
> > > + break;
> > > +
> > > + group = xa_erase(>faults, response.cookie);
> > > + if (!group)
> > > + break;
> > is 'continue' better?
> 
> If we can't find a matched iopf group here, it means userspace provided
> something wrong. The current logic is that we stop here and tell
> userspace that only part of the faults have been responded to and it
> should retry the remaining responses with the right message.

The usual fd-ish error handling here should be to return a short write
(success) and then userspace will retry with the failing entry at the
start of the buffer and collect the errno.

Jason



RE: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Monday, May 20, 2024 9:34 AM
> 
> On 5/15/24 4:37 PM, Tian, Kevin wrote:
> >> +
> >> +  iopf_group_response(group, response.code);
> > PCIe spec states that a response failure disables the PRI interface. For SR-
> IOV
> > it'd be dangerous allowing user to trigger such code to VF to close the
> entire
> > shared PRI interface.
> >
> > Just another example lacking of coordination for shared capabilities
> between
> > PF/VF. But exposing such gap to userspace makes it worse.
> 
> Yes. You are right.
> 
> >
> > I guess we don't want to make this work depending on that cleanup. The
> > minimal correct thing is to disallow attaching VF to a fault-capable hwpt
> > with a note here that once we turn on support for VF the response failure
> > code should not be forwarded to the hardware. Instead it's an indication
> > that the user cannot serve more requests and such situation waits for
> > a vPRI reset to recover.
> 
> Is it the same thing to disallow PRI for VF in IOMMUFD?
> 

yes


Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/20/24 11:26 AM, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Monday, May 20, 2024 8:41 AM

On 5/15/24 3:57 PM, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Wednesday, May 8, 2024 6:05 PM

On 2024/5/8 8:11, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-

priv.h

index ae65e0b85d69..1a0450a83bd0 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -36,6 +36,10 @@ struct iommu_attach_handle {
struct device   *dev;
refcount_t  users;
};
+   /* attach data for IOMMUFD */
+   struct {
+   void*idev;
+   };

We can use a proper type here, just forward declare it.

But this sequence in the other patch:

+   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+   if (ret) {
+   iommufd_fault_iopf_disable(idev);
+   return ret;
+   }
+
+   handle = iommu_attach_handle_get(idev->igroup->group,

IOMMU_NO_PASID, 0);

+   handle->idev = idev;

Is why I was imagining the caller would allocate, because now we have
the issue that a fault capable domain was installed into the IOMMU
before it's handle could be fully setup, so we have a race where a
fault could come in right between those things. Then what happens?
I suppose we can retry the fault and by the time it comes back the
race should resolve. A bit ugly I suppose.


You are right. It makes more sense if the attached data is allocated and
managed by the caller. I will go in this direction and update my series.
I will also consider other review comments you have given in other
places.



Does this direction imply a new iommu_attach_group_handle() helper
to pass in the caller-allocated handle pointer or exposing a new
iommu_group_set_handle() to set the handle to the group pasid_array
and then having iomm_attach_group() to update the domain info in
the handle?


I will add new iommu_attach/replace/detach_group_handle() helpers. Like
below:

+/**
+ * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU
group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ * @handle: attach handle
+ *
+ * Returns 0 on success and error code on failure.
+ *
+ * This is a variant of iommu_attach_group(). It allows the caller to
provide
+ * an attach handle and use it when the domain is attached. This is
currently
+ * only designed for IOMMUFD to deliver the I/O page faults.
+ */
+int iommu_attach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group,
+ struct iommu_attach_handle *handle)



"currently only designed for IOMMUFD" doesn't sound correct.

design-wise this can be used by anyone which relies on the handle.
There is nothing tied to IOMMUFD.

s/designed for/used by/ is more accurate.


Done.

Best regards,
baolu



RE: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Monday, May 20, 2024 8:41 AM
> 
> On 5/15/24 3:57 PM, Tian, Kevin wrote:
> >> From: Baolu Lu 
> >> Sent: Wednesday, May 8, 2024 6:05 PM
> >>
> >> On 2024/5/8 8:11, Jason Gunthorpe wrote:
> >>> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
>  diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-
> priv.h
>  index ae65e0b85d69..1a0450a83bd0 100644
>  --- a/drivers/iommu/iommu-priv.h
>  +++ b/drivers/iommu/iommu-priv.h
>  @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>   struct device   *dev;
>   refcount_t  users;
>   };
>  +/* attach data for IOMMUFD */
>  +struct {
>  +void*idev;
>  +};
> >>> We can use a proper type here, just forward declare it.
> >>>
> >>> But this sequence in the other patch:
> >>>
> >>> +   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
> >>> +   if (ret) {
> >>> +   iommufd_fault_iopf_disable(idev);
> >>> +   return ret;
> >>> +   }
> >>> +
> >>> +   handle = iommu_attach_handle_get(idev->igroup->group,
> >> IOMMU_NO_PASID, 0);
> >>> +   handle->idev = idev;
> >>>
> >>> Is why I was imagining the caller would allocate, because now we have
> >>> the issue that a fault capable domain was installed into the IOMMU
> >>> before it's handle could be fully setup, so we have a race where a
> >>> fault could come in right between those things. Then what happens?
> >>> I suppose we can retry the fault and by the time it comes back the
> >>> race should resolve. A bit ugly I suppose.
> >>
> >> You are right. It makes more sense if the attached data is allocated and
> >> managed by the caller. I will go in this direction and update my series.
> >> I will also consider other review comments you have given in other
> >> places.
> >>
> >
> > Does this direction imply a new iommu_attach_group_handle() helper
> > to pass in the caller-allocated handle pointer or exposing a new
> > iommu_group_set_handle() to set the handle to the group pasid_array
> > and then having iomm_attach_group() to update the domain info in
> > the handle?
> 
> I will add new iommu_attach/replace/detach_group_handle() helpers. Like
> below:
> 
> +/**
> + * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU
> group
> + * @domain: IOMMU domain to attach
> + * @group: IOMMU group that will be attached
> + * @handle: attach handle
> + *
> + * Returns 0 on success and error code on failure.
> + *
> + * This is a variant of iommu_attach_group(). It allows the caller to
> provide
> + * an attach handle and use it when the domain is attached. This is
> currently
> + * only designed for IOMMUFD to deliver the I/O page faults.
> + */
> +int iommu_attach_group_handle(struct iommu_domain *domain,
> + struct iommu_group *group,
> + struct iommu_attach_handle *handle)
> 

"currently only designed for IOMMUFD" doesn't sound correct.

design-wise this can be used by anyone which relies on the handle.
There is nothing tied to IOMMUFD.

s/designed for/used by/ is more accurate.




Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/15/24 4:37 PM, Tian, Kevin wrote:

+   iopf_free_group(group);
+   done += response_size;
+
+   iommufd_put_object(fault->ictx, >obj);

get/put is unpaired:

if (!idev || idev->obj.id != response.dev_id)
idev = iommufd_get_object();

...

iommufd_put_object(idev);

The intention might be reusing idev if multiple fault responses are
for a same idev. But idev is always put in each iteration then following
messages will access the idev w/o holding the reference.


Good catch. Let me fix it by putting the response queue in the fault
object.

Best regards,
baolu



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/15/24 4:37 PM, Tian, Kevin wrote:

+
+   iopf_group_response(group, response.code);

PCIe spec states that a response failure disables the PRI interface. For SR-IOV
it'd be dangerous allowing user to trigger such code to VF to close the entire
shared PRI interface.

Just another example lacking of coordination for shared capabilities between
PF/VF. But exposing such gap to userspace makes it worse.


Yes. You are right.



I guess we don't want to make this work depending on that cleanup. The
minimal correct thing is to disallow attaching VF to a fault-capable hwpt
with a note here that once we turn on support for VF the response failure
code should not be forwarded to the hardware. Instead it's an indication
that the user cannot serve more requests and such situation waits for
a vPRI reset to recover.


Is it the same thing to disallow PRI for VF in IOMMUFD?

Best regards,
baolu



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/15/24 4:37 PM, Tian, Kevin wrote:

+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user
*buf,
+   size_t count, loff_t *ppos)
+{
+   size_t response_size = sizeof(struct iommu_hwpt_page_response);
+   struct iommufd_fault *fault = filep->private_data;
+   struct iommu_hwpt_page_response response;
+   struct iommufd_device *idev = NULL;
+   struct iopf_group *group;
+   size_t done = 0;
+   int rc;
+
+   if (*ppos || count % response_size)
+   return -ESPIPE;
+
+   mutex_lock(>mutex);
+   while (count > done) {
+   rc = copy_from_user(, buf + done, response_size);
+   if (rc)
+   break;
+
+   if (!idev || idev->obj.id != response.dev_id)
+   idev = container_of(iommufd_get_object(fault->ictx,
+  response.dev_id,
+
IOMMUFD_OBJ_DEVICE),
+   struct iommufd_device, obj);
+   if (IS_ERR(idev))
+   break;
+
+   group = xa_erase(>faults, response.cookie);
+   if (!group)
+   break;

is 'continue' better?


If we can't find a matched iopf group here, it means userspace provided
something wrong. The current logic is that we stop here and tell
userspace that only part of the faults have been responded to and it
should retry the remaining responses with the right message.

Best regards,
baolu



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/15/24 4:37 PM, Tian, Kevin wrote:

@@ -395,6 +396,8 @@ struct iommufd_device {
/* always the physical device */
struct device *dev;
bool enforce_cache_coherency;
+   /* outstanding faults awaiting response indexed by fault group id */
+   struct xarray faults;

this...


+struct iommufd_fault {
+   struct iommufd_object obj;
+   struct iommufd_ctx *ictx;
+   struct file *filep;
+
+   /* The lists of outstanding faults protected by below mutex. */
+   struct mutex mutex;
+   struct list_head deliver;
+   struct list_head response;

...and here worth a discussion.

First the response list is not used. If continuing the choice of queueing
faults per device it should be removed.


You are right. I have removed the response list in the new version.



But I wonder whether it makes more sense to keep this response
queue per fault object. sounds simpler to me.

Also it's unclear why we need the response message to carry the
same info as the request while only id/code/cookie are used.

+struct iommu_hwpt_page_response {
+   __u32 size;
+   __u32 flags;
+   __u32 dev_id;
+   __u32 pasid;
+   __u32 grpid;
+   __u32 code;
+   __u32 cookie;
+   __u32 reserved;
+};

If we keep the response queue in the fault object, the response message
only needs to carry size/flags/code/cookie and cookie can identify the
pending message uniquely in the response queue.


It seems fine from the code's point of view. Let's wait and see whether
there are any concerns from the uAPI's perspective.

Best regards,
baolu



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-19 Thread Baolu Lu

On 5/15/24 3:57 PM, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Wednesday, May 8, 2024 6:05 PM

On 2024/5/8 8:11, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index ae65e0b85d69..1a0450a83bd0 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -36,6 +36,10 @@ struct iommu_attach_handle {
struct device   *dev;
refcount_t  users;
};
+   /* attach data for IOMMUFD */
+   struct {
+   void*idev;
+   };

We can use a proper type here, just forward declare it.

But this sequence in the other patch:

+   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+   if (ret) {
+   iommufd_fault_iopf_disable(idev);
+   return ret;
+   }
+
+   handle = iommu_attach_handle_get(idev->igroup->group,

IOMMU_NO_PASID, 0);

+   handle->idev = idev;

Is why I was imagining the caller would allocate, because now we have
the issue that a fault capable domain was installed into the IOMMU
before it's handle could be fully setup, so we have a race where a
fault could come in right between those things. Then what happens?
I suppose we can retry the fault and by the time it comes back the
race should resolve. A bit ugly I suppose.


You are right. It makes more sense if the attached data is allocated and
managed by the caller. I will go in this direction and update my series.
I will also consider other review comments you have given in other
places.



Does this direction imply a new iommu_attach_group_handle() helper
to pass in the caller-allocated handle pointer or exposing a new
iommu_group_set_handle() to set the handle to the group pasid_array
and then having iomm_attach_group() to update the domain info in
the handle?


I will add new iommu_attach/replace/detach_group_handle() helpers. Like
below:

+/**
+ * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ * @handle: attach handle
+ *
+ * Returns 0 on success and error code on failure.
+ *
+ * This is a variant of iommu_attach_group(). It allows the caller to 
provide
+ * an attach handle and use it when the domain is attached. This is 
currently

+ * only designed for IOMMUFD to deliver the I/O page faults.
+ */
+int iommu_attach_group_handle(struct iommu_domain *domain,
+ struct iommu_group *group,
+ struct iommu_attach_handle *handle)

Best regards,
baolu



RE: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-15 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> @@ -131,6 +131,9 @@ struct iopf_group {
>   struct iommu_attach_handle *attach_handle;
>   /* The device's fault data parameter. */
>   struct iommu_fault_param *fault_param;
> + /* Used by handler provider to hook the group on its own lists. */
> + struct list_head node;
> + u32 cookie;

better put together with attach_handle.

rename 'node' to 'handle_node'

> @@ -128,6 +128,7 @@ enum iommufd_object_type {
>   IOMMUFD_OBJ_HWPT_NESTED,
>   IOMMUFD_OBJ_IOAS,
>   IOMMUFD_OBJ_ACCESS,
> + IOMMUFD_OBJ_FAULT,

Agree with Jason that 'FAULT_QUEUE' sounds a clearer object name.

> @@ -395,6 +396,8 @@ struct iommufd_device {
>   /* always the physical device */
>   struct device *dev;
>   bool enforce_cache_coherency;
> + /* outstanding faults awaiting response indexed by fault group id */
> + struct xarray faults;

this...

> +struct iommufd_fault {
> + struct iommufd_object obj;
> + struct iommufd_ctx *ictx;
> + struct file *filep;
> +
> + /* The lists of outstanding faults protected by below mutex. */
> + struct mutex mutex;
> + struct list_head deliver;
> + struct list_head response;

...and here worth a discussion.

First the response list is not used. If continuing the choice of queueing
faults per device it should be removed.

But I wonder whether it makes more sense to keep this response
queue per fault object. sounds simpler to me.

Also it's unclear why we need the response message to carry the
same info as the request while only id/code/cookie are used.

+struct iommu_hwpt_page_response {
+   __u32 size;
+   __u32 flags;
+   __u32 dev_id;
+   __u32 pasid;
+   __u32 grpid;
+   __u32 code;
+   __u32 cookie;
+   __u32 reserved;
+};

If we keep the response queue in the fault object, the response message
only needs to carry size/flags/code/cookie and cookie can identify the
pending message uniquely in the response queue.

> +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user
> *buf,
> + size_t count, loff_t *ppos)
> +{
> + size_t response_size = sizeof(struct iommu_hwpt_page_response);
> + struct iommufd_fault *fault = filep->private_data;
> + struct iommu_hwpt_page_response response;
> + struct iommufd_device *idev = NULL;
> + struct iopf_group *group;
> + size_t done = 0;
> + int rc;
> +
> + if (*ppos || count % response_size)
> + return -ESPIPE;
> +
> + mutex_lock(>mutex);
> + while (count > done) {
> + rc = copy_from_user(, buf + done, response_size);
> + if (rc)
> + break;
> +
> + if (!idev || idev->obj.id != response.dev_id)
> + idev = container_of(iommufd_get_object(fault->ictx,
> +response.dev_id,
> +
> IOMMUFD_OBJ_DEVICE),
> + struct iommufd_device, obj);
> + if (IS_ERR(idev))
> + break;
> +
> + group = xa_erase(>faults, response.cookie);
> + if (!group)
> + break;

is 'continue' better?

> +
> + iopf_group_response(group, response.code);

PCIe spec states that a response failure disables the PRI interface. For SR-IOV
it'd be dangerous allowing user to trigger such code to VF to close the entire
shared PRI interface.

Just another example lacking of coordination for shared capabilities between
PF/VF. But exposing such gap to userspace makes it worse.

I guess we don't want to make this work depending on that cleanup. The
minimal correct thing is to disallow attaching VF to a fault-capable hwpt
with a note here that once we turn on support for VF the response failure
code should not be forwarded to the hardware. Instead it's an indication
that the user cannot serve more requests and such situation waits for
a vPRI reset to recover.

> + iopf_free_group(group);
> + done += response_size;
> +
> + iommufd_put_object(fault->ictx, >obj);

get/put is unpaired:

if (!idev || idev->obj.id != response.dev_id)
idev = iommufd_get_object();

...

iommufd_put_object(idev);

The intention might be reusing idev if multiple fault responses are
for a same idev. But idev is always put in each iteration then following
messages will access the idev w/o holding the reference.



RE: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-15 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Wednesday, May 8, 2024 6:05 PM
> 
> On 2024/5/8 8:11, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> >> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> >> index ae65e0b85d69..1a0450a83bd0 100644
> >> --- a/drivers/iommu/iommu-priv.h
> >> +++ b/drivers/iommu/iommu-priv.h
> >> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
> >>struct device   *dev;
> >>refcount_t  users;
> >>};
> >> +  /* attach data for IOMMUFD */
> >> +  struct {
> >> +  void*idev;
> >> +  };
> > We can use a proper type here, just forward declare it.
> >
> > But this sequence in the other patch:
> >
> > +   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
> > +   if (ret) {
> > +   iommufd_fault_iopf_disable(idev);
> > +   return ret;
> > +   }
> > +
> > +   handle = iommu_attach_handle_get(idev->igroup->group,
> IOMMU_NO_PASID, 0);
> > +   handle->idev = idev;
> >
> > Is why I was imagining the caller would allocate, because now we have
> > the issue that a fault capable domain was installed into the IOMMU
> > before it's handle could be fully setup, so we have a race where a
> > fault could come in right between those things. Then what happens?
> > I suppose we can retry the fault and by the time it comes back the
> > race should resolve. A bit ugly I suppose.
> 
> You are right. It makes more sense if the attached data is allocated and
> managed by the caller. I will go in this direction and update my series.
> I will also consider other review comments you have given in other
> places.
> 

Does this direction imply a new iommu_attach_group_handle() helper
to pass in the caller-allocated handle pointer or exposing a new
iommu_group_set_handle() to set the handle to the group pasid_array 
and then having iomm_attach_group() to update the domain info in
the handle?


Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-10 Thread Baolu Lu

On 2024/5/8 8:22, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:

+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+  size_t count, loff_t *ppos)
+{
+   size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+   struct iommufd_fault *fault = filep->private_data;
+   struct iommu_hwpt_pgfault data;
+   struct iommufd_device *idev;
+   struct iopf_group *group;
+   struct iopf_fault *iopf;
+   size_t done = 0;
+   int rc;
+
+   if (*ppos || count % fault_size)
+   return -ESPIPE;
+
+   mutex_lock(>mutex);
+   while (!list_empty(>deliver) && count > done) {
+   group = list_first_entry(>deliver,
+struct iopf_group, node);
+
+   if (list_count_nodes(>faults) * fault_size > count - 
done)
+   break;


Can this list_count be precomputed when we build the fault group?


Yes. Done.




+
+   idev = group->attach_handle->idev;
+   if (!idev)
+   break;


This check should be done before adding the fault to the linked
list. See my other note about the race.


Done.




+
+   rc = xa_alloc(>faults, >cookie, group,
+ xa_limit_32b, GFP_KERNEL);
+   if (rc)
+   break;


This error handling is not quite right, if done == 0 then this should
return rc.



+
+   list_for_each_entry(iopf, >faults, list) {
+   iommufd_compose_fault_message(>fault,
+ , idev,
+ group->cookie);
+   rc = copy_to_user(buf + done, , fault_size);
+   if (rc) {
+   xa_erase(>faults, group->cookie);
+   break;


Same here

(same comment on the write side too)


All fixed. Thank you!

Best regards,
baolu




Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-08 Thread Baolu Lu

On 2024/5/8 8:11, Jason Gunthorpe wrote:

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index ae65e0b85d69..1a0450a83bd0 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -36,6 +36,10 @@ struct iommu_attach_handle {
struct device   *dev;
refcount_t  users;
};
+   /* attach data for IOMMUFD */
+   struct {
+   void*idev;
+   };

We can use a proper type here, just forward declare it.

But this sequence in the other patch:

+   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+   if (ret) {
+   iommufd_fault_iopf_disable(idev);
+   return ret;
+   }
+
+   handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 
0);
+   handle->idev = idev;

Is why I was imagining the caller would allocate, because now we have
the issue that a fault capable domain was installed into the IOMMU
before it's handle could be fully setup, so we have a race where a
fault could come in right between those things. Then what happens?
I suppose we can retry the fault and by the time it comes back the
race should resolve. A bit ugly I suppose.


You are right. It makes more sense if the attached data is allocated and
managed by the caller. I will go in this direction and update my series.
I will also consider other review comments you have given in other
places.

Best regards,
baolu



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-07 Thread Jason Gunthorpe
On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> +size_t count, loff_t *ppos)
> +{
> + size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> + struct iommufd_fault *fault = filep->private_data;
> + struct iommu_hwpt_pgfault data;
> + struct iommufd_device *idev;
> + struct iopf_group *group;
> + struct iopf_fault *iopf;
> + size_t done = 0;
> + int rc;
> +
> + if (*ppos || count % fault_size)
> + return -ESPIPE;
> +
> + mutex_lock(>mutex);
> + while (!list_empty(>deliver) && count > done) {
> + group = list_first_entry(>deliver,
> +  struct iopf_group, node);
> +
> + if (list_count_nodes(>faults) * fault_size > count - 
> done)
> + break;

Can this list_count be precomputed when we build the fault group?

> +
> + idev = group->attach_handle->idev;
> + if (!idev)
> + break;

This check should be done before adding the fault to the linked
list. See my other note about the race.

> +
> + rc = xa_alloc(>faults, >cookie, group,
> +   xa_limit_32b, GFP_KERNEL);
> + if (rc)
> + break;

This error handling is not quite right, if done == 0 then this should
return rc.


> +
> + list_for_each_entry(iopf, >faults, list) {
> + iommufd_compose_fault_message(>fault,
> +   , idev,
> +   group->cookie);
> + rc = copy_to_user(buf + done, , fault_size);
> + if (rc) {
> + xa_erase(>faults, group->cookie);
> + break;

Same here

(same comment on the write side too)

Jason



Re: [PATCH v5 5/9] iommufd: Add iommufd fault object

2024-05-07 Thread Jason Gunthorpe
On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index ae65e0b85d69..1a0450a83bd0 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>   struct device   *dev;
>   refcount_t  users;
>   };
> + /* attach data for IOMMUFD */
> + struct {
> + void*idev;
> + };

We can use a proper type here, just forward declare it.

But this sequence in the other patch:

+   ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+   if (ret) {
+   iommufd_fault_iopf_disable(idev);
+   return ret;
+   }
+
+   handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 
0);
+   handle->idev = idev;

Is why I was imagining the caller would allocate, because now we have
the issue that a fault capable domain was installed into the IOMMU
before it's handle could be fully setup, so we have a race where a
fault could come in right between those things. Then what happens?
I suppose we can retry the fault and by the time it comes back the
race should resolve. A bit ugly I suppose.

> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 83b45dce94a4..1819b28e9e6b 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -50,6 +50,7 @@ enum {
>   IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
>   IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
>   IOMMUFD_CMD_HWPT_INVALIDATE,
> + IOMMUFD_CMD_FAULT_ALLOC,
>  };

I think I'd call this a CMD_FAULT_QUEUE_ALLOC - does that make sense?

Jason