Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Jason Wang


在 2021/7/1 下午6:26, Yongji Xie 写道:

On Thu, Jul 1, 2021 at 3:55 PM Jason Wang  wrote:


在 2021/7/1 下午2:50, Yongji Xie 写道:

On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:

On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:

On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:

+/* ioctls */
+
+struct vduse_dev_config {
+ char name[VDUSE_NAME_MAX]; /* vduse device name */
+ __u32 vendor_id; /* virtio vendor id */
+ __u32 device_id; /* virtio device id */
+ __u64 features; /* device features */
+ __u64 bounce_size; /* bounce buffer size for iommu */
+ __u16 vq_size_max; /* the max size of virtqueue */

The VIRTIO specification allows per-virtqueue sizes. A device can have
two virtqueues, where the first one allows up to 1024 descriptors and
the second one allows only 128 descriptors, for example.


Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
that now. All virtqueues have the same maximum size.

I see struct vpda_config_ops only supports a per-device max vq size:
u16 (*get_vq_num_max)(struct vdpa_device *vdev);

virtio-pci supports per-virtqueue sizes because the struct
virtio_pci_common_cfg->queue_size register is per-queue (controlled by
queue_select).


Oh, yes. I miss queue_select.


I guess this is a question for Jason: will vdpa will keep this limitation?
If yes, then VDUSE can stick to it too without running into problems in
the future.


I think it's better to extend the get_vq_num_max() per virtqueue.

Currently, vDPA assumes the parent to have a global max size. This seems
to work on most of the parents but not vp-vDPA (which could be backed by
QEMU, in that case cvq's size is smaller).

Fortunately, we haven't enabled had cvq support in the userspace now.

I can post the fixes.


OK. If so, it looks like we need to support the per-vq configuration.
I wonder if it's better to use something like: VDUSE_CREATE_DEVICE ->
VDUSE_SETUP_VQ -> VDUSE_SETUP_VQ -> ... -> VDUSE_ENABLE_DEVICE to do
initialization rather than only use VDUSE_CREATE_DEVICE.



This should be fine.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v15 12/12] of: Add plumbing for restricted DMA pool

2021-07-01 Thread Guenter Roeck
Hi,

On Thu, Jun 24, 2021 at 11:55:26PM +0800, Claire Chang wrote:
> If a device is not behind an IOMMU, we look up the device node and set
> up the restricted DMA when the restricted-dma-pool is presented.
> 
> Signed-off-by: Claire Chang 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

With this patch in place, all sparc and sparc64 qemu emulations
fail to boot. Symptom is that the root file system is not found.
Reverting this patch fixes the problem. Bisect log is attached.

Guenter

---
# bad: [fb0ca446157a86b75502c1636b0d81e642fe6bf1] Add linux-next specific files 
for 20210701
# good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
git bisect start 'HEAD' 'v5.13'
# bad: [f63c4fda987a19b1194cc45cb72fd5bf968d9d90] Merge remote-tracking branch 
'rdma/for-next'
git bisect bad f63c4fda987a19b1194cc45cb72fd5bf968d9d90
# good: [46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4] Merge remote-tracking branch 
'ipsec/master'
git bisect good 46bb5dd1d2a63e906e374e97dfd4a5e33934b1c4
# good: [43ba6969cfb8185353a7a6fc79070f13b9e3d6d3] Merge remote-tracking branch 
'clk/clk-next'
git bisect good 43ba6969cfb8185353a7a6fc79070f13b9e3d6d3
# good: [1ca5eddcf8dca1d6345471c6404e7364af0d7019] Merge remote-tracking branch 
'fuse/for-next'
git bisect good 1ca5eddcf8dca1d6345471c6404e7364af0d7019
# good: [8f6d7b3248705920187263a4e7147b0752ec7dcf] Merge remote-tracking branch 
'pci/next'
git bisect good 8f6d7b3248705920187263a4e7147b0752ec7dcf
# good: [df1885a755784da3ef285f36d9230c1d090ef186] RDMA/rtrs_clt: Alloc less 
memory with write path fast memory registration
git bisect good df1885a755784da3ef285f36d9230c1d090ef186
# good: [93d31efb58c8ad4a66bbedbc2d082df458c04e45] Merge remote-tracking branch 
'cpufreq-arm/cpufreq/arm/linux-next'
git bisect good 93d31efb58c8ad4a66bbedbc2d082df458c04e45
# good: [46308965ae6fdc7c25deb2e8c048510ae51bbe66] RDMA/irdma: Check contents 
of user-space irdma_mem_reg_req object
git bisect good 46308965ae6fdc7c25deb2e8c048510ae51bbe66
# good: [6de7a1d006ea9db235492b288312838d6878385f] 
thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
git bisect good 6de7a1d006ea9db235492b288312838d6878385f
# good: [081bec2577cda3d04f6559c60b6f4e2242853520] dt-bindings: of: Add 
restricted DMA pool
git bisect good 081bec2577cda3d04f6559c60b6f4e2242853520
# good: [bf95ac0bcd69979af146852f6a617a60285ebbc1] Merge remote-tracking branch 
'thermal/thermal/linux-next'
git bisect good bf95ac0bcd69979af146852f6a617a60285ebbc1
# good: [3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc] RDMA/core: Always release 
restrack object
git bisect good 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc
# bad: [cff1f23fad6e0bd7d671acce0d15285c709f259c] Merge remote-tracking branch 
'swiotlb/linux-next'
git bisect bad cff1f23fad6e0bd7d671acce0d15285c709f259c
# bad: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing for 
restricted DMA pool
git bisect bad b655006619b7bccd0dc1e055bd72de5d613e7b5c
# first bad commit: [b655006619b7bccd0dc1e055bd72de5d613e7b5c] of: Add plumbing 
for restricted DMA pool
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

2021-07-01 Thread Thierry Reding
On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote:
> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote:
> > On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote:
> > > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote:
> > > > From: Thierry Reding 
> > > > 
> > > > Reserved memory region phandle references can be accompanied by a
> > > > specifier that provides additional information about how that specific
> > > > reference should be treated.
> > > > 
> > > > One use-case is to mark a memory region as needing an identity mapping
> > > > in the system's IOMMU for the device that references the region. This is
> > > > needed for example when the bootloader has set up hardware (such as a
> > > > display controller) to actively access a memory region (e.g. a boot
> > > > splash screen framebuffer) during boot. The operating system can use the
> > > > identity mapping flag from the specifier to make sure an IOMMU identity
> > > > mapping is set up for the framebuffer before IOMMU translations are
> > > > enabled for the display controller.
> > > > 
> > > > Signed-off-by: Thierry Reding 
> > > > ---
> > > >  .../reserved-memory/reserved-memory.txt   | 21 +++
> > > >  include/dt-bindings/reserved-memory.h |  8 +++
> > > >  2 files changed, 29 insertions(+)
> > > >  create mode 100644 include/dt-bindings/reserved-memory.h
> > > 
> > > Sorry for being slow on this. I have 2 concerns.
> > > 
> > > First, this creates an ABI issue. A DT with cells in 'memory-region' 
> > > will not be understood by an existing OS. I'm less concerned about this 
> > > if we address that with a stable fix. (Though I'm pretty sure we've 
> > > naively added #?-cells in the past ignoring this issue.)
> > 
> > A while ago I had proposed adding memory-region*s* as an alternative
> > name for memory-region to make the naming more consistent with other
> > types of properties (think clocks, resets, gpios, ...). If we added
> > that, we could easily differentiate between the "legacy" cases where
> > no #memory-region-cells was allowed and the new cases where it was.
> > 
> > > Second, it could be the bootloader setting up the reserved region. If a 
> > > node already has 'memory-region', then adding more regions is more 
> > > complicated compared to adding new properties. And defining what each 
> > > memory-region entry is or how many in schemas is impossible.
> > 
> > It's true that updating the property gets a bit complicated, but it's
> > not exactly rocket science. We really just need to splice the array. I
> > have a working implemention for this in U-Boot.
> > 
> > For what it's worth, we could run into the same issue with any new
> > property that we add. Even if we renamed this to iommu-memory-region,
> > it's still possible that a bootloader may have to update this property
> > if it already exists (it could be hard-coded in DT, or it could have
> > been added by some earlier bootloader or firmware).
> > 
> > > Both could be addressed with a new property. Perhaps something like 
> > > 'iommu-memory-region = <>;'. I think the 'iommu' prefix is 
> > > appropriate given this is entirely because of the IOMMU being in the 
> > > mix. I might feel differently if we had other uses for cells, but I 
> > > don't really see it in this case. 
> > 
> > I'm afraid that down the road we'll end up with other cases and then we
> > might proliferate a number of *-memory-region properties with varying
> > prefixes.
> > 
> > I am aware of one other case where we might need something like this: on
> > some Tegra SoCs we have audio processors that will access memory buffers
> > using a DMA engine. These processors are booted from early firmware
> > using firmware from system memory. In order to avoid trashing the
> > firmware, we need to reserve memory. We can do this using reserved
> > memory nodes. However, the audio DMA engine also uses the SMMU, so we
> > need to make sure that the firmware memory is marked as reserved within
> > the SMMU. This is similar to the identity mapping case, but not exactly
> > the same. Instead of creating a 1:1 mapping, we just want that IOVA
> > region to be reserved (i.e. IOMMU_RESV_RESERVED instead of
> > IOMMU_RESV_DIRECT{,_RELAXABLE}).
> > 
> > That would also fall into the IOMMU domain, but we can't reuse the
> > iommu-memory-region property for that because then we don't have enough
> > information to decide which type of reservation we need.
> > 
> > We could obviously make iommu-memory-region take a specifier, but we
> > could just as well use memory-regions in that case since we have
> > something more generic anyway.
> > 
> > With the #memory-region-cells proposal, we can easily extend the cell in
> > the specifier with an additional MEMORY_REGION_IOMMU_RESERVE flag to
> > take that other use case into account. If we than also change to the new
> > memory-regions property name, we avoid the ABI issue (and we gain a bit
> > of 

Re: Re: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-01 Thread Stefan Hajnoczi
On Thu, Jul 01, 2021 at 06:00:48PM +0800, Yongji Xie wrote:
> On Wed, Jun 30, 2021 at 6:06 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jun 29, 2021 at 01:43:11PM +0800, Yongji Xie wrote:
> > > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  
> > > wrote:
> > > > On Tue, Jun 15, 2021 at 10:13:31PM +0800, Xie Yongji wrote:
> > > > > + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t 
> > > > > *len)
> > > > > + {
> > > > > + int fd;
> > > > > + void *addr;
> > > > > + size_t size;
> > > > > + struct vduse_iotlb_entry entry;
> > > > > +
> > > > > + entry.start = iova;
> > > > > + entry.last = iova + 1;
> > > >
> > > > Why +1?
> > > >
> > > > I expected the request to include *len so that VDUSE can create a bounce
> > > > buffer for the full iova range, if necessary.
> > > >
> > >
> > > The function is used to translate iova to va. And the *len is not
> > > specified by the caller. Instead, it's used to tell the caller the
> > > length of the contiguous iova region from the specified iova. And the
> > > ioctl VDUSE_IOTLB_GET_FD will get the file descriptor to the first
> > > overlapped iova region. So using iova + 1 should be enough here.
> >
> > Does the entry.last field have any purpose with VDUSE_IOTLB_GET_FD? I
> > wonder why userspace needs to assign a value at all if it's always +1.
> >
> 
> If we need to get some iova regions in the specified range, we need
> the entry.last field. For example, we can use [0, ULONG_MAX] to get
> the first overlapped iova region which might be [4096, 8192]. But in
> this function, we don't use VDUSE_IOTLB_GET_FD like this. We need to
> get the iova region including the specified iova.

I see, thanks for explaining!

> > > > > + return addr + iova - entry.start;
> > > > > + }
> > > > > +
> > > > > +- VDUSE_DEV_GET_FEATURES: Get the negotiated features
> > > >
> > > > Are these VIRTIO feature bits? Please explain how feature negotiation
> > > > works. There must be a way for userspace to report the device's
> > > > supported feature bits to the kernel.
> > > >
> > >
> > > Yes, these are VIRTIO feature bits. Userspace will specify the
> > > device's supported feature bits when creating a new VDUSE device with
> > > ioctl(VDUSE_CREATE_DEV).
> >
> > Can the VDUSE device influence feature bit negotiation? For example, if
> > the VDUSE virtio-blk device does not implement discard/write-zeroes, how
> > does QEMU or the guest find out about this?
> >
> 
> There is a "features" field in struct vduse_dev_config which is used
> to do feature negotiation.

This approach is more restrictive than required by the VIRTIO
specification:

  "The device SHOULD accept any valid subset of features the driver
  accepts, otherwise it MUST fail to set the FEATURES_OK device status
  bit when the driver writes it."

  
https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-130002

The spec allows a device to reject certain subsets of features. For
example, if feature B depends on feature A and can only be enabled when
feature A is also enabled.

From your description I think VDUSE would accept feature B without
feature A since the device implementation has no opportunity to fail
negotiation with custom logic.

Ideally VDUSE would send a SET_FEATURES message to userspace, allowing
the device implementation full flexibility in which subsets of features
to accept.

This is a corner case. Many or maybe even all existing VIRTIO devices
don't need this flexibility, but I want to point out this limitation in
the VDUSE interface because it may cause issues in the future.

> > > > > +- VDUSE_DEV_UPDATE_CONFIG: Update the configuration space and inject 
> > > > > a config interrupt
> > > >
> > > > Does this mean the contents of the configuration space are cached by
> > > > VDUSE?
> > >
> > > Yes, but the kernel will also store the same contents.
> > >
> > > > The downside is that the userspace code cannot generate the
> > > > contents on demand. Most devices doin't need to generate the contents
> > > > on demand, so I think this is okay but I had expected a different
> > > > interface:
> > > >
> > > > kernel->userspace VDUSE_DEV_GET_CONFIG
> > > > userspace->kernel VDUSE_DEV_INJECT_CONFIG_IRQ
> > > >
> > >
> > > The problem is how to handle the failure of VDUSE_DEV_GET_CONFIG. We
> > > will need lots of modification of virtio codes to support that. So to
> > > make it simple, we choose this way:
> > >
> > > userspace -> kernel VDUSE_DEV_SET_CONFIG
> > > userspace -> kernel VDUSE_DEV_INJECT_CONFIG_IRQ
> > >
> > > > I think you can leave it the way it is, but I wanted to mention this in
> > > > case someone thinks it's important to support generating the contents of
> > > > the configuration space on demand.
> > > >
> > >
> > > Sorry, I didn't get you here. Can't VDUSE_DEV_SET_CONFIG and
> > > VDUSE_DEV_INJECT_CONFIG_IRQ achieve that?
> >
> > If the contents of the configuration 

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Yongji Xie
On Thu, Jul 1, 2021 at 3:55 PM Jason Wang  wrote:
>
>
> 在 2021/7/1 下午2:50, Yongji Xie 写道:
> > On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:
> >> On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:
> >>> On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  
> >>> wrote:
>  On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > +/* ioctls */
> > +
> > +struct vduse_dev_config {
> > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > + __u32 vendor_id; /* virtio vendor id */
> > + __u32 device_id; /* virtio device id */
> > + __u64 features; /* device features */
> > + __u64 bounce_size; /* bounce buffer size for iommu */
> > + __u16 vq_size_max; /* the max size of virtqueue */
>  The VIRTIO specification allows per-virtqueue sizes. A device can have
>  two virtqueues, where the first one allows up to 1024 descriptors and
>  the second one allows only 128 descriptors, for example.
> 
> >>> Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
> >>> that now. All virtqueues have the same maximum size.
> >> I see struct vpda_config_ops only supports a per-device max vq size:
> >> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> >>
> >> virtio-pci supports per-virtqueue sizes because the struct
> >> virtio_pci_common_cfg->queue_size register is per-queue (controlled by
> >> queue_select).
> >>
> > Oh, yes. I miss queue_select.
> >
> >> I guess this is a question for Jason: will vdpa will keep this limitation?
> >> If yes, then VDUSE can stick to it too without running into problems in
> >> the future.
>
>
> I think it's better to extend the get_vq_num_max() per virtqueue.
>
> Currently, vDPA assumes the parent to have a global max size. This seems
> to work on most of the parents but not vp-vDPA (which could be backed by
> QEMU, in that case cvq's size is smaller).
>
> Fortunately, we haven't enabled had cvq support in the userspace now.
>
> I can post the fixes.
>

OK. If so, it looks like we need to support the per-vq configuration.
I wonder if it's better to use something like: VDUSE_CREATE_DEVICE ->
VDUSE_SETUP_VQ -> VDUSE_SETUP_VQ -> ... -> VDUSE_ENABLE_DEVICE to do
initialization rather than only use VDUSE_CREATE_DEVICE.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-07-01 Thread Yongji Xie
On Wed, Jun 30, 2021 at 6:06 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 29, 2021 at 01:43:11PM +0800, Yongji Xie wrote:
> > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
> > > On Tue, Jun 15, 2021 at 10:13:31PM +0800, Xie Yongji wrote:
> > > > + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len)
> > > > + {
> > > > + int fd;
> > > > + void *addr;
> > > > + size_t size;
> > > > + struct vduse_iotlb_entry entry;
> > > > +
> > > > + entry.start = iova;
> > > > + entry.last = iova + 1;
> > >
> > > Why +1?
> > >
> > > I expected the request to include *len so that VDUSE can create a bounce
> > > buffer for the full iova range, if necessary.
> > >
> >
> > The function is used to translate iova to va. And the *len is not
> > specified by the caller. Instead, it's used to tell the caller the
> > length of the contiguous iova region from the specified iova. And the
> > ioctl VDUSE_IOTLB_GET_FD will get the file descriptor to the first
> > overlapped iova region. So using iova + 1 should be enough here.
>
> Does the entry.last field have any purpose with VDUSE_IOTLB_GET_FD? I
> wonder why userspace needs to assign a value at all if it's always +1.
>

If we need to get some iova regions in the specified range, we need
the entry.last field. For example, we can use [0, ULONG_MAX] to get
the first overlapped iova region which might be [4096, 8192]. But in
this function, we don't use VDUSE_IOTLB_GET_FD like this. We need to
get the iova region including the specified iova.

> >
> > > > + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, );
> > > > + if (fd < 0)
> > > > + return NULL;
> > > > +
> > > > + size = entry.last - entry.start + 1;
> > > > + *len = entry.last - iova + 1;
> > > > + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED,
> > > > + fd, entry.offset);
> > > > + close(fd);
> > > > + if (addr == MAP_FAILED)
> > > > + return NULL;
> > > > +
> > > > + /* do something to cache this iova region */
> > >
> > > How is userspace expected to manage iotlb mmaps? When should munmap(2)
> > > be called?
> > >
> >
> > The simple way is using a list to store the iotlb mappings. And we
> > should call the munmap(2) for the old mappings when VDUSE_UPDATE_IOTLB
> > or VDUSE_STOP_DATAPLANE message is received.
>
> Thanks for explaining. It would be helpful to have a description of
> IOTLB operation in this document.
>

Sure.

> > > Should userspace expect VDUSE_IOTLB_GET_FD to return a full chunk of
> > > guest RAM (e.g. multiple gigabytes) that can be cached permanently or
> > > will it return just enough pages to cover [start, last)?
> > >
> >
> > It should return one iotlb mapping that covers [start, last). In
> > vhost-vdpa cases, it might be a full chunk of guest RAM. In
> > virtio-vdpa cases, it might be the whole bounce buffer or one coherent
> > mapping (produced by dma_alloc_coherent()).
>
> Great, thanks. Adding something about this to the documentation would
> help others implementing VDUSE devices or libraries.
>

OK.

> > > > +
> > > > + return addr + iova - entry.start;
> > > > + }
> > > > +
> > > > +- VDUSE_DEV_GET_FEATURES: Get the negotiated features
> > >
> > > Are these VIRTIO feature bits? Please explain how feature negotiation
> > > works. There must be a way for userspace to report the device's
> > > supported feature bits to the kernel.
> > >
> >
> > Yes, these are VIRTIO feature bits. Userspace will specify the
> > device's supported feature bits when creating a new VDUSE device with
> > ioctl(VDUSE_CREATE_DEV).
>
> Can the VDUSE device influence feature bit negotiation? For example, if
> the VDUSE virtio-blk device does not implement discard/write-zeroes, how
> does QEMU or the guest find out about this?
>

There is a "features" field in struct vduse_dev_config which is used
to do feature negotiation.

> > > > +- VDUSE_DEV_UPDATE_CONFIG: Update the configuration space and inject a 
> > > > config interrupt
> > >
> > > Does this mean the contents of the configuration space are cached by
> > > VDUSE?
> >
> > Yes, but the kernel will also store the same contents.
> >
> > > The downside is that the userspace code cannot generate the
> > > contents on demand. Most devices doin't need to generate the contents
> > > on demand, so I think this is okay but I had expected a different
> > > interface:
> > >
> > > kernel->userspace VDUSE_DEV_GET_CONFIG
> > > userspace->kernel VDUSE_DEV_INJECT_CONFIG_IRQ
> > >
> >
> > The problem is how to handle the failure of VDUSE_DEV_GET_CONFIG. We
> > will need lots of modification of virtio codes to support that. So to
> > make it simple, we choose this way:
> >
> > userspace -> kernel VDUSE_DEV_SET_CONFIG
> > userspace -> kernel VDUSE_DEV_INJECT_CONFIG_IRQ
> >
> > > I think you can leave it the way 

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Marek Szyprowski
On 01.07.2021 11:11, Robin Murphy wrote:
> On 2021-07-01 10:01, Will Deacon wrote:
>> On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:
>>> Hi Robin,
>>>
>>> On 30.06.2021 16:01, Robin Murphy wrote:
 On 2021-06-30 14:48, Marek Szyprowski wrote:
> On 30.06.2021 14:59, Will Deacon wrote:
>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>> On 08.06.2021 18:45, Amey Narkhede wrote:
 If device registration fails, remove sysfs attribute
 and if setting bus callbacks fails, unregister the device
 and cleanup the sysfs attribute.

 Signed-off-by: Amey Narkhede 
>>> This patch landed in linux-next some time ago as commit 
>>> 249c9dc6aa0d
>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>> bisecting and some manual searching I finally found that it is
>>> responsible for breaking s2idle on DragonBoard 410c. Here is the 
>>> log
>>> (captured with no_console_suspend):
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan� 1 
>>> 00:02:13 1970
>>> PM: suspend entry (s2idle)
>>> Filesystems sync: 0.002 seconds
>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) 
>>> done.
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 0070
>>> Mem abort info:
>>> �� � ESR = 0x9606
>>> �� � EC = 0x25: DABT (current EL), IL = 32 bits
>>> �� � SET = 0, FnV = 0
>>> �� � EA = 0, S1PTW = 0
>>> �� � FSC = 0x06: level 2 translation fault
>>> Data abort info:
>>> �� � ISV = 0, ISS = 0x0006
>>> �� � CM = 0, WnR = 0
>>> user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
>>> [0070] pgd=080085c3c003, p4d=080085c3c003,
>>> pud=080088dcf003, pmd=
>>> Internal error: Oops: 9606 [#1] PREEMPT SMP
>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu 
>>> v4l2_mem2mem
>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
>>> qnoc_msm8916
>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector 
>>> rmtfs_mem
>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 
>>> #3592
>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>> pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>> ...
>>> Call trace:
>>> �� �msm_runtime_suspend+0x1c/0x60 [msm]
>>> �� �msm_pm_suspend+0x18/0x38 [msm]
>>> �� �dpm_run_callback+0x84/0x378
>> I wonder if we're missing a pm_runtime_disable() call on the failure
>> path?
>> i.e. something like the diff below...
>
> I've checked and it doesn't fix anything.

 What's happened previously? Has an IOMMU actually failed to probe, or
 is this a fiddly "code movement unveils latent bug elsewhere" kind of
 thing? There doesn't look to be much capable of going wrong in
 msm_runtime_suspend() itself, so is the DRM driver also in a broken
 half-probed state where it's left its pm_runtime_ops behind without
 its drvdata being valid?

>>> I finally had some time to analyze this issue. It turned out that with
>>> this patch, iommu fails to probe for soc:iommu@1f08000 device, while it
>>> worked fine before. This happens because this patch adds a check for 
>>> the
>>> return value of the bus_set_iommu() in
>>> drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it
>>> probes successfully again. It looks that there are already iommu ops
>>> registered for platform bus, before qcom_iommu probes. On the other
>>> hand, if I remember correctly they are not used during the device
>>> registration, but they are needed for some legacy stuff. I can send a
>>> patch restoring old code flow if you think that this is a right 
>>> solution.
>>
>> Yes, let's just revert the qcom_iommu.c changes from that patch for now.
>> The pm runtime stuff looks dodgy anyway so I think this needs more 
>> thought.
>
> Oh, right, blindly returning the -EBUSY from bus_set_iommu() because 
> we're not the first instance to probe is definitely the wrong thing to 
> do as well. It's still not clear why failing makes the DRM driver fall 
> over, 

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Robin Murphy

On 2021-07-01 10:01, Will Deacon wrote:

On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:

Hi Robin,

On 30.06.2021 16:01, Robin Murphy wrote:

On 2021-06-30 14:48, Marek Szyprowski wrote:

On 30.06.2021 14:59, Will Deacon wrote:

On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:

On 08.06.2021 18:45, Amey Narkhede wrote:

If device registration fails, remove sysfs attribute
and if setting bus callbacks fails, unregister the device
and cleanup the sysfs attribute.

Signed-off-by: Amey Narkhede 

This patch landed in linux-next some time ago as commit 249c9dc6aa0d
("iommu/arm: Cleanup resources in case of probe error path"). After
bisecting and some manual searching I finally found that it is
responsible for breaking s2idle on DragonBoard 410c. Here is the log
(captured with no_console_suspend):

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan� 1 00:02:13 1970
PM: suspend entry (s2idle)
Filesystems sync: 0.002 seconds
Freezing user space processes ... (elapsed 0.006 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
Unable to handle kernel NULL pointer dereference at virtual address
0070
Mem abort info:
�� � ESR = 0x9606
�� � EC = 0x25: DABT (current EL), IL = 32 bits
�� � SET = 0, FnV = 0
�� � EA = 0, S1PTW = 0
�� � FSC = 0x06: level 2 translation fault
Data abort info:
�� � ISV = 0, ISS = 0x0006
�� � CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
[0070] pgd=080085c3c003, p4d=080085c3c003,
pud=080088dcf003, pmd=
Internal error: Oops: 9606 [#1] PREEMPT SMP
Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
qnoc_msm8916
videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
pc : msm_runtime_suspend+0x1c/0x60 [msm]
lr : msm_pm_suspend+0x18/0x38 [msm]
...
Call trace:
�� �msm_runtime_suspend+0x1c/0x60 [msm]
�� �msm_pm_suspend+0x18/0x38 [msm]
�� �dpm_run_callback+0x84/0x378

I wonder if we're missing a pm_runtime_disable() call on the failure
path?
i.e. something like the diff below...


I've checked and it doesn't fix anything.


What's happened previously? Has an IOMMU actually failed to probe, or
is this a fiddly "code movement unveils latent bug elsewhere" kind of
thing? There doesn't look to be much capable of going wrong in
msm_runtime_suspend() itself, so is the DRM driver also in a broken
half-probed state where it's left its pm_runtime_ops behind without
its drvdata being valid?


I finally had some time to analyze this issue. It turned out that with
this patch, iommu fails to probe for soc:iommu@1f08000 device, while it
worked fine before. This happens because this patch adds a check for the
return value of the bus_set_iommu() in
drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it
probes successfully again. It looks that there are already iommu ops
registered for platform bus, before qcom_iommu probes. On the other
hand, if I remember correctly they are not used during the device
registration, but they are needed for some legacy stuff. I can send a
patch restoring old code flow if you think that this is a right solution.


Yes, let's just revert the qcom_iommu.c changes from that patch for now.
The pm runtime stuff looks dodgy anyway so I think this needs more thought.


Oh, right, blindly returning the -EBUSY from bus_set_iommu() because 
we're not the first instance to probe is definitely the wrong thing to 
do as well. It's still not clear why failing makes the DRM driver fall 
over, but +1 to qcom-iommu needing some deeper consideration.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Will Deacon
On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:
> Hi Robin,
> 
> On 30.06.2021 16:01, Robin Murphy wrote:
> > On 2021-06-30 14:48, Marek Szyprowski wrote:
> >> On 30.06.2021 14:59, Will Deacon wrote:
> >>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>  On 08.06.2021 18:45, Amey Narkhede wrote:
> > If device registration fails, remove sysfs attribute
> > and if setting bus callbacks fails, unregister the device
> > and cleanup the sysfs attribute.
> >
> > Signed-off-by: Amey Narkhede 
>  This patch landed in linux-next some time ago as commit 249c9dc6aa0d
>  ("iommu/arm: Cleanup resources in case of probe error path"). After
>  bisecting and some manual searching I finally found that it is
>  responsible for breaking s2idle on DragonBoard 410c. Here is the log
>  (captured with no_console_suspend):
> 
>  # time rtcwake -s10 -mmem
>  rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
>  PM: suspend entry (s2idle)
>  Filesystems sync: 0.002 seconds
>  Freezing user space processes ... (elapsed 0.006 seconds) done.
>  OOM killer disabled.
>  Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
>  Unable to handle kernel NULL pointer dereference at virtual address
>  0070
>  Mem abort info:
>       ESR = 0x9606
>       EC = 0x25: DABT (current EL), IL = 32 bits
>       SET = 0, FnV = 0
>       EA = 0, S1PTW = 0
>       FSC = 0x06: level 2 translation fault
>  Data abort info:
>       ISV = 0, ISS = 0x0006
>       CM = 0, WnR = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
>  [0070] pgd=080085c3c003, p4d=080085c3c003,
>  pud=080088dcf003, pmd=
>  Internal error: Oops: 9606 [#1] PREEMPT SMP
>  Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>  venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>  snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>  qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>  venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>  videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
>  snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>  snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci 
>  qnoc_msm8916
>  videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
>  CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
>  Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>  pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>  pc : msm_runtime_suspend+0x1c/0x60 [msm]
>  lr : msm_pm_suspend+0x18/0x38 [msm]
>  ...
>  Call trace:
>      msm_runtime_suspend+0x1c/0x60 [msm]
>      msm_pm_suspend+0x18/0x38 [msm]
>      dpm_run_callback+0x84/0x378
> >>> I wonder if we're missing a pm_runtime_disable() call on the failure 
> >>> path?
> >>> i.e. something like the diff below...
> >>
> >> I've checked and it doesn't fix anything.
> >
> > What's happened previously? Has an IOMMU actually failed to probe, or 
> > is this a fiddly "code movement unveils latent bug elsewhere" kind of 
> > thing? There doesn't look to be much capable of going wrong in 
> > msm_runtime_suspend() itself, so is the DRM driver also in a broken 
> > half-probed state where it's left its pm_runtime_ops behind without 
> > its drvdata being valid?
> >
> I finally had some time to analyze this issue. It turned out that with 
> this patch, iommu fails to probe for soc:iommu@1f08000 device, while it 
> worked fine before. This happens because this patch adds a check for the 
> return value of the bus_set_iommu() in 
> drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it 
> probes successfully again. It looks that there are already iommu ops 
> registered for platform bus, before qcom_iommu probes. On the other 
> hand, if I remember correctly they are not used during the device 
> registration, but they are needed for some legacy stuff. I can send a 
> patch restoring old code flow if you think that this is a right solution.

Yes, let's just revert the qcom_iommu.c changes from that patch for now.
The pm runtime stuff looks dodgy anyway so I think this needs more thought.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-07-01 Thread Marek Szyprowski
Hi Robin,

On 30.06.2021 16:01, Robin Murphy wrote:
> On 2021-06-30 14:48, Marek Szyprowski wrote:
>> On 30.06.2021 14:59, Will Deacon wrote:
>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
 On 08.06.2021 18:45, Amey Narkhede wrote:
> If device registration fails, remove sysfs attribute
> and if setting bus callbacks fails, unregister the device
> and cleanup the sysfs attribute.
>
> Signed-off-by: Amey Narkhede 
 This patch landed in linux-next some time ago as commit 249c9dc6aa0d
 ("iommu/arm: Cleanup resources in case of probe error path"). After
 bisecting and some manual searching I finally found that it is
 responsible for breaking s2idle on DragonBoard 410c. Here is the log
 (captured with no_console_suspend):

 # time rtcwake -s10 -mmem
 rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
 PM: suspend entry (s2idle)
 Filesystems sync: 0.002 seconds
 Freezing user space processes ... (elapsed 0.006 seconds) done.
 OOM killer disabled.
 Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
 Unable to handle kernel NULL pointer dereference at virtual address
 0070
 Mem abort info:
      ESR = 0x9606
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x06: level 2 translation fault
 Data abort info:
      ISV = 0, ISS = 0x0006
      CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000
 [0070] pgd=080085c3c003, p4d=080085c3c003,
 pud=080088dcf003, pmd=
 Internal error: Oops: 9606 [#1] PREEMPT SMP
 Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
 venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
 snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
 qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
 venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
 videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
 snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
 snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci 
 qnoc_msm8916
 videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
 CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
 pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
 pc : msm_runtime_suspend+0x1c/0x60 [msm]
 lr : msm_pm_suspend+0x18/0x38 [msm]
 ...
 Call trace:
     msm_runtime_suspend+0x1c/0x60 [msm]
     msm_pm_suspend+0x18/0x38 [msm]
     dpm_run_callback+0x84/0x378
>>> I wonder if we're missing a pm_runtime_disable() call on the failure 
>>> path?
>>> i.e. something like the diff below...
>>
>> I've checked and it doesn't fix anything.
>
> What's happened previously? Has an IOMMU actually failed to probe, or 
> is this a fiddly "code movement unveils latent bug elsewhere" kind of 
> thing? There doesn't look to be much capable of going wrong in 
> msm_runtime_suspend() itself, so is the DRM driver also in a broken 
> half-probed state where it's left its pm_runtime_ops behind without 
> its drvdata being valid?
>
I finally had some time to analyze this issue. It turned out that with 
this patch, iommu fails to probe for soc:iommu@1f08000 device, while it 
worked fine before. This happens because this patch adds a check for the 
return value of the bus_set_iommu() in 
drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it 
probes successfully again. It looks that there are already iommu ops 
registered for platform bus, before qcom_iommu probes. On the other 
hand, if I remember correctly they are not used during the device 
registration, but they are needed for some legacy stuff. I can send a 
patch restoring old code flow if you think that this is a right solution.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Jason Wang


在 2021/7/1 下午2:50, Yongji Xie 写道:

On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:

On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:

On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:

On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:

+/* ioctls */
+
+struct vduse_dev_config {
+ char name[VDUSE_NAME_MAX]; /* vduse device name */
+ __u32 vendor_id; /* virtio vendor id */
+ __u32 device_id; /* virtio device id */
+ __u64 features; /* device features */
+ __u64 bounce_size; /* bounce buffer size for iommu */
+ __u16 vq_size_max; /* the max size of virtqueue */

The VIRTIO specification allows per-virtqueue sizes. A device can have
two virtqueues, where the first one allows up to 1024 descriptors and
the second one allows only 128 descriptors, for example.


Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
that now. All virtqueues have the same maximum size.

I see struct vpda_config_ops only supports a per-device max vq size:
u16 (*get_vq_num_max)(struct vdpa_device *vdev);

virtio-pci supports per-virtqueue sizes because the struct
virtio_pci_common_cfg->queue_size register is per-queue (controlled by
queue_select).


Oh, yes. I miss queue_select.


I guess this is a question for Jason: will vdpa will keep this limitation?
If yes, then VDUSE can stick to it too without running into problems in
the future.



I think it's better to extend the get_vq_num_max() per virtqueue.

Currently, vDPA assumes the parent to have a global max size. This seems 
to work on most of the parents but not vp-vDPA (which could be backed by 
QEMU, in that case cvq's size is smaller).


Fortunately, we haven't enabled had cvq support in the userspace now.

I can post the fixes.





+ __u16 padding; /* padding */
+ __u32 vq_num; /* the number of virtqueues */
+ __u32 vq_align; /* the allocation alignment of virtqueue's metadata */

I'm not sure what this is?


  This will be used by vring_create_virtqueue() too.

If there is no official definition for the meaning of this value then
"/* same as vring_create_virtqueue()'s vring_align parameter */" would
be clearer. That way the reader knows what to research in order to
understand how this field works.


OK.


I don't remember but maybe it was used to support vrings when the
host/guest have non-4KB page sizes. I wonder if anyone has an official
definition for this value?

Not sure. Maybe we might need some alignment which is less than
PAGE_SIZE sometimes.



So I see CCW always use 4096, but I'm not sure whether or not it's 
smaller than PAGE_SIZE.


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-01 Thread Will Deacon
On Wed, Jun 30, 2021 at 08:56:51AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote:
> > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote:
> > > `BUG: unable to handle page fault for address: 003a8290` and
> > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory
> > > (maybe dev->dma_io_tlb_mem) was corrupted?
> > > The dev->dma_io_tlb_mem should be set here
> > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528)
> > > through device_initialize.
> > 
> > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at
> > 'io_tlb_default_mem', which is a page-aligned allocation from memblock.
> > The spinlock is at offset 0x24 in that structure, and looking at the
> > register dump from the crash:
> > 
> > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006
> > Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: 
> >  RCX: 8900572ad580
> > Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 
> > 000c RDI: 1d17
> > Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 
> > 000c R09: 
> > Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 
> > 89005653f000 R12: 0212
> > Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 
> > 0002 R15: 0020
> > Jun 29 18:28:42 hp-4300G kernel: FS:  7f1f8898ea40() 
> > GS:89005728() knlGS:
> > Jun 29 18:28:42 hp-4300G kernel: CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 
> > 0001020d CR4: 00350ee0
> > Jun 29 18:28:42 hp-4300G kernel: Call Trace:
> > Jun 29 18:28:42 hp-4300G kernel:  _raw_spin_lock_irqsave+0x39/0x50
> > Jun 29 18:28:42 hp-4300G kernel:  swiotlb_tbl_map_single+0x12b/0x4c0
> > 
> > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and
> > RDX pointing at the spinlock. Yet RAX is holding junk :/
> > 
> > I agree that enabling KASAN would be a good idea, but I also think we
> > probably need to get some more information out of swiotlb_tbl_map_single()
> > to see see what exactly is going wrong in there.
> 
> I can certainly enable KASAN and if there is any debug print I can add
> or dump anything, let me know!

I bit the bullet and took v5.13 with swiotlb/for-linus-5.14 merged in, built
x86 defconfig and ran it on my laptop. However, it seems to work fine!

Please can you share your .config?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-01 Thread Yongji Xie
On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote:
> > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi  wrote:
> > >
> > > On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote:
> > > > +/* ioctls */
> > > > +
> > > > +struct vduse_dev_config {
> > > > + char name[VDUSE_NAME_MAX]; /* vduse device name */
> > > > + __u32 vendor_id; /* virtio vendor id */
> > > > + __u32 device_id; /* virtio device id */
> > > > + __u64 features; /* device features */
> > > > + __u64 bounce_size; /* bounce buffer size for iommu */
> > > > + __u16 vq_size_max; /* the max size of virtqueue */
> > >
> > > The VIRTIO specification allows per-virtqueue sizes. A device can have
> > > two virtqueues, where the first one allows up to 1024 descriptors and
> > > the second one allows only 128 descriptors, for example.
> > >
> >
> > Good point! But it looks like virtio-vdpa/virtio-pci doesn't support
> > that now. All virtqueues have the same maximum size.
>
> I see struct vpda_config_ops only supports a per-device max vq size:
> u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>
> virtio-pci supports per-virtqueue sizes because the struct
> virtio_pci_common_cfg->queue_size register is per-queue (controlled by
> queue_select).
>

Oh, yes. I miss queue_select.

> I guess this is a question for Jason: will vdpa will keep this limitation?
> If yes, then VDUSE can stick to it too without running into problems in
> the future.
>
> > > > + __u16 padding; /* padding */
> > > > + __u32 vq_num; /* the number of virtqueues */
> > > > + __u32 vq_align; /* the allocation alignment of virtqueue's 
> > > > metadata */
> > >
> > > I'm not sure what this is?
> > >
> >
> >  This will be used by vring_create_virtqueue() too.
>
> If there is no official definition for the meaning of this value then
> "/* same as vring_create_virtqueue()'s vring_align parameter */" would
> be clearer. That way the reader knows what to research in order to
> understand how this field works.
>

OK.

> I don't remember but maybe it was used to support vrings when the
> host/guest have non-4KB page sizes. I wonder if anyone has an official
> definition for this value?

Not sure. Maybe we might need some alignment which is less than
PAGE_SIZE sometimes.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu