Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints

2020-06-29 Thread Jason Wang


On 2020/6/29 下午11:49, Michael S. Tsirkin wrote:

On Mon, Jun 29, 2020 at 05:26:03PM +0800, Jason Wang wrote:

On 2020/6/28 下午5:58, Michael S. Tsirkin wrote:

On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote:

This patches extend the vhost IOTLB API to accept batch updating hints
form userspace. When userspace wants update the device IOTLB in a
batch, it may do:

1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag
2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE
3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag

As long as we are extending the interface,
is there some way we could cut down the number of system calls needed
here?


I'm not sure it's worth to do that since usually we only have less than 10
regions.


Well with a guest iommu I'm guessing it can go up significantly, right?



The batching flag is not mandatory, so userspace can still choose to 
update a single IOTLB mapping through a single system call without batch 
flag.






A possible method is to carry multiple vhost_iotlb_message in one system
call.

That's an option.
Also, can kernel handle a batch that is too large by applying
parts of it iteratively?



I think so, we can iterate a vhost iotlb message one by one through iov 
iterator.




Or must all changes take place atomically after BATCH_END?



For changes:

- if you mean the changes in vhost IOTLB, it's not atomically
- if you mean the mapping seen by vDPA device driver, it could be 
atomically for the device driver that implements set_map() method, since 
we pass a interval tree to the device.




If atomically, we might need to limit batch size to make
sure kernel can keep a copy in memory.



It's the responsibility of guest driver to make sure there won't be a 
race condition between DMA and map updating. So it looks to me we don't 
need to care about that.









Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when
vDPA device support set_map() ops. This is useful for the vDPA device
that want to know all the mappings to tweak their own DMA translation
logic.

For vDPA device that doesn't require set_map(), no behavior changes.

This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability.

Signed-off-by: Jason Wang 
---
   drivers/vhost/vdpa.c | 30 +++---
   include/uapi/linux/vhost.h   |  2 ++
   include/uapi/linux/vhost_types.h |  7 +++
   3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 453057421f80..8f624bbafee7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -56,7 +56,9 @@ enum {
   };
   enum {
-   VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
+   VHOST_VDPA_BACKEND_FEATURES =
+   (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) |
+   (1ULL << VHOST_BACKEND_F_IOTLB_BATCH),
   };
   /* Currently, only network backend w/o multiqueue is supported. */
@@ -77,6 +79,7 @@ struct vhost_vdpa {
int virtio_id;
int minor;
struct eventfd_ctx *config_ctx;
+   int in_batch;
   };
   static DEFINE_IDA(vhost_vdpa_ida);
@@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v)
const struct vdpa_config_ops *ops = vdpa->config;
ops->set_status(vdpa, 0);
+   v->in_batch = 0;
   }
   static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
@@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
if (ops->dma_map)
r = ops->dma_map(vdpa, iova, size, pa, perm);
-   else if (ops->set_map)
-   r = ops->set_map(vdpa, dev->iotlb);
-   else
+   else if (ops->set_map) {
+   if (!v->in_batch)
+   r = ops->set_map(vdpa, dev->iotlb);
+   } else
r = iommu_map(v->domain, iova, pa, size,
  perm_to_iommu_flags(perm));
@@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
iova, u64 size)
if (ops->dma_map)
ops->dma_unmap(vdpa, iova, size);
-   else if (ops->set_map)
-   ops->set_map(vdpa, dev->iotlb);
-   else
+   else if (ops->set_map) {
+   if (!v->in_batch)
+   ops->set_map(vdpa, dev->iotlb);
+   } else
iommu_unmap(v->domain, iova, size);
   }
@@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
*dev,
struct vhost_iotlb_msg *msg)
   {
struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
int r = 0;
r = vhost_dev_check_owner(dev);
@@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
*dev,
case VHOST_IOTLB_INVALIDATE:
vhost_vdpa_unmap(v, msg->iova, msg->size);
break;
+  

RE: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Peng Fan
> Subject: RE: [PATCH] xen: introduce xen_vring_use_dma
> 
> On Mon, 29 Jun 2020, Peng Fan wrote:
> > > > If that is the case, how is it possible that virtio breaks on ARM
> > > > using the default dma_ops? The breakage is not Xen related (except
> > > > that Xen turns dma_ops on). The original message from Peng was:
> > > >
> > > >   vring_map_one_sg -> vring_use_dma_api
> > > >-> dma_map_page
> > > >-> __swiotlb_map_page
> > > > ->swiotlb_map_page
> > > > 
> > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev,
> > > dev_addr)), size, dir);
> > > >   However we are using per device dma area for rpmsg, phys_to_virt
> > > >   could not return a correct virtual address for virtual address in
> > > >   vmalloc area. Then kernel panic.
> > > >
> > > > I must be missing something. Maybe it is because it has to do with
> RPMesg?
> > >
> > > I think it's an RPMesg bug, yes
> >
> > rpmsg bug is another issue, it should not use dma_alloc_coherent for
> > reserved area, and use vmalloc_to_page.
> >
> > Anyway here using dma api will also trigger issue.
> 
> Is the stack trace above for the RPMesg issue or for the Trusty issue?

The stack trace you pasted is rpmsg issue.

> If it is the stack trace for RPMesg, can you also post the Trusty stack 
> trace? Or
> are they indentical?

There is no stack dump here. It successfully using swiotlb to do a map,
but we actually no need swiotlb in domu to do the map.

Thanks,
Peng.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Stefano Stabellini
On Mon, 29 Jun 2020, Peng Fan wrote:
> > > If that is the case, how is it possible that virtio breaks on ARM
> > > using the default dma_ops? The breakage is not Xen related (except
> > > that Xen turns dma_ops on). The original message from Peng was:
> > >
> > >   vring_map_one_sg -> vring_use_dma_api
> > >-> dma_map_page
> > >  -> __swiotlb_map_page
> > >   ->swiotlb_map_page
> > >   
> > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev,
> > dev_addr)), size, dir);
> > >   However we are using per device dma area for rpmsg, phys_to_virt
> > >   could not return a correct virtual address for virtual address in
> > >   vmalloc area. Then kernel panic.
> > >
> > > I must be missing something. Maybe it is because it has to do with RPMesg?
> > 
> > I think it's an RPMesg bug, yes
> 
> rpmsg bug is another issue, it should not use dma_alloc_coherent for reserved 
> area,
> and use vmalloc_to_page.
> 
> Anyway here using dma api will also trigger issue.

Is the stack trace above for the RPMesg issue or for the Trusty issue?
If it is the stack trace for RPMesg, can you also post the Trusty stack
trace? Or are they indentical?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Stefano Stabellini
On Fri, 26 Jun 2020, Michael S. Tsirkin wrote:
> On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > > 
> > > > > > > > Use xen_swiotlb to determine when vring should use dma APIs to 
> > > > > > > > map the
> > > > > > > > ring: when xen_swiotlb is enabled the dma API is required. When 
> > > > > > > > it is
> > > > > > > > disabled, it is not required.
> > > > > > > > 
> > > > > > > > Signed-off-by: Peng Fan 
> > > > > > > 
> > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for this?
> > > > > > > Xen was there first, but everyone else is using that now.
> > > > > > 
> > > > > > Unfortunately it is complicated and it is not related to
> > > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > > 
> > > > > > 
> > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to translate
> > > > > > foreign mappings (memory coming from other VMs) to physical 
> > > > > > addresses.
> > > > > > On x86, it also uses dma_ops to translate Linux's idea of a physical
> > > > > > address into a real physical address (this is unneeded on ARM.)
> > > > > > 
> > > > > > 
> > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be used on 
> > > > > > Xen/x86
> > > > > > always and on Xen/ARM if Linux is Dom0 (because it has foreign
> > > > > > mappings.) That is why we have the if (xen_domain) return true; in
> > > > > > vring_use_dma_api.
> > > > > 
> > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > > 
> > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > > >
> > > > > Unfortunately as a result Xen never got around to
> > > > > properly setting VIRTIO_F_IOMMU_PLATFORM.
> > > > 
> > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this because
> > > > the usage of swiotlb_xen is not a property of virtio,
> > > 
> > > 
> > > Basically any device without VIRTIO_F_ACCESS_PLATFORM
> > > (that is it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is
> > > what linux calls it) is declared as "special, don't follow normal rules
> > > for access".
> > > 
> > > So yes swiotlb_xen is not a property of virtio, but what *is* a property
> > > of virtio is that it's not special, just a regular device from DMA POV.
> > 
> > I am trying to understand what you meant but I think I am missing
> > something.
> > 
> > Are you saying that modern virtio should always have
> > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other devices?
> 
> I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you
> have some special needs e.g. you are very sure it's ok to bypass DMA
> ops, or you need to support a legacy guest (produced in the window
> between virtio 1 support in 2014 and support for
> VIRTIO_F_ACCESS_PLATFORM in 2016).

Ok thanks. I understand and it makes sense to me.

 
> > > > > > You might have noticed that I missed one possible case above: 
> > > > > > Xen/ARM
> > > > > > DomU :-)
> > > > > > 
> > > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even initialized. 
> > > > > > So if
> > > > > > (xen_domain) return true; would give the wrong answer in that case.
> > > > > > Linux would end up calling the "normal" dma_ops, not swiotlb-xen, 
> > > > > > and
> > > > > > the "normal" dma_ops fail.
> > > > > > 
> > > > > > 
> > > > > > The solution I suggested was to make the check in vring_use_dma_api 
> > > > > > more
> > > > > > flexible by returning true if the swiotlb_xen is supposed to be 
> > > > > > used,
> > > > > > not in general for all Xen domains, because that is what the check 
> > > > > > was
> > > > > > really meant to do.
> > > > > 
> > > > > Why not fix DMA ops so they DTRT (nop) on Xen/ARM DomU? What is wrong 
> > > > > with that?
> > > > 
> > > > swiotlb-xen is not used on Xen/ARM DomU, the default dma_ops are the
> > > > ones that are used. So you are saying, why don't we fix the default
> > > > dma_ops to work with virtio?
> > > > 
> > > > It is bad that the default dma_ops crash with virtio, so yes I think it
> > > > would be good to fix that. However, even if we fixed that, the if
> > > > (xen_domain()) check in vring_use_dma_api is still a problem.
> > > 
> > > Why is it a problem? It just makes virtio use DMA API.
> > > If that in turn works, problem solved.
> > 
> > You are correct in the sense that it would work. However I do think it
> > is wrong for vring_use_dma_api to enable dma_ops/swiotlb-xen for Xen/ARM
> > DomUs that don't need it. There are many 

Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 06:48:28PM +0200, Pierre Morel wrote:
> 
> 
> On 2020-06-29 18:09, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:
> > > An architecture protecting the guest memory against unauthorized host
> > > access may want to enforce VIRTIO I/O device protection through the
> > > use of VIRTIO_F_IOMMU_PLATFORM.
> > > Let's give a chance to the architecture to accept or not devices
> > > without VIRTIO_F_IOMMU_PLATFORM.
> > 
> > I agree it's a bit misleading. Protection is enforced by memory
> > encryption, you can't trust the hypervisor to report the bit correctly
> > so using that as a securoty measure would be pointless.
> > The real gain here is that broken configs are easier to
> > debug.
> > 
> > Here's an attempt at a better description:
> > 
> > On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is
> > required for virtio to function: e.g. this is the case on s390 protected
> > virt guests, since otherwise guest passes encrypted guest memory to 
> > devices,
> > which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the
> > result is that affected memory (or even a whole page containing
> > it is corrupted). Detect and fail probe instead - that is easier
> > to debug.
> 
> Thanks indeed better aside from the "encrypted guest memory": the mechanism
> used to avoid the access to the guest memory from the host by s390 is not
> encryption but a hardware feature denying the general host access and
> allowing pieces of memory to be shared between guest and host.

s/encrypted/protected/

> As a consequence the data read from memory is not corrupted but not read at
> all and the read error kills the hypervizor with a SIGSEGV.

s/(or even a whole page containing it is corrupted)/can not be
read and the read error kills the hypervizor with a SIGSEGV/


As an aside, we could maybe handle that more gracefully
on the hypervisor side.

> 
> > 
> > however, now that we have described what it is (hypervisor
> > misconfiguration) I ask a question: can we be sure this will never
> > ever work? E.g. what if some future hypervisor gains ability to
> > access the protected guest memory in some abstractly secure manner?
> 
> The goal of the s390 PV feature is to avoid this possibility so I don't
> think so; however, there is a possibility that some hardware VIRTIO device
> gain access to the guest's protected memory, even such device does not exist
> yet.
> 
> At the moment such device exists we will need a driver for it, at least to
> enable the feature and apply policies, it is also one of the reasons why a
> hook to the architecture is interesting.


Not neessarily, it could also be fully transparent. See e.g.
recent AMD andvances allowing unmodified guests with SEV.


> > We are blocking this here, and it's hard to predict the future,
> > and a broken hypervisor can always find ways to crash the guest ...
> 
> yes, this is also something to fix on the hypervizor side, Halil is working
> on it.
> 
> > 
> > IMHO it would be safer to just print a warning.
> > What do you think?
> 
> Sadly, putting a warning may not help as qemu is killed if it accesses the
> protected memory.
> Also note that the crash occurs not only on start but also on hotplug.
> 
> Thanks,
> Pierre

Well that depends on where does the warning go. If it's on a serial port
it might be reported host side before the crash triggers.  But
interesting point generally. How about a feature to send a warning code
or string to host then?

> -- 
> Pierre Morel
> IBM Lab Boeblingen

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-29 18:09, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:

An architecture protecting the guest memory against unauthorized host
access may want to enforce VIRTIO I/O device protection through the
use of VIRTIO_F_IOMMU_PLATFORM.
Let's give a chance to the architecture to accept or not devices
without VIRTIO_F_IOMMU_PLATFORM.


I agree it's a bit misleading. Protection is enforced by memory
encryption, you can't trust the hypervisor to report the bit correctly
so using that as a securoty measure would be pointless.
The real gain here is that broken configs are easier to
debug.

Here's an attempt at a better description:

On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is
required for virtio to function: e.g. this is the case on s390 protected
virt guests, since otherwise guest passes encrypted guest memory to 
devices,
which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the
result is that affected memory (or even a whole page containing
it is corrupted). Detect and fail probe instead - that is easier
to debug.


Thanks indeed better aside from the "encrypted guest memory": the 
mechanism used to avoid the access to the guest memory from the host by 
s390 is not encryption but a hardware feature denying the general host 
access and allowing pieces of memory to be shared between guest and host.
As a consequence the data read from memory is not corrupted but not read 
at all and the read error kills the hypervizor with a SIGSEGV.





however, now that we have described what it is (hypervisor
misconfiguration) I ask a question: can we be sure this will never
ever work? E.g. what if some future hypervisor gains ability to
access the protected guest memory in some abstractly secure manner?


The goal of the s390 PV feature is to avoid this possibility so I don't 
think so; however, there is a possibility that some hardware VIRTIO 
device gain access to the guest's protected memory, even such device 
does not exist yet.


At the moment such device exists we will need a driver for it, at least 
to enable the feature and apply policies, it is also one of the reasons 
why a hook to the architecture is interesting.



We are blocking this here, and it's hard to predict the future,
and a broken hypervisor can always find ways to crash the guest ...


yes, this is also something to fix on the hypervizor side, Halil is 
working on it.




IMHO it would be safer to just print a warning.
What do you think?


Sadly, putting a warning may not help as qemu is killed if it accesses 
the protected memory.

Also note that the crash occurs not only on start but also on hotplug.

Thanks,
Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-29 15:44, Cornelia Huck wrote:

On Mon, 29 Jun 2020 15:14:04 +0200
Pierre Morel  wrote:


On 2020-06-19 11:20, Cornelia Huck wrote:

On Thu, 18 Jun 2020 00:29:56 +0200
Halil Pasic  wrote:
   

On Wed, 17 Jun 2020 12:43:57 +0200
Pierre Morel  wrote:



@@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
return 0;
   
+	if (arch_needs_virtio_iommu_platform(dev) &&

+   !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(>dev,
+"virtio: device must provide 
VIRTIO_F_IOMMU_PLATFORM\n");


[Side note: wasn't there a patch renaming this bit on the list? I think
this name is only kept for userspace compat.]


Sorry, I do not understand what you expect from me.
On which mailing list you think there is a patch on?





I'm not sure, divulging the current Linux name of this feature bit is a
good idea, but if everybody else is fine with this, I don't care that


Not sure if that feature name will ever change, as it is exported in
headers. At most, we might want to add the new ACCESS_PLATFORM define
and keep the old one, but that would still mean some churn.
   

much. An alternative would be:
"virtio: device falsely claims to have full access to the memory,
aborting the device"


"virtio: device does not work with limited memory access" ?

But no issue with keeping the current message.
   


If it is OK, I would like to specify that the arch is responsible to
accept or not the device.
The reason why the device is not accepted without IOMMU_PLATFORM is arch
specific.


Hm, I'd think the reason is always the same (the device cannot access
the memory directly), just the way to figure out whether that is the
case or not is arch-specific, as with so many other things. No real
need to go into detail here, I think.



As you like, so I rename the subject to:
"virtio: device does not work with limited memory access"

Regards,

Pierre


--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-29 17:57, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:

An architecture protecting the guest memory against unauthorized host
access may want to enforce VIRTIO I/O device protection through the
use of VIRTIO_F_IOMMU_PLATFORM.

Let's give a chance to the architecture to accept or not devices
without VIRTIO_F_IOMMU_PLATFORM.

Signed-off-by: Pierre Morel 
Acked-by: Jason Wang 
Acked-by: Christian Borntraeger 
---
  arch/s390/mm/init.c |  6 ++
  drivers/virtio/virtio.c | 22 ++
  include/linux/virtio.h  |  2 ++
  3 files changed, 30 insertions(+)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 6dc7c3b60ef6..215070c03226 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
  
  pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
  
@@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev)

return is_prot_virt_guest();
  }
  
+int arch_needs_virtio_iommu_platform(struct virtio_device *dev)

+{
+   return is_prot_virt_guest();
+}
+
  /* protected virtualization */
  static void pv_init(void)
  {
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32a88f2..aa8e01104f86 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, unsigned 
int status)
  }
  EXPORT_SYMBOL_GPL(virtio_add_status);
  
+/*

+ * arch_needs_virtio_iommu_platform - provide arch specific hook when 
finalizing
+ *   features for VIRTIO device dev
+ * @dev: the VIRTIO device being added
+ *
+ * Permits the platform to provide architecture specific functionality when
+ * devices features are finalized. This is the default implementation.
+ * Architecture implementations can override this.
+ */
+
+int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev)
+{
+   return 0;
+}
+
  int virtio_finalize_features(struct virtio_device *dev)
  {
int ret = dev->config->finalize_features(dev);
@@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
return 0;
  
+	if (arch_needs_virtio_iommu_platform(dev) &&

+   !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(>dev,
+"virtio: device must provide 
VIRTIO_F_IOMMU_PLATFORM\n");
+   return -ENODEV;
+   }
+
virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
status = dev->config->get_status(dev);
if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {


Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after?


Yes, you are right,

Thanks,

Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Michael S. Tsirkin
On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:
> An architecture protecting the guest memory against unauthorized host
> access may want to enforce VIRTIO I/O device protection through the
> use of VIRTIO_F_IOMMU_PLATFORM.
> Let's give a chance to the architecture to accept or not devices
> without VIRTIO_F_IOMMU_PLATFORM.

I agree it's a bit misleading. Protection is enforced by memory
encryption, you can't trust the hypervisor to report the bit correctly
so using that as a securoty measure would be pointless.
The real gain here is that broken configs are easier to
debug.

Here's an attempt at a better description:

On some architectures, guest knows that VIRTIO_F_IOMMU_PLATFORM is
required for virtio to function: e.g. this is the case on s390 protected
virt guests, since otherwise guest passes encrypted guest memory to 
devices,
which the device can't read. Without VIRTIO_F_IOMMU_PLATFORM the
result is that affected memory (or even a whole page containing
it is corrupted). Detect and fail probe instead - that is easier
to debug.

however, now that we have described what it is (hypervisor
misconfiguration) I ask a question: can we be sure this will never
ever work? E.g. what if some future hypervisor gains ability to
access the protected guest memory in some abstractly secure manner?
We are blocking this here, and it's hard to predict the future,
and a broken hypervisor can always find ways to crash the guest ...

IMHO it would be safer to just print a warning.
What do you think?



> 
> Signed-off-by: Pierre Morel 
> Acked-by: Jason Wang 
> Acked-by: Christian Borntraeger 
> ---
>  arch/s390/mm/init.c |  6 ++
>  drivers/virtio/virtio.c | 22 ++
>  include/linux/virtio.h  |  2 ++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 6dc7c3b60ef6..215070c03226 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
>  
> @@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev)
>   return is_prot_virt_guest();
>  }
>  
> +int arch_needs_virtio_iommu_platform(struct virtio_device *dev)
> +{
> + return is_prot_virt_guest();
> +}
> +
>  /* protected virtualization */
>  static void pv_init(void)
>  {
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32a88f2..aa8e01104f86 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, 
> unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
>  
> +/*
> + * arch_needs_virtio_iommu_platform - provide arch specific hook when 
> finalizing
> + * features for VIRTIO device dev
> + * @dev: the VIRTIO device being added
> + *
> + * Permits the platform to provide architecture specific functionality when
> + * devices features are finalized. This is the default implementation.
> + * Architecture implementations can override this.
> + */
> +
> +int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev)
> +{
> + return 0;
> +}
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> + if (arch_needs_virtio_iommu_platform(dev) &&
> + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> + dev_warn(>dev,
> +  "virtio: device must provide 
> VIRTIO_F_IOMMU_PLATFORM\n");
> + return -ENODEV;
> + }
> +
>   virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>   status = dev->config->get_status(dev);
>   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index a493eac08393..e8526ae3463e 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv);
>  #define module_virtio_driver(__virtio_driver) \
>   module_driver(__virtio_driver, register_virtio_driver, \
>   unregister_virtio_driver)
> +
> +int arch_needs_virtio_iommu_platform(struct virtio_device *dev);
>  #endif /* _LINUX_VIRTIO_H */
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Cornelia Huck
On Mon, 29 Jun 2020 11:57:14 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:
> > An architecture protecting the guest memory against unauthorized host
> > access may want to enforce VIRTIO I/O device protection through the
> > use of VIRTIO_F_IOMMU_PLATFORM.
> > 
> > Let's give a chance to the architecture to accept or not devices
> > without VIRTIO_F_IOMMU_PLATFORM.
> > 
> > Signed-off-by: Pierre Morel 
> > Acked-by: Jason Wang 
> > Acked-by: Christian Borntraeger 
> > ---
> >  arch/s390/mm/init.c |  6 ++
> >  drivers/virtio/virtio.c | 22 ++
> >  include/linux/virtio.h  |  2 ++
> >  3 files changed, 30 insertions(+)

> > @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
> > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
> > return 0;
> >  
> > +   if (arch_needs_virtio_iommu_platform(dev) &&
> > +   !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> > +   dev_warn(>dev,
> > +"virtio: device must provide 
> > VIRTIO_F_IOMMU_PLATFORM\n");
> > +   return -ENODEV;
> > +   }
> > +
> > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> > status = dev->config->get_status(dev);
> > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {  
> 
> Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after?

But it's only available with VERSION_1 anyway, isn't it? So it probably
also needs to fail when this feature is needed if VERSION_1 has not been
negotiated, I think.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Michael S. Tsirkin
On Wed, Jun 17, 2020 at 12:43:57PM +0200, Pierre Morel wrote:
> An architecture protecting the guest memory against unauthorized host
> access may want to enforce VIRTIO I/O device protection through the
> use of VIRTIO_F_IOMMU_PLATFORM.
> 
> Let's give a chance to the architecture to accept or not devices
> without VIRTIO_F_IOMMU_PLATFORM.
> 
> Signed-off-by: Pierre Morel 
> Acked-by: Jason Wang 
> Acked-by: Christian Borntraeger 
> ---
>  arch/s390/mm/init.c |  6 ++
>  drivers/virtio/virtio.c | 22 ++
>  include/linux/virtio.h  |  2 ++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 6dc7c3b60ef6..215070c03226 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
>  
> @@ -161,6 +162,11 @@ bool force_dma_unencrypted(struct device *dev)
>   return is_prot_virt_guest();
>  }
>  
> +int arch_needs_virtio_iommu_platform(struct virtio_device *dev)
> +{
> + return is_prot_virt_guest();
> +}
> +
>  /* protected virtualization */
>  static void pv_init(void)
>  {
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32a88f2..aa8e01104f86 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -167,6 +167,21 @@ void virtio_add_status(struct virtio_device *dev, 
> unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
>  
> +/*
> + * arch_needs_virtio_iommu_platform - provide arch specific hook when 
> finalizing
> + * features for VIRTIO device dev
> + * @dev: the VIRTIO device being added
> + *
> + * Permits the platform to provide architecture specific functionality when
> + * devices features are finalized. This is the default implementation.
> + * Architecture implementations can override this.
> + */
> +
> +int __weak arch_needs_virtio_iommu_platform(struct virtio_device *dev)
> +{
> + return 0;
> +}
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> + if (arch_needs_virtio_iommu_platform(dev) &&
> + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> + dev_warn(>dev,
> +  "virtio: device must provide 
> VIRTIO_F_IOMMU_PLATFORM\n");
> + return -ENODEV;
> + }
> +
>   virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>   status = dev->config->get_status(dev);
>   if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {

Well don't you need to check it *before* VIRTIO_F_VERSION_1, not after?



> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index a493eac08393..e8526ae3463e 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -195,4 +195,6 @@ void unregister_virtio_driver(struct virtio_driver *drv);
>  #define module_virtio_driver(__virtio_driver) \
>   module_driver(__virtio_driver, register_virtio_driver, \
>   unregister_virtio_driver)
> +
> +int arch_needs_virtio_iommu_platform(struct virtio_device *dev);
>  #endif /* _LINUX_VIRTIO_H */
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 05:26:03PM +0800, Jason Wang wrote:
> 
> On 2020/6/28 下午5:58, Michael S. Tsirkin wrote:
> > On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote:
> > > This patches extend the vhost IOTLB API to accept batch updating hints
> > > form userspace. When userspace wants update the device IOTLB in a
> > > batch, it may do:
> > > 
> > > 1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag
> > > 2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE
> > > 3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag
> > As long as we are extending the interface,
> > is there some way we could cut down the number of system calls needed
> > here?
> 
> 
> I'm not sure it's worth to do that since usually we only have less than 10
> regions.


Well with a guest iommu I'm guessing it can go up significantly, right?

> A possible method is to carry multiple vhost_iotlb_message in one system
> call.

That's an option.
Also, can kernel handle a batch that is too large by applying
parts of it iteratively?
Or must all changes take place atomically after BATCH_END?
If atomically, we might need to limit batch size to make
sure kernel can keep a copy in memory.


> 
> > 
> > 
> > > Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when
> > > vDPA device support set_map() ops. This is useful for the vDPA device
> > > that want to know all the mappings to tweak their own DMA translation
> > > logic.
> > > 
> > > For vDPA device that doesn't require set_map(), no behavior changes.
> > > 
> > > This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   drivers/vhost/vdpa.c | 30 +++---
> > >   include/uapi/linux/vhost.h   |  2 ++
> > >   include/uapi/linux/vhost_types.h |  7 +++
> > >   3 files changed, 32 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index 453057421f80..8f624bbafee7 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -56,7 +56,9 @@ enum {
> > >   };
> > >   enum {
> > > - VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
> > > + VHOST_VDPA_BACKEND_FEATURES =
> > > + (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) |
> > > + (1ULL << VHOST_BACKEND_F_IOTLB_BATCH),
> > >   };
> > >   /* Currently, only network backend w/o multiqueue is supported. */
> > > @@ -77,6 +79,7 @@ struct vhost_vdpa {
> > >   int virtio_id;
> > >   int minor;
> > >   struct eventfd_ctx *config_ctx;
> > > + int in_batch;
> > >   };
> > >   static DEFINE_IDA(vhost_vdpa_ida);
> > > @@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v)
> > >   const struct vdpa_config_ops *ops = vdpa->config;
> > >   ops->set_status(vdpa, 0);
> > > + v->in_batch = 0;
> > >   }
> > >   static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user 
> > > *argp)
> > > @@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
> > >   if (ops->dma_map)
> > >   r = ops->dma_map(vdpa, iova, size, pa, perm);
> > > - else if (ops->set_map)
> > > - r = ops->set_map(vdpa, dev->iotlb);
> > > - else
> > > + else if (ops->set_map) {
> > > + if (!v->in_batch)
> > > + r = ops->set_map(vdpa, dev->iotlb);
> > > + } else
> > >   r = iommu_map(v->domain, iova, pa, size,
> > > perm_to_iommu_flags(perm));
> > > @@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, 
> > > u64 iova, u64 size)
> > >   if (ops->dma_map)
> > >   ops->dma_unmap(vdpa, iova, size);
> > > - else if (ops->set_map)
> > > - ops->set_map(vdpa, dev->iotlb);
> > > - else
> > > + else if (ops->set_map) {
> > > + if (!v->in_batch)
> > > + ops->set_map(vdpa, dev->iotlb);
> > > + } else
> > >   iommu_unmap(v->domain, iova, size);
> > >   }
> > > @@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct 
> > > vhost_dev *dev,
> > >   struct vhost_iotlb_msg *msg)
> > >   {
> > >   struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, 
> > > vdev);
> > > + struct vdpa_device *vdpa = v->vdpa;
> > > + const struct vdpa_config_ops *ops = vdpa->config;
> > >   int r = 0;
> > >   r = vhost_dev_check_owner(dev);
> > > @@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct 
> > > vhost_dev *dev,
> > >   case VHOST_IOTLB_INVALIDATE:
> > >   vhost_vdpa_unmap(v, msg->iova, msg->size);
> > >   break;
> > > + case VHOST_IOTLB_BATCH_BEGIN:
> > > + v->in_batch = true;
> > > + break;
> > > + case VHOST_IOTLB_BATCH_END:
> > > + if (v->in_batch && ops->set_map)
> > > + ops->set_map(vdpa, dev->iotlb);
> > > + v->in_batch = 

Re: [RFC 0/3] virtio: NUMA-aware memory allocation

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote:
> On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> > 
> > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > > These patches are not ready to be merged because I was unable to measure a
> > > performance improvement. I'm publishing them so they are archived in case
> > > someone picks up this work again in the future.
> > > 
> > > The goal of these patches is to allocate virtqueues and driver state from 
> > > the
> > > device's NUMA node for optimal memory access latency. Only guests with a 
> > > vNUMA
> > > topology and virtio devices spread across vNUMA nodes benefit from this.  
> > > In
> > > other cases the memory placement is fine and we don't need to take NUMA 
> > > into
> > > account inside the guest.
> > > 
> > > These patches could be extended to virtio_net.ko and other devices in the
> > > future. I only tested virtio_blk.ko.
> > > 
> > > The benchmark configuration was designed to trigger worst-case NUMA 
> > > placement:
> > >   * Physical NVMe storage controller on host NUMA node 0

It's possible that numa is not such a big deal for NVMe.
And it's possible that bios misconfigures ACPI reporting NUMA placement
incorrectly.
I think that the best thing to try is to use a ramdisk
on a specific numa node.

> > >   * IOThread pinned to host NUMA node 0
> > >   * virtio-blk-pci device in vNUMA node 1
> > >   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
> > >   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
> > > 
> > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host 
> > > NUMA
> > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci 
> > > devic=
> > > e.
> > > Applying these patches fixes memory placement so that virtqueues and 
> > > driver
> > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is 
> > > located.
> > > 
> > > The fio 4KB randread benchmark results do not show a significant 
> > > improvement:
> > > 
> > > Name  IOPS   Error
> > > virtio-blk42373.79 =C2=B1 0.54%
> > > virtio-blk-numa   42517.07 =C2=B1 0.79%
> > 
> > 
> > I remember I did something similar in vhost by using page_to_nid() for
> > descriptor ring. And I get little improvement as shown here.
> > 
> > Michael reminds that it was probably because all data were cached. So I
> > doubt if the test lacks sufficient stress on the cache ...
> 
> Yes, that sounds likely. If there's no real-world performance
> improvement then I'm happy to leave these patches unmerged.
> 
> Stefan


Well that was for vhost though. This is virtio, which is different.
Doesn't some benchmark put pressure on the CPU cache?


I kind of feel there should be a difference, and the fact there isn't
means there's some other bottleneck somewhere. Might be worth
figuring out.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Cornelia Huck
On Mon, 29 Jun 2020 15:14:04 +0200
Pierre Morel  wrote:

> On 2020-06-19 11:20, Cornelia Huck wrote:
> > On Thu, 18 Jun 2020 00:29:56 +0200
> > Halil Pasic  wrote:
> >   
> >> On Wed, 17 Jun 2020 12:43:57 +0200
> >> Pierre Morel  wrote:  

> >>> @@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device 
> >>> *dev)
> >>>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
> >>>   return 0;
> >>>   
> >>> + if (arch_needs_virtio_iommu_platform(dev) &&
> >>> + !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
> >>> + dev_warn(>dev,
> >>> +  "virtio: device must provide 
> >>> VIRTIO_F_IOMMU_PLATFORM\n");  

[Side note: wasn't there a patch renaming this bit on the list? I think
this name is only kept for userspace compat.]

> >>
> >> I'm not sure, divulging the current Linux name of this feature bit is a
> >> good idea, but if everybody else is fine with this, I don't care that  
> > 
> > Not sure if that feature name will ever change, as it is exported in
> > headers. At most, we might want to add the new ACCESS_PLATFORM define
> > and keep the old one, but that would still mean some churn.
> >   
> >> much. An alternative would be:
> >> "virtio: device falsely claims to have full access to the memory,
> >> aborting the device"  
> > 
> > "virtio: device does not work with limited memory access" ?
> > 
> > But no issue with keeping the current message.
> >   
> 
> If it is OK, I would like to specify that the arch is responsible to 
> accept or not the device.
> The reason why the device is not accepted without IOMMU_PLATFORM is arch 
> specific.

Hm, I'd think the reason is always the same (the device cannot access
the memory directly), just the way to figure out whether that is the
case or not is arch-specific, as with so many other things. No real
need to go into detail here, I think.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-18 00:29, Halil Pasic wrote:

On Wed, 17 Jun 2020 12:43:57 +0200
Pierre Morel  wrote:


An architecture protecting the guest memory against unauthorized host
access may want to enforce VIRTIO I/O device protection through the
use of VIRTIO_F_IOMMU_PLATFORM.

Let's give a chance to the architecture to accept or not devices
without VIRTIO_F_IOMMU_PLATFORM.


[..]


I'm still not really satisfied with your commit message, furthermore
I did some thinking about the abstraction you introduce here. I will
give a short analysis of that, but first things first. Your patch does
the job of preventing calamity, and the details can be changed any time,
thus:

Acked-by: Halil Pasic 


Thanks,

Connie already answered the other points you raised and I have nothing 
to add on it.


Regards,
Pierre


--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-19 14:02, Halil Pasic wrote:

On Fri, 19 Jun 2020 11:20:51 +0200
Cornelia Huck  wrote:


+   if (arch_needs_virtio_iommu_platform(dev) &&
+   !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(>dev,
+"virtio: device must provide 
VIRTIO_F_IOMMU_PLATFORM\n");


I'm not sure, divulging the current Linux name of this feature bit is a
good idea, but if everybody else is fine with this, I don't care that


Not sure if that feature name will ever change, as it is exported in
headers. At most, we might want to add the new ACCESS_PLATFORM define
and keep the old one, but that would still mean some churn.


much. An alternative would be:
"virtio: device falsely claims to have full access to the memory,
aborting the device"


"virtio: device does not work with limited memory access" ?

But no issue with keeping the current message.


I think I prefer Conny's version, but no strong feelings here.




The reason why the device is not accepted without IOMMU_PLATFORM is arch 
specific, I think it should be clearly stated.

If no strong oposition...

Thanks,
Pierre


--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] s390: virtio: let arch accept devices without IOMMU feature

2020-06-29 Thread Pierre Morel




On 2020-06-19 11:20, Cornelia Huck wrote:

On Thu, 18 Jun 2020 00:29:56 +0200
Halil Pasic  wrote:


On Wed, 17 Jun 2020 12:43:57 +0200
Pierre Morel  wrote:

...


But since this can be rewritten any time, let's go with the option
people already agree with, instead of more discussion.


Yes, there's nothing wrong with the patch as-is.

Acked-by: Cornelia Huck 


Thanks,




Which tree should this go through? Virtio? s390? >


Just another question. Do we want this backported? Do we need cc stable?


It does change behaviour of virtio-ccw devices; but then, it only
fences off configurations that would not have worked anyway.
Distributions should probably pick this; but I do not consider it
strictly a "fix" (more a mitigation for broken configurations), so I'm
not sure whether stable applies.


[..]



  int virtio_finalize_features(struct virtio_device *dev)
  {
int ret = dev->config->finalize_features(dev);
@@ -179,6 +194,13 @@ int virtio_finalize_features(struct virtio_device *dev)
if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
return 0;
  
+	if (arch_needs_virtio_iommu_platform(dev) &&

+   !virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(>dev,
+"virtio: device must provide 
VIRTIO_F_IOMMU_PLATFORM\n");


I'm not sure, divulging the current Linux name of this feature bit is a
good idea, but if everybody else is fine with this, I don't care that


Not sure if that feature name will ever change, as it is exported in
headers. At most, we might want to add the new ACCESS_PLATFORM define
and keep the old one, but that would still mean some churn.


much. An alternative would be:
"virtio: device falsely claims to have full access to the memory,
aborting the device"


"virtio: device does not work with limited memory access" ?

But no issue with keeping the current message.



If it is OK, I would like to specify that the arch is responsible to 
accept or not the device.
The reason why the device is not accepted without IOMMU_PLATFORM is arch 
specific.


Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/3] virtio: NUMA-aware memory allocation

2020-06-29 Thread Stefan Hajnoczi
On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> 
> On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > These patches are not ready to be merged because I was unable to measure a
> > performance improvement. I'm publishing them so they are archived in case
> > someone picks up this work again in the future.
> > 
> > The goal of these patches is to allocate virtqueues and driver state from 
> > the
> > device's NUMA node for optimal memory access latency. Only guests with a 
> > vNUMA
> > topology and virtio devices spread across vNUMA nodes benefit from this.  In
> > other cases the memory placement is fine and we don't need to take NUMA into
> > account inside the guest.
> > 
> > These patches could be extended to virtio_net.ko and other devices in the
> > future. I only tested virtio_blk.ko.
> > 
> > The benchmark configuration was designed to trigger worst-case NUMA 
> > placement:
> >   * Physical NVMe storage controller on host NUMA node 0
> >   * IOThread pinned to host NUMA node 0
> >   * virtio-blk-pci device in vNUMA node 1
> >   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
> >   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
> > 
> > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
> > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci 
> > devic=
> > e.
> > Applying these patches fixes memory placement so that virtqueues and driver
> > state is allocated in vNUMA node 1 where the virtio-blk-pci device is 
> > located.
> > 
> > The fio 4KB randread benchmark results do not show a significant 
> > improvement:
> > 
> > Name  IOPS   Error
> > virtio-blk42373.79 =C2=B1 0.54%
> > virtio-blk-numa   42517.07 =C2=B1 0.79%
> 
> 
> I remember I did something similar in vhost by using page_to_nid() for
> descriptor ring. And I get little improvement as shown here.
> 
> Michael reminds that it was probably because all data were cached. So I
> doubt if the test lacks sufficient stress on the cache ...

Yes, that sounds likely. If there's no real-world performance
improvement then I'm happy to leave these patches unmerged.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 4/5] vhost-vdpa: support IOTLB batching hints

2020-06-29 Thread Jason Wang


On 2020/6/28 下午5:58, Michael S. Tsirkin wrote:

On Thu, Jun 18, 2020 at 01:56:25PM +0800, Jason Wang wrote:

This patches extend the vhost IOTLB API to accept batch updating hints
form userspace. When userspace wants update the device IOTLB in a
batch, it may do:

1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag
2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE
3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag

As long as we are extending the interface,
is there some way we could cut down the number of system calls needed
here?



I'm not sure it's worth to do that since usually we only have less than 
10 regions.


A possible method is to carry multiple vhost_iotlb_message in one system 
call.







Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when
vDPA device support set_map() ops. This is useful for the vDPA device
that want to know all the mappings to tweak their own DMA translation
logic.

For vDPA device that doesn't require set_map(), no behavior changes.

This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability.

Signed-off-by: Jason Wang 
---
  drivers/vhost/vdpa.c | 30 +++---
  include/uapi/linux/vhost.h   |  2 ++
  include/uapi/linux/vhost_types.h |  7 +++
  3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 453057421f80..8f624bbafee7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -56,7 +56,9 @@ enum {
  };
  
  enum {

-   VHOST_VDPA_BACKEND_FEATURES = (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)
+   VHOST_VDPA_BACKEND_FEATURES =
+   (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2) |
+   (1ULL << VHOST_BACKEND_F_IOTLB_BATCH),
  };
  
  /* Currently, only network backend w/o multiqueue is supported. */

@@ -77,6 +79,7 @@ struct vhost_vdpa {
int virtio_id;
int minor;
struct eventfd_ctx *config_ctx;
+   int in_batch;
  };
  
  static DEFINE_IDA(vhost_vdpa_ida);

@@ -125,6 +128,7 @@ static void vhost_vdpa_reset(struct vhost_vdpa *v)
const struct vdpa_config_ops *ops = vdpa->config;
  
  	ops->set_status(vdpa, 0);

+   v->in_batch = 0;
  }
  
  static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)

@@ -540,9 +544,10 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
  
  	if (ops->dma_map)

r = ops->dma_map(vdpa, iova, size, pa, perm);
-   else if (ops->set_map)
-   r = ops->set_map(vdpa, dev->iotlb);
-   else
+   else if (ops->set_map) {
+   if (!v->in_batch)
+   r = ops->set_map(vdpa, dev->iotlb);
+   } else
r = iommu_map(v->domain, iova, pa, size,
  perm_to_iommu_flags(perm));
  
@@ -559,9 +564,10 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size)
  
  	if (ops->dma_map)

ops->dma_unmap(vdpa, iova, size);
-   else if (ops->set_map)
-   ops->set_map(vdpa, dev->iotlb);
-   else
+   else if (ops->set_map) {
+   if (!v->in_batch)
+   ops->set_map(vdpa, dev->iotlb);
+   } else
iommu_unmap(v->domain, iova, size);
  }
  
@@ -655,6 +661,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,

struct vhost_iotlb_msg *msg)
  {
struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
int r = 0;
  
  	r = vhost_dev_check_owner(dev);

@@ -668,6 +676,14 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev 
*dev,
case VHOST_IOTLB_INVALIDATE:
vhost_vdpa_unmap(v, msg->iova, msg->size);
break;
+   case VHOST_IOTLB_BATCH_BEGIN:
+   v->in_batch = true;
+   break;
+   case VHOST_IOTLB_BATCH_END:
+   if (v->in_batch && ops->set_map)
+   ops->set_map(vdpa, dev->iotlb);
+   v->in_batch = false;
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 0c2349612e77..565da96f55d5 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -91,6 +91,8 @@
  
  /* Use message type V2 */

  #define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+/* IOTLB can accpet batching hints */

typo



Will fix.





+#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
  
  #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)

  #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 669457ce5c48..5c12faffdde9 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -60,6 +60,13 @@ struct vhost_iotlb_msg {
  

RE: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Peng Fan
> Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> 
> On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote:
> > > > > > Anyway, re-reading the last messages of the original thread
> > > > > > [1], it looks like Peng had a clear idea on how to fix the general 
> > > > > > issue.
> > > > > > Peng, what happened with that?
> > > >
> > > > We shrinked the rpmsg reserved area to workaround the issue.
> > > > So still use the dma apis in rpmsg.
> > > >
> > > > But here I am going to address domu android trusty issue using virtio.
> > >
> > > My suggestion is to first of all fix DMA API so it works properly.
> >
> > Could you please elaborate more details?
> >
> > You mean the DMA API usage of rpmsg? Or xen domu dma_ops?
> >
> > Thanks,
> > Peng.
> 
> Not 100% sure but I think xen dma ops.

Sorry to ask again, could you explain more details about what issues
do you see of current xen ARM domu dma_ops? 
Or Dom0 swiotlb xen dma_ops?

Thanks,
Pen.g

> 
> --
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 06:25:41AM +, Peng Fan wrote:
> > > > > Anyway, re-reading the last messages of the original thread [1],
> > > > > it looks like Peng had a clear idea on how to fix the general issue.
> > > > > Peng, what happened with that?
> > >
> > > We shrinked the rpmsg reserved area to workaround the issue.
> > > So still use the dma apis in rpmsg.
> > >
> > > But here I am going to address domu android trusty issue using virtio.
> > 
> > My suggestion is to first of all fix DMA API so it works properly.
> 
> Could you please elaborate more details?
> 
> You mean the DMA API usage of rpmsg? Or xen domu dma_ops?
> 
> Thanks,
> Peng. 

Not 100% sure but I think xen dma ops.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Peng Fan
> Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> 
> On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote:
> > > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> > >
> > > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini
> wrote:
> > > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > > > >
> > > > > > > > > > Use xen_swiotlb to determine when vring should use dma
> > > > > > > > > > APIs to map the
> > > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required.
> > > > > > > > > > When it is disabled, it is not required.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Peng Fan 
> > > > > > > > >
> > > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for
> > > this?
> > > > > > > > > Xen was there first, but everyone else is using that now.
> > > > > > > >
> > > > > > > > Unfortunately it is complicated and it is not related to
> > > > > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > > > >
> > > > > > > >
> > > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to
> > > > > > > > translate foreign mappings (memory coming from other VMs)
> > > > > > > > to
> > > physical addresses.
> > > > > > > > On x86, it also uses dma_ops to translate Linux's idea of
> > > > > > > > a physical address into a real physical address (this is
> > > > > > > > unneeded on ARM.)
> > > > > > > >
> > > > > > > >
> > > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should
> > > > > > > > be used on Xen/x86 always and on Xen/ARM if Linux is Dom0
> > > > > > > > (because it has foreign
> > > > > > > > mappings.) That is why we have the if (xen_domain) return
> > > > > > > > true; in vring_use_dma_api.
> > > > > > >
> > > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > > > >
> > > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also*
> > > > > > > forces DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > > > > >
> > > > > > > Unfortunately as a result Xen never got around to properly
> > > > > > > setting VIRTIO_F_IOMMU_PLATFORM.
> > > > > >
> > > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for
> > > > > > this because the usage of swiotlb_xen is not a property of
> > > > > > virtio,
> > > > >
> > > > >
> > > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is
> > > > > it's name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what
> > > > > linux calls it) is declared as "special, don't follow normal
> > > > > rules for access".
> > > > >
> > > > > So yes swiotlb_xen is not a property of virtio, but what *is* a
> > > > > property of virtio is that it's not special, just a regular
> > > > > device from DMA
> > > POV.
> > > >
> > > > I am trying to understand what you meant but I think I am missing
> > > > something.
> > > >
> > > > Are you saying that modern virtio should always have
> > > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other
> > > devices?
> > >
> > > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if
> > > you have some special needs e.g. you are very sure it's ok to bypass
> > > DMA ops, or you need to support a legacy guest (produced in the
> > > window between virtio 1 support in 2014 and support for
> VIRTIO_F_ACCESS_PLATFORM in 2016).
> > >
> > >
> > > > If that is the case, how is it possible that virtio breaks on ARM
> > > > using the default dma_ops? The breakage is not Xen related (except
> > > > that Xen turns dma_ops on). The original message from Peng was:
> > > >
> > > >   vring_map_one_sg -> vring_use_dma_api
> > > >-> dma_map_page
> > > >-> __swiotlb_map_page
> > > > ->swiotlb_map_page
> > > > 
> > > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev,
> > > dev_addr)), size, dir);
> > > >   However we are using per device dma area for rpmsg, phys_to_virt
> > > >   could not return a correct virtual address for virtual address in
> > > >   vmalloc area. Then kernel panic.
> > > >
> > > > I must be missing something. Maybe it is because it has to do with
> RPMesg?
> > >
> > > I think it's an RPMesg bug, yes
> >
> > rpmsg bug is another issue, it should not use dma_alloc_coherent for
> > reserved area, and use vmalloc_to_page.
> >
> > Anyway here using dma api will also trigger issue.
> >
> > >
> > > >
> > > > > > > > You might have noticed that I missed one possible case above:
> > > > > > > > Xen/ARM DomU :-)
> > > > > > > >
> > > > > > > > Xen/ARM domUs don't need 

Re: [PATCH] xen: introduce xen_vring_use_dma

2020-06-29 Thread Michael S. Tsirkin
On Mon, Jun 29, 2020 at 03:05:19AM +, Peng Fan wrote:
> > Subject: Re: [PATCH] xen: introduce xen_vring_use_dma
> > 
> > On Thu, Jun 25, 2020 at 10:31:27AM -0700, Stefano Stabellini wrote:
> > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 24, 2020 at 02:53:54PM -0700, Stefano Stabellini wrote:
> > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 24, 2020 at 10:59:47AM -0700, Stefano Stabellini wrote:
> > > > > > > On Wed, 24 Jun 2020, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Jun 24, 2020 at 05:17:32PM +0800, Peng Fan wrote:
> > > > > > > > > Export xen_swiotlb for all platforms using xen swiotlb
> > > > > > > > >
> > > > > > > > > Use xen_swiotlb to determine when vring should use dma
> > > > > > > > > APIs to map the
> > > > > > > > > ring: when xen_swiotlb is enabled the dma API is required.
> > > > > > > > > When it is disabled, it is not required.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Peng Fan 
> > > > > > > >
> > > > > > > > Isn't there some way to use VIRTIO_F_IOMMU_PLATFORM for
> > this?
> > > > > > > > Xen was there first, but everyone else is using that now.
> > > > > > >
> > > > > > > Unfortunately it is complicated and it is not related to
> > > > > > > VIRTIO_F_IOMMU_PLATFORM :-(
> > > > > > >
> > > > > > >
> > > > > > > The Xen subsystem in Linux uses dma_ops via swiotlb_xen to
> > > > > > > translate foreign mappings (memory coming from other VMs) to
> > physical addresses.
> > > > > > > On x86, it also uses dma_ops to translate Linux's idea of a
> > > > > > > physical address into a real physical address (this is
> > > > > > > unneeded on ARM.)
> > > > > > >
> > > > > > >
> > > > > > > So regardless of VIRTIO_F_IOMMU_PLATFORM, dma_ops should be
> > > > > > > used on Xen/x86 always and on Xen/ARM if Linux is Dom0
> > > > > > > (because it has foreign
> > > > > > > mappings.) That is why we have the if (xen_domain) return
> > > > > > > true; in vring_use_dma_api.
> > > > > >
> > > > > > VIRTIO_F_IOMMU_PLATFORM makes guest always use DMA ops.
> > > > > >
> > > > > > Xen hack predates VIRTIO_F_IOMMU_PLATFORM so it *also* forces
> > > > > > DMA ops even if VIRTIO_F_IOMMU_PLATFORM is clear.
> > > > > >
> > > > > > Unfortunately as a result Xen never got around to properly
> > > > > > setting VIRTIO_F_IOMMU_PLATFORM.
> > > > >
> > > > > I don't think VIRTIO_F_IOMMU_PLATFORM would be correct for this
> > > > > because the usage of swiotlb_xen is not a property of virtio,
> > > >
> > > >
> > > > Basically any device without VIRTIO_F_ACCESS_PLATFORM (that is it's
> > > > name in latest virtio spec, VIRTIO_F_IOMMU_PLATFORM is what linux
> > > > calls it) is declared as "special, don't follow normal rules for
> > > > access".
> > > >
> > > > So yes swiotlb_xen is not a property of virtio, but what *is* a
> > > > property of virtio is that it's not special, just a regular device from 
> > > > DMA
> > POV.
> > >
> > > I am trying to understand what you meant but I think I am missing
> > > something.
> > >
> > > Are you saying that modern virtio should always have
> > > VIRTIO_F_ACCESS_PLATFORM, hence use normal dma_ops as any other
> > devices?
> > 
> > I am saying it's a safe default. Clear VIRTIO_F_ACCESS_PLATFORM if you have
> > some special needs e.g. you are very sure it's ok to bypass DMA ops, or you
> > need to support a legacy guest (produced in the window between virtio 1
> > support in 2014 and support for VIRTIO_F_ACCESS_PLATFORM in 2016).
> > 
> > 
> > > If that is the case, how is it possible that virtio breaks on ARM
> > > using the default dma_ops? The breakage is not Xen related (except
> > > that Xen turns dma_ops on). The original message from Peng was:
> > >
> > >   vring_map_one_sg -> vring_use_dma_api
> > >-> dma_map_page
> > >  -> __swiotlb_map_page
> > >   ->swiotlb_map_page
> > >   
> > > ->__dma_map_area(phys_to_virt(dma_to_phys(dev,
> > dev_addr)), size, dir);
> > >   However we are using per device dma area for rpmsg, phys_to_virt
> > >   could not return a correct virtual address for virtual address in
> > >   vmalloc area. Then kernel panic.
> > >
> > > I must be missing something. Maybe it is because it has to do with RPMesg?
> > 
> > I think it's an RPMesg bug, yes
> 
> rpmsg bug is another issue, it should not use dma_alloc_coherent for reserved 
> area,
> and use vmalloc_to_page.
> 
> Anyway here using dma api will also trigger issue.
> 
> > 
> > >
> > > > > > > You might have noticed that I missed one possible case above:
> > > > > > > Xen/ARM DomU :-)
> > > > > > >
> > > > > > > Xen/ARM domUs don't need swiotlb_xen, it is not even
> > > > > > > initialized. So if
> > > > > > > (xen_domain) return true; would give the wrong answer in that 
> > > > > > > case.
> > > > > > > Linux would end up calling the "normal" dma_ops, not
> > > > > > > swiotlb-xen, and the "normal" dma_ops fail.
> > > >