Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-11 Thread Michael S. Tsirkin
On Tue, Nov 10, 2015 at 10:54:21AM -0800, Andy Lutomirski wrote:
> On Nov 10, 2015 7:02 AM, "Michael S. Tsirkin"  wrote:
> >
> > On Sun, Nov 08, 2015 at 12:49:46PM +0100, Joerg Roedel wrote:
> > > On Sun, Nov 08, 2015 at 12:37:47PM +0200, Michael S. Tsirkin wrote:
> > > > I have no problem with that. For example, can we teach
> > > > the DMA API on intel x86 to use PT for virtio by default?
> > > > That would allow merging Andy's patches with
> > > > full compatibility with old guests and hosts.
> > >
> > > Well, the only incompatibility comes from an experimental qemu feature,
> > > more explicitly from a bug in that features implementation. So why
> > > should we work around that in the kernel? I think it is not too hard to
> > > fix qemu to generate a correct DMAR table which excludes the virtio
> > > devices from iommu translation.
> > >
> > >
> > >   Joerg
> >
> > It's not that easy - you'd have to dedicate some buses
> > for iommu bypass, and teach management tools to only put
> > virtio there - but it's possible.
> >
> > This will absolutely address guests that don't need to set up IOMMU for
> > virtio devices, and virtio that bypasses the IOMMU.
> >
> > But the problem is that we do want to *allow* guests
> > to set up IOMMU for virtio devices.
> > In that case, these are two other usecases:
> >
> > A- monolitic virtio within QEMU:
> > iommu only needed for VFIO ->
> > guest should always use iommu=pt
> > iommu=on works but is just useless overhead.
> >
> > B- modular out of process virtio outside QEMU:
> > iommu needed for VFIO or kernel driver ->
> > guest should use iommu=pt or iommu=on
> > depending on security/performance requirements
> >
> > Note that there could easily be a mix of these in the same system.
> >
> > So for these cases we do need QEMU to specify to guest that IOMMU covers
> > the virtio devices.  Also, once one does this, the default on linux is
> > iommu=on and not pt, which works but ATM is very slow.
> >
> > This poses three problems:
> >
> > 1. How do we address the different needs of A and B?
> >One way would be for virtio to pass the information to guest
> >using some virtio specific way, and have drivers
> >specify what kind of DMA access they want.
> >
> > 2. (Kind of a subset of 1) once we do allow IOMMU, how do we make sure most 
> > guests
> >use the more sensible iommu=pt.
> >
> > 3. Once we do allow IOMMU, how can we keep existing guests work in this 
> > configuration?
> >Creating different hypervisor configurations depending on guest is very 
> > nasty.
> >Again, one way would be some virtio specific interface.
> >
> > I'd rather we figured the answers to this before merging Andy's patches
> > because I'm concerned that instead of 1 broken configuration
> > (virtio always bypasses IOMMU) we'll get two bad configurations
> > (in the second one, virtio uses the slow default with no
> > gain in security).
> >
> > Suggestions wellcome.
> 
> I think there's still no downside of using my patches, even on x86.
> 
> Old kernels on new QEMU work unless IOMMU is enabled on the host.  I
> think that's the best we can possibly do.
> New kernels work at full speed on old QEMU.

Only if IOMMU is disabled, right?

> New kernels with new QEMU and iommu enabled work slower.  Even newer
> kernels with default passthrough work at full speed, and there's no
> obvious downside to the existence of kernels with just my patches.
> 
> --Andy
> 

I tried to explain the possible downside. Let me try again.  Imagine
that guest kernel notifies hypervisor that it wants IOMMU to actually
work.  This will make old kernel on new QEMU work even with IOMMU
enabled on host - better than "the best we can do" that you described
above.  Specifically, QEMU will assume that if it didn't get
notification, it's an old kernel so it should ignore the IOMMU.

But if we apply your patches this trick won't work.

Without implementing it all, I think the easiest incremental step would
be to teach linux to make passthrough the default when running as a
guest on top of QEMU, put your patches on top. If someone specifies
non passthrough on command line it'll still be broken,
but not too bad.


> >
> > --
> > MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-11 Thread Michael S. Tsirkin
On Sat, Oct 31, 2015 at 12:16:12AM +0900, Joerg Roedel wrote:
> On Thu, Oct 29, 2015 at 11:01:41AM +0200, Michael S. Tsirkin wrote:
> > Example: you have a mix of assigned devices and virtio devices. You
> > don't trust your assigned device vendor not to corrupt your memory so
> > you want to limit the damage your assigned device can do to your guest,
> > so you use an IOMMU for that.  Thus existing iommu=pt within guest is out.
> > 
> > But you trust your hypervisor (you have no choice anyway),
> > and you don't want the overhead of tweaking IOMMU
> > on data path for virtio. Thus iommu=on is out too.
> 
> IOMMUs on x86 usually come with an ACPI table that describes which
> IOMMUs are in the system and which devices they translate. So you can
> easily describe all devices there that are not behind an IOMMU.
> 
> The ACPI table is built by the BIOS, and the platform intialization code
> sets the device dma_ops accordingly. If the BIOS provides wrong
> information in the ACPI table this is a platform bug.

It doesn't look like I managed to put the point across.
My point is that IOMMU is required to do things like
userspace drivers, what we need is a way to express
"there is an IOMMU but it is part of device itself, use passthrough
 unless your driver is untrusted".

> > I'm not sure what ACPI has to do with it.  It's about a way for guest
> > users to specify whether they want to bypass an IOMMU for a given
> > device.
> 
> We have no way yet to request passthrough-mode per-device from the IOMMU
> drivers, but that can easily be added. But as I see it:
> 
> > By the way, a bunch of code is missing on the QEMU side
> > to make this useful:
> > 1. virtio ignores the iommu
> > 2. vhost user ignores the iommu
> > 3. dataplane ignores the iommu
> > 4. vhost-net ignores the iommu
> > 5. VFIO ignores the iommu
> 
> Qemu does not implement IOMMU translation for virtio devices anyway
> (which is fine), so it just should tell the guest so in the ACPI table
> built to describe the emulated IOMMU.
> 
> 
>   Joerg

This is a short term limitation.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


WorldCIST'2016 - Deadline extended: November 22, 2015

2015-11-11 Thread Maria Lemos
...
. Apologize if you receive multiple copies of this email, or if its content 
is irrelevant for you.
... Please forward for your contacts. Thank you so much!
.
...

-
WorldCIST'16 - 4th World Conference on Information Systems and Technologies 
Recife, PE, Brazil
22th-24th of March 2016
http://www.aisti.eu/worldcist16/
---


SCOPE

The WorldCist'16 - 4th World Conference on Information Systems and Technologies 
( http://www.aisti.eu/worldcist16/ ), to be held at Recife, PE, Brazil, 22 - 24 
March 2016, is a global forum for researchers and practitioners to present and 
discuss the most recent innovations, trends, results, experiences and concerns 
in the several perspectives of Information Systems and Technologies.

We are pleased to invite you to submit your papers to WorldCist'16. All 
submissions will be reviewed on the basis of relevance, originality, importance 
and clarity.


THEMES

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);
B) Organizational Models and Information Systems (OMIS);
C) Software and Systems Modeling (SSM);
D) Software Systems, Architectures, Applications and Tools (SSAAT);
E) Multimedia Systems and Applications (MSA);
F) Computer Networks, Mobility and Pervasive Systems (CNMPS);
G) Intelligent and Decision Support Systems (IDSS);
H) Big Data Analytics and Applications (BDAA);
I) Human-Computer Interaction (HCI);
J) Health Informatics (HIS);
K) Information Technologies in Education (ITE);
L) Information Technologies in Radiocommunications (ITR).


TYPES OF SUBMISSIONS AND DECISIONS

Four types of papers can be submitted:

- Full paper: Finished or consolidated R works, to be included in one of the 
Conference themes. These papers are assigned a 10-page limit.

- Short paper: Ongoing works with relevant preliminary results, open to 
discussion. These papers are assigned a 7-page limit.

-Poster paper: Initial work with relevant ideas, open to discussion. These 
papers are assigned to a 4-page limit.

- Company paper: Companies' papers that show practical experience, R & D, 
tools, etc., focused on some topics of the conference. These papers are 
assigned to a 4-page limit.

Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website or 
download a DOC example) be written in English, must not have been published 
before, not be under review for any other conference or publication and not 
include any information leading to the authors’ identification. Therefore, the 
authors’ names, affiliations and bibliographic references should not be 
included in the version for evaluation by the Program Committee. This 
information should only be included in the camera-ready version, saved in Word 
or Latex format and also in PDF format. These files must be accompanied by the 
Consent to Publication form filled out, in a ZIP file, and uploaded at the 
conference management system.

All papers will be subjected to a “double-blind review” by at least two members 
of the Program Committee.

Based on Program Committee evaluation, a paper can be rejected or accepted by 
the Conference Chairs. In the later case, it can be accepted as the type 
originally submitted or as another type. Thus, full papers can be accepted as 
short papers or poster papers only. Similarly, short papers can be accepted as 
poster papers only. In these cases, the authors will be allowed to maintain the 
original number of pages in the camera-ready version.

The authors of accepted poster papers must also build and print a poster to be 
exhibited during the Conference. This poster must follow an A1 or A2 vertical 
format. The Conference can includes Work Sessions where these posters are 
presented and orally discussed, with a 5 minute limit per poster.

The authors of accepted full papers will have 15 minutes to present their work 
in a Conference Work Session; approximately 5 minutes of discussion will follow 
each presentation. The authors of accepted short papers and company papers will 
have 11 minutes to present their work in a Conference Work Session; 
approximately 4 minutes of discussion will follow each presentation.


PUBLICATION AND INDEXING

To ensure that a full paper, short paper, poster paper or company paper is 
published in the Proceedings, at least one of the authors must be fully 
registered by the 27th of December 2015, and the paper must comply with the 
suggested layout and page-limit. Additionally, all recommended changes must be 
addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published in the Conference 
Proceedings. An extra fee must be paid for publication of additional papers, 
with a maximum of one additional paper per registration.

Full and short papers will be published in 

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-11 Thread Andy Lutomirski
On Wed, Nov 11, 2015 at 2:05 AM, Michael S. Tsirkin  wrote:
> On Tue, Nov 10, 2015 at 10:54:21AM -0800, Andy Lutomirski wrote:
>> On Nov 10, 2015 7:02 AM, "Michael S. Tsirkin"  wrote:
>> >
>> > On Sun, Nov 08, 2015 at 12:49:46PM +0100, Joerg Roedel wrote:
>> > > On Sun, Nov 08, 2015 at 12:37:47PM +0200, Michael S. Tsirkin wrote:
>> > > > I have no problem with that. For example, can we teach
>> > > > the DMA API on intel x86 to use PT for virtio by default?
>> > > > That would allow merging Andy's patches with
>> > > > full compatibility with old guests and hosts.
>> > >
>> > > Well, the only incompatibility comes from an experimental qemu feature,
>> > > more explicitly from a bug in that features implementation. So why
>> > > should we work around that in the kernel? I think it is not too hard to
>> > > fix qemu to generate a correct DMAR table which excludes the virtio
>> > > devices from iommu translation.
>> > >
>> > >
>> > >   Joerg
>> >
>> > It's not that easy - you'd have to dedicate some buses
>> > for iommu bypass, and teach management tools to only put
>> > virtio there - but it's possible.
>> >
>> > This will absolutely address guests that don't need to set up IOMMU for
>> > virtio devices, and virtio that bypasses the IOMMU.
>> >
>> > But the problem is that we do want to *allow* guests
>> > to set up IOMMU for virtio devices.
>> > In that case, these are two other usecases:
>> >
>> > A- monolitic virtio within QEMU:
>> > iommu only needed for VFIO ->
>> > guest should always use iommu=pt
>> > iommu=on works but is just useless overhead.
>> >
>> > B- modular out of process virtio outside QEMU:
>> > iommu needed for VFIO or kernel driver ->
>> > guest should use iommu=pt or iommu=on
>> > depending on security/performance requirements
>> >
>> > Note that there could easily be a mix of these in the same system.
>> >
>> > So for these cases we do need QEMU to specify to guest that IOMMU covers
>> > the virtio devices.  Also, once one does this, the default on linux is
>> > iommu=on and not pt, which works but ATM is very slow.
>> >
>> > This poses three problems:
>> >
>> > 1. How do we address the different needs of A and B?
>> >One way would be for virtio to pass the information to guest
>> >using some virtio specific way, and have drivers
>> >specify what kind of DMA access they want.
>> >
>> > 2. (Kind of a subset of 1) once we do allow IOMMU, how do we make sure 
>> > most guests
>> >use the more sensible iommu=pt.
>> >
>> > 3. Once we do allow IOMMU, how can we keep existing guests work in this 
>> > configuration?
>> >Creating different hypervisor configurations depending on guest is very 
>> > nasty.
>> >Again, one way would be some virtio specific interface.
>> >
>> > I'd rather we figured the answers to this before merging Andy's patches
>> > because I'm concerned that instead of 1 broken configuration
>> > (virtio always bypasses IOMMU) we'll get two bad configurations
>> > (in the second one, virtio uses the slow default with no
>> > gain in security).
>> >
>> > Suggestions wellcome.
>>
>> I think there's still no downside of using my patches, even on x86.
>>
>> Old kernels on new QEMU work unless IOMMU is enabled on the host.  I
>> think that's the best we can possibly do.
>> New kernels work at full speed on old QEMU.
>
> Only if IOMMU is disabled, right?
>
>> New kernels with new QEMU and iommu enabled work slower.  Even newer
>> kernels with default passthrough work at full speed, and there's no
>> obvious downside to the existence of kernels with just my patches.
>>
>> --Andy
>>
>
> I tried to explain the possible downside. Let me try again.  Imagine
> that guest kernel notifies hypervisor that it wants IOMMU to actually
> work.  This will make old kernel on new QEMU work even with IOMMU
> enabled on host - better than "the best we can do" that you described
> above.  Specifically, QEMU will assume that if it didn't get
> notification, it's an old kernel so it should ignore the IOMMU.

Can you flesh out this trick?

On x86 IIUC the IOMMU more-or-less defaults to passthrough.  If the
kernel wants, it can switch it to a non-passthrough mode.  My patches
cause the virtio driver to do exactly this, except that the host
implementation doesn't actually exist yet, so the patches will instead
have no particular effect.

On powerpc and sparc, we *already* screwed up.  The host already tells
the guest that there's an IOMMU and that it's *enabled* because those
platforms don't have selective IOMMU coverage the way that x86 does.
So we need to work around it.

I think that, if we want fancy virt-friendly IOMMU stuff like you're
talking about, then the right thing to do is to create a virtio bus
instead of pretending to be PCI.  That bus could have a virtio IOMMU
and its own cross-platform enumeration mechanism for devices on the
bus, and everything would be peachy.

In the mean time, 

Re: [PATCH] virtio_ring: Shadow available ring flags & index

2015-11-11 Thread Michael S. Tsirkin
On Tue, Nov 10, 2015 at 04:21:07PM -0800, Venkatesh Srinivas wrote:
> Improves cacheline transfer flow of available ring header.
> 
> Virtqueues are implemented as a pair of rings, one producer->consumer
> avail ring and one consumer->producer used ring; preceding the
> avail ring in memory are two contiguous u16 fields -- avail->flags
> and avail->idx. A producer posts work by writing to avail->idx and
> a consumer reads avail->idx.
> 
> The flags and idx fields only need to be written by a producer CPU
> and only read by a consumer CPU; when the producer and consumer are
> running on different CPUs and the virtio_ring code is structured to
> only have source writes/sink reads, we can continuously transfer the
> avail header cacheline between 'M' states between cores. This flow
> optimizes core -> core bandwidth on certain CPUs.
> 
> (see: "Software Optimization Guide for AMD Family 15h Processors",
> Section 11.6; similar language appears in the 10h guide and should
> apply to CPUs w/ exclusive caches, using LLC as a transfer cache)
> 
> Unfortunately the existing virtio_ring code issued reads to the
> avail->idx and read-modify-writes to avail->flags on the producer.
> 
> This change shadows the flags and index fields in producer memory;
> the vring code now reads from the shadows and only ever writes to
> avail->flags and avail->idx, allowing the cacheline to transfer
> core -> core optimally.

Sounds logical, I'll apply this after a  bit of testing
of my own, thanks!

> In a concurrent version of vring_bench, the time required for
> 10,000,000 buffer checkout/returns was reduced by ~2% (average
> across many runs) on an AMD Piledriver (15h) CPU:
> 
> (w/o shadowing):
>  Performance counter stats for './vring_bench':
>  5,451,082,016  L1-dcache-loads
>  ...
>2.221477739 seconds time elapsed
> 
> (w/ shadowing):
>  Performance counter stats for './vring_bench':
>  5,405,701,361  L1-dcache-loads
>  ...
>2.168405376 seconds time elapsed

Could you supply the full command line you used
to test this?

> The further away (in a NUMA sense) virtio producers and consumers are
> from each other, the more we expect to benefit. Physical implementations
> of virtio devices and implementations of virtio where the consumer polls
> vring avail indexes (vhost) should also benefit.
> 
> Signed-off-by: Venkatesh Srinivas 

Here's a similar patch for the ring itself:
https://lkml.org/lkml/2015/9/10/111

Does it help you as well?


> ---
>  drivers/virtio/virtio_ring.c | 46 
> 
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 096b857..6262015 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -80,6 +80,12 @@ struct vring_virtqueue {
>   /* Last used index we've seen. */
>   u16 last_used_idx;
>  
> + /* Last written value to avail->flags */
> + u16 avail_flags_shadow;
> +
> + /* Last written value to avail->idx in guest byte order */
> + u16 avail_idx_shadow;
> +
>   /* How to notify other side. FIXME: commonalize hcalls! */
>   bool (*notify)(struct virtqueue *vq);
>  
> @@ -235,13 +241,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>  
>   /* Put entry in available array (but don't update avail->idx until they
>* do sync). */
> - avail = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) & 
> (vq->vring.num - 1);
> + avail = vq->avail_idx_shadow & (vq->vring.num - 1);
>   vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>  
>   /* Descriptors and available array need to be set before we expose the
>* new available array entries. */
>   virtio_wmb(vq->weak_barriers);
> - vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, 
> virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) + 1);
> + vq->avail_idx_shadow++;
> + vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
>   vq->num_added++;
>  
>   pr_debug("Added buffer head %i to %p\n", head, vq);
> @@ -354,8 +361,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>* event. */
>   virtio_mb(vq->weak_barriers);
>  
> - old = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->num_added;
> - new = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx);
> + old = vq->avail_idx_shadow - vq->num_added;
> + new = vq->avail_idx_shadow;
>   vq->num_added = 0;
>  
>  #ifdef DEBUG
> @@ -510,7 +517,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
> int *len)
>   /* If we expect an interrupt for the next entry, tell host
>* by writing event index and flush out the write before
>* the read in the next get_buf call. */
> - if (!(vq->vring.avail->flags & cpu_to_virtio16(_vq->vdev, 
> VRING_AVAIL_F_NO_INTERRUPT))) {
> + if (!(vq->avail_flags_shadow & 

Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-11 Thread David Woodhouse
On Wed, 2015-11-11 at 07:56 -0800, Andy Lutomirski wrote:
> 
> Can you flesh out this trick?
> 
> On x86 IIUC the IOMMU more-or-less defaults to passthrough.  If the
> kernel wants, it can switch it to a non-passthrough mode.  My patches
> cause the virtio driver to do exactly this, except that the host
> implementation doesn't actually exist yet, so the patches will instead
> have no particular effect.

At some level, yes — we're compatible with a 1982 IBM PC and thus the
IOMMU is entirely disabled at boot until the kernel turns it on —
except in TXT mode where we abandon that compatibility.

But no, the virtio driver has *nothing* to do with switching the device
out of passthrough mode. It is either in passthrough mode, or it isn't.

If the VMM *doesn't* expose an IOMMU to the guest, obviously the
devices are in passthrough mode. If the guest kernel doesn't have IOMMU
support enabled, then obviously the devices are in passthrough mode.
And if the ACPI tables exposed to the guest kernel *tell* it that the
virtio devices are not actually behind the IOMMU (which qemu gets
wrong), then it'll be in passthrough mode.

If the IOMMU is exposed, and enabled, and telling the guest kernel that
it *does* cover the virtio devices, then those virtio devices will
*not* be in passthrough mode.

You choosing to use the DMA API in the virtio device drivers instead of
being buggy, has nothing to do with whether it's actually in
passthrough mode or not. Whether it's in passthrough mode or not, using
the DMA API is technically the right thing to do — because it should
either *do* the translation, or return a 1:1 mapped IOVA, as
appropriate.


> On powerpc and sparc, we *already* screwed up.  The host already tells
> the guest that there's an IOMMU and that it's *enabled* because those
> platforms don't have selective IOMMU coverage the way that x86 does.
> So we need to work around it.

No, we need it on x86 too because once we fix the virtio device driver
bug and make it start using the DMA API, then we start to trip up on
the qemu bug where it lies about which devices are covered by the
IOMMU.

Of course, we still have that same qemu bug w.r.t. assigned devices,
which it *also* claims are behind its IOMMU when they're not...

> I think that, if we want fancy virt-friendly IOMMU stuff like you're
> talking about, then the right thing to do is to create a virtio bus
> instead of pretending to be PCI.  That bus could have a virtio IOMMU
> and its own cross-platform enumeration mechanism for devices on the
> bus, and everything would be peachy.

That doesn't really help very much for the x86 case where the problem
is compatibility with *existing* (arguably broken) qemu
implementations.

Having said that, if this were real hardware I'd just be blacklisting
it and saying "Another BIOS with broken DMAR tables --> IOMMU
completely disabled". So perhaps we should just do that.


> I still don't understand what trick.  If we want virtio devices to be
> assignable, then they should be translated through the IOMMU, and the
> DMA API is the right interface for that.

The DMA API is the right interface *regardless* of whether there's
actual translation to be done. The device driver itself should not be
involved in any way with that decision.

When you want to access MMIO, you use ioremap() and writel() instead of
doing random crap for yourself. When you want DMA, you use the DMA API
to get a bus address for your device *even* if you expect there to be
no IOMMU and you expect it to precisely match the physical address. No
excuses.

-- 
dwmw2




smime.p7s
Description: S/MIME cryptographic signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization