Re: virtio fixes pull for 4.0?
Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 09, 2015 at 05:43:19PM +1030, Rusty Russell wrote: I think it's a good idea to merge these patches (maybe except the !TASK_RUNNING thing) sooner rather than later, to make sure people have the time to test the fixes properly. Would you like me to pack up (some of them) them up and do a pull request? I'm waiting a bit longer, since they seem to still be tricking in. I don't think there was a pull request since rc1 which makes me a bit anxious. Any outstanding issues? Well, I was hoping for an ack on your mmio patches. But the weekend is over, so I've pushed and sent Linus a pull request. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 log fixed] virtio_mmio: fix endian-ness for mmio
Michael S. Tsirkin m...@redhat.com writes: Subject: [PATCH] virtio_mmio: fix access width for mmio Just for the record: Applied. Thanks, Rusty. Going over the virtio mmio code, I noticed that it doesn't correctly access modern device config values using natural accessors: it uses readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states: 4.2.2.2 Driver Requirements: MMIO Device Register Layout ... The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Borrow code from virtio_pci_modern to do this correctly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This is exactly the same as PATCH virtio_mmio: fix endian-ness for mmio but with corrected subject and commit log, to make it easier to apply. Note: untested: QEMU doesn't support virtio 1.0 for virtio-mmio, but seems pretty obvious. Let's apply ASAP so we don't ship incompliant drivers for virtio 1.0? drivers/virtio/virtio_mmio.c | 79 +++- 1 file changed, 71 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index cad5698..0375456 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -156,22 +156,85 @@ static void vm_get(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - ptr[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + ptr[i] = readb(base + offset + i); + return; + } + + switch (len) { + case 1: + b = readb(base + offset); + memcpy(buf, b, sizeof b); + break; + case 2: + w = cpu_to_le16(readw(base + offset)); + memcpy(buf, w, sizeof w); + break; + case 4: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + break; + case 8: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + l = cpu_to_le32(ioread32(base + offset + sizeof l)); + memcpy(buf + sizeof l, l, sizeof l); + break; + default: + BUG(); + } } static void vm_set(struct virtio_device *vdev, unsigned offset, const void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - const u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - writeb(ptr[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + const u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + writeb(ptr[i], base + offset + i); + + return; + } + + switch (len) { + case 1: + memcpy(b, buf, sizeof b); + writeb(b, base + offset); + break; + case 2: + memcpy(w, buf, sizeof w); + writew(le16_to_cpu(w), base + offset); + break; + case 4: + memcpy(l, buf, sizeof l); + writel(le32_to_cpu(l), base + offset); + break; + case 8: + memcpy(l, buf, sizeof l); + writel(le32_to_cpu(l), base + offset); + memcpy(l, buf + sizeof l, sizeof l); + writel(le32_to_cpu(l), base + offset + sizeof l); + break; + default: + BUG(); + } } static u8 vm_get_status(struct virtio_device *vdev) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device
Ohad Ben-Cohen o...@wizery.com writes: On Mon, Mar 9, 2015 at 10:41 AM, Michael S. Tsirkin m...@redhat.com wrote: On Sat, Mar 07, 2015 at 08:06:56PM +0100, Michael S. Tsirkin wrote: virtio spec requires that all drivers set DRIVER_OK before using devices. While rpmsg isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio rpmsg violates this rule: is calls kick before setting DRIVER_OK. The fix isn't trivial since simply calling virtio_device_ready earlier would mean we might get an interrupt in parallel with adding buffers. Instead, split kick out to prepare+notify calls. prepare before virtio_device_ready - when we know we won't get interrupts. notify right afterwards. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Ohad, can you review and ack pls? Sure, Acked-by: Ohad Ben-Cohen o...@wizery.com Applied. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: On Wed, 25 Feb 2015 16:11:27 +0100 Cornelia Huck cornelia.h...@de.ibm.com wrote: On Wed, 25 Feb 2015 15:36:02 +0100 Michael S. Tsirkin m...@redhat.com wrote: virtio balloon has this code: wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update || kthread_should_stop() || freezing(current)); Which is a problem because towards_target() call might block after wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing the task_struct::state collision typical of nesting of sleeping primitives See also http://lwn.net/Articles/628628/ or Thomas's bug report http://article.gmane.org/gmane.linux.kernel.virtualization/24846 for a fuller explanation. To fix, rewrite using wait_woken. Cc: sta...@vger.kernel.org Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- changes from v1: remove wait_event_interruptible noticed by Cornelia Huck cornelia.h...@de.ibm.com drivers/virtio/virtio_balloon.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) I was able to reproduce Thomas' original problem and can confirm that it is gone with this patch. Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Right, I just applied the patch on my system, too, and the problem is indeed gone! Thanks for the quick fix! Tested-by: Thomas Huth th...@linux.vnet.ibm.com Applied. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Cornelia Huck cornelia.h...@de.ibm.com writes: On Wed, 4 Mar 2015 11:25:56 +0100 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 04, 2015 at 04:44:54PM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote: Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. I agree, but I'd like to point out that even without kmalloc, on s390 get_config is blocking - it's waiting for a hardware interrupt. And it makes sense: config is not data path, I don't think we should spin there. So I think besides these two parts, we still need my two patches: virtio-balloon: do not call blocking ops when !TASK_RUNNING I prefer to annotate, over trying to fix this. Because it's not important. We might spin a few times, but it's very unlikely, and it's certainly not performance critical. Thanks, Rusty. Subject: virtio_balloon: annotate possible sleep waiting for event. CCW (s390) does this. Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0413157f3b49..3f4d5acdbde0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -340,6 +340,15 @@ static int balloon(void *_vballoon) s64 diff; try_to_freeze(); + + /* + * Reading the config on the ccw backend involves an + * allocation, so we may actually sleep and have an + * extra iteration. It's extremely unlikely, Hmm, this part of the comment seems wrong to me. Reading the config on the ccw backend always sleeps because it's interrupt driven. (...) So I suspect http://mid.gmane.org/1424874878-17155-1-git-send-email-...@redhat.com is better. What do you think? I'd prefer to fix this as well. While the I/O request completes instantly on current qemu (the ssch backend handles the start function immediately, not asynchronously as on real hardware), this (a) is an implementation detail that may change and (b) doesn't account for the need to deliver the interrupt to the guest - which might take non-zero time. Ah, I see. My mistake. I've thrown out my patch, applied that one. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device
Michael S. Tsirkin m...@redhat.com writes: virtio spec requires that all drivers set DRIVER_OK before using devices. While rpmsg isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio rpmsg violates this rule: is calls kick before setting DRIVER_OK. The fix isn't trivial since simply calling virtio_device_ready earlier would mean we might get an interrupt in parallel with adding buffers. Instead, split kick out to prepare+notify calls. prepare before virtio_device_ready - when we know we won't get interrupts. notify right afterwards. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. I'll wait for Ohad to ack before sending to Linus. BTW I assume you have a version of qemu which warns on these kind of failures? That'd be nice to have! Thanks, Rusty. --- Note: compile-tested only. drivers/rpmsg/virtio_rpmsg_bus.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index 92f6af6..73354ee 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -951,6 +951,7 @@ static int rpmsg_probe(struct virtio_device *vdev) void *bufs_va; int err = 0, i; size_t total_buf_space; + bool notify; vrp = kzalloc(sizeof(*vrp), GFP_KERNEL); if (!vrp) @@ -1030,8 +1031,22 @@ static int rpmsg_probe(struct virtio_device *vdev) } } + /* + * Prepare to kick but don't notify yet - we can't do this before + * device is ready. + */ + notify = virtqueue_kick_prepare(vrp-rvq); + + /* From this point on, we can notify and get callbacks. */ + virtio_device_ready(vdev); + /* tell the remote processor it can start sending messages */ - virtqueue_kick(vrp-rvq); + /* + * this might be concurrent with callbacks, but we are only + * doing notify, not a full kick here, so that's ok. + */ + if (notify) + virtqueue_notify(vrp-rvq); dev_info(vdev-dev, rpmsg host is online\n); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio fixes pull for 4.0?
Michael S. Tsirkin m...@redhat.com writes: Hi Rusty! There are a bunch of (mostly virtio 1.0 related) fixes for virtio that need to go into 4.0 I think. virtio_blk: typo fix virtio_blk: fix comment for virtio 1.0 OK, I've added these two. I tend to be overcautious after the merge window. virtio_console: init work unconditionally virtio_console: avoid config access from irq virtio_balloon: set DRIVER_OK before using device seem ready? These are in my virtio-next tree already. virtio_mmio: generation support virtio_mmio: fix endian-ness for mmio these two are waiting for ack by Pawel These two fix bugs in virtio 1.0 code for mmio. Host code for that was AFAIK not posted, so I can't test properly. Pawel? I'm waiting on Acks for these two. virtio-balloon: do not call blocking ops when !TASK_RUNNING Rusty, it seems you prefer a different fix for this issue, while Cornelia prefers mine. Maybe both me and Cornelia misunderstand the issue? I know you dealt with a ton of similar issues recently so you are more likely to be right, but I'd like to understand what's going on better all the same. Could you help please? In the longer run, we should handle failures from these callbacks. But we don't need to do that right now. So we want the minimal fix. And an annotation is the minimal fix. The bug has been there for ages; it's just the warning that is new (if we *always* slept, we would spin, but in practice we'll rarely sleep). virtio_rpmsg: set DRIVER_OK before using device Just posted this, but seems pretty obvious. Yep, I've applied this too. Thanks! I think it's a good idea to merge these patches (maybe except the !TASK_RUNNING thing) sooner rather than later, to make sure people have the time to test the fixes properly. Would you like me to pack up (some of them) them up and do a pull request? I'm waiting a bit longer, since they seem to still be tricking in. I'm still chasing a QEMU bug, where it seems to fill in a number too large in the 'len' field for a block device. It should be 1 byte for a block device write, for example. See patch which causes assert() in qemu, but I had to stop at that point (should get back tomorrow I hope). Thanks, Rusty. diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 882a31b..98e99b8 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -243,16 +243,21 @@ int virtio_queue_empty(VirtQueue *vq) } void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx) +unsigned int len_written, unsigned int idx) { -unsigned int offset; +unsigned int offset, tot_wlen; int i; -trace_virtqueue_fill(vq, elem, len, idx); +trace_virtqueue_fill(vq, elem, len_written, idx); + +for (tot_wlen = i = 0; i elem-out_num; i++) { +tot_wlen += elem-out_sg[i].iov_len; +} +assert(len_written = tot_wlen); offset = 0; for (i = 0; i elem-in_num; i++) { -size_t size = MIN(len - offset, elem-in_sg[i].iov_len); +size_t size = MIN(len_written - offset, elem-in_sg[i].iov_len); cpu_physical_memory_unmap(elem-in_sg[i].iov_base, elem-in_sg[i].iov_len, @@ -270,7 +275,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, /* Get a pointer to the next entry in the used ring. */ vring_used_ring_id(vq, idx, elem-index); -vring_used_ring_len(vq, idx, len); +vring_used_ring_len(vq, idx, len_written); } void virtqueue_flush(VirtQueue *vq, unsigned int count) @@ -288,9 +293,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) } void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len) +unsigned int len_written) { -virtqueue_fill(vq, elem, len, 0); +virtqueue_fill(vq, elem, len_written, 0); virtqueue_flush(vq, 1); } diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index df09993..153374f 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -191,10 +191,10 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_del_queue(VirtIODevice *vdev, int n); void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len); +unsigned int len_written); void virtqueue_flush(VirtQueue *vq, unsigned int count); void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx); +unsigned int len_written, unsigned int idx); void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, size_t num_sg, int is_write); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote: Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. I agree, but I'd like to point out that even without kmalloc, on s390 get_config is blocking - it's waiting for a hardware interrupt. And it makes sense: config is not data path, I don't think we should spin there. So I think besides these two parts, we still need my two patches: virtio-balloon: do not call blocking ops when !TASK_RUNNING I prefer to annotate, over trying to fix this. Because it's not important. We might spin a few times, but it's very unlikely, and it's certainly not performance critical. Thanks, Rusty. Subject: virtio_balloon: annotate possible sleep waiting for event. CCW (s390) does this. Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0413157f3b49..3f4d5acdbde0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -340,6 +340,15 @@ static int balloon(void *_vballoon) s64 diff; try_to_freeze(); + + /* +* Reading the config on the ccw backend involves an +* allocation, so we may actually sleep and have an +* extra iteration. It's extremely unlikely, and this +* isn't a fast path in any sense. +*/ + sched_annotate_sleep(); + wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. Cornelia, I think ccw and config_area should be allocated inside vcdev. You could either use pointers, or simply allocate vcdev with GDP_DMA. This would avoid the kmalloc inside these calls. Thanks, Rusty. [0.839687] do not call blocking ops when !TASK_RUNNING; state=1 set at [00174a1e] prepare_to_wait_event+0x7e/0x108 [0.839694] [ cut here ] [0.839697] WARNING: at kernel/sched/core.c:7326 [0.839698] Modules linked in: [0.839702] CPU: 0 PID: 46 Comm: vballoon Not tainted 3.19.0 #233 [0.839705] task: 021d ti: 021d8000 task.ti: 021d8000 [0.839707] Krnl PSW : 0704c0018000 0015bf8e (__might_sleep+0x8e/0x98) [0.839713]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 000d 021d 0071 0001 [0.839718]00675ace 01998c50 [0.839720]00982134 0058f824 00a008a8 [0.839722]04d9 007ea992 0015bf8a 021dbc28 [0.839731] Krnl Code: 0015bf7e: c0200033e838 larl %r2,7d8fee 0015bf84: c0e50028cd62 brasl %r14,675a48 #0015bf8a: a7f40001 brc 15,15bf8c 0015bf8e: 9201a000 mvi 0(%r10),1 0015bf92: a7f4ffe2 brc 15,15bf56 0015bf96: 0707 bcr 0,%r7 0015bf98: ebdff0800024 stmg%r13,%r15,128(%r15) 0015bf9e: a7f13fe0 tmll%r15,16352 [0.839749] Call Trace: [0.839751] ([0015bf8a] __might_sleep+0x8a/0x98) [0.839756] [0028a562] __kmalloc+0x272/0x350 [0.839759] [0058f824] virtio_ccw_get_config+0x3c/0x100 [0.839762] [0049fcb0] balloon+0x1b8/0x330 [0.839765] [001529c8] kthread+0x120/0x138 [0.839767] [00683c22] kernel_thread_starter+0x6/0xc [0.839770] [00683c1c] kernel_thread_starter+0x0/0xc [0.839772] no locks held by vballoon/46. [0.839773] Last Breaking-Event-Address: [0.839776] [0015bf8a] __might_sleep+0x8a/0x98 [0.839778] ---[ end trace d27fcdfa27273d7c ]--- The problem seems to be this code in balloon() in drivers/virtio/virtio_balloon.c: wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update || kthread_should_stop() || freezing(current)); wait_event_interruptible() sets the state of the current task to TASK_INTERRUPTIBLE, then checks the condition. The condition contains towards_target() which reads the virtio config space via virtio_cread(). On s390, this then triggers virtio_ccw_get_config() - and this function calls some other functions again that might sleep (e.g. kzalloc or wait_event in ccw_io_helper) ... and this causes the new kernel warning message with kernel 3.19. I think it would be quite difficult or at least ugly to rewrite virtio_ccw_get_config() so that it does not call sleepable functions anymore. So would it be feasible to rewrite the balloon() function that it does not call the towards_target() in its wait_event condition anymore? I am unfortunately not that familiar with the balloon code semantics, so any help is very appreciated here! Thanks, Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Qemu and virtio 1.0
OK, I am trying to experiment with virtio 1.0 support using the latest kernel and MST's qemu tree: https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/?h=virtio-1.0 The first issue is that the device config endian was wrong (see attached patch). I'm now setting up a BE guest on my x86 laptop, and a BE and LE guest on a BE powerpc machine, to check that all combinations work correctly. If others test too, that would be appreciated! Cheers, Rusty. From 95ac91554ed602f856a2a5fcc25eaffcad1b1c8d Mon Sep 17 00:00:00 2001 From: Rusty Russell ru...@rustcorp.com.au Date: Tue, 24 Feb 2015 14:47:44 +1030 Subject: [PATCH] virtio_config_write*/virtio_config_read*: Don't endian swap for virtio 1.0. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 079944c..882a31b 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -662,7 +662,12 @@ uint32_t virtio_config_readw(VirtIODevice *vdev, uint32_t addr) k-get_config(vdev, vdev-config); -val = lduw_p(vdev-config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = lduw_le_p(vdev-config + addr); +} else { +val = lduw_p(vdev-config + addr); +} return val; } @@ -677,7 +682,12 @@ uint32_t virtio_config_readl(VirtIODevice *vdev, uint32_t addr) k-get_config(vdev, vdev-config); -val = ldl_p(vdev-config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = ldl_le_p(vdev-config + addr); +} else { +val = ldl_p(vdev-config + addr); +} return val; } @@ -706,7 +716,12 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stw_p(vdev-config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stw_le_p(vdev-config + addr, val); +} else { +stw_p(vdev-config + addr, val); +} if (k-set_config) { k-set_config(vdev, vdev-config); @@ -722,7 +737,12 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stl_p(vdev-config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stl_le_p(vdev-config + addr, val); +} else { +stl_p(vdev-config + addr, val); +} if (k-set_config) { k-set_config(vdev, vdev-config); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] virtio-next
The following changes since commit b97f880c8342fd6e49a02c9ef7507a678722b2b3: Merge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata (2015-01-21 07:54:16 +1200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git tags/virtio-next-for-linus for you to fetch changes up to 5b40a7daf51812b35cf05d1601a779a7043f8414: virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. (2015-02-17 16:19:29 +1030) OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. Luis R. Rodriguez (1): virtual: Documentation: simplify and generalize paravirt_ops.txt Michael S. Tsirkin (23): virtio_pci: drop virtio_config dependency virtio/9p: verify device has config space virtio/blk: verify device has config space virtio/console: verify device has config space virtio/net: verify device has config space virtio/scsi: verify device has config space virtio/balloon: verify device has config space mn10300: drop dead code pci: add pci_iomap_range s390: add pci_iomap_range virtio_pci: move probe/remove code to common virtio_pci: modern driver virtio_pci_modern: reduce number of mappings virtio_pci_modern: support devices with no config virtio_balloon: coding style fixes virtio_blk: coding style fixes virtio_ring: coding style fix virtio_rng: drop extra empty line virtio_pci: Kconfig grammar fix virtio_pci: drop Kconfig warnings virtio_pci: add an option to disable legacy driver virtio_pci: add module param to force legacy mode virtio_pci_modern: drop an unused function Pawel Moll (1): virtio-mmio: Update the device to OASIS spec version Rusty Russell (53): virtio-pci: define layout for virtio 1.0 virtio_pci: macros for PCI layout offsets virtio: define VIRTIO_PCI_CAP_PCI_CFG in header. virtio: Don't expose legacy block features when VIRTIO_BLK_NO_LEGACY defined. virtio: Don't expose legacy config features when VIRTIO_CONFIG_NO_LEGACY defined. virtio_pci: use 16-bit accessor for queue_enable. virtio: don't require a config space on the console device. lguest: have --rng read from /dev/urandom not /dev/random. lguest: add operations to get/set a register from the Launcher. lguest: write more information to userspace about pending traps. lguest: add infrastructure for userspace to deliver a trap to the guest. lguest: add infrastructure to check mappings. lguest: send trap 13 through to userspace. lguest: suppress PS/2 keyboard polling. lguest: don't disable iospace. lguest: add iomem region, where guest page faults get sent to userspace. lguest: disable ACPI explicitly. lguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC lguest: add MMIO region allocator in example launcher. lguest: decode mmio accesses for PCI in example launcher. lguest: add PCI config space emulation to example launcher. lguest: implement virtio-PCI MMIO accesses. lguest: fix failure to find linux/virtio_types.h lguest: add a dummy PCI host bridge. lguest: Convert block device to virtio 1.0 PCI. lguest: Convert net device to virtio 1.0 PCI. lguest: Convert entropy device to virtio 1.0 PCI. lguest: Convert console device to virtio 1.0 PCI. lguest: define VIRTIO_CONFIG_NO_LEGACY in example launcher. lguest: remove support for lguest bus. lguest: remove support for lguest bus in demonstration launcher. lguest: remove lguest bus definitions from header. lguest: support emerg_wr in console device in example launcher. lguest: support backdoor window. lguest: always put console in PCI slot #1. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: remove NOTIFY facility from demonstration launcher. lguest: remove NOTIFY call and eventfd facility. tools/lguest: handle device reset correctly in example launcher. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: handle indirect partway through chain. tools/lguest: don't start devices until DRIVER_OK status set. lguest: don't look in console features to find emerg_wr. tools/lguest: more documentation
Re: [PATCH] virtual: Documentation: simplify and generalize paravirt_ops.txt
Luis R. Rodriguez mcg...@do-not-panic.com writes: From: Luis R. Rodriguez mcg...@suse.com The general documentation we have for pv_ops is currenty present on the IA64 docs, but since this documentation covers IA64 xen enablement and IA64 Xen support got ripped out a while ago through commit d52eefb47 present since v3.14-rc1 lets just simplify, generalize and move the pv_ops documentation to a shared place. OK, I've applied this. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] hw_random: fix unregister race.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. v4: add cleanup_done flag to insure that cleanup is done That's a bit weird. The usual pattern would be to hold a reference until we're actually finished, but this reference is a bit weird. We hold the mutex across cleanup, so we could grab that but we have to take care sleeping inside wait_event, otherwise Peter will have to fix my code again :) AFAICT the wake_woken() stuff isn't merged yet, so your patch will have to do for now. @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + rng-cleanup_done = true; + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) kthread_stop(hwrng_fill); } else mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng-cleanup_done +atomic_read(rng-ref.refcount) == 0); + The atomic_read() isn't necessary here. However, you should probably init cleanup_done in hwrng_register(). (Probably noone does unregister then register, but let's be clear). Thanks, Rusty. } EXPORT_SYMBOL_GPL(hwrng_unregister); diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index c212e71..7832e50 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -46,6 +46,7 @@ struct hwrng { /* internal. */ struct list_head list; struct kref ref; + bool cleanup_done; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v4: decrease last reference for triggering the cleanup This doesn't make any sense: +static void drop_current_rng(void) +{ + struct hwrng *rng = current_rng; + + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + /* release current_rng reference */ + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; + + /* decrease last reference for triggering the cleanup */ + kref_put(rng-ref, cleanup_rng); +} Why would it drop the refcount twice? This doesn't make sense. Hmm, because you added kref_init, which initializes the reference count to 1, you created this bug. Leave out the kref_init, and let it naturally be 0 (until, and if, it becomes current_rng). Add a comment if you want. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/6] hw_random: fix unregister race.
Herbert Xu herb...@gondor.apana.org.au writes: On Thu, Sep 18, 2014 at 08:37:45PM +0800, Amos Kong wrote: From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell ru...@rustcorp.com.au You totally corrupted Rusty's patch. If you're going to repost his series you better make sure that you've got the right patches. Just as well though as it made me think a little more about this patch :) OK Amos, can you please repost the complete series? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: We got a warning in boot stage when above set_current_rng() is executed, it can be fixed by init rng-ref in hwrng_init(). @@ -166,6 +169,8 @@ static inline int hwrng_init(struct hwrng *rng) if (current_quality 0 !hwrng_fill) start_khwrngd(); + kref_init(rng-ref); + return 0; } OK, I folded this fix on. Thanks, Rusty. hw_random: use reference counts on each struct hwrng. current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v3: initialize kref (thanks Amos Kong) v2: fix missing put_rng() on exit path (thanks Amos Kong) Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index a0905c818e13..0ecac38da954 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include linux/delay.h #include linux/slab.h #include linux/random.h +#include linux/err.h #include asm/uaccess.h @@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng-cleanup) + rng-cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + kref_get(rng-ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(rng-ref); + + mutex_unlock(rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* +* Hold rng_mutex here so we serialize in case they set_current_rng +* on rng again immediately. +*/ + mutex_lock(rng_mutex); + if (rng) + kref_put(rng-ref, cleanup_rng); + mutex_unlock(rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng-init) { @@ -110,13 +164,9 @@ static inline int hwrng_init(struct hwrng *rng) if (current_quality 0 !hwrng_fill) start_khwrngd(); - return 0; -} + kref_init(rng-ref); -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng rng-cleanup) - rng-cleanup(rng); + return 0; } static int rng_dev_open(struct inode *inode, struct file *filp) @@ -154,21 +204,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { @@ -200,8 +251,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(rng_mutex); mutex_unlock(reading_mutex); + put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); @@ -213,12 +264,11 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock
Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. V2: reduce reference count in signal_pending path OK, I changed it to do the put_rng before the check, so instead of: @@ -208,17 +256,19 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (signal_pending(current)) { err = -ERESTARTSYS; + put_rng(rng); goto out; } + + put_rng(rng); } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } We have: mutex_unlock(reading_mutex); put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; goto out; } } out: return ret ? : err; out_unlock_reading: mutex_unlock(reading_mutex); put_rng(rng); goto out; } Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
Jason Wang jasow...@redhat.com writes: Below should be useful for some experiments Jason is doing. I thought I'd send it out for early review/feedback. event idx feature allows us to defer interrupts until a specific # of descriptors were used. Sometimes it might be useful to get an interrupt after a specific descriptor, regardless. This adds a descriptor flag for this, and an API to create an urgent output descriptor. This is still an RFC: we'll need a feature bit for drivers to detect this, but we've run out of feature bits for virtio 0.X. For experimentation purposes, drivers can assume this is set, or add a driver-specific feature bit. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com The new VRING_DESC_F_URGENT bit is theoretically nicer, but for networking (which tends to take packets in order) couldn't we just set the event counter to give us a tx interrupt at the packet we want? Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 04/25] virtio: defer config changed notifications
Michael S. Tsirkin m...@redhat.com writes: Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. But AFAICT you never *read* dev-config_changed. You unconditionally trigger a config_changed event in virtio_config_enable(). That's a bit weird, but probably OK. How about the following change (on top of your patch). I think the renaming is clearer, and note the added if() test in virtio_config_enable(). If you approve, I'll fold it in. Cheers, Rusty. diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 2536701b098b..df598dd8c5c8 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device *dev) struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); if (!dev-config_enabled) - dev-config_changed = true; + dev-config_change_pending = true; else if (drv drv-config_changed) drv-config_changed(dev); } @@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device *dev) { spin_lock_irq(dev-config_lock); dev-config_enabled = true; - __virtio_config_changed(dev); - dev-config_changed = false; + if (dev-config_change_pending) + __virtio_config_changed(dev); + dev-config_change_pending = false; spin_unlock_irq(dev-config_lock); } @@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev) spin_lock_init(dev-config_lock); dev-config_enabled = false; - dev-config_changed = false; + dev-config_change_pending = false; /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 5636b119dc25..65261a7244fc 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq); * @index: unique position on the virtio bus * @failed: saved value for CONFIG_S_FAILED bit (for restore) * @config_enabled: configuration change reporting enabled - * @config_changed: configuration change reported while disabled + * @config_change_pending: configuration change reported while disabled * @config_lock: protects configuration change reporting * @dev: underlying device. * @id: the device type identification (used to match it with a driver). @@ -94,7 +94,7 @@ struct virtio_device { int index; bool failed; bool config_enabled; - bool config_changed; + bool config_change_pending; spinlock_t config_lock; struct device dev; struct virtio_device_id id; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 08/11] virtio_blk: use virtio v1.0 endian
Cornelia Huck cornelia.h...@de.ibm.com writes: Note that we care only about the fields still in use for virtio v1.0. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Hi Cornelia, These patches all look good; I'm a bit nervous about our testing missing some conversion, so we'll need qemu patches for PCI so we can test on other platforms too. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 03/11] virtio: support more feature bits
Cornelia Huck cornelia.h...@de.ibm.com writes: With virtio-1, we support more than 32 feature bits. Let's make vdev-guest_features depend on the number of supported feature bits, allowing us to grow the feature bits automatically. It's a judgement call, but I would say that simply using uint64_t will be sufficient for quite a while. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 10/25] virtio: add API to enable VQs early
Michael S. Tsirkin m...@redhat.com writes: virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com OK, this all looks good, but I think we should do two things: 1) rename virtio_enable_vqs_early() to virtio_device_ready(). 2) Add a BUG_ON in the virtio_ring code to make sure device is ready before any ops are called. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
Amos Kong ak...@redhat.com writes: I started a QEMU (non-smp) guest with one virtio-rng device, and read random data from /dev/hwrng by dd: # dd if=/dev/hwrng of=/dev/null In the same time, if I check hwrng attributes from sysfs by cat: # cat /sys/class/misc/hw_random/rng_* The cat process always gets stuck with slow backend (5 k/s), if we use a quick backend (1.2 M/s), the cat process will cost 1 to 2 minutes. The stuck doesn't exist for smp guest. Reading syscall enters kernel and call rng_dev_read(), it's user context. We used need_resched() to check if other tasks need to be run, but it almost always return false, and re-hold the mutex lock. The attributes accessing process always fails to hold the lock, so the cat gets stuck. User context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. This is why the need_reshed() always return false here. This patch removed need_resched() and always schedule other tasks then other tasks can have chance to hold the lock and execute protected code. OK, this is going to be a rant. Your explanation doesn't make sense at all. Worse, your solution breaks the advice of Kernighan Plaugher: Don't patch bad code - rewrite it.. But worst of all, this detailed explanation might have convinced me you understood the problem better than I did, and applied your patch. I did some tests. For me, as expected, the process spends its time inside the virtio rng read function, holding the mutex and thus blocking sysfs access; it's not a failure of this code at all. Your schedule_timeout() fix probably just helps by letting the host refresh entropy, so we spend less time waiting in the read fn. I will post a series, which unfortunately is only lightly tested, then I'm going to have some beer to begin my holiday. That may help me forget my disappointment at seeing respected fellow developers monkey-patching random code they don't understand. Grrr Rusty. Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..263a370 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] hw_random: use reference counts on each struct hwrng.
current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 135 -- include/linux/hw_random.h | 2 + 2 files changed, 94 insertions(+), 43 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b1b6042ad85c..dc9092a1075d 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include linux/delay.h #include linux/slab.h #include linux/random.h +#include linux/err.h #include asm/uaccess.h @@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng-cleanup) + rng-cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + kref_get(rng-ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(rng-ref); + + mutex_unlock(rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* +* Hold rng_mutex here so we serialize in case they set_current_rng +* on rng again immediately. +*/ + mutex_lock(rng_mutex); + if (rng) + kref_put(rng-ref, cleanup_rng); + mutex_unlock(rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng-init) { @@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng) return 0; } -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng rng-cleanup) - rng-cleanup(rng); -} - static int rng_dev_open(struct inode *inode, struct file *filp) { /* enforce read-only access to this chrdev */ @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(rng_mutex); mutex_unlock(reading_mutex); if (need_resched()) @@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, err = -ERESTARTSYS; goto out; } + + put_rng(rng); } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev, err = hwrng_init(rng); if (err) break; - hwrng_cleanup(current_rng); - current_rng = rng; + drop_current_rng(); + set_current_rng
[PATCH 5/5] hw_random: don't init list element we're about to add to list.
Another interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 6a34feca6b43..96fa06716e95 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } } - INIT_LIST_HEAD(rng-list); list_add_tail(rng-list, rng_list); if (old_rng !rng-init) { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] hw_random: place mutex around read functions and buffers.
There's currently a big lock around everything, and it means that we can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) while the rng is reading. This is a real problem when the rng is slow, or blocked (eg. virtio_rng with qemu's default /dev/random backend) This doesn't help (it leaves the current lock untouched), just adds a lock to protect the read function and the static buffers, in preparation for transition. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25c8d49..b1b6042ad85c 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -53,7 +53,10 @@ static struct hwrng *current_rng; static struct task_struct *hwrng_fill; static LIST_HEAD(rng_list); +/* Protects rng_list and current_rng */ static DEFINE_MUTEX(rng_mutex); +/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */ +static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; static unsigned short current_quality; @@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng) unsigned char bytes[16]; int bytes_read; + mutex_lock(reading_mutex); bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); + mutex_unlock(reading_mutex); if (bytes_read 0) add_device_randomness(bytes, bytes_read); } @@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, int wait) { int present; + BUG_ON(!mutex_is_locked(reading_mutex)); if (rng-read) return rng-read(rng, (void *)buffer, size, wait); @@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, goto out_unlock; } + mutex_lock(reading_mutex); if (!data_avail) { bytes_read = rng_get_data(current_rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { err = bytes_read; - goto out_unlock; + goto out_unlock_reading; } data_avail = bytes_read; } @@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (!data_avail) { if (filp-f_flags O_NONBLOCK) { err = -EAGAIN; - goto out_unlock; + goto out_unlock_reading; } } else { len = data_avail; @@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (copy_to_user(buf + ret, rng_buffer + data_avail, len)) { err = -EFAULT; - goto out_unlock; + goto out_unlock_reading; } size -= len; @@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(rng_mutex); + mutex_unlock(reading_mutex); if (need_resched()) schedule_timeout_interruptible(1); @@ -208,6 +216,9 @@ out: out_unlock: mutex_unlock(rng_mutex); goto out; +out_unlock_reading: + mutex_unlock(reading_mutex); + goto out_unlock; } @@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused) while (!kthread_should_stop()) { if (!current_rng) break; + mutex_lock(reading_mutex); rc = rng_get_data(current_rng, rng_fillbuf, rng_buffer_size(), 1); + mutex_unlock(reading_mutex); if (rc = 0) { pr_warn(hwrng: no data available\n); msleep_interruptible(1); continue; } + /* Outside lock, sure, but y'know: randomness. */ add_hwgenerator_randomness((void *)rng_fillbuf, rc, rc * current_quality * 8 10); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] hw_random: don't double-check old_rng.
Interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b4a21e9521cf..6a34feca6b43 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -472,14 +472,13 @@ int hwrng_register(struct hwrng *rng) } old_rng = current_rng; + err = 0; if (!old_rng) { err = hwrng_init(rng); if (err) goto out_unlock; set_current_rng(rng); - } - err = 0; - if (!old_rng) { + err = register_miscdev(); if (err) { drop_current_rng(); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] hw_random: fix unregister race.
The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index dc9092a1075d..b4a21e9521cf 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to off */ @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) } mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, atomic_read(rng-ref.refcount) == 0); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
Amos Kong ak...@redhat.com writes: On Sun, Sep 14, 2014 at 01:12:58AM +0800, Amos Kong wrote: On Thu, Sep 11, 2014 at 09:08:03PM +0930, Rusty Russell wrote: Amos Kong ak...@redhat.com writes: When I check hwrng attributes in sysfs, cat process always gets stuck if guest has only 1 vcpu and uses a slow rng backend. Currently we check if there is any tasks waiting to be run on current cpu in rng_dev_read() by need_resched(). But need_resched() doesn't work because rng_dev_read() is executing in user context. I don't understand this explanation? I'd expect the sysfs process to be woken by the mutex_unlock(). But actually sysfs process's not woken always, this is they the process gets stuck. %s/they/why/ Hi Rusty, Reference: http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-2.html Sure, that was true when I wrote it, and is still true when preempt is off. read() syscall of /dev/hwrng will enter into kernel, the read operation is rng_dev_read(), it's userspace context (not interrupt context). Userspace context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. This is true assuming preempt is off, yes. In this case, the need_resched() doesn't work. This is exactly what need_resched() is for: it should return true if there's another process of sufficient priority waiting to be run. It implies that schedule() would run it. git blame doesn't offer any enlightenment here, as to why we use schedule_timeout_interruptible() at all. I would expect mutex_unlock() to wake the other reader. The code certainly seems to, so it should now be runnable and need_resched() should return true. I suspect something else is happening which makes this work. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] virtio-rng: skip reading when we start to remove the device
Amit Shah amit.s...@redhat.com writes: On (Wed) 10 Sep 2014 [14:11:37], Amos Kong wrote: Before we really unregister the hwrng device, reading will get stuck if the virtio device is reset. We should return error for reading when we start to remove the device. Signed-off-by: Amos Kong ak...@redhat.com Cc: sta...@vger.kernel.org Reviewed-by: Amit Shah amit.s...@redhat.com Thanks, applied. They're sitting in my fixes branch. If there are no screams from linux-next, I'll push to Linus Monday. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
Amos Kong ak...@redhat.com writes: When I check hwrng attributes in sysfs, cat process always gets stuck if guest has only 1 vcpu and uses a slow rng backend. Currently we check if there is any tasks waiting to be run on current cpu in rng_dev_read() by need_resched(). But need_resched() doesn't work because rng_dev_read() is executing in user context. I don't understand this explanation? I'd expect the sysfs process to be woken by the mutex_unlock(). If we're really high priority (vs. the sysfs process) then I can see why we'd need schedule_timeout_interruptible() instead of just schedule(), and in that case, need_resched() would be false too. You could argue that's intended behaviour, but I can't see how it happens in the normal case anyway. What am I missing? Thanks, Rusty. This patch removed need_resched() and increase delay to 10 jiffies, then other tasks can have chance to execute protected code. Delaying 1 jiffy also works, but 10 jiffies is safer. Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..b5d1b6f 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(10); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/2] virtio-rng: fix stuck of hot-unplugging busy device
Amit Shah amit.s...@redhat.com writes: On (Wed) 10 Sep 2014 [14:11:36], Amos Kong wrote: When we try to hot-remove a busy virtio-rng device from QEMU monitor, the device can't be hot-removed. Because virtio-rng driver hangs at wait_for_completion_killable(). This patch exits the waiting by completing have_data completion before unregistering, resets data_avail to avoid the hwrng core use wrong buffer bytes. Signed-off-by: Amos Kong ak...@redhat.com Cc: sta...@vger.kernel.org Reviewed-by: Amit Shah amit.s...@redhat.com Thanks, applied. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] virtio: support for urgent descriptors
Michael S. Tsirkin m...@redhat.com writes: Below should be useful for some experiments Jason is doing. I thought I'd send it out for early review/feedback. Compiled-only at this point. It's not a terrible idea, but it will come down to how effective it is in practice. I'm tempted to make it v1.0 only though. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using virtio for inter-VM communication
Jan Kiszka jan.kis...@siemens.com writes: On 2014-06-12 04:27, Rusty Russell wrote: Henning Schild henning.sch...@siemens.com writes: It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. The reason why a trusted helper, i.e. additional logic in the hypervisor, is not our favorite solution is that we'd like to keep the hypervisor as small as possible. I wouldn't exclude such an approach categorically, but we have to weigh the costs (lines of code, additional hypervisor interface) carefully against the gain (existing specifications and guest driver infrastructure). Reasonable, but I think you'll find it is about the minimal implementation in practice. Unfortunately, I don't have time during the next 6 months to implement it myself :( Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA working group discussion): What speaks against introducing an alternative encoding of addresses inside virtio data structures? The idea of this flag was to replace guest-physical addresses with offsets into a shared memory region associated with or part of a virtio device. We would also need a way of defining the shared memory region. But that's not the problem. If such a feature is not accepted by the guest? How to you fall back? We don't add features which unmake the standard. That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. Is it too invasive to existing infrastructure or does it have some other pitfalls? You'll have to convince every vendor to implement your addition to the standard. Which is easier than inventing a completely new system, but it's not quite virtio. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using virtio for inter-VM communication
Henning Schild henning.sch...@siemens.com writes: Hi, i am working on the jailhouse[1] project and am currently looking at inter-VM communication. We want to connect guests directly with virtual consoles based on shared memory. The code complexity in the hypervisor should be minimal, it should just make the shared memory discoverable and provide a signaling mechanism. Hi Henning, The virtio assumption was that the host can see all of guest memory. This simplifies things significantly, and makes it efficient. If you don't have this, *someone* needs to do a copy. Usually the guest OS does a bounce buffer into your shared region. Goodbye performance. Or you can play remapping tricks. Goodbye performance again. My preferred model is to have a trusted helper (ie. host) which understands how to copy between virtio rings. The backend guest (to steal Xen vocab) R/O maps the descriptor, avail ring and used rings in the guest. It then asks the trusted helper to do various operation (copy into writable descriptor, copy out of readable descriptor, mark used). The virtio ring itself acts as a grant table. Note: that helper mechanism is completely protocol agnostic. It was also explicitly designed into the virtio mechanism (with its 4k boundaries for data structures and its 'len' field to indicate how much was written into the descriptor). It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio specification: OOO completion of the xmit buffers in the networking device
Vlad Zolotarov vl...@cloudius-systems.com writes: Rusty, hi! I'd like to ask for a small clarification about the virtio spec. The virtio specification is put the way that it allows the out-of-order completions in general. Although it states that it's mostly relevant to the virtio-blk it doesn't explicitly bans it for virtio networking devices (for generality?). Indeed. (AFAICS the current virtio HV implementations make sure there is no OOO Tx buffers completions). I wonder if this feature (OOO Tx buffers completions) seems practical to u and if not can I hope that it could be clearly stated in the future virtio specification releases that it's forbidden? Are you sure that vhost-net doesn't allow async completion already? I can't immediately see why it can't happen, but MST is CC'd. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 9/9] kvmtool: virtio: enable arm/arm64 support for bi-endianness
Alexander Graf ag...@suse.de writes: On 05/07/2014 11:57 AM, Marc Zyngier wrote: How virtio implementations should determine their endianness is a spec question, I think; at any rate QEMU and kvmtool ought to agree on how it's done. I think the most recent suggestion on the QEMU mailing list (for PPC) is that we should care about the guest kernel endianness, but I don't know if anybody thought of the pass-through-to-userspace usecase... Current opinion on the qemu-devel thread seems to be that we should just define that the endianness of the virtio device is the endianness of the guest kernel at the point where the guest triggers a reset of the virtio device by writing zero the QueuePFN or Status registers. Virtio by design has full access to guest physical memory. It doesn't route DMA via PCI. So user space drivers simply don't make sense here. Huh? What if my guest has usespace using an idmap, with Stage-1 MMU for isolation only (much like an MPU)? R-class guests anyone? Agreed, this is not the general use case, but that doesn't seem to be completely unrealistic either. Yes, and once that user tries the same without idmap virtio ends up overwriting random memory. It's just not a good idea and I'd much rather see us solve this properly with virtio 1.0 really. Slightly orthogonal: virtio 1.0 is LE, so endianness is solved. Of course alternatively we can continue bikeshedding about this until everything becomes moot because we switched to virtio 1.0 ;). Transition will be long... Rusty / Michael, virtio 1.0 does go via normal DMA channels, right? No. We argued about this; it's more PCI-like to do, but there's a performance cost and it's really unclear that passing through a virtio PCI device to a (sub)guest is a scenario worth supporting. Maybe someone will come up with a convincing reason, and we'll add a feature bit in a future revision... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio: make udp more efficient by avoiding indirect desc
Qin Chuanyu qinchua...@huawei.com writes: udp packet use 2 buffers at least, one for vnet_hdr and one for skb-data. we could change the threshold from 2 to 3, so the udp packet which data buff only using single desc will gain from this. the guest would avoid from allocating memory dynamically. the host would avoid from translating indirect desc. Signed-off-by: Chuanyu Qin qinchua...@huawei.com We should really adapt the threshold depending on how full the ring is. If it's gotten cramped recently, prefer indirect. See linux/average.h. Such a change would have to be benchmarked carefully though! Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
Luiz Capitulino lcapitul...@redhat.com writes: On Fri, 17 Jan 2014 09:10:47 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Luiz Capitulino lcapitul...@redhat.com writes: From: Luiz capitulino lcapitul...@redhat.com This commit adds support to a new virtqueue called message virtqueue. OK, this needs a lot of thought (especially since reworking the virtio balloon is on the TODO list for the OASIS virtio technical committee...) But AFAICT this is a way of explicitly saying no to the host's target (vs the implicit method of just not meeting the target). I'm not sure that gives enough information to the host. On the other hand, I'm not sure what information we *can* give. Should we instead be counter-proposing a target? The problem is how to estimate a target value. I found it simpler to just try to obey what the host is asking for (and fail if not possible) than trying to make the guest negotiate with the host. Understood, but we already do this the other way where the host tells the guest how much memory to give up. And is a guest expected to automatically inflate the balloon even if the host doesn't want the memory, or wait to be asked? What does qemu do with this information? There are two possible scenarios: 1. The balloon driver is currently inflating when it gets under pressure QEMU resets num_pages to the current balloon size. This cancels the on-going inflate 2. The balloon driver is not inflating, eg. it's possibly sleeping QEMU issues a deflate But note that those scenarios are not supposed to be used with the current device, they are part of the automatic ballooning feature. I CC'ed you on the QEMU patch, you can find it here case you didn't see it: http://marc.info/?l=kvmm=138988966315690w=2 Yes, caught that after I replied; I should have checked first. It seems like we are still figuring out how to do ballooning. The best approach in cases like this is to make it simple, so I don't hate this. But note that Daniel Kiper and I have been discussing a new virtio balloon for draft 2 of the standard. I'll CC you when I post that to one of the OASIS virtio mailing lists. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
Luiz Capitulino lcapitul...@redhat.com writes: From: Luiz capitulino lcapitul...@redhat.com This commit adds support to a new virtqueue called message virtqueue. OK, this needs a lot of thought (especially since reworking the virtio balloon is on the TODO list for the OASIS virtio technical committee...) But AFAICT this is a way of explicitly saying no to the host's target (vs the implicit method of just not meeting the target). I'm not sure that gives enough information to the host. On the other hand, I'm not sure what information we *can* give. Should we instead be counter-proposing a target? What does qemu do with this information? Thanks, Rusty. The message virtqueue can be used by guests to notify the host about important memory-related state changes in the guest. Currently, the only implemented notification is the guest is under pressure one, which informs the host that the guest is under memory pressure. This notification can be used to implement automatic memory ballooning in the host. For example, once learning that the guest is under pressure, the host could cancel an on-going inflate and/or start a deflate operation. Doing this through a virtqueue might seem like overkill, as all we're doing is to transfer an integer between guest and host. However, using a virtqueue offers the following advantages: 1. We can realibly synchronize host and guest. That is, the guest will only continue once the host acknowledges the notification. This is important, because if the guest gets under pressure while inflating the balloon, it has to stop to give the host a chance to reset num_pages (see next commit) 2. It's extensible. We may (or may not) want to tell the host which pressure level the guest finds itself in (ie. low, medium or critical) The lightweight alternative is to use a configuration space parameter. For this to work though, the guest would have to wait the for host to acknowloedge the receipt of a configuration change update. I could try this if the virtqueue is too overkill. Finally, the guest learns it's under pressure by registering a callback with the in-kernel vmpressure API. FIXMEs: - vmpressure API is missing an de-registration routine - not completely sure my changes in virtballoon_probe() are correct Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 93 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 84 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 5c4a95b..1c3ee71 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -29,6 +29,9 @@ #include linux/module.h #include linux/balloon_compaction.h +#include linux/cgroup.h +#include linux/vmpressure.h + /* * Balloon device works in 4K page units. So each page is pointed to by * multiple balloon pages. All memory counters in this driver are in balloon @@ -37,10 +40,12 @@ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define VIRTIO_BALLOON_MSG_PRESSURE 1 + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq; /* Where the ballooning thread waits for config to change. */ wait_queue_head_t config_change; @@ -51,6 +56,8 @@ struct virtio_balloon /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; + wait_queue_head_t message_acked; + /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; /* @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* Message virtqueue */ + atomic_t guest_pressure; }; static struct virtio_device_id id_table[] = { @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = { { 0 }, }; +static inline bool guest_under_pressure(const struct virtio_balloon *vb) +{ + return atomic_read(vb-guest_pressure) == 1; +} + +static void vmpressure_event_handler(void *data, int level) +{ + struct virtio_balloon *vb = data; + + atomic_set(vb-guest_pressure, 1); + wake_up(vb-config_change); +} + +static void tell_host_pressure(struct virtio_balloon *vb) +{ + const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE; + struct scatterlist sg; + unsigned int len; + int err; + + sg_init_one(sg, msg, sizeof(msg)); + + err = virtqueue_add_outbuf(vb-message_vq, sg, 1, vb, GFP_KERNEL); + if (err 0) { + printk(KERN_WARNING virtio-balloon: failed to send host message
Re: [PATCH V2] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
Jason Wang jasow...@redhat.com writes: From: Asias He asias.he...@gmail.com vqs are freed in virtscsi_freeze but the hotcpu_notifier is not unregistered. We will have a use-after-free usage when the notifier callback is called after virtscsi_freeze. Fixes: 285e71ea6f3583a85e27cb2b9a7d8c35d4c0d558 (virtio-scsi: reset virtqueue affinity when doing cpu hotplug) Sorry, I missed this in my earlier sweep (I was only skimming emails during the break). Applied, Rusty. Cc: sta...@vger.kernel.org Signed-off-by: Asias He asias.he...@gmail.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - Add Fixes line - CC stable --- drivers/scsi/virtio_scsi.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index c3173dc..16bfd50 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -956,6 +956,10 @@ static void virtscsi_remove(struct virtio_device *vdev) #ifdef CONFIG_PM_SLEEP static int virtscsi_freeze(struct virtio_device *vdev) { + struct Scsi_Host *sh = virtio_scsi_host(vdev); + struct virtio_scsi *vscsi = shost_priv(sh); + + unregister_hotcpu_notifier(vscsi-nb); virtscsi_remove_vqs(vdev); return 0; } @@ -964,8 +968,17 @@ static int virtscsi_restore(struct virtio_device *vdev) { struct Scsi_Host *sh = virtio_scsi_host(vdev); struct virtio_scsi *vscsi = shost_priv(sh); + int err; + + err = virtscsi_init(vdev, vscsi); + if (err) + return err; + + err = register_hotcpu_notifier(vscsi-nb); + if (err) + vdev-config-del_vqs(vdev); - return virtscsi_init(vdev, vscsi); + return err; } #endif -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
Jason Wang jasow...@redhat.com writes: On 10/28/2013 04:01 PM, Asias He wrote: vqs are freed in virtscsi_freeze but the hotcpu_notifier is not unregistered. We will have a use-after-free usage when the notifier callback is called after virtscsi_freeze. Signed-off-by: Asias He as...@redhat.com Please include a Fixes: line, especially if you want the CC: stable. Thanks, Rusty. --- drivers/scsi/virtio_scsi.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 74b88ef..b26f1a5 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -957,6 +957,10 @@ static void virtscsi_remove(struct virtio_device *vdev) #ifdef CONFIG_PM static int virtscsi_freeze(struct virtio_device *vdev) { +struct Scsi_Host *sh = virtio_scsi_host(vdev); +struct virtio_scsi *vscsi = shost_priv(sh); + +unregister_hotcpu_notifier(vscsi-nb); virtscsi_remove_vqs(vdev); return 0; } @@ -965,8 +969,17 @@ static int virtscsi_restore(struct virtio_device *vdev) { struct Scsi_Host *sh = virtio_scsi_host(vdev); struct virtio_scsi *vscsi = shost_priv(sh); +int err; + +err = virtscsi_init(vdev, vscsi); +if (err) +return err; + +err = register_hotcpu_notifier(vscsi-nb); +if (err) +vdev-config-del_vqs(vdev); -return virtscsi_init(vdev, vscsi); +return err; } #endif Ping. Rusty, could you please review and apply this patch? Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_balloon: update_balloon_size(): update correct field
Luiz Capitulino lcapitul...@redhat.com writes: According to the virtio spec, the device configuration field that should be updated after an inflation or deflation operation is the 'actual' field, not the 'num_pages' one. Commit 855e0c5288177bcb193f6f6316952d2490478e1c swapped them in update_balloon_size(). Fix it. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Damn, exactly right. Good catch. Applied, Rusty. drivers/virtio/virtio_balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index c444654..5c4a95b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -285,7 +285,7 @@ static void update_balloon_size(struct virtio_balloon *vb) { __le32 actual = cpu_to_le32(vb-num_pages); - virtio_cwrite(vb-vdev, struct virtio_balloon_config, num_pages, + virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual, actual); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio: mmio: fix signature checking for BE guests
Pawel Moll pawel.m...@arm.com writes: On Tue, 2013-11-05 at 03:36 +, Rusty Russell wrote: This particular can is empty - all worms already escaped :-) I just thought that if you wait for 1.0, it will always be little-endian, and if the current qemu only supported little-endian your life might be simpler inside qemu. But I'm sure Marc doesn't want to wait :) It's a bugfix, not changing anything for the existing hosts. So - yes please, do merge this one. Done, thanks. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio: mmio: fix signature checking for BE guests
Rusty Russell ru...@rustcorp.com.au writes: Pawel Moll pawel.m...@arm.com writes: On Fri, 2013-10-11 at 15:36 +0100, Marc Zyngier wrote: As virtio-mmio config registers are specified to be little-endian, using readl() to read the magic value and then memcmp() to check it fails on BE (as readl() has an implicit swab). Fix it by encoding the magic value as an integer instead of a string. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Pawel Moll pawel.m...@arm.com Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/virtio/virtio_mmio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 1ba0d68..57f24fd 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -470,7 +470,7 @@ static int virtio_mmio_probe(struct platform_device *pdev) /* Check magic value */ magic = readl(vm_dev-base + VIRTIO_MMIO_MAGIC_VALUE); - if (memcmp(magic, virt, 4) != 0) { + if (magic != ('v' | 'i' 8 | 'r' 16 | 't' 24)) { dev_warn(pdev-dev, Wrong magic value 0x%08lx!\n, magic); return -ENODEV; } The new spec will clarify this: * 0x000 | R | MagicValue Magic value. Must be 0x74726976 (a Little Endian equivalent of a virt string). ... but I like the 'v'i'r't' characters still being there :-) Acked-by: Pawel Moll pawel.m...@arm.com Applied, thanks. OK, I *said* applied, but left it in my pending queue. Pawel, do you want to open this can of worms? If so, I'l merge this now. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-mmio: handle BE guests on LE hosts
Marc Zyngier marc.zyng...@arm.com writes: On 14/10/13 09:21, Rusty Russell wrote: Marc Zyngier marc.zyng...@arm.com writes: This small patch series adds just enough kernel infrastructure and fixes to allow a BE guest to use virtio-mmio on a LE host, provided that the host actually supports such madness. This has been tested on arm64, with some fixes to KVM and a set of changes to kvmtool, both which I am posting separately. OK, so I already have a patch which supports config space accessors. I've posted the series below, and since you want it, I'll put it in virtio-next after more testing and rebasing... Yes, that's definitely something I'd like to see being merged, as it would allow me to drop a significant chunk of changes. OK, I've just put them into virtio-next. But we won't be using feature bits for endianness, since we're defining endian in the 1.0 spec (whether LE-everywhere or LE-for-mmio-and-PCI is still debated). Since we need to guess for backwards compat anyway, let's keep doing this until v1.0? When is 1.0 going to happen? When will actual implementation of drivers and devices show up in my favourite platform emulation? 1.0 won't be finalized until early next year. We aim to publish the first draft next month, but noone should finalize implementations until the feedback process is complete. Having a grand plan for the future is great, but I need something working right now, or at least fairly soonish... And I need it to be backward compatible, as none of the above is going to show up overnight. Well, mmio BE won't work at all right now due to the signature check being wrong in Linux. So there are two choices: (1) fix it, and use heuristics to figure out if the guest is BE, or (2) say there's no BE mmio, and wait for 1.0. I don't know what your timeline is: you might need to chat with Pawel internally. BTW, for qemu and PPC (though virtio-pci, not mmio) I look at the guest interrupt delivery endian bit upon any virtio device reset to guess endian. Since Linux guests reset the device before doing anything else, this works well, and supports crazy stuff like kexec a kernel in the other endian. (Yeah, I made this mess with native endian. I promise I have learnt my lesson). Paracetamol bill coming you way... ;-) Now I just have to figure out where to send my equivalent bill... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-mmio: handle BE guests on LE hosts
Marc Zyngier marc.zyng...@arm.com writes: This small patch series adds just enough kernel infrastructure and fixes to allow a BE guest to use virtio-mmio on a LE host, provided that the host actually supports such madness. This has been tested on arm64, with some fixes to KVM and a set of changes to kvmtool, both which I am posting separately. OK, so I already have a patch which supports config space accessors. I've posted the series below, and since you want it, I'll put it in virtio-next after more testing and rebasing... But we won't be using feature bits for endianness, since we're defining endian in the 1.0 spec (whether LE-everywhere or LE-for-mmio-and-PCI is still debated). Since we need to guess for backwards compat anyway, let's keep doing this until v1.0? (Yeah, I made this mess with native endian. I promise I have learnt my lesson). Thanks, Rusty. Subject: virtio_config: introduce size-based accessors. This lets the us do endian conversion if necessary, and insulates the drivers from that change. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 29b9104..e62acdd 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -162,5 +135,139 @@ int virtqueue_set_affinity(struct virtqueue *vq, int cpu) return 0; } +/* Config space accessors. */ +#define virtio_cread(vdev, structname, member, ptr)\ + do {\ + /* Must match the member's type, and be integer */ \ + if (!typecheck(typeofstructname*)0)-member)), *(ptr))) \ + (*ptr) = 1; \ + \ + switch (sizeof(*ptr)) { \ + case 1: \ + *(ptr) = virtio_cread8(vdev,\ + offsetof(structname, member)); \ + break; \ + case 2: \ + *(ptr) = virtio_cread16(vdev, \ + offsetof(structname, member)); \ + break; \ + case 4: \ + *(ptr) = virtio_cread32(vdev, \ + offsetof(structname, member)); \ + break; \ + case 8: \ + *(ptr) = virtio_cread64(vdev, \ + offsetof(structname, member)); \ + break; \ + default:\ + BUG(); \ + } \ + } while(0) + +/* Config space accessors. */ +#define virtio_cwrite(vdev, structname, member, ptr) \ + do {\ + /* Must match the member's type, and be integer */ \ + if (!typecheck(typeofstructname*)0)-member)), *(ptr))) \ + BUG_ON((*ptr) == 1);\ + \ + switch (sizeof(*ptr)) { \ + case 1: \ + virtio_cwrite8(vdev,\ + offsetof(structname, member),\ + *(ptr)); \ + break; \ + case 2: \ + virtio_cwrite16(vdev, \ + offsetof(structname, member), \ + *(ptr));\ + break; \ + case 4: \ + virtio_cwrite32(vdev, \ + offsetof(structname, member), \ + *(ptr));\ + break
Re: [PATCH 2/3] virtio: mmio: fix signature checking for BE guests
Pawel Moll pawel.m...@arm.com writes: On Fri, 2013-10-11 at 15:36 +0100, Marc Zyngier wrote: As virtio-mmio config registers are specified to be little-endian, using readl() to read the magic value and then memcmp() to check it fails on BE (as readl() has an implicit swab). Fix it by encoding the magic value as an integer instead of a string. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Pawel Moll pawel.m...@arm.com Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- drivers/virtio/virtio_mmio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 1ba0d68..57f24fd 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -470,7 +470,7 @@ static int virtio_mmio_probe(struct platform_device *pdev) /* Check magic value */ magic = readl(vm_dev-base + VIRTIO_MMIO_MAGIC_VALUE); -if (memcmp(magic, virt, 4) != 0) { +if (magic != ('v' | 'i' 8 | 'r' 16 | 't' 24)) { dev_warn(pdev-dev, Wrong magic value 0x%08lx!\n, magic); return -ENODEV; } The new spec will clarify this: * 0x000 | R | MagicValue Magic value. Must be 0x74726976 (a Little Endian equivalent of a virt string). ... but I like the 'v'i'r't' characters still being there :-) Acked-by: Pawel Moll pawel.m...@arm.com Applied, thanks. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-mmio: handle BE guests on LE hosts
Marc Zyngier marc.zyng...@arm.com writes: Hi Michael, On 12/10/13 19:28, Michael S. Tsirkin wrote: On Fri, Oct 11, 2013 at 03:36:08PM +0100, Marc Zyngier wrote: This small patch series adds just enough kernel infrastructure and fixes to allow a BE guest to use virtio-mmio on a LE host, provided that the host actually supports such madness. This has been tested on arm64, with some fixes to KVM and a set of changes to kvmtool, both which I am posting separately. A branch containing all the relevant changes is at: git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/be-on-le-3.12-rc4 Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: Pawel Moll pawel.m...@arm.com We are changing the spec to make everything LE instead of the native endian. I think that'll fix the issue in a cleaner way. While I agree that it would solve the issue completely, it would also break all BE users. Is that really an option? The whole goal of this series is to implement something that gracefully falls back to what we have today, not breaking anything in the process. Pawel was in favor of incrementing the version number. OTOH, clearly there are no big endian users at the moment, since they fail the magic tests... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-mmio: handle BE guests on LE hosts
Pawel Moll pawel.m...@arm.com writes: On Mon, 2013-10-14 at 11:46 +0100, Michael S. Tsirkin wrote: What I meant is that under the proposed scheme, users with existing v1 drivers must configure a v1 device explicitly. According to the plan, drivers will be updated so they can work with both v1 devices and new v2 devices. But if you configure a v2 device, old drivers will not work. That's correct, yes. If qemu 2015(16? 17?).01 will provide v2 devices only, kernel 3.11 is not going to recognize any of them. What will happen in reality, though, is that the implementation of the Linux v1/v2 driver will pre-date *any* v2 device by far. First implementation of v1 devices appeared almost 2 years after the driver appeared. It will take year for people to upgrade to v2, and probably only after they experience problems with page-based addressing mode. I think I can live with this. Yes, this makes sense. In particular, MMIO doesn't have any users you don't control at this point. I expect it to be less than 2 years to get into qemu, but there'll be no compelling reason to use v2 devices for at least that long. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-mmio: handle BE guests on LE hosts
Michael S. Tsirkin m...@redhat.com writes: On Mon, Oct 14, 2013 at 06:50:48PM +0200, Paolo Bonzini wrote: Il 14/10/2013 17:36, Marc Zyngier ha scritto: Devices are fine in QEMU, it's only the generic parts (rings) that are missing AFAICT. So if I understand correctly how it works, target endianness is set at compile time, and you have a BE specific QEMU? Yes. Though as Alex said, this will have to change for PPC little-endian support. Paolo Or we'll just say that this platform requires virtio 1.0. To come full circle, I have implemented virtio support for BE PPC in qemu :) And we'll be supporting pre-1.0 for the moment. I'm awaiting some ppc-specific hooks, but it could be merged without them. It's icky: we don't really want to use the current endianness of the CPU, since in theory a BE kernel could be running an LE process. So at the time of virtio device reset (the first thing the Linux drivers do) we read the endianness of the interrupt entry point, which is presumably the OS endianness. The result looks like this. In virtio_reset(void *opaque): /* We assume all devices are the same endian. */ virtio_byteswap = virtio_get_byteswap(); And accessors like this are used in all virtio devices and routines: /* Initialized by virtio_get_byteswap() at any virtio device reset. */ extern bool virtio_byteswap; static inline uint16_t virtio_lduw_phys(hwaddr pa) { if (virtio_byteswap) { return bswap16(lduw_phys(pa)); } return lduw_phys(pa); } static inline uint32_t virtio_ldl_phys(hwaddr pa) { if (virtio_byteswap) { return bswap32(ldl_phys(pa)); } return ldl_phys(pa); } static inline uint64_t virtio_ldq_phys(hwaddr pa) { if (virtio_byteswap) { return bswap64(ldq_phys(pa)); } return ldq_phys(pa); } static inline void virtio_stw_phys(hwaddr pa, uint16_t value) { if (virtio_byteswap) { stw_phys(pa, bswap16(value)); } else { stw_phys(pa, value); } } static inline void virtio_stl_phys(hwaddr pa, uint32_t value) { if (virtio_byteswap) { stl_phys(pa, bswap32(value)); } else { stl_phys(pa, value); } } static inline void virtio_stw_p(void *ptr, uint16_t v) { if (virtio_byteswap) { stw_p(ptr, bswap16(v)); } else { stw_p(ptr, v); } } static inline void virtio_stl_p(void *ptr, uint32_t v) { if (virtio_byteswap) { stl_p(ptr, bswap32(v)); } else { stl_p(ptr, v); } } static inline void virtio_stq_p(void *ptr, uint64_t v) { if (virtio_byteswap) { stq_p(ptr, bswap64(v)); } else { stq_p(ptr, v); } } static inline int virtio_lduw_p(const void *ptr) { if (virtio_byteswap) { return bswap16(lduw_p(ptr)); } else { return lduw_p(ptr); } } static inline int virtio_ldl_p(const void *ptr) { if (virtio_byteswap) { return bswap32(ldl_p(ptr)); } else { return ldl_p(ptr); } } static inline uint64_t virtio_ldq_p(const void *ptr) { if (virtio_byteswap) { return bswap64(ldq_p(ptr)); } else { return ldq_p(ptr); } } static inline uint32_t virtio_tswap32(uint32_t s) { if (virtio_byteswap) { return bswap32(tswap32(s)); } else { return tswap32(s); } } static inline void virtio_tswap32s(uint32_t *s) { tswap32s(s); if (virtio_byteswap) { *s = bswap32(*s); } } There are obvious optimizations to this, such as making virtio_byteswap a compile time constant on non-endian-ambivalent targets, but this is the current approach. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-scsi: Fix virtqueue affinity setup
Asias He as...@redhat.com writes: vscsi-num_queues counts the number of request virtqueue which does not include the control and event virtqueue. It is wrong to subtract VIRTIO_SCSI_VQ_BASE from vscsi-num_queues. This patch fixes the following panic. Applied. Thanks, Rusty. (qemu) device_del scsi0 BUG: unable to handle kernel NULL pointer dereference at 0020 IP: [8179b29f] __virtscsi_set_affinity+0x6f/0x120 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 659 Comm: kworker/0:1 Not tainted 3.11.0-rc2+ #1172 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: kacpi_hotplug _handle_hotplug_event_func task: 88007bee1cc0 ti: 88007bfe4000 task.ti: 88007bfe4000 RIP: 0010:[8179b29f] [8179b29f] __virtscsi_set_affinity+0x6f/0x120 RSP: 0018:88007bfe5a38 EFLAGS: 00010202 RAX: 0010 RBX: 880077fd0d28 RCX: 0050 RDX: RSI: 0246 RDI: RBP: 88007bfe5a58 R08: 880077f6ff00 R09: 0001 R10: 8143e673 R11: 0001 R12: 0001 R13: 880077fd0800 R14: R15: 88007bf489b0 FS: () GS:88007ea0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0020 CR3: 79f8b000 CR4: 06f0 Stack: 880077fd0d28 880077fd0800 0008 88007bfe5a78 8179b37d 88007bccc800 88007bccc800 88007bfe5a98 8179b3b6 88007bccc800 880077fd0d28 Call Trace: [8179b37d] virtscsi_set_affinity+0x2d/0x40 [8179b3b6] virtscsi_remove_vqs+0x26/0x50 [8179c7d2] virtscsi_remove+0x82/0xa0 [814cb6b2] virtio_dev_remove+0x22/0x70 [8167ca49] __device_release_driver+0x69/0xd0 [8167cb9d] device_release_driver+0x2d/0x40 [8167bb96] bus_remove_device+0x116/0x150 [81679936] device_del+0x126/0x1e0 [81679a06] device_unregister+0x16/0x30 [814cb889] unregister_virtio_device+0x19/0x30 [814cdad6] virtio_pci_remove+0x36/0x80 [81464ae7] pci_device_remove+0x37/0x70 [8167ca49] __device_release_driver+0x69/0xd0 [8167cb9d] device_release_driver+0x2d/0x40 [8167bb96] bus_remove_device+0x116/0x150 [81679936] device_del+0x126/0x1e0 [8145edfc] pci_stop_bus_device+0x9c/0xb0 [8145f036] pci_stop_and_remove_bus_device+0x16/0x30 [81474a9e] acpiphp_disable_slot+0x8e/0x150 [81474f6a] hotplug_event_func+0xba/0x1a0 [814906c8] ? acpi_os_release_object+0xe/0x12 [81475911] _handle_hotplug_event_func+0x31/0x70 [810b5333] process_one_work+0x183/0x500 [810b66e2] worker_thread+0x122/0x400 [810b65c0] ? manage_workers+0x2d0/0x2d0 [810bc5de] kthread+0xce/0xe0 [810bc510] ? kthread_freezable_should_stop+0x70/0x70 [81ca045c] ret_from_fork+0x7c/0xb0 [810bc510] ? kthread_freezable_should_stop+0x70/0x70 Code: 01 00 00 00 74 59 45 31 e4 83 bb c8 01 00 00 02 74 46 66 2e 0f 1f 84 00 00 00 00 00 49 63 c4 48 c1 e0 04 48 8b bc 0 3 10 02 00 00 48 8b 47 20 48 8b 80 d0 01 00 00 48 8b 40 50 48 85 c0 74 07 be RIP [8179b29f] __virtscsi_set_affinity+0x6f/0x120 RSP 88007bfe5a38 CR2: 0020 ---[ end trace 99679331a3775f48 ]--- CC: sta...@vger.kernel.org Signed-off-by: Asias He as...@redhat.com --- drivers/scsi/virtio_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 2168258..74b88ef 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -751,7 +751,7 @@ static void __virtscsi_set_affinity(struct virtio_scsi *vscsi, bool affinity) vscsi-affinity_hint_set = true; } else { - for (i = 0; i vscsi-num_queues - VIRTIO_SCSI_VQ_BASE; i++) + for (i = 0; i vscsi-num_queues; i++) virtqueue_set_affinity(vscsi-req_vqs[i].vq, -1); vscsi-affinity_hint_set = false; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio: include asm/barrier explicitly
Michael S. Tsirkin m...@redhat.com writes: virtio_ring.h uses mb() and friends, make it pull in asm/barrier.h itself, not rely on other headers to do it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. Thanks, Rusty. PS. I'll squeeze these into this merge window, since they're trivial. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tools/virtio: move module license stub to module.h
Michael S. Tsirkin m...@redhat.com writes: This fixes build for the vringh test: [linux]$ make -C tools/virtio/ make: Entering directory `/home/mst/scm/linux/tools/virtio' cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -c -o vringh.o ../../drivers/vhost/vringh.c ../../drivers/vhost/vringh.c:1010:16: error: expected declaration specifiers or ‘...’ before string constant Signed-off-by: Michael S. Tsirkin m...@redhat.com --- tools/virtio/linux/module.h | 5 + tools/virtio/linux/virtio.h | 3 --- 2 files changed, 5 insertions(+), 3 deletions(-) Applied, thanks. Cheers, Rusty. diff --git a/tools/virtio/linux/module.h b/tools/virtio/linux/module.h index 3039a7e..28ce95a 100644 --- a/tools/virtio/linux/module.h +++ b/tools/virtio/linux/module.h @@ -1 +1,6 @@ #include linux/export.h + +#define MODULE_LICENSE(__MODULE_LICENSE_value) \ + static __attribute__((unused)) const char *__MODULE_LICENSE_name = \ + __MODULE_LICENSE_value + diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h index cd80183..8447830 100644 --- a/tools/virtio/linux/virtio.h +++ b/tools/virtio/linux/virtio.h @@ -45,9 +45,6 @@ struct virtqueue { void *priv; }; -#define MODULE_LICENSE(__MODULE_LICENSE_value) \ - const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value - /* Interfaces exported by virtio_ring. */ int virtqueue_add_sgs(struct virtqueue *vq, struct scatterlist *sgs[], -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm_intel: Could not allocate 42 bytes percpu data
Chegu Vinod chegu_vi...@hp.com writes: On 6/30/2013 11:22 PM, Rusty Russell wrote: Chegu Vinod chegu_vi...@hp.com writes: Hello, Lots (~700+) of the following messages are showing up in the dmesg of a 3.10-rc1 based kernel (Host OS is running on a large socket count box with HT-on). [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from reserved chunk failed [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data Woah, weird Oh. Shit. Um, this is embarrassing. Thanks, Rusty. Thanks for your response! === module: do percpu allocation after uniqueness check. No, really! v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting percpu memory on large machines: Now we have a new state MODULE_STATE_UNFORMED, we can insert the module into the list (and thus guarantee its uniqueness) before we allocate the per-cpu region. In my defence, it didn't actually say the patch did this. Just that we can. This patch actually *does* it. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Tested-by: Noone it seems. Your following updated fix seems to be working fine on the larger socket count machine with HT-on. OK, did you definitely revert every other workaround? If so, please give me a Tested-by: line... Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm_intel: Could not allocate 42 bytes percpu data
Chegu Vinod chegu_vi...@hp.com writes: On 7/1/2013 10:49 PM, Rusty Russell wrote: Chegu Vinod chegu_vi...@hp.com writes: On 6/30/2013 11:22 PM, Rusty Russell wrote: module: do percpu allocation after uniqueness check. No, really! v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting percpu memory on large machines: Now we have a new state MODULE_STATE_UNFORMED, we can insert the module into the list (and thus guarantee its uniqueness) before we allocate the per-cpu region. In my defence, it didn't actually say the patch did this. Just that we can. This patch actually *does* it. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Tested-by: Noone it seems. Your following updated fix seems to be working fine on the larger socket count machine with HT-on. OK, did you definitely revert every other workaround? Yes no other workarounds were there when your change was tested. If so, please give me a Tested-by: line... FYI The actual verification of your change was done by my esteemed colleague :Jim Hull (cc'd) who had access to this larger socket count box. Tested-by: Jim Hull jim.h...@hp.com Thanks, I've put this in my -next tree, and CC'd stable. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Alex Williamson alex.william...@redhat.com writes: On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, It's actually not that bad. eg. struct vfio_container *vfio_container_from_file(struct file *filp) { if (filp-f_op != vfio_device_fops) return ERR_PTR(-EINVAL); /* OK it really is a vfio fd, return the data. */ } EXPORT_SYMBOL_GPL(vfio_container_from_file); ... inside KVM_CREATE_SPAPR_TCE_IOMMU: struct file *vfio_filp; struct vfio_container *(lookup)(struct file *filp); vfio_filp = fget(create_tce_iommu.fd); if (!vfio) ret = -EBADF; lookup = symbol_get(vfio_container_from_file); if (!lookup) ret = -EINVAL; else { container = lookup(vfio_filp); if (IS_ERR(container)) ret = PTR_ERR(container); else ... symbol_put(vfio_container_from_file); } symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Hope that helps, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
Alex Williamson alex.william...@redhat.com writes: On Mon, 2013-06-17 at 13:56 +1000, Benjamin Herrenschmidt wrote: On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Right, we don't want to create dependencies across modules. I don't have a vision for how this should work. This is effectively a complete side-band to vfio, so we're really just dealing in the iommu group space. Maybe there needs to be some kind of registration of ownership for the group using some kind of token. It would need to include some kind of notification when that ownership ends. That might also be a convenient tag to toggle driver probing off for devices in the group. Other ideas? Thanks, It's actually not that bad. eg. struct vfio_container *vfio_container_from_file(struct file *filp) { if (filp-f_op != vfio_device_fops) return ERR_PTR(-EINVAL); /* OK it really is a vfio fd, return the data. */ } EXPORT_SYMBOL_GPL(vfio_container_from_file); ... inside KVM_CREATE_SPAPR_TCE_IOMMU: struct file *vfio_filp; struct vfio_container *(lookup)(struct file *filp); vfio_filp = fget(create_tce_iommu.fd); if (!vfio) ret = -EBADF; lookup = symbol_get(vfio_container_from_file); if (!lookup) ret = -EINVAL; else { container = lookup(vfio_filp); if (IS_ERR(container)) ret = PTR_ERR(container); else ... symbol_put(vfio_container_from_file); } symbol_get() won't try to load a module; it'll just fail. This is what you want, since they must have vfio in the kernel to get a valid fd... Hope that helps, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mod filenames != modnames? (inconsistent name changing)
Linda Walsh l...@tlinx.org writes: Should we have any expectation that a module name and it's filename should be equivalent? I was writing an auto-complete script for modprobe so it wouldn't give me the option to double load a module (I'd have to manually type it in if I really wanted it). Trouble is some modules with names w/underscores ('_') change register their names with a 'dash'. But modprobe kvm_intel already works. Just convert any - to _? Would it be too much trouble to ask that the same names be used in both places? Well, you could get kernel developers to rename their files with underscores. I've CC'd them. We use _ because we turn the module names into identifiers; it's also the most common choice, and having both is just confusing. I have about 30 modules loaded now and out of that list, 2 are name changes: acpi_cpufreq (filename acip-cpufreq.ko) kvm_intel (filename kvm-intel.ko) If I load a bunch more... I find more that change names. Could the naming in the modules that do this be made consistent? Not doing so makes automatically tying a mod-name to it's file tend toward being non-deterministic. Many modules use '-' or '_' without being confused. It would be helpful if the few confused modules could get their identity straight. Is there a problem with making these names consistent? It shouldn't break anything to rename them, if there's consensus I'm happy to create a patch. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: Hi Rusty, Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. Now you have a different compatibility problem; how do you know the guest supports the new virtio-pcie net? We don't care. We would still use virtio-pci for existing devices. Only new devices would use virtio-pcie. My concern is that new features make the virtio-pcie driver so desirable that libvirt is pressured to use it ASAP. Have we just punted the problem upstream? (Of course, feature bits were supposed to avoid such a transition issue, but mistakes accumulate). Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? Yes but you wanted a generic transport feature bit for flexible SG layout. Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then? If we need it within 6 months, I'd rather do that: this feature would then be assumed for 1.0, and reserved. We may do that for other features, too (the committee will have to review). MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... We might have something ready to deploy in 3 months, but realistically, I'd rather have it ready and tested outside the main git tree(s) and push it once the standard is committed. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. Now you have a different compatibility problem; how do you know the guest supports the new virtio-pcie net? If you put a virtio-pci card behind a PCI-e bridge today, it's not compliant, but AFAICT it will Just Work. (Modulo the 16-dev limit). I've been assuming we'd avoid a flag day change; that devices would look like existing virtio-pci with capabilities indicating the new config layout. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. Removing both forward and backward compatibility is easy to explain, but I think it'll be harder to deploy. This is your area though, so perhaps I'm wrong. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. Noone expected the new cards to Just Work with old OSes: a new machine meant a new OS and new drivers. Hardware vendors like that. Since virtualization often involves legacy, our priorities might be different. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. We definitely must not force a guest change. The explicit aim of the standard is that legacy and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori If you look at my patches you'll see how it works. Basically old guests use BAR0 new ones don't, so it's easy: BAR0 access means legacy guest. Only started testing but things seem to work fine with old guests so far. I think we need a spec, not just driver code. Rusty what's the plan? Want me to write it? We need both, of course, but the spec work will happen in the OASIS WG. A draft is good, but let's not commit anything to upstream QEMU until we get the spec finalized. And that is proposed to be late this year. Well that would be quite sad really. This means we can't make virtio a spec compliant pci express device, and we can't add any more feature bits, so no flexible buffer optimizations for virtio net. There are probably more projects that will be blocked. So how about we keep extending legacy layout for a bit longer: - add a way to access device with MMIO - use feature bit 31 to signal 64 bit features (and shift device config accordingly) By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. We definitely must not force a guest change. The explicit aim of the standard is that legacy and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori If you look at my patches you'll see how it works. Basically old guests use BAR0 new ones don't, so it's easy: BAR0 access means legacy guest. Only started testing but things seem to work fine with old guests so far. I think we need a spec, not just driver code. Rusty what's the plan? Want me to write it? We need both, of course, but the spec work will happen in the OASIS WG. A draft is good, but let's not commit anything to upstream QEMU until we get the spec finalized. And that is proposed to be late this year. Since I'm going to have to reformat the spec and adapt it into OASIS style anyway, perhaps you should prepare a description as a standalone text document. Easier to email and work with... Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we call them; I used the term legacy for existing implementations in the OASIS WG proposal), you add capabilities and don't point them into (the start of?) BAR0. Old drivers use BAR0 as now. One trick to note: while drivers shouldn't use both old and new style on the same device, you need to allow it for kexec, particularly reset via BAR0. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH] virtio-spec: small English/punctuation corrections
Luiz Capitulino lcapitul...@redhat.com writes: 1. s/These are devices are/These devices are 2. s/Thefirst/The first 3. s/, Guest should/. Guest should Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- virtio-spec.lyx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Thanks... I changed , Guest to . The guest, and re-entered the changes by hand to get change tracking. Thanks, Rusty. diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 6e188d0..7e4ce71 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -116,7 +116,7 @@ description Peripheral Component Interconnect; a common device bus. Seehtt \end_inset devices. - These are devices are found in + These devices are found in \emph on virtual \emph default @@ -1558,7 +1558,7 @@ name sub:Feature-Bits \end_layout \begin_layout Standard -Thefirst configuration field indicates the features that the device supports. +The first configuration field indicates the features that the device supports. The bits are allocated as follows: \end_layout @@ -2919,7 +2919,7 @@ For each ring, guest should then disable interrupts by writing VRING_AVAIL_F_NO_ INTERRUPT flag in avail structure, if required. It can then process used ring entries finally enabling interrupts by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the EVENT_IDX field in - the available structure, Guest should then execute a memory barrier, and + the available structure. Guest should then execute a memory barrier, and then recheck the ring empty condition. This is necessary to handle the case where, after the last check and before enabling interrupts, an interrupt has been suppressed by the device: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Michael S. Tsirkin m...@redhat.com writes: On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote: Yet the structure definitions are descriptive, capturing layout, size and endianness in natural a format readable by any C programmer. From an API design point of view, here are the problems I see: 1) C makes no guarantees about structure layout beyond the first member. Yes, if it's naturally aligned or has a packed attribute, GCC does the right thing. But this isn't kernel land anymore, portability matters and there are more compilers than GCC. [ I argued in detail here, then deleted. Damn bikeshedding... ] I think the best thing is to add the decoded integer versions as macros, and have a heap of BUILD_BUG_ON() in the kernel source to make sure they match. Hmm you want to have both in the header? That would confuse users for sure :) I guess we could put in a big comment explaning why do we have both, and which version should userspace use. I tried to write it: /* * On all known C compilers, you can use the * offsetof macro to find the correct offset of each field - * if your compiler doesn't implement this correctly, use the * integer versions below, instead. * In that case please don't use the structures above, to avoid confusion. */ My version was a little less verbose :) Subject: virtio_pci: macros for PCI layout offsets. QEMU wants it, so why not? Trust, but verify. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 24bcd9b..6510009 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -453,6 +453,64 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen, return p; } +/* This is part of the ABI. Don't screw with it. */ +static inline void check_offsets(void) +{ + /* Note: disk space was harmed in compilation of this function. */ + BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR != +offsetof(struct virtio_pci_cap, cap_vndr)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT != +offsetof(struct virtio_pci_cap, cap_next)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN != +offsetof(struct virtio_pci_cap, cap_len)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_TYPE_AND_BAR != +offsetof(struct virtio_pci_cap, type_and_bar)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET != +offsetof(struct virtio_pci_cap, offset)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH != +offsetof(struct virtio_pci_cap, length)); + BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT != +offsetof(struct virtio_pci_notify_cap, + notify_off_multiplier)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT != +offsetof(struct virtio_pci_common_cfg, + device_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF != +offsetof(struct virtio_pci_common_cfg, device_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT != +offsetof(struct virtio_pci_common_cfg, + guest_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF != +offsetof(struct virtio_pci_common_cfg, guest_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX != +offsetof(struct virtio_pci_common_cfg, msix_config)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ != +offsetof(struct virtio_pci_common_cfg, num_queues)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS != +offsetof(struct virtio_pci_common_cfg, device_status)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT != +offsetof(struct virtio_pci_common_cfg, queue_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE != +offsetof(struct virtio_pci_common_cfg, queue_size)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX != +offsetof(struct virtio_pci_common_cfg, queue_msix_vector)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE != +offsetof(struct virtio_pci_common_cfg, queue_enable)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF != +offsetof(struct virtio_pci_common_cfg, queue_notify_off)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO != +offsetof(struct virtio_pci_common_cfg, queue_desc_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI != +offsetof(struct virtio_pci_common_cfg, queue_desc_hi)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO != +offsetof(struct virtio_pci_common_cfg, queue_avail_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI != +offsetof(struct virtio_pci_common_cfg, queue_avail_hi)); + BUILD_BUG_ON
Re: updated: kvm networking todo wiki
Anthony Liguori anth...@codemonkey.ws writes: Michael S. Tsirkin m...@redhat.com writes: On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote: Stefan Hajnoczi stefa...@gmail.com writes: On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote: Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. The model is very interesting for QEMU because then we can use vhost as a backend for other types of network adapters (like vmxnet3 or even e1000). It also helps for things like fault tolerance where we need to be able to control packet flow within QEMU. (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). Then I'm really confused as to what this would look like. A zero copy sendmsg? We should be able to implement that today. On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Sounds like recvmmsg(2). Could we map this to mergable rx buffers though? Regards, Anthony Liguori Yes because we don't have to complete buffers in order. What I meant though was for GRO, we don't know how large the received packet is going to be. Mergable rx buffers lets us allocate a pool of data for all incoming packets instead of allocating max packet size * max packets. recvmmsg expects an array of msghdrs and I presume each needs to be given a fixed size. So this seems incompatible with mergable rx buffers. Good point. You'd need to build 64k buffers to pass to recvmmsg, then reuse the parts it didn't touch on the next call. This limits us to about a 16th of what we could do with an interface which understood buffer merging, but I don't know how much that would matter in practice. We'd need some benchmarks Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Stefan Hajnoczi stefa...@gmail.com writes: On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote: On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Sounds like recvmmsg(2). Wow... the future is here, today! Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. The model is very interesting for QEMU because then we can use vhost as a backend for other types of network adapters (like vmxnet3 or even e1000). It also helps for things like fault tolerance where we need to be able to control packet flow within QEMU. (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). Then I'm really confused as to what this would look like. A zero copy sendmsg? We should be able to implement that today. On the receive side, what can we do better than readv? If we need to return to userspace to tell the guest that we've got a new packet, we don't win on latency. We might reduce syscall overhead with a multi-dimensional readv to read multiple packets at once? Confused, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_pci: fix capability format, comments
Michael S. Tsirkin m...@redhat.com writes: - queue size can actually be 0 which is not a power of 2 Actually, that points to a flaw in the code. When we shut down the queue, we should ideally reset it to what the device started with, rather than 0. See below. - fix capability format. PCI spec says: The layout of the information is vendor specific, except that the byte immediately following the “Next” pointer in the capability structure is defined to be a length field. This length field provides the number of bytes in the capability structure (including the ID and Next pointer bytes). That part's definitely correct: applied. Thanks, Rusty. Subjet: virtio_pci: save the desired ringsize. MST points out that 0 isn't a power of 2. This means we can't re-open a virtio device once we write 0 into the queue length. We should restore the amount the device originally asked for. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_pci-common.h b/drivers/virtio/virtio_pci-common.h index ba1bf81..0e3143b 100644 --- a/drivers/virtio/virtio_pci-common.h +++ b/drivers/virtio/virtio_pci-common.h @@ -82,6 +82,9 @@ struct virtio_pci_vq_info { /* MSI-X vector (or none) */ unsigned msix_vector; + + /* What size did the device *want* this to be? */ + u16 desired_num; }; /* the notify function used when creating a virt queue */ diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 0f0e3a6..8b35c2e 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -230,6 +230,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, return ERR_PTR(-ENOMEM); info-msix_vector = msix_vec; + info-desired_num = num; /* get offset of notification word for this vq (shouldn't wrap) */ off = ioread16(vp_dev-common-queue_notify_off); @@ -350,7 +351,7 @@ static void del_vq(struct virtqueue *vq) vring_del_virtqueue(vq); /* This is for our own benefit, not the device's! */ - iowrite16(0, vp_dev-common-queue_size); + iowrite16(info-desired_num, vp_dev-common-queue_size); iowrite64_twopart(0, vp_dev-common-queue_desc); iowrite64_twopart(0, vp_dev-common-queue_avail); iowrite64_twopart(0, vp_dev-common-queue_used); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-rng only returns zeros with CONFIG_HW_RANDOM=m
Jens Axboe ax...@kernel.dk writes: On Wed, Feb 27 2013, Rusty Russell wrote: Aurelien Jarno aurel...@aurel32.net writes: Hi, I have noticed that virtio-rng only returns zero for kernels = 2.6.33 built with CONFIG_HW_RANDOM=m. This is a bit much too predictable for a random generator ;-). Wow. Fortunately, all of SLES, RHEL, Ubuntu or Fedora set CONFIG_HW_RANDOM=y. What do they know that we don't? Oops, looks like Debian testing: config-3.2.0-4-amd64:CONFIG_HW_RANDOM=m The reason for that is virtio expects guest real addresses, while rng_core.ko (ie when built as a module) is passing a vmalloced buffer to the virtio-rng read function, declared as such: static u8 rng_buffer[SMP_CACHE_BYTES 32 ? 32 : SMP_CACHE_BYTES] __cacheline_aligned; Yuck... It would be nice if this has oopsed. Jens, what about this patch? Cheers, Rusty. Subject: scatterlist: sg_set_buf() argument must be in linear mapping. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 4bd6c06..9365375 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -111,6 +111,9 @@ static inline struct page *sg_page(struct scatterlist *sg) static inline void sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { +#ifdef CONFIG_DEBUG_SG +BUG_ON(!virt_addr_valid(buf)); +#endif sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } Looks good to me, in lieu of being able to return an error. Want me to queue it up? -- Jens Axboe Ping? I haven't seen this go into Linus' tree... Subject: scatterlist: sg_set_buf() argument must be in linear mapping. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 4bd6c06..9365375 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -111,6 +111,9 @@ static inline struct page *sg_page(struct scatterlist *sg) static inline void sg_set_buf(struct scatterlist *sg, const void *buf, unsigned int buflen) { +#ifdef CONFIG_DEBUG_SG + BUG_ON(!virt_addr_valid(buf)); +#endif sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori aligu...@us.ibm.com writes: Michael S. Tsirkin m...@redhat.com writes: +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy-device_feature_select; Oh dear no... Please use defines like the rest of QEMU. It is pretty ugly. I think beauty is in the eye of the beholder here... Pretty much every device we have has a switch statement like this. Consistency wins when it comes to qualitative arguments like this. I was agreeing with you here, actually. Yet the structure definitions are descriptive, capturing layout, size and endianness in natural a format readable by any C programmer. From an API design point of view, here are the problems I see: 1) C makes no guarantees about structure layout beyond the first member. Yes, if it's naturally aligned or has a packed attribute, GCC does the right thing. But this isn't kernel land anymore, portability matters and there are more compilers than GCC. [ I argued in detail here, then deleted. Damn bikeshedding... ] I think the best thing is to add the decoded integer versions as macros, and have a heap of BUILD_BUG_ON() in the kernel source to make sure they match. 3) It suspect it's harder to review because a subtle change could more easily have broad impact. If someone changed the type of a field from u32 to u16, it changes the offset of every other field. That's not terribly obvious in the patch itself unless you understand how the structure is used elsewhere. MST's patch just did this, so point taken. (MST: I'm going to combine the cfg_type and bar bytes to fix this, patch coming). No change, but there's an open question on whether we should nail it to little endian (or define the endian by the transport). Of course, I can't rule out that the 1.0 standard *may* decide to frob the ring layout somehow, Well, given that virtio is widely deployed today, I would think the 1.0 standard should strictly reflect what's deployed today, no? That will be up to the committee. I think we want to fix some obvious pain points, though qemu will not benefit from them in the next 5 years. Any new config layout would be 2.0 material, right? Re: the new config layout, I don't think we would want to use it for anything but new devices. There are enough concrete reasons that I think we want it for existing devices: 1) Negotiated ring size/alignment. Coreboot wants smaller, others want larger. 2) Remove assertion that it has to be an I/O bar. PowerPC wants this. 3) Notification location flexibility. MST wanted this for performance. 4) More feature bits. Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. We definitely must not force a guest change. The explicit aim of the standard is that legacy and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). It's a delicate balancing act. My plan is to accompany any changes in the standard with a qemu implementation, so we can see how painful those changes are. And if there are performance implications, measure them. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Michael S. Tsirkin m...@redhat.com writes: On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote: On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote: Hey guys, I've updated the kvm networking todo wiki with current projects. Will try to keep it up to date more often. Original announcement below. Thanks a lot. I've added the tasks I'm currently working on to the wiki. btw. I notice the virtio-net data plane were missed in the wiki. Is the project still being considered? It might have been interesting several years ago, but now that linux has vhost-net in kernel, the only point seems to be to speed up networking on non-linux hosts. Data plane just means having a dedicated thread for virtqueue processing that doesn't hold qemu_mutex. Of course we're going to do this in QEMU. It's a no brainer. But not as a separate device, just as an improvement to the existing userspace virtio-net. Since non-linux does not have kvm, I doubt virtio is a bottleneck. FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). Great idea, that sounds very intresting. I'll add it to the wiki. In fact a bit of complexity in vhost was put there in the vague hope to support something like this: virtio rings are not translated through regular memory tables, instead, vhost gets a pointer to ring address. This allows qemu acting as a man in the middle, verifying the descriptors but not touching the Anyone interested in working on such a project? It would be an interesting idea if we didn't already have the vhost model where we don't need the userspace bounce. We already have two sets of host side ring code in the kernel (vhost and vringh, though they're being unified). All an accelerator can offer on the tx side is zero copy and direct update of the used ring. On rx userspace could register the buffers and the accelerator could fill them and update the used ring. It still needs to deal with merged buffers, for example. You avoid the address translation in the kernel, but I'm not convinced that's a key problem. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori aligu...@us.ibm.com writes: Michael S. Tsirkin m...@redhat.com writes: +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy-device_feature_select; Oh dear no... Please use defines like the rest of QEMU. It is pretty ugly. Yet the structure definitions are descriptive, capturing layout, size and endianness in natural a format readable by any C programmer. So AFAICT the question is, do we put the required #define VIRTIO_PCI_CFG_FEATURE_SEL \ (offsetof(struct virtio_pci_common_cfg, device_feature_select)) etc. in the kernel headers or qemu? Haven't looked at the proposed new ring layout yet. No change, but there's an open question on whether we should nail it to little endian (or define the endian by the transport). Of course, I can't rule out that the 1.0 standard *may* decide to frob the ring layout somehow, but I'd think it would require a compelling reason. I suggest that's 2.0 material... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Paolo Bonzini pbonz...@redhat.com writes: Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: + +switch (addr) { +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy-device_feature_select; Oh dear no... Please use defines like the rest of QEMU. Any good reason not to use offsetof? I'm not sure it's portable to use it in case labels. IIRC, the definition ((int)(((T *)0)-field)) is not a valid C integer constant expression. Laszlo? It's defined to yield an integer constant expression in the ISO standard (and I think ANSI too, though that's not at hand): 7.19, para 3: ...offsetof(type, member-designator) which expands to an integer constant expression that has type size_t, ... The real question is whether compilers qemu cares about meet the standard (there's some evidence that older compilers fail this). If not, we'll have to define them as raw offsets... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH rusty/virtio-pci-new-layout] virtio: new layout minor header fixups
Michael S. Tsirkin m...@redhat.com writes: Fix issues observed with the new layout code, seen when implementing device in qemu: - use of uXX in uapi header - incorrect readonly tag on one field - unconditional warning breaks builds with -Werr Signed-off-by: Michael S. Tsirkin m...@redhat.com Hmm, this means if we ever *do* remove those defines, qemu will break. But perhaps that's OK, because we'll probably remove legacy support at the same time as we break the header. Applied, thanks! Rusty. This patch is on top of rusty/virtio-pci-new-layout. include/uapi/linux/virtio_pci.h | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 3e61d55..cda688f 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -86,8 +86,10 @@ #define VIRTIO_PCI_LEGACY_VRING_ALIGN4096 #ifndef VIRTIO_PCI_NO_LEGACY +#ifndef VIRTIO_PCI_LEGACY_COMPAT_NAMES /* Don't break compile of old userspace code. These will go away. */ #warning Please support virtio_pci non-legacy mode! +#endif #define VIRTIO_PCI_HOST_FEATURES VIRTIO_PCI_LEGACY_HOST_FEATURES #define VIRTIO_PCI_GUEST_FEATURES VIRTIO_PCI_LEGACY_GUEST_FEATURES #define VIRTIO_PCI_QUEUE_PFN VIRTIO_PCI_LEGACY_QUEUE_PFN @@ -125,10 +127,10 @@ /* This is the PCI capability header: */ struct virtio_pci_cap { - u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */ - u8 cap_next;/* Generic PCI field: next ptr. */ - u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */ - u8 bar; /* Where to find it. */ + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ + __u8 cap_next; /* Generic PCI field: next ptr. */ + __u8 cfg_type; /* One of the VIRTIO_PCI_CAP_*_CFG. */ + __u8 bar; /* Where to find it. */ __le32 offset; /* Offset within bar. */ __le32 length; /* Length. */ }; @@ -144,7 +146,7 @@ struct virtio_pci_common_cfg { __le32 device_feature_select; /* read-write */ __le32 device_feature; /* read-only */ __le32 guest_feature_select;/* read-write */ - __le32 guest_feature; /* read-only */ + __le32 guest_feature; /* read-write */ __le16 msix_config; /* read-write */ __le16 num_queues; /* read-only */ __u8 device_status; /* read-write */ -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: provide vhost thread per virtqueue for forwarding scenario
Jason Wang jasow...@redhat.com writes: On 05/22/2013 05:59 PM, Zang Hongyong wrote: On 2013/5/20 15:43, Michael S. Tsirkin wrote: On Mon, May 20, 2013 at 02:11:19AM +, Qinchuanyu wrote: Yes, I don't think we want to create threads even more aggressively in all cases. I'm worried about scalability as it is. I think we should explore a flexible approach, use a thread pool (for example, a wq) to share threads between virtqueues, switch to a separate thread only if there's free CPU and existing threads are busy. Hopefully share threads between vhost instances too. On Xen platform, network backend pv driver model has evolved to this way. Netbacks from all DomUs share a thread pool, and thread number eaqual to cpu core number. Is there any plan for kvm paltform? There used to be two related RFCs for this, one is the multiple vhost workers from Anthony another is percpu vhost thread from Shirley. You can search the archives on netdev or kvm for the patches. As I've said to MST before, I think our entire model is wrong. Userspace should create the threads and call in. If you're doing kernel acceleration, two extra threads per NIC is a tiny overhead. Of course, such radical changes to vhost doesn't help existing users as Qinchuanyu asked... Cheers, Rusty, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] virtio_pci: better macro exported in uapi
Michael S. Tsirkin m...@redhat.com writes: Macro VIRTIO_PCI_CONFIG assumes that userspace actually has a structure with a field named msix_enabled. Add VIRTIO_PCI_CONFIG_OFF that gets the msix_enabled by value instead, to make it useful for userspace. We still keep VIRTIO_PCI_CONFIG around for now, in case some userspace uses it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. Thanks, Rusty. --- Changes from v1: - rename macro to VIRTIO_PCI_CONFIG_OFF so preprocessor can be used to check for its presence, as suggested by Rusty - keep old macro around as suggested by David Howells include/uapi/linux/virtio_pci.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index ea66f3f..e5ec1ca 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -80,7 +80,9 @@ /* The remaining space is defined by each driver as the per-driver * configuration space */ -#define VIRTIO_PCI_CONFIG(dev) ((dev)-msix_enabled ? 24 : 20) +#define VIRTIO_PCI_CONFIG_OFF(msix_enabled) ((msix_enabled) ? 24 : 20) +/* Deprecated: please use VIRTIO_PCI_CONFIG_OFF instead */ +#define VIRTIO_PCI_CONFIG(dev) VIRTIO_PCI_CONFIG_OFF((dev)-msix_enabled) /* Virtio ABI version, this must match exactly */ #define VIRTIO_PCI_ABI_VERSION 0 -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_pci: fix macro exported in uapi
David Howells dhowe...@redhat.com writes: Rusty Russell ru...@rustcorp.com.au wrote: Macro still isn't usable, because userspace can't know whether it's the new or old. We need to either remove it from UAPI, or rename it to VIRTIO_PCI_CONFIG_OFF. Surely, if userspace is using it as is, you can't remove it, rename it or alter it? The point of the patch is that it's unusable: #define VIRTIO_PCI_CONFIG(dev) ((dev)-msix_enabled ? 24 : 20) ie. it's accessing a member of the kernel's virtio_pci_dev structure. In theory, userspace could have a structure with the same field and be using it now, but that's unlikely (qemu certainly doesn't). No harm no foul. In theory it's actually a useful macro, so we could expose it to userspace, but we'd need to have a new name so userspace can #ifdef for older headers... Hope that clarifies, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_pci: fix macro exported in uapi
Michael S. Tsirkin m...@redhat.com writes: macro VIRTIO_PCI_CONFIG except in the unlikely event userspace actually has a structure with a field named msix_enabled. Get the msix_enabled by value instead, to make it useful for userspace. Signed-off-by: Michael S. Tsirkin m...@redhat.com Macro still isn't usable, because userspace can't know whether it's the new or old. We need to either remove it from UAPI, or rename it to VIRTIO_PCI_CONFIG_OFF. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
Joe Perches j...@perches.com writes: On Thu, 2013-05-16 at 13:04 +0930, Rusty Russell wrote: Asias He as...@redhat.com writes: On Wed, May 15, 2013 at 02:47:53PM +0930, Rusty Russell wrote: [] Other users are using memcpy_fromiovec and friends outside net. It seems a good idea to put it in a util library. e.g. crypto/algif_skcipher.c which also depends on NET for it. [] Subject: Hoist memcpy_fromiovec into lib/ You'll need the friends memcpy_toiovec too. $ git grep -E \bmemcpy\w+iovec\w* crypto/algif_hash.c:err = memcpy_toiovec(msg-msg_iov, ctx-result, len); crypto/algif_skcipher.c:err = memcpy_fromiovec(page_address(sg_page(sg)) + crypto/algif_skcipher.c:err = memcpy_fromiovec(page_address(sg_page(sg + i)), drivers/dma/iovlock.c:#include net/tcp.h /* for memcpy_toiovec */ drivers/dma/iovlock.c: return memcpy_toiovec(iov, kdata, len); drivers/dma/iovlock.c: err = memcpy_toiovec(iov, vaddr + offset, len); drivers/isdn/mISDN/socket.c:if (memcpy_fromiovec(skb_put(skb, len), msg-msg_iov, len)) { drivers/misc/vmw_vmci/vmci_queue_pair.c:err = memcpy_fromiovec((u8 *)va + page_o drivers/misc/vmw_vmci/vmci_queue_pair.c:err = memcpy_toiovec(iov, (u8 *)va + pag Fascinating. These all indirectly depend on NET, so there's no problem at the moment. But it is a bit weird... crypto/algif_hash.c: depends on CRYPTO_USER_API_HASH - NET crypto/algif_skcipher.c: depends on CRYPTO_USER_API_SKCIPHER - NET drivers/dma/iovlock.c: depends on NET_DMA - NET drivers/isdn/mISDN/socket.c: depends on MISDN - ISDN - NET drivers/misc/vmw_vmci/vmci_queue_pair.c: depends on VMCI - NET Patch welcome. Meanwhile, to avoid more bikeshedding I've put the patch I posted with all acks in my fixes branch. One cycle through linux-next, then straight to Linus. Subject: Hoist memcpy_fromiovec into lib/ ERROR: memcpy_fromiovec [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. socket.h already include uio.h, so no callers need updating. Reported-by: Randy Dunlap rdun...@infradead.org Acked-by: David S. Miller da...@davemloft.net Acked-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/socket.h b/include/linux/socket.h index 428c37a..7266775 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -305,7 +305,6 @@ struct ucred { extern void cred_to_ucred(struct pid *pid, const struct cred *cred, struct ucred *ucred); -extern int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); extern int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, int offset, int len); extern int csum_partial_copy_fromiovecend(unsigned char *kdata, diff --git a/include/linux/uio.h b/include/linux/uio.h index 629aaf5..21628d3 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -35,4 +35,6 @@ static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs) } unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to); + +int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); #endif diff --git a/lib/Makefile b/lib/Makefile index e9c52e1..2377211 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -9,7 +9,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o timerqueue.o\ -idr.o int_sqrt.o extable.o \ +idr.o int_sqrt.o extable.o iovec.o \ sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ diff --git a/lib/iovec.c b/lib/iovec.c new file mode 100644 index 000..632c5ea --- /dev/null +++ b/lib/iovec.c @@ -0,0 +1,29 @@ +#include linux/uaccess.h +#include linux/export.h +#include linux/uio.h + +/* + * Copy iovec to kernel. Returns -EFAULT on error. + * + * Note: this modifies the original iovec. + */ + +int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len) +{ + while (len 0) { + if (iov-iov_len) { + int copy = min_t(unsigned int, len, iov-iov_len); + if (copy_from_user(kdata, iov-iov_base, copy)) + return -EFAULT; + len -= copy; + kdata += copy; + iov-iov_base += copy; + iov-iov_len -= copy; + } + iov++; + } + + return 0; +} +EXPORT_SYMBOL(memcpy_fromiovec); + diff --git a/net/core/iovec.c b/net/core/iovec.c index 7e7aeb0..d81257f 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c
Re: [PATCH] virtio_console: fix uapi header
Michael S. Tsirkin m...@redhat.com writes: uapi should use __u32 not u32. Fix a macro in virtio_console.h which uses u32. Signed-off-by: Michael S. Tsirkin m...@redhat.com Ouch. Added CC:stable, and put into fixes. Mainly because it's embarrassing :) Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
Nicholas A. Bellinger n...@linux-iscsi.org writes: On Wed, 2013-05-15 at 14:47 +0930, Rusty Russell wrote: Asias He as...@redhat.com writes: scsi.c includes vhost.c which uses memcpy_fromiovec. This patch fixes this build failure. From Randy Dunlap: ''' on x86_64: ERROR: memcpy_fromiovec [drivers/vhost/vhost_scsi.ko] undefined! It needs to depend on NET since net/core/ provides that function. ''' Proper fix please. Though I can't see why you thought this was a good idea. Nonetheless, I shan't highlight why: I have far too much respect for your intellects and abilities. No, don't thank me! Hi Rusty Asias, I assume you mean something like the following patch to allow kbuild to work when VHOST_NET + VHOST_SCSI are both enabled and sharing vhost.o, yes..? No, that's a separate issue. memcpy_fromiovec() has nothing to do with networking: that was just the first user. Note that crypto/algif_skcipher.c also uses it. The obvious answer is to move it into lib/. OTOH making vhost_scsi depend on CONFIG_NET is breathtakingly lazy. I expect better from experienced kernel hackers :( Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
Asias He as...@redhat.com writes: On Wed, May 15, 2013 at 02:47:53PM +0930, Rusty Russell wrote: Asias He as...@redhat.com writes: scsi.c includes vhost.c which uses memcpy_fromiovec. This patch fixes this build failure. From Randy Dunlap: ''' on x86_64: ERROR: memcpy_fromiovec [drivers/vhost/vhost_scsi.ko] undefined! It needs to depend on NET since net/core/ provides that function. ''' Proper fix please. --verbose please ;-) Making VHOST_SCSI depends on NET looks weird but this is because vhost core depends on it. A bunch of patches are cleaning this up. Since MST wanted do the vhost.ko split up in 3.11, plus your WIP vringh work, so I wanted the fix for 3.10 as minimum as possible. If this isn't the only symbol causing the problem, then this should be mentioned in the changelog. If it is, it should be fixed: we have plenty of time for that. Either way, your commit message or the commit itself needs to justify it! Other users are using memcpy_fromiovec and friends outside net. It seems a good idea to put it in a util library. e.g. crypto/algif_skcipher.c which also depends on NET for it. Though I can't see why you thought this was a good idea. Nonetheless, I shan't highlight why: I have far too much respect for your intellects and abilities. No, don't thank me! Interesting. Heh... I originally wrote an explanation, then found it a bit insulting: I knew I didn't need to tell you :) How's this? From: Rusty Russell ru...@rustcorp.com.au Subject: Hoist memcpy_fromiovec into lib/ ERROR: memcpy_fromiovec [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. socket.h already include uio.h, so no callers need updating. Reported-by: Randy Dunlap rdun...@infradead.org Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/socket.h b/include/linux/socket.h index 428c37a..7266775 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -305,7 +305,6 @@ struct ucred { extern void cred_to_ucred(struct pid *pid, const struct cred *cred, struct ucred *ucred); -extern int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); extern int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, int offset, int len); extern int csum_partial_copy_fromiovecend(unsigned char *kdata, diff --git a/include/linux/uio.h b/include/linux/uio.h index 629aaf5..21628d3 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -35,4 +35,6 @@ static inline size_t iov_length(const struct iovec *iov, unsigned long nr_segs) } unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to); + +int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); #endif diff --git a/lib/Makefile b/lib/Makefile index e9c52e1..2377211 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -9,7 +9,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o timerqueue.o\ -idr.o int_sqrt.o extable.o \ +idr.o int_sqrt.o extable.o iovec.o \ sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ diff --git a/lib/iovec.c b/lib/iovec.c new file mode 100644 index 000..632c5ea --- /dev/null +++ b/lib/iovec.c @@ -0,0 +1,29 @@ +#include linux/uaccess.h +#include linux/export.h +#include linux/uio.h + +/* + * Copy iovec to kernel. Returns -EFAULT on error. + * + * Note: this modifies the original iovec. + */ + +int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len) +{ + while (len 0) { + if (iov-iov_len) { + int copy = min_t(unsigned int, len, iov-iov_len); + if (copy_from_user(kdata, iov-iov_base, copy)) + return -EFAULT; + len -= copy; + kdata += copy; + iov-iov_base += copy; + iov-iov_len -= copy; + } + iov++; + } + + return 0; +} +EXPORT_SYMBOL(memcpy_fromiovec); + diff --git a/net/core/iovec.c b/net/core/iovec.c index 7e7aeb0..d81257f 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -125,31 +125,6 @@ int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata, EXPORT_SYMBOL(memcpy_toiovecend); /* - * Copy iovec to kernel. Returns -EFAULT on error. - * - * Note: this modifies the original iovec. - */ - -int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len) -{ - while (len 0) { - if (iov-iov_len) { - int copy = min_t(unsigned int, len, iov-iov_len
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
Asias He as...@redhat.com writes: scsi.c includes vhost.c which uses memcpy_fromiovec. This patch fixes this build failure. From Randy Dunlap: ''' on x86_64: ERROR: memcpy_fromiovec [drivers/vhost/vhost_scsi.ko] undefined! It needs to depend on NET since net/core/ provides that function. ''' Proper fix please. Though I can't see why you thought this was a good idea. Nonetheless, I shan't highlight why: I have far too much respect for your intellects and abilities. No, don't thank me! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] vhost cleanups and separate module
Michael S. Tsirkin m...@redhat.com writes: On Tue, May 07, 2013 at 02:13:44PM +0930, Rusty Russell wrote: AFAICT we should always do zero copy. It seems not to be a win for small packets. I speculate the issue is that ring space isn't released as promptly. Further, we can't do it safely for guest to guest and guest to host. And if we try, net core just does a packet copy later (which is less efficient). So there's a hack in place to detect that and suppress zero copy. AFAICT there are two places we should copy. One is small packets: latency plus refcount/callbacks aren't a win. The other is weird packets (eg. more than 1000 segements), which we currently drop. Though I do wonder if we should use a dedicated hook to get an skb into the tun driver and generate it ourselves, rather than going sg - iov - skb. Cheers, Rusty. I think we'd have to export two interfaces: - alloc_skb() add frags ... - send_skb the code to add frags could maybe use some library functions ... I think we just need send_skb for the socket. We can build the skb ourselves. But yes that frag-handling code should be factored out. I'll see how I go. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] vhost cleanups and separate module
Michael S. Tsirkin m...@redhat.com writes: On Mon, May 06, 2013 at 03:41:36PM +0930, Rusty Russell wrote: Asias He as...@redhat.com writes: Asias He (3): vhost: Remove vhost_enable_zcopy in vhost.h vhost: Move VHOST_NET_FEATURES to net.c vhost: Make vhost a separate module I like these cleanups, MST pleasee apply. Absolutely. Except it's 3.11 material and I can only usefully create a -next branch once -rc1 is out. I have some other cleanups which are on hold for the moment pending MST's vhost_net simplification. MST, how's that going? Not too well. The array of status bytes which was designed to complete packets in order turns out to be a very efficient datastructure: It gives us a way to signal completions that is completely lockless for multiple completers, and using the producer/consumer model saves extra scans for the common case. Overall I can save some memory and clean up some code but can't get rid of the producer/consumer idices (currently named upend/done indices) which is what you asked me to do. Your cleanups basically don't work with zcopy because they ignore the upend/done indices? Would you like to post them, noting they only work with zcopy off, and we'll look for a way to apply them, together? Not quite; it's just that I don't understand that code. It seemed to be achieving something (ordered completion) which was entirely unnecessary, so I went on with other things while you removed it. Now that's not possible, I'll revisit. AFAICT we should always do zero copy. Though I do wonder if we should use a dedicated hook to get an skb into the tun driver and generate it ourselves, rather than going sg - iov - skb. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] vhost cleanups and separate module
Asias He as...@redhat.com writes: Asias He (3): vhost: Remove vhost_enable_zcopy in vhost.h vhost: Move VHOST_NET_FEATURES to net.c vhost: Make vhost a separate module I like these cleanups, MST pleasee apply. I have some other cleanups which are on hold for the moment pending MST's vhost_net simplification. MST, how's that going? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggesting wording fixes for virtio-spec 0.9.5
Laszlo Ersek ler...@redhat.com writes: On 04/23/13 06:05, Rusty Russell wrote: Laszlo Ersek ler...@redhat.com writes: Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: Yes, I like the use of 'descriptor chains'. This is a definite improvement. ... Not sure if it's customary here or if you need it / want it, but anyway Reviewed-by: Laszlo Ersek ler...@redhat.com Not needed, but always useful to ahve confirmation that I didn't introduce another mistake while fixing my old ones! (Also I've fixed the OVMF driver; just reposting the patch today with a better commit message.) Thanks much! Laszlo Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: virtio-net mergable rx buffers
Michael S. Tsirkin m...@redhat.com writes: On Wed, Apr 24, 2013 at 09:51:57AM +0300, Pekka Enberg wrote: Hi, On 04/23/2013 12:35 PM, Eric Northup wrote: Do you care about guests with drivers that don't negotiate VIRTIO_NET_F_MRG_RXBUF? On Wed, Apr 24, 2013 at 5:51 AM, Sasha Levin sasha.le...@oracle.com wrote: We usually try to keep backward compatibility, but in this case mergable RX buffers are about 5 years old now, so it's safe to assume they'll be running in any guest. Unless there is a specific reason to allow working without them I'd rather keep the code simple in this case. Are there such guests around? What's the failure scenario for them after this patch? Pekka Warning: have not looked at the patch, just a general comment. I think it's reasonable to assume embedded guests such as PXE won't negotiate any features. And, running old guests is one of the reasons people use virtualization at all. So 5 years is not a lot. In any case, stick to the device spec please, if you want it changed please send a spec patch, don't deviate from it randomly. Supporting old guests is an quality of implementation issue. It's like any ABI: if noone will notice, you can remove stuff. But the case of I can receive GSO packets but I don't support mergeable buffers is a trivial one: you can support it by pretending the guest can't handle GSO :) If you want to support non-Linux guests (eg. bootloaders), you probably want to keep support for very dumb drivers with no mergable rxbufs though. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-net: fill only rx queues which are being used
Sasha Levin sasha.le...@oracle.com writes: Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: Thanks, applied! Cheers, Rusty. sh-4.2# free -h total used free sharedbuffers cached Mem: 490M35M 455M 0B 0B 4.1M -/+ buffers/cache:31M 459M Swap: 0B 0B 0B sh-4.2# ethtool -L eth0 combined 8 sh-4.2# free -h total used free sharedbuffers cached Mem: 490M 162M 327M 0B 0B 4.1M -/+ buffers/cache: 158M 331M Swap: 0B 0B 0B Signed-off-by: Sasha Levin sasha.le...@oracle.com --- drivers/net/virtio_net.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6bfc511..196e721 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -581,7 +581,7 @@ static void refill_work(struct work_struct *work) bool still_empty; int i; - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { struct receive_queue *rq = vi-rq[i]; napi_disable(rq-napi); @@ -636,7 +636,7 @@ static int virtnet_open(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i; - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); @@ -900,6 +900,7 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) struct scatterlist sg; struct virtio_net_ctrl_mq s; struct net_device *dev = vi-dev; + int i; if (!vi-has_cvq || !virtio_has_feature(vi-vdev, VIRTIO_NET_F_MQ)) return 0; @@ -912,8 +913,12 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(dev-dev, Fail to set num of queue pairs to %d\n, queue_pairs); return -EINVAL; - } else + } else { + for (i = vi-curr_queue_pairs; i queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); vi-curr_queue_pairs = queue_pairs; + } return 0; } @@ -1568,7 +1573,7 @@ static int virtnet_probe(struct virtio_device *vdev) } /* Last of all, set up some receive buffers. */ - for (i = 0; i vi-max_queue_pairs; i++) { + for (i = 0; i vi-curr_queue_pairs; i++) { try_fill_recv(vi-rq[i], GFP_KERNEL); /* If we didn't even get one input buffer, we're useless. */ @@ -1692,7 +1697,7 @@ static int virtnet_restore(struct virtio_device *vdev) netif_device_attach(vi-dev); - for (i = 0; i vi-max_queue_pairs; i++) + for (i = 0; i vi-curr_queue_pairs; i++) if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-net: fill only rx queues which are being used
Sasha Levin sasha.le...@oracle.com writes: Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: Idea is good, implementation needs a tiny tweak: @@ -912,8 +913,13 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) dev_warn(dev-dev, Fail to set num of queue pairs to %d\n, queue_pairs); return -EINVAL; - } else + } else { + if (queue_pairs vi-curr_queue_pairs) + for (i = 0; i queue_pairs; i++) + if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); vi-curr_queue_pairs = queue_pairs; + } return 0; } You don't want to refill existing queues, so you don't need the if. for (i = vi-curr_queue_pairs; i queue_pairs; i++) { if (!try_fill_recv(vi-rq[i], GFP_KERNEL)) schedule_delayed_work(vi-refill, 0); We don't free up buffers when we're reducing queues, but I consider that a corner case. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/3] tcm_vhost: Add hotplug/hotunplug support
Asias He as...@redhat.com writes: On Mon, Apr 22, 2013 at 04:17:04PM +0300, Michael S. Tsirkin wrote: + evt = kzalloc(sizeof(*evt), GFP_KERNEL); I think kzalloc not needed here, you init all fields. Not really! evt-event.lun[4-7] is not initialized. It needs to be 0. So that is 4 bytes just init them when you set rest of lun. It is not in the fast path. You can do it this way but not a must. I think that's a bit cleaner than relying on kzalloc to zero-initialize. I think it is a bit shorter, 4 lines saved. OTOH you don't have to think what about the rest of the bytes? then hunt through the code and go ah, it's kzalloc. Looks it's a style issue but I prefer explicit initialization. It is just make none sense to argue about this. If you really want you can make patches to remove all the lines where use kzalloc because it is cleaner. I prefer explicit initialization too, FWIW. +static void tcm_vhost_do_evt_work(struct vhost_scsi *vs, + struct tcm_vhost_evt *evt) +{ + struct vhost_virtqueue *vq = vs-vqs[VHOST_SCSI_VQ_EVT]; + struct virtio_scsi_event *event = evt-event; + struct virtio_scsi_event __user *eventp; + unsigned out, in; + int head, ret; + + if (!tcm_vhost_check_endpoint(vq)) + return; + + mutex_lock(vs-vs_events_lock); + mutex_lock(vq-mutex); +again: + vhost_disable_notify(vs-dev, vq); + head = vhost_get_vq_desc(vs-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), out, in, + NULL, NULL); + if (head 0) { + vs-vs_events_dropped = true; + goto out; + } + if (head == vq-num) { + if (vhost_enable_notify(vs-dev, vq)) + goto again; Please write loops with while() or for(). Not with goto. goto is for error handling. This makes extra indention which is more ugly. I don't care. No loops with goto and that's a hard rule. It is not a loop. If same code repeats, it's a loop. We really need here is to call vhost_get_vq_desc once. It's not a loop like other places to use while() or for() with vhost_get_vq_desc. Exactly. Like a mutex for example? We only need to lock it once. See __mutex_lock_common - uses a for loop, doesn't it? So what? So fix this please, use while or for or even until. I do not think it is necessary. I tend to use again: in such corner cases, where I don't expect it to be invoked very often. A do loop looks too normal. I do exactly the same thing in virtio-net.c, for the same case. Would you like me to review the entire thing? Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggesting wording fixes for virtio-spec 0.9.5
Laszlo Ersek ler...@redhat.com writes: Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: Yes, I like the use of 'descriptor chains'. This is a definite improvement. Here's the diff I ended up with (massaged to minimize it). Thanks! Rusty. --- virtio-spec.txt-old 2013-04-23 13:22:21.339158214 +0930 +++ virtio-spec.txt 2013-04-23 13:34:14.055176464 +0930 @@ -482,10 +482,10 @@ 2.3.4 Available Ring -The available ring refers to what descriptors we are offering the -device: it refers to the head of a descriptor chain. The “flags” +The available ring refers to what descriptor chains we are offering the +device: each entry refers to the head of a descriptor chain. The “flags” field is currently 0 or 1: 1 indicating that we do not need an -interrupt when the device consumes a descriptor from the +interrupt when the device consumes a descriptor chain from the available ring. Alternatively, the guest can ask the device to delay interrupts until an entry with an index specified by the “ used_event” field is written in the used ring (equivalently, @@ -671,16 +671,16 @@ avail-ring[avail-idx % qsz] = head; -However, in general we can add many descriptors before we update -the “idx” field (at which point they become visible to the -device), so we keep a counter of how many we've added: +However, in general we can add many separate descriptor chains before we update +the “idx” field (at which point they become visible to the device), +so we keep a counter of how many we've added: avail-ring[(avail-idx + added++) % qsz] = head; 2.4.1.3 Updating The Index Field Once the idx field of the virtqueue is updated, the device will -be able to access the descriptor entries we've created and the +be able to access the descriptor chains we've created and the memory they refer to. This is why a memory barrier is generally used before the idx update, to ensure it sees the most up-to-date copy. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document virtio-9p
Sasha Levin sasha.le...@oracle.com writes: Add basic documentation for virtio-9p. I can expand more on device operation, but I don't think there's anything significant enough for the spec to be mentioned there. Please let me know if I'm wrong. Signed-off-by: Sasha Levin sasha.le...@oracle.com Hmm, I guess so. I was happy to leave it undocumented. Applied, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-net mq vq initialization
Michael S. Tsirkin m...@redhat.com writes: On Sun, Apr 14, 2013 at 11:59:42AM -0400, Sasha Levin wrote: On 04/14/2013 11:53 AM, Michael S. Tsirkin wrote: Initializing them only when they're actually needed will do the trick here. Not initializing, adding the buffers. In the current spec, initialization is always done before DRIVER_OK. Yeah, that's better, but we're going to need a spec change either way since even adding the buffers is specifically stated to happen before DRIVER_OK: 6. The receive virtqueues should be filled with receive buffers. This is described in detail below in “Setting Up Receive Buffers”. Thanks, Sasha Yes but that's a minor point, specific to virio-net. The main point was to ensure that VQs should have buffers before they get any packets. OTOH VQ init bere DRIVER_OK is a generic virtio thing: DRIVER_OK means we can start sending interrupts and we expect driver to be ready to get them. Let's solve the real problem with buffers, a couple of pages per VQ shouldn't be an issue. Yeah, let's not populate the extra virtqueues until someone turns multiqueue on. I appended to the last paragraph under Setting Up Receive Buffers: If VIRTIO_NET_F_MQ is negotiated, each of receiveq0...receiveqN that will be used should be populated with receive buffers Added: before multiqueue is activated (See [sub:Automatic-receive-steering]). Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-net mq vq initialization (was: [PATCH 0/5] Usual batch of random ARM fixes for kvmtool)
Sasha Levin sasha.le...@oracle.com writes: On 04/11/2013 12:36 PM, Will Deacon wrote: Hello folks, Here's the latest round of ARM fixes and updates for kvmtool. Most of this is confined to the arm/ subdirectory, with the exception of a fix to the virtio-mmio vq definitions due to the multi-queue work from Sasha. I'm not terribly happy about that code though, since it seriously increases the memory footprint of the guest. Without multi-queue, we can boot Debian Wheezy to a prompt in 38MB. With the new changes, that increases to 170MB! Any chance we can try and tackle this regression please? I keep getting bitten by the OOM killer :( (cc Rusty, MST) The spec defines the operation of a virtio-net device with regards to multiple queues as follows: Device Initialization 1. The initialization routine should identify the receive and transmission virtqueues, up to N+1 of each kind. If VIRTIO_NET_F_MQ feature bit is negotiated, N=max_virtqueue_pairs-1, otherwise identify N=0. [...] 5. Only receiveq0, transmitq0 and controlq are used by default. To use more queues driver must negotiate the VIRTIO_NET_F_MQ feature; initialize up to max_virtqueue_pairs of each of transmit and receive queues; execute_ VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command specifying the number of the transmit and receive queues that is going to be used and wait until the device consumes the controlq buffer and acks this command. And kvmtool follows that to the letter: It will initialize the maximum amount of queues it can support during initialization and will start using them only when the device tells it it should use them. As Will has stated, this causes a memory issue since all the data structures that hold all possible queues get initialized regardless of whether we actually need them or not, which is quite troublesome for systems with small RAM. Rusty, MST, would you be open to a spec and code change that would initialize the RX/TX vqs on demand instead of on device initialization? Or is there an easier way to work around this issue? I'm confused. kvmtool is using too much memory, or the guest? If kvmtool, the Device Initialization section above applies to the driver, not the device. If the guest, well, the language says UP TO N+1. You want a small guest, don't use them all. Or any... What am I missing? Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V7 0/5] virtio-scsi multiqueue
Asias He as...@redhat.com writes: On Sat, Apr 06, 2013 at 09:40:13AM +0100, James Bottomley wrote: Well, I haven't had time to look at anything other than the patch I commented on. I'm happy with your fix, so you can add my acked by to that one. Since it's going through the virtio tree, don't wait for me, put it in and I'll make you fix up anything I find later that I'm unhappy with. So, Rusty, could you pick this up in your virtio-next tree? Done! Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/2] tcm_vhost: Use vq-private_data to indicate if the endpoint is setup
Michael S. Tsirkin m...@redhat.com writes: Rusty's currently doing some reorgs of -net let's delay cleanups there to avoid stepping on each other's toys. Let's focus on scsi here. E.g. any chance framing assumptions can be fixed in 3.10? I am waiting for your removal of the dma-compelete ordering stuff in vhost-net. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html