Re: [PATCH] maintainers: drop Chris Wright from pvops
Chris CC'd: He wasn't that hard to find. (linkedin says he's CTO of RedHat now. I feel like an underachiever!) Cheers, Rusty. Juergen Gross <jgr...@suse.com> writes: > Mails to chr...@sous-sol.org are not deliverable since several months. > Drop him as PARAVIRT_OPS maintainer. > > Signed-off-by: Juergen Gross <jgr...@suse.com> > --- > MAINTAINERS | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index d85c08956875..af0cb69f6a3e 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -10179,7 +10179,6 @@ F:Documentation/parport*.txt > > PARAVIRT_OPS INTERFACE > M: Juergen Gross <jgr...@suse.com> > -M: Chris Wright <chr...@sous-sol.org> > M: Alok Kataria <akata...@vmware.com> > M: Rusty Russell <ru...@rustcorp.com.au> > L: virtualization@lists.linux-foundation.org > -- > 2.12.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] paravirt: remove paravirt ops pmd_update[_defer] and pte_update_defer
Juergen Gross <jgr...@suse.com> writes: > Ping? Acked-by: Rusty Russell <ru...@rustcorp.com.au> Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: remove unused operation
Juergen Gross jgr...@suse.com writes: Ping? Acked-by: Rusty Russell ru...@rustcorp.com.au Cheers, Rusty. On 08/06/2015 01:55 PM, Juergen Gross wrote: Remove the paravirt operation get_tsc_khz as it is used nowhere. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/paravirt_types.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index a6b8f9f..5a18a66 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -97,7 +97,6 @@ struct pv_lazy_ops { struct pv_time_ops { unsigned long long (*sched_clock)(void); unsigned long long (*steal_clock)(int cpu); -unsigned long (*get_tsc_khz)(void); }; struct pv_cpu_ops { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio-net: why not always to set avail-flags to VRING_AVAIL_F_NO_INTERRUPT
Linhaifeng haifeng@huawei.com writes: On 2015/6/4 9:13, Rusty Russell wrote: Linhaifeng haifeng@huawei.com writes: Hi, I'm a newbie and have a question about vring_new_virtqueue function. Why we set avail-flags to VRING_AVAIL_F_NO_INTERRUPT when no callbacks? I think we should set avail-flags to VRING_AVAIL_F_NO_INTERRUPT even if no callbacks. Hi Linhaifeng, Not sure I understand your question, but I'll try to answer. We don't set VRING_AVAIL_F_NO_INTERRUPT if there's a callback because we want that callback called. Otherwise callback will never be used. Cheers, Rusty. Hi Rusty, Thank you for your response. I mean should we set VRING_AVAIL_F_NO_INTERRUPT when virtqueue is initialized whether there is callback or not? As it would be set in function virtqueue_disable_cb and virtqueue_enable_cb_prepare later. You're right, the callers could virtqueue_enable() specifically, but I think most callers will want the virtqueue enabled immediately. And remember, the VRING_AVAIL_F_NO_INTERRUPT is just a *hint*: the device is allows to ignore it (this can happen due to race conditions, for example). Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio-net: why not always to set avail-flags to VRING_AVAIL_F_NO_INTERRUPT
Linhaifeng haifeng@huawei.com writes: Hi, I'm a newbie and have a question about vring_new_virtqueue function. Why we set avail-flags to VRING_AVAIL_F_NO_INTERRUPT when no callbacks? I think we should set avail-flags to VRING_AVAIL_F_NO_INTERRUPT even if no callbacks. Hi Linhaifeng, Not sure I understand your question, but I'll try to answer. We don't set VRING_AVAIL_F_NO_INTERRUPT if there's a callback because we want that callback called. Otherwise callback will never be used. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 4/4] rpmsg: DMA map sgs passed to virtio
Edgar E. Iglesias edgar.igles...@gmail.com writes: From: Edgar E. Iglesias edgar.igles...@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.igles...@xilinx.com First off, I have handed maintainership off to Michael S. Tsirkin, so his word is now law. That said... there's nothing fundamentally *wrong* with this, but it's not how standard virtio works. We decided some time ago that as we're paravirtualized, we would not be doing address mapping. rpmsg uses virtio, but it's with a twist: they're not talking to a host. Thus my preference, in order, would be: 1) Don't use non-kmalloc addresses. 2) If that's not possible, call these _dma interfaces _rpmsg instead, so normal virtio users don't get confused and try to use them. Cheers, Rusty. --- drivers/rpmsg/virtio_rpmsg_bus.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index 73354ee..9ae53a0 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -210,6 +210,22 @@ static void __ept_release(struct kref *kref) kfree(ept); } +static inline dma_addr_t msg_dma_address(struct virtproc_info *vrp, void *msg) +{ + unsigned long offset = msg - vrp-rbufs; + + return vrp-bufs_dma + offset; +} + +static inline void rpmsg_msg_sg_init(struct virtproc_info *vrp, + struct scatterlist *sg, + void *msg, unsigned int len) +{ + sg_init_table(sg, 1); + sg_dma_address(sg) = msg_dma_address(vrp, msg); + sg_dma_len(sg) = len; +} + /* for more info, see below documentation of rpmsg_create_ept() */ static struct rpmsg_endpoint *__rpmsg_create_ept(struct virtproc_info *vrp, struct rpmsg_channel *rpdev, rpmsg_rx_cb_t cb, @@ -754,12 +770,12 @@ int rpmsg_send_offchannel_raw(struct rpmsg_channel *rpdev, u32 src, u32 dst, print_hex_dump(KERN_DEBUG, rpmsg_virtio TX: , DUMP_PREFIX_NONE, 16, 1, msg, sizeof(*msg) + msg-len, true); - sg_init_one(sg, msg, sizeof(*msg) + len); + rpmsg_msg_sg_init(vrp, sg, msg, sizeof(*msg) + len); mutex_lock(vrp-tx_lock); /* add message to the remote processor's virtqueue */ - err = virtqueue_add_outbuf(vrp-svq, sg, 1, msg, GFP_KERNEL); + err = dma_virtqueue_add_outbuf(vrp-svq, sg, 1, msg, GFP_KERNEL); if (err) { /* * need to reclaim the buffer here, otherwise it's lost @@ -828,10 +844,10 @@ static int rpmsg_recv_single(struct virtproc_info *vrp, struct device *dev, dev_warn(dev, msg received with no recipient\n); /* publish the real size of the buffer */ - sg_init_one(sg, msg, RPMSG_BUF_SIZE); + rpmsg_msg_sg_init(vrp, sg, msg, RPMSG_BUF_SIZE); /* add the buffer back to the remote processor's virtqueue */ - err = virtqueue_add_inbuf(vrp-rvq, sg, 1, msg, GFP_KERNEL); + err = dma_virtqueue_add_inbuf(vrp-rvq, sg, 1, msg, GFP_KERNEL); if (err 0) { dev_err(dev, failed to add a virtqueue buffer: %d\n, err); return err; @@ -1007,9 +1023,9 @@ static int rpmsg_probe(struct virtio_device *vdev) struct scatterlist sg; void *cpu_addr = vrp-rbufs + i * RPMSG_BUF_SIZE; - sg_init_one(sg, cpu_addr, RPMSG_BUF_SIZE); + rpmsg_msg_sg_init(vrp, sg, cpu_addr, RPMSG_BUF_SIZE); - err = virtqueue_add_inbuf(vrp-rvq, sg, 1, cpu_addr, + err = dma_virtqueue_add_inbuf(vrp-rvq, sg, 1, cpu_addr, GFP_KERNEL); WARN_ON(err); /* sanity check; this can't really happen */ } -- 1.9.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio: pass baton to Michael Tsirkin
With my job change kernel work will be own time; I'm keeping lguest and modules (and the virtio standards work), but virtio kernel has to go. This makes it clear that Michael is in charge. He's good, but having me watch over his shoulder won't help. Good luck Michael! Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/MAINTAINERS b/MAINTAINERS index 2e5bbc0d68b2..16227759dfa8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10523,7 +10523,6 @@ F: include/linux/virtio_console.h F: include/uapi/linux/virtio_console.h VIRTIO CORE, NET AND BLOCK DRIVERS -M: Rusty Russell ru...@rustcorp.com.au M: Michael S. Tsirkin m...@redhat.com L: virtualization@lists.linux-foundation.org S: Maintained ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: fix typo in vring_need_event() doc comment
Michael S. Tsirkin m...@redhat.com writes: On Sun, Apr 19, 2015 at 02:36:38PM +0930, Rusty Russell wrote: Stefan Hajnoczi stefa...@redhat.com writes: Here the other side refers to the guest or host. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Applied. Thanks! Rusty. Just to make sure, are you applying this for 4.1? No, I am going to hand you my pending queue. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: fix typo in vring_need_event() doc comment
Stefan Hajnoczi stefa...@redhat.com writes: Here the other side refers to the guest or host. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Applied. Thanks! Rusty. --- include/uapi/linux/virtio_ring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index a3318f3..915980a 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -155,7 +155,7 @@ static inline unsigned vring_size(unsigned int num, unsigned long align) } /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */ -/* Assuming a given event_idx value from the other size, if +/* Assuming a given event_idx value from the other side, if * we have just incremented index from old to new_idx, * should we trigger an event? */ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old) -- 2.1.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/6] virtio_balloon: virtio 1 support
Michael S. Tsirkin m...@redhat.com writes: On Tue, Apr 14, 2015 at 11:50:53AM +0200, Cornelia Huck wrote: On Tue, 14 Apr 2015 11:21:11 +0200 Michael S. Tsirkin m...@redhat.com wrote: diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index f81b220..164e0c2 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -52,15 +52,31 @@ struct virtio_balloon_config { #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */ #define VIRTIO_BALLOON_S_NR 6 +/* + * Memory statistics structure. + * Driver fills an array of these structures and passes to device. + * + * NOTE: fields are laid out in a way that would make compiler add padding + * between and after fields, so we have to use compiler-specific attributes to + * pack it, to disable this padding. This also often causes compiler to + * generate suboptimal code. + * + * We maintain this for backwards compatibility, but don't follow this example. s/this/the existing statistics structure/ existing seems redundant. What else? non-existing? + * + * Do something like the below instead: If you want to implement a similar structure, do... Just that nobody gets the idea that they are supposed to implement new balloon statistics ;) + * struct virtio_balloon_stat { + * __virtio16 tag; + * __u8 reserved[6]; + * __virtio64 val; + * }; (...) @@ -225,13 +219,8 @@ static inline void update_stat(struct virtio_balloon *vb, int idx, u16 tag, u64 val) { BUG_ON(idx = VIRTIO_BALLOON_S_NR); - if (virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) { - vb-stats[idx].tag = cpu_to_le32(tag); - vb-stats[idx].val = cpu_to_le64(val); - } else { - vb-legacy_stats[idx].tag = tag; - vb-legacy_stats[idx].val = val; - } + vb-stats[idx].tag = cpu_to_virtio16(vb-vdev, tag); Seems that nobody seemed to care much about statistics... Or about BE guests ;) + vb-stats[idx].val = cpu_to_virtio64(vb-vdev, val); } #define pages_to_bytes(x) ((u64)(x) PAGE_SHIFT) With these changes merged in: Acked-by: Cornelia Huck cornelia.h...@de.ibm.com OK, here's an updated incremental patch: only comment changed. OK, I've merged this with one change: +static void stats_sg_init(struct virtio_balloon *vb, struct scatterlist *sg) +{ + sg_init_one(sg, vb-stats, sizeof(vb-stats)); +} + ... - sg_init_one(sg, vb-stats, sizeof(vb-stats)); + stats_sg_init(vb, sg); This is no longer a meaningful change, so I removed it. Here's the final result: From: Michael S. Tsirkin m...@redhat.com Subject: virtio_balloon: transitional interface Virtio 1.0 doesn't include a modern balloon device. But it's not a big change to support a transitional balloon device: this has the advantage of supporting existing drivers, transparently. Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 6a356e344f82..9db546ebe5a1 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -214,8 +219,8 @@ static inline void update_stat(struct virtio_balloon *vb, int idx, u16 tag, u64 val) { BUG_ON(idx = VIRTIO_BALLOON_S_NR); - vb-stats[idx].tag = tag; - vb-stats[idx].val = val; + vb-stats[idx].tag = cpu_to_virtio16(vb-vdev, tag); + vb-stats[idx].val = cpu_to_virtio64(vb-vdev, val); } #define pages_to_bytes(x) ((u64)(x) PAGE_SHIFT) @@ -283,18 +288,27 @@ static void virtballoon_changed(struct virtio_device *vdev) static inline s64 towards_target(struct virtio_balloon *vb) { - __le32 v; s64 target; + u32 num_pages; - virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages, v); + virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages, +num_pages); - target = le32_to_cpu(v); + /* Legacy balloon config space is LE, unlike all other devices. */ + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) + num_pages = le32_to_cpu((__force __le32)num_pages); + + target = num_pages; return target - vb-num_pages; } static void update_balloon_size(struct virtio_balloon *vb) { - __le32 actual = cpu_to_le32(vb-num_pages); + u32 actual = vb-num_pages; + + /* Legacy balloon config space is LE, unlike all other devices. */ + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) + actual = (__force u32)cpu_to_le32(actual); virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual, actual); diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux
Re: [PATCH] virtio_balloon: drop virtio_balloon_stat_modern
Cornelia Huck cornelia.h...@de.ibm.com writes: On Tue, 14 Apr 2015 12:01:13 +0200 Michael S. Tsirkin m...@redhat.com wrote: Looks like we are better off sticking with the misaligned stat struct, to reduce the amount of virtio 1 specific code in balloon. So let's do it. Add a detailed comment to reduce the chance people copy this bad example. This also fixes a bug on BE architectures: tag should use cpu_to_le16, not cpu_to_le32. Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Just reposting so it's easier to apply. Feel free to squash into previous patch if you think it's neater. +1 for squashing from me. My A-b would also apply to the squashed patch. Yes, I already squashed. TBH, I would have squashed the next patches into one too, that simply got rid of the virtio_device_is_legacy_only() function and all callers. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/6] virtio_balloon: virtio 1 support
Michael S. Tsirkin m...@redhat.com writes: On Wed, Apr 01, 2015 at 02:57:35PM +0200, Michael S. Tsirkin wrote: Virtio 1.0 doesn't include a modern balloon device. At some point we'll likely define an incompatible interface with a different ID and different semantics. But for now, it's not a big effort to support a transitional balloon device: this has the advantage of supporting existing drivers, transparently, as well as transports that don't allow mixing virtio 0 and virtio 1 devices. And balloon is an easy device to test, so it's also useful for people to test virtio core handling of transitional devices. The only interface issue is with the stats buffer, which has misaligned fields. We could leave it as is, but this sets a bad precedent that others might copy by mistake. As we need to change stats code to do byteswaps for virtio 1.0, it seems easy to fix by prepending a 6 byte reserved field. I also had to change config structure field types from __le32 to __u32 to match other devices. This means we need a couple of __force tags for legacy path but that seems minor. Rusty, what are your thoughts here? How about merging this for 4.1? I disagree with making any changes, other than allowing it to be used with VIRTIO_F_VERSION_1. However it doesn't seem to bother anyone else, so I've applied it for 4.1. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_config: reorder functions
Cornelia Huck cornelia.h...@de.ibm.com writes: On Wed, 8 Apr 2015 16:49:46 +0200 Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 31, 2015 at 05:17:25PM +0200, Michael S. Tsirkin wrote: This simply reorders functions in virtio_config so width access wrapper helpers are all together. Drops an extra empty line while we are at it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Ping --- include/linux/virtio_config.h | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Sorry, it's in my virtio-next tree, I obviously failed to Ack. Applied, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_ring: Update weak barriers to use dma_wmb/rmb
Alexander Duyck alexander.h.du...@redhat.com writes: This change makes it so that instead of using smp_wmb/rmb which varies depending on the kernel configuration we can can use dma_wmb/rmb which for most architectures should be equal to or slightly more strict than smp_wmb/rmb. The advantage to this is that these barriers are available to uniprocessor builds as well so the performance should improve under such a configuration. Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com This seems OK to me, since it's really as much a cleanup as anything, but like you I do wonder if there benefit on ARM in practice. Applied, thanks. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next] netdevice: document NETDEV_TX_BUSY deprecation.
This paraphrases DaveM (and steals some of his words) explaining why a device shouldn't return NETDEV_TX_BUSY, even though it looks so inviting to driver authors. See http://www.spinics.net/lists/netdev/msg322350.html Inspired-by: David Miller da...@davemloft.net Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index dcf6ec27739b..a2cad44b8630 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -795,7 +795,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, * struct net_device *dev); * Called when a packet needs to be transmitted. - * Must return NETDEV_TX_OK , NETDEV_TX_BUSY. + * Returns NETDEV_TX_OK. Can return NETDEV_TX_BUSY, but you should stop + * the queue before that can happen; it's for obsolete devices and weird + * corner cases, but the stack really does a non-trivial amount + * of useless work if you return NETDEV_TX_BUSY. *(can also return NETDEV_TX_LOCKED iff NETIF_F_LLTX) * Required can not be NULL. * ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: drop a useless config read
Cornelia Huck cornelia.h...@de.ibm.com writes: On Tue, 31 Mar 2015 13:55:42 +0200 Michael S. Tsirkin m...@redhat.com wrote: commit d71de9ec6ba806104439d3a669befda84757b5af virtio: core support for config generation fixed reading up 64 bit values, adding generation checks for such reads. By mistake, it left an explicit get call in place as well. the result is that the value is read twice, the first result is discarded. Not a big deal since this only happens with virtio blk and only on boot ATM, so performance isn't affected, but let's clean it up. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/linux/virtio_config.h | 1 - 1 file changed, 1 deletion(-) Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Applied. Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 1/6] virtio_balloon: transitional interface
Michael S. Tsirkin m...@redhat.com writes: Virtio 1.0 doesn't include a modern balloon device. But it's not a big change to support a transitional balloon device: this has the advantage of supporting existing drivers, transparently. You decided to fix the packed struct... diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 6a356e3..574267f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -77,7 +77,7 @@ struct virtio_balloon { /* Memory statistics */ int need_stats_update; - struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + struct virtio_balloon_stat_modern stats[VIRTIO_BALLOON_S_NR]; /* To register callback in oom notifier call chain */ struct notifier_block nb; @@ -269,7 +269,11 @@ static void stats_handle_request(struct virtio_balloon *vb) vq = vb-stats_vq; if (!virtqueue_get_buf(vq, len)) return; - sg_init_one(sg, vb-stats, sizeof(vb-stats)); + if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) + sg_init_one(sg, vb-stats, sizeof(vb-stats)); + else + sg_init_one(sg, vb-stats-tag, sizeof(vb-stats) - + offsetof(typeof(*vb-stats, tag); This makes it compile, but definitely won't work. virtqueue_add_outbuf(vq, sg, 1, vb, GFP_KERNEL); virtqueue_kick(vq); } @@ -283,21 +287,30 @@ static void virtballoon_changed(struct virtio_device *vdev) static inline s64 towards_target(struct virtio_balloon *vb) { - __le32 v; s64 target; + u32 num_pages; - virtio_cread(vb-vdev, struct virtio_balloon_config, num_pages, v); + virtio_cread(vb-vdev, struct virtio_balloon_config, + num_pages, num_pages); - target = le32_to_cpu(v); + /* Legacy balloon config space is LE, unlike all other devices. */ + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) + num_pages = le32_to_cpu((__force le32)num_pages); + + target = num_pages; return target - vb-num_pages; } static void update_balloon_size(struct virtio_balloon *vb) { - __le32 actual = cpu_to_le32(vb-num_pages); + u32 actual = vb-num_pages; + + /* Legacy balloon config space is LE, unlike all other devices. */ + if (!virtio_has_feature(vb-vdev, VIRTIO_F_VERSION_1)) + actual = (__force u32)cpu_to_le32(num_pages); - virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual, - actual); + virtio_cwrite(vb-vdev, struct virtio_balloon_config, + actual, actual); } Final line is gratitous reformatting. I would leave the device *exactly* as is, ugly structure packing and all. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5] Add virtio-input driver.
Michael S. Tsirkin m...@redhat.com writes: On Thu, Mar 26, 2015 at 11:49:25AM +0100, Gerd Hoffmann wrote: virtio-input is basically evdev-events-over-virtio, so this driver isn't much more than reading configuration from config space and forwarding incoming events to the linux input layer. Signed-off-by: Gerd Hoffmann kra...@redhat.com Still a bit worried about using input.h as host/guest interface (can't we use some formal standard, e.g. USB HID?), but I'll let Rusty decide that. I like the simplicity, and this API is pretty well proven. Since this is under drivers/virtio rather than drivers/input, I've applied it to my tree. I have a dream where Denys Vlasenko adds this and the virtio gl device to lguest, and we have X running under lguest :) Applied! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4] Add virtio-input driver.
Vojtech Pavlik vojt...@suse.com writes: On Wed, Mar 25, 2015 at 01:51:43PM +1030, Rusty Russell wrote: Imagine a future virtio standard which incorporates this. And a Windows or FreeBSD implementation of the device and or driver. How ugly would they be? A windows translation layer is fairly easy, people porting software from Windows to Linux told me numerous times that adapting isn't hard. I also believe that at least one of the BSD's has a compatible implementation these days based on the fact that I was asked to allow copying the headers in the past. Thanks Vojtech, that answers my questions. I figure we apply this and see where it leads... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] virtio: change comment in transmit
Stephen Hemminger step...@networkplumber.org writes: The original comment was not really informative or funny as well as sexist. Replace it with a better explanation of why the driver does stop and what the impacts are. Signed-off-by: Stephen Hemminger step...@networkplumber.org Fair call. Comment was certainly snarky, probably sexist. I think it expressed my feelings perfectly, however. I note that there's still no comment saying don't do this in netdevice.h; I gather returning NETDEV_TX_BUSY is still considered a Bad Thing? (Does it really BUG_ON?) Thanks, Rusty. --- a/drivers/net/virtio_net.c2015-03-24 15:20:25.174671000 -0700 +++ b/drivers/net/virtio_net.c2015-03-24 16:17:28.478525333 -0700 @@ -939,8 +939,12 @@ static netdev_tx_t start_xmit(struct sk_ skb_orphan(skb); nf_reset(skb); - /* Apparently nice girls don't return TX_BUSY; stop the queue - * before it gets out of hand. Naturally, this wastes entries. */ + /* It is better to stop queue if running out of space + * instead of forcing queuing layer to requeue the skb + * by returning TX_BUSY (and cause a BUG message). + * Since most packets only take 1 or 2 ring slots + * this means 16 slots are typically wasted. + */ if (sq-vq-num_free 2+MAX_SKB_FRAGS) { netif_stop_subqueue(dev, qnum); if (unlikely(!virtqueue_enable_cb_delayed(sq-vq))) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4] Add virtio-input driver.
Gerd Hoffmann kra...@redhat.com writes: virtio-input is basically evdev-events-over-virtio, so this driver isn't much more than reading configuration from config space and forwarding incoming events to the linux input layer. Signed-off-by: Gerd Hoffmann kra...@redhat.com Is the input layer sane? I've never dealt with it, so I don't know. What's it used for? Imagine a future virtio standard which incorporates this. And a Windows or FreeBSD implementation of the device and or driver. How ugly would they be? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_mmio: fix endian-ness for mmio
Michael S. Tsirkin m...@redhat.com writes: On Thu, Mar 12, 2015 at 12:33:36PM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: Going over the virtio mmio code, I noticed that it doesn't correctly return device config values in LE format when using virtio 1.0. Borrow code from virtio_pci_modern to do this correctly. AFAICT, it doesn't need to. The endian correction is done by the callers. The only reason that virtio_pci_modern() does it is because readl() etc do endian conversion, and we don't want them to. And the PCI part of the spec says to use natural accessors, so we don't do byte-at-a-time. Cheers, Rusty. You are right, the endina-ness is not an issue, so the commit log I wrote is wrong, but I still think the patch is required, because MMIO spec says the same as PCI. The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Ah, good point. I'm sure Pawel is on a beach somewhere sipping cocktails, so I'll apply this immediately (with your updated commit message) to my pending-rebases and give him the weekend to Ack. Thanks, Rusty. Here's a better commit log: --- virtio_mmio: fix access width for mmio Going over the virtio mmio code, I noticed that it doesn't correctly access modern device config values using natural accessors: it uses readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states: 4.2.2.2 Driver Requirements: MMIO Device Register Layout ... The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Borrow code from virtio_pci_modern to do this correctly. Signed-off-by: Michael S. Tsirkin m...@redhat.com Makes sense now, right? Want me to repost or can you just tweak the commit log? Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Note: untested: QEMU doesn't support virtio 1.0 for virtio-mmio. Pawel, could you please confirm this patch makes sense? drivers/virtio/virtio_mmio.c | 79 +++- 1 file changed, 71 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index cad5698..0375456 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -156,22 +156,85 @@ static void vm_get(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - ptr[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + ptr[i] = readb(base + offset + i); + return; + } + + switch (len) { + case 1: + b = readb(base + offset); + memcpy(buf, b, sizeof b); + break; + case 2: + w = cpu_to_le16(readw(base + offset)); + memcpy(buf, w, sizeof w); + break; + case 4: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + break; + case 8: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + l = cpu_to_le32(ioread32(base + offset + sizeof l)); + memcpy(buf + sizeof l, l, sizeof l); + break; + default: + BUG(); + } } static void vm_set(struct virtio_device *vdev, unsigned offset, const void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - const u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - writeb(ptr[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + const u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + writeb(ptr[i], base + offset + i); + + return; + } + + switch (len) { + case 1: + memcpy(b, buf, sizeof b
Re: [PATCH v2 log fixed] virtio_mmio: fix endian-ness for mmio
Michael S. Tsirkin m...@redhat.com writes: Subject: [PATCH] virtio_mmio: fix access width for mmio Just for the record: Applied. Thanks, Rusty. Going over the virtio mmio code, I noticed that it doesn't correctly access modern device config values using natural accessors: it uses readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states: 4.2.2.2 Driver Requirements: MMIO Device Register Layout ... The driver MUST only use 32 bit wide and aligned reads and writes to access the control registers described in table 4.1. For the device-specific configuration space, the driver MUST use 8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned accesses for 16 bit wide fields and 32 bit wide and aligned accesses for 32 and 64 bit wide fields. Borrow code from virtio_pci_modern to do this correctly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- This is exactly the same as PATCH virtio_mmio: fix endian-ness for mmio but with corrected subject and commit log, to make it easier to apply. Note: untested: QEMU doesn't support virtio 1.0 for virtio-mmio, but seems pretty obvious. Let's apply ASAP so we don't ship incompliant drivers for virtio 1.0? drivers/virtio/virtio_mmio.c | 79 +++- 1 file changed, 71 insertions(+), 8 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index cad5698..0375456 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -156,22 +156,85 @@ static void vm_get(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - ptr[i] = readb(vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + ptr[i] = readb(base + offset + i); + return; + } + + switch (len) { + case 1: + b = readb(base + offset); + memcpy(buf, b, sizeof b); + break; + case 2: + w = cpu_to_le16(readw(base + offset)); + memcpy(buf, w, sizeof w); + break; + case 4: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + break; + case 8: + l = cpu_to_le32(readl(base + offset)); + memcpy(buf, l, sizeof l); + l = cpu_to_le32(ioread32(base + offset + sizeof l)); + memcpy(buf + sizeof l, l, sizeof l); + break; + default: + BUG(); + } } static void vm_set(struct virtio_device *vdev, unsigned offset, const void *buf, unsigned len) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); - const u8 *ptr = buf; - int i; + void __iomem *base = vm_dev-base + VIRTIO_MMIO_CONFIG; + u8 b; + __le16 w; + __le32 l; - for (i = 0; i len; i++) - writeb(ptr[i], vm_dev-base + VIRTIO_MMIO_CONFIG + offset + i); + if (vm_dev-version == 1) { + const u8 *ptr = buf; + int i; + + for (i = 0; i len; i++) + writeb(ptr[i], base + offset + i); + + return; + } + + switch (len) { + case 1: + memcpy(b, buf, sizeof b); + writeb(b, base + offset); + break; + case 2: + memcpy(w, buf, sizeof w); + writew(le16_to_cpu(w), base + offset); + break; + case 4: + memcpy(l, buf, sizeof l); + writel(le32_to_cpu(l), base + offset); + break; + case 8: + memcpy(l, buf, sizeof l); + writel(le32_to_cpu(l), base + offset); + memcpy(l, buf + sizeof l, sizeof l); + writel(le32_to_cpu(l), base + offset + sizeof l); + break; + default: + BUG(); + } } static u8 vm_get_status(struct virtio_device *vdev) -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] uapi/virtio_scsi: allow overriding CDB/SENSE size
Michael S. Tsirkin m...@redhat.com writes: On Wed, Mar 11, 2015 at 02:19:03PM +0100, Michael S. Tsirkin wrote: QEMU wants to use virtio scsi structures with a different VIRTIO_SCSI_CDB_SIZE/VIRTIO_SCSI_SENSE_SIZE, let's add ifdefs to allow overriding them. Keep the old defines under new names: VIRTIO_SCSI_CDB_DEFAULT_SIZE/VIRTIO_SCSI_SENSE_DEFAULT_SIZE, since that's what these values really are: defaults for cdb/sense size fields. Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Rusty, pls note, if possible I'd like this one merged for 4.0 because it's important for QEMU (which now imports linux headers). OK, applied. Thanks, Rusty. include/uapi/linux/virtio_scsi.h | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/virtio_scsi.h b/include/uapi/linux/virtio_scsi.h index 42b9370..cc18ef8 100644 --- a/include/uapi/linux/virtio_scsi.h +++ b/include/uapi/linux/virtio_scsi.h @@ -29,8 +29,16 @@ #include linux/virtio_types.h -#define VIRTIO_SCSI_CDB_SIZE 32 -#define VIRTIO_SCSI_SENSE_SIZE 96 +/* Default values of the CDB and sense data size configuration fields */ +#define VIRTIO_SCSI_CDB_DEFAULT_SIZE 32 +#define VIRTIO_SCSI_SENSE_DEFAULT_SIZE 96 + +#ifndef VIRTIO_SCSI_CDB_SIZE +#define VIRTIO_SCSI_CDB_SIZE VIRTIO_SCSI_CDB_DEFAULT_SIZE +#endif +#ifndef VIRTIO_SCSI_SENSE_SIZE +#define VIRTIO_SCSI_SENSE_SIZE VIRTIO_SCSI_SENSE_DEFAULT_SIZE +#endif /* SCSI command request, followed by data-out */ struct virtio_scsi_cmd_req { -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device
Ohad Ben-Cohen o...@wizery.com writes: On Mon, Mar 9, 2015 at 10:41 AM, Michael S. Tsirkin m...@redhat.com wrote: On Sat, Mar 07, 2015 at 08:06:56PM +0100, Michael S. Tsirkin wrote: virtio spec requires that all drivers set DRIVER_OK before using devices. While rpmsg isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio rpmsg violates this rule: is calls kick before setting DRIVER_OK. The fix isn't trivial since simply calling virtio_device_ready earlier would mean we might get an interrupt in parallel with adding buffers. Instead, split kick out to prepare+notify calls. prepare before virtio_device_ready - when we know we won't get interrupts. notify right afterwards. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Ohad, can you review and ack pls? Sure, Acked-by: Ohad Ben-Cohen o...@wizery.com Applied. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Cornelia Huck cornelia.h...@de.ibm.com writes: On Wed, 4 Mar 2015 11:25:56 +0100 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 04, 2015 at 04:44:54PM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote: Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. I agree, but I'd like to point out that even without kmalloc, on s390 get_config is blocking - it's waiting for a hardware interrupt. And it makes sense: config is not data path, I don't think we should spin there. So I think besides these two parts, we still need my two patches: virtio-balloon: do not call blocking ops when !TASK_RUNNING I prefer to annotate, over trying to fix this. Because it's not important. We might spin a few times, but it's very unlikely, and it's certainly not performance critical. Thanks, Rusty. Subject: virtio_balloon: annotate possible sleep waiting for event. CCW (s390) does this. Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0413157f3b49..3f4d5acdbde0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -340,6 +340,15 @@ static int balloon(void *_vballoon) s64 diff; try_to_freeze(); + + /* + * Reading the config on the ccw backend involves an + * allocation, so we may actually sleep and have an + * extra iteration. It's extremely unlikely, Hmm, this part of the comment seems wrong to me. Reading the config on the ccw backend always sleeps because it's interrupt driven. (...) So I suspect http://mid.gmane.org/1424874878-17155-1-git-send-email-...@redhat.com is better. What do you think? I'd prefer to fix this as well. While the I/O request completes instantly on current qemu (the ssch backend handles the start function immediately, not asynchronously as on real hardware), this (a) is an implementation detail that may change and (b) doesn't account for the need to deliver the interrupt to the guest - which might take non-zero time. Ah, I see. My mistake. I've thrown out my patch, applied that one. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] virtio-balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: On Wed, 25 Feb 2015 16:11:27 +0100 Cornelia Huck cornelia.h...@de.ibm.com wrote: On Wed, 25 Feb 2015 15:36:02 +0100 Michael S. Tsirkin m...@redhat.com wrote: virtio balloon has this code: wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update || kthread_should_stop() || freezing(current)); Which is a problem because towards_target() call might block after wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing the task_struct::state collision typical of nesting of sleeping primitives See also http://lwn.net/Articles/628628/ or Thomas's bug report http://article.gmane.org/gmane.linux.kernel.virtualization/24846 for a fuller explanation. To fix, rewrite using wait_woken. Cc: sta...@vger.kernel.org Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- changes from v1: remove wait_event_interruptible noticed by Cornelia Huck cornelia.h...@de.ibm.com drivers/virtio/virtio_balloon.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) I was able to reproduce Thomas' original problem and can confirm that it is gone with this patch. Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Right, I just applied the patch on my system, too, and the problem is indeed gone! Thanks for the quick fix! Tested-by: Thomas Huth th...@linux.vnet.ibm.com Applied. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_balloon: set DRIVER_OK before using device
Michael S. Tsirkin m...@redhat.com writes: On Thu, Mar 05, 2015 at 01:24:47PM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: virtio spec requires that all drivers set DRIVER_OK before using devices. While balloon isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio balloon might violate this rule: probe calls kthread_run before setting DRIVER_OK, which might run immediately and cause balloon to inflate/deflate. To fix, call virtio_device_ready before running the kthread. Signed-off-by: Michael S. Tsirkin m...@redhat.com Replied, CC'd stable. Did you mean applied? I stand erected. Franks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_rpmsg: set DRIVER_OK before using device
Michael S. Tsirkin m...@redhat.com writes: virtio spec requires that all drivers set DRIVER_OK before using devices. While rpmsg isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio rpmsg violates this rule: is calls kick before setting DRIVER_OK. The fix isn't trivial since simply calling virtio_device_ready earlier would mean we might get an interrupt in parallel with adding buffers. Instead, split kick out to prepare+notify calls. prepare before virtio_device_ready - when we know we won't get interrupts. notify right afterwards. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. I'll wait for Ohad to ack before sending to Linus. BTW I assume you have a version of qemu which warns on these kind of failures? That'd be nice to have! Thanks, Rusty. --- Note: compile-tested only. drivers/rpmsg/virtio_rpmsg_bus.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c index 92f6af6..73354ee 100644 --- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -951,6 +951,7 @@ static int rpmsg_probe(struct virtio_device *vdev) void *bufs_va; int err = 0, i; size_t total_buf_space; + bool notify; vrp = kzalloc(sizeof(*vrp), GFP_KERNEL); if (!vrp) @@ -1030,8 +1031,22 @@ static int rpmsg_probe(struct virtio_device *vdev) } } + /* + * Prepare to kick but don't notify yet - we can't do this before + * device is ready. + */ + notify = virtqueue_kick_prepare(vrp-rvq); + + /* From this point on, we can notify and get callbacks. */ + virtio_device_ready(vdev); + /* tell the remote processor it can start sending messages */ - virtqueue_kick(vrp-rvq); + /* + * this might be concurrent with callbacks, but we are only + * doing notify, not a full kick here, so that's ok. + */ + if (notify) + virtqueue_notify(vrp-rvq); dev_info(vdev-dev, rpmsg host is online\n); -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio fixes pull for 4.0?
Michael S. Tsirkin m...@redhat.com writes: Hi Rusty! There are a bunch of (mostly virtio 1.0 related) fixes for virtio that need to go into 4.0 I think. virtio_blk: typo fix virtio_blk: fix comment for virtio 1.0 OK, I've added these two. I tend to be overcautious after the merge window. virtio_console: init work unconditionally virtio_console: avoid config access from irq virtio_balloon: set DRIVER_OK before using device seem ready? These are in my virtio-next tree already. virtio_mmio: generation support virtio_mmio: fix endian-ness for mmio these two are waiting for ack by Pawel These two fix bugs in virtio 1.0 code for mmio. Host code for that was AFAIK not posted, so I can't test properly. Pawel? I'm waiting on Acks for these two. virtio-balloon: do not call blocking ops when !TASK_RUNNING Rusty, it seems you prefer a different fix for this issue, while Cornelia prefers mine. Maybe both me and Cornelia misunderstand the issue? I know you dealt with a ton of similar issues recently so you are more likely to be right, but I'd like to understand what's going on better all the same. Could you help please? In the longer run, we should handle failures from these callbacks. But we don't need to do that right now. So we want the minimal fix. And an annotation is the minimal fix. The bug has been there for ages; it's just the warning that is new (if we *always* slept, we would spin, but in practice we'll rarely sleep). virtio_rpmsg: set DRIVER_OK before using device Just posted this, but seems pretty obvious. Yep, I've applied this too. Thanks! I think it's a good idea to merge these patches (maybe except the !TASK_RUNNING thing) sooner rather than later, to make sure people have the time to test the fixes properly. Would you like me to pack up (some of them) them up and do a pull request? I'm waiting a bit longer, since they seem to still be tricking in. I'm still chasing a QEMU bug, where it seems to fill in a number too large in the 'len' field for a block device. It should be 1 byte for a block device write, for example. See patch which causes assert() in qemu, but I had to stop at that point (should get back tomorrow I hope). Thanks, Rusty. diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 882a31b..98e99b8 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -243,16 +243,21 @@ int virtio_queue_empty(VirtQueue *vq) } void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx) +unsigned int len_written, unsigned int idx) { -unsigned int offset; +unsigned int offset, tot_wlen; int i; -trace_virtqueue_fill(vq, elem, len, idx); +trace_virtqueue_fill(vq, elem, len_written, idx); + +for (tot_wlen = i = 0; i elem-out_num; i++) { +tot_wlen += elem-out_sg[i].iov_len; +} +assert(len_written = tot_wlen); offset = 0; for (i = 0; i elem-in_num; i++) { -size_t size = MIN(len - offset, elem-in_sg[i].iov_len); +size_t size = MIN(len_written - offset, elem-in_sg[i].iov_len); cpu_physical_memory_unmap(elem-in_sg[i].iov_base, elem-in_sg[i].iov_len, @@ -270,7 +275,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, /* Get a pointer to the next entry in the used ring. */ vring_used_ring_id(vq, idx, elem-index); -vring_used_ring_len(vq, idx, len); +vring_used_ring_len(vq, idx, len_written); } void virtqueue_flush(VirtQueue *vq, unsigned int count) @@ -288,9 +293,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count) } void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len) +unsigned int len_written) { -virtqueue_fill(vq, elem, len, 0); +virtqueue_fill(vq, elem, len_written, 0); virtqueue_flush(vq, 1); } diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index df09993..153374f 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -191,10 +191,10 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void virtio_del_queue(VirtIODevice *vdev, int n); void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len); +unsigned int len_written); void virtqueue_flush(VirtQueue *vq, unsigned int count); void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, -unsigned int len, unsigned int idx); +unsigned int len_written, unsigned int idx); void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, size_t num_sg, int is_write); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org
Re: [PATCH] virtio_balloon: set DRIVER_OK before using device
Michael S. Tsirkin m...@redhat.com writes: virtio spec requires that all drivers set DRIVER_OK before using devices. While balloon isn't yet included in the virtio 1 spec, previous spec versions also required this. virtio balloon might violate this rule: probe calls kthread_run before setting DRIVER_OK, which might run immediately and cause balloon to inflate/deflate. To fix, call virtio_device_ready before running the kthread. Signed-off-by: Michael S. Tsirkin m...@redhat.com Replied, CC'd stable. Thanks, Rusty. --- drivers/virtio/virtio_balloon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 5a6ad6d..6a356e3 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -508,6 +508,8 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err 0) goto out_oom_notify; + virtio_device_ready(vdev); + vb-thread = kthread_run(balloon, vb, vballoon); if (IS_ERR(vb-thread)) { err = PTR_ERR(vb-thread); -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0
Michael S. Tsirkin m...@redhat.com writes: On Wed, Mar 04, 2015 at 11:06:08AM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: virtio 1.0 config space is in LE format for all devices, use modern wrappers when accessed through the 1.0 BAR. Hmm, I'm not so sure about these patches. It's easy to miss the existence of the _modern variants, and they're 90% the same as the legacy variants. But as long as it's fixed... Thanks, Rusty. PS. rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE host). Hmm good point. What if I'll rework this to get bool legacy parameter? The functions have access to vdev, so they can determine that already. Simply renaming the old version to _legacy would be enough. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote: Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. I agree, but I'd like to point out that even without kmalloc, on s390 get_config is blocking - it's waiting for a hardware interrupt. And it makes sense: config is not data path, I don't think we should spin there. So I think besides these two parts, we still need my two patches: virtio-balloon: do not call blocking ops when !TASK_RUNNING I prefer to annotate, over trying to fix this. Because it's not important. We might spin a few times, but it's very unlikely, and it's certainly not performance critical. Thanks, Rusty. Subject: virtio_balloon: annotate possible sleep waiting for event. CCW (s390) does this. Reported-by: Thomas Huth th...@linux.vnet.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 0413157f3b49..3f4d5acdbde0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -340,6 +340,15 @@ static int balloon(void *_vballoon) s64 diff; try_to_freeze(); + + /* +* Reading the config on the ccw backend involves an +* allocation, so we may actually sleep and have an +* extra iteration. It's extremely unlikely, and this +* isn't a fast path in any sense. +*/ + sched_annotate_sleep(); + wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio-pci: switch to modern accessors for 1.0
Michael S. Tsirkin m...@redhat.com writes: virtio 1.0 config space is in LE format for all devices, use modern wrappers when accessed through the 1.0 BAR. Hmm, I'm not so sure about these patches. It's easy to miss the existence of the _modern variants, and they're 90% the same as the legacy variants. But as long as it's fixed... Thanks, Rusty. PS. rng, block, net and 9p virtio 1.0 seem to work OK on BE guests (LE host). ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_console: avoid config access from irq
Amit Shah amit.s...@redhat.com writes: On (Sat) 28 Feb 2015 [18:42:25], Michael S. Tsirkin wrote: when multiport is off, virtio console invokes config access from irq context, config access is blocking on s390. Fix this up by scheduling work from config irq - similar to what we do for multiport configs. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Amit Shah amit.s...@redhat.com Applied, thanks. Should these be CC: stable? Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_console: init work unconditionally
Amit Shah amit.s...@redhat.com writes: On (Sat) 28 Feb 2015 [18:41:34], Michael S. Tsirkin wrote: when multiport is off, we don't initialize config work, but we then cancel uninitialized control_work on freeze. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Amit Shah amit.s...@redhat.com Thanks, applied. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_blk: fix comment for virtio 1.0
Michael S. Tsirkin m...@redhat.com writes: Fix up comment to match virtio 1.0 logic: virtio_blk_outhdr isn't the first elements anymore, the only requirement is that it comes first in the s/g list. Signed-off-by: Michael S. Tsirkin m...@redhat.com Thanks, both applied. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: On Thu, 26 Feb 2015 11:50:42 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. AFAIK this is currently not a problem. According to http://lwn.net/Articles/627419/ these kmalloc calls never fail because they allocate less than a page. I strongly suggest you unlearn that fact. The fix for this is in two parts: 1) Annotate using sched_annotate_sleep() and add a comment: we may spin a few times in low memory situations, but this isn't a high performance path. 2) Handle get_config (and other) failure in some more elegant way. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio balloon: do not call blocking ops when !TASK_RUNNING
Thomas Huth th...@linux.vnet.ibm.com writes: Hi all, with the recent kernel 3.19, I get a kernel warning when I start my KVM guest on s390 with virtio balloon enabled: The deeper problem is that virtio_ccw_get_config just silently fails on OOM. Neither get_config nor set_config are expected to fail. Cornelia, I think ccw and config_area should be allocated inside vcdev. You could either use pointers, or simply allocate vcdev with GDP_DMA. This would avoid the kmalloc inside these calls. Thanks, Rusty. [0.839687] do not call blocking ops when !TASK_RUNNING; state=1 set at [00174a1e] prepare_to_wait_event+0x7e/0x108 [0.839694] [ cut here ] [0.839697] WARNING: at kernel/sched/core.c:7326 [0.839698] Modules linked in: [0.839702] CPU: 0 PID: 46 Comm: vballoon Not tainted 3.19.0 #233 [0.839705] task: 021d ti: 021d8000 task.ti: 021d8000 [0.839707] Krnl PSW : 0704c0018000 0015bf8e (__might_sleep+0x8e/0x98) [0.839713]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 000d 021d 0071 0001 [0.839718]00675ace 01998c50 [0.839720]00982134 0058f824 00a008a8 [0.839722]04d9 007ea992 0015bf8a 021dbc28 [0.839731] Krnl Code: 0015bf7e: c0200033e838 larl %r2,7d8fee 0015bf84: c0e50028cd62 brasl %r14,675a48 #0015bf8a: a7f40001 brc 15,15bf8c 0015bf8e: 9201a000 mvi 0(%r10),1 0015bf92: a7f4ffe2 brc 15,15bf56 0015bf96: 0707 bcr 0,%r7 0015bf98: ebdff0800024 stmg%r13,%r15,128(%r15) 0015bf9e: a7f13fe0 tmll%r15,16352 [0.839749] Call Trace: [0.839751] ([0015bf8a] __might_sleep+0x8a/0x98) [0.839756] [0028a562] __kmalloc+0x272/0x350 [0.839759] [0058f824] virtio_ccw_get_config+0x3c/0x100 [0.839762] [0049fcb0] balloon+0x1b8/0x330 [0.839765] [001529c8] kthread+0x120/0x138 [0.839767] [00683c22] kernel_thread_starter+0x6/0xc [0.839770] [00683c1c] kernel_thread_starter+0x0/0xc [0.839772] no locks held by vballoon/46. [0.839773] Last Breaking-Event-Address: [0.839776] [0015bf8a] __might_sleep+0x8a/0x98 [0.839778] ---[ end trace d27fcdfa27273d7c ]--- The problem seems to be this code in balloon() in drivers/virtio/virtio_balloon.c: wait_event_interruptible(vb-config_change, (diff = towards_target(vb)) != 0 || vb-need_stats_update || kthread_should_stop() || freezing(current)); wait_event_interruptible() sets the state of the current task to TASK_INTERRUPTIBLE, then checks the condition. The condition contains towards_target() which reads the virtio config space via virtio_cread(). On s390, this then triggers virtio_ccw_get_config() - and this function calls some other functions again that might sleep (e.g. kzalloc or wait_event in ccw_io_helper) ... and this causes the new kernel warning message with kernel 3.19. I think it would be quite difficult or at least ugly to rewrite virtio_ccw_get_config() so that it does not call sleepable functions anymore. So would it be feasible to rewrite the balloon() function that it does not call the towards_target() in its wait_event condition anymore? I am unfortunately not that familiar with the balloon code semantics, so any help is very appreciated here! Thanks, Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Qemu and virtio 1.0
OK, I am trying to experiment with virtio 1.0 support using the latest kernel and MST's qemu tree: https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/?h=virtio-1.0 The first issue is that the device config endian was wrong (see attached patch). I'm now setting up a BE guest on my x86 laptop, and a BE and LE guest on a BE powerpc machine, to check that all combinations work correctly. If others test too, that would be appreciated! Cheers, Rusty. From 95ac91554ed602f856a2a5fcc25eaffcad1b1c8d Mon Sep 17 00:00:00 2001 From: Rusty Russell ru...@rustcorp.com.au Date: Tue, 24 Feb 2015 14:47:44 +1030 Subject: [PATCH] virtio_config_write*/virtio_config_read*: Don't endian swap for virtio 1.0. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 079944c..882a31b 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -662,7 +662,12 @@ uint32_t virtio_config_readw(VirtIODevice *vdev, uint32_t addr) k-get_config(vdev, vdev-config); -val = lduw_p(vdev-config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = lduw_le_p(vdev-config + addr); +} else { +val = lduw_p(vdev-config + addr); +} return val; } @@ -677,7 +682,12 @@ uint32_t virtio_config_readl(VirtIODevice *vdev, uint32_t addr) k-get_config(vdev, vdev-config); -val = ldl_p(vdev-config + addr); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +val = ldl_le_p(vdev-config + addr); +} else { +val = ldl_p(vdev-config + addr); +} return val; } @@ -706,7 +716,12 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stw_p(vdev-config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stw_le_p(vdev-config + addr, val); +} else { +stw_p(vdev-config + addr, val); +} if (k-set_config) { k-set_config(vdev, vdev-config); @@ -722,7 +737,12 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t addr, uint32_t data) return; } -stl_p(vdev-config + addr, val); +/* Virtio 1.0 is always LE */ +if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) { +stl_le_p(vdev-config + addr, val); +} else { +stl_p(vdev-config + addr, val); +} if (k-set_config) { k-set_config(vdev, vdev-config); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PULL] virtio-next
The following changes since commit b97f880c8342fd6e49a02c9ef7507a678722b2b3: Merge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata (2015-01-21 07:54:16 +1200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git tags/virtio-next-for-linus for you to fetch changes up to 5b40a7daf51812b35cf05d1601a779a7043f8414: virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. (2015-02-17 16:19:29 +1030) OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. Luis R. Rodriguez (1): virtual: Documentation: simplify and generalize paravirt_ops.txt Michael S. Tsirkin (23): virtio_pci: drop virtio_config dependency virtio/9p: verify device has config space virtio/blk: verify device has config space virtio/console: verify device has config space virtio/net: verify device has config space virtio/scsi: verify device has config space virtio/balloon: verify device has config space mn10300: drop dead code pci: add pci_iomap_range s390: add pci_iomap_range virtio_pci: move probe/remove code to common virtio_pci: modern driver virtio_pci_modern: reduce number of mappings virtio_pci_modern: support devices with no config virtio_balloon: coding style fixes virtio_blk: coding style fixes virtio_ring: coding style fix virtio_rng: drop extra empty line virtio_pci: Kconfig grammar fix virtio_pci: drop Kconfig warnings virtio_pci: add an option to disable legacy driver virtio_pci: add module param to force legacy mode virtio_pci_modern: drop an unused function Pawel Moll (1): virtio-mmio: Update the device to OASIS spec version Rusty Russell (53): virtio-pci: define layout for virtio 1.0 virtio_pci: macros for PCI layout offsets virtio: define VIRTIO_PCI_CAP_PCI_CFG in header. virtio: Don't expose legacy block features when VIRTIO_BLK_NO_LEGACY defined. virtio: Don't expose legacy config features when VIRTIO_CONFIG_NO_LEGACY defined. virtio_pci: use 16-bit accessor for queue_enable. virtio: don't require a config space on the console device. lguest: have --rng read from /dev/urandom not /dev/random. lguest: add operations to get/set a register from the Launcher. lguest: write more information to userspace about pending traps. lguest: add infrastructure for userspace to deliver a trap to the guest. lguest: add infrastructure to check mappings. lguest: send trap 13 through to userspace. lguest: suppress PS/2 keyboard polling. lguest: don't disable iospace. lguest: add iomem region, where guest page faults get sent to userspace. lguest: disable ACPI explicitly. lguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC lguest: add MMIO region allocator in example launcher. lguest: decode mmio accesses for PCI in example launcher. lguest: add PCI config space emulation to example launcher. lguest: implement virtio-PCI MMIO accesses. lguest: fix failure to find linux/virtio_types.h lguest: add a dummy PCI host bridge. lguest: Convert block device to virtio 1.0 PCI. lguest: Convert net device to virtio 1.0 PCI. lguest: Convert entropy device to virtio 1.0 PCI. lguest: Convert console device to virtio 1.0 PCI. lguest: define VIRTIO_CONFIG_NO_LEGACY in example launcher. lguest: remove support for lguest bus. lguest: remove support for lguest bus in demonstration launcher. lguest: remove lguest bus definitions from header. lguest: support emerg_wr in console device in example launcher. lguest: support backdoor window. lguest: always put console in PCI slot #1. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: remove NOTIFY facility from demonstration launcher. lguest: remove NOTIFY call and eventfd facility. tools/lguest: handle device reset correctly in example launcher. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: handle indirect partway through chain. tools/lguest: don't start devices until DRIVER_OK status set. lguest: don't look in console features to find emerg_wr. tools/lguest: more documentation
Re: [PATCH 1/2] virtio_pci_modern: type-safe io accessors
Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: The spec is very clear on this: 4.1.3.1 Driver Requirements: PCI Device Layout The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. Add type-safe wrappers to prevent access with incorrect width. Applied both (for *next* merge window). ... and fixed this: +static inline void vp_iowrite32(u32 value, u32 __iomem *addr) +{ +iowrite16(value, addr); +} Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
Cornelia Huck cornelia.h...@de.ibm.com writes: On Fri, 13 Feb 2015 13:22:09 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: This patch enables the interrupt coalescing setting through ethtool. The problem is that there's nothing network specific about interrupt coalescing. I can see other devices wanting exactly the same thing, which means we'd deprecate this in the next virtio standard. I think the right answer is to extend like we did with vring_used_event(), eg: 1) Add a new feature VIRTIO_F_RING_COALESCE. 2) Add another a 32-bit field after vring_used_event(), eg: #define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 2])) This loses the ability to coalesce by number of frames, but we can still do number of sg entries, as we do now with used_event, and we could change virtqueue_enable_cb_delayed() to take a precise number if we wanted. But do we expect delay to be update dynamically? If not, why not stick it in config space? Hmm, we could update it dynamically (and will, in the case of ethtool). But it won't be common, so we could append a field to virtio_pci_common_cfg for PCI. I think MMIO and CCW would be easy to extend too, but CC'd to check. If this is a simple extension of the config space, it should just work for ccw (the Linux guest driver currently uses 0x100 as max config space size, which I grabbed from pci at the time I wrote it). But looking at this virtio_pci_common_cfg stuff, it seems to contain a lot of things that are handled via ccws on virtio-ccw (like number of queues or device status). Having an extra ccw just for changing this delay value seems like overkill. Yes, possibly. On the basic topic of interrupt coalescing: With adapter interrupts, virtio-ccw already has some kind of coalescing: The summary indicator is set just once and an interrupt is made pending, then individual queue indicators are switched on and no further interrupt is generated if the summary indicator has not been cleared by the guest yet. I'm not sure how it would be different if an individual queue indicator is switched on later. Chances are that the guest code processing the indicators has not even yet processed to that individual indicator, so it wouldn't matter if it was set delayed. It is probably something that has to be tried out. The network driver will do this at the virtio level too: no more rx interrupts will be received until all packets have been processed. But it is particularly useful for network transmit interrupts: we want to be notified of the packet's finishing, but a little delay (hence more batching) is better. For rx, I can envision a case where the guest is too fast and thus keeps getting interrupted after each packet. A user might decide to trade off some latency to increase batching; seems like a bit like a benchmark hack to me, though... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
Pawel Moll pawel.m...@arm.com writes: On Fri, 2015-02-13 at 02:52 +, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: This patch enables the interrupt coalescing setting through ethtool. The problem is that there's nothing network specific about interrupt coalescing. I can see other devices wanting exactly the same thing, which means we'd deprecate this in the next virtio standard. I think the right answer is to extend like we did with vring_used_event(), eg: 1) Add a new feature VIRTIO_F_RING_COALESCE. 2) Add another a 32-bit field after vring_used_event(), eg: #define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 2])) This loses the ability to coalesce by number of frames, but we can still do number of sg entries, as we do now with used_event, and we could change virtqueue_enable_cb_delayed() to take a precise number if we wanted. But do we expect delay to be update dynamically? If not, why not stick it in config space? Hmm, we could update it dynamically (and will, in the case of ethtool). But it won't be common, so we could append a field to virtio_pci_common_cfg for PCI. I think MMIO and CCW would be easy to extend too, but CC'd to check. As far as I understand the virtio_pci_common_cfg principle (just had a look, for the first time ;-), it's now an equivalent of the MMIO control registers block. I see no major problem with adding another one. OK, thanks. Or were you thinking about introducing some standard for the real config space? (fine with me as well - the transport will have nothing to do :-) No, that'd not be possible at this point. I think it's a per-transport decision. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] virtio_pci_modern: type-safe io accessors
Michael S. Tsirkin m...@redhat.com writes: The spec is very clear on this: 4.1.3.1 Driver Requirements: PCI Device Layout The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. Add type-safe wrappers to prevent access with incorrect width. Applied both (for *next* merge window). Thanks, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci_modern.c | 37 + 1 file changed, 37 insertions(+) diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 2aa38e5..4e73ef7 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -20,6 +20,43 @@ #define VIRTIO_PCI_NO_LEGACY #include virtio_pci_common.h +/* + * Type-safe wrappers for io accesses. + * Use these to enforce at compile time the following spec requirement: + * + * The driver MUST access each field using the “natural” access + * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses + * for 16-bit fields and 8-bit accesses for 8-bit fields. + */ +static inline u8 vp_ioread8(u8 __iomem *addr) +{ + return ioread8(addr); +} +static inline u16 vp_ioread16 (u16 __iomem *addr) +{ + return ioread16(addr); +} + +static inline u32 vp_ioread32(u32 __iomem *addr) +{ + return ioread32(addr); +} + +static inline void vp_iowrite8(u8 value, u8 __iomem *addr) +{ + iowrite8(value, addr); +} + +static inline void vp_iowrite16(u16 value, u16 __iomem *addr) +{ + iowrite16(value, addr); +} + +static inline void vp_iowrite32(u32 value, u32 __iomem *addr) +{ + iowrite16(value, addr); +} + static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen, u32 align, -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
I noticed this with the console device. It's not *wrong*, just a bit weird. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index b9f70dfc4751..5ce2aa48fc6e 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -236,7 +236,10 @@ static int virtio_dev_probe(struct device *_d) if (err) goto err; - add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + /* If probe didn't do it, mark device DRIVER_OK ourselves. */ + if (!(dev-config-get_status(dev) VIRTIO_CONFIG_S_DRIVER_OK)) + virtio_device_ready(dev); + if (drv-scan) drv-scan(dev); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
Michael S. Tsirkin m...@redhat.com writes: On Tue, Feb 10, 2015 at 12:02:37PM +1030, Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: This patch enables the interrupt coalescing setting through ethtool. The problem is that there's nothing network specific about interrupt coalescing. I can see other devices wanting exactly the same thing, which means we'd deprecate this in the next virtio standard. I think the right answer is to extend like we did with vring_used_event(), eg: 1) Add a new feature VIRTIO_F_RING_COALESCE. 2) Add another a 32-bit field after vring_used_event(), eg: #define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 2])) This loses the ability to coalesce by number of frames, but we can still do number of sg entries, as we do now with used_event, and we could change virtqueue_enable_cb_delayed() to take a precise number if we wanted. But do we expect delay to be update dynamically? If not, why not stick it in config space? Hmm, we could update it dynamically (and will, in the case of ethtool). But it won't be common, so we could append a field to virtio_pci_common_cfg for PCI. I think MMIO and CCW would be easy to extend too, but CC'd to check. My feeling is that this should be a v1.0-only feature though (eg. feature bit 33). Yes, e.g. we can't extend config space for legacy virtio pci. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtual: Documentation: simplify and generalize paravirt_ops.txt
Luis R. Rodriguez mcg...@do-not-panic.com writes: From: Luis R. Rodriguez mcg...@suse.com The general documentation we have for pv_ops is currenty present on the IA64 docs, but since this documentation covers IA64 xen enablement and IA64 Xen support got ripped out a while ago through commit d52eefb47 present since v3.14-rc1 lets just simplify, generalize and move the pv_ops documentation to a shared place. OK, I've applied this. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] virtio: Introducing virtio_pci.no_mmio, the worst boot option in history.
Sometimes, devices are just too damn fast. Wouldn't it be nice if we could (1) have an option to access them in the most baroque way possible, and (2) ensure that even the normal case caused extra bloat? Well, wish no longer: This patch does exactly that! Since every complaint virtio 1.0 device has to have a capability to allow backdoor access into the BARs, we can use that: 4.1.4.7.1 Device Requirements: PCI configuration access capability The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability. Upon detecting driver write access to pci_cfg_data, the device MUST execute a write access at offset cap.offset at BAR selected by cap.bar using the first cap.length bytes from pci_cfg_data. Upon detecting driver read access to pci_cfg_data, the device MUST execute a read access of length cap.length at offset cap.offset at BAR selected by cap.bar and store the first cap.length bytes in pci_cfg_data. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/virtio/virtio_pci_common.c | 31 ++- drivers/virtio/virtio_pci_common.h | 21 + drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 183 ++--- 4 files changed, 221 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 4e6132dd0ca3..0030180411cc 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -29,45 +29,68 @@ MODULE_PARM_DESC(force_legacy, u8 vp_read_isr(const struct virtio_pci_device *dev) { + if (dev-no_mmio) + return dev-no_mmio-read8(dev, dev-isr, 0); return ioread8(dev-isr); } void vp_write_notify(const struct virtqueue *vq, u16 vqindex) { - iowrite16(vqindex, (void __iomem *)vq-priv); + struct virtio_pci_device *vp_dev = to_vp_device(vq-vdev); + + if (vp_dev-no_mmio) + vp_dev-no_mmio-write16(vp_dev, (void __iomem *)vq-priv, +0, vqindex); + else + iowrite16(vqindex, (void __iomem *)vq-priv); } u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset) { + if (dev-no_mmio) + return dev-no_mmio-read32(dev, dev-common, offset); return ioread32((void __iomem *)dev-common + offset); } u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset) { + if (dev-no_mmio) + return dev-no_mmio-read16(dev, dev-common, offset); return ioread16((void __iomem *)dev-common + offset); } u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset) { + if (dev-no_mmio) + return dev-no_mmio-read8(dev, dev-common, offset); return ioread8((void __iomem *)dev-common + offset); } void __vp_write_common32(const struct virtio_pci_device *dev, size_t offset, u32 val) { - iowrite32(val, (void __iomem *)dev-common + offset); + if (dev-no_mmio) + dev-no_mmio-write32(dev, dev-common, offset, val); + else + iowrite32(val, (void __iomem *)dev-common + offset); } void __vp_write_common16(const struct virtio_pci_device *dev, size_t offset, u16 val) { - iowrite16(val, (void __iomem *)dev-common + offset); + if (dev-no_mmio) + dev-no_mmio-write16(dev, dev-common, offset, val); + else + iowrite16(val, (void __iomem *)dev-common + offset); } void __vp_write_common8(const struct virtio_pci_device *dev, size_t offset, u8 val) { - iowrite8(val, (void __iomem *)dev-common + offset); + if (dev-no_mmio) + dev-no_mmio-write8(dev, dev-common, offset, val); + else + iowrite8(val, (void __iomem *)dev-common + offset); } /* wait for pending irq handlers */ diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index 15a20c968ae7..d891e3123cdd 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -53,6 +53,9 @@ struct virtio_pci_device { struct virtio_device vdev; struct pci_dev *pci_dev; + /* This is only valid for modern devices. */ + const struct virtio_pci_no_mmio_ops *no_mmio; + /* In legacy mode, these two point to within -legacy. */ /* Where to read and clear interrupt */ u8 __iomem *isr; @@ -75,6 +78,9 @@ struct virtio_pci_device { /* Multiply queue_notify_off by this value. (non-legacy mode). */ u32 notify_offset_multiplier; + /* PCI config window for BAR access (non-legacy mode). */ + int window; + /* Legacy only field */ /* the IO mapping for the PCI config space */ void __iomem *ioaddr; @@ -113,6 +119,21 @@ struct virtio_pci_device
[PATCH 1/2] virtio_pci: abstract all MMIO accesses.
This is in preparation for testing the virtio pci config mmio backdoor. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/virtio/virtio_pci_common.c | 47 - drivers/virtio/virtio_pci_common.h | 41 +++ drivers/virtio/virtio_pci_modern.c | 140 - 3 files changed, 177 insertions(+), 51 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index e894eb278d83..4e6132dd0ca3 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -27,6 +27,49 @@ MODULE_PARM_DESC(force_legacy, Force legacy mode for transitional virtio 1 devices); #endif +u8 vp_read_isr(const struct virtio_pci_device *dev) +{ + return ioread8(dev-isr); +} + +void vp_write_notify(const struct virtqueue *vq, u16 vqindex) +{ + iowrite16(vqindex, (void __iomem *)vq-priv); +} + +u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset) +{ + return ioread32((void __iomem *)dev-common + offset); +} + +u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset) +{ + return ioread16((void __iomem *)dev-common + offset); +} + +u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset) +{ + return ioread8((void __iomem *)dev-common + offset); +} + +void __vp_write_common32(const struct virtio_pci_device *dev, +size_t offset, u32 val) +{ + iowrite32(val, (void __iomem *)dev-common + offset); +} + +void __vp_write_common16(const struct virtio_pci_device *dev, +size_t offset, u16 val) +{ + iowrite16(val, (void __iomem *)dev-common + offset); +} + +void __vp_write_common8(const struct virtio_pci_device *dev, + size_t offset, u8 val) +{ + iowrite8(val, (void __iomem *)dev-common + offset); +} + /* wait for pending irq handlers */ void vp_synchronize_vectors(struct virtio_device *vdev) { @@ -45,7 +88,7 @@ bool vp_notify(struct virtqueue *vq) { /* we write the queue's selector into the notification register to * signal the other end */ - iowrite16(vq-index, (void __iomem *)vq-priv); + vp_write_notify(vq, vq-index); return true; } @@ -89,7 +132,7 @@ static irqreturn_t vp_interrupt(int irq, void *opaque) /* reading the ISR has the effect of also clearing it so it's very * important to save off the value. */ - isr = ioread8(vp_dev-isr); + isr = vp_read_isr(vp_dev); /* It's definitely not us if the ISR was not high */ if (!isr) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index 28ee4e56badf..15a20c968ae7 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -113,6 +113,47 @@ struct virtio_pci_device { u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector); }; +/* Accessor functions. */ +u8 vp_read_isr(const struct virtio_pci_device *dev); +void vp_write_notify(const struct virtqueue *vq, u16 vqindex); +u32 __vp_read_common32(const struct virtio_pci_device *dev, size_t offset); +u16 __vp_read_common16(const struct virtio_pci_device *dev, size_t offset); +u8 __vp_read_common8(const struct virtio_pci_device *dev, size_t offset); +void __vp_write_common32(const struct virtio_pci_device *dev, +size_t offset, u32 val); +void __vp_write_common16(const struct virtio_pci_device *dev, +size_t offset, u16 val); +void __vp_write_common8(const struct virtio_pci_device *dev, + size_t offset, u8 val); + +#define vp_read_common32(dev, fieldname) \ + __vp_read_common32((dev), \ + offsetof(struct virtio_pci_common_cfg, fieldname) + \ + BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 4)) +#define vp_read_common16(dev, fieldname) \ + __vp_read_common16((dev), \ + offsetof(struct virtio_pci_common_cfg, fieldname) + \ + BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 2)) +#define vp_read_common8(dev, fieldname)\ + __vp_read_common8((dev),\ + offsetof(struct virtio_pci_common_cfg, fieldname) + \ + BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 1)) +#define vp_write_common32(dev, fieldname, val) \ + __vp_write_common32((dev), \ + offsetof(struct virtio_pci_common_cfg, fieldname) + \ + BUILD_BUG_ON_ZERO(sizeof((dev)-common-fieldname) != 4), \ + (val)) +#define vp_write_common16(dev, fieldname, val) \ + __vp_write_common16
[RFC 0/2] virtio_pci: patches never to apply.
This should allow testing when QEMU gets VIRTIO_PCI_CAP_PCI_CFG support, but I'm pretty sure we should never allow these patches upstream. Tested with lguest (in virtio-next), which supports VIRTIO_PCI_CAP_PCI_CFG. Rusty Russell (2): virtio_pci: abstract all MMIO accesses. virtio: Introducing virtio_pci.no_mmio, the worst boot option in history. drivers/virtio/virtio_pci_common.c | 70 +++- drivers/virtio/virtio_pci_common.h | 62 drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 317 ++--- 4 files changed, 391 insertions(+), 59 deletions(-) -- 2.1.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
Michael S. Tsirkin m...@redhat.com writes: On Tue, Feb 10, 2015 at 11:33:52AM +1030, Rusty Russell wrote: Jason Wang jasow...@redhat.com writes: We currently does: bufs = (avail-idx - last_used_idx) * 3 / 4; This is ok now since we only try to enable the delayed callbacks when the queue is about to be full. This may not work well when there is only one pending buffer in the virtqueue (this may be the case after tx interrupt was enabled). Since virtqueue_enable_cb() will return false which may cause unnecessary triggering of napis. This patch correct this by only calculate the four thirds when bufs is not one. I mildly prefer to avoid the branch, by changing the calculation like so: /* Set bufs = 1, even if there's only one pending buffer */ bufs = (bufs + 1) * 3 / 4; Or bus * 3/4 + 1 But it's not clear to me how much this happens. I'm happy with the patch though, as currently virtqueue_enable_cb_delayed() is the same as virtqueue_enable_cb() if there's only been one buffer added. Cheers, Rusty. But isn't this by design? The documentation says: * This re-enables callbacks but hints to the other side to delay * interrupts until most of the available buffers have been processed; Clearly, this implies that when there's one buffer it must behave exactly the same. Yes, my mistake. We could hit the never gets notified case with this change, and that's a much bigger problem. So I don't think we can accept this patch... Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 2/6] virtio_ring: try to disable event index callbacks in virtqueue_disable_cb()
Jason Wang jasow...@redhat.com writes: Currently, we do nothing to prevent the callbacks in virtqueue_disable_cb() when event index is used. This may cause spurious interrupts which may damage the performance. This patch tries to publish avail event as the used even to prevent the callbacks. Signed-off-by: Jason Wang jasow...@redhat.com Acked-by: Rusty Russell ru...@rustcorp.com.au --- drivers/virtio/virtio_ring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 545fed5..e9ffbfb 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -539,6 +539,8 @@ void virtqueue_disable_cb(struct virtqueue *_vq) struct vring_virtqueue *vq = to_vvq(_vq); vq-vring.avail-flags |= cpu_to_virtio16(_vq-vdev, VRING_AVAIL_F_NO_INTERRUPT); + vring_used_event(vq-vring) = cpu_to_virtio16(_vq-vdev, +vq-vring.avail-idx); } EXPORT_SYMBOL_GPL(virtqueue_disable_cb); -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 net-next 4/6] virtio-net: add basic interrupt coalescing support
Jason Wang jasow...@redhat.com writes: This patch enables the interrupt coalescing setting through ethtool. The problem is that there's nothing network specific about interrupt coalescing. I can see other devices wanting exactly the same thing, which means we'd deprecate this in the next virtio standard. I think the right answer is to extend like we did with vring_used_event(), eg: 1) Add a new feature VIRTIO_F_RING_COALESCE. 2) Add another a 32-bit field after vring_used_event(), eg: #define vring_used_delay(vr) (*(u32 *)((vr)-avail-ring[(vr)-num + 2])) This loses the ability to coalesce by number of frames, but we can still do number of sg entries, as we do now with used_event, and we could change virtqueue_enable_cb_delayed() to take a precise number if we wanted. My feeling is that this should be a v1.0-only feature though (eg. feature bit 33). Cheers, Rusty. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/net/virtio_net.c| 67 + include/uapi/linux/virtio_net.h | 12 2 files changed, 79 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index cc5f5de..2b958fb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -145,6 +145,11 @@ struct virtnet_info { /* Budget for polling tx completion */ u32 tx_work_limit; + + __u32 rx_coalesce_usecs; + __u32 rx_max_coalesced_frames; + __u32 tx_coalesce_usecs; + __u32 tx_max_coalesced_frames; }; struct padded_vnet_hdr { @@ -1404,12 +1409,73 @@ static void virtnet_get_channels(struct net_device *dev, channels-other_count = 0; } +static int virtnet_set_coalesce(struct net_device *dev, + struct ethtool_coalesce *ec) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct scatterlist sg; + struct virtio_net_ctrl_coalesce c; + + if (!vi-has_cvq || + !virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_COALESCE)) + return -EOPNOTSUPP; + if (vi-rx_coalesce_usecs != ec-rx_coalesce_usecs || + vi-rx_max_coalesced_frames != ec-rx_max_coalesced_frames) { + c.coalesce_usecs = ec-rx_coalesce_usecs; + c.max_coalesced_frames = ec-rx_max_coalesced_frames; + sg_init_one(sg, c, sizeof(c)); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE, + VIRTIO_NET_CTRL_COALESCE_RX_SET, + sg)) { + dev_warn(dev-dev, Fail to set rx coalescing\n); + return -EINVAL; + } + vi-rx_coalesce_usecs = ec-rx_coalesce_usecs; + vi-rx_max_coalesced_frames = ec-rx_max_coalesced_frames; + } + + if (vi-tx_coalesce_usecs != ec-tx_coalesce_usecs || + vi-tx_max_coalesced_frames != ec-tx_max_coalesced_frames) { + c.coalesce_usecs = ec-tx_coalesce_usecs; + c.max_coalesced_frames = ec-tx_max_coalesced_frames; + sg_init_one(sg, c, sizeof(c)); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_COALESCE, + VIRTIO_NET_CTRL_COALESCE_TX_SET, + sg)) { + dev_warn(dev-dev, Fail to set tx coalescing\n); + return -EINVAL; + } + vi-tx_coalesce_usecs = ec-tx_coalesce_usecs; + vi-tx_max_coalesced_frames = ec-tx_max_coalesced_frames; + } + + vi-tx_work_limit = ec-tx_max_coalesced_frames_irq; + + return 0; +} + +static int virtnet_get_coalesce(struct net_device *dev, + struct ethtool_coalesce *ec) +{ + struct virtnet_info *vi = netdev_priv(dev); + + ec-rx_coalesce_usecs = vi-rx_coalesce_usecs; + ec-rx_max_coalesced_frames = vi-rx_max_coalesced_frames; + ec-tx_coalesce_usecs = vi-tx_coalesce_usecs; + ec-tx_max_coalesced_frames = vi-tx_max_coalesced_frames; + ec-tx_max_coalesced_frames_irq = vi-tx_work_limit; + + return 0; +} + static const struct ethtool_ops virtnet_ethtool_ops = { .get_drvinfo = virtnet_get_drvinfo, .get_link = ethtool_op_get_link, .get_ringparam = virtnet_get_ringparam, .set_channels = virtnet_set_channels, .get_channels = virtnet_get_channels, + .set_coalesce = virtnet_set_coalesce, + .get_coalesce = virtnet_get_coalesce, }; #define MIN_MTU 68 @@ -2048,6 +2114,7 @@ static unsigned int features[] = { VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, VIRTIO_NET_F_CTRL_MAC_ADDR, VIRTIO_F_ANY_LAYOUT, + VIRTIO_NET_F_CTRL_COALESCE, }; static struct virtio_driver virtio_net_driver = { diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux
Re: [PATCH RFC v5 net-next 1/6] virtio_ring: fix virtqueue_enable_cb() when only 1 buffers were pending
Jason Wang jasow...@redhat.com writes: We currently does: bufs = (avail-idx - last_used_idx) * 3 / 4; This is ok now since we only try to enable the delayed callbacks when the queue is about to be full. This may not work well when there is only one pending buffer in the virtqueue (this may be the case after tx interrupt was enabled). Since virtqueue_enable_cb() will return false which may cause unnecessary triggering of napis. This patch correct this by only calculate the four thirds when bufs is not one. I mildly prefer to avoid the branch, by changing the calculation like so: /* Set bufs = 1, even if there's only one pending buffer */ bufs = (bufs + 1) * 3 / 4; But it's not clear to me how much this happens. I'm happy with the patch though, as currently virtqueue_enable_cb_delayed() is the same as virtqueue_enable_cb() if there's only been one buffer added. Cheers, Rusty. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/virtio/virtio_ring.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 00ec6b3..545fed5 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -636,7 +636,10 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq) * entry. Always do both to keep code simple. */ vq-vring.avail-flags = cpu_to_virtio16(_vq-vdev, ~VRING_AVAIL_F_NO_INTERRUPT); /* TODO: tune this threshold */ - bufs = (u16)(virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) - vq-last_used_idx) * 3 / 4; + bufs = (u16)(virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) - + vq-last_used_idx); + if (bufs != 1) + bufs = bufs * 3 / 4; vring_used_event(vq-vring) = cpu_to_virtio16(_vq-vdev, vq-last_used_idx + bufs); virtio_mb(vq-weak_barriers); if (unlikely((u16)(virtio16_to_cpu(_vq-vdev, vq-vring.used-idx) - vq-last_used_idx) bufs)) { -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio_pci: use 16-bit accessor for queue_enable.
Since PCI is little endian, 8-bit access might work, but the spec section is very clear on this: 4.1.3.1 Driver Requirements: PCI Device Layout The driver MUST access each field using the “natural” access method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit fields and 8-bit accesses for 8-bit fields. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index f16e462cb4ef..2aa38e59db2e 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -294,7 +294,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, /* Check if queue is either not available or already active. */ num = ioread16(cfg-queue_size); - if (!num || ioread8(cfg-queue_enable)) + if (!num || ioread16(cfg-queue_enable)) return ERR_PTR(-ENOENT); if (num (num - 1)) { @@ -394,7 +394,7 @@ static int vp_modern_find_vqs(struct virtio_device *vdev, unsigned nvqs, */ list_for_each_entry(vq, vdev-vqs, list) { iowrite16(vq-index, vp_dev-common-queue_select); - iowrite8(1, vp_dev-common-queue_enable); + iowrite16(1, vp_dev-common-queue_enable); } return 0; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled.
Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp writes: From 11fd997d724f520ca628615e7ffbfd7901c40b62 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Date: Fri, 6 Feb 2015 13:28:38 +0900 Subject: [PATCH] virtio: Avoid possible kernel panic if DEBUG is enabled. The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is called if vq-num_added == (1 16) - 1 before calling END_USE(). The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE() upon entry, and will call panic() if DEBUG is enabled. Move this virtqueue_kick() call to after END_USE() call. Thanks, applied. Cheers, Rusty. Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp --- drivers/virtio/virtio_ring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 00ec6b3..596735b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -245,14 +245,14 @@ static inline int virtqueue_add(struct virtqueue *_vq, vq-vring.avail-idx = cpu_to_virtio16(_vq-vdev, virtio16_to_cpu(_vq-vdev, vq-vring.avail-idx) + 1); vq-num_added++; + pr_debug(Added buffer head %i to %p\n, head, vq); + END_USE(vq); + /* This is very unlikely, but theoretically possible. Kick * just in case. */ if (unlikely(vq-num_added == (1 16) - 1)) virtqueue_kick(_vq); - pr_debug(Added buffer head %i to %p\n, head, vq); - END_USE(vq); - return 0; } -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-mmio: Update the device to OASIS spec version
Pawel Moll pawel.m...@arm.com writes: This patch add a support for second version of the virtio-mmio device, which follows OASIS Virtual I/O Device (VIRTIO) Version 1.0 specification. Nice job, that turned out quite neat! Applied, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] virtio-mmio: Update the device to OASIS spec version
Michael S. Tsirkin m...@redhat.com writes: On Tue, Jan 20, 2015 at 06:12:11PM +, Pawel Moll wrote: This patch add a support for second version of the virtio-mmio device, which follows OASIS Virtual I/O Device (VIRTIO) Version 1.0 specification. OK, applied this instead :) I'll leave it in my pending queue for a few days in case there's a v3. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio for 3.20
Michael S. Tsirkin m...@redhat.com writes: On Mon, Jan 19, 2015 at 12:56:28PM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: Hi Rusty, all I parked outstanding virtio patches here: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next this way they are merged in linux-next as well. Rusty if you'll be doing 3.20 window, would you mind checking that tree to verify nothing's lost? Hi Michael, Thanks, now I've returned from linux.conf.au, I'm starting to organize the 3.20 queue. I've merged this branch into my virtio-next branch. Cheers, Rusty. Thanks, good to know. I tweaked some patches since due to test failure reports by Gerd. Updated patches are tagged post-squash, if you want to see what changed, see fixup patches in the series tagged pre-squash. Or you can look at my tree: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git virtio-next is pre-squash. git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next is post-squash. OK, I merged your vhost-next with my virtio-next, and the merge was empty. So we're up-to-date. Please send patches against my virtio-next from now on. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 10/16] s390: add pci_iomap_range
Sebastian Ott seb...@linux.vnet.ibm.com writes: On Wed, 14 Jan 2015, Michael S. Tsirkin wrote: } -EXPORT_SYMBOL_GPL(pci_iomap); +EXPORT_SYMBOL_GPL(pci_iomap_range); + +void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen) +{ +return pci_iomap_range(dev, bar, 0, maxlen); +} +EXPORT_SYMBOL(pci_iomap); As discussed earlier, could you please leave that as EXPORT_SYMBOL_GPL. If there's a reason to have these interfaces as EXPORT_SYMBOL, we could change all of them in an extra patch. With this change integrated you can add Acked-by: Sebastian Ott seb...@linux.vnet.ibm.com OK, I've gone back and rebased virtio-next to fix this (I don't really want to think too hard about the legal consequences of having a version which isn't EXPORT_SYMBOL_GPL). Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] virtio: graceful failure with get == NULL
Michael S. Tsirkin m...@redhat.com writes: On Tue, Jan 13, 2015 at 04:43:07PM +0200, Michael S. Tsirkin wrote: virtio 1.0 says device configuration is optional, but most drivers treat it as mandatory. Even if presented by device, guest bios might disable the BAR holding that configuration, so we can't assume it's there, but we also don't want to fail if not in case drivers can cope with it's absence - such as caif or rng. Add code to drivers to check presence of get callback and fail gracefully. Rusty, in case it's not clear: I'd like to hear your opinion on these patches, since virtio pci modern driver I'm now preparing for submission, depends on this. Just to note for anyone following, that these are now in my virtio-next tree. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio for 3.20
Michael S. Tsirkin m...@redhat.com writes: Hi Rusty, all I parked outstanding virtio patches here: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-next this way they are merged in linux-next as well. Rusty if you'll be doing 3.20 window, would you mind checking that tree to verify nothing's lost? Hi Michael, Thanks, now I've returned from linux.conf.au, I'm starting to organize the 3.20 queue. I've merged this branch into my virtio-next branch. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_pci: add module param to force legacy mode
Michael S. Tsirkin m...@redhat.com writes: If set, try legacy interface first, modern one if that fails. Useful to work around device/driver bugs, and for compatibility testing. Signed-off-by: Michael S. Tsirkin m...@redhat.com Thanks, I've applied all these to virtio-next. Cheers! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 0/3] Sharing MSIX irq for tx/rx queue pairs
Jason Wang jasow...@redhat.com writes: Hi all: This series try to share MSIX irq for each tx/rx queue pair. This is done through: - introducing virtio pci channel which are group of virtqueues that sharing a single MSIX irq (Patch 1) - expose channel setting to virtio core api (Patch 2) - try to use channel setting in virtio-net (Patch 3) Hi Jason, Is channel a term you created yourself, or something I was just unaware of? irq_group would seem more obvious, if the former. For the transport that does not support channel, channel paramters were simply ignored. For devices that does not use channel, it can simply pass NULL or zero to virito core. With the patch, 1 MSIX irq were saved for each TX/RX queue pair. It seems fairly straightforward. Acked-by: Rusty Russell ru...@rustcorp.com.au Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/6] virtio: core support for config generation
Michael S. Tsirkin m...@redhat.com writes: virtio 1.0 spec says: Drivers MUST NOT assume reads from fields greater than 32 bits wide are atomic, nor are reads from multiple fields: drivers SHOULD read device configuration space fields like so: u32 before, after; do { before = get_config_generation(device); // read config entry/entries. after = get_config_generation(device); } while (after != before); Do exactly this, for transports that support it. static inline void virtio_cwrite8(struct virtio_device *vdev, @@ -352,6 +375,7 @@ static inline u64 virtio_cread64(struct virtio_device *vdev, { u64 ret; vdev-config-get(vdev, offset, ret, sizeof(ret)); + __virtio_cread_many(vdev, offset, ret, 1, sizeof(ret)); return virtio64_to_cpu(vdev, (__force __virtio64)ret); } The vdev-config-get(vdev, offset, ret, sizeof(ret)); should be deleted. Harmless if not, though. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/3] fix up vringh/mic sparse errors
Michael S. Tsirkin m...@redhat.com writes: This fixes remaining sparse warnings in vringh and mic by using virtio 1.0 compliant wrappers. This also needs by get_user patches to avoid getting warnings from these calls. Tested by running vringh_test. Rusty, I prefer fixing all these warnings for 3.19, any objections? OK, I've queued these too. Cheers, Rusty. Michael S. Tsirkin (3): vringh: 64 bit features vringh: initial virtio 1.0 support mic/host: initial virtio 1.0 support include/linux/vringh.h | 37 ++- drivers/misc/mic/host/mic_debugfs.c | 18 -- drivers/vhost/vringh.c | 125 ++-- 3 files changed, 123 insertions(+), 57 deletions(-) -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] tools/virtio: more stubs
Michael S. Tsirkin m...@redhat.com writes: As usual, add more stubs to fix test build after main codebase changes. This doesn't apply, since: diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h index dafe1c9..806d683 100644 --- a/tools/virtio/linux/virtio_config.h +++ b/tools/virtio/linux/virtio_config.h @@ -1,8 +1,7 @@ #include linux/virtio_byteorder.h #include linux/virtio.h +#include uapi/linux/virtio_config.h -#define VIRTIO_TRANSPORT_F_START 28 -#define VIRTIO_TRANSPORT_F_END 32 /* Version in master has no headers in this file. What's this based on? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] virtio 1.0 fixups, tweaks
Michael S. Tsirkin m...@redhat.com writes: Fixes a couple of minor compliance issues in new virtio 1.0 code. Plus, adds a couple of minor cleanups - not bugfixes, but seem safe enough for 3.19. OK, I've applied these. Thanks, Rusty. Michael S. Tsirkin (6): virtio: set VIRTIO_CONFIG_S_FEATURES_OK on restore virtio_config: fix virtio_cread_bytes virtio_pci_common.h: drop VIRTIO_PCI_NO_LEGACY virtio_pci: move probe to common file virtio_pci: add VIRTIO_PCI_NO_LEGACY virtio: core support for config generation drivers/virtio/virtio_pci_common.h | 7 +++ include/linux/virtio_config.h | 29 - include/uapi/linux/virtio_pci.h| 15 ++- drivers/virtio/virtio.c| 37 +++-- drivers/virtio/virtio_pci_common.c | 34 +- drivers/virtio/virtio_pci_legacy.c | 24 ++-- 6 files changed, 99 insertions(+), 47 deletions(-) -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: BUG_ON in virtio-ring.c
Alexey Lapitsky lex.pub...@gmail.com writes: Hi, Sorry for the long delay. It prints exactly the same: [3.792033] virtqueue elements = 128, max_segments = 126 (1 queues) [3.802191] vda: vda1 vda2 vda5 A little bit more about my setup (if it helps): OK, I think this is fixed by Ming Lei's recent count updates, which interestingly did not seem to be CC:stable. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 9/9] virtio_pci: update file descriptions and copyright
Michael S. Tsirkin m...@redhat.com writes: There's been a lot of changes since 2007. List main authors, add Red Hat copyright. Acked-by: Rusty Russell ru...@rustcorp.com.au Cheers, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.h| 5 - drivers/virtio/virtio_pci.c| 5 - drivers/virtio/virtio_pci_legacy.c | 5 - 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_pci.h b/drivers/virtio/virtio_pci.h index a3b1259..fba383c 100644 --- a/drivers/virtio/virtio_pci.h +++ b/drivers/virtio/virtio_pci.h @@ -1,15 +1,18 @@ #ifndef _DRIVERS_VIRTIO_VIRTIO_PCI_H #define _DRIVERS_VIRTIO_VIRTIO_PCI_H /* - * Virtio PCI driver + * Virtio PCI driver - APIs for common functionality for all device versions * * This module allows virtio devices to be used over a virtual PCI device. * This can be used with QEMU based VMMs like KVM or Xen. * * Copyright IBM Corp. 2007 + * Copyright Red Hat, Inc. 2014 * * Authors: * Anthony Liguori aligu...@us.ibm.com + * Rusty Russell ru...@rustcorp.com.au + * Michael S. Tsirkin m...@redhat.com * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 6c7e186..c263f67 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -1,13 +1,16 @@ /* - * Virtio PCI driver + * Virtio PCI driver - common functionality for all device versions * * This module allows virtio devices to be used over a virtual PCI device. * This can be used with QEMU based VMMs like KVM or Xen. * * Copyright IBM Corp. 2007 + * Copyright Red Hat, Inc. 2014 * * Authors: * Anthony Liguori aligu...@us.ibm.com + * Rusty Russell ru...@rustcorp.com.au + * Michael S. Tsirkin m...@redhat.com * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 9fdec9a..1ca4422 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -1,13 +1,16 @@ /* - * Virtio PCI driver + * Virtio PCI driver - legacy device support * * This module allows virtio devices to be used over a virtual PCI device. * This can be used with QEMU based VMMs like KVM or Xen. * * Copyright IBM Corp. 2007 + * Copyright Red Hat, Inc. 2014 * * Authors: * Anthony Liguori aligu...@us.ibm.com + * Rusty Russell ru...@rustcorp.com.au + * Michael S. Tsirkin m...@redhat.com * * This work is licensed under the terms of the GNU GPL, version 2 or later. * See the COPYING file in the top-level directory. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] virtio_ccw: minor enhancements
Michael S. Tsirkin m...@redhat.com writes: Hi Rusty, I guess you are busy. I'd like to send all these patch series combined to Linus. To see what I have queued, you can check the vhost-next branch in my tree. Will wait until tomorrow to give you a chance to respond. Hi Michael, Indeed, this is the last week of my sabbatical. I am back full time next week. See my Ack on your patches. Thanks. Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] virito: introduce methods of fixing device features
Jason Wang jasow...@redhat.com writes: Buggy host may advertised buggy host features (a usual case is that host advertise a feature whose dependencies were missed). In this case, driver should detect and disable the buggy features by itself. Sorry, I've been focussing elsewhere. I would really prefer that drivers offer a feature_depends table, which can indicate that feature A depends on feature B, and have the core iterate, complain and fixup as necessary. Is that expressive enough, or do we need more? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v4: decrease last reference for triggering the cleanup This doesn't make any sense: +static void drop_current_rng(void) +{ + struct hwrng *rng = current_rng; + + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + /* release current_rng reference */ + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; + + /* decrease last reference for triggering the cleanup */ + kref_put(rng-ref, cleanup_rng); +} Why would it drop the refcount twice? This doesn't make sense. Hmm, because you added kref_init, which initializes the reference count to 1, you created this bug. Leave out the kref_init, and let it naturally be 0 (until, and if, it becomes current_rng). Add a comment if you want. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/6] hw_random: fix unregister race.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. v4: add cleanup_done flag to insure that cleanup is done That's a bit weird. The usual pattern would be to hold a reference until we're actually finished, but this reference is a bit weird. We hold the mutex across cleanup, so we could grab that but we have to take care sleeping inside wait_event, otherwise Peter will have to fix my code again :) AFAICT the wake_woken() stuff isn't merged yet, so your patch will have to do for now. @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + rng-cleanup_done = true; + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) kthread_stop(hwrng_fill); } else mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng-cleanup_done +atomic_read(rng-ref.refcount) == 0); + The atomic_read() isn't necessary here. However, you should probably init cleanup_done in hwrng_register(). (Probably noone does unregister then register, but let's be clear). Thanks, Rusty. } EXPORT_SYMBOL_GPL(hwrng_unregister); diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index c212e71..7832e50 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -46,6 +46,7 @@ struct hwrng { /* internal. */ struct list_head list; struct kref ref; + bool cleanup_done; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: kernel BUG at drivers/block/virtio_blk.c:172!
Jeff Layton jlay...@poochiereds.net writes: In the latest Fedora rawhide kernel in the repos, I'm seeing the following oops when mounting xfs. rc2-ish kernels seem to be fine: [ 64.669633] [ cut here ] [ 64.670008] kernel BUG at drivers/block/virtio_blk.c:172! Hmm, that's: BUG_ON(req-nr_phys_segments + 2 vblk-sg_elems); But during our probe routine we said: /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk-sg_elems-2); Jens? Thanks, Rusty. [ 64.670008] invalid opcode: [#1] SMP [ 64.670008] Modules linked in: xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm ppdev snd_timer snd virtio_net virtio_balloon soundcore serio_raw parport_pc virtio_console pvpanic parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc qxl virtio_blk drm_kms_helper ttm drm ata_generic virtio_pci virtio_ring virtio pata_acpi [ 64.670008] CPU: 1 PID: 705 Comm: mount Not tainted 3.18.0-0.rc3.git2.1.fc22.x86_64 #1 [ 64.670008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 64.670008] task: 8800d94a4ec0 ti: 8800d9f38000 task.ti: 8800d9f38000 [ 64.670008] RIP: 0010:[a00287c0] [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP: 0018:8800d9f3b778 EFLAGS: 00010202 [ 64.670008] RAX: 0082 RBX: 8800d8375700 RCX: dead00200200 [ 64.670008] RDX: 0001 RSI: 8800d8375700 RDI: 8800d82c4c00 [ 64.670008] RBP: 8800d9f3b7b8 R08: 8800d8375700 R09: 0001 [ 64.670008] R10: 0001 R11: 0004 R12: 8800d9f3b7e0 [ 64.670008] R13: 8800d82c4c00 R14: 880118629200 R15: [ 64.670008] FS: 7f5c64dfd840() GS:88011b00() knlGS: [ 64.670008] CS: 0010 DS: ES: CR0: 8005003b [ 64.670008] CR2: 7fffe6458fb8 CR3: d06d3000 CR4: 06e0 [ 64.670008] Stack: [ 64.670008] 8801 8800d8375870 0001 8800d82c4c00 [ 64.670008] 8800d9f3b7e0 8800d8375700 8800d82c4c48 [ 64.670008] 8800d9f3b828 813ec258 8800d82c8000 0001 [ 64.670008] Call Trace: [ 64.670008] [813ec258] __blk_mq_run_hw_queue+0x1c8/0x330 [ 64.670008] [813ecd80] blk_mq_run_hw_queue+0x70/0x90 [ 64.670008] [813ee0cd] blk_sq_make_request+0x24d/0x5c0 [ 64.670008] [813dec68] generic_make_request+0xf8/0x150 [ 64.670008] [813ded38] submit_bio+0x78/0x190 [ 64.670008] [a02fc27e] _xfs_buf_ioapply+0x2be/0x5f0 [xfs] [ 64.670008] [a0333628] ? xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a02ffe21] xfs_buf_submit_wait+0x91/0x840 [xfs] [ 64.670008] [a0333628] xlog_bread_noalign+0xa8/0xe0 [xfs] [ 64.670008] [a0333ea7] xlog_bread+0x27/0x60 [xfs] [ 64.670008] [a03357f3] xlog_find_verify_cycle+0xf3/0x1b0 [xfs] [ 64.670008] [a0335de5] xlog_find_head+0x2f5/0x3e0 [xfs] [ 64.670008] [a0335f0c] xlog_find_tail+0x3c/0x410 [xfs] [ 64.670008] [a033b12d] xlog_recover+0x2d/0x120 [xfs] [ 64.670008] [a033cfdb] ? xfs_trans_ail_init+0xcb/0x100 [xfs] [ 64.670008] [a0329c3d] xfs_log_mount+0xdd/0x2c0 [xfs] [ 64.670008] [a031f744] xfs_mountfs+0x514/0x9c0 [xfs] [ 64.670008] [a0320c8d] ? xfs_mru_cache_create+0x18d/0x1f0 [xfs] [ 64.670008] [a0322ed0] xfs_fs_fill_super+0x330/0x3b0 [xfs] [ 64.670008] [8126d4ac] mount_bdev+0x1bc/0x1f0 [ 64.670008] [a0322ba0] ? xfs_parseargs+0xbe0/0xbe0 [xfs] [ 64.670008] [a0320fd5] xfs_fs_mount+0x15/0x20 [xfs] [ 64.670008] [8126de58] mount_fs+0x38/0x1c0 [ 64.670008] [81202c15] ? __alloc_percpu+0x15/0x20 [ 64.670008] [812908f8] vfs_kern_mount+0x68/0x160 [ 64.670008] [81293d6c] do_mount+0x22c/0xc20 [ 64.670008] [8120d92e] ? might_fault+0x5e/0xc0 [ 64.670008] [811fcf1b] ? memdup_user+0x4b/0x90 [ 64.670008] [81294a8e] SyS_mount+0x9e/0x100 [ 64.670008] [8185e169] system_call_fastpath+0x12/0x17 [ 64.670008] Code: 00 00 c7 86 78 01 00 00 02 00 00 00 48 c7 86 80 01 00 00 00 00 00 00 89 86 7c 01 00 00 e9 02 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [ 64.670008] RIP [a00287c0] virtio_queue_rq+0x290/0x2a0 [virtio_blk] [ 64.670008] RSP 8800d9f3b778 [ 64.715347] ---[ end trace c0ff4a0f2fb21f7f ]--- It's reliably reproducible and I don't see this oops when I convert the same block device to ext4 and mount it. In this setup, the KVM guest has a virtio block device that has a LVM2 PV on it with an
Re: [PATCH v2 4/6] hw_random: fix unregister race.
Herbert Xu herb...@gondor.apana.org.au writes: On Thu, Sep 18, 2014 at 08:37:45PM +0800, Amos Kong wrote: From: Rusty Russell ru...@rustcorp.com.au The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell ru...@rustcorp.com.au You totally corrupted Rusty's patch. If you're going to repost his series you better make sure that you've got the right patches. Just as well though as it made me think a little more about this patch :) OK Amos, can you please repost the complete series? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 1/4] virtio_net: pass vi around
Michael S. Tsirkin m...@redhat.com writes: Too many places poke at [rs]q-vq-vdev-priv just to get the the vi structure. Let's just pass the pointer around: seems cleaner, and might even be faster. Agreed, it's neater. Acked-by: Rusty Russell ru...@rustcorp.com.au Thanks, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 57cbc7d..36f3dfc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -241,11 +241,11 @@ static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize) } /* Called from bottom half context */ -static struct sk_buff *page_to_skb(struct receive_queue *rq, +static struct sk_buff *page_to_skb(struct virtnet_info *vi, +struct receive_queue *rq, struct page *page, unsigned int offset, unsigned int len, unsigned int truesize) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct sk_buff *skb; struct skb_vnet_hdr *hdr; unsigned int copy, hdr_len, hdr_padded_len; @@ -328,12 +328,13 @@ static struct sk_buff *receive_small(void *buf, unsigned int len) } static struct sk_buff *receive_big(struct net_device *dev, +struct virtnet_info *vi, struct receive_queue *rq, void *buf, unsigned int len) { struct page *page = buf; - struct sk_buff *skb = page_to_skb(rq, page, 0, len, PAGE_SIZE); + struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE); if (unlikely(!skb)) goto err; @@ -359,7 +360,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, int offset = buf - page_address(page); unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx)); - struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, truesize); + struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len, +truesize); struct sk_buff *curr_skb = head_skb; if (unlikely(!curr_skb)) @@ -433,9 +435,9 @@ err_buf: return NULL; } -static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) +static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, + void *buf, unsigned int len) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct net_device *dev = vi-dev; struct virtnet_stats *stats = this_cpu_ptr(vi-stats); struct sk_buff *skb; @@ -459,9 +461,9 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) if (vi-mergeable_rx_bufs) skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len); else if (vi-big_packets) - skb = receive_big(dev, rq, buf, len); + skb = receive_big(dev, vi, rq, buf, len); else - skb = receive_small(buf, len); + skb = receive_small(vi, buf, len); if (unlikely(!skb)) return; @@ -530,9 +532,9 @@ frame_err: dev_kfree_skb(skb); } -static int add_recvbuf_small(struct receive_queue *rq, gfp_t gfp) +static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, + gfp_t gfp) { - struct virtnet_info *vi = rq-vq-vdev-priv; struct sk_buff *skb; struct skb_vnet_hdr *hdr; int err; @@ -655,9 +657,9 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp) * before we're receiving packets, or from refill_work which is * careful to disable receiving (using napi_disable). */ -static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp) +static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, + gfp_t gfp) { - struct virtnet_info *vi = rq-vq-vdev-priv; int err; bool oom; @@ -668,7 +670,7 @@ static bool try_fill_recv(struct receive_queue *rq, gfp_t gfp) else if (vi-big_packets) err = add_recvbuf_big(rq, gfp); else - err = add_recvbuf_small(rq, gfp); + err = add_recvbuf_small(vi, rq, gfp); oom = err == -ENOMEM; if (err) @@ -717,7 +719,7 @@ static void refill_work(struct work_struct *work) struct receive_queue *rq = vi-rq[i]; napi_disable(rq-napi); - still_empty = !try_fill_recv(rq, GFP_KERNEL); + still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); virtnet_napi_enable(rq); /* In theory, this can happen
Re: [PATCH] virtio_blk: fix race at module removal
Ming Lei tom.leim...@gmail.com writes: On Fri, Oct 24, 2014 at 12:12 AM, Michael S. Tsirkin m...@redhat.com wrote: If a device appears while module is being removed, driver will get a callback after we've given up on the major number. In theory this means this major number can get reused by something else, resulting in a conflict. Yes, there is a tiny race window. Applied. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 2/4] virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr
Michael S. Tsirkin m...@redhat.com writes: virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not really useful since virtio_net_hdr_mrg_rxbuf includes that as the first field anyway. Let's drop it, precalculate header len and store within vi instead. This way we can also remove struct skb_vnet_hdr. Yes, this is definitely a win. Acked-by: Rusty Russell ru...@rustcorp.com.au Thanks, Rusty. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 88 ++-- 1 file changed, 40 insertions(+), 48 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 36f3dfc..a795a23 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -123,6 +123,9 @@ struct virtnet_info { /* Host can handle any s/g split between our header and packet data */ bool any_header_sg; + /* Packet virtio header size */ + u8 hdr_len; + /* Active statistics */ struct virtnet_stats __percpu *stats; @@ -139,21 +142,14 @@ struct virtnet_info { struct notifier_block nb; }; -struct skb_vnet_hdr { - union { - struct virtio_net_hdr hdr; - struct virtio_net_hdr_mrg_rxbuf mhdr; - }; -}; - struct padded_vnet_hdr { - struct virtio_net_hdr hdr; + struct virtio_net_hdr_mrg_rxbuf hdr; /* - * virtio_net_hdr should be in a separated sg buffer because of a - * QEMU bug, and data sg buffer shares same page with this header sg. - * This padding makes next sg 16 byte aligned after virtio_net_hdr. + * hdr is in a separate sg buffer, and data sg buffer shares same page + * with this header sg. This padding makes next sg 16 byte aligned + * after the header. */ - char padding[6]; + char padding[4]; }; /* Converting between virtqueue no. and kernel tx/rx queue no. @@ -179,9 +175,9 @@ static int rxq2vq(int rxq) return rxq * 2; } -static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) +static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb) { - return (struct skb_vnet_hdr *)skb-cb; + return (struct virtio_net_hdr_mrg_rxbuf *)skb-cb; } /* @@ -247,7 +243,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int len, unsigned int truesize) { struct sk_buff *skb; - struct skb_vnet_hdr *hdr; + struct virtio_net_hdr_mrg_rxbuf *hdr; unsigned int copy, hdr_len, hdr_padded_len; char *p; @@ -260,13 +256,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, hdr = skb_vnet_hdr(skb); - if (vi-mergeable_rx_bufs) { - hdr_len = sizeof hdr-mhdr; - hdr_padded_len = sizeof hdr-mhdr; - } else { - hdr_len = sizeof hdr-hdr; + hdr_len = vi-hdr_len; + if (vi-mergeable_rx_bufs) + hdr_padded_len = sizeof *hdr; + else hdr_padded_len = sizeof(struct padded_vnet_hdr); - } memcpy(hdr, p, hdr_len); @@ -317,11 +311,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return skb; } -static struct sk_buff *receive_small(void *buf, unsigned int len) +static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len) { struct sk_buff * skb = buf; - len -= sizeof(struct virtio_net_hdr); + len -= vi-hdr_len; skb_trim(skb, len); return skb; @@ -354,8 +348,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, unsigned int len) { void *buf = mergeable_ctx_to_buf_address(ctx); - struct skb_vnet_hdr *hdr = buf; - u16 num_buf = virtio16_to_cpu(rq-vq-vdev, hdr-mhdr.num_buffers); + struct virtio_net_hdr_mrg_rxbuf *hdr = buf; + u16 num_buf = virtio16_to_cpu(vi-vdev, hdr-num_buffers); struct page *page = virt_to_head_page(buf); int offset = buf - page_address(page); unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx)); @@ -373,8 +367,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (unlikely(!ctx)) { pr_debug(%s: rx error: %d buffers out of %d missing\n, dev-name, num_buf, - virtio16_to_cpu(rq-vq-vdev, - hdr-mhdr.num_buffers)); + virtio16_to_cpu(vi-vdev, + hdr-num_buffers)); dev-stats.rx_length_errors++; goto err_buf; } @@ -441,7 +435,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, struct net_device *dev = vi-dev; struct
Re: [PATCH RFC 4/4] virtio_net: bigger header when VERSION_1 is set
Michael S. Tsirkin m...@redhat.com writes: With VERSION_1 virtio_net uses same header size whether mergeable buffers are enabled or not. Signed-off-by: Michael S. Tsirkin m...@redhat.com These two are great too, thanks: Acked-by: Rusty Russell ru...@rustcorp.com.au Cheers, Rusty. --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9c6d50f..a2fe340 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1764,7 +1764,8 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) vi-mergeable_rx_bufs = true; - if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) || + virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) vi-hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); else vi-hdr_len = sizeof(struct virtio_net_hdr); -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: We got a warning in boot stage when above set_current_rng() is executed, it can be fixed by init rng-ref in hwrng_init(). @@ -166,6 +169,8 @@ static inline int hwrng_init(struct hwrng *rng) if (current_quality 0 !hwrng_fill) start_khwrngd(); + kref_init(rng-ref); + return 0; } OK, I folded this fix on. Thanks, Rusty. hw_random: use reference counts on each struct hwrng. current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v3: initialize kref (thanks Amos Kong) v2: fix missing put_rng() on exit path (thanks Amos Kong) Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index a0905c818e13..0ecac38da954 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include linux/delay.h #include linux/slab.h #include linux/random.h +#include linux/err.h #include asm/uaccess.h @@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng-cleanup) + rng-cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + kref_get(rng-ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + BUG_ON(!mutex_is_locked(rng_mutex)); + if (!current_rng) + return; + + kref_put(current_rng-ref, cleanup_rng); + current_rng = NULL; +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(rng-ref); + + mutex_unlock(rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* +* Hold rng_mutex here so we serialize in case they set_current_rng +* on rng again immediately. +*/ + mutex_lock(rng_mutex); + if (rng) + kref_put(rng-ref, cleanup_rng); + mutex_unlock(rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng-init) { @@ -110,13 +164,9 @@ static inline int hwrng_init(struct hwrng *rng) if (current_quality 0 !hwrng_fill) start_khwrngd(); - return 0; -} + kref_init(rng-ref); -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng rng-cleanup) - rng-cleanup(rng); + return 0; } static int rng_dev_open(struct inode *inode, struct file *filp) @@ -154,21 +204,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp-f_flags O_NONBLOCK)); if (bytes_read 0) { @@ -200,8 +251,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(rng_mutex); mutex_unlock(reading_mutex); + put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); @@ -213,12 +264,11 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock
Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong ak...@redhat.com writes: From: Rusty Russell ru...@rustcorp.com.au current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. V2: reduce reference count in signal_pending path OK, I changed it to do the put_rng before the check, so instead of: @@ -208,17 +256,19 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (signal_pending(current)) { err = -ERESTARTSYS; + put_rng(rng); goto out; } + + put_rng(rng); } out: return ret ? : err; -out_unlock: - mutex_unlock(rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } We have: mutex_unlock(reading_mutex); put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; goto out; } } out: return ret ? : err; out_unlock_reading: mutex_unlock(reading_mutex); put_rng(rng); goto out; } Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_balloon: free some memory from balloon on OOM
Denis V. Lunev d...@openvz.org writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This does not provide a security breach as balloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Allocate virtio feature bit for this: it is not set by default, the the guest will not deflate virtio balloon on OOM without explicit permission from host. Signed-off-by: Raushaniya Maksudova rmaksud...@parallels.com Signed-off-by: Denis V. Lunev d...@openvz.org CC: Rusty Russell ru...@rustcorp.com.au CC: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_balloon.c | 52 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 53 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 66cac10..88d73a0 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -28,6 +28,7 @@ #include linux/slab.h #include linux/module.h #include linux/balloon_compaction.h +#include linux/oom.h /* * Balloon device works in 4K page units. So each page is pointed to by @@ -36,6 +37,12 @@ */ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define OOM_VBALLOON_DEFAULT_PAGES 256 +#define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 + +static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; +module_param(oom_pages, int, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(oom_pages, pages to free on OOM); struct virtio_balloon { @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* To register callback in oom notifier call chain */ + struct notifier_block nb; }; static struct virtio_device_id id_table[] = { @@ -290,6 +300,38 @@ static void update_balloon_size(struct virtio_balloon *vb) actual); } +/* + * virtballoon_oom_notify - release pages when system is under severe + * memory pressure (called from out_of_memory()) + * @self : notifier block struct + * @dummy: not used + * @parm : returned - number of freed pages + * + * The balancing of memory by use of the virtio balloon should not cause + * the termination of processes while there are pages in the balloon. + * If virtio balloon manages to release some memory, it will make the + * system return and retry the allocation that forced the OOM killer + * to run. + */ +static int virtballoon_oom_notify(struct notifier_block *self, + unsigned long dummy, void *parm) +{ + struct virtio_balloon *vb; + unsigned long *freed; + unsigned num_freed_pages; + + vb = container_of(self, struct virtio_balloon, nb); + if (!virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) + return NOTIFY_OK; + + freed = parm; + num_freed_pages = leak_balloon(vb, oom_pages); + update_balloon_size(vb); + *freed += num_freed_pages; + + return NOTIFY_OK; +} + static int balloon(void *_vballoon) { struct virtio_balloon *vb = _vballoon; OK, I applied these. I thought it a bit weird that you're checking the feature in the callback, rather than only registering the callback if the feature exists. But it's clear enough. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PULL] More virtio fun
The following changes since commit 7ec62d421bdf29cb31101ae2689f7f3a9906289a: Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs (2014-09-10 14:04:17 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git tags/virtio-next-for-linus for you to fetch changes up to 1bbc26062754b012656d34103215f7552e02b999: virtio-rng: refactor probe error handling (2014-10-15 10:25:14 +1030) One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next. Thanks, Rusty. Michael S. Tsirkin (24): virtio_pci: fix virtio spec compliance on restore virtio: unify config_changed handling virtio-pci: move freeze/restore to virtio core virtio: defer config changed notifications virtio_blk: drop config_enable virtio-blk: drop config_mutex virtio_net: drop config_enable virtio-net: drop config_mutex virtio_net: minor cleanup virtio: add API to enable VQs early virtio_net: enable VQs early virtio_blk: enable VQs early virtio_console: enable VQs early 9p/trans_virtio: enable VQs early virtio_net: fix use after free on allocation failure virtio_scsi: move kick event out from virtscsi_init virtio_blk: enable VQs early on restore virtio_scsi: enable VQs early on restore virtio_console: enable VQs early on restore virtio_net: enable VQs early on restore virtio_scsi: fix race on device removal virtio_balloon: enable VQs early on restore virtio_scsi: drop scan callback virtio-rng: refactor probe error handling Paolo Bonzini (1): virito_scsi: use freezable WQ for events drivers/block/virtio_blk.c | 40 -- drivers/char/hw_random/virtio-rng.c | 15 +++--- drivers/char/virtio_console.c | 4 ++ drivers/misc/mic/card/mic_virtio.c | 6 +-- drivers/net/virtio_net.c| 44 +-- drivers/s390/kvm/kvm_virtio.c | 9 +--- drivers/s390/kvm/virtio_ccw.c | 6 +-- drivers/scsi/virtio_scsi.c | 42 +-- drivers/virtio/virtio.c | 103 drivers/virtio/virtio_balloon.c | 2 + drivers/virtio/virtio_mmio.c| 7 +-- drivers/virtio/virtio_pci.c | 33 ++-- include/linux/virtio.h | 14 + include/linux/virtio_config.h | 17 ++ net/9p/trans_virtio.c | 2 + 15 files changed, 207 insertions(+), 137 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next RFC 1/3] virtio: support for urgent descriptors
Jason Wang jasow...@redhat.com writes: Below should be useful for some experiments Jason is doing. I thought I'd send it out for early review/feedback. event idx feature allows us to defer interrupts until a specific # of descriptors were used. Sometimes it might be useful to get an interrupt after a specific descriptor, regardless. This adds a descriptor flag for this, and an API to create an urgent output descriptor. This is still an RFC: we'll need a feature bit for drivers to detect this, but we've run out of feature bits for virtio 0.X. For experimentation purposes, drivers can assume this is set, or add a driver-specific feature bit. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Jason Wang jasow...@redhat.com The new VRING_DESC_F_URGENT bit is theoretically nicer, but for networking (which tends to take packets in order) couldn't we just set the event counter to give us a tx interrupt at the packet we want? Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM
Michael S. Tsirkin m...@redhat.com writes: On Tue, Oct 14, 2014 at 10:14:05AM +1030, Rusty Russell wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote: Denis V. Lunev d...@parallels.com writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This makes some amount of sense. This reminds me of the balloon fs that Google once proposed. This really needs to be controlled from host though. At the moment host does not expect guest to deflate before requests. So as a minimum, add a feature bit for this. what if you want a mix of mandatory and optional balooning? I guess we can use multiple balloons, is that the idea? Trying to claw back some pages on OOM is almost certainly correct, even if the host doesn't expect it. It's roughly equivalent to not giving up pages in the first place. Well the difference is that there are management tools that poll balloon in host until they see balloon size reaches the expected value. They don't expect balloon to shrink below num_pages and will respond in various unexpected ways like e.g. killing the VM if it does. Killing a userspace process within the guest might be better for VM health. Besides the fact that we always did it like this, these tools seem to have basis in the spec. Specifically, this is based on this text from the spec: the device asks for a certain amount of memory, and the driver supplies it (or withdraws it, if the device has more than it asks for). This allows the guest to adapt to changes in allowance of underlying physical memory. and The device is driven by the receipt of a configuration change interrupt. Cheers, Rusty. PS. Yes, a real guest-driven balloon is preferable, but that's a much larger task. Any objection to making the feature depend on a feature flag? If you believe a guest which does this will cause drastic failure on the host side (ie. killing the VM), then yes, we can do this. However, I'm not aware of anything that sophisticated... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_balloon: Convert vballon kthread into a workqueue
Petr Mladek pmla...@suse.cz writes: Workqueues have clean and rich API for all basic operations. The code is usually easier and better readable. It can be easily tuned for the given purpose. OK, sure. -static void fill_balloon(struct virtio_balloon *vb, size_t num) +static void fill_balloon(struct virtio_balloon *vb, size_t diff) { struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; + size_t num; + bool done; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb-pfns)); + num = min(diff, ARRAY_SIZE(vb-pfns)); + done = (num == diff) ? true : false; mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; @@ -143,6 +143,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) VIRTIO_BALLOON_PAGES_PER_PAGE); /* Sleep for at least 1/5 of a second before retry. */ msleep(200); + done = false; break; } set_page_pfns(vb-pfns + vb-num_pfns, page); @@ -154,6 +155,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) if (vb-num_pfns != 0) tell_host(vb, vb-inflate_vq); mutex_unlock(vb-balloon_lock); + + if (!done) + queue_work(vb-wq, vb-wq_work); } Hmm, this is a bit convoluted. I would spell it out by keeping a num_done counter: if (num_done diff) queue_work(vb-wq, vb-wq_work); static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -168,20 +172,25 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } -static void leak_balloon(struct virtio_balloon *vb, size_t num) +static void leak_balloon(struct virtio_balloon *vb, size_t diff) { struct page *page; struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; + size_t num; + bool done; /* We can only do one array worth at a time. */ - num = min(num, ARRAY_SIZE(vb-pfns)); + num = min(diff, ARRAY_SIZE(vb-pfns)); + done = (num == diff) ? true : false; mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { page = balloon_page_dequeue(vb_dev_info); - if (!page) + if (!page) { + done = false; break; + } set_page_pfns(vb-pfns + vb-num_pfns, page); vb-num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE; } @@ -195,6 +204,9 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); + + if (!done) + queue_work(vb-wq, vb-wq_work); } Here too. @@ -471,11 +469,13 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_free_vb_mapping; - vb-thread = kthread_run(balloon, vb, vballoon); - if (IS_ERR(vb-thread)) { - err = PTR_ERR(vb-thread); + vb-wq = alloc_workqueue(vballoon_wq, + WQ_FREEZABLE | WQ_MEM_RECLAIM, 0); + if (!vb-wq) { + err = -ENOMEM; goto out_del_vqs; } + INIT_WORK(vb-wq_work, balloon); That looks racy to me; shouldn't we init vq-work before allocating wq? Otherwise, looks good! Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 04/25] virtio: defer config changed notifications
Michael S. Tsirkin m...@redhat.com writes: Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. But AFAICT you never *read* dev-config_changed. You unconditionally trigger a config_changed event in virtio_config_enable(). That's a bit weird, but probably OK. How about the following change (on top of your patch). I think the renaming is clearer, and note the added if() test in virtio_config_enable(). If you approve, I'll fold it in. Cheers, Rusty. diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 2536701b098b..df598dd8c5c8 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -122,7 +122,7 @@ static void __virtio_config_changed(struct virtio_device *dev) struct virtio_driver *drv = drv_to_virtio(dev-dev.driver); if (!dev-config_enabled) - dev-config_changed = true; + dev-config_change_pending = true; else if (drv drv-config_changed) drv-config_changed(dev); } @@ -148,8 +148,9 @@ static void virtio_config_enable(struct virtio_device *dev) { spin_lock_irq(dev-config_lock); dev-config_enabled = true; - __virtio_config_changed(dev); - dev-config_changed = false; + if (dev-config_change_pending) + __virtio_config_changed(dev); + dev-config_change_pending = false; spin_unlock_irq(dev-config_lock); } @@ -253,7 +254,7 @@ int register_virtio_device(struct virtio_device *dev) spin_lock_init(dev-config_lock); dev-config_enabled = false; - dev-config_changed = false; + dev-config_change_pending = false; /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 5636b119dc25..65261a7244fc 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -80,7 +80,7 @@ bool virtqueue_is_broken(struct virtqueue *vq); * @index: unique position on the virtio bus * @failed: saved value for CONFIG_S_FAILED bit (for restore) * @config_enabled: configuration change reporting enabled - * @config_changed: configuration change reported while disabled + * @config_change_pending: configuration change reported while disabled * @config_lock: protects configuration change reporting * @dev: underlying device. * @id: the device type identification (used to match it with a driver). @@ -94,7 +94,7 @@ struct virtio_device { int index; bool failed; bool config_enabled; - bool config_changed; + bool config_change_pending; spinlock_t config_lock; struct device dev; struct virtio_device_id id; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio_net: pass well-formed sgs to virtqueue_add_*()
Michael S. Tsirkin m...@redhat.com writes: On Fri, Sep 05, 2014 at 12:40:50PM +0200, Paolo Bonzini wrote: Il 03/09/2014 06:29, Rusty Russell ha scritto: + sg_init_table(rq-sg, MAX_SKB_FRAGS + 2); I think 2 is enough here. That said... sg_set_buf(rq-sg, hdr-hdr, sizeof hdr-hdr); - skb_to_sgvec(skb, rq-sg + 1, 0, skb-len); err = virtqueue_add_inbuf(rq-vq, rq-sg, 2, skb, gfp); ... skb_to_sgvec will already make the sg well formed, so the sg_init_table is _almost_ redundant; it is only there to remove intermediate end marks. The block layer takes care to remove them, but skb_to_sgvec doesn't. sg_init_table is still needed if CONFIG_DEBUG_SG, so I don't think it's worth it. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM
Michael S. Tsirkin m...@redhat.com writes: On Mon, Oct 13, 2014 at 04:02:52PM +1030, Rusty Russell wrote: Denis V. Lunev d...@parallels.com writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This makes some amount of sense. This reminds me of the balloon fs that Google once proposed. This really needs to be controlled from host though. At the moment host does not expect guest to deflate before requests. So as a minimum, add a feature bit for this. what if you want a mix of mandatory and optional balooning? I guess we can use multiple balloons, is that the idea? Trying to claw back some pages on OOM is almost certainly correct, even if the host doesn't expect it. It's roughly equivalent to not giving up pages in the first place. Cheers, Rusty. PS. Yes, a real guest-driven balloon is preferable, but that's a much larger task. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 08/11] virtio_blk: use virtio v1.0 endian
Cornelia Huck cornelia.h...@de.ibm.com writes: Note that we care only about the fields still in use for virtio v1.0. Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Hi Cornelia, These patches all look good; I'm a bit nervous about our testing missing some conversion, so we'll need qemu patches for PCI so we can test on other platforms too. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 03/11] virtio: support more feature bits
Cornelia Huck cornelia.h...@de.ibm.com writes: With virtio-1, we support more than 32 feature bits. Let's make vdev-guest_features depend on the number of supported feature bits, allowing us to grow the feature bits automatically. It's a judgement call, but I would say that simply using uint64_t will be sufficient for quite a while. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_balloon: free some memory from baloon on OOM
Denis V. Lunev d...@parallels.com writes: From: Raushaniya Maksudova rmaksud...@parallels.com Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This makes some amount of sense. But I suggest a few minor changes: +static int oom_vballoon_pages = OOM_VBALLOON_DEFAULT_PAGES; +module_param(oom_vballoon_pages, int, S_IRUSR | S_IWUSR); +MODULE_PARM_DESC(oom_vballoon_pages, pages to free on OOM); Since this is already prefixed with virtio_balloon. I suggest just calling it oom_pages. +static int virtballoon_oom_notify(struct notifier_block *self, + unsigned long dummy, void *parm) +{ + unsigned int num_freed_pages; + unsigned long *freed = (unsigned long *)parm; + struct virtio_balloon *vb = container_of((struct notifier_block *)self, + struct virtio_balloon, nb); Why cast self here? + num_freed_pages = leak_balloon(vb, oom_vballoon_pages); + update_balloon_size(vb); + *freed += num_freed_pages; + + return NOTIFY_OK; +} Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: BUG_ON in virtio-ring.c
Alexey Lapitsky lex.pub...@gmail.com writes: Hi, I'm hitting this bug with both ext4 and btrfs. Here's an example of the backtrace: https://gist.github.com/vzctl/e888a821333979120932 I tried raising this BUG only for direct ring and it solved the problem: - BUG_ON(total_sg vq-vring.num); + BUG_ON(total_sg vq-vring.num !vq-indirect); Shall I submit the patch or is a more elaborate fix required? This is wrong. It looks like the block layer is spending down more sg elements than we have ring entries, yet we tell it not to: blk_queue_max_segments(q, vblk-sg_elems-2); If you apply this patch, what happens? Here it prints: [0.616564] virtqueue elements = 128, max_segments = 126 (1 queues) [0.621244] vda: vda1 vda2 vda5 [0.632290] virtqueue elements = 128, max_segments = 126 (1 queues) [0.683526] vdb: vdb1 vdb2 vdb5 Cheers, Rusty. diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0a581400de0f..aa9d4d313587 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev) /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk-sg_elems-2); + printk(virtqueue elements = %u, max_segments = %u (%u queues), + virtqueue_get_vring_size(vblk-vqs[0].vq), + vblk-sg_elems-2, + vblk-num_vqs); + + BUG_ON(vblk-sg_elems-2 virtqueue_get_vring_size(vblk-vqs[0].vq)); + /* No need to bounce any requests */ blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 10/25] virtio: add API to enable VQs early
Michael S. Tsirkin m...@redhat.com writes: virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com OK, this all looks good, but I think we should do two things: 1) rename virtio_enable_vqs_early() to virtio_device_ready(). 2) Add a BUG_ON in the virtio_ring code to make sure device is ready before any ops are called. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio: pci: Use SIMPLE_DEV_PM_OPS macro
Jingoo Han jg1@samsung.com writes: On Tuesday, September 09, 2014 9:14 AM, Rusty Russell wrote: Jingoo Han jg1@samsung.com writes: Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler. Signed-off-by: Jingoo Han jg1@samsung.com This patch is obviously wrong. It won't compile without CONFIG_PM_SLEEP. No, there is no compile issue. When, CONFIG_PM_SLEEP=n, there is no build error. My mistake. Thanks, I've applied it. It probably won't go in until the next merge window, however, since I'm travelling for this one. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
Amos Kong ak...@redhat.com writes: I started a QEMU (non-smp) guest with one virtio-rng device, and read random data from /dev/hwrng by dd: # dd if=/dev/hwrng of=/dev/null In the same time, if I check hwrng attributes from sysfs by cat: # cat /sys/class/misc/hw_random/rng_* The cat process always gets stuck with slow backend (5 k/s), if we use a quick backend (1.2 M/s), the cat process will cost 1 to 2 minutes. The stuck doesn't exist for smp guest. Reading syscall enters kernel and call rng_dev_read(), it's user context. We used need_resched() to check if other tasks need to be run, but it almost always return false, and re-hold the mutex lock. The attributes accessing process always fails to hold the lock, so the cat gets stuck. User context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. This is why the need_reshed() always return false here. This patch removed need_resched() and always schedule other tasks then other tasks can have chance to hold the lock and execute protected code. OK, this is going to be a rant. Your explanation doesn't make sense at all. Worse, your solution breaks the advice of Kernighan Plaugher: Don't patch bad code - rewrite it.. But worst of all, this detailed explanation might have convinced me you understood the problem better than I did, and applied your patch. I did some tests. For me, as expected, the process spends its time inside the virtio rng read function, holding the mutex and thus blocking sysfs access; it's not a failure of this code at all. Your schedule_timeout() fix probably just helps by letting the host refresh entropy, so we spend less time waiting in the read fn. I will post a series, which unfortunately is only lightly tested, then I'm going to have some beer to begin my holiday. That may help me forget my disappointment at seeing respected fellow developers monkey-patching random code they don't understand. Grrr Rusty. Signed-off-by: Amos Kong ak...@redhat.com --- drivers/char/hw_random/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..263a370 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5/5] hw_random: don't init list element we're about to add to list.
Another interesting anti-pattern. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 6a34feca6b43..96fa06716e95 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } } - INIT_LIST_HEAD(rng-list); list_add_tail(rng-list, rng_list); if (old_rng !rng-init) { -- 1.9.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/5] hw_random: fix unregister race.
The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/char/hw_random/core.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index dc9092a1075d..b4a21e9521cf 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to off */ @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) if (rng-cleanup) rng-cleanup(rng); + wake_up_all(rng_done); } static void set_current_rng(struct hwrng *rng) @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) } mutex_unlock(rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, atomic_read(rng-ref.refcount) == 0); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization