Re: [PATCH RFC 4/5] tun: vringfd xmit support.
From: Rusty Russell <[EMAIL PROTECTED]> Date: Mon, 7 Apr 2008 17:24:51 +1000 > On Monday 07 April 2008 15:13:44 Herbert Xu wrote: > > On second thought, this is not going to work. The network stack > > can clone individual pages out of this skb and put it into a new > > skb. Therefore whatever scheme we come up with will either need > > to be page-based, or add a flag to tell the network stack that it > > can't clone those pages. > > Erk... I'll put in the latter for now. A page-level solution is not really > an option: if userspace hands us mmaped pages for example. Keep in mind that the core of the TCP stack really depends upon being able to slice and dice paged SKBs as is pleases in order to send packets out. In fact, it also does such splitting during SACK processing. It really is a base requirement for efficient TSO support. Otherwise the above operations would be so incredibly expensive we might as well rip all of the TSO support out. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/5] /dev/vring: simple userspace-kernel ringbuffer interface.
From: Rusty Russell <[EMAIL PROTECTED]> Date: Sun, 20 Apr 2008 02:41:14 +1000 > If only there were some kind of, I don't know... summit... for kernel > people... I'm starting to disbelieve the myth that because we can discuss technical issues on mailing lists, we should talk primarily about process issues during the kernel summit. There is a distinct advantage to discussing and hashing things out in person. You can't say "screw you, your idea sucks" when you're face to face with the other person, whereas online it's way too easy. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [6/6] [VIRTIO] net: Allow receiving SG packets
From: Rusty Russell <[EMAIL PROTECTED]> Date: Tue, 22 Apr 2008 05:06:16 +1000 > I'm not sure what the right number is here. Say worst case is header which > goes over a page boundary then MAX_SKB_FRAGS in the skb, but for some reason > that already has a +2: > > /* To allow 64K frame to be packed as single skb without frag_list */ > #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) > > Unless someone explains, I'll change the xmit sg to 2+MAX_SKB_FRAGS as well. MAX_SKB_FRAGS + 1 is what you ought to need. MAX_SKB_FRAGS is only accounting for the skb frag pages. If you want to know how many segments skb->data might consume as well, you have to add one. skb->data is linear, therefore it's not possible to need more than one scatterlist entry for it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [6/6] [VIRTIO] net: Allow receiving SG packets
From: Rusty Russell <[EMAIL PROTECTED]> Date: Tue, 22 Apr 2008 12:50:27 +1000 > But I was curious as to why the +2 in the MAX_SKB_FRAGS definition? To be honest I have no idea. When Alexey added the TSO changeset way back then, it had the "+2", from the history-2.6 tree: commit 80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf Author: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Wed Aug 28 11:52:03 2002 -0700 [NET]: Add TCP segmentation offload core infrastructure. ... diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a812681..9b6e6ad 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -109,7 +109,8 @@ struct sk_buff_head { struct sk_buff; -#define MAX_SKB_FRAGS 6 +/* To allow 64K frame to be packed as single skb without frag_list */ +#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) typedef struct skb_frag_struct skb_frag_t; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/5] Remove now unused structs from kvm_para.h
You sent these patches to "kvm-owner", ie. the mailing list owner, and not the list itself which would be plain "kvm". ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] tun: Interface to query tun/tap features.
From: Max Krasnyansky <[EMAIL PROTECTED]> Date: Tue, 01 Jul 2008 21:59:02 -0700 > Dave, do you want me to put all outstanding TUN patches into a git tree so > that you can pull them in one shot ? > Otherwise if you're ok with applying them one by one please apply this one. > > Acked-by: Max Krasnyansky <[EMAIL PROTECTED]> I'll apply Rusty's patches after I give them a review too. Thanks Max. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] tun: Allow GSO using virtio_net_hdr
From: Rusty Russell <[EMAIL PROTECTED]> Date: Thu, 3 Jul 2008 11:34:14 +1000 > Add a IFF_VNET_HDR flag. This uses the same ABI as virtio_net (ie. prepending > struct virtio_net_hdr to packets) to indicate GSO and checksum information. > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> Also applied to net-next-2.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] tun: Interface to query tun/tap features.
From: Rusty Russell <[EMAIL PROTECTED]> Date: Thu, 3 Jul 2008 11:32:12 +1000 > The problem with introducing checksum offload and gso to tun is they > need to set dev->features to enable GSO and/or checksumming, which is > supposed to be done before register_netdevice(), ie. as part of > TUNSETIFF. ... > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> Applied to net-next-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] tun: TUNSETFEATURES to set gso features.
From: Rusty Russell <[EMAIL PROTECTED]> Date: Thu, 3 Jul 2008 11:33:11 +1000 > ethtool is useful for setting (some) device fields, but it's > root-only. Finer feature control is available through a tun-specific > ioctl. > > (Includes Mark McLoughlin <[EMAIL PROTECTED]>'s fix to hold rtnl sem). > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> Applied to net-next-2.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: David Miller <[EMAIL PROTECTED]> Date: Mon, 14 Jul 2008 22:16:02 -0700 (PDT) > It doesn't apply cleanly to net-next-2.6, as I just tried to > stick this into my tree. Ignore this, I did something stupid. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Max Krasnyansky <[EMAIL PROTECTED]> Date: Sat, 12 Jul 2008 01:52:54 -0700 > This is on top of the latest and greatest :). Assuming virt folks are ok with > the API this should go into 2.6.27. Really? :-) It doesn't apply cleanly to net-next-2.6, as I just tried to stick this into my tree. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Jeff Garzik <[EMAIL PROTECTED]> Date: Tue, 22 Jul 2008 19:41:47 -0400 > looks mostly OK, but stuff like the above should be > > (void __user *) arg > > Did you check this with sparse (Documentation/sparse.txt)? Jeff, I already added this particular patch to the tree a week or so ago. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Max Krasnyansky <[EMAIL PROTECTED]> Date: Tue, 22 Jul 2008 21:45:30 -0700 > Jeff Garzik wrote: > > David Miller wrote: > >> Jeff, I already added this particular patch to the tree > >> a week or so ago. > > > > Yeah, later on in my queue were the fixes. > > I'm not sure I'm following. What fixes ? Are you talking about fixing sparse > warnings or something else ? He's talking about sparse fixes. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] tun: TUNGETIFF interface to query name and flags
From: Max Krasnyansky <[EMAIL PROTECTED]> Date: Fri, 15 Aug 2008 11:00:19 -0700 > Rusty Russell wrote: > > On Thursday 14 August 2008 00:30:16 Mark McLoughlin wrote: > >> A very simple approach is attached; I did consider doing a TUNGETFLAGS > >> that would return tun->flags, but I think it's nicer to have a companion > >> to TUNGETIFF since it also allows one to query the interface name from > >> the file descriptor. > > > > This seems really sensible to me. > > > > If Max acks it, I'd say Dave should merge it. > > Makes perfect sense to me. > Definitely Ack. It has zero impact on existing user and I'd be ok if this goes > in during .27-rc series. I've applied Mark's patch, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: large tx MTU support
From: Mark McLoughlin <[EMAIL PROTECTED]> Date: Wed, 26 Nov 2008 13:58:11 + > We don't really have a max tx packet size limit, so allow configuring > the device with up to 64k tx MTU. > > Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]> Rusty, ACK? If so, I'll toss this into net-next-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.
From: Gleb Natapov Date: Sun, 14 Dec 2008 13:50:55 +0200 > It is undesirable to use TCP/IP for this purpose since network > connectivity may not exist between host and guest and if it exists the > traffic can be not routable between host and guest for security reasons > or TCP/IP traffic can be firewalled (by mistake) by unsuspecting VM user. I don't really accept this argument, sorry. If you can't use TCP because it might be security protected or misconfigured, adding this new stream protocol thing is not one bit better. It doesn't make any sense at all. Also, if TCP could be "misconfigured" this new thing could just as easily be screwed up too. And I wouldn't be surprised to see a whole bunch of SELINUX and netfilter features proposed later for this and then we're back to square one. You guys really need to rethink this. Either a stream protocol is a workable solution to your problem, or it isn't. And don't bring up any "virtualization is special because..." arguments into your reply because virtualization has nothing to do with my objections stated above. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.
From: Gleb Natapov Date: Mon, 15 Dec 2008 09:48:19 +0200 > On Sun, Dec 14, 2008 at 10:44:36PM -0800, David Miller wrote: > > You guys really need to rethink this. Either a stream protocol is a > > workable solution to your problem, or it isn't. > > Stream protocol is workable solution for us, but we need it out of band > in regard to networking and as much zero config as possible. If we will > use networking how can it be done without additional configuration (and > reconfiguration can be required after migration BTW) You miss the whole point and you also missed the part where I said (and the one part of my comments you conveniently did NOT quote): And don't bring up any "virtualization is special because..." arguments into your reply because virtualization has nothing to do with my objections stated above. What part of that do you not understand? Don't give me this junk about zero config, it's not a plausible argument against anything I said. You want to impose a new burdon onto the kernel in the form of a whole new socket layer. When existing ones can solve any communications problem. Performance is not a good argument because we have (repeatedly) made TCP/IP go fast in just about any environment. If you have a configuration problem, you can solve it in userspace in a number of different ways. Building on top of things we have and the user need not know anything about that. I would even be OK with features such as "permanent" links or special attributes for devices or IP addresses that by default prevent tampering and filtering by things like netfilter. But not this new thing that duplicates existing functionality, no way. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.
From: Anthony Liguori Date: Mon, 15 Dec 2008 09:02:23 -0600 > There is already an AF_IUCV for s390. This is a scarecrow and irrelevant to this discussion. And this is exactly why I asked that any arguments in this thread avoid talking about virtualization technology and why it's "special." This proposed patch here is asking to add new infrastructure for hypervisor facilities that will be _ADDED_ and for which we have complete control over. Whereas the S390 folks have to deal with existing infrastructure which is largely outside of their control. So if they implement access mechanisms for that, it's fine. I would be doing the same thing if I added a protocol socket layer for accessing the Niagara hypervisor virtualization channels. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.
From: Anthony Liguori Date: Mon, 15 Dec 2008 14:44:26 -0600 > We want this communication mechanism to be simple and reliable as we > want to implement the backends drivers in the host userspace with > minimum mess. One implication of your statement here is that TCP is unreliable. That's absolutely not true. > Within the guest, we need the interface to be always available and > we need an addressing scheme that is hypervisor specific. Yes, we > can build this all on top of TCP/IP. We could even build it on top > of a serial port. Both have their down-sides wrt reliability and > complexity. I don't know of any zero-copy through the hypervisor mechanisms for serial ports, but I know we do that with the various virtualization network devices. > Do you have another recommendation? I don't have to make alternative recommendations until you can show that what we have can't solve the problem acceptably, and TCP emphatically can. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.
From: Anthony Liguori Date: Mon, 15 Dec 2008 17:01:14 -0600 > No, TCP falls under the not simple category because it requires the > backend to have access to a TCP/IP stack. I'm at a loss for words if you need TCP in the hypervisor, if that's what you're implying here. You only need it in the guest and the host, which you already have, in the Linux kernel. Just transport that over virtio or whatever and be done with it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
From: Herbert Xu Date: Mon, 20 Apr 2009 16:35:50 +0800 > On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote: >> >> tun: Fix sk_sleep races when attaching/detaching > > That patch doesn't apply anymore because of contextual changes > caused by the first patch. Here's an update. > > tun: Fix sk_sleep races when attaching/detaching Do you think these two patches are ready to go into net-2.6 now? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
From: Herbert Xu Date: Mon, 20 Apr 2009 17:35:49 +0800 > On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote: >> >> Do you think these two patches are ready to go into net-2.6 >> now? > > I think so. Great, applied, thanks Herbert. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell Date: Mon, 18 May 2009 22:18:47 +0930 > We check for finished xmit skbs on every xmit, or on a timer (unless > the host promises to force an interrupt when the xmit ring is empty). > This can penalize userspace tasks which fill their sockbuf. Not much > difference with TSO, but measurable with large numbers of packets. > > There are a finite number of packets which can be in the transmission > queue. We could fire the timer more than every 100ms, but that would > just hurt performance for a corner case. This seems neatest. ... > Signed-off-by: Rusty Russell If this is so great for virtio it would also be a great idea universally, but we don't do it. What you're doing by orphan'ing is creating a situation where a single UDP socket can loop doing sends and monopolize the TX queue of a device. The only control we have over a sender for fairness in datagram protocols is that send buffer allocation. I'm guilty of doing this too in the NIU driver, also because there I lack a "TX queue empty" interrupt and this can keep TCP sockets from getting stuck. I think we need a generic solution to this issue because it is getting quite common to see cases where the packets in the TX queue of a device can sit there indefinitely. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell Date: Thu, 21 May 2009 16:27:05 +0930 > On Tue, 19 May 2009 12:10:13 pm David Miller wrote: >> What you're doing by orphan'ing is creating a situation where a single >> UDP socket can loop doing sends and monopolize the TX queue of a >> device. The only control we have over a sender for fairness in >> datagram protocols is that send buffer allocation. > > Urgh, that hadn't even occurred to me. Good point. Now this all is predicated on this actually mattering. :-) You could argue that the scheduler as well as the size of the TX queue should be limiting and enforcing fairness. Someone really needs to test this. Just skb_orphan() every packet at the beginning of dev_hard_start_xmit(), then run some test program with two clients looping out UDP packets to see if one can monopolize the device and get a significantly larger amount of TX resources than the other. Repeat for 3, 4, 5, etc. clients. > I haven't thought this through properly, but how about a hack where > we don't orphan packets if the ring is over half full? That would also work. And for the NIU case this would be great because I DO have a marker bit for triggering interrupts in the TX descriptors. There's just no "all empty" interrupt on TX (who designs these things? :( ). > Then I guess we could overload the watchdog as a more general > timer-after-no- xmit? Yes, but it means that teardown of a socket can be delayed up to the amount of that timer. Factor in all of this crazy round_jiffies() stuff people do these days and it could cause pauses for real use cases and drive users batty. Probably the most profitable avenue is to see if this is a real issue afterall (see above). If we can get away with having the socket buffer represent socket --> device space only, that's the most ideal solution. It will probably also improve performance a lot across the board, especially on NUMA/SMP boxes as our TX complete events tend to be in difference places than the SKB producer. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Patrick Ohly Date: Mon, 01 Jun 2009 21:47:22 +0200 > On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote: >> This patch adds skb_orphan to the start of dev_hard_start_xmit(): it >> can be premature in the NETDEV_TX_BUSY case, but that's uncommon. > > Would it be possible to make the new skb_orphan() at the start of > dev_hard_start_xmit() conditionally so that it is not executed for > packets that are to be time stamped? > > As discussed before > (http://article.gmane.org/gmane.linux.network/121378/), the skb->sk > socket pointer is required for sending back the send time stamp from > inside the device driver. Calling skb_orphan() unconditionally as in > this patch would break the hardware time stamping of outgoing packets. Indeed, we need to check that case, at a minimum. And there are other potentially other problems. For example, I wonder how this interacts with the new TX MMAP af_packet support in net-next-2.6 :-/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Rusty Russell Date: Tue, 2 Jun 2009 23:38:29 +0930 > On Tue, 2 Jun 2009 04:55:53 pm David Miller wrote: >> From: Patrick Ohly >> Date: Mon, 01 Jun 2009 21:47:22 +0200 >> >> > On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote: >> >> This patch adds skb_orphan to the start of dev_hard_start_xmit(): it >> >> can be premature in the NETDEV_TX_BUSY case, but that's uncommon. >> > >> > Would it be possible to make the new skb_orphan() at the start of >> > dev_hard_start_xmit() conditionally so that it is not executed for >> > packets that are to be time stamped? >> > >> > As discussed before >> > (http://article.gmane.org/gmane.linux.network/121378/), the skb->sk >> > socket pointer is required for sending back the send time stamp from >> > inside the device driver. Calling skb_orphan() unconditionally as in >> > this patch would break the hardware time stamping of outgoing packets. >> >> Indeed, we need to check that case, at a minimum. >> >> And there are other potentially other problems. For example, I >> wonder how this interacts with the new TX MMAP af_packet support >> in net-next-2.6 :-/ > > I think I'll do this in the driver for now, and let's revisit doing it > generically later? That might be the best course of action for the time being. This whole area is a rat's nest. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Rusty Russell Date: Thu, 4 Jun 2009 13:24:57 +0930 > On Thu, 4 Jun 2009 06:32:53 am Eric Dumazet wrote: >> Also, taking a reference on socket for each xmit packet in flight is very >> expensive, since it slows down receiver in __udp4_lib_lookup(). Several >> cpus are fighting for sk->refcnt cache line. > > Now we have decent dynamic per-cpu, we can finally implement bigrefs. More > obvious for device counts than sockets, but perhaps applicable here as well? It might be very beneficial for longer lasting, active, connections, but for high connection rates it's going to be a lose in my estimation. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Eric Dumazet Date: Thu, 04 Jun 2009 06:54:24 +0200 > We also can avoid the sock_put()/sock_hold() pair for each tx packet, > to only touch sk_wmem_alloc (with appropriate atomic_sub_return() in > sock_wfree() > and atomic_dec_test in sk_free > > We could initialize sk->sk_wmem_alloc to one instead of 0, so that > sock_wfree() could just synchronize itself with sk_free() Excellent idea Eric. > Patch will follow after some testing I look forward to it :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu Date: Fri, 3 Jul 2009 15:55:30 +0800 > Calling skb_orphan like this should be forbidden. Apart from the > problems already raised, it is a sign that the driver is trying to > paper over a more serious issue of not cleaning up skb's timely. > > Yes skb_orphan will work for the cases where calling the skb > destructor allows forward progress, but for the cases where you > really need to the skb to be freed (e.g., iSCSI or Xen), this > simply doesn't work. > > So anytime someone tries to propose such a solution it is a sign > that they have bigger problems. Agreed, but alas we are foaming at the mouth until we have a truly usable alternative. In particular the case of handling a device without usable TX completion event indications is still quite troublesome. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu Date: Sat, 4 Jul 2009 11:08:30 +0800 > On Fri, Jul 03, 2009 at 08:02:54PM -0700, David Miller wrote: >> >> In particular the case of handling a device without usable TX >> completion event indications is still quite troublesome. > e > Which particular devices do you have in mind? NIU I basically can't defer interrupts because the chip supports per-TX-desc interrupt indications but it lacks an "all TX queue sent" event. So if, say, tell it to interrupt every 1/4 of the TX queue then up to 1/4 of the queue can have packets "stuck" in there if TX activity all of a sudden ceases. The only thing I've come up with to be able to mitigate interrupts is to use an hrtimer of some sort. But that's going to be hard to get right, and who knows what kind of latencies will be introduced for TX completion packet freeing unless I am very carefull. And finally this belongs in generic code, not in the NIU driver, whatever we come up with. Especially since my understanding is that this is similar to what Rusty needs. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] net/bridge: Add 'hairpin' port forwarding mode
From: "Fischer, Anna" Date: Thu, 13 Aug 2009 16:55:16 + > This patch adds a 'hairpin' (also called 'reflective relay') mode > port configuration to the Linux Ethernet bridge kernel module. > A bridge supporting hairpin forwarding mode can send frames back > out through the port the frame was received on. > > Hairpin mode is required to support basic VEPA (Virtual > Ethernet Port Aggregator) capabilities. > > You can find additional information on VEPA here: > http://tech.groups.yahoo.com/group/evb/ > http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf > http://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf > > An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode' > is provided to allow configuring hairpin mode from userspace tools. > > Signed-off-by: Paul Congdon > Signed-off-by: Anna Fischer Applied to net-next-2.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu Date: Sun, 5 Jul 2009 11:34:08 +0800 > Here's an even crazier idea that doesn't use dummy descriptors. > > xmit(skb) > > if (TX queue contains no interrupting descriptor && > qdisc is empty) > mark TX descriptor as interrupting > > if (TX queue now contains an interrupting descriptor && > qdisc len < 2) > stop queue > > if (TX ring full) > stop queue > > clean() > > do work > wake queue as per usual I'm pretty sure that for normal TCP and UDP workloads, this is just going to set the interrupt bit on the first packet that gets into the queue, and then not in the rest. TCP just loops over packets in the send queue, and at initial state the qdisc will be empty. It's very hard to get this to work as well as if we had a real queue empty interrupt status event. Even if you get upstream status from the protocols saying "there's more packets coming" via some flag in the SKB, that only says what one client feeding the TX ring is about to do. It says nothing about other threads of control which are about to start feeding packets to the same device. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu Date: Wed, 19 Aug 2009 13:19:26 +1000 > I'm in the process of repeating the same experiment with cxgb3 > which hopefully should let me turn interrupts off on descriptors > while still reporting completion status. Ok, I look forward to seeing your work however it turns out. Once I see what you've done, I'll give it a spin on niu. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Rusty Russell <[EMAIL PROTECTED]> Date: Sat, 17 Mar 2007 21:33:58 +1100 > On Fri, 2007-03-16 at 13:38 -0700, Jeremy Fitzhardinge wrote: > > David Miller wrote: > > > Perhaps the problem can be dealt with using ELF relocations. > > > > > > There is another case, discussed yesterday on netdev, where run-time > > > resolution of ELF relocations would be useful (for > > > very-very-very-read-only variables) so if it can solve this problem > > > too it would be nice to have a generic infrastructure for it. > > > > That's an interesting idea. Have you or anyone else looked at what it > > would take to code up? > > > > For this case, I guess you'd walk the relocs looking for references into > > the paravirt_ops structure. You'd need to check that was a reference > > from an indirect jump or call instruction, which would identify a > > patchable callsite. The offset into the pv_ops structure would identify > > which operation is involved. > > I wrote a whole email on ways to do this, BUT... The idea is _NOT_ that you go look for references to the paravirt_ops members structure, that would be stupid and you wouldn't be able to use the most efficient addressing mode on a given cpu, you'd be patching up indirect calls and crap like that. Just say no... Instead you get rid of paravirt ops completely, and you call functions whose symbol name will not resolve in the initial kernel link. You do an initial link of the kernel, look for the unresolved symbols in the ELF relocation tables (just like the linker does), and put those references into a table that is use to patch things up and you can use standard ELF relocation code to handle this, exactly like code we already have for module loading in the kernel already. This idea is about 15 years old, sparc32 has been doing exactly this via something called "btfixup" to handle the page table, TLB, and cache differences of 15 different cpu+cache type combinations. > #define pv_patch(call, args...) \ > asm volatile(":"); > call(args); > asm volatile("8889:" >[ stuff to put 8889, and call in fixup section ] Please, use ELF and it's powerful and clean existing way to do this please. :-) > > What are the netdev requirements? > > Reading Ben LaHaise's (very cool!) patch, it's not clear that using > reloc postprocessing is going to be clearer than open-coding it as he > has done. Ben's case can be handled in the same way. Just do not define the symbols, pre-link, look for the references in the relocation tables, and run through that when you do the set_very_readonly() or install_paravirt_ops() thing. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Andi Kleen <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 11:57:28 +0100 > On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote: > > Andi Kleen wrote: > > For example, say we wanted to put a general call for sti into entry.S, > > where its expected it won't touch any registers. In that case, we'd > > have a sequence like: > > > > push %eax > > push %ecx > > push %edx > > call paravirt_cli > > pop %edx > > pop %ecx > > pop %eax > > This cannot right now be expressed as inline assembly in the unwinder at all > because there is no way to inject the push/pops into the compiler generated > ehframe tables. > > [BTW I plan to resubmit the unwinder with some changes] It's inability to handle sequences like the above sounds to me like a very good argument to _not_ merge the unwinder back into the tree. To me, that unwinder is nothing but trouble, it severly limits what cases you can use special calling conventions via inline asm (and we have done that on several occaisions) and even ignoring that the unwinder only works half the time. Please don't subject us to another couple months of hair-pulling only to have Linus yank the thing out again, there are certainly more useful things to spend time on :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 12:10:08 -0700 > All this is doable; I'd probably end up hacking boot/compressed/relocs.c > to generate the appropriate reloc table. My main concern is hacking the > kernel build process itself; I'm unsure of what it would actually take > to implement all this. 32-bit Sparc's btfixup might be usable as a guide. Another point worth making is that for function calls you can fix things up lazily if you want. So you link, build the reloc tables, then link in a *.o file that does provide the functions in the form of stubs. The stubs intercept the call, and patch the callsite, then revector to the real handler. I don't like this idea actually because it essentially means you either: 1) Only allow one setting of the operations OR 2) Need to have code which walks the whole reloc table anyways to handle settings after the first so you can revector everyone back to the stubs and lazy reloc properly again In fact forget I mentioned this idea :) As another note, I do agree with Linus about the register usage arguments. It is important. I think it's been mentioned but what you could do is save nothing (so that "sti" and "cli" are just that and cost nothing), but the more complicated versions save and restore enough registers to operate. It all depends upon what you're trying to do. For example, it's easy to use patching to make different PTE layouts be supportable in the same kernel image. We do this on sparc64 since sun4v has a different PTE layout than sun4u, you can see the code in asm-sparc64/pgtable.h for details (search for "sun4v_*_patch") ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT) > > > Please don't subject us to another couple months of hair-pulling only > > > to have Linus yank the thing out again, there are certainly more > > > useful things to spend time on :-) > > Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it > out, I simply won't merge it. It was more than just totally buggy code, it > was an inability of the people to understand that even bugfree code > isn't enough - you have to be able to also handle buggy data. Thank you. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Paul Mackerras <[EMAIL PROTECTED]> Date: Wed, 21 Mar 2007 11:03:14 +1100 > Linus Torvalds writes: > > > We should just do this natively. There's been several tests over the years > > saying that it's much more efficient to do sti/cli as a simple store, and > > handling the "oops, we got an interrupt while interrupts were disabled" as > > a special case. > > > > I have this dim memory that ARM has done it that way for a long time > > because it's so expensive to do a "real" cli/sti. > > > > And I think -rt does it for other reasons. It's just more flexible. > > 64-bit powerpc does this now as well. I was curious about this so I had a look. There appears to be three pieces of state used to manage this on powerpc, PACASOFTIRQEN(r13), PACAHARDIRQEN(r13) and the SOFTE() in the stackframe. Plus there is all of this complicated logic on trap entry and exit to manage these three values properly. local_irq_restore() doesn't look like a simple piece of code either. Logically it should be simple, update the software binary state, and if enabling see if any interrupts came in while we were disable so we can run them. Given all of that, is it really cheaper than just flipping the bit in the cpu control register? :-/ ___ Virtualization mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [2/2] [NET] link_watch: Remove delay for up even when we're down
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 8 May 2007 22:16:09 +1000 > [NET]: Remove link_watch delay for up even when we're down > > Currently all link carrier events are delayed by up to a second > before they're processed to prevent link storms. This causes > unnecessary packet loss during that interval. > > In fact, we can achieve the same effect in preventing storms by > only delaying down events and unnecssary up events. The latter > is defined as up events when we're already up. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Also applied, thanks Herbert. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 8 May 2007 22:13:22 +1000 > [NET] link_watch: Move link watch list into net_device > > These days the link watch mechanism is an integral part of the > network subsystem as it manages the carrier status. So it now > makes sense to allocate some memory for it in net_device rather > than allocating it on demand. > > In fact, this is necessary because we can't tolerate a memory > allocation failure since that means we'd have to potentially > throw a link up event away. > > It also simplifies the code greatly. > > In doing so I discovered a subtle race condition in the use > of singleevent. This race condition still exists (and is > somewhat magnified) without singleevent but it's now plugged > thanks to an smp_mb__before_clear_bit. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Applied, thanks Herbert. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Date: Thu, 10 May 2007 15:00:05 -0700 > Herbert Xu wrote: > > [NET] link_watch: Move link watch list into net_device > > > > These days the link watch mechanism is an integral part of the > > network subsystem as it manages the carrier status. So it now > > makes sense to allocate some memory for it in net_device rather > > than allocating it on demand. > > I think there's a problem with one of these two patches. Yes, there are :-) Did you catch the follow-on bug fixes? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Date: Thu, 10 May 2007 15:22:17 -0700 > Andrew Morton wrote: > > Five minutes after boot is when jiffies wraps. Are you sure it's > > a list-screwup rather than a jiffy-wrap screwup? > > > > > Hm, its suggestive, isn't it? Apparently they've already fixed this in > the sekret networking clubhouse, so I'll need to track it down. I'm not so certain now that we know it's the jiffies wrap point :-) The fixes in question are attached below and they were posted and discussed on netdev: commit fe47cdba83b3041e4ac1aa1418431020a4afe1e0 Author: Herbert Xu <[EMAIL PROTECTED]> Date: Tue May 8 23:22:43 2007 -0700 [NET] link_watch: Eliminate potential delay on wrap-around When the jiffies wrap around or when the system boots up for the first time, down events can be delayed indefinitely since we no longer update linkwatch_nextevent when only urgent events are processed. This patch fixes this by setting linkwatch_nextevent when a wrap-around occurs. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/net/core/link_watch.c b/net/core/link_watch.c index b5f4579..4674ae5 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -101,8 +101,10 @@ static void linkwatch_schedule_work(unsigned long delay) return; /* If we wrap around we'll delay it by at most HZ. */ - if (delay > HZ) + if (delay > HZ) { + linkwatch_nextevent = jiffies; delay = 0; + } schedule_delayed_work(&linkwatch_work, delay); } commit 4cba637dbb9a13706494a1c85174c8e736914010 Author: Herbert Xu <[EMAIL PROTECTED]> Date: Wed May 9 00:17:30 2007 -0700 [NET] link_watch: Always schedule urgent events Urgent events may be delayed if we already have a non-urgent event queued for that device. This patch changes this by making sure that an urgent event is always looked at immediately. I've replaced the LW_RUNNING flag by LW_URGENT since whether work is scheduled is already kept track by the work queue system. The only complication is that we have to provide some exclusion for the setting linkwatch_nextevent which is available in the actual work function. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 4674ae5..a5e372b 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -26,7 +26,7 @@ enum lw_bits { - LW_RUNNING = 0, + LW_URGENT = 0, }; static unsigned long linkwatch_flags; @@ -95,18 +95,41 @@ static void linkwatch_add_event(struct net_device *dev) } -static void linkwatch_schedule_work(unsigned long delay) +static void linkwatch_schedule_work(int urgent) { - if (test_and_set_bit(LW_RUNNING, &linkwatch_flags)) + unsigned long delay = linkwatch_nextevent - jiffies; + + if (test_bit(LW_URGENT, &linkwatch_flags)) return; - /* If we wrap around we'll delay it by at most HZ. */ - if (delay > HZ) { - linkwatch_nextevent = jiffies; + /* Minimise down-time: drop delay for up event. */ + if (urgent) { + if (test_and_set_bit(LW_URGENT, &linkwatch_flags)) + return; delay = 0; } - schedule_delayed_work(&linkwatch_work, delay); + /* If we wrap around we'll delay it by at most HZ. */ + if (delay > HZ) + delay = 0; + + /* +* This is true if we've scheduled it immeditately or if we don't +* need an immediate execution and it's already pending. +*/ + if (schedule_delayed_work(&linkwatch_work, delay) == !delay) + return; + + /* Don't bother if there is nothing urgent. */ + if (!test_bit(LW_URGENT, &linkwatch_flags)) + return; + + /* It's already running which is good enough. */ + if (!cancel_delayed_work(&linkwatch_work)) + return; + + /* Otherwise we reschedule it again for immediate exection. */ + schedule_delayed_work(&linkwatch_work, 0); } @@ -123,7 +146,11 @@ static void __linkwatch_run_queue(int urgent_only) */ if (!urgent_only) linkwatch_nextevent = jiffies + HZ; - clear_bit(LW_RUNNING, &linkwatch_flags); + /* Limit wrap-around effect on delay. */ + else if (time_after(linkwatch_nextevent, jiffies + HZ)) + linkwatch_nextevent = jiffies; + + clear_bit(LW_URGENT, &linkwatch_flags); spin_lock_irq(&lweventlist_lock); next = lweventlist; @@ -166,7 +193,7 @@ static void __linkwatch_run_queue(int urgent_only) } if (lweventlist) - linkwatch_schedule_work(linkwatch_nextev
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Date: Thu, 10 May 2007 15:45:42 -0700 > David Miller wrote: > > I'm not so certain now that we know it's the jiffies wrap point :-) > > > > The fixes in question are attached below and they were posted and > > discussed on netdev: > > > > Yep, this patch gets rid of my spinning thread. I can't find this patch > or any discussion on marc.info; is there a better netdev list archive? I don't see it there either... let me check my mail archive... Indeed, they were "posted" to netdev but were blocked by the vger regexp filters on the keyword "urgent" so that postings never made it to the list. I removed that filter regexp so that never happens again, sorry. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] Virtio draft IV: the net driver
From: Christian Borntraeger <[EMAIL PROTECTED]> Date: Wed, 11 Jul 2007 12:45:40 +0200 > Am Mittwoch, 4. Juli 2007 schrieb Rusty Russell: > > +static void receive_skb(struct net_device *dev, struct sk_buff *skb, > [...] > > + netif_rx(skb); > > In the NAPI case, we should use netif_receive_skb, no? NAPI doesn't make sense for virtual devices, my Sun LDOM nework driver won't use NAPI either. It's also too cumbersome to use NAPI with the way virtualized network drivers work (multiple ports, each with an interrupt source, not just one) until the NAPI split patches are ported and applied upstream and that won't be for a while. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] Virtio draft IV: the net driver
From: Rusty Russell <[EMAIL PROTECTED]> Date: Thu, 12 Jul 2007 12:21:33 +1000 > Dave, I think you're the only one (so far?) with multiple irqs. Luckily there are known hw implementations with that issue so I won't be weird for long :) > It's not clear that guest-controlled interrupt mitigation is the best > approach for virtual devices, but at the moment it doesn't hurt. It would be nice for consistency's sake, once it is easy to do so. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] skb_partial_csum_set
From: Rusty Russell <[EMAIL PROTECTED]> Date: Tue, 15 Jan 2008 21:41:55 +1100 > Implement skb_partial_csum_set, for setting partial csums on untrusted > packets. > > Use it in virtio_net (replacing buggy version there), it's also going > to be used by TAP for partial csum support. > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> Looks fine to me. Acked-by: David S. Miller <[EMAIL PROTECTED]> If you like I can merge this into my net-2.6.25 tree, or alternatively if it makes your life easier you then you can handle it yourself. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH for-3.7] vhost: fix mergeable bufs on BE hosts
From: "Michael S. Tsirkin" Date: Sun, 21 Oct 2012 14:49:01 +0200 > On Mon, Oct 15, 2012 at 07:55:34PM +0200, Michael S. Tsirkin wrote: >> We copy head count to a 16 bit field, >> this works by chance on LE but on BE >> guest gets 0. Fix it up. >> >> Signed-off-by: Michael S. Tsirkin >> Tested-by: Alexander Graf >> Cc: sta...@kernel.org > > Ping. Dave, could you apply this to -net please? Pinging me but not cc:'ing me? That's really strange. What if I operate by just mass deleting things that I'm not explicitly on the To: or CC: when I'm very backlogged? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH for-3.7] vhost: fix mergeable bufs on BE hosts
From: "Michael S. Tsirkin" Date: Wed, 24 Oct 2012 18:24:38 +0200 > Would you like me to repost the patch? This question is almost retorical. I said I don't reliably read things I'm not explicitly CC:'d on, therefore it's possible (and in fact, likely) I don't have the patch in my inbox. What do you think you should do? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH repost for-3.7] vhost: fix mergeable bufs on BE hosts
From: "Michael S. Tsirkin" Date: Wed, 24 Oct 2012 20:37:51 +0200 > We copy head count to a 16 bit field, this works by chance on LE but on > BE guest gets 0. Fix it up. > > Signed-off-by: Michael S. Tsirkin > Tested-by: Alexander Graf > Cc: sta...@vger.kernel.org Applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs
From: "Michael S. Tsirkin" Date: Wed, 31 Oct 2012 12:31:06 +0200 > -void vhost_zerocopy_callback(struct ubuf_info *ubuf) > +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status) If you're only reporting true/false values, even just for now, please use 'bool' for this. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs
From: "Michael S. Tsirkin" Date: Thu, 1 Nov 2012 18:16:11 +0200 > Do you think it's over-engineering, or a good idea? Engineer what you need, not what you might need. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv3 net-next 0/8] enable/disable zero copy tx dynamically
From: "Michael S. Tsirkin" Date: Thu, 1 Nov 2012 21:16:17 +0200 > > tun supports zero copy transmit since > 0690899b4d4501b3505be069b9a687e68ccbe15b, > however you can only enable this mode if you know your workload does not > trigger heavy guest to host/host to guest traffic - otherwise you > get a (minor) performance regression. > This patchset addresses this problem by notifying the owner > device when callback is invoked because of a data copy. > This makes it possible to detect whether zero copy is appropriate > dynamically: we start in zero copy mode, when we detect > data copied we disable zero copy for a while. > > With this patch applied, I get the same performance for > guest to host and guest to guest both with and without zero copy tx. Series applied, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] VSOCK for Linux upstreaming
The big and only question is whether anyone can actually use any of this stuff without your proprietary bits? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] VSOCK for Linux upstreaming
From: David Miller Date: Mon, 05 Nov 2012 13:09:17 -0500 (EST) > The big and only question is whether anyone can actually use any of > this stuff without your proprietary bits? And BTW vm-crosst...@vmware.com bounces, take it out of the CC: list on all future emails. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] vhost: Remove duplicate inclusion of linux/vhost.h
From: Sachin Kamat Date: Mon, 19 Nov 2012 16:58:28 +0530 > linux/vhost.h was included twice. > > Signed-off-by: Sachin Kamat Michael, are you gonna take this? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] vhost: Remove duplicate inclusion of linux/vhost.h
From: "Michael S. Tsirkin" Date: Mon, 19 Nov 2012 21:49:55 +0200 > On Mon, Nov 19, 2012 at 02:18:13PM -0500, David Miller wrote: >> From: Sachin Kamat >> Date: Mon, 19 Nov 2012 16:58:28 +0530 >> >> > linux/vhost.h was included twice. >> > >> > Signed-off-by: Sachin Kamat >> >> Michael, are you gonna take this? >> >> Thanks. > > Pls pick it up. > > Acked-by: Michael S. Tsirkin Applied to net-next, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-blk: Add vhost-blk support v5
From: "Michael S. Tsirkin" Date: Mon, 26 Nov 2012 17:14:16 +0200 > On Mon, Nov 19, 2012 at 10:26:41PM +0200, Michael S. Tsirkin wrote: >> > >> > Userspace bits: >> > - >> > 1) LKVM >> > The latest vhost-blk userspace bits for kvm tool can be found here: >> > g...@github.com:asias/linux-kvm.git blk.vhost-blk >> > >> > 2) QEMU >> > The latest vhost-blk userspace prototype for QEMU can be found here: >> > g...@github.com:asias/qemu.git blk.vhost-blk >> > >> > Changes in v5: >> > - Do not assume the buffer layout >> > - Fix wakeup race >> > >> > Changes in v4: >> > - Mark req->status as userspace pointer >> > - Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status() >> > - Add if (need_resched()) schedule() in blk thread >> > - Kill vhost_blk_stop_vq() and move it into vhost_blk_stop() >> > - Use vq_err() instead of pr_warn() >> > - Fail un Unsupported request >> > - Add flush in vhost_blk_set_features() >> > >> > Changes in v3: >> > - Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph! >> > - Check file passed by user is a raw block device file >> > >> > Signed-off-by: Asias He >> >> Since there are files shared by this and vhost net >> it's easiest for me to merge this all through the >> vhost tree. > > Hi Dave, are you ok with this proposal? I have no problems with this, for networking parts: Acked-by: David S. Miller ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: initialize zcopy packet counters
From: "Michael S. Tsirkin" Date: Mon, 3 Dec 2012 19:31:51 +0200 > These packet counters are used to drive the zercopy > selection heuristic so nothing too bad happens if they are off a bit - > and they are also reset once in a while. > But it's cleaner to clear them when backend is set so that > we start in a known state. > > Signed-off-by: Michael S. Tsirkin Applied to net-next, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v3 0/3] Multiqueue support in virtio-net
From: Jason Wang Date: Sat, 8 Dec 2012 01:04:54 +0800 > This series is an update version (hope the final version) of multiqueue > (VIRTIO_NET_F_MQ) support in virtio-net driver. All previous comments were > addressed, the work were based on Krishna Kumar's work to let virtio-net use > multiple rx/tx queues to do the packets reception and transmission. > Performance > test show the aggregate latency were increased greately but may get some > regression in small packet transmission. Due to this, multiqueue were disabled > by default. If user want to benefit form the multiqueue, ethtool -L could be > used to enable the feature. > > Please review and comments. > > A protype implementation of qemu-kvm support could by found in > git://github.com/jasowang/qemu-kvm-mq.git. To start a guest with two queues, > you > could specify the queues parameters to both tap and virtio-net like: > > ./qemu-kvm -netdev tap,queues=2,... -device virtio-net-pci,queues=2,... > > then enable the multiqueue through ethtool by: > > ethtool -L eth0 combined 2 It seems like most, if not all, of the feedback given for this series has been addressed by Jason. Can I get some ACKs? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v3 0/3] Multiqueue support in virtio-net
From: Jason Wang Date: Sat, 8 Dec 2012 01:04:54 +0800 > This series is an update version (hope the final version) of multiqueue > (VIRTIO_NET_F_MQ) support in virtio-net driver. All previous comments were > addressed, the work were based on Krishna Kumar's work to let virtio-net use > multiple rx/tx queues to do the packets reception and transmission. > Performance > test show the aggregate latency were increased greately but may get some > regression in small packet transmission. Due to this, multiqueue were disabled > by default. If user want to benefit form the multiqueue, ethtool -L could be > used to enable the feature. These changes look fine to me, applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] VSOCK for Linux upstreaming
From: Greg KH Date: Tue, 8 Jan 2013 16:21:10 -0800 > On Tue, Jan 08, 2013 at 03:59:08PM -0800, George Zhang wrote: >> >> * * * >> >> This series of VSOCK linux upstreaming patches include latest udpate from >> VMware to address Greg's and all other's code review comments. > > Dave, you acked these patches a while ago, Really? I'd like to see where I did that. Instead, what I remember doing was deferring to the feedback these folks received, stating that ideas that the virtio people had mentioned should be considered instead. http://marc.info/?l=linux-netdev&m=135301515818462&w=2 So definitely NACK this code and any infrastructure you've merged which essentialy depends upon it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming
From: Dmitry Torokhov Date: Tue, 08 Jan 2013 17:41:44 -0800 > On Tuesday, January 08, 2013 05:30:56 PM David Miller wrote: >> From: Greg KH >> Date: Tue, 8 Jan 2013 16:21:10 -0800 >> >> > On Tue, Jan 08, 2013 at 03:59:08PM -0800, George Zhang wrote: >> >> * * * >> >> >> >> This series of VSOCK linux upstreaming patches include latest udpate from >> >> VMware to address Greg's and all other's code review comments. >> > >> > Dave, you acked these patches a while ago, >> >> Really? I'd like to see where I did that. >> >> Instead, what I remember doing was deferring to the feedback these >> folks received, stating that ideas that the virtio people had >> mentioned should be considered instead. >> >> http://marc.info/?l=linux-netdev&m=135301515818462&w=2 > > I believe Andy replied to Anthony's AF_VMCHANNEL post and the differences > between the proposed solutions. I'd much rather see a hypervisor neutral solution than a hypervisor specific one which this certainly is. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 2/3] net: split eth_mac_addr for better error handling
From: ak...@redhat.com Date: Sun, 20 Jan 2013 10:43:08 +0800 > From: Stefan Hajnoczi > > When we set mac address, software mac address in system and hardware mac > address all need to be updated. Current eth_mac_addr() doesn't allow > callers to implement error handling nicely. > > This patch split eth_mac_addr() to prepare part and real commit part, > then we can prepare first, and try to change hardware address, then do > the real commit if hardware address is set successfully. > > Signed-off-by: Stefan Hajnoczi > Signed-off-by: Amos Kong This patch doesn't apply to net-next. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 0/3] make mac programming for virtio net more robust
From: Amos Kong Date: Mon, 21 Jan 2013 19:17:20 +0800 > Currenly mac is programmed byte by byte. This means that we > have an intermediate step where mac is wrong. > > Third patch introduced a new vq control command to set mac > address, it's atomic. > > V2: check return of sending command, delay eth_mac_addr() > V3: restore software address when fail to set hardware address > V4: split eth_mac_addr, fix error handle > V5: rebase patches to net-next tree I'll apply this series, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V8 1/3] virtio-net: fix the set affinity bug when CPU IDs are not consecutive
From: Wanlong Gao Date: Fri, 25 Jan 2013 17:51:29 +0800 > As Michael mentioned, set affinity and select queue will not work very > well when CPU IDs are not consecutive, this can happen with hot unplug. > Fix this bug by traversal the online CPUs, and create a per cpu variable > to find the mapping from CPU to the preferable virtual-queue. > > Cc: Rusty Russell > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Eric Dumazet > Cc: "David S. Miller" > Cc: virtualization@lists.linux-foundation.org > Cc: net...@vger.kernel.org > Signed-off-by: Wanlong Gao > Acked-by: Michael S. Tsirkin Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V8 2/3] virtio-net: split out clean affinity function
From: Wanlong Gao Date: Fri, 25 Jan 2013 17:51:30 +0800 > Split out the clean affinity function to virtnet_clean_affinity(). > > Cc: Rusty Russell > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Eric Dumazet > Cc: "David S. Miller" > Cc: virtualization@lists.linux-foundation.org > Cc: net...@vger.kernel.org > Signed-off-by: Wanlong Gao > Acked-by: Michael S. Tsirkin Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V8 3/3] virtio-net: reset virtqueue affinity when doing cpu hotplug
From: Wanlong Gao Date: Fri, 25 Jan 2013 17:51:31 +0800 > Add a cpu notifier to virtio-net, so that we can reset the > virtqueue affinity if the cpu hotplug happens. It improve > the performance through enabling or disabling the virtqueue > affinity after doing cpu hotplug. > > Cc: Rusty Russell > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Eric Dumazet > Cc: "David S. Miller" > Cc: virtualization@lists.linux-foundation.org > Cc: net...@vger.kernel.org > Signed-off-by: Wanlong Gao > Acked-by: Michael S. Tsirkin Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/8] drivers/net: Remove unnecessary alloc/OOM messages
From: Joe Perches Date: Sun, 3 Feb 2013 19:28:07 -0800 > Remove all the OOM messages that follow kernel alloc > failures as there is already a generic equivalent to > these messages in the mm subsystem. > > Joe Perches (8): > caif: Remove unnecessary alloc/OOM messages > can: Remove unnecessary alloc/OOM messages > ethernet: Remove unnecessary alloc/OOM messages, alloc cleanups > drivers: net: usb: Remove unnecessary alloc/OOM messages > wan: Remove unnecessary alloc/OOM messages > wimax: Remove unnecessary alloc/OOM messages, alloc cleanups > wireless: Remove unnecessary alloc/OOM messages, alloc cleanups > drivers:net:misc: Remove unnecessary alloc/OOM messages Series applied, thanks Joe. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] [PATCH 0/1] VM Sockets for Linux upstreaming
From: Dmitry Torokhov Date: Fri, 8 Feb 2013 17:20:44 -0800 > Hi David, > > On Wed, Feb 06, 2013 at 04:23:55PM -0800, Andy King wrote: >> In an effort to improve the out-of-the-box experience with Linux kernels for >> VMware users, VMware is working on readying the VM Sockets (VSOCK, formerly >> VMCI Sockets) (vsock) kernel module for inclusion in the Linux kernel. The >> purpose of this post is to acquire feedback on the vsock kernel module. >> >> Unlike previous submissions, where the new socket family was entirely reliant >> on VMware's VMCI PCI device (and thus VMware's hypervisor), VM Sockets is now >> completely[1] separated out into two parts, each in its own module: >> >> o Core socket code, which is transport-neutral and invokes transport >> callbacks to communicate with the hypervisor. This is vsock.ko. >> o A VMCI transport, which communicates over VMCI with the VMware hypervisor. >> This is vmw_vsock_vmci_transport.ko, and it registers with the core module >> as a transport. >> >> This should provide a path to introducing additional transports, for example >> virtio, with the ultimate goal being to make this new socket family >> hypervisor-neutral. > > As Andy mentioned in another e-mail, we would like very much to get > vsock in 3.9 release, so now that it is split into hypervisor neutral > and transport parts is there any high level issues that we need to > resolve before the code can be accepted? I have no idea, I haven't gotten to reviewing your changes yet, and I will do so at a time of my own choosing. Pressing me about the matter is unlikely to make me review things any faster, and in fact will have the opposite effect. Therefore, just be patient like everyone else is. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/1] VM Sockets for Linux upstreaming
From: Andy King Date: Wed, 6 Feb 2013 16:23:55 -0800 > In an effort to improve the out-of-the-box experience with Linux kernels for > VMware users, VMware is working on readying the VM Sockets (VSOCK, formerly > VMCI Sockets) (vsock) kernel module for inclusion in the Linux kernel. The > purpose of this post is to acquire feedback on the vsock kernel module. > > Unlike previous submissions, where the new socket family was entirely reliant > on VMware's VMCI PCI device (and thus VMware's hypervisor), VM Sockets is now > completely[1] separated out into two parts, each in its own module: > > o Core socket code, which is transport-neutral and invokes transport > callbacks to communicate with the hypervisor. This is vsock.ko. > o A VMCI transport, which communicates over VMCI with the VMware hypervisor. > This is vmw_vsock_vmci_transport.ko, and it registers with the core module > as a transport. > > This should provide a path to introducing additional transports, for example > virtio, with the ultimate goal being to make this new socket family > hypervisor-neutral. Applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/4] Minor vSockets fixes
From: Andy King Date: Mon, 18 Feb 2013 08:04:09 -0800 > Minor vSockets fixes, two of which were reported on LKML. Series applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost_net: remove tx polling state
From: Jason Wang Date: Thu, 7 Mar 2013 12:31:56 +0800 > After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle > polling > errors when setting backend), we in fact track the polling state through > poll->wqh, so there's no need to duplicate the work with an extra > vhost_net_polling_state. So this patch removes this and make the code simpler. > > Signed-off-by: Jason Wang Can I get an ACK or two from some VHOST folks? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] VSOCK: Split vm_sockets.h into kernel/uapi
From: David Howells Date: Fri, 08 Mar 2013 01:09:18 + > Greg KH wrote: > >> David, is there any rush to get stuff like this into 3.9 for any >> uapi-type changes, or can it just wait for 3.10? > > Not especially. It won't appear in userspace due to the __KERNEL__ guards. I've applied this to net-next, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv3 vringh] caif_virtio: Introduce caif over virtio
From: Erwan Yvin Date: Fri, 15 Mar 2013 10:42:17 +0100 > caif-virtio is going to replace caif-shm. > This patch should be merged in rusty's tree. (vringh) > because there is a dependency with vringh wrapper. Feel free to add my: Acked-by: David S. Miller ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] vhost/net: fix heads usage of ubuf_info
From: "Michael S. Tsirkin" Date: Sun, 17 Mar 2013 14:46:09 +0200 > ubuf info allocator uses guest controlled head as an index, > so a malicious guest could put the same head entry in the ring twice, > and we will get two callbacks on the same value. > To fix use upend_idx which is guaranteed to be unique. > > Reported-by: Rusty Russell > Signed-off-by: Michael S. Tsirkin Applied and queued up for -stable, thanks. And thankfully you got the stable URL wrong, please do not CC: networking patches to stable, just make sure I apply them and in your post-commit text explicitly ask me to queue it up to my -stable queue. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost_net: remove tx polling state
From: "Michael S. Tsirkin" Date: Thu, 11 Apr 2013 10:24:30 +0300 > On Thu, Apr 11, 2013 at 02:50:48PM +0800, Jason Wang wrote: >> After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle >> polling >> errors when setting backend), we in fact track the polling state through >> poll->wqh, so there's no need to duplicate the work with an extra >> vhost_net_polling_state. So this patch removes this and make the code >> simpler. >> >> This patch also removes the all tx starting/stopping code in tx path >> according >> to Michael's suggestion. >> >> Netperf test shows almost the same result in stream test, but gets >> improvements >> on TCP_RR tests (both zerocopy or copy) especially on low load cases. >> >> Tested between multiqueue kvm guest and external host with two direct >> connected 82599s. ... >> Signed-off-by: Jason Wang > > Less code and better speed, what's not to like. > Davem, could you pick this up for 3.10 please? > > Acked-by: Michael S. Tsirkin Applied to net-next, thanks everyone. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
From: Rusty Russell Date: Thu, 16 May 2013 09:05:38 +0930 > memcpy_fromiovec() has nothing to do with networking: that was just the > first user. Note that crypto/algif_skcipher.c also uses it. The > obvious answer is to move it into lib/. +1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec
From: "Michael S. Tsirkin" Date: Thu, 16 May 2013 09:46:21 +0300 > On Wed, May 15, 2013 at 08:10:55PM -0700, David Miller wrote: >> From: Rusty Russell >> Date: Thu, 16 May 2013 09:05:38 +0930 >> >> > memcpy_fromiovec() has nothing to do with networking: that was just the >> > first user. Note that crypto/algif_skcipher.c also uses it. The >> > obvious answer is to move it into lib/. >> >> +1 > > Rusty sent a patch that does this: > http://patchwork.ozlabs.org/patch/244207/ > > David, looks like you weren't CC'd. > If you agree could you please Ack that patch and then I can merge it > through the vhost tree? > Or if you prefer merge it directly and I'll sort out the dependencies... Acked-by: David S. Miller ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V2] virtio_net: enable napi for all possible queues during open
From: Jason Wang Date: Wed, 22 May 2013 14:03:58 +0800 > Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx > queues which are being used) only does the napi enabling during open for > curr_queue_pairs. This will break multiqueue receiving since napi of new > queues > were still disabled after changing the number of queues. > > This patch fixes this by enabling napi for all possible queues during open. > > Cc: Sasha Levin > Signed-off-by: Jason Wang Applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx
From: "Michael S. Tsirkin" Date: Wed, 5 Jun 2013 12:02:52 +0300 > On Wed, Jun 05, 2013 at 03:40:46PM +0800, Jason Wang wrote: >> When we decide not use zero-copy, msg.control should be set to NULL otherwise >> macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs >> wrongly. >> >> Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84 >> (vhost-net: skip head management if no outstanding). >> >> This solves the following warnings: >> >> WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() >> Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp >> llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] >> CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 >> Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 >> a0198323 88007c9ebd08 81796b73 88007c9ebd48 >> 8103d66b 7b773e20 8800779f 8800779f43f0 >> 8800779f8418 015c 0062 88007c9ebd58 >> Call Trace: >> [] dump_stack+0x19/0x1e >> [] warn_slowpath_common+0x6b/0xa0 >> [] warn_slowpath_null+0x15/0x20 >> [] handle_tx+0x477/0x4b0 [vhost_net] >> [] handle_tx_kick+0x10/0x20 [vhost_net] >> [] vhost_worker+0xfe/0x1a0 [vhost_net] >> [] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] >> [] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] >> [] kthread+0xc6/0xd0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> [] ret_from_fork+0x7c/0xb0 >> [] ? kthread_freezable_should_stop+0x70/0x70 >> >> Signed-off-by: Jason Wang > > Good catch. > > Acked-by: Michael S. Tsirkin > > This needs to go into stable as well. Applied and queued up for -stable. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] vhost-net: fix use-after-free in vhost_net_flush
From: "Michael S. Tsirkin" Date: Thu, 20 Jun 2013 14:48:13 +0300 > vhost_net_ubuf_put_and_wait has a confusing name: > it will actually also free it's argument. > Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01 Never reference commits only by SHA1 ID, it is never sufficient. Always provide, after the SHA1 ID, in parenthesis, the header line from the commit message. To be honest, I'm kind of tired of telling people they need to do this over and over again. Maybe people keep forgetting because the reason why this is an issue hasn't really sunk in. If the patch you reference got backported into another tree, it will not have the SHA1 ID, and therefore someone reading the "fix" won't be able to find the fault causing change without going through a lot of trouble. By providing the commit header line you remove that problem altogether, no ambiguity is possible. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] vhost-net: fix use-after-free in vhost_net_flush
From: "Michael S. Tsirkin" Date: Tue, 25 Jun 2013 17:29:46 +0300 > vhost_net_ubuf_put_and_wait has a confusing name: > it will actually also free it's argument. > Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01 > "vhost-net: flush outstanding DMAs on memory change" > vhost_net_flush tries to use the argument after passing it > to vhost_net_ubuf_put_and_wait, this results > in use after free. > To fix, don't free the argument in vhost_net_ubuf_put_and_wait, > add an new API for callers that want to free ubufs. > > Acked-by: Asias He > Acked-by: Jason Wang > Signed-off-by: Michael S. Tsirkin This doesn't apply cleanly to the 'net' tree, please fix this up and resubmit. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 2/5] VSOCK: Introduce virtio-vsock-common.ko
From: Asias He Date: Thu, 27 Jun 2013 16:00:01 +0800 > +static void > +virtio_transport_recv_dgram(struct sock *sk, > + struct virtio_vsock_pkt *pkt) ... > + memcpy(skb->data, pkt, sizeof(*pkt)); > + memcpy(skb->data + sizeof(*pkt), pkt->buf, pkt->len); Are you sure this is right? Shouldn't you be using "sizeof(struct virtio_vsock_hdr)" instead of "sizeof(*pkt)". 'pkt' is "struct virtio_vsock_pkt" and has all kinds of meta-data you probably don't mean to include in the SKB. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH next] xen: Use more current logging styles
From: Ian Campbell Date: Fri, 28 Jun 2013 08:59:50 +0100 > On Thu, 2013-06-27 at 21:57 -0700, Joe Perches wrote: >> Instead of mixing printk and pr_ forms, >> just use pr_ >> >> Miscellaneous changes around these conversions: >> >> Add a missing newline to avoid message interleaving, >> coalesce formats, reflow modified lines to 80 columns. >> >> Signed-off-by: Joe Perches > > Acked-by: Ian Campbell Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix the race between channels setting and refill
From: Jason Wang Date: Wed, 3 Jul 2013 20:15:52 +0800 > Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx > queues > which are being used) tries to refill on demand when changing the number of > channels by call try_refill_recv() directly, this may race: > > - the refill work who may do the refill in the same time > - the try_refill_recv() called in bh since napi was not disabled > > Which may led guest complain during setting channels: > > virtio_net virtio0: input.1:id 0 is not a head! > > Solve this issue by scheduling a refill work which can guarantee the > serialization of refill. > > Cc: Sasha Levin > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang Michael, please review. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCHv3] vhost-net: fix use-after-free in vhost_net_flush
From: "Michael S. Tsirkin" Date: Sun, 7 Jul 2013 14:26:53 +0300 > vhost_net_ubuf_put_and_wait has a confusing name: > it will actually also free it's argument. > Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01 > "vhost-net: flush outstanding DMAs on memory change" > vhost_net_flush tries to use the argument after passing it > to vhost_net_ubuf_put_and_wait, this results > in use after free. > To fix, don't free the argument in vhost_net_ubuf_put_and_wait, > add an new API for callers that want to free ubufs. > > Acked-by: Asias He > Acked-by: Jason Wang > Signed-off-by: Michael S. Tsirkin Applied and queued up for -stable. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-net: put virtio net header inline with data
From: Rusty Russell Date: Tue, 09 Jul 2013 17:38:51 +0930 > If you convince DaveM, I won't object :) Simplifications are great, but not when the merge window opens up. Sorry, this isn't appropriate now. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-net: put virtio net header inline with data
From: Rusty Russell Date: Mon, 15 Jul 2013 11:13:25 +0930 > From: Michael S. Tsirkin > > For small packets we can simplify xmit processing > by linearizing buffers with the header: > most packets seem to have enough head room > we can use for this purpose. > Since existing hypervisors require that header > is the first s/g element, we need a feature bit > for this. > > Signed-off-by: Michael S. Tsirkin > Signed-off-by: Rusty Russell I really think this has to wait until the next merge window, sorry. Please resubmit this when I open net-next back up, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-net: put virtio net header inline with data
From: "Michael S. Tsirkin" Date: Wed, 17 Jul 2013 08:00:32 +0300 > On Tue, Jul 16, 2013 at 12:33:26PM -0700, David Miller wrote: >> From: Rusty Russell >> Date: Mon, 15 Jul 2013 11:13:25 +0930 >> >> > From: Michael S. Tsirkin >> > >> > For small packets we can simplify xmit processing >> > by linearizing buffers with the header: >> > most packets seem to have enough head room >> > we can use for this purpose. >> > Since existing hypervisors require that header >> > is the first s/g element, we need a feature bit >> > for this. >> > >> > Signed-off-by: Michael S. Tsirkin >> > Signed-off-by: Rusty Russell >> >> I really think this has to wait until the next merge window, sorry. >> >> Please resubmit this when I open net-next back up, thanks. > > I assumed since -rc1 is out net-next is already open? -rc1 being released never makes net-next open. Instead, I explicitly open it up at some point in time after -rc1 when I feel that things have settled down enough. And when that happens, I announce so here. So you have to follow my announcements here on netdev to know when net-next is actually open. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 0/3] networking: Use ETH_ALEN where appropriate
From: Joe Perches Date: Thu, 1 Aug 2013 16:17:46 -0700 > Convert the uses mac addresses to ETH_ALEN so > it's easier to find and verify where mac addresses > need to be __aligned(2) Series applied to net-next, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: Drop linux/socket.h
From: Asias He Date: Thu, 15 Aug 2013 11:20:16 +0800 > memcpy_fromiovec is moved to lib/iovec.c. No need to include > linux/socket.h for it. > > Signed-off-by: Asias He You can't do this. Because this file doesn't include the header file that provides the declaration, which is linux/uio.h linux/socket.h includes linux/uio.h, so honestly leaving things the way they are is a 1000 times better than your patch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: Drop linux/socket.h
From: Asias He Date: Fri, 16 Aug 2013 09:27:43 +0800 > On Thu, Aug 15, 2013 at 02:07:40PM -0700, David Miller wrote: >> From: Asias He >> Date: Thu, 15 Aug 2013 11:20:16 +0800 >> >> > memcpy_fromiovec is moved to lib/iovec.c. No need to include >> > linux/socket.h for it. >> > >> > Signed-off-by: Asias He >> >> You can't do this. >> >> Because this file doesn't include the header file that >> provides the declaration, which is linux/uio.h > > vhost.c includes drivers/vhost/vhost.h. In drivers/vhost/vhost.h, we > have linux/uio.h included. Nothing in vhost.h needs linux/uio.h right? That's very poor style, include the header where the dependency exists which is vhost.c ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: Drop linux/socket.h
From: Asias He Date: Fri, 16 Aug 2013 17:27:43 +0800 > On Fri, Aug 16, 2013 at 12:31:59AM -0700, David Miller wrote: >> From: Asias He >> Date: Fri, 16 Aug 2013 09:27:43 +0800 >> >> > On Thu, Aug 15, 2013 at 02:07:40PM -0700, David Miller wrote: >> >> From: Asias He >> >> Date: Thu, 15 Aug 2013 11:20:16 +0800 >> >> >> >> > memcpy_fromiovec is moved to lib/iovec.c. No need to include >> >> > linux/socket.h for it. >> >> > >> >> > Signed-off-by: Asias He >> >> >> >> You can't do this. >> >> >> >> Because this file doesn't include the header file that >> >> provides the declaration, which is linux/uio.h >> > >> > vhost.c includes drivers/vhost/vhost.h. In drivers/vhost/vhost.h, we >> > have linux/uio.h included. >> >> Nothing in vhost.h needs linux/uio.h right? That's very poor style, >> include the header where the dependency exists which is vhost.c > > We use 'struct iovec' in vhost.h which needs linux/uio.h, no? > > So, how about including linux/uio.h in both vhost.c and vhost.h. That sounds good. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] vhost: Include linux/uio.h instead of linux/socket.h
From: Asias He Date: Mon, 19 Aug 2013 09:23:19 +0800 > memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c. > linux/uio.h provides the declaration for memcpy_fromiovec. > > Include linux/uio.h instead of inux/socket.h for it. > > Signed-off-by: Asias He Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] VMXNET3: Add support for virtual IOMMU
From: Andy King Date: Tue, 20 Aug 2013 10:33:32 -0700 > We can't just do virt_to_phys() on memory that we pass to the device and > expect it to work in presence of a virtual IOMMU. We need to add IOMMU > mappings for such DMAs to work correctly. Fix that with > pci_alloc_consistent() where possible, or pci_map_single() where the > mapping is short-lived or we don't control the allocation (netdev). > > Also fix two small bugs: > 1) use after free of rq->buf_info in vmxnet3_rq_destroy() > 2) a cpu_to_le32() that should have been a cpu_to_le64() > > Acked-by: George Zhang > Acked-by: Aditya Sarwade > Signed-off-by: Andy King Please use dma_alloc_coherent() (or in fact dma_zalloc_coherent()), dma_map_single() et al., because they are preferred and in particular allow specification of GFP_* flags. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] VMXNET3: Add support for virtual IOMMU
From: Andy King Date: Fri, 23 Aug 2013 09:33:49 -0700 > This patch adds support for virtual IOMMU to the vmxnet3 module. We > switch to DMA consistent mappings for anything we pass to the device. > There were a few places where we already did this, but using pci_blah(); > these have been fixed to use dma_blah(), along with all new occurrences > where we've replaced kmalloc() and friends. > > Also fix two small bugs: > 1) use after free of rq->buf_info in vmxnet3_rq_destroy() > 2) a cpu_to_le32() that should have been a cpu_to_le64() > > Acked-by: George Zhang > Acked-by: Aditya Sarwade > Signed-off-by: Andy King Applied, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-net: Set RXCSUM feature if GUEST_CSUM is available
From: Thomas Huth Date: Tue, 27 Aug 2013 17:09:02 +0200 > If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest > does not have to calculate the checksums on all received packets. This > is pretty much the same feature as RX checksum offloading on real > network cards, so the virtio-net driver should report this by setting > the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she > can see whether the virtio-net interface has to calculate RX checksums > or not. > > Signed-off-by: Thomas Huth Can one of the virtio_net folks please review this? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 0/6] vhost code cleanup and minor enhancement
From: Jason Wang Date: Mon, 2 Sep 2013 16:40:55 +0800 > This series tries to unify and simplify vhost codes especially for > zerocopy. With this series, 5% - 10% improvement for per cpu throughput were > seen during netperf guest sending test. > > Plase review. Applied and patch #5 queued up for -stable, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi
From: "Michael S. Tsirkin" Date: Mon, 12 Apr 2021 18:33:45 -0400 > On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote: >> OK I started looking at this again. My idea is simple. >> A. disable callbacks before we try to drain skbs >> B. actually do disable callbacks even with event idx >> >> To make B not regress, we need to >> C. detect the common case of disable after event triggering and skip the >> write then. >> >> I added a new event_triggered flag for that. >> Completely untested - but then I could not see the warnings either. >> Would be very much interested to know whether this patch helps >> resolve the sruprious interrupt problem at all ... >> >> >> Signed-off-by: Michael S. Tsirkin > > Hmm a slightly cleaner alternative is to clear the flag when enabling > interrupts ... > I wonder which cacheline it's best to use for this. > > Signed-off-by: Michael S. Tsirkin Please make a fresh new submission if you want to use this approach, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [net-next PATCH V2] virtio-net: switch to use XPS to choose txq
From: Jason Wang Date: Mon, 30 Sep 2013 15:37:17 +0800 > We used to use a percpu structure vq_index to record the cpu to queue > mapping, this is suboptimal since it duplicates the work of XPS and > loses all other XPS functionality such as allowing use to configure > their own transmission steering strategy. > > So this patch switches to use XPS and suggest a default mapping when > the number of cpus is equal to the number of queues. With XPS support, > there's no need for keeping per-cpu vq_index and .ndo_select_queue(), > so they were removed also. > > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang > --- > Changes from V1: > - use cpumask_of() instead of allocate dynamically This generates build warnings: drivers/net/virtio_net.c: In function ‘virtnet_set_affinity’: drivers/net/virtio_net.c:1093:3: warning: passing argument 2 of ‘netif_set_xps_queue’ discards ‘const’ qualifier from pointer target type [enabled by default] In file included from drivers/net/virtio_net.c:20:0: include/linux/netdevice.h:2275:5: note: expected ‘struct cpumask *’ but argument is of type ‘const struct cpumask *’ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization