Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Rusty Russell [EMAIL PROTECTED] Date: Sat, 17 Mar 2007 21:33:58 +1100 On Fri, 2007-03-16 at 13:38 -0700, Jeremy Fitzhardinge wrote: David Miller wrote: Perhaps the problem can be dealt with using ELF relocations. There is another case, discussed yesterday on netdev, where run-time resolution of ELF relocations would be useful (for very-very-very-read-only variables) so if it can solve this problem too it would be nice to have a generic infrastructure for it. That's an interesting idea. Have you or anyone else looked at what it would take to code up? For this case, I guess you'd walk the relocs looking for references into the paravirt_ops structure. You'd need to check that was a reference from an indirect jump or call instruction, which would identify a patchable callsite. The offset into the pv_ops structure would identify which operation is involved. I wrote a whole email on ways to do this, BUT... The idea is _NOT_ that you go look for references to the paravirt_ops members structure, that would be stupid and you wouldn't be able to use the most efficient addressing mode on a given cpu, you'd be patching up indirect calls and crap like that. Just say no... Instead you get rid of paravirt ops completely, and you call functions whose symbol name will not resolve in the initial kernel link. You do an initial link of the kernel, look for the unresolved symbols in the ELF relocation tables (just like the linker does), and put those references into a table that is use to patch things up and you can use standard ELF relocation code to handle this, exactly like code we already have for module loading in the kernel already. This idea is about 15 years old, sparc32 has been doing exactly this via something called btfixup to handle the page table, TLB, and cache differences of 15 different cpu+cache type combinations. #define pv_patch(call, args...) \ asm volatile(:); call(args); asm volatile(8889: [ stuff to put 8889, and call in fixup section ] Please, use ELF and it's powerful and clean existing way to do this please. :-) What are the netdev requirements? Reading Ben LaHaise's (very cool!) patch, it's not clear that using reloc postprocessing is going to be clearer than open-coding it as he has done. Ben's case can be handled in the same way. Just do not define the symbols, pre-link, look for the references in the relocation tables, and run through that when you do the set_very_readonly() or install_paravirt_ops() thing. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Linus Torvalds [EMAIL PROTECTED] Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT) Please don't subject us to another couple months of hair-pulling only to have Linus yank the thing out again, there are certainly more useful things to spend time on :-) Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it out, I simply won't merge it. It was more than just totally buggy code, it was an inability of the people to understand that even bugfree code isn't enough - you have to be able to also handle buggy data. Thank you. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable
From: Paul Mackerras [EMAIL PROTECTED] Date: Wed, 21 Mar 2007 11:03:14 +1100 Linus Torvalds writes: We should just do this natively. There's been several tests over the years saying that it's much more efficient to do sti/cli as a simple store, and handling the oops, we got an interrupt while interrupts were disabled as a special case. I have this dim memory that ARM has done it that way for a long time because it's so expensive to do a real cli/sti. And I think -rt does it for other reasons. It's just more flexible. 64-bit powerpc does this now as well. I was curious about this so I had a look. There appears to be three pieces of state used to manage this on powerpc, PACASOFTIRQEN(r13), PACAHARDIRQEN(r13) and the SOFTE() in the stackframe. Plus there is all of this complicated logic on trap entry and exit to manage these three values properly. local_irq_restore() doesn't look like a simple piece of code either. Logically it should be simple, update the software binary state, and if enabling see if any interrupts came in while we were disable so we can run them. Given all of that, is it really cheaper than just flipping the bit in the cpu control register? :-/ ___ Virtualization mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Herbert Xu [EMAIL PROTECTED] Date: Tue, 8 May 2007 22:13:22 +1000 [NET] link_watch: Move link watch list into net_device These days the link watch mechanism is an integral part of the network subsystem as it manages the carrier status. So it now makes sense to allocate some memory for it in net_device rather than allocating it on demand. In fact, this is necessary because we can't tolerate a memory allocation failure since that means we'd have to potentially throw a link up event away. It also simplifies the code greatly. In doing so I discovered a subtle race condition in the use of singleevent. This race condition still exists (and is somewhat magnified) without singleevent but it's now plugged thanks to an smp_mb__before_clear_bit. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Thu, 10 May 2007 15:00:05 -0700 Herbert Xu wrote: [NET] link_watch: Move link watch list into net_device These days the link watch mechanism is an integral part of the network subsystem as it manages the carrier status. So it now makes sense to allocate some memory for it in net_device rather than allocating it on demand. I think there's a problem with one of these two patches. Yes, there are :-) Did you catch the follow-on bug fixes? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Thu, 10 May 2007 15:22:17 -0700 Andrew Morton wrote: Five minutes after boot is when jiffies wraps. Are you sure it's a list-screwup rather than a jiffy-wrap screwup? Hm, its suggestive, isn't it? Apparently they've already fixed this in the sekret networking clubhouse, so I'll need to track it down. I'm not so certain now that we know it's the jiffies wrap point :-) The fixes in question are attached below and they were posted and discussed on netdev: commit fe47cdba83b3041e4ac1aa1418431020a4afe1e0 Author: Herbert Xu [EMAIL PROTECTED] Date: Tue May 8 23:22:43 2007 -0700 [NET] link_watch: Eliminate potential delay on wrap-around When the jiffies wrap around or when the system boots up for the first time, down events can be delayed indefinitely since we no longer update linkwatch_nextevent when only urgent events are processed. This patch fixes this by setting linkwatch_nextevent when a wrap-around occurs. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/core/link_watch.c b/net/core/link_watch.c index b5f4579..4674ae5 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -101,8 +101,10 @@ static void linkwatch_schedule_work(unsigned long delay) return; /* If we wrap around we'll delay it by at most HZ. */ - if (delay HZ) + if (delay HZ) { + linkwatch_nextevent = jiffies; delay = 0; + } schedule_delayed_work(linkwatch_work, delay); } commit 4cba637dbb9a13706494a1c85174c8e736914010 Author: Herbert Xu [EMAIL PROTECTED] Date: Wed May 9 00:17:30 2007 -0700 [NET] link_watch: Always schedule urgent events Urgent events may be delayed if we already have a non-urgent event queued for that device. This patch changes this by making sure that an urgent event is always looked at immediately. I've replaced the LW_RUNNING flag by LW_URGENT since whether work is scheduled is already kept track by the work queue system. The only complication is that we have to provide some exclusion for the setting linkwatch_nextevent which is available in the actual work function. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 4674ae5..a5e372b 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -26,7 +26,7 @@ enum lw_bits { - LW_RUNNING = 0, + LW_URGENT = 0, }; static unsigned long linkwatch_flags; @@ -95,18 +95,41 @@ static void linkwatch_add_event(struct net_device *dev) } -static void linkwatch_schedule_work(unsigned long delay) +static void linkwatch_schedule_work(int urgent) { - if (test_and_set_bit(LW_RUNNING, linkwatch_flags)) + unsigned long delay = linkwatch_nextevent - jiffies; + + if (test_bit(LW_URGENT, linkwatch_flags)) return; - /* If we wrap around we'll delay it by at most HZ. */ - if (delay HZ) { - linkwatch_nextevent = jiffies; + /* Minimise down-time: drop delay for up event. */ + if (urgent) { + if (test_and_set_bit(LW_URGENT, linkwatch_flags)) + return; delay = 0; } - schedule_delayed_work(linkwatch_work, delay); + /* If we wrap around we'll delay it by at most HZ. */ + if (delay HZ) + delay = 0; + + /* +* This is true if we've scheduled it immeditately or if we don't +* need an immediate execution and it's already pending. +*/ + if (schedule_delayed_work(linkwatch_work, delay) == !delay) + return; + + /* Don't bother if there is nothing urgent. */ + if (!test_bit(LW_URGENT, linkwatch_flags)) + return; + + /* It's already running which is good enough. */ + if (!cancel_delayed_work(linkwatch_work)) + return; + + /* Otherwise we reschedule it again for immediate exection. */ + schedule_delayed_work(linkwatch_work, 0); } @@ -123,7 +146,11 @@ static void __linkwatch_run_queue(int urgent_only) */ if (!urgent_only) linkwatch_nextevent = jiffies + HZ; - clear_bit(LW_RUNNING, linkwatch_flags); + /* Limit wrap-around effect on delay. */ + else if (time_after(linkwatch_nextevent, jiffies + HZ)) + linkwatch_nextevent = jiffies; + + clear_bit(LW_URGENT, linkwatch_flags); spin_lock_irq(lweventlist_lock); next = lweventlist; @@ -166,7 +193,7 @@ static void __linkwatch_run_queue(int urgent_only) } if (lweventlist) - linkwatch_schedule_work(linkwatch_nextevent - jiffies); +
Re: [1/2] [NET] link_watch: Move link watch list into net_device
From: Jeremy Fitzhardinge [EMAIL PROTECTED] Date: Thu, 10 May 2007 15:45:42 -0700 David Miller wrote: I'm not so certain now that we know it's the jiffies wrap point :-) The fixes in question are attached below and they were posted and discussed on netdev: Yep, this patch gets rid of my spinning thread. I can't find this patch or any discussion on marc.info; is there a better netdev list archive? I don't see it there either... let me check my mail archive... Indeed, they were posted to netdev but were blocked by the vger regexp filters on the keyword urgent so that postings never made it to the list. I removed that filter regexp so that never happens again, sorry. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] skb_partial_csum_set
From: Rusty Russell [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 21:41:55 +1100 Implement skb_partial_csum_set, for setting partial csums on untrusted packets. Use it in virtio_net (replacing buggy version there), it's also going to be used by TAP for partial csum support. Signed-off-by: Rusty Russell [EMAIL PROTECTED] Looks fine to me. Acked-by: David S. Miller [EMAIL PROTECTED] If you like I can merge this into my net-2.6.25 tree, or alternatively if it makes your life easier you then you can handle it yourself. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 4/5] tun: vringfd xmit support.
From: Rusty Russell [EMAIL PROTECTED] Date: Mon, 7 Apr 2008 17:24:51 +1000 On Monday 07 April 2008 15:13:44 Herbert Xu wrote: On second thought, this is not going to work. The network stack can clone individual pages out of this skb and put it into a new skb. Therefore whatever scheme we come up with will either need to be page-based, or add a flag to tell the network stack that it can't clone those pages. Erk... I'll put in the latter for now. A page-level solution is not really an option: if userspace hands us mmaped pages for example. Keep in mind that the core of the TCP stack really depends upon being able to slice and dice paged SKBs as is pleases in order to send packets out. In fact, it also does such splitting during SACK processing. It really is a base requirement for efficient TSO support. Otherwise the above operations would be so incredibly expensive we might as well rip all of the TSO support out. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/5] /dev/vring: simple userspace-kernel ringbuffer interface.
From: Rusty Russell [EMAIL PROTECTED] Date: Sun, 20 Apr 2008 02:41:14 +1000 If only there were some kind of, I don't know... summit... for kernel people... I'm starting to disbelieve the myth that because we can discuss technical issues on mailing lists, we should talk primarily about process issues during the kernel summit. There is a distinct advantage to discussing and hashing things out in person. You can't say screw you, your idea sucks when you're face to face with the other person, whereas online it's way too easy. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [6/6] [VIRTIO] net: Allow receiving SG packets
From: Rusty Russell [EMAIL PROTECTED] Date: Tue, 22 Apr 2008 05:06:16 +1000 I'm not sure what the right number is here. Say worst case is header which goes over a page boundary then MAX_SKB_FRAGS in the skb, but for some reason that already has a +2: /* To allow 64K frame to be packed as single skb without frag_list */ #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) Unless someone explains, I'll change the xmit sg to 2+MAX_SKB_FRAGS as well. MAX_SKB_FRAGS + 1 is what you ought to need. MAX_SKB_FRAGS is only accounting for the skb frag pages. If you want to know how many segments skb-data might consume as well, you have to add one. skb-data is linear, therefore it's not possible to need more than one scatterlist entry for it. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [6/6] [VIRTIO] net: Allow receiving SG packets
From: Rusty Russell [EMAIL PROTECTED] Date: Tue, 22 Apr 2008 12:50:27 +1000 But I was curious as to why the +2 in the MAX_SKB_FRAGS definition? To be honest I have no idea. When Alexey added the TSO changeset way back then, it had the +2, from the history-2.6 tree: commit 80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf Author: Alexey Kuznetsov [EMAIL PROTECTED] Date: Wed Aug 28 11:52:03 2002 -0700 [NET]: Add TCP segmentation offload core infrastructure. ... diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a812681..9b6e6ad 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -109,7 +109,8 @@ struct sk_buff_head { struct sk_buff; -#define MAX_SKB_FRAGS 6 +/* To allow 64K frame to be packed as single skb without frag_list */ +#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) typedef struct skb_frag_struct skb_frag_t; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/5] Remove now unused structs from kvm_para.h
You sent these patches to kvm-owner, ie. the mailing list owner, and not the list itself which would be plain kvm. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] tun: Interface to query tun/tap features.
From: Max Krasnyansky [EMAIL PROTECTED] Date: Tue, 01 Jul 2008 21:59:02 -0700 Dave, do you want me to put all outstanding TUN patches into a git tree so that you can pull them in one shot ? Otherwise if you're ok with applying them one by one please apply this one. Acked-by: Max Krasnyansky [EMAIL PROTECTED] I'll apply Rusty's patches after I give them a review too. Thanks Max. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] tun: Interface to query tun/tap features.
From: Rusty Russell [EMAIL PROTECTED] Date: Thu, 3 Jul 2008 11:32:12 +1000 The problem with introducing checksum offload and gso to tun is they need to set dev-features to enable GSO and/or checksumming, which is supposed to be done before register_netdevice(), ie. as part of TUNSETIFF. ... Signed-off-by: Rusty Russell [EMAIL PROTECTED] Applied to net-next-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: David Miller [EMAIL PROTECTED] Date: Mon, 14 Jul 2008 22:16:02 -0700 (PDT) It doesn't apply cleanly to net-next-2.6, as I just tried to stick this into my tree. Ignore this, I did something stupid. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Max Krasnyansky [EMAIL PROTECTED] Date: Sat, 12 Jul 2008 01:52:54 -0700 This is on top of the latest and greatest :). Assuming virt folks are ok with the API this should go into 2.6.27. Really? :-) It doesn't apply cleanly to net-next-2.6, as I just tried to stick this into my tree. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] tun: Fix/rewrite packet filtering logic
From: Jeff Garzik [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 19:41:47 -0400 looks mostly OK, but stuff like the above should be (void __user *) arg Did you check this with sparse (Documentation/sparse.txt)? Jeff, I already added this particular patch to the tree a week or so ago. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] tun: TUNGETIFF interface to query name and flags
From: Max Krasnyansky [EMAIL PROTECTED] Date: Fri, 15 Aug 2008 11:00:19 -0700 Rusty Russell wrote: On Thursday 14 August 2008 00:30:16 Mark McLoughlin wrote: A very simple approach is attached; I did consider doing a TUNGETFLAGS that would return tun-flags, but I think it's nicer to have a companion to TUNGETIFF since it also allows one to query the interface name from the file descriptor. This seems really sensible to me. If Max acks it, I'd say Dave should merge it. Makes perfect sense to me. Definitely Ack. It has zero impact on existing user and I'd be ok if this goes in during .27-rc series. I've applied Mark's patch, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: large tx MTU support
From: Mark McLoughlin [EMAIL PROTECTED] Date: Wed, 26 Nov 2008 13:58:11 + We don't really have a max tx packet size limit, so allow configuring the device with up to 64k tx MTU. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] Rusty, ACK? If so, I'll toss this into net-next-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest-host communication.
From: Gleb Natapov g...@redhat.com Date: Sun, 14 Dec 2008 13:50:55 +0200 It is undesirable to use TCP/IP for this purpose since network connectivity may not exist between host and guest and if it exists the traffic can be not routable between host and guest for security reasons or TCP/IP traffic can be firewalled (by mistake) by unsuspecting VM user. I don't really accept this argument, sorry. If you can't use TCP because it might be security protected or misconfigured, adding this new stream protocol thing is not one bit better. It doesn't make any sense at all. Also, if TCP could be misconfigured this new thing could just as easily be screwed up too. And I wouldn't be surprised to see a whole bunch of SELINUX and netfilter features proposed later for this and then we're back to square one. You guys really need to rethink this. Either a stream protocol is a workable solution to your problem, or it isn't. And don't bring up any virtualization is special because... arguments into your reply because virtualization has nothing to do with my objections stated above. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest-host communication.
From: Anthony Liguori anth...@codemonkey.ws Date: Mon, 15 Dec 2008 09:02:23 -0600 There is already an AF_IUCV for s390. This is a scarecrow and irrelevant to this discussion. And this is exactly why I asked that any arguments in this thread avoid talking about virtualization technology and why it's special. This proposed patch here is asking to add new infrastructure for hypervisor facilities that will be _ADDED_ and for which we have complete control over. Whereas the S390 folks have to deal with existing infrastructure which is largely outside of their control. So if they implement access mechanisms for that, it's fine. I would be doing the same thing if I added a protocol socket layer for accessing the Niagara hypervisor virtualization channels. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] AF_VMCHANNEL address family for guest-host communication.
From: Anthony Liguori anth...@codemonkey.ws Date: Mon, 15 Dec 2008 14:44:26 -0600 We want this communication mechanism to be simple and reliable as we want to implement the backends drivers in the host userspace with minimum mess. One implication of your statement here is that TCP is unreliable. That's absolutely not true. Within the guest, we need the interface to be always available and we need an addressing scheme that is hypervisor specific. Yes, we can build this all on top of TCP/IP. We could even build it on top of a serial port. Both have their down-sides wrt reliability and complexity. I don't know of any zero-copy through the hypervisor mechanisms for serial ports, but I know we do that with the various virtualization network devices. Do you have another recommendation? I don't have to make alternative recommendations until you can show that what we have can't solve the problem acceptably, and TCP emphatically can. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [2/2] tun: Fix sk_sleep races when attaching/detaching
From: Herbert Xu herb...@gondor.apana.org.au Date: Mon, 20 Apr 2009 16:35:50 +0800 On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote: tun: Fix sk_sleep races when attaching/detaching That patch doesn't apply anymore because of contextual changes caused by the first patch. Here's an update. tun: Fix sk_sleep races when attaching/detaching Do you think these two patches are ready to go into net-2.6 now? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell ru...@rustcorp.com.au Date: Mon, 18 May 2009 22:18:47 +0930 We check for finished xmit skbs on every xmit, or on a timer (unless the host promises to force an interrupt when the xmit ring is empty). This can penalize userspace tasks which fill their sockbuf. Not much difference with TSO, but measurable with large numbers of packets. There are a finite number of packets which can be in the transmission queue. We could fire the timer more than every 100ms, but that would just hurt performance for a corner case. This seems neatest. ... Signed-off-by: Rusty Russell ru...@rustcorp.com.au If this is so great for virtio it would also be a great idea universally, but we don't do it. What you're doing by orphan'ing is creating a situation where a single UDP socket can loop doing sends and monopolize the TX queue of a device. The only control we have over a sender for fairness in datagram protocols is that send buffer allocation. I'm guilty of doing this too in the NIU driver, also because there I lack a TX queue empty interrupt and this can keep TCP sockets from getting stuck. I think we need a generic solution to this issue because it is getting quite common to see cases where the packets in the TX queue of a device can sit there indefinitely. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 21 May 2009 16:27:05 +0930 On Tue, 19 May 2009 12:10:13 pm David Miller wrote: What you're doing by orphan'ing is creating a situation where a single UDP socket can loop doing sends and monopolize the TX queue of a device. The only control we have over a sender for fairness in datagram protocols is that send buffer allocation. Urgh, that hadn't even occurred to me. Good point. Now this all is predicated on this actually mattering. :-) You could argue that the scheduler as well as the size of the TX queue should be limiting and enforcing fairness. Someone really needs to test this. Just skb_orphan() every packet at the beginning of dev_hard_start_xmit(), then run some test program with two clients looping out UDP packets to see if one can monopolize the device and get a significantly larger amount of TX resources than the other. Repeat for 3, 4, 5, etc. clients. I haven't thought this through properly, but how about a hack where we don't orphan packets if the ring is over half full? That would also work. And for the NIU case this would be great because I DO have a marker bit for triggering interrupts in the TX descriptors. There's just no all empty interrupt on TX (who designs these things? :( ). Then I guess we could overload the watchdog as a more general timer-after-no- xmit? Yes, but it means that teardown of a socket can be delayed up to the amount of that timer. Factor in all of this crazy round_jiffies() stuff people do these days and it could cause pauses for real use cases and drive users batty. Probably the most profitable avenue is to see if this is a real issue afterall (see above). If we can get away with having the socket buffer represent socket -- device space only, that's the most ideal solution. It will probably also improve performance a lot across the board, especially on NUMA/SMP boxes as our TX complete events tend to be in difference places than the SKB producer. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Patrick Ohly patrick.o...@intel.com Date: Mon, 01 Jun 2009 21:47:22 +0200 On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote: This patch adds skb_orphan to the start of dev_hard_start_xmit(): it can be premature in the NETDEV_TX_BUSY case, but that's uncommon. Would it be possible to make the new skb_orphan() at the start of dev_hard_start_xmit() conditionally so that it is not executed for packets that are to be time stamped? As discussed before (http://article.gmane.org/gmane.linux.network/121378/), the skb-sk socket pointer is required for sending back the send time stamp from inside the device driver. Calling skb_orphan() unconditionally as in this patch would break the hardware time stamping of outgoing packets. Indeed, we need to check that case, at a minimum. And there are other potentially other problems. For example, I wonder how this interacts with the new TX MMAP af_packet support in net-next-2.6 :-/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Rusty Russell ru...@rustcorp.com.au Date: Tue, 2 Jun 2009 23:38:29 +0930 On Tue, 2 Jun 2009 04:55:53 pm David Miller wrote: From: Patrick Ohly patrick.o...@intel.com Date: Mon, 01 Jun 2009 21:47:22 +0200 On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote: This patch adds skb_orphan to the start of dev_hard_start_xmit(): it can be premature in the NETDEV_TX_BUSY case, but that's uncommon. Would it be possible to make the new skb_orphan() at the start of dev_hard_start_xmit() conditionally so that it is not executed for packets that are to be time stamped? As discussed before (http://article.gmane.org/gmane.linux.network/121378/), the skb-sk socket pointer is required for sending back the send time stamp from inside the device driver. Calling skb_orphan() unconditionally as in this patch would break the hardware time stamping of outgoing packets. Indeed, we need to check that case, at a minimum. And there are other potentially other problems. For example, I wonder how this interacts with the new TX MMAP af_packet support in net-next-2.6 :-/ I think I'll do this in the driver for now, and let's revisit doing it generically later? That might be the best course of action for the time being. This whole area is a rat's nest. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 4 Jun 2009 13:24:57 +0930 On Thu, 4 Jun 2009 06:32:53 am Eric Dumazet wrote: Also, taking a reference on socket for each xmit packet in flight is very expensive, since it slows down receiver in __udp4_lib_lookup(). Several cpus are fighting for sk-refcnt cache line. Now we have decent dynamic per-cpu, we can finally implement bigrefs. More obvious for device counts than sockets, but perhaps applicable here as well? It might be very beneficial for longer lasting, active, connections, but for high connection rates it's going to be a lose in my estimation. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Eric Dumazet eric.duma...@gmail.com Date: Thu, 04 Jun 2009 06:54:24 +0200 We also can avoid the sock_put()/sock_hold() pair for each tx packet, to only touch sk_wmem_alloc (with appropriate atomic_sub_return() in sock_wfree() and atomic_dec_test in sk_free We could initialize sk-sk_wmem_alloc to one instead of 0, so that sock_wfree() could just synchronize itself with sk_free() Excellent idea Eric. Patch will follow after some testing I look forward to it :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu herb...@gondor.apana.org.au Date: Fri, 3 Jul 2009 15:55:30 +0800 Calling skb_orphan like this should be forbidden. Apart from the problems already raised, it is a sign that the driver is trying to paper over a more serious issue of not cleaning up skb's timely. Yes skb_orphan will work for the cases where calling the skb destructor allows forward progress, but for the cases where you really need to the skb to be freed (e.g., iSCSI or Xen), this simply doesn't work. So anytime someone tries to propose such a solution it is a sign that they have bigger problems. Agreed, but alas we are foaming at the mouth until we have a truly usable alternative. In particular the case of handling a device without usable TX completion event indications is still quite troublesome. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu herb...@gondor.apana.org.au Date: Sat, 4 Jul 2009 11:08:30 +0800 On Fri, Jul 03, 2009 at 08:02:54PM -0700, David Miller wrote: In particular the case of handling a device without usable TX completion event indications is still quite troublesome. e Which particular devices do you have in mind? NIU I basically can't defer interrupts because the chip supports per-TX-desc interrupt indications but it lacks an all TX queue sent event. So if, say, tell it to interrupt every 1/4 of the TX queue then up to 1/4 of the queue can have packets stuck in there if TX activity all of a sudden ceases. The only thing I've come up with to be able to mitigate interrupts is to use an hrtimer of some sort. But that's going to be hard to get right, and who knows what kind of latencies will be introduced for TX completion packet freeing unless I am very carefull. And finally this belongs in generic code, not in the NIU driver, whatever we come up with. Especially since my understanding is that this is similar to what Rusty needs. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] net/bridge: Add 'hairpin' port forwarding mode
From: Fischer, Anna anna.fisc...@hp.com Date: Thu, 13 Aug 2009 16:55:16 + This patch adds a 'hairpin' (also called 'reflective relay') mode port configuration to the Linux Ethernet bridge kernel module. A bridge supporting hairpin forwarding mode can send frames back out through the port the frame was received on. Hairpin mode is required to support basic VEPA (Virtual Ethernet Port Aggregator) capabilities. You can find additional information on VEPA here: http://tech.groups.yahoo.com/group/evb/ http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf http://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode' is provided to allow configuring hairpin mode from userspace tools. Signed-off-by: Paul Congdon paul.cong...@hp.com Signed-off-by: Anna Fischer anna.fisc...@hp.com Applied to net-next-2.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit
From: Herbert Xu herb...@gondor.apana.org.au Date: Wed, 19 Aug 2009 13:19:26 +1000 I'm in the process of repeating the same experiment with cxgb3 which hopefully should let me turn interrupts off on descriptors while still reporting completion status. Ok, I look forward to seeing your work however it turns out. Once I see what you've done, I'll give it a spin on niu. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Shreyas Bhatewara sbhatew...@vmware.com Date: Mon, 28 Sep 2009 16:56:45 -0700 + uint32_t rxdIdx:12;/* Index of the RxDesc */ Don't use uint32_t et al. sized types, use u32 and friends throughout. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2.6.32-rc1] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Stephen Hemminger shemmin...@vyatta.com Date: Wed, 30 Sep 2009 17:39:23 -0700 Why not use NETIF_F_LRO and ethtool to control LRO support? In fact, you must, in order to handle bridging and routing correctly. Bridging and routing is illegal with LRO enabled, so the kernel automatically issues the necessary ethtool commands to disable LRO in the relevant devices. Therefore you must support the ethtool LRO operation in order to support LRO at all. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2.6.32-rc1] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Shreyas Bhatewara sbhatew...@vmware.com Date: Wed, 30 Sep 2009 14:34:57 -0700 (PDT) +{ + struct vmxnet3_adapter *adapter = netdev_priv(netdev); + u8 *base; + int i; + + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS); + + /* this does assume each counter is 64-bit wide */ + + base = (u8 *)adapter-tqd_start-stats; + for (i = 0; i ARRAY_SIZE(vmxnet3_tq_dev_stats); i++) + *buf++ = *(u64 *)(base + vmxnet3_tq_dev_stats[i].offset); + + base = (u8 *)adapter-tx_queue.stats; + for (i = 0; i ARRAY_SIZE(vmxnet3_tq_driver_stats); i++) + *buf++ = *(u64 *)(base + vmxnet3_tq_driver_stats[i].offset); + + base = (u8 *)adapter-rqd_start-stats; There's a lot of code like this that isn't indented properly. Either that or your email client has corrupted the patch by breaking up long lines or similar. Another example: +static int +vmxnet3_set_rx_csum(struct net_device *netdev, u32 val) +{ + struct vmxnet3_adapter *adapter = netdev_priv(netdev); + + if (adapter-rxcsum != val) { + adapter-rxcsum = val; + if (netif_running(netdev)) { + if (val) + adapter-shared-devRead.misc.uptFeatures |= + UPT1_F_RXCSUM; + else + adapter-shared-devRead.misc.uptFeatures = + ~UPT1_F_RXCSUM; + + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, + VMXNET3_CMD_UPDATE_FEATURE); + } + } + return 0; +} Yikes! :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2.6.32-rc4] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Shreyas Bhatewara sbhatew...@vmware.com Date: Mon, 12 Oct 2009 15:18:42 -0700 (PDT) Ethernet NIC driver for VMware's vmxnet3 From: Shreyas Bhatewara sbhatew...@vmware.com This patch adds driver support for VMware's virtual Ethernet NIC: vmxnet3 Guests running on VMware hypervisors supporting vmxnet3 device will thus have access to improved network functionalities and performance. Signed-off-by: Shreyas Bhatewara sbhatew...@vmware.com Signed-off-by: Bhavesh Davda bhav...@vmware.com Signed-off-by: Ronghua Zhang rong...@vmware.com Ok, looks good, applied to net-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv7 1/3] tun: export underlying socket
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 3 Nov 2009 19:24:00 +0200 Assuming it's okay with davem, I think it makes sense to merge this patch through Rusty's tree because vhost is the first user of the new interface. Posted here for completeness. I'm fine with that, please add my: Acked-by: David S. Miller da...@davemloft.net ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv3 0/4] macvlan: add vepa and bridge mode
From: Patrick McHardy ka...@trash.net Date: Thu, 26 Nov 2009 17:26:17 +0100 Arnd Bergmann wrote: Version 2 description: The patch to iproute2 has not changed, so I'm not including it this time. Patch 4/4 (the netlink interface) is basically unchanged as well but included for completeness. The other changes have moved forward a bit, to the point where I find them a lot cleaner and am more confident in the code being ready for inclusion. The implementation hardly resembles Erics original patch now, so I've dropped his signed-off-by. Please take a look and ack if you are happy so we can get it into 2.6.33. Looks good to me, nice work. Acked-by: Patrick McHardy ka...@trash.net for the entire series. All applied to net-next-2.6, thanks everyone! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC] macvlan: add tap device backend
From: Arnd Bergmann a...@arndb.de Date: Mon, 14 Dec 2009 13:00:36 +0100 c) prepare a combined patch for net-next.git, or This is probably fine. I'll be taking patches into net-next-2.6 right after Linus releases 2.6.33-rc1. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] virtio net improvements
From: Rusty Russell ru...@rustcorp.com.au Date: Wed, 3 Feb 2010 09:57:06 +1030 On Fri, 29 Jan 2010 11:46:43 pm Rusty Russell wrote: Hi Dave, Nice driver optimization from Shirley, but requires a new virtio hook. Do you want to take both? I have nothing else overlapping it. Dave, any news on this? Just slowly creeping up the backlog :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] virtio: Add ability to detach unused buffers from vrings
From: Rusty Russell ru...@rustcorp.com.au Date: Fri, 29 Jan 2010 23:49:05 +1030 From: Shirley Ma mashi...@us.ibm.com There's currently no way for a virtio driver to ask for unused buffers, so it has to keep a list itself to reclaim them at shutdown. This is redundant, since virtio_ring stores that information. So add a new hook to do this. Signed-off-by: Shirley Ma x...@us.ibm.com Signed-off-by: Amit Shah amit.s...@redhat.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio_net: Defer skb allocation in receive path Date: Wed, 13 Jan 2010 12:53:38 -0800
From: Rusty Russell ru...@rustcorp.com.au Date: Fri, 29 Jan 2010 23:50:04 +1030 From: Shirley Ma mashi...@us.ibm.com virtio_net receives packets from its pre-allocated vring buffers, then it delivers these packets to upper layer protocols as skb buffs. So it's not necessary to pre-allocate skb for each mergable buffer, then frees extra skbs when buffers are merged into a large packet. This patch has deferred skb allocation in receiving packets for both big packets and mergeable buffers to reduce skb pre-allocations and skb frees. It frees unused buffers by calling detach_unused_buf in vring, so recv skb queue is not needed. Signed-off-by: Shirley Ma x...@us.ibm.com Signed-off-by: Rusty Russell ru...@rustcorp.com.au Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/3 v4] macvtap driver
From: Arnd Bergmann a...@arndb.de Date: Sat, 30 Jan 2010 23:22:15 +0100 This is the fourth version of the macvtap driver, based on the comments I got for the last version I got a few days ago. Very few changes: * release netdev in chardev open function so we can destroy it properly. * Implement TUNSETSNDBUF * fix sleeping call in rcu_read_lock * Fix comment in namespace isolation patch * Fix small context difference to make it apply to net-next I can't really test here while travelling, so please give it a go if you're interested in this driver. All applied to net-next-2.6, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: switch to smp barriers
From: Michael S. Tsirkin m...@redhat.com Date: Sat, 13 Feb 2010 19:39:11 +0200 Dave, I see it's marked not applicable: http://patchwork.ozlabs.org/patch/44207/ the patch applies to net-next as of b3b3f04fb587ecb61b5baa6c1c5f0e666fd12d73. Can this be queued up please? Should I resubmit with Rusty's ack? Sorry about that, I must have thought Rusty would queue it up. I'll fix the state to under-review and process it in my backlog. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling
Just for the record I'm generally not interested in vhost patches. If it's a specific network one that will be merged via the networking tree, yes please CC: me. But if it's a bunch of changes to vhost.c and other pieces of infrastructure, feel free to leave me out of it. It just clutters my already overflowing inbox. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 24 Feb 2010 07:37:37 +0200 Dave, so while Rusty's on vacation, what's the best way to get vhost infrastructure fixes in? Are you ok with getting pull requests and merging them into net-next? That should keep the clutter in your inbox to the minimum. Of course network changes would still go the usual way. Well, who is providing oversight of vhost work while he's gone? Has he, implicitly or explicitly, appointed a maintainer while he's away? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] vhost: fix get_user_pages_fast error handling
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 24 Feb 2010 09:34:25 +0200 Implicitly, I guess. He said if there's an issue Michael Tsirkin is the best person to resolve it, this was wrt merging his virtiolguest tree. He didn't mention vhost, I wrote all of vhost though, there shouldn't be an issue with that. That's good enough for me. Feel free to setup a tree for me to pull from. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] vhost-net fixes for 2.6.34
From: Michael S. Tsirkin m...@redhat.com Date: Sun, 28 Feb 2010 20:44:40 +0200 The following changes since commit 655ffee284dfcf9a24ac0343f3e5ee6db85b85c5: Jiri Pirko (1): wireless: convert to use netdev_for_each_mc_addr are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost Pulled, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] vhost-net fixes for issues in 2.6.34-rc1
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 18 Mar 2010 11:53:55 +0200 The following tree includes patches fixing issues with vhost-net in 2.6.34-rc1. Please pull them for 2.6.34. Pulled, thanks a lot. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] vhost-net fix for 2.6.34-rc3
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 7 Apr 2010 20:35:02 +0300 David, The following tree includes a patch fixing an issue with vhost-net in 2.6.34-rc3. Please pull for 2.6.34. Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: David Miller da...@davemloft.net Date: Mon, 03 May 2010 15:07:29 -0700 (PDT) From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. Nevermind, reverted. Do you even compile test what you send to people? drivers/net/macvtap.c: In function ‘macvtap_ioctl’: drivers/net/macvtap.c:713: warning: control reaches end of non-void function You're really batting 1000 today Michael... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 00:32:45 +0300 The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] amended: first round of vhost-net enhancements for net-next
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 4 May 2010 14:21:01 +0300 This is an amended pull request: I have rebased the tree to the correct patches. This has been through basic tests and seems to work fine here. The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Michael S. Tsirkin m...@redhat.com Date: Mon, 28 Jun 2010 13:08:07 +0300 Userspace virtio server has the following hack so guests rely on it, and we have to replicate it, too: Use port number to detect incoming IPv4 DHCP response packets, and fill in the checksum for these. The issue we are solving is that on linux guests, some apps that use recvmsg with AF_PACKET sockets, don't know how to handle CHECKSUM_PARTIAL; The interface to return the relevant information was added in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, and older userspace does not use it. One important user of recvmsg with AF_PACKET is dhclient, so we add a work-around just for DHCP. Don't bother applying the hack to IPv6 as userspace virtio does not have a work-around for that - let's hope guests will do the right thing wrt IPv6. Signed-off-by: Michael S. Tsirkin m...@redhat.com Yikes, this is awful too. Nothing in the kernel should be mucking around with procotol packets like this by default. In particular, what the heck does port 67 mean? Locally I can use it for whatever I want for my own purposes, I don't have to follow the conventions for service ports as specified by the IETF. But I can't have the packet checksum state be left alone for port 67 traffic on a box using virtio because you have this hack there. And yes it's broken on machines using the qemu thing, but at least the hack there is restricted to userspace. I really don't want anything in the kernel that looks like this. These applications are broken, and we've provided a way for them to work properly. What's the point of having fixed applications if all of these hacks grow like fungus over every virtualization transport? It just means that people won't fix the apps, since they don't have to. There is no incentive, and the mechanism we created to properly handle this loses it's value. At best, you can write a netfilter module that mucks up the packet checksum state in these situations. At least in that case, you can make it generic (it mangles iff a packet matches a certain rule, so for your virtio guests you'd make it match for DHCP frames) instead of being some hard-coded DHCP thing by design. And since this is so cleanly seperated and portable you don't even need to push it upstream. It's a temporary workaround for a temporary problem. You can just delete it as soon as the majority of guests have the fixed dhcp. The qemu crap should disappear similarly. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 29 Jun 2010 16:04:39 +0300 Since using the module involves updating the management tools as well, if we go down this route it will be much less painful for everyone to do push it upstream. Ok, you can make your case to Patrick McHardy and if he'll merge it into his netfilter GIT tree I guess I'll have to take it :) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: more error handling fixes
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 1 Jul 2010 19:41:27 +0300 David, The following tree includes more fixes dealing with error handling in vhost-net. It is on top of net-2.6. Please merge it for 2.6.35. Thanks! The following changes since commit 38000a94a902e94ca8b5498f7871c6316de8957a: sky2: enable rx/tx in sky2_phy_reinit() (2010-06-23 14:37:04 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Michael S. Tsirkin (2): vhost: break out of polling loop on error vhost: add unlikely annotations to error path Pulled. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar pthak...@vmware.com Date: Wed, 14 Jul 2010 10:18:22 -0700 The plugin is guest agnostic and hence we did not want to rely on any kernel provided functions. While I disagree entirely with this kind of approach, even that doesn't justify what you're doing here. memcpy() and memset() are on a much more fundamental ground than kernel provided functions. They had better be available no matter where you build this thing. And doing what you're doing is foolish on so many levels. One more duplication of code, one more place for unnecessary bugs to live, one more place that might need optimizations and thus require duplication of even more work people have done over the years. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL] vhost-net fixes
From: Michael S. Tsirkin m...@redhat.com Date: Fri, 16 Jul 2010 15:25:30 +0300 David, please pull the following fixes for 2.6.35. Thanks! The following changes since commit 91a72a70594e5212c97705ca6a694bd307f7a26b: net/core: neighbour update Oops (2010-07-14 18:02:16 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Michael S. Tsirkin (2): vhost-net: avoid flush under lock vhost: avoid pr_err on condition guest can trigger Pulled, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.36
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 28 Jul 2010 16:32:31 +0300 The following changes since commit 4cfa580e7eebb8694b875d2caff3b989ada2efac: r6040: Fix args to phy_mii_ioctl(). (2010-07-21 21:10:49 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: 2.6.36 regression fixes
From: Michael S. Tsirkin m...@redhat.com Date: Mon, 6 Sep 2010 14:36:06 +0300 The following tree includes more regression fixes for vhost-net in 2.6.36. It is on top of net-2.6. Please merge it for 2.6.36. Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: fix range checking
From: Michael S. Tsirkin m...@redhat.com Date: Mon, 20 Sep 2010 19:42:22 +0200 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Pulled, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.37
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 5 Oct 2010 20:27:32 +0200 It looks like it was a quiet cycle for vhost-net: probably because most of energy was spent on bugfixes that went in for 2.6.36. People are working on multiqueue, tracing but I'm not sure it'll get done in time for 2.6.37 - so here's a tree with a single patch that helps windows guests which we definitely want in the next kernel. Please merge for 2.6.37. Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: access_ok fix
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 19 Oct 2010 16:59:01 +0200 David, Not sure if it's too late for 2.6.36 - in case it's not, the following tree includes a last minute bugfix for vhost-net, found by code inspection. It is on top of net-2.6. Thanks! The following changes since commit b0057c51db66c5f0f38059f242c57d61c4741d89: tg3: restore rx_dropped accounting (2010-10-11 16:06:24 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Even though it's too late, I've pulled this. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: rcu fixup
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 25 Nov 2010 14:23:01 +0200 Please merge the following fix for 2.6.36. Thanks! The following changes since commit a27e13d370415add3487949c60810e36069a23a6: econet: fix CVE-2010-3848 (2010-11-24 11:51:47 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Michael S. Tsirkin (1): vhost/net: fix rcu check usage Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-2.6] vhost-net: logging fixup
From: Michael S. Tsirkin m...@redhat.com Date: Sun, 12 Dec 2010 12:09:43 +0200 Please merge the following fix for 2.6.37. It is also applicable to -stable. Thanks! The following changes since commit a19faf0250e09b16cac169354126404bc8aa342b: net: fix skb_defer_rx_timestamp() (2010-12-10 16:20:56 -0800) Pulled, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [GIT PULL net-next-2.6] vhost-net: tools, cleanups, optimizations
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 14 Dec 2010 14:23:26 +0200 On Mon, Dec 13, 2010 at 12:44:13PM +0200, Michael S. Tsirkin wrote: Please merge the following tree for 2.6.38. Thanks! Rusty Acked it as is, so please pull the below. Thanks very much! The following changes since commit ad1184c6cf067a13e8cb2a4e7ccc407f947027d0: net: au1000_eth: remove unused global variable. (2010-12-11 12:01:48 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next Pulled, thanks a lot. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PULL] vhost-net: 2.6.38 - warning fix
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 1 Feb 2011 17:44:40 +0200 git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net Pulled, thanks Michael. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: Add schedule check to napi_enable call
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 10 Feb 2011 19:57:26 +0200 On Thu, Feb 10, 2011 at 12:32:50PM +1030, Rusty Russell wrote: From: Bruce Rogers brog...@novell.com Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de Cc: sta...@kernel.org Signed-off-by: Rusty Russell ru...@rustcorp.com.au Rusty, so this is 2.6.38 material - you'll send this to Linus? Or DaveM? Don't worry I'll apply this to net-2.6, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: Add schedule check to napi_enable call
From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 10 Feb 2011 12:32:50 +1030 From: Bruce Rogers brog...@novell.com Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Olaf Kirch o...@suse.de Cc: sta...@kernel.org Signed-off-by: Rusty Russell ru...@rustcorp.com.au Applied, thanks everyone. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: copy_from_user - __copy_from_user
From: Michael S. Tsirkin m...@redhat.com Date: Sun, 6 Mar 2011 13:33:49 +0200 copy_from_user is pretty high on perf top profile, replacing it with __copy_from_user helps. It's also safe because we do access_ok checks during setup. Signed-off-by: Michael S. Tsirkin m...@redhat.com Is Rusty going to take this or should I? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PULL net-2.6] vhost: cleanups and fixes
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 17 Mar 2011 16:04:04 +0200 The following changes since commit 1fc050a13473348f5c439de2bb41c8e92dba5588: ipv4: Cache source address in nexthop entries. (2011-03-07 20:54:48 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next Jason Wang (3): vhost-net: check the support of mergeable buffer outside the receive loop vhost-net: Unify the code of mergeable and big buffer handling vhost: lock receive queue, not the socket Krishna Kumar (1): vhost: Cleanup vhost.c and net.c Michael S. Tsirkin (2): vhost: copy_from_user - __copy_from_user vhost-net: remove unlocked use of receive_queue Pulled, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: convert to hw_features
From: Michał Mirosław mirq-li...@rere.qmqm.pl Date: Thu, 31 Mar 2011 13:01:35 +0200 (CEST) Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH RESEND] net: convert xen-netfront to hw_features
From: Michał Mirosław mirq-li...@rere.qmqm.pl Date: Thu, 31 Mar 2011 13:01:35 +0200 (CEST) Not tested in any way. The original code for offload setting seems broken as it resets the features on every netback reconnect. This will set GSO_ROBUST at device creation time (earlier than connect time). RX checksum offload is forced on - so advertise as it is. Signed-off-by: Michał Mirosław mirq-li...@rere.qmqm.pl Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH RESEND] net: convert xen-netfront to hw_features
From: Ian Campbell ian.campb...@eu.citrix.com Date: Mon, 4 Apr 2011 13:29:19 +0100 From 0b56469abe56efae415b4603ef508ce9aec0e4c1 Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Mon, 4 Apr 2011 10:58:50 +0100 Subject: [PATCH] xen: netfront: assume all hw features are available until backend connection setup We need to assume that all features will be available when registering the netdev otherwise they are ommitted from the initial set of dev-wanted_features. When we connect to the backed we reduce the set as necessary due to the call to netdev_update_features() in xennet_connect(). Signed-off-by: Ian Campbell ian.campb...@citrix.com I've applied this, thanks Ian. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Signed bit field; int have_hotplug_status_watch:1
From: Ian Campbell ian.campb...@eu.citrix.com Date: Mon, 4 Apr 2011 09:26:24 +0100 Subject: [PATCH] xen: netback: use unsigned type for one-bit bitfield. Fixes error from sparse: CHECK drivers/net/xen-netback/xenbus.c drivers/net/xen-netback/xenbus.c:29:40: error: dubious one-bit signed bitfield int have_hotplug_status_watch:1; Reported-by: Dr. David Alan Gilbert li...@treblig.org Signed-off-by: Ian Campbell ian.campb...@citrix.com Applied to net-next-2.6, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] xen: drop anti-dependency on X86_VISWS
From: Ian Campbell ian.campb...@eu.citrix.com Date: Mon, 4 Apr 2011 10:55:55 +0100 You mean the !X86_VISWS I presume? It doesn't make sense to me either. No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC dependency. And, well, you could type make allmodconfig on your tree and see for yourself instead of asking me :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv2 00/14] virtio and vhost-net performance enhancements
From: Michael S. Tsirkin m...@redhat.com Date: Fri, 20 May 2011 02:10:07 +0300 Rusty, I think it will be easier to merge vhost and virtio bits in one go. Can it all go in through your tree (Dave in the past acked sending a very similar patch through you so should not be a problem)? And in case you want an explicit ack for the net bits: Acked-by: David S. Miller da...@davemloft.net :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID
From: Michael S. Tsirkin m...@redhat.com Date: Fri, 10 Jun 2011 14:28:28 +0300 On Fri, Jun 10, 2011 at 06:56:17PM +0800, Jason Wang wrote: There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. This wasn't required by the spec, but maybe it should be. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang jasow...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com Applied to net-next-2.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-net: per cpu 64 bit stats (v2)
From: Stephen Hemminger shemmin...@vyatta.com Date: Wed, 15 Jun 2011 12:36:29 -0400 Use per-cpu variables to maintain 64 bit statistics. Signed-off-by: Stephen Hemminger shemmin...@vyatta.com I'll apply this, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/4] xen/netback: Add module alias for autoloading
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Thu, 30 Jun 2011 12:39:54 -0400 On Wed, Jun 29, 2011 at 02:41:32PM +0200, Bastian Blank wrote: Add xen-backend:vif module alias to the xen-netback module. This allows automatic loading of the module. Dave, Could you queue this up for 3.1 please? I've the other two patches in my tree for 3.1 and the block patch ready for Jens. Done. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: Large Patch Series in Email
From: Michael Witten mfwit...@gmail.com Date: Fri, 15 Jul 2011 22:55:29 - On Sat, 16 Jul 2011 00:09:03 +0300, Dan Carpenter wrote: On Fri, Jul 15, 2011 at 06:25:55PM -, Michael Witten wrote: Do not send more than 15 patches at once to the vger mailing lists!!! ... Don't be a whinge bucket. Or be respectful of bandwidth, differing email environments, and the official guidelines for submitting patches, which I will quote again: If you cannot condense your patch set into a smaller set of patches, then only post say 15 or so at a time and wait for review and integration. ... Do not send more than 15 patches at once to the vger mailing lists!!! Indeed, it really sucks when people send huge patch sets, do not do it. If the official SubmittingPatches document isn't convincing enough, then maybe me (the vger postmaster) saying it will. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv9] vhost: experimental tx zero-copy support
From: Michael S. Tsirkin m...@redhat.com Date: Sun, 17 Jul 2011 22:36:14 +0300 The below is what I came up with. We add the feature enabled by default ... s/enabled/disabled/ Well, at least you got it right in the commit message where it counts :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCHv11] vhost: vhost TX zero-copy support
From: Michael S. Tsirkin m...@redhat.com Date: Mon, 18 Jul 2011 16:48:46 +0300 From: Shirley Ma mashi...@us.ibm.com This adds experimental zero copy support in vhost-net, disabled by default. To enable, set experimental_zcopytx module option to 1. This patch maintains the outstanding userspace buffers in the sequence it is delivered to vhost. The outstanding userspace buffers will be marked as done once the lower device buffers DMA has finished. This is monitored through last reference of kfree_skb callback. Two buffer indices are used for this purpose. The vhost-net device passes the userspace buffers info to lower device skb through message control. DMA done status check and guest notification are handled by handle_tx: in the worst case is all buffers in the vq are in pending/done status, so we need to notify guest to release DMA done buffers first before we get any new buffers from the vq. One known problem is that if the guest stops submitting buffers, buffers might never get used until some further action, e.g. device reset. This does not seem to affect linux guests. Signed-off-by: Shirley x...@us.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH repost] Fix panic in virtnet_remove
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 20 Jul 2011 17:31:15 +0300 On Wed, Jul 20, 2011 at 07:26:02PM +0530, Krishna Kumar wrote: Fix a panic in virtnet_remove. unregister_netdev has already freed up the netdev (and virtnet_info) due to dev-destructor being set, while virtnet_info is still required. Remove virtnet_free altogether, and move the freeing of the per-cpu statistics from virtnet_free to virtnet_remove. Tested patch below. Signed-off-by: Krishna Kumar krkum...@in.ibm.com Also note that the crash was apparently introduced by 3fa2a1df909482cc234524906e4bd30dee3514df in net-next, so this is a net-next only patch. Stephen, was there any special reason to free the memory in the destructor like you did? Acked-by: Michael S. Tsirkin m...@redhat.com Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PULL net] vhost-net: zercopy mode fixes
From: Michael S. Tsirkin m...@redhat.com Date: Fri, 22 Jul 2011 09:00:46 +0300 The following includes vhost-net fixes - both in the experimental zero copy mode. Please pull for 3.1. Thanks! Where is this the following? I don't see any GIT url to pull from or anything :-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PULL net (try 2)] vhost-net: zercopy mode fixes
From: Michael S. Tsirkin m...@redhat.com Date: Fri, 22 Jul 2011 09:32:38 +0300 Fixing a corrupted pull request sent earlier. Sorry about the noise! The following includes vhost-net fixes - both in the experimental zero copy mode. Please pull for 3.1. Pulled, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC 0/0] Introducing a generic socket offload framework
I'm not reading any RFC without any example code, sorry. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH net-next v2] enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
From: r...@tardy.cup.hp.com (Rick Jones) Date: Mon, 14 Nov 2011 16:17:08 -0800 (PST) From: Rick Jones rick.jon...@hp.com Add a new .bus_name to virtio_config_ops then modify virtio_net to call through to it in an ethtool .get_drvinfo routine to report bus_info in ethtool -i output which is consistent with other emulated NICs and the output of lspci. Signed-off-by: Rick Jones rick.jon...@hp.com --- The changes to drivers/lguest/lguest_device.c, drivers/s390/kvm/kvm_virtio.c, and drivers/virtio/virtio_mmio.c code inspected only, not compiled. Applied, thanks Rick. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first
From: Krishna Kumar2 krkum...@in.ibm.com Date: Fri, 25 Nov 2011 09:39:11 +0530 Jason Wang jasow...@redhat.com wrote on 11/25/2011 08:51:57 AM: My description is not clear again :( I mean the same vhost thead: vhost thread #0 transmits packets of flow A on processor M ... vhost thread #0 move to another process N and start to transmit packets of flow A Thanks for clarifying. Yes, binding vhosts to CPU's makes the incoming packet go to the same vhost each time. BTW, are you doing any binding and/or irqbalance when you run your tests? I am not running either at this time, but thought both might be useful. So are we going with this patch or are we saying that vhost binding is a requirement? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: Acquire device lock when releasing device
From: Sasha Levin levinsasha...@gmail.com Date: Fri, 18 Nov 2011 11:19:42 +0200 Device lock should be held when releasing a device, and specifically when calling vhost_dev_cleanup(). Otherwise, RCU complains about it: ... Cc: Michael S. Tsirkin m...@redhat.com Cc: k...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: net...@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com Michael et al., are you guys going to gather this fix or should I apply it directly to thet net tree? Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first
From: Michael S. Tsirkin m...@redhat.com Date: Wed, 7 Dec 2011 18:10:02 +0200 On Fri, Nov 25, 2011 at 01:35:52AM -0500, David Miller wrote: From: Krishna Kumar2 krkum...@in.ibm.com Date: Fri, 25 Nov 2011 09:39:11 +0530 Jason Wang jasow...@redhat.com wrote on 11/25/2011 08:51:57 AM: My description is not clear again :( I mean the same vhost thead: vhost thread #0 transmits packets of flow A on processor M ... vhost thread #0 move to another process N and start to transmit packets of flow A Thanks for clarifying. Yes, binding vhosts to CPU's makes the incoming packet go to the same vhost each time. BTW, are you doing any binding and/or irqbalance when you run your tests? I am not running either at this time, but thought both might be useful. So are we going with this patch or are we saying that vhost binding is a requirement? OK we didn't come to a conclusion so I would be inclined to merge this patch as is for 3.2, and revisit later. One question though: do these changes affect userspace in any way? For example, will this commit us to ensure that a single flow gets a unique hash even for strange configurations that transmit the same flow from multiple cpus? Once you sort this out, reply with an Acked-by: for me, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected
From: Laszlo Ersek ler...@redhat.com Date: Fri, 9 Dec 2011 12:38:58 +0100 These two together provide complete ordering. Sub-condition (1) is satisfied by pvops commit 43223efd9bfd. I don't see this commit in Linus's tree, so I doubt it's valid for me to apply this as a bug fix to my 'net' tree since the precondition pvops commit isn't upstream as far as I can tell. Where did you intend me to apply this patch, and how did you expect the dependent commit to make it's way into the tree so that this fix is complete? BTW, you should always explicitly specificy the answers to all the questions in the previous paragraph, otherwise (like right now) we go back and forth wasting time establishing these facts. The way to say which tree the patch is intended for is to specify it in the Subject like, f.e. [PATCH net-next v3 REPOST] ... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected
From: Ian Campbell ian.campb...@citrix.com Date: Fri, 9 Dec 2011 21:23:00 + On Fri, 2011-12-09 at 18:45 +, David Miller wrote: From: Laszlo Ersek ler...@redhat.com Date: Fri, 9 Dec 2011 12:38:58 +0100 These two together provide complete ordering. Sub-condition (1) is satisfied by pvops commit 43223efd9bfd. I don't see this commit in Linus's tree, The referenced commit is in git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#xen/next-2.6.32 some people call the pvops tree but there's no reason to expect someone outside the Xen world to know that... A better reference would have been 6b0b80ca7165 in git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history which is the precise branch that was flattened to make f942dc2552b8, which is the upstream commit that added netback, so this change is already in upstream. I want the commit message fixed so someone seeing the commit ID can figure out what it actually refers to. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first
From: Michael S. Tsirkin m...@redhat.com Date: Tue, 20 Dec 2011 13:15:12 +0200 On Wed, Dec 07, 2011 at 01:52:35PM -0500, David Miller wrote: Once you sort this out, reply with an Acked-by: for me, thanks. Acked-by: Michael S. Tsirkin m...@redhat.com Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v1 1/2] virtio_net: Pass gfp flags when allocating rx buffers.
From: Rusty Russell ru...@rustcorp.com.au Date: Thu, 05 Jan 2012 10:40:02 +1030 On Wed, 04 Jan 2012 14:52:32 -0800, Mike Waychison mi...@google.com wrote: Currently, the refill path for RX buffers will always allocate the buffers as GFP_ATOMIC, even if we are in process context. This will fail to apply memory pressure as the worker thread will not contribute to the freeing of memory. Fix this by changing add_recvbuf_small to use the gfp variant allocator, __netdev_alloc_skb_ip_align(). Signed-off-by: Mike Waychison mi...@google.com OK, this is a no-brainer. Thanks! Dave, can you pick this up? Acked-by: Rusty Russell ru...@rustcorp.com.au Applied. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: add module alias (v2.1)
From: Stephen Hemminger shemmin...@vyatta.com Date: Wed, 11 Jan 2012 21:30:38 -0800 Subject: vhost-net: add module alias (v2.1) By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger shemmin...@vyatta.com ACKs, NACKs? What is happening here? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: add module alias (v2.1)
From: Kay Sievers kay.siev...@vrfy.org Date: Fri, 13 Jan 2012 05:19:05 +0100 On Fri, Jan 13, 2012 at 05:07, David Miller da...@davemloft.net wrote: From: Stephen Hemminger shemmin...@vyatta.com Date: Wed, 11 Jan 2012 21:30:38 -0800 Subject: vhost-net: add module alias (v2.1) By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger shemmin...@vyatta.com ACKs, NACKs? What is happening here? In general, static minors are acceptable and very useful to make on-demand loading of kernel modules working. They should be used only for single-instance devices though, which usually means: One single static device name associated with a module. That looks all fine here, and for what it's worth: Acked-By: Kay Sievers kay.siev...@vrfy.org Ok, applied, thanks everyone. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost-net: add module alias (v2.1)
From: Stephen Hemminger shemmin...@vyatta.com Date: Mon, 16 Jan 2012 07:52:36 -0800 On Mon, 16 Jan 2012 12:26:45 + Alan Cox a...@linux.intel.com wrote: ACKs, NACKs? What is happening here? I would like an Ack from Alan Cox who switched vhost-net to a dynamic minor in the first place, in commit 79907d89c397b8bc2e05b347ec94e928ea919d33. Sorry dev...@lanana.org isn't yet back from the kernel hack incident. I don't read netdev so someone needs to summarise the issue and send me a copy of the patch to look at. Alan Subject: vhost-net: add module alias (v2.1) By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger shemmin...@vyatta.com I already applied your first patch, so you need to give me something relative to apply on top of your original one. And it also shows that you're really not generating these patches against current 'net', otherwise you'd have noticed your other patch already there. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization