from:"David Miller"

Re: [PATCH RFC 4/5] tun: vringfd xmit support.

2008-04-07 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Mon, 7 Apr 2008 17:24:51 +1000

> On Monday 07 April 2008 15:13:44 Herbert Xu wrote:
> > On second thought, this is not going to work.  The network stack
> > can clone individual pages out of this skb and put it into a new
> > skb.  Therefore whatever scheme we come up with will either need
> > to be page-based, or add a flag to tell the network stack that it
> > can't clone those pages.
> 
> Erk... I'll put in the latter for now.   A page-level solution is not really 
> an option: if userspace hands us mmaped pages for example.

Keep in mind that the core of the TCP stack really depends
upon being able to slice and dice paged SKBs as is pleases
in order to send packets out.

In fact, it also does such splitting during SACK processing.

It really is a base requirement for efficient TSO support.
Otherwise the above operations would be so incredibly
expensive we might as well rip all of the TSO support out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/5] /dev/vring: simple userspace-kernel ringbuffer interface.

2008-04-19 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Sun, 20 Apr 2008 02:41:14 +1000

> If only there were some kind of, I don't know... summit... for kernel 
> people... 

I'm starting to disbelieve the myth that because we can discuss
technical issues on mailing lists, we should talk primarily about
process issues during the kernel summit.

There is a distinct advantage to discussing and hashing things out in
person.  You can't say "screw you, your idea sucks" when you're face
to face with the other person, whereas online it's way too easy.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [6/6] [VIRTIO] net: Allow receiving SG packets

2008-04-21 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Tue, 22 Apr 2008 05:06:16 +1000

> I'm not sure what the right number is here.  Say worst case is header which 
> goes over a page boundary then MAX_SKB_FRAGS in the skb, but for some reason 
> that already has a +2:
> 
> /* To allow 64K frame to be packed as single skb without frag_list */
> #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
> 
> Unless someone explains, I'll change the xmit sg to 2+MAX_SKB_FRAGS as well.

MAX_SKB_FRAGS + 1 is what you ought to need.

MAX_SKB_FRAGS is only accounting for the skb frag pages.
If you want to know how many segments skb->data might
consume as well, you have to add one.

skb->data is linear, therefore it's not possible to need
more than one scatterlist entry for it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [6/6] [VIRTIO] net: Allow receiving SG packets

2008-04-21 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Tue, 22 Apr 2008 12:50:27 +1000

> But I was curious as to why the +2 in the MAX_SKB_FRAGS definition?

To be honest I have no idea.

When Alexey added the TSO changeset way back then, it had the
"+2", from the history-2.6 tree:

commit 80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf
Author: Alexey Kuznetsov <[EMAIL PROTECTED]>
Date:   Wed Aug 28 11:52:03 2002 -0700

[NET]: Add TCP segmentation offload core infrastructure.

 ...
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a812681..9b6e6ad 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -109,7 +109,8 @@ struct sk_buff_head {
 
 struct sk_buff;
 
-#define MAX_SKB_FRAGS 6
+/* To allow 64K frame to be packed as single skb without frag_list */
+#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
 
 typedef struct skb_frag_struct skb_frag_t;
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] Remove now unused structs from kvm_para.h

2008-06-03 Thread David Miller


You sent these patches to "kvm-owner", ie. the mailing list owner, and
not the list itself which would be plain "kvm".

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] tun: Interface to query tun/tap features.

2008-07-01 Thread David Miller

From: Max Krasnyansky <[EMAIL PROTECTED]>
Date: Tue, 01 Jul 2008 21:59:02 -0700

> Dave, do you want me to put all outstanding TUN patches into a git tree so
> that you can pull them in one shot ?
> Otherwise if you're ok with applying them one by one please apply this one.
> 
> Acked-by: Max Krasnyansky <[EMAIL PROTECTED]>

I'll apply Rusty's patches after I give them a review too.

Thanks Max.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/3] tun: Allow GSO using virtio_net_hdr

2008-07-03 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Thu, 3 Jul 2008 11:34:14 +1000

> Add a IFF_VNET_HDR flag.  This uses the same ABI as virtio_net (ie. prepending
> struct virtio_net_hdr to packets) to indicate GSO and checksum information.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

Also applied to net-next-2.6
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] tun: Interface to query tun/tap features.

2008-07-03 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Thu, 3 Jul 2008 11:32:12 +1000

> The problem with introducing checksum offload and gso to tun is they
> need to set dev->features to enable GSO and/or checksumming, which is
> supposed to be done before register_netdevice(), ie. as part of
> TUNSETIFF.
 ...
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

Applied to net-next-2.6, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/3] tun: TUNSETFEATURES to set gso features.

2008-07-03 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Thu, 3 Jul 2008 11:33:11 +1000

> ethtool is useful for setting (some) device fields, but it's
> root-only.  Finer feature control is available through a tun-specific
> ioctl.
> 
> (Includes Mark McLoughlin <[EMAIL PROTECTED]>'s fix to hold rtnl sem).
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

Applied to net-next-2.6
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-14 Thread David Miller

From: David Miller <[EMAIL PROTECTED]>
Date: Mon, 14 Jul 2008 22:16:02 -0700 (PDT)

> It doesn't apply cleanly to net-next-2.6, as I just tried to
> stick this into my tree.

Ignore this, I did something stupid.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-14 Thread David Miller

From: Max Krasnyansky <[EMAIL PROTECTED]>
Date: Sat, 12 Jul 2008 01:52:54 -0700

> This is on top of the latest and greatest :). Assuming virt folks are ok with
> the API this should go into 2.6.27.

Really? :-)

It doesn't apply cleanly to net-next-2.6, as I just tried to
stick this into my tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-22 Thread David Miller

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Tue, 22 Jul 2008 19:41:47 -0400

> looks mostly OK, but stuff like the above should be
> 
>   (void __user *) arg
> 
> Did you check this with sparse (Documentation/sparse.txt)?

Jeff, I already added this particular patch to the tree
a week or so ago.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] tun: Fix/rewrite packet filtering logic

2008-07-22 Thread David Miller

From: Max Krasnyansky <[EMAIL PROTECTED]>
Date: Tue, 22 Jul 2008 21:45:30 -0700

> Jeff Garzik wrote:
> > David Miller wrote:
> >> Jeff, I already added this particular patch to the tree
> >> a week or so ago.
> > 
> > Yeah, later on in my queue were the fixes.
> 
> I'm not sure I'm following. What fixes ? Are you talking about fixing sparse
> warnings or something else ?

He's talking about sparse fixes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] tun: TUNGETIFF interface to query name and flags

2008-08-15 Thread David Miller

From: Max Krasnyansky <[EMAIL PROTECTED]>
Date: Fri, 15 Aug 2008 11:00:19 -0700

> Rusty Russell wrote:
> > On Thursday 14 August 2008 00:30:16 Mark McLoughlin wrote:
> >> A very simple approach is attached; I did consider doing a TUNGETFLAGS
> >> that would return tun->flags, but I think it's nicer to have a companion
> >> to TUNGETIFF since it also allows one to query the interface name from
> >> the file descriptor.
> > 
> > This seems really sensible to me.
> > 
> > If Max acks it, I'd say Dave should merge it.
> 
> Makes perfect sense to me.
> Definitely Ack. It has zero impact on existing user and I'd be ok if this goes
>  in during .27-rc series.

I've applied Mark's patch, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_net: large tx MTU support

2008-11-26 Thread David Miller

From: Mark McLoughlin <[EMAIL PROTECTED]>
Date: Wed, 26 Nov 2008 13:58:11 +

> We don't really have a max tx packet size limit, so allow configuring
> the device with up to 64k tx MTU.
> 
> Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>

Rusty, ACK?

If so, I'll toss this into net-next-2.6, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.

2008-12-14 Thread David Miller

From: Gleb Natapov 
Date: Sun, 14 Dec 2008 13:50:55 +0200

> It is undesirable to use TCP/IP for this purpose since network
> connectivity may not exist between host and guest and if it exists the
> traffic can be not routable between host and guest for security reasons
> or TCP/IP traffic can be firewalled (by mistake) by unsuspecting VM user.

I don't really accept this argument, sorry.

If you can't use TCP because it might be security protected or
misconfigured, adding this new stream protocol thing is not one
bit better.  It doesn't make any sense at all.

Also, if TCP could be "misconfigured" this new thing could just as
easily be screwed up too.  And I wouldn't be surprised to see a whole
bunch of SELINUX and netfilter features proposed later for this and
then we're back to square one.

You guys really need to rethink this.  Either a stream protocol is a
workable solution to your problem, or it isn't.

And don't bring up any "virtualization is special because..."
arguments into your reply because virtualization has nothing to do
with my objections stated above.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.

2008-12-15 Thread David Miller

From: Gleb Natapov 
Date: Mon, 15 Dec 2008 09:48:19 +0200

> On Sun, Dec 14, 2008 at 10:44:36PM -0800, David Miller wrote:
> > You guys really need to rethink this.  Either a stream protocol is a
> > workable solution to your problem, or it isn't.
>
> Stream protocol is workable solution for us, but we need it out of band
> in regard to networking and as much zero config as possible. If we will
> use networking how can it be done without additional configuration (and
> reconfiguration can be required after migration BTW)

You miss the whole point and you also missed the part where I said
(and the one part of my comments you conveniently did NOT quote):

And don't bring up any "virtualization is special because..."
arguments into your reply because virtualization has nothing to do
with my objections stated above.

What part of that do you not understand?  Don't give me this
junk about zero config, it's not a plausible argument against
anything I said.

You want to impose a new burdon onto the kernel in the form of a whole
new socket layer.  When existing ones can solve any communications
problem.

Performance is not a good argument because we have (repeatedly) made
TCP/IP go fast in just about any environment.

If you have a configuration problem, you can solve it in userspace in
a number of different ways.  Building on top of things we have and the
user need not know anything about that.

I would even be OK with features such as "permanent" links or special
attributes for devices or IP addresses that by default prevent
tampering and filtering by things like netfilter.

But not this new thing that duplicates existing functionality, no way.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.

2008-12-15 Thread David Miller

From: Anthony Liguori 
Date: Mon, 15 Dec 2008 09:02:23 -0600

> There is already an AF_IUCV for s390.

This is a scarecrow and irrelevant to this discussion.

And this is exactly why I asked that any arguments in this thread
avoid talking about virtualization technology and why it's "special."

This proposed patch here is asking to add new infrastructure for
hypervisor facilities that will be _ADDED_ and for which we have
complete control over.

Whereas the S390 folks have to deal with existing infrastructure which
is largely outside of their control.  So if they implement access
mechanisms for that, it's fine.

I would be doing the same thing if I added a protocol socket layer for
accessing the Niagara hypervisor virtualization channels.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.

2008-12-15 Thread David Miller

From: Anthony Liguori 
Date: Mon, 15 Dec 2008 14:44:26 -0600

> We want this communication mechanism to be simple and reliable as we
> want to implement the backends drivers in the host userspace with
> minimum mess.

One implication of your statement here is that TCP is unreliable.
That's absolutely not true.

> Within the guest, we need the interface to be always available and
> we need an addressing scheme that is hypervisor specific.  Yes, we
> can build this all on top of TCP/IP.  We could even build it on top
> of a serial port.  Both have their down-sides wrt reliability and
> complexity.

I don't know of any zero-copy through the hypervisor mechanisms for
serial ports, but I know we do that with the various virtualization
network devices.

> Do you have another recommendation?

I don't have to make alternative recommendations until you can
show that what we have can't solve the problem acceptably, and
TCP emphatically can.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] AF_VMCHANNEL address family for guest<->host communication.

2008-12-15 Thread David Miller

From: Anthony Liguori 
Date: Mon, 15 Dec 2008 17:01:14 -0600

> No, TCP falls under the not simple category because it requires the
> backend to have access to a TCP/IP stack.

I'm at a loss for words if you need TCP in the hypervisor, if that's
what you're implying here.

You only need it in the guest and the host, which you already have,
in the Linux kernel.  Just transport that over virtio or whatever
and be done with it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [2/2] tun: Fix sk_sleep races when attaching/detaching

2009-04-20 Thread David Miller

From: Herbert Xu 
Date: Mon, 20 Apr 2009 16:35:50 +0800

> On Thu, Apr 16, 2009 at 07:09:52PM +0800, Herbert Xu wrote:
>> 
>> tun: Fix sk_sleep races when attaching/detaching
> 
> That patch doesn't apply anymore because of contextual changes
> caused by the first patch.  Here's an update.
> 
> tun: Fix sk_sleep races when attaching/detaching

Do you think these two patches are ready to go into net-2.6
now?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [2/2] tun: Fix sk_sleep races when attaching/detaching

2009-04-20 Thread David Miller

From: Herbert Xu 
Date: Mon, 20 Apr 2009 17:35:49 +0800

> On Mon, Apr 20, 2009 at 02:26:35AM -0700, David Miller wrote:
>> 
>> Do you think these two patches are ready to go into net-2.6
>> now?
> 
> I think so.

Great, applied, thanks Herbert.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-18 Thread David Miller

From: Rusty Russell 
Date: Mon, 18 May 2009 22:18:47 +0930

> We check for finished xmit skbs on every xmit, or on a timer (unless
> the host promises to force an interrupt when the xmit ring is empty).
> This can penalize userspace tasks which fill their sockbuf.  Not much
> difference with TSO, but measurable with large numbers of packets.
> 
> There are a finite number of packets which can be in the transmission
> queue.  We could fire the timer more than every 100ms, but that would
> just hurt performance for a corner case.  This seems neatest.
 ...
> Signed-off-by: Rusty Russell 

If this is so great for virtio it would also be a great idea
universally, but we don't do it.

What you're doing by orphan'ing is creating a situation where a single
UDP socket can loop doing sends and monopolize the TX queue of a
device.  The only control we have over a sender for fairness in
datagram protocols is that send buffer allocation.

I'm guilty of doing this too in the NIU driver, also because there I
lack a "TX queue empty" interrupt and this can keep TCP sockets from
getting stuck.

I think we need a generic solution to this issue because it is getting
quite common to see cases where the packets in the TX queue of a
device can sit there indefinitely.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [RFC] virtio: orphan skbs if we're relying on timer to free them

2009-05-21 Thread David Miller

From: Rusty Russell 
Date: Thu, 21 May 2009 16:27:05 +0930

> On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
>> What you're doing by orphan'ing is creating a situation where a single
>> UDP socket can loop doing sends and monopolize the TX queue of a
>> device.  The only control we have over a sender for fairness in
>> datagram protocols is that send buffer allocation.
> 
> Urgh, that hadn't even occurred to me.  Good point.

Now this all is predicated on this actually mattering. :-)

You could argue that the scheduler as well as the size of the
TX queue should be limiting and enforcing fairness.

Someone really needs to test this.  Just skb_orphan() every packet
at the beginning of dev_hard_start_xmit(), then run some test
program with two clients looping out UDP packets to see if one
can monopolize the device and get a significantly larger amount
of TX resources than the other.  Repeat for 3, 4, 5, etc. clients.

> I haven't thought this through properly, but how about a hack where
> we don't orphan packets if the ring is over half full?

That would also work.  And for the NIU case this would be great
because I DO have a marker bit for triggering interrupts in the TX
descriptors.  There's just no "all empty" interrupt on TX (who
designs these things? :( ).

> Then I guess we could overload the watchdog as a more general
> timer-after-no- xmit?

Yes, but it means that teardown of a socket can be delayed up to
the amount of that timer.  Factor in all of this crazy
round_jiffies() stuff people do these days and it could cause
pauses for real use cases and drive users batty.

Probably the most profitable avenue is to see if this is a real issue
afterall (see above).  If we can get away with having the socket
buffer represent socket --> device space only, that's the most ideal
solution.  It will probably also improve performance a lot across the
board, especially on NUMA/SMP boxes as our TX complete events tend to
be in difference places than the SKB producer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-02 Thread David Miller

From: Patrick Ohly 
Date: Mon, 01 Jun 2009 21:47:22 +0200

> On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote:
>> This patch adds skb_orphan to the start of dev_hard_start_xmit(): it
>> can be premature in the NETDEV_TX_BUSY case, but that's uncommon.
> 
> Would it be possible to make the new skb_orphan() at the start of
> dev_hard_start_xmit() conditionally so that it is not executed for
> packets that are to be time stamped?
> 
> As discussed before
> (http://article.gmane.org/gmane.linux.network/121378/), the skb->sk
> socket pointer is required for sending back the send time stamp from
> inside the device driver. Calling skb_orphan() unconditionally as in
> this patch would break the hardware time stamping of outgoing packets.

Indeed, we need to check that case, at a minimum.

And there are other potentially other problems.  For example, I
wonder how this interacts with the new TX MMAP af_packet support
in net-next-2.6 :-/

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-02 Thread David Miller

From: Rusty Russell 
Date: Tue, 2 Jun 2009 23:38:29 +0930

> On Tue, 2 Jun 2009 04:55:53 pm David Miller wrote:
>> From: Patrick Ohly 
>> Date: Mon, 01 Jun 2009 21:47:22 +0200
>>
>> > On Fri, 2009-05-29 at 23:44 +0930, Rusty Russell wrote:
>> >> This patch adds skb_orphan to the start of dev_hard_start_xmit(): it
>> >> can be premature in the NETDEV_TX_BUSY case, but that's uncommon.
>> >
>> > Would it be possible to make the new skb_orphan() at the start of
>> > dev_hard_start_xmit() conditionally so that it is not executed for
>> > packets that are to be time stamped?
>> >
>> > As discussed before
>> > (http://article.gmane.org/gmane.linux.network/121378/), the skb->sk
>> > socket pointer is required for sending back the send time stamp from
>> > inside the device driver. Calling skb_orphan() unconditionally as in
>> > this patch would break the hardware time stamping of outgoing packets.
>>
>> Indeed, we need to check that case, at a minimum.
>>
>> And there are other potentially other problems.  For example, I
>> wonder how this interacts with the new TX MMAP af_packet support
>> in net-next-2.6 :-/
> 
> I think I'll do this in the driver for now, and let's revisit doing it 
> generically later?

That might be the best course of action for the time being.
This whole area is a rat's nest.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-03 Thread David Miller

From: Rusty Russell 
Date: Thu, 4 Jun 2009 13:24:57 +0930

> On Thu, 4 Jun 2009 06:32:53 am Eric Dumazet wrote:
>> Also, taking a reference on socket for each xmit packet in flight is very
>> expensive, since it slows down receiver in __udp4_lib_lookup(). Several
>> cpus are fighting for sk->refcnt cache line.
> 
> Now we have decent dynamic per-cpu, we can finally implement bigrefs.  More 
> obvious for device counts than sockets, but perhaps applicable here as well?

It might be very beneficial for longer lasting, active, connections, but
for high connection rates it's going to be a lose in my estimation.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-06-03 Thread David Miller

From: Eric Dumazet 
Date: Thu, 04 Jun 2009 06:54:24 +0200

> We also can avoid the sock_put()/sock_hold() pair for each tx packet,
> to only touch sk_wmem_alloc (with appropriate atomic_sub_return() in 
> sock_wfree()
> and atomic_dec_test in sk_free
> 
> We could initialize sk->sk_wmem_alloc to one instead of 0, so that
> sock_wfree() could just synchronize itself with sk_free()

Excellent idea Eric.

> Patch will follow after some testing

I look forward to it :-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-07-03 Thread David Miller

From: Herbert Xu 
Date: Fri, 3 Jul 2009 15:55:30 +0800

> Calling skb_orphan like this should be forbidden.  Apart from the
> problems already raised, it is a sign that the driver is trying to
> paper over a more serious issue of not cleaning up skb's timely.
> 
> Yes skb_orphan will work for the cases where calling the skb
> destructor allows forward progress, but for the cases where you
> really need to the skb to be freed (e.g., iSCSI or Xen), this
> simply doesn't work.
> 
> So anytime someone tries to propose such a solution it is a sign
> that they have bigger problems.

Agreed, but alas we are foaming at the mouth until we have a truly
usable alternative.

In particular the case of handling a device without usable TX
completion event indications is still quite troublesome.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-07-03 Thread David Miller

From: Herbert Xu 
Date: Sat, 4 Jul 2009 11:08:30 +0800

> On Fri, Jul 03, 2009 at 08:02:54PM -0700, David Miller wrote:
>>
>> In particular the case of handling a device without usable TX
>> completion event indications is still quite troublesome.
> e
> Which particular devices do you have in mind?

NIU

I basically can't defer interrupts because the chip supports
per-TX-desc interrupt indications but it lacks an "all TX queue sent"
event.  So if, say, tell it to interrupt every 1/4 of the TX queue
then up to 1/4 of the queue can have packets "stuck" in there
if TX activity all of a sudden ceases.

The only thing I've come up with to be able to mitigate interrupts is
to use an hrtimer of some sort.  But that's going to be hard to get
right, and who knows what kind of latencies will be introduced for TX
completion packet freeing unless I am very carefull.

And finally this belongs in generic code, not in the NIU driver,
whatever we come up with.  Especially since my understanding is that
this is similar to what Rusty needs.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] net/bridge: Add 'hairpin' port forwarding mode

2009-08-13 Thread David Miller

From: "Fischer, Anna" 
Date: Thu, 13 Aug 2009 16:55:16 +

> This patch adds a 'hairpin' (also called 'reflective relay') mode
> port configuration to the Linux Ethernet bridge kernel module.
> A bridge supporting hairpin forwarding mode can send frames back
> out through the port the frame was received on.
> 
> Hairpin mode is required to support basic VEPA (Virtual
> Ethernet Port Aggregator) capabilities.
> 
> You can find additional information on VEPA here:
> http://tech.groups.yahoo.com/group/evb/
> http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf
> http://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf
> 
> An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode'
> is provided to allow configuring hairpin mode from userspace tools.
> 
> Signed-off-by: Paul Congdon 
> Signed-off-by: Anna Fischer 

Applied to net-next-2.6
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-08-17 Thread David Miller

From: Herbert Xu 
Date: Sun, 5 Jul 2009 11:34:08 +0800

> Here's an even crazier idea that doesn't use dummy descriptors.
> 
> xmit(skb)
> 
>   if (TX queue contains no interrupting descriptor &&
>   qdisc is empty)
>   mark TX descriptor as interrupting
> 
>   if (TX queue now contains an interrupting descriptor &&
>   qdisc len < 2)
>   stop queue
> 
>   if (TX ring full)
>   stop queue
> 
> clean()
> 
>   do work
>   wake queue as per usual

I'm pretty sure that for normal TCP and UDP workloads, this is just
going to set the interrupt bit on the first packet that gets into the
queue, and then not in the rest.

TCP just loops over packets in the send queue, and at initial state
the qdisc will be empty.

It's very hard to get this to work as well as if we had a real
queue empty interrupt status event.

Even if you get upstream status from the protocols saying "there's
more packets coming" via some flag in the SKB, that only says what one
client feeding the TX ring is about to do.

It says nothing about other threads of control which are about to start
feeding packets to the same device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] net: skb_orphan on dev_hard_start_xmit

2009-08-18 Thread David Miller

From: Herbert Xu 
Date: Wed, 19 Aug 2009 13:19:26 +1000

> I'm in the process of repeating the same experiment with cxgb3
> which hopefully should let me turn interrupts off on descriptors
> while still reporting completion status.

Ok, I look forward to seeing your work however it turns out.

Once I see what you've done, I'll give it a spin on niu.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-17 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Sat, 17 Mar 2007 21:33:58 +1100

> On Fri, 2007-03-16 at 13:38 -0700, Jeremy Fitzhardinge wrote:
> > David Miller wrote:
> > > Perhaps the problem can be dealt with using ELF relocations.
> > >
> > > There is another case, discussed yesterday on netdev, where run-time
> > > resolution of ELF relocations would be useful (for
> > > very-very-very-read-only variables) so if it can solve this problem
> > > too it would be nice to have a generic infrastructure for it.
> > 
> > That's an interesting idea.  Have you or anyone else looked at what it
> > would take to code up?
> > 
> > For this case, I guess you'd walk the relocs looking for references into
> > the paravirt_ops structure.  You'd need to check that was a reference
> > from an indirect jump or call instruction, which would identify a
> > patchable callsite.  The offset into the pv_ops structure would identify
> > which operation is involved.
> 
> I wrote a whole email on ways to do this, BUT...

The idea is _NOT_ that you go look for references to the paravirt_ops
members structure, that would be stupid and you wouldn't be able to
use the most efficient addressing mode on a given cpu, you'd be
patching up indirect calls and crap like that.  Just say no...

Instead you get rid of paravirt ops completely, and you call functions
whose symbol name will not resolve in the initial kernel link.

You do an initial link of the kernel, look for the unresolved symbols
in the ELF relocation tables (just like the linker does), and put
those references into a table that is use to patch things up and you
can use standard ELF relocation code to handle this, exactly like code
we already have for module loading in the kernel already.

This idea is about 15 years old, sparc32 has been doing exactly this
via something called "btfixup" to handle the page table, TLB, and
cache differences of 15 different cpu+cache type combinations.

> #define pv_patch(call, args...) \
>   asm volatile(":"); 
>   call(args);
>   asm volatile("8889:"
>[ stuff to put 8889,  and call in fixup section ]

Please, use ELF and it's powerful and clean existing way to
do this please. :-)

> > What are the netdev requirements?
> 
> Reading Ben LaHaise's (very cool!) patch, it's not clear that using
> reloc postprocessing is going to be clearer than open-coding it as he
> has done.

Ben's case can be handled in the same way.  Just do not define the
symbols, pre-link, look for the references in the relocation tables,
and run through that when you do the set_very_readonly() or
install_paravirt_ops() thing.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller

From: Andi Kleen <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 11:57:28 +0100

> On Monday 19 March 2007 00:46, Jeremy Fitzhardinge wrote:
> > Andi Kleen wrote:
> > For example, say we wanted to put a general call for sti into entry.S,
> > where its expected it won't touch any registers.  In that case, we'd
> > have a sequence like:
> >
> > push %eax
> > push %ecx
> > push %edx
> > call paravirt_cli
> > pop %edx
> > pop %ecx
> > pop %eax
> 
> This cannot right now be expressed as inline assembly in the unwinder at all 
> because there is no way to inject the push/pops into the compiler generated
> ehframe tables.
> 
> [BTW I plan to resubmit the unwinder with some changes]

It's inability to handle sequences like the above sounds to me like
a very good argument to _not_ merge the unwinder back into the tree.

To me, that unwinder is nothing but trouble, it severly limits what
cases you can use special calling conventions via inline asm (and we
have done that on several occaisions) and even ignoring that the
unwinder only works half the time.

Please don't subject us to another couple months of hair-pulling only
to have Linus yank the thing out again, there are certainly more
useful things to spend time on :-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller

From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 12:10:08 -0700

> All this is doable; I'd probably end up hacking boot/compressed/relocs.c
> to generate the appropriate reloc table.  My main concern is hacking the
> kernel build process itself; I'm unsure of what it would actually take
> to implement all this.

32-bit Sparc's btfixup might be usable as a guide.

Another point worth making is that for function calls you
can fix things up lazily if you want.

So you link, build the reloc tables, then link in a *.o file that
does provide the functions in the form of stubs.  The stubs intercept
the call, and patch the callsite, then revector to the real handler.

I don't like this idea actually because it essentially means you
either:

1) Only allow one setting of the operations

OR

2) Need to have code which walks the whole reloc table anyways
   to handle settings after the first so you can revector
   everyone back to the stubs and lazy reloc properly again

In fact forget I mentioned this idea :)

As another note, I do agree with Linus about the register usage
arguments.  It is important.  I think it's been mentioned but what you
could do is save nothing (so that "sti" and "cli" are just that and
cost nothing), but the more complicated versions save and restore
enough registers to operate.

It all depends upon what you're trying to do.  For example, it's
easy to use patching to make different PTE layouts be supportable
in the same kernel image.  We do this on sparc64 since sun4v
has a different PTE layout than sun4u, you can see the code in
asm-sparc64/pgtable.h for details (search for "sun4v_*_patch")
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-19 Thread David Miller

From: Linus Torvalds <[EMAIL PROTECTED]>
Date: Mon, 19 Mar 2007 20:18:14 -0700 (PDT)

> > > Please don't subject us to another couple months of hair-pulling only
> > > to have Linus yank the thing out again, there are certainly more
> > > useful things to spend time on :-)
> 
> Good call. Dwarf2 unwinding simply isn't worth doing. But I won't yank it 
> out, I simply won't merge it. It was more than just totally buggy code, it 
> was an inability of the people to understand that even bugfree code 
> isn't enough - you have to be able to also handle buggy data.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-04-12 Thread David Miller

From: Paul Mackerras <[EMAIL PROTECTED]>
Date: Wed, 21 Mar 2007 11:03:14 +1100

> Linus Torvalds writes:
> 
> > We should just do this natively. There's been several tests over the years 
> > saying that it's much more efficient to do sti/cli as a simple store, and 
> > handling the "oops, we got an interrupt while interrupts were disabled" as 
> > a special case.
> > 
> > I have this dim memory that ARM has done it that way for a long time 
> > because it's so expensive to do a "real" cli/sti.
> > 
> > And I think -rt does it for other reasons. It's just more flexible.
> 
> 64-bit powerpc does this now as well.

I was curious about this so I had a look.

There appears to be three pieces of state used to manage this
on powerpc, PACASOFTIRQEN(r13), PACAHARDIRQEN(r13) and the
SOFTE() in the stackframe.

Plus there is all of this complicated logic on trap entry and
exit to manage these three values properly.

local_irq_restore() doesn't look like a simple piece of code
either.  Logically it should be simple, update the software
binary state, and if enabling see if any interrupts came in
while we were disable so we can run them.

Given all of that, is it really cheaper than just flipping the
bit in the cpu control register? :-/
___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [2/2] [NET] link_watch: Remove delay for up even when we're down

2007-05-09 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 8 May 2007 22:16:09 +1000

> [NET]: Remove link_watch delay for up even when we're down
> 
> Currently all link carrier events are delayed by up to a second
> before they're processed to prevent link storms.  This causes
> unnecessary packet loss during that interval.
> 
> In fact, we can achieve the same effect in preventing storms by
> only delaying down events and unnecssary up events.  The latter
> is defined as up events when we're already up.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Also applied, thanks Herbert.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-09 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Tue, 8 May 2007 22:13:22 +1000

> [NET] link_watch: Move link watch list into net_device
> 
> These days the link watch mechanism is an integral part of the
> network subsystem as it manages the carrier status.  So it now
> makes sense to allocate some memory for it in net_device rather
> than allocating it on demand.
> 
> In fact, this is necessary because we can't tolerate a memory
> allocation failure since that means we'd have to potentially
> throw a link up event away.
> 
> It also simplifies the code greatly.
> 
> In doing so I discovered a subtle race condition in the use
> of singleevent.  This race condition still exists (and is
> somewhat magnified) without singleevent but it's now plugged
> thanks to an smp_mb__before_clear_bit.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks Herbert.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread David Miller

From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Thu, 10 May 2007 15:00:05 -0700

> Herbert Xu wrote:
> > [NET] link_watch: Move link watch list into net_device
> >
> > These days the link watch mechanism is an integral part of the
> > network subsystem as it manages the carrier status.  So it now
> > makes sense to allocate some memory for it in net_device rather
> > than allocating it on demand.
> 
> I think there's a problem with one of these two patches.

Yes, there are :-)

Did you catch the follow-on bug fixes?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread David Miller

From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Thu, 10 May 2007 15:22:17 -0700

> Andrew Morton wrote:
> > Five minutes after boot is when jiffies wraps.  Are you sure it's
> > a list-screwup rather than a jiffy-wrap screwup?
> >   
> 
> 
> Hm, its suggestive, isn't it?  Apparently they've already fixed this in
> the sekret networking clubhouse, so I'll need to track it down.

I'm not so certain now that we know it's the jiffies wrap point :-)

The fixes in question are attached below and they were posted and
discussed on netdev:


commit fe47cdba83b3041e4ac1aa1418431020a4afe1e0
Author: Herbert Xu <[EMAIL PROTECTED]>
Date:   Tue May 8 23:22:43 2007 -0700

[NET] link_watch: Eliminate potential delay on wrap-around

When the jiffies wrap around or when the system boots up for the first
time, down events can be delayed indefinitely since we no longer
update linkwatch_nextevent when only urgent events are processed.

This patch fixes this by setting linkwatch_nextevent when a
wrap-around occurs.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index b5f4579..4674ae5 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -101,8 +101,10 @@ static void linkwatch_schedule_work(unsigned long delay)
return;
 
/* If we wrap around we'll delay it by at most HZ. */
-   if (delay > HZ)
+   if (delay > HZ) {
+   linkwatch_nextevent = jiffies;
delay = 0;
+   }
 
schedule_delayed_work(&linkwatch_work, delay);
 }

commit 4cba637dbb9a13706494a1c85174c8e736914010
Author: Herbert Xu <[EMAIL PROTECTED]>
Date:   Wed May 9 00:17:30 2007 -0700

[NET] link_watch: Always schedule urgent events

Urgent events may be delayed if we already have a non-urgent event
queued for that device.  This patch changes this by making sure that
an urgent event is always looked at immediately.

I've replaced the LW_RUNNING flag by LW_URGENT since whether work
is scheduled is already kept track by the work queue system.

The only complication is that we have to provide some exclusion for
the setting linkwatch_nextevent which is available in the actual
work function.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 4674ae5..a5e372b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -26,7 +26,7 @@
 
 
 enum lw_bits {
-   LW_RUNNING = 0,
+   LW_URGENT = 0,
 };
 
 static unsigned long linkwatch_flags;
@@ -95,18 +95,41 @@ static void linkwatch_add_event(struct net_device *dev)
 }
 
 
-static void linkwatch_schedule_work(unsigned long delay)
+static void linkwatch_schedule_work(int urgent)
 {
-   if (test_and_set_bit(LW_RUNNING, &linkwatch_flags))
+   unsigned long delay = linkwatch_nextevent - jiffies;
+
+   if (test_bit(LW_URGENT, &linkwatch_flags))
return;
 
-   /* If we wrap around we'll delay it by at most HZ. */
-   if (delay > HZ) {
-   linkwatch_nextevent = jiffies;
+   /* Minimise down-time: drop delay for up event. */
+   if (urgent) {
+   if (test_and_set_bit(LW_URGENT, &linkwatch_flags))
+   return;
delay = 0;
}
 
-   schedule_delayed_work(&linkwatch_work, delay);
+   /* If we wrap around we'll delay it by at most HZ. */
+   if (delay > HZ)
+   delay = 0;
+
+   /*
+* This is true if we've scheduled it immeditately or if we don't
+* need an immediate execution and it's already pending.
+*/
+   if (schedule_delayed_work(&linkwatch_work, delay) == !delay)
+   return;
+
+   /* Don't bother if there is nothing urgent. */
+   if (!test_bit(LW_URGENT, &linkwatch_flags))
+   return;
+
+   /* It's already running which is good enough. */
+   if (!cancel_delayed_work(&linkwatch_work))
+   return;
+
+   /* Otherwise we reschedule it again for immediate exection. */
+   schedule_delayed_work(&linkwatch_work, 0);
 }
 
 
@@ -123,7 +146,11 @@ static void __linkwatch_run_queue(int urgent_only)
 */
if (!urgent_only)
linkwatch_nextevent = jiffies + HZ;
-   clear_bit(LW_RUNNING, &linkwatch_flags);
+   /* Limit wrap-around effect on delay. */
+   else if (time_after(linkwatch_nextevent, jiffies + HZ))
+   linkwatch_nextevent = jiffies;
+
+   clear_bit(LW_URGENT, &linkwatch_flags);
 
spin_lock_irq(&lweventlist_lock);
next = lweventlist;
@@ -166,7 +193,7 @@ static void __linkwatch_run_queue(int urgent_only)
}
 
if (lweventlist)
-   linkwatch_schedule_work(linkwatch_nextev

Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread David Miller

From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Thu, 10 May 2007 15:45:42 -0700

> David Miller wrote:
> > I'm not so certain now that we know it's the jiffies wrap point :-)
> >
> > The fixes in question are attached below and they were posted and
> > discussed on netdev:
> >   
> 
> Yep, this patch gets rid of my spinning thread.  I can't find this patch
> or any discussion on marc.info; is there a better netdev list archive?

I don't see it there either... let me check my mail archive...

Indeed, they were "posted" to netdev but were blocked by the vger
regexp filters on the keyword "urgent" so that postings never made it
to the list.  I removed that filter regexp so that never happens
again, sorry.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/3] Virtio draft IV: the net driver

2007-07-11 Thread David Miller

From: Christian Borntraeger <[EMAIL PROTECTED]>
Date: Wed, 11 Jul 2007 12:45:40 +0200

> Am Mittwoch, 4. Juli 2007 schrieb Rusty Russell:
> > +static void receive_skb(struct net_device *dev, struct sk_buff *skb,
> [...]
> > +   netif_rx(skb);
> 
> In the NAPI case, we should use netif_receive_skb, no?

NAPI doesn't make sense for virtual devices, my Sun LDOM nework
driver won't use NAPI either.

It's also too cumbersome to use NAPI with the way virtualized
network drivers work (multiple ports, each with an interrupt
source, not just one) until the NAPI split patches are ported
and applied upstream and that won't be for a while.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/3] Virtio draft IV: the net driver

2007-07-11 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Thu, 12 Jul 2007 12:21:33 +1000

> Dave, I think you're the only one (so far?) with multiple irqs.

Luckily there are known hw implementations with that issue
so I won't be weird for long :)

> It's not clear that guest-controlled interrupt mitigation is the best
> approach for virtual devices, but at the moment it doesn't hurt.

It would be nice for consistency's sake, once it is easy to do so.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] skb_partial_csum_set

2008-01-15 Thread David Miller

From: Rusty Russell <[EMAIL PROTECTED]>
Date: Tue, 15 Jan 2008 21:41:55 +1100

> Implement skb_partial_csum_set, for setting partial csums on untrusted 
> packets.
> 
> Use it in virtio_net (replacing buggy version there), it's also going
> to be used by TAP for partial csum support.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

Looks fine to me.

Acked-by: David S. Miller <[EMAIL PROTECTED]>

If you like I can merge this into my net-2.6.25 tree, or alternatively
if it makes your life easier you then you can handle it yourself.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH for-3.7] vhost: fix mergeable bufs on BE hosts

2012-10-21 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Sun, 21 Oct 2012 14:49:01 +0200

> On Mon, Oct 15, 2012 at 07:55:34PM +0200, Michael S. Tsirkin wrote:
>> We copy head count to a 16 bit field,
>> this works by chance on LE but on BE
>> guest gets 0. Fix it up.
>> 
>> Signed-off-by: Michael S. Tsirkin 
>> Tested-by: Alexander Graf 
>> Cc: sta...@kernel.org
> 
> Ping. Dave, could you apply this to -net please?

Pinging me but not cc:'ing me?  That's really strange.

What if I operate by just mass deleting things that I'm
not explicitly on the To: or CC: when I'm very backlogged?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH for-3.7] vhost: fix mergeable bufs on BE hosts

2012-10-24 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Wed, 24 Oct 2012 18:24:38 +0200

> Would you like me to repost the patch?

This question is almost retorical.

I said I don't reliably read things I'm not explicitly CC:'d
on, therefore it's possible (and in fact, likely) I don't have
the patch in my inbox.

What do you think you should do?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH repost for-3.7] vhost: fix mergeable bufs on BE hosts

2012-10-24 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Wed, 24 Oct 2012 20:37:51 +0200

> We copy head count to a 16 bit field, this works by chance on LE but on
> BE guest gets 0. Fix it up.
> 
> Signed-off-by: Michael S. Tsirkin 
> Tested-by: Alexander Graf 
> Cc: sta...@vger.kernel.org

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Wed, 31 Oct 2012 12:31:06 +0200

> -void vhost_zerocopy_callback(struct ubuf_info *ubuf)
> +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)

If you're only reporting true/false values, even just for now,
please use 'bool' for this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv2 net-next 1/8] skb: report completion status for zero copy skbs

2012-11-01 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Thu, 1 Nov 2012 18:16:11 +0200

> Do you think it's over-engineering, or a good idea?

Engineer what you need, not what you might need.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv3 net-next 0/8] enable/disable zero copy tx dynamically

2012-11-02 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Thu, 1 Nov 2012 21:16:17 +0200

> 
> tun supports zero copy transmit since 
> 0690899b4d4501b3505be069b9a687e68ccbe15b,
> however you can only enable this mode if you know your workload does not
> trigger heavy guest to host/host to guest traffic - otherwise you
> get a (minor) performance regression.
> This patchset addresses this problem by notifying the owner
> device when callback is invoked because of a data copy.
> This makes it possible to detect whether zero copy is appropriate
> dynamically: we start in zero copy mode, when we detect
> data copied we disable zero copy for a while.
> 
> With this patch applied, I get the same performance for
> guest to host and guest to guest both with and without zero copy tx.

Series applied, thanks Michael.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-05 Thread David Miller


The big and only question is whether anyone can actually use any of
this stuff without your proprietary bits?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-05 Thread David Miller

From: David Miller 
Date: Mon, 05 Nov 2012 13:09:17 -0500 (EST)

> The big and only question is whether anyone can actually use any of
> this stuff without your proprietary bits?

And BTW vm-crosst...@vmware.com bounces, take it out of the CC: list
on all future emails.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Remove duplicate inclusion of linux/vhost.h

2012-11-19 Thread David Miller

From: Sachin Kamat 
Date: Mon, 19 Nov 2012 16:58:28 +0530

> linux/vhost.h was included twice.
> 
> Signed-off-by: Sachin Kamat 

Michael, are you gonna take this?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost: Remove duplicate inclusion of linux/vhost.h

2012-11-19 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Mon, 19 Nov 2012 21:49:55 +0200

> On Mon, Nov 19, 2012 at 02:18:13PM -0500, David Miller wrote:
>> From: Sachin Kamat 
>> Date: Mon, 19 Nov 2012 16:58:28 +0530
>> 
>> > linux/vhost.h was included twice.
>> > 
>> > Signed-off-by: Sachin Kamat 
>> 
>> Michael, are you gonna take this?
>> 
>> Thanks.
> 
> Pls pick it up.
> 
> Acked-by: Michael S. Tsirkin 

Applied to net-next, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost-blk: Add vhost-blk support v5

2012-11-28 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Mon, 26 Nov 2012 17:14:16 +0200

> On Mon, Nov 19, 2012 at 10:26:41PM +0200, Michael S. Tsirkin wrote:
>> > 
>> > Userspace bits:
>> > -
>> > 1) LKVM
>> > The latest vhost-blk userspace bits for kvm tool can be found here:
>> > g...@github.com:asias/linux-kvm.git blk.vhost-blk
>> > 
>> > 2) QEMU
>> > The latest vhost-blk userspace prototype for QEMU can be found here:
>> > g...@github.com:asias/qemu.git blk.vhost-blk
>> > 
>> > Changes in v5:
>> > - Do not assume the buffer layout
>> > - Fix wakeup race
>> > 
>> > Changes in v4:
>> > - Mark req->status as userspace pointer
>> > - Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status()
>> > - Add if (need_resched()) schedule() in blk thread
>> > - Kill vhost_blk_stop_vq() and move it into vhost_blk_stop()
>> > - Use vq_err() instead of pr_warn()
>> > - Fail un Unsupported request
>> > - Add flush in vhost_blk_set_features()
>> > 
>> > Changes in v3:
>> > - Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
>> > - Check file passed by user is a raw block device file
>> > 
>> > Signed-off-by: Asias He 
>> 
>> Since there are files shared by this and vhost net
>> it's easiest for me to merge this all through the
>> vhost tree.
> 
> Hi Dave, are you ok with this proposal?

I have no problems with this, for networking parts:

Acked-by: David S. Miller 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost-net: initialize zcopy packet counters

2012-12-03 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Mon, 3 Dec 2012 19:31:51 +0200

> These packet counters are used to drive the zercopy
> selection heuristic so nothing too bad happens if they are off a bit -
> and they are also reset once in a while.
> But it's cleaner to clear them when backend is set so that
> we start in a known state.
> 
> Signed-off-by: Michael S. Tsirkin 

Applied to net-next, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 0/3] Multiqueue support in virtio-net

2012-12-07 Thread David Miller

From: Jason Wang 
Date: Sat,  8 Dec 2012 01:04:54 +0800

> This series is an update version (hope the final version) of multiqueue
> (VIRTIO_NET_F_MQ) support in virtio-net driver. All previous comments were
> addressed, the work were based on Krishna Kumar's work to let virtio-net use
> multiple rx/tx queues to do the packets reception and transmission. 
> Performance
> test show the aggregate latency were increased greately but may get some
> regression in small packet transmission. Due to this, multiqueue were disabled
> by default. If user want to benefit form the multiqueue, ethtool -L could be
> used to enable the feature.
> 
> Please review and comments.
> 
> A protype implementation of qemu-kvm support could by found in
> git://github.com/jasowang/qemu-kvm-mq.git. To start a guest with two queues, 
> you
> could specify the queues parameters to both tap and virtio-net like:
> 
> ./qemu-kvm -netdev tap,queues=2,... -device virtio-net-pci,queues=2,...
> 
> then enable the multiqueue through ethtool by:
> 
> ethtool -L eth0 combined 2

It seems like most, if not all, of the feedback given for this series
has been addressed by Jason.

Can I get some ACKs?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 0/3] Multiqueue support in virtio-net

2012-12-08 Thread David Miller

From: Jason Wang 
Date: Sat,  8 Dec 2012 01:04:54 +0800

> This series is an update version (hope the final version) of multiqueue
> (VIRTIO_NET_F_MQ) support in virtio-net driver. All previous comments were
> addressed, the work were based on Krishna Kumar's work to let virtio-net use
> multiple rx/tx queues to do the packets reception and transmission. 
> Performance
> test show the aggregate latency were increased greately but may get some
> regression in small packet transmission. Due to this, multiqueue were disabled
> by default. If user want to benefit form the multiqueue, ethtool -L could be
> used to enable the feature.

These changes look fine to me, applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/6] VSOCK for Linux upstreaming

2013-01-08 Thread David Miller

From: Greg KH 
Date: Tue, 8 Jan 2013 16:21:10 -0800

> On Tue, Jan 08, 2013 at 03:59:08PM -0800, George Zhang wrote:
>> 
>> * * *
>> 
>> This series of VSOCK linux upstreaming patches include latest udpate from
>> VMware to address Greg's and all other's code review comments.
> 
> Dave, you acked these patches a while ago,

Really?  I'd like to see where I did that.

Instead, what I remember doing was deferring to the feedback these
folks received, stating that ideas that the virtio people had
mentioned should be considered instead.

http://marc.info/?l=linux-netdev&m=135301515818462&w=2

So definitely NACK this code and any infrastructure you've
merged which essentialy depends upon it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming

2013-01-08 Thread David Miller

From: Dmitry Torokhov 
Date: Tue, 08 Jan 2013 17:41:44 -0800

> On Tuesday, January 08, 2013 05:30:56 PM David Miller wrote:
>> From: Greg KH 
>> Date: Tue, 8 Jan 2013 16:21:10 -0800
>> 
>> > On Tue, Jan 08, 2013 at 03:59:08PM -0800, George Zhang wrote:
>> >> * * *
>> >> 
>> >> This series of VSOCK linux upstreaming patches include latest udpate from
>> >> VMware to address Greg's and all other's code review comments.
>> > 
>> > Dave, you acked these patches a while ago,
>> 
>> Really?  I'd like to see where I did that.
>> 
>> Instead, what I remember doing was deferring to the feedback these
>> folks received, stating that ideas that the virtio people had
>> mentioned should be considered instead.
>> 
>> http://marc.info/?l=linux-netdev&m=135301515818462&w=2
> 
> I believe Andy replied to Anthony's AF_VMCHANNEL post and the differences
> between the proposed solutions.

I'd much rather see a hypervisor neutral solution than a hypervisor
specific one which this certainly is.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 2/3] net: split eth_mac_addr for better error handling

2013-01-20 Thread David Miller

From: ak...@redhat.com
Date: Sun, 20 Jan 2013 10:43:08 +0800

> From: Stefan Hajnoczi 
> 
> When we set mac address, software mac address in system and hardware mac
> address all need to be updated. Current eth_mac_addr() doesn't allow
> callers to implement error handling nicely.
> 
> This patch split eth_mac_addr() to prepare part and real commit part,
> then we can prepare first, and try to change hardware address, then do
> the real commit if hardware address is set successfully.
> 
> Signed-off-by: Stefan Hajnoczi 
> Signed-off-by: Amos Kong 

This patch doesn't apply to net-next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 0/3] make mac programming for virtio net more robust

2013-01-21 Thread David Miller

From: Amos Kong 
Date: Mon, 21 Jan 2013 19:17:20 +0800

> Currenly mac is programmed byte by byte. This means that we
> have an intermediate step where mac is wrong. 
> 
> Third patch introduced a new vq control command to set mac
> address, it's atomic.
> 
> V2: check return of sending command, delay eth_mac_addr()
> V3: restore software address when fail to set hardware address
> V4: split eth_mac_addr, fix error handle
> V5: rebase patches to net-next tree

I'll apply this series, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V8 1/3] virtio-net: fix the set affinity bug when CPU IDs are not consecutive

2013-01-26 Thread David Miller

From: Wanlong Gao 
Date: Fri, 25 Jan 2013 17:51:29 +0800

> As Michael mentioned, set affinity and select queue will not work very
> well when CPU IDs are not consecutive, this can happen with hot unplug.
> Fix this bug by traversal the online CPUs, and create a per cpu variable
> to find the mapping from CPU to the preferable virtual-queue.
> 
> Cc: Rusty Russell 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Eric Dumazet 
> Cc: "David S. Miller" 
> Cc: virtualization@lists.linux-foundation.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Wanlong Gao 
> Acked-by: Michael S. Tsirkin 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V8 2/3] virtio-net: split out clean affinity function

2013-01-26 Thread David Miller

From: Wanlong Gao 
Date: Fri, 25 Jan 2013 17:51:30 +0800

> Split out the clean affinity function to virtnet_clean_affinity().
> 
> Cc: Rusty Russell 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Eric Dumazet 
> Cc: "David S. Miller" 
> Cc: virtualization@lists.linux-foundation.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Wanlong Gao 
> Acked-by: Michael S. Tsirkin 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V8 3/3] virtio-net: reset virtqueue affinity when doing cpu hotplug

2013-01-26 Thread David Miller

From: Wanlong Gao 
Date: Fri, 25 Jan 2013 17:51:31 +0800

> Add a cpu notifier to virtio-net, so that we can reset the
> virtqueue affinity if the cpu hotplug happens. It improve
> the performance through enabling or disabling the virtqueue
> affinity after doing cpu hotplug.
> 
> Cc: Rusty Russell 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Eric Dumazet 
> Cc: "David S. Miller" 
> Cc: virtualization@lists.linux-foundation.org
> Cc: net...@vger.kernel.org
> Signed-off-by: Wanlong Gao 
> Acked-by: Michael S. Tsirkin 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/8] drivers/net: Remove unnecessary alloc/OOM messages

2013-02-04 Thread David Miller

From: Joe Perches 
Date: Sun,  3 Feb 2013 19:28:07 -0800

> Remove all the OOM messages that follow kernel alloc
> failures as there is already a generic equivalent to
> these messages in the mm subsystem.
> 
> Joe Perches (8):
>   caif: Remove unnecessary alloc/OOM messages
>   can: Remove unnecessary alloc/OOM messages
>   ethernet: Remove unnecessary alloc/OOM messages, alloc cleanups
>   drivers: net: usb: Remove unnecessary alloc/OOM messages
>   wan: Remove unnecessary alloc/OOM messages
>   wimax: Remove unnecessary alloc/OOM messages, alloc cleanups
>   wireless: Remove unnecessary alloc/OOM messages, alloc cleanups
>   drivers:net:misc: Remove unnecessary alloc/OOM messages

Series applied, thanks Joe.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Pv-drivers] [PATCH 0/1] VM Sockets for Linux upstreaming

2013-02-08 Thread David Miller

From: Dmitry Torokhov 
Date: Fri, 8 Feb 2013 17:20:44 -0800

> Hi David,
> 
> On Wed, Feb 06, 2013 at 04:23:55PM -0800, Andy King wrote:
>> In an effort to improve the out-of-the-box experience with Linux kernels for
>> VMware users, VMware is working on readying the VM Sockets (VSOCK, formerly
>> VMCI Sockets) (vsock) kernel module for inclusion in the Linux kernel. The
>> purpose of this post is to acquire feedback on the vsock kernel module.
>> 
>> Unlike previous submissions, where the new socket family was entirely reliant
>> on VMware's VMCI PCI device (and thus VMware's hypervisor), VM Sockets is now
>> completely[1] separated out into two parts, each in its own module:
>> 
>> o Core socket code, which is transport-neutral and invokes transport
>>   callbacks to communicate with the hypervisor.  This is vsock.ko.
>> o A VMCI transport, which communicates over VMCI with the VMware hypervisor.
>>   This is vmw_vsock_vmci_transport.ko, and it registers with the core module
>>   as a transport.
>> 
>> This should provide a path to introducing additional transports, for example
>> virtio, with the ultimate goal being to make this new socket family
>> hypervisor-neutral.
> 
> As Andy mentioned in another e-mail, we would like very much to get
> vsock in 3.9 release, so now that it is split into hypervisor neutral
> and transport parts is there any high level issues that we need to
> resolve before the code can be accepted?

I have no idea, I haven't gotten to reviewing your changes yet, and
I will do so at a time of my own choosing.  Pressing me about the matter
is unlikely to make me review things any faster, and in fact will have
the opposite effect.

Therefore, just be patient like everyone else is.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/1] VM Sockets for Linux upstreaming

2013-02-10 Thread David Miller

From: Andy King 
Date: Wed,  6 Feb 2013 16:23:55 -0800

> In an effort to improve the out-of-the-box experience with Linux kernels for
> VMware users, VMware is working on readying the VM Sockets (VSOCK, formerly
> VMCI Sockets) (vsock) kernel module for inclusion in the Linux kernel. The
> purpose of this post is to acquire feedback on the vsock kernel module.
> 
> Unlike previous submissions, where the new socket family was entirely reliant
> on VMware's VMCI PCI device (and thus VMware's hypervisor), VM Sockets is now
> completely[1] separated out into two parts, each in its own module:
> 
> o Core socket code, which is transport-neutral and invokes transport
>   callbacks to communicate with the hypervisor.  This is vsock.ko.
> o A VMCI transport, which communicates over VMCI with the VMware hypervisor.
>   This is vmw_vsock_vmci_transport.ko, and it registers with the core module
>   as a transport.
> 
> This should provide a path to introducing additional transports, for example
> virtio, with the ultimate goal being to make this new socket family
> hypervisor-neutral.

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/4] Minor vSockets fixes

2013-02-18 Thread David Miller

From: Andy King 
Date: Mon, 18 Feb 2013 08:04:09 -0800

> Minor vSockets fixes, two of which were reported on LKML.

Series applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_net: remove tx polling state

2013-03-07 Thread David Miller

From: Jason Wang 
Date: Thu,  7 Mar 2013 12:31:56 +0800

> After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle 
> polling
> errors when setting backend), we in fact track the polling state through
> poll->wqh, so there's no need to duplicate the work with an extra
> vhost_net_polling_state. So this patch removes this and make the code simpler.
> 
> Signed-off-by: Jason Wang 

Can I get an ACK or two from some VHOST folks?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] VSOCK: Split vm_sockets.h into kernel/uapi

2013-03-08 Thread David Miller

From: David Howells 
Date: Fri, 08 Mar 2013 01:09:18 +

> Greg KH  wrote:
> 
>> David, is there any rush to get stuff like this into 3.9 for any
>> uapi-type changes, or can it just wait for 3.10?
> 
> Not especially.  It won't appear in userspace due to the __KERNEL__ guards.

I've applied this to net-next, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv3 vringh] caif_virtio: Introduce caif over virtio

2013-03-17 Thread David Miller

From: Erwan Yvin 
Date: Fri, 15 Mar 2013 10:42:17 +0100

> caif-virtio is going to replace caif-shm.
> This patch should be merged in rusty's tree. (vringh)
> because there is a dependency with vringh wrapper.

Feel free to add my:

Acked-by: David S. Miller 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] vhost/net: fix heads usage of ubuf_info

2013-03-17 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Sun, 17 Mar 2013 14:46:09 +0200

> ubuf info allocator uses guest controlled head as an index,
> so a malicious guest could put the same head entry in the ring twice,
> and we will get two callbacks on the same value.
> To fix use upend_idx which is guaranteed to be unique.
> 
> Reported-by: Rusty Russell 
> Signed-off-by: Michael S. Tsirkin 

Applied and queued up for -stable, thanks.

And thankfully you got the stable URL wrong, please do not CC:
networking patches to stable, just make sure I apply them and in
your post-commit text explicitly ask me to queue it up to my
-stable queue.

Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_net: remove tx polling state

2013-04-11 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Thu, 11 Apr 2013 10:24:30 +0300

> On Thu, Apr 11, 2013 at 02:50:48PM +0800, Jason Wang wrote:
>> After commit 2b8b328b61c799957a456a5a8dab8cc7dea68575 (vhost_net: handle 
>> polling
>> errors when setting backend), we in fact track the polling state through
>> poll->wqh, so there's no need to duplicate the work with an extra
>> vhost_net_polling_state. So this patch removes this and make the code 
>> simpler.
>> 
>> This patch also removes the all tx starting/stopping code in tx path 
>> according
>> to Michael's suggestion.
>> 
>> Netperf test shows almost the same result in stream test, but gets 
>> improvements
>> on TCP_RR tests (both zerocopy or copy) especially on low load cases.
>> 
>> Tested between multiqueue kvm guest and external host with two direct
>> connected 82599s.
 ...
>> Signed-off-by: Jason Wang 
> 
> Less code and better speed, what's not to like.
> Davem, could you pick this up for 3.10 please?
> 
> Acked-by: Michael S. Tsirkin 

Applied to net-next, thanks everyone.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec

2013-05-15 Thread David Miller

From: Rusty Russell 
Date: Thu, 16 May 2013 09:05:38 +0930

> memcpy_fromiovec() has nothing to do with networking: that was just the
> first user.  Note that crypto/algif_skcipher.c also uses it.  The
> obvious answer is to move it into lib/.

+1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost-scsi: Depend on NET for memcpy_fromiovec

2013-05-16 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Thu, 16 May 2013 09:46:21 +0300

> On Wed, May 15, 2013 at 08:10:55PM -0700, David Miller wrote:
>> From: Rusty Russell 
>> Date: Thu, 16 May 2013 09:05:38 +0930
>> 
>> > memcpy_fromiovec() has nothing to do with networking: that was just the
>> > first user.  Note that crypto/algif_skcipher.c also uses it.  The
>> > obvious answer is to move it into lib/.
>> 
>> +1
> 
> Rusty sent a patch that does this:
> http://patchwork.ozlabs.org/patch/244207/
> 
> David, looks like you weren't CC'd.
> If you agree could you please Ack that patch and then I can merge it
> through the vhost tree?
> Or if you prefer merge it directly and I'll sort out the dependencies...

Acked-by: David S. Miller 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2] virtio_net: enable napi for all possible queues during open

2013-05-23 Thread David Miller

From: Jason Wang 
Date: Wed, 22 May 2013 14:03:58 +0800

> Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx
> queues which are being used) only does the napi enabling during open for
> curr_queue_pairs. This will break multiqueue receiving since napi of new 
> queues
> were still disabled after changing the number of queues.
> 
> This patch fixes this by enabling napi for all possible queues during open.
> 
> Cc: Sasha Levin 
> Signed-off-by: Jason Wang 

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_net: clear msg.control for non-zerocopy case during tx

2013-06-10 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Wed, 5 Jun 2013 12:02:52 +0300

> On Wed, Jun 05, 2013 at 03:40:46PM +0800, Jason Wang wrote:
>> When we decide not use zero-copy, msg.control should be set to NULL otherwise
>> macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
>> wrongly.
>> 
>> Bug were introduced by commit cedb9bdce099206290a2bdd02ce47a7b253b6a84
>> (vhost-net: skip head management if no outstanding).
>> 
>> This solves the following warnings:
>> 
>> WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
>> Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp 
>> llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
>> CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
>> Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
>> a0198323 88007c9ebd08 81796b73 88007c9ebd48
>> 8103d66b 7b773e20 8800779f 8800779f43f0
>> 8800779f8418 015c 0062 88007c9ebd58
>> Call Trace:
>> [] dump_stack+0x19/0x1e
>> [] warn_slowpath_common+0x6b/0xa0
>> [] warn_slowpath_null+0x15/0x20
>> [] handle_tx+0x477/0x4b0 [vhost_net]
>> [] handle_tx_kick+0x10/0x20 [vhost_net]
>> [] vhost_worker+0xfe/0x1a0 [vhost_net]
>> [] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
>> [] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
>> [] kthread+0xc6/0xd0
>> [] ? kthread_freezable_should_stop+0x70/0x70
>> [] ret_from_fork+0x7c/0xb0
>> [] ? kthread_freezable_should_stop+0x70/0x70
>> 
>> Signed-off-by: Jason Wang 
> 
> Good catch.
> 
> Acked-by: Michael S. Tsirkin 
> 
> This needs to go into stable as well.

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] vhost-net: fix use-after-free in vhost_net_flush

2013-06-24 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Thu, 20 Jun 2013 14:48:13 +0300

> vhost_net_ubuf_put_and_wait has a confusing name:
> it will actually also free it's argument.
> Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01

Never reference commits only by SHA1 ID, it is never sufficient.

Always provide, after the SHA1 ID, in parenthesis, the header line
from the commit message.

To be honest, I'm kind of tired of telling people they need to do
this over and over again.

Maybe people keep forgetting because the reason why this is an issue
hasn't really sunk in.

If the patch you reference got backported into another tree, it will
not have the SHA1 ID, and therefore someone reading the "fix" won't
be able to find the fault causing change without going through a lot
of trouble.  By providing the commit header line you remove that
problem altogether, no ambiguity is possible.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv2] vhost-net: fix use-after-free in vhost_net_flush

2013-06-25 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Tue, 25 Jun 2013 17:29:46 +0300

> vhost_net_ubuf_put_and_wait has a confusing name:
> it will actually also free it's argument.
> Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01
> "vhost-net: flush outstanding DMAs on memory change"
> vhost_net_flush tries to use the argument after passing it
> to vhost_net_ubuf_put_and_wait, this results
> in use after free.
> To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
> add an new API for callers that want to free ubufs.
> 
> Acked-by: Asias He 
> Acked-by: Jason Wang 
> Signed-off-by: Michael S. Tsirkin 

This doesn't apply cleanly to the 'net' tree, please fix this up
and resubmit.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 2/5] VSOCK: Introduce virtio-vsock-common.ko

2013-06-28 Thread David Miller

From: Asias He 
Date: Thu, 27 Jun 2013 16:00:01 +0800

> +static void
> +virtio_transport_recv_dgram(struct sock *sk,
> + struct virtio_vsock_pkt *pkt)
 ...
> + memcpy(skb->data, pkt, sizeof(*pkt));
> + memcpy(skb->data + sizeof(*pkt), pkt->buf, pkt->len);

Are you sure this is right?

Shouldn't you be using "sizeof(struct virtio_vsock_hdr)" instead of
"sizeof(*pkt)".  'pkt' is "struct virtio_vsock_pkt" and has all kinds
of meta-data you probably don't mean to include in the SKB.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH next] xen: Use more current logging styles

2013-07-01 Thread David Miller

From: Ian Campbell 
Date: Fri, 28 Jun 2013 08:59:50 +0100

> On Thu, 2013-06-27 at 21:57 -0700, Joe Perches wrote:
>> Instead of mixing printk and pr_ forms,
>> just use pr_
>> 
>> Miscellaneous changes around these conversions:
>> 
>> Add a missing newline to avoid message interleaving,
>> coalesce formats, reflow modified lines to 80 columns.
>> 
>> Signed-off-by: Joe Perches 
> 
> Acked-by: Ian Campbell 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix the race between channels setting and refill

2013-07-03 Thread David Miller

From: Jason Wang 
Date: Wed,  3 Jul 2013 20:15:52 +0800

> Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx 
> queues
> which are being used) tries to refill on demand when changing the number of
> channels by call try_refill_recv() directly, this may race:
> 
> - the refill work who may do the refill in the same time
> - the try_refill_recv() called in bh since napi was not disabled
> 
> Which may led guest complain during setting channels:
> 
> virtio_net virtio0: input.1:id 0 is not a head!
> 
> Solve this issue by scheduling a refill work which can guarantee the
> serialization of refill.
> 
> Cc: Sasha Levin 
> Cc: Rusty Russell 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 

Michael, please review.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCHv3] vhost-net: fix use-after-free in vhost_net_flush

2013-07-09 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Sun, 7 Jul 2013 14:26:53 +0300

> vhost_net_ubuf_put_and_wait has a confusing name:
> it will actually also free it's argument.
> Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01
> "vhost-net: flush outstanding DMAs on memory change"
> vhost_net_flush tries to use the argument after passing it
> to vhost_net_ubuf_put_and_wait, this results
> in use after free.
> To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
> add an new API for callers that want to free ubufs.
> 
> Acked-by: Asias He 
> Acked-by: Jason Wang 
> Signed-off-by: Michael S. Tsirkin 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: put virtio net header inline with data

2013-07-09 Thread David Miller

From: Rusty Russell 
Date: Tue, 09 Jul 2013 17:38:51 +0930

> If you convince DaveM, I won't object :)

Simplifications are great, but not when the merge window opens up.

Sorry, this isn't appropriate now.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: put virtio net header inline with data

2013-07-16 Thread David Miller

From: Rusty Russell 
Date: Mon, 15 Jul 2013 11:13:25 +0930

> From: Michael S. Tsirkin 
> 
> For small packets we can simplify xmit processing
> by linearizing buffers with the header:
> most packets seem to have enough head room
> we can use for this purpose.
> Since existing hypervisors require that header
> is the first s/g element, we need a feature bit
> for this.
> 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Rusty Russell 

I really think this has to wait until the next merge window, sorry.

Please resubmit this when I open net-next back up, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: put virtio net header inline with data

2013-07-16 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Wed, 17 Jul 2013 08:00:32 +0300

> On Tue, Jul 16, 2013 at 12:33:26PM -0700, David Miller wrote:
>> From: Rusty Russell 
>> Date: Mon, 15 Jul 2013 11:13:25 +0930
>> 
>> > From: Michael S. Tsirkin 
>> > 
>> > For small packets we can simplify xmit processing
>> > by linearizing buffers with the header:
>> > most packets seem to have enough head room
>> > we can use for this purpose.
>> > Since existing hypervisors require that header
>> > is the first s/g element, we need a feature bit
>> > for this.
>> > 
>> > Signed-off-by: Michael S. Tsirkin 
>> > Signed-off-by: Rusty Russell 
>> 
>> I really think this has to wait until the next merge window, sorry.
>> 
>> Please resubmit this when I open net-next back up, thanks.
> 
> I assumed since -rc1 is out net-next is already open?

-rc1 being released never makes net-next open.  Instead, I explicitly
open it up at some point in time after -rc1 when I feel that things
have settled down enough.

And when that happens, I announce so here.

So you have to follow my announcements here on netdev to know
when net-next is actually open.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/3] networking: Use ETH_ALEN where appropriate

2013-08-02 Thread David Miller

From: Joe Perches 
Date: Thu,  1 Aug 2013 16:17:46 -0700

> Convert the uses mac addresses to ETH_ALEN so
> it's easier to find and verify where mac addresses
> need to be __aligned(2)

Series applied to net-next, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: Drop linux/socket.h

2013-08-15 Thread David Miller

From: Asias He 
Date: Thu, 15 Aug 2013 11:20:16 +0800

> memcpy_fromiovec is moved to lib/iovec.c. No need to include
> linux/socket.h for it.
> 
> Signed-off-by: Asias He 

You can't do this.

Because this file doesn't include the header file that
provides the declaration, which is linux/uio.h

linux/socket.h includes linux/uio.h, so honestly leaving
things the way they are is a 1000 times better than your
patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: Drop linux/socket.h

2013-08-16 Thread David Miller

From: Asias He 
Date: Fri, 16 Aug 2013 09:27:43 +0800

> On Thu, Aug 15, 2013 at 02:07:40PM -0700, David Miller wrote:
>> From: Asias He 
>> Date: Thu, 15 Aug 2013 11:20:16 +0800
>> 
>> > memcpy_fromiovec is moved to lib/iovec.c. No need to include
>> > linux/socket.h for it.
>> > 
>> > Signed-off-by: Asias He 
>> 
>> You can't do this.
>> 
>> Because this file doesn't include the header file that
>> provides the declaration, which is linux/uio.h
> 
> vhost.c includes drivers/vhost/vhost.h. In drivers/vhost/vhost.h, we
> have linux/uio.h included.

Nothing in vhost.h needs linux/uio.h right?  That's very poor style,
include the header where the dependency exists which is vhost.c
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: Drop linux/socket.h

2013-08-16 Thread David Miller

From: Asias He 
Date: Fri, 16 Aug 2013 17:27:43 +0800

> On Fri, Aug 16, 2013 at 12:31:59AM -0700, David Miller wrote:
>> From: Asias He 
>> Date: Fri, 16 Aug 2013 09:27:43 +0800
>> 
>> > On Thu, Aug 15, 2013 at 02:07:40PM -0700, David Miller wrote:
>> >> From: Asias He 
>> >> Date: Thu, 15 Aug 2013 11:20:16 +0800
>> >> 
>> >> > memcpy_fromiovec is moved to lib/iovec.c. No need to include
>> >> > linux/socket.h for it.
>> >> > 
>> >> > Signed-off-by: Asias He 
>> >> 
>> >> You can't do this.
>> >> 
>> >> Because this file doesn't include the header file that
>> >> provides the declaration, which is linux/uio.h
>> > 
>> > vhost.c includes drivers/vhost/vhost.h. In drivers/vhost/vhost.h, we
>> > have linux/uio.h included.
>> 
>> Nothing in vhost.h needs linux/uio.h right?  That's very poor style,
>> include the header where the dependency exists which is vhost.c
> 
> We use 'struct iovec' in vhost.h which needs linux/uio.h, no?
> 
> So, how about including linux/uio.h in both vhost.c and vhost.h.

That sounds good.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vhost: Include linux/uio.h instead of linux/socket.h

2013-08-20 Thread David Miller

From: Asias He 
Date: Mon, 19 Aug 2013 09:23:19 +0800

> memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
> linux/uio.h provides the declaration for memcpy_fromiovec.
> 
> Include linux/uio.h instead of inux/socket.h for it.
> 
> Signed-off-by: Asias He 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] VMXNET3: Add support for virtual IOMMU

2013-08-21 Thread David Miller

From: Andy King 
Date: Tue, 20 Aug 2013 10:33:32 -0700

> We can't just do virt_to_phys() on memory that we pass to the device and
> expect it to work in presence of a virtual IOMMU.  We need to add IOMMU
> mappings for such DMAs to work correctly.  Fix that with
> pci_alloc_consistent() where possible, or pci_map_single() where the
> mapping is short-lived or we don't control the allocation (netdev).
> 
> Also fix two small bugs:
> 1) use after free of rq->buf_info in vmxnet3_rq_destroy()
> 2) a cpu_to_le32() that should have been a cpu_to_le64()
> 
> Acked-by: George Zhang 
> Acked-by: Aditya Sarwade 
> Signed-off-by: Andy King 

Please use dma_alloc_coherent() (or in fact dma_zalloc_coherent()),
dma_map_single() et al., because they are preferred and in particular
allow specification of GFP_* flags.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] VMXNET3: Add support for virtual IOMMU

2013-08-27 Thread David Miller

From: Andy King 
Date: Fri, 23 Aug 2013 09:33:49 -0700

> This patch adds support for virtual IOMMU to the vmxnet3 module.  We
> switch to DMA consistent mappings for anything we pass to the device.
> There were a few places where we already did this, but using pci_blah();
> these have been fixed to use dma_blah(), along with all new occurrences
> where we've replaced kmalloc() and friends.
> 
> Also fix two small bugs:
> 1) use after free of rq->buf_info in vmxnet3_rq_destroy()
> 2) a cpu_to_le32() that should have been a cpu_to_le64()
> 
> Acked-by: George Zhang 
> Acked-by: Aditya Sarwade 
> Signed-off-by: Andy King 

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-net: Set RXCSUM feature if GUEST_CSUM is available

2013-08-29 Thread David Miller

From: Thomas Huth 
Date: Tue, 27 Aug 2013 17:09:02 +0200

> If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest
> does not have to calculate the checksums on all received packets. This
> is pretty much the same feature as RX checksum offloading on real
> network cards, so the virtio-net driver should report this by setting
> the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she
> can see whether the virtio-net interface has to calculate RX checksums
> or not.
> 
> Signed-off-by: Thomas Huth 

Can one of the virtio_net folks please review this?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/6] vhost code cleanup and minor enhancement

2013-09-03 Thread David Miller

From: Jason Wang 
Date: Mon,  2 Sep 2013 16:40:55 +0800

> This series tries to unify and simplify vhost codes especially for
> zerocopy. With this series, 5% - 10% improvement for per cpu throughput were
> seen during netperf guest sending test.
> 
> Plase review.

Applied and patch #5 queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread David Miller

From: "Michael S. Tsirkin" 
Date: Mon, 12 Apr 2021 18:33:45 -0400

> On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
>> OK I started looking at this again. My idea is simple.
>> A. disable callbacks before we try to drain skbs
>> B. actually do disable callbacks even with event idx
>> 
>> To make B not regress, we need to
>> C. detect the common case of disable after event triggering and skip the 
>> write then.
>> 
>> I added a new event_triggered flag for that.
>> Completely untested - but then I could not see the warnings either.
>> Would be very much interested to know whether this patch helps
>> resolve the sruprious interrupt problem at all ...
>> 
>> 
>> Signed-off-by: Michael S. Tsirkin 
> 
> Hmm a slightly cleaner alternative is to clear the flag when enabling 
> interrupts ...
> I wonder which cacheline it's best to use for this.
> 
> Signed-off-by: Michael S. Tsirkin 

Please make a fresh new submission if you want to use this approach, thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [net-next PATCH V2] virtio-net: switch to use XPS to choose txq

2013-09-30 Thread David Miller

From: Jason Wang 
Date: Mon, 30 Sep 2013 15:37:17 +0800

> We used to use a percpu structure vq_index to record the cpu to queue
> mapping, this is suboptimal since it duplicates the work of XPS and
> loses all other XPS functionality such as allowing use to configure
> their own transmission steering strategy.
> 
> So this patch switches to use XPS and suggest a default mapping when
> the number of cpus is equal to the number of queues. With XPS support,
> there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
> so they were removed also.
> 
> Cc: Rusty Russell 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
> Changes from V1:
> - use cpumask_of() instead of allocate dynamically

This generates build warnings:

drivers/net/virtio_net.c: In function ‘virtnet_set_affinity’:
drivers/net/virtio_net.c:1093:3: warning: passing argument 2 of 
‘netif_set_xps_queue’ discards ‘const’ qualifier from pointer target type 
[enabled by default]
In file included from drivers/net/virtio_net.c:20:0:
include/linux/netdevice.h:2275:5: note: expected ‘struct cpumask *’ but 
argument is of type ‘const struct cpumask *’
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

1 2 3 4 5 6 >

1 - 100 of 536 matches

Mail list logo