Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-08 Thread Paul E. McKenney
On Sun, Nov 08, 2009 at 02:39:59PM +1030, Rusty Russell wrote:
 On Sat, 7 Nov 2009 03:00:07 am Paul E. McKenney wrote:
  On Fri, Nov 06, 2009 at 03:31:20PM +1030, Rusty Russell wrote:
   But it's still nasty to use half an API.  If it were a few places I would
   have open-coded it with a comment, or wrapped it.  As it is, I don't think
   that would be a win.
  
  So would it help to have a rcu_read_lock_workqueue() and
  rcu_read_unlock_workqueue() that checked nesting and whether they were
  actually running in the context of a workqueue item?  Or did you have
  something else in mind?  Or am I misjudging the level of sarcasm in
  your reply?  ;-)
 
 You read correctly.  If we get a second user, creating an API makes sense.

Makes sense to me as well.  Which does provide some time to come up with
a primitive designed to answer the question Am I currently executing in
the context of a workqueue item?.  ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-07 Thread Rusty Russell
On Sat, 7 Nov 2009 03:00:07 am Paul E. McKenney wrote:
 On Fri, Nov 06, 2009 at 03:31:20PM +1030, Rusty Russell wrote:
  But it's still nasty to use half an API.  If it were a few places I would
  have open-coded it with a comment, or wrapped it.  As it is, I don't think
  that would be a win.
 
 So would it help to have a rcu_read_lock_workqueue() and
 rcu_read_unlock_workqueue() that checked nesting and whether they were
 actually running in the context of a workqueue item?  Or did you have
 something else in mind?  Or am I misjudging the level of sarcasm in
 your reply?  ;-)

You read correctly.  If we get a second user, creating an API makes sense.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-05 Thread Rusty Russell
On Thu, 5 Nov 2009 03:55:42 am Paul E. McKenney wrote:
 On Wed, Nov 04, 2009 at 01:57:29PM +0200, Michael S. Tsirkin wrote:
  Can you ack this usage please?
 
 I thought I had done so in my paragraph above, but if you would like
 something a bit more formal...

snip verbose super-ack with qualifications

That's great guys.  And yes, this is a kind of read-copy-update.  And no,
there's nothing wrong with it.

But it's still nasty to use half an API.  If it were a few places I would
have open-coded it with a comment, or wrapped it.  As it is, I don't think
that would be a win.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 10:11:12PM +0100, Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
  
  Paul, you acked this previously. Should I add you acked-by line so
  people calm down?  If you would rather I replace
  rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
  Or maybe patch Documentation to explain this RCU usage?
  
 
 So you believe I am over-reacting to this dubious use of RCU ?
 
 RCU documentation is already very complex, we dont need to add yet another
 subtle use, and makes it less readable.
 
 It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros :
 
 #define rcu_dereference(p) ({ \
 typeof(p) _p1 = ACCESS_ONCE(p); \
 smp_read_barrier_depends(); \
 (_p1); \
 })
 
 #define rcu_assign_pointer(p, v) \
 ({ \
 if (!__builtin_constant_p(v) || \
 ((v) != NULL)) \
 smp_wmb(); \
 (p) = (v); \
 })
 
 
 There are plenty regular uses of smp_wmb() in kernel, not related to Read 
 Copy Update,
 there is nothing wrong to use barriers with appropriate comments.

Well, what I do has classic RCU characteristics: readers do not take
locks, writers take a lock and flush after update. This is why I believe
rcu_dereference and rcu_assign_pointer are more appropriate here than
open-coding barriers.

Before deciding whether it's a good idea to open-code barriers
instead, I would like to hear Paul's opinion.

 
 (And you already use mb(), wmb(), rmb(), smp_wmb() in your patch)

Yes, virtio guest pretty much forces this, there's no way to share
a lock with the guest.

 BTW there is at least one locking bug in vhost_net_set_features()
 
 Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked
 by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC
 
 
 static void vhost_net_set_features(struct vhost_net *n, u64 features)
 {
size_t hdr_size = features  (1  VHOST_NET_F_VIRTIO_NET_HDR) ?
sizeof(struct virtio_net_hdr) : 0;
int i;
 !  mutex_unlock(n-dev.mutex);
n-dev.acked_features = features;
smp_wmb();
for (i = 0; i  VHOST_NET_VQ_MAX; ++i) {
mutex_lock(n-vqs[i].mutex);
n-vqs[i].hdr_size = hdr_size;
mutex_unlock(n-vqs[i].mutex);
}
mutex_unlock(n-dev.mutex);
vhost_net_flush(n);
 }

Thanks very much for spotting this! Will fix.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 12:08:47PM +0100, Andi Kleen wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
 Haven't really read the whole thing, just noticed something at a glance.
 
  +/* Expects to be always run from workqueue - which acts as
  + * read-size critical section for our kind of RCU. */
  +static void handle_tx(struct vhost_net *net)
  +{
  +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +   unsigned head, out, in, s;
  +   struct msghdr msg = {
  +   .msg_name = NULL,
  +   .msg_namelen = 0,
  +   .msg_control = NULL,
  +   .msg_controllen = 0,
  +   .msg_iov = vq-iov,
  +   .msg_flags = MSG_DONTWAIT,
  +   };
  +   size_t len, total_len = 0;
  +   int err, wmem;
  +   size_t hdr_size;
  +   struct socket *sock = rcu_dereference(vq-private_data);
  +   if (!sock)
  +   return;
  +
  +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +   if (wmem = sock-sk-sk_sndbuf)
  +   return;
  +
  +   use_mm(net-dev.mm);
 
 I haven't gone over all this code in detail, but that isolated reference count
 use looks suspicious. What prevents the mm from going away before
 you increment, if it's not the current one?

We take a reference to it before we start any virtqueues,
and stop all virtqueues before we drop the reference:
/* Caller should have device mutex */
static long vhost_dev_set_owner(struct vhost_dev *dev)
{
/* Is there an owner already? */
if (dev-mm)
return -EBUSY;
/* No owner, become one */
dev-mm = get_task_mm(current);
return 0;
}

And 
vhost_dev_cleanup:


if (dev-mm)
mmput(dev-mm);
dev-mm = NULL;
}


Fine?

 -Andi 
 
 -- 
 a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Andi Kleen
 Fine?

I cannot say -- are there paths that could drop the device beforehand?
(as in do you hold a reference to it?)

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
  Fine?
 
 I cannot say -- are there paths that could drop the device beforehand?

Do you mean drop the mm reference?

 (as in do you hold a reference to it?)

By design I think I always have a reference to mm before I use it.

This works like this:
ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm
ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue
ioctl RESET_OWNER - stops virtqueues, drops the reference to mm
file close - stops virtqueues, if we still have it then drops mm

This is why I think I can call use_mm/unuse_mm while virtqueue is running,
safely.
Makes sense?

 -Andi
 -- 
 a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Andi Kleen
On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
   Fine?
  
  I cannot say -- are there paths that could drop the device beforehand?
 
 Do you mean drop the mm reference?

No the reference to the device, which owns the mm for you.

 
  (as in do you hold a reference to it?)
 
 By design I think I always have a reference to mm before I use it.
 
 This works like this:
 ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm
 ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue
 ioctl RESET_OWNER - stops virtqueues, drops the reference to mm
 file close - stops virtqueues, if we still have it then drops mm
 
 This is why I think I can call use_mm/unuse_mm while virtqueue is running,
 safely.
 Makes sense?

Do you protect against another thread doing RESET_OWNER in parallel while
RESET_OWNER runs?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote:
 On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
Fine?
   
   I cannot say -- are there paths that could drop the device beforehand?
  
  Do you mean drop the mm reference?
 
 No the reference to the device, which owns the mm for you.

The device is created when file is open and destroyed
when file is closed. So I think the fs code handles the
reference counting for me: it won't call file cleanup
callback while some userspace process has the file open.
Right?

  
   (as in do you hold a reference to it?)
  
  By design I think I always have a reference to mm before I use it.
  
  This works like this:
  ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm
  ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue
  ioctl RESET_OWNER - stops virtqueues, drops the reference to mm
  file close - stops virtqueues, if we still have it then drops mm
  
  This is why I think I can call use_mm/unuse_mm while virtqueue is running,
  safely.
  Makes sense?
 
 Do you protect against another thread doing RESET_OWNER in parallel while
 RESET_OWNER runs?

Yes, I have a mutex in the device for that. Same with SET_BACKEND.

 -Andi
 
 -- 
 a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Andi Kleen
On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote:
  On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote:
   On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
 Fine?

I cannot say -- are there paths that could drop the device beforehand?
   
   Do you mean drop the mm reference?
  
  No the reference to the device, which owns the mm for you.
 
 The device is created when file is open and destroyed
 when file is closed. So I think the fs code handles the
 reference counting for me: it won't call file cleanup
 callback while some userspace process has the file open.
 Right?

Yes.

But the semantics when someone inherits such a fd through exec
or through file descriptor passing would be surely interesting
You would still do IO on the old VM.

I guess it would be a good way to confuse memory accounting schemes 
or administrators @)

It would be all saner if this was all a single atomic step.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 02:37:28PM +0100, Andi Kleen wrote:
 On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote:
   On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote:
On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
  Fine?
 
 I cannot say -- are there paths that could drop the device beforehand?

Do you mean drop the mm reference?
   
   No the reference to the device, which owns the mm for you.
  
  The device is created when file is open and destroyed
  when file is closed. So I think the fs code handles the
  reference counting for me: it won't call file cleanup
  callback while some userspace process has the file open.
  Right?
 
 Yes.
 
 But the semantics when someone inherits such a fd through exec
 or through file descriptor passing would be surely interesting
 You would still do IO on the old VM.
 
 I guess it would be a good way to confuse memory accounting schemes 
 or administrators @)
 It would be all saner if this was all a single atomic step.
 
 -Andi

I have this atomic actually. A child process will first thing
do SET_OWNER: this is required before any other operation.

SET_OWNER atomically (under mutex) does two things:
- check that there is no other owner
- get mm and set current process as owner

I hope this addresses your concern?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 03:41:47PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 04, 2009 at 02:37:28PM +0100, Andi Kleen wrote:
  On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote:
   On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote:
On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote:
   Fine?
  
  I cannot say -- are there paths that could drop the device 
  beforehand?
 
 Do you mean drop the mm reference?

No the reference to the device, which owns the mm for you.
   
   The device is created when file is open and destroyed
   when file is closed. So I think the fs code handles the
   reference counting for me: it won't call file cleanup
   callback while some userspace process has the file open.
   Right?
  
  Yes.
  
  But the semantics when someone inherits such a fd through exec
  or through file descriptor passing would be surely interesting
  You would still do IO on the old VM.
  
  I guess it would be a good way to confuse memory accounting schemes 
  or administrators @)
  It would be all saner if this was all a single atomic step.
  
  -Andi
 
 I have this atomic actually. A child process will first thing
 do SET_OWNER: this is required before any other operation.
 
 SET_OWNER atomically (under mutex) does two things:
 - check that there is no other owner
 - get mm and set current process as owner
 
 I hope this addresses your concern?

Andrea, since you looked at this design at the early stages,
maybe you can provide feedback on the following question:

vhost has an ioctl to do get_task_mm and store the mm in per-file device
structure.  mmput is called when file is closed.  vhost is careful not
to reference the mm after is has been put. There is also an atomic
mutual exclusion mechanism to ensure that vhost does not allow one
process to access another's mm, even if they share a vhost file
descriptor.  But, this still means that mm structure can outlive the
task if the file descriptor is shared with another process.

Other drivers, such as kvm, have the same property.

Do you think this is OK?

Thanks!

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Paul E. McKenney
On Wed, Nov 04, 2009 at 01:57:29PM +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 03, 2009 at 03:57:44PM -0800, Paul E. McKenney wrote:
  On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
   Gregory Haskins wrote:
Eric Dumazet wrote:
Michael S. Tsirkin a écrit :
+static void handle_tx(struct vhost_net *net)
+{
+ struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
+ unsigned head, out, in, s;
+ struct msghdr msg = {
+ .msg_name = NULL,
+ .msg_namelen = 0,
+ .msg_control = NULL,
+ .msg_controllen = 0,
+ .msg_iov = vq-iov,
+ .msg_flags = MSG_DONTWAIT,
+ };
+ size_t len, total_len = 0;
+ int err, wmem;
+ size_t hdr_size;
+ struct socket *sock = rcu_dereference(vq-private_data);
+ if (!sock)
+ return;
+
+ wmem = atomic_read(sock-sk-sk_wmem_alloc);
+ if (wmem = sock-sk-sk_sndbuf)
+ return;
+
+ use_mm(net-dev.mm);
+ mutex_lock(vq-mutex);
+ vhost_no_notify(vq);
+
using rcu_dereference() and mutex_lock() at the same time seems wrong, 
I suspect
that your use of RCU is not correct.
   
1) rcu_dereference() should be done inside a read_rcu_lock() section, 
and
   we are not allowed to sleep in such a section.
   (Quoting Documentation/RCU/whatisRCU.txt :
 It is illegal to block while in an RCU read-side critical 
section, )
   
2) mutex_lock() can sleep (ie block)
   


Michael,
  I warned you that this needed better documentation ;)

Eric,
  I think I flagged this once before, but Michael convinced me that it
was indeed ok, if but perhaps a bit unconventional.  I will try to
find the thread.

Kind Regards,
-Greg

   
   Here it is:
   
   http://lkml.org/lkml/2009/8/12/173
  
  What was happening in that case was that the rcu_dereference()
  was being used in a workqueue item.  The role of rcu_read_lock()
  was taken on be the start of execution of the workqueue item, of
  rcu_read_unlock() by the end of execution of the workqueue item, and
  of synchronize_rcu() by flush_workqueue().  This does work, at least
  assuming that flush_workqueue() operates as advertised, which it appears
  to at first glance.
  
  The above code looks somewhat different, however -- I don't see
  handle_tx() being executed in the context of a work queue.  Instead
  it appears to be in an interrupt handler.
  So what is the story?  Using synchronize_irq() or some such?
  
  Thanx, Paul
 
 No, there has been no change (I won't be able to use a mutex in an
 interrupt handler, will I?).  handle_tx is still called in the context
 of a work queue: either from handle_tx_kick or from handle_tx_net which
 are work queue items.

Ah, my mistake -- I was looking at 2.6.31 rather than latest git with
your patches.

 Can you ack this usage please?

I thought I had done so in my paragraph above, but if you would like
something a bit more formal...

I, Paul E. McKenney, maintainer of the RCU implmentation
embodied in the Linux kernel and co-inventor of RCU, being of
sound mind and body, notwithstanding the wear and tear inherent
in my numerous decades sojourn on this planet, hereby declare
that the following usage of work queues constitutes a valid
RCU implementation:

1.  Execution of a full workqueue item being substituted
for a conventional RCU read-side critical section, so
that the start of execution of the function specified to
INIT_WORK() corresponds to rcu_read_lock(), and the end of
this self-same function corresponds to rcu_read_unlock().

2.  Execution of flush_workqueue() being substituted for
the conventional synchronize_rcu().

The kernel developer availing himself or herself of this
declaration must observe the following caveats:

a.  The function specified to INIT_WORK() may only be
invoked via the workqueue mechanism.  Invoking said
function directly renders this declaration null
and void, as it prevents the flush_workqueue() function
from delivering the fundamental guarantee inherent in RCU.

b.  At some point in the future, said developer may be
required to apply some gcc attribute or sparse annotation
to the function passed to INIT_WORK().  Beyond that
point, failure to comply will render this declaration
null and void, as such failure would render inoperative
some potential RCU-validation tools, as duly noted by
Eric Dumazet.

c.  This 

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Eric Dumazet
Paul E. McKenney a écrit :
 
 (Sorry, but, as always, I could not resist!)

Yes :)

Thanks Paul for this masterpiece of diplomatic Acked-by ;)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Michael S. Tsirkin
On Wed, Nov 04, 2009 at 09:25:42AM -0800, Paul E. McKenney wrote:
 (Sorry, but, as always, I could not resist!)
 
   Thanx, Paul

Thanks Paul!
Jonathan: are you reading this?
Another one for your quotes of the week collection :)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 On Wed, Nov 04, 2009 at 09:25:42AM -0800, Paul E. McKenney wrote:
 (Sorry, but, as always, I could not resist!)

  Thanx, Paul
 
 Thanks Paul!
 Jonathan: are you reading this?
 Another one for your quotes of the week collection :)
 

I second that.



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 + unsigned head, out, in, s;
 + struct msghdr msg = {
 + .msg_name = NULL,
 + .msg_namelen = 0,
 + .msg_control = NULL,
 + .msg_controllen = 0,
 + .msg_iov = vq-iov,
 + .msg_flags = MSG_DONTWAIT,
 + };
 + size_t len, total_len = 0;
 + int err, wmem;
 + size_t hdr_size;
 + struct socket *sock = rcu_dereference(vq-private_data);
 + if (!sock)
 + return;
 +
 + wmem = atomic_read(sock-sk-sk_wmem_alloc);
 + if (wmem = sock-sk-sk_sndbuf)
 + return;
 +
 + use_mm(net-dev.mm);
 + mutex_lock(vq-mutex);
 + vhost_no_notify(vq);
 +

using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect
that your use of RCU is not correct.

1) rcu_dereference() should be done inside a read_rcu_lock() section, and
   we are not allowed to sleep in such a section.
   (Quoting Documentation/RCU/whatisRCU.txt :
 It is illegal to block while in an RCU read-side critical section, )

2) mutex_lock() can sleep (ie block)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +   unsigned head, out, in, s;
 +   struct msghdr msg = {
 +   .msg_name = NULL,
 +   .msg_namelen = 0,
 +   .msg_control = NULL,
 +   .msg_controllen = 0,
 +   .msg_iov = vq-iov,
 +   .msg_flags = MSG_DONTWAIT,
 +   };
 +   size_t len, total_len = 0;
 +   int err, wmem;
 +   size_t hdr_size;
 +   struct socket *sock = rcu_dereference(vq-private_data);
 +   if (!sock)
 +   return;
 +
 +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +   if (wmem = sock-sk-sk_sndbuf)
 +   return;
 +
 +   use_mm(net-dev.mm);
 +   mutex_lock(vq-mutex);
 +   vhost_no_notify(vq);
 +
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 
 
 Michael,
   I warned you that this needed better documentation ;)
 
 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.
 
 Kind Regards,
 -Greg
 

Here it is:

http://lkml.org/lkml/2009/8/12/173

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +unsigned head, out, in, s;
 +struct msghdr msg = {
 +.msg_name = NULL,
 +.msg_namelen = 0,
 +.msg_control = NULL,
 +.msg_controllen = 0,
 +.msg_iov = vq-iov,
 +.msg_flags = MSG_DONTWAIT,
 +};
 +size_t len, total_len = 0;
 +int err, wmem;
 +size_t hdr_size;
 +struct socket *sock = rcu_dereference(vq-private_data);
 +if (!sock)
 +return;
 +
 +wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +if (wmem = sock-sk-sk_sndbuf)
 +return;
 +
 +use_mm(net-dev.mm);
 +mutex_lock(vq-mutex);
 +vhost_no_notify(vq);
 +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)
 


Michael,
  I warned you that this needed better documentation ;)

Eric,
  I think I flagged this once before, but Michael convinced me that it
was indeed ok, if but perhaps a bit unconventional.  I will try to
find the thread.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Gregory Haskins a écrit :
 Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +  struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +  unsigned head, out, in, s;
 +  struct msghdr msg = {
 +  .msg_name = NULL,
 +  .msg_namelen = 0,
 +  .msg_control = NULL,
 +  .msg_controllen = 0,
 +  .msg_iov = vq-iov,
 +  .msg_flags = MSG_DONTWAIT,
 +  };
 +  size_t len, total_len = 0;
 +  int err, wmem;
 +  size_t hdr_size;
 +  struct socket *sock = rcu_dereference(vq-private_data);
 +  if (!sock)
 +  return;
 +
 +  wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +  if (wmem = sock-sk-sk_sndbuf)
 +  return;
 +
 +  use_mm(net-dev.mm);
 +  mutex_lock(vq-mutex);
 +  vhost_no_notify(vq);
 +
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)


 Michael,
   I warned you that this needed better documentation ;)

 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.

 Kind Regards,
 -Greg

 
 Here it is:
 
 http://lkml.org/lkml/2009/8/12/173
 

Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
use.
People wanting to use RCU do a grep on kernel sources to find how to correctly
use RCU.

Michael, please use existing locking/barrier mechanisms, and not pretend to use 
RCU.

Some automatic tools might barf later.

For example, we could add a debugging facility to check that rcu_dereference() 
is used
in an appropriate context, ie conflict with existing mutex_lock() debugging 
facility.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote:
 Gregory Haskins a écrit :
  Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +unsigned head, out, in, s;
  +struct msghdr msg = {
  +.msg_name = NULL,
  +.msg_namelen = 0,
  +.msg_control = NULL,
  +.msg_controllen = 0,
  +.msg_iov = vq-iov,
  +.msg_flags = MSG_DONTWAIT,
  +};
  +size_t len, total_len = 0;
  +int err, wmem;
  +size_t hdr_size;
  +struct socket *sock = rcu_dereference(vq-private_data);
  +if (!sock)
  +return;
  +
  +wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +if (wmem = sock-sk-sk_sndbuf)
  +return;
  +
  +use_mm(net-dev.mm);
  +mutex_lock(vq-mutex);
  +vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
 
  Michael,
I warned you that this needed better documentation ;)
 
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
 
  Kind Regards,
  -Greg
 
  
  Here it is:
  
  http://lkml.org/lkml/2009/8/12/173
  
 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.
 
 Some automatic tools might barf later.
 
 For example, we could add a debugging facility to check that 
 rcu_dereference() is used
 in an appropriate context, ie conflict with existing mutex_lock() debugging 
 facility.


Paul, you acked this previously. Should I add you acked-by line so
people calm down?  If you would rather I replace
rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
Or maybe patch Documentation to explain this RCU usage?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +   unsigned head, out, in, s;
  +   struct msghdr msg = {
  +   .msg_name = NULL,
  +   .msg_namelen = 0,
  +   .msg_control = NULL,
  +   .msg_controllen = 0,
  +   .msg_iov = vq-iov,
  +   .msg_flags = MSG_DONTWAIT,
  +   };
  +   size_t len, total_len = 0;
  +   int err, wmem;
  +   size_t hdr_size;
  +   struct socket *sock = rcu_dereference(vq-private_data);
  +   if (!sock)
  +   return;
  +
  +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +   if (wmem = sock-sk-sk_sndbuf)
  +   return;
  +
  +   use_mm(net-dev.mm);
  +   mutex_lock(vq-mutex);
  +   vhost_no_notify(vq);
  +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)

This use is correct. See comment in vhost.h This use of RCU has been
acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well.
There are many ways to use RCU not all of which involve read_rcu_lock.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Paul E. McKenney
On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
 Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  + unsigned head, out, in, s;
  + struct msghdr msg = {
  + .msg_name = NULL,
  + .msg_namelen = 0,
  + .msg_control = NULL,
  + .msg_controllen = 0,
  + .msg_iov = vq-iov,
  + .msg_flags = MSG_DONTWAIT,
  + };
  + size_t len, total_len = 0;
  + int err, wmem;
  + size_t hdr_size;
  + struct socket *sock = rcu_dereference(vq-private_data);
  + if (!sock)
  + return;
  +
  + wmem = atomic_read(sock-sk-sk_wmem_alloc);
  + if (wmem = sock-sk-sk_sndbuf)
  + return;
  +
  + use_mm(net-dev.mm);
  + mutex_lock(vq-mutex);
  + vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
  
  
  Michael,
I warned you that this needed better documentation ;)
  
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
  
  Kind Regards,
  -Greg
  
 
 Here it is:
 
 http://lkml.org/lkml/2009/8/12/173

What was happening in that case was that the rcu_dereference()
was being used in a workqueue item.  The role of rcu_read_lock()
was taken on be the start of execution of the workqueue item, of
rcu_read_unlock() by the end of execution of the workqueue item, and
of synchronize_rcu() by flush_workqueue().  This does work, at least
assuming that flush_workqueue() operates as advertised, which it appears
to at first glance.

The above code looks somewhat different, however -- I don't see
handle_tx() being executed in the context of a work queue.  Instead
it appears to be in an interrupt handler.

So what is the story?  Using synchronize_irq() or some such?

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Gregory Haskins a écrit :
 Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :

 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 Michael,
   I warned you that this needed better documentation ;)

 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.

 Kind Regards,
 -Greg

 Here it is:

 http://lkml.org/lkml/2009/8/12/173

 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.

Yes, I would tend to agree with you.  In fact, I think I suggested that
a normal barrier should be used instead of abusing rcu_dereference().

But as far as his code is concerned, I think it technically works
properly, and that was my main point.  Also note that the usage
rcu_dereference+mutex_lock() are not necessarily broken, per se:  it
could be an srcu-based critical section created by the caller, for
instance.  It would be perfectly legal to sleep on the mutex if that
were the case.

To me, the bigger issue is that the rcu_dereference() without any
apparent hint of a corresponding RSCS is simply confusing as a reviewer.
 smp_rmb() (or whatever is proper in this case) is probably more
appropriate.

Kind Regards,
-Greg




signature.asc
Description: OpenPGP digital signature


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 
 Paul, you acked this previously. Should I add you acked-by line so
 people calm down?  If you would rather I replace
 rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
 Or maybe patch Documentation to explain this RCU usage?
 

So you believe I am over-reacting to this dubious use of RCU ?

RCU documentation is already very complex, we dont need to add yet another
subtle use, and makes it less readable.

It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros :

#define rcu_dereference(p) ({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
(_p1); \
})

#define rcu_assign_pointer(p, v) \
({ \
if (!__builtin_constant_p(v) || \
((v) != NULL)) \
smp_wmb(); \
(p) = (v); \
})


There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy 
Update,
there is nothing wrong to use barriers with appropriate comments.

(And you already use mb(), wmb(), rmb(), smp_wmb() in your patch)


BTW there is at least one locking bug in vhost_net_set_features()

Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked
by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC


static void vhost_net_set_features(struct vhost_net *n, u64 features)
{
   size_t hdr_size = features  (1  VHOST_NET_F_VIRTIO_NET_HDR) ?
   sizeof(struct virtio_net_hdr) : 0;
   int i;
!  mutex_unlock(n-dev.mutex);
   n-dev.acked_features = features;
   smp_wmb();
   for (i = 0; i  VHOST_NET_VQ_MAX; ++i) {
   mutex_lock(n-vqs[i].mutex);
   n-vqs[i].hdr_size = hdr_size;
   mutex_unlock(n-vqs[i].mutex);
   }
   mutex_unlock(n-dev.mutex);
   vhost_net_flush(n);
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html