Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Sun, Nov 08, 2009 at 02:39:59PM +1030, Rusty Russell wrote: On Sat, 7 Nov 2009 03:00:07 am Paul E. McKenney wrote: On Fri, Nov 06, 2009 at 03:31:20PM +1030, Rusty Russell wrote: But it's still nasty to use half an API. If it were a few places I would have open-coded it with a comment, or wrapped it. As it is, I don't think that would be a win. So would it help to have a rcu_read_lock_workqueue() and rcu_read_unlock_workqueue() that checked nesting and whether they were actually running in the context of a workqueue item? Or did you have something else in mind? Or am I misjudging the level of sarcasm in your reply? ;-) You read correctly. If we get a second user, creating an API makes sense. Makes sense to me as well. Which does provide some time to come up with a primitive designed to answer the question Am I currently executing in the context of a workqueue item?. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Sat, 7 Nov 2009 03:00:07 am Paul E. McKenney wrote: On Fri, Nov 06, 2009 at 03:31:20PM +1030, Rusty Russell wrote: But it's still nasty to use half an API. If it were a few places I would have open-coded it with a comment, or wrapped it. As it is, I don't think that would be a win. So would it help to have a rcu_read_lock_workqueue() and rcu_read_unlock_workqueue() that checked nesting and whether they were actually running in the context of a workqueue item? Or did you have something else in mind? Or am I misjudging the level of sarcasm in your reply? ;-) You read correctly. If we get a second user, creating an API makes sense. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Thu, 5 Nov 2009 03:55:42 am Paul E. McKenney wrote: On Wed, Nov 04, 2009 at 01:57:29PM +0200, Michael S. Tsirkin wrote: Can you ack this usage please? I thought I had done so in my paragraph above, but if you would like something a bit more formal... snip verbose super-ack with qualifications That's great guys. And yes, this is a kind of read-copy-update. And no, there's nothing wrong with it. But it's still nasty to use half an API. If it were a few places I would have open-coded it with a comment, or wrapped it. As it is, I don't think that would be a win. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 10:11:12PM +0100, Eric Dumazet wrote: Michael S. Tsirkin a écrit : Paul, you acked this previously. Should I add you acked-by line so people calm down? If you would rather I replace rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. Or maybe patch Documentation to explain this RCU usage? So you believe I am over-reacting to this dubious use of RCU ? RCU documentation is already very complex, we dont need to add yet another subtle use, and makes it less readable. It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros : #define rcu_dereference(p) ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); \ (_p1); \ }) #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || \ ((v) != NULL)) \ smp_wmb(); \ (p) = (v); \ }) There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy Update, there is nothing wrong to use barriers with appropriate comments. Well, what I do has classic RCU characteristics: readers do not take locks, writers take a lock and flush after update. This is why I believe rcu_dereference and rcu_assign_pointer are more appropriate here than open-coding barriers. Before deciding whether it's a good idea to open-code barriers instead, I would like to hear Paul's opinion. (And you already use mb(), wmb(), rmb(), smp_wmb() in your patch) Yes, virtio guest pretty much forces this, there's no way to share a lock with the guest. BTW there is at least one locking bug in vhost_net_set_features() Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC static void vhost_net_set_features(struct vhost_net *n, u64 features) { size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? sizeof(struct virtio_net_hdr) : 0; int i; ! mutex_unlock(n-dev.mutex); n-dev.acked_features = features; smp_wmb(); for (i = 0; i VHOST_NET_VQ_MAX; ++i) { mutex_lock(n-vqs[i].mutex); n-vqs[i].hdr_size = hdr_size; mutex_unlock(n-vqs[i].mutex); } mutex_unlock(n-dev.mutex); vhost_net_flush(n); } Thanks very much for spotting this! Will fix. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 12:08:47PM +0100, Andi Kleen wrote: Michael S. Tsirkin m...@redhat.com writes: Haven't really read the whole thing, just noticed something at a glance. +/* Expects to be always run from workqueue - which acts as + * read-size critical section for our kind of RCU. */ +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); I haven't gone over all this code in detail, but that isolated reference count use looks suspicious. What prevents the mm from going away before you increment, if it's not the current one? We take a reference to it before we start any virtqueues, and stop all virtqueues before we drop the reference: /* Caller should have device mutex */ static long vhost_dev_set_owner(struct vhost_dev *dev) { /* Is there an owner already? */ if (dev-mm) return -EBUSY; /* No owner, become one */ dev-mm = get_task_mm(current); return 0; } And vhost_dev_cleanup: if (dev-mm) mmput(dev-mm); dev-mm = NULL; } Fine? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Fine? I cannot say -- are there paths that could drop the device beforehand? (as in do you hold a reference to it?) -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? (as in do you hold a reference to it?) By design I think I always have a reference to mm before I use it. This works like this: ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue ioctl RESET_OWNER - stops virtqueues, drops the reference to mm file close - stops virtqueues, if we still have it then drops mm This is why I think I can call use_mm/unuse_mm while virtqueue is running, safely. Makes sense? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? No the reference to the device, which owns the mm for you. (as in do you hold a reference to it?) By design I think I always have a reference to mm before I use it. This works like this: ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue ioctl RESET_OWNER - stops virtqueues, drops the reference to mm file close - stops virtqueues, if we still have it then drops mm This is why I think I can call use_mm/unuse_mm while virtqueue is running, safely. Makes sense? Do you protect against another thread doing RESET_OWNER in parallel while RESET_OWNER runs? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? No the reference to the device, which owns the mm for you. The device is created when file is open and destroyed when file is closed. So I think the fs code handles the reference counting for me: it won't call file cleanup callback while some userspace process has the file open. Right? (as in do you hold a reference to it?) By design I think I always have a reference to mm before I use it. This works like this: ioctl SET_OWNER - calls get_task_mm, I think this gets a reference to mm ioctl SET_BACKEND - checks that SET_OWNER was run, starts virtqueue ioctl RESET_OWNER - stops virtqueues, drops the reference to mm file close - stops virtqueues, if we still have it then drops mm This is why I think I can call use_mm/unuse_mm while virtqueue is running, safely. Makes sense? Do you protect against another thread doing RESET_OWNER in parallel while RESET_OWNER runs? Yes, I have a mutex in the device for that. Same with SET_BACKEND. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? No the reference to the device, which owns the mm for you. The device is created when file is open and destroyed when file is closed. So I think the fs code handles the reference counting for me: it won't call file cleanup callback while some userspace process has the file open. Right? Yes. But the semantics when someone inherits such a fd through exec or through file descriptor passing would be surely interesting You would still do IO on the old VM. I guess it would be a good way to confuse memory accounting schemes or administrators @) It would be all saner if this was all a single atomic step. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 02:37:28PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? No the reference to the device, which owns the mm for you. The device is created when file is open and destroyed when file is closed. So I think the fs code handles the reference counting for me: it won't call file cleanup callback while some userspace process has the file open. Right? Yes. But the semantics when someone inherits such a fd through exec or through file descriptor passing would be surely interesting You would still do IO on the old VM. I guess it would be a good way to confuse memory accounting schemes or administrators @) It would be all saner if this was all a single atomic step. -Andi I have this atomic actually. A child process will first thing do SET_OWNER: this is required before any other operation. SET_OWNER atomically (under mutex) does two things: - check that there is no other owner - get mm and set current process as owner I hope this addresses your concern? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 03:41:47PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 02:37:28PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:17:36PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 02:15:33PM +0100, Andi Kleen wrote: On Wed, Nov 04, 2009 at 03:08:28PM +0200, Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 01:59:57PM +0100, Andi Kleen wrote: Fine? I cannot say -- are there paths that could drop the device beforehand? Do you mean drop the mm reference? No the reference to the device, which owns the mm for you. The device is created when file is open and destroyed when file is closed. So I think the fs code handles the reference counting for me: it won't call file cleanup callback while some userspace process has the file open. Right? Yes. But the semantics when someone inherits such a fd through exec or through file descriptor passing would be surely interesting You would still do IO on the old VM. I guess it would be a good way to confuse memory accounting schemes or administrators @) It would be all saner if this was all a single atomic step. -Andi I have this atomic actually. A child process will first thing do SET_OWNER: this is required before any other operation. SET_OWNER atomically (under mutex) does two things: - check that there is no other owner - get mm and set current process as owner I hope this addresses your concern? Andrea, since you looked at this design at the early stages, maybe you can provide feedback on the following question: vhost has an ioctl to do get_task_mm and store the mm in per-file device structure. mmput is called when file is closed. vhost is careful not to reference the mm after is has been put. There is also an atomic mutual exclusion mechanism to ensure that vhost does not allow one process to access another's mm, even if they share a vhost file descriptor. But, this still means that mm structure can outlive the task if the file descriptor is shared with another process. Other drivers, such as kvm, have the same property. Do you think this is OK? Thanks! -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 01:57:29PM +0200, Michael S. Tsirkin wrote: On Tue, Nov 03, 2009 at 03:57:44PM -0800, Paul E. McKenney wrote: On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote: Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 What was happening in that case was that the rcu_dereference() was being used in a workqueue item. The role of rcu_read_lock() was taken on be the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue(). This does work, at least assuming that flush_workqueue() operates as advertised, which it appears to at first glance. The above code looks somewhat different, however -- I don't see handle_tx() being executed in the context of a work queue. Instead it appears to be in an interrupt handler. So what is the story? Using synchronize_irq() or some such? Thanx, Paul No, there has been no change (I won't be able to use a mutex in an interrupt handler, will I?). handle_tx is still called in the context of a work queue: either from handle_tx_kick or from handle_tx_net which are work queue items. Ah, my mistake -- I was looking at 2.6.31 rather than latest git with your patches. Can you ack this usage please? I thought I had done so in my paragraph above, but if you would like something a bit more formal... I, Paul E. McKenney, maintainer of the RCU implmentation embodied in the Linux kernel and co-inventor of RCU, being of sound mind and body, notwithstanding the wear and tear inherent in my numerous decades sojourn on this planet, hereby declare that the following usage of work queues constitutes a valid RCU implementation: 1. Execution of a full workqueue item being substituted for a conventional RCU read-side critical section, so that the start of execution of the function specified to INIT_WORK() corresponds to rcu_read_lock(), and the end of this self-same function corresponds to rcu_read_unlock(). 2. Execution of flush_workqueue() being substituted for the conventional synchronize_rcu(). The kernel developer availing himself or herself of this declaration must observe the following caveats: a. The function specified to INIT_WORK() may only be invoked via the workqueue mechanism. Invoking said function directly renders this declaration null and void, as it prevents the flush_workqueue() function from delivering the fundamental guarantee inherent in RCU. b. At some point in the future, said developer may be required to apply some gcc attribute or sparse annotation to the function passed to INIT_WORK(). Beyond that point, failure to comply will render this declaration null and void, as such failure would render inoperative some potential RCU-validation tools, as duly noted by Eric Dumazet. c. This
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Paul E. McKenney a écrit : (Sorry, but, as always, I could not resist!) Yes :) Thanks Paul for this masterpiece of diplomatic Acked-by ;) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Wed, Nov 04, 2009 at 09:25:42AM -0800, Paul E. McKenney wrote: (Sorry, but, as always, I could not resist!) Thanx, Paul Thanks Paul! Jonathan: are you reading this? Another one for your quotes of the week collection :) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin wrote: On Wed, Nov 04, 2009 at 09:25:42AM -0800, Paul E. McKenney wrote: (Sorry, but, as always, I could not resist!) Thanx, Paul Thanks Paul! Jonathan: are you reading this? Another one for your quotes of the week collection :) I second that. signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; +unsigned head, out, in, s; +struct msghdr msg = { +.msg_name = NULL, +.msg_namelen = 0, +.msg_control = NULL, +.msg_controllen = 0, +.msg_iov = vq-iov, +.msg_flags = MSG_DONTWAIT, +}; +size_t len, total_len = 0; +int err, wmem; +size_t hdr_size; +struct socket *sock = rcu_dereference(vq-private_data); +if (!sock) +return; + +wmem = atomic_read(sock-sk-sk_wmem_alloc); +if (wmem = sock-sk-sk_sndbuf) +return; + +use_mm(net-dev.mm); +mutex_lock(vq-mutex); +vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Gregory Haskins a écrit : Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU use. People wanting to use RCU do a grep on kernel sources to find how to correctly use RCU. Michael, please use existing locking/barrier mechanisms, and not pretend to use RCU. Some automatic tools might barf later. For example, we could add a debugging facility to check that rcu_dereference() is used in an appropriate context, ie conflict with existing mutex_lock() debugging facility. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote: Gregory Haskins a écrit : Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; +unsigned head, out, in, s; +struct msghdr msg = { +.msg_name = NULL, +.msg_namelen = 0, +.msg_control = NULL, +.msg_controllen = 0, +.msg_iov = vq-iov, +.msg_flags = MSG_DONTWAIT, +}; +size_t len, total_len = 0; +int err, wmem; +size_t hdr_size; +struct socket *sock = rcu_dereference(vq-private_data); +if (!sock) +return; + +wmem = atomic_read(sock-sk-sk_wmem_alloc); +if (wmem = sock-sk-sk_sndbuf) +return; + +use_mm(net-dev.mm); +mutex_lock(vq-mutex); +vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU use. People wanting to use RCU do a grep on kernel sources to find how to correctly use RCU. Michael, please use existing locking/barrier mechanisms, and not pretend to use RCU. Some automatic tools might barf later. For example, we could add a debugging facility to check that rcu_dereference() is used in an appropriate context, ie conflict with existing mutex_lock() debugging facility. Paul, you acked this previously. Should I add you acked-by line so people calm down? If you would rather I replace rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. Or maybe patch Documentation to explain this RCU usage? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) This use is correct. See comment in vhost.h This use of RCU has been acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well. There are many ways to use RCU not all of which involve read_rcu_lock. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote: Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : +static void handle_tx(struct vhost_net *net) +{ + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + unsigned head, out, in, s; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_iov = vq-iov, + .msg_flags = MSG_DONTWAIT, + }; + size_t len, total_len = 0; + int err, wmem; + size_t hdr_size; + struct socket *sock = rcu_dereference(vq-private_data); + if (!sock) + return; + + wmem = atomic_read(sock-sk-sk_wmem_alloc); + if (wmem = sock-sk-sk_sndbuf) + return; + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); + using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 What was happening in that case was that the rcu_dereference() was being used in a workqueue item. The role of rcu_read_lock() was taken on be the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue(). This does work, at least assuming that flush_workqueue() operates as advertised, which it appears to at first glance. The above code looks somewhat different, however -- I don't see handle_tx() being executed in the context of a work queue. Instead it appears to be in an interrupt handler. So what is the story? Using synchronize_irq() or some such? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Eric Dumazet wrote: Gregory Haskins a écrit : Gregory Haskins wrote: Eric Dumazet wrote: Michael S. Tsirkin a écrit : using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect that your use of RCU is not correct. 1) rcu_dereference() should be done inside a read_rcu_lock() section, and we are not allowed to sleep in such a section. (Quoting Documentation/RCU/whatisRCU.txt : It is illegal to block while in an RCU read-side critical section, ) 2) mutex_lock() can sleep (ie block) Michael, I warned you that this needed better documentation ;) Eric, I think I flagged this once before, but Michael convinced me that it was indeed ok, if but perhaps a bit unconventional. I will try to find the thread. Kind Regards, -Greg Here it is: http://lkml.org/lkml/2009/8/12/173 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU use. People wanting to use RCU do a grep on kernel sources to find how to correctly use RCU. Michael, please use existing locking/barrier mechanisms, and not pretend to use RCU. Yes, I would tend to agree with you. In fact, I think I suggested that a normal barrier should be used instead of abusing rcu_dereference(). But as far as his code is concerned, I think it technically works properly, and that was my main point. Also note that the usage rcu_dereference+mutex_lock() are not necessarily broken, per se: it could be an srcu-based critical section created by the caller, for instance. It would be perfectly legal to sleep on the mutex if that were the case. To me, the bigger issue is that the rcu_dereference() without any apparent hint of a corresponding RSCS is simply confusing as a reviewer. smp_rmb() (or whatever is proper in this case) is probably more appropriate. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
Michael S. Tsirkin a écrit : Paul, you acked this previously. Should I add you acked-by line so people calm down? If you would rather I replace rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this. Or maybe patch Documentation to explain this RCU usage? So you believe I am over-reacting to this dubious use of RCU ? RCU documentation is already very complex, we dont need to add yet another subtle use, and makes it less readable. It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros : #define rcu_dereference(p) ({ \ typeof(p) _p1 = ACCESS_ONCE(p); \ smp_read_barrier_depends(); \ (_p1); \ }) #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || \ ((v) != NULL)) \ smp_wmb(); \ (p) = (v); \ }) There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy Update, there is nothing wrong to use barriers with appropriate comments. (And you already use mb(), wmb(), rmb(), smp_wmb() in your patch) BTW there is at least one locking bug in vhost_net_set_features() Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC static void vhost_net_set_features(struct vhost_net *n, u64 features) { size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? sizeof(struct virtio_net_hdr) : 0; int i; ! mutex_unlock(n-dev.mutex); n-dev.acked_features = features; smp_wmb(); for (i = 0; i VHOST_NET_VQ_MAX; ++i) { mutex_lock(n-vqs[i].mutex); n-vqs[i].hdr_size = hdr_size; mutex_unlock(n-vqs[i].mutex); } mutex_unlock(n-dev.mutex); vhost_net_flush(n); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html