Re: [PATCHv4 1/6] qemu/virtio: move features to an inline function

2009-11-03 Thread Michael S. Tsirkin
On Mon, Nov 02, 2009 at 04:33:53PM -0600, Anthony Liguori wrote:
 Michael S. Tsirkin wrote:
 devices should have the final say over which virtio features they
 support. E.g. indirect entries may or may not make sense in the context
 of virtio-console. In particular, for vhost, we do not want to report to
 guest bits not supported by kernel backend.  Move the common bits from
 virtio-pci to an inline function and let each device call it.

 No functional changes.
   

 This is a layering violation.  There are transport specific features and  
 device specific features.  The virtio-net device should have no  
 knowledge or nack'ing ability for transport features.

We could pass vhost flag to virtio, and have virtio query the device
for features. Would that be better?

 If you need to change transport features, it suggests you're modeling  
 things incorrectly and should be supplying an alternative transport  
 implementation.
 Regards,

 Anthony Liguori

Yes, you can make vhost an alternative transport in qemu.  This might be
one way to handle this. However, this seems to go contrary to your
previous proposal to make vhost a networking back end. Which will it be?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv6 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Mon, Nov 02, 2009 at 04:05:58PM -0800, Daniel Walker wrote:
 
 Random style issues below .. Part of this is just stuff checkpatch
 found.

Thanks very much, I'll fix these.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Arnd Bergmann
On Monday 02 November 2009, Michael S. Tsirkin wrote:
 Tun device looks similar to a packet socket
 in that both pass complete frames from/to userspace.
 
 This patch fills in enough fields in the socket underlying tun driver
 to support sendmsg/recvmsg operations, and message flags
 MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
 to modules.  Regular read/write behaviour is unchanged.
 
 This way, code using raw sockets to inject packets
 into a physical device, can support injecting
 packets into host network stack almost without modification.
 
 First user of this interface will be vhost virtualization
 accelerator.

You mentioned before that you wanted to export the socket
using some ioctl function returning an open file descriptor,
which seemed to be a cleaner approach than this one.

What was your reason for changing?

 index 3f5fd52..404abe0 100644
 --- a/include/linux/if_tun.h
 +++ b/include/linux/if_tun.h
 @@ -86,4 +86,18 @@ struct tun_filter {
 __u8   addr[0][ETH_ALEN];
  };
  
 +#ifdef __KERNEL__
 +#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 +struct socket *tun_get_socket(struct file *);
 +#else
 +#include linux/err.h
 +#include linux/errno.h
 +struct file;
 +struct socket;
 +static inline struct socket *tun_get_socket(struct file *f)
 +{
 +   return ERR_PTR(-EINVAL);
 +}
 +#endif /* CONFIG_TUN */
 +#endif /* __KERNEL__ */
  #endif /* __IF_TUN_H */

Is this a leftover from testing? Exporting the function for !__KERNEL__
seems pointless.

Arnd 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 01:12:33PM +0100, Arnd Bergmann wrote:
 On Monday 02 November 2009, Michael S. Tsirkin wrote:
  Tun device looks similar to a packet socket
  in that both pass complete frames from/to userspace.
  
  This patch fills in enough fields in the socket underlying tun driver
  to support sendmsg/recvmsg operations, and message flags
  MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
  to modules.  Regular read/write behaviour is unchanged.
  
  This way, code using raw sockets to inject packets
  into a physical device, can support injecting
  packets into host network stack almost without modification.
  
  First user of this interface will be vhost virtualization
  accelerator.
 
 You mentioned before that you wanted to export the socket
 using some ioctl function returning an open file descriptor,
 which seemed to be a cleaner approach than this one.

Note that a similar feature can be implemented on top of tun_get_socket,
as seen from patch below.

 What was your reason for changing?

It turns out socket structure is really bound to specific a file, so we
can not have 2 files referencing the same socket.  Instead, as I say
above, it's possible to make sendmsg/recvmsg work on tap file directly.

For vhost, the advantage of such a feature over using tun_get_socket
directly would be that vhost module won't depend on tun module then.  I
have implemented this (patch below), but decided to go with the simple
thing first.  Since no userspace-visible changes are involved, let's do
this by small steps: it will be easier to figure out when vhost
is upstream.


---

Note: patch below aplies on top of patch tun: export underlying socket.
It is not intended for merge yet.

net: convert tun device to socket

Add callback to file_ops to retrieve socket from
file structure. Use this to make tun character device
accept sendmsg/recvmsg calls.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b58095a..53e1806 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1405,7 +1405,8 @@ static const struct file_operations tun_fops = {
.unlocked_ioctl = tun_chr_ioctl,
.open   = tun_chr_open,
.release = tun_chr_close,
-   .fasync = tun_chr_fasync
+   .fasync = tun_chr_fasync,
+   .get_socket = tun_get_socket,
 };
 
 static struct miscdevice tun_miscdev = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2620a8c..f2b381f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1506,6 +1506,9 @@ struct file_operations {
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t 
*, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info 
*, size_t, unsigned int);
int (*setlease)(struct file *, long, struct file_lock **);
+#ifdef CONFIG_NET
+   struct socket *(*get_socket)(struct file *file);
+#endif
 };
 
 struct inode_operations {
diff --git a/net/socket.c b/net/socket.c
index 9dff31c..700efcb 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -119,6 +119,11 @@ static ssize_t sock_splice_read(struct file *file, loff_t 
*ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
 
+static struct socket *sock_get_socket(struct file *file)
+{
+   return file-private_data;  /* set in sock_map_fd */
+}
+
 /*
  * Socket files have a set of 'special' operations as well as the generic 
file ones. These don't appear
  * in the operation structures but are done directly via the socketcall() 
multiplexor.
@@ -141,6 +146,7 @@ static const struct file_operations socket_file_ops = {
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
.splice_read =  sock_splice_read,
+   .get_socket =   sock_get_socket,
 };
 
 /*
@@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags)
 
 static struct socket *sock_from_file(struct file *file, int *err)
 {
-   if (file-f_op == socket_file_ops)
-   return file-private_data;  /* set in sock_map_fd */
+   if (file-f_op-get_socket)
+   return file-f_op-get_socket(file);
 
*err = -ENOTSOCK;
return NULL;


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv6 1/3] tun: export underlying socket

2009-11-03 Thread Arnd Bergmann
On Tuesday 03 November 2009, Michael S. Tsirkin wrote:
  What was your reason for changing?
 
 It turns out socket structure is really bound to specific a file, so we
 can not have 2 files referencing the same socket.  Instead, as I say
 above, it's possible to make sendmsg/recvmsg work on tap file directly.

Ah, I see.

 I have implemented this (patch below), but decided to go with the simple
 thing first.  Since no userspace-visible changes are involved, let's do
 this by small steps: it will be easier to figure out when vhost
 is upstream.

This may even make it easier for me to do the same with macvtap
if I resume work on that.

 @@ -416,8 +422,8 @@ int sock_map_fd(struct socket *sock, int flags)
  
  static struct socket *sock_from_file(struct file *file, int *err)
  {
 -   if (file-f_op == socket_file_ops)
 -   return file-private_data;  /* set in sock_map_fd */
 +   if (file-f_op-get_socket)
 +   return file-f_op-get_socket(file);
  
 *err = -ENOTSOCK;

Or maybe do both (socket_file_ops and get_socket), to avoid an indirect
function call.

Arnd 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH v10 1/1] virtio_console: Add support for multiple ports for generic guest and host communication

2009-11-03 Thread Amit Shah
Expose multiple char devices (ports) for simple communication
between the host userspace and guest.

Sample offline usages can be: poking around in a guest to find out
the file systems used, applications installed, etc. Online usages
can be sharing of clipboard data between the guest and the host,
sending information about logged-in users to the host, locking the
screen or session when a vnc session is closed, and so on.

This patch also enables spawning of multiple console ports that
can be attached to hvc consoles.

The interface presented to guest userspace is of a simple char
device, so it can be used like this:

fd = open(/dev/vcon2, O_RDWR);
ret = read(fd, buf, 100);
ret = write(fd, string, strlen(string));

A name property, if set by the host, is exposed in

/sys/class/virtio-console/vconNN/name

that can be used to create symlinks by udev scripts, example:

/dev/vcon/org.libvirt.channel.0  - /dev/vcon1

For requirements, use-cases, test cases, udev scripts and some
history see

http://www.linux-kvm.org/page/VMchannel_Requirements

Signed-off-by: Amit Shah amit.s...@redhat.com
Acked-by: Christian Borntraeger borntrae...@de.ibm.com (console)
CC: Christian Borntraeger borntrae...@de.ibm.com
---
 drivers/char/Kconfig   |6 +
 drivers/char/virtio_console.c  | 1412 
 include/linux/virtio_console.h |   50 ++-
 3 files changed, 1332 insertions(+), 136 deletions(-)

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 6aad99e..bc71c60 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -679,6 +679,12 @@ config VIRTIO_CONSOLE
help
  Virtio console for use with lguest and other hypervisors.
 
+ Also serves as a general-purpose serial device for data
+ transfer between the guest and host. Character devices at
+ /dev/vconNN will be created when corresponding ports are
+ found. If specified by the host, a sysfs attribute called
+ 'name' will be populated with a name for the port which can
+ be used by udev scripts to create a symlink to /dev/vconNN.
 
 config HVCS
tristate IBM Hypervisor Virtual Console Server support
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index a035ae3..48c533c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -9,10 +9,8 @@
  * functions.
  :*/
 
-/*M:002 The console can be flooded: while the Guest is processing input the
- * Host can send more.  Buffering in the Host could alleviate this, but it is a
- * difficult problem in general. :*/
 /* Copyright (C) 2006, 2007 Rusty Russell, IBM Corporation
+ * Copyright (C) 2009, Amit Shah, Red Hat, Inc.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -28,114 +26,582 @@
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
+
+#include linux/cdev.h
+#include linux/debugfs.h
+#include linux/device.h
 #include linux/err.h
+#include linux/fs.h
 #include linux/init.h
+#include linux/list.h
+#include linux/poll.h
+#include linux/sched.h
+#include linux/spinlock.h
 #include linux/virtio.h
 #include linux/virtio_console.h
+#include linux/wait.h
+#include linux/workqueue.h
 #include hvc_console.h
 
-/*D:340 These represent our input and output console queues, and the virtio
- * operations for them. */
-static struct virtqueue *in_vq, *out_vq;
-static struct virtio_device *vdev;
+/* This struct stores data that's common to all the ports */
+struct virtio_console_struct {
+   /*
+* Workqueue handlers where we process deferred work after an
+* interrupt
+*/
+   struct work_struct rx_work;
+   struct work_struct tx_work;
+   struct work_struct config_work;
 
-/* This is our input buffer, and how much data is left in it. */
-static unsigned int in_len;
-static char *in, *inbuf;
+   struct list_head port_head;
+   struct list_head unused_read_head;
+   struct list_head unused_write_head;
 
-/* The operations for our console. */
-static struct hv_ops virtio_cons;
+   /* To protect the list of unused write buffers and the out_vq */
+   spinlock_t write_list_lock;
+
+   struct virtio_device *vdev;
+   struct class *class;
+   /* The input and the output queues */
+   struct virtqueue *in_vq, *out_vq;
+
+   /* Used for exporting per-port information to debugfs */
+   struct dentry *debugfs_dir;
+
+   /* The current config space is stored here */
+   struct virtio_console_config config;
+};
+
+/* This struct holds individual buffers received for each port */
+struct virtio_console_port_buffer {
+   struct list_head next;
+
+   char *buf;
+
+   /* length of the buffer */
+   size_t len;
+   /* offset in the buf from which to consume 

[PATCH v10 0/1] virtio-console: Support for generic ports and multiple consoles

2009-11-03 Thread Amit Shah
(Rusty, would it improve the chances of getting this patch in your
tree if this description were written with latex instead of without
it?)

Here is a new iteration of the patch series that implements a
transport for guest and host communications.

I've tested for compatibility (old qemu  new kernel, new qemu  old
kernel, new qemu  new kernel) and it all works fine. To maintain
backward compatibility, the first port to be spawned (port at id 0) is
to be a console port, to be bound to hvc. This keeps new host, old
guest happy.

A testsuite is put up at the following site that has an interactive
test program that can be run in the guest. It also has an automated
test suite that tests the major functionalities presented here:

http://fedorapeople.org/gitweb?p=amitshah/public_git/test-virtserial.git

This version passes all the automated tests there.

Christian has tested the console functionality on s390.

v10:
- only one process can open a port in the guest
- minor fixes for throttling and caching cases


Please review and apply,
Amit

Amit Shah (1):
  virtio_console: Add support for multiple ports for generic guest and
host communication

 drivers/char/Kconfig   |6 +
 drivers/char/virtio_console.c  | 1412 
 include/linux/virtio_console.h |   50 ++-
 3 files changed, 1332 insertions(+), 136 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv7 2/3] mm: export use_mm/unuse_mm to modules

2009-11-03 Thread Michael S. Tsirkin
vhost net module wants to do copy to/from user from a kernel thread,
which needs use_mm. Export it to modules.

Acked-by: Andrea Arcangeli aarca...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 mm/mmu_context.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index ded9081..0777654 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -5,6 +5,7 @@
 
 #include linux/mm.h
 #include linux/mmu_context.h
+#include linux/module.h
 #include linux/sched.h
 
 #include asm/mmu_context.h
@@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm)
if (active_mm != mm)
mmdrop(active_mm);
 }
+EXPORT_SYMBOL_GPL(use_mm);
 
 /*
  * unuse_mm
@@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm)
enter_lazy_tlb(mm, tsk);
task_unlock(tsk);
 }
+EXPORT_SYMBOL_GPL(unuse_mm);
-- 
1.6.5.2.143.g8cc62

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +   unsigned head, out, in, s;
 +   struct msghdr msg = {
 +   .msg_name = NULL,
 +   .msg_namelen = 0,
 +   .msg_control = NULL,
 +   .msg_controllen = 0,
 +   .msg_iov = vq-iov,
 +   .msg_flags = MSG_DONTWAIT,
 +   };
 +   size_t len, total_len = 0;
 +   int err, wmem;
 +   size_t hdr_size;
 +   struct socket *sock = rcu_dereference(vq-private_data);
 +   if (!sock)
 +   return;
 +
 +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +   if (wmem = sock-sk-sk_sndbuf)
 +   return;
 +
 +   use_mm(net-dev.mm);
 +   mutex_lock(vq-mutex);
 +   vhost_no_notify(vq);
 +
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 
 
 Michael,
   I warned you that this needed better documentation ;)
 
 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.
 
 Kind Regards,
 -Greg
 

Here it is:

http://lkml.org/lkml/2009/8/12/173

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 +unsigned head, out, in, s;
 +struct msghdr msg = {
 +.msg_name = NULL,
 +.msg_namelen = 0,
 +.msg_control = NULL,
 +.msg_controllen = 0,
 +.msg_iov = vq-iov,
 +.msg_flags = MSG_DONTWAIT,
 +};
 +size_t len, total_len = 0;
 +int err, wmem;
 +size_t hdr_size;
 +struct socket *sock = rcu_dereference(vq-private_data);
 +if (!sock)
 +return;
 +
 +wmem = atomic_read(sock-sk-sk_wmem_alloc);
 +if (wmem = sock-sk-sk_sndbuf)
 +return;
 +
 +use_mm(net-dev.mm);
 +mutex_lock(vq-mutex);
 +vhost_no_notify(vq);
 +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)
 


Michael,
  I warned you that this needed better documentation ;)

Eric,
  I think I flagged this once before, but Michael convinced me that it
was indeed ok, if but perhaps a bit unconventional.  I will try to
find the thread.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv7 2/3] mm: export use_mm/unuse_mm to modules

2009-11-03 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 vhost net module wants to do copy to/from user from a kernel thread,
 which needs use_mm. Export it to modules.
 
 Acked-by: Andrea Arcangeli aarca...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

I need this too:

Acked-by: Gregory Haskins ghask...@novell.com

 ---
  mm/mmu_context.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/mm/mmu_context.c b/mm/mmu_context.c
 index ded9081..0777654 100644
 --- a/mm/mmu_context.c
 +++ b/mm/mmu_context.c
 @@ -5,6 +5,7 @@
  
  #include linux/mm.h
  #include linux/mmu_context.h
 +#include linux/module.h
  #include linux/sched.h
  
  #include asm/mmu_context.h
 @@ -37,6 +38,7 @@ void use_mm(struct mm_struct *mm)
   if (active_mm != mm)
   mmdrop(active_mm);
  }
 +EXPORT_SYMBOL_GPL(use_mm);
  
  /*
   * unuse_mm
 @@ -56,3 +58,4 @@ void unuse_mm(struct mm_struct *mm)
   enter_lazy_tlb(mm, tsk);
   task_unlock(tsk);
  }
 +EXPORT_SYMBOL_GPL(unuse_mm);




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv7 0/3] vhost: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
Rusty, ok, I think I've addressed all comments so far here.  In
particular I have added write logging for live migration, indirect
buffers and virtio net header (enables gso).  I'd like this to go
into linux-next, through your tree, and hopefully 2.6.33.
What do you think?

---

This implements vhost: a kernel-level backend for virtio,
The main motivation for this work is to reduce virtualization
overhead for virtio by removing system calls on data path,
without guest changes. For virtio-net, this removes up to
4 system calls per packet: vm exit for kick, reentry for kick,
iothread wakeup for packet, interrupt injection for packet.

This driver is pretty minimal, but it's fully functional (including
migration support interfaces), and already shows performance (especially
latency) improvement over userspace.

Some more detailed description attached to the patch itself.

The patches apply to both 2.6.32-rc5 and kvm.git.  I'd like them to go
into linux-next if possible.  Please comment.

Changelog from v6:
- review comments by Daniel Walker addressed
- checkpatch cleanup
- fix build on 32 bit
- maintainers entry corrected

Changelog from v5:
- tun support
- backends with virtio net header support (enables GSO, checksum etc)
- 32 bit compat fixed
- support indirect buffers, tx exit mitigation,
  tx interrupt mitigation
- support write logging (allows migration without virtio ring code in userspace)

Changelog from v4:
- disable rx notification when have rx buffers
- addressed all comments from Rusty's review
- copy bugfixes from lguest commits:
ebf9a5a99c1a464afe0b4dfa64416fc8b273bc5c
e606490c440900e50ccf73a54f6fc6150ff40815

Changelog from v3:
- checkpatch fixes

Changelog from v2:
- Comments on RCU usage
- Compat ioctl support
- Make variable static
- Copied more idiomatic english from Rusty

Changes from v1:
- Move use_mm/unuse_mm from fs/aio.c to mm instead of copying.
- Reorder code to avoid need for forward declarations
- Kill a couple of debugging printks

Michael S. Tsirkin (3):
  tun: export underlying socket
  mm: export use_mm/unuse_mm to modules
  vhost_net: a kernel-level virtio server

 MAINTAINERS|9 +
 arch/x86/kvm/Kconfig   |1 +
 drivers/Makefile   |1 +
 drivers/net/tun.c  |  101 -
 drivers/vhost/Kconfig  |   11 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/net.c|  633 +
 drivers/vhost/vhost.c  |  970 
 drivers/vhost/vhost.h  |  158 +++
 include/linux/Kbuild   |1 +
 include/linux/if_tun.h |   14 +
 include/linux/miscdevice.h |1 +
 include/linux/vhost.h  |  126 ++
 mm/mmu_context.c   |3 +
 14 files changed, 2012 insertions(+), 19 deletions(-)
 create mode 100644 drivers/vhost/Kconfig
 create mode 100644 drivers/vhost/Makefile
 create mode 100644 drivers/vhost/net.c
 create mode 100644 drivers/vhost/vhost.c
 create mode 100644 drivers/vhost/vhost.h
 create mode 100644 include/linux/vhost.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv7 1/3] tun: export underlying socket

2009-11-03 Thread Michael S. Tsirkin
Tun device looks similar to a packet socket
in that both pass complete frames from/to userspace.

This patch fills in enough fields in the socket underlying tun driver
to support sendmsg/recvmsg operations, and message flags
MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket
to modules.  Regular read/write behaviour is unchanged.

This way, code using raw sockets to inject packets
into a physical device, can support injecting
packets into host network stack almost without modification.

First user of this interface will be vhost virtualization
accelerator.

Acked-by: Herbert Xu herb...@gondor.apana.org.au
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

This was posted to netdev separately as well.
Assuming it's okay with davem, I think it makes sense to merge this
patch through Rusty's tree because vhost is the first user of the new
interface.  Posted here for completeness.


 drivers/net/tun.c  |  101 +++-
 include/linux/if_tun.h |   14 +++
 2 files changed, 96 insertions(+), 19 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4fdfa2a..18f8876 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -144,6 +144,7 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file)
err = 0;
tfile-tun = tun;
tun-tfile = tfile;
+   tun-socket.file = file;
dev_hold(tun-dev);
sock_hold(tun-socket.sk);
atomic_inc(tfile-count);
@@ -158,6 +159,7 @@ static void __tun_detach(struct tun_struct *tun)
/* Detach from net device */
netif_tx_lock_bh(tun-dev);
tun-tfile = NULL;
+   tun-socket.file = NULL;
netif_tx_unlock_bh(tun-dev);
 
/* Drop read queue */
@@ -387,7 +389,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
/* Notify and wake up reader process */
if (tun-flags  TUN_FASYNC)
kill_fasync(tun-fasync, SIGIO, POLL_IN);
-   wake_up_interruptible(tun-socket.wait);
+   wake_up_interruptible_poll(tun-socket.wait, POLLIN |
+  POLLRDNORM | POLLRDBAND);
return NETDEV_TX_OK;
 
 drop:
@@ -743,7 +746,7 @@ static __inline__ ssize_t tun_put_user(struct tun_struct 
*tun,
len = min_t(int, skb-len, len);
 
skb_copy_datagram_const_iovec(skb, 0, iv, total, len);
-   total += len;
+   total += skb-len;
 
tun-dev-stats.tx_packets++;
tun-dev-stats.tx_bytes += len;
@@ -751,34 +754,23 @@ static __inline__ ssize_t tun_put_user(struct tun_struct 
*tun,
return total;
 }
 
-static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
-   unsigned long count, loff_t pos)
+static ssize_t tun_do_read(struct tun_struct *tun,
+  struct kiocb *iocb, const struct iovec *iv,
+  ssize_t len, int noblock)
 {
-   struct file *file = iocb-ki_filp;
-   struct tun_file *tfile = file-private_data;
-   struct tun_struct *tun = __tun_get(tfile);
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
-   ssize_t len, ret = 0;
-
-   if (!tun)
-   return -EBADFD;
+   ssize_t ret = 0;
 
DBG(KERN_INFO %s: tun_chr_read\n, tun-dev-name);
 
-   len = iov_length(iv, count);
-   if (len  0) {
-   ret = -EINVAL;
-   goto out;
-   }
-
add_wait_queue(tun-socket.wait, wait);
while (len) {
current-state = TASK_INTERRUPTIBLE;
 
/* Read frames from the queue */
if (!(skb=skb_dequeue(tun-socket.sk-sk_receive_queue))) {
-   if (file-f_flags  O_NONBLOCK) {
+   if (noblock) {
ret = -EAGAIN;
break;
}
@@ -805,6 +797,27 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
current-state = TASK_RUNNING;
remove_wait_queue(tun-socket.wait, wait);
 
+   return ret;
+}
+
+static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
+   unsigned long count, loff_t pos)
+{
+   struct file *file = iocb-ki_filp;
+   struct tun_file *tfile = file-private_data;
+   struct tun_struct *tun = __tun_get(tfile);
+   ssize_t len, ret;
+
+   if (!tun)
+   return -EBADFD;
+   len = iov_length(iv, count);
+   if (len  0) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = tun_do_read(tun, iocb, iv, len, file-f_flags  O_NONBLOCK);
+   ret = min_t(ssize_t, ret, len);
 out:
tun_put(tun);
return ret;
@@ -847,7 +860,8 @@ static void tun_sock_write_space(struct sock *sk)
return;
 
if (sk-sk_sleep  waitqueue_active(sk-sk_sleep))
-   wake_up_interruptible_sync(sk-sk_sleep);
+   

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 +static void handle_tx(struct vhost_net *net)
 +{
 + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
 + unsigned head, out, in, s;
 + struct msghdr msg = {
 + .msg_name = NULL,
 + .msg_namelen = 0,
 + .msg_control = NULL,
 + .msg_controllen = 0,
 + .msg_iov = vq-iov,
 + .msg_flags = MSG_DONTWAIT,
 + };
 + size_t len, total_len = 0;
 + int err, wmem;
 + size_t hdr_size;
 + struct socket *sock = rcu_dereference(vq-private_data);
 + if (!sock)
 + return;
 +
 + wmem = atomic_read(sock-sk-sk_wmem_alloc);
 + if (wmem = sock-sk-sk_sndbuf)
 + return;
 +
 + use_mm(net-dev.mm);
 + mutex_lock(vq-mutex);
 + vhost_no_notify(vq);
 +

using rcu_dereference() and mutex_lock() at the same time seems wrong, I suspect
that your use of RCU is not correct.

1) rcu_dereference() should be done inside a read_rcu_lock() section, and
   we are not allowed to sleep in such a section.
   (Quoting Documentation/RCU/whatisRCU.txt :
 It is illegal to block while in an RCU read-side critical section, )

2) mutex_lock() can sleep (ie block)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Gregory Haskins
Eric Dumazet wrote:
 Gregory Haskins a écrit :
 Gregory Haskins wrote:
 Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :

 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.

 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )

 2) mutex_lock() can sleep (ie block)

 Michael,
   I warned you that this needed better documentation ;)

 Eric,
   I think I flagged this once before, but Michael convinced me that it
 was indeed ok, if but perhaps a bit unconventional.  I will try to
 find the thread.

 Kind Regards,
 -Greg

 Here it is:

 http://lkml.org/lkml/2009/8/12/173

 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.

Yes, I would tend to agree with you.  In fact, I think I suggested that
a normal barrier should be used instead of abusing rcu_dereference().

But as far as his code is concerned, I think it technically works
properly, and that was my main point.  Also note that the usage
rcu_dereference+mutex_lock() are not necessarily broken, per se:  it
could be an srcu-based critical section created by the caller, for
instance.  It would be perfectly legal to sleep on the mutex if that
were the case.

To me, the bigger issue is that the rcu_dereference() without any
apparent hint of a corresponding RSCS is simply confusing as a reviewer.
 smp_rmb() (or whatever is proper in this case) is probably more
appropriate.

Kind Regards,
-Greg




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Eric Dumazet
Michael S. Tsirkin a écrit :
 
 Paul, you acked this previously. Should I add you acked-by line so
 people calm down?  If you would rather I replace
 rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
 Or maybe patch Documentation to explain this RCU usage?
 

So you believe I am over-reacting to this dubious use of RCU ?

RCU documentation is already very complex, we dont need to add yet another
subtle use, and makes it less readable.

It seems you use 'RCU api' in drivers/vhost/net.c as convenient macros :

#define rcu_dereference(p) ({ \
typeof(p) _p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); \
(_p1); \
})

#define rcu_assign_pointer(p, v) \
({ \
if (!__builtin_constant_p(v) || \
((v) != NULL)) \
smp_wmb(); \
(p) = (v); \
})


There are plenty regular uses of smp_wmb() in kernel, not related to Read Copy 
Update,
there is nothing wrong to use barriers with appropriate comments.

(And you already use mb(), wmb(), rmb(), smp_wmb() in your patch)


BTW there is at least one locking bug in vhost_net_set_features()

Apparently, mutex_unlock() doesnt trigger a fault if mutex is not locked
by current thread... even with DEBUG_MUTEXES / DEBUG_LOCK_ALLOC


static void vhost_net_set_features(struct vhost_net *n, u64 features)
{
   size_t hdr_size = features  (1  VHOST_NET_F_VIRTIO_NET_HDR) ?
   sizeof(struct virtio_net_hdr) : 0;
   int i;
!  mutex_unlock(n-dev.mutex);
   n-dev.acked_features = features;
   smp_wmb();
   for (i = 0; i  VHOST_NET_VQ_MAX; ++i) {
   mutex_lock(n-vqs[i].mutex);
   n-vqs[i].hdr_size = hdr_size;
   mutex_unlock(n-vqs[i].mutex);
   }
   mutex_unlock(n-dev.mutex);
   vhost_net_flush(n);
}
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Paul E. McKenney
On Tue, Nov 03, 2009 at 01:14:06PM -0500, Gregory Haskins wrote:
 Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  + struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  + unsigned head, out, in, s;
  + struct msghdr msg = {
  + .msg_name = NULL,
  + .msg_namelen = 0,
  + .msg_control = NULL,
  + .msg_controllen = 0,
  + .msg_iov = vq-iov,
  + .msg_flags = MSG_DONTWAIT,
  + };
  + size_t len, total_len = 0;
  + int err, wmem;
  + size_t hdr_size;
  + struct socket *sock = rcu_dereference(vq-private_data);
  + if (!sock)
  + return;
  +
  + wmem = atomic_read(sock-sk-sk_wmem_alloc);
  + if (wmem = sock-sk-sk_sndbuf)
  + return;
  +
  + use_mm(net-dev.mm);
  + mutex_lock(vq-mutex);
  + vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
  
  
  Michael,
I warned you that this needed better documentation ;)
  
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
  
  Kind Regards,
  -Greg
  
 
 Here it is:
 
 http://lkml.org/lkml/2009/8/12/173

What was happening in that case was that the rcu_dereference()
was being used in a workqueue item.  The role of rcu_read_lock()
was taken on be the start of execution of the workqueue item, of
rcu_read_unlock() by the end of execution of the workqueue item, and
of synchronize_rcu() by flush_workqueue().  This does work, at least
assuming that flush_workqueue() operates as advertised, which it appears
to at first glance.

The above code looks somewhat different, however -- I don't see
handle_tx() being executed in the context of a work queue.  Instead
it appears to be in an interrupt handler.

So what is the story?  Using synchronize_irq() or some such?

Thanx, Paul
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:03:55PM +0100, Eric Dumazet wrote:
 Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +   struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +   unsigned head, out, in, s;
  +   struct msghdr msg = {
  +   .msg_name = NULL,
  +   .msg_namelen = 0,
  +   .msg_control = NULL,
  +   .msg_controllen = 0,
  +   .msg_iov = vq-iov,
  +   .msg_flags = MSG_DONTWAIT,
  +   };
  +   size_t len, total_len = 0;
  +   int err, wmem;
  +   size_t hdr_size;
  +   struct socket *sock = rcu_dereference(vq-private_data);
  +   if (!sock)
  +   return;
  +
  +   wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +   if (wmem = sock-sk-sk_sndbuf)
  +   return;
  +
  +   use_mm(net-dev.mm);
  +   mutex_lock(vq-mutex);
  +   vhost_no_notify(vq);
  +
 
 using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
 suspect
 that your use of RCU is not correct.
 
 1) rcu_dereference() should be done inside a read_rcu_lock() section, and
we are not allowed to sleep in such a section.
(Quoting Documentation/RCU/whatisRCU.txt :
  It is illegal to block while in an RCU read-side critical section, )
 
 2) mutex_lock() can sleep (ie block)

This use is correct. See comment in vhost.h This use of RCU has been
acked by Paul E. McKenney (paul...@linux.vnet.ibm.com) as well.
There are many ways to use RCU not all of which involve read_rcu_lock.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
  migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device.  Backend is also configured by userspace, including vlan/mac
etc.

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Acked-by: Arnd Bergmann a...@arndb.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 MAINTAINERS|9 +
 arch/x86/kvm/Kconfig   |1 +
 drivers/Makefile   |1 +
 drivers/vhost/Kconfig  |   11 +
 drivers/vhost/Makefile |2 +
 drivers/vhost/net.c|  633 +
 drivers/vhost/vhost.c  |  970 
 drivers/vhost/vhost.h  |  158 +++
 include/linux/Kbuild   |1 +
 include/linux/miscdevice.h |1 +
 include/linux/vhost.h  |  126 ++
 11 files changed, 1913 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/Kconfig
 create mode 100644 drivers/vhost/Makefile
 create mode 100644 drivers/vhost/net.c
 create mode 100644 drivers/vhost/vhost.c
 create mode 100644 drivers/vhost/vhost.h
 create mode 100644 include/linux/vhost.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8824115..980a69b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5619,6 +5619,15 @@ S:   Maintained
 F: Documentation/filesystems/vfat.txt
 F: fs/fat/
 
+VIRTIO HOST (VHOST)
+M: Michael S. Tsirkin m...@redhat.com
+L: k...@vger.kernel.org
+L: virtualizat...@lists.osdl.org
+L: net...@vger.kernel.org
+S: Maintained
+F: drivers/vhost/
+F: include/linux/vhost.h
+
 VIA RHINE NETWORK DRIVER
 M: Roger Luethi r...@hellgate.ch
 S: Maintained
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b84e571..94f44d9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_AMD
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
+source drivers/vhost/Kconfig
 source drivers/lguest/Kconfig
 source drivers/virtio/Kconfig
 
diff --git a/drivers/Makefile b/drivers/Makefile
index 6ee53c7..81e3659 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_HID)   += hid/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_OF)   += of/
 obj-$(CONFIG_SSB)  += ssb/
+obj-$(CONFIG_VHOST_NET)+= vhost/
 obj-$(CONFIG_VIRTIO)   += virtio/
 obj-$(CONFIG_VLYNQ)+= vlynq/
 obj-$(CONFIG_STAGING)  += staging/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
new file mode 100644
index 000..d955406
--- /dev/null
+++ b/drivers/vhost/Kconfig
@@ -0,0 +1,11 @@
+config VHOST_NET
+   tristate Host kernel accelerator for virtio net
+   depends on NET  EVENTFD
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest networking with virtio_net. Not to be confused with virtio_net
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_net.
+
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
new file mode 100644
index 000..72dd020
--- /dev/null
+++ b/drivers/vhost/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_VHOST_NET) += vhost_net.o
+vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
new file mode 100644
index 000..4af3b98
--- /dev/null
+++ b/drivers/vhost/net.c
@@ -0,0 +1,633 @@
+/* Copyright (C) 2009 Red Hat, Inc.
+ * Author: Michael S. Tsirkin m...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * 

Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server

2009-11-03 Thread Michael S. Tsirkin
On Tue, Nov 03, 2009 at 07:51:35PM +0100, Eric Dumazet wrote:
 Gregory Haskins a écrit :
  Gregory Haskins wrote:
  Eric Dumazet wrote:
  Michael S. Tsirkin a écrit :
  +static void handle_tx(struct vhost_net *net)
  +{
  +struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX];
  +unsigned head, out, in, s;
  +struct msghdr msg = {
  +.msg_name = NULL,
  +.msg_namelen = 0,
  +.msg_control = NULL,
  +.msg_controllen = 0,
  +.msg_iov = vq-iov,
  +.msg_flags = MSG_DONTWAIT,
  +};
  +size_t len, total_len = 0;
  +int err, wmem;
  +size_t hdr_size;
  +struct socket *sock = rcu_dereference(vq-private_data);
  +if (!sock)
  +return;
  +
  +wmem = atomic_read(sock-sk-sk_wmem_alloc);
  +if (wmem = sock-sk-sk_sndbuf)
  +return;
  +
  +use_mm(net-dev.mm);
  +mutex_lock(vq-mutex);
  +vhost_no_notify(vq);
  +
  using rcu_dereference() and mutex_lock() at the same time seems wrong, I 
  suspect
  that your use of RCU is not correct.
 
  1) rcu_dereference() should be done inside a read_rcu_lock() section, and
 we are not allowed to sleep in such a section.
 (Quoting Documentation/RCU/whatisRCU.txt :
   It is illegal to block while in an RCU read-side critical section, )
 
  2) mutex_lock() can sleep (ie block)
 
 
  Michael,
I warned you that this needed better documentation ;)
 
  Eric,
I think I flagged this once before, but Michael convinced me that it
  was indeed ok, if but perhaps a bit unconventional.  I will try to
  find the thread.
 
  Kind Regards,
  -Greg
 
  
  Here it is:
  
  http://lkml.org/lkml/2009/8/12/173
  
 
 Yes, this doesnt convince me at all, and could be a precedent for a wrong RCU 
 use.
 People wanting to use RCU do a grep on kernel sources to find how to correctly
 use RCU.
 
 Michael, please use existing locking/barrier mechanisms, and not pretend to 
 use RCU.
 
 Some automatic tools might barf later.
 
 For example, we could add a debugging facility to check that 
 rcu_dereference() is used
 in an appropriate context, ie conflict with existing mutex_lock() debugging 
 facility.


Paul, you acked this previously. Should I add you acked-by line so
people calm down?  If you would rather I replace
rcu_dereference/rcu_assign_pointer with rmb/wmb, I can do this.
Or maybe patch Documentation to explain this RCU usage?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-11-03 Thread Gerd Hoffmann
On 10/30/09 09:12, Hannes Reinecke wrote:
 Gerd Hoffmann wrote:
 http://repo.or.cz/w/qemu/kraxel.git?a=shortlog;h=refs/heads/scsi.v1

 It is far from being completed, will continue tomorrow.  Should give a
 idea of the direction I'm heading to though.  Comments welcome.

 Yep, this looks good.

More bits available now at:

http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v3

Please have a look at the new interface, this commit:

http://repo.or.cz/w/qemu/kraxel.git/commitdiff/9c825dac540282dd4d5f5f660ca13af617888037

The workflow I have in mind is:

   -request_new()

Returns a parsed request, i.e. all fields of req-cmd are filled 
already, so the host adapter can easily check whenever it is a read or 
write and how many bytes will be transfered.

   -request_run()

Execute the command.  iovec is passed to the dma_bdrv_*() functions, 
which means it should contain guest physical addresses.

When the request is done the complete callback is called.  For commands 
which complete immediately the callback might be called before 
request_run() returns.  Or maybe don't call the callback at all then? 
We can easily indicate using the return value that we are done already.

   -request_del()

Release request when done.

Questions  comments are welcome.

cheers,
   Gerd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization