Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-27 Thread Rusty Russell
On Fri, 13 Jul 2012 16:38:51 +0800, Asias He as...@redhat.com wrote:
 This patch introduces bio-based IO path for virtio-blk.

Acked-by: Rusty Russell ru...@rustcorp.com.au

I just hope we can do better than a module option in future.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace

2012-07-27 Thread Yoshihiro YUNOMAE

Hi Amit,

Thank you for commenting on our work.

(2012/07/26 20:35), Amit Shah wrote:

On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:


[...]



Therefore, we propose a new system virtio-trace, which uses enhanced
virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel
tracing data. In this system, there are 5 main components:
  (1) Ring-buffer of ftrace in a guest
  - When trace agent reads ring-buffer, a page is removed from ring-buffer.
  (2) Trace agent in the guest
  - Splice the page of ring-buffer to read_pipe using splice() without
memory copying. Then, the page is spliced from write_pipe to virtio
without memory copying.


I really like the splicing idea.


Thanks. We will improve this patch set.


  (3) Virtio-console driver in the guest
  - Pass the page to virtio-ring
  (4) Virtio-serial bus in QEMU
  - Copy the page to kernel pipe
  (5) Reader in the host
  - Read guest tracing data via FIFO(named pipe)


So will this be useful only if guest and host run the same kernel?

I'd like to see the host kernel not being used at all -- collect all
relevant info from the guest and send it out to qemu, where it can be
consumed directly by apps driving the tracing.


No, this patch set is used only for guest kernels, so guest and host
don't need to run the same kernel.


***Evaluation***
When a host collects tracing data of a guest, the performance of using
virtio-trace is compared with that of using native(just running ftrace),
IVRing, and virtio-serial(normal method of read/write).


Why is tracing performance-sensitive?  i.e. why try to optimise this
at all?


To minimize effects for applications on guests when a host collects
tracing data of guests.
For example, we assume the situation where guests A and B are running
on a host sharing I/O device. An I/O delay problem occur in guest A,
but it doesn't for the requirement in guest B. In this case, we need to
collect tracing data of guests A and B, but a usual method using
network takes high load for applications of guest B even if guest B is
normally running. Therefore, we try to decrease the load on guests.
We also use this feature for performance analysis on production
virtualization systems.

[...]



***Just enhancement ideas***
  - Support for trace-cmd
  - Support for 9pfs protocol
  - Support for non-blocking mode in QEMU


There were patches long back (by me) to make chardevs non-blocking but
they didn't make it upstream.  Fedora carries them, if you want to try
out.  Though we want to converge on a reasonable solution that's
acceptable upstream as well.  Just that no one's working on it
currently.  Any help here will be appreciated.


Thanks! In this case, since a guest will stop to run when host reads
trace data of the guest, char device is needed to add a non-blocking
mode. I'll read your patch series. Is the latest version 8?
http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html


  - Make vhost-serial


I need to understand a) why it's perf-critical, and b) why should the
host be involved at all, to comment on these.


a) To make collecting overhead decrease for application on a guest.
   (see above)
b) Trace data of host kernel is not involved even if we introduce this
   patch set.

Thank you,

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace

2012-07-27 Thread Amit Shah
On (Fri) 27 Jul 2012 [17:55:11], Yoshihiro YUNOMAE wrote:
 Hi Amit,
 
 Thank you for commenting on our work.
 
 (2012/07/26 20:35), Amit Shah wrote:
 On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote:
 
 [...]
 
 
 Therefore, we propose a new system virtio-trace, which uses enhanced
 virtio-serial and existing ring-buffer of ftrace, for collecting guest 
 kernel
 tracing data. In this system, there are 5 main components:
   (1) Ring-buffer of ftrace in a guest
   - When trace agent reads ring-buffer, a page is removed from 
  ring-buffer.
   (2) Trace agent in the guest
   - Splice the page of ring-buffer to read_pipe using splice() without
 memory copying. Then, the page is spliced from write_pipe to virtio
 without memory copying.
 
 I really like the splicing idea.
 
 Thanks. We will improve this patch set.
 
   (3) Virtio-console driver in the guest
   - Pass the page to virtio-ring
   (4) Virtio-serial bus in QEMU
   - Copy the page to kernel pipe
   (5) Reader in the host
   - Read guest tracing data via FIFO(named pipe)
 
 So will this be useful only if guest and host run the same kernel?
 
 I'd like to see the host kernel not being used at all -- collect all
 relevant info from the guest and send it out to qemu, where it can be
 consumed directly by apps driving the tracing.
 
 No, this patch set is used only for guest kernels, so guest and host
 don't need to run the same kernel.

OK - that's good to know.

 ***Evaluation***
 When a host collects tracing data of a guest, the performance of using
 virtio-trace is compared with that of using native(just running ftrace),
 IVRing, and virtio-serial(normal method of read/write).
 
 Why is tracing performance-sensitive?  i.e. why try to optimise this
 at all?
 
 To minimize effects for applications on guests when a host collects
 tracing data of guests.
 For example, we assume the situation where guests A and B are running
 on a host sharing I/O device. An I/O delay problem occur in guest A,
 but it doesn't for the requirement in guest B. In this case, we need to
 collect tracing data of guests A and B, but a usual method using
 network takes high load for applications of guest B even if guest B is
 normally running. Therefore, we try to decrease the load on guests.
 We also use this feature for performance analysis on production
 virtualization systems.

OK, got it.

 
 [...]
 
 
 ***Just enhancement ideas***
   - Support for trace-cmd
   - Support for 9pfs protocol
   - Support for non-blocking mode in QEMU
 
 There were patches long back (by me) to make chardevs non-blocking but
 they didn't make it upstream.  Fedora carries them, if you want to try
 out.  Though we want to converge on a reasonable solution that's
 acceptable upstream as well.  Just that no one's working on it
 currently.  Any help here will be appreciated.
 
 Thanks! In this case, since a guest will stop to run when host reads
 trace data of the guest, char device is needed to add a non-blocking
 mode. I'll read your patch series. Is the latest version 8?
 http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html

I suppose the latest version on-list is what you quote above.  The
objections to the patch series are mentioned in Anthony's mails.

Hans maintains a rebased version of the patches in his tree at

http://cgit.freedesktop.org/~jwrdegoede/qemu/

those patches are included in Fedora's qemu-kvm, so you can try that
out if it improves performance for you.

   - Make vhost-serial
 
 I need to understand a) why it's perf-critical, and b) why should the
 host be involved at all, to comment on these.
 
 a) To make collecting overhead decrease for application on a guest.
(see above)
 b) Trace data of host kernel is not involved even if we introduce this
patch set.

I see, so you suggested vhost-serial only because you saw the guest
stopping problem due to the absence of non-blocking code?  If so, it
now makes sense.  I don't think we need vhost-serial in any way yet.

BTW where do you parse the trace data obtained from guests?  On a
remote host?

Thanks,
Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Alan Cox
 +enum {
 + VMCI_SUCCESS_QUEUEPAIR_ATTACH   =  5,
 + VMCI_SUCCESS_QUEUEPAIR_CREATE   =  4,
 + VMCI_SUCCESS_LAST_DETACH=  3,
 + VMCI_SUCCESS_ACCESS_GRANTED =  2,
 + VMCI_SUCCESS_ENTRY_DEAD =  1,

We've got a nice collection of Linux error codes than you, and it would
make the driver enormously more readable on the Linux side if as low
level as possible it started using Linux error codes.


 + VMCI_SUCCESS=  0,
 + VMCI_ERROR_INVALID_RESOURCE = (-1),
 + VMCI_ERROR_INVALID_ARGS = (-2),
 + VMCI_ERROR_NO_MEM   = (-3),
 + VMCI_ERROR_DATAGRAM_FAILED  = (-4),
 + VMCI_ERROR_MORE_DATA= (-5),
 + VMCI_ERROR_NO_MORE_DATAGRAMS= (-6),
 + VMCI_ERROR_NO_ACCESS= (-7),
 + VMCI_ERROR_NO_HANDLE= (-8),
 + VMCI_ERROR_DUPLICATE_ENTRY  = (-9),
 + VMCI_ERROR_DST_UNREACHABLE  = (-10),
 + VMCI_ERROR_PAYLOAD_TOO_LARGE= (-11),
 + VMCI_ERROR_INVALID_PRIV = (-12),
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Sam Ravnborg
Hi Andrew.

A few things noted in the following..

 
 diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
 index 2661f6e..fe38c7a 100644
 --- a/drivers/misc/Kconfig
 +++ b/drivers/misc/Kconfig
 @@ -517,4 +517,5 @@ source drivers/misc/lis3lv02d/Kconfig
  source drivers/misc/carma/Kconfig
  source drivers/misc/altera-stapl/Kconfig
  source drivers/misc/mei/Kconfig
 +source drivers/misc/vmw_vmci/Kconfig
  endmenu
 diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
 index 456972f..af9e413 100644
 --- a/drivers/misc/Makefile
 +++ b/drivers/misc/Makefile
 @@ -51,3 +51,4 @@ obj-y   += carma/
  obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
  obj-$(CONFIG_ALTERA_STAPL)   +=altera-stapl/
  obj-$(CONFIG_INTEL_MEI)  += mei/
 +obj-y+= vmw_vmci/

Please use obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/

like we do in the other cases. This prevents us from visiting the directory
when this feature is not enabled.

 +++ b/drivers/misc/vmw_vmci/Makefile
 @@ -0,0 +1,43 @@
 +
 +#
 +# Linux driver for VMware's VMCI device.
 +#
 +# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved.
 +#
 +# This program is free software; you can redistribute it and/or modify it
 +# under the terms of the GNU General Public License as published by the
 +# Free Software Foundation; version 2 of the License and no later version.
 +#
 +# This program is distributed in the hope that it will be useful, but
 +# WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
 +# NON INFRINGEMENT.  See the GNU General Public License for more
 +# details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program; if not, write to the Free Software
 +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 +#
 +# The full GNU General Public License is included in this distribution in
 +# the file called COPYING.
 +#
 +# Maintained by: Andrew Stiegmann pv-driv...@vmware.com
 +#
 +
Lot's of boilerplate noise for such a simple file...

 +
 +#
 +# Makefile for the VMware VMCI
 +#
 +
 +obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
 +
 +vmw_vmci-objs += vmci_context.o
 +vmw_vmci-objs += vmci_datagram.o
 +vmw_vmci-objs += vmci_doorbell.o
 +vmw_vmci-objs += vmci_driver.o
 +vmw_vmci-objs += vmci_event.o
 +vmw_vmci-objs += vmci_handle_array.o
 +vmw_vmci-objs += vmci_hash_table.o
 +vmw_vmci-objs += vmci_queue_pair.o
 +vmw_vmci-objs += vmci_resource.o
 +vmw_vmci-objs += vmci_route.o

please use:
vmw_vmci-y += vmci_context.o
vmw_vmci-y += vmci_datagram.o
vmw_vmci-y += vmci_doorbell.o

This is recommended these days and allows you to enable/disable
single files later using a config option.



 diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h 
 b/drivers/misc/vmw_vmci/vmci_common_int.h
 +
 +#ifndef _VMCI_COMMONINT_H_
 +#define _VMCI_COMMONINT_H_
 +
 +#include linux/printk.h
 +#include linux/vmw_vmci_defs.h

Use inverse chrismas tree here.
Longer include lines first, and soret alphabetically when
lines are of the same length.
This applies likely in many cases.

 +#include vmci_handle_array.h
 +
 +#define ASSERT(cond) BUG_ON(!(cond))
 +
 +#define CAN_BLOCK(_f) (!((_f)  VMCI_QPFLAG_NONBLOCK))
 +#define QP_PINNED(_f) ((_f)  VMCI_QPFLAG_PINNED)

Looks like poor obscufation.
Use a statis inline function if you need a helper for this.

 +
 +/*
 + * Utilility function that checks whether two entities are allowed
 + * to interact. If one of them is restricted, the other one must
 + * be trusted.
 + */
 +static inline bool vmci_deny_interaction(uint32_t partOne,
 +  uint32_t partTwo)

The kernel types are u32 not uint32_t - these types belongs in user-space.

 +++ b/include/linux/vmw_vmci_api.h
 +
 +#ifndef __VMW_VMCI_API_H__
 +#define __VMW_VMCI_API_H__
 +
 +#include linux/vmw_vmci_defs.h
 +
 +#undef  VMCI_KERNEL_API_VERSION
 +#define VMCI_KERNEL_API_VERSION_2 2
 +#define VMCI_KERNEL_API_VERSION   VMCI_KERNEL_API_VERSION_2
 +
 +typedef void (VMCI_DeviceShutdownFn) (void *deviceRegistration, void 
 *userData);
 +
 +bool VMCI_DeviceGet(uint32_t *apiVersion,
 + VMCI_DeviceShutdownFn *deviceShutdownCB,
 + void *userData, void **deviceRegistration);

The kernel style is to use lower_case for everything.
So this would become:

vmci_device_get()

This is obviously a very general comment and applies everywhere.

Sam
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue

2012-07-27 Thread Paolo Bonzini
Il 05/07/2012 12:29, Jason Wang ha scritto:
 Sometimes, virtio device need to configure irq affiniry hint to maximize the
 performance. Instead of just exposing the irq of a virtqueue, this patch
 introduce an API to set the affinity for a virtqueue.
 
 The api is best-effort, the affinity hint may not be set as expected due to
 platform support, irq sharing or irq type. Currently, only pci method were
 implemented and we set the affinity according to:
 
 - if device uses INTX, we just ignore the request
 - if device has per vq vector, we force the affinity hint
 - if the virtqueues share MSI, make the affinity OR over all affinities
  requested
 
 Signed-off-by: Jason Wang jasow...@redhat.com

Hmm, I don't see any benefit from this patch, I need to use
irq_set_affinity (which however is not exported) to actually bind IRQs
to CPUs.  Example:

with irq_set_affinity_hint:
 43:   89  107  100   97   PCI-MSI-edge   virtio0-request
 44:  178  195  268  199   PCI-MSI-edge   virtio0-request
 45:   97  100   97  155   PCI-MSI-edge   virtio0-request
 46:  234  261  213  218   PCI-MSI-edge   virtio0-request

with irq_set_affinity:
 43:  721001   PCI-MSI-edge   virtio0-request
 44:0  74601   PCI-MSI-edge   virtio0-request
 45:00  6580   PCI-MSI-edge   virtio0-request
 46:001  547   PCI-MSI-edge   virtio0-request

I gathered these quickly after boot, but real benchmarks show the same
behavior, and performance gets actually worse with virtio-scsi
multiqueue+irq_set_affinity_hint than with irq_set_affinity.

I also tried adding IRQ_NO_BALANCING, but the only effect is that I
cannot set the affinity

The queue steering algorithm I use in virtio-scsi is extremely simple
and based on your tx code.  See how my nice pinning is destroyed:

# taskset -c 0 dd if=/dev/sda bs=1M count=1000 of=/dev/null iflag=direct
# cat /proc/interrupts
 43:  2690 2709 2691 2696   PCI-MSI-edge  virtio0-request
 44:   109  122  199  124   PCI-MSI-edge  virtio0-request
 45:   170  183  170  237   PCI-MSI-edge  virtio0-request
 46:   143  166  125  125   PCI-MSI-edge  virtio0-request

All my requests come from CPU#0 and thus go to the first virtqueue, but
the interrupts are serviced all over the place.

Did you set the affinity manually in your experiments, or perhaps there
is a difference between scsi and networking... (interrupt mitigation?)

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Andrew Stiegmann
Hi Sam,

- Original Message -
 From: Sam Ravnborg s...@ravnborg.org
 To: Andrew Stiegmann (stieg) astiegm...@vmware.com
 Cc: linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, 
 pv-driv...@vmware.com,
 vm-crosst...@vmware.com, csch...@vmware.com, gre...@linuxfoundation.org
 Sent: Friday, July 27, 2012 3:34:55 AM
 Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
 
 Hi Andrew.
 
 A few things noted in the following..
 
  
  diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
  index 2661f6e..fe38c7a 100644
  --- a/drivers/misc/Kconfig
  +++ b/drivers/misc/Kconfig
  @@ -517,4 +517,5 @@ source drivers/misc/lis3lv02d/Kconfig
   source drivers/misc/carma/Kconfig
   source drivers/misc/altera-stapl/Kconfig
   source drivers/misc/mei/Kconfig
  +source drivers/misc/vmw_vmci/Kconfig
   endmenu
  diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
  index 456972f..af9e413 100644
  --- a/drivers/misc/Makefile
  +++ b/drivers/misc/Makefile
  @@ -51,3 +51,4 @@ obj-y += carma/
   obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
   obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
   obj-$(CONFIG_INTEL_MEI)+= mei/
  +obj-y  += vmw_vmci/
 
 Please use obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
 
 like we do in the other cases. This prevents us from visiting the
 directory
 when this feature is not enabled.

Ok.

  +++ b/drivers/misc/vmw_vmci/Makefile
  @@ -0,0 +1,43 @@
  +
  +#
  +# Linux driver for VMware's VMCI device.
  +#
  +# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved.
  +#
  +# This program is free software; you can redistribute it and/or
  modify it
  +# under the terms of the GNU General Public License as published
  by the
  +# Free Software Foundation; version 2 of the License and no later
  version.
  +#
  +# This program is distributed in the hope that it will be useful,
  but
  +# WITHOUT ANY WARRANTY; without even the implied warranty of
  +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE
  or
  +# NON INFRINGEMENT.  See the GNU General Public License for more
  +# details.
  +#
  +# You should have received a copy of the GNU General Public
  License
  +# along with this program; if not, write to the Free Software
  +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
  02110-1301 USA.
  +#
  +# The full GNU General Public License is included in this
  distribution in
  +# the file called COPYING.
  +#
  +# Maintained by: Andrew Stiegmann pv-driv...@vmware.com
  +#
  +
 Lot's of boilerplate noise for such a simple file...

I removed the section containing FSF address and section below it as well per 
Greg KH's request.

  +
  +#
  +# Makefile for the VMware VMCI
  +#
  +
  +obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
  +
  +vmw_vmci-objs += vmci_context.o
  +vmw_vmci-objs += vmci_datagram.o
  +vmw_vmci-objs += vmci_doorbell.o
  +vmw_vmci-objs += vmci_driver.o
  +vmw_vmci-objs += vmci_event.o
  +vmw_vmci-objs += vmci_handle_array.o
  +vmw_vmci-objs += vmci_hash_table.o
  +vmw_vmci-objs += vmci_queue_pair.o
  +vmw_vmci-objs += vmci_resource.o
  +vmw_vmci-objs += vmci_route.o
 
 please use:
 vmw_vmci-y += vmci_context.o
 vmw_vmci-y += vmci_datagram.o
 vmw_vmci-y += vmci_doorbell.o
 
 This is recommended these days and allows you to enable/disable
 single files later using a config option.

Ok.
 
  diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h
  b/drivers/misc/vmw_vmci/vmci_common_int.h
  +
  +#ifndef _VMCI_COMMONINT_H_
  +#define _VMCI_COMMONINT_H_
  +
  +#include linux/printk.h
  +#include linux/vmw_vmci_defs.h
 
 Use inverse chrismas tree here.
 Longer include lines first, and soret alphabetically when
 lines are of the same length.
 This applies likely in many cases.
 
  +#include vmci_handle_array.h
  +
  +#define ASSERT(cond) BUG_ON(!(cond))
  +
  +#define CAN_BLOCK(_f) (!((_f)  VMCI_QPFLAG_NONBLOCK))
  +#define QP_PINNED(_f) ((_f)  VMCI_QPFLAG_PINNED)
 
 Looks like poor obscufation.
 Use a statis inline function if you need a helper for this.

These definitions are intended more as a helper to make reading the code 
easier.  IMHO ts a lot easier to read

if (CAN_BLOCK(flags))

compared to 

if (!(flags  VMCI_QPFLAG_NONBLOCK))

Wouldn't you agree?  I'm not sure something this simple warrants a static 
inline function but I don't see any harm in converting it over to that.
 
  +
  +/*
  + * Utilility function that checks whether two entities are allowed
  + * to interact. If one of them is restricted, the other one must
  + * be trusted.
  + */
  +static inline bool vmci_deny_interaction(uint32_t partOne,
  +uint32_t partTwo)
 
 The kernel types are u32 not uint32_t - these types belongs in
 user-space.

Ok.

  +++ b/include/linux/vmw_vmci_api.h
  +
  

Re: [Pv-drivers] [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Dmitry Torokhov
Hi Alan,

On Fri, Jul 27, 2012 at 10:53:57AM +0100, Alan Cox wrote:
  +enum {
  +   VMCI_SUCCESS_QUEUEPAIR_ATTACH   =  5,
  +   VMCI_SUCCESS_QUEUEPAIR_CREATE   =  4,
  +   VMCI_SUCCESS_LAST_DETACH=  3,
  +   VMCI_SUCCESS_ACCESS_GRANTED =  2,
  +   VMCI_SUCCESS_ENTRY_DEAD =  1,
 
 We've got a nice collection of Linux error codes than you, and it would
 make the driver enormously more readable on the Linux side if as low
 level as possible it started using Linux error codes.

If VMCI was only used on Linux we'd definitely do that; however VMCI
core is shared among several operating systems (much like ACPI is) and
we'd like to limit divergencies between them while conforming to the
kernel coding style as much as possible.

We'll make sure that we will not leak VMCI-specific errors to the
standard kernel APIs.

Thanks,
Dmitry
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Greg KH
On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote:
  The kernel style is to use lower_case for everything.
  So this would become:
  
  vmci_device_get()
  
  This is obviously a very general comment and applies everywhere.
 
 I wish I could lower case these symbols but VMCI has already existed
 outside the mainline Linux tree for some time now and changing these
 exported symbols would mean that other drivers that depend on VMCI
 (vSock, vmhgfs) would need to change as well.   One thought that did
 come to mind was exporting both VMCI_Device_Get and vmci_device_get
 but that would likely just confuse people.  So in short I have made
 function names lower case where possible, but exported symbols could
 not be changed.

Not true at all.  You want those drivers to be merged as well, right?
So they will need to have their functions changed, and their code as
well.

Just wait until we get to the change your functionality around
requests, those will require those drivers to change.  Right now we are
at the silly and obvious things you did wrong stage of the review
process :)

So please fix these, and also, post these drivers as well, so we can see
how they interact with the core code.

Actually, if you are going to need lots of refactoring for these
drivers, and the core, I would recommend putting this all in the staging
tree, to allow that to happen over time.  That would ensure that your
users keep having working systems, and let you modify the interfaces
better and easier, than having to keep it all out-of-tree.

What do you think?

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Andrew Stiegmann


- Original Message -
 From: Greg KH gre...@linuxfoundation.org
 To: Andrew Stiegmann astiegm...@vmware.com
 Cc: Sam Ravnborg s...@ravnborg.org, linux-ker...@vger.kernel.org, 
 virtualization@lists.linux-foundation.org,
 pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com
 Sent: Friday, July 27, 2012 11:16:39 AM
 Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
 
 On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote:
   The kernel style is to use lower_case for everything.
   So this would become:
   
   vmci_device_get()
   
   This is obviously a very general comment and applies everywhere.
  
  I wish I could lower case these symbols but VMCI has already
  existed
  outside the mainline Linux tree for some time now and changing
  these
  exported symbols would mean that other drivers that depend on VMCI
  (vSock, vmhgfs) would need to change as well.   One thought that
  did
  come to mind was exporting both VMCI_Device_Get and vmci_device_get
  but that would likely just confuse people.  So in short I have made
  function names lower case where possible, but exported symbols
  could
  not be changed.
 
 Not true at all.  You want those drivers to be merged as well, right?
 So they will need to have their functions changed, and their code as
 well.

As previously mentioned VMware is working on upstreaming our vSock driver (one 
of a few drivers that uses vmw_vmci).  However there are no plans to upstream 
the other drivers that depend on vmw_vmci.  Because of this these symbols can 
not change.

 Just wait until we get to the change your functionality around
 requests, those will require those drivers to change.  Right now we
 are
 at the silly and obvious things you did wrong stage of the review
 process :)

 So please fix these, and also, post these drivers as well, so we can
 see
 how they interact with the core code.
 
 Actually, if you are going to need lots of refactoring for these
 drivers, and the core, I would recommend putting this all in the
 staging
 tree, to allow that to happen over time.  That would ensure that your
 users keep having working systems, and let you modify the interfaces
 better and easier, than having to keep it all out-of-tree.
 
 What do you think?

We will discuss this internally and let you know.
 
 greg k-h
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Greg KH
On Fri, Jul 27, 2012 at 11:39:23AM -0700, Andrew Stiegmann wrote:
 
 
 - Original Message -
  From: Greg KH gre...@linuxfoundation.org
  To: Andrew Stiegmann astiegm...@vmware.com
  Cc: Sam Ravnborg s...@ravnborg.org, linux-ker...@vger.kernel.org, 
  virtualization@lists.linux-foundation.org,
  pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com
  Sent: Friday, July 27, 2012 11:16:39 AM
  Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
  
  On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote:
The kernel style is to use lower_case for everything.
So this would become:

vmci_device_get()

This is obviously a very general comment and applies everywhere.
   
   I wish I could lower case these symbols but VMCI has already
   existed
   outside the mainline Linux tree for some time now and changing
   these
   exported symbols would mean that other drivers that depend on VMCI
   (vSock, vmhgfs) would need to change as well.   One thought that
   did
   come to mind was exporting both VMCI_Device_Get and vmci_device_get
   but that would likely just confuse people.  So in short I have made
   function names lower case where possible, but exported symbols
   could
   not be changed.
  
  Not true at all.  You want those drivers to be merged as well, right?
  So they will need to have their functions changed, and their code as
  well.
 
 As previously mentioned VMware is working on upstreaming our vSock
 driver (one of a few drivers that uses vmw_vmci).

Great.

 However there are no plans to upstream the other drivers that depend
 on vmw_vmci.

Why not?  That seems quite short-sighted.

 Because of this these symbols can not change.

Then I would argue that we can not accept this code at all, because it
will change over time, both symbol names, and functionality (see my
previous comment about how that is going to have to change.)

sorry,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Re: [Qemu-devel] [RFC PATCH 0/6] virtio-trace: Support virtio-trace

2012-07-27 Thread Blue Swirl
On Wed, Jul 25, 2012 at 8:15 AM, Masami Hiramatsu
masami.hiramatsu...@hitachi.com wrote:
 (2012/07/25 5:26), Blue Swirl wrote:
 The following patch set provides a low-overhead system for collecting kernel
 tracing data of guests by a host in a virtualization environment.

 A guest OS generally shares some devices with other guests or a host, so
 reasons of any problems occurring in a guest may be from other guests or a
 host.
 Then, to collect some tracing data of a number of guests and a host is 
 needed
 when some problems occur in a virtualization environment. One of methods to
 realize that is to collect tracing data of guests in a host. To do this,
 network
 is generally used. However, high load will be taken to applications on 
 guests
 using network I/O because there are many network stack layers. Therefore,
 a communication method for collecting the data without using network is
 needed.

 I implemented something similar earlier by passing trace data from
 OpenBIOS to QEMU using the firmware configuration device. The data
 format was the same as QEMU used for simpletrace event structure
 instead of ftrace. I didn't commit it because of a few problems.

 Sounds interesting :)
 I guess you traced BIOS events, right?

Yes, I converted a few DPRINTFs to tracepoints as a proof of concept.


 I'm not familiar with ftrace, is it possible to trace two guest
 applications (BIOS and kernel) at the same time?

 Since ftrace itself is a tracing feature in the linux kernel, it
 can trace two or more applications (processes) if those run on linux
 kernel. However, I think OpenBIOS runs *under* the guest kernel.
 If so, ftrace currently can't trace OpenBIOS from guest side.

No, OpenBIOS boots the machine and then passes control to boot loader
and that to kernel. The kernel will make a few calls to OpenBIOS at
start but not later. OpenBIOS is used by QEMU as Sparc and PowerPC
BIOS.


 I think it may need another enhancement on both OpenBIOS and linux
 kernel to trace BIOS event from linux kernel.


Ideally both OpenBIOS and Linux should be able to feed trace events
back to QEMU independently.

 Or could this be
 handled by opening two different virtio-serial pipes, one for BIOS and
 the other for the kernel?

 Of course, virtio-serial itself can open multiple channels, thus, if
 OpenBIOS can handle virtio, it can pass trace data via another
 channel.

Currently OpenBIOS probes the PCI bus and identifies virtio devices
but ignores them, adding virtio-serial support shouldn't be too hard.
There's a time window between CPU boot and PCI probe when the the
device will not be available though.


 In my version, the tracepoint ID would have been used to demultiplex
 QEMU tracepoints from BIOS tracepoints, but something like separate ID
 spaces would have been better.

 I guess your feature notifies events to QEMU and QEMU records that in
 their own buffer. Therefore it must have different tracepoint IDs.
 On the other hand, with this feature, QEMU just passes trace-data to
 host-side pipe. Since outer tracing tool separately collects trace
 data, we don't need to demultiplex the data.

 Perhaps, in the analyzing phase (after tracing), we have to mix events
 again. At that time, we'll add some guest-ID for each event-ID, but
 it can be done offline.

Yes, the multiplexing/demultiplexing is only needed in my version
because the feeds are not independent.


 Best Regards,

 --
 Masami HIRAMATSU
 Software Platform Research Dept. Linux Technology Center
 Hitachi, Ltd., Yokohama Research Laboratory
 E-mail: masami.hiramatsu...@hitachi.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Sam Ravnborg
   +
   +#define CAN_BLOCK(_f) (!((_f)  VMCI_QPFLAG_NONBLOCK))
   +#define QP_PINNED(_f) ((_f)  VMCI_QPFLAG_PINNED)
  
  Looks like poor obscufation.
  Use a statis inline function if you need a helper for this.
 
 These definitions are intended more as a helper to make reading the code 
 easier.  IMHO ts a lot easier to read
 
 if (CAN_BLOCK(flags))
 
 compared to 
 
 if (!(flags  VMCI_QPFLAG_NONBLOCK))
 
 Wouldn't you agree?  I'm not sure something this simple warrants a static 
 inline
 function but I don't see any harm in converting it over to that.

I would put it the other way around. I cannot see that such simple stuff 
warrants a #define.
A static inline is (almost) always preferable to hide code in a macro.

For once you get better type-checks.
And semantics are also much simpler. With a macro you can do so many silly 
things.

Sam
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Andrew Stiegmann


- Original Message -
 From: Sam Ravnborg s...@ravnborg.org
 To: Andrew Stiegmann astiegm...@vmware.com
 Cc: linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, 
 pv-driv...@vmware.com,
 vm-crosst...@vmware.com, csch...@vmware.com, gre...@linuxfoundation.org
 Sent: Friday, July 27, 2012 12:53:20 PM
 Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
 
+
+#define CAN_BLOCK(_f) (!((_f)  VMCI_QPFLAG_NONBLOCK))
+#define QP_PINNED(_f) ((_f)  VMCI_QPFLAG_PINNED)
   
   Looks like poor obscufation.
   Use a statis inline function if you need a helper for this.
  
  These definitions are intended more as a helper to make reading the
  code easier.  IMHO ts a lot easier to read
  
  if (CAN_BLOCK(flags))
  
  compared to
  
  if (!(flags  VMCI_QPFLAG_NONBLOCK))
  
  Wouldn't you agree?  I'm not sure something this simple warrants a
  static inline
  function but I don't see any harm in converting it over to that.
 
 I would put it the other way around. I cannot see that such simple
 stuff warrants a #define.
 A static inline is (almost) always preferable to hide code in a
 macro.
 
 For once you get better type-checks.
 And semantics are also much simpler. With a macro you can do so many
 silly things.

Fair enough.  I'll make them into static inline functions.

   Sam
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Pv-drivers] [vmw_vmci 11/11] Apply the header code to make VMCI build

2012-07-27 Thread Dmitry Torokhov
On Fri, Jul 27, 2012 at 11:16:39AM -0700, Greg KH wrote:
 On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote:
   The kernel style is to use lower_case for everything.
   So this would become:
   
   vmci_device_get()
   
   This is obviously a very general comment and applies everywhere.
  
  I wish I could lower case these symbols but VMCI has already existed
  outside the mainline Linux tree for some time now and changing these
  exported symbols would mean that other drivers that depend on VMCI
  (vSock, vmhgfs) would need to change as well.   One thought that did
  come to mind was exporting both VMCI_Device_Get and vmci_device_get
  but that would likely just confuse people.  So in short I have made
  function names lower case where possible, but exported symbols could
  not be changed.
 
 Not true at all.  You want those drivers to be merged as well, right?
 So they will need to have their functions changed, and their code as
 well.
 
 Just wait until we get to the change your functionality around
 requests, those will require those drivers to change.  Right now we are
 at the silly and obvious things you did wrong stage of the review
 process :)
 
 So please fix these, and also, post these drivers as well, so we can see
 how they interact with the core code.
 
 Actually, if you are going to need lots of refactoring for these
 drivers, and the core, I would recommend putting this all in the staging
 tree, to allow that to happen over time.  That would ensure that your
 users keep having working systems, and let you modify the interfaces
 better and easier, than having to keep it all out-of-tree.
 
 What do you think?

Actually I think that we'd prefer to keep this in a patch-based form, at
least for now, because majority of our users get these drivers with
VMware Tools and will continue doing so until ditsributions start
enabling VMCI in their kernels. Which they probably won't until VMCI
moves form staging. We'd also have to constantly adjust drivers that we
are not working on getting upstream at this time to work with the
rapidly changing version of VMCI in staging, which will just add work
for us.

So we'd like to get more feedback and have a chance to address issues
and then decide whether staying in staging makes sense or not.

Thanks.

-- 
Dmitry
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


KVM Forum 2012 Call For Participation

2012-07-27 Thread KVM Forum 2012 Program Committee
=
KVM Forum 2012: Call For Participation
November 7-9, 2012 - Hotel Fira Palace - Barcelona, Spain

(All submissions must be received before midnight Aug 31st, 2012)
=

KVM is an industry leading open source hypervisor that provides
an ideal platform for datacenter virtualization, virtual desktop
infrastructure, and cloud computing.  Once again, it's time to bring
together the community of developers and users that define the KVM
ecosystem for our annual technical conference.  We will discuss the
current state of affairs and plan for the future of KVM, its surrounding
infrastructure, and management tools.  We are also excited to announce
the oVirt Workshop will run in parallel with the KVM Forum, bringing in
a community focused on enterprise datacenter virtualization management
built on KVM.  For topics which overlap we will have shared sessions.
So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon,
Based on feedback from last year, this time it's LinuxCon Europe!
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2012.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - PCI device assignment
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - VEPA, VN-Link, vswitch
 - Security
 - Architecture ports
 
 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support
 
 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized
 
 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management
 
 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: Aug 31st, 2012
Notification: Sep 14th, 2012

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Sep 14th.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

LIGHTNING TALKS

In addition to submitted talks we will also have some room for lightning
talks. These are short (5 minute) discussions to highlight new work or
ideas that aren't complete enough to warrant a full presentation slot.
Lightning talk submissions and scheduling will be handled on-site at
KVM Forum.

HOTEL / TRAVEL

The KVM Forum 2012 will be held in Barcelona, Spain at the Hotel Fira Palace.

http://events.linuxfoundation.org/events/kvm-forum/hotel

Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2012 in November!

Thanks,
your KVM Forum 2012 Program Commitee

Please contact us with any questions or comments.
kvm-forum-2012...@redhat.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

[PATCH V4 0/3] Improve virtio-blk performance

2012-07-27 Thread Asias He
Hi, Jens  Rusty

This version is rebased against linux-next which resolves the conflict with
Paolo Bonzini's 'virtio-blk: allow toggling host cache between writeback and
writethrough' patch.

Patch 1/3 and 2/3 applies on linus's master as well. Since Rusty will pick up
patch 3/3 so the changes to block core (adding blk_bio_map_sg()) will have a
user.

Jens, could you please consider picking up the dependencies 1/3 and 2/3 in your
tree. Thanks!

This patchset implements bio-based IO path for virito-blk to improve
performance.

Fio test shows bio-based IO path gives the following performance improvement:

1) Ramdisk device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%
2) Fusion IO device
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%

Asias He (3):
  block: Introduce __blk_segment_map_sg() helper
  block: Add blk_bio_map_sg() helper
  virtio-blk: Add bio-based IO path for virtio-blk

 block/blk-merge.c  |  117 +
 drivers/block/virtio_blk.c |  203 +++-
 include/linux/blkdev.h |2 +
 3 files changed, 247 insertions(+), 75 deletions(-)

-- 
1.7.10.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V4 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-27 Thread Asias He
This patch introduces bio-based IO path for virtio-blk.

Compared to request-based IO path, bio-based IO path uses driver
provided -make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.

When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.

Performance evaluation:
-
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%

Long version:
 With bio-based IO path:
  seq-read  : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
  seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
  rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
  rand-write: io=3095.7MB, bw=96198KB/s,  iops=192396 , runt= 32952msec
clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
  cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
  cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
  cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
  cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
 With request-based IO path:
  seq-read  : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
  seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
  rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
  rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
  cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
  cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
  cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
  cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985

2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%
Long Version:
 With bio-based IO path:
  read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
  write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
  read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
  write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
  cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
  cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
  cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
  cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
 With request-based IO path:
  read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
  write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
  read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
  write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
  cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
  cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
  cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
  cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409

How to use:
-
Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable -make_request_fn() based I/O path.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: 

Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-27 Thread Asias He

On 07/27/2012 08:33 AM, Rusty Russell wrote:

On Fri, 13 Jul 2012 16:38:51 +0800, Asias He as...@redhat.com wrote:

Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable -make_request_fn() based I/O path.


This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling
host cache between writeback and writethrough' which is also queued (see
linux-next).


Rebased against Paolo's patch in V4.


I'm not sure what the correct behavior for bio  cacheflush is, if any.


REQ_FLUSH is not supported in the bio path.


But as to the patch itself: it's a hack.

1) Leaving the guest's admin to turn on the switch is a terrible choice.
2) The block layer should stop merging and sorting when a device is
fast, not the driver.
3) I pointed out that slow disks have low IOPS, so why is this
conditional?  Sure, more guest exits, but it's still a small number
for a slow device.
4) The only case where we want merging is on a slow device when the host
isn't doing it.

Now, despite this, I'm prepared to commit it.  But in my mind it's a
hack: we should aim for use_bio to be based on a feature bit fed from
the host, and use the module parameter only if we want to override it.


OK. A feature bit from host sound like a choice but a switch is also 
needed on host side. And for other OS, e.g. Windows, the bio thing does 
not apply at all.


Anyway, I have to admit that adding a module parameter here is not the 
best choice. Let's think more.


--
Asias
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization