date:20121122

On Wed, Nov 21, 2012 at 03:27:48PM +0100, lementec fabien wrote:
 usage
 -
 PCIEFW devices are instanciated using the following QEMU options:
 -device \
  pciefw,\
  laddr=local_addr,\
  lport=local_port,\
  raddr=remote_addr,\
  rport=remote_port

Take a look at qemu_socket.h:socket_parse().  It should allow you to
support TCP, UNIX domain sockets, and arbitrary file descriptors.

 implementation
 --
 PCIEFW is a PCIE accesses forwarding device added to the QEMU source tree. At
 initialization, this device opens a bidirectionnal point to point 
 communication
 channel with an external process. This process actually implements the PCIE
 endpoint. That is, a PCIE access made by QEMU is forwarded to the process.
 Reciprocally, replies and interrupts messages from the process are forwarded
 to QEMU.

 The commnication currently relies on a bidirectionnal point to point TCP

s/commnication/communication/

 socket based channel. Byte ordering is little endian.

 PCIEFW initiates a request upon access from QEMU. It sends a message whose
 format is described by the pciefw_msg_t type:
 
 typedef struct pciefw_msg
 {
 #define PCIEFW_MSG_MAX_SIZE (offsetof(pciefw_msg_t, data) + 0x1000)

The size field is uint16_t.  Do you really want to limit to 4 KB of
data?


   pciefw_header_t header;

 #define PCIEFW_OP_READ_CONFIG 0
 #define PCIEFW_OP_WRITE_CONFIG 1
 #define PCIEFW_OP_READ_MEM 2
 #define PCIEFW_OP_WRITE_MEM 3
 #define PCIEFW_OP_READ_IO 4
 #define PCIEFW_OP_WRITE_IO 5
 #define PCIEFW_OP_INT 6
 #define PCIEFW_OP_MSI 7
 #define PCIEFW_OP_MSIX 8

   uint8_t op; /* in PCIEFW_OP_XXX */
   uint8_t bar; /* in [0:5] */
   uint8_t width; /* access in 1, 2, 4, 8 */
   uint64_t addr;
   uint16_t size; /* data size, in bytes */

Why is are both width and size fields?  For read-type operations the
size field would indicate how many bytes to read.  For write-type
operations the size field would indicate how many bytes are included in
data[].

   uint8_t data[1];

 } __attribute__((packed)) pciefw_msg_t;

 Note that data is a variable length field.

 The PCIE endpoint process replies with a pciefw_reply_t formatted message:

 typedef struct pciefw_reply
 {
   pciefw_header_t header;
   uint8_t status;

What values does this field take?

   uint8_t data[8];
 } __attribute__((packed)) pciefw_reply_t;

 The PCIE endpoint process can initiate pciefw_msg_t to perform write 
 operations
 of its own. This is used to perform data transfer (DMA engines ...) and send
 interrupts.

Any flow control rules?  For example, can the endpoint raise an
interrupt while processing a message (before it sends a reply)?

 Both types start with a pciefw_header containing the total size:

 typedef struct pciefw_header
 {
   uint16_t size;
 } __attribute__((packed)) pciefw_header_t;

A hello message type would be useful so that you can extend the
protocol in the future.  The message would contain feature bits or a
version number.

Stefan

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Stefan Priebe - Profihost AG


Am 21.11.2012 23:32, schrieb Peter Maydell:

On 21 November 2012 17:03, Stefan Weil s...@weilnetz.de wrote:

Why do you use int64_t instead of off_t?
If the value is related to file sizes, off_t would be a good choice.


Looking at the librbd API (which is what the size and ret
values come from), it uses size_t and ssize_t for these.
So I think probably ssize_t is the right type for ret
(and size) in our structs here.


This sounds reasonable but does ssize_t support negative values? For 
error values.


Greets,
Stefan

Re: [Qemu-devel] [PATCH] inet_parse: fix ,to= parsing

Stefano Stabellini stefano.stabell...@eu.citrix.com writes:

 Fix inet_parse to parse the ,to= command line option correctly.

 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

 diff --git a/qemu-sockets.c b/qemu-sockets.c
 index cfed9c5..f8740dd 100644
 --- a/qemu-sockets.c
 +++ b/qemu-sockets.c
 @@ -529,8 +529,9 @@ static InetSocketAddress *inet_parse(const char *str, 
 Error **errp)
  optstr = str + pos;
  h = strstr(optstr, ,to=);
  if (h) {
 -if (1 != sscanf(str, %d%n, to, pos) ||
 -(str[pos] != '\0'  str[pos] != ',')) {
 +h += 4;
 +if (1 != sscanf(h, %d%n, to, pos) ||
 +(h[pos] != '\0'  h[pos] != ',')) {
  error_setg(errp, error parsing to= argument);
  goto fail;
  }

Similar patch just got committed as 1ccbc285.  Thanks anyway!

Re: [Qemu-devel] [PATCH] configure: Default to 'cc', not 'gcc', on MacOS X

On 22 November 2012 00:04, Andreas Färber andreas.faer...@web.de wrote:
 Am 22.11.2012 00:19, schrieb Peter Maydell:
 On 17 November 2012 13:10, Peter Maydell peter.mayd...@linaro.org wrote:
 On 17 November 2012 13:02, Andreas Färber andreas.faer...@web.de wrote:
 Am 16.11.2012 17:37, schrieb Peter Maydell:
 +if test $(uname -s) = Darwin; then
 +  # On MacOS X the standard supported system compiler is 'cc' (usually 
 clang),
 +  # and 'gcc' is a legacy llvm-gcc which is rather elderly and best 
 avoided.

 This comment strikes me as wrong in this generality. It should at least
 be qualified with OSX version numbers.

 How about and if 'gcc' is not the same as 'cc' then it is a legacy llvm-gcc
 which is rather elderly and best avoided ? I'd rather not get into having
 to research which versions of OSX shipped with which compiler as 'cc',
 when really the point is that 'cc' will always give you whichever compiler
 Apple thought was the best default for that version.

 Andreas: ping? are you happy with this suggested rephrasing?

 Not quite... clang is a relatively new thing. On v10.5.8 ppc64 'cc' is a
 symlink to a real (well, Apple-flavoured) 'gcc-4.0'.

Yes, that's the case where gcc is the same as cc, ie the if condition
in the comment is false. (I guess this is saying my rephrasing is at
best not very clear...)

 What about ... (clang on recent systems) and 'gcc' may be a legacy
 llvm-gcc ...?

Sounds ok.

 Do you
 think this is 1.3 material? (now the static-stublib stuff is in it's
 less critical, but it still seems like the right idea...)

 I wouldn't be opposed to taking the default change into 1.3 as long as
 we can still override it to a specific compiler.

--cc=whatever remains available.

 But then again there's the question of why not doing it on Linux as well
 now that we seem to compile under clang, we have cc - gcc-4.7 on
 openSUSE 12.2. Among our supported platforms only Solaris comes to my
 mind where cc might be an incompatible proprietary compiler.

Do any of the BSDs ship with some odd non-GPL thing as cc ?

-- PMM

Re: [Qemu-devel] [PATCH V3 for-1.3] Build system fix distclean error for pixman

On 22 November 2012 02:07, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:
   Currently Makefile test if pixman have configure log, but the script 
 directly
 return error if that file do not exist. This patch fix it.

 v2: print out the command.
 v3: resend as a stand alone fix patch, add reviewer.

Really v2/v3 info should go below '---' but never mind.

You didn't put for-1.3 in the patch tag in the subject
so I have added it.

-- PMM

Re: [Qemu-devel] [PATCH] configure: Default to 'cc', not 'gcc', on MacOS X

2012-11-22 Thread Brad Smith

On Thu, Nov 22, 2012 at 08:33:17AM +, Peter Maydell wrote:
 On 22 November 2012 00:04, Andreas F??rber andreas.faer...@web.de wrote:
  Am 22.11.2012 00:19, schrieb Peter Maydell:
  On 17 November 2012 13:10, Peter Maydell peter.mayd...@linaro.org wrote:
  On 17 November 2012 13:02, Andreas F??rber andreas.faer...@web.de wrote:
  Am 16.11.2012 17:37, schrieb Peter Maydell:
  +if test $(uname -s) = Darwin; then
  +  # On MacOS X the standard supported system compiler is 'cc' (usually 
  clang),
  +  # and 'gcc' is a legacy llvm-gcc which is rather elderly and best 
  avoided.
 
  This comment strikes me as wrong in this generality. It should at least
  be qualified with OSX version numbers.
 
  How about and if 'gcc' is not the same as 'cc' then it is a legacy 
  llvm-gcc
  which is rather elderly and best avoided ? I'd rather not get into having
  to research which versions of OSX shipped with which compiler as 'cc',
  when really the point is that 'cc' will always give you whichever compiler
  Apple thought was the best default for that version.
 
  Andreas: ping? are you happy with this suggested rephrasing?
 
  Not quite... clang is a relatively new thing. On v10.5.8 ppc64 'cc' is a
  symlink to a real (well, Apple-flavoured) 'gcc-4.0'.
 
 Yes, that's the case where gcc is the same as cc, ie the if condition
 in the comment is false. (I guess this is saying my rephrasing is at
 best not very clear...)
 
  What about ... (clang on recent systems) and 'gcc' may be a legacy
  llvm-gcc ...?
 
 Sounds ok.
 
  Do you
  think this is 1.3 material? (now the static-stublib stuff is in it's
  less critical, but it still seems like the right idea...)
 
  I wouldn't be opposed to taking the default change into 1.3 as long as
  we can still override it to a specific compiler.
 
 --cc=whatever remains available.
 
  But then again there's the question of why not doing it on Linux as well
  now that we seem to compile under clang, we have cc - gcc-4.7 on
  openSUSE 12.2. Among our supported platforms only Solaris comes to my
  mind where cc might be an incompatible proprietary compiler.
 
 Do any of the BSDs ship with some odd non-GPL thing as cc ?

non-GPL yes, odd no.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

On 22 November 2012 08:23, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
 Am 21.11.2012 23:32, schrieb Peter Maydell:
 Looking at the librbd API (which is what the size and ret
 values come from), it uses size_t and ssize_t for these.
 So I think probably ssize_t is the right type for ret
 (and size) in our structs here.


 This sounds reasonable but does ssize_t support negative values? For error
 values.

Yes, the first 's' in ssize_t means 'signed' and is the
difference between it and size_t.

-- PMM

Re: [Qemu-devel] [PATCH] configure: Default to 'cc', not 'gcc', on MacOS X

On 22 November 2012 08:38, Brad Smith b...@comstyle.com wrote:
 On Thu, Nov 22, 2012 at 08:33:17AM +, Peter Maydell wrote:
 On 22 November 2012 00:04, Andreas F??rber andreas.faer...@web.de wrote:
  But then again there's the question of why not doing it on Linux as well
  now that we seem to compile under clang, we have cc - gcc-4.7 on
  openSUSE 12.2. Among our supported platforms only Solaris comes to my
  mind where cc might be an incompatible proprietary compiler.

 Do any of the BSDs ship with some odd non-GPL thing as cc ?

 non-GPL yes, odd no.

Does QEMU build OK with them? Should we be preferring them
to gcc?

-- PMM

Re: [Qemu-devel] [PATCH] configure: Default to 'cc', not 'gcc', on MacOS X

2012-11-22 Thread Brad Smith

On Thu, Nov 22, 2012 at 08:41:19AM +, Peter Maydell wrote:
 On 22 November 2012 08:38, Brad Smith b...@comstyle.com wrote:
  On Thu, Nov 22, 2012 at 08:33:17AM +, Peter Maydell wrote:
  On 22 November 2012 00:04, Andreas F??rber andreas.faer...@web.de wrote:
   But then again there's the question of why not doing it on Linux as well
   now that we seem to compile under clang, we have cc - gcc-4.7 on
   openSUSE 12.2. Among our supported platforms only Solaris comes to my
   mind where cc might be an incompatible proprietary compiler.
 
  Do any of the BSDs ship with some odd non-GPL thing as cc ?
 
  non-GPL yes, odd no.
 
 Does QEMU build OK with them? Should we be preferring them
 to gcc?

The release notes for 1.3 seem to indicate Clang is supported
now so I would think so. AFAIK FreeBSD -current does not
ship with gcc anymore. Bitrig is a fork of OpenBSD that only
ships with Clang.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [Qemu-devel] [PATCH for 1.3 V3] Build system fix distclean error for pixman

2012-11-22 Thread Wenchao Xia




On 22 November 2012 02:07, Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:

   Currently Makefile test if pixman have configure log, but the script directly
return error if that file do not exist. This patch fix it.

v2: print out the command.
v3: resend as a stand alone fix patch, add reviewer.


Really v2/v3 info should go below '---' but never mind.

You didn't put for-1.3 in the patch tag in the subject
so I have added it.

-- PMM


  Thanks for the correcting, by referencing to other patches for v1.3,
I changed the subject to PATCH for 1.3 V3, hope it can draw attention.

--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH] coroutine: Fix win32 variant for older mingw32 compilers

Il 21/11/2012 20:11, Stefan Weil ha scritto:
 
 Debian cross development is full of difficulties. Passing an extra
 compiler option like -mthread is only one of these difficulties.
 I updated http://wiki.qemu.org/Hosts/W32, so anybody who really
 wants to run cross compilations on Debian can get more information
 there.

Thanks very much!

Paolo

[Qemu-devel] [Bug 156085] Re: Could not open /proc/bus/usb/devices

2012-11-22 Thread Bug Watch Updater

** Changed in: usbview (Debian)
   Status: New = Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/156085

Title:
  Could not open /proc/bus/usb/devices

Status in QEMU:
  Fix Released
Status in Virtualbox:
  Fix Released
Status in “apcupsd” package in Ubuntu:
  Invalid
Status in “fxload” package in Ubuntu:
  Fix Released
Status in “kvm” package in Ubuntu:
  Fix Released
Status in “madfuload” package in Ubuntu:
  Fix Released
Status in “midisport-firmware” package in Ubuntu:
  Fix Released
Status in “qemu” package in Ubuntu:
  Fix Released
Status in “sysvinit” package in Ubuntu:
  Won't Fix
Status in “usbview” package in Ubuntu:
  Triaged
Status in “vmware-server” package in Ubuntu:
  Invalid
Status in “apcupsd” source package in Gutsy:
  Invalid
Status in “fxload” source package in Gutsy:
  Won't Fix
Status in “kvm” source package in Gutsy:
  Won't Fix
Status in “madfuload” source package in Gutsy:
  Won't Fix
Status in “midisport-firmware” source package in Gutsy:
  Won't Fix
Status in “qemu” source package in Gutsy:
  Won't Fix
Status in “sysvinit” source package in Gutsy:
  Invalid
Status in “usbview” source package in Gutsy:
  Won't Fix
Status in “vmware-server” source package in Gutsy:
  Won't Fix
Status in “apcupsd” source package in Hardy:
  Invalid
Status in “fxload” source package in Hardy:
  New
Status in “kvm” source package in Hardy:
  Confirmed
Status in “madfuload” source package in Hardy:
  New
Status in “midisport-firmware” source package in Hardy:
  New
Status in “qemu” source package in Hardy:
  Confirmed
Status in “sysvinit” source package in Hardy:
  Won't Fix
Status in “usbview” source package in Hardy:
  Confirmed
Status in “vmware-server” source package in Hardy:
  New
Status in “apcupsd” source package in Jaunty:
  Invalid
Status in “fxload” source package in Jaunty:
  Fix Released
Status in “kvm” source package in Jaunty:
  Fix Released
Status in “madfuload” source package in Jaunty:
  Fix Released
Status in “midisport-firmware” source package in Jaunty:
  Fix Released
Status in “qemu” source package in Jaunty:
  Fix Released
Status in “sysvinit” source package in Jaunty:
  Won't Fix
Status in “usbview” source package in Jaunty:
  Won't Fix
Status in “vmware-server” source package in Jaunty:
  Invalid
Status in “usbview” package in Debian:
  Fix Released
Status in “qemu” package in openSUSE:
  Fix Released

Bug description:
  On Gutsy 64-bit using qemu/kvm  (also VirtualBox and, I believe,
  VMWare) and the -usb switch no USB devices are available since
  qemu/kvm looks for the description of them in the 'devices' file of a
  usbfs mount at

  /proc/bus/usb/

  but the code previously providing the usbfs mount that is created at
  start-up via /etc/rcS/S11mountdevsubfs.sh --
  /etc/init.d/mountdevsubfs.sh was commented out. The USB devices are
  now reported at

  /dev/bus/usb/

  but currently there is no usbfs file-system mounted/linked there to
  provide the 'devices' text-file.

  qemu/kvm reports:

  qemuctl -qemu vdeq kvm  -boot c -m 512 -hda /home/all/VirtualMachines
  /WindowsXP-SP2.ovl -net nic,vlan=0 -soundhw es1370 -usb -net vde

  info usbhost
  Could not open /proc/bus/usb/devices

  $ ls -l /proc/bus/usb
  total 0

  
  WORKAROUND:
  To re-enable it edit the file '/etc/init.d/mountdevsubfs.sh' and uncomment 
the code after line 40 so it reads:
   # Magic to make /proc/bus/usb work
#
mkdir -p /dev/bus/usb/.usbfs
domount usbfs  /dev/bus/usb/.usbfs 
-obusmode=0700,devmode=0600,listmode=0644
ln -s .usbfs/devices /dev/bus/usb/devices
mount --rbind /dev/bus/usb /proc/bus/usb
  Execute the script manually to enable the change before a system reboot:
  $ sudo /etc/init.d/mountdevsubfs.sh start

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/156085/+subscriptions

[Qemu-devel] [PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Stefan Priebe

When acb-cmd is WRITE or DISCARD block/rbd stores rcb-size into acb-ret

Look here:
   if (acb-cmd == RBD_AIO_WRITE ||
acb-cmd == RBD_AIO_DISCARD) {
if (r  0) {
acb-ret = r;
acb-error = 1;
} else if (!acb-error) {
acb-ret = rcb-size;
}

right now acb-ret is just an int and we might get an overflow if size is too 
big.
For discards rcb-size holds the size of the discard - this might be some TB if 
you
discard a whole device.

The steps to reproduce are:
mkfs.xfs -f a whole device bigger than int in bytes. mkfs.xfs sends a discard. 
Important is that you use scsi-hd and set discard_granularity=512. Otherwise 
rbd disabled discard support.
---
 block/rbd.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 5a0f79f..0384c6c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -69,7 +69,7 @@ typedef enum {
 typedef struct RBDAIOCB {
 BlockDriverAIOCB common;
 QEMUBH *bh;
-int ret;
+ssize_t ret;
 QEMUIOVector *qiov;
 char *bounce;
 RBDAIOCmd cmd;
@@ -86,7 +86,7 @@ typedef struct RADOSCB {
 int done;
 int64_t size;
 char *buf;
-int ret;
+ssize_t ret;
 } RADOSCB;
 
 #define RBD_FD_READ 0
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 2/2] Documentation: Update image format information

2012-11-22 Thread Kevin Wolf

Am 22.11.2012 08:37, schrieb Stefan Hajnoczi:
 On Wed, Nov 21, 2012 at 04:22:07PM +0100, Kevin Wolf wrote:
 Am 21.11.2012 16:14, schrieb Stefan Hajnoczi:
 On Wed, Nov 21, 2012 at 02:23:57PM +0100, Kevin Wolf wrote:
  @item qed
 -Image format with support for backing files and compact image files (when 
 your
 -filesystem or transport medium does not support holes).  Good performance 
 due
 -to less metadata than the more featureful qcow2 format, especially with
 -cache=writethrough or cache=directsync.  Consider using qcow2 which will 
 soon
 -have a similar optimization and is most actively developed.
 +Old QEMU image format. Left for compatibility.
 +
 +For new images, use qcow2 instead. You might want to consider using the
 +@code{lazy_refcounts=on} option to get a more QED-like behaviour.

 The first sentence should be kept, it describes the general feature set
 and scope of this image format.  I agree that the rest of the paragraph
 can be dropped.

 You could insert a statement saying that qcow2 is now preferred because
 it is actively developed and offers advanced features and performance as
 the very first sentence.

 It's the same terse description as for qcow1. If we decided to describe
 the format in more detail, we should do so consistently, and probably
 also for non-native formats. (But why? The only use case is
 compatibility with other/older hypervisors, which is mentioned.)
 
 Yes, we should have descriptions of other formats too.
 
 Users dealing with existing guests using these formats should still have
 access to general information about the formats.  For example, if I'm a
 new user and it's my job to work with an existing qcow1 disk image then
 it's not very helpful to just see a terse Use qcow2 instead message.

 It's not good to drop documentation on a feature because it is
 deprecated.

I can see your point, but the whole section starts to become a bit
lengthy then for a man page. Also, I don't think that your qcow1 user
really needs a detailed description - he already has a format and
doesn't have to choose one, so the characteristics aren't that
important. If he considers converting the image, we're back to raw and
qcow2.

Maybe we should keep the current descriptions for raw and qcow2 (as they
are what users really choose between for new images), and extend and
move the documentation of other formats into qemu-doc.texi and mention
them in the man page of qemu-img only as Also supported for
compatibility with other hypervisors or older QEMU versions: vmdk, vdi,
qed, ... See the QEMU Emulator User Documentation for more information
on these.

Kevin

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Stefan Priebe - Profihost AG


Hello,

i send a new patch using ssize_t. (Subject [PATCH] overflow of int ret: 
use ssize_t for ret)


Stefan

Am 22.11.2012 09:40, schrieb Peter Maydell:

On 22 November 2012 08:23, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:

Am 21.11.2012 23:32, schrieb Peter Maydell:

Looking at the librbd API (which is what the size and ret
values come from), it uses size_t and ssize_t for these.
So I think probably ssize_t is the right type for ret
(and size) in our structs here.



This sounds reasonable but does ssize_t support negative values? For error
values.


Yes, the first 's' in ssize_t means 'signed' and is the
difference between it and size_t.

-- PMM

Re: [Qemu-devel] [PATCH v3] Add compare subcommand for qemu-img

On Wed, Nov 21, 2012 at 04:50:14AM -0500, Miroslav Rezanina wrote:
 diff --git a/qemu-img.c b/qemu-img.c
 index e29e01b..6294b11 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -101,7 +101,12 @@ static void help(void)
   '-a' applies a snapshot (revert disk to saved state)\n
   '-c' creates a snapshot\n
   '-d' deletes a snapshot\n
 - '-l' lists all snapshots in the given image\n;
 + '-l' lists all snapshots in the given image\n

Please add \n to separate like the other parameter docs for
subcommands.

 +   Parameters to compare subcommand:\n
 + '-f' First image format\n
 + '-F' Second image format\n
 + '-q' Quiet mode - do not print any output (except errors)\n
 + '-s' Strict mode - fail on different image size or sector 
 allocation\n;
  
  printf(%s\nSupported formats:, help_msg);
  bdrv_iterate_format(format_print, NULL);
 @@ -657,6 +662,277 @@ static int compare_sectors(const uint8_t *buf1, const 
 uint8_t *buf2, int n,
  }
  
  #define IO_BUF_SIZE (2 * 1024 * 1024)
 +#define sector_to_bytes(sector) ((sector)  BDRV_SECTOR_BITS)

No macros, please.  Hiding the sector conversion isn't consistent anyway
since further down in your patch you explicitly use  BDRV_SECTOR_BITS.

Either open code or define static inline functions so we have type
information.

 +
 +/*
 + * Get number of sectors that can be stored in IO buffer.
 + */
 +
 +static int64_t sectors_to_process(int64_t total, int64_t from)

Doc comments fit snuggly against the function definition, no newline
please.  QEMU code isn't very consistent in doc comment formatting in
general but it does not use a newline here.

 +for (;;) {
 +c = getopt(argc, argv, pf:F:sq);
 +if (c == -1) {
 +break;
 +}
 +switch (c) {
 +case 'f':
 +fmt1 = optarg;
 +break;
 +case 'F':
 +fmt2 = optarg;
 +break;
 +case 'p':
 +progress = (quiet == 0) ? 1 : 0;
 +break;
 +case 'q':
 +quiet = 1;
 +if (progress == 1) {
 +progress = 0;
 +}
 +break;

It's cleaner to silence progress after all options have been parsed than
to duplicate the quiet/progress checking.

/* -q overrides -p */
if (quiet) {
progress = 0;
}

 +case 's':
 +strict = 1;
 +break;
 +}

case 'h':
case '?':
help();
break;

 +}
 +if (optind = argc) {

This subcommand takes two filenames, so check the number of arguments is
correct:

if (optind + 1 = argc) {

 +help();
 +goto out3;
 +}
 +filename1 = argv[optind++];
 +filename2 = argv[optind++];
 +
 +/* Initialize before goto out */
 +qemu_progress_init(progress, 2.0);
 +
 +bs1 = bdrv_new_open(filename1, fmt1, flags, true);
 +if (!bs1) {
 +error_report(Can't open file %s, filename1);
 +ret = 2;
 +goto out3;
 +}
 +
 +bs2 = bdrv_new_open(filename2, fmt2, flags, true);
 +if (!bs2) {
 +error_report(Can't open file %s:, filename2);

Stray ':'?

 +ret = 2;
 +goto out2;
 +}
 +
 +buf1 = qemu_blockalign(bs1, IO_BUF_SIZE);
 +buf2 = qemu_blockalign(bs2, IO_BUF_SIZE);
 +bdrv_get_geometry(bs1, bs_sectors);
 +total_sectors1 = bs_sectors;
 +bdrv_get_geometry(bs2, bs_sectors);
 +total_sectors2 = bs_sectors;
 +total_sectors = total_sectors1;
 +progress_base = total_sectors;
 +
 +qemu_progress_print(0, 100);
 +
 +if (total_sectors1 != total_sectors2) {
 +BlockDriverState *bsover;
 +int64_t lo_total_sectors, lo_sector_num;
 +const char *filename_over;
 +
 +if (strict) {
 +ret = 1;
 +if (!quiet) {
 +printf(Strict mode: Image size mismatch!\n);
 +}
 +goto out;
 +} else {
 +error_report(Image size mismatch!);
 +}
 +
 +if (total_sectors1  total_sectors2) {
 +total_sectors = total_sectors2;
 +lo_total_sectors = total_sectors1;
 +lo_sector_num = total_sectors2;
 +bsover = bs1;
 +filename_over = filename1;
 +} else {
 +total_sectors = total_sectors1;
 +lo_total_sectors = total_sectors2;
 +lo_sector_num = total_sectors1;
 +bsover = bs2;
 +filename_over = filename2;
 +}
 +
 +progress_base = lo_total_sectors;
 +
 +for (;;) {
 +nb_sectors = sectors_to_process(lo_total_sectors, lo_sector_num);
 +if (nb_sectors = 0) {
 +break;
 +}
 +rv = bdrv_is_allocated(bsover, lo_sector_num, nb_sectors, pnum);

We should error out if a backing image has non-zero data:

rv = bdrv_is_allocated_above(bsover, NULL, lo_sector_num, nb_sectors

Re: [Qemu-devel] [PATCH] Fix missing TRACE exception

2012-11-22 Thread Julio Guerra

2012/10/19 Julio Guerra gu...@julio.in:
 This patch fixes bug 1031698 :
 https://bugs.launchpad.net/qemu/+bug/1031698

 If we look at the (truncated) translation of the conditional branch
 instruction in the test submitted in the bug post, the call to the
 exception helper is missing in the bne-false chunk of translated
 code :

 IN:
 bne-0x1800278

 OUT:
 0xb544236d:  jne0xb5442396

 0xb5442373:  mov%ebp,(%esp)
 0xb5442376:  mov$0x44,%ebx
 0xb544237b:  mov%ebx,0x4(%esp)
 0xb544237f:  mov$0x1800278,%ebx
 0xb5442384:  mov%ebx,0x25c(%ebp)
 0xb544238a:  call   0x827475a
  ^^
 # OK : call the exception helper function

 0xb5442396:  mov%ebp,(%esp)
 0xb5442399:  mov$0x44,%ebx
 0xb544239e:  mov%ebx,0x4(%esp)
 0xb54423a2:  mov$0x1800270,%ebx
 0xb54423a7:  mov%ebx,0x25c(%ebp)
 # KO : missing call 0x827475a


 Indeed, gen_exception(ctx, excp) called by gen_goto_tb (called by
 gen_bcond) changes ctx-exception's value to excp's :

 gen_bcond()
 {
   gen_goto_tb(ctx, 0, ctx-nip + li - 4);
   /* ctx-exception value is POWERPC_EXCP_BRANCH */

   gen_goto_tb(ctx, 1, ctx-nip);
   /* ctx-exception now value is POWERPC_EXCP_TRACE */
 }


 Making the following gen_goto_tb()'s test false during the second call :

 if ((ctx-singlestep_enabled 
 (CPU_BRANCH_STEP | CPU_SINGLE_STEP)) 
 ctx-exception == POWERPC_EXCP_BRANCH /* false...*/) {
  target_ulong tmp = ctx-nip;
  ctx-nip = dest;
  /* ... and this is the missing call */
  gen_exception(ctx, POWERPC_EXCP_TRACE);
  ctx-nip = tmp;
 }

 So the patch simply adds the missing matching case, fixing our problem.

 Signed-off-by: Julio Guerra gu...@julio.in
 ---
  target-ppc/translate.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff -up a/target-ppc/translate.c b/target-ppc/translate.c
 --- a/target-ppc/translate.c
 +++ b/target-ppc/translate.c
 @@ -3466,7 +3466,8 @@ static inline void gen_goto_tb(DisasCont
  if (unlikely(ctx-singlestep_enabled)) {
  if ((ctx-singlestep_enabled 
  (CPU_BRANCH_STEP | CPU_SINGLE_STEP)) 
 -ctx-exception == POWERPC_EXCP_BRANCH) {
 +(ctx-exception == POWERPC_EXCP_BRANCH ||
 + ctx-exception == POWERPC_EXCP_TRACE)) {
  target_ulong tmp = ctx-nip;
  ctx-nip = dest;
  gen_exception(ctx, POWERPC_EXCP_TRACE);


ping

--
Julio Guerra

Re: [Qemu-devel] [Bug 1081416] [NEW] Qemu 1.2.0 crashes when using tcp serial console and GRUB boots

On Wed, Nov 21, 2012 at 03:14:28AM -, Jérôme Poulin wrote:
 When booting OpenWRT Attitude Adjustement ( 
 http://downloads.openwrt.org/attitude_adjustment/12.09-beta2/x86/generic/openwrt-x86-generic-combined-ext4.img.gz
  ) with this command line:
 qemu-system-x86_64 -serial tcp:127.0.0.1: -hda 
 openwrt-x86-generic-combined-ext4.img
 
 Qemu crashes as soon as GRUB starts, after network cards start.
[...]
 Program received signal SIGABRT, Aborted.
 0x7452bfa5 in raise () from /usr/lib/libc.so.6
 (gdb) bt
 #0  0x7452bfa5 in raise () from /usr/lib/libc.so.6
 #1  0x7452d428 in abort () from /usr/lib/libc.so.6
 #2  0x7456acfb in __libc_message () from /usr/lib/libc.so.6
 #3  0x745f2ad7 in __fortify_fail () from /usr/lib/libc.so.6
 #4  0x745f0bb0 in __chk_fail () from /usr/lib/libc.so.6
 #5  0x745f2a47 in __fdelt_warn () from /usr/lib/libc.so.6
 #6  0x0046a628 in qemu_iohandler_poll (readfds=0xdb7da0 rfds, 
 writefds=0xdb7e20 wfds, xfds=0x6, xfds@entry=0xdb7ea0 xfds, ret=-1, 
 ret@entry=1) at iohandler.c:121
 #7  0x004e8a14 in main_loop_wait (nonblocking=optimized out)
 at main-loop.c:497
 #8  0x004e802b in main_loop ()
 at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:1643
 #9  main (argc=optimized out, argv=optimized out, envp=optimized out)
 at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:3755

Can't reproduce on qemu.git/master (1ccbc2851282564308f790753d7158487b6af8e2) or
qemu-system-x86-1.2.0-23.fc18.x86_64.

I get to the OpenWRT root prompt.

Please build qemu.git/master from source to verify whether this issue
still exists:

  $ git clone git://git.qemu-project.org/qemu.git
  $ cd qemu
  $ ./configure --target-list=x86_64-softmmu  make
  $ x86_64-softmmu/qemu-system-x86_64 -serial tcp:127.0.0.1: -hda 
openwrt-x86-generic-combined-ext4.img

Note that if you want to connect to the serial port you should use
-serial tcp:127.0.0.1:,server.  The command-line you specified tries
to connect to 127.0.0.1: as a client instead of listening as a
server.

Thanks,
Stefan

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
 The iov_get_ptr() data returns a pointer to contiguous data within a
 vector.  This allows the caller to manipulate data inside the vector
 without copying in/out using iov_from_buf()/iov_to_buf() when we know
 that data is contiguous within an iovec element.

This works for you because you have a single byte to write.  It would
not work for the SG_IO inhdr, which would need iov_to_buf().

What about the following alternative API:

void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
  ssize_t offset, size_t *bytes);

which would place the number of valid bytes (i.e. the length of the
remainder of the iovec entry) in *bytes?

Also, I think that offset == iov_size(iov, iov_cnt) should be
acceptable, and it would be the only case in which *bytes == 0.

Otherwise looks good.

Paolo

 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  iov.c | 25 +
  iov.h | 11 +++
  2 files changed, 36 insertions(+)
 
 diff --git a/iov.c b/iov.c
 index 6eed089..bc78c34 100644
 --- a/iov.c
 +++ b/iov.c
 @@ -395,3 +395,28 @@ size_t iov_discard(struct iovec **iov, unsigned int 
 *iov_cnt, ssize_t bytes)
  }
  return total;
  }
 +
 +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
 +  ssize_t offset, size_t bytes)
 +{
 +if (offset  0) {
 +offset += iov_size(iov, iov_cnt);
 +if (offset  0) {
 +return NULL; /* offset before beginning of vector */
 +}
 +}
 +
 +while (iov_cnt  0) {
 +if (offset  iov-iov_len) {
 +if (bytes  iov-iov_len - offset) {
 +return NULL; /* would span iovec elements */
 +}
 +return iov-iov_base + offset;
 +}
 +
 +offset -= iov-iov_len;
 +iov_cnt--;
 +iov++;
 +}
 +return NULL; /* offset beyond end of vector */
 +}
 diff --git a/iov.h b/iov.h
 index d6d1fa6..674dd51 100644
 --- a/iov.h
 +++ b/iov.h
 @@ -108,3 +108,14 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int 
 dst_iov_cnt,
   * smaller than requested if the vector is too small.
   */
  size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes);
 +
 +/*
 + * Get a pointer into a vector at offset if the given number of bytes is
 + * contiguous and not split across iovec elements.  NULL is returned if
 + * memory would span iovec elements or exceed the length of the vector.
 + *
 + * The offset is ssize_t so that an offset from the end of the vector can
 + * be specified with a negative number.
 + */
 +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt, ssize_t offset,
 +  size_t bytes);

Re: [Qemu-devel] [GitHub] [Qemu-commits] [qemu/qemu] ecdffb: tcg/ppc: Remove unused s_bits variable

malc av1...@comtv.ru writes:

 On Tue, 20 Nov 2012, Paolo Bonzini wrote:

 Il 20/11/2012 10:59, malc ha scritto:

 [..snip..]

 
  FSF says
  
  And what it says about CC0 or whatnot is utterly irrelevent.
 
 Surely their lawyers are better at copyright law than malc.

 Surely they are masters of Civil code of RF.

Let's assume /* public domain */ suffices under the laws that apply to
your contributions, because you said so.

Even then it's still unadvisable.  Because legal boilerplate gets
copied.  And the copiers may well be in jurisdictions where your note
doesn't suffice.

You, as a commiter, ought to set a good example.

[...]

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
 Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
  The iov_get_ptr() data returns a pointer to contiguous data within a
  vector.  This allows the caller to manipulate data inside the vector
  without copying in/out using iov_from_buf()/iov_to_buf() when we know
  that data is contiguous within an iovec element.
 
 This works for you because you have a single byte to write.  It would
 not work for the SG_IO inhdr, which would need iov_to_buf().
 
 What about the following alternative API:
 
 void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
   ssize_t offset, size_t *bytes);
 
 which would place the number of valid bytes (i.e. the length of the
 remainder of the iovec entry) in *bytes?
 
 Also, I think that offset == iov_size(iov, iov_cnt) should be
 acceptable, and it would be the only case in which *bytes == 0.
 
 Otherwise looks good.
 
 Paolo

All this looks suspiciously like premature optimization to me.
Do we have data to show avoiding header copy is a win?

The caller would have to handle the case where the header is not
contigious, which seems just too complex - in practice the code
seems to simply fail in this case.

Why not just copy the header using iov_from_buf()/iov_to_buf()
and be done with it?


  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   iov.c | 25 +
   iov.h | 11 +++
   2 files changed, 36 insertions(+)
  
  diff --git a/iov.c b/iov.c
  index 6eed089..bc78c34 100644
  --- a/iov.c
  +++ b/iov.c
  @@ -395,3 +395,28 @@ size_t iov_discard(struct iovec **iov, unsigned int 
  *iov_cnt, ssize_t bytes)
   }
   return total;
   }
  +
  +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
  +  ssize_t offset, size_t bytes)
  +{
  +if (offset  0) {
  +offset += iov_size(iov, iov_cnt);
  +if (offset  0) {
  +return NULL; /* offset before beginning of vector */
  +}
  +}
  +
  +while (iov_cnt  0) {
  +if (offset  iov-iov_len) {
  +if (bytes  iov-iov_len - offset) {
  +return NULL; /* would span iovec elements */
  +}
  +return iov-iov_base + offset;
  +}
  +
  +offset -= iov-iov_len;
  +iov_cnt--;
  +iov++;
  +}
  +return NULL; /* offset beyond end of vector */
  +}
  diff --git a/iov.h b/iov.h
  index d6d1fa6..674dd51 100644
  --- a/iov.h
  +++ b/iov.h
  @@ -108,3 +108,14 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int 
  dst_iov_cnt,
* smaller than requested if the vector is too small.
*/
   size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t 
  bytes);
  +
  +/*
  + * Get a pointer into a vector at offset if the given number of bytes is
  + * contiguous and not split across iovec elements.  NULL is returned if
  + * memory would span iovec elements or exceed the length of the vector.
  + *
  + * The offset is ssize_t so that an offset from the end of the vector can
  + * be specified with a negative number.
  + */
  +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt, ssize_t offset,
  +  size_t bytes);

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

Il 22/11/2012 10:45, Michael S. Tsirkin ha scritto:
 All this looks suspiciously like premature optimization to me.
 Do we have data to show avoiding header copy is a win?

The code is a little simpler, because we know the footer is 1 byte only.

Paolo

 The caller would have to handle the case where the header is not
 contigious, which seems just too complex - in practice the code
 seems to simply fail in this case.
 
 Why not just copy the header using iov_from_buf()/iov_to_buf()
 and be done with it?

[Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-22 Thread Stefan Priebe

This one fixes a race which qemu had also in iscsi block driver
between cancellation and io completition.

qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.

To archieve this it introduces a new status flag which uses
-EINPROGRESS.

Signed-off-by: Stefan Priebe s.pri...@profihost.ag
---
 block/rbd.c |   23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 0384c6c..783c3d7 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -77,6 +77,7 @@ typedef struct RBDAIOCB {
 int error;
 struct BDRVRBDState *s;
 int cancelled;
+int status;
 } RBDAIOCB;
 
 typedef struct RADOSCB {
@@ -376,12 +377,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 RBDAIOCB *acb = rcb-acb;
 int64_t r;
 
-if (acb-cancelled) {
-qemu_vfree(acb-bounce);
-qemu_aio_release(acb);
-goto done;
-}
-
 r = rcb-ret;
 
 if (acb-cmd == RBD_AIO_WRITE ||
@@ -406,10 +401,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 acb-ret = r;
 }
 }
+acb-status = 0;
+
 /* Note that acb-bh can be NULL in case where the aio was cancelled */
 acb-bh = qemu_bh_new(rbd_aio_bh_cb, acb);
 qemu_bh_schedule(acb-bh);
-done:
 g_free(rcb);
 }
 
@@ -574,6 +570,12 @@ static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 RBDAIOCB *acb = (RBDAIOCB *) blockacb;
 acb-cancelled = 1;
+
+while (acb-status == -EINPROGRESS) {
+qemu_aio_wait();
+}
+
+qemu_aio_release(acb);
 }
 
 static AIOPool rbd_aio_pool = {
@@ -646,7 +648,8 @@ static void rbd_aio_bh_cb(void *opaque)
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
 
-qemu_aio_release(acb);
+if (!acb-cancelled)
+qemu_aio_release(acb);
 }
 
 static int rbd_aio_discard_wrapper(rbd_image_t image,
@@ -691,6 +694,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 acb-s = s;
 acb-cancelled = 0;
 acb-bh = NULL;
+acb-status = -EINPROGRESS;
 
 if (cmd == RBD_AIO_WRITE) {
 qemu_iovec_to_buf(acb-qiov, 0, acb-bounce, qiov-size);
@@ -737,7 +741,8 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 failed:
 g_free(rcb);
 s-qemu_aio_count--;
-qemu_aio_release(acb);
+if (!acb-cancelled)
+qemu_aio_release(acb);
 return NULL;
 }
 
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-22 Thread Stefan Priebe - Profihost AG


Hello,

i've send a new patch which hopefully cares about all your comments.

[PATCH] rbd block driver fix race between aio completition and aio cancel

Greets
Stefan

Am 21.11.2012 10:07, schrieb Stefan Hajnoczi:

On Mon, Nov 19, 2012 at 09:39:45PM +0100, Stefan Priebe wrote:

@@ -376,9 +376,7 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  RBDAIOCB *acb = rcb-acb;
  int64_t r;

-if (acb-cancelled) {
-qemu_vfree(acb-bounce);
-qemu_aio_release(acb);
+if (acb-bh) {
  goto done;
  }


When is acb-bh != NULL here?



@@ -406,9 +404,12 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
  acb-ret = r;
  }
  }
+acb-status = acb-ret;


How about initializing acb-ret with -EINPROGRESS and using it instead
of adding a new status field?


@@ -573,7 +574,10 @@ static void qemu_rbd_close(BlockDriverState *bs)
  static void qemu_rbd_aio_cancel(BlockDriverAIOCB *blockacb)
  {
  RBDAIOCB *acb = (RBDAIOCB *) blockacb;
-acb-cancelled = 1;
+
+while (acb-status == -EINPROGRESS) {
+qemu_aio_wait();
+}
  }


Need to take care that acb is still valid (not yet released!) when the
while loop iterates.

One way of doing this is to mark the acb as cancelled so the completion
handler won't release it.  Then the cancellation function can release
the acb - it's the last piece of code that needs a reference to acb.  In
this case the acb-cancelled field is useful.


@@ -642,10 +646,11 @@ static void rbd_aio_bh_cb(void *opaque)
  qemu_iovec_from_buf(acb-qiov, 0, acb-bounce, acb-qiov-size);
  }
  qemu_vfree(acb-bounce);
-acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 : acb-ret));
  qemu_bh_delete(acb-bh);
  acb-bh = NULL;

+acb-common.cb(acb-common.opaque, (acb-ret  0 ? 0 : acb-ret));
+
  qemu_aio_release(acb);
  }



What does this hunk do?

Re: [Qemu-devel] TCP based PCIE request forwarding

2012-11-22 Thread lementec fabien

Hi,

Thanks for the feedback, I will modify the previous document
to include the changes you mentionned. I reply here too.

2012/11/22 Stefan Hajnoczi stefa...@gmail.com:
 On Wed, Nov 21, 2012 at 03:27:48PM +0100, lementec fabien wrote:
 usage
 -
 PCIEFW devices are instanciated using the following QEMU options:
 -device \
  pciefw,\
  laddr=local_addr,\
  lport=local_port,\
  raddr=remote_addr,\
  rport=remote_port

 Take a look at qemu_socket.h:socket_parse().  It should allow you to
 support TCP, UNIX domain sockets, and arbitrary file descriptors.


ok, I will have a look what it implies to support arbitrary file descriptor.
For instance, my current implementation does not work with UDP sockets.
It assumes a reliable, ordered transport layer, whose OS API is not datagram
oriented.

 implementation
 --
 PCIEFW is a PCIE accesses forwarding device added to the QEMU source tree. At
 initialization, this device opens a bidirectionnal point to point 
 communication
 channel with an external process. This process actually implements the PCIE
 endpoint. That is, a PCIE access made by QEMU is forwarded to the process.
 Reciprocally, replies and interrupts messages from the process are forwarded
 to QEMU.

 The commnication currently relies on a bidirectionnal point to point TCP

 s/commnication/communication/

 socket based channel. Byte ordering is little endian.

 PCIEFW initiates a request upon access from QEMU. It sends a message whose
 format is described by the pciefw_msg_t type:

 typedef struct pciefw_msg
 {
 #define PCIEFW_MSG_MAX_SIZE (offsetof(pciefw_msg_t, data) + 0x1000)

 The size field is uint16_t.  Do you really want to limit to 4 KB of
 data?


My first implementation required to allocate a fixed size buffer. It
is no longer
the case (with non datagram oriented IO operations) since I included the header
that contains the message size. Since PCIE maximum payload size is 0x1000,
it was an obvious choice. Of course, it is an arbitrary choice.


   pciefw_header_t header;

 #define PCIEFW_OP_READ_CONFIG 0
 #define PCIEFW_OP_WRITE_CONFIG 1
 #define PCIEFW_OP_READ_MEM 2
 #define PCIEFW_OP_WRITE_MEM 3
 #define PCIEFW_OP_READ_IO 4
 #define PCIEFW_OP_WRITE_IO 5
 #define PCIEFW_OP_INT 6
 #define PCIEFW_OP_MSI 7
 #define PCIEFW_OP_MSIX 8

   uint8_t op; /* in PCIEFW_OP_XXX */
   uint8_t bar; /* in [0:5] */
   uint8_t width; /* access in 1, 2, 4, 8 */
   uint64_t addr;
   uint16_t size; /* data size, in bytes */

 Why is are both width and size fields?  For read-type operations the
 size field would indicate how many bytes to read.  For write-type
 operations the size field would indicate how many bytes are included in
 data[].


Actually, the width field is currently not required. I included it to
allow multiple
contiguous accesses in 1 operation (where count = size / width). The device
would still need know the width of individual accesses in this case. But this is
not used.

   uint8_t data[1];

 } __attribute__((packed)) pciefw_msg_t;

 Note that data is a variable length field.

 The PCIE endpoint process replies with a pciefw_reply_t formatted message:

 typedef struct pciefw_reply
 {
   pciefw_header_t header;
   uint8_t status;

 What values does this field take?


I will define a PCIEFW_STATUS_XXX

   uint8_t data[8];
 } __attribute__((packed)) pciefw_reply_t;

 The PCIE endpoint process can initiate pciefw_msg_t to perform write 
 operations
 of its own. This is used to perform data transfer (DMA engines ...) and send
 interrupts.

 Any flow control rules?  For example, can the endpoint raise an
 interrupt while processing a message (before it sends a reply)?


Currently, messages are not identified so the transport is assumed
to be made in order. In practice, it works because of the LINUX
application I use does not starts 2 DMA transfers in parallel. But a
protocol cannot rely on such assumptions. Plus, I assume QEMU
can eventually make 2 PCIE concurrent accesses on the same
device which would lead to 2 replies. so I will add an identifier field.

 Both types start with a pciefw_header containing the total size:

 typedef struct pciefw_header
 {
   uint16_t size;
 } __attribute__((packed)) pciefw_header_t;

 A hello message type would be useful so that you can extend the
 protocol in the future.  The message would contain feature bits or a
 version number.


I did think about it. More generally, it would be useful to have a control
message to allow an endpoint to be disconnected then reconnected
without having to reboot QEMU. It is very useful when developping a
new device.

 Stefan

I will send you the modified document,

Thanks,

Fabien.

Re: [Qemu-devel] [PATCH for 1.3 v2] block: Fix regression for MinGW (assertion caused by short string)

On Thu, Nov 22, 2012 at 07:25:48AM +0100, Stefan Weil wrote:
 The local string tmp_filename is passed to function get_tmp_filename
 which expects a string with minimum size MAX_PATH for w32 hosts.
 
 MAX_PATH is 260 and PATH_MAX is 259, so tmp_filename was too short.
 
 Commit eba25057b9a5e19d10ace2bc7716667a31297169 introduced this
 regression.
 
 Signed-off-by: Stefan Weil s...@weilnetz.de
 ---
 
 v2: Added TODO comment as suggested by Stefan Hajnoczi.
 
 Hi Anthony, hi Blue,
 
 this fix is needed for 1.3 to avoid a fatal assertion on Windows.
 As discussed on qemu-devel, it is a minimal solution and can be
 replaced by a better one after 1.3.
 
 Please commit it to git master.
 
 Regards
 Stefan
 
 
  block.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Re: [Qemu-devel] [PATCH V3] Build system fix distclean error for pixman

On Thu, Nov 22, 2012 at 10:07:46AM +0800, Wenchao Xia wrote:
   Currently Makefile test if pixman have configure log, but the script 
 directly
 return error if that file do not exist. This patch fix it.
 
 v2: print out the command.
 v3: resend as a stand alone fix patch, add reviewer.
 
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 Reviewed-by: Peter Maydell peter.mayd...@linaro.org
 ---
  Makefile |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Re: [Qemu-devel] TCP based PCIE request forwarding

Il 22/11/2012 09:19, Stefan Hajnoczi ha scritto:
  usage
  -
  PCIEFW devices are instanciated using the following QEMU options:
  -device \
   pciefw,\
   laddr=local_addr,\
   lport=local_port,\
   raddr=remote_addr,\
   rport=remote_port
 Take a look at qemu_socket.h:socket_parse().  It should allow you to
 support TCP, UNIX domain sockets, and arbitrary file descriptors.
 

Even better it could just be a chardev.  socket_parse() is only used by
the (human) monitor interface.

Paolo

Re: [Qemu-devel] Interrupt controller updates

Il 21/11/2012 02:07, Benjamin Herrenschmidt ha scritto:
 David (CC) want to make some progress with our in-kernel PIC. From
 memory, one of the outcomes of the BOF was that we need to move the
 existing enable in-kernel PIC from generic KVM init to machine init in
 order to be able to add an argument indicating the model use by the
 arch/platform since some like ours support several different models and
 since that all needs to be selected before the VCPUs are created.
 
 Again, from memory, you were volunteered to do the initial x86 change so
 we could piggy back on it :-) Or do I remember wrong ?

Please suggest an API, then we can work out the x86 changes.  I can
volunteer myself, but I wasn't in the BOF so I need something more concrete.

Paolo

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 10:52:27AM +0100, Paolo Bonzini wrote:
 Il 22/11/2012 10:45, Michael S. Tsirkin ha scritto:
  All this looks suspiciously like premature optimization to me.
  Do we have data to show avoiding header copy is a win?
 
 The code is a little simpler, because we know the footer is 1 byte only.
 
 Paolo

Yes but the APIs don't make sense in the generic case
of 1 byte: users will have to code up two paths for when
the buffer they want to access gets scattered across.
So this looks like a future source of errors:
it's better to avoid poking at guest memory directly.

If the point is to avoid scanning iov vector when data is towards the
end of the iov, then this does sound reasonable.  In that case IMHO we
should just have accessors that work back from end of the iov. E.g.

size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
size_t offset, const void *buf, size_t bytes)

[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static

If you can provide a simple straightforward reproduce case that would be
useful.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/955379

Title:
  cmake hangs with qemu-arm-static

Status in QEMU:
  New
Status in Linaro QEMU:
  New
Status in “qemu-linaro” package in Ubuntu:
  New

Bug description:
  I'm using git commit 3e7ecd976b06f... configured with --target-list
  =arm-linux-user --static in a chroot environment to compile some
  things. I ran into this problem with both pcl and opencv-2.3.1. cmake
  consistently freezes at some point during its execution, though in a
  different spot each time, usually during a step when it's searching
  for some libraries. For instance, pcl most commonly stops after:

  [snip]
  -- Boost version: 1.46.1
  -- Found the following Boost libraries:
  --   system
  --   filesystem
  --   thread
  --   date_time
  -- checking for module 'eigen3'
  --   found eigen3, version 3.0.1

  which is perplexing because it freezes after finding what it wants,
  not during the search. When it does get past that point, it does so
  almost immediately but freezes somewhere else.

  I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic
  with an Intel i5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

Il 22/11/2012 11:38, Michael S. Tsirkin ha scritto:
  The code is a little simpler, because we know the footer is 1 byte only.

 Yes but the APIs don't make sense in the generic case
 of 1 byte: users will have to code up two paths for when
 the buffer they want to access gets scattered across.

That would be premature optimization; with 1 byte you just use
iov_from/to_buf.

BTW, something like this function is also useful for the broken SCSI
outhdr (the CDB starts after the common outhdr and is in a single
iovec).  But the API must be changed slightly, as in my answer to Stefan.

 If the point is to avoid scanning iov vector when data is towards the
 end of the iov, then this does sound reasonable.  In that case IMHO we
 should just have accessors that work back from end of the iov. E.g.
 
 size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
   size_t offset, const void *buf, size_t bytes)

That's also a possibility.

Paolo

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 11:54:49AM +0100, Paolo Bonzini wrote:
 Il 22/11/2012 11:38, Michael S. Tsirkin ha scritto:
   The code is a little simpler, because we know the footer is 1 byte only.
 
  Yes but the APIs don't make sense in the generic case
  of 1 byte: users will have to code up two paths for when
  the buffer they want to access gets scattered across.
 
 That would be premature optimization; with 1 byte you just use
 iov_from/to_buf.

If the API makes it very clear that it only works for 1 byte
I would have no objection.

 BTW, something like this function is also useful for the broken SCSI
 outhdr (the CDB starts after the common outhdr and is in a single
 iovec).  But the API must be changed slightly, as in my answer to Stefan.

The scsi command is special in that the length is
iov_length. It's all legacy anyway so I think we can
just assume it's the second s/g entry:
iov[1].iov_length.
We should probably have a wrapper that gets it
after checking iov_cnt  1.

size_t iov_get_seg_length(...)
{
assert(iov_cnt = idx);
return iov[idx].iov_length;
}

Rusty says he'll put the length of the command
in the buffer in the future, so we will have
if (new_cmd_feature)
iov_to_buf( len)
else
len = iov_get_seg_length



  If the point is to avoid scanning iov vector when data is towards the
  end of the iov, then this does sound reasonable.  In that case IMHO we
  should just have accessors that work back from end of the iov. E.g.
  
  size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
  size_t offset, const void *buf, size_t bytes)
 
 That's also a possibility.
 
 Paolo

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

On Wed, Nov 21, 2012 at 10:01:00AM +0100, Dietmar Maurer wrote:
 +==Disadvantages==
 +
 +* we need to define a new archive format
 +
 +Note: Most existing archive formats are optimized to store small files
 +including file attributes. We simply do not need that for VM archives.

Did you look at the VMDK Stream-Optimized Compressed subformat?

http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk

It is a stream of compressed grains (data).  They are out-of-order and
each grain comes with the virtual disk lba where the data should be
visible to the guest.

The stream also contains grain tables and grain directories.  This
metadata makes random read access to the file possible once you have
downloaded the entire file (i.e. it is seekable).  Although tools can
choose to consume the stream in sequential order too and ignore the
metadata.

In other words, the format is an out-of-order stream of data chunks plus
random access lookup tables at the end.

QEMU's block/vmdk.c already has some support for this format although I
don't think we generate out-of-order yet.

The benefit of reusing this code is that existing tools can consume
these files.

Stefan

Re: [Qemu-devel] [PATCH v4 00/14] Add Q35 base support

2012-11-22 Thread Gerd Hoffmann

On 11/14/12 21:53, Jason Baron wrote:
 These patches are intened to give us a base set of patches for Q35 upon which
 to build. The major change in this series is to add the memory controller hub,
 or 'mch' as proper member of the q35 host structure. This change refactors the

Fails to build with all targets enabled:

[ ... ]

  LINK  alpha-softmmu/qemu-system-alpha
../hw/acpi_ich9.o: In function `ich9_pm_init':
acpi_ich9.c:(.text+0x5d): undefined reference to `acpi_pm_tmr_init'
acpi_ich9.c:(.text+0x65): undefined reference to `acpi_pm1_cnt_init'
acpi_ich9.c:(.text+0x72): undefined reference to `acpi_gpe_init'

[ ... ]

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 2/5] add basic backup support to block driver

On Wed, Nov 21, 2012 at 10:01:01AM +0100, Dietmar Maurer wrote:
Two comments from skimming the code, not a full review.

 +/* #define DEBUG_BACKUP */
 +
 +#ifdef DEBUG_BACKUP
 +#define DPRINTF(fmt, ...) \
 +do { printf(backup:  fmt, ## __VA_ARGS__); } while (0)
 +#else
 +#define DPRINTF(fmt, ...) \
 +do { } while (0)
 +#endif

Please don't #define debug macros like this.  It hides build breakage
because the debug code is #ifdef'd out.

Either use docs/tracing.txt or do the following so the code always gets
parsed by the compiler:

#define DEBUG_BACKUP 0
#define DPRINTF(fmt, ...) \
do { \
if (DEBUG_BACKUP) { \
printf(backup:  fmt, ## __VA_ARGS__); \
} \
} while (0)

 +BackupInfo *
 +bdrv_backup_init(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
 + BlockDriverCompletionFunc *backup_complete_cb,
 + void *opaque)
 +{
 +assert(bs);
 +assert(backup_dump_cb);
 +assert(backup_complete_cb);
 +
 +if (bs-backup_info) {
 +DPRINTF(bdrv_backup_init already initialized %s\n,
 +bdrv_get_device_name(bs));
 +return NULL;
 +}
 +
 +BackupInfo *info = g_malloc0(sizeof(BackupInfo));
 +int64_t bitmap_size;
 +const char *devname = bdrv_get_device_name(bs);
 +
 +if (!devname || !devname[0]) {
 +return NULL;
 +}
 +
 +DPRINTF(bdrv_backup_init %s\n, bdrv_get_device_name(bs));
 +
 +bitmap_size = bs-total_sectors +
 +BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG - 1;
 +bitmap_size /= BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG;
 +
 +info-backup_dump_cb = backup_dump_cb;
 +info-backup_complete_cb = backup_complete_cb;
 +info-opaque = opaque;
 +info-bitmap_size = bitmap_size;
 +info-bitmap = g_new0(unsigned long, bitmap_size);
 +
 +Error *errp;
 +BackupBlockJob *job = block_job_create(backup_job_type, bs, 0,
 +   backup_job_cleanup_cb, bs, errp);

Is there a 1:1 relationship between BackupInfo and BackupBlockJob?  Then
it would be nicer to move BlockupInfo fields into BackupBlockJob (which
is empty right now).  Then you also don't need to add fields to
BlockDriverState because you know that if your blockjob is running you
can access it via bs-job, if necessary.  You can then drop BackupInfo
and any memory management code for it.

Re: [Qemu-devel] TCP based PCIE request forwarding

2012-11-22 Thread lementec fabien

Hi,

I modified the protocol so that new message types can be
added easily. It is necessary for control related messages,
such as the hello one (I called it init). A type field has
been added to the header.

I did not include a is_reply (or is_request) field, and
prefered having 2 distinct message types. This is because
one may imagine a message type that has no reply (ie. ping ...)

Out of order reception is allowed by the use of a tag field
in request and replies. I did not included the tag in the
header, since not all the messages may need a tag. I plan
to implement this tag as a simple incrementing counter, so
made it large enough.

I did not implement these modifications yet, since I prefer
having feedbacks first. Neither did I have a look to the
the command line option parsing.

Regards,

Fabien.

2012/11/22 Paolo Bonzini pbonz...@redhat.com:
 Il 22/11/2012 09:19, Stefan Hajnoczi ha scritto:
  usage
  -
  PCIEFW devices are instanciated using the following QEMU options:
  -device \
   pciefw,\
   laddr=local_addr,\
   lport=local_port,\
   raddr=remote_addr,\
   rport=remote_port
 Take a look at qemu_socket.h:socket_parse().  It should allow you to
 support TCP, UNIX domain sockets, and arbitrary file descriptors.


 Even better it could just be a chardev.  socket_parse() is only used by
 the (human) monitor interface.

 Paolo


pciefw.protocol
Description: Binary data

Re: [Qemu-devel] [Qemu-ppc] [PATCH 12/12] pseries: Generate unique LIOBNs for PCI host bridges

2012-11-22 Thread Alexander Graf



On 22.11.2012, at 08:23, Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Nov 22, 2012 at 01:27:18PM +1100, David Gibson wrote:
 On Wed, Nov 21, 2012 at 05:27:37PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 21, 2012 at 02:27:08PM +0100, Alexander Graf wrote:
 On 11/21/2012 02:21 PM, David Gibson wrote:
 On Wed, Nov 21, 2012 at 03:13:39PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 21, 2012 at 11:36:00PM +1100, David Gibson wrote:
 On Wed, Nov 21, 2012 at 01:34:48PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 21, 2012 at 11:57:05AM +1100, David Gibson wrote:
 On Tue, Nov 20, 2012 at 02:26:09PM +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 20, 2012 at 10:27:11AM +0100, Alexander Graf wrote:
 On 19.11.2012, at 23:51, David Gibson wrote:
 
 On Mon, Nov 19, 2012 at 05:34:12PM +0100, Alexander Graf wrote:
 On 13.11.2012, at 03:47, David Gibson wrote:
 
 From: Alexey Kardashevskiya...@ozlabs.ru
 
 In future (with VFIO) we will have multiple PCI host bridges on
 pseries.  Each one needs a unique LIOBN (IOMMU id).  At the 
 moment we
 derive these from the pci domain number, but the whole notion of
 domain numbers on the qemu side is bogus and in any case they're 
 not
 actually uniquely allocated at this point.
 
 This patch, therefore uses a simple sequence counter to generate
 unique LIOBNs for PCI host bridges.
 
 Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
 Signed-off-by: David Gibsonda...@gibson.dropbear.id.au
 I don't really like the idea of having a global variable just
 because our domain ID generation seems to not work as
 expected. Michael, any comments here?
 Well, the patch I sent which changed domain id generation was
 ignored.  In any case, as I said, the whole concept of domain 
 numbers
 Michael?
 This is user visible, right?
 So IMHO we should have the user specify LIOBN through a property,
 rather than assign what's essentially a random value.
 Well, I can implement an override through a property, which could be
 useful in some circumstances.  But we still need to have qemu generate
 unique defaults, rather than forcing it to be specified in every case.
 I don't see why.
 And if you want automatic defaults then they need to be generated in a
 way that does not depend on implementation detail such as order of
 device initialization.
 Because requiring explicit unique liobns to be supplied whenever there
 is more than one PHB is horrible for usability.
 We should make simple things simple and complex things possible.
 More than one PHB seems like an advanced feature
 Not for pseries.  On real hardware of this type, dozens of PHBs is
 routine.  Plus, vfio passthrough is coming, we need at minimum one PHB
 for emulated devices and one for passthrough devices.
 
 Yeah, I second Davids opinion here. We need to make this easy for users.
 
 I think users don't normally create PHBs. They request a disk, a network
 device, a pass-through device ... In this case if this requires
 more PHBs create them internally but I imagine this doesn't
 require any allocation scheme - simply set some address
 for virtual devices, some other one for assigned devices ... no?
 
 No.  One PHB for passthrough and one for emulated is the minimum.
 Since we don't emulated p2p bridges,
 
 Actually qemu does emulate p2p bridges.
 
 each PHB can only support a small
 number of PCI devices, so if enough PCI devices are requested, we will
 still need to create - and assign numbers to - additional PHBs.
 
 Each PHB can support up to 32 slots right? This seems ample for
 a typical use. If you want many tens of devices you need to
 supply addresses manually, this looks reasonable to me.
 
 Allocating PHBs on the fly seems unencessarily tricky.

IIRC it's required for fault isolation.

Alex

 
 If user wants to play with low level detail such as PHBs I don't see why
 it's not reasonable to require full specification.
 
 How do we assign PCI slot IDs today when all you do is a -device?
 This should probably follow the same scheme.
 
 
 Alex
 
 What we do is find a first free slow. But it's exactly why I worry:
 changing pci addresses between qemu releases has been a source of pain
 and compatibility hassles in the past.
 The problem would be more manageable if you simply reserve some fixed
 addresses for internal use, like we reserve slots for VGA and IDE,
 though even that becomes a problem as we switch to q35.
 
 
 -- 
 David Gibson| I'll have my music baroque, and my code
 david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
 http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

 Did you look at the VMDK Stream-Optimized Compressed subformat?
 
 http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?
 src=vmdk
 
 It is a stream of compressed grains (data).  They are out-of-order and each
 grain comes with the virtual disk lba where the data should be visible to the
 guest.
 

What kind of license is applied to that specification?

[Qemu-devel] FYI: KVM Forum community team photo

2012-11-22 Thread Daniel P. Berrange

At the LinuxCon / KVM Forum last week, a bunch of QEMU/KVM community members
were rounded up to pose for a team photo. Thanks to Jeff Cody who was the
one with the camera. I've put the photo up here along with, what I hope
is a correct, list of names against faces...

  http://www.linux-kvm.org/static/kvm-forum-2012-barcelona-team-photo.html

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 2/5] add basic backup support to block driver

 Is there a 1:1 relationship between BackupInfo and BackupBlockJob?  Then it
 would be nicer to move BlockupInfo fields into BackupBlockJob (which is
 empty right now

No, a backup create several block jobs - one for each blockdev you want to 
backup. Those
jobs run in parallel.

Re: [Qemu-devel] FYI: KVM Forum community team photo

2012-11-22 Thread Zhi Yong Wu

On Thu, Nov 22, 2012 at 6:40 PM, Daniel P. Berrange berra...@redhat.com wrote:
 At the LinuxCon / KVM Forum last week, a bunch of QEMU/KVM community members
 were rounded up to pose for a team photo. Thanks to Jeff Cody who was the
 one with the camera. I've put the photo up here along with, what I hope
 is a correct, list of names against faces...

   http://www.linux-kvm.org/static/kvm-forum-2012-barcelona-team-photo.html
It will be more appreciated if someone can upload slides.


 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|




-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [Qemu-ppc] [PATCH 12/12] pseries: Generate unique LIOBNs for PCI host bridges

On Thu, Nov 22, 2012 at 12:27:49PM +0100, Alexander Graf wrote:
 
 
 On 22.11.2012, at 08:23, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Thu, Nov 22, 2012 at 01:27:18PM +1100, David Gibson wrote:
  On Wed, Nov 21, 2012 at 05:27:37PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 21, 2012 at 02:27:08PM +0100, Alexander Graf wrote:
  On 11/21/2012 02:21 PM, David Gibson wrote:
  On Wed, Nov 21, 2012 at 03:13:39PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 21, 2012 at 11:36:00PM +1100, David Gibson wrote:
  On Wed, Nov 21, 2012 at 01:34:48PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 21, 2012 at 11:57:05AM +1100, David Gibson wrote:
  On Tue, Nov 20, 2012 at 02:26:09PM +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 20, 2012 at 10:27:11AM +0100, Alexander Graf wrote:
  On 19.11.2012, at 23:51, David Gibson wrote:
  
  On Mon, Nov 19, 2012 at 05:34:12PM +0100, Alexander Graf wrote:
  On 13.11.2012, at 03:47, David Gibson wrote:
  
  From: Alexey Kardashevskiya...@ozlabs.ru
  
  In future (with VFIO) we will have multiple PCI host bridges on
  pseries.  Each one needs a unique LIOBN (IOMMU id).  At the 
  moment we
  derive these from the pci domain number, but the whole notion 
  of
  domain numbers on the qemu side is bogus and in any case 
  they're not
  actually uniquely allocated at this point.
  
  This patch, therefore uses a simple sequence counter to 
  generate
  unique LIOBNs for PCI host bridges.
  
  Signed-off-by: Alexey Kardashevskiya...@ozlabs.ru
  Signed-off-by: David Gibsonda...@gibson.dropbear.id.au
  I don't really like the idea of having a global variable just
  because our domain ID generation seems to not work as
  expected. Michael, any comments here?
  Well, the patch I sent which changed domain id generation was
  ignored.  In any case, as I said, the whole concept of domain 
  numbers
  Michael?
  This is user visible, right?
  So IMHO we should have the user specify LIOBN through a property,
  rather than assign what's essentially a random value.
  Well, I can implement an override through a property, which could be
  useful in some circumstances.  But we still need to have qemu 
  generate
  unique defaults, rather than forcing it to be specified in every 
  case.
  I don't see why.
  And if you want automatic defaults then they need to be generated in 
  a
  way that does not depend on implementation detail such as order of
  device initialization.
  Because requiring explicit unique liobns to be supplied whenever there
  is more than one PHB is horrible for usability.
  We should make simple things simple and complex things possible.
  More than one PHB seems like an advanced feature
  Not for pseries.  On real hardware of this type, dozens of PHBs is
  routine.  Plus, vfio passthrough is coming, we need at minimum one PHB
  for emulated devices and one for passthrough devices.
  
  Yeah, I second Davids opinion here. We need to make this easy for users.
  
  I think users don't normally create PHBs. They request a disk, a network
  device, a pass-through device ... In this case if this requires
  more PHBs create them internally but I imagine this doesn't
  require any allocation scheme - simply set some address
  for virtual devices, some other one for assigned devices ... no?
  
  No.  One PHB for passthrough and one for emulated is the minimum.
  Since we don't emulated p2p bridges,
  
  Actually qemu does emulate p2p bridges.
  
  each PHB can only support a small
  number of PCI devices, so if enough PCI devices are requested, we will
  still need to create - and assign numbers to - additional PHBs.
  
  Each PHB can support up to 32 slots right? This seems ample for
  a typical use. If you want many tens of devices you need to
  supply addresses manually, this looks reasonable to me.
  
  Allocating PHBs on the fly seems unencessarily tricky.
 
 IIRC it's required for fault isolation.
 
 Alex

IIUC separate physical PHBs might help fault isolation
of assigned devices.
OTOH I don't see why we need to emulate PHBs for this.

  
  If user wants to play with low level detail such as PHBs I don't see why
  it's not reasonable to require full specification.
  
  How do we assign PCI slot IDs today when all you do is a -device?
  This should probably follow the same scheme.
  
  
  Alex
  
  What we do is find a first free slow. But it's exactly why I worry:
  changing pci addresses between qemu releases has been a source of pain
  and compatibility hassles in the past.
  The problem would be more manageable if you simply reserve some fixed
  addresses for internal use, like we reserve slots for VGA and IDE,
  though even that becomes a problem as we switch to q35.
  
  
  -- 
  David Gibson| I'll have my music baroque, and my code
  david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
  _other_
 | _way_ _around_!
  http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

 Did you look at the VMDK Stream-Optimized Compressed subformat?
 
 http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?
 src=vmdk

Max file size 2TB?

Re: [Qemu-devel] [PATCH] ivshmem: use irqfd to interrupt among VMs

2012-11-22 Thread Jan Kiszka

On 2012-11-22 03:48, liu ping fan wrote:
 On Wed, Nov 21, 2012 at 8:43 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-11-21 07:02, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Using irqfd, so we can avoid switch between kernel and user when
 VMs interrupts each other.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/ivshmem.c |   48 +++-
  1 files changed, 47 insertions(+), 1 deletions(-)

 diff --git a/hw/ivshmem.c b/hw/ivshmem.c
 index f6dbb21..81c7354 100644
 --- a/hw/ivshmem.c
 +++ b/hw/ivshmem.c
 @@ -19,6 +19,7 @@
  #include hw.h
  #include pc.h
  #include pci.h
 +#include msi.h
  #include msix.h
  #include kvm.h
  #include migration.h
 @@ -54,6 +55,11 @@ typedef struct EventfdEntry {
  int vector;
  } EventfdEntry;

 +typedef struct IrqfdEntry {
 +int virq;
 +bool used;

 used = (virq != -1), so it should be redundant, and you can reduce
 IrqfdEntry to a plain int holding the virq.

 Applied,
 
 +} IrqfdEntry;
 +
  typedef struct IVShmemState {
  PCIDevice dev;
  uint32_t intrmask;
 @@ -83,6 +89,8 @@ typedef struct IVShmemState {
  uint32_t vectors;
  uint32_t features;
  EventfdEntry *eventfd_table;
 +IrqfdEntry *vector_irqfd;
 +bool irqfd_enable;

  Error *migration_blocker;

 @@ -632,6 +640,38 @@ static void ivshmem_write_config(PCIDevice *pci_dev, 
 uint32_t address,
  msix_write_config(pci_dev, address, val, len);
  }

 +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector,
 + MSIMessage msg)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +int virq;
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +
 +virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +if (virq = 0  kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 
 0) {
 +s-vector_irqfd[vector].virq = virq;
 +s-vector_irqfd[vector].used = true;
 +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, 
 NULL);
 +} else if (virq = 0) {
 +kvm_irqchip_release_virq(kvm_state, virq);
 +}
 +return 0;

 You drop the errors here. Better refactor the code to a scheme like this:

 In msix_fire_vector_notifier(), there is assert(ret = 0);.  But I
 think ivshmem can work even if irqfd can not be established, and fall
 back to the origin scheme. So here just ignore err. Is it proper?

If you have an error here, something seriously went wrong. So better let
the user know than falling back to slow mode silently. That's what
other users do as well.

 
 err = service();
 if (err) {
 roll_back();
 return err;
 /* or: goto roll_back_... */
 }

 +}
 +
 +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector)
 +{
 +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 +EventNotifier *n = s-peers[s-vm_id].eventfds[vector];
 +int virq = s-vector_irqfd[vector].virq;
 +
 +if (s-vector_irqfd[vector].used) {
 +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq);
 +kvm_irqchip_release_virq(kvm_state, virq);
 +s-vector_irqfd[vector].virq = -1;
 +s-vector_irqfd[vector].used = false;
 +}
 +}
 +
  static int pci_ivshmem_init(PCIDevice *dev)
  {
  IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
 @@ -759,7 +799,13 @@ static int pci_ivshmem_init(PCIDevice *dev)
  }

  s-dev.config_write = ivshmem_write_config;
 -
 +if (kvm_gsi_routing_enabled()) {
 +s-irqfd_enable = msix_set_vector_notifiers(dev, 
 ivshmem_vector_use,
 +ivshmem_vector_release) = 0 ? true : false;
 +if (s-irqfd_enable) {
 +s-vector_irqfd = g_new0(IrqfdEntry, s-vectors);

 Conceptually, msix_set_vector_notifiers can already call
 ivshmem_vector_use, so this initialization would come too late. Doesn't
 happen here as MSI-X is still off during device init. However, just
 perform vector_irqfd allocation unconditionally before the registration.

 Applied,
 And where do you free it again...?

 Will fix it.
 
 And I think, this is the way to push interrupt subsystem out of big
 lock for KVM.  For TCG code, we can use something like
 kvm_irqchip_add_msi_route(). How do you think about it?

Cannot follow what your idea is.

Also not that MSI on x86 is special anyway as it is not routed in any
user-space device model (so far - emulated interrupt remapping will
change this) but goes directly to the target VCPU via the corresponding
IOCTL.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] FYI: KVM Forum community team photo

2012-11-22 Thread Daniel P. Berrange

On Thu, Nov 22, 2012 at 07:33:03PM +0800, Zhi Yong Wu wrote:
 On Thu, Nov 22, 2012 at 6:40 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  At the LinuxCon / KVM Forum last week, a bunch of QEMU/KVM community members
  were rounded up to pose for a team photo. Thanks to Jeff Cody who was the
  one with the camera. I've put the photo up here along with, what I hope
  is a correct, list of names against faces...
 
http://www.linux-kvm.org/static/kvm-forum-2012-barcelona-team-photo.html
 It will be more appreciated if someone can upload slides.

I understand that is in the process of being arranged, though this is
thanksgiving week in the US so many people are not working.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
 Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
  The iov_get_ptr() data returns a pointer to contiguous data within a
  vector.  This allows the caller to manipulate data inside the vector
  without copying in/out using iov_from_buf()/iov_to_buf() when we know
  that data is contiguous within an iovec element.
 
 This works for you because you have a single byte to write.  It would
 not work for the SG_IO inhdr, which would need iov_to_buf().

Guilty as charged, your honor. :)

Let me give a few more details about the motivation for this function:

In virtio-blk-data-plane we have an iovec[] array.  In the read/write
code path we discard the inhdr/outhdr so just the data buffers are left
in the iovec[] array.  Then we can pass the iovec[] array straight to
the Linux AIO functions.

Because we're using the iovec[] array for data buffers and we're not
allowed to make assumptions about iovec layout, we cannot use
iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
inhdr has already been discarded from the iovec[] array.

Since I knew the inhdr is only 1 byte I decided against doing something
like dynamically allocating/freeing a QEMUIOVector which could handle
spanning iovecs.

That said, I think this function is okay as-is because it works fine for
non-virtio cases where the caller *knows* the iovec[] layout.  As a
utility function it stands on its own.

 What about the following alternative API:
 
 void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
   ssize_t offset, size_t *bytes);
 
 which would place the number of valid bytes (i.e. the length of the
 remainder of the iovec entry) in *bytes?
 
 Also, I think that offset == iov_size(iov, iov_cnt) should be
 acceptable, and it would be the only case in which *bytes == 0.

Hmm...this may be more useful than the version I proposed since the
caller can also use it to find out how many bytes are contiguous.

Michael: Any concerns if I update the code to reflect Paolo's
suggestion?

Stefan

Re: [Qemu-devel] [PATCH 2/2] Documentation: Update image format information

On Thu, Nov 22, 2012 at 10:08:06AM +0100, Kevin Wolf wrote:
 Am 22.11.2012 08:37, schrieb Stefan Hajnoczi:
  On Wed, Nov 21, 2012 at 04:22:07PM +0100, Kevin Wolf wrote:
  Am 21.11.2012 16:14, schrieb Stefan Hajnoczi:
  On Wed, Nov 21, 2012 at 02:23:57PM +0100, Kevin Wolf wrote:
   @item qed
  -Image format with support for backing files and compact image files 
  (when your
  -filesystem or transport medium does not support holes).  Good 
  performance due
  -to less metadata than the more featureful qcow2 format, especially with
  -cache=writethrough or cache=directsync.  Consider using qcow2 which 
  will soon
  -have a similar optimization and is most actively developed.
  +Old QEMU image format. Left for compatibility.
  +
  +For new images, use qcow2 instead. You might want to consider using the
  +@code{lazy_refcounts=on} option to get a more QED-like behaviour.
 
  The first sentence should be kept, it describes the general feature set
  and scope of this image format.  I agree that the rest of the paragraph
  can be dropped.
 
  You could insert a statement saying that qcow2 is now preferred because
  it is actively developed and offers advanced features and performance as
  the very first sentence.
 
  It's the same terse description as for qcow1. If we decided to describe
  the format in more detail, we should do so consistently, and probably
  also for non-native formats. (But why? The only use case is
  compatibility with other/older hypervisors, which is mentioned.)
  
  Yes, we should have descriptions of other formats too.
  
  Users dealing with existing guests using these formats should still have
  access to general information about the formats.  For example, if I'm a
  new user and it's my job to work with an existing qcow1 disk image then
  it's not very helpful to just see a terse Use qcow2 instead message.
 
  It's not good to drop documentation on a feature because it is
  deprecated.
 
 I can see your point, but the whole section starts to become a bit
 lengthy then for a man page. Also, I don't think that your qcow1 user
 really needs a detailed description - he already has a format and
 doesn't have to choose one, so the characteristics aren't that
 important. If he considers converting the image, we're back to raw and
 qcow2.
 
 Maybe we should keep the current descriptions for raw and qcow2 (as they
 are what users really choose between for new images), and extend and
 move the documentation of other formats into qemu-doc.texi and mention
 them in the man page of qemu-img only as Also supported for
 compatibility with other hypervisors or older QEMU versions: vmdk, vdi,
 qed, ... See the QEMU Emulator User Documentation for more information
 on these.

Sounds good.

Stefan

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

 It is a stream of compressed grains (data).  They are out-of-order and each
 grain comes with the virtual disk lba where the data should be visible to the
 guest.
 
 The stream also contains grain tables and grain directories.  This
 metadata makes random read access to the file possible once you have
 downloaded the entire file (i.e. it is seekable).  Although tools can choose 
 to
 consume the stream in sequential order too and ignore the metadata.
 
 In other words, the format is an out-of-order stream of data chunks plus
 random access lookup tables at the end.
 
 QEMU's block/vmdk.c already has some support for this format although I
 don't think we generate out-of-order yet.
 
 The benefit of reusing this code is that existing tools can consume these 
 files.

Compression format is hardcoded to RFC 1951 (defalte). I think this is a major 
disadvantage, 
because it is really slow (compared to lzop).

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

Christian Borntraeger borntrae...@de.ibm.com writes:

 There are QEMUMachines that have neither IF_IDE nor IF_SCSI as a
 default/standard interface to their block devices / drives. Therefore,
 this patch introduces a new field default_block_type per QEMUMachine
 struct. The prior use_scsi field becomes thereby obsolete and is
 replaced through .default_block_type = IF_SCSI.

 This patch also changes the default for s390x to IF_VIRTIO and
 removes an early hack that converts IF_IDE drives.

Recommend to do this in a separate patch.

 Other parties have already claimed interest (e.g. IF_SD for exynos)

 To create a sane default, for machines that dont specify a
 default_block_type, this patch makes IF_IDE = 0 and IF_NONE = 1.
 I checked all users of IF_NONE (blockdev.c and ww/device-hotplug.c)

Typo: hw/

 as well as IF_IDE and it seems that it is ok to change the defines -
 in other words, I found no obvious (to me) assumption in the code
 regarding IF_NONE==0.

Checking for IF_NONE and IF_IDE is not enough; we have to check that
BlockInterfaceType values are never used in a way that's broken by this
change, such as C89 initializers and implicit comparisons to zero.

I just did that, and couldn't find anything.

Feel free to update the commit message accordingly.

   IF_NONE is only set if there is an
 explicit if=none. Without if=* the interface becomes IF_DEFAULT.

 I would suggest to have some additional care, e.g. by letting
 this patch sit some days in the block tree.

Procedural remarks like this one can go below the --- line, so they
won't get committed.

 Based on an initial patch from Einar Lueck elelu...@de.ibm.com

 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 CC: Igor Mitsyanko i.mitsya...@samsung.com
 CC: Markus Armbruster arm...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 ---
  blockdev.c  |4 ++--
  blockdev.h  |5 +++--
  hw/boards.h |3 ++-
  hw/device-hotplug.c |2 +-
  hw/highbank.c   |2 +-
  hw/leon3.c  |1 -
  hw/mips_jazz.c  |4 ++--
  hw/pc_sysfw.c   |2 +-
  hw/puv3.c   |1 -
  hw/realview.c   |7 ---
  hw/s390-virtio.c|   17 ++---
  hw/spapr.c  |2 +-
  hw/sun4m.c  |   24 
  hw/versatilepb.c|4 ++--
  hw/vexpress.c   |4 ++--
  hw/xilinx_zynq.c|2 +-
  vl.c|   21 -
  17 files changed, 48 insertions(+), 57 deletions(-)

 diff --git a/blockdev.c b/blockdev.c
 index e73fd6e..4a4d99f 100644
 --- a/blockdev.c
 +++ b/blockdev.c
 @@ -275,7 +275,7 @@ static bool do_check_io_limits(BlockIOLimit *io_limits)
  return true;
  }
  
 -DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 +DriveInfo *drive_init(QemuOpts *opts, BlockInterfaceType block_default_type)

Call it default_block_if_type?

  {
  const char *buf;
  const char *file = NULL;
 @@ -325,7 +325,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
  return NULL;
   }
  } else {
 -type = default_to_scsi ? IF_SCSI : IF_IDE;
 +type = block_default_type;
  }
  
  max_devs = if_max_devs[type];
 diff --git a/blockdev.h b/blockdev.h
 index 5f27b64..1c9ca6a 100644
 --- a/blockdev.h
 +++ b/blockdev.h
 @@ -19,8 +19,9 @@ void blockdev_auto_del(BlockDriverState *bs);
  
  typedef enum {
  IF_DEFAULT = -1,/* for use with drive_add() only */
 +IF_IDE, /* machines without block_default_type get 0 
 */

Comment could be clearer.  Perhaps:

/*
 * IF_IDE must be zero, because we want QEMUMachine member
 * block_default_type to default-initialize to IF_IDE
 */
 IF_IDE = 0,

  IF_NONE,
 -IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
 +IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
  IF_COUNT
  } BlockInterfaceType;
  
 @@ -51,7 +52,7 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
  QemuOpts *drive_def(const char *optstr);
  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
  const char *optstr);
 -DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi);
 +DriveInfo *drive_init(QemuOpts *arg, BlockInterfaceType block_default_type);
  
  /* device-hotplug */
  
 diff --git a/hw/boards.h b/hw/boards.h
 index 813d0e5..c66fa16 100644
 --- a/hw/boards.h
 +++ b/hw/boards.h
 @@ -3,6 +3,7 @@
  #ifndef HW_BOARDS_H
  #define HW_BOARDS_H
  
 +#include blockdev.h
  #include qdev.h
  
  typedef struct QEMUMachineInitArgs {
 @@ -24,7 +25,7 @@ typedef struct QEMUMachine {
  const char *desc;
  QEMUMachineInitFunc *init;
  QEMUMachineResetFunc *reset;
 -int use_scsi;
 +BlockInterfaceType block_default_type;

Call it default_block_if_type?

  int max_cpus;
  unsigned int no_serial:1,
  no_parallel:1,
 diff --git a/hw/device-hotplug.c

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

 Did you look at the VMDK Stream-Optimized Compressed subformat?
 
 http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?
 src=vmdk

And is that covered by any patents?

Re: [Qemu-devel] [PATCH] coroutine: Fix win32 variant for older mingw32 compilers

2012-11-22 Thread Jan Kiszka

On 2012-11-21 20:11, Stefan Weil wrote:
 Am 21.11.2012 15:38, schrieb Paolo Bonzini:
 Il 21/11/2012 15:33, malc ha scritto:
 Leaking leader is a bit bad, but it looks ok for 1.3.
 Hmm. A TLS destructor is apparently not available. Is there some on
 thread termination callback mechanism on Windows? Didn't find one on
 first glance.

 Dlls receive something like THREAD_DETTACH in it's startup routine or
 something like that if my memory serves me.
 Only DLLs.

 But this sounds like deja-vu.  I'm pretty sure in the past we just
 decided that this compiler is not supported (of course it's bad that
 it's silent).  Stefan, do you remember the details?

 Paolo
 
 Debian cross works with -mthread.The issue was discussed here:

Yep, works here as well.

With this knowledge, why not convert the TLS usage in
qemu-thread-win32.c as well? That's what motivated me to go this path.

 
 https://bugs.launchpad.net/qemu/+bug/932487
 
 Jan, I don't think your patch should be applied.

I agree, but we need a different solution for this subtle failure.

 
 Current MinGW / MinGW-w64 compilers work, so those users
 which compile and use QEMU on Windows won't have a problem.
 With MinGW-w64, it is even possible to compile QEMU with nearly
 no warnings :-)
 
 Debian cross development is full of difficulties. Passing an extra
 compiler option like -mthread is only one of these difficulties.
 I updated http://wiki.qemu.org/Hosts/W32, so anybody who really
 wants to run cross compilations on Debian can get more information
 there.
 
 We could add a check to configure and add -mthread automatically.
 Up to now, there was no consensus whether this is wanted because
 -mthread adds a library to QEMU's dependencies.

As the alternative to this is a crashing QEMU, I doubt there is much to
discuss.

Do we know which mingw version is fine without -mthreads? Or do we have
to test this during configure?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane

On Wed, Nov 21, 2012 at 03:00:25PM +0800, Asias He wrote:
On 11/21/2012 02:44 PM, Stefan Hajnoczi wrote:
On Wed, Nov 21, 2012 at 7:42 AM, Asias He as...@redhat.com wrote:
On 11/21/2012 01:39 PM, Asias He wrote:
On 11/20/2012 08:25 PM, Stefan Hajnoczi wrote:
On Tue, Nov 20, 2012 at 1:21 PM, Stefan Hajnoczi stefa...@gmail.com
wrote:
On Tue, Nov 20, 2012 at 10:02 AM, Asias He as...@redhat.com wrote:
Hello Stefan,

On 11/15/2012 11:18 PM, Stefan Hajnoczi wrote:
This series adds the -device virtio-blk-pci,x-data-plane=on property
that
enables a high performance I/O codepath. A dedicated thread is used
to process
virtio-blk requests outside the global mutex and without going
through the QEMU
block layer.

Khoa Huynh k...@us.ibm.com reported an increase from 140,000 IOPS
to 600,000
IOPS for a single VM using virtio-blk-data-plane in July:

http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580

The virtio-blk-data-plane approach was originally presented at Linux
Plumbers
Conference 2010. The following slides contain a brief overview:

http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf

The basic approach is:
1. Each virtio-blk device has a thread dedicated to handling ioeventfd
signalling when the guest kicks the virtqueue.
2. Requests are processed without going through the QEMU block layer
using
Linux AIO directly.
3. Completion interrupts are injected via irqfd from the dedicated
thread.

To try it out:

qemu -drive
if=none,id=drive0,cache=none,aio=native,format=raw,file=...
-device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on

Is this the latest dataplane bits:
(git://github.com/stefanha/qemu.git virtio-blk-data-plane)

commit 7872075c24fa01c925d4f41faa9d04ce69bf5328
Author: Stefan Hajnoczi stefa...@redhat.com
Date: Wed Nov 14 15:45:38 2012 +0100

virtio-blk: add x-data-plane=on|off performance feature

With this commit on a ramdisk based box, I am seeing about 10K IOPS
with
x-data-plane on and 90K IOPS with x-data-plane off.

Any ideas?

Command line I used:

IMG=/dev/ram0
x86_64-softmmu/qemu-system-x86_64 \
-drive file=/root/img/sid.img,if=ide \
-drive file=${IMG},if=none,cache=none,aio=native,id=disk1 -device
virtio-blk-pci,x-data-plane=off,drive=disk1,scsi=off \
-kernel $KERNEL -append root=/dev/sdb1 console=tty0 \
-L /tmp/qemu-dataplane/share/qemu/ -nographic -vnc :0 -enable-kvm -m
2048 -smp 4 -cpu qemu64,+x2apic -M pc

Was just about to send out the latest patch series which addresses
review comments, so I have tested the latest code
(61b70fef489ce51ecd18d69afb9622c110b9315c).

Rebased onto qemu.git/master before sending out. The commit ID is now:
cf6ed6406543ecc43895012a9ac9665e3753d5e8

https://github.com/stefanha/qemu/commits/virtio-blk-data-plane

Stefan

Ok, thanks. /me trying

Hi Stefan,

If I enable the merge in guest the IOPS for seq read/write goes up to
~400K/300K. If I disable the merge in guest the IOPS drops to ~17K/24K
for seq read/write (which is similar to the result I posted yesterday,
with merge disalbed). Could you please also share the numbers for rand
read and write in your setup?

Thanks for running the test. Please send your rand read/write fio
jobfile so I can run the exact same test.

BTW I was running the default F18 (host) and RHEL 6.3 (guest) I/O
schedulers in my test yesterday.

Sure, this is the script I used to run the test in guest.

Thanks, will give this a try!

Stefan

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

Il 22/11/2012 12:58, Stefan Hajnoczi ha scritto:
 Then we can pass the iovec[] array straight to
 the Linux AIO functions.
 
 Since I knew the inhdr is only 1 byte I decided against doing something
 like dynamically allocating/freeing a QEMUIOVector which could handle
 spanning iovecs.

Long-term we want anyway a QEMUIOVector, to pass it to block functions.
 Can you introduce qemu_iovec_concat_iov that takes an iov/niov pair
instead of the source QEMUIOVector?  That should do quite nicely.

Paolo

Re: [Qemu-devel] [PATCH 0/7] virtio: virtio-blk data plane

On Wed, Nov 21, 2012 at 01:22:22PM +0800, Asias He wrote:
 On 11/20/2012 08:21 PM, Stefan Hajnoczi wrote:
  On Tue, Nov 20, 2012 at 10:02 AM, Asias He as...@redhat.com wrote:
  Hello Stefan,
 
  On 11/15/2012 11:18 PM, Stefan Hajnoczi wrote:
  This series adds the -device virtio-blk-pci,x-data-plane=on property that
  enables a high performance I/O codepath.  A dedicated thread is used to 
  process
  virtio-blk requests outside the global mutex and without going through 
  the QEMU
  block layer.
 
  Khoa Huynh k...@us.ibm.com reported an increase from 140,000 IOPS to 
  600,000
  IOPS for a single VM using virtio-blk-data-plane in July:
 
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580
 
  The virtio-blk-data-plane approach was originally presented at Linux 
  Plumbers
  Conference 2010.  The following slides contain a brief overview:
 

  http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf
 
  The basic approach is:
  1. Each virtio-blk device has a thread dedicated to handling ioeventfd
 signalling when the guest kicks the virtqueue.
  2. Requests are processed without going through the QEMU block layer using
 Linux AIO directly.
  3. Completion interrupts are injected via irqfd from the dedicated thread.
 
  To try it out:
 
qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=...
 -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on
 
 
  Is this the latest dataplane bits:
  (git://github.com/stefanha/qemu.git virtio-blk-data-plane)
 
  commit 7872075c24fa01c925d4f41faa9d04ce69bf5328
  Author: Stefan Hajnoczi stefa...@redhat.com
  Date:   Wed Nov 14 15:45:38 2012 +0100
 
  virtio-blk: add x-data-plane=on|off performance feature
 
 
  With this commit on a ramdisk based box, I am seeing about 10K IOPS with
  x-data-plane on and 90K IOPS with x-data-plane off.
 
  Any ideas?
 
  Command line I used:
 
  IMG=/dev/ram0
  x86_64-softmmu/qemu-system-x86_64 \
  -drive file=/root/img/sid.img,if=ide \
  -drive file=${IMG},if=none,cache=none,aio=native,id=disk1 -device
  virtio-blk-pci,x-data-plane=off,drive=disk1,scsi=off \
  -kernel $KERNEL -append root=/dev/sdb1 console=tty0 \
  -L /tmp/qemu-dataplane/share/qemu/ -nographic -vnc :0 -enable-kvm -m
  2048 -smp 4 -cpu qemu64,+x2apic -M pc
  
  Was just about to send out the latest patch series which addresses
  review comments, so I have tested the latest code
  (61b70fef489ce51ecd18d69afb9622c110b9315c).
  
  I was unable to reproduce a ramdisk performance regression on Linux
  3.6.6-3.fc18.x86_64 with Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz with
  8 GB RAM.
 
 I am using the latest upstream kernel.
 
  The ramdisk is 4 GB and I used your QEMU command-line with a RHEL 6.3 guest.
  
  Summary results:
  x-data-plane-on: iops=132856 aggrb=1039.1MB/s
  x-data-plane-off: iops=126236 aggrb=988.40MB/s
  
  virtio-blk-data-plane is ~5% faster in this benchmark.
  
  fio jobfile:
  [global]
  filename=/dev/vda
  blocksize=8k
  ioengine=libaio
  direct=1
  iodepth=8
  runtime=120
  time_based=1
  
  [reads]
  readwrite=randread
  numjobs=4
  
  Perf top (data-plane-on):
3.71%  [kvm]   [k] kvm_arch_vcpu_ioctl_run
3.27%  [kernel][k] memset--- ramdisk
2.98%  [kernel][k] do_blockdev_direct_IO
2.82%  [kvm_intel] [k] vmx_vcpu_run
2.66%  [kernel][k] _raw_spin_lock_irqsave
2.06%  [kernel][k] put_compound_page
2.06%  [kernel][k] __get_page_tail
1.83%  [i915]  [k] __gen6_gt_force_wake_mt_get
1.75%  [kernel][k] _raw_spin_unlock_irqrestore
1.33%  qemu-system-x86_64  [.] vring_pop --- virtio-blk-data-plane
1.19%  [kernel][k] compound_unlock_irqrestore
1.13%  [kernel][k] gup_huge_pmd
1.11%  [kernel][k] __audit_syscall_exit
1.07%  [kernel][k] put_page_testzero
1.01%  [kernel][k] fget
1.01%  [kernel][k] do_io_submit
  
  Since the ramdisk (memset and page-related functions) is so prominent
  in perf top, I also tried a 1-job 8k dd sequential write test on a
  Samsung 830 Series SSD where virtio-blk-data-plane was 9% faster than
  virtio-blk.  Optimizing against ramdisk isn't a good idea IMO because
  it acts very differently from real hardware where the driver relies on
  mmio, DMA, and interrupts (vs synchronous memcpy/memset).
 
 For the memset in ramdisk, you can simply patch drivers/block/brd.c to
 do nop instead of memset for testing.
 
 Yes, if you have fast SSD device （sometimes you need multiple which I
 do not have）, it makes more sense to test on real hardware. However,
 ramdisk test is still useful. It gives rough performance numbers. If A
 and B are both tested against ramdisk. The difference between A and B
 are still useful.

Optimizing the difference between A and B on ramdisk is only guaranteed
to

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 12:58:23PM +0100, Stefan Hajnoczi wrote:
 On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
  Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
   The iov_get_ptr() data returns a pointer to contiguous data within a
   vector.  This allows the caller to manipulate data inside the vector
   without copying in/out using iov_from_buf()/iov_to_buf() when we know
   that data is contiguous within an iovec element.
  
  This works for you because you have a single byte to write.  It would
  not work for the SG_IO inhdr, which would need iov_to_buf().
 
 Guilty as charged, your honor. :)
 
 Let me give a few more details about the motivation for this function:
 
 In virtio-blk-data-plane we have an iovec[] array.  In the read/write
 code path we discard the inhdr/outhdr so just the data buffers are left
 in the iovec[] array.  Then we can pass the iovec[] array straight to
 the Linux AIO functions.
 
 Because we're using the iovec[] array for data buffers and we're not
 allowed to make assumptions about iovec layout, we cannot use
 iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
 inhdr has already been discarded from the iovec[] array.

How about using iov_copy?

We have exactly this problem in virtio net if we run
on host that does not support mergeable buffer header,
and we solve it by copying out the iovec.

 Since I knew the inhdr is only 1 byte I decided against doing something
 like dynamically allocating/freeing a QEMUIOVector which could handle
 spanning iovecs.
 
 That said, I think this function is okay as-is because it works fine for
 non-virtio cases where the caller *knows* the iovec[] layout.  As a
 utility function it stands on its own.
 

My concern is these APIs are unsafe to use: you get back a pointer and
you must verify length is not too big before access.  Since the iov can
be manipulated by guest this looks like a good place to put extra
safeguards.

  What about the following alternative API:
  
  void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
ssize_t offset, size_t *bytes);
  
  which would place the number of valid bytes (i.e. the length of the
  remainder of the iovec entry) in *bytes?
  
  Also, I think that offset == iov_size(iov, iov_cnt) should be
  acceptable, and it would be the only case in which *bytes == 0.
 
 Hmm...this may be more useful than the version I proposed since the
 caller can also use it to find out how many bytes are contiguous.
 
 Michael: Any concerns if I update the code to reflect Paolo's
 suggestion?
 
 Stefan

I'd prefer something that actually works for all cases
rather than making callers check and handle failure,
or reason why it can't fail.

-- 
MST

Re: [Qemu-devel] [PATCH] Legacy qemu-kvm options have no argument

On Wed, 21 Nov 2012 13:12:53 +0100
Jan Kiszka jan.kis...@siemens.com wrote:

 On 2012-11-20 16:14, Markus Armbruster wrote:
  Cc'ing Marcelo  Jan to speed up patch application.
  
  Bruce Rogers brog...@suse.com writes:
  
  The options no-kvm, no-kvm-pit, no-kvm-pit-reinjection, and no-kvm-irqchip
  should be marked as having no argument.
 
  Signed-off-by: Bruce Rogers brog...@suse.com
  ---
   qemu-options.hx |8 
   1 files changed, 4 insertions(+), 4 deletions(-)
 
  diff --git a/qemu-options.hx b/qemu-options.hx
  index 9bb29d3..fbcf079 100644
  --- a/qemu-options.hx
  +++ b/qemu-options.hx
  @@ -2918,17 +2918,17 @@ Enable FIPS 140-2 compliance mode.
   ETEXI
   
   HXCOMM Deprecated by -machine accel=tcg property
  -DEF(no-kvm, HAS_ARG, QEMU_OPTION_no_kvm, , QEMU_ARCH_I386)
  +DEF(no-kvm, 0, QEMU_OPTION_no_kvm, , QEMU_ARCH_I386)
   
   HXCOMM Deprecated by kvm-pit driver properties
  -DEF(no-kvm-pit-reinjection, HAS_ARG, QEMU_OPTION_no_kvm_pit_reinjection,
  +DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection,
   , QEMU_ARCH_I386)
   
   HXCOMM Deprecated (ignored)
  -DEF(no-kvm-pit, HAS_ARG, QEMU_OPTION_no_kvm_pit, , QEMU_ARCH_I386)
  +DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, , QEMU_ARCH_I386)
   
   HXCOMM Deprecated by -machine kernel_irqchip=on|off property
  -DEF(no-kvm-irqchip, HAS_ARG, QEMU_OPTION_no_kvm_irqchip, , 
  QEMU_ARCH_I386)
  +DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, , QEMU_ARCH_I386)
   
   HXCOMM Deprecated (ignored)
   DEF(tdf, 0, QEMU_OPTION_tdf,, QEMU_ARCH_ALL)
 
 Reviewed-by: Jan Kiszka jan.kis...@siemens.com

Candidate for -trivial.

Re: [Qemu-devel] [PATCH 03/14] pixman: windup in configure makefiles

2012-11-22 Thread Stefano Stabellini

I have just noticed that we haven't actually written anywhere what's the
oldest version of pixman we support.

Do you know which one it is?


On Wed, 17 Oct 2012, Gerd Hoffmann wrote:
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  Makefile  |9 +
  configure |   38 ++
  2 files changed, 47 insertions(+), 0 deletions(-)
 
 diff --git a/Makefile b/Makefile
 index a9c22bf..5699101 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -105,6 +105,15 @@ endif
  
  subdir-libcacard: $(oslib-obj-y) $(trace-obj-y) qemu-timer-common.o
  
 +subdir-pixman: pixman/Makefile
 + $(call quiet-command,$(MAKE) $(SUBDIR_MAKEFLAGS) -C pixman V=$(V) 
 all,)
 +
 +pixman/Makefile: $(SRC_PATH)/pixman/configure
 + (cd pixman; $(SRC_PATH)/pixman/configure --disable-shared 
 --enable-static)
 +
 +$(SRC_PATH)/pixman/configure:
 + (cd $(SRC_PATH)/pixman; autoreconf -v --install)
 +
  $(filter %-softmmu,$(SUBDIR_RULES)): $(universal-obj-y) $(trace-obj-y) 
 $(common-obj-y) $(extra-obj-y) subdir-libdis
  
  $(filter %-user,$(SUBDIR_RULES)): $(universal-obj-y) $(trace-obj-y) 
 subdir-libdis-user subdir-libuser
 diff --git a/configure b/configure
 index 353d788..4b916aa 100755
 --- a/configure
 +++ b/configure
 @@ -147,6 +147,7 @@ curses=
  docs=
  fdt=
  nptl=
 +pixman=
  sdl=
  virtfs=
  vnc=yes
 @@ -642,6 +643,10 @@ for opt do
  # configure to be used by RPM and similar macros that set
  # lots of directory switches by default.
;;
 +  --pixman-system) pixman=system
 +  ;;
 +  --pixman-internal) pixman=internal
 +  ;;
--disable-sdl) sdl=no
;;
--enable-sdl) sdl=yes
 @@ -2117,6 +2122,34 @@ else
  fi
  
  ##
 +# pixman support probe
 +
 +if test $pixman = ; then
 +  if $pkg_config pixman-1  /dev/null 21; then
 +pixman=system
 +  else
 +pixman=internal
 +  fi
 +fi
 +if test $pixman = system; then
 +  pixman_cflags=`$pkg_config --cflags pixman-1 2/dev/null`
 +  pixman_libs=`$pkg_config --libs pixman-1 2/dev/null`
 +else
 +  if test ! -d ${source_path}/pixman/pixman; then
 +echo ERROR: pixman not present. Your options:
 +echo   (1) Prefered: Install the pixman devel package (any recent
 +echo   distro should have packages as Xorg needs pixman too).
 +echo   (2) Fetch the pixman submodule, using:
 +echo   git submodule update --init pixman
 +exit 1
 +  fi
 +  pixman_cflags=-I${source_path}/pixman/pixman
 +  pixman_libs=-Lpixman/pixman/.libs -lpixman-1
 +fi
 +QEMU_CFLAGS=$QEMU_CFLAGS $pixman_cflags
 +libs_softmmu=$libs_softmmu $pixman_libs
 +
 +##
  # libcap probe
  
  if test $cap != no ; then
 @@ -3147,6 +3180,7 @@ echo -Werror enabled   $werror
  if test $darwin = yes ; then
  echo Cocoa support $cocoa
  fi
 +echo pixman$pixman
  echo SDL support   $sdl
  echo curses support$curses
  echo curl support  $curl
 @@ -3909,6 +3943,9 @@ if test $target_softmmu = yes ; then
if test $smartcard_nss = yes ; then
  echo subdir-$target: subdir-libcacard  $config_host_mak
fi
 +  if test $pixman = internal ; then
 +echo subdir-$target: subdir-pixman  $config_host_mak
 +  fi
case $target_arch2 in
  i386|x86_64)
echo CONFIG_HAVE_CORE_DUMP=y  $config_target_mak
 @@ -4112,6 +4149,7 @@ DIRS=$DIRS pc-bios/optionrom pc-bios/spapr-rtas
  DIRS=$DIRS roms/seabios roms/vgabios
  DIRS=$DIRS qapi-generated
  DIRS=$DIRS libcacard libcacard/libcacard libcacard/trace
 +DIRS=$DIRS pixman
  FILES=Makefile tests/tcg/Makefile qdict-test-data.txt
  FILES=$FILES tests/tcg/cris/Makefile tests/tcg/cris/.gdbinit
  FILES=$FILES tests/tcg/lm32/Makefile libcacard/Makefile
 -- 
 1.7.1

Re: [Qemu-devel] TCP based PCIE request forwarding

On Thu, Nov 22, 2012 at 11:21:58AM +0100, Paolo Bonzini wrote:
 Il 22/11/2012 09:19, Stefan Hajnoczi ha scritto:
   usage
   -
   PCIEFW devices are instanciated using the following QEMU options:
   -device \
pciefw,\
laddr=local_addr,\
lport=local_port,\
raddr=remote_addr,\
rport=remote_port
  Take a look at qemu_socket.h:socket_parse().  It should allow you to
  support TCP, UNIX domain sockets, and arbitrary file descriptors.
  
 
 Even better it could just be a chardev.  socket_parse() is only used by
 the (human) monitor interface.

The issue with chardev is that it's asynchronous.

In this case we cannot return from MemoryRegionOps-read() or
MemoryRegionOps-write() back to the event loop.

Stefan

[Qemu-devel] qmp problems with --enable-kvm

I use the following qmp commands (file cmds.txt) for testing:

- cmds.txt start
{execute:qmp_capabilities, arguments:{}}
{execute:migrate, arguments:{ uri:exec:cat /dev/null}}
{execute:query-migrate, arguments:{}}
- cmds.txt end

and the following command line to start qemu:

#  ./x86_64-softmmu/qemu-system-x86_64 -chardev 
socket,id=qmp,path=/tmp/test.qmp,server,nowait -mon chardev=qmp,mode=control  
--enable-kvm

I then execute the qmp commands with:

# cat cmds.txt | socat - /tmp/test.qmp 
{QMP: {version: {qemu: {micro: 90, minor: 2, major: 1}, package: 
}, capabilities: []}}
{return: {}}
{return: {}}
{timestamp: {seconds: 1353587785, microseconds: 134907}, event: STOP}
{return: {status: completed, downtime: 8, ram: {total: 143065088, 
remaining: 0, transferred: 772035, duplicate: 34755, dirty-pages-rate: 
0, normal-bytes: 737280, normal: 180}}}

so far so good.

But if I execute that again I get an error:

# cat cmds.txt | socat - /tmp/test.qmp 
{error: {class: CommandNotFound, desc: The command qmp_capabilities 
has not been found}}
{return: {}}
{return: {expected-downtime: 0, status: active, total-time: 33, 
ram: {total: 143065088, remaining: 8937472, transferred: 495481, 
duplicate: 67388, dirty-pages-rate: 0, normal-bytes: 1200128, normal: 
293}}}


The interesting thing is that it works perfectly without --enable-kvm. 

Can somebody reproduce that?

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

On Thu, Nov 22, 2012 at 11:26:21AM +, Dietmar Maurer wrote:
  Did you look at the VMDK Stream-Optimized Compressed subformat?
  
  http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?
  src=vmdk
  
  It is a stream of compressed grains (data).  They are out-of-order and 
  each
  grain comes with the virtual disk lba where the data should be visible to 
  the
  guest.
  
 
 What kind of license is applied to that specification?

The document I linked came straight from Google Search and you don't
need to agree to anything to view it.  The document doesn't seem to
impose restrictions.  QEMU has supported the VMDK format and so have
other open source tools for a number of years.

For anything more specific you could search VMware's website and/or
check with a lawyer.

Stefan

Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)

 -Original Message-
 From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
 Sent: Donnerstag, 22. November 2012 13:45
 To: Dietmar Maurer
 Cc: qemu-devel@nongnu.org; kw...@redhat.com
 Subject: Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu
 (v1)

 On Thu, Nov 22, 2012 at 11:26:21AM +, Dietmar Maurer wrote:
   Did you look at the VMDK Stream-Optimized Compressed subformat?

 http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?
   src=vmdk

   It is a stream of compressed grains (data).  They are out-of-order
   and each grain comes with the virtual disk lba where the data should
   be visible to the guest.

  What kind of license is applied to that specification?

 The document I linked came straight from Google Search and you don't need
 to agree to anything to view it.  The document doesn't seem to impose
 restrictions.  QEMU has supported the VMDK format and so have other open
 source tools for a number of years.

 For anything more specific you could search VMware's website and/or check
 with a lawyer.

The documents says: VMware products are covered by one or more patents listed 
at http://www.vmware.com/go/patents

I simply do not have the time to check all those things, which make that format 
unusable for me.

Anyways, thanks for the link.

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

On Tue, Nov 20, 2012 at 03:30:34PM +0100, Christian Borntraeger wrote:
 There are QEMUMachines that have neither IF_IDE nor IF_SCSI as a
 default/standard interface to their block devices / drives. Therefore,
 this patch introduces a new field default_block_type per QEMUMachine
 struct. The prior use_scsi field becomes thereby obsolete and is
 replaced through .default_block_type = IF_SCSI.
 
 This patch also changes the default for s390x to IF_VIRTIO and
 removes an early hack that converts IF_IDE drives.
 Other parties have already claimed interest (e.g. IF_SD for exynos)
 
 To create a sane default, for machines that dont specify a
 default_block_type, this patch makes IF_IDE = 0 and IF_NONE = 1.
 I checked all users of IF_NONE (blockdev.c and ww/device-hotplug.c)
 as well as IF_IDE and it seems that it is ok to change the defines -
 in other words, I found no obvious (to me) assumption in the code
 regarding IF_NONE==0. IF_NONE is only set if there is an
 explicit if=none. Without if=* the interface becomes IF_DEFAULT.
 
 I would suggest to have some additional care, e.g. by letting
 this patch sit some days in the block tree.
 
 Based on an initial patch from Einar Lueck elelu...@de.ibm.com
 
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 CC: Igor Mitsyanko i.mitsya...@samsung.com
 CC: Markus Armbruster arm...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 ---
  blockdev.c  |4 ++--
  blockdev.h  |5 +++--
  hw/boards.h |3 ++-
  hw/device-hotplug.c |2 +-
  hw/highbank.c   |2 +-
  hw/leon3.c  |1 -
  hw/mips_jazz.c  |4 ++--
  hw/pc_sysfw.c   |2 +-
  hw/puv3.c   |1 -
  hw/realview.c   |7 ---
  hw/s390-virtio.c|   17 ++---
  hw/spapr.c  |2 +-
  hw/sun4m.c  |   24 
  hw/versatilepb.c|4 ++--
  hw/vexpress.c   |4 ++--
  hw/xilinx_zynq.c|2 +-
  vl.c|   21 -
  17 files changed, 48 insertions(+), 57 deletions(-)

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

This will be part of QEMU 1.4.

Stefan

Re: [Qemu-devel] [PATCH 03/14] pixman: windup in configure makefiles

2012-11-22 Thread Gerd Hoffmann

On 11/22/12 13:34, Stefano Stabellini wrote:
 I have just noticed that we haven't actually written anywhere what's the
 oldest version of pixman we support.
 
 Do you know which one it is?

0.18.4 is known good.

cheers,
  Gerd

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

2012-11-22 Thread Christian Borntraeger

On 22/11/12 13:02, Markus Armbruster wrote:

Thanks for the review, Stefan already applied the patch, though.
Is there anything that you really want to have a followup patch
besides this one?

 There's just one caller that passes IF_DEFAULT.  We could change it to
 pass machine-block_default_type instead, and drop parameter
 block_default_type.  Follow-up patch.

I will have a look on that.



Christian

Re: [Qemu-devel] qmp problems with --enable-kvm

On Thu, 22 Nov 2012 12:41:31 +
Dietmar Maurer diet...@proxmox.com wrote:

 I use the following qmp commands (file cmds.txt) for testing:
 
 - cmds.txt start
 {execute:qmp_capabilities, arguments:{}}
 {execute:migrate, arguments:{ uri:exec:cat /dev/null}}
 {execute:query-migrate, arguments:{}}
 - cmds.txt end
 
 and the following command line to start qemu:
 
 #  ./x86_64-softmmu/qemu-system-x86_64 -chardev 
 socket,id=qmp,path=/tmp/test.qmp,server,nowait -mon chardev=qmp,mode=control  
 --enable-kvm
 
 I then execute the qmp commands with:
 
 # cat cmds.txt | socat - /tmp/test.qmp 
 {QMP: {version: {qemu: {micro: 90, minor: 2, major: 1}, 
 package: }, capabilities: []}}
 {return: {}}
 {return: {}}
 {timestamp: {seconds: 1353587785, microseconds: 134907}, event: 
 STOP}
 {return: {status: completed, downtime: 8, ram: {total: 143065088, 
 remaining: 0, transferred: 772035, duplicate: 34755, 
 dirty-pages-rate: 0, normal-bytes: 737280, normal: 180}}}
 
 so far so good.
 
 But if I execute that again I get an error:
 
 # cat cmds.txt | socat - /tmp/test.qmp 
 {error: {class: CommandNotFound, desc: The command qmp_capabilities 
 has not been found}}
 {return: {}}
 {return: {expected-downtime: 0, status: active, total-time: 33, 
 ram: {total: 143065088, remaining: 8937472, transferred: 495481, 
 duplicate: 67388, dirty-pages-rate: 0, normal-bytes: 1200128, normal: 
 293}}}
 
 
 The interesting thing is that it works perfectly without --enable-kvm. 

Can you please try the following patch?

 https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00174.html

Re: [Qemu-devel] [PATCH v2 39/39] raw-win32: implement native asynchronous I/O

2012-11-22 Thread Jan Kiszka

On 2012-11-21 14:38, Jan Kiszka wrote:
 On 2012-11-21 14:33, Paolo Bonzini wrote:
 Il 21/11/2012 14:27, Jan Kiszka ha scritto:
 Are there any patches flying around that are supposed to resolve this?

 No.  Is this with aio=native or aio=threads?
 aio=default - whatever that is on win32.

 That's aio=threads, just like POSIX.  I'm curious (but not too
 optimistic) whether aio=native fixes it.
 
 Just tried, and it actually makes no difference.
 

 Anyhow, do you have a reproducer that I can download?
 
 Unfortunately, not at this point. Have to check, maybe I can trip the
 image to the bare Linux guest.

Maybe this helps:

=0 0xf77c242e __kernel_vsyscall+0xe() in [vdso].so (0x0519df88)
  1 0xf761f10b __libc_read+0x4a() in libpthread.so.0 (0x0519df88)
  2 0x7bc78c98 in ntdll (+0x68c97) (0x0519df88)
  3 0x7bc7b0a3 in ntdll (+0x6b0a2) (0x0519e1b8)
  4 0x7bc7b195 NtWaitForMultipleObjects+0x54() in ntdll (0x0519e1e8)
  5 0x7b86de9f WaitForMultipleObjectsEx+0xee() in kernel32 (0x0519e338)
  6 0x7b86df1a WaitForMultipleObjects+0x39() in kernel32 (0x0519e368)
  7 0x0040301b aio_poll+0x16a(ctx=0x157610, blocking=is not available) 
[/data/qemu/aio-win32.c:178] in qemu-system-arm (0x0001)
  8 0x004feec3 qemu_aio_wait+0x22() [/data/qemu/main-loop.c:442] in 
qemu-system-arm (0x0105bdec)
  9 0x00417166 bdrv_rw_co+0xa5(bs=0x159648, sector_num=internal error, 
buf=¹¸ÿ¦

  , nb_sectors=0x4, is_write=true) [/data/qemu/block.c:1997] in 
qemu-system-arm (0x0105bdec)
  10 0x00495544 in qemu-system-arm (+0x95543) (0x0105bdec)
  11 0x0049592a pflash_write+0x3b9(pfl=0xd6f16e0, offset=internal error, 
value=0x98f7fb1d, width=0x4, be=0) [/data/qemu/hw/pflash_cfi01.c:405] in 
qemu-system-arm (0x0105bdec)
  12 0x00618a7e io_mem_write+0xed(mr=0xd6f2a48, addr=0x105bdec, val=0x98f7fb1d, 
size=0x4) [/data/qemu/memory.c:911] in qemu-system-arm (0x0004)
  13 0x0064088c helper_stl_mmu+0x2fb(env=0x15e268, addr=0xca05bdec, 
val=0x98f7fb1d, mmu_idx=0) [/data/qemu/softmmu_template.h:231] in 
qemu-system-arm (0x371e)
  14 0x02136637 (0x0015e268)

It's the VCPU thread stuck in aio_poll, holding the BQL, blocking the
iothread this way as well. Is there some race that prevents we receive
the proper IO completion signal? Any states I should check?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

Markus Armbruster arm...@redhat.com writes:

 Christian Borntraeger borntrae...@de.ibm.com writes:

 There are QEMUMachines that have neither IF_IDE nor IF_SCSI as a
 default/standard interface to their block devices / drives. Therefore,
 this patch introduces a new field default_block_type per QEMUMachine
 struct. The prior use_scsi field becomes thereby obsolete and is
 replaced through .default_block_type = IF_SCSI.

 This patch also changes the default for s390x to IF_VIRTIO and
 removes an early hack that converts IF_IDE drives.

 Recommend to do this in a separate patch.

 Other parties have already claimed interest (e.g. IF_SD for exynos)

 To create a sane default, for machines that dont specify a
 default_block_type, this patch makes IF_IDE = 0 and IF_NONE = 1.
 I checked all users of IF_NONE (blockdev.c and ww/device-hotplug.c)

 Typo: hw/

 as well as IF_IDE and it seems that it is ok to change the defines -
 in other words, I found no obvious (to me) assumption in the code
 regarding IF_NONE==0.

 Checking for IF_NONE and IF_IDE is not enough; we have to check that
 BlockInterfaceType values are never used in a way that's broken by this
 change, such as C89 initializers and implicit comparisons to zero.

 I just did that, and couldn't find anything.

Almost missed this one: it changes this error message of monitor command
drive_add:

(qemu) drive_add 0 if=ide
Can't hot-add drive to type 1

to

Can't hot-add drive to type 0

I don't think we care.  Yes, the error message sucks.

[...]

Re: [Qemu-devel] qmp problems with --enable-kvm

  The interesting thing is that it works perfectly without --enable-kvm.
 
 Can you please try the following patch?
 
  https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00174.html

I am unable to apply that patch?

patching file vl.c
Hunk #1 FAILED at 3551.
Hunk #2 succeeded at 3729 with fuzz 2 (offset 138 lines).

[Qemu-devel] [PATCH for-1.3] usb: tag usb host adapters as not hotpluggable.

2012-11-22 Thread Gerd Hoffmann

Hotplugging them simply doesn't work, so tag them accordingly to
avoid users trying and then crashing qemu.

For xhci there is nothing fundamental which prevents hotplug from
working, we'll only need a exit() function which cleans up
everything properly.  That isn't for 1.3 though.

For ehci+uhci+ohci hotplug can't be supported until qemu gains the
capability to hotplug multifunction pci devices.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb/hcd-ehci-pci.c |1 +
 hw/usb/hcd-ohci.c |1 +
 hw/usb/hcd-uhci.c |1 +
 hw/usb/hcd-xhci.c |1 +
 4 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/hw/usb/hcd-ehci-pci.c b/hw/usb/hcd-ehci-pci.c
index 5887eab..41dbb53 100644
--- a/hw/usb/hcd-ehci-pci.c
+++ b/hw/usb/hcd-ehci-pci.c
@@ -123,6 +123,7 @@ static void ehci_class_init(ObjectClass *klass, void *data)
 k-revision = i-revision;
 k-class_id = PCI_CLASS_SERIAL_USB;
 k-config_write = usb_ehci_pci_write_config;
+k-no_hotplug = 1;
 dc-vmsd = vmstate_ehci_pci;
 dc-props = ehci_pci_properties;
 }
diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index 64de906..e16a2ec 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -1882,6 +1882,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void 
*data)
 k-vendor_id = PCI_VENDOR_ID_APPLE;
 k-device_id = PCI_DEVICE_ID_APPLE_IPID_USB;
 k-class_id = PCI_CLASS_SERIAL_USB;
+k-no_hotplug = 1;
 dc-desc = Apple USB Controller;
 dc-props = ohci_pci_properties;
 }
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 8e47803..d053791 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1327,6 +1327,7 @@ static void uhci_class_init(ObjectClass *klass, void 
*data)
 k-device_id = info-device_id;
 k-revision  = info-revision;
 k-class_id  = PCI_CLASS_SERIAL_USB;
+k-no_hotplug = 1;
 dc-vmsd = vmstate_uhci;
 dc-props = uhci_properties;
 u-info = *info;
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 8ef4b07..efb509e 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3167,6 +3167,7 @@ static void xhci_class_init(ObjectClass *klass, void 
*data)
 k-class_id = PCI_CLASS_SERIAL_USB;
 k-revision = 0x03;
 k-is_express   = 1;
+k-no_hotplug   = 1;
 }
 
 static TypeInfo xhci_info = {
-- 
1.7.1

Re: [Qemu-devel] [PATCH/RFC] block: Ensure that block size constraints are considered

2012-11-22 Thread Christian Borntraeger

On 21/11/12 17:03, Paolo Bonzini wrote:
 Il 21/11/2012 10:15, Kevin Wolf ha scritto:
 +if ((bs-open_flags  BDRV_O_NOCACHE)) {
 +bs-file-buffer_alignment = align;
 +}
 Any reason to restrict this to BDRV_O_NOCACHE?

 There have been patches to change the BDRV_O_NOCACHE flag from the
 monitor, in which case bdrv_set_buffer_alignment() wouldn't be called
 anew and O_DIRECT requests start to fail again.

 
 bdrv_set_buffer_alignment() is completely broken.  It should set host
 alignment, but in fact it is passed the guest alignment.
 
 In practice, we only support logical_block_size matching the host's or
 bigger (which is unsafe due to torn writes, but works).


For other reasons (partition table format) we want to have host block
size == guest block size on s390 anyway - so it would not really matter for
us.
But I certainly agree that it makes more sense to use the host block size
for the alignment checks.



 So I suggest that we just look at writes outside the device models, and
 fix them to always read a multiple of 4k.

Wouldnt that cause performance regressions for block devices with 512 byte 
block size, because we read more than necessary. Wouldnt that also require
read/update/write combinations for valid 512 byte writes?

Christian

[Qemu-devel] drive_add 0 if=scsi crashes

Watch this:

(qemu) drive_add 0 if=scsi
Segmentation fault (core dumped)

Broken in commit 0d936928 qdev: Convert busses to QEMU Object Model.
Before:

(qemu) drive_add 0 if=scsi
Device is not a SCSI adapter

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

Christian Borntraeger borntrae...@de.ibm.com writes:

 On 22/11/12 13:02, Markus Armbruster wrote:

 Thanks for the review, Stefan already applied the patch, though.
 Is there anything that you really want to have a followup patch
 besides this one?

Really want is too strong, but improving the comment on the definition
of IF_IDE would be nice.

 There's just one caller that passes IF_DEFAULT.  We could change it to
 pass machine-block_default_type instead, and drop parameter
 block_default_type.  Follow-up patch.

 I will have a look on that.

Thanks!

Re: [Qemu-devel] Ubuntu/Debian Installer + Virtio-SCSI - Bad ram pointer

2012-11-22 Thread Peter Lieven



On 19.11.2012 18:20, Stefan Hajnoczi wrote:

On Thu, Nov 8, 2012 at 4:26 PM, Peter Lieven p...@dlhnet.de wrote:

Has anyone any other idea what the cause could be or where to start?


Hi Peter,
I suggested posting the source tree you are building.  Since you have
applied patches yourself no one else is able to follow along with the
gdb output or reproduce the issue accurately.


Sorry for the late reply, I used qemu git at 
e24dc9feb0d68142d54dc3c097f57588836d1338
and libiscsi git at 3b3036b9dae55f0c3eef9d75db89c7b78f637a12.

The cmdline:
qemu-system-x86_64 -enable-kvm -m 1024 -drive 
if=virtio,file=iscsi://172.21.200.56/iqn.2001-05.com.equallogic:0-8a0906-62ff4e007-e4a3c8908af50839-test-3000g/0
 -cdrom ubuntu-12.04.1-server-amd64.iso -vnc :1

The vm crashes with:

Bad ram pointer 0x7fd220008000

after the user settings and timezone config when loading the module
libdmraid1.0.0.rc16-udeb

I hope this helps to reproduce.

Peter



Stefan

[Qemu-devel] [PATCH] Clean up pci_drive_hot_add()'s use of BlockInterfaceType

pci_drive_hot_add() parameter type has the wrong type: int instead of
BlockInterfaceType.  It's actually redundant, so we can just drop it.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 hw/device-hotplug.c | 11 ---
 hw/pci-hotplug.c|  7 +++
 sysemu.h|  3 +--
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/hw/device-hotplug.c b/hw/device-hotplug.c
index eec0fe3..6d9c080 100644
--- a/hw/device-hotplug.c
+++ b/hw/device-hotplug.c
@@ -49,18 +49,16 @@ DriveInfo *add_init_drive(const char *optstr)
 }
 
 #if !defined(TARGET_I386)
-int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
-  DriveInfo *dinfo, int type)
+int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo)
 {
 /* On non-x86 we don't do PCI hotplug */
-monitor_printf(mon, Can't hot-add drive to type %d\n, type);
+monitor_printf(mon, Can't hot-add drive to type %d\n, dinfo-type);
 return -1;
 }
 #endif
 
 void drive_hot_add(Monitor *mon, const QDict *qdict)
 {
-int type;
 DriveInfo *dinfo = NULL;
 const char *opts = qdict_get_str(qdict, opts);
 
@@ -72,14 +70,13 @@ void drive_hot_add(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, Parameter addr not supported\n);
 goto err;
 }
-type = dinfo-type;
 
-switch (type) {
+switch (dinfo-type) {
 case IF_NONE:
 monitor_printf(mon, OK\n);
 break;
 default:
-if (pci_drive_hot_add(mon, qdict, dinfo, type)) {
+if (pci_drive_hot_add(mon, qdict, dinfo)) {
 goto err;
 }
 }
diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index e7fb780..d2121ee 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -105,15 +105,14 @@ static int scsi_hot_add(Monitor *mon, DeviceState 
*adapter,
 return 0;
 }
 
-int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
-  DriveInfo *dinfo, int type)
+int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo)
 {
 int dom, pci_bus;
 unsigned slot;
 PCIDevice *dev;
 const char *pci_addr = qdict_get_str(qdict, pci_addr);
 
-switch (type) {
+switch (dinfo-type) {
 case IF_SCSI:
 if (pci_read_devaddr(mon, pci_addr, dom, pci_bus, slot)) {
 goto err;
@@ -129,7 +128,7 @@ int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 }
 break;
 default:
-monitor_printf(mon, Can't hot-add drive to type %d\n, type);
+monitor_printf(mon, Can't hot-add drive to type %d\n, dinfo-type);
 goto err;
 }
 
diff --git a/sysemu.h b/sysemu.h
index f5ac664..c1b370b 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -145,8 +145,7 @@ extern unsigned int nb_prom_envs;
 
 /* pci-hotplug */
 void pci_device_hot_add(Monitor *mon, const QDict *qdict);
-int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
-  DriveInfo *dinfo, int type);
+int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo);
 void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict);
 
 /* generic hotplug */
-- 
1.7.11.7

Re: [Qemu-devel] [PATCH] Clean up pci_drive_hot_add()'s use of BlockInterfaceType

2012-11-22 Thread Alexander Graf


On 22.11.2012, at 15:16, Markus Armbruster wrote:

 pci_drive_hot_add() parameter type has the wrong type: int instead of
 BlockInterfaceType.  It's actually redundant, so we can just drop it.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
 hw/device-hotplug.c | 11 ---
 hw/pci-hotplug.c|  7 +++
 sysemu.h|  3 +--
 3 files changed, 8 insertions(+), 13 deletions(-)
 
 diff --git a/hw/device-hotplug.c b/hw/device-hotplug.c
 index eec0fe3..6d9c080 100644
 --- a/hw/device-hotplug.c
 +++ b/hw/device-hotplug.c
 @@ -49,18 +49,16 @@ DriveInfo *add_init_drive(const char *optstr)
 }
 
 #if !defined(TARGET_I386)
 -int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 -  DriveInfo *dinfo, int type)
 +int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo)
 {
 /* On non-x86 we don't do PCI hotplug */
 -monitor_printf(mon, Can't hot-add drive to type %d\n, type);
 +monitor_printf(mon, Can't hot-add drive to type %d\n, dinfo-type);

Can't we expose names instead? I don't really want internal enum numbering be 
part of our external CLI interface :)


Alex

 return -1;
 }
 #endif
 
 void drive_hot_add(Monitor *mon, const QDict *qdict)
 {
 -int type;
 DriveInfo *dinfo = NULL;
 const char *opts = qdict_get_str(qdict, opts);
 
 @@ -72,14 +70,13 @@ void drive_hot_add(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, Parameter addr not supported\n);
 goto err;
 }
 -type = dinfo-type;
 
 -switch (type) {
 +switch (dinfo-type) {
 case IF_NONE:
 monitor_printf(mon, OK\n);
 break;
 default:
 -if (pci_drive_hot_add(mon, qdict, dinfo, type)) {
 +if (pci_drive_hot_add(mon, qdict, dinfo)) {
 goto err;
 }
 }
 diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
 index e7fb780..d2121ee 100644
 --- a/hw/pci-hotplug.c
 +++ b/hw/pci-hotplug.c
 @@ -105,15 +105,14 @@ static int scsi_hot_add(Monitor *mon, DeviceState 
 *adapter,
 return 0;
 }
 
 -int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 -  DriveInfo *dinfo, int type)
 +int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo)
 {
 int dom, pci_bus;
 unsigned slot;
 PCIDevice *dev;
 const char *pci_addr = qdict_get_str(qdict, pci_addr);
 
 -switch (type) {
 +switch (dinfo-type) {
 case IF_SCSI:
 if (pci_read_devaddr(mon, pci_addr, dom, pci_bus, slot)) {
 goto err;
 @@ -129,7 +128,7 @@ int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 }
 break;
 default:
 -monitor_printf(mon, Can't hot-add drive to type %d\n, type);
 +monitor_printf(mon, Can't hot-add drive to type %d\n, dinfo-type);
 goto err;
 }
 
 diff --git a/sysemu.h b/sysemu.h
 index f5ac664..c1b370b 100644
 --- a/sysemu.h
 +++ b/sysemu.h
 @@ -145,8 +145,7 @@ extern unsigned int nb_prom_envs;
 
 /* pci-hotplug */
 void pci_device_hot_add(Monitor *mon, const QDict *qdict);
 -int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 -  DriveInfo *dinfo, int type);
 +int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo);
 void do_pci_device_hot_remove(Monitor *mon, const QDict *qdict);
 
 /* generic hotplug */
 -- 
 1.7.11.7

[Qemu-devel] tap devices not receiving packets from a bridge

2012-11-22 Thread Peter Lieven


Hi,

is anyone aware of a problem with the linux network bridge that in very rare 
circumstances stops
a bridge from sending pakets to a tap device?

My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 
3.2.0-34.53
which is based on Linux 3.2.33.

I was not yet able to reproduce the issue, it happens in really rare cases. The 
symptom is that
the tap does not have any TX packets. RX is working fine. I see the packets 
coming in at
the physical interface on the host, but they are not forwarded to the tap 
interface.
The bridge itself has learnt the mac address of the vServer that is connected 
to the tap interface.
It does not help to toggle the bridge link status,  the tap interface status or 
the interface in the vServer.
It seems that problem occurs if a tap interface that has previously been used, 
but set to nonpersistent
is set persistent again and then is by chance assigned to the same vServer 
(=same mac address on same
bridge) again. Unfortunately it seems not to be reproducible.

Maybe this sounds familiar to someone?

Thank you,
Peter

[Qemu-devel] [RFC PATCH v2 0/3] Virtio-refactoring.

From: KONRAD Frederic fred.kon...@greensocs.com

Hi,

I made the changes you suggest in the last RFC. 

There are still two issues with the command line :

* When I use ./qemu* -device virtio-blk -device virtio-pci
  It is said that no virtio-bus are present.
* The virtio-blk is plugged in the last created virtio-bus if no bus=
  option is present. It's an issue as we can only plug one virtio-device.

The first problem is a more general issue as it is the case for the SCSI bus and
can be fixed later.

Changes v1 - v2:

* All the little fix you suggest ( License, Debug printf, naming convention,
  ...)
* Added get_virtio_device_id(), and remove the pci_id* from the VirtioBus
  structure.
* Added virtio_bus_reset().
* Added cast macros VIRTIO_BUS.
* Added virtio_bus_plug_device.
* Replaced the old-style bus-qbus by BUS() macro.

git available here :
git://git.greensocs.com/qemu_virtio.git virtio_refactoring_2
or with http :
http://git.greensocs.com/?p=qemu_virtio.git;a=shortlog; \
h=refs/heads/virtio_refactoring_2

Fred

KONRAD Frederic (3):
  virtio-bus : Introduce VirtioBus.
  virtio-pci : add a virtio-bus interface
  virtio-blk : add the virtio-blk device.

 hw/Makefile.objs |   1 +
 hw/virtio-blk.c  |  82 ++
 hw/virtio-blk.h  |  10 
 hw/virtio-bus.c  | 148 +
 hw/virtio-bus.h  |  58 +
 hw/virtio-pci.c  | 152 +++
 hw/virtio-pci.h  |   6 +++
 7 files changed, 457 insertions(+)
 create mode 100644 hw/virtio-bus.c
 create mode 100644 hw/virtio-bus.h

-- 
1.7.11.7

Re: [Qemu-devel] [PATCH] Clean up pci_drive_hot_add()'s use of BlockInterfaceType

Alexander Graf ag...@suse.de writes:

 On 22.11.2012, at 15:16, Markus Armbruster wrote:

 pci_drive_hot_add() parameter type has the wrong type: int instead of
 BlockInterfaceType.  It's actually redundant, so we can just drop it.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
 hw/device-hotplug.c | 11 ---
 hw/pci-hotplug.c|  7 +++
 sysemu.h|  3 +--
 3 files changed, 8 insertions(+), 13 deletions(-)
 
 diff --git a/hw/device-hotplug.c b/hw/device-hotplug.c
 index eec0fe3..6d9c080 100644
 --- a/hw/device-hotplug.c
 +++ b/hw/device-hotplug.c
 @@ -49,18 +49,16 @@ DriveInfo *add_init_drive(const char *optstr)
 }
 
 #if !defined(TARGET_I386)
 -int pci_drive_hot_add(Monitor *mon, const QDict *qdict,
 -  DriveInfo *dinfo, int type)
 +int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo)
 {
 /* On non-x86 we don't do PCI hotplug */
 -monitor_printf(mon, Can't hot-add drive to type %d\n, type);
 +monitor_printf(mon, Can't hot-add drive to type %d\n, dinfo-type);

 Can't we expose names instead? I don't really want internal enum
 numbering be part of our external CLI interface :)

Fixing that would be nice, but it's outside this patch's scope.

[Qemu-devel] [RFC PATCH v2 3/3] virtio-blk : add the virtio-blk device.

From: KONRAD Frederic fred.kon...@greensocs.com

This patch just add the virtio-blk device which can connect on a Virtio-Bus. The
initialization fail if no free VirtioBus are present.

Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
---
 hw/virtio-blk.c | 82 +
 hw/virtio-blk.h | 10 +++
 2 files changed, 92 insertions(+)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index e25cc96..d8263c3 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -17,6 +17,8 @@
 #include hw/block-common.h
 #include blockdev.h
 #include virtio-blk.h
+#include virtio-bus.h
+#include hw/pci.h /* for PCI IDs */
 #include scsi-defs.h
 #ifdef __linux__
 # include scsi/sg.h
@@ -656,3 +658,83 @@ void virtio_blk_exit(VirtIODevice *vdev)
 blockdev_mark_auto_del(s-bs);
 virtio_cleanup(vdev);
 }
+
+static int virtio_blk_qdev_init(DeviceState *qdev)
+{
+VirtIOBLKState *s = VIRTIO_BLK(qdev);
+/*
+ * When the 'bus=' option is not set, the default bus is the last created.
+ * This is a problem because only one VirtIODevice can be plugged.
+ */
+VirtioBus *bus = VIRTIO_BUS(qdev_get_parent_bus(qdev));
+
+/* init the device. */
+VirtIODevice *vdev = virtio_blk_init(qdev, s-blk);
+if (!vdev) {
+return -1;
+}
+
+/* Plug the device */
+if (virtio_bus_plug_device(bus, vdev) != 0) {
+/* this can happen when the bus is not free */
+return -1;
+}
+
+return 0;
+}
+
+static int virtio_blk_qdev_exit(DeviceState *qdev)
+{
+VirtioBus *bus = DO_UPCAST(VirtioBus, qbus, qdev-parent_bus);
+virtio_bus_exit_cb(bus);
+virtio_blk_exit(bus-vdev);
+return 0;
+}
+
+static Property virtio_blk_properties[] = {
+DEFINE_BLOCK_PROPERTIES(VirtIOBLKState, blk.conf),
+DEFINE_BLOCK_CHS_PROPERTIES(VirtIOBLKState, blk.conf),
+DEFINE_PROP_STRING(serial, VirtIOBLKState, blk.serial),
+#ifdef __linux__
+DEFINE_PROP_BIT(scsi, VirtIOBLKState, blk.scsi, 0, true),
+#endif
+DEFINE_PROP_BIT(config-wce, VirtIOBLKState, blk.config_wce, 0, true),
+/*
+ * Theses one are PCI related.
+ * DEFINE_PROP_BIT(ioeventfd, VirtIOBLKState, flags,
+ * VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
+ * DEFINE_PROP_UINT32(vectors, VirtIOBLKState, nvectors, 2),
+ */
+DEFINE_VIRTIO_BLK_FEATURES(VirtIOBLKState, host_features),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_blk_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *k = DEVICE_CLASS(klass);
+k-bus_type = TYPE_VIRTIO_BUS;
+k-init = virtio_blk_qdev_init;
+k-exit = virtio_blk_qdev_exit;
+/*
+ * k-unplug = ?
+ */
+k-props = virtio_blk_properties;
+}
+
+static TypeInfo virtio_blk_info = {
+.name = virtio-blk,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(VirtIOBLKState),
+/*
+ * .abstract = true,
+ * .class_size = sizeof(?),
+ */
+.class_init = virtio_blk_class_init,
+};
+
+static void virtio_blk_register_types(void)
+{
+type_register_static(virtio_blk_info);
+}
+
+type_init(virtio_blk_register_types)
diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h
index f0740d0..cb2ea8c 100644
--- a/hw/virtio-blk.h
+++ b/hw/virtio-blk.h
@@ -16,6 +16,7 @@
 
 #include virtio.h
 #include hw/block-common.h
+#include virtio-bus.h
 
 /* from Linux's linux/virtio_blk.h */
 
@@ -107,6 +108,15 @@ struct VirtIOBlkConf
 uint32_t config_wce;
 };
 
+typedef struct {
+DeviceState qdev;
+uint32_t host_features;
+VirtIOBlkConf blk;
+} VirtIOBLKState;
+
+#define TYPE_VIRTIO_BLK virtio-blk
+#define VIRTIO_BLK(obj) OBJECT_CHECK(VirtIOBLKState, (obj), TYPE_VIRTIO_BLK)
+
 #define DEFINE_VIRTIO_BLK_FEATURES(_state, _field) \
 DEFINE_VIRTIO_COMMON_FEATURES(_state, _field), \
 DEFINE_PROP_BIT(config-wce, _state, _field, VIRTIO_BLK_F_CONFIG_WCE, 
true)
-- 
1.7.11.7

Re: [Qemu-devel] [PATCH] Reorder alarm timer setup again

On Thu, 01 Nov 2012 17:25:24 +0100
Jan Kiszka jan.kis...@web.de wrote:

 From: Jan Kiszka jan.kis...@siemens.com
 
 ac4119c023 moved the alarm timer initialization to an earlier point but
 failed to consider that it depends on qemu_init_main_loop. Instead of
 moving the wrong things before os_daemonize, better push alarm timer
 right after qemu_init_main_loop and move the chardev initialization
 after this.

Jan, it seems that you have to rebase this patch.

Re: [Qemu-devel] qmp problems with --enable-kvm

On Thu, 22 Nov 2012 13:44:24 +
Dietmar Maurer diet...@proxmox.com wrote:

   The interesting thing is that it works perfectly without --enable-kvm.
  
  Can you please try the following patch?
  
   https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg00174.html
 
 I am unable to apply that patch?
 
 patching file vl.c
 Hunk #1 FAILED at 3551.
 Hunk #2 succeeded at 3729 with fuzz 2 (offset 138 lines).

Thanks for reporting, I've replied to the original thread and put
you in the CC.

Re: [Qemu-devel] [RFC PATCH v2 0/3] Virtio-refactoring.

On 22 November 2012 14:50,  fred.kon...@greensocs.com wrote:
 There are still two issues with the command line :

 * When I use ./qemu* -device virtio-blk -device virtio-pci
   It is said that no virtio-bus are present.

Does it work if you create the devices in the other order?

 * The virtio-blk is plugged in the last created virtio-bus if no bus=
   option is present. It's an issue as we can only plug one virtio-device.

Anthony said this might Just Work (ie that we should find the
first non-full bus rather than just the first one), and if it
doesn't work it should be fairly straightforward to make it work.

-- PMM

[Qemu-devel] [RFC PATCH v2 1/3] virtio-bus : Introduce VirtioBus.

From: KONRAD Frederic fred.kon...@greensocs.com

This patch create a new VirtioBus, which can be added to Virtio transports like
virtio-pci, virtio-mmio,...

One VirtIODevice can be connected to this device, like virtio-blk in the 3rd
patch.

The VirtioBus shares through a VirtioBusInfo structure :

* two callbacks with the transport : init_cb and exit_cb, which must be
  called by the VirtIODevice, after the initialization and before the
  destruction, to put the right PCI IDs and/or stop the event fd.

* a VirtIOBindings structure.

Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
---
 hw/Makefile.objs |   1 +
 hw/virtio-bus.c  | 148 +++
 hw/virtio-bus.h  |  58 ++
 3 files changed, 207 insertions(+)
 create mode 100644 hw/virtio-bus.c
 create mode 100644 hw/virtio-bus.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index ea46f81..bd14d1b 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-y = usb/ ide/
 common-obj-y += loader.o
 common-obj-$(CONFIG_VIRTIO) += virtio-console.o
 common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
+common-obj-$(CONFIG_VIRTIO) += virtio-bus.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 common-obj-y += fw_cfg.o
 common-obj-$(CONFIG_PCI) += pci.o pci_bridge.o pci_bridge_dev.o
diff --git a/hw/virtio-bus.c b/hw/virtio-bus.c
new file mode 100644
index 000..991b6f5
--- /dev/null
+++ b/hw/virtio-bus.c
@@ -0,0 +1,148 @@
+/*
+ * VirtioBus
+ *
+ *  Copyright (C) 2012 : GreenSocs Ltd
+ *  http://www.greensocs.com/ , email: i...@greensocs.com
+ *
+ *  Developed by :
+ *  Frederic Konrad   fred.kon...@greensocs.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ *
+ */
+
+#include hw.h
+#include qemu-error.h
+#include qdev.h
+#include virtio-bus.h
+#include virtio.h
+
+#define DEBUG_VIRTIO_BUS 1
+
+#define DPRINTF(fmt, ...) if (DEBUG_VIRTIO_BUS) {\
+printf(virtio_bus:  fmt , ## __VA_ARGS__); \
+  }
+
+static void virtio_bus_init_cb(VirtioBus *bus);
+static int virtio_bus_reset(BusState *qbus);
+
+static void virtio_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+k-reset = virtio_bus_reset;
+}
+
+static TypeInfo virtio_bus_info = {
+.name = TYPE_VIRTIO_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(VirtioBus),
+.class_init = virtio_bus_class_init,
+};
+
+/* Reset the bus */
+static int virtio_bus_reset(BusState *qbus)
+{
+VirtioBus *bus = VIRTIO_BUS(qbus);
+if (bus-bus_in_use) {
+virtio_reset(bus-vdev);
+}
+return 1;
+}
+
+/* Plug the VirtIODevice */
+int virtio_bus_plug_device(VirtioBus *bus, VirtIODevice *vdev)
+{
+BusState *qbus = BUS(bus);
+/*
+ * This is a problem, when bus= option is not set, the last created
+ * virtio-bus is used. So it give the following error.
+ */
+DPRINTF(plug device into %s.\n, qbus-name);
+if (bus-bus_in_use) {
+error_report(%s in use.\n, qbus-name);
+return -1;
+}
+bus-bus_in_use = true;
+
+/* keep the VirtIODevice in the VirtioBus. */
+bus-vdev = vdev;
+
+/* call the transport callback. */
+virtio_bus_init_cb(bus);
+return 0;
+}
+
+/* Create a virtio bus.  */
+VirtioBus *virtio_bus_new(DeviceState *host, const VirtioBusInfo *info)
+{
+/*
+ * This is needed, as we want to have different names for each virtio-bus.
+ * If we don't do that, we can't add more than one VirtIODevice.
+ */
+static int next_virtio_bus;
+char *bus_name = g_strdup_printf(virtio-bus.%d, next_virtio_bus++);
+
+BusState *qbus = qbus_create(TYPE_VIRTIO_BUS, host, bus_name);
+VirtioBus *bus = VIRTIO_BUS(qbus);
+bus-info = info;
+qbus-allow_hotplug = 0;
+bus-bus_in_use = false;
+DPRINTF(%s bus created\n, bus_name);
+return bus;
+}
+
+/* Bind the VirtIODevice to the VirtioBus. */
+void virtio_bus_bind_device(VirtioBus *bus)
+{
+BusState *qbus = BUS(bus);
+assert(bus != NULL);
+assert(bus-vdev != NULL);
+virtio_bind_device(bus-vdev, (bus-info-virtio_bindings), qbus-parent);
+}
+
+/*
+ * Transport independent init.
+ * This must be called after VirtIODevice initialization.
+ */
+static void virtio_bus_init_cb(VirtioBus *bus)
+{
+BusState *qbus = BUS(bus);
+

Re: [Qemu-devel] [RFC PATCH v2 0/3] Virtio-refactoring.

2012-11-22 Thread KONRAD Frédéric


Did you get the full patchset ? The 01 and 02 seems to be lost on
the mailing list. :s.

On 22/11/2012 16:08, Peter Maydell wrote:

On 22 November 2012 14:50,  fred.kon...@greensocs.com wrote:

There are still two issues with the command line :

 * When I use ./qemu* -device virtio-blk -device virtio-pci
   It is said that no virtio-bus are present.

Does it work if you create the devices in the other order?

Yes the other order works.




 * The virtio-blk is plugged in the last created virtio-bus if no bus=
   option is present. It's an issue as we can only plug one virtio-device.

Anthony said this might Just Work (ie that we should find the
first non-full bus rather than just the first one), and if it
doesn't work it should be fairly straightforward to make it work.

Yes, maybe it is implemented in the QOM ? Because when I use
qdev_get_parent_bus(qdev) in virtio_blk_qdev_init, it gives me
the last created Virtiobus.

Fred.

Re: [Qemu-devel] [PATCH v2 39/39] raw-win32: implement native asynchronous I/O

Il 22/11/2012 14:34, Jan Kiszka ha scritto:
 
 =0 0xf77c242e __kernel_vsyscall+0xe() in [vdso].so (0x0519df88)
   1 0xf761f10b __libc_read+0x4a() in libpthread.so.0 (0x0519df88)
   2 0x7bc78c98 in ntdll (+0x68c97) (0x0519df88)
   3 0x7bc7b0a3 in ntdll (+0x6b0a2) (0x0519e1b8)
   4 0x7bc7b195 NtWaitForMultipleObjects+0x54() in ntdll (0x0519e1e8)
   5 0x7b86de9f WaitForMultipleObjectsEx+0xee() in kernel32 (0x0519e338)
   6 0x7b86df1a WaitForMultipleObjects+0x39() in kernel32 (0x0519e368)
   7 0x0040301b aio_poll+0x16a(ctx=0x157610, blocking=is not available) 
 [/data/qemu/aio-win32.c:178] in qemu-system-arm (0x0001)
   8 0x004feec3 qemu_aio_wait+0x22() [/data/qemu/main-loop.c:442] in 
 qemu-system-arm (0x0105bdec)
   9 0x00417166 bdrv_rw_co+0xa5(bs=0x159648, sector_num=internal error, 
 buf=¹¸ÿ¦
   
 , nb_sectors=0x4, is_write=true) [/data/qemu/block.c:1997] in 
 qemu-system-arm (0x0105bdec)
   10 0x00495544 in qemu-system-arm (+0x95543) (0x0105bdec)
   11 0x0049592a pflash_write+0x3b9(pfl=0xd6f16e0, offset=internal error, 
 value=0x98f7fb1d, width=0x4, be=0) [/data/qemu/hw/pflash_cfi01.c:405] in 
 qemu-system-arm (0x0105bdec)
   12 0x00618a7e io_mem_write+0xed(mr=0xd6f2a48, addr=0x105bdec, 
 val=0x98f7fb1d, size=0x4) [/data/qemu/memory.c:911] in qemu-system-arm 
 (0x0004)
   13 0x0064088c helper_stl_mmu+0x2fb(env=0x15e268, addr=0xca05bdec, 
 val=0x98f7fb1d, mmu_idx=0) [/data/qemu/softmmu_template.h:231] in 
 qemu-system-arm (0x371e)
   14 0x02136637 (0x0015e268)
 
 It's the VCPU thread stuck in aio_poll, holding the BQL, blocking the
 iothread this way as well. Is there some race that prevents we receive
 the proper IO completion signal? Any states I should check?

Ah, so it wasn't fixed by the TLS change?

Instead of tracing, can you try adding printfs in aio_worker
(block/raw-win32.c), and also in thread_pool_active and
event_notifier_ready (thread-pool.c)?

This patch can help gathering debug output, it will place it into a
console window.

diff --git a/vl.c b/vl.c
index c8e9c78..5398178 100644
--- a/vl.c
+++ b/vl.c
@@ -3531,6 +3531,10 @@ int main(int argc, char **argv, char **envp)
 }
 loc_set_none();

+AllocConsole();
+freopen(CONIN$, rb, stdin);
+freopen(CONOUT$, wb, stdout);
+freopen(CONOUT$, wb, stderr);
 if (qemu_init_main_loop()) {
 fprintf(stderr, qemu_init_main_loop failed\n);
 exit(1);

Paolo

[Qemu-devel] [RFC PATCH v2 2/3] virtio-pci : add a virtio-bus interface

From: KONRAD Frederic fred.kon...@greensocs.com

This patch add a VirtioBus in the VirtIOPCIProxy structure. It creates a new
device : virtio-pci which init the VirtioBus. Two callback are written :

* void virtio_pci_init_cb(DeviceState *dev) to initialize the PCI interface
  after the VirtIODevice init, it is approximately the same as
  virtio_init_pci.

* void virtio_pci_exit_cb(DeviceState *dev) to stop the ioevent before the
  VirtIODevice exit.

Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
---
 hw/virtio-pci.c | 152 
 hw/virtio-pci.h |   6 +++
 2 files changed, 158 insertions(+)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 71f4fb5..78aa5e8 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -22,6 +22,7 @@
 #include virtio-net.h
 #include virtio-serial.h
 #include virtio-scsi.h
+#include virtio-bus.h
 #include pci.h
 #include qemu-error.h
 #include msi.h
@@ -1117,15 +1118,166 @@ static TypeInfo virtio_scsi_info = {
 .instance_size = sizeof(VirtIOPCIProxy),
 .class_init= virtio_scsi_class_init,
 };
+/* The new virtio-pci device */
+
+void virtio_pci_init_cb(DeviceState *dev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, d);
+uint8_t *config;
+uint32_t size;
+
+/* Put the PCI IDs */
+switch (get_virtio_device_id(proxy-bus)) {
+
+case VIRTIO_ID_BLOCK:
+pci_config_set_device_id(proxy-pci_dev.config,
+ PCI_DEVICE_ID_VIRTIO_BLOCK);
+pci_config_set_class(proxy-pci_dev.config, PCI_CLASS_STORAGE_SCSI);
+break;
+default:
+error_report(unknown device id\n);
+break;
+
+}
+
+/* virtio_init_pci code without bindings */
+
+/* This should disappear */
+proxy-vdev = proxy-bus-vdev;
+
+config = proxy-pci_dev.config;
+
+if (proxy-class_code) {
+pci_config_set_class(config, proxy-class_code);
+}
+pci_set_word(config + PCI_SUBSYSTEM_VENDOR_ID,
+ pci_get_word(config + PCI_VENDOR_ID));
+pci_set_word(config + PCI_SUBSYSTEM_ID, proxy-bus-vdev-device_id);
+config[PCI_INTERRUPT_PIN] = 1;
+
+if (proxy-bus-vdev-nvectors 
+msix_init_exclusive_bar(proxy-pci_dev, proxy-bus-vdev-nvectors,
+1)) {
+proxy-bus-vdev-nvectors = 0;
+}
+
+proxy-pci_dev.config_write = virtio_write_config;
+
+size = VIRTIO_PCI_REGION_SIZE(proxy-pci_dev)
+ + proxy-bus-vdev-config_len;
+if (size  (size-1)) {
+size = 1  qemu_fls(size);
+}
+
+memory_region_init_io(proxy-bar, virtio_pci_config_ops, proxy,
+  virtio-pci, size);
+pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
+ proxy-bar);
+
+if (!kvm_has_many_ioeventfds()) {
+proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+}
+
+/* Bind the VirtIODevice to the VirtioBus. */
+virtio_bus_bind_device(proxy-bus);
+
+proxy-host_features |= 0x1  VIRTIO_F_NOTIFY_ON_EMPTY;
+proxy-host_features |= 0x1  VIRTIO_F_BAD_FEATURE;
+/* Should be modified */
+proxy-host_features = proxy-bus-vdev-get_features(proxy-bus-vdev,
+ proxy-host_features);
+}
+
+void virtio_pci_exit_cb(DeviceState *dev)
+{
+PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, d);
+/* Put the PCI IDs */
+pci_config_set_device_id(proxy-pci_dev.config, 0x);
+pci_config_set_class(proxy-pci_dev.config, PCI_CLASS_OTHERS);
+
+virtio_pci_stop_ioeventfd(proxy);
+}
+
+static const struct VirtioBusInfo virtio_bus_info = {
+.init_cb = virtio_pci_init_cb,
+.exit_cb = virtio_pci_exit_cb,
+
+.virtio_bindings = {
+.notify = virtio_pci_notify,
+.save_config = virtio_pci_save_config,
+.load_config = virtio_pci_load_config,
+.save_queue = virtio_pci_save_queue,
+.load_queue = virtio_pci_load_queue,
+.get_features = virtio_pci_get_features,
+.query_guest_notifiers = virtio_pci_query_guest_notifiers,
+.set_host_notifier = virtio_pci_set_host_notifier,
+.set_guest_notifiers = virtio_pci_set_guest_notifiers,
+.vmstate_change = virtio_pci_vmstate_change,
+}
+};
+
+static int virtiopci_qdev_init(PCIDevice *dev)
+{
+VirtIOPCIProxy *s = DO_UPCAST(VirtIOPCIProxy, pci_dev, dev);
+DeviceState *qdev = DEVICE(dev);
+
+/* create a virtio-bus and attach virtio-pci to it */
+s-bus = virtio_bus_new(qdev, virtio_bus_info);
+
+return 0;
+}
+
+static Property virtiopci_properties[] = {
+/* TODO : Add the correct properties */
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtiopci_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+
+

[Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane

This series adds the -device virtio-blk-pci,x-data-plane=on property that
enables a high performance I/O codepath.  A dedicated thread is used to process
virtio-blk requests outside the global mutex and without going through the QEMU
block layer.

Khoa Huynh k...@us.ibm.com reported an increase from 140,000 IOPS to 600,000
IOPS for a single VM using virtio-blk-data-plane in July:

  http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580

The virtio-blk-data-plane approach was originally presented at Linux Plumbers
Conference 2010.  The following slides contain a brief overview:

  
http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf

The basic approach is:
1. Each virtio-blk device has a thread dedicated to handling ioeventfd
   signalling when the guest kicks the virtqueue.
2. Requests are processed without going through the QEMU block layer using
   Linux AIO directly.
3. Completion interrupts are injected via irqfd from the dedicated thread.

To try it out:

  qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=...
   -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on

Limitations:
 * Only format=raw is supported
 * Live migration is not supported
 * Block jobs, hot unplug, and other operations fail with -EBUSY
 * I/O throttling limits are ignored
 * Only Linux hosts are supported due to Linux AIO usage

The code has reached a stage where I feel it is ready to merge.  Users have
been playing with it for some time and want the significant performance boost.

We are refactoring QEMU to get rid of the global mutex.  I believe that
virtio-blk-data-plane can eventually become the default mode of operation.

Instead of waiting for global mutex removal efforts to finish, I want to use
virtio-blk-data-plane as an example device for AioContext and threaded hw
dispatch refactoring.  This means:

1. When the block layer can bind to an AioContext and execute I/O outside the
   global mutex, virtio-blk-data-plane can use this (and gain image format
   support).

2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c
   again and perhaps run a pool of iothreads instead of dedicated data plane
   threads.

But in the meantime, I have cleaned up the virtio-blk-data-plane code so that
it can be merged as an experimental feature.

v4:
 * Add qemu_iovec_concat_iov() [Paolo]
 * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo]

v3:
 * Don't assume iovec layout [Michael]
 * Better naming for hostmem.c MemoryListener callbacks [Don]
 * More vring quarantining if commands are bogus instead of exiting [Blue]

v2:
 * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and 
everyone else pointed this out ;-)]
 * Quarantine invalid vring instead of exiting [Blue]
 * Replace __u16 kernel types with uint16_t [Blue]

Changes from the RFC v9:
 * Add x-data-plane=on|off option and coexist with regular virtio-blk code
 * Create thread from BH so it inherits iothread cpusets
 * Drain requests on vm_stop() so stopped guest does not access image file
 * Add migration blocker
 * Add bdrv_in_use() to prevent block jobs and other operations that can 
interfere
 * Drop IOQueue request merging for simplicity
 * Drop ioctl interrupt injection and always use irqfd for simplicity
 * Major cleanup to split up source files
 * Rebase from qemu-kvm.git onto qemu.git
 * Address Michael Tsirkin's review comments

Stefan Hajnoczi (11):
  raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane
  configure: add CONFIG_VIRTIO_BLK_DATA_PLANE
  dataplane: add host memory mapping code
  dataplane: add virtqueue vring code
  dataplane: add event loop
  dataplane: add Linux AIO request queue
  iov: add iov_discard() to remove data
  test-iov: add iov_discard() testcase
  iov: add qemu_iovec_concat_iov()
  dataplane: add virtio-blk data plane code
  virtio-blk: add x-data-plane=on|off performance feature

 block.h|   9 +
 block/raw-posix.c  |  34 
 configure  |  21 +++
 hw/Makefile.objs   |   2 +-
 hw/dataplane/Makefile.objs |   3 +
 hw/dataplane/event-poll.c  | 109 
 hw/dataplane/event-poll.h  |  40 +
 hw/dataplane/hostmem.c | 165 ++
 hw/dataplane/hostmem.h |  52 ++
 hw/dataplane/ioq.c | 118 +
 hw/dataplane/ioq.h |  57 ++
 hw/dataplane/virtio-blk.c  | 427 +
 hw/dataplane/virtio-blk.h  |  41 +
 hw/dataplane/vring.c   | 344 
 hw/dataplane/vring.h   |  62 +++
 hw/virtio-blk.c|  59 ++-
 hw/virtio-blk.h|   1 +
 hw/virtio-pci.c|   3 +
 iov.c  |  80 +++--
 iov.h  |  13 ++
 qemu-common.h  |   3 +
 tests/test-iov.c   | 129 ++
 trace-events   |   9 +
 23 files changed, 1767

[Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code

The data plane thread needs to map guest physical addresses to host
pointers.  Normally this is done with cpu_physical_memory_map() but the
function assumes the global mutex is held.  The data plane thread does
not touch the global mutex and therefore needs a thread-safe memory
mapping mechanism.

Hostmem registers a MemoryListener similar to how vhost collects and
pushes memory region information into the kernel.  There is a
fine-grained lock on the regions list which is held during lookup and
when installing a new regions list.

When the physical memory map changes the MemoryListener callbacks are
invoked.  They build up a new list of memory regions which is finally
installed when the list has been completed.

Note that this approach is not safe across memory hotplug because mapped
pointers may still be in used across memory unplug.  However, this is
currently a problem for QEMU in general and needs to be addressed in the
future.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   3 +
 hw/dataplane/hostmem.c | 165 +
 hw/dataplane/hostmem.h |  52 ++
 3 files changed, 220 insertions(+)
 create mode 100644 hw/dataplane/Makefile.objs
 create mode 100644 hw/dataplane/hostmem.c
 create mode 100644 hw/dataplane/hostmem.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
new file mode 100644
index 000..8c8dea1
--- /dev/null
+++ b/hw/dataplane/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_VIRTIO), y)
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
+endif
diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
new file mode 100644
index 000..48aabf0
--- /dev/null
+++ b/hw/dataplane/hostmem.c
@@ -0,0 +1,165 @@
+/*
+ * Thread-safe guest to host memory mapping
+ *
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include exec-memory.h
+#include hostmem.h
+
+static int hostmem_lookup_cmp(const void *phys_, const void *region_)
+{
+hwaddr phys = *(const hwaddr *)phys_;
+const HostmemRegion *region = region_;
+
+if (phys  region-guest_addr) {
+return -1;
+} else if (phys = region-guest_addr + region-size) {
+return 1;
+} else {
+return 0;
+}
+}
+
+/**
+ * Map guest physical address to host pointer
+ */
+void *hostmem_lookup(Hostmem *hostmem, hwaddr phys, hwaddr len, bool is_write)
+{
+HostmemRegion *region;
+void *host_addr = NULL;
+hwaddr offset_within_region;
+
+qemu_mutex_lock(hostmem-current_regions_lock);
+region = bsearch(phys, hostmem-current_regions,
+ hostmem-num_current_regions,
+ sizeof(hostmem-current_regions[0]),
+ hostmem_lookup_cmp);
+if (!region) {
+goto out;
+}
+if (is_write  region-readonly) {
+goto out;
+}
+offset_within_region = phys - region-guest_addr;
+if (offset_within_region + len = region-size) {
+host_addr = region-host_addr + offset_within_region;
+}
+out:
+qemu_mutex_unlock(hostmem-current_regions_lock);
+
+return host_addr;
+}
+
+/**
+ * Install new regions list
+ */
+static void hostmem_listener_commit(MemoryListener *listener)
+{
+Hostmem *hostmem = container_of(listener, Hostmem, listener);
+
+qemu_mutex_lock(hostmem-current_regions_lock);
+g_free(hostmem-current_regions);
+hostmem-current_regions = hostmem-new_regions;
+hostmem-num_current_regions = hostmem-num_new_regions;
+qemu_mutex_unlock(hostmem-current_regions_lock);
+
+/* Reset new regions list */
+hostmem-new_regions = NULL;
+hostmem-num_new_regions = 0;
+}
+
+/**
+ * Add a MemoryRegionSection to the new regions list
+ */
+static void hostmem_append_new_region(Hostmem *hostmem,
+  MemoryRegionSection *section)
+{
+void *ram_ptr = memory_region_get_ram_ptr(section-mr);
+size_t num = hostmem-num_new_regions;
+size_t new_size = (num + 1) * sizeof(hostmem-new_regions[0]);
+
+hostmem-new_regions = g_realloc(hostmem-new_regions, new_size);
+hostmem-new_regions[num] = (HostmemRegion){
+.host_addr = ram_ptr + section-offset_within_region,
+.guest_addr = section-offset_within_address_space,
+.size = section-size,
+.readonly = section-readonly,
+};
+hostmem-num_new_regions++;
+}
+
+static void hostmem_listener_append_region(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+Hostmem *hostmem = container_of(listener, Hostmem, listener);
+
+if (memory_region_is_ram(section-mr)) {
+hostmem_append_new_region(hostmem, section);
+}
+}
+
+/* We don't implement most MemoryListener callbacks, use these nop stubs */

[Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code

The virtio-blk-data-plane cannot access memory using the usual QEMU
functions since it executes outside the global mutex and the memory APIs
are this time are not thread-safe.

This patch introduces a virtqueue module based on the kernel's vhost
vring code.  The trick is that we map guest memory ahead of time and
access it cheaply outside the global mutex.

Once the hardware emulation code can execute outside the global mutex it
will be possible to drop this code.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/Makefile.objs   |   2 +-
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/vring.c   | 344 +
 hw/dataplane/vring.h   |  62 
 trace-events   |   3 +
 5 files changed, 411 insertions(+), 2 deletions(-)
 create mode 100644 hw/dataplane/vring.c
 create mode 100644 hw/dataplane/vring.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index ea46f81..db87fbf 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y = usb/ ide/
+common-obj-y = usb/ ide/ dataplane/
 common-obj-y += loader.o
 common-obj-$(CONFIG_VIRTIO) += virtio-console.o
 common-obj-$(CONFIG_VIRTIO) += virtio-rng.o
diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index 8c8dea1..34e6d57 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
 endif
diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c
new file mode 100644
index 000..2632fbd
--- /dev/null
+++ b/hw/dataplane/vring.c
@@ -0,0 +1,344 @@
+/* Copyright 2012 Red Hat, Inc.
+ * Copyright IBM, Corp. 2012
+ *
+ * Based on Linux vhost code:
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2006 Rusty Russell IBM Corporation
+ *
+ * Author: Michael S. Tsirkin m...@redhat.com
+ * Stefan Hajnoczi stefa...@redhat.com
+ *
+ * Inspiration, some code, and most witty comments come from
+ * Documentation/virtual/lguest/lguest.c, by Rusty Russell
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include trace.h
+#include hw/dataplane/vring.h
+
+/* Map the guest's vring to host memory */
+bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
+{
+hwaddr vring_addr = virtio_queue_get_ring_addr(vdev, n);
+hwaddr vring_size = virtio_queue_get_ring_size(vdev, n);
+void *vring_ptr;
+
+vring-broken = false;
+
+hostmem_init(vring-hostmem);
+vring_ptr = hostmem_lookup(vring-hostmem, vring_addr, vring_size, true);
+if (!vring_ptr) {
+error_report(Failed to map vring 
+ addr %# HWADDR_PRIx  size % HWADDR_PRIu,
+ vring_addr, vring_size);
+vring-broken = true;
+return false;
+}
+
+vring_init(vring-vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
+
+vring-last_avail_idx = 0;
+vring-last_used_idx = 0;
+vring-signalled_used = 0;
+vring-signalled_used_valid = false;
+
+trace_vring_setup(virtio_queue_get_ring_addr(vdev, n),
+  vring-vr.desc, vring-vr.avail, vring-vr.used);
+return true;
+}
+
+void vring_teardown(Vring *vring)
+{
+hostmem_finalize(vring-hostmem);
+}
+
+/* Toggle guest-host notifies */
+void vring_set_notification(VirtIODevice *vdev, Vring *vring, bool enable)
+{
+if (vdev-guest_features  (1  VIRTIO_RING_F_EVENT_IDX)) {
+if (enable) {
+vring_avail_event(vring-vr) = vring-vr.avail-idx;
+}
+} else if (enable) {
+vring-vr.used-flags = ~VRING_USED_F_NO_NOTIFY;
+} else {
+vring-vr.used-flags |= VRING_USED_F_NO_NOTIFY;
+}
+}
+
+/* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
+bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
+{
+uint16_t old, new;
+bool v;
+/* Flush out used index updates. This is paired
+ * with the barrier that the Guest executes when enabling
+ * interrupts. */
+smp_mb();
+
+if ((vdev-guest_features  VIRTIO_F_NOTIFY_ON_EMPTY) 
+unlikely(vring-vr.avail-idx == vring-last_avail_idx)) {
+return true;
+}
+
+if (!(vdev-guest_features  VIRTIO_RING_F_EVENT_IDX)) {
+return !(vring-vr.avail-flags  VRING_AVAIL_F_NO_INTERRUPT);
+}
+old = vring-signalled_used;
+v = vring-signalled_used_valid;
+new = vring-signalled_used = vring-last_used_idx;
+vring-signalled_used_valid = true;
+
+if (unlikely(!v)) {
+return true;
+}
+
+return vring_need_event(vring_used_event(vring-vr), new, old);
+}
+
+/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
+static int get_indirect(Vring *vring,
+struct iovec iov[], struct iovec *iov_end,
+unsigned int *out_num, unsigned int *in_num,
+struct vring_desc *indirect)
+{
+struct

Re: [Qemu-devel] [PATCH v3 09/12] iov: add iov_get_ptr() to reference vector data

On Thu, Nov 22, 2012 at 1:35 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Nov 22, 2012 at 12:58:23PM +0100, Stefan Hajnoczi wrote:
 On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
  Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
   The iov_get_ptr() data returns a pointer to contiguous data within a
   vector.  This allows the caller to manipulate data inside the vector
   without copying in/out using iov_from_buf()/iov_to_buf() when we know
   that data is contiguous within an iovec element.
 
  This works for you because you have a single byte to write.  It would
  not work for the SG_IO inhdr, which would need iov_to_buf().

 Guilty as charged, your honor. :)

 Let me give a few more details about the motivation for this function:

 In virtio-blk-data-plane we have an iovec[] array.  In the read/write
 code path we discard the inhdr/outhdr so just the data buffers are left
 in the iovec[] array.  Then we can pass the iovec[] array straight to
 the Linux AIO functions.

 Because we're using the iovec[] array for data buffers and we're not
 allowed to make assumptions about iovec layout, we cannot use
 iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
 inhdr has already been discarded from the iovec[] array.

 How about using iov_copy?

 We have exactly this problem in virtio net if we run
 on host that does not support mergeable buffer header,
 and we solve it by copying out the iovec.

 Since I knew the inhdr is only 1 byte I decided against doing something
 like dynamically allocating/freeing a QEMUIOVector which could handle
 spanning iovecs.

 That said, I think this function is okay as-is because it works fine for
 non-virtio cases where the caller *knows* the iovec[] layout.  As a
 utility function it stands on its own.


 My concern is these APIs are unsafe to use: you get back a pointer and
 you must verify length is not too big before access.  Since the iov can
 be manipulated by guest this looks like a good place to put extra
 safeguards.

  What about the following alternative API:
 
  void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
ssize_t offset, size_t *bytes);
 
  which would place the number of valid bytes (i.e. the length of the
  remainder of the iovec entry) in *bytes?
 
  Also, I think that offset == iov_size(iov, iov_cnt) should be
  acceptable, and it would be the only case in which *bytes == 0.

 Hmm...this may be more useful than the version I proposed since the
 caller can also use it to find out how many bytes are contiguous.

 Michael: Any concerns if I update the code to reflect Paolo's
 suggestion?

 Stefan

 I'd prefer something that actually works for all cases
 rather than making callers check and handle failure,
 or reason why it can't fail.

I just sent out a new version of the patch which goes whole hog and
uses a QEMUIOVector to safely access virtio_blk_inhdr regardless of
its size or iovec spanning.

Stefan

Re: [Qemu-devel] [PATCHv3] Support default block interfaces per QEMUMachine

On Thu, Nov 22, 2012 at 2:22 PM, Christian Borntraeger
borntrae...@de.ibm.com wrote:
 On 22/11/12 13:02, Markus Armbruster wrote:

 Thanks for the review, Stefan already applied the patch, though.
 Is there anything that you really want to have a followup patch
 besides this one?

Follow-up patches are probably the best step to handle any additional comments.

However, if you prefer we can replace the patch in block-next since
that tree won't see any movement until the 1.4 window opens up.

Stefan

Re: [Qemu-devel] drive_add 0 if=scsi crashes

Il 22/11/2012 15:02, Markus Armbruster ha scritto:
 Watch this:
 
 (qemu) drive_add 0 if=scsi
 Segmentation fault (core dumped)
 
 Broken in commit 0d936928 qdev: Convert busses to QEMU Object Model.
 Before:
 
 (qemu) drive_add 0 if=scsi
 Device is not a SCSI adapter

Seems more of an improvement to me. :)

Can you open a bug for Fedora or launchpad?

Paolo

[Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code

virtio-blk-data-plane is a subset implementation of virtio-blk.  It only
handles read, write, and flush requests.  It does this using a dedicated
thread that executes an epoll(2)-based event loop and processes I/O
using Linux AIO.

This approach performs very well but can be used for raw image files
only.  The number of IOPS achieved has been reported to be several times
higher than the existing virtio-blk implementation.

Eventually it should be possible to unify virtio-blk-data-plane with the
main body of QEMU code once the block layer and hardware emulation is
able to run outside the global mutex.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/virtio-blk.c  | 427 +
 hw/dataplane/virtio-blk.h  |  41 +
 trace-events   |   6 +
 4 files changed, 475 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/virtio-blk.c
 create mode 100644 hw/dataplane/virtio-blk.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index abd408f..682aa9e 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
ioq.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o 
ioq.o virtio-blk.o
 endif
diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
new file mode 100644
index 000..9b29969
--- /dev/null
+++ b/hw/dataplane/virtio-blk.c
@@ -0,0 +1,427 @@
+/*
+ * Dedicated thread for virtio-blk I/O processing
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include trace.h
+#include iov.h
+#include event-poll.h
+#include qemu-thread.h
+#include vring.h
+#include ioq.h
+#include hw/virtio-blk.h
+#include hw/dataplane/virtio-blk.h
+
+enum {
+SEG_MAX = 126,  /* maximum number of I/O segments */
+VRING_MAX = SEG_MAX + 2,/* maximum number of vring descriptors */
+REQ_MAX = VRING_MAX,/* maximum number of requests in the vring,
+ * is VRING_MAX / 2 with traditional and
+ * VRING_MAX with indirect descriptors */
+};
+
+typedef struct {
+struct iocb iocb;   /* Linux AIO control block */
+QEMUIOVector *inhdr;/* iovecs for virtio_blk_inhdr */
+unsigned int head;  /* vring descriptor index */
+} VirtIOBlockRequest;
+
+struct VirtIOBlockDataPlane {
+bool started;
+QEMUBH *start_bh;
+QemuThread thread;
+
+int fd; /* image file descriptor */
+
+VirtIODevice *vdev;
+Vring vring;/* virtqueue vring */
+EventNotifier *guest_notifier;  /* irq */
+
+EventPoll event_poll;   /* event poller */
+EventHandler io_handler;/* Linux AIO completion handler */
+EventHandler notify_handler;/* virtqueue notify handler */
+
+IOQueue ioqueue;/* Linux AIO queue (should really be per
+   dataplane thread) */
+VirtIOBlockRequest requests[REQ_MAX]; /* pool of requests, managed by the
+ queue */
+
+unsigned int num_reqs;
+QemuMutex num_reqs_lock;
+QemuCond no_reqs_cond;
+};
+
+/* Raise an interrupt to signal guest, if necessary */
+static void notify_guest(VirtIOBlockDataPlane *s)
+{
+if (!vring_should_notify(s-vdev, s-vring)) {
+return;
+}
+
+event_notifier_set(s-guest_notifier);
+}
+
+static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
+{
+VirtIOBlockDataPlane *s = opaque;
+VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
+struct virtio_blk_inhdr hdr;
+int len;
+
+if (likely(ret = 0)) {
+hdr.status = VIRTIO_BLK_S_OK;
+len = ret;
+} else {
+hdr.status = VIRTIO_BLK_S_IOERR;
+len = 0;
+}
+
+trace_virtio_blk_data_plane_complete_request(s, req-head, ret);
+
+qemu_iovec_from_buf(req-inhdr, 0, hdr, sizeof(hdr));
+qemu_iovec_destroy(req-inhdr);
+g_slice_free(QEMUIOVector, req-inhdr);
+
+/* According to the virtio specification len should be the number of bytes
+ * written to, but for virtio-blk it seems to be the number of bytes
+ * transferred plus the status bytes.
+ */
+vring_push(s-vring, req-head, len + sizeof(hdr));
+
+qemu_mutex_lock(s-num_reqs_lock);
+if (--s-num_reqs == 0) {
+qemu_cond_broadcast(s-no_reqs_cond);
+}
+qemu_mutex_unlock(s-num_reqs_lock);
+}
+
+static void fail_request_early(VirtIOBlockDataPlane *s, unsigned int head,
+

[Qemu-devel] [PATCH v4 05/11] dataplane: add event loop

Outside the safety of the global mutex we need to poll on file
descriptors.  I found epoll(2) is a convenient way to do that, although
other options could replace this module in the future (such as an
AioContext-based loop or glib's GMainLoop).

One important feature of this small event loop implementation is that
the loop can be terminated in a thread-safe way.  This allows QEMU to
stop the data plane thread cleanly.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/dataplane/Makefile.objs |   2 +-
 hw/dataplane/event-poll.c  | 109 +
 hw/dataplane/event-poll.h  |  40 +
 3 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 hw/dataplane/event-poll.c
 create mode 100644 hw/dataplane/event-poll.h

diff --git a/hw/dataplane/Makefile.objs b/hw/dataplane/Makefile.objs
index 34e6d57..e26bd7d 100644
--- a/hw/dataplane/Makefile.objs
+++ b/hw/dataplane/Makefile.objs
@@ -1,3 +1,3 @@
 ifeq ($(CONFIG_VIRTIO), y)
-common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o
+common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += hostmem.o vring.o event-poll.o
 endif
diff --git a/hw/dataplane/event-poll.c b/hw/dataplane/event-poll.c
new file mode 100644
index 000..4a53d48
--- /dev/null
+++ b/hw/dataplane/event-poll.c
@@ -0,0 +1,109 @@
+/*
+ * Event loop with file descriptor polling
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include sys/epoll.h
+#include hw/dataplane/event-poll.h
+
+/* Add an event notifier and its callback for polling */
+void event_poll_add(EventPoll *poll, EventHandler *handler,
+EventNotifier *notifier, EventCallback *callback)
+{
+struct epoll_event event = {
+.events = EPOLLIN,
+.data.ptr = handler,
+};
+handler-notifier = notifier;
+handler-callback = callback;
+if (epoll_ctl(poll-epoll_fd, EPOLL_CTL_ADD,
+  event_notifier_get_fd(notifier), event) != 0) {
+fprintf(stderr, failed to add event handler to epoll: %m\n);
+exit(1);
+}
+}
+
+/* Event callback for stopping the event_poll_run() loop */
+static bool handle_stop(EventHandler *handler)
+{
+return false; /* stop event loop */
+}
+
+void event_poll_init(EventPoll *poll)
+{
+/* Create epoll file descriptor */
+poll-epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+if (poll-epoll_fd  0) {
+fprintf(stderr, epoll_create1 failed: %m\n);
+exit(1);
+}
+
+/* Set up stop notifier */
+if (event_notifier_init(poll-stop_notifier, 0)  0) {
+fprintf(stderr, failed to init stop notifier\n);
+exit(1);
+}
+event_poll_add(poll, poll-stop_handler,
+   poll-stop_notifier, handle_stop);
+}
+
+void event_poll_cleanup(EventPoll *poll)
+{
+event_notifier_cleanup(poll-stop_notifier);
+close(poll-epoll_fd);
+poll-epoll_fd = -1;
+}
+
+/* Block until the next event and invoke its callback
+ *
+ * Signals must be masked, EINTR should never happen.  This is true for QEMU
+ * threads.
+ */
+static bool event_poll(EventPoll *poll)
+{
+EventHandler *handler;
+struct epoll_event event;
+int nevents;
+
+/* Wait for the next event.  Only do one event per call to keep the
+ * function simple, this could be changed later. */
+nevents = epoll_wait(poll-epoll_fd, event, 1, -1);
+if (unlikely(nevents != 1)) {
+fprintf(stderr, epoll_wait failed: %m\n);
+exit(1); /* should never happen */
+}
+
+/* Find out which event handler has become active */
+handler = event.data.ptr;
+
+/* Clear the eventfd */
+event_notifier_test_and_clear(handler-notifier);
+
+/* Handle the event */
+return handler-callback(handler);
+}
+
+void event_poll_run(EventPoll *poll)
+{
+while (event_poll(poll)) {
+/* do nothing */
+}
+}
+
+/* Stop the event_poll_run() loop
+ *
+ * This function can be used from another thread.
+ */
+void event_poll_stop(EventPoll *poll)
+{
+event_notifier_set(poll-stop_notifier);
+}
diff --git a/hw/dataplane/event-poll.h b/hw/dataplane/event-poll.h
new file mode 100644
index 000..5e1771f
--- /dev/null
+++ b/hw/dataplane/event-poll.h
@@ -0,0 +1,40 @@
+/*
+ * Event loop with file descriptor polling
+ *
+ * Copyright 2012 IBM, Corp.
+ * Copyright 2012 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *   Stefan Hajnoczi stefa...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef EVENT_POLL_H
+#define EVENT_POLL_H
+
+#include event_notifier.h
+
+typedef struct EventHandler EventHandler;
+typedef bool EventCallback(EventHandler *handler);
+struct EventHandler {
+

[Qemu-devel] [PATCH v4 01/11] raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane

The raw_get_aio_fd() function allows virtio-blk-data-plane to get the
file descriptor of a raw image file with Linux AIO enabled.  This
interface is really a layering violation that can be resolved once the
block layer is able to run outside the global mutex - at that point
virtio-blk-data-plane will switch from custom Linux AIO code to using
the block layer.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.h   |  9 +
 block/raw-posix.c | 34 ++
 2 files changed, 43 insertions(+)

diff --git a/block.h b/block.h
index 722c620..2dc6aaf 100644
--- a/block.h
+++ b/block.h
@@ -365,6 +365,15 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
+#ifdef CONFIG_LINUX_AIO
+int raw_get_aio_fd(BlockDriverState *bs);
+#else
+static inline int raw_get_aio_fd(BlockDriverState *bs)
+{
+return -ENOTSUP;
+}
+#endif
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
diff --git a/block/raw-posix.c b/block/raw-posix.c
index f2f0404..fc04981 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1768,6 +1768,40 @@ static BlockDriver bdrv_host_cdrom = {
 };
 #endif /* __FreeBSD__ */
 
+#ifdef CONFIG_LINUX_AIO
+/**
+ * Return the file descriptor for Linux AIO
+ *
+ * This function is a layering violation and should be removed when it becomes
+ * possible to call the block layer outside the global mutex.  It allows the
+ * caller to hijack the file descriptor so I/O can be performed outside the
+ * block layer.
+ */
+int raw_get_aio_fd(BlockDriverState *bs)
+{
+BDRVRawState *s;
+
+if (!bs-drv) {
+return -ENOMEDIUM;
+}
+
+if (bs-drv == bdrv_find_format(raw)) {
+bs = bs-file;
+}
+
+/* raw-posix has several protocols so just check for raw_aio_readv */
+if (bs-drv-bdrv_aio_readv != raw_aio_readv) {
+return -ENOTSUP;
+}
+
+s = bs-opaque;
+if (!s-use_aio) {
+return -ENOTSUP;
+}
+return s-fd;
+}
+#endif /* CONFIG_LINUX_AIO */
+
 static void bdrv_file_init(void)
 {
 /*
-- 
1.8.0

[Qemu-devel] [PATCH v4 09/11] iov: add qemu_iovec_concat_iov()

The qemu_iovec_concat() function copies a subset of a QEMUIOVector.  The
new qemu_iovec_concat_iov() function does the same for a iov/cnt pair.

It is easy to define qemu_iovec_concat() in terms of
qemu_iovec_concat_iov().  The existing code is mostly unchanged, except
for the assertion src-size = soffset, which cannot be efficiently
checked upfront on a iov/cnt pair.  Instead we assert upon hitting the
end of src with an unsatisfied soffset.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 iov.c | 39 +++
 qemu-common.h |  3 +++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/iov.c b/iov.c
index 6eed089..1d4c5fe 100644
--- a/iov.c
+++ b/iov.c
@@ -289,34 +289,49 @@ void qemu_iovec_add(QEMUIOVector *qiov, void *base, 
size_t len)
 }
 
 /*
- * Concatenates (partial) iovecs from src to the end of dst.
+ * Concatenates (partial) iovecs from src_iov to the end of dst.
  * It starts copying after skipping `soffset' bytes at the
  * beginning of src and adds individual vectors from src to
  * dst copies up to `sbytes' bytes total, or up to the end
- * of src if it comes first.  This way, it is okay to specify
+ * of src_iov if it comes first.  This way, it is okay to specify
  * very large value for `sbytes' to indicate up to the end
  * of src.
  * Only vector pointers are processed, not the actual data buffers.
  */
-void qemu_iovec_concat(QEMUIOVector *dst,
-   QEMUIOVector *src, size_t soffset, size_t sbytes)
+void qemu_iovec_concat_iov(QEMUIOVector *dst,
+   struct iovec *src_iov, unsigned int src_cnt,
+   size_t soffset, size_t sbytes)
 {
 int i;
 size_t done;
-struct iovec *siov = src-iov;
 assert(dst-nalloc != -1);
-assert(src-size = soffset);
-for (i = 0, done = 0; done  sbytes  i  src-niov; i++) {
-if (soffset  siov[i].iov_len) {
-size_t len = MIN(siov[i].iov_len - soffset, sbytes - done);
-qemu_iovec_add(dst, siov[i].iov_base + soffset, len);
+for (i = 0, done = 0; done  sbytes  i  src_cnt; i++) {
+if (soffset  src_iov[i].iov_len) {
+size_t len = MIN(src_iov[i].iov_len - soffset, sbytes - done);
+qemu_iovec_add(dst, src_iov[i].iov_base + soffset, len);
 done += len;
 soffset = 0;
 } else {
-soffset -= siov[i].iov_len;
+soffset -= src_iov[i].iov_len;
 }
 }
-/* return done; */
+assert(soffset == 0); /* offset beyond end of src */
+}
+
+/*
+ * Concatenates (partial) iovecs from src to the end of dst.
+ * It starts copying after skipping `soffset' bytes at the
+ * beginning of src and adds individual vectors from src to
+ * dst copies up to `sbytes' bytes total, or up to the end
+ * of src if it comes first.  This way, it is okay to specify
+ * very large value for `sbytes' to indicate up to the end
+ * of src.
+ * Only vector pointers are processed, not the actual data buffers.
+ */
+void qemu_iovec_concat(QEMUIOVector *dst,
+   QEMUIOVector *src, size_t soffset, size_t sbytes)
+{
+qemu_iovec_concat_iov(dst, src-iov, src-niov, soffset, sbytes);
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
diff --git a/qemu-common.h b/qemu-common.h
index cef264c..4cc63e1 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -379,6 +379,9 @@ void qemu_iovec_init_external(QEMUIOVector *qiov, struct 
iovec *iov, int niov);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
 void qemu_iovec_concat(QEMUIOVector *dst,
QEMUIOVector *src, size_t soffset, size_t sbytes);
+void qemu_iovec_concat_iov(QEMUIOVector *dst,
+   struct iovec *src_iov, unsigned int src_cnt,
+   size_t soffset, size_t sbytes);
 void qemu_iovec_destroy(QEMUIOVector *qiov);
 void qemu_iovec_reset(QEMUIOVector *qiov);
 size_t qemu_iovec_to_buf(QEMUIOVector *qiov, size_t offset,
-- 
1.8.0

[Qemu-devel] [PATCH v4 08/11] test-iov: add iov_discard() testcase