date:20150508

Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Markus Armbruster

Paolo Bonzini pbonz...@redhat.com writes:

 Right now, NBD includes potentially platform-specific error values in
 the wire protocol.

Design flaw.

 Luckily, most common error values are more or less universal: in
 particular, of all errno values = 34 (up to ERANGE), they are all
 the same on supported platforms except for 11 (which is EAGAIN on
 Windows and Linux, but EDEADLK on Darwin and the *BSDs).

Can EAGAIN or EDEADLK happen?  I don't know is an acceptable answer :)

 So, in order to guarantee some portability, only keep a dozen
 possible error codes and squash everything else to EINVAL.

Ugh.  I guess it'll do.

Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in
addition to 1, 5, ...

If backward compatibility is not an issue: s/in addition to/instead of/.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  nbd.c | 47 +++
  1 file changed, 47 insertions(+)

 diff --git a/nbd.c b/nbd.c
 index eea8c51..1ad5b66 100644
 --- a/nbd.c
 +++ b/nbd.c
 @@ -86,6 +86,37 @@
  #define NBD_OPT_ABORT   (2)
  #define NBD_OPT_LIST(3)
  
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */

Is the protocol defined anywhere?

 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return 1;
 +case EIO:
 +return 5;
 +case ENXIO:
 +return 6;
 +case E2BIG:
 +return 7;
 +case ENOMEM:
 +return 12;
 +case EACCES:
 +return 13;
 +case EFBIG:
 +return 27;
 +case ENOSPC:
 +return 28;
 +case EROFS:
 +return 30;
 +case EINVAL:
 +default:
 +return 22;
 +}
 +}
 +

This maps recognized OS errnos to NBD errnos.  The latter are literals.

  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
 *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +/* NBD errors should be universally equal to the corresponding
 + * errno values, check it here.
 + */
 +QEMU_BUILD_BUG_ON(EPERM != 1);
 +QEMU_BUILD_BUG_ON(EIO != 5);
 +QEMU_BUILD_BUG_ON(ENXIO != 6);
 +QEMU_BUILD_BUG_ON(E2BIG != 7);
 +QEMU_BUILD_BUG_ON(ENOMEM != 12);
 +QEMU_BUILD_BUG_ON(EACCES != 13);
 +QEMU_BUILD_BUG_ON(EINVAL != 22);
 +QEMU_BUILD_BUG_ON(EFBIG != 27);
 +QEMU_BUILD_BUG_ON(ENOSPC != 28);
 +QEMU_BUILD_BUG_ON(EROFS != 30);
 +

This checks that the mapping above is the identify function for all the
recognized NBD errnos.  Why is that necessary?

Same literals as above.  Violates DRY.  I don't mind all that much, but
wonder whether we could at least do the checking next to
system_errno_to_nbd_errno().

  TRACE(Got reply: 
{ magic = 0x%x, .error = %d, handle = % PRIu64 },
magic, reply-error, reply-handle);
 @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply 
 *reply)
  uint8_t buf[NBD_REPLY_SIZE];
  ssize_t ret;
  
 +reply-error = system_errno_to_nbd_errno(reply-error);
 +
  /* Reply
 [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
 [ 4 ..  7]error   (0 == no error)

Re: [Qemu-devel] [PATCH v3 4/5] qtest: precompute hex nibs

2015-05-08 Thread Markus Armbruster

Eric Blake ebl...@redhat.com writes:

 On 05/06/2015 10:18 AM, John Snow wrote:

 To find out, add just buffering.  Something like this in your patch
 instead of byte2hex():

  for (i = 0; i  len; i++) {
 -qtest_sendf(chr, %02x, data[i]);
 +snprintf(enc[i * 2], 2, %02x, data[i]);
  }

 If the speedup is pretty much entirely due to buffering (which I
 suspect), then your commit message could use a bit of love :)

 
 When you're right, you're right. The difference may not be statistically
 meaningful, but with today's current planetary alignment, using
 sprintf() to batch the sends instead of my home-rolled nib computation
 function, I can eke out a few more tenths of a second.

 I'm a bit surprised - making a function call per byte generally executes
 more instructions than open-coding the conversion (albeit the branch
 prediction in the hardware probably does fairly well over long strings,
 since it is a tight and predictable loop).  Remember, sprintf() has to
 decode the format string on every call (unless the compiler is smart
 enough to open-code what sprintf would do).

John's measurements show that the speed difference between snprintf()
and a local copy of formatting code gets thoroughly drowned in noise.

The snprintf() version takes 18 lines less, according to diffstat.  Less
code, same measured performance, what's not to like?

However, if you feel strongly about avoiding snprintf() here, I won't
argue further.  Except for the commit message: it needs to be fixed not
to claim avoiding printf and friends makes a speed difference.

Re: [Qemu-devel] [PATCH 0/7] virtio: inline private qdev properties into virtio devices

2015-05-08 Thread Paolo Bonzini

No, he is on vacation this week. Sorry for the delay!

Of you are doing cleanups in virtio, perhaps you can look into using alias 
properties for virtio-balloon's QOM properties (for example the statistics). 
The code is currently using object_property_add and manually-written 
getters/setters.

Thanks,

Paolo


-Original Message-
From: Shannon Zhao [shannon.z...@linaro.org]
Received: venerdì, 08 mag 2015, 3:24
To: qemu-devel@nongnu.org
CC: peter.mayd...@linaro.org, christoffer.d...@linaro.org, m...@redhat.com, 
pbonz...@redhat.com, peter.huangp...@huawei.com, hangaoh...@huawei.com, 
zhaoshengl...@huawei.com
Subject: Re: [PATCH 0/7] virtio: inline private qdev properties into virtio 
devices


On 2015/4/29 23:24, Shannon Zhao wrote:
 The private qdev properties of virtio devices are only used by
 themselves. As Peter suggested and like what virtio-blk has done, we
 should move the private qdev properties into devices and don't expose
 them to avoid wrongly use.
 
 This patchset is based on following patchset which moves host features
 to backends.
 http://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03785.html
 
 Shannon Zhao (7):
   virtio-net: move qdev properties into virtio-net.c
   virtio-net.h: Remove unsed DEFINE_VIRTIO_NET_PROPERTIES
   virtio-scsi: move qdev properties into virtio-scsi.c
   virtio-rng: move qdev properties into virtio-rng.c
   virtio-serial-bus: move qdev properties into virtio-serial-bus.c
   virtio-9p-device: move qdev properties into virtio-9p-device.c
   vhost-scsi: move qdev properties into vhost-scsi.c

Have any maintainer picked up these patches?

-- 
Shannon

Re: [Qemu-devel] [PATCH] configure: bump glib version to 2.16

2015-05-08 Thread Markus Armbruster

John Snow js...@redhat.com writes:

 On 05/07/2015 08:22 AM, Peter Maydell wrote:
 On 10 March 2015 at 17:45, Peter Maydell peter.mayd...@linaro.org wrote:
 On 10 March 2015 at 17:43, John Snow js...@redhat.com wrote:
 Wasn't aware we were actually going through with that; it had looked
 like we were going to refrain from fiddling with it because we found
 a workaround that sufficed for glib 2.12.

 Well, we're not doing anything before QEMU 2.3 is released.
 But if we bump the version requirement at all, then we might
 as well move it to 2.22, because once you drop the support RHEL5
 requirement then that's the newest version we can shift to without
 breaking compilation on some other distro we care about (SUSE, in
 this case).
 
 So do people still want to bump the minimum glib version?
 If we're going to do it for 2.4 I would prefer to see the
 configure patch reasonably early in the release cycle I think.
 
 -- PMM
 

 I think we determined that the minimum version could be as high as 2.22.

 https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg00488.html

I support upgrading to a more recent version of GLib as much as ever.
Let's go for 2.22.

Re: [Qemu-devel] [PULL 3/6] console-gl: add opengl rendering helper functions

2015-05-08 Thread Gerd Hoffmann

On Fr, 2015-05-08 at 00:28 +0200, Alexander Graf wrote:
 
 On 05.05.15 11:43, Gerd Hoffmann wrote:
  Signed-off-by: Gerd Hoffmann kra...@redhat.com
  Reviewed-by: Max Reitz mre...@redhat.com
  ---
 
 [...]
 
  +void surface_gl_create_texture(ConsoleGLState *gls,
  +   DisplaySurface *surface)
  +{
  +assert(gls);
  +assert(surface_stride(surface) % surface_bytes_per_pixel(surface) == 
  0);
  +
  +switch (surface-format) {
  +case PIXMAN_BE_b8g8r8x8:
  +case PIXMAN_BE_b8g8r8a8:
  +surface-glformat = GL_BGRA_EXT;
  +surface-gltype = GL_UNSIGNED_BYTE;
  +break;
  +case PIXMAN_r5g6b5:
  +surface-glformat = GL_RGB;
  +surface-gltype = GL_UNSIGNED_SHORT_5_6_5;
  +break;
  +default:
  +g_assert_not_reached();
  +}
  +
  +glGenTextures(1, surface-texture);
  +glEnable(GL_TEXTURE_2D);
  +glBindTexture(GL_TEXTURE_2D, surface-texture);
  +glPixelStorei(GL_UNPACK_ROW_LENGTH_EXT,
 
 This doesn't compile for me on SLES11:
 
 ui/console-gl.c: In function ‘surface_gl_create_texture’:
 ui/console-gl.c:97:19: error: ‘GL_UNPACK_ROW_LENGTH_EXT’ undeclared
 (first use in this function)
 ui/console-gl.c:97:19: note: each undeclared identifier is reported only
 once for each function it appears in
 ui/console-gl.c: In function ‘surface_gl_update_texture’:
 ui/console-gl.c:117:19: error: ‘GL_UNPACK_ROW_LENGTH_EXT’ undeclared
 (first use in this function)
 make: *** [ui/console-gl.o] Error 1
 make: *** Waiting for unfinished jobs

Which mesa version is this?

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers

2015-05-08 Thread Pavel Fedin

 Hello!

 Hm, weren't there some patches for irqfd on arm?

 Yes, there were. However, they had a design problem by breaking backwards 
compatibility
with unmodified virtio. Their idea was to set up one more shared memory area 
between
virtio and vhost-net and use it to pass ISR value, which helps to distinguish, 
which event
took place (queue update or config change). So, this idea was rejected.
 http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html

 I thought about it, and technically, i think, this can be done in better way. 
Actually,
as far as i understood, all we need is mechanism for distinguishing between 
these two
events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ signals 
exactly
one type of event. MSI-X code also has two IRQs mode as a failsafe, where one 
IRQ
signals config change and another IRQ signals queues update (and all queues are 
polled in
turn). I think a similar thing could be done for virtio-mmio. It could allocate 
two IRQs
instead of one and describe both of them in the device tree. Guest side, upon 
seeing that,
could make use of those two IRQs and acknowledge to the host side that yes, i 
am new
version and use new mode.
 But, sorry, i will unlikely implement this, because we already have PCI with 
MSI-X (i
hope this is going to be published soon), so my project can use PCI emulation. 
So
implementing irqfds for virtio-mmio is a bit out of my scope.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH] docs: update BLOCK_IMAGE_CORRUPTED documentation

2015-05-08 Thread Michael Tokarev

Applied to -trivial, thank you!

/mjt

[Qemu-devel] [Bug 1452904] [NEW] High CPU in idle Windows guest

2015-05-08 Thread tglanfield

Public bug reported:

Hi,

I'm running a freshly installed Windows 7 domU on an up-to-date Debian
jessie machine running Xen 4.4.1-9.  When the Windows machine is idle,
I'm seeing upwards of 10% CPU usage from the qemu-system-i386 instance.
Other Linux and FreeBSD machines register negligable CPU usage (0.5%).
The server is an HP Proliant DL360 G7.

Data from perf attacthed to the process might give the best clues.

Any information as to why this processes is comsuming so much CPU would
be much appreciated.

Regards,
Terry.

# uname -a
Linux xen 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 
GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:Debian GNU/Linux 8.0 (jessie)
Release:8.0
Codename:   jessie

# qemu-system-i386 --version
QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11), Copyright (c) 2003-2008 
Fabrice Bellard

# cat win7.cfg
name= 'win7'
vcpus   = '1'
memory  = '4096'
builder = 'hvm'
disk= [ '/vms/xen/domains/desk_win7/disk1.img,qcow2,hda,rw',
'/vms/xen/domains/desk_win7/disk2.img,qcow2,hdb,rw' ]
vfb = [ 'vnc=1,vnclisten=0.0.0.0,sdl=0' ]
vif = [ 'mac=00:16:3E:FB:FC:39,bridge=xenbr0' ]
serial  = 'pty'
vnc = 1
vnclisten   = '0.0.0.0'
sdl = 0
usbdevice   = 'mouse'
audio   = 1
soundhw = 'sb16,es1370'
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash= 'restart'

# /usr/bin/qemu-system-i386 -nodefaults -name desk_win7 -boot order=cda
-usb -usbdevice mouse -device
rtl8139,id=nic0,netdev=net0,mac=00:16:3e:fb:fc:39 -netdev
type=tap,id=net0,ifname=vif12.0-emu,script=no,downscript=no -m 4088
-drive
file=/vms/xen/domains/desk_win7/disk1.img,if=ide,index=0,media=disk,format=qcow2,cache=writeback
-drive
file=/vms/xen/domains/desk_win7/disk2.img,if=ide,index=1,media=disk,format=qcow2,cache=writeback
-vnc 0.0.0.0:0,to=99 -display none -device cirrus-vga  -global
vga.vram_size_mb=8

# top
Tasks: 134 total,   1 running, 133 sleeping,   0 stopped,   0 zombie
%Cpu(s): 16.6 us,  5.5 sy,  0.0 ni,  0.0 id, 77.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:354420 total,   348884 used, 5536 free,  184 buffers
KiB Swap: 17575932 total,  2642868 used, 14933064 free. 4132 cached Mem

  PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 
12297 root  20   0 6683864 192944   1140 S 20.6 54.4   2:25.51 qemu-syste+ 

# xen dmesg
(XEN) Xen version 4.4.1 (Debian 4.4.1-9) (wa...@debian.org) (gcc (Debian 
4.9.2-10) 4.9.2) debug=n Mon Apr  6 18:24:28 UTC 2015
(XEN) Bootloader: GRUB 2.02~beta2-22
(XEN) Command line: placeholder dom0_mem=512M dom0_max_vcpus=1 dom0_vcpus_pin 
radeon.modeset=0 iommu=no-intremap
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 2 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009f400 (usable)
(XEN)  0009f400 - 000a (reserved)
(XEN)  000f - 0010 (reserved)
(XEN)  0010 - cf62f000 (usable)
(XEN)  cf62f000 - cf63c000 (ACPI data)
(XEN)  cf63c000 - cf63d000 (usable)
(XEN)  cf63d000 - d400 (reserved)
(XEN)  fec0 - fee1 (reserved)
(XEN)  ff80 - 0001 (reserved)
(XEN)  0001 - 0004a000 (usable)
(XEN) ACPI: RSDP 000F4F00, 0024 (r2 HP)
(XEN) ACPI: XSDT CF630140, 00B4 (r1 HP ProLiant2   Ò 162E)
(XEN) ACPI: FACP CF630240, 00F4 (r3 HP ProLiant2   Ò 162E)
(XEN) ACPI: DSDT CF630340, 20BD (r1 HP DSDT1 INTL 20030228)
(XEN) ACPI: FACS CF62F100, 0040
(XEN) ACPI: SPCR CF62F140, 0050 (r1 HP SPCRRBSU1   Ò 162E)
(XEN) ACPI: MCFG CF62F1C0, 003C (r1 HP ProLiant1 0)
(XEN) ACPI: HPET CF62F200, 0038 (r1 HP ProLiant2   Ò 162E)
(XEN) ACPI:  CF62F240, 0064 (r2 HP ProLiant2   Ò 162E)
(XEN) ACPI: SPMI CF62F2C0, 0040 (r5 HP ProLiant1   Ò 162E)
(XEN) ACPI: ERST CF62F300, 01D0 (r1 HP ProLiant1   Ò 162E)
(XEN) ACPI: APIC CF62F500, 015E (r1 HP ProLiant2 0)
(XEN) ACPI: SRAT CF62F680, 0570 (r1 HP Proliant1   Ò 162E)
(XEN) ACPI:  CF62FC00, 0176 (r1 HP ProLiant1   Ò 162E)
(XEN) ACPI: BERT CF62FD80, 0030 (r1 HP ProLiant1   Ò 162E)
(XEN) ACPI: HEST CF62FDC0, 00BC (r1 HP ProLiant1   Ò 162E)
(XEN) ACPI: DMAR CF62FE80, 0150 (r1 HP ProLiant1   Ò 162E)
(XEN) ACPI: SSDT CF632400, 0125 (r3 HP  CRSPCI02   HP1)
(XEN) ACPI: SSDT CF632540, 01CF (r3 HP  riser1a2 INTL 20061109)
(XEN) ACPI: SSDT CF632740, 03BB

Re: [Qemu-devel] [PATCH v6 22/22] hw/arm/virt: Enable dynamic generation of ACPI v5.1 tables

2015-05-08 Thread Shannon Zhao

On 2015/5/8 0:09, Peter Maydell wrote:
 On 7 May 2015 at 10:29, Shannon Zhao zhaoshengl...@huawei.com wrote:
 From: Shannon Zhao shannon.z...@linaro.org

 Expose the needed device information to the table generation
 insfrastructure and register a machine_init_done notify to
 
 infrastructure.
 
 call virt_acpi_build().

 Add CONFIG_ACPI to arm-softmmu.mak.

 Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com
 Signed-off-by: Shannon Zhao shannon.z...@linaro.org
 ---
  default-configs/arm-softmmu.mak  |  1 +
  default-configs/i386-softmmu.mak |  3 ++
  default-configs/mips-softmmu.mak |  3 ++
  default-configs/mips64-softmmu.mak   |  3 ++
  default-configs/mips64el-softmmu.mak |  3 ++
  default-configs/mipsel-softmmu.mak   |  3 ++
  default-configs/x86_64-softmmu.mak   |  3 ++
  hw/acpi/Makefile.objs|  5 ++-
  hw/arm/virt.c| 78 
 
  hw/i2c/Makefile.objs |  2 +-
  10 files changed, 94 insertions(+), 10 deletions(-)

 diff --git a/default-configs/arm-softmmu.mak 
 b/default-configs/arm-softmmu.mak
 index a767e4b..74f1db3 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -101,3 +101,4 @@ CONFIG_ALLWINNER_A10=y
  CONFIG_XIO3130=y
  CONFIG_IOH3420=y
  CONFIG_I82801B11=y
 +CONFIG_ACPI=y
 diff --git a/default-configs/i386-softmmu.mak 
 b/default-configs/i386-softmmu.mak
 index 6a74e00..d2de500 100644
 --- a/default-configs/i386-softmmu.mak
 +++ b/default-configs/i386-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
 
 This is splitting the basic CONFIG_ACPI into several pieces, right?
 I think that deserves its own patch.
 

Ok, will split this patch.

 What's the difference now between CONFIG_ACPI and CONFIG_ACPI_CORE ?
 

I didn't find a proper name. Maybe we should name it CONFIG_ACPI_x86
for core.o piix4.o ich9.o pcihp.o as these files are for x86.

  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_IDE_ISA=y
 diff --git a/default-configs/mips-softmmu.mak 
 b/default-configs/mips-softmmu.mak
 index cce2c81..c96d42d 100644
 --- a/default-configs/mips-softmmu.mak
 +++ b/default-configs/mips-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_PIIX4=y
 diff --git a/default-configs/mips64-softmmu.mak 
 b/default-configs/mips64-softmmu.mak
 index 7a88a08..d229f9e 100644
 --- a/default-configs/mips64-softmmu.mak
 +++ b/default-configs/mips64-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_PIIX4=y
 diff --git a/default-configs/mips64el-softmmu.mak 
 b/default-configs/mips64el-softmmu.mak
 index 095de43..ea31b8b 100644
 --- a/default-configs/mips64el-softmmu.mak
 +++ b/default-configs/mips64el-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_PIIX4=y
 diff --git a/default-configs/mipsel-softmmu.mak 
 b/default-configs/mipsel-softmmu.mak
 index 0e25108..9a4462e 100644
 --- a/default-configs/mipsel-softmmu.mak
 +++ b/default-configs/mipsel-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_PIIX4=y
 diff --git a/default-configs/x86_64-softmmu.mak 
 b/default-configs/x86_64-softmmu.mak
 index 46b87dd..11019b6 100644
 --- a/default-configs/x86_64-softmmu.mak
 +++ b/default-configs/x86_64-softmmu.mak
 @@ -15,6 +15,9 @@ CONFIG_PCSPK=y
  CONFIG_PCKBD=y
  CONFIG_FDC=y
  CONFIG_ACPI=y
 +CONFIG_ACPI_CORE=y
 +CONFIG_ACPI_MEMORY_HOTPLUG=y
 +CONFIG_ACPI_CPU_HOTPLUG=y
  CONFIG_APM=y
  CONFIG_I8257=y
  CONFIG_IDE_ISA=y
 diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
 index b9fefa7..511771a 100644
 --- a/hw/acpi/Makefile.objs
 +++ b/hw/acpi/Makefile.objs
 @@ -1,5 +1,6 @@
 -common-obj-$(CONFIG_ACPI) += core.o piix4.o ich9.o pcihp.o cpu_hotplug.o
 -common-obj-$(CONFIG_ACPI) += memory_hotplug.o
 +common-obj-$(CONFIG_ACPI_CORE) += core.o piix4.o ich9.o pcihp.o
 
 Why is x86 specific stuff like piix4 in the CONFIG_ACPI_CORE set?
 
 +common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 +common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
  common-obj-$(CONFIG_ACPI) += acpi_interface.o
  common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
  common-obj-$(CONFIG_ACPI) += aml-build.o
 diff --git a/hw/arm/virt.c b/hw/arm/virt.c
 index 565f573..9291045 100644
 --- a/hw/arm/virt.c
 +++ b/hw/arm/virt.c
 @@ -43,6 +43,7 @@

Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 08:45, Markus Armbruster wrote:
 Luckily, most common error values are more or less universal: in
 particular, of all errno values = 34 (up to ERANGE), they are all
 the same on supported platforms except for 11 (which is EAGAIN on
 Windows and Linux, but EDEADLK on Darwin and the *BSDs).
 
 Can EAGAIN or EDEADLK happen?  I don't know is an acceptable answer :)

No.  Even stuff like EFAULT would be a bug.

 So, in order to guarantee some portability, only keep a dozen
 possible error codes and squash everything else to EINVAL.
 
 Ugh.  I guess it'll do.
 
 Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in
 addition to 1, 5, ...

Why?  It's a binary protocol after all.  But I agree that the right
fix without backwards-compatibility would be to make the errors
something like 0x8000 to 0x8004.

Paolo

 If backward compatibility is not an issue: s/in addition to/instead of/.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  nbd.c | 47 +++
  1 file changed, 47 insertions(+)

 diff --git a/nbd.c b/nbd.c
 index eea8c51..1ad5b66 100644
 --- a/nbd.c
 +++ b/nbd.c
 @@ -86,6 +86,37 @@
  #define NBD_OPT_ABORT   (2)
  #define NBD_OPT_LIST(3)
  
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */
 
 Is the protocol defined anywhere?
 
 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return 1;
 +case EIO:
 +return 5;
 +case ENXIO:
 +return 6;
 +case E2BIG:
 +return 7;
 +case ENOMEM:
 +return 12;
 +case EACCES:
 +return 13;
 +case EFBIG:
 +return 27;
 +case ENOSPC:
 +return 28;
 +case EROFS:
 +return 30;
 +case EINVAL:
 +default:
 +return 22;
 +}
 +}
 +
 
 This maps recognized OS errnos to NBD errnos.  The latter are literals.
 
  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
 *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +/* NBD errors should be universally equal to the corresponding
 + * errno values, check it here.
 + */
 +QEMU_BUILD_BUG_ON(EPERM != 1);
 +QEMU_BUILD_BUG_ON(EIO != 5);
 +QEMU_BUILD_BUG_ON(ENXIO != 6);
 +QEMU_BUILD_BUG_ON(E2BIG != 7);
 +QEMU_BUILD_BUG_ON(ENOMEM != 12);
 +QEMU_BUILD_BUG_ON(EACCES != 13);
 +QEMU_BUILD_BUG_ON(EINVAL != 22);
 +QEMU_BUILD_BUG_ON(EFBIG != 27);
 +QEMU_BUILD_BUG_ON(ENOSPC != 28);
 +QEMU_BUILD_BUG_ON(EROFS != 30);
 +
 
 This checks that the mapping above is the identify function for all the
 recognized NBD errnos.  Why is that necessary?
 
 Same literals as above.  Violates DRY.  I don't mind all that much, but
 wonder whether we could at least do the checking next to
 system_errno_to_nbd_errno().
 
  TRACE(Got reply: 
{ magic = 0x%x, .error = %d, handle = % PRIu64 },
magic, reply-error, reply-handle);
 @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct 
 nbd_reply *reply)
  uint8_t buf[NBD_REPLY_SIZE];
  ssize_t ret;
  
 +reply-error = system_errno_to_nbd_errno(reply-error);
 +
  /* Reply
 [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
 [ 4 ..  7]error   (0 == no error)

Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 09:14, Pavel Fedin wrote:
  Hello!
 
 FWIW virt-manager 1.2.0 (just released) will do the following when creating a
 new VM:

 - aarch64 + kvm : -cpu host
 - aarch64 + tcg : -cpu cortex-a57
 - arm32 + kvm : -cpu host
 - arm32 + tcg : defer to qemu

 Though if you explicitly request 'hypervisor default' then we won't specify
 any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64
 
  virt-manager is not the only tool to create VMs...
  But, okay. Seems you just don't want to change this. Well... I still don't 
 agree and this
 default looks strange for me, but it's just me. I'm out of further arguments.

Actually I agree with your patch.  It's the same that x86 does.

Paolo

Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache

2015-05-08 Thread Alberto Garcia

On Fri 24 Apr 2015 03:04:06 PM CEST, Kevin Wolf kw...@redhat.com wrote:

  I think it would be nice to have a way to free unused cache
  entries after a while.
 
  Do you think mmap plus a periodic timer would work?
 
  I'm hesitant about changes like this because they make QEMU more
  complex, slow down the guest, and make the memory footprint
  volatile.  But if a simple solution addresses the problem, then I'd
  be happy.
 
 I understand. One possible approach would be to use the timer only if
 the cache size requested by the user is large, so
 
  1) we only try to save memory when the amount of memory to save is
 significant.
  2) we don't make the default scenario slower.
 
 I'll try to see if I can come up with a good solution (and with more
 numbers).

 I suspect that at this point, Anthony would have reminded us about
 separation of policy and mechanism. The very least we need is some
 option to control the policy (probably defaulting to no automatic
 cache size changes).

There's also the problem that with a single chunk of memory for all
cache tables it's not so easy to free individual entries.

One possible solution would be madvise() with MADV_DONTNEED, and that's
rather simple to use, but I'm unsure about its portability.

 The other option would be to let the management tool change the cache
 size options at runtime (and as it happens, my current bdrv_reopen()
 overhaul allows just that - except that a QMP interface isn't planned
 yet), but I'm not sure if it has enough information to make good
 decisions. If we know what the criteria are, they could possibly be
 exposed, though.

But what would you do there? Recreate the whole cache?

Berto

[Qemu-devel] [PULL 4/8] virtio-ccw: change realization sequence

2015-05-08 Thread Cornelia Huck

virtio-ccw has an odd sequence of realizing devices: first the
device-specific relization (net, block, ...), then the generic
realization. It feels less odd to have the generic realization
callback trigger the device-specific realization instead (and this
also matches what virtio-pci does).

One thing to note: We need to defer initializing the cu model in the
sense id data until after the device-specific realization has been
performed, as we need to refer to the virtio device's device_id.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Message-Id: 1429627016-30656-2-git-send-email-cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c | 41 +
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index c1d8288..1c2bd9d 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -642,8 +642,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 return ret;
 }
 
-static void virtio_ccw_device_realize(VirtioCcwDevice *dev,
-  VirtIODevice *vdev, Error **errp)
+static void virtio_ccw_device_realize(VirtioCcwDevice *dev, Error **errp)
 {
 unsigned int cssid = 0;
 unsigned int ssid = 0;
@@ -654,6 +653,9 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev,
 SubchDev *sch;
 int num;
 DeviceState *parent = DEVICE(dev);
+Error *err = NULL;
+VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
+VirtIODevice *vdev;
 
 sch = g_malloc0(sizeof(SubchDev));
 
@@ -766,6 +768,18 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev,
 memset(sch-id, 0, sizeof(SenseId));
 sch-id.reserved = 0xff;
 sch-id.cu_type = VIRTIO_CCW_CU_TYPE;
+
+if (k-realize) {
+k-realize(dev, err);
+}
+if (err) {
+error_propagate(errp, err);
+css_subch_assign(cssid, ssid, schid, devno, NULL);
+goto out_err;
+}
+
+/* device_id is only set after vdev has been realized */
+vdev = virtio_ccw_get_vdev(sch);
 sch-id.cu_model = vdev-device_id;
 
 /* Only the first 32 feature bits are used. */
@@ -813,10 +827,7 @@ static void virtio_ccw_net_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 static void virtio_ccw_net_instance_init(Object *obj)
@@ -839,10 +850,7 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 static void virtio_ccw_blk_instance_init(Object *obj)
@@ -879,10 +887,7 @@ static void virtio_ccw_serial_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 
@@ -904,10 +909,7 @@ static void virtio_ccw_balloon_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 static void balloon_ccw_stats_get_all(Object *obj, struct Visitor *v,
@@ -972,10 +974,7 @@ static void virtio_ccw_scsi_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 static void virtio_ccw_scsi_instance_init(Object *obj)
@@ -999,10 +998,7 @@ static void vhost_ccw_scsi_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_bool(OBJECT(vdev), true, realized, err);
 if (err) {
 error_propagate(errp, err);
-return;
 }
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 static void vhost_ccw_scsi_instance_init(Object *obj)
@@ -1030,8 +1026,6 @@ static void virtio_ccw_rng_realize(VirtioCcwDevice 
*ccw_dev, Error **errp)
 object_property_set_link(OBJECT(dev),
  OBJECT(dev-vdev.conf.rng), rng,
  NULL);
-
-virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp);
 }
 
 /* DeviceState to VirtioCcwDevice. Note: used on datapath,
@@ -1640,10 +1634,9 @@ static const TypeInfo virtio_ccw_rng = {
 static void virtio_ccw_busdev_realize(DeviceState *dev, Error **errp)
 {
 VirtioCcwDevice *_dev = (VirtioCcwDevice *)dev;
-VirtIOCCWDeviceClass *_info = VIRTIO_CCW_DEVICE_GET_CLASS(dev);

[Qemu-devel] [PULL 8/8] s390x/kvm: migrate vcpu interrupt state

2015-05-08 Thread Cornelia Huck

From: Jens Freimann jf...@linux.vnet.ibm.com

This patch adds support to migrate vcpu interrupts.
We use ioctl KVM_S390_GET_IRQ_STATE and _SET_IRQ_STATE
to get/set the complete interrupt state for a vcpu.

Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 target-s390x/cpu-qom.h |  3 +++
 target-s390x/cpu.c |  1 +
 target-s390x/cpu.h |  9 +
 target-s390x/kvm.c | 55 ++
 target-s390x/machine.c | 15 +-
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target-s390x/cpu-qom.h b/target-s390x/cpu-qom.h
index 8b376df..936ae21 100644
--- a/target-s390x/cpu-qom.h
+++ b/target-s390x/cpu-qom.h
@@ -66,6 +66,9 @@ typedef struct S390CPU {
 /* public */
 
 CPUS390XState env;
+/* needed for live migration */
+void *irqstate;
+uint32_t irqstate_saved_size;
 } S390CPU;
 
 static inline S390CPU *s390_env_get_cpu(CPUS390XState *env)
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index e0537fa..d2f9836 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -213,6 +213,7 @@ static void s390_cpu_finalize(Object *obj)
 S390CPU *cpu = S390_CPU(obj);
 
 qemu_unregister_reset(s390_cpu_machine_reset_cb, cpu);
+g_free(cpu-irqstate);
 #endif
 }
 
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index ba7d250..c557211 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -1079,6 +1079,8 @@ void kvm_s390_clear_cmma_callback(void *opaque);
 int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state);
 void kvm_s390_reset_vcpu(S390CPU *cpu);
 int kvm_s390_set_mem_limit(KVMState *s, uint64_t new_limit, uint64_t 
*hw_limit);
+void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu);
+int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu);
 #else
 static inline void kvm_s390_io_interrupt(uint16_t subchannel_id,
 uint16_t subchannel_nr,
@@ -1121,6 +1123,13 @@ static inline int kvm_s390_set_mem_limit(KVMState *s, 
uint64_t new_limit,
 {
 return 0;
 }
+static inline void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu)
+{
+}
+static inline int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu)
+{
+return 0;
+}
 #endif
 
 static inline int s390_set_memory_limit(uint64_t new_limit, uint64_t *hw_limit)
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 43ad009..aba1265 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -109,6 +109,14 @@
 #define ICPT_CPU_STOP   0x28
 #define ICPT_IO 0x40
 
+#define NR_LOCAL_IRQS 32
+/*
+ * Needs to be big enough to contain max_cpus emergency signals
+ * and in addition NR_LOCAL_IRQS interrupts
+ */
+#define VCPU_IRQ_BUF_SIZE (sizeof(struct kvm_s390_irq) * \
+   (max_cpus + NR_LOCAL_IRQS))
+
 static CPUWatchpoint hw_watchpoint;
 /*
  * We don't use a list because this structure is also used to transmit the
@@ -274,6 +282,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 {
 S390CPU *cpu = S390_CPU(cs);
 kvm_s390_set_cpu_state(cpu, cpu-env.cpu_state);
+cpu-irqstate = g_malloc0(VCPU_IRQ_BUF_SIZE);
 return 0;
 }
 
@@ -2059,6 +2068,52 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t 
cpu_state)
 return ret;
 }
 
+void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu)
+{
+struct kvm_s390_irq_state irq_state;
+CPUState *cs = CPU(cpu);
+int32_t bytes;
+
+if (!kvm_check_extension(kvm_state, KVM_CAP_S390_IRQ_STATE)) {
+return;
+}
+
+irq_state.buf = (uint64_t) cpu-irqstate;
+irq_state.len = VCPU_IRQ_BUF_SIZE;
+
+bytes = kvm_vcpu_ioctl(cs, KVM_S390_GET_IRQ_STATE, irq_state);
+if (bytes  0) {
+cpu-irqstate_saved_size = 0;
+error_report(Migration of interrupt state failed);
+return;
+}
+
+cpu-irqstate_saved_size = bytes;
+}
+
+int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu)
+{
+CPUState *cs = CPU(cpu);
+struct kvm_s390_irq_state irq_state;
+int r;
+
+if (!kvm_check_extension(kvm_state, KVM_CAP_S390_IRQ_STATE)) {
+return -ENOSYS;
+}
+
+if (cpu-irqstate_saved_size == 0) {
+return 0;
+}
+irq_state.buf = (uint64_t) cpu-irqstate;
+irq_state.len = cpu-irqstate_saved_size;
+
+r = kvm_vcpu_ioctl(cs, KVM_S390_SET_IRQ_STATE, irq_state);
+if (r) {
+error_report(Setting interrupt state failed %d, r);
+}
+return r;
+}
+
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
   uint64_t address, uint32_t data)
 {
diff --git a/target-s390x/machine.c b/target-s390x/machine.c
index a034423..7853e3c 100644
--- a/target-s390x/machine.c
+++ b/target-s390x/machine.c
@@ -28,10 +28,19 @@ static int cpu_post_load(void *opaque, int version_id)
  */
 if (kvm_enabled()) {
 kvm_s390_set_cpu_state(cpu, cpu-env.cpu_state);
+return

[Qemu-devel] [PULL 3/8] s390-virtio: clear {used, avail}_event_idx on reset as well

2015-05-08 Thread Cornelia Huck

From: Christian Borntraeger borntrae...@de.ibm.com

The old s390-virtio transport clears the vring used/avail indices in
the shared area on reset. When we enabled event_idx for virtio-blk, we
noticed that this is not enough: We also need to clear the published
used/avail event indices, or reboot will fail.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/s390-virtio-bus.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index 4a33c6a..4f69cbb 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -77,10 +77,18 @@ void s390_virtio_reset_idx(VirtIOS390Device *dev)
 VIRTIO_VRING_AVAIL_IDX_OFFS;
 address_space_stw(address_space_memory, idx_addr, 0,
   MEMTXATTRS_UNSPECIFIED, NULL);
+idx_addr = virtio_queue_get_avail_addr(dev-vdev, i) +
+virtio_queue_get_avail_size(dev-vdev, i);
+address_space_stw(address_space_memory, idx_addr, 0,
+  MEMTXATTRS_UNSPECIFIED, NULL);
 idx_addr = virtio_queue_get_used_addr(dev-vdev, i) +
 VIRTIO_VRING_USED_IDX_OFFS;
 address_space_stw(address_space_memory, idx_addr, 0,
   MEMTXATTRS_UNSPECIFIED, NULL);
+idx_addr = virtio_queue_get_used_addr(dev-vdev, i) +
+virtio_queue_get_used_size(dev-vdev, i);
+address_space_stw(address_space_memory, idx_addr, 0,
+  MEMTXATTRS_UNSPECIFIED, NULL);
 }
 }
 
-- 
2.4.0

[Qemu-devel] [PATCH COLO v4 11/15] Add new block driver interfaces to control block replication

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
Cc: Luiz Capitulino lcapitul...@redhat.com
Cc: Michael Roth mdr...@linux.vnet.ibm.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 block.c   | 40 
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block.json   | 16 
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 050f919..54d5b29 100644
--- a/block.c
+++ b/block.c
@@ -4213,3 +4213,43 @@ void bdrv_disconnect(BlockDriverState *bs)
 bdrv_disconnect(bs-file);
 }
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_start_replication) {
+drv-bdrv_start_replication(bs, mode, errp);
+} else if (bs-file) {
+bdrv_start_replication(bs-file, mode, errp);
+} else {
+error_setg(errp, this feature or command is not currently supported);
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_do_checkpoint) {
+drv-bdrv_do_checkpoint(bs, errp);
+} else if (bs-file) {
+bdrv_do_checkpoint(bs-file, errp);
+} else {
+error_setg(errp, this feature or command is not currently supported);
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_stop_replication) {
+drv-bdrv_stop_replication(bs, failover, errp);
+} else if (bs-file) {
+bdrv_stop_replication(bs-file, failover, errp);
+} else {
+error_setg(errp, this feature or command is not currently supported);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index ed5b8e5..c98adfb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -596,4 +596,9 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 void bdrv_connect(BlockDriverState *bs, Error **errp);
 void bdrv_disconnect(BlockDriverState *bs);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index da29fa7..761e6ff 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
 void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
 void (*bdrv_disconnect)(BlockDriverState *bs);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index e313465..56e469d 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @unprotected: Replication is not started or after failover.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'ReplicationMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 06/15] Don't allow a disk use backing reference target

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index 35e1a95..15d21da 100644
--- a/block.c
+++ b/block.c
@@ -1204,6 +1204,14 @@ static int 
bdrv_open_backing_reference_file(BlockDriverState *bs,
 }
 
 backing_hd = blk_bs(backing_blk);
+/* Don't allow a disk use backing reference target */
+ret = blk_attach_dev(backing_hd-blk, bs);
+if (ret  0) {
+error_setg(errp, backing_hd %s is used by the other device model,
+   backing_name);
+goto free_exit;
+}
+
 /* Backing reference itself? */
 if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
 error_setg(errp, Backing reference itself);
@@ -1887,6 +1895,7 @@ void bdrv_close(BlockDriverState *bs)
 if (backing_hd-backing_hd-job) {
 block_job_cancel(backing_hd-backing_hd-job);
 }
+blk_detach_dev(backing_hd-backing_hd-blk, bs);
 bdrv_set_backing_hd(backing_hd, NULL);
 bdrv_unref(backing_hd-backing_hd);
 }
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 02/15] allow writing to the backing file

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 49d472f..8fa4f05 100644
--- a/block.c
+++ b/block.c
@@ -1051,6 +1051,20 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILEallow-write-backing-file
+static QemuOptsList backing_file_opts = {
+.name = backing_file,
+.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+.desc = {
+{
+.name = ALLOW_WRITE_BACKING_FILE,
+.type = QEMU_OPT_BOOL,
+.help = allow write to backing file,
+},
+{ /* end of list */ }
+},
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1065,6 +1079,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 int ret = 0;
 BlockDriverState *backing_hd;
 Error *local_err = NULL;
+QemuOpts *opts = NULL;
+int flags;
 
 if (bs-backing_hd != NULL) {
 QDECREF(options);
@@ -1077,6 +1093,19 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 }
 
 bs-open_flags = ~BDRV_O_NO_BACKING;
+flags = bdrv_backing_flags(bs-open_flags);
+
+opts = qemu_opts_create(backing_file_opts, NULL, 0, error_abort);
+qemu_opts_absorb_qdict(opts, options, local_err);
+if (local_err) {
+ret = -EINVAL;
+goto free_exit;
+}
+if (qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false)) {
+flags |= BDRV_O_RDWR;
+}
+qemu_opts_del(opts);
+
 if (qdict_haskey(options, file.filename)) {
 backing_filename[0] = '\0';
 } else if (bs-backing_file[0] == '\0'  qdict_size(options) == 0) {
@@ -1109,7 +1138,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 assert(bs-backing_hd == NULL);
 ret = bdrv_open(backing_hd,
 *backing_filename ? backing_filename : NULL, NULL, options,
-bdrv_backing_flags(bs-open_flags), NULL, local_err);
+flags, NULL, local_err);
 if (ret  0) {
 bdrv_unref(backing_hd);
 backing_hd = NULL;
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 14/15] quorum: allow ignoring child errors

2015-05-08 Thread Wen Congyang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/quorum.c | 64 +++---
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index e3c2e46..425459e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -30,6 +30,7 @@
 #define QUORUM_OPT_BLKVERIFY  blkverify
 #define QUORUM_OPT_REWRITErewrite-corrupted
 #define QUORUM_OPT_READ_PATTERN   read-pattern
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   ignore-errors
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -65,6 +66,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
 BlockDriverState **bs; /* children BlockDriverStates */
+bool *ignore_errors;   /* ignore children's error? */
 int num_children;  /* children count */
 int threshold; /* if less than threshold children reads gave the
 * same result a quorum error occurs.
@@ -99,6 +101,7 @@ typedef struct QuorumChildRequest {
 uint8_t *buf;
 int ret;
 QuorumAIOCB *parent;
+int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -211,6 +214,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 acb-qcrs[i].buf = NULL;
 acb-qcrs[i].ret = 0;
 acb-qcrs[i].parent = acb;
+acb-qcrs[i].index = i;
 }
 
 return acb;
@@ -304,7 +308,7 @@ static void quorum_aio_cb(void *opaque, int ret)
 acb-count++;
 if (ret == 0) {
 acb-success_count++;
-} else {
+} else if (!s-ignore_errors[sacb-index]) {
 quorum_report_bad(acb, sacb-aiocb-bs-node_name, ret);
 }
 assert(acb-count = s-num_children);
@@ -719,19 +723,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs-opaque;
-int64_t result;
+int64_t result = -EIO;
 int i;
 
 /* check that all file have the same length */
-result = bdrv_getlength(s-bs[0]);
-if (result  0) {
-return result;
-}
-for (i = 1; i  s-num_children; i++) {
+for (i = 0; i  s-num_children; i++) {
 int64_t value = bdrv_getlength(s-bs[i]);
+
 if (value  0) {
 return value;
 }
+
+if (value == 0  s-ignore_errors[i]) {
+/*
+ * If the child is not ready, it cannot return -errno,
+ * otherwise refresh_total_sectors() will fail when
+ * we open the child.
+ */
+continue;
+}
+
+if (result == -EIO) {
+result = value;
+continue;
+}
+
 if (value != result) {
 return -EIO;
 }
@@ -769,6 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 
 for (i = 0; i  s-num_children; i++) {
 result = bdrv_co_flush(s-bs[i]);
+if (result  0  s-ignore_errors[i]) {
+result = 0;
+}
 result_value.l = result;
 quorum_count_vote(error_votes, result_value, i);
 }
@@ -843,6 +862,19 @@ static QemuOptsList quorum_runtime_opts = {
 },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+.name = quorum children,
+.head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+.desc = {
+{
+.name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+.type = QEMU_OPT_BOOL,
+.help = ignore child I/O error,
+},
+{ /* end of list */ }
+},
+};
+
 static int parse_read_pattern(const char *opt)
 {
 int i;
@@ -938,11 +970,14 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children BlockDriverState array */
 s-bs = g_new0(BlockDriverState *, s-num_children);
 opened = g_new0(bool, s-num_children);
+s-ignore_errors = g_new0(bool, s-num_children);
 
 for (i = 0, lentry = qlist_first(list); lentry;
  lentry = qlist_next(lentry), i++) {
 QDict *d;
 QString *string;
+QemuOpts *children_opts = NULL;
+bool value;
 
 switch (qobject_type(lentry-value))
 {
@@ -950,6 +985,20 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 case QTYPE_QDICT:
 d =

[Qemu-devel] [PATCH COLO v4 12/15] skip nbd_target when starting block replication

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 54d5b29..a442b5f 100644
--- a/block.c
+++ b/block.c
@@ -4219,6 +4219,10 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 {
 BlockDriver *drv = bs-drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv  drv-bdrv_start_replication) {
 drv-bdrv_start_replication(bs, mode, errp);
 } else if (bs-file) {
@@ -4232,6 +4236,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error 
**errp)
 {
 BlockDriver *drv = bs-drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv  drv-bdrv_do_checkpoint) {
 drv-bdrv_do_checkpoint(bs, errp);
 } else if (bs-file) {
@@ -4245,6 +4253,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
 {
 BlockDriver *drv = bs-drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv  drv-bdrv_stop_replication) {
 drv-bdrv_stop_replication(bs, failover, errp);
 } else if (bs-file) {
-- 
2.1.0

Re: [Qemu-devel] [PATCH v1 1/1] hw/ptimer: Do not artificially limit timers when using icount

2015-05-08 Thread Edgar E. Iglesias

On Mon, May 04, 2015 at 11:27:36AM +0200, Paolo Bonzini wrote:
 
 
 On 04/05/2015 05:18, Edgar E. Iglesias wrote:
  From: Edgar E. Iglesias edgar.igles...@xilinx.com
  
  Signed-off-by: Edgar E. Iglesias edgar.igles...@xilinx.com
  ---
   hw/core/ptimer.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
  index 2abad1f..8437bd6 100644
  --- a/hw/core/ptimer.c
  +++ b/hw/core/ptimer.c
  @@ -189,7 +189,7 @@ void ptimer_set_limit(ptimer_state *s, uint64_t limit, 
  int reload)
* on the current generation of host machines.
*/
   
  -if (limit * s-period  1  s-period) {
  +if (!use_icount  limit * s-period  1  s-period) {
   limit = 1 / s-period;
   }
   
  
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks, I've applied this one.

Cheers,
Edgar

Re: [Qemu-devel] [PULL 05/16] arch_init: Alloc and free data struct for compression

2015-05-08 Thread Juan Quintela

Paolo Bonzini pbonz...@redhat.com wrote:
 On 07/05/2015 18:46, Juan Quintela wrote:
  static CompressParam *comp_param;
  static QemuThread *compress_threads;
 +/* comp_done_cond is used to wake up the migration thread when
 + * one of the compression threads has finished the compression.
 + * comp_done_lock is used to co-work with comp_done_cond.
 + */
 +static QemuMutex *comp_done_lock;
 +static QemuCond *comp_done_cond;

 Can you please putr these (and the corresponding globals for
 decompression) in a struct?

Added to ToDo list if Liang don't de it first.

Later, Juan.

[Qemu-devel] [PATCH] Add virt-v3 machine that uses GIC-500

2015-05-08 Thread Pavel Fedin

 This is an alternative version of:
https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00945.html. Code 
merged with
original virt implementation.
 I decided to stick to different machine name instead of adding an option 
because of
compatibility with libvirt. In order to make use of a new option we would have 
to teach
libvirt to pass it, which means extensive amount of patching. After all, there 
are
architectural differences between GIC versions, so GICv3-based machine deserves 
to have a
different name. I decided to name it virt-v3. Feel free to change it if you 
like, i'm
just out of fantasy.
 I have tested the implementation with kernel v4.1rc2 and 16 CPUs. Works 
perfectly fine,
no lockups were observed.

Signed-off-by: Pavel Fedin p.fe...@samsung.com
---
 hw/arm/virt.c | 145 --
 1 file changed, 120 insertions(+), 25 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 565f573..b1c0eca 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -66,6 +66,10 @@ enum {
 VIRT_CPUPERIPHS,
 VIRT_GIC_DIST,
 VIRT_GIC_CPU,
+VIRT_GIC_DIST_SPI = VIRT_GIC_CPU,
+VIRT_ITS_CONTROL,
+VIRT_ITS_TRANSLATION,
+VIRT_LPI,
 VIRT_UART,
 VIRT_MMIO,
 VIRT_RTC,
@@ -97,6 +101,7 @@ typedef struct {
 typedef struct {
 MachineState parent;
 bool secure;
+bool gicv3;
 } VirtMachineState;
 
 #define TYPE_VIRT_MACHINE   virt
@@ -107,6 +112,8 @@ typedef struct {
 #define VIRT_MACHINE_CLASS(klass) \
 OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE)
 
+#define TYPE_VIRTV3_MACHINE   virt-v3
+
 /* Addresses and sizes of our components.
  * 0..128MB is space for a flash device so we can run bootrom code such as 
UEFI.
  * 128MB..256MB is used for miscellaneous device I/O.
@@ -121,25 +128,29 @@ typedef struct {
  */
 static const MemMapEntry a15memmap[] = {
 /* Space up to 0x800 is reserved for a boot ROM */
-[VIRT_FLASH] =  {  0, 0x0800 },
-[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 },
+[VIRT_FLASH] =   {  0, 0x0800 },
+[VIRT_CPUPERIPHS] =  { 0x0800, 0x0002 },
 /* GIC distributor and CPU interfaces sit inside the CPU peripheral space 
*/
-[VIRT_GIC_DIST] =   { 0x0800, 0x0001 },
-[VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
-[VIRT_UART] =   { 0x0900, 0x1000 },
-[VIRT_RTC] ={ 0x0901, 0x1000 },
-[VIRT_FW_CFG] = { 0x0902, 0x000a },
-[VIRT_MMIO] =   { 0x0a00, 0x0200 },
+[VIRT_GIC_DIST] ={ 0x0800, 0x0001 },
+[VIRT_GIC_CPU] = { 0x0801, 0x0001 },
+/* On v3 VIRT_GIC_DIST_SPI takes place of VIRT_GIC_CPU */
+[VIRT_ITS_CONTROL] = { 0x0802, 0x1000 },
+[VIRT_ITS_TRANSLATION] = { 0x0803, 0x0001 },
+[VIRT_LPI] = { 0x0804, 0x0080 },
+[VIRT_UART] ={ 0x0900, 0x1000 },
+[VIRT_RTC] = { 0x0901, 0x1000 },
+[VIRT_FW_CFG] =  { 0x0902, 0x000a },
+[VIRT_MMIO] ={ 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 /*
  * PCIE verbose map:
  *
- * MMIO window  { 0x1000, 0x2eff },
- * PIO window   { 0x3eff, 0x0001 },
- * ECAM { 0x3f00, 0x0100 },
+ * MMIO window   { 0x1000, 0x2eff },
+ * PIO window{ 0x3eff, 0x0001 },
+ * ECAM  { 0x3f00, 0x0100 },
  */
-[VIRT_PCIE] =   { 0x1000, 0x3000 },
-[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 },
+[VIRT_PCIE] ={ 0x1000, 0x3000 },
+[VIRT_MEM] = { 0x4000, 30ULL * 1024 * 1024 * 1024 },
 };
 
 static const int a15irqmap[] = {
@@ -299,11 +310,24 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 {
 int cpu;
 
+/*
+ * From Documentation/devicetree/bindings/arm/cpus.txt
+ *  On ARM v8 64-bit systems value should be set to 2,
+ *  that corresponds to the MPIDR_EL1 register size.
+ *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
+ *  in the system, #address-cells can be set to 1, since
+ *  MPIDR_EL1[63:32] bits are not used for CPUs
+ *  identification.
+ *
+ *  Now GIC500 doesn't support affinities 2  3 so currently
+ *  #address-cells can stay 1 until future GIC
+ */
 qemu_fdt_add_subnode(vbi-fdt, /cpus);
 qemu_fdt_setprop_cell(vbi-fdt, /cpus, #address-cells, 0x1);
 qemu_fdt_setprop_cell(vbi-fdt, /cpus, #size-cells, 0x0);
 
 for (cpu = vbi-smp_cpus - 1; cpu = 0; cpu--) {
+int Aff1, Aff0;
 char *nodename = g_strdup_printf(/cpus/cpu@%d, cpu);
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
 
@@ -317,12 +341,20 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)

Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64

2015-05-08 Thread Pavel Fedin

 Hello!

 Ignoring legacy, the ideal naming scheme would be:
 
 qemu-system-arm
 aarch64-linux-user
 aarch32-linux-user

 You know, i rethought. Actually, the question concerns only virt machine, 
because for other real machines the CPU should be the same as on real HW, and 
this is no question. Shouldn't our default for virt be just host ? This 
would eliminate all ambiguity i believe...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

[Qemu-devel] [PATCH COLO v4 09/15] Introduce a new -drive option to control whether to connect to remote target

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 blockdev.c| 8 
 include/block/block.h | 1 +
 qemu-options.hx   | 4 
 3 files changed, 13 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..6839c8f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -429,6 +429,10 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 qdict_put(bs_opts, driver, qstring_from_str(buf));
 }
 
+if (qemu_opt_get_bool(opts, no-connect, false)) {
+bdrv_flags |= BDRV_O_NO_CONNECT;
+}
+
 /* disk I/O throttling */
 memset(cfg, 0, sizeof(cfg));
 cfg.buckets[THROTTLE_BPS_TOTAL].avg =
@@ -3197,6 +3201,10 @@ QemuOptsList qemu_common_drive_opts = {
 .name = detect-zeroes,
 .type = QEMU_OPT_STRING,
 .help = try to optimize zero writes (off, on, unmap),
+},{
+.name = no-connect,
+.type = QEMU_OPT_BOOL,
+.help = enable whether to connect remote target
 },
 { /* end of list */ }
 },
diff --git a/include/block/block.h b/include/block/block.h
index e479a34..ed5b8e5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -87,6 +87,7 @@ typedef struct HDGeometry {
 #define BDRV_O_PROTOCOL0x8000  /* if no block driver is explicitly given:
   select an appropriate protocol driver,
   ignoring the format layer */
+#define BDRV_O_NO_CONNECT  0x1 /* do not connect to remote target */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
 
diff --git a/qemu-options.hx b/qemu-options.hx
index ec356f6..c41e3e3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -464,6 +464,7 @@ DEF(drive, HAS_ARG, QEMU_OPTION_drive,
[[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n
[[,iops_max=im]|[[,iops_rd_max=irm][,iops_wr_max=iwm]]]\n
[[,iops_size=is]]\n
+   [,no-connect=on|off]\n
 use 'file' as a drive image\n, QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -525,6 +526,9 @@ file sectors into the image file.
 conversion of plain zero writes by the OS to driver specific optimized
 zero write commands. You may even choose unmap if @var{discard} is set
 to unmap to allow a zero write to be converted to an UNMAP operation.
+@item no-connect=@var{no-connect}
+@var{no-connect} is on or off, and enables whether to connect to remote
+target when open the drive. The default value is off.
 @end table
 
 By default, the @option{cache=writeback} mode is used. It will report data
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 03/15] Allow creating backup jobs when opening BDS

2015-05-08 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
Cc: Jeff Cody jc...@redhat.com
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 0d8c2a4..8dd9b9c 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 10/15] NBD client: connect to nbd server later

2015-05-08 Thread Wen Congyang

The secondary qemu starts later than the primary qemu, so we
cannot connect to nbd server in bdrv_open(). Introduce a new
open flags to control it.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/nbd.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 0624232..e1a04bb 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -298,10 +298,12 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-nbd_connect_server(bs, local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return -EINVAL;
+if (!(flags  BDRV_O_NO_CONNECT)) {
+nbd_connect_server(bs, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
 }
 
 return 0;
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 13/15] quorum: implement block driver interfaces for block replication

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/quorum.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index f91ef75..e3c2e46 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -82,6 +82,8 @@ typedef struct BDRVQuorumState {
 */
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -972,6 +974,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 g_free(opened);
+s-replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1061,6 +1064,77 @@ static void quorum_refresh_filename(BlockDriverState *bs)
 bs-full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs-opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, Invalid parameter '%s', mode);
+return;
+}
+
+if (s-read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, Invalid parameter '%s', read pattern);
+return;
+}
+
+for (i = 0; i  s-num_children; i++) {
+bdrv_start_replication(s-bs[i], mode, local_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+/* No child supports block replication */
+error_setg(errp, this feature or command is not currently supported);
+} else if (count  1) {
+for (i = 0; i  s-num_children; i++) {
+bdrv_stop_replication(s-bs[i], false, NULL);
+}
+error_setg(errp, too many children support block replication);
+} else {
+s-replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs-opaque;
+
+if (s-replication_index  0) {
+error_setg(errp, Block replication is not started);
+return;
+}
+
+bdrv_do_checkpoint(s-bs[s-replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs-opaque;
+
+if (s-replication_index  0) {
+error_setg(errp, Block replication is not started);
+return;
+}
+
+bdrv_stop_replication(s-bs[s-replication_index], failover, errp);
+s-replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
@@ -1084,6 +1158,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.1.0

Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64

2015-05-08 Thread Pavel Fedin

 Hello!

 FWIW virt-manager 1.2.0 (just released) will do the following when creating a
 new VM:
 
 - aarch64 + kvm : -cpu host
 - aarch64 + tcg : -cpu cortex-a57
 - arm32 + kvm : -cpu host
 - arm32 + tcg : defer to qemu
 
 Though if you explicitly request 'hypervisor default' then we won't specify
 any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64

 virt-manager is not the only tool to create VMs...
 But, okay. Seems you just don't want to change this. Well... I still don't 
agree and this
default looks strange for me, but it's just me. I'm out of further arguments.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/7] Fix transactional snapshot with virtio-blk dataplane

2015-05-08 Thread Stefan Hajnoczi

On Thu, May 07, 2015 at 02:43:43PM +0100, Stefan Hajnoczi wrote:
 On Wed, May 06, 2015 at 07:23:32PM +0800, Fam Zheng wrote:
  Reported by Paolo.
  
  Unlike the iohandler in main loop, iothreads currently process the event
  notifier used as virtio-blk ioeventfd in all nested aio_poll. This is 
  dangerous
  without proper protection, because guest requests could sneak to block layer
  where they mustn't.
  
  For example, a QMP transaction may involve multiple bdrv_drain_all() in
  handling the list of AioContext it works on. If an aio_poll in one of the
  bdrv_drain_all() happens to process a guest VQ kick by dispatching the
  ioeventfd event, a new guest write is then submitted, and voila, the
  transaction semantics is violated.
  
  This series avoids this problem by disabling virtio-blk handlers during
  bdrv_drain_all() and transactions.
  
  Notes:
  
  If the general approach is right, other transaction types could get the
  blockers similarly, in next revision. And some related bdrv_drain_all() 
  could
  also be changed to bdrv_drain().
  
  virtio-scsi-dataplane will be a bit more complicated, but still doable.  It
  would probably need one more interface abstraction between scsi-disk, 
  scsi-bus
  and virtio-scsi.
  
  Although other devices don't have a pause/resume callback yet, the
  blk_check_request, which returns -EBUSY if device io op blocker is set, 
  could
  hopefully cover most cases already.
  
  Timers and block jobs also generate IO, but it should be fine as long as 
  they
  don't change guest visible data, which is true AFAICT.
  
  
  Fam Zheng (7):
block: Add op blocker type device IO
block: Block device IO during bdrv_drain and bdrv_drain_all
block: Add op blocker notifier list
block-backend: Add blk_op_blocker_add_notifier
virtio-blk: Move complete_request to 'ops' structure
virtio-blk: Don't handle output when there is device IO op blocker
blockdev: Add device IO op blocker during snapshot transaction
  
   block.c | 20 
   block/block-backend.c   | 10 ++
   block/io.c  | 12 +++
   blockdev.c  |  7 +
   hw/block/dataplane/virtio-blk.c | 36 ++---
   hw/block/virtio-blk.c   | 69 
  +++--
   include/block/block.h   |  9 ++
   include/block/block_int.h   |  3 ++
   include/hw/virtio/virtio-blk.h  | 17 --
   include/sysemu/block-backend.h  |  2 ++
   10 files changed, 174 insertions(+), 11 deletions(-)
 
 Please also add a listener to nbd.c so NBD exports cannot submit I/O
 during bdrv_drain_all().

qmp_transaction probably also needs this.

Stefan


pgpAQPw3DSGnC.pgp
Description: PGP signature

[Qemu-devel] [PULL 5/8] virtio-ccw: implement -device_plugged

2015-05-08 Thread Cornelia Huck

Let's move operations that are only valid after the backend has been
realized to a -device_plugged callback, just as virtio-pci does.
Also reorder setting up the host feature bits to the sequence used
by virtio-pci.

While we're at it, also add a -device_unplugged callback to stop
ioeventfd, just to be on the safe side.

Reviewed-by: Shannon Zhao shannon.z...@linaro.org
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
Message-Id: 1429627016-30656-3-git-send-email-cornelia.h...@de.ibm.com
---
 hw/s390x/virtio-ccw.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 1c2bd9d..430cc6f 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -652,10 +652,8 @@ static void virtio_ccw_device_realize(VirtioCcwDevice 
*dev, Error **errp)
 bool found = false;
 SubchDev *sch;
 int num;
-DeviceState *parent = DEVICE(dev);
 Error *err = NULL;
 VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
-VirtIODevice *vdev;
 
 sch = g_malloc0(sizeof(SubchDev));
 
@@ -778,19 +776,6 @@ static void virtio_ccw_device_realize(VirtioCcwDevice 
*dev, Error **errp)
 goto out_err;
 }
 
-/* device_id is only set after vdev has been realized */
-vdev = virtio_ccw_get_vdev(sch);
-sch-id.cu_model = vdev-device_id;
-
-/* Only the first 32 feature bits are used. */
-dev-host_features[0] = virtio_bus_get_vdev_features(dev-bus,
- 
dev-host_features[0]);
-
-virtio_add_feature(dev-host_features[0], VIRTIO_F_NOTIFY_ON_EMPTY);
-virtio_add_feature(dev-host_features[0], VIRTIO_F_BAD_FEATURE);
-
-css_generate_sch_crws(sch-cssid, sch-ssid, sch-schid,
-  parent-hotplugged, 1);
 return;
 
 out_err:
@@ -1428,6 +1413,30 @@ static int virtio_ccw_load_config(DeviceState *d, 
QEMUFile *f)
 return 0;
 }
 
+/* This is called by virtio-bus just after the device is plugged. */
+static void virtio_ccw_device_plugged(DeviceState *d)
+{
+VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
+SubchDev *sch = dev-sch;
+
+sch-id.cu_model = virtio_bus_get_vdev_id(dev-bus);
+
+/* Only the first 32 feature bits are used. */
+virtio_add_feature(dev-host_features[0], VIRTIO_F_NOTIFY_ON_EMPTY);
+virtio_add_feature(dev-host_features[0], VIRTIO_F_BAD_FEATURE);
+dev-host_features[0] = virtio_bus_get_vdev_features(dev-bus,
+ 
dev-host_features[0]);
+
+css_generate_sch_crws(sch-cssid, sch-ssid, sch-schid,
+  d-hotplugged, 1);
+}
+
+static void virtio_ccw_device_unplugged(DeviceState *d)
+{
+VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
+
+virtio_ccw_stop_ioeventfd(dev);
+}
 / Virtio-ccw Bus Device Descriptions ***/
 
 static Property virtio_ccw_net_properties[] = {
@@ -1752,6 +1761,8 @@ static void virtio_ccw_bus_class_init(ObjectClass *klass, 
void *data)
 k-load_queue = virtio_ccw_load_queue;
 k-save_config = virtio_ccw_save_config;
 k-load_config = virtio_ccw_load_config;
+k-device_plugged = virtio_ccw_device_plugged;
+k-device_unplugged = virtio_ccw_device_unplugged;
 }
 
 static const TypeInfo virtio_ccw_bus_info = {
-- 
2.4.0

[Qemu-devel] [PATCH COLO v4 00/15] Block replication for continuous checkpoints

2015-05-08 Thread Wen Congyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wencongyang/qemu-colo/commits/block-replication-v4

You can get the patch with the other COLO patches here:
https://github.com/wencongyang/qemu-colo/tree/colo_huawei_v4.7

TODO:
1. Test failover when the guest does too many I/O operations. If it
   takes too much time, we need to optimize it.
2. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (15):
  docs: block replication's description
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Parse backing_reference option to reference existing BDS
  Backup: clear all bitmap when doing block checkpoint
  Don't allow a disk use backing reference target
  Add new block driver interface to connect/disconnect the remote target
  NBD client: implement block driver interfaces to connect/disconnect
NBD server
  Introduce a new -drive option to control whether to connect to remote
target
  NBD client: connect to nbd server later
  Add new block driver interfaces to control block replication
  skip nbd_target when starting block replication
  quorum: implement block driver interfaces for block replication
  quorum: allow ignoring child errors
  Implement new driver for block replication

 block.c| 270 +++-
 block/Makefile.objs|   3 +-
 block/backup.c |  13 ++
 block/nbd.c|  67 --
 block/quorum.c | 142 -
 block/replication.c| 512 +
 blockdev.c |   8 +
 blockjob.c |  10 +
 docs/block-replication.txt | 179 
 include/block/block.h  |  10 +
 include/block/block_int.h  |  18 ++
 include/block/blockjob.h   |  12 ++
 qapi/block.json|  16 ++
 qemu-options.hx|   4 +
 tests/qemu-iotests/051 |  13 ++
 tests/qemu-iotests/051.out |  13 ++
 16 files changed, 1260 insertions(+), 30 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.1.0

[Qemu-devel] [PULL 7/8] s390x: move fpu regs into a subsection of the vmstate

2015-05-08 Thread Cornelia Huck

From: David Hildenbrand d...@linux.vnet.ibm.com

Let's move the floating point registers into a seperate subsection and
bump up the version id. This cleans up the current vmstate and will
allow for a future extension with vector registers in a compatible way.

This patch is based on a patch from Eric Farman.

Reviewed-by: Eric Farman far...@linux.vnet.ibm.com
Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 target-s390x/machine.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/target-s390x/machine.c b/target-s390x/machine.c
index bd4cea7..a034423 100644
--- a/target-s390x/machine.c
+++ b/target-s390x/machine.c
@@ -33,12 +33,11 @@ static int cpu_post_load(void *opaque, int version_id)
 return 0;
 }
 
-const VMStateDescription vmstate_s390_cpu = {
-.name = cpu,
-.post_load = cpu_post_load,
-.version_id = 2,
-.minimum_version_id = 2,
-.fields  = (VMStateField[]) {
+const VMStateDescription vmstate_fpu = {
+.name = cpu/fpu,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
 VMSTATE_UINT64(env.fregs[0].ll, S390CPU),
 VMSTATE_UINT64(env.fregs[1].ll, S390CPU),
 VMSTATE_UINT64(env.fregs[2].ll, S390CPU),
@@ -55,11 +54,26 @@ const VMStateDescription vmstate_s390_cpu = {
 VMSTATE_UINT64(env.fregs[13].ll, S390CPU),
 VMSTATE_UINT64(env.fregs[14].ll, S390CPU),
 VMSTATE_UINT64(env.fregs[15].ll, S390CPU),
+VMSTATE_UINT32(env.fpc, S390CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static inline bool fpu_needed(void *opaque)
+{
+return true;
+}
+
+const VMStateDescription vmstate_s390_cpu = {
+.name = cpu,
+.post_load = cpu_post_load,
+.version_id = 3,
+.minimum_version_id = 3,
+.fields  = (VMStateField[]) {
 VMSTATE_UINT64_ARRAY(env.regs, S390CPU, 16),
 VMSTATE_UINT64(env.psw.mask, S390CPU),
 VMSTATE_UINT64(env.psw.addr, S390CPU),
 VMSTATE_UINT64(env.psa, S390CPU),
-VMSTATE_UINT32(env.fpc, S390CPU),
 VMSTATE_UINT32(env.todpr, S390CPU),
 VMSTATE_UINT64(env.pfault_token, S390CPU),
 VMSTATE_UINT64(env.pfault_compare, S390CPU),
@@ -74,4 +88,12 @@ const VMStateDescription vmstate_s390_cpu = {
 VMSTATE_UINT8(env.sigp_order, S390CPU),
 VMSTATE_END_OF_LIST()
  },
+.subsections = (VMStateSubsection[]) {
+{
+.vmsd = vmstate_fpu,
+.needed = fpu_needed,
+} , {
+/* empty */
+}
+},
 };
-- 
2.4.0

[Qemu-devel] [PATCH COLO v4 04/15] block: Parse backing_reference option to reference existing BDS

2015-05-08 Thread Wen Congyang

Usage:
-drive file=xxx,id=Y, \
-drive 
file=,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*

It will create such backing chain:
   {virtio-blk dev 'Y'}  
{virtio-blk dev 'X'}
 |  
|
 |  
|
 v  
v

[base] - [mid] - ( Y )  - (hidden target) 
--- ( X )

 v  ^
 v  ^
 v  ^
 v  ^
  drive-backup sync=none 

X's backing file is hidden-disk, and hidden-disk's backing file is Y.
Disk Y may be opened or reopened in read-write mode, so A block backup
job is automatically created: source is Y and target is hidden disk.
Active disk X, hidden disk, and Y are all on the same AioContext.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c| 154 -
 include/block/block.h  |   1 +
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/051 |  13 
 tests/qemu-iotests/051.out |  13 
 5 files changed, 179 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 8fa4f05..35e1a95 100644
--- a/block.c
+++ b/block.c
@@ -1155,6 +1155,119 @@ free_exit:
 return ret;
 }
 
+static void backing_reference_completed(void *opaque, int ret)
+{
+BlockDriverState *hidden_disk = opaque;
+
+assert(!hidden_disk-backing_reference);
+}
+
+static int bdrv_open_backing_reference_file(BlockDriverState *bs,
+QDict *options, Error **errp)
+{
+const char *backing_name;
+QDict *hidden_disk_options = NULL;
+BlockDriverState *backing_hd, *hidden_disk;
+BlockBackend *backing_blk;
+AioContext *aio_context;
+Error *local_err = NULL;
+int ret = 0;
+
+backing_name = qdict_get_try_str(options, drive_id);
+if (!backing_name) {
+error_setg(errp, Backing reference needs option drive_id);
+ret = -EINVAL;
+goto free_exit;
+}
+qdict_del(options, drive_id);
+
+qdict_extract_subqdict(options, hidden_disk_options, hidden-disk.);
+if (!qdict_size(hidden_disk_options)) {
+error_setg(errp, Backing reference needs option hidden-disk.*);
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (qdict_size(options)) {
+const QDictEntry *entry = qdict_first(options);
+error_setg(errp, Backing reference used by '%s' doesn't support 
+   the option '%s', bdrv_get_device_name(bs), entry-key);
+ret = -EINVAL;
+goto free_exit;
+}
+
+backing_blk = blk_by_name(backing_name);
+if (!backing_blk) {
+error_setg(errp, Device '%s' not found, backing_name);
+ret = -ENOENT;
+goto free_exit;
+}
+
+backing_hd = blk_bs(backing_blk);
+/* Backing reference itself? */
+if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
+error_setg(errp, Backing reference itself);
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+   errp)) {
+ret = -EBUSY;
+goto free_exit;
+}
+
+/* hidden-disk is bs's backing file */
+ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
+hidden_disk_options = NULL;
+if (ret  0) {
+goto free_exit;
+}
+
+hidden_disk = bs-backing_hd;
+if (!hidden_disk-drv || !hidden_disk-drv-supports_backing) {
+ret = -EINVAL;
+error_setg(errp, Hidden disk's driver doesn't support backing files);
+goto free_exit;
+}
+
+bdrv_set_backing_hd(hidden_disk, backing_hd);
+bdrv_ref(backing_hd);
+
+/*
+ * backing hd may be opened or reopened in read-write mode, so we
+ * should backup backing hd to hidden disk
+ */
+bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs-backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+hidden_disk-backing_blocker);
+
+bdrv_ref(hidden_disk);
+
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+bdrv_set_aio_context(backing_hd, aio_context);
+backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE, NULL,
+ BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ backing_reference_completed, hidden_disk, local_err);
+aio_context_release(aio_context);
+if (local_err)

[Qemu-devel] [PATCH COLO v4 05/15] Backup: clear all bitmap when doing block checkpoint

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
Cc: Jeff Cody jc...@redhat.com
---
 block/backup.c   | 13 +
 blockjob.c   | 10 ++
 include/block/blockjob.h | 12 
 3 files changed, 35 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index d3f648d..d3d8ba7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -210,11 +210,24 @@ static void backup_iostatus_reset(BlockJob *job)
 bdrv_iostatus_reset(s-target);
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job-sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, this feature or command is not currently supported);
+return;
+}
+
+hbitmap_reset_all(backup_job-bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index 2755465..9d2128a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -399,3 +399,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
 qemu_bh_schedule(data-bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job-driver-do_checkpoint) {
+error_setg(errp, this feature or command is not currently supported);
+return;
+}
+
+job-driver-do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..b832dc3 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
  * manually.
  */
 void (*complete)(BlockJob *job, Error **errp);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
   BlockJobDeferToMainLoopFn *fn,
   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.1.0

[Qemu-devel] [PATCH COLO v4 08/15] NBD client: implement block driver interfaces to connect/disconnect NBD server

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/nbd.c | 65 -
 1 file changed, 47 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2176186..0624232 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,8 @@
 typedef struct BDRVNBDState {
 NbdClientSession client;
 QemuOpts *socket_opts;
+char *export;
+bool connected;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -254,34 +256,55 @@ static int nbd_establish_connection(BlockDriverState *bs, 
Error **errp)
 return sock;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-Error **errp)
+static void nbd_connect_server(BlockDriverState *bs, Error **errp)
 {
 BDRVNBDState *s = bs-opaque;
-char *export = NULL;
-int result, sock;
-Error *local_err = NULL;
-
-/* Pop the config into our state object. Exit if invalid. */
-nbd_config(s, options, export, local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return -EINVAL;
-}
+int sock;
 
 /* establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
  */
 sock = nbd_establish_connection(bs, errp);
 if (sock  0) {
-g_free(export);
-return sock;
+return;
 }
 
 /* NBD handshake */
-result = nbd_client_init(bs, sock, export, errp);
-g_free(export);
-return result;
+nbd_client_init(bs, sock, s-export, errp);
+
+s-connected = true;
+}
+
+static void nbd_disconnect_server(BlockDriverState *bs)
+{
+BDRVNBDState *s = bs-opaque;
+
+if (s-connected) {
+nbd_client_close(bs);
+s-connected = false;
+}
+}
+
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
+{
+BDRVNBDState *s = bs-opaque;
+Error *local_err = NULL;
+
+/* Pop the config into our state object. Exit if invalid. */
+nbd_config(s, options, s-export, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+nbd_connect_server(bs, local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+return 0;
 }
 
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -318,7 +341,7 @@ static void nbd_close(BlockDriverState *bs)
 BDRVNBDState *s = bs-opaque;
 
 qemu_opts_del(s-socket_opts);
-nbd_client_close(bs);
+nbd_disconnect_server(bs);
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
@@ -400,6 +423,8 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -418,6 +443,8 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -436,6 +463,8 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.1.0

Re: [Qemu-devel] [PATCH v4 4/4] spapr_pci: populate ibm,loc-code

2015-05-08 Thread Alexey Kardashevskiy


On 05/07/2015 05:21 PM, Nikunj A Dadhania wrote:

Each hardware instance has a platform unique location code.  The OF
device tree that describes a part of a hardware entity must include
the “ibm,loc-code” property with a value that represents the location
code for that hardware entity.

Populate ibm,loc-code.

1) PCI passthru devices need to identify with its own ibm,loc-code
available on the host. In failure cases use:
vfio_name:phb-index:slot.fn

2) Emulated devices encode as following:
qemu_name:phb-index:slot.fn

Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
---
  hw/ppc/spapr_pci.c | 98 +++---
  1 file changed, 86 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 12f1b9c..d901007 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -769,6 +769,81 @@ static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState 
*phb,
  return drck-get_index(drc);
  }

+static bool spapr_phb_vfio_get_devspec_value(PCIDevice *pdev, char **value)


Does it have to be a separate function?



+{
+char *host;
+char path[PATH_MAX];
+
+host = object_property_get_str(OBJECT(pdev), host, NULL);
+if (!host) {
+return false;
+}
+
+snprintf(path, sizeof(path), /sys/bus/pci/devices/%s/devspec, host);
+g_free(host);
+
+return g_file_get_contents(path, value, NULL, NULL);
+}
+
+static char *spapr_phb_vfio_get_loc_code(sPAPRPHBState *sphb,  PCIDevice *pdev)
+{
+char path[PATH_MAX], *buf = NULL;
+
+/* We have a vfio host bridge lets get the path. */
+if (!spapr_phb_vfio_get_devspec_value(pdev, buf)) {
+return NULL;
+}
+
+snprintf(path, sizeof(path), /proc/device-tree%s/ibm,loc-code, buf);
+g_free(buf);
+
+g_file_get_contents(path, buf, NULL, NULL);
+return buf;
+}
+
+static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb,  PCIDevice *pdev)
+{
+char *path = g_malloc(PATH_MAX);
+
+if (!path) {
+return NULL;
+}
+
+/*
+ * For non-vfio devices make up the location code out
+ * of the name, slot and function.
+ *
+ *   qemu_name:phb-index:slot.fn
+ */
+snprintf(path, PATH_MAX, qemu_%s:%02d:%02d.%1d, pdev-name,
+ sphb-index, PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn));



g_strdup_printf?




+return path;
+}
+
+
+static char *spapr_ibm_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev)


s/spapr_ibm_get_loc_code/spapr_phb_get_loc_code/

Strange to see ibm in a function name, so far we have only used ibm in 
RTAS handler names :)




+{
+if (object_dynamic_cast(OBJECT(pdev), vfio-pci) != NULL) {


QEMU does not compare object_dynamic_cast with NULL anywhere, so:

if (object_dynamic_cast(OBJECT(pdev), vfio-pci)) {






+char *buf = spapr_phb_vfio_get_loc_code(sphb, pdev);
+
+/*
+ * In case of failures reading the loc-code, make it up
+ * indicating a vfio device
+ */
+if (!buf) {
+buf = g_malloc(PATH_MAX);
+if (!buf) {
+return NULL;
+}
+snprintf(buf, PATH_MAX, vfio_%s:%02d:%02d.%1d, pdev-name,
+ sphb-index, PCI_SLOT(pdev-devfn), 
PCI_FUNC(pdev-devfn));



g_strdup_printf?

Also, vfio_%s:%02d:%02d.%1d looks very similar to 
qemu_%s:%02d:%02d.%1d, you could extend spapr_phb_get_loc_code() to take 
vfio/qemu as a parameter.




+}
+return buf;
+} else {
+return spapr_phb_get_loc_code(sphb, pdev);
+}
+}
+
  /* Macros to operate with address in OF binding to PCI */
  #define b_x(x, p, l)(((x)  ((1(l))-1))  (p))
  #define b_n(x)  b_x((x), 31, 1) /* 0 if relocatable */
@@ -906,12 +981,12 @@ static void populate_resource_props(PCIDevice *d, 
ResourceProps *rp)
  }

  static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
-   sPAPRPHBState *sphb,
-   const char *drc_name)
+   sPAPRPHBState *sphb)
  {
  ResourceProps rp;
  bool is_bridge = false;
  int pci_status;
+char *buf = NULL;
  uint32_t drc_index = spapr_phb_get_pci_drc_index(sphb, dev);

  if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
@@ -973,9 +1048,10 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, 
void *fdt, int offset,
   * processed by OF beforehand
   */
  _FDT(fdt_setprop_string(fdt, offset, name, pci));
-if (drc_name) {
-_FDT(fdt_setprop(fdt, offset, ibm,loc-code, drc_name,
- strlen(drc_name)));
+buf = spapr_ibm_get_loc_code(sphb, dev);
+if (buf) {


It is rather:

if (!buf) {
error_report(Bad thing just happened);
return -1;
}

OR

g_assert(!buf);

Your code can only return NULL if g_malloc() failed (otherwise it will be 
at least (qemuvfio)_%s:%02d:%02d.%1d) and if this happened, something 
went

[Qemu-devel] [PULL 0/8] s390x: various patches

2015-05-08 Thread Cornelia Huck

The following changes since commit f8340b360b9bc29d48716ba8aca79df2b9544979:

  hw/ptimer: Do not artificially limit timers when using icount (2015-05-08 
17:15:23 +1000)

are available in the git repository at:

  git://github.com/cohuck/qemu tags/s390x-20150508

for you to fetch changes up to 3cda44f7bae5c9feddc11630ba6eecb2e3bed425:

  s390x/kvm: migrate vcpu interrupt state (2015-05-08 10:36:19 +0200)


Assorted s390x patches:
- updates for virtio-ccw and s390-virtio, making them more similar
  to virtio-pci
- improvements regarding per-vcpu interrupts and migration



Christian Borntraeger (2):
  s390-virtio: Accommodate guests using virtqueues too early
  s390-virtio: clear {used,avail}_event_idx on reset as well

Cornelia Huck (3):
  s390-virtio: use common features
  virtio-ccw: change realization sequence
  virtio-ccw: implement -device_plugged

David Hildenbrand (1):
  s390x: move fpu regs into a subsection of the vmstate

Jens Freimann (2):
  s390x/kvm: use ioctl KVM_S390_IRQ for vcpu interrupts
  s390x/kvm: migrate vcpu interrupt state

 hw/s390x/s390-virtio-bus.c | 28 -
 hw/s390x/s390-virtio.c | 10 ++
 hw/s390x/virtio-ccw.c  | 72 +++
 target-s390x/cpu-qom.h |  3 ++
 target-s390x/cpu.c |  1 +
 target-s390x/cpu.h |  9 ++
 target-s390x/kvm.c | 77 --
 target-s390x/machine.c | 49 -
 8 files changed, 192 insertions(+), 57 deletions(-)

-- 
2.4.0

[Qemu-devel] [PULL 2/8] s390-virtio: use common features

2015-05-08 Thread Cornelia Huck

We used to avoid enabling event_idx for virtio-blk devices via
s390-virtio, but we now have a workaround in place for guests trying
to use the device before setting DRIVER_OK. Therefore, let's add
DEFINE_VIRTIO_COMMON_FEATURES to the base device so all devices get
those common features - and make s390-virtio use the same mechanism
as the other transports do.

Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Reviewed-by: Shannon Zhao shannon.z...@linaro.org
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/s390-virtio-bus.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
index c27f8a5..4a33c6a 100644
--- a/hw/s390x/s390-virtio-bus.c
+++ b/hw/s390x/s390-virtio-bus.c
@@ -530,7 +530,6 @@ static unsigned virtio_s390_get_features(DeviceState *d)
 / S390 Virtio Bus Device Descriptions ***/
 
 static Property s390_virtio_net_properties[] = {
-DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
 DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -592,18 +591,12 @@ static const TypeInfo s390_virtio_serial = {
 .class_init= s390_virtio_serial_class_init,
 };
 
-static Property s390_virtio_rng_properties[] = {
-DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
-DEFINE_PROP_END_OF_LIST(),
-};
-
 static void s390_virtio_rng_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass);
 
 k-realize = s390_virtio_rng_realize;
-dc-props = s390_virtio_rng_properties;
 set_bit(DEVICE_CATEGORY_MISC, dc-categories);
 }
 
@@ -632,10 +625,16 @@ static void s390_virtio_busdev_reset(DeviceState *dev)
 virtio_reset(_dev-vdev);
 }
 
+static Property virtio_s390_properties[] = {
+DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void virtio_s390_device_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc-props = virtio_s390_properties;
 dc-realize = s390_virtio_busdev_realize;
 dc-bus_type = TYPE_S390_VIRTIO_BUS;
 dc-reset = s390_virtio_busdev_reset;
@@ -651,7 +650,6 @@ static const TypeInfo virtio_s390_device_info = {
 };
 
 static Property s390_virtio_scsi_properties[] = {
-DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
 DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -675,18 +673,12 @@ static const TypeInfo s390_virtio_scsi = {
 };
 
 #ifdef CONFIG_VHOST_SCSI
-static Property s390_vhost_scsi_properties[] = {
-DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features),
-DEFINE_PROP_END_OF_LIST(),
-};
-
 static void s390_vhost_scsi_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass);
 
 k-realize = s390_vhost_scsi_realize;
-dc-props = s390_vhost_scsi_properties;
 set_bit(DEVICE_CATEGORY_STORAGE, dc-categories);
 }
 
-- 
2.4.0

[Qemu-devel] [PATCH COLO v4 01/15] docs: block replication's description

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 docs/block-replication.txt | 179 +
 1 file changed, 179 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..a29f51a
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,179 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content(it could be from either Secondary Write Requests or
+   previous COW of Primary Write Requests) in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  ---  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk - hidden-disk 5 
- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'

[Qemu-devel] [PATCH COLO v4 07/15] Add new block driver interface to connect/disconnect the remote target

2015-05-08 Thread Wen Congyang

In some cases, we want to connect/disconnect the remote target when
we need, not in bdrv_open()/bdrv_close().

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block.c   | 24 
 include/block/block.h |  3 +++
 include/block/block_int.h |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/block.c b/block.c
index 15d21da..050f919 100644
--- a/block.c
+++ b/block.c
@@ -4189,3 +4189,27 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
 return bs-stats;
 }
+
+void bdrv_connect(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_connect) {
+drv-bdrv_connect(bs, errp);
+} else if (bs-file) {
+bdrv_connect(bs-file, errp);
+} else {
+error_setg(errp, this feature or command is not currently supported);
+}
+}
+
+void bdrv_disconnect(BlockDriverState *bs)
+{
+BlockDriver *drv = bs-drv;
+
+if (drv  drv-bdrv_disconnect) {
+drv-bdrv_disconnect(bs);
+} else if (bs-file) {
+bdrv_disconnect(bs-file);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 006c941..e479a34 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -592,4 +592,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_connect(BlockDriverState *bs, Error **errp);
+void bdrv_disconnect(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 49ca826..da29fa7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -290,6 +290,9 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
+void (*bdrv_disconnect)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64

2015-05-08 Thread Peter Crosthwaite

On Fri, May 8, 2015 at 12:14 AM, Pavel Fedin p.fe...@samsung.com wrote:
  Hello!

 FWIW virt-manager 1.2.0 (just released) will do the following when creating a
 new VM:

 - aarch64 + kvm : -cpu host
 - aarch64 + tcg : -cpu cortex-a57
 - arm32 + kvm : -cpu host
 - arm32 + tcg : defer to qemu

 Though if you explicitly request 'hypervisor default' then we won't specify
 any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64

  virt-manager is not the only tool to create VMs...
  But, okay. Seems you just don't want to change this. Well... I still don't 
 agree and this
 default looks strange for me, but it's just me. I'm out of further arguments.


I think the real bug is the naming scheme for QEMU targets. QEMU
targets such as Aarch64 and x86_64 are following the long established
convention of using an ISA for a name, but are ultimately badly named
as they support multiple ISAs beyond what their singular name
suggests.

qemu-system-arm would be the be the best name for the
works-for-all-ARM QEMU but that name is taken. But if we are true to
qemu-system-aarch64 being a functional superset of arm, can we just
unify as the one arch (arm) and phase out qemu-system-aarch64 (would
become an alias of qemu-system-arm)?. Linux user builds are the main
complication, but that is the case where it very much does make sense
to tie to an ISA. Ignoring legacy, the ideal naming scheme would be:

qemu-system-arm
aarch64-linux-user
aarch32-linux-user

Regards,
Peter

Regards,
Peter

 Kind regards,
 Pavel Fedin
 Expert Engineer
 Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description

2015-05-08 Thread Stefan Hajnoczi

On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
 * Stefan Hajnoczi (stefa...@redhat.com) wrote:
  On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:
   
   
   On 24/04/2015 11:38, Wen Congyang wrote:
 
 That can be done with drive-mirror.  But I think it's too early for 
 that.
Do you mean use drive-mirror instead of quorum?
   
   Only before starting up a new secondary.  Basically you do a migration
   with non-shared storage, and then start the secondary in colo mode.
   
   But it's only for the failover case.  Quorum (or a new block/colo.c
   driver or filter) is fine for normal colo operation.
  
  Perhaps this patch series should mirror the Secondary's disk to a Backup
  Secondary so that the system can be protected very quickly after
  failover.
  
  I think anyone serious about fault tolerance would deploy a Backup
  Secondary, otherwise the system cannot survive two failures unless a
  human administrator is lucky/fast enough to set up a new Secondary.
 
 I'd assumed that a higher level management layer would do the allocation
 of a new secondary after the first failover, so no human need be involved.

That doesn't help, after the first failover is too late even if it's
done by a program.  There should be no window during which the VM is
unprotected.

People who want fault tolerance care about 9s of availability.  The VM
must be protected on the new Primary as soon as the failover occurs,
otherwise this isn't a serious fault tolerance solution.

Stefan


pgpVH1OR4tWnF.pgp
Description: PGP signature

[Qemu-devel] [PULL 1/8] s390-virtio: Accommodate guests using virtqueues too early

2015-05-08 Thread Cornelia Huck

From: Christian Borntraeger borntrae...@de.ibm.com

Feature updates are not a synchronuous operation for the legacy
s390-virtio transport. This transport syncs the guest feature bits
(those from finalize) on the set_status hypercall. Before that qemu
thinks that features are zero, which means QEMU will misbehave, e.g.
it will not write the event index, even if the guest asks for it.

Let's detect the case where a kick happens before the driver is ready
and force sync the features.
With this workaround, it is now safe to switch to the common feature
bit handling code as used by all other transports.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/s390x/s390-virtio.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c
index 3a1b9ee..59750db 100644
--- a/hw/s390x/s390-virtio.c
+++ b/hw/s390x/s390-virtio.c
@@ -77,6 +77,16 @@ static int s390_virtio_hcall_notify(const uint64_t *args)
 if (mem  ram_size) {
 VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus, mem, i);
 if (dev) {
+/*
+ * Older kernels will use the virtqueue before setting DRIVER_OK.
+ * In this case the feature bits are not yet up to date, meaning
+ * that several funny things can happen, e.g. the guest thinks
+ * EVENT_IDX is on and QEMU thinks it is off. Let's force a feature
+ * and status sync.
+ */
+if (!(dev-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK)) {
+s390_virtio_device_update_status(dev);
+}
 virtio_queue_notify(dev-vdev, i);
 } else {
 r = -EINVAL;
-- 
2.4.0

[Qemu-devel] [PULL 6/8] s390x/kvm: use ioctl KVM_S390_IRQ for vcpu interrupts

2015-05-08 Thread Cornelia Huck

From: Jens Freimann jf...@linux.vnet.ibm.com

KVM_S390_INT uses only two parameter fields. This is not
enough to pass all required information for certain interrupts.

A new ioctl KVM_S390_IRQ is available which allows us to
inject all local interrupts as defined in the Principles of
Operation. It takes a struct kvm_s390_irq as a parameter
which can store interrupt payload data for all interrupts.

Let's use the new ioctl for injecting vcpu interrupts.

Tested-by: Thomas Huth th...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 target-s390x/kvm.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 8e65e43..43ad009 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -124,6 +124,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 static int cap_sync_regs;
 static int cap_async_pf;
 static int cap_mem_op;
+static int cap_s390_irq;
 
 static void *legacy_s390_alloc(size_t size, uint64_t *align);
 
@@ -249,6 +250,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
 cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
 cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
+cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
 
 kvm_s390_enable_cmma(s);
 
@@ -827,10 +829,9 @@ static int s390_kvm_irq_to_interrupt(struct kvm_s390_irq 
*irq,
 return r;
 }
 
-void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct kvm_s390_irq *irq)
+static void inject_vcpu_irq_legacy(CPUState *cs, struct kvm_s390_irq *irq)
 {
 struct kvm_s390_interrupt kvmint = {};
-CPUState *cs = CPU(cpu);
 int r;
 
 r = s390_kvm_irq_to_interrupt(irq, kvmint);
@@ -846,6 +847,23 @@ void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct 
kvm_s390_irq *irq)
 }
 }
 
+void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct kvm_s390_irq *irq)
+{
+CPUState *cs = CPU(cpu);
+int r;
+
+if (cap_s390_irq) {
+r = kvm_vcpu_ioctl(cs, KVM_S390_IRQ, irq);
+if (!r) {
+return;
+}
+error_report(KVM failed to inject interrupt %llx, irq-type);
+exit(1);
+}
+
+inject_vcpu_irq_legacy(cs, irq);
+}
+
 static void __kvm_s390_floating_interrupt(struct kvm_s390_irq *irq)
 {
 struct kvm_s390_interrupt kvmint = {};
-- 
2.4.0

[Qemu-devel] [PATCH COLO v4 15/15] Implement new driver for block replication

2015-05-08 Thread Wen Congyang

Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Signed-off-by: Gonglei arei.gong...@huawei.com
---
 block/Makefile.objs |   1 +
 block/replication.c | 512 
 2 files changed, 513 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 8dd9b9c..8b1c587 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,6 +22,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..f2a9b2b
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,512 @@
+#include qemu-common.h
+#include block/block_int.h
+#include block/blockjob.h
+#include block/nbd.h
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+char *export_name;
+NBDExport *exp;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+int error;
+
+/*
+ * After failover, secondary qemu should read/write this
+ * bs directly.
+ */
+BlockDriverState *bs;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1  COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define NBD_OPT_EXPORT  export
+#define REPLICATION_MODEmode
+static QemuOptsList replication_runtime_opts = {
+.name = replication,
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = NBD_OPT_EXPORT,
+.type = QEMU_OPT_STRING,
+.help = The NBD server name,
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs-opaque;;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(replication_runtime_opts, NULL, 0, error_abort);
+qemu_opts_absorb_qdict(opts, options, local_err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(local_err, Missing the option mode);
+goto fail;
+}
+
+if (!strcmp(mode, primary)) {
+s-mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, secondary)) {
+s-mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(local_err,
+   The option mode's value should be primary or secondary);
+goto fail;
+}
+
+if (s-mode == REPLICATION_MODE_SECONDARY) {
+s-export_name = g_strdup(qemu_opt_get(opts, NBD_OPT_EXPORT));
+if (!s-export_name) {
+error_setg(local_err, Missing the option export);
+goto fail;
+}
+}
+
+return 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs-opaque;
+
+if (s-replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+
+g_free(s-export_name);
+}
+
+static void replication_refresh_filename(BlockDriverState *bs)
+{
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs-file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s-replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s-mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s-mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret  0) {
+s-error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+

Re: [Qemu-devel] [PATCH 2/6] watchdog: Add watchdog device diag288 to the sysbus

2015-05-08 Thread Cornelia Huck

On Fri, 17 Apr 2015 09:52:37 +0200
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 From: Xu Wang gesa...@linux.vnet.ibm.com
 
 A new sysbus device for diag288 watchdog, it monitors the kvm guest
 status, and take corresponding actions when it detects that the guest
 is not responding.
 
 Signed-off-by: Xu Wang gesa...@linux.vnet.ibm.com
 Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  default-configs/s390x-softmmu.mak |   1 +
  hw/watchdog/Makefile.objs |   1 +
  hw/watchdog/wdt_diag288.c | 110 
 ++
  include/hw/watchdog/wdt_diag288.h |  36 +
  qemu-options.hx   |   6 ++-
  5 files changed, 152 insertions(+), 2 deletions(-)
  create mode 100644 hw/watchdog/wdt_diag288.c
  create mode 100644 include/hw/watchdog/wdt_diag288.h

OK, after some deliberation, I think it is best to model the diag288
watchdog as a bus-less device: The other watchdogs are modelled as
devices as well, and it is probably a good idea to stay with this model
as other parties may have made assumptions based on this.

Xu, can you please switch the diag288 device to TYPE_DEVICE and address
Markus' comments regarding the help text as well?

Re: [Qemu-devel] [PULL 00/16] Migration pull request (v2)

2015-05-08 Thread Stefan Hajnoczi

On Thu, May 07, 2015 at 11:40:50PM +0530, Amit Shah wrote:
 On (Thu) 07 May 2015 [13:45:26], Peter Maydell wrote:
  On 7 May 2015 at 12:50, Juan Quintela quint...@redhat.com wrote:
  
  
   Hi again
  
   For v2
  
   - fix 32bit compilation (as said, compiling for 64bit linux, 64bit
 windows and 32bit windows was not enough)
  
   - Now, we have versions 2.4 everywhere (thanks Eric)
  
   - Liang Li sent a new patch to the list to fix the update of a migration 
   parameter, included.
  
   Please apply, and sorry for the inconvenience.
  
  Fails to build on win32:
 
 Does the buildbot try all these combinations?  I want to have a
 'stage' branch where I just push unapplied patches and receive
 complaints before sending a pull req.

The buildbot is dead and requires maintenance:
http://buildbot.b1-systems.de/qemu/one_line_per_build

There are two alternatives:

1. Travis (see .travis.yml) but it's missing mingw32.  It will never be
   able to do builds for host operating systems that do not support
   cross-compilation from Linux.

2. patchew (http://qemu.patchew.org/) but it only has Fedora 20 x86_64
   builds at the moment.  Adding mingw32 cross-compilation should be
   possible but it scans the mailing list rather than git repos.

The source code to patchew is available here:
https://github.com/famz/patchew

The trouble with continuous integration and build farms is that they
require maintenance.  I think patchew is the best bet right now since
Fam is developing it.

Stefan


pgpvELUKvDUM_.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Markus Armbruster

Paolo Bonzini pbonz...@redhat.com writes:

 On 08/05/2015 08:45, Markus Armbruster wrote:
 Luckily, most common error values are more or less universal: in
 particular, of all errno values = 34 (up to ERANGE), they are all
 the same on supported platforms except for 11 (which is EAGAIN on
 Windows and Linux, but EDEADLK on Darwin and the *BSDs).
 
 Can EAGAIN or EDEADLK happen?  I don't know is an acceptable answer :)

 No.  Even stuff like EFAULT would be a bug.

Good.

 So, in order to guarantee some portability, only keep a dozen
 possible error codes and squash everything else to EINVAL.
 
 Ugh.  I guess it'll do.
 
 Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in
 addition to 1, 5, ...

 Why?  It's a binary protocol after all.

Okay, I just succeeded in demonstrating my ignorance %-)

  But I agree that the right
 fix without backwards-compatibility would be to make the errors
 something like 0x8000 to 0x8004.

 Paolo

Two more remarks below.

 If backward compatibility is not an issue: s/in addition to/instead of/.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  nbd.c | 47 +++
  1 file changed, 47 insertions(+)

 diff --git a/nbd.c b/nbd.c
 index eea8c51..1ad5b66 100644
 --- a/nbd.c
 +++ b/nbd.c
 @@ -86,6 +86,37 @@
  #define NBD_OPT_ABORT   (2)
  #define NBD_OPT_LIST(3)
  
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */

 Is the protocol defined anywhere?

https://github.com/yoe/nbd/blob/master/doc/proto.txt

All it has on the error code is this paragraph:

The reply contains three fields: a 32 bit magic number ('magic'), a
32 bit error code ('error'; 0, unless an error occurred in which
case it is the errno of the error on the server side), and the same
64 bit handle that the corresponding request had in its 'handle'
field.  In case the reply is sent in response to a read request and
the error field is 0 (zero), the reply header is immediately
followed by request.len bytes of data.

Could you update it to document the errno compatibility issues, and
recommended practice (i.e. this patch's)?

 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return 1;
 +case EIO:
 +return 5;
 +case ENXIO:
 +return 6;
 +case E2BIG:
 +return 7;
 +case ENOMEM:
 +return 12;
 +case EACCES:
 +return 13;
 +case EFBIG:
 +return 27;
 +case ENOSPC:
 +return 28;
 +case EROFS:
 +return 30;
 +case EINVAL:
 +default:
 +return 22;
 +}
 +}
 +
 
 This maps recognized OS errnos to NBD errnos.  The latter are literals.
 
  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
 *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +/* NBD errors should be universally equal to the corresponding
 + * errno values, check it here.
 + */
 +QEMU_BUILD_BUG_ON(EPERM != 1);
 +QEMU_BUILD_BUG_ON(EIO != 5);
 +QEMU_BUILD_BUG_ON(ENXIO != 6);
 +QEMU_BUILD_BUG_ON(E2BIG != 7);
 +QEMU_BUILD_BUG_ON(ENOMEM != 12);
 +QEMU_BUILD_BUG_ON(EACCES != 13);
 +QEMU_BUILD_BUG_ON(EINVAL != 22);
 +QEMU_BUILD_BUG_ON(EFBIG != 27);
 +QEMU_BUILD_BUG_ON(ENOSPC != 28);
 +QEMU_BUILD_BUG_ON(EROFS != 30);
 +
 
 This checks that the mapping above is the identify function for all the
 recognized NBD errnos.  Why is that necessary?

Still curious.  Explain, and earn my R-by :)

 Same literals as above.  Violates DRY.  I don't mind all that much, but
 wonder whether we could at least do the checking next to
 system_errno_to_nbd_errno().

Could we?

  TRACE(Got reply: 
{ magic = 0x%x, .error = %d, handle = % PRIu64 },
magic, reply-error, reply-handle);
 @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct 
 nbd_reply *reply)
  uint8_t buf[NBD_REPLY_SIZE];
  ssize_t ret;
  
 +reply-error = system_errno_to_nbd_errno(reply-error);
 +
  /* Reply
 [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
 [ 4 ..  7]error   (0 == no error)

Re: [Qemu-devel] [PATCH v6 3/6] Qemu-Xen-vTPM: Xen frontend driver infrastructure

2015-05-08 Thread Stefano Stabellini

On Mon, 4 May 2015, Quan Xu wrote:
 This patch adds infrastructure for xen front drivers living in qemu,
 so drivers don't need to implement common stuff on their own.  It's
 mostly xenbus management stuff: some functions to access XenStore,
 setting up XenStore watches, callbacks on device discovery and state
 changes, and handle event channel between the virtual machines.
 
 Call xen_fe_register() function to register XenDevOps, and make sure,
  [...]
 3 = 
  [...]
  device =  (frontend device, the backend is running in QEMU/.etc)
   vkbd = 
[...]
   vif = 
[...]
 
  ..
 
 (QEMU) xen_vtpmdev_ops is initialized with the following process:
   xen_hvm_init()
 [...]
 --xen_fe_register(vtpm, ...)
   --xenstore_fe_scan()
 --xen_fe_try_init(ops)
   -- XenDevOps.init()
 --xen_fe_get_xendev()
   -- XenDevOps.alloc()
 --xen_fe_check()
   --xen_fe_try_initialise()
 -- XenDevOps.initialise()
   --xen_fe_try_connected()
 -- XenDevOps.connected()
 --xs_watch()
 [...]
 
 Signed-off-by: Quan Xu quan...@intel.com
 
 --Changes in v6:
  -Replace buf_size with PAGE_SIZE and use length rather than
   shr-length.
 ---
  hw/tpm/Makefile.objs |   1 +
  hw/tpm/xen_vtpm_frontend.c   | 315 
 +++
  hw/xen/xen_frontend.c|  20 +++
  include/hw/xen/xen_backend.h |   5 +
  include/hw/xen/xen_common.h  |   6 +
  xen-hvm.c|   5 +
  6 files changed, 352 insertions(+)
  create mode 100644 hw/tpm/xen_vtpm_frontend.c
 
 diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs
 index 99f5983..57919fa 100644
 --- a/hw/tpm/Makefile.objs
 +++ b/hw/tpm/Makefile.objs
 @@ -1,2 +1,3 @@
  common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
  common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
 +common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o
 diff --git a/hw/tpm/xen_vtpm_frontend.c b/hw/tpm/xen_vtpm_frontend.c
 new file mode 100644
 index 000..d6e7cc6
 --- /dev/null
 +++ b/hw/tpm/xen_vtpm_frontend.c
 @@ -0,0 +1,315 @@
 +/*
 + * Connect to Xen vTPM stubdom domain
 + *
 + *  Copyright (c) 2015 Intel Corporation
 + *  Authors:
 + *Quan Xu quan...@intel.com
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see 
 http://www.gnu.org/licenses/
 + */
 +
 +#include stdio.h
 +#include stdlib.h
 +#include stdarg.h
 +#include string.h
 +#include unistd.h
 +#include signal.h
 +#include inttypes.h
 +#include time.h
 +#include fcntl.h
 +#include errno.h
 +#include sys/ioctl.h
 +#include sys/types.h
 +#include sys/stat.h
 +#include sys/mman.h
 +#include sys/uio.h
 +
 +#include hw/hw.h
 +#include block/aio.h
 +#include hw/xen/xen_backend.h
 +
 +#ifndef XS_STUBDOM_VTPM_ENABLE
 +#define XS_STUBDOM_VTPM_ENABLE1
 +#endif
 +
 +#ifndef PAGE_SIZE
 +#define PAGE_SIZE  4096
 +#endif
 +
 +enum tpmif_state {
 +/* No contents, vTPM idle, cancel complete */
 +TPMIF_STATE_IDLE,
 +/* Request ready or vTPM working */
 +TPMIF_STATE_SUBMIT,
 +/* Response ready or vTPM idle */
 +TPMIF_STATE_FINISH,
 +/* Cancel requested or vTPM working */
 +TPMIF_STATE_CANCEL,
 +};
 +
 +static AioContext *vtpm_aio_ctx;
 +
 +enum status_bits {
 +VTPM_STATUS_RUNNING  = 0x1,
 +VTPM_STATUS_IDLE = 0x2,
 +VTPM_STATUS_RESULT   = 0x4,
 +VTPM_STATUS_CANCELED = 0x8,
 +};
 +
 +struct tpmif_shared_page {
 +/* Request and response length in bytes */
 +uint32_t length;
 +/* Enum tpmif_state */
 +uint8_t  state;
 +/* For the current request */
 +uint8_t  locality;
 +/* Should be zero */
 +uint8_t  pad;
 +/* Extra pages for long packets; may be zero */
 +uint8_t  nr_extra_pages;
 +/*
 + * Grant IDs, the length is actually nr_extra_pages.
 + * beyond the extra_pages entries is the actual request
 + * and response.
 + */
 +uint32_t extra_pages[0];
 +};
 +
 +struct xen_vtpm_dev {
 +struct XenDevice xendev;  /* must be first */
 +struct   tpmif_shared_page *shr;
 +xc_gntshr*xen_xcs;
 +int  ring_ref;
 +int  bedomid;
 +QEMUBH   *sr_bh;
 +};
 +
 +static uint8_t vtpm_status(struct xen_vtpm_dev *vtpmdev)
 +{
 +switch (vtpmdev-shr-state) {
 +case TPMIF_STATE_IDLE:
 +case TPMIF_STATE_FINISH:

Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-08 Thread Kevin Wolf

Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben:
 On 07/05/2015 16:34, Kevin Wolf wrote:
  Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
 
 
  On 07/05/2015 16:07, Kevin Wolf wrote:
  This is not right for two reasons: The first is that this is
  BlockBackend code
 
  I think it would take effect for the qemu-nbd case though.
  
  Oh, you want to change the server code rather than the client?
 
 Yes.

Actually, considering all the information in this thread, I'm inclined
that we should change both sides. qemu-nbd because ENOSPC might be what
clients expect by analogy with Linux block devices, even if the
behaviour for accesses beyond the device size isn't specified in the NBD
protocol and the server might just do anything. As long as the behaviour
is undefined, it's nice to do what most people may expect.

And as the real fix change the nbd client, because even if new qemu-nbd
versions will be nice, we shouldn't rely on undefined behaviour. We know
that old qemu-nbd servers won't produce ENOSPC and I'm not sure what
other NBD servers do.

  Wait... Are you saying that NBD sends a (platform specific) errno value
  over the network? :-/
 
 Yes. :/  That said, at least the error codes that Linux places in
 /usr/include/asm/errno-base.h seem to be pretty much standard---at least
 Windows and most Unices share them---with the exception of EAGAIN.
 
 I'll send a patch to NBD to standardize the set of error codes that it
 sends.

Thanks, that will be helpful in the future.

Is this the right place to look up the spec?
http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt

If so, the commands seem to be hopelessly underspecified, especially
with respect to error conditions. And where it says something about
errors, it doesn't make sense: The server is forbidden to reply to a
NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)

  In theory, what error code the NBD server needs to send should be
  specified by the NBD protocol. Am I right to assume that it doesn't do
  that?
 
 Nope.
 
  In any case, I'm not sure whether qemu's internal error code
  should change just for NBD. Producing the right error code for the
  protocol is the job of nbd_co_receive_request().
 
 Ok, so it shouldn't reach blk_check_request at all.  But then, we should
 aim at making blk_check_request's checks assertions.

Sounds fair as a goal, but I don't think all devices have such checks
yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk)
just a while ago.

  and it wouldn't even take effect for the qcow2 case
  where we're writing past EOF only on the protocol layer. The second is
  that -ENOSPC is only for writes and not for reads.
 
  This is right.
 
  Reads in the kernel return 0, but in QEMU we do not want that.  The code
  currently returns -EIO, but perhaps -EINVAL is a better match.  It also
  happens to be what Linux returns for discards.
  
  Perhaps it is, yes. It shouldn't make a difference for guests anyway.
  (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
  most likely not what we want.)
 
 Yes, we want the check duplicated in all BlockBackend users.  Most of
 them already do it, see the work that Markus did last year I think.

I wouldn't call it duplicated because the action to take is different
for each device, but yes, the check belongs there.

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 12:08, Kevin Wolf wrote:
 Actually, considering all the information in this thread, I'm inclined
 that we should change both sides. qemu-nbd because ENOSPC might be what
 clients expect by analogy with Linux block devices, even if the
 behaviour for accesses beyond the device size isn't specified in the NBD
 protocol and the server might just do anything. As long as the behaviour
 is undefined, it's nice to do what most people may expect.
 
 And as the real fix change the nbd client, because even if new qemu-nbd
 versions will be nice, we shouldn't rely on undefined behaviour. We know
 that old qemu-nbd servers won't produce ENOSPC and I'm not sure what
 other NBD servers do.

Sounds like a plan.

 Thanks, that will be helpful in the future.
 
 Is this the right place to look up the spec?
 http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt

Yes.

 If so, the commands seem to be hopelessly underspecified, especially
 with respect to error conditions. And where it says something about
 errors, it doesn't make sense: The server is forbidden to reply to a
 NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)

So does nbd-server. O:-)  Looks like you're reading the spec too
literally (which is never a bad thing).

 Ok, so it shouldn't reach blk_check_request at all.  But then, we should
 aim at making blk_check_request's checks assertions.
 
 Sounds fair as a goal, but I don't think all devices have such checks
 yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk)
 just a while ago.

Indeed (aim at).

Paolo

Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices

2015-05-08 Thread Stefan Hajnoczi

On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote:
 This is the behavior in the operating system, for example Linux's
 blkdev_write_iter has the following:
 
 if (bdev_read_only(I_BDEV(bd_inode)))
 return -EPERM;
 
 This does not apply to opening a device for read/write, when the
 device only supports read-only operation.  In this case any of
 EACCES, EPERM or EROFS is acceptable depending on why writing is
 not possible.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  block/io.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpHaZFSXh31k.pgp
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-08 Thread Kevin Wolf

Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben:
 On 08/05/2015 12:08, Kevin Wolf wrote:
  If so, the commands seem to be hopelessly underspecified, especially
  with respect to error conditions. And where it says something about
  errors, it doesn't make sense: The server is forbidden to reply to a
  NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)
 
 So does nbd-server. O:-)  Looks like you're reading the spec too
 literally (which is never a bad thing).

I don't think there is something like reading a spec too literally.
Specs are meant to be read literally. If a specification is open to
interpretation, you don't need it. So I'd rather say I've found a bug
in the spec. ;-)

As you already seem to be working on the NBD mailing list, do you want
to fix this, or should I subscribe and send a patch myself?

Kevin

[Qemu-devel] eventfd_signal()

2015-05-08 Thread Catalin Vasile

Is eventfd blocking? By that I mean: If I signal the guest for
something, will that thread remain blocked till the guest clears the
interrupt?

[Qemu-devel] [Bug 1349972] Re: qcow2-refcount: qemu-io crashes on 'discard' command

2015-05-08 Thread Kevin Wolf

I sent a patch that fixes the original problem that Maria reported. It's
hard to say whether this is the same problem as you saw, Sam, but it's
quite possible.

** Changed in: qemu
   Status: New = In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1349972

Title:
   qcow2-refcount: qemu-io crashes on 'discard' command

Status in QEMU:
  In Progress

Bug description:
  qemu-io is killed by SIGIOT at the 'discard' command on the image
  having no refcount information.

  Sequence:
  1. Unpack test.img and backing_img.qed in the same directory (see the 
attached archives for images)
  2. Make a copy of test.img to copy.img (qemu-io modifies the image before 
being kill, therefore the image backup is necessary)
  3. Run the command

  qemu-io copy.img -c 'discard 2210816 2856448'

  Result: qemu-io is killed by SIGIOT with the reason:

  qemu-io: block/qcow2-refcount.c:468: update_refcount_discard:
  Assertion `d-bytes + length == new_end - new_start' failed.

  
  The image was generated by the image fuzzer.

  qemu.git HEAD: 1d80eb7a680d

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1349972/+subscriptions

[Qemu-devel] [PATCH] net: Change help text to list -netdev instead of -net by default

2015-05-08 Thread Thomas Huth

Looking at the output of qemu-system-xxx -help, you easily get
the impression that -net is the preferred way instead of -netdev
to specify host network interface, since the -net option is
omnipresent but the -netdev option is only listed as a one-liner
at the end. This is ugly since -net is considered as legacy and
even might be removed one day. Thus, this patch switches the output
to explain the host network interfaces with the -netdev option
instead, moving the legacy -net option into some few lines at
the end.

Signed-off-by: Thomas Huth th...@redhat.com
---
 qemu-options.hx | 66 ++---
 1 file changed, 39 insertions(+), 27 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index ec356f6..2cab5f3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1449,25 +1449,25 @@ DEF(smb, HAS_ARG, QEMU_OPTION_smb, , QEMU_ARCH_ALL)
 #endif
 #endif
 
-DEF(net, HAS_ARG, QEMU_OPTION_net,
--net 
nic[,vlan=n][,macaddr=mac][,model=type][,name=str][,addr=str][,vectors=v]\n
-create a new Network Interface Card and connect it to 
VLAN 'n'\n
+DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_SLIRP
--net 
user[,vlan=n][,name=str][,net=addr[/mask]][,host=addr][,restrict=on|off]\n
+-netdev user,id=str[,net=addr[/mask]][,host=addr][,restrict=on|off]\n
  
[,hostname=host][,dhcpstart=addr][,dns=addr][,dnssearch=domain][,tftp=dir]\n
  [,bootfile=f][,hostfwd=rule][,guestfwd=rule]
 #ifndef _WIN32
  [,smb=dir[,smbserver=addr]]\n
 #endif
-connect the user mode network stack to VLAN 'n', 
configure its\n
-DHCP server and enabled optional services\n
+enable a user mode network interface, identified by 
'str',\n
+configure its DHCP server and enabled optional services\n
 #endif
 #ifdef _WIN32
--net tap[,vlan=n][,name=str],ifname=name\n
-connect the host TAP network interface to VLAN 'n'\n
+-netdev tap,id=str,ifname=name\n
+enable a host TAP network interface with ID 'str'\n
 #else
--net 
tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n
-connect the host TAP network interface to VLAN 'n'\n
+-netdev 
tap,id=str[,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile]\n
+ 
[,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off]\n
+ 
[,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n
+enable a host TAP network interface, identified by 
'str'\n
 use network scripts 'file' (default= 
DEFAULT_NETWORK_SCRIPT )\n
 to configure it and 'dfile' (default= 
DEFAULT_NETWORK_DOWN_SCRIPT )\n
 to deconfigure it\n
@@ -1486,13 +1486,16 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 use 'vhostfd=h' to connect to an already opened vhost net 
device\n
 use 'vhostfds=x:y:...:z to connect to multiple already 
opened vhost net devices\n
 use 'queues=n' to specify the number of queues to be 
created for multiqueue TAP\n
--net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n
+-netdev bridge,id=str[,br=bridge][,helper=helper]\n
 connects a host TAP network interface to a host bridge 
device 'br'\n
 (default= DEFAULT_BRIDGE_INTERFACE ) using the program 
'helper'\n
 (default= DEFAULT_BRIDGE_HELPER )\n
 #endif
 #ifdef __linux__
--net 
l2tpv3[,vlan=n][,name=str],src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6=on/off][,udp=on/off][,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]\n
+-netdev 
l2tpv3,id=str,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport]\n
+ 
[,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,udp=on/off]\n
+ [,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n
+ [,rxcookie=rxcookie][,offset=offset]\n
 connect the VLAN to an Ethernet over L2TPv3 pseudowire\n
 Linux kernel 3.3+ as well as most routers can talk\n
 L2TPv3. This transport allows connecting a VM to a VM,\n
@@ -1514,32 +1517,39 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 use 'pincounter=on' to work around broken counter 
handling in peer\n
 use 'offset=X' to add an extra offset between header and 
data\n
 #endif
--net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n
-connect the vlan 'n' to another VLAN using a

[Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Paolo Bonzini

Right now, NBD includes potentially platform-specific error values in
the wire protocol.

Luckily, most common error values are more or less universal: in
particular, of all errno values = 34 (up to ERANGE), they are all the
same on supported platforms except for 11 (which is EAGAIN on Windows and
Linux, but EDEADLK on Darwin and the *BSDs).  So, in order to guarantee
some portability, only keep a handful of possible error codes and squash
everything else to EINVAL.

This patch defines a limited set of errno values that are valid for the
NBD protocol, and specifies recommendations for what error to return
in specific corner cases.  The set of errno values is roughly based on
the errors listed in the read(2) and write(2) man pages, with some
exceptions:

- ENOMEM is added for servers that implement copy-on-write or other
  formats that require dynamic allocation.

- EDQUOT is not part of the universal set of errors; it can be changed
  to ENOSPC on the wire format.

- EFBIG is part of the universal set of errors, but it is also changed
  to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.

Incoming values will in general match system errno values, but not
on the Hurd which has different errno values (they have a subsystem
code equal to 0x10 in bits 24-31).  The Hurd is probably not something
to which QEMU has been ported, but still do the right thing and
reverse-map the NBD errno values to the system errno values.

The corresponding patch to the NBD protocol description can be found at
http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v1-v2: use #defines and do not expect errnos to match
NBD errnos for the sake of Hurd.  Reduced the list
of errno values even more.  Better commit message
explaining how the list was obtained.

 nbd.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/nbd.c b/nbd.c
index cb1b9bb..57d71b2 100644
--- a/nbd.c
+++ b/nbd.c
@@ -86,6 +86,54 @@
 #define NBD_OPT_ABORT   (2)
 #define NBD_OPT_LIST(3)
 
+/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
+ * but only a limited set of errno values is specified in the protocol.
+ * Everything else is squashed to EINVAL.
+ */
+#define NBD_EPERM  1
+#define NBD_EIO5
+#define NBD_ENOMEM 12
+#define NBD_EINVAL 22
+#define NBD_ENOSPC 28
+
+static int system_errno_to_nbd_errno(int err)
+{
+switch (err) {
+case EPERM:
+return NBD_EPERM;
+case EIO:
+return NBD_EIO;
+case ENOMEM:
+return NBD_ENOMEM;
+#ifdef EDQUOT
+case EDQUOT:
+#endif
+case EFBIG:
+case ENOSPC:
+return NBD_ENOSPC;
+case EINVAL:
+default:
+return NBD_EINVAL;
+}
+}
+
+static int nbd_errno_to_system_errno(int err)
+{
+switch (err) {
+case NBD_EPERM:
+return EPERM;
+case NBD_EIO:
+return EIO;
+case NBD_ENOMEM:
+return ENOMEM;
+case NBD_ENOSPC:
+return ENOSPC;
+case NBD_EINVAL:
+default:
+return EINVAL;
+}
+}
+
 /* Definitions for opaque data types */
 
 typedef struct NBDRequest NBDRequest;
@@ -856,6 +904,8 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
*reply)
 reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
 reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
 
+reply-error = nbd_errno_to_system_errno(reply-error);
+
 TRACE(Got reply: 
   { magic = 0x%x, .error = %d, handle = % PRIu64 },
   magic, reply-error, reply-handle);
@@ -872,6 +922,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply 
*reply)
 uint8_t buf[NBD_REPLY_SIZE];
 ssize_t ret;
 
+reply-error = system_errno_to_nbd_errno(reply-error);
+
 /* Reply
[ 0 ..  3]magic   (NBD_REPLY_MAGIC)
[ 4 ..  7]error   (0 == no error)
-- 
2.3.5

Re: [Qemu-devel] [PATCH v6 2/6] Qemu-Xen-vTPM: Xen frontend driver infrastructure

2015-05-08 Thread Stefano Stabellini

On Mon, 4 May 2015, Quan Xu wrote:
 @@ -97,6 +116,7 @@ extern struct XenDevOps xen_kbdmouse_ops; /* 
 xen_framebuffer.c */
  extern struct XenDevOps xen_framebuffer_ops;  /* xen_framebuffer.c */
  extern struct XenDevOps xen_blkdev_ops;   /* xen_disk.c*/
  extern struct XenDevOps xen_netdev_ops;   /* xen_nic.c */
 +extern struct XenDevOps xen_vtpmdev_ops;  /* xen_vtpm_frontend.c*/

This should be in the next patch.

Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description

2015-05-08 Thread Dr. David Alan Gilbert

* Kevin Wolf (kw...@redhat.com) wrote:
 Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben:
  On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
   * Stefan Hajnoczi (stefa...@redhat.com) wrote:
On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:
 
 
 On 24/04/2015 11:38, Wen Congyang wrote:
   
   That can be done with drive-mirror.  But I think it's too early 
   for that.
  Do you mean use drive-mirror instead of quorum?
 
 Only before starting up a new secondary.  Basically you do a migration
 with non-shared storage, and then start the secondary in colo mode.
 
 But it's only for the failover case.  Quorum (or a new block/colo.c
 driver or filter) is fine for normal colo operation.

Perhaps this patch series should mirror the Secondary's disk to a Backup
Secondary so that the system can be protected very quickly after
failover.

I think anyone serious about fault tolerance would deploy a Backup
Secondary, otherwise the system cannot survive two failures unless a
human administrator is lucky/fast enough to set up a new Secondary.
   
   I'd assumed that a higher level management layer would do the allocation
   of a new secondary after the first failover, so no human need be involved.
  
  That doesn't help, after the first failover is too late even if it's
  done by a program.  There should be no window during which the VM is
  unprotected.
  
  People who want fault tolerance care about 9s of availability.  The VM
  must be protected on the new Primary as soon as the failover occurs,
  otherwise this isn't a serious fault tolerance solution.
 
 If you're worried about two failures in a row, why wouldn't you be
 worried about three in a row? I think if you really want more than one
 backup to be ready, you shouldn't go to two, but to n.

Agreed, if you did multiple secondaries you'd do 'n'.

But 1+2 does satisfy all but the most paranoid; and in particular it does
mean that if you want to take a host down for some maintenance you can
do it without worrying.

But, as I said in my reply to Stefan, doing more than 1+1 gets really hairy;
the combinations of failovers are much more complicated.

Dave
  1) It means that 
  1) As Stefan mentions you get worried about the lack of protection after
the first failover; 
 Kevin


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices

2015-05-08 Thread Stefan Hajnoczi

On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote:
 This is the behavior in the operating system, for example Linux's
 blkdev_write_iter has the following:
 
 if (bdev_read_only(I_BDEV(bd_inode)))
 return -EPERM;
 
 This does not apply to opening a device for read/write, when the
 device only supports read-only operation.  In this case any of
 EACCES, EPERM or EROFS is acceptable depending on why writing is
 not possible.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  block/io.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


pgpYpxPw21sfz.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 12:12, Markus Armbruster wrote:
 The corresponding patch to the NBD protocol description can be found at
 http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.

[...]

 - EFBIG is part of the universal set of errors, but it is also changed
   to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.
 
 Perhaps debatable, but I defer to your judgement.

EFBIG is weird anyway, and requires you (or your parents) to ignore
SIGXFSZ.  A simpler protocol puts fewer requirements on the client.

(In fact, we probably should ignore SIGXFSZ in QEMU and treat EFBIG like
ENOSPC everywhere.  Should doesn't mean that it will get on anyone's
todo list or priority list...).

 +static int nbd_errno_to_system_errno(int err)
 +{
 +switch (err) {
 +case NBD_EPERM:
 +return EPERM;
 +case NBD_EIO:
 +return EIO;
 +case NBD_ENOMEM:
 +return ENOMEM;
 +case NBD_ENOSPC:
 +return ENOSPC;
 +case NBD_EINVAL:
 +default:
 +return EINVAL;
 +}
 +}
 +
 
 If we reach default, something's amiss, isn't it?  Worth logging
 something then?

Not necessarily.  You could have an older NBD server on the other side,
and then the errno code might only be valid on another platform!

In fact, if the NBD server is qemu-nbd, the producer of the error could
be any driver in block/ for any old version of QEMU.  I would not be
surprised if an EPIPE or ENOTCONN sneaked in somehow.

And there are probably other NBD servers around.  CCing Rich Jones so
that he can fix nbdkit.

Paolo

Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers

2015-05-08 Thread Cornelia Huck

On Fri, 08 May 2015 13:30:18 +0300
Pavel Fedin p.fe...@samsung.com wrote:

  Hello!
 
  Yes, I think it makes sense to just pick the low-hanging fruit for
  virtio-mmio and wait for pci.
 
  Does this mean that my series can be accepted as it is? Since PCI is 
 potentially better
 solution, MMIO is a low priority in my project, and i have lots of other 
 tasks. This means
 i unfortunately don't have time for further refactor. If you ACK, i will 
 resend the series
 once again as v3, i set up git send-email and it should be working now.
  I just wanted to share this piece because it's already done, and i would not 
 like it to
 go to oblivion again.

I think (ommitting the refactoring) you also need to care about reset.
(And maybe fold patches 1 and 3.) Should not be too much effort, I hope.

And I'd recommend to cc: MST, so that there is a chance of it not
getting lost in his after-vacation mail avalanche :)

Re: [Qemu-devel] [PATCH] qcow2: Flush pending discards before allocating cluster

2015-05-08 Thread Kevin Wolf

Am 06.05.2015 um 13:29 hat Kevin Wolf geschrieben:
 Before a freed cluster can be reused, pending discards for this cluster
 must be processed.
 
 The original assumption was that this was not a problem because discards
 are only cached during discard/write zeroes operations, which are
 synchronous so that no concurrent write requests can cause cluster
 allocations.
 
 However, the discard/write zeroes operation itself can allocate a new L2
 table (and it has to in order to put zero flags there), so make sure we
 can cope with the situation.
 
 This fixes https://bugs.launchpad.net/bugs/1349972.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com

Cc: qemu-sta...@nongnu.org

Applied to my block branch.

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 12:34, Kevin Wolf wrote:
 Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben:
 On 08/05/2015 12:08, Kevin Wolf wrote:
 If so, the commands seem to be hopelessly underspecified, especially
 with respect to error conditions. And where it says something about
 errors, it doesn't make sense: The server is forbidden to reply to a
 NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)

 So does nbd-server. O:-)  Looks like you're reading the spec too
 literally (which is never a bad thing).
 
 I don't think there is something like reading a spec too literally.
 Specs are meant to be read literally. If a specification is open to
 interpretation, you don't need it. So I'd rather say I've found a bug
 in the spec. ;-)

You have.  The bug is a single missing word (successful) reply, but it
is still a bug.

There is another bug, in that it talks about outstanding writes rather
than completed writes.

 As you already seem to be working on the NBD mailing list, do you want
 to fix this, or should I subscribe and send a patch myself?

You've been CCed on the fix.

Paolo

Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description

2015-05-08 Thread Dr. David Alan Gilbert

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
 On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
  * Stefan Hajnoczi (stefa...@redhat.com) wrote:
   On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:


On 24/04/2015 11:38, Wen Congyang wrote:
  
  That can be done with drive-mirror.  But I think it's too early 
  for that.
 Do you mean use drive-mirror instead of quorum?

Only before starting up a new secondary.  Basically you do a migration
with non-shared storage, and then start the secondary in colo mode.

But it's only for the failover case.  Quorum (or a new block/colo.c
driver or filter) is fine for normal colo operation.
   
   Perhaps this patch series should mirror the Secondary's disk to a Backup
   Secondary so that the system can be protected very quickly after
   failover.
   
   I think anyone serious about fault tolerance would deploy a Backup
   Secondary, otherwise the system cannot survive two failures unless a
   human administrator is lucky/fast enough to set up a new Secondary.
  
  I'd assumed that a higher level management layer would do the allocation
  of a new secondary after the first failover, so no human need be involved.
 
 That doesn't help, after the first failover is too late even if it's
 done by a program.  There should be no window during which the VM is
 unprotected.

 People who want fault tolerance care about 9s of availability.  The VM
 must be protected on the new Primary as soon as the failover occurs,
 otherwise this isn't a serious fault tolerance solution.

I'm not aware of any other system that manages that, so I don't
think that's fair.

You gain a lot more availability going from a single
system to the 1+1 system that COLO (or any of the checkpointing systems)
propose, I can't say how many 9s it gets you.  It's true having multiple
secondaries would get you a bit more on top of that, but you're still
a lot better off just having the one secondary.

I had thought that having 1 secondary would be a nice addition, but it's
a big change everywhere else (e.g. having to maintain multiple migration
streams, dealing with miscompares from multiple hosts).

Dave


 
 Stefan


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description

2015-05-08 Thread Kevin Wolf

Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben:
 On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote:
  * Stefan Hajnoczi (stefa...@redhat.com) wrote:
   On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote:


On 24/04/2015 11:38, Wen Congyang wrote:
  
  That can be done with drive-mirror.  But I think it's too early 
  for that.
 Do you mean use drive-mirror instead of quorum?

Only before starting up a new secondary.  Basically you do a migration
with non-shared storage, and then start the secondary in colo mode.

But it's only for the failover case.  Quorum (or a new block/colo.c
driver or filter) is fine for normal colo operation.
   
   Perhaps this patch series should mirror the Secondary's disk to a Backup
   Secondary so that the system can be protected very quickly after
   failover.
   
   I think anyone serious about fault tolerance would deploy a Backup
   Secondary, otherwise the system cannot survive two failures unless a
   human administrator is lucky/fast enough to set up a new Secondary.
  
  I'd assumed that a higher level management layer would do the allocation
  of a new secondary after the first failover, so no human need be involved.
 
 That doesn't help, after the first failover is too late even if it's
 done by a program.  There should be no window during which the VM is
 unprotected.
 
 People who want fault tolerance care about 9s of availability.  The VM
 must be protected on the new Primary as soon as the failover occurs,
 otherwise this isn't a serious fault tolerance solution.

If you're worried about two failures in a row, why wouldn't you be
worried about three in a row? I think if you really want more than one
backup to be ready, you shouldn't go to two, but to n.

Kevin


pgp6wi8xJKZQC.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 11:32, Markus Armbruster wrote:
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */

 Is the protocol defined anywhere?
 
 https://github.com/yoe/nbd/blob/master/doc/proto.txt
 
 All it has on the error code is this paragraph:
 
 The reply contains three fields: a 32 bit magic number ('magic'), a
 32 bit error code ('error'; 0, unless an error occurred in which
 case it is the errno of the error on the server side), and the same
 64 bit handle that the corresponding request had in its 'handle'
 field.  In case the reply is sent in response to a read request and
 the error field is 0 (zero), the reply header is immediately
 followed by request.len bytes of data.
 
 Could you update it to document the errno compatibility issues, and
 recommended practice (i.e. this patch's)?

Yes, I've sent a patch yesterday.

 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return 1;
 +case EIO:
 +return 5;
 +case ENXIO:
 +return 6;
 +case E2BIG:
 +return 7;
 +case ENOMEM:
 +return 12;
 +case EACCES:
 +return 13;
 +case EFBIG:
 +return 27;
 +case ENOSPC:
 +return 28;
 +case EROFS:
 +return 30;
 +case EINVAL:
 +default:
 +return 22;
 +}
 +}
 +

 This maps recognized OS errnos to NBD errnos.  The latter are literals.

  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
 *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +/* NBD errors should be universally equal to the corresponding
 + * errno values, check it here.
 + */
 +QEMU_BUILD_BUG_ON(EPERM != 1);
 +QEMU_BUILD_BUG_ON(EIO != 5);
 +QEMU_BUILD_BUG_ON(ENXIO != 6);
 +QEMU_BUILD_BUG_ON(E2BIG != 7);
 +QEMU_BUILD_BUG_ON(ENOMEM != 12);
 +QEMU_BUILD_BUG_ON(EACCES != 13);
 +QEMU_BUILD_BUG_ON(EINVAL != 22);
 +QEMU_BUILD_BUG_ON(EFBIG != 27);
 +QEMU_BUILD_BUG_ON(ENOSPC != 28);
 +QEMU_BUILD_BUG_ON(EROFS != 30);
 +

 This checks that the mapping above is the identify function for all the
 recognized NBD errnos.  Why is that necessary?
 
 Still curious.  Explain, and earn my R-by :)

Because block/nbd.c expects host errnos, and I was too lazy to add a
switch and a nbd_errno_to_system_errno function.  But Eric pointed out
that Hurd has weird errnos (the low bits match, but there's a 0x10
subsystem code in bits 24-31) so I'll add it.

 Same literals as above.  Violates DRY.  I don't mind all that much, but
 wonder whether we could at least do the checking next to
 system_errno_to_nbd_errno().
 
 Could we?

Yes, s/could/should/.  Also, should have added RFC to the patch.

Paolo

  TRACE(Got reply: 
{ magic = 0x%x, .error = %d, handle = % PRIu64 },
magic, reply-error, reply-handle);
 @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct 
 nbd_reply *reply)
  uint8_t buf[NBD_REPLY_SIZE];
  ssize_t ret;
  
 +reply-error = system_errno_to_nbd_errno(reply-error);
 +
  /* Reply
 [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
 [ 4 ..  7]error   (0 == no error)

[Qemu-devel] [PATCH] qemu-nbd: return ENOSPC for out-of-range writes

2015-05-08 Thread Paolo Bonzini

This ensures that werror=enospc works fine for NBD-backed devices.
Recovery can be done through live snapshots even if the NBD server
does not support online resizing.

Suggested-by: Kevin Wolf kw...@redhat.com
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/nbd.c b/nbd.c
index 57d71b2..a04ba80 100644
--- a/nbd.c
+++ b/nbd.c
@@ -1297,7 +1297,8 @@ static void nbd_trip(void *opaque)
 request.from, request.len,
 (uint64_t)exp-size, (uint64_t)exp-dev_offset);
 LOG(requested operation past EOF--bad client?);
-goto invalid_request;
+reply.error = (command == NBD_CMD_WRITE) ? ENOSPC : EINVAL;
+goto error_reply;
 }
 
 switch (command) {
@@ -1390,7 +1391,6 @@ static void nbd_trip(void *opaque)
 break;
 default:
 LOG(invalid request type (%u) received, request.type);
-invalid_request:
 reply.error = EINVAL;
 error_reply:
 if (nbd_co_send_reply(req, reply, 0)  0) {
-- 
2.3.5

Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers

2015-05-08 Thread Cornelia Huck

On Fri, 08 May 2015 09:45:00 +0300
Pavel Fedin p.fe...@samsung.com wrote:

  Hello!
 
  Hm, weren't there some patches for irqfd on arm?
 
  Yes, there were. However, they had a design problem by breaking backwards 
 compatibility
 with unmodified virtio. Their idea was to set up one more shared memory area 
 between
 virtio and vhost-net and use it to pass ISR value, which helps to 
 distinguish, which event
 took place (queue update or config change). So, this idea was rejected.
  http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html
 
  I thought about it, and technically, i think, this can be done in better 
 way. Actually,
 as far as i understood, all we need is mechanism for distinguishing between 
 these two
 events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ 
 signals exactly
 one type of event. MSI-X code also has two IRQs mode as a failsafe, where 
 one IRQ
 signals config change and another IRQ signals queues update (and all queues 
 are polled in
 turn). I think a similar thing could be done for virtio-mmio. It could 
 allocate two IRQs
 instead of one and describe both of them in the device tree. Guest side, upon 
 seeing that,
 could make use of those two IRQs and acknowledge to the host side that yes, 
 i am new
 version and use new mode.
  But, sorry, i will unlikely implement this, because we already have PCI with 
 MSI-X (i
 hope this is going to be published soon), so my project can use PCI 
 emulation. So
 implementing irqfds for virtio-mmio is a bit out of my scope.

Thanks for the explanation.

Yes, I think it makes sense to just pick the low-hanging fruit for
virtio-mmio and wait for pci.

[Qemu-devel] [virtio] use virtqueue async

2015-05-08 Thread Catalin Vasile

Can I use a virtqueue asynchronously between guest and backend?
By that I mean that any end of the communication channel (virtqueue)
can push/pop how much data he wants anytime, without worrying that it
will pop a thing that it pushed on the queue, and in fact the other
end should have popped it? Or do I have to use 2 virtqueues for this
to happen?
What I'm trying to do is to process jobs in the backend continuously
and then send a little more detailed result of the job. The job itself
will run async on some piece of hardware while I continue popping jobs
from the queue (the backend itself will just schedule them, will not
busy wait till they get done). When a job is done, a callback will be
called which will push the result one the virtqueue.
What I am worrying about is what if I use the same queue for job
request and job result? Is there a possibility for a job result to be
processed by the same entity that put it there, instead of the other
end?

Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates

2015-05-08 Thread Kevin Wolf

Am 07.05.2015 um 06:04 hat Zhe Qiu geschrieben:
 From: phoeagon phoea...@gmail.com
 
 In reference to 
 b0ad5a455d7e5352d4c86ba945112011dbeadfb8~078a458e077d6b0db262c4b05fee51d01de2d1d2,
  metadata writes to qcow2/cow/qcow/vpc/vmdk are all synced prior to 
 succeeding writes.

The justification for these patches (in 2010!) was more or less that we
didn't know whether the implementations were safe without the flushes.
Many of the flushes added back then have been removed again until today
because they have turned out to be unnecessary.

 Only when write is successful that bdrv_flush is called.
 
 Signed-off-by: Zhe Qiu phoea...@gmail.com

Please describe why VDI needs this flush to be safe. This is best done
by describing a case where not doing the flush would lead to image
corruption in case of a crash.

To my knowledge, the VDI driver is completely safe without it.

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH] vmdk: Fix overflow if l1_size is 0x20000000

2015-05-08 Thread Kevin Wolf

Am 06.05.2015 um 19:50 hat Max Reitz geschrieben:
 On 05.05.2015 11:28, Fam Zheng wrote:
 Richard Jones caught this bug with afl fuzzer.
 
 In fact, that's the only possible value to overflow (extent-l1_size =
 0x2000) l1_size:
 
 l1_size = extent-l1_size * sizeof(long) = 0x8000;
 
 g_try_malloc returns NULL because l1_size is interpreted as negative
 during type casting from 'int' to 'gsize', which yields a enormous
 value. Hence, by coincidence, we get a not too bad behavior:
 
 qemu-img: Could not open '/tmp/afl6.img': Could not open
 '/tmp/afl6.img': Cannot allocate memory
 
 Values larger than 0x2000 will be refused by the validation in
 vmdk_add_extent.
 
 Values smaller than 0x2000 will not overflow l1_size.
 
 Reported-by: Richard W.M. Jones rjo...@redhat.com
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
   block/vmdk.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
 Okay, so because size_t is unsigned, this will not overflow on
 systems with sizeof(size_t) == 4 either.
 
 Reviewed-by: Max Reitz mre...@redhat.com

Thanks, applied to the block branch. I think this one is for stable,
too.

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache

2015-05-08 Thread Kevin Wolf

Am 08.05.2015 um 11:00 hat Alberto Garcia geschrieben:
 On Fri 24 Apr 2015 03:04:06 PM CEST, Kevin Wolf kw...@redhat.com wrote:
 
   I think it would be nice to have a way to free unused cache
   entries after a while.
  
   Do you think mmap plus a periodic timer would work?
  
   I'm hesitant about changes like this because they make QEMU more
   complex, slow down the guest, and make the memory footprint
   volatile.  But if a simple solution addresses the problem, then I'd
   be happy.
  
  I understand. One possible approach would be to use the timer only if
  the cache size requested by the user is large, so
  
   1) we only try to save memory when the amount of memory to save is
  significant.
   2) we don't make the default scenario slower.
  
  I'll try to see if I can come up with a good solution (and with more
  numbers).
 
  I suspect that at this point, Anthony would have reminded us about
  separation of policy and mechanism. The very least we need is some
  option to control the policy (probably defaulting to no automatic
  cache size changes).
 
 There's also the problem that with a single chunk of memory for all
 cache tables it's not so easy to free individual entries.

That's one of the reasons why I suggested to fix the problem by using
mmap() for the individual entries rather than allocating one big chunk.

 One possible solution would be madvise() with MADV_DONTNEED, and that's
 rather simple to use, but I'm unsure about its portability.

Probably not portable at all, but the worst that can happen is that a
cache doesn't shrink, i.e. behaves like today. Seems good enough.

  The other option would be to let the management tool change the cache
  size options at runtime (and as it happens, my current bdrv_reopen()
  overhaul allows just that - except that a QMP interface isn't planned
  yet), but I'm not sure if it has enough information to make good
  decisions. If we know what the criteria are, they could possibly be
  exposed, though.
 
 But what would you do there? Recreate the whole cache?

Recreating the cache is the easiest solution (and it is what my reopen
patches do) and the overhead is probably neglegible as long as you don't
resize the cache all the time.

If we really need it, resizing the cache while maintaining its content
would be possible, too. Extending shouldn't be a problem at all (with
your patches you'd have to rehash the entries, but that's it), and for
shrinking you would have to evict some entries (based on the algorithm
that is used in other cases as well). Neither sounds particularly
complicated, it's just some work that needs to be done if we ever get a
requirement like this.

Kevin

Re: [Qemu-devel] [PATCH v6 00/22] Generate ACPI v5.1 tables and expose them to guest over fw_cfg on ARM

2015-05-08 Thread Shannon Zhao

Hi Peter,

On 2015/5/8 0:34, Peter Maydell wrote:
 Hi. I've had a look through these patches now, and sent some mails
 for a couple of specific issues. For the rest, this is what I'd
 like to see for me to pull this in to target-arm:
 

Thanks for your review comments.

 (1) acks or reviewed-by from one of the QEMU ACPI folk for any
 patch that's touching generic ACPI code.

Yeah, Igor said he will give reviewed-by for these patches.

 (2) series should pass checkpatch.pl (you have a lot of overlong lines
 warnings at the moment).

Sorry, I will fix patch 07/22， but to patch 01/22， as I just move the
file, maybe I don't need to deal the formats error of acpi-defs.h. Right?

 (3) we shouldn't generate any ACPI tables or information where the
 corresponding kernel code hasn't been accepted into mainline or
 the arm64 tree; if there's anything here which isn't covered by
 the accepted ACPI patches we should postpone that til the kernel
 parts are in.
 

There are uart, virtio-mmio, PCIe whose kernel patches are not accepted
into kernel mainline.
For PCIe I think this is harmless. The PCIe ACPI table's format don't
need to change even though the kernel code changes because it's defined
by PCI firmware SPEC.
For virtio-mmio the kernel code just add acpi_match_table for kernel
driver to probe the device by ACPI. To be honestly the only thing will
be changed is the _HID and this change is very small. It's same with uart.

In a word, the only thing will be changed is the _HID of virtio-mmio and
uart. But if we get these tables in, it's useful for people to test the
corresponding kernel code.

 If item (3) seems too awkward or harsh as a criterion I'm
 open to discussion about it, but my feeling is that I'd
 rather not start emitting tables until they're definitely
 nailed down in format by being in the kernel.
 
 Finally, if you have some prebuilt images and command lines that
 would be helpful, since I can put them into my local set of
 things I run to check for breakage before making ARM pull requests.

Ok.

-- 
Shannon

[Qemu-devel] [RFC v1 PATCH 0/3] cpus: Convert cpu_index into a bitmap

2015-05-08 Thread Bharata B Rao

This patch changes the way cpu_index is handed out to newly created
CPUs by tracking the allocted CPUs in a bitmap. More information and
the need for this patch is described in patch 2/3 of this series. These
generic changes are needed to support CPU hot plug/unplug on PowerPC.

cpu_index is allocated in cpu_exec_init() and freed (during CPU unplug) in
the newly added API cpu_exec_exit(). Currently all architectures call
cpu_exec_init() from instance_init and hence cpu_exec_exit() is called
from instance_finalize for all archs in this series.

Instead of adding instance_finalize to all archs, we could call
cpu_exec_exit() from the parent class (CPUClass.instance_finalize).
However archs like x86 and PowerPC want cpu_exec_init() to be moved to
realizefn from instance_init. Such movement will become easy with the
current approach. Eudardo's suggested alternative of making cpu_exec_exit()
idempotent

(http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01241.html)

will also simplify this movement, but I think that would compilicate the
deallocation logic.

Another open question is about handling of holes correctly in the allocated
bitmap to support VM migration after CPU unplug. This was briefly discussed
here:

https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg00560.html

Should cpu_exec_init() API support specifying of a particular cpu_index
in addition to returning the next available cpu_index by default ? I know
that QEMU cmdline semantics for CPU device add/delele haven't been defined
yet, but should we now make provision in cpu_exec_init() to allocate the
required cpu_index ?

Changes in v1
-
- Added Error argument to cpu_exec_init() so that it callers calling
  it from realizefn can collect errors.

v0: https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg02950.html

Bharata B Rao (3):
  cpus: Add Error argument to cpu_exec_init()
  cpus: Convert cpu_index into a bitmap
  ppc: Move cpu_exec_init() call to realize function

 exec.c  | 39 +++
 include/exec/exec-all.h |  2 +-
 include/qom/cpu.h   |  8 
 target-alpha/cpu.c  |  8 +++-
 target-arm/cpu.c|  3 ++-
 target-cris/cpu.c   |  8 +++-
 target-i386/cpu.c   |  8 +++-
 target-lm32/cpu.c   |  8 +++-
 target-m68k/cpu.c   |  8 +++-
 target-microblaze/cpu.c |  8 +++-
 target-mips/cpu.c   |  8 +++-
 target-moxie/cpu.c  |  8 +++-
 target-openrisc/cpu.c   |  8 +++-
 target-ppc/translate_init.c | 15 +--
 target-s390x/cpu.c  |  3 ++-
 target-sh4/cpu.c|  8 +++-
 target-sparc/cpu.c  |  3 ++-
 target-tricore/cpu.c|  7 ++-
 target-unicore32/cpu.c  |  8 +++-
 target-xtensa/cpu.c |  8 +++-
 20 files changed, 153 insertions(+), 23 deletions(-)

-- 
2.1.0

[Qemu-devel] [RFC v1 PATCH 1/3] cpus: Add Error argument to cpu_exec_init()

2015-05-08 Thread Bharata B Rao

Add an Error argument to cpu_exec_init() to let users collect the
error. Change all callers to currently pass NULL error argument. This change
is needed for the following reasons:

- A subsequent commit changes the CPU enumeration logic in cpu_exec_init()
  resulting in cpu_exec_init() to fail if cpu_index values corresponding
  to max_cpus have already been handed out.
- Archs like PowerPC and x86 want cpu_exec_init() to be called from realize
  rather than instance_init. With this change, those architectures
  that can move this call into realize function can do so in a phased
  manner.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
---
 exec.c  | 2 +-
 include/exec/exec-all.h | 2 +-
 target-alpha/cpu.c  | 2 +-
 target-arm/cpu.c| 2 +-
 target-cris/cpu.c   | 2 +-
 target-i386/cpu.c   | 2 +-
 target-lm32/cpu.c   | 2 +-
 target-m68k/cpu.c   | 2 +-
 target-microblaze/cpu.c | 2 +-
 target-mips/cpu.c   | 2 +-
 target-moxie/cpu.c  | 2 +-
 target-openrisc/cpu.c   | 2 +-
 target-ppc/translate_init.c | 2 +-
 target-s390x/cpu.c  | 2 +-
 target-sh4/cpu.c| 2 +-
 target-sparc/cpu.c  | 2 +-
 target-tricore/cpu.c| 2 +-
 target-unicore32/cpu.c  | 2 +-
 target-xtensa/cpu.c | 2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/exec.c b/exec.c
index ae37b98..68b632b 100644
--- a/exec.c
+++ b/exec.c
@@ -519,7 +519,7 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace 
*as)
 }
 #endif
 
-void cpu_exec_init(CPUArchState *env)
+void cpu_exec_init(CPUArchState *env, Error **errp)
 {
 CPUState *cpu = ENV_GET_CPU(env);
 CPUClass *cc = CPU_GET_CLASS(cpu);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b58cd47..0a3a504 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -88,7 +88,7 @@ void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t 
retaddr);
 TranslationBlock *tb_gen_code(CPUState *cpu,
   target_ulong pc, target_ulong cs_base, int flags,
   int cflags);
-void cpu_exec_init(CPUArchState *env);
+void cpu_exec_init(CPUArchState *env, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
 int page_unprotect(target_ulong address, uintptr_t pc, void *puc);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
index a98b7d8..0a0c21e 100644
--- a/target-alpha/cpu.c
+++ b/target-alpha/cpu.c
@@ -257,7 +257,7 @@ static void alpha_cpu_initfn(Object *obj)
 CPUAlphaState *env = cpu-env;
 
 cs-env_ptr = env;
-cpu_exec_init(env);
+cpu_exec_init(env, NULL);
 tlb_flush(cs, 1);
 
 alpha_translate_init();
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 3ca3fa8..0b27c19 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -369,7 +369,7 @@ static void arm_cpu_initfn(Object *obj)
 static bool inited;
 
 cs-env_ptr = cpu-env;
-cpu_exec_init(cpu-env);
+cpu_exec_init(cpu-env, NULL);
 cpu-cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal,
  g_free, g_free);
 
diff --git a/target-cris/cpu.c b/target-cris/cpu.c
index 16cfba9..8b589ec 100644
--- a/target-cris/cpu.c
+++ b/target-cris/cpu.c
@@ -170,7 +170,7 @@ static void cris_cpu_initfn(Object *obj)
 static bool tcg_initialized;
 
 cs-env_ptr = env;
-cpu_exec_init(env);
+cpu_exec_init(env, NULL);
 
 env-pregs[PR_VR] = ccc-vr;
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3305e09..72e8fa9 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2850,7 +2850,7 @@ static void x86_cpu_initfn(Object *obj)
 static int inited;
 
 cs-env_ptr = env;
-cpu_exec_init(env);
+cpu_exec_init(env, NULL);
 
 object_property_add(obj, family, int,
 x86_cpuid_version_get_family,
diff --git a/target-lm32/cpu.c b/target-lm32/cpu.c
index f8081f5..89b6631 100644
--- a/target-lm32/cpu.c
+++ b/target-lm32/cpu.c
@@ -151,7 +151,7 @@ static void lm32_cpu_initfn(Object *obj)
 static bool tcg_initialized;
 
 cs-env_ptr = env;
-cpu_exec_init(env);
+cpu_exec_init(env, NULL);
 
 env-flags = 0;
 
diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c
index 4cfb725..6a41551 100644
--- a/target-m68k/cpu.c
+++ b/target-m68k/cpu.c
@@ -168,7 +168,7 @@ static void m68k_cpu_initfn(Object *obj)
 static bool inited;
 
 cs-env_ptr = env;
-cpu_exec_init(env);
+cpu_exec_init(env, NULL);
 
 if (tcg_enabled()  !inited) {
 inited = true;
diff --git a/target-microblaze/cpu.c b/target-microblaze/cpu.c
index 67e3182..6b3732d 100644
--- a/target-microblaze/cpu.c
+++ b/target-microblaze/cpu.c
@@ -130,7 +130,7 @@ static void mb_cpu_initfn(Object *obj)
 static bool tcg_initialized;
 
 cs-env_ptr =

[Qemu-devel] [RFC v1 PATCH 2/3] cpus: Convert cpu_index into a bitmap

2015-05-08 Thread Bharata B Rao

Currently CPUState.cpu_index is monotonically increasing and a newly
created CPU always gets the next higher index. The next available
index is calculated by counting the existing number of CPUs. This is
fine as long as we only add CPUs, but there are architectures which
are starting to support CPU removal too. For an architecture like PowerPC
which derives its CPU identifier (device tree ID) from cpu_index, the
existing logic of generating cpu_index values causes problems.

With the currently proposed method of handling vCPU removal by parking
the vCPU fd in QEMU
(Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
generating cpu_index this way will not work for PowerPC.

This patch changes the way cpu_index is handed out by maintaining
a bit map of the CPUs that tracks both addition and removal of CPUs.

The CPU bitmap allocation logic is part of cpu_exec_init() which is
called by instance_init routines of various CPU targets. This patch
also adds corresponding instance_finalize routine if needed for these
CPU targets so that CPU can be marked free when it is removed.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
 exec.c  | 37 ++---
 include/qom/cpu.h   |  8 
 target-alpha/cpu.c  |  6 ++
 target-arm/cpu.c|  1 +
 target-cris/cpu.c   |  6 ++
 target-i386/cpu.c   |  6 ++
 target-lm32/cpu.c   |  6 ++
 target-m68k/cpu.c   |  6 ++
 target-microblaze/cpu.c |  6 ++
 target-mips/cpu.c   |  6 ++
 target-moxie/cpu.c  |  6 ++
 target-openrisc/cpu.c   |  6 ++
 target-ppc/translate_init.c |  6 ++
 target-s390x/cpu.c  |  1 +
 target-sh4/cpu.c|  6 ++
 target-sparc/cpu.c  |  1 +
 target-tricore/cpu.c|  5 +
 target-unicore32/cpu.c  |  6 ++
 target-xtensa/cpu.c |  6 ++
 19 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index 68b632b..2422948 100644
--- a/exec.c
+++ b/exec.c
@@ -519,21 +519,52 @@ void tcg_cpu_address_space_init(CPUState *cpu, 
AddressSpace *as)
 }
 #endif
 
+#ifndef CONFIG_USER_ONLY
+static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
+
+static int cpu_get_free_index(Error **errp)
+{
+int cpu = find_first_zero_bit(cpu_index_map, max_cpus);
+
+if (cpu == max_cpus) {
+error_setg(errp, Trying to use more CPUs than allowed max of %d\n,
+max_cpus);
+return max_cpus;
+} else {
+bitmap_set(cpu_index_map, cpu, 1);
+return cpu;
+}
+}
+
+void cpu_exec_exit(CPUState *cpu)
+{
+bitmap_clear(cpu_index_map, cpu-cpu_index, 1);
+}
+#endif
+
 void cpu_exec_init(CPUArchState *env, Error **errp)
 {
 CPUState *cpu = ENV_GET_CPU(env);
 CPUClass *cc = CPU_GET_CLASS(cpu);
-CPUState *some_cpu;
 int cpu_index;
-
 #if defined(CONFIG_USER_ONLY)
+CPUState *some_cpu;
+
 cpu_list_lock();
-#endif
 cpu_index = 0;
 CPU_FOREACH(some_cpu) {
 cpu_index++;
 }
 cpu-cpu_index = cpu_index;
+#else
+Error *local_err = NULL;
+
+cpu_index = cpu-cpu_index = cpu_get_free_index(local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+#endif
 cpu-numa_node = 0;
 QTAILQ_INIT(cpu-breakpoints);
 QTAILQ_INIT(cpu-watchpoints);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 39f0f19..3611ce5 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -673,6 +673,14 @@ void cpu_watchpoint_remove_all(CPUState *cpu, int mask);
 void QEMU_NORETURN cpu_abort(CPUState *cpu, const char *fmt, ...)
 GCC_FMT_ATTR(2, 3);
 
+#ifndef CONFIG_USER_ONLY
+void cpu_exec_exit(CPUState *cpu);
+#else
+static inline void cpu_exec_exit(CPUState *cpu)
+{
+}
+#endif
+
 #ifdef CONFIG_SOFTMMU
 extern const struct VMStateDescription vmstate_cpu_common;
 #else
diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
index 0a0c21e..259a04c 100644
--- a/target-alpha/cpu.c
+++ b/target-alpha/cpu.c
@@ -250,6 +250,11 @@ static const TypeInfo ev68_cpu_type_info = {
 .parent = TYPE(ev67),
 };
 
+static void alpha_cpu_finalize(Object *obj)
+{
+cpu_exec_exit(CPU(obj));
+}
+
 static void alpha_cpu_initfn(Object *obj)
 {
 CPUState *cs = CPU(obj);
@@ -305,6 +310,7 @@ static const TypeInfo alpha_cpu_type_info = {
 .parent = TYPE_CPU,
 .instance_size = sizeof(AlphaCPU),
 .instance_init = alpha_cpu_initfn,
+.instance_finalize = alpha_cpu_finalize,
 .abstract = true,
 .class_size = sizeof(AlphaCPUClass),
 .class_init = alpha_cpu_class_init,
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 0b27c19..ef04ae1 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -454,6 +454,7 @@ static void arm_cpu_finalizefn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
 g_hash_table_destroy(cpu-cp_regs);
+cpu_exec_exit(CPU(obj));
 }
 
 static void

[Qemu-devel] [RFC v1 PATCH 3/3] ppc: Move cpu_exec_init() call to realize function

2015-05-08 Thread Bharata B Rao

Move cpu_exec_init() call from instance_init to realize. This allows
any failures from cpu_exec_init() to be handled appropriately.
Also add corresponding cpu_exec_exit() call from unrealize.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
---
 target-ppc/translate_init.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index b915697..47fe3d6 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8928,6 +8928,11 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
**errp)
 return;
 }
 
+cpu_exec_init(cpu-env, local_err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
 cpu-cpu_dt_id = (cs-cpu_index / smp_threads) * max_smt
 + (cs-cpu_index % smp_threads);
 #endif
@@ -9141,6 +9146,8 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error 
**errp)
 opc_handler_t **table;
 int i, j;
 
+cpu_exec_exit(CPU(dev));
+
 for (i = 0; i  PPC_CPU_OPCODES_LEN; i++) {
 if (env-opcodes[i] == invalid_handler) {
 continue;
@@ -9638,8 +9645,6 @@ static void ppc_cpu_initfn(Object *obj)
 CPUPPCState *env = cpu-env;
 
 cs-env_ptr = env;
-cpu_exec_init(env, NULL);
-cpu-cpu_dt_id = cs-cpu_index;
 
 env-msr_mask = pcc-msr_mask;
 env-mmu_model = pcc-mmu_model;
-- 
2.1.0

Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Markus Armbruster

Paolo Bonzini pbonz...@redhat.com writes:

 On 08/05/2015 11:32, Markus Armbruster wrote:
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */

 Is the protocol defined anywhere?
 
 https://github.com/yoe/nbd/blob/master/doc/proto.txt
 
 All it has on the error code is this paragraph:
 
 The reply contains three fields: a 32 bit magic number ('magic'), a
 32 bit error code ('error'; 0, unless an error occurred in which
 case it is the errno of the error on the server side), and the same
 64 bit handle that the corresponding request had in its 'handle'
 field.  In case the reply is sent in response to a read request and
 the error field is 0 (zero), the reply header is immediately
 followed by request.len bytes of data.
 
 Could you update it to document the errno compatibility issues, and
 recommended practice (i.e. this patch's)?

 Yes, I've sent a patch yesterday.

Excellent!

 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return 1;
 +case EIO:
 +return 5;
 +case ENXIO:
 +return 6;
 +case E2BIG:
 +return 7;
 +case ENOMEM:
 +return 12;
 +case EACCES:
 +return 13;
 +case EFBIG:
 +return 27;
 +case ENOSPC:
 +return 28;
 +case EROFS:
 +return 30;
 +case EINVAL:
 +default:
 +return 22;
 +}
 +}
 +

 This maps recognized OS errnos to NBD errnos.  The latter are literals.

  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct 
 nbd_reply *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +/* NBD errors should be universally equal to the corresponding
 + * errno values, check it here.
 + */
 +QEMU_BUILD_BUG_ON(EPERM != 1);
 +QEMU_BUILD_BUG_ON(EIO != 5);
 +QEMU_BUILD_BUG_ON(ENXIO != 6);
 +QEMU_BUILD_BUG_ON(E2BIG != 7);
 +QEMU_BUILD_BUG_ON(ENOMEM != 12);
 +QEMU_BUILD_BUG_ON(EACCES != 13);
 +QEMU_BUILD_BUG_ON(EINVAL != 22);
 +QEMU_BUILD_BUG_ON(EFBIG != 27);
 +QEMU_BUILD_BUG_ON(ENOSPC != 28);
 +QEMU_BUILD_BUG_ON(EROFS != 30);
 +

 This checks that the mapping above is the identify function for all the
 recognized NBD errnos.  Why is that necessary?
 
 Still curious.  Explain, and earn my R-by :)

 Because block/nbd.c expects host errnos, and I was too lazy to add a
 switch and a nbd_errno_to_system_errno function.  But Eric pointed out
 that Hurd has weird errnos (the low bits match, but there's a 0x10
 subsystem code in bits 24-31) so I'll add it.

Oww!  That's going to hurd...

 Same literals as above.  Violates DRY.  I don't mind all that much, but
 wonder whether we could at least do the checking next to
 system_errno_to_nbd_errno().
 
 Could we?

 Yes, s/could/should/.  Also, should have added RFC to the patch.

Thanks!

Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices

2015-05-08 Thread Stefan Hajnoczi

On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote:
 This is the behavior in the operating system, for example Linux's
 blkdev_write_iter has the following:
 
 if (bdev_read_only(I_BDEV(bd_inode)))
 return -EPERM;
 
 This does not apply to opening a device for read/write, when the
 device only supports read-only operation.  In this case any of
 EACCES, EPERM or EROFS is acceptable depending on why writing is
 not possible.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  block/io.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpZi66XmRLg6.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Markus Armbruster

Paolo Bonzini pbonz...@redhat.com writes:

 Right now, NBD includes potentially platform-specific error values in
 the wire protocol.

 Luckily, most common error values are more or less universal: in
 particular, of all errno values = 34 (up to ERANGE), they are all the
 same on supported platforms except for 11 (which is EAGAIN on Windows and
 Linux, but EDEADLK on Darwin and the *BSDs).  So, in order to guarantee
 some portability, only keep a handful of possible error codes and squash
 everything else to EINVAL.

 This patch defines a limited set of errno values that are valid for the
 NBD protocol, and specifies recommendations for what error to return
 in specific corner cases.  The set of errno values is roughly based on
 the errors listed in the read(2) and write(2) man pages, with some
 exceptions:

 - ENOMEM is added for servers that implement copy-on-write or other
   formats that require dynamic allocation.

 - EDQUOT is not part of the universal set of errors; it can be changed
   to ENOSPC on the wire format.

Makes sense.

 - EFBIG is part of the universal set of errors, but it is also changed
   to ENOSPC because it is pretty similar to ENOSPC or EDQUOT.

Perhaps debatable, but I defer to your judgement.

 Incoming values will in general match system errno values, but not
 on the Hurd which has different errno values (they have a subsystem
 code equal to 0x10 in bits 24-31).  The Hurd is probably not something
 to which QEMU has been ported, but still do the right thing and
 reverse-map the NBD errno values to the system errno values.

 The corresponding patch to the NBD protocol description can be found at
 http://article.gmane.org/gmane.linux.drivers.nbd.general/3154.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
 v1-v2: use #defines and do not expect errnos to match
 NBD errnos for the sake of Hurd.  Reduced the list
 of errno values even more.  Better commit message
 explaining how the list was obtained.

  nbd.c | 52 
  1 file changed, 52 insertions(+)

 diff --git a/nbd.c b/nbd.c
 index cb1b9bb..57d71b2 100644
 --- a/nbd.c
 +++ b/nbd.c
 @@ -86,6 +86,54 @@
  #define NBD_OPT_ABORT   (2)
  #define NBD_OPT_LIST(3)
  
 +/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
 + * but only a limited set of errno values is specified in the protocol.
 + * Everything else is squashed to EINVAL.
 + */
 +#define NBD_EPERM  1
 +#define NBD_EIO5
 +#define NBD_ENOMEM 12
 +#define NBD_EINVAL 22
 +#define NBD_ENOSPC 28
 +
 +static int system_errno_to_nbd_errno(int err)
 +{
 +switch (err) {
 +case EPERM:
 +return NBD_EPERM;
 +case EIO:
 +return NBD_EIO;
 +case ENOMEM:
 +return NBD_ENOMEM;
 +#ifdef EDQUOT
 +case EDQUOT:
 +#endif
 +case EFBIG:
 +case ENOSPC:
 +return NBD_ENOSPC;
 +case EINVAL:
 +default:
 +return NBD_EINVAL;
 +}
 +}
 +
 +static int nbd_errno_to_system_errno(int err)
 +{
 +switch (err) {
 +case NBD_EPERM:
 +return EPERM;
 +case NBD_EIO:
 +return EIO;
 +case NBD_ENOMEM:
 +return ENOMEM;
 +case NBD_ENOSPC:
 +return ENOSPC;
 +case NBD_EINVAL:
 +default:
 +return EINVAL;
 +}
 +}
 +

If we reach default, something's amiss, isn't it?  Worth logging
something then?

  /* Definitions for opaque data types */
  
  typedef struct NBDRequest NBDRequest;
 @@ -856,6 +904,8 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply 
 *reply)
  reply-error  = be32_to_cpup((uint32_t*)(buf + 4));
  reply-handle = be64_to_cpup((uint64_t*)(buf + 8));
  
 +reply-error = nbd_errno_to_system_errno(reply-error);
 +
  TRACE(Got reply: 
{ magic = 0x%x, .error = %d, handle = % PRIu64 },
magic, reply-error, reply-handle);
 @@ -872,6 +922,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply 
 *reply)
  uint8_t buf[NBD_REPLY_SIZE];
  ssize_t ret;
  
 +reply-error = system_errno_to_nbd_errno(reply-error);
 +
  /* Reply
 [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
 [ 4 ..  7]error   (0 == no error)

Reviewed-by: Markus Armbruster arm...@redhat.com

Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers

2015-05-08 Thread Pavel Fedin

 Hello!

 Yes, I think it makes sense to just pick the low-hanging fruit for
 virtio-mmio and wait for pci.

 Does this mean that my series can be accepted as it is? Since PCI is 
potentially better
solution, MMIO is a low priority in my project, and i have lots of other tasks. 
This means
i unfortunately don't have time for further refactor. If you ACK, i will resend 
the series
once again as v3, i set up git send-email and it should be working now.
 I just wanted to share this piece because it's already done, and i would not 
like it to
go to oblivion again.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


 -Original Message-
 From: qemu-devel-bounces+p.fedin=samsung@nongnu.org [mailto:qemu-devel-
 bounces+p.fedin=samsung@nongnu.org] On Behalf Of Cornelia Huck
 Sent: Friday, May 08, 2015 1:20 PM
 To: Pavel Fedin
 Cc: qemu-devel@nongnu.org
 Subject: Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce 
 set_guest_notifiers
 
 On Fri, 08 May 2015 09:45:00 +0300
 Pavel Fedin p.fe...@samsung.com wrote:
 
   Hello!
 
   Hm, weren't there some patches for irqfd on arm?
 
   Yes, there were. However, they had a design problem by breaking backwards
compatibility
  with unmodified virtio. Their idea was to set up one more shared memory 
  area between
  virtio and vhost-net and use it to pass ISR value, which helps to 
  distinguish, which
event
  took place (queue update or config change). So, this idea was rejected.
   http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html
 
   I thought about it, and technically, i think, this can be done in better 
  way.
Actually,
  as far as i understood, all we need is mechanism for distinguishing between 
  these two
  events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ 
  signals
exactly
  one type of event. MSI-X code also has two IRQs mode as a failsafe, where 
  one IRQ
  signals config change and another IRQ signals queues update (and all queues 
  are polled
in
  turn). I think a similar thing could be done for virtio-mmio. It could 
  allocate two
IRQs
  instead of one and describe both of them in the device tree. Guest side, 
  upon seeing
that,
  could make use of those two IRQs and acknowledge to the host side that 
  yes, i am new
  version and use new mode.
   But, sorry, i will unlikely implement this, because we already have PCI 
  with MSI-X (i
  hope this is going to be published soon), so my project can use PCI 
  emulation. So
  implementing irqfds for virtio-mmio is a bit out of my scope.
 
 Thanks for the explanation.

Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire

2015-05-08 Thread Richard W.M. Jones

On Fri, May 08, 2015 at 12:27:59PM +0200, Paolo Bonzini wrote:
 And there are probably other NBD servers around.  CCing Rich Jones so
 that he can fix nbdkit.

Thanks.  FWIW currently nbdkit will send any arbitrary errno returned
by a system call, and it doesn't do any platform-specific encoding.

I've added this to the to-do list.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [Qemu-devel] [Qemu-block] [PATCH] vmdk: Fix next_cluster_sector for compressed write

2015-05-08 Thread Kevin Wolf

Am 06.05.2015 um 20:01 hat Max Reitz geschrieben:
 On 06.05.2015 14:23, Fam Zheng wrote:
 This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster
 allocation).
 
 Sometimes, write_len could be larger than cluster size, because it
 contains both data and marker.  We must advance next_cluster_sector in
 this case, otherwise the image gets corrupted.
 
 Reported-by: Antoni Villalonga qemu-l...@friki.cat
 Signed-off-by: Fam Zheng f...@redhat.com
 
 Reviewed-by: Max Reitz mre...@redhat.com

Thanks, added a Cc: qemu-sta...@nongnu.org line to the commit message
and applied to the block branch.

Kevin

[Qemu-devel] [Bug 1453025] [NEW] remote usbredirection failed, when guest os has more than one vcpus

2015-05-08 Thread lxj

Public bug reported:

I was testing usb3.0 redirection, it worked fine with one vcpu.
But when I set vcpu to a number greater than one, the redirection failed.

Host and guest are Fedora 20 and Windows 7.
Client is remote-viwer.

version: qemu 2.3.0  libvirt 1.2.15 spice 0.12.5

libvirt xml:

vcpu placement='static'2/vcpu
controller type='usb' index='0' model='nec-xhci'
/controller

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: usbredir

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1453025

Title:
  remote usbredirection failed, when guest os has more than one vcpus

Status in QEMU:
  New

Bug description:
  I was testing usb3.0 redirection, it worked fine with one vcpu.
  But when I set vcpu to a number greater than one, the redirection failed.

  Host and guest are Fedora 20 and Windows 7.
  Client is remote-viwer.

  version: qemu 2.3.0  libvirt 1.2.15 spice 0.12.5

  libvirt xml:

  vcpu placement='static'2/vcpu
  controller type='usb' index='0' model='nec-xhci'
  /controller

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1453025/+subscriptions

[Qemu-devel] [PULL 10/10] scripts: qmp-shell: Add verbose flag

2015-05-08 Thread Luiz Capitulino

From: John Snow js...@redhat.com

Add a verbose flag that shows the QMP command that was
constructed, to allow for later copy/pasting, reference,
debugging, etc.

The QMP is converted from a Python literal to JSON first,
to ensure that it is viable input to the actual QMP parser.

As a side-effect, this JSON output will helpfully show all
the necessary conversions that were performed on the input,
illustrating that True was transformed back into true,
literal values are now escaped with  instead of '', and so on.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Tested-by: Kashyap Chamarthy kcham...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 scripts/qmp/qmp-shell | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 1df2ca7..65280d2 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -195,6 +195,13 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 self.__cli_expr(cmdargs[1:], qmpcmd['arguments'])
 return qmpcmd
 
+def _print(self, qmp):
+jsobj = json.dumps(qmp)
+if self._pp is not None:
+self._pp.pprint(jsobj)
+else:
+print str(jsobj)
+
 def _execute_cmd(self, cmdline):
 try:
 qmpcmd = self.__build_cmd(cmdline)
@@ -206,15 +213,13 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 # For transaction mode, we may have just cached the action:
 if qmpcmd is None:
 return True
+if self._verbose:
+self._print(qmpcmd)
 resp = self.cmd_obj(qmpcmd)
 if resp is None:
 print 'Disconnected'
 return False
-
-if self._pp is not None:
-self._pp.pprint(resp)
-else:
-print resp
+self._print(resp)
 return True
 
 def connect(self):
@@ -250,6 +255,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 else:
 return self._execute_cmd(cmdline)
 
+def set_verbosity(self, verbose):
+self._verbose = verbose
+
 class HMPShell(QMPShell):
 def __init__(self, address):
 QMPShell.__init__(self, address)
@@ -327,7 +335,7 @@ def die(msg):
 def fail_cmdline(option=None):
 if option:
 sys.stderr.write('ERROR: bad command-line option \'%s\'\n' % option)
-sys.stderr.write('qemu-shell [ -p ] [ -H ]  UNIX socket path |  TCP 
address:port \n')
+sys.stderr.write('qemu-shell [ -v ] [ -p ] [ -H ]  UNIX socket path |  
TCP address:port \n')
 sys.exit(1)
 
 def main():
@@ -335,6 +343,7 @@ def main():
 qemu = None
 hmp = False
 pp = None
+verbose = False
 
 try:
 for arg in sys.argv[1:]:
@@ -346,6 +355,8 @@ def main():
 if pp is not None:
 fail_cmdline(arg)
 pp = pprint.PrettyPrinter(indent=4)
+elif arg == -v:
+verbose = True
 else:
 if qemu is not None:
 fail_cmdline(arg)
@@ -370,6 +381,7 @@ def main():
 die('Could not connect to %s' % addr)
 
 qemu.show_banner()
+qemu.set_verbosity(verbose)
 while qemu.read_exec_command(qemu.get_prompt()):
 pass
 qemu.close()
-- 
1.9.3

Re: [Qemu-devel] [PATCH RFC v8 5/7] qemu-iotests: s390x: fix test 049

2015-05-08 Thread Max Reitz


On 07.05.2015 09:52, Bo Tu wrote:

when creating an image qemu-img enable us specifying the size of the
image using -o size=xx options. But when we specify an invalid size
such as a negtive size then different platform gives different result.

parse_option_size() function in util/qemu-option.c will be called to
parse the size, a cast was called in the function to cast the input
(saved as a double in the function) size to an unsigned int64 value,
when the input is a negtive value or exceeds the maximum of uint64, then
the result is undefined.

Language spec 6.3.1.4 Real floating and integers:
the result of this assignment/cast is undefined if the float is not
in the open interval (-1, Utype_MAX+1).

Signed-off-by: Bo Tu t...@linux.vnet.ibm.com
---
  tests/qemu-iotests/049.out | 10 --
  util/qemu-option.c |  4 
  2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 9f93666..b8790f9 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -95,17 +95,15 @@ qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1024
  qemu-img: Image size must be less than 8 EiB!
  
  qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2

-qemu-img: qcow2 doesn't support shrinking images yet
-qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16
+qemu-img: Parameter 'size' expects a non-negative number below 2^64
+qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
  
  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k

  qemu-img: Image size must be less than 8 EiB!
  
  qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2

-qemu-img: qcow2 doesn't support shrinking images yet
-qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16
+qemu-img: Parameter 'size' expects a non-negative number below 2^64
+qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
  
  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte

  qemu-img: Invalid image size specified! You may use k, M, G, T, P or E 
suffixes for
diff --git a/util/qemu-option.c b/util/qemu-option.c
index fda4e5f..6e75698 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -179,6 +179,10 @@ void parse_option_size(const char *name, const char *value,
  
  if (value != NULL) {

  sizef = strtod(value, postfix);
+if (sizef  0 || sizef  UINT64_MAX) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE, name, a non-negative 
number below 2^64);


scripts/checkpatch.pl:

Konsole output

Konsole output
WARNING: line over 80 characters
#106: FILE: util/qemu-option.c:183:
+error_set(errp, QERR_INVALID_PARAMETER_VALUE, name, a 
non-negative number below 2^64);


Other than that, looks good.

Max


+return;
+}
  switch (*postfix) {
  case 'T':
  sizef *= 1024;

Re: [Qemu-devel] [PATCH RFC v8 7/7] qemu-iotests: s390x: fix test 130

2015-05-08 Thread Max Reitz


On 07.05.2015 09:52, Bo Tu wrote:

The default device id of hard disk on the s390 platform is virtio0
which differs to the ide0-hd0 for the x86 platform. Setting id in
the drive definition, ie:qemu -drive id=testdisk, will be the same
on all platforms.

Signed-off-by: Bo Tu t...@linux.vnet.ibm.com
---
  tests/qemu-iotests/130 | 8 
  tests/qemu-iotests/130.out | 4 ++--
  2 files changed, 6 insertions(+), 6 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

Re: [Qemu-devel] [PATCH 2/4] migration: move savevm.c inside migration/

2015-05-08 Thread Eric Blake

On 05/08/2015 06:43 AM, Juan Quintela wrote:
 Now, everything is in place.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  MAINTAINERS| 1 -
  Makefile.target| 4 ++--
  savevm.c = migration/savevm.c | 0
  trace-events   | 2 +-
  4 files changed, 3 insertions(+), 4 deletions(-)
  rename savevm.c = migration/savevm.c (100%)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 1/7] qcow2: use one single memory block for the L2/refcount cache tables

2015-05-08 Thread Max Reitz


On 06.05.2015 15:39, Alberto Garcia wrote:

The qcow2 L2/refcount cache contains one separate table for each cache
entry. Doing one allocation per table adds unnecessary overhead and it
also requires us to store the address of each table separately.

Since the size of the cache is constant during its lifetime, it's
better to have an array that contains all the tables using one single
allocation.

In my tests measuring freshly created caches with sizes 128MB (L2) and
32MB (refcount) this uses around 10MB of RAM less.

Signed-off-by: Alberto Garcia be...@igalia.com
---
  block/qcow2-cache.c| 55 --
  block/qcow2-cluster.c  | 12 +--
  block/qcow2-refcount.c |  8 +---
  block/qcow2.h  |  3 ++-
  4 files changed, 39 insertions(+), 39 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com

[Qemu-devel] [PATCH 4/4] block: Include qemu-log.o in block-obj-y

2015-05-08 Thread Dimitris Aragiorgis

Building the QEMU tools fails if we #define DEBUG_BLOCK inside
block/raw-posix.c. This happens because qemu-log.o is missing from
block-obj-y, which causes the link to fail. Fix this.

Signed-off-by: Dimitris Aragiorgis dim...@arrikto.com
---
 Makefile.objs |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile.objs b/Makefile.objs
index 28999d3..98f6e02 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -6,7 +6,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o 
qapi-event.o
 ###
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
-block-obj-y = async.o thread-pool.o
+block-obj-y = async.o thread-pool.o qemu-log.o
 block-obj-y += nbd.o block.o blockjob.o
 block-obj-y += main-loop.o iohandler.o qemu-timer.o
 block-obj-$(CONFIG_POSIX) += aio-posix.o
-- 
1.7.10.4

[Qemu-devel] [PULL 09/10] scripts: qmp-shell: add transaction subshell

2015-05-08 Thread Luiz Capitulino

From: John Snow js...@redhat.com

Add a special processing mode to craft transactions.

By entering transaction( the shell will enter a special
mode where each subsequent command will be saved as a transaction
instead of executed as an individual command.

The transaction can be submitted by entering ) on a line by itself.

Examples:

Separate lines:

(QEMU) transaction(
TRANS block-dirty-bitmap-add node=drive0 name=bitmap1
TRANS block-dirty-bitmap-clear node=drive0 name=bitmap0
TRANS )

With a transaction action included on the first line:

(QEMU) transaction( block-dirty-bitmap-add node=drive0 name=bitmap2
TRANS block-dirty-bitmap-add node=drive0 name=bitmap3
TRANS )

As a one-liner, with just one transaction action:

(QEMU) transaction( block-dirty-bitmap-add node=drive0 name=bitmap0 )

As a side-effect of this patch, blank lines are now parsed as no-ops,
regardless of which shell mode you are in.

Signed-off-by: John Snow js...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Tested-by: Kashyap Chamarthy kcham...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 scripts/qmp/qmp-shell | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 7f2c554..1df2ca7 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -73,6 +73,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 self._greeting = None
 self._completer = None
 self._pp = pp
+self._transmode = False
+self._actions = list()
 
 def __get_address(self, arg):
 
@@ -159,6 +161,36 @@ class QMPShell(qmp.QEMUMonitorProtocol):
  command-name  [ arg-name1=arg1 ] ... [ arg-nameN=argN ]
 
 cmdargs = cmdline.split()
+
+# Transactional CLI entry/exit:
+if cmdargs[0] == 'transaction(':
+self._transmode = True
+cmdargs.pop(0)
+elif cmdargs[0] == ')' and self._transmode:
+self._transmode = False
+if len(cmdargs)  1:
+raise QMPShellError(Unexpected input after close of 
Transaction sub-shell)
+qmpcmd = { 'execute': 'transaction',
+   'arguments': { 'actions': self._actions } }
+self._actions = list()
+return qmpcmd
+
+# Nothing to process?
+if not cmdargs:
+return None
+
+# Parse and then cache this Transactional Action
+if self._transmode:
+finalize = False
+action = { 'type': cmdargs[0], 'data': {} }
+if cmdargs[-1] == ')':
+cmdargs.pop(-1)
+finalize = True
+self.__cli_expr(cmdargs[1:], action['data'])
+self._actions.append(action)
+return self.__build_cmd(')') if finalize else None
+
+# Standard command: parse and return it to be executed.
 qmpcmd = { 'execute': cmdargs[0], 'arguments': {} }
 self.__cli_expr(cmdargs[1:], qmpcmd['arguments'])
 return qmpcmd
@@ -171,6 +203,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 print 'command format: command-name ',
 print '[arg-name1=arg1] ... [arg-nameN=argN]'
 return True
+# For transaction mode, we may have just cached the action:
+if qmpcmd is None:
+return True
 resp = self.cmd_obj(qmpcmd)
 if resp is None:
 print 'Disconnected'
@@ -191,6 +226,11 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 version = self._greeting['QMP']['version']['qemu']
 print 'Connected to QEMU %d.%d.%d\n' % 
(version['major'],version['minor'],version['micro'])
 
+def get_prompt(self):
+if self._transmode:
+return TRANS 
+return (QEMU) 
+
 def read_exec_command(self, prompt):
 
 Read and execute a command.
@@ -330,7 +370,7 @@ def main():
 die('Could not connect to %s' % addr)
 
 qemu.show_banner()
-while qemu.read_exec_command('(QEMU) '):
+while qemu.read_exec_command(qemu.get_prompt()):
 pass
 qemu.close()
 
-- 
1.9.3

[Qemu-devel] [PULL 04/10] qobject: Add a special null QObject

2015-05-08 Thread Luiz Capitulino

From: Markus Armbruster arm...@redhat.com

I'm going to fix the JSON parser to recognize null.  The obvious
representation of JSON null as (QObject *)NULL doesn't work, because
the parser already uses it as an error value.  Perhaps we should
change it to free NULL for null, but that's more than I can do right
now.  Create a special null QObject instead.

The existing QDict, QList, and QString all represent something that
is a pointer in C and could therefore be associated with NULL.  But
right now, all three of these sub-types are always non-null once
created, so the new null sentinel object is intentionally unrelated
to them.

Signed-off-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Eric Blake ebl...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 include/qapi/qmp/qobject.h | 11 ++-
 qobject/Makefile.objs  |  2 +-
 qobject/qjson.c|  3 +++
 qobject/qnull.c| 29 +
 4 files changed, 43 insertions(+), 2 deletions(-)
 create mode 100644 qobject/qnull.c

diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index 0991296..84b2d9f 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -3,7 +3,7 @@
  *
  * Based on ideas by Avi Kivity a...@redhat.com
  *
- * Copyright (C) 2009 Red Hat Inc.
+ * Copyright (C) 2009, 2015 Red Hat Inc.
  *
  * Authors:
  *  Luiz Capitulino lcapitul...@redhat.com
@@ -37,6 +37,7 @@
 
 typedef enum {
 QTYPE_NONE,/* sentinel value, no QObject has this type code */
+QTYPE_QNULL,
 QTYPE_QINT,
 QTYPE_QSTRING,
 QTYPE_QDICT,
@@ -110,4 +111,12 @@ static inline qtype_code qobject_type(const QObject *obj)
 return obj-type-code;
 }
 
+extern QObject qnull_;
+
+static inline QObject *qnull(void)
+{
+qobject_incref(qnull_);
+return qnull_;
+}
+
 #endif /* QOBJECT_H */
diff --git a/qobject/Makefile.objs b/qobject/Makefile.objs
index c9ff59c..f7595f5 100644
--- a/qobject/Makefile.objs
+++ b/qobject/Makefile.objs
@@ -1,3 +1,3 @@
-util-obj-y = qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o
+util-obj-y = qnull.o qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o
 util-obj-y += qjson.o json-lexer.o json-streamer.o json-parser.o
 util-obj-y += qerror.o
diff --git a/qobject/qjson.c b/qobject/qjson.c
index f2857c1..846733d 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -127,6 +127,9 @@ static void to_json_list_iter(QObject *obj, void *opaque)
 static void to_json(const QObject *obj, QString *str, int pretty, int indent)
 {
 switch (qobject_type(obj)) {
+case QTYPE_QNULL:
+qstring_append(str, null);
+break;
 case QTYPE_QINT: {
 QInt *val = qobject_to_qint(obj);
 char buffer[1024];
diff --git a/qobject/qnull.c b/qobject/qnull.c
new file mode 100644
index 000..9873e26
--- /dev/null
+++ b/qobject/qnull.c
@@ -0,0 +1,29 @@
+/*
+ * QNull
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster arm...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qapi/qmp/qobject.h
+
+static void qnull_destroy_obj(QObject *obj)
+{
+assert(0);
+}
+
+static const QType qnull_type = {
+.code = QTYPE_QNULL,
+.destroy = qnull_destroy_obj,
+};
+
+QObject qnull_ = {
+.type = qnull_type,
+.refcnt = 1,
+};
-- 
1.9.3

[Qemu-devel] [PULL 05/10] json-parser: Accept 'null' in QMP

2015-05-08 Thread Luiz Capitulino

From: Eric Blake ebl...@redhat.com

We document that in QMP, the client may send any json-value
for the optional id key, and then return that same value
on reply (both success and failures, insofar as the failure
happened after parsing the id).  [Note that the output may
not be identical to the input, as whitespace may change and
since we may reorder keys within a json-object, but that this
still constitutes the same json-value].  However, we were not
handling the JSON literal null, which counts as a json-value
per RFC 7159.

Also, down the road, given the QAPI schema of {'*foo':'str'} or
{'*foo':'ComplexType'}, we could decide to allow the QMP client
to pass { foo:null } instead of the current representation of
{ } where omitting the key is the only way to get at the default
NULL value.  Such a change might be useful for argument
introspection (if a type in older qemu lacks 'foo' altogether,
then an explicit foo:null probe will force an easily
distinguished error message for whether the optional foo key
is even understood in newer qemu).  And if we add default values
to optional arguments, allowing an explicit null would be
required for getting a NULL value associated with an optional
string that has a non-null default.  But all that can come at a
later day.

The 'check-unit' testsuite is enhanced to test that parsing
produces the same object as explicitly requesting a reference
to the special qnull object.  In addition, I tested with:

$ ./x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults
{QMP: {version: {qemu: {micro: 91, minor: 2, major: 2}, package: 
}, capabilities: []}}
{execute:qmp_capabilities,id:null}
{return: {}, id: null}
{id:{a:null,b:[1,null]},execute:quit}
{return: {}, id: {a: null, b: [1, null]}}
{timestamp: {seconds: 1427742379, microseconds: 423128}, event: 
SHUTDOWN}

Signed-off-by: Eric Blake ebl...@redhat.com
Signed-off-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 qobject/json-parser.c |  2 ++
 tests/check-qjson.c   | 15 +--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 4288267..717cb8f 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -561,6 +561,8 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
 ret = QOBJECT(qbool_from_int(true));
 } else if (token_is_keyword(token, false)) {
 ret = QOBJECT(qbool_from_int(false));
+} else if (token_is_keyword(token, null)) {
+ret = qnull();
 } else {
 parse_error(ctxt, token, invalid keyword `%s', 
token_get_value(token));
 goto out;
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 95497a0..60e5b22 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1,6 +1,6 @@
 /*
  * Copyright IBM, Corp. 2009
- * Copyright (c) 2013 Red Hat Inc.
+ * Copyright (c) 2013, 2015 Red Hat Inc.
  *
  * Authors:
  *  Anthony Liguori   aligu...@us.ibm.com
@@ -1005,6 +1005,7 @@ static void keyword_literal(void)
 {
 QObject *obj;
 QBool *qbool;
+QObject *null;
 QString *str;
 
 obj = qobject_from_json(true);
@@ -1041,7 +1042,7 @@ static void keyword_literal(void)
 g_assert(qbool_get_int(qbool) == 0);
 
 QDECREF(qbool);
-
+
 obj = qobject_from_jsonf(%i, true);
 g_assert(obj != NULL);
 g_assert(qobject_type(obj) == QTYPE_QBOOL);
@@ -1050,6 +1051,16 @@ static void keyword_literal(void)
 g_assert(qbool_get_int(qbool) != 0);
 
 QDECREF(qbool);
+
+obj = qobject_from_json(null);
+g_assert(obj != NULL);
+g_assert(qobject_type(obj) == QTYPE_QNULL);
+
+null = qnull();
+g_assert(null == obj);
+
+qobject_decref(obj);
+qobject_decref(null);
 }
 
 typedef struct LiteralQDictEntry LiteralQDictEntry;
-- 
1.9.3

Re: [Qemu-devel] [PATCH 1/4] migration: move ram stuff to migration/ram

2015-05-08 Thread Eric Blake

On 05/08/2015 06:43 AM, Juan Quintela wrote:
 For historic reasons, ram migration have been on arch_init.c.  Just

s/have been on/has been in/

 split it into migration/ram.c, the same that happened with block.c.
 
 There is only code movement, no changes altogether.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  MAINTAINERS   |1 -
  Makefile.target   |1 +
  arch_init.c   | 1754 
 ++---
  include/migration/migration.h |2 +
  include/sysemu/arch_init.h|1 -
  migration/ram.c   | 1681 +++
  trace-events  |2 +-
  7 files changed, 1747 insertions(+), 1695 deletions(-)
  create mode 100644 migration/ram.c

This diff is harder than necessary to read.  It can be made smaller by
using the --patience argument when creating the diff:

$ git diff HEAD^ --patience --stat
 MAINTAINERS   |1 -
 Makefile.target   |1 +
 arch_init.c   | 1630
---
 include/migration/migration.h |2 +
 include/sysemu/arch_init.h|1 -
 migration/ram.c   | 1681
+
 trace-events  |2 +-
 7 files changed, 1685 insertions(+), 1633 deletions(-)


To make it permanent:
$ git config diff.algorithm patience

At any rate, here's how I reviewed:

$ diff -u (git diff HEAD^ | sed -n 's/^-//p') (git diff HEAD^ | sed -n
's/^\+//p')

You introduced a newline before mig_sleep_cpu(), and changed the text in
the DPRINTF definition, but I can live with that.

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] net: Change help text to list -netdev instead of -net by default

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 11:36, Thomas Huth wrote:
 Looking at the output of qemu-system-xxx -help, you easily get
 the impression that -net is the preferred way instead of -netdev
 to specify host network interface, since the -net option is
 omnipresent but the -netdev option is only listed as a one-liner
 at the end. This is ugly since -net is considered as legacy and
 even might be removed one day. Thus, this patch switches the output
 to explain the host network interfaces with the -netdev option
 instead, moving the legacy -net option into some few lines at
 the end.
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  qemu-options.hx | 66 
 ++---
  1 file changed, 39 insertions(+), 27 deletions(-)

That would be great if it worked for all machine types, but it doesn't.
 For example, there's no equivalent of:

$ qemu-system-arm -net user -net nic -machine versatilepb

You cannot just use -netdev:

$ qemu-system-arm -netdev user,id=nd0 -machine versatilepb
Warning: netdev nd0 has no peer

You cannot use -netdev and -net together:

$ qemu-system-arm -netdev user,id=nd0 -net nic -machine versatilepb
Warning: vlan 0 is not connected to host network
Warning: netdev nd0 has no peer

This particular board has a PCI controller, so you can just add a PCI
NIC using -netdev/-device, but still that wouldn't be a match for -net
user -net nic (which puts a smc91c111 NIC on sysbus).  In most embedded
boards there's not even a PCI controller.

So the patch is great, but I wouldn't say it's deprecated, because in
practice it isn't.

Paolo

Re: [Qemu-devel] [PATCH v3 3/7] qom: create objects in two phases

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 16:37, Andreas Färber wrote:
 Hi,
 
 Can we *please* find a better subject for this? To me, creating QOM
 objects in two phases is about instance_init vs. realize, and thus I was
 pretty upset that Paolo dared to apply this without asking me first.

Oops, sorry.  I very much understand where you came from, now.

Thanks,

Paolo

[Qemu-devel] [PULL 03/10] qobject: Clean up around qtype_code

2015-05-08 Thread Luiz Capitulino

From: Markus Armbruster arm...@redhat.com

QTYPE_NONE is a sentinel value.  No QObject has this type code.
Document it properly.

Fix dump_qobject() to abort() on QTYPE_NONE, just like for any other
invalid type code.

Fix to_json() to abort() on all invalid type codes, not just
QTYPE_MAX.

Clean up Property member qtype's type: it's a qtype_code.

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 block/qapi.c   | 3 ---
 include/hw/qdev-core.h | 2 +-
 include/qapi/qmp/qobject.h | 2 +-
 qobject/qjson.c| 3 +--
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 063dd1b..18d2b95 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -523,9 +523,6 @@ static void dump_qobject(fprintf_function func_fprintf, 
void *f,
 QDECREF(value);
 break;
 }
-case QTYPE_NONE:
-break;
-case QTYPE_MAX:
 default:
 abort();
 }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 4e673f9..9a0ee30 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -226,7 +226,7 @@ struct Property {
 PropertyInfo *info;
 int  offset;
 uint8_t  bitnr;
-uint8_t  qtype;
+qtype_code   qtype;
 int64_t  defval;
 int  arrayoffset;
 PropertyInfo *arrayinfo;
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index d0bbc7c..0991296 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -36,7 +36,7 @@
 #include assert.h
 
 typedef enum {
-QTYPE_NONE,
+QTYPE_NONE,/* sentinel value, no QObject has this type code */
 QTYPE_QINT,
 QTYPE_QSTRING,
 QTYPE_QDICT,
diff --git a/qobject/qjson.c b/qobject/qjson.c
index 12c576d..f2857c1 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -260,9 +260,8 @@ static void to_json(const QObject *obj, QString *str, int 
pretty, int indent)
 }
 case QTYPE_QERROR:
 /* XXX: should QError be emitted? */
-case QTYPE_NONE:
 break;
-case QTYPE_MAX:
+default:
 abort();
 }
 }
-- 
1.9.3

[Qemu-devel] [PULL 02/10] QJSON: Use OBJECT_CHECK

2015-05-08 Thread Luiz Capitulino

From: Eduardo Habkost ehabk...@redhat.com

The QJSON code used casts to (QJSON*) directly, instead of OBJECT_CHECK.
There were even some functions using object_dynamic_cast() calls
followed by assert(), which is exactly what OBJECT_CHECK does (by
calling object_dynamic_cast_assert()).

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---
 qjson.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qjson.c b/qjson.c
index 0cda269..e478802 100644
--- a/qjson.c
+++ b/qjson.c
@@ -24,6 +24,8 @@ struct QJSON {
 bool omit_comma;
 };
 
+#define QJSON(obj) OBJECT_CHECK(QJSON, (obj), TYPE_QJSON)
+
 static void json_emit_element(QJSON *json, const char *name)
 {
 /* Check whether we need to print a , before an element */
@@ -87,7 +89,7 @@ const char *qjson_get_str(QJSON *json)
 
 QJSON *qjson_new(void)
 {
-QJSON *json = (QJSON *)object_new(TYPE_QJSON);
+QJSON *json = QJSON(object_new(TYPE_QJSON));
 return json;
 }
 
@@ -98,8 +100,7 @@ void qjson_finish(QJSON *json)
 
 static void qjson_initfn(Object *obj)
 {
-QJSON *json = (QJSON *)object_dynamic_cast(obj, TYPE_QJSON);
-assert(json);
+QJSON *json = QJSON(obj);
 
 json-str = qstring_from_str({ );
 json-omit_comma = true;
@@ -107,9 +108,8 @@ static void qjson_initfn(Object *obj)
 
 static void qjson_finalizefn(Object *obj)
 {
-QJSON *json = (QJSON *)object_dynamic_cast(obj, TYPE_QJSON);
+QJSON *json = QJSON(obj);
 
-assert(json);
 qobject_decref(QOBJECT(json-str));
 }
 
-- 
1.9.3

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo