Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire
Paolo Bonzini pbonz...@redhat.com writes: Right now, NBD includes potentially platform-specific error values in the wire protocol. Design flaw. Luckily, most common error values are more or less universal: in particular, of all errno values = 34 (up to ERANGE), they are all the same on supported platforms except for 11 (which is EAGAIN on Windows and Linux, but EDEADLK on Darwin and the *BSDs). Can EAGAIN or EDEADLK happen? I don't know is an acceptable answer :) So, in order to guarantee some portability, only keep a dozen possible error codes and squash everything else to EINVAL. Ugh. I guess it'll do. Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in addition to 1, 5, ... If backward compatibility is not an issue: s/in addition to/instead of/. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/nbd.c b/nbd.c index eea8c51..1ad5b66 100644 --- a/nbd.c +++ b/nbd.c @@ -86,6 +86,37 @@ #define NBD_OPT_ABORT (2) #define NBD_OPT_LIST(3) +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ Is the protocol defined anywhere? +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return 1; +case EIO: +return 5; +case ENXIO: +return 6; +case E2BIG: +return 7; +case ENOMEM: +return 12; +case EACCES: +return 13; +case EFBIG: +return 27; +case ENOSPC: +return 28; +case EROFS: +return 30; +case EINVAL: +default: +return 22; +} +} + This maps recognized OS errnos to NBD errnos. The latter are literals. /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +/* NBD errors should be universally equal to the corresponding + * errno values, check it here. + */ +QEMU_BUILD_BUG_ON(EPERM != 1); +QEMU_BUILD_BUG_ON(EIO != 5); +QEMU_BUILD_BUG_ON(ENXIO != 6); +QEMU_BUILD_BUG_ON(E2BIG != 7); +QEMU_BUILD_BUG_ON(ENOMEM != 12); +QEMU_BUILD_BUG_ON(EACCES != 13); +QEMU_BUILD_BUG_ON(EINVAL != 22); +QEMU_BUILD_BUG_ON(EFBIG != 27); +QEMU_BUILD_BUG_ON(ENOSPC != 28); +QEMU_BUILD_BUG_ON(EROFS != 30); + This checks that the mapping above is the identify function for all the recognized NBD errnos. Why is that necessary? Same literals as above. Violates DRY. I don't mind all that much, but wonder whether we could at least do the checking next to system_errno_to_nbd_errno(). TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error)
Re: [Qemu-devel] [PATCH v3 4/5] qtest: precompute hex nibs
Eric Blake ebl...@redhat.com writes: On 05/06/2015 10:18 AM, John Snow wrote: To find out, add just buffering. Something like this in your patch instead of byte2hex(): for (i = 0; i len; i++) { -qtest_sendf(chr, %02x, data[i]); +snprintf(enc[i * 2], 2, %02x, data[i]); } If the speedup is pretty much entirely due to buffering (which I suspect), then your commit message could use a bit of love :) When you're right, you're right. The difference may not be statistically meaningful, but with today's current planetary alignment, using sprintf() to batch the sends instead of my home-rolled nib computation function, I can eke out a few more tenths of a second. I'm a bit surprised - making a function call per byte generally executes more instructions than open-coding the conversion (albeit the branch prediction in the hardware probably does fairly well over long strings, since it is a tight and predictable loop). Remember, sprintf() has to decode the format string on every call (unless the compiler is smart enough to open-code what sprintf would do). John's measurements show that the speed difference between snprintf() and a local copy of formatting code gets thoroughly drowned in noise. The snprintf() version takes 18 lines less, according to diffstat. Less code, same measured performance, what's not to like? However, if you feel strongly about avoiding snprintf() here, I won't argue further. Except for the commit message: it needs to be fixed not to claim avoiding printf and friends makes a speed difference.
Re: [Qemu-devel] [PATCH 0/7] virtio: inline private qdev properties into virtio devices
No, he is on vacation this week. Sorry for the delay! Of you are doing cleanups in virtio, perhaps you can look into using alias properties for virtio-balloon's QOM properties (for example the statistics). The code is currently using object_property_add and manually-written getters/setters. Thanks, Paolo -Original Message- From: Shannon Zhao [shannon.z...@linaro.org] Received: venerdì, 08 mag 2015, 3:24 To: qemu-devel@nongnu.org CC: peter.mayd...@linaro.org, christoffer.d...@linaro.org, m...@redhat.com, pbonz...@redhat.com, peter.huangp...@huawei.com, hangaoh...@huawei.com, zhaoshengl...@huawei.com Subject: Re: [PATCH 0/7] virtio: inline private qdev properties into virtio devices On 2015/4/29 23:24, Shannon Zhao wrote: The private qdev properties of virtio devices are only used by themselves. As Peter suggested and like what virtio-blk has done, we should move the private qdev properties into devices and don't expose them to avoid wrongly use. This patchset is based on following patchset which moves host features to backends. http://lists.gnu.org/archive/html/qemu-devel/2015-04/msg03785.html Shannon Zhao (7): virtio-net: move qdev properties into virtio-net.c virtio-net.h: Remove unsed DEFINE_VIRTIO_NET_PROPERTIES virtio-scsi: move qdev properties into virtio-scsi.c virtio-rng: move qdev properties into virtio-rng.c virtio-serial-bus: move qdev properties into virtio-serial-bus.c virtio-9p-device: move qdev properties into virtio-9p-device.c vhost-scsi: move qdev properties into vhost-scsi.c Have any maintainer picked up these patches? -- Shannon
Re: [Qemu-devel] [PATCH] configure: bump glib version to 2.16
John Snow js...@redhat.com writes: On 05/07/2015 08:22 AM, Peter Maydell wrote: On 10 March 2015 at 17:45, Peter Maydell peter.mayd...@linaro.org wrote: On 10 March 2015 at 17:43, John Snow js...@redhat.com wrote: Wasn't aware we were actually going through with that; it had looked like we were going to refrain from fiddling with it because we found a workaround that sufficed for glib 2.12. Well, we're not doing anything before QEMU 2.3 is released. But if we bump the version requirement at all, then we might as well move it to 2.22, because once you drop the support RHEL5 requirement then that's the newest version we can shift to without breaking compilation on some other distro we care about (SUSE, in this case). So do people still want to bump the minimum glib version? If we're going to do it for 2.4 I would prefer to see the configure patch reasonably early in the release cycle I think. -- PMM I think we determined that the minimum version could be as high as 2.22. https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg00488.html I support upgrading to a more recent version of GLib as much as ever. Let's go for 2.22.
Re: [Qemu-devel] [PULL 3/6] console-gl: add opengl rendering helper functions
On Fr, 2015-05-08 at 00:28 +0200, Alexander Graf wrote: On 05.05.15 11:43, Gerd Hoffmann wrote: Signed-off-by: Gerd Hoffmann kra...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com --- [...] +void surface_gl_create_texture(ConsoleGLState *gls, + DisplaySurface *surface) +{ +assert(gls); +assert(surface_stride(surface) % surface_bytes_per_pixel(surface) == 0); + +switch (surface-format) { +case PIXMAN_BE_b8g8r8x8: +case PIXMAN_BE_b8g8r8a8: +surface-glformat = GL_BGRA_EXT; +surface-gltype = GL_UNSIGNED_BYTE; +break; +case PIXMAN_r5g6b5: +surface-glformat = GL_RGB; +surface-gltype = GL_UNSIGNED_SHORT_5_6_5; +break; +default: +g_assert_not_reached(); +} + +glGenTextures(1, surface-texture); +glEnable(GL_TEXTURE_2D); +glBindTexture(GL_TEXTURE_2D, surface-texture); +glPixelStorei(GL_UNPACK_ROW_LENGTH_EXT, This doesn't compile for me on SLES11: ui/console-gl.c: In function ‘surface_gl_create_texture’: ui/console-gl.c:97:19: error: ‘GL_UNPACK_ROW_LENGTH_EXT’ undeclared (first use in this function) ui/console-gl.c:97:19: note: each undeclared identifier is reported only once for each function it appears in ui/console-gl.c: In function ‘surface_gl_update_texture’: ui/console-gl.c:117:19: error: ‘GL_UNPACK_ROW_LENGTH_EXT’ undeclared (first use in this function) make: *** [ui/console-gl.o] Error 1 make: *** Waiting for unfinished jobs Which mesa version is this? cheers, Gerd
Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers
Hello! Hm, weren't there some patches for irqfd on arm? Yes, there were. However, they had a design problem by breaking backwards compatibility with unmodified virtio. Their idea was to set up one more shared memory area between virtio and vhost-net and use it to pass ISR value, which helps to distinguish, which event took place (queue update or config change). So, this idea was rejected. http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html I thought about it, and technically, i think, this can be done in better way. Actually, as far as i understood, all we need is mechanism for distinguishing between these two events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ signals exactly one type of event. MSI-X code also has two IRQs mode as a failsafe, where one IRQ signals config change and another IRQ signals queues update (and all queues are polled in turn). I think a similar thing could be done for virtio-mmio. It could allocate two IRQs instead of one and describe both of them in the device tree. Guest side, upon seeing that, could make use of those two IRQs and acknowledge to the host side that yes, i am new version and use new mode. But, sorry, i will unlikely implement this, because we already have PCI with MSI-X (i hope this is going to be published soon), so my project can use PCI emulation. So implementing irqfds for virtio-mmio is a bit out of my scope. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia
Re: [Qemu-devel] [PATCH] docs: update BLOCK_IMAGE_CORRUPTED documentation
Applied to -trivial, thank you! /mjt
[Qemu-devel] [Bug 1452904] [NEW] High CPU in idle Windows guest
Public bug reported: Hi, I'm running a freshly installed Windows 7 domU on an up-to-date Debian jessie machine running Xen 4.4.1-9. When the Windows machine is idle, I'm seeing upwards of 10% CPU usage from the qemu-system-i386 instance. Other Linux and FreeBSD machines register negligable CPU usage (0.5%). The server is an HP Proliant DL360 G7. Data from perf attacthed to the process might give the best clues. Any information as to why this processes is comsuming so much CPU would be much appreciated. Regards, Terry. # uname -a Linux xen 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux # lsb_release -a No LSB modules are available. Distributor ID: Debian Description:Debian GNU/Linux 8.0 (jessie) Release:8.0 Codename: jessie # qemu-system-i386 --version QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11), Copyright (c) 2003-2008 Fabrice Bellard # cat win7.cfg name= 'win7' vcpus = '1' memory = '4096' builder = 'hvm' disk= [ '/vms/xen/domains/desk_win7/disk1.img,qcow2,hda,rw', '/vms/xen/domains/desk_win7/disk2.img,qcow2,hdb,rw' ] vfb = [ 'vnc=1,vnclisten=0.0.0.0,sdl=0' ] vif = [ 'mac=00:16:3E:FB:FC:39,bridge=xenbr0' ] serial = 'pty' vnc = 1 vnclisten = '0.0.0.0' sdl = 0 usbdevice = 'mouse' audio = 1 soundhw = 'sb16,es1370' on_poweroff = 'destroy' on_reboot = 'restart' on_crash= 'restart' # /usr/bin/qemu-system-i386 -nodefaults -name desk_win7 -boot order=cda -usb -usbdevice mouse -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:fb:fc:39 -netdev type=tap,id=net0,ifname=vif12.0-emu,script=no,downscript=no -m 4088 -drive file=/vms/xen/domains/desk_win7/disk1.img,if=ide,index=0,media=disk,format=qcow2,cache=writeback -drive file=/vms/xen/domains/desk_win7/disk2.img,if=ide,index=1,media=disk,format=qcow2,cache=writeback -vnc 0.0.0.0:0,to=99 -display none -device cirrus-vga -global vga.vram_size_mb=8 # top Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie %Cpu(s): 16.6 us, 5.5 sy, 0.0 ni, 0.0 id, 77.9 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem:354420 total, 348884 used, 5536 free, 184 buffers KiB Swap: 17575932 total, 2642868 used, 14933064 free. 4132 cached Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 12297 root 20 0 6683864 192944 1140 S 20.6 54.4 2:25.51 qemu-syste+ # xen dmesg (XEN) Xen version 4.4.1 (Debian 4.4.1-9) (wa...@debian.org) (gcc (Debian 4.9.2-10) 4.9.2) debug=n Mon Apr 6 18:24:28 UTC 2015 (XEN) Bootloader: GRUB 2.02~beta2-22 (XEN) Command line: placeholder dom0_mem=512M dom0_max_vcpus=1 dom0_vcpus_pin radeon.modeset=0 iommu=no-intremap (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 2 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009f400 (usable) (XEN) 0009f400 - 000a (reserved) (XEN) 000f - 0010 (reserved) (XEN) 0010 - cf62f000 (usable) (XEN) cf62f000 - cf63c000 (ACPI data) (XEN) cf63c000 - cf63d000 (usable) (XEN) cf63d000 - d400 (reserved) (XEN) fec0 - fee1 (reserved) (XEN) ff80 - 0001 (reserved) (XEN) 0001 - 0004a000 (usable) (XEN) ACPI: RSDP 000F4F00, 0024 (r2 HP) (XEN) ACPI: XSDT CF630140, 00B4 (r1 HP ProLiant2 Ò 162E) (XEN) ACPI: FACP CF630240, 00F4 (r3 HP ProLiant2 Ò 162E) (XEN) ACPI: DSDT CF630340, 20BD (r1 HP DSDT1 INTL 20030228) (XEN) ACPI: FACS CF62F100, 0040 (XEN) ACPI: SPCR CF62F140, 0050 (r1 HP SPCRRBSU1 Ò 162E) (XEN) ACPI: MCFG CF62F1C0, 003C (r1 HP ProLiant1 0) (XEN) ACPI: HPET CF62F200, 0038 (r1 HP ProLiant2 Ò 162E) (XEN) ACPI: CF62F240, 0064 (r2 HP ProLiant2 Ò 162E) (XEN) ACPI: SPMI CF62F2C0, 0040 (r5 HP ProLiant1 Ò 162E) (XEN) ACPI: ERST CF62F300, 01D0 (r1 HP ProLiant1 Ò 162E) (XEN) ACPI: APIC CF62F500, 015E (r1 HP ProLiant2 0) (XEN) ACPI: SRAT CF62F680, 0570 (r1 HP Proliant1 Ò 162E) (XEN) ACPI: CF62FC00, 0176 (r1 HP ProLiant1 Ò 162E) (XEN) ACPI: BERT CF62FD80, 0030 (r1 HP ProLiant1 Ò 162E) (XEN) ACPI: HEST CF62FDC0, 00BC (r1 HP ProLiant1 Ò 162E) (XEN) ACPI: DMAR CF62FE80, 0150 (r1 HP ProLiant1 Ò 162E) (XEN) ACPI: SSDT CF632400, 0125 (r3 HP CRSPCI02 HP1) (XEN) ACPI: SSDT CF632540, 01CF (r3 HP riser1a2 INTL 20061109) (XEN) ACPI: SSDT CF632740, 03BB
Re: [Qemu-devel] [PATCH v6 22/22] hw/arm/virt: Enable dynamic generation of ACPI v5.1 tables
On 2015/5/8 0:09, Peter Maydell wrote: On 7 May 2015 at 10:29, Shannon Zhao zhaoshengl...@huawei.com wrote: From: Shannon Zhao shannon.z...@linaro.org Expose the needed device information to the table generation insfrastructure and register a machine_init_done notify to infrastructure. call virt_acpi_build(). Add CONFIG_ACPI to arm-softmmu.mak. Signed-off-by: Shannon Zhao zhaoshengl...@huawei.com Signed-off-by: Shannon Zhao shannon.z...@linaro.org --- default-configs/arm-softmmu.mak | 1 + default-configs/i386-softmmu.mak | 3 ++ default-configs/mips-softmmu.mak | 3 ++ default-configs/mips64-softmmu.mak | 3 ++ default-configs/mips64el-softmmu.mak | 3 ++ default-configs/mipsel-softmmu.mak | 3 ++ default-configs/x86_64-softmmu.mak | 3 ++ hw/acpi/Makefile.objs| 5 ++- hw/arm/virt.c| 78 hw/i2c/Makefile.objs | 2 +- 10 files changed, 94 insertions(+), 10 deletions(-) diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a767e4b..74f1db3 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -101,3 +101,4 @@ CONFIG_ALLWINNER_A10=y CONFIG_XIO3130=y CONFIG_IOH3420=y CONFIG_I82801B11=y +CONFIG_ACPI=y diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 6a74e00..d2de500 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y This is splitting the basic CONFIG_ACPI into several pieces, right? I think that deserves its own patch. Ok, will split this patch. What's the difference now between CONFIG_ACPI and CONFIG_ACPI_CORE ? I didn't find a proper name. Maybe we should name it CONFIG_ACPI_x86 for core.o piix4.o ich9.o pcihp.o as these files are for x86. CONFIG_APM=y CONFIG_I8257=y CONFIG_IDE_ISA=y diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak index cce2c81..c96d42d 100644 --- a/default-configs/mips-softmmu.mak +++ b/default-configs/mips-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y CONFIG_APM=y CONFIG_I8257=y CONFIG_PIIX4=y diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak index 7a88a08..d229f9e 100644 --- a/default-configs/mips64-softmmu.mak +++ b/default-configs/mips64-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y CONFIG_APM=y CONFIG_I8257=y CONFIG_PIIX4=y diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak index 095de43..ea31b8b 100644 --- a/default-configs/mips64el-softmmu.mak +++ b/default-configs/mips64el-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y CONFIG_APM=y CONFIG_I8257=y CONFIG_PIIX4=y diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak index 0e25108..9a4462e 100644 --- a/default-configs/mipsel-softmmu.mak +++ b/default-configs/mipsel-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y CONFIG_APM=y CONFIG_I8257=y CONFIG_PIIX4=y diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak index 46b87dd..11019b6 100644 --- a/default-configs/x86_64-softmmu.mak +++ b/default-configs/x86_64-softmmu.mak @@ -15,6 +15,9 @@ CONFIG_PCSPK=y CONFIG_PCKBD=y CONFIG_FDC=y CONFIG_ACPI=y +CONFIG_ACPI_CORE=y +CONFIG_ACPI_MEMORY_HOTPLUG=y +CONFIG_ACPI_CPU_HOTPLUG=y CONFIG_APM=y CONFIG_I8257=y CONFIG_IDE_ISA=y diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs index b9fefa7..511771a 100644 --- a/hw/acpi/Makefile.objs +++ b/hw/acpi/Makefile.objs @@ -1,5 +1,6 @@ -common-obj-$(CONFIG_ACPI) += core.o piix4.o ich9.o pcihp.o cpu_hotplug.o -common-obj-$(CONFIG_ACPI) += memory_hotplug.o +common-obj-$(CONFIG_ACPI_CORE) += core.o piix4.o ich9.o pcihp.o Why is x86 specific stuff like piix4 in the CONFIG_ACPI_CORE set? +common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o +common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o common-obj-$(CONFIG_ACPI) += acpi_interface.o common-obj-$(CONFIG_ACPI) += bios-linker-loader.o common-obj-$(CONFIG_ACPI) += aml-build.o diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 565f573..9291045 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -43,6 +43,7 @@
Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire
On 08/05/2015 08:45, Markus Armbruster wrote: Luckily, most common error values are more or less universal: in particular, of all errno values = 34 (up to ERANGE), they are all the same on supported platforms except for 11 (which is EAGAIN on Windows and Linux, but EDEADLK on Darwin and the *BSDs). Can EAGAIN or EDEADLK happen? I don't know is an acceptable answer :) No. Even stuff like EFAULT would be a bug. So, in order to guarantee some portability, only keep a dozen possible error codes and squash everything else to EINVAL. Ugh. I guess it'll do. Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in addition to 1, 5, ... Why? It's a binary protocol after all. But I agree that the right fix without backwards-compatibility would be to make the errors something like 0x8000 to 0x8004. Paolo If backward compatibility is not an issue: s/in addition to/instead of/. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/nbd.c b/nbd.c index eea8c51..1ad5b66 100644 --- a/nbd.c +++ b/nbd.c @@ -86,6 +86,37 @@ #define NBD_OPT_ABORT (2) #define NBD_OPT_LIST(3) +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ Is the protocol defined anywhere? +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return 1; +case EIO: +return 5; +case ENXIO: +return 6; +case E2BIG: +return 7; +case ENOMEM: +return 12; +case EACCES: +return 13; +case EFBIG: +return 27; +case ENOSPC: +return 28; +case EROFS: +return 30; +case EINVAL: +default: +return 22; +} +} + This maps recognized OS errnos to NBD errnos. The latter are literals. /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +/* NBD errors should be universally equal to the corresponding + * errno values, check it here. + */ +QEMU_BUILD_BUG_ON(EPERM != 1); +QEMU_BUILD_BUG_ON(EIO != 5); +QEMU_BUILD_BUG_ON(ENXIO != 6); +QEMU_BUILD_BUG_ON(E2BIG != 7); +QEMU_BUILD_BUG_ON(ENOMEM != 12); +QEMU_BUILD_BUG_ON(EACCES != 13); +QEMU_BUILD_BUG_ON(EINVAL != 22); +QEMU_BUILD_BUG_ON(EFBIG != 27); +QEMU_BUILD_BUG_ON(ENOSPC != 28); +QEMU_BUILD_BUG_ON(EROFS != 30); + This checks that the mapping above is the identify function for all the recognized NBD errnos. Why is that necessary? Same literals as above. Violates DRY. I don't mind all that much, but wonder whether we could at least do the checking next to system_errno_to_nbd_errno(). TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error)
Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64
On 08/05/2015 09:14, Pavel Fedin wrote: Hello! FWIW virt-manager 1.2.0 (just released) will do the following when creating a new VM: - aarch64 + kvm : -cpu host - aarch64 + tcg : -cpu cortex-a57 - arm32 + kvm : -cpu host - arm32 + tcg : defer to qemu Though if you explicitly request 'hypervisor default' then we won't specify any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64 virt-manager is not the only tool to create VMs... But, okay. Seems you just don't want to change this. Well... I still don't agree and this default looks strange for me, but it's just me. I'm out of further arguments. Actually I agree with your patch. It's the same that x86 does. Paolo
Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache
On Fri 24 Apr 2015 03:04:06 PM CEST, Kevin Wolf kw...@redhat.com wrote: I think it would be nice to have a way to free unused cache entries after a while. Do you think mmap plus a periodic timer would work? I'm hesitant about changes like this because they make QEMU more complex, slow down the guest, and make the memory footprint volatile. But if a simple solution addresses the problem, then I'd be happy. I understand. One possible approach would be to use the timer only if the cache size requested by the user is large, so 1) we only try to save memory when the amount of memory to save is significant. 2) we don't make the default scenario slower. I'll try to see if I can come up with a good solution (and with more numbers). I suspect that at this point, Anthony would have reminded us about separation of policy and mechanism. The very least we need is some option to control the policy (probably defaulting to no automatic cache size changes). There's also the problem that with a single chunk of memory for all cache tables it's not so easy to free individual entries. One possible solution would be madvise() with MADV_DONTNEED, and that's rather simple to use, but I'm unsure about its portability. The other option would be to let the management tool change the cache size options at runtime (and as it happens, my current bdrv_reopen() overhaul allows just that - except that a QMP interface isn't planned yet), but I'm not sure if it has enough information to make good decisions. If we know what the criteria are, they could possibly be exposed, though. But what would you do there? Recreate the whole cache? Berto
[Qemu-devel] [PULL 4/8] virtio-ccw: change realization sequence
virtio-ccw has an odd sequence of realizing devices: first the device-specific relization (net, block, ...), then the generic realization. It feels less odd to have the generic realization callback trigger the device-specific realization instead (and this also matches what virtio-pci does). One thing to note: We need to defer initializing the cu model in the sense id data until after the device-specific realization has been performed, as we need to refer to the virtio device's device_id. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Message-Id: 1429627016-30656-2-git-send-email-cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c | 41 + 1 file changed, 17 insertions(+), 24 deletions(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index c1d8288..1c2bd9d 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -642,8 +642,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw) return ret; } -static void virtio_ccw_device_realize(VirtioCcwDevice *dev, - VirtIODevice *vdev, Error **errp) +static void virtio_ccw_device_realize(VirtioCcwDevice *dev, Error **errp) { unsigned int cssid = 0; unsigned int ssid = 0; @@ -654,6 +653,9 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev, SubchDev *sch; int num; DeviceState *parent = DEVICE(dev); +Error *err = NULL; +VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_GET_CLASS(dev); +VirtIODevice *vdev; sch = g_malloc0(sizeof(SubchDev)); @@ -766,6 +768,18 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev, memset(sch-id, 0, sizeof(SenseId)); sch-id.reserved = 0xff; sch-id.cu_type = VIRTIO_CCW_CU_TYPE; + +if (k-realize) { +k-realize(dev, err); +} +if (err) { +error_propagate(errp, err); +css_subch_assign(cssid, ssid, schid, devno, NULL); +goto out_err; +} + +/* device_id is only set after vdev has been realized */ +vdev = virtio_ccw_get_vdev(sch); sch-id.cu_model = vdev-device_id; /* Only the first 32 feature bits are used. */ @@ -813,10 +827,7 @@ static void virtio_ccw_net_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } static void virtio_ccw_net_instance_init(Object *obj) @@ -839,10 +850,7 @@ static void virtio_ccw_blk_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } static void virtio_ccw_blk_instance_init(Object *obj) @@ -879,10 +887,7 @@ static void virtio_ccw_serial_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } @@ -904,10 +909,7 @@ static void virtio_ccw_balloon_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } static void balloon_ccw_stats_get_all(Object *obj, struct Visitor *v, @@ -972,10 +974,7 @@ static void virtio_ccw_scsi_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } static void virtio_ccw_scsi_instance_init(Object *obj) @@ -999,10 +998,7 @@ static void vhost_ccw_scsi_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_bool(OBJECT(vdev), true, realized, err); if (err) { error_propagate(errp, err); -return; } - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } static void vhost_ccw_scsi_instance_init(Object *obj) @@ -1030,8 +1026,6 @@ static void virtio_ccw_rng_realize(VirtioCcwDevice *ccw_dev, Error **errp) object_property_set_link(OBJECT(dev), OBJECT(dev-vdev.conf.rng), rng, NULL); - -virtio_ccw_device_realize(ccw_dev, VIRTIO_DEVICE(vdev), errp); } /* DeviceState to VirtioCcwDevice. Note: used on datapath, @@ -1640,10 +1634,9 @@ static const TypeInfo virtio_ccw_rng = { static void virtio_ccw_busdev_realize(DeviceState *dev, Error **errp) { VirtioCcwDevice *_dev = (VirtioCcwDevice *)dev; -VirtIOCCWDeviceClass *_info = VIRTIO_CCW_DEVICE_GET_CLASS(dev);
[Qemu-devel] [PULL 8/8] s390x/kvm: migrate vcpu interrupt state
From: Jens Freimann jf...@linux.vnet.ibm.com This patch adds support to migrate vcpu interrupts. We use ioctl KVM_S390_GET_IRQ_STATE and _SET_IRQ_STATE to get/set the complete interrupt state for a vcpu. Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- target-s390x/cpu-qom.h | 3 +++ target-s390x/cpu.c | 1 + target-s390x/cpu.h | 9 + target-s390x/kvm.c | 55 ++ target-s390x/machine.c | 15 +- 5 files changed, 82 insertions(+), 1 deletion(-) diff --git a/target-s390x/cpu-qom.h b/target-s390x/cpu-qom.h index 8b376df..936ae21 100644 --- a/target-s390x/cpu-qom.h +++ b/target-s390x/cpu-qom.h @@ -66,6 +66,9 @@ typedef struct S390CPU { /* public */ CPUS390XState env; +/* needed for live migration */ +void *irqstate; +uint32_t irqstate_saved_size; } S390CPU; static inline S390CPU *s390_env_get_cpu(CPUS390XState *env) diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c index e0537fa..d2f9836 100644 --- a/target-s390x/cpu.c +++ b/target-s390x/cpu.c @@ -213,6 +213,7 @@ static void s390_cpu_finalize(Object *obj) S390CPU *cpu = S390_CPU(obj); qemu_unregister_reset(s390_cpu_machine_reset_cb, cpu); +g_free(cpu-irqstate); #endif } diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h index ba7d250..c557211 100644 --- a/target-s390x/cpu.h +++ b/target-s390x/cpu.h @@ -1079,6 +1079,8 @@ void kvm_s390_clear_cmma_callback(void *opaque); int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state); void kvm_s390_reset_vcpu(S390CPU *cpu); int kvm_s390_set_mem_limit(KVMState *s, uint64_t new_limit, uint64_t *hw_limit); +void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu); +int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu); #else static inline void kvm_s390_io_interrupt(uint16_t subchannel_id, uint16_t subchannel_nr, @@ -1121,6 +1123,13 @@ static inline int kvm_s390_set_mem_limit(KVMState *s, uint64_t new_limit, { return 0; } +static inline void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu) +{ +} +static inline int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu) +{ +return 0; +} #endif static inline int s390_set_memory_limit(uint64_t new_limit, uint64_t *hw_limit) diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 43ad009..aba1265 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -109,6 +109,14 @@ #define ICPT_CPU_STOP 0x28 #define ICPT_IO 0x40 +#define NR_LOCAL_IRQS 32 +/* + * Needs to be big enough to contain max_cpus emergency signals + * and in addition NR_LOCAL_IRQS interrupts + */ +#define VCPU_IRQ_BUF_SIZE (sizeof(struct kvm_s390_irq) * \ + (max_cpus + NR_LOCAL_IRQS)) + static CPUWatchpoint hw_watchpoint; /* * We don't use a list because this structure is also used to transmit the @@ -274,6 +282,7 @@ int kvm_arch_init_vcpu(CPUState *cs) { S390CPU *cpu = S390_CPU(cs); kvm_s390_set_cpu_state(cpu, cpu-env.cpu_state); +cpu-irqstate = g_malloc0(VCPU_IRQ_BUF_SIZE); return 0; } @@ -2059,6 +2068,52 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t cpu_state) return ret; } +void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu) +{ +struct kvm_s390_irq_state irq_state; +CPUState *cs = CPU(cpu); +int32_t bytes; + +if (!kvm_check_extension(kvm_state, KVM_CAP_S390_IRQ_STATE)) { +return; +} + +irq_state.buf = (uint64_t) cpu-irqstate; +irq_state.len = VCPU_IRQ_BUF_SIZE; + +bytes = kvm_vcpu_ioctl(cs, KVM_S390_GET_IRQ_STATE, irq_state); +if (bytes 0) { +cpu-irqstate_saved_size = 0; +error_report(Migration of interrupt state failed); +return; +} + +cpu-irqstate_saved_size = bytes; +} + +int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu) +{ +CPUState *cs = CPU(cpu); +struct kvm_s390_irq_state irq_state; +int r; + +if (!kvm_check_extension(kvm_state, KVM_CAP_S390_IRQ_STATE)) { +return -ENOSYS; +} + +if (cpu-irqstate_saved_size == 0) { +return 0; +} +irq_state.buf = (uint64_t) cpu-irqstate; +irq_state.len = cpu-irqstate_saved_size; + +r = kvm_vcpu_ioctl(cs, KVM_S390_SET_IRQ_STATE, irq_state); +if (r) { +error_report(Setting interrupt state failed %d, r); +} +return r; +} + int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route, uint64_t address, uint32_t data) { diff --git a/target-s390x/machine.c b/target-s390x/machine.c index a034423..7853e3c 100644 --- a/target-s390x/machine.c +++ b/target-s390x/machine.c @@ -28,10 +28,19 @@ static int cpu_post_load(void *opaque, int version_id) */ if (kvm_enabled()) { kvm_s390_set_cpu_state(cpu, cpu-env.cpu_state); +return
[Qemu-devel] [PULL 3/8] s390-virtio: clear {used, avail}_event_idx on reset as well
From: Christian Borntraeger borntrae...@de.ibm.com The old s390-virtio transport clears the vring used/avail indices in the shared area on reset. When we enabled event_idx for virtio-blk, we noticed that this is not enough: We also need to clear the published used/avail event indices, or reboot will fail. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/s390-virtio-bus.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index 4a33c6a..4f69cbb 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -77,10 +77,18 @@ void s390_virtio_reset_idx(VirtIOS390Device *dev) VIRTIO_VRING_AVAIL_IDX_OFFS; address_space_stw(address_space_memory, idx_addr, 0, MEMTXATTRS_UNSPECIFIED, NULL); +idx_addr = virtio_queue_get_avail_addr(dev-vdev, i) + +virtio_queue_get_avail_size(dev-vdev, i); +address_space_stw(address_space_memory, idx_addr, 0, + MEMTXATTRS_UNSPECIFIED, NULL); idx_addr = virtio_queue_get_used_addr(dev-vdev, i) + VIRTIO_VRING_USED_IDX_OFFS; address_space_stw(address_space_memory, idx_addr, 0, MEMTXATTRS_UNSPECIFIED, NULL); +idx_addr = virtio_queue_get_used_addr(dev-vdev, i) + +virtio_queue_get_used_size(dev-vdev, i); +address_space_stw(address_space_memory, idx_addr, 0, + MEMTXATTRS_UNSPECIFIED, NULL); } } -- 2.4.0
[Qemu-devel] [PATCH COLO v4 11/15] Add new block driver interfaces to control block replication
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com Cc: Luiz Capitulino lcapitul...@redhat.com Cc: Michael Roth mdr...@linux.vnet.ibm.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com --- block.c | 40 include/block/block.h | 5 + include/block/block_int.h | 14 ++ qapi/block.json | 16 4 files changed, 75 insertions(+) diff --git a/block.c b/block.c index 050f919..54d5b29 100644 --- a/block.c +++ b/block.c @@ -4213,3 +4213,43 @@ void bdrv_disconnect(BlockDriverState *bs) bdrv_disconnect(bs-file); } } + +void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, +Error **errp) +{ +BlockDriver *drv = bs-drv; + +if (drv drv-bdrv_start_replication) { +drv-bdrv_start_replication(bs, mode, errp); +} else if (bs-file) { +bdrv_start_replication(bs-file, mode, errp); +} else { +error_setg(errp, this feature or command is not currently supported); +} +} + +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp) +{ +BlockDriver *drv = bs-drv; + +if (drv drv-bdrv_do_checkpoint) { +drv-bdrv_do_checkpoint(bs, errp); +} else if (bs-file) { +bdrv_do_checkpoint(bs-file, errp); +} else { +error_setg(errp, this feature or command is not currently supported); +} +} + +void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp) +{ +BlockDriver *drv = bs-drv; + +if (drv drv-bdrv_stop_replication) { +drv-bdrv_stop_replication(bs, failover, errp); +} else if (bs-file) { +bdrv_stop_replication(bs-file, failover, errp); +} else { +error_setg(errp, this feature or command is not currently supported); +} +} diff --git a/include/block/block.h b/include/block/block.h index ed5b8e5..c98adfb 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -596,4 +596,9 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs); void bdrv_connect(BlockDriverState *bs, Error **errp); void bdrv_disconnect(BlockDriverState *bs); +void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, +Error **errp); +void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp); +void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index da29fa7..761e6ff 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -293,6 +293,20 @@ struct BlockDriver { void (*bdrv_connect)(BlockDriverState *bs, Error **errp); void (*bdrv_disconnect)(BlockDriverState *bs); +void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode, + Error **errp); +/* Drop Disk buffer when doing checkpoint. */ +void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp); +/* + * After failover, we should flush Disk buffer into secondary disk + * and stop block replication. + * + * If the guest is shutdown, we should drop Disk buffer and stop + * block representation. + */ +void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover, + Error **errp); + QLIST_ENTRY(BlockDriver) list; }; diff --git a/qapi/block.json b/qapi/block.json index e313465..56e469d 100644 --- a/qapi/block.json +++ b/qapi/block.json @@ -40,6 +40,22 @@ 'data': ['auto', 'none', 'lba', 'large', 'rechs']} ## +# @ReplicationMode +# +# An enumeration of replication modes. +# +# @unprotected: Replication is not started or after failover. +# +# @primary: Primary mode, the vm's state will be sent to secondary QEMU. +# +# @secondary: Secondary mode, receive the vm's state from primary QEMU. +# +# Since: 2.4 +## +{ 'enum' : 'ReplicationMode', + 'data' : ['unprotected', 'primary', 'secondary']} + +## # @BlockdevSnapshotInternal # # @device: the name of the device to generate the snapshot from -- 2.1.0
[Qemu-devel] [PATCH COLO v4 06/15] Don't allow a disk use backing reference target
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block.c b/block.c index 35e1a95..15d21da 100644 --- a/block.c +++ b/block.c @@ -1204,6 +1204,14 @@ static int bdrv_open_backing_reference_file(BlockDriverState *bs, } backing_hd = blk_bs(backing_blk); +/* Don't allow a disk use backing reference target */ +ret = blk_attach_dev(backing_hd-blk, bs); +if (ret 0) { +error_setg(errp, backing_hd %s is used by the other device model, + backing_name); +goto free_exit; +} + /* Backing reference itself? */ if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) { error_setg(errp, Backing reference itself); @@ -1887,6 +1895,7 @@ void bdrv_close(BlockDriverState *bs) if (backing_hd-backing_hd-job) { block_job_cancel(backing_hd-backing_hd-job); } +blk_detach_dev(backing_hd-backing_hd-blk, bs); bdrv_set_backing_hd(backing_hd, NULL); bdrv_unref(backing_hd-backing_hd); } -- 2.1.0
[Qemu-devel] [PATCH COLO v4 02/15] allow writing to the backing file
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index 49d472f..8fa4f05 100644 --- a/block.c +++ b/block.c @@ -1051,6 +1051,20 @@ out: bdrv_refresh_limits(bs, NULL); } +#define ALLOW_WRITE_BACKING_FILEallow-write-backing-file +static QemuOptsList backing_file_opts = { +.name = backing_file, +.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head), +.desc = { +{ +.name = ALLOW_WRITE_BACKING_FILE, +.type = QEMU_OPT_BOOL, +.help = allow write to backing file, +}, +{ /* end of list */ } +}, +}; + /* * Opens the backing file for a BlockDriverState if not yet open * @@ -1065,6 +1079,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp) int ret = 0; BlockDriverState *backing_hd; Error *local_err = NULL; +QemuOpts *opts = NULL; +int flags; if (bs-backing_hd != NULL) { QDECREF(options); @@ -1077,6 +1093,19 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp) } bs-open_flags = ~BDRV_O_NO_BACKING; +flags = bdrv_backing_flags(bs-open_flags); + +opts = qemu_opts_create(backing_file_opts, NULL, 0, error_abort); +qemu_opts_absorb_qdict(opts, options, local_err); +if (local_err) { +ret = -EINVAL; +goto free_exit; +} +if (qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false)) { +flags |= BDRV_O_RDWR; +} +qemu_opts_del(opts); + if (qdict_haskey(options, file.filename)) { backing_filename[0] = '\0'; } else if (bs-backing_file[0] == '\0' qdict_size(options) == 0) { @@ -1109,7 +1138,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp) assert(bs-backing_hd == NULL); ret = bdrv_open(backing_hd, *backing_filename ? backing_filename : NULL, NULL, options, -bdrv_backing_flags(bs-open_flags), NULL, local_err); +flags, NULL, local_err); if (ret 0) { bdrv_unref(backing_hd); backing_hd = NULL; -- 2.1.0
[Qemu-devel] [PATCH COLO v4 14/15] quorum: allow ignoring child errors
If the child is not ready, read/write/getlength/flush will return -errno. It is not critical error, and can be ignored: 1. read/write: Just not report the error event. 2. getlength: just ignore it. If all children's getlength return -errno, and be ignored, return -EIO. 3. flush: Just ignore it. If all children's getlength return -errno, and be ignored, return 0. Usage: children.x.ignore-errors=true Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block/quorum.c | 64 +++--- 1 file changed, 57 insertions(+), 7 deletions(-) diff --git a/block/quorum.c b/block/quorum.c index e3c2e46..425459e 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -30,6 +30,7 @@ #define QUORUM_OPT_BLKVERIFY blkverify #define QUORUM_OPT_REWRITErewrite-corrupted #define QUORUM_OPT_READ_PATTERN read-pattern +#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS ignore-errors /* This union holds a vote hash value */ typedef union QuorumVoteValue { @@ -65,6 +66,7 @@ typedef struct QuorumVotes { /* the following structure holds the state of one quorum instance */ typedef struct BDRVQuorumState { BlockDriverState **bs; /* children BlockDriverStates */ +bool *ignore_errors; /* ignore children's error? */ int num_children; /* children count */ int threshold; /* if less than threshold children reads gave the * same result a quorum error occurs. @@ -99,6 +101,7 @@ typedef struct QuorumChildRequest { uint8_t *buf; int ret; QuorumAIOCB *parent; +int index; } QuorumChildRequest; /* Quorum will use the following structure to track progress of each read/write @@ -211,6 +214,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s, acb-qcrs[i].buf = NULL; acb-qcrs[i].ret = 0; acb-qcrs[i].parent = acb; +acb-qcrs[i].index = i; } return acb; @@ -304,7 +308,7 @@ static void quorum_aio_cb(void *opaque, int ret) acb-count++; if (ret == 0) { acb-success_count++; -} else { +} else if (!s-ignore_errors[sacb-index]) { quorum_report_bad(acb, sacb-aiocb-bs-node_name, ret); } assert(acb-count = s-num_children); @@ -719,19 +723,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs, static int64_t quorum_getlength(BlockDriverState *bs) { BDRVQuorumState *s = bs-opaque; -int64_t result; +int64_t result = -EIO; int i; /* check that all file have the same length */ -result = bdrv_getlength(s-bs[0]); -if (result 0) { -return result; -} -for (i = 1; i s-num_children; i++) { +for (i = 0; i s-num_children; i++) { int64_t value = bdrv_getlength(s-bs[i]); + if (value 0) { return value; } + +if (value == 0 s-ignore_errors[i]) { +/* + * If the child is not ready, it cannot return -errno, + * otherwise refresh_total_sectors() will fail when + * we open the child. + */ +continue; +} + +if (result == -EIO) { +result = value; +continue; +} + if (value != result) { return -EIO; } @@ -769,6 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs) for (i = 0; i s-num_children; i++) { result = bdrv_co_flush(s-bs[i]); +if (result 0 s-ignore_errors[i]) { +result = 0; +} result_value.l = result; quorum_count_vote(error_votes, result_value, i); } @@ -843,6 +862,19 @@ static QemuOptsList quorum_runtime_opts = { }, }; +static QemuOptsList quorum_children_common_opts = { +.name = quorum children, +.head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head), +.desc = { +{ +.name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS, +.type = QEMU_OPT_BOOL, +.help = ignore child I/O error, +}, +{ /* end of list */ } +}, +}; + static int parse_read_pattern(const char *opt) { int i; @@ -938,11 +970,14 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, /* allocate the children BlockDriverState array */ s-bs = g_new0(BlockDriverState *, s-num_children); opened = g_new0(bool, s-num_children); +s-ignore_errors = g_new0(bool, s-num_children); for (i = 0, lentry = qlist_first(list); lentry; lentry = qlist_next(lentry), i++) { QDict *d; QString *string; +QemuOpts *children_opts = NULL; +bool value; switch (qobject_type(lentry-value)) { @@ -950,6 +985,20 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, case QTYPE_QDICT: d =
[Qemu-devel] [PATCH COLO v4 12/15] skip nbd_target when starting block replication
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block.c | 12 1 file changed, 12 insertions(+) diff --git a/block.c b/block.c index 54d5b29..a442b5f 100644 --- a/block.c +++ b/block.c @@ -4219,6 +4219,10 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode, { BlockDriver *drv = bs-drv; +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) { +return; +} + if (drv drv-bdrv_start_replication) { drv-bdrv_start_replication(bs, mode, errp); } else if (bs-file) { @@ -4232,6 +4236,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp) { BlockDriver *drv = bs-drv; +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) { +return; +} + if (drv drv-bdrv_do_checkpoint) { drv-bdrv_do_checkpoint(bs, errp); } else if (bs-file) { @@ -4245,6 +4253,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp) { BlockDriver *drv = bs-drv; +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) { +return; +} + if (drv drv-bdrv_stop_replication) { drv-bdrv_stop_replication(bs, failover, errp); } else if (bs-file) { -- 2.1.0
Re: [Qemu-devel] [PATCH v1 1/1] hw/ptimer: Do not artificially limit timers when using icount
On Mon, May 04, 2015 at 11:27:36AM +0200, Paolo Bonzini wrote: On 04/05/2015 05:18, Edgar E. Iglesias wrote: From: Edgar E. Iglesias edgar.igles...@xilinx.com Signed-off-by: Edgar E. Iglesias edgar.igles...@xilinx.com --- hw/core/ptimer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c index 2abad1f..8437bd6 100644 --- a/hw/core/ptimer.c +++ b/hw/core/ptimer.c @@ -189,7 +189,7 @@ void ptimer_set_limit(ptimer_state *s, uint64_t limit, int reload) * on the current generation of host machines. */ -if (limit * s-period 1 s-period) { +if (!use_icount limit * s-period 1 s-period) { limit = 1 / s-period; } Reviewed-by: Paolo Bonzini pbonz...@redhat.com Thanks, I've applied this one. Cheers, Edgar
Re: [Qemu-devel] [PULL 05/16] arch_init: Alloc and free data struct for compression
Paolo Bonzini pbonz...@redhat.com wrote: On 07/05/2015 18:46, Juan Quintela wrote: static CompressParam *comp_param; static QemuThread *compress_threads; +/* comp_done_cond is used to wake up the migration thread when + * one of the compression threads has finished the compression. + * comp_done_lock is used to co-work with comp_done_cond. + */ +static QemuMutex *comp_done_lock; +static QemuCond *comp_done_cond; Can you please putr these (and the corresponding globals for decompression) in a struct? Added to ToDo list if Liang don't de it first. Later, Juan.
[Qemu-devel] [PATCH] Add virt-v3 machine that uses GIC-500
This is an alternative version of: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00945.html. Code merged with original virt implementation. I decided to stick to different machine name instead of adding an option because of compatibility with libvirt. In order to make use of a new option we would have to teach libvirt to pass it, which means extensive amount of patching. After all, there are architectural differences between GIC versions, so GICv3-based machine deserves to have a different name. I decided to name it virt-v3. Feel free to change it if you like, i'm just out of fantasy. I have tested the implementation with kernel v4.1rc2 and 16 CPUs. Works perfectly fine, no lockups were observed. Signed-off-by: Pavel Fedin p.fe...@samsung.com --- hw/arm/virt.c | 145 -- 1 file changed, 120 insertions(+), 25 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 565f573..b1c0eca 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -66,6 +66,10 @@ enum { VIRT_CPUPERIPHS, VIRT_GIC_DIST, VIRT_GIC_CPU, +VIRT_GIC_DIST_SPI = VIRT_GIC_CPU, +VIRT_ITS_CONTROL, +VIRT_ITS_TRANSLATION, +VIRT_LPI, VIRT_UART, VIRT_MMIO, VIRT_RTC, @@ -97,6 +101,7 @@ typedef struct { typedef struct { MachineState parent; bool secure; +bool gicv3; } VirtMachineState; #define TYPE_VIRT_MACHINE virt @@ -107,6 +112,8 @@ typedef struct { #define VIRT_MACHINE_CLASS(klass) \ OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE) +#define TYPE_VIRTV3_MACHINE virt-v3 + /* Addresses and sizes of our components. * 0..128MB is space for a flash device so we can run bootrom code such as UEFI. * 128MB..256MB is used for miscellaneous device I/O. @@ -121,25 +128,29 @@ typedef struct { */ static const MemMapEntry a15memmap[] = { /* Space up to 0x800 is reserved for a boot ROM */ -[VIRT_FLASH] = { 0, 0x0800 }, -[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 }, +[VIRT_FLASH] = { 0, 0x0800 }, +[VIRT_CPUPERIPHS] = { 0x0800, 0x0002 }, /* GIC distributor and CPU interfaces sit inside the CPU peripheral space */ -[VIRT_GIC_DIST] = { 0x0800, 0x0001 }, -[VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, -[VIRT_UART] = { 0x0900, 0x1000 }, -[VIRT_RTC] ={ 0x0901, 0x1000 }, -[VIRT_FW_CFG] = { 0x0902, 0x000a }, -[VIRT_MMIO] = { 0x0a00, 0x0200 }, +[VIRT_GIC_DIST] ={ 0x0800, 0x0001 }, +[VIRT_GIC_CPU] = { 0x0801, 0x0001 }, +/* On v3 VIRT_GIC_DIST_SPI takes place of VIRT_GIC_CPU */ +[VIRT_ITS_CONTROL] = { 0x0802, 0x1000 }, +[VIRT_ITS_TRANSLATION] = { 0x0803, 0x0001 }, +[VIRT_LPI] = { 0x0804, 0x0080 }, +[VIRT_UART] ={ 0x0900, 0x1000 }, +[VIRT_RTC] = { 0x0901, 0x1000 }, +[VIRT_FW_CFG] = { 0x0902, 0x000a }, +[VIRT_MMIO] ={ 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ /* * PCIE verbose map: * - * MMIO window { 0x1000, 0x2eff }, - * PIO window { 0x3eff, 0x0001 }, - * ECAM { 0x3f00, 0x0100 }, + * MMIO window { 0x1000, 0x2eff }, + * PIO window{ 0x3eff, 0x0001 }, + * ECAM { 0x3f00, 0x0100 }, */ -[VIRT_PCIE] = { 0x1000, 0x3000 }, -[VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 }, +[VIRT_PCIE] ={ 0x1000, 0x3000 }, +[VIRT_MEM] = { 0x4000, 30ULL * 1024 * 1024 * 1024 }, }; static const int a15irqmap[] = { @@ -299,11 +310,24 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi) { int cpu; +/* + * From Documentation/devicetree/bindings/arm/cpus.txt + * On ARM v8 64-bit systems value should be set to 2, + * that corresponds to the MPIDR_EL1 register size. + * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs + * in the system, #address-cells can be set to 1, since + * MPIDR_EL1[63:32] bits are not used for CPUs + * identification. + * + * Now GIC500 doesn't support affinities 2 3 so currently + * #address-cells can stay 1 until future GIC + */ qemu_fdt_add_subnode(vbi-fdt, /cpus); qemu_fdt_setprop_cell(vbi-fdt, /cpus, #address-cells, 0x1); qemu_fdt_setprop_cell(vbi-fdt, /cpus, #size-cells, 0x0); for (cpu = vbi-smp_cpus - 1; cpu = 0; cpu--) { +int Aff1, Aff0; char *nodename = g_strdup_printf(/cpus/cpu@%d, cpu); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); @@ -317,12 +341,20 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64
Hello! Ignoring legacy, the ideal naming scheme would be: qemu-system-arm aarch64-linux-user aarch32-linux-user You know, i rethought. Actually, the question concerns only virt machine, because for other real machines the CPU should be the same as on real HW, and this is no question. Shouldn't our default for virt be just host ? This would eliminate all ambiguity i believe... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia
[Qemu-devel] [PATCH COLO v4 09/15] Introduce a new -drive option to control whether to connect to remote target
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- blockdev.c| 8 include/block/block.h | 1 + qemu-options.hx | 4 3 files changed, 13 insertions(+) diff --git a/blockdev.c b/blockdev.c index 5eaf77e..6839c8f 100644 --- a/blockdev.c +++ b/blockdev.c @@ -429,6 +429,10 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, qdict_put(bs_opts, driver, qstring_from_str(buf)); } +if (qemu_opt_get_bool(opts, no-connect, false)) { +bdrv_flags |= BDRV_O_NO_CONNECT; +} + /* disk I/O throttling */ memset(cfg, 0, sizeof(cfg)); cfg.buckets[THROTTLE_BPS_TOTAL].avg = @@ -3197,6 +3201,10 @@ QemuOptsList qemu_common_drive_opts = { .name = detect-zeroes, .type = QEMU_OPT_STRING, .help = try to optimize zero writes (off, on, unmap), +},{ +.name = no-connect, +.type = QEMU_OPT_BOOL, +.help = enable whether to connect remote target }, { /* end of list */ } }, diff --git a/include/block/block.h b/include/block/block.h index e479a34..ed5b8e5 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -87,6 +87,7 @@ typedef struct HDGeometry { #define BDRV_O_PROTOCOL0x8000 /* if no block driver is explicitly given: select an appropriate protocol driver, ignoring the format layer */ +#define BDRV_O_NO_CONNECT 0x1 /* do not connect to remote target */ #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH) diff --git a/qemu-options.hx b/qemu-options.hx index ec356f6..c41e3e3 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -464,6 +464,7 @@ DEF(drive, HAS_ARG, QEMU_OPTION_drive, [[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n [[,iops_max=im]|[[,iops_rd_max=irm][,iops_wr_max=iwm]]]\n [[,iops_size=is]]\n + [,no-connect=on|off]\n use 'file' as a drive image\n, QEMU_ARCH_ALL) STEXI @item -drive @var{option}[,@var{option}[,@var{option}[,...]]] @@ -525,6 +526,9 @@ file sectors into the image file. conversion of plain zero writes by the OS to driver specific optimized zero write commands. You may even choose unmap if @var{discard} is set to unmap to allow a zero write to be converted to an UNMAP operation. +@item no-connect=@var{no-connect} +@var{no-connect} is on or off, and enables whether to connect to remote +target when open the drive. The default value is off. @end table By default, the @option{cache=writeback} mode is used. It will report data -- 2.1.0
[Qemu-devel] [PATCH COLO v4 03/15] Allow creating backup jobs when opening BDS
When opening BDS, we need to create backup jobs for image-fleecing. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com Cc: Jeff Cody jc...@redhat.com --- block/Makefile.objs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/Makefile.objs b/block/Makefile.objs index 0d8c2a4..8dd9b9c 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -21,10 +21,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-y += accounting.o block-obj-y += write-threshold.o +block-obj-y += backup.o common-obj-y += stream.o common-obj-y += commit.o -common-obj-y += backup.o iscsi.o-cflags := $(LIBISCSI_CFLAGS) iscsi.o-libs := $(LIBISCSI_LIBS) -- 2.1.0
[Qemu-devel] [PATCH COLO v4 10/15] NBD client: connect to nbd server later
The secondary qemu starts later than the primary qemu, so we cannot connect to nbd server in bdrv_open(). Introduce a new open flags to control it. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block/nbd.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 0624232..e1a04bb 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -298,10 +298,12 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, return -EINVAL; } -nbd_connect_server(bs, local_err); -if (local_err) { -error_propagate(errp, local_err); -return -EINVAL; +if (!(flags BDRV_O_NO_CONNECT)) { +nbd_connect_server(bs, local_err); +if (local_err) { +error_propagate(errp, local_err); +return -EINVAL; +} } return 0; -- 2.1.0
[Qemu-devel] [PATCH COLO v4 13/15] quorum: implement block driver interfaces for block replication
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block/quorum.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/block/quorum.c b/block/quorum.c index f91ef75..e3c2e46 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -82,6 +82,8 @@ typedef struct BDRVQuorumState { */ QuorumReadPattern read_pattern; + +int replication_index; /* store which child supports block replication */ } BDRVQuorumState; typedef struct QuorumAIOCB QuorumAIOCB; @@ -972,6 +974,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags, } g_free(opened); +s-replication_index = -1; goto exit; close_exit: @@ -1061,6 +1064,77 @@ static void quorum_refresh_filename(BlockDriverState *bs) bs-full_open_options = opts; } +static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode, + Error **errp) +{ +BDRVQuorumState *s = bs-opaque; +int count = 0, i, index; +Error *local_err = NULL; + +/* + * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary + * QEMU becoming primary QEMU. + */ +if (mode != REPLICATION_MODE_PRIMARY) { +error_setg(errp, Invalid parameter '%s', mode); +return; +} + +if (s-read_pattern != QUORUM_READ_PATTERN_FIFO) { +error_setg(errp, Invalid parameter '%s', read pattern); +return; +} + +for (i = 0; i s-num_children; i++) { +bdrv_start_replication(s-bs[i], mode, local_err); +if (local_err) { +error_free(local_err); +local_err = NULL; +} else { +count++; +index = i; +} +} + +if (count == 0) { +/* No child supports block replication */ +error_setg(errp, this feature or command is not currently supported); +} else if (count 1) { +for (i = 0; i s-num_children; i++) { +bdrv_stop_replication(s-bs[i], false, NULL); +} +error_setg(errp, too many children support block replication); +} else { +s-replication_index = index; +} +} + +static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp) +{ +BDRVQuorumState *s = bs-opaque; + +if (s-replication_index 0) { +error_setg(errp, Block replication is not started); +return; +} + +bdrv_do_checkpoint(s-bs[s-replication_index], errp); +} + +static void quorum_stop_replication(BlockDriverState *bs, bool failover, +Error **errp) +{ +BDRVQuorumState *s = bs-opaque; + +if (s-replication_index 0) { +error_setg(errp, Block replication is not started); +return; +} + +bdrv_stop_replication(s-bs[s-replication_index], failover, errp); +s-replication_index = -1; +} + static BlockDriver bdrv_quorum = { .format_name= quorum, .protocol_name = quorum, @@ -1084,6 +1158,10 @@ static BlockDriver bdrv_quorum = { .is_filter = true, .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter, + +.bdrv_start_replication = quorum_start_replication, +.bdrv_do_checkpoint = quorum_do_checkpoint, +.bdrv_stop_replication = quorum_stop_replication, }; static void bdrv_quorum_init(void) -- 2.1.0
Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64
Hello! FWIW virt-manager 1.2.0 (just released) will do the following when creating a new VM: - aarch64 + kvm : -cpu host - aarch64 + tcg : -cpu cortex-a57 - arm32 + kvm : -cpu host - arm32 + tcg : defer to qemu Though if you explicitly request 'hypervisor default' then we won't specify any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64 virt-manager is not the only tool to create VMs... But, okay. Seems you just don't want to change this. Well... I still don't agree and this default looks strange for me, but it's just me. I'm out of further arguments. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/7] Fix transactional snapshot with virtio-blk dataplane
On Thu, May 07, 2015 at 02:43:43PM +0100, Stefan Hajnoczi wrote: On Wed, May 06, 2015 at 07:23:32PM +0800, Fam Zheng wrote: Reported by Paolo. Unlike the iohandler in main loop, iothreads currently process the event notifier used as virtio-blk ioeventfd in all nested aio_poll. This is dangerous without proper protection, because guest requests could sneak to block layer where they mustn't. For example, a QMP transaction may involve multiple bdrv_drain_all() in handling the list of AioContext it works on. If an aio_poll in one of the bdrv_drain_all() happens to process a guest VQ kick by dispatching the ioeventfd event, a new guest write is then submitted, and voila, the transaction semantics is violated. This series avoids this problem by disabling virtio-blk handlers during bdrv_drain_all() and transactions. Notes: If the general approach is right, other transaction types could get the blockers similarly, in next revision. And some related bdrv_drain_all() could also be changed to bdrv_drain(). virtio-scsi-dataplane will be a bit more complicated, but still doable. It would probably need one more interface abstraction between scsi-disk, scsi-bus and virtio-scsi. Although other devices don't have a pause/resume callback yet, the blk_check_request, which returns -EBUSY if device io op blocker is set, could hopefully cover most cases already. Timers and block jobs also generate IO, but it should be fine as long as they don't change guest visible data, which is true AFAICT. Fam Zheng (7): block: Add op blocker type device IO block: Block device IO during bdrv_drain and bdrv_drain_all block: Add op blocker notifier list block-backend: Add blk_op_blocker_add_notifier virtio-blk: Move complete_request to 'ops' structure virtio-blk: Don't handle output when there is device IO op blocker blockdev: Add device IO op blocker during snapshot transaction block.c | 20 block/block-backend.c | 10 ++ block/io.c | 12 +++ blockdev.c | 7 + hw/block/dataplane/virtio-blk.c | 36 ++--- hw/block/virtio-blk.c | 69 +++-- include/block/block.h | 9 ++ include/block/block_int.h | 3 ++ include/hw/virtio/virtio-blk.h | 17 -- include/sysemu/block-backend.h | 2 ++ 10 files changed, 174 insertions(+), 11 deletions(-) Please also add a listener to nbd.c so NBD exports cannot submit I/O during bdrv_drain_all(). qmp_transaction probably also needs this. Stefan pgpAQPw3DSGnC.pgp Description: PGP signature
[Qemu-devel] [PULL 5/8] virtio-ccw: implement -device_plugged
Let's move operations that are only valid after the backend has been realized to a -device_plugged callback, just as virtio-pci does. Also reorder setting up the host feature bits to the sequence used by virtio-pci. While we're at it, also add a -device_unplugged callback to stop ioeventfd, just to be on the safe side. Reviewed-by: Shannon Zhao shannon.z...@linaro.org Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Message-Id: 1429627016-30656-3-git-send-email-cornelia.h...@de.ibm.com --- hw/s390x/virtio-ccw.c | 41 ++--- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index 1c2bd9d..430cc6f 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -652,10 +652,8 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev, Error **errp) bool found = false; SubchDev *sch; int num; -DeviceState *parent = DEVICE(dev); Error *err = NULL; VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_GET_CLASS(dev); -VirtIODevice *vdev; sch = g_malloc0(sizeof(SubchDev)); @@ -778,19 +776,6 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev, Error **errp) goto out_err; } -/* device_id is only set after vdev has been realized */ -vdev = virtio_ccw_get_vdev(sch); -sch-id.cu_model = vdev-device_id; - -/* Only the first 32 feature bits are used. */ -dev-host_features[0] = virtio_bus_get_vdev_features(dev-bus, - dev-host_features[0]); - -virtio_add_feature(dev-host_features[0], VIRTIO_F_NOTIFY_ON_EMPTY); -virtio_add_feature(dev-host_features[0], VIRTIO_F_BAD_FEATURE); - -css_generate_sch_crws(sch-cssid, sch-ssid, sch-schid, - parent-hotplugged, 1); return; out_err: @@ -1428,6 +1413,30 @@ static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f) return 0; } +/* This is called by virtio-bus just after the device is plugged. */ +static void virtio_ccw_device_plugged(DeviceState *d) +{ +VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); +SubchDev *sch = dev-sch; + +sch-id.cu_model = virtio_bus_get_vdev_id(dev-bus); + +/* Only the first 32 feature bits are used. */ +virtio_add_feature(dev-host_features[0], VIRTIO_F_NOTIFY_ON_EMPTY); +virtio_add_feature(dev-host_features[0], VIRTIO_F_BAD_FEATURE); +dev-host_features[0] = virtio_bus_get_vdev_features(dev-bus, + dev-host_features[0]); + +css_generate_sch_crws(sch-cssid, sch-ssid, sch-schid, + d-hotplugged, 1); +} + +static void virtio_ccw_device_unplugged(DeviceState *d) +{ +VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d); + +virtio_ccw_stop_ioeventfd(dev); +} / Virtio-ccw Bus Device Descriptions ***/ static Property virtio_ccw_net_properties[] = { @@ -1752,6 +1761,8 @@ static void virtio_ccw_bus_class_init(ObjectClass *klass, void *data) k-load_queue = virtio_ccw_load_queue; k-save_config = virtio_ccw_save_config; k-load_config = virtio_ccw_load_config; +k-device_plugged = virtio_ccw_device_plugged; +k-device_unplugged = virtio_ccw_device_unplugged; } static const TypeInfo virtio_ccw_bus_info = { -- 2.4.0
[Qemu-devel] [PATCH COLO v4 00/15] Block replication for continuous checkpoints
Block replication is a very important feature which is used for continuous checkpoints(for example: COLO). Usage: Please refer to docs/block-replication.txt You can get the patch here: https://github.com/wencongyang/qemu-colo/commits/block-replication-v4 You can get the patch with the other COLO patches here: https://github.com/wencongyang/qemu-colo/tree/colo_huawei_v4.7 TODO: 1. Test failover when the guest does too many I/O operations. If it takes too much time, we need to optimize it. 2. Continuous block replication. It will be started after basic functions are accepted. Changs Log: V4: 1. Introduce a new driver replication to avoid touch nbd and qcow2. V3: 1: use error_setg() instead of error_set() 2. Add a new block job API 3. Active disk, hidden disk and nbd target uses the same AioContext 4. Add a testcase to test new hbitmap API V2: 1. Redesign the secondary qemu(use image-fleecing) 2. Use Error objects to return error message 3. Address the comments from Max Reitz and Eric Blake Wen Congyang (15): docs: block replication's description allow writing to the backing file Allow creating backup jobs when opening BDS block: Parse backing_reference option to reference existing BDS Backup: clear all bitmap when doing block checkpoint Don't allow a disk use backing reference target Add new block driver interface to connect/disconnect the remote target NBD client: implement block driver interfaces to connect/disconnect NBD server Introduce a new -drive option to control whether to connect to remote target NBD client: connect to nbd server later Add new block driver interfaces to control block replication skip nbd_target when starting block replication quorum: implement block driver interfaces for block replication quorum: allow ignoring child errors Implement new driver for block replication block.c| 270 +++- block/Makefile.objs| 3 +- block/backup.c | 13 ++ block/nbd.c| 67 -- block/quorum.c | 142 - block/replication.c| 512 + blockdev.c | 8 + blockjob.c | 10 + docs/block-replication.txt | 179 include/block/block.h | 10 + include/block/block_int.h | 18 ++ include/block/blockjob.h | 12 ++ qapi/block.json| 16 ++ qemu-options.hx| 4 + tests/qemu-iotests/051 | 13 ++ tests/qemu-iotests/051.out | 13 ++ 16 files changed, 1260 insertions(+), 30 deletions(-) create mode 100644 block/replication.c create mode 100644 docs/block-replication.txt -- 2.1.0
[Qemu-devel] [PULL 7/8] s390x: move fpu regs into a subsection of the vmstate
From: David Hildenbrand d...@linux.vnet.ibm.com Let's move the floating point registers into a seperate subsection and bump up the version id. This cleans up the current vmstate and will allow for a future extension with vector registers in a compatible way. This patch is based on a patch from Eric Farman. Reviewed-by: Eric Farman far...@linux.vnet.ibm.com Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- target-s390x/machine.c | 36 +--- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/target-s390x/machine.c b/target-s390x/machine.c index bd4cea7..a034423 100644 --- a/target-s390x/machine.c +++ b/target-s390x/machine.c @@ -33,12 +33,11 @@ static int cpu_post_load(void *opaque, int version_id) return 0; } -const VMStateDescription vmstate_s390_cpu = { -.name = cpu, -.post_load = cpu_post_load, -.version_id = 2, -.minimum_version_id = 2, -.fields = (VMStateField[]) { +const VMStateDescription vmstate_fpu = { +.name = cpu/fpu, +.version_id = 1, +.minimum_version_id = 1, +.fields = (VMStateField[]) { VMSTATE_UINT64(env.fregs[0].ll, S390CPU), VMSTATE_UINT64(env.fregs[1].ll, S390CPU), VMSTATE_UINT64(env.fregs[2].ll, S390CPU), @@ -55,11 +54,26 @@ const VMStateDescription vmstate_s390_cpu = { VMSTATE_UINT64(env.fregs[13].ll, S390CPU), VMSTATE_UINT64(env.fregs[14].ll, S390CPU), VMSTATE_UINT64(env.fregs[15].ll, S390CPU), +VMSTATE_UINT32(env.fpc, S390CPU), +VMSTATE_END_OF_LIST() +} +}; + +static inline bool fpu_needed(void *opaque) +{ +return true; +} + +const VMStateDescription vmstate_s390_cpu = { +.name = cpu, +.post_load = cpu_post_load, +.version_id = 3, +.minimum_version_id = 3, +.fields = (VMStateField[]) { VMSTATE_UINT64_ARRAY(env.regs, S390CPU, 16), VMSTATE_UINT64(env.psw.mask, S390CPU), VMSTATE_UINT64(env.psw.addr, S390CPU), VMSTATE_UINT64(env.psa, S390CPU), -VMSTATE_UINT32(env.fpc, S390CPU), VMSTATE_UINT32(env.todpr, S390CPU), VMSTATE_UINT64(env.pfault_token, S390CPU), VMSTATE_UINT64(env.pfault_compare, S390CPU), @@ -74,4 +88,12 @@ const VMStateDescription vmstate_s390_cpu = { VMSTATE_UINT8(env.sigp_order, S390CPU), VMSTATE_END_OF_LIST() }, +.subsections = (VMStateSubsection[]) { +{ +.vmsd = vmstate_fpu, +.needed = fpu_needed, +} , { +/* empty */ +} +}, }; -- 2.4.0
[Qemu-devel] [PATCH COLO v4 04/15] block: Parse backing_reference option to reference existing BDS
Usage: -drive file=xxx,id=Y, \ -drive file=,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.* It will create such backing chain: {virtio-blk dev 'Y'} {virtio-blk dev 'X'} | | | | v v [base] - [mid] - ( Y ) - (hidden target) --- ( X ) v ^ v ^ v ^ v ^ drive-backup sync=none X's backing file is hidden-disk, and hidden-disk's backing file is Y. Disk Y may be opened or reopened in read-write mode, so A block backup job is automatically created: source is Y and target is hidden disk. Active disk X, hidden disk, and Y are all on the same AioContext. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block.c| 154 - include/block/block.h | 1 + include/block/block_int.h | 1 + tests/qemu-iotests/051 | 13 tests/qemu-iotests/051.out | 13 5 files changed, 179 insertions(+), 3 deletions(-) diff --git a/block.c b/block.c index 8fa4f05..35e1a95 100644 --- a/block.c +++ b/block.c @@ -1155,6 +1155,119 @@ free_exit: return ret; } +static void backing_reference_completed(void *opaque, int ret) +{ +BlockDriverState *hidden_disk = opaque; + +assert(!hidden_disk-backing_reference); +} + +static int bdrv_open_backing_reference_file(BlockDriverState *bs, +QDict *options, Error **errp) +{ +const char *backing_name; +QDict *hidden_disk_options = NULL; +BlockDriverState *backing_hd, *hidden_disk; +BlockBackend *backing_blk; +AioContext *aio_context; +Error *local_err = NULL; +int ret = 0; + +backing_name = qdict_get_try_str(options, drive_id); +if (!backing_name) { +error_setg(errp, Backing reference needs option drive_id); +ret = -EINVAL; +goto free_exit; +} +qdict_del(options, drive_id); + +qdict_extract_subqdict(options, hidden_disk_options, hidden-disk.); +if (!qdict_size(hidden_disk_options)) { +error_setg(errp, Backing reference needs option hidden-disk.*); +ret = -EINVAL; +goto free_exit; +} + +if (qdict_size(options)) { +const QDictEntry *entry = qdict_first(options); +error_setg(errp, Backing reference used by '%s' doesn't support + the option '%s', bdrv_get_device_name(bs), entry-key); +ret = -EINVAL; +goto free_exit; +} + +backing_blk = blk_by_name(backing_name); +if (!backing_blk) { +error_setg(errp, Device '%s' not found, backing_name); +ret = -ENOENT; +goto free_exit; +} + +backing_hd = blk_bs(backing_blk); +/* Backing reference itself? */ +if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) { +error_setg(errp, Backing reference itself); +ret = -EINVAL; +goto free_exit; +} + +if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE, + errp)) { +ret = -EBUSY; +goto free_exit; +} + +/* hidden-disk is bs's backing file */ +ret = bdrv_open_backing_file(bs, hidden_disk_options, errp); +hidden_disk_options = NULL; +if (ret 0) { +goto free_exit; +} + +hidden_disk = bs-backing_hd; +if (!hidden_disk-drv || !hidden_disk-drv-supports_backing) { +ret = -EINVAL; +error_setg(errp, Hidden disk's driver doesn't support backing files); +goto free_exit; +} + +bdrv_set_backing_hd(hidden_disk, backing_hd); +bdrv_ref(backing_hd); + +/* + * backing hd may be opened or reopened in read-write mode, so we + * should backup backing hd to hidden disk + */ +bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET, +bs-backing_blocker); +bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE, +hidden_disk-backing_blocker); + +bdrv_ref(hidden_disk); + +aio_context = bdrv_get_aio_context(bs); +aio_context_acquire(aio_context); +bdrv_set_aio_context(backing_hd, aio_context); +backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE, NULL, + BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT, + backing_reference_completed, hidden_disk, local_err); +aio_context_release(aio_context); +if (local_err)
[Qemu-devel] [PATCH COLO v4 05/15] Backup: clear all bitmap when doing block checkpoint
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com Cc: Jeff Cody jc...@redhat.com --- block/backup.c | 13 + blockjob.c | 10 ++ include/block/blockjob.h | 12 3 files changed, 35 insertions(+) diff --git a/block/backup.c b/block/backup.c index d3f648d..d3d8ba7 100644 --- a/block/backup.c +++ b/block/backup.c @@ -210,11 +210,24 @@ static void backup_iostatus_reset(BlockJob *job) bdrv_iostatus_reset(s-target); } +static void backup_do_checkpoint(BlockJob *job, Error **errp) +{ +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common); + +if (backup_job-sync_mode != MIRROR_SYNC_MODE_NONE) { +error_setg(errp, this feature or command is not currently supported); +return; +} + +hbitmap_reset_all(backup_job-bitmap); +} + static const BlockJobDriver backup_job_driver = { .instance_size = sizeof(BackupBlockJob), .job_type = BLOCK_JOB_TYPE_BACKUP, .set_speed = backup_set_speed, .iostatus_reset = backup_iostatus_reset, +.do_checkpoint = backup_do_checkpoint, }; static BlockErrorAction backup_error_action(BackupBlockJob *job, diff --git a/blockjob.c b/blockjob.c index 2755465..9d2128a 100644 --- a/blockjob.c +++ b/blockjob.c @@ -399,3 +399,13 @@ void block_job_defer_to_main_loop(BlockJob *job, qemu_bh_schedule(data-bh); } + +void block_job_do_checkpoint(BlockJob *job, Error **errp) +{ +if (!job-driver-do_checkpoint) { +error_setg(errp, this feature or command is not currently supported); +return; +} + +job-driver-do_checkpoint(job, errp); +} diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 57d8ef1..b832dc3 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -50,6 +50,9 @@ typedef struct BlockJobDriver { * manually. */ void (*complete)(BlockJob *job, Error **errp); + +/** Optional callback for job types that support checkpoint. */ +void (*do_checkpoint)(BlockJob *job, Error **errp); } BlockJobDriver; /** @@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job, BlockJobDeferToMainLoopFn *fn, void *opaque); +/** + * block_job_do_checkpoint: + * @job: The job. + * @errp: Error object. + * + * Do block checkpoint on the specified job. + */ +void block_job_do_checkpoint(BlockJob *job, Error **errp); + #endif -- 2.1.0
[Qemu-devel] [PATCH COLO v4 08/15] NBD client: implement block driver interfaces to connect/disconnect NBD server
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block/nbd.c | 65 - 1 file changed, 47 insertions(+), 18 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 2176186..0624232 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -44,6 +44,8 @@ typedef struct BDRVNBDState { NbdClientSession client; QemuOpts *socket_opts; +char *export; +bool connected; } BDRVNBDState; static int nbd_parse_uri(const char *filename, QDict *options) @@ -254,34 +256,55 @@ static int nbd_establish_connection(BlockDriverState *bs, Error **errp) return sock; } -static int nbd_open(BlockDriverState *bs, QDict *options, int flags, -Error **errp) +static void nbd_connect_server(BlockDriverState *bs, Error **errp) { BDRVNBDState *s = bs-opaque; -char *export = NULL; -int result, sock; -Error *local_err = NULL; - -/* Pop the config into our state object. Exit if invalid. */ -nbd_config(s, options, export, local_err); -if (local_err) { -error_propagate(errp, local_err); -return -EINVAL; -} +int sock; /* establish TCP connection, return error if it fails * TODO: Configurable retry-until-timeout behaviour. */ sock = nbd_establish_connection(bs, errp); if (sock 0) { -g_free(export); -return sock; +return; } /* NBD handshake */ -result = nbd_client_init(bs, sock, export, errp); -g_free(export); -return result; +nbd_client_init(bs, sock, s-export, errp); + +s-connected = true; +} + +static void nbd_disconnect_server(BlockDriverState *bs) +{ +BDRVNBDState *s = bs-opaque; + +if (s-connected) { +nbd_client_close(bs); +s-connected = false; +} +} + +static int nbd_open(BlockDriverState *bs, QDict *options, int flags, +Error **errp) +{ +BDRVNBDState *s = bs-opaque; +Error *local_err = NULL; + +/* Pop the config into our state object. Exit if invalid. */ +nbd_config(s, options, s-export, local_err); +if (local_err) { +error_propagate(errp, local_err); +return -EINVAL; +} + +nbd_connect_server(bs, local_err); +if (local_err) { +error_propagate(errp, local_err); +return -EINVAL; +} + +return 0; } static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num, @@ -318,7 +341,7 @@ static void nbd_close(BlockDriverState *bs) BDRVNBDState *s = bs-opaque; qemu_opts_del(s-socket_opts); -nbd_client_close(bs); +nbd_disconnect_server(bs); } static int64_t nbd_getlength(BlockDriverState *bs) @@ -400,6 +423,8 @@ static BlockDriver bdrv_nbd = { .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, .bdrv_refresh_filename = nbd_refresh_filename, +.bdrv_connect = nbd_connect_server, +.bdrv_disconnect= nbd_disconnect_server, }; static BlockDriver bdrv_nbd_tcp = { @@ -418,6 +443,8 @@ static BlockDriver bdrv_nbd_tcp = { .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, .bdrv_refresh_filename = nbd_refresh_filename, +.bdrv_connect = nbd_connect_server, +.bdrv_disconnect= nbd_disconnect_server, }; static BlockDriver bdrv_nbd_unix = { @@ -436,6 +463,8 @@ static BlockDriver bdrv_nbd_unix = { .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, .bdrv_refresh_filename = nbd_refresh_filename, +.bdrv_connect = nbd_connect_server, +.bdrv_disconnect= nbd_disconnect_server, }; static void bdrv_nbd_init(void) -- 2.1.0
Re: [Qemu-devel] [PATCH v4 4/4] spapr_pci: populate ibm,loc-code
On 05/07/2015 05:21 PM, Nikunj A Dadhania wrote: Each hardware instance has a platform unique location code. The OF device tree that describes a part of a hardware entity must include the “ibm,loc-code” property with a value that represents the location code for that hardware entity. Populate ibm,loc-code. 1) PCI passthru devices need to identify with its own ibm,loc-code available on the host. In failure cases use: vfio_name:phb-index:slot.fn 2) Emulated devices encode as following: qemu_name:phb-index:slot.fn Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com --- hw/ppc/spapr_pci.c | 98 +++--- 1 file changed, 86 insertions(+), 12 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 12f1b9c..d901007 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -769,6 +769,81 @@ static uint32_t spapr_phb_get_pci_drc_index(sPAPRPHBState *phb, return drck-get_index(drc); } +static bool spapr_phb_vfio_get_devspec_value(PCIDevice *pdev, char **value) Does it have to be a separate function? +{ +char *host; +char path[PATH_MAX]; + +host = object_property_get_str(OBJECT(pdev), host, NULL); +if (!host) { +return false; +} + +snprintf(path, sizeof(path), /sys/bus/pci/devices/%s/devspec, host); +g_free(host); + +return g_file_get_contents(path, value, NULL, NULL); +} + +static char *spapr_phb_vfio_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev) +{ +char path[PATH_MAX], *buf = NULL; + +/* We have a vfio host bridge lets get the path. */ +if (!spapr_phb_vfio_get_devspec_value(pdev, buf)) { +return NULL; +} + +snprintf(path, sizeof(path), /proc/device-tree%s/ibm,loc-code, buf); +g_free(buf); + +g_file_get_contents(path, buf, NULL, NULL); +return buf; +} + +static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev) +{ +char *path = g_malloc(PATH_MAX); + +if (!path) { +return NULL; +} + +/* + * For non-vfio devices make up the location code out + * of the name, slot and function. + * + * qemu_name:phb-index:slot.fn + */ +snprintf(path, PATH_MAX, qemu_%s:%02d:%02d.%1d, pdev-name, + sphb-index, PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn)); g_strdup_printf? +return path; +} + + +static char *spapr_ibm_get_loc_code(sPAPRPHBState *sphb, PCIDevice *pdev) s/spapr_ibm_get_loc_code/spapr_phb_get_loc_code/ Strange to see ibm in a function name, so far we have only used ibm in RTAS handler names :) +{ +if (object_dynamic_cast(OBJECT(pdev), vfio-pci) != NULL) { QEMU does not compare object_dynamic_cast with NULL anywhere, so: if (object_dynamic_cast(OBJECT(pdev), vfio-pci)) { +char *buf = spapr_phb_vfio_get_loc_code(sphb, pdev); + +/* + * In case of failures reading the loc-code, make it up + * indicating a vfio device + */ +if (!buf) { +buf = g_malloc(PATH_MAX); +if (!buf) { +return NULL; +} +snprintf(buf, PATH_MAX, vfio_%s:%02d:%02d.%1d, pdev-name, + sphb-index, PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn)); g_strdup_printf? Also, vfio_%s:%02d:%02d.%1d looks very similar to qemu_%s:%02d:%02d.%1d, you could extend spapr_phb_get_loc_code() to take vfio/qemu as a parameter. +} +return buf; +} else { +return spapr_phb_get_loc_code(sphb, pdev); +} +} + /* Macros to operate with address in OF binding to PCI */ #define b_x(x, p, l)(((x) ((1(l))-1)) (p)) #define b_n(x) b_x((x), 31, 1) /* 0 if relocatable */ @@ -906,12 +981,12 @@ static void populate_resource_props(PCIDevice *d, ResourceProps *rp) } static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset, - sPAPRPHBState *sphb, - const char *drc_name) + sPAPRPHBState *sphb) { ResourceProps rp; bool is_bridge = false; int pci_status; +char *buf = NULL; uint32_t drc_index = spapr_phb_get_pci_drc_index(sphb, dev); if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) == @@ -973,9 +1048,10 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset, * processed by OF beforehand */ _FDT(fdt_setprop_string(fdt, offset, name, pci)); -if (drc_name) { -_FDT(fdt_setprop(fdt, offset, ibm,loc-code, drc_name, - strlen(drc_name))); +buf = spapr_ibm_get_loc_code(sphb, dev); +if (buf) { It is rather: if (!buf) { error_report(Bad thing just happened); return -1; } OR g_assert(!buf); Your code can only return NULL if g_malloc() failed (otherwise it will be at least (qemuvfio)_%s:%02d:%02d.%1d) and if this happened, something went
[Qemu-devel] [PULL 0/8] s390x: various patches
The following changes since commit f8340b360b9bc29d48716ba8aca79df2b9544979: hw/ptimer: Do not artificially limit timers when using icount (2015-05-08 17:15:23 +1000) are available in the git repository at: git://github.com/cohuck/qemu tags/s390x-20150508 for you to fetch changes up to 3cda44f7bae5c9feddc11630ba6eecb2e3bed425: s390x/kvm: migrate vcpu interrupt state (2015-05-08 10:36:19 +0200) Assorted s390x patches: - updates for virtio-ccw and s390-virtio, making them more similar to virtio-pci - improvements regarding per-vcpu interrupts and migration Christian Borntraeger (2): s390-virtio: Accommodate guests using virtqueues too early s390-virtio: clear {used,avail}_event_idx on reset as well Cornelia Huck (3): s390-virtio: use common features virtio-ccw: change realization sequence virtio-ccw: implement -device_plugged David Hildenbrand (1): s390x: move fpu regs into a subsection of the vmstate Jens Freimann (2): s390x/kvm: use ioctl KVM_S390_IRQ for vcpu interrupts s390x/kvm: migrate vcpu interrupt state hw/s390x/s390-virtio-bus.c | 28 - hw/s390x/s390-virtio.c | 10 ++ hw/s390x/virtio-ccw.c | 72 +++ target-s390x/cpu-qom.h | 3 ++ target-s390x/cpu.c | 1 + target-s390x/cpu.h | 9 ++ target-s390x/kvm.c | 77 -- target-s390x/machine.c | 49 - 8 files changed, 192 insertions(+), 57 deletions(-) -- 2.4.0
[Qemu-devel] [PULL 2/8] s390-virtio: use common features
We used to avoid enabling event_idx for virtio-blk devices via s390-virtio, but we now have a workaround in place for guests trying to use the device before setting DRIVER_OK. Therefore, let's add DEFINE_VIRTIO_COMMON_FEATURES to the base device so all devices get those common features - and make s390-virtio use the same mechanism as the other transports do. Acked-by: Christian Borntraeger borntrae...@de.ibm.com Reviewed-by: Shannon Zhao shannon.z...@linaro.org Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/s390-virtio-bus.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c index c27f8a5..4a33c6a 100644 --- a/hw/s390x/s390-virtio-bus.c +++ b/hw/s390x/s390-virtio-bus.c @@ -530,7 +530,6 @@ static unsigned virtio_s390_get_features(DeviceState *d) / S390 Virtio Bus Device Descriptions ***/ static Property s390_virtio_net_properties[] = { -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_NET_FEATURES(VirtIOS390Device, host_features), DEFINE_PROP_END_OF_LIST(), }; @@ -592,18 +591,12 @@ static const TypeInfo s390_virtio_serial = { .class_init= s390_virtio_serial_class_init, }; -static Property s390_virtio_rng_properties[] = { -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), -DEFINE_PROP_END_OF_LIST(), -}; - static void s390_virtio_rng_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass); k-realize = s390_virtio_rng_realize; -dc-props = s390_virtio_rng_properties; set_bit(DEVICE_CATEGORY_MISC, dc-categories); } @@ -632,10 +625,16 @@ static void s390_virtio_busdev_reset(DeviceState *dev) virtio_reset(_dev-vdev); } +static Property virtio_s390_properties[] = { +DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), +DEFINE_PROP_END_OF_LIST(), +}; + static void virtio_s390_device_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +dc-props = virtio_s390_properties; dc-realize = s390_virtio_busdev_realize; dc-bus_type = TYPE_S390_VIRTIO_BUS; dc-reset = s390_virtio_busdev_reset; @@ -651,7 +650,6 @@ static const TypeInfo virtio_s390_device_info = { }; static Property s390_virtio_scsi_properties[] = { -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), DEFINE_VIRTIO_SCSI_FEATURES(VirtIOS390Device, host_features), DEFINE_PROP_END_OF_LIST(), }; @@ -675,18 +673,12 @@ static const TypeInfo s390_virtio_scsi = { }; #ifdef CONFIG_VHOST_SCSI -static Property s390_vhost_scsi_properties[] = { -DEFINE_VIRTIO_COMMON_FEATURES(VirtIOS390Device, host_features), -DEFINE_PROP_END_OF_LIST(), -}; - static void s390_vhost_scsi_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass); k-realize = s390_vhost_scsi_realize; -dc-props = s390_vhost_scsi_properties; set_bit(DEVICE_CATEGORY_STORAGE, dc-categories); } -- 2.4.0
[Qemu-devel] [PATCH COLO v4 01/15] docs: block replication's description
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- docs/block-replication.txt | 179 + 1 file changed, 179 insertions(+) create mode 100644 docs/block-replication.txt diff --git a/docs/block-replication.txt b/docs/block-replication.txt new file mode 100644 index 000..a29f51a --- /dev/null +++ b/docs/block-replication.txt @@ -0,0 +1,179 @@ +Block replication + +Copyright Fujitsu, Corp. 2015 +Copyright (c) 2015 Intel Corporation +Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD. + +This work is licensed under the terms of the GNU GPL, version 2 or later. +See the COPYING file in the top-level directory. + +Block replication is used for continuous checkpoints. It is designed +for COLO (COurse-grain LOck-stepping) where the Secondary VM is running. +It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario, +where the Secondary VM is not running. + +This document gives an overview of block replication's design. + +== Background == +High availability solutions such as micro checkpoint and COLO will do +consecutive checkpoints. The VM state of Primary VM and Secondary VM is +identical right after a VM checkpoint, but becomes different as the VM +executes till the next checkpoint. To support disk contents checkpoint, +the modified disk contents in the Secondary VM must be buffered, and are +only dropped at next checkpoint time. To reduce the network transportation +effort at the time of checkpoint, the disk modification operations of +Primary disk are asynchronously forwarded to the Secondary node. + +== Workflow == +The following is the image of block replication workflow: + ++--+++ +|Primary Write Requests||Secondary Write Requests| ++--+++ + | | + | (4) + | V + | /-\ + | Copy and Forward| | + |-(1)--+ | Disk Buffer | + | | | | + | (3) \-/ + | speculative ^ + |write through(2) + | | | + V V | + +--+ ++ + | Primary Disk | | Secondary Disk | + +--+ ++ + +1) Primary write requests will be copied and forwarded to Secondary + QEMU. +2) Before Primary write requests are written to Secondary disk, the + original sector content will be read from Secondary disk and + buffered in the Disk buffer, but it will not overwrite the existing + sector content(it could be from either Secondary Write Requests or + previous COW of Primary Write Requests) in the Disk buffer. +3) Primary write requests will be written to Secondary disk. +4) Secondary write requests will be buffered in the Disk buffer and it + will overwrite the existing sector content in the buffer. + +== Architecture == +We are going to implement block replication from many basic +blocks that are already in QEMU. + + virtio-blk || + ^||.-- + |||| Secondary +1 Quorum ||'-- + / \ || +/\|| + Primary2 filter + disk ^ virtio-blk + | ^ +3 NBD --- 3 NBD | +client|| server 2 filter + ||^ ^ +. ||| | +Primary | || Secondary disk - hidden-disk 5 - active-disk 4 +' ||| backing^ backing + ||| | + ||| | + ||'-'
[Qemu-devel] [PATCH COLO v4 07/15] Add new block driver interface to connect/disconnect the remote target
In some cases, we want to connect/disconnect the remote target when we need, not in bdrv_open()/bdrv_close(). Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block.c | 24 include/block/block.h | 3 +++ include/block/block_int.h | 3 +++ 3 files changed, 30 insertions(+) diff --git a/block.c b/block.c index 15d21da..050f919 100644 --- a/block.c +++ b/block.c @@ -4189,3 +4189,27 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs) { return bs-stats; } + +void bdrv_connect(BlockDriverState *bs, Error **errp) +{ +BlockDriver *drv = bs-drv; + +if (drv drv-bdrv_connect) { +drv-bdrv_connect(bs, errp); +} else if (bs-file) { +bdrv_connect(bs-file, errp); +} else { +error_setg(errp, this feature or command is not currently supported); +} +} + +void bdrv_disconnect(BlockDriverState *bs) +{ +BlockDriver *drv = bs-drv; + +if (drv drv-bdrv_disconnect) { +drv-bdrv_disconnect(bs); +} else if (bs-file) { +bdrv_disconnect(bs-file); +} +} diff --git a/include/block/block.h b/include/block/block.h index 006c941..e479a34 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -592,4 +592,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs); BlockAcctStats *bdrv_get_stats(BlockDriverState *bs); +void bdrv_connect(BlockDriverState *bs, Error **errp); +void bdrv_disconnect(BlockDriverState *bs); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 49ca826..da29fa7 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -290,6 +290,9 @@ struct BlockDriver { */ int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo); +void (*bdrv_connect)(BlockDriverState *bs, Error **errp); +void (*bdrv_disconnect)(BlockDriverState *bs); + QLIST_ENTRY(BlockDriver) list; }; -- 2.1.0
Re: [Qemu-devel] [PATCH] Fix default CPU model for ARM64
On Fri, May 8, 2015 at 12:14 AM, Pavel Fedin p.fe...@samsung.com wrote: Hello! FWIW virt-manager 1.2.0 (just released) will do the following when creating a new VM: - aarch64 + kvm : -cpu host - aarch64 + tcg : -cpu cortex-a57 - arm32 + kvm : -cpu host - arm32 + tcg : defer to qemu Though if you explicitly request 'hypervisor default' then we won't specify any -cpu and defer to qemu, which will hit the cortex-a15 default for aarch64 virt-manager is not the only tool to create VMs... But, okay. Seems you just don't want to change this. Well... I still don't agree and this default looks strange for me, but it's just me. I'm out of further arguments. I think the real bug is the naming scheme for QEMU targets. QEMU targets such as Aarch64 and x86_64 are following the long established convention of using an ISA for a name, but are ultimately badly named as they support multiple ISAs beyond what their singular name suggests. qemu-system-arm would be the be the best name for the works-for-all-ARM QEMU but that name is taken. But if we are true to qemu-system-aarch64 being a functional superset of arm, can we just unify as the one arch (arm) and phase out qemu-system-aarch64 (would become an alias of qemu-system-arm)?. Linux user builds are the main complication, but that is the case where it very much does make sense to tie to an ISA. Ignoring legacy, the ideal naming scheme would be: qemu-system-arm aarch64-linux-user aarch32-linux-user Regards, Peter Regards, Peter Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia
Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description
On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: * Stefan Hajnoczi (stefa...@redhat.com) wrote: On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: On 24/04/2015 11:38, Wen Congyang wrote: That can be done with drive-mirror. But I think it's too early for that. Do you mean use drive-mirror instead of quorum? Only before starting up a new secondary. Basically you do a migration with non-shared storage, and then start the secondary in colo mode. But it's only for the failover case. Quorum (or a new block/colo.c driver or filter) is fine for normal colo operation. Perhaps this patch series should mirror the Secondary's disk to a Backup Secondary so that the system can be protected very quickly after failover. I think anyone serious about fault tolerance would deploy a Backup Secondary, otherwise the system cannot survive two failures unless a human administrator is lucky/fast enough to set up a new Secondary. I'd assumed that a higher level management layer would do the allocation of a new secondary after the first failover, so no human need be involved. That doesn't help, after the first failover is too late even if it's done by a program. There should be no window during which the VM is unprotected. People who want fault tolerance care about 9s of availability. The VM must be protected on the new Primary as soon as the failover occurs, otherwise this isn't a serious fault tolerance solution. Stefan pgpVH1OR4tWnF.pgp Description: PGP signature
[Qemu-devel] [PULL 1/8] s390-virtio: Accommodate guests using virtqueues too early
From: Christian Borntraeger borntrae...@de.ibm.com Feature updates are not a synchronuous operation for the legacy s390-virtio transport. This transport syncs the guest feature bits (those from finalize) on the set_status hypercall. Before that qemu thinks that features are zero, which means QEMU will misbehave, e.g. it will not write the event index, even if the guest asks for it. Let's detect the case where a kick happens before the driver is ready and force sync the features. With this workaround, it is now safe to switch to the common feature bit handling code as used by all other transports. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- hw/s390x/s390-virtio.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/s390x/s390-virtio.c b/hw/s390x/s390-virtio.c index 3a1b9ee..59750db 100644 --- a/hw/s390x/s390-virtio.c +++ b/hw/s390x/s390-virtio.c @@ -77,6 +77,16 @@ static int s390_virtio_hcall_notify(const uint64_t *args) if (mem ram_size) { VirtIOS390Device *dev = s390_virtio_bus_find_vring(s390_bus, mem, i); if (dev) { +/* + * Older kernels will use the virtqueue before setting DRIVER_OK. + * In this case the feature bits are not yet up to date, meaning + * that several funny things can happen, e.g. the guest thinks + * EVENT_IDX is on and QEMU thinks it is off. Let's force a feature + * and status sync. + */ +if (!(dev-vdev-status VIRTIO_CONFIG_S_DRIVER_OK)) { +s390_virtio_device_update_status(dev); +} virtio_queue_notify(dev-vdev, i); } else { r = -EINVAL; -- 2.4.0
[Qemu-devel] [PULL 6/8] s390x/kvm: use ioctl KVM_S390_IRQ for vcpu interrupts
From: Jens Freimann jf...@linux.vnet.ibm.com KVM_S390_INT uses only two parameter fields. This is not enough to pass all required information for certain interrupts. A new ioctl KVM_S390_IRQ is available which allows us to inject all local interrupts as defined in the Principles of Operation. It takes a struct kvm_s390_irq as a parameter which can store interrupt payload data for all interrupts. Let's use the new ioctl for injecting vcpu interrupts. Tested-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- target-s390x/kvm.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c index 8e65e43..43ad009 100644 --- a/target-s390x/kvm.c +++ b/target-s390x/kvm.c @@ -124,6 +124,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static int cap_sync_regs; static int cap_async_pf; static int cap_mem_op; +static int cap_s390_irq; static void *legacy_s390_alloc(size_t size, uint64_t *align); @@ -249,6 +250,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS); cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF); cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP); +cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ); kvm_s390_enable_cmma(s); @@ -827,10 +829,9 @@ static int s390_kvm_irq_to_interrupt(struct kvm_s390_irq *irq, return r; } -void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct kvm_s390_irq *irq) +static void inject_vcpu_irq_legacy(CPUState *cs, struct kvm_s390_irq *irq) { struct kvm_s390_interrupt kvmint = {}; -CPUState *cs = CPU(cpu); int r; r = s390_kvm_irq_to_interrupt(irq, kvmint); @@ -846,6 +847,23 @@ void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct kvm_s390_irq *irq) } } +void kvm_s390_vcpu_interrupt(S390CPU *cpu, struct kvm_s390_irq *irq) +{ +CPUState *cs = CPU(cpu); +int r; + +if (cap_s390_irq) { +r = kvm_vcpu_ioctl(cs, KVM_S390_IRQ, irq); +if (!r) { +return; +} +error_report(KVM failed to inject interrupt %llx, irq-type); +exit(1); +} + +inject_vcpu_irq_legacy(cs, irq); +} + static void __kvm_s390_floating_interrupt(struct kvm_s390_irq *irq) { struct kvm_s390_interrupt kvmint = {}; -- 2.4.0
[Qemu-devel] [PATCH COLO v4 15/15] Implement new driver for block replication
Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- block/Makefile.objs | 1 + block/replication.c | 512 2 files changed, 513 insertions(+) create mode 100644 block/replication.c diff --git a/block/Makefile.objs b/block/Makefile.objs index 8dd9b9c..8b1c587 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -22,6 +22,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-y += accounting.o block-obj-y += write-threshold.o block-obj-y += backup.o +block-obj-y += replication.o common-obj-y += stream.o common-obj-y += commit.o diff --git a/block/replication.c b/block/replication.c new file mode 100644 index 000..f2a9b2b --- /dev/null +++ b/block/replication.c @@ -0,0 +1,512 @@ +#include qemu-common.h +#include block/block_int.h +#include block/blockjob.h +#include block/nbd.h + +typedef struct BDRVReplicationState { +ReplicationMode mode; +int replication_state; +char *export_name; +NBDExport *exp; +BlockDriverState *active_disk; +BlockDriverState *hidden_disk; +int error; + +/* + * After failover, secondary qemu should read/write this + * bs directly. + */ +BlockDriverState *bs; +} BDRVReplicationState; + +enum { +BLOCK_REPLICATION_NONE, /* block replication is not started */ +BLOCK_REPLICATION_RUNNING, /* block replication is running */ +BLOCK_REPLICATION_DONE, /* block replication is done(failover) */ +}; + +#define COMMIT_CLUSTER_BITS 16 +#define COMMIT_CLUSTER_SIZE (1 COMMIT_CLUSTER_BITS) +#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE) + +static void replication_stop(BlockDriverState *bs, bool failover, Error **errp); + +#define NBD_OPT_EXPORT export +#define REPLICATION_MODEmode +static QemuOptsList replication_runtime_opts = { +.name = replication, +.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head), +.desc = { +{ +.name = REPLICATION_MODE, +.type = QEMU_OPT_STRING, +}, +{ +.name = NBD_OPT_EXPORT, +.type = QEMU_OPT_STRING, +.help = The NBD server name, +}, +{ /* end of list */ } +}, +}; + +static int replication_open(BlockDriverState *bs, QDict *options, +int flags, Error **errp) +{ +int ret; +BDRVReplicationState *s = bs-opaque;; +Error *local_err = NULL; +QemuOpts *opts = NULL; +const char *mode; + +ret = -EINVAL; +opts = qemu_opts_create(replication_runtime_opts, NULL, 0, error_abort); +qemu_opts_absorb_qdict(opts, options, local_err); +if (local_err) { +goto fail; +} + +mode = qemu_opt_get(opts, REPLICATION_MODE); +if (!mode) { +error_setg(local_err, Missing the option mode); +goto fail; +} + +if (!strcmp(mode, primary)) { +s-mode = REPLICATION_MODE_PRIMARY; +} else if (!strcmp(mode, secondary)) { +s-mode = REPLICATION_MODE_SECONDARY; +} else { +error_setg(local_err, + The option mode's value should be primary or secondary); +goto fail; +} + +if (s-mode == REPLICATION_MODE_SECONDARY) { +s-export_name = g_strdup(qemu_opt_get(opts, NBD_OPT_EXPORT)); +if (!s-export_name) { +error_setg(local_err, Missing the option export); +goto fail; +} +} + +return 0; + +fail: +qemu_opts_del(opts); +/* propagate error */ +if (local_err) { +error_propagate(errp, local_err); +} +return ret; +} + +static void replication_close(BlockDriverState *bs) +{ +BDRVReplicationState *s = bs-opaque; + +if (s-replication_state == BLOCK_REPLICATION_RUNNING) { +replication_stop(bs, false, NULL); +} + +g_free(s-export_name); +} + +static void replication_refresh_filename(BlockDriverState *bs) +{ +} + +static int64_t replication_getlength(BlockDriverState *bs) +{ +return bdrv_getlength(bs-file); +} + +static int replication_get_io_status(BDRVReplicationState *s) +{ +switch (s-replication_state) { +case BLOCK_REPLICATION_NONE: +return -EIO; +case BLOCK_REPLICATION_RUNNING: +return 0; +case BLOCK_REPLICATION_DONE: +return s-mode == REPLICATION_MODE_PRIMARY ? -EIO : 1; +default: +abort(); +} +} + +static int replication_return_value(BDRVReplicationState *s, int ret) +{ +if (s-mode == REPLICATION_MODE_SECONDARY) { +return ret; +} + +if (ret 0) { +s-error = ret; +ret = 0; +} + +return ret; +} + +static coroutine_fn int replication_co_readv(BlockDriverState *bs, + int64_t sector_num, + int remaining_sectors, +
Re: [Qemu-devel] [PATCH 2/6] watchdog: Add watchdog device diag288 to the sysbus
On Fri, 17 Apr 2015 09:52:37 +0200 Cornelia Huck cornelia.h...@de.ibm.com wrote: From: Xu Wang gesa...@linux.vnet.ibm.com A new sysbus device for diag288 watchdog, it monitors the kvm guest status, and take corresponding actions when it detects that the guest is not responding. Signed-off-by: Xu Wang gesa...@linux.vnet.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- default-configs/s390x-softmmu.mak | 1 + hw/watchdog/Makefile.objs | 1 + hw/watchdog/wdt_diag288.c | 110 ++ include/hw/watchdog/wdt_diag288.h | 36 + qemu-options.hx | 6 ++- 5 files changed, 152 insertions(+), 2 deletions(-) create mode 100644 hw/watchdog/wdt_diag288.c create mode 100644 include/hw/watchdog/wdt_diag288.h OK, after some deliberation, I think it is best to model the diag288 watchdog as a bus-less device: The other watchdogs are modelled as devices as well, and it is probably a good idea to stay with this model as other parties may have made assumptions based on this. Xu, can you please switch the diag288 device to TYPE_DEVICE and address Markus' comments regarding the help text as well?
Re: [Qemu-devel] [PULL 00/16] Migration pull request (v2)
On Thu, May 07, 2015 at 11:40:50PM +0530, Amit Shah wrote: On (Thu) 07 May 2015 [13:45:26], Peter Maydell wrote: On 7 May 2015 at 12:50, Juan Quintela quint...@redhat.com wrote: Hi again For v2 - fix 32bit compilation (as said, compiling for 64bit linux, 64bit windows and 32bit windows was not enough) - Now, we have versions 2.4 everywhere (thanks Eric) - Liang Li sent a new patch to the list to fix the update of a migration parameter, included. Please apply, and sorry for the inconvenience. Fails to build on win32: Does the buildbot try all these combinations? I want to have a 'stage' branch where I just push unapplied patches and receive complaints before sending a pull req. The buildbot is dead and requires maintenance: http://buildbot.b1-systems.de/qemu/one_line_per_build There are two alternatives: 1. Travis (see .travis.yml) but it's missing mingw32. It will never be able to do builds for host operating systems that do not support cross-compilation from Linux. 2. patchew (http://qemu.patchew.org/) but it only has Fedora 20 x86_64 builds at the moment. Adding mingw32 cross-compilation should be possible but it scans the mailing list rather than git repos. The source code to patchew is available here: https://github.com/famz/patchew The trouble with continuous integration and build farms is that they require maintenance. I think patchew is the best bet right now since Fam is developing it. Stefan pgpvELUKvDUM_.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire
Paolo Bonzini pbonz...@redhat.com writes: On 08/05/2015 08:45, Markus Armbruster wrote: Luckily, most common error values are more or less universal: in particular, of all errno values = 34 (up to ERANGE), they are all the same on supported platforms except for 11 (which is EAGAIN on Windows and Linux, but EDEADLK on Darwin and the *BSDs). Can EAGAIN or EDEADLK happen? I don't know is an acceptable answer :) No. Even stuff like EFAULT would be a bug. Good. So, in order to guarantee some portability, only keep a dozen possible error codes and squash everything else to EINVAL. Ugh. I guess it'll do. Cleaner solution: Fix the protocol to transmit EPERM, EIO, ... in addition to 1, 5, ... Why? It's a binary protocol after all. Okay, I just succeeded in demonstrating my ignorance %-) But I agree that the right fix without backwards-compatibility would be to make the errors something like 0x8000 to 0x8004. Paolo Two more remarks below. If backward compatibility is not an issue: s/in addition to/instead of/. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/nbd.c b/nbd.c index eea8c51..1ad5b66 100644 --- a/nbd.c +++ b/nbd.c @@ -86,6 +86,37 @@ #define NBD_OPT_ABORT (2) #define NBD_OPT_LIST(3) +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ Is the protocol defined anywhere? https://github.com/yoe/nbd/blob/master/doc/proto.txt All it has on the error code is this paragraph: The reply contains three fields: a 32 bit magic number ('magic'), a 32 bit error code ('error'; 0, unless an error occurred in which case it is the errno of the error on the server side), and the same 64 bit handle that the corresponding request had in its 'handle' field. In case the reply is sent in response to a read request and the error field is 0 (zero), the reply header is immediately followed by request.len bytes of data. Could you update it to document the errno compatibility issues, and recommended practice (i.e. this patch's)? +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return 1; +case EIO: +return 5; +case ENXIO: +return 6; +case E2BIG: +return 7; +case ENOMEM: +return 12; +case EACCES: +return 13; +case EFBIG: +return 27; +case ENOSPC: +return 28; +case EROFS: +return 30; +case EINVAL: +default: +return 22; +} +} + This maps recognized OS errnos to NBD errnos. The latter are literals. /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +/* NBD errors should be universally equal to the corresponding + * errno values, check it here. + */ +QEMU_BUILD_BUG_ON(EPERM != 1); +QEMU_BUILD_BUG_ON(EIO != 5); +QEMU_BUILD_BUG_ON(ENXIO != 6); +QEMU_BUILD_BUG_ON(E2BIG != 7); +QEMU_BUILD_BUG_ON(ENOMEM != 12); +QEMU_BUILD_BUG_ON(EACCES != 13); +QEMU_BUILD_BUG_ON(EINVAL != 22); +QEMU_BUILD_BUG_ON(EFBIG != 27); +QEMU_BUILD_BUG_ON(ENOSPC != 28); +QEMU_BUILD_BUG_ON(EROFS != 30); + This checks that the mapping above is the identify function for all the recognized NBD errnos. Why is that necessary? Still curious. Explain, and earn my R-by :) Same literals as above. Violates DRY. I don't mind all that much, but wonder whether we could at least do the checking next to system_errno_to_nbd_errno(). Could we? TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error)
Re: [Qemu-devel] [PATCH v6 3/6] Qemu-Xen-vTPM: Xen frontend driver infrastructure
On Mon, 4 May 2015, Quan Xu wrote: This patch adds infrastructure for xen front drivers living in qemu, so drivers don't need to implement common stuff on their own. It's mostly xenbus management stuff: some functions to access XenStore, setting up XenStore watches, callbacks on device discovery and state changes, and handle event channel between the virtual machines. Call xen_fe_register() function to register XenDevOps, and make sure, [...] 3 = [...] device = (frontend device, the backend is running in QEMU/.etc) vkbd = [...] vif = [...] .. (QEMU) xen_vtpmdev_ops is initialized with the following process: xen_hvm_init() [...] --xen_fe_register(vtpm, ...) --xenstore_fe_scan() --xen_fe_try_init(ops) -- XenDevOps.init() --xen_fe_get_xendev() -- XenDevOps.alloc() --xen_fe_check() --xen_fe_try_initialise() -- XenDevOps.initialise() --xen_fe_try_connected() -- XenDevOps.connected() --xs_watch() [...] Signed-off-by: Quan Xu quan...@intel.com --Changes in v6: -Replace buf_size with PAGE_SIZE and use length rather than shr-length. --- hw/tpm/Makefile.objs | 1 + hw/tpm/xen_vtpm_frontend.c | 315 +++ hw/xen/xen_frontend.c| 20 +++ include/hw/xen/xen_backend.h | 5 + include/hw/xen/xen_common.h | 6 + xen-hvm.c| 5 + 6 files changed, 352 insertions(+) create mode 100644 hw/tpm/xen_vtpm_frontend.c diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs index 99f5983..57919fa 100644 --- a/hw/tpm/Makefile.objs +++ b/hw/tpm/Makefile.objs @@ -1,2 +1,3 @@ common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o +common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o diff --git a/hw/tpm/xen_vtpm_frontend.c b/hw/tpm/xen_vtpm_frontend.c new file mode 100644 index 000..d6e7cc6 --- /dev/null +++ b/hw/tpm/xen_vtpm_frontend.c @@ -0,0 +1,315 @@ +/* + * Connect to Xen vTPM stubdom domain + * + * Copyright (c) 2015 Intel Corporation + * Authors: + *Quan Xu quan...@intel.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/ + */ + +#include stdio.h +#include stdlib.h +#include stdarg.h +#include string.h +#include unistd.h +#include signal.h +#include inttypes.h +#include time.h +#include fcntl.h +#include errno.h +#include sys/ioctl.h +#include sys/types.h +#include sys/stat.h +#include sys/mman.h +#include sys/uio.h + +#include hw/hw.h +#include block/aio.h +#include hw/xen/xen_backend.h + +#ifndef XS_STUBDOM_VTPM_ENABLE +#define XS_STUBDOM_VTPM_ENABLE1 +#endif + +#ifndef PAGE_SIZE +#define PAGE_SIZE 4096 +#endif + +enum tpmif_state { +/* No contents, vTPM idle, cancel complete */ +TPMIF_STATE_IDLE, +/* Request ready or vTPM working */ +TPMIF_STATE_SUBMIT, +/* Response ready or vTPM idle */ +TPMIF_STATE_FINISH, +/* Cancel requested or vTPM working */ +TPMIF_STATE_CANCEL, +}; + +static AioContext *vtpm_aio_ctx; + +enum status_bits { +VTPM_STATUS_RUNNING = 0x1, +VTPM_STATUS_IDLE = 0x2, +VTPM_STATUS_RESULT = 0x4, +VTPM_STATUS_CANCELED = 0x8, +}; + +struct tpmif_shared_page { +/* Request and response length in bytes */ +uint32_t length; +/* Enum tpmif_state */ +uint8_t state; +/* For the current request */ +uint8_t locality; +/* Should be zero */ +uint8_t pad; +/* Extra pages for long packets; may be zero */ +uint8_t nr_extra_pages; +/* + * Grant IDs, the length is actually nr_extra_pages. + * beyond the extra_pages entries is the actual request + * and response. + */ +uint32_t extra_pages[0]; +}; + +struct xen_vtpm_dev { +struct XenDevice xendev; /* must be first */ +struct tpmif_shared_page *shr; +xc_gntshr*xen_xcs; +int ring_ref; +int bedomid; +QEMUBH *sr_bh; +}; + +static uint8_t vtpm_status(struct xen_vtpm_dev *vtpmdev) +{ +switch (vtpmdev-shr-state) { +case TPMIF_STATE_IDLE: +case TPMIF_STATE_FINISH:
Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols
Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben: On 07/05/2015 16:34, Kevin Wolf wrote: Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben: On 07/05/2015 16:07, Kevin Wolf wrote: This is not right for two reasons: The first is that this is BlockBackend code I think it would take effect for the qemu-nbd case though. Oh, you want to change the server code rather than the client? Yes. Actually, considering all the information in this thread, I'm inclined that we should change both sides. qemu-nbd because ENOSPC might be what clients expect by analogy with Linux block devices, even if the behaviour for accesses beyond the device size isn't specified in the NBD protocol and the server might just do anything. As long as the behaviour is undefined, it's nice to do what most people may expect. And as the real fix change the nbd client, because even if new qemu-nbd versions will be nice, we shouldn't rely on undefined behaviour. We know that old qemu-nbd servers won't produce ENOSPC and I'm not sure what other NBD servers do. Wait... Are you saying that NBD sends a (platform specific) errno value over the network? :-/ Yes. :/ That said, at least the error codes that Linux places in /usr/include/asm/errno-base.h seem to be pretty much standard---at least Windows and most Unices share them---with the exception of EAGAIN. I'll send a patch to NBD to standardize the set of error codes that it sends. Thanks, that will be helpful in the future. Is this the right place to look up the spec? http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt If so, the commands seem to be hopelessly underspecified, especially with respect to error conditions. And where it says something about errors, it doesn't make sense: The server is forbidden to reply to a NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) In theory, what error code the NBD server needs to send should be specified by the NBD protocol. Am I right to assume that it doesn't do that? Nope. In any case, I'm not sure whether qemu's internal error code should change just for NBD. Producing the right error code for the protocol is the job of nbd_co_receive_request(). Ok, so it shouldn't reach blk_check_request at all. But then, we should aim at making blk_check_request's checks assertions. Sounds fair as a goal, but I don't think all devices have such checks yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk) just a while ago. and it wouldn't even take effect for the qcow2 case where we're writing past EOF only on the protocol layer. The second is that -ENOSPC is only for writes and not for reads. This is right. Reads in the kernel return 0, but in QEMU we do not want that. The code currently returns -EIO, but perhaps -EINVAL is a better match. It also happens to be what Linux returns for discards. Perhaps it is, yes. It shouldn't make a difference for guests anyway. (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's most likely not what we want.) Yes, we want the check duplicated in all BlockBackend users. Most of them already do it, see the work that Markus did last year I think. I wouldn't call it duplicated because the action to take is different for each device, but yes, the check belongs there. Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols
On 08/05/2015 12:08, Kevin Wolf wrote: Actually, considering all the information in this thread, I'm inclined that we should change both sides. qemu-nbd because ENOSPC might be what clients expect by analogy with Linux block devices, even if the behaviour for accesses beyond the device size isn't specified in the NBD protocol and the server might just do anything. As long as the behaviour is undefined, it's nice to do what most people may expect. And as the real fix change the nbd client, because even if new qemu-nbd versions will be nice, we shouldn't rely on undefined behaviour. We know that old qemu-nbd servers won't produce ENOSPC and I'm not sure what other NBD servers do. Sounds like a plan. Thanks, that will be helpful in the future. Is this the right place to look up the spec? http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt Yes. If so, the commands seem to be hopelessly underspecified, especially with respect to error conditions. And where it says something about errors, it doesn't make sense: The server is forbidden to reply to a NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) So does nbd-server. O:-) Looks like you're reading the spec too literally (which is never a bad thing). Ok, so it shouldn't reach blk_check_request at all. But then, we should aim at making blk_check_request's checks assertions. Sounds fair as a goal, but I don't think all devices have such checks yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk) just a while ago. Indeed (aim at). Paolo
Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices
On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote: This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan pgpHaZFSXh31k.pgp Description: PGP signature
Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols
Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben: On 08/05/2015 12:08, Kevin Wolf wrote: If so, the commands seem to be hopelessly underspecified, especially with respect to error conditions. And where it says something about errors, it doesn't make sense: The server is forbidden to reply to a NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) So does nbd-server. O:-) Looks like you're reading the spec too literally (which is never a bad thing). I don't think there is something like reading a spec too literally. Specs are meant to be read literally. If a specification is open to interpretation, you don't need it. So I'd rather say I've found a bug in the spec. ;-) As you already seem to be working on the NBD mailing list, do you want to fix this, or should I subscribe and send a patch myself? Kevin
[Qemu-devel] eventfd_signal()
Is eventfd blocking? By that I mean: If I signal the guest for something, will that thread remain blocked till the guest clears the interrupt?
[Qemu-devel] [Bug 1349972] Re: qcow2-refcount: qemu-io crashes on 'discard' command
I sent a patch that fixes the original problem that Maria reported. It's hard to say whether this is the same problem as you saw, Sam, but it's quite possible. ** Changed in: qemu Status: New = In Progress -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1349972 Title: qcow2-refcount: qemu-io crashes on 'discard' command Status in QEMU: In Progress Bug description: qemu-io is killed by SIGIOT at the 'discard' command on the image having no refcount information. Sequence: 1. Unpack test.img and backing_img.qed in the same directory (see the attached archives for images) 2. Make a copy of test.img to copy.img (qemu-io modifies the image before being kill, therefore the image backup is necessary) 3. Run the command qemu-io copy.img -c 'discard 2210816 2856448' Result: qemu-io is killed by SIGIOT with the reason: qemu-io: block/qcow2-refcount.c:468: update_refcount_discard: Assertion `d-bytes + length == new_end - new_start' failed. The image was generated by the image fuzzer. qemu.git HEAD: 1d80eb7a680d To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1349972/+subscriptions
[Qemu-devel] [PATCH] net: Change help text to list -netdev instead of -net by default
Looking at the output of qemu-system-xxx -help, you easily get the impression that -net is the preferred way instead of -netdev to specify host network interface, since the -net option is omnipresent but the -netdev option is only listed as a one-liner at the end. This is ugly since -net is considered as legacy and even might be removed one day. Thus, this patch switches the output to explain the host network interfaces with the -netdev option instead, moving the legacy -net option into some few lines at the end. Signed-off-by: Thomas Huth th...@redhat.com --- qemu-options.hx | 66 ++--- 1 file changed, 39 insertions(+), 27 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index ec356f6..2cab5f3 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1449,25 +1449,25 @@ DEF(smb, HAS_ARG, QEMU_OPTION_smb, , QEMU_ARCH_ALL) #endif #endif -DEF(net, HAS_ARG, QEMU_OPTION_net, --net nic[,vlan=n][,macaddr=mac][,model=type][,name=str][,addr=str][,vectors=v]\n -create a new Network Interface Card and connect it to VLAN 'n'\n +DEF(netdev, HAS_ARG, QEMU_OPTION_netdev, #ifdef CONFIG_SLIRP --net user[,vlan=n][,name=str][,net=addr[/mask]][,host=addr][,restrict=on|off]\n +-netdev user,id=str[,net=addr[/mask]][,host=addr][,restrict=on|off]\n [,hostname=host][,dhcpstart=addr][,dns=addr][,dnssearch=domain][,tftp=dir]\n [,bootfile=f][,hostfwd=rule][,guestfwd=rule] #ifndef _WIN32 [,smb=dir[,smbserver=addr]]\n #endif -connect the user mode network stack to VLAN 'n', configure its\n -DHCP server and enabled optional services\n +enable a user mode network interface, identified by 'str',\n +configure its DHCP server and enabled optional services\n #endif #ifdef _WIN32 --net tap[,vlan=n][,name=str],ifname=name\n -connect the host TAP network interface to VLAN 'n'\n +-netdev tap,id=str,ifname=name\n +enable a host TAP network interface with ID 'str'\n #else --net tap[,vlan=n][,name=str][,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n -connect the host TAP network interface to VLAN 'n'\n +-netdev tap,id=str[,fd=h][,fds=x:y:...:z][,ifname=name][,script=file][,downscript=dfile]\n + [,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off]\n + [,vhostfd=h][,vhostfds=x:y:...:z][,vhostforce=on|off][,queues=n]\n +enable a host TAP network interface, identified by 'str'\n use network scripts 'file' (default= DEFAULT_NETWORK_SCRIPT )\n to configure it and 'dfile' (default= DEFAULT_NETWORK_DOWN_SCRIPT )\n to deconfigure it\n @@ -1486,13 +1486,16 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, use 'vhostfd=h' to connect to an already opened vhost net device\n use 'vhostfds=x:y:...:z to connect to multiple already opened vhost net devices\n use 'queues=n' to specify the number of queues to be created for multiqueue TAP\n --net bridge[,vlan=n][,name=str][,br=bridge][,helper=helper]\n +-netdev bridge,id=str[,br=bridge][,helper=helper]\n connects a host TAP network interface to a host bridge device 'br'\n (default= DEFAULT_BRIDGE_INTERFACE ) using the program 'helper'\n (default= DEFAULT_BRIDGE_HELPER )\n #endif #ifdef __linux__ --net l2tpv3[,vlan=n][,name=str],src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6=on/off][,udp=on/off][,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]\n +-netdev l2tpv3,id=str,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport]\n + [,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,udp=on/off]\n + [,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n + [,rxcookie=rxcookie][,offset=offset]\n connect the VLAN to an Ethernet over L2TPv3 pseudowire\n Linux kernel 3.3+ as well as most routers can talk\n L2TPv3. This transport allows connecting a VM to a VM,\n @@ -1514,32 +1517,39 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, use 'pincounter=on' to work around broken counter handling in peer\n use 'offset=X' to add an extra offset between header and data\n #endif --net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n -connect the vlan 'n' to another VLAN using a
[Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire
Right now, NBD includes potentially platform-specific error values in the wire protocol. Luckily, most common error values are more or less universal: in particular, of all errno values = 34 (up to ERANGE), they are all the same on supported platforms except for 11 (which is EAGAIN on Windows and Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee some portability, only keep a handful of possible error codes and squash everything else to EINVAL. This patch defines a limited set of errno values that are valid for the NBD protocol, and specifies recommendations for what error to return in specific corner cases. The set of errno values is roughly based on the errors listed in the read(2) and write(2) man pages, with some exceptions: - ENOMEM is added for servers that implement copy-on-write or other formats that require dynamic allocation. - EDQUOT is not part of the universal set of errors; it can be changed to ENOSPC on the wire format. - EFBIG is part of the universal set of errors, but it is also changed to ENOSPC because it is pretty similar to ENOSPC or EDQUOT. Incoming values will in general match system errno values, but not on the Hurd which has different errno values (they have a subsystem code equal to 0x10 in bits 24-31). The Hurd is probably not something to which QEMU has been ported, but still do the right thing and reverse-map the NBD errno values to the system errno values. The corresponding patch to the NBD protocol description can be found at http://article.gmane.org/gmane.linux.drivers.nbd.general/3154. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v1-v2: use #defines and do not expect errnos to match NBD errnos for the sake of Hurd. Reduced the list of errno values even more. Better commit message explaining how the list was obtained. nbd.c | 52 1 file changed, 52 insertions(+) diff --git a/nbd.c b/nbd.c index cb1b9bb..57d71b2 100644 --- a/nbd.c +++ b/nbd.c @@ -86,6 +86,54 @@ #define NBD_OPT_ABORT (2) #define NBD_OPT_LIST(3) +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ +#define NBD_EPERM 1 +#define NBD_EIO5 +#define NBD_ENOMEM 12 +#define NBD_EINVAL 22 +#define NBD_ENOSPC 28 + +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return NBD_EPERM; +case EIO: +return NBD_EIO; +case ENOMEM: +return NBD_ENOMEM; +#ifdef EDQUOT +case EDQUOT: +#endif +case EFBIG: +case ENOSPC: +return NBD_ENOSPC; +case EINVAL: +default: +return NBD_EINVAL; +} +} + +static int nbd_errno_to_system_errno(int err) +{ +switch (err) { +case NBD_EPERM: +return EPERM; +case NBD_EIO: +return EIO; +case NBD_ENOMEM: +return ENOMEM; +case NBD_ENOSPC: +return ENOSPC; +case NBD_EINVAL: +default: +return EINVAL; +} +} + /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +904,8 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +reply-error = nbd_errno_to_system_errno(reply-error); + TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +922,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error) -- 2.3.5
Re: [Qemu-devel] [PATCH v6 2/6] Qemu-Xen-vTPM: Xen frontend driver infrastructure
On Mon, 4 May 2015, Quan Xu wrote: @@ -97,6 +116,7 @@ extern struct XenDevOps xen_kbdmouse_ops; /* xen_framebuffer.c */ extern struct XenDevOps xen_framebuffer_ops; /* xen_framebuffer.c */ extern struct XenDevOps xen_blkdev_ops; /* xen_disk.c*/ extern struct XenDevOps xen_netdev_ops; /* xen_nic.c */ +extern struct XenDevOps xen_vtpmdev_ops; /* xen_vtpm_frontend.c*/ This should be in the next patch.
Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description
* Kevin Wolf (kw...@redhat.com) wrote: Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben: On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: * Stefan Hajnoczi (stefa...@redhat.com) wrote: On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: On 24/04/2015 11:38, Wen Congyang wrote: That can be done with drive-mirror. But I think it's too early for that. Do you mean use drive-mirror instead of quorum? Only before starting up a new secondary. Basically you do a migration with non-shared storage, and then start the secondary in colo mode. But it's only for the failover case. Quorum (or a new block/colo.c driver or filter) is fine for normal colo operation. Perhaps this patch series should mirror the Secondary's disk to a Backup Secondary so that the system can be protected very quickly after failover. I think anyone serious about fault tolerance would deploy a Backup Secondary, otherwise the system cannot survive two failures unless a human administrator is lucky/fast enough to set up a new Secondary. I'd assumed that a higher level management layer would do the allocation of a new secondary after the first failover, so no human need be involved. That doesn't help, after the first failover is too late even if it's done by a program. There should be no window during which the VM is unprotected. People who want fault tolerance care about 9s of availability. The VM must be protected on the new Primary as soon as the failover occurs, otherwise this isn't a serious fault tolerance solution. If you're worried about two failures in a row, why wouldn't you be worried about three in a row? I think if you really want more than one backup to be ready, you shouldn't go to two, but to n. Agreed, if you did multiple secondaries you'd do 'n'. But 1+2 does satisfy all but the most paranoid; and in particular it does mean that if you want to take a host down for some maintenance you can do it without worrying. But, as I said in my reply to Stefan, doing more than 1+1 gets really hairy; the combinations of failovers are much more complicated. Dave 1) It means that 1) As Stefan mentions you get worried about the lack of protection after the first failover; Kevin -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices
On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote: This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan pgpYpxPw21sfz.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire
On 08/05/2015 12:12, Markus Armbruster wrote: The corresponding patch to the NBD protocol description can be found at http://article.gmane.org/gmane.linux.drivers.nbd.general/3154. [...] - EFBIG is part of the universal set of errors, but it is also changed to ENOSPC because it is pretty similar to ENOSPC or EDQUOT. Perhaps debatable, but I defer to your judgement. EFBIG is weird anyway, and requires you (or your parents) to ignore SIGXFSZ. A simpler protocol puts fewer requirements on the client. (In fact, we probably should ignore SIGXFSZ in QEMU and treat EFBIG like ENOSPC everywhere. Should doesn't mean that it will get on anyone's todo list or priority list...). +static int nbd_errno_to_system_errno(int err) +{ +switch (err) { +case NBD_EPERM: +return EPERM; +case NBD_EIO: +return EIO; +case NBD_ENOMEM: +return ENOMEM; +case NBD_ENOSPC: +return ENOSPC; +case NBD_EINVAL: +default: +return EINVAL; +} +} + If we reach default, something's amiss, isn't it? Worth logging something then? Not necessarily. You could have an older NBD server on the other side, and then the errno code might only be valid on another platform! In fact, if the NBD server is qemu-nbd, the producer of the error could be any driver in block/ for any old version of QEMU. I would not be surprised if an EPIPE or ENOTCONN sneaked in somehow. And there are probably other NBD servers around. CCing Rich Jones so that he can fix nbdkit. Paolo
Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers
On Fri, 08 May 2015 13:30:18 +0300 Pavel Fedin p.fe...@samsung.com wrote: Hello! Yes, I think it makes sense to just pick the low-hanging fruit for virtio-mmio and wait for pci. Does this mean that my series can be accepted as it is? Since PCI is potentially better solution, MMIO is a low priority in my project, and i have lots of other tasks. This means i unfortunately don't have time for further refactor. If you ACK, i will resend the series once again as v3, i set up git send-email and it should be working now. I just wanted to share this piece because it's already done, and i would not like it to go to oblivion again. I think (ommitting the refactoring) you also need to care about reset. (And maybe fold patches 1 and 3.) Should not be too much effort, I hope. And I'd recommend to cc: MST, so that there is a chance of it not getting lost in his after-vacation mail avalanche :)
Re: [Qemu-devel] [PATCH] qcow2: Flush pending discards before allocating cluster
Am 06.05.2015 um 13:29 hat Kevin Wolf geschrieben: Before a freed cluster can be reused, pending discards for this cluster must be processed. The original assumption was that this was not a problem because discards are only cached during discard/write zeroes operations, which are synchronous so that no concurrent write requests can cause cluster allocations. However, the discard/write zeroes operation itself can allocate a new L2 table (and it has to in order to put zero flags there), so make sure we can cope with the situation. This fixes https://bugs.launchpad.net/bugs/1349972. Signed-off-by: Kevin Wolf kw...@redhat.com Cc: qemu-sta...@nongnu.org Applied to my block branch. Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] block: Warn about usage of growing formats over non-growable protocols
On 08/05/2015 12:34, Kevin Wolf wrote: Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben: On 08/05/2015 12:08, Kevin Wolf wrote: If so, the commands seem to be hopelessly underspecified, especially with respect to error conditions. And where it says something about errors, it doesn't make sense: The server is forbidden to reply to a NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) So does nbd-server. O:-) Looks like you're reading the spec too literally (which is never a bad thing). I don't think there is something like reading a spec too literally. Specs are meant to be read literally. If a specification is open to interpretation, you don't need it. So I'd rather say I've found a bug in the spec. ;-) You have. The bug is a single missing word (successful) reply, but it is still a bug. There is another bug, in that it talks about outstanding writes rather than completed writes. As you already seem to be working on the NBD mailing list, do you want to fix this, or should I subscribe and send a patch myself? You've been CCed on the fix. Paolo
Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description
* Stefan Hajnoczi (stefa...@redhat.com) wrote: On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: * Stefan Hajnoczi (stefa...@redhat.com) wrote: On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: On 24/04/2015 11:38, Wen Congyang wrote: That can be done with drive-mirror. But I think it's too early for that. Do you mean use drive-mirror instead of quorum? Only before starting up a new secondary. Basically you do a migration with non-shared storage, and then start the secondary in colo mode. But it's only for the failover case. Quorum (or a new block/colo.c driver or filter) is fine for normal colo operation. Perhaps this patch series should mirror the Secondary's disk to a Backup Secondary so that the system can be protected very quickly after failover. I think anyone serious about fault tolerance would deploy a Backup Secondary, otherwise the system cannot survive two failures unless a human administrator is lucky/fast enough to set up a new Secondary. I'd assumed that a higher level management layer would do the allocation of a new secondary after the first failover, so no human need be involved. That doesn't help, after the first failover is too late even if it's done by a program. There should be no window during which the VM is unprotected. People who want fault tolerance care about 9s of availability. The VM must be protected on the new Primary as soon as the failover occurs, otherwise this isn't a serious fault tolerance solution. I'm not aware of any other system that manages that, so I don't think that's fair. You gain a lot more availability going from a single system to the 1+1 system that COLO (or any of the checkpointing systems) propose, I can't say how many 9s it gets you. It's true having multiple secondaries would get you a bit more on top of that, but you're still a lot better off just having the one secondary. I had thought that having 1 secondary would be a nice addition, but it's a big change everywhere else (e.g. having to maintain multiple migration streams, dealing with miscompares from multiple hosts). Dave Stefan -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH COLO v3 01/14] docs: block replication's description
Am 08.05.2015 um 10:42 hat Stefan Hajnoczi geschrieben: On Tue, May 05, 2015 at 04:23:56PM +0100, Dr. David Alan Gilbert wrote: * Stefan Hajnoczi (stefa...@redhat.com) wrote: On Fri, Apr 24, 2015 at 11:36:35AM +0200, Paolo Bonzini wrote: On 24/04/2015 11:38, Wen Congyang wrote: That can be done with drive-mirror. But I think it's too early for that. Do you mean use drive-mirror instead of quorum? Only before starting up a new secondary. Basically you do a migration with non-shared storage, and then start the secondary in colo mode. But it's only for the failover case. Quorum (or a new block/colo.c driver or filter) is fine for normal colo operation. Perhaps this patch series should mirror the Secondary's disk to a Backup Secondary so that the system can be protected very quickly after failover. I think anyone serious about fault tolerance would deploy a Backup Secondary, otherwise the system cannot survive two failures unless a human administrator is lucky/fast enough to set up a new Secondary. I'd assumed that a higher level management layer would do the allocation of a new secondary after the first failover, so no human need be involved. That doesn't help, after the first failover is too late even if it's done by a program. There should be no window during which the VM is unprotected. People who want fault tolerance care about 9s of availability. The VM must be protected on the new Primary as soon as the failover occurs, otherwise this isn't a serious fault tolerance solution. If you're worried about two failures in a row, why wouldn't you be worried about three in a row? I think if you really want more than one backup to be ready, you shouldn't go to two, but to n. Kevin pgp6wi8xJKZQC.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire
On 08/05/2015 11:32, Markus Armbruster wrote: +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ Is the protocol defined anywhere? https://github.com/yoe/nbd/blob/master/doc/proto.txt All it has on the error code is this paragraph: The reply contains three fields: a 32 bit magic number ('magic'), a 32 bit error code ('error'; 0, unless an error occurred in which case it is the errno of the error on the server side), and the same 64 bit handle that the corresponding request had in its 'handle' field. In case the reply is sent in response to a read request and the error field is 0 (zero), the reply header is immediately followed by request.len bytes of data. Could you update it to document the errno compatibility issues, and recommended practice (i.e. this patch's)? Yes, I've sent a patch yesterday. +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return 1; +case EIO: +return 5; +case ENXIO: +return 6; +case E2BIG: +return 7; +case ENOMEM: +return 12; +case EACCES: +return 13; +case EFBIG: +return 27; +case ENOSPC: +return 28; +case EROFS: +return 30; +case EINVAL: +default: +return 22; +} +} + This maps recognized OS errnos to NBD errnos. The latter are literals. /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +/* NBD errors should be universally equal to the corresponding + * errno values, check it here. + */ +QEMU_BUILD_BUG_ON(EPERM != 1); +QEMU_BUILD_BUG_ON(EIO != 5); +QEMU_BUILD_BUG_ON(ENXIO != 6); +QEMU_BUILD_BUG_ON(E2BIG != 7); +QEMU_BUILD_BUG_ON(ENOMEM != 12); +QEMU_BUILD_BUG_ON(EACCES != 13); +QEMU_BUILD_BUG_ON(EINVAL != 22); +QEMU_BUILD_BUG_ON(EFBIG != 27); +QEMU_BUILD_BUG_ON(ENOSPC != 28); +QEMU_BUILD_BUG_ON(EROFS != 30); + This checks that the mapping above is the identify function for all the recognized NBD errnos. Why is that necessary? Still curious. Explain, and earn my R-by :) Because block/nbd.c expects host errnos, and I was too lazy to add a switch and a nbd_errno_to_system_errno function. But Eric pointed out that Hurd has weird errnos (the low bits match, but there's a 0x10 subsystem code in bits 24-31) so I'll add it. Same literals as above. Violates DRY. I don't mind all that much, but wonder whether we could at least do the checking next to system_errno_to_nbd_errno(). Could we? Yes, s/could/should/. Also, should have added RFC to the patch. Paolo TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +917,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error)
[Qemu-devel] [PATCH] qemu-nbd: return ENOSPC for out-of-range writes
This ensures that werror=enospc works fine for NBD-backed devices. Recovery can be done through live snapshots even if the NBD server does not support online resizing. Suggested-by: Kevin Wolf kw...@redhat.com Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- nbd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/nbd.c b/nbd.c index 57d71b2..a04ba80 100644 --- a/nbd.c +++ b/nbd.c @@ -1297,7 +1297,8 @@ static void nbd_trip(void *opaque) request.from, request.len, (uint64_t)exp-size, (uint64_t)exp-dev_offset); LOG(requested operation past EOF--bad client?); -goto invalid_request; +reply.error = (command == NBD_CMD_WRITE) ? ENOSPC : EINVAL; +goto error_reply; } switch (command) { @@ -1390,7 +1391,6 @@ static void nbd_trip(void *opaque) break; default: LOG(invalid request type (%u) received, request.type); -invalid_request: reply.error = EINVAL; error_reply: if (nbd_co_send_reply(req, reply, 0) 0) { -- 2.3.5
Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers
On Fri, 08 May 2015 09:45:00 +0300 Pavel Fedin p.fe...@samsung.com wrote: Hello! Hm, weren't there some patches for irqfd on arm? Yes, there were. However, they had a design problem by breaking backwards compatibility with unmodified virtio. Their idea was to set up one more shared memory area between virtio and vhost-net and use it to pass ISR value, which helps to distinguish, which event took place (queue update or config change). So, this idea was rejected. http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html I thought about it, and technically, i think, this can be done in better way. Actually, as far as i understood, all we need is mechanism for distinguishing between these two events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ signals exactly one type of event. MSI-X code also has two IRQs mode as a failsafe, where one IRQ signals config change and another IRQ signals queues update (and all queues are polled in turn). I think a similar thing could be done for virtio-mmio. It could allocate two IRQs instead of one and describe both of them in the device tree. Guest side, upon seeing that, could make use of those two IRQs and acknowledge to the host side that yes, i am new version and use new mode. But, sorry, i will unlikely implement this, because we already have PCI with MSI-X (i hope this is going to be published soon), so my project can use PCI emulation. So implementing irqfds for virtio-mmio is a bit out of my scope. Thanks for the explanation. Yes, I think it makes sense to just pick the low-hanging fruit for virtio-mmio and wait for pci.
[Qemu-devel] [virtio] use virtqueue async
Can I use a virtqueue asynchronously between guest and backend? By that I mean that any end of the communication channel (virtqueue) can push/pop how much data he wants anytime, without worrying that it will pop a thing that it pushed on the queue, and in fact the other end should have popped it? Or do I have to use 2 virtqueues for this to happen? What I'm trying to do is to process jobs in the backend continuously and then send a little more detailed result of the job. The job itself will run async on some piece of hardware while I continue popping jobs from the queue (the backend itself will just schedule them, will not busy wait till they get done). When a job is done, a callback will be called which will push the result one the virtqueue. What I am worrying about is what if I use the same queue for job request and job result? Is there a possibility for a job result to be processed by the same entity that put it there, instead of the other end?
Re: [Qemu-devel] [Qemu-block] [PATCH v3] block/vdi: Use bdrv_flush after metadata updates
Am 07.05.2015 um 06:04 hat Zhe Qiu geschrieben: From: phoeagon phoea...@gmail.com In reference to b0ad5a455d7e5352d4c86ba945112011dbeadfb8~078a458e077d6b0db262c4b05fee51d01de2d1d2, metadata writes to qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes. The justification for these patches (in 2010!) was more or less that we didn't know whether the implementations were safe without the flushes. Many of the flushes added back then have been removed again until today because they have turned out to be unnecessary. Only when write is successful that bdrv_flush is called. Signed-off-by: Zhe Qiu phoea...@gmail.com Please describe why VDI needs this flush to be safe. This is best done by describing a case where not doing the flush would lead to image corruption in case of a crash. To my knowledge, the VDI driver is completely safe without it. Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH] vmdk: Fix overflow if l1_size is 0x20000000
Am 06.05.2015 um 19:50 hat Max Reitz geschrieben: On 05.05.2015 11:28, Fam Zheng wrote: Richard Jones caught this bug with afl fuzzer. In fact, that's the only possible value to overflow (extent-l1_size = 0x2000) l1_size: l1_size = extent-l1_size * sizeof(long) = 0x8000; g_try_malloc returns NULL because l1_size is interpreted as negative during type casting from 'int' to 'gsize', which yields a enormous value. Hence, by coincidence, we get a not too bad behavior: qemu-img: Could not open '/tmp/afl6.img': Could not open '/tmp/afl6.img': Cannot allocate memory Values larger than 0x2000 will be refused by the validation in vmdk_add_extent. Values smaller than 0x2000 will not overflow l1_size. Reported-by: Richard W.M. Jones rjo...@redhat.com Signed-off-by: Fam Zheng f...@redhat.com --- block/vmdk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Okay, so because size_t is unsigned, this will not overflow on systems with sizeof(size_t) == 4 either. Reviewed-by: Max Reitz mre...@redhat.com Thanks, applied to the block branch. I think this one is for stable, too. Kevin
Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache
Am 08.05.2015 um 11:00 hat Alberto Garcia geschrieben: On Fri 24 Apr 2015 03:04:06 PM CEST, Kevin Wolf kw...@redhat.com wrote: I think it would be nice to have a way to free unused cache entries after a while. Do you think mmap plus a periodic timer would work? I'm hesitant about changes like this because they make QEMU more complex, slow down the guest, and make the memory footprint volatile. But if a simple solution addresses the problem, then I'd be happy. I understand. One possible approach would be to use the timer only if the cache size requested by the user is large, so 1) we only try to save memory when the amount of memory to save is significant. 2) we don't make the default scenario slower. I'll try to see if I can come up with a good solution (and with more numbers). I suspect that at this point, Anthony would have reminded us about separation of policy and mechanism. The very least we need is some option to control the policy (probably defaulting to no automatic cache size changes). There's also the problem that with a single chunk of memory for all cache tables it's not so easy to free individual entries. That's one of the reasons why I suggested to fix the problem by using mmap() for the individual entries rather than allocating one big chunk. One possible solution would be madvise() with MADV_DONTNEED, and that's rather simple to use, but I'm unsure about its portability. Probably not portable at all, but the worst that can happen is that a cache doesn't shrink, i.e. behaves like today. Seems good enough. The other option would be to let the management tool change the cache size options at runtime (and as it happens, my current bdrv_reopen() overhaul allows just that - except that a QMP interface isn't planned yet), but I'm not sure if it has enough information to make good decisions. If we know what the criteria are, they could possibly be exposed, though. But what would you do there? Recreate the whole cache? Recreating the cache is the easiest solution (and it is what my reopen patches do) and the overhead is probably neglegible as long as you don't resize the cache all the time. If we really need it, resizing the cache while maintaining its content would be possible, too. Extending shouldn't be a problem at all (with your patches you'd have to rehash the entries, but that's it), and for shrinking you would have to evict some entries (based on the algorithm that is used in other cases as well). Neither sounds particularly complicated, it's just some work that needs to be done if we ever get a requirement like this. Kevin
Re: [Qemu-devel] [PATCH v6 00/22] Generate ACPI v5.1 tables and expose them to guest over fw_cfg on ARM
Hi Peter, On 2015/5/8 0:34, Peter Maydell wrote: Hi. I've had a look through these patches now, and sent some mails for a couple of specific issues. For the rest, this is what I'd like to see for me to pull this in to target-arm: Thanks for your review comments. (1) acks or reviewed-by from one of the QEMU ACPI folk for any patch that's touching generic ACPI code. Yeah, Igor said he will give reviewed-by for these patches. (2) series should pass checkpatch.pl (you have a lot of overlong lines warnings at the moment). Sorry, I will fix patch 07/22, but to patch 01/22, as I just move the file, maybe I don't need to deal the formats error of acpi-defs.h. Right? (3) we shouldn't generate any ACPI tables or information where the corresponding kernel code hasn't been accepted into mainline or the arm64 tree; if there's anything here which isn't covered by the accepted ACPI patches we should postpone that til the kernel parts are in. There are uart, virtio-mmio, PCIe whose kernel patches are not accepted into kernel mainline. For PCIe I think this is harmless. The PCIe ACPI table's format don't need to change even though the kernel code changes because it's defined by PCI firmware SPEC. For virtio-mmio the kernel code just add acpi_match_table for kernel driver to probe the device by ACPI. To be honestly the only thing will be changed is the _HID and this change is very small. It's same with uart. In a word, the only thing will be changed is the _HID of virtio-mmio and uart. But if we get these tables in, it's useful for people to test the corresponding kernel code. If item (3) seems too awkward or harsh as a criterion I'm open to discussion about it, but my feeling is that I'd rather not start emitting tables until they're definitely nailed down in format by being in the kernel. Finally, if you have some prebuilt images and command lines that would be helpful, since I can put them into my local set of things I run to check for breakage before making ARM pull requests. Ok. -- Shannon
[Qemu-devel] [RFC v1 PATCH 0/3] cpus: Convert cpu_index into a bitmap
This patch changes the way cpu_index is handed out to newly created CPUs by tracking the allocted CPUs in a bitmap. More information and the need for this patch is described in patch 2/3 of this series. These generic changes are needed to support CPU hot plug/unplug on PowerPC. cpu_index is allocated in cpu_exec_init() and freed (during CPU unplug) in the newly added API cpu_exec_exit(). Currently all architectures call cpu_exec_init() from instance_init and hence cpu_exec_exit() is called from instance_finalize for all archs in this series. Instead of adding instance_finalize to all archs, we could call cpu_exec_exit() from the parent class (CPUClass.instance_finalize). However archs like x86 and PowerPC want cpu_exec_init() to be moved to realizefn from instance_init. Such movement will become easy with the current approach. Eudardo's suggested alternative of making cpu_exec_exit() idempotent (http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01241.html) will also simplify this movement, but I think that would compilicate the deallocation logic. Another open question is about handling of holes correctly in the allocated bitmap to support VM migration after CPU unplug. This was briefly discussed here: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg00560.html Should cpu_exec_init() API support specifying of a particular cpu_index in addition to returning the next available cpu_index by default ? I know that QEMU cmdline semantics for CPU device add/delele haven't been defined yet, but should we now make provision in cpu_exec_init() to allocate the required cpu_index ? Changes in v1 - - Added Error argument to cpu_exec_init() so that it callers calling it from realizefn can collect errors. v0: https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg02950.html Bharata B Rao (3): cpus: Add Error argument to cpu_exec_init() cpus: Convert cpu_index into a bitmap ppc: Move cpu_exec_init() call to realize function exec.c | 39 +++ include/exec/exec-all.h | 2 +- include/qom/cpu.h | 8 target-alpha/cpu.c | 8 +++- target-arm/cpu.c| 3 ++- target-cris/cpu.c | 8 +++- target-i386/cpu.c | 8 +++- target-lm32/cpu.c | 8 +++- target-m68k/cpu.c | 8 +++- target-microblaze/cpu.c | 8 +++- target-mips/cpu.c | 8 +++- target-moxie/cpu.c | 8 +++- target-openrisc/cpu.c | 8 +++- target-ppc/translate_init.c | 15 +-- target-s390x/cpu.c | 3 ++- target-sh4/cpu.c| 8 +++- target-sparc/cpu.c | 3 ++- target-tricore/cpu.c| 7 ++- target-unicore32/cpu.c | 8 +++- target-xtensa/cpu.c | 8 +++- 20 files changed, 153 insertions(+), 23 deletions(-) -- 2.1.0
[Qemu-devel] [RFC v1 PATCH 1/3] cpus: Add Error argument to cpu_exec_init()
Add an Error argument to cpu_exec_init() to let users collect the error. Change all callers to currently pass NULL error argument. This change is needed for the following reasons: - A subsequent commit changes the CPU enumeration logic in cpu_exec_init() resulting in cpu_exec_init() to fail if cpu_index values corresponding to max_cpus have already been handed out. - Archs like PowerPC and x86 want cpu_exec_init() to be called from realize rather than instance_init. With this change, those architectures that can move this call into realize function can do so in a phased manner. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com Reviewed-by: David Gibson da...@gibson.dropbear.id.au --- exec.c | 2 +- include/exec/exec-all.h | 2 +- target-alpha/cpu.c | 2 +- target-arm/cpu.c| 2 +- target-cris/cpu.c | 2 +- target-i386/cpu.c | 2 +- target-lm32/cpu.c | 2 +- target-m68k/cpu.c | 2 +- target-microblaze/cpu.c | 2 +- target-mips/cpu.c | 2 +- target-moxie/cpu.c | 2 +- target-openrisc/cpu.c | 2 +- target-ppc/translate_init.c | 2 +- target-s390x/cpu.c | 2 +- target-sh4/cpu.c| 2 +- target-sparc/cpu.c | 2 +- target-tricore/cpu.c| 2 +- target-unicore32/cpu.c | 2 +- target-xtensa/cpu.c | 2 +- 19 files changed, 19 insertions(+), 19 deletions(-) diff --git a/exec.c b/exec.c index ae37b98..68b632b 100644 --- a/exec.c +++ b/exec.c @@ -519,7 +519,7 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as) } #endif -void cpu_exec_init(CPUArchState *env) +void cpu_exec_init(CPUArchState *env, Error **errp) { CPUState *cpu = ENV_GET_CPU(env); CPUClass *cc = CPU_GET_CLASS(cpu); diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index b58cd47..0a3a504 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -88,7 +88,7 @@ void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t retaddr); TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc, target_ulong cs_base, int flags, int cflags); -void cpu_exec_init(CPUArchState *env); +void cpu_exec_init(CPUArchState *env, Error **errp); void QEMU_NORETURN cpu_loop_exit(CPUState *cpu); int page_unprotect(target_ulong address, uintptr_t pc, void *puc); void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end, diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c index a98b7d8..0a0c21e 100644 --- a/target-alpha/cpu.c +++ b/target-alpha/cpu.c @@ -257,7 +257,7 @@ static void alpha_cpu_initfn(Object *obj) CPUAlphaState *env = cpu-env; cs-env_ptr = env; -cpu_exec_init(env); +cpu_exec_init(env, NULL); tlb_flush(cs, 1); alpha_translate_init(); diff --git a/target-arm/cpu.c b/target-arm/cpu.c index 3ca3fa8..0b27c19 100644 --- a/target-arm/cpu.c +++ b/target-arm/cpu.c @@ -369,7 +369,7 @@ static void arm_cpu_initfn(Object *obj) static bool inited; cs-env_ptr = cpu-env; -cpu_exec_init(cpu-env); +cpu_exec_init(cpu-env, NULL); cpu-cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free); diff --git a/target-cris/cpu.c b/target-cris/cpu.c index 16cfba9..8b589ec 100644 --- a/target-cris/cpu.c +++ b/target-cris/cpu.c @@ -170,7 +170,7 @@ static void cris_cpu_initfn(Object *obj) static bool tcg_initialized; cs-env_ptr = env; -cpu_exec_init(env); +cpu_exec_init(env, NULL); env-pregs[PR_VR] = ccc-vr; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 3305e09..72e8fa9 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2850,7 +2850,7 @@ static void x86_cpu_initfn(Object *obj) static int inited; cs-env_ptr = env; -cpu_exec_init(env); +cpu_exec_init(env, NULL); object_property_add(obj, family, int, x86_cpuid_version_get_family, diff --git a/target-lm32/cpu.c b/target-lm32/cpu.c index f8081f5..89b6631 100644 --- a/target-lm32/cpu.c +++ b/target-lm32/cpu.c @@ -151,7 +151,7 @@ static void lm32_cpu_initfn(Object *obj) static bool tcg_initialized; cs-env_ptr = env; -cpu_exec_init(env); +cpu_exec_init(env, NULL); env-flags = 0; diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c index 4cfb725..6a41551 100644 --- a/target-m68k/cpu.c +++ b/target-m68k/cpu.c @@ -168,7 +168,7 @@ static void m68k_cpu_initfn(Object *obj) static bool inited; cs-env_ptr = env; -cpu_exec_init(env); +cpu_exec_init(env, NULL); if (tcg_enabled() !inited) { inited = true; diff --git a/target-microblaze/cpu.c b/target-microblaze/cpu.c index 67e3182..6b3732d 100644 --- a/target-microblaze/cpu.c +++ b/target-microblaze/cpu.c @@ -130,7 +130,7 @@ static void mb_cpu_initfn(Object *obj) static bool tcg_initialized; cs-env_ptr =
[Qemu-devel] [RFC v1 PATCH 2/3] cpus: Convert cpu_index into a bitmap
Currently CPUState.cpu_index is monotonically increasing and a newly created CPU always gets the next higher index. The next available index is calculated by counting the existing number of CPUs. This is fine as long as we only add CPUs, but there are architectures which are starting to support CPU removal too. For an architecture like PowerPC which derives its CPU identifier (device tree ID) from cpu_index, the existing logic of generating cpu_index values causes problems. With the currently proposed method of handling vCPU removal by parking the vCPU fd in QEMU (Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html), generating cpu_index this way will not work for PowerPC. This patch changes the way cpu_index is handed out by maintaining a bit map of the CPUs that tracks both addition and removal of CPUs. The CPU bitmap allocation logic is part of cpu_exec_init() which is called by instance_init routines of various CPU targets. This patch also adds corresponding instance_finalize routine if needed for these CPU targets so that CPU can be marked free when it is removed. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- exec.c | 37 ++--- include/qom/cpu.h | 8 target-alpha/cpu.c | 6 ++ target-arm/cpu.c| 1 + target-cris/cpu.c | 6 ++ target-i386/cpu.c | 6 ++ target-lm32/cpu.c | 6 ++ target-m68k/cpu.c | 6 ++ target-microblaze/cpu.c | 6 ++ target-mips/cpu.c | 6 ++ target-moxie/cpu.c | 6 ++ target-openrisc/cpu.c | 6 ++ target-ppc/translate_init.c | 6 ++ target-s390x/cpu.c | 1 + target-sh4/cpu.c| 6 ++ target-sparc/cpu.c | 1 + target-tricore/cpu.c| 5 + target-unicore32/cpu.c | 6 ++ target-xtensa/cpu.c | 6 ++ 19 files changed, 128 insertions(+), 3 deletions(-) diff --git a/exec.c b/exec.c index 68b632b..2422948 100644 --- a/exec.c +++ b/exec.c @@ -519,21 +519,52 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as) } #endif +#ifndef CONFIG_USER_ONLY +static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS); + +static int cpu_get_free_index(Error **errp) +{ +int cpu = find_first_zero_bit(cpu_index_map, max_cpus); + +if (cpu == max_cpus) { +error_setg(errp, Trying to use more CPUs than allowed max of %d\n, +max_cpus); +return max_cpus; +} else { +bitmap_set(cpu_index_map, cpu, 1); +return cpu; +} +} + +void cpu_exec_exit(CPUState *cpu) +{ +bitmap_clear(cpu_index_map, cpu-cpu_index, 1); +} +#endif + void cpu_exec_init(CPUArchState *env, Error **errp) { CPUState *cpu = ENV_GET_CPU(env); CPUClass *cc = CPU_GET_CLASS(cpu); -CPUState *some_cpu; int cpu_index; - #if defined(CONFIG_USER_ONLY) +CPUState *some_cpu; + cpu_list_lock(); -#endif cpu_index = 0; CPU_FOREACH(some_cpu) { cpu_index++; } cpu-cpu_index = cpu_index; +#else +Error *local_err = NULL; + +cpu_index = cpu-cpu_index = cpu_get_free_index(local_err); +if (local_err) { +error_propagate(errp, local_err); +return; +} +#endif cpu-numa_node = 0; QTAILQ_INIT(cpu-breakpoints); QTAILQ_INIT(cpu-watchpoints); diff --git a/include/qom/cpu.h b/include/qom/cpu.h index 39f0f19..3611ce5 100644 --- a/include/qom/cpu.h +++ b/include/qom/cpu.h @@ -673,6 +673,14 @@ void cpu_watchpoint_remove_all(CPUState *cpu, int mask); void QEMU_NORETURN cpu_abort(CPUState *cpu, const char *fmt, ...) GCC_FMT_ATTR(2, 3); +#ifndef CONFIG_USER_ONLY +void cpu_exec_exit(CPUState *cpu); +#else +static inline void cpu_exec_exit(CPUState *cpu) +{ +} +#endif + #ifdef CONFIG_SOFTMMU extern const struct VMStateDescription vmstate_cpu_common; #else diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c index 0a0c21e..259a04c 100644 --- a/target-alpha/cpu.c +++ b/target-alpha/cpu.c @@ -250,6 +250,11 @@ static const TypeInfo ev68_cpu_type_info = { .parent = TYPE(ev67), }; +static void alpha_cpu_finalize(Object *obj) +{ +cpu_exec_exit(CPU(obj)); +} + static void alpha_cpu_initfn(Object *obj) { CPUState *cs = CPU(obj); @@ -305,6 +310,7 @@ static const TypeInfo alpha_cpu_type_info = { .parent = TYPE_CPU, .instance_size = sizeof(AlphaCPU), .instance_init = alpha_cpu_initfn, +.instance_finalize = alpha_cpu_finalize, .abstract = true, .class_size = sizeof(AlphaCPUClass), .class_init = alpha_cpu_class_init, diff --git a/target-arm/cpu.c b/target-arm/cpu.c index 0b27c19..ef04ae1 100644 --- a/target-arm/cpu.c +++ b/target-arm/cpu.c @@ -454,6 +454,7 @@ static void arm_cpu_finalizefn(Object *obj) { ARMCPU *cpu = ARM_CPU(obj); g_hash_table_destroy(cpu-cp_regs); +cpu_exec_exit(CPU(obj)); } static void
[Qemu-devel] [RFC v1 PATCH 3/3] ppc: Move cpu_exec_init() call to realize function
Move cpu_exec_init() call from instance_init to realize. This allows any failures from cpu_exec_init() to be handled appropriately. Also add corresponding cpu_exec_exit() call from unrealize. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com Reviewed-by: David Gibson da...@gibson.dropbear.id.au --- target-ppc/translate_init.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index b915697..47fe3d6 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -8928,6 +8928,11 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error **errp) return; } +cpu_exec_init(cpu-env, local_err); +if (local_err != NULL) { +error_propagate(errp, local_err); +return; +} cpu-cpu_dt_id = (cs-cpu_index / smp_threads) * max_smt + (cs-cpu_index % smp_threads); #endif @@ -9141,6 +9146,8 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp) opc_handler_t **table; int i, j; +cpu_exec_exit(CPU(dev)); + for (i = 0; i PPC_CPU_OPCODES_LEN; i++) { if (env-opcodes[i] == invalid_handler) { continue; @@ -9638,8 +9645,6 @@ static void ppc_cpu_initfn(Object *obj) CPUPPCState *env = cpu-env; cs-env_ptr = env; -cpu_exec_init(env, NULL); -cpu-cpu_dt_id = cs-cpu_index; env-msr_mask = pcc-msr_mask; env-mmu_model = pcc-mmu_model; -- 2.1.0
Re: [Qemu-devel] [PATCH] qemu-nbd: only send a limited number of errno codes on the wire
Paolo Bonzini pbonz...@redhat.com writes: On 08/05/2015 11:32, Markus Armbruster wrote: +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ Is the protocol defined anywhere? https://github.com/yoe/nbd/blob/master/doc/proto.txt All it has on the error code is this paragraph: The reply contains three fields: a 32 bit magic number ('magic'), a 32 bit error code ('error'; 0, unless an error occurred in which case it is the errno of the error on the server side), and the same 64 bit handle that the corresponding request had in its 'handle' field. In case the reply is sent in response to a read request and the error field is 0 (zero), the reply header is immediately followed by request.len bytes of data. Could you update it to document the errno compatibility issues, and recommended practice (i.e. this patch's)? Yes, I've sent a patch yesterday. Excellent! +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return 1; +case EIO: +return 5; +case ENXIO: +return 6; +case E2BIG: +return 7; +case ENOMEM: +return 12; +case EACCES: +return 13; +case EFBIG: +return 27; +case ENOSPC: +return 28; +case EROFS: +return 30; +case EINVAL: +default: +return 22; +} +} + This maps recognized OS errnos to NBD errnos. The latter are literals. /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +887,20 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +/* NBD errors should be universally equal to the corresponding + * errno values, check it here. + */ +QEMU_BUILD_BUG_ON(EPERM != 1); +QEMU_BUILD_BUG_ON(EIO != 5); +QEMU_BUILD_BUG_ON(ENXIO != 6); +QEMU_BUILD_BUG_ON(E2BIG != 7); +QEMU_BUILD_BUG_ON(ENOMEM != 12); +QEMU_BUILD_BUG_ON(EACCES != 13); +QEMU_BUILD_BUG_ON(EINVAL != 22); +QEMU_BUILD_BUG_ON(EFBIG != 27); +QEMU_BUILD_BUG_ON(ENOSPC != 28); +QEMU_BUILD_BUG_ON(EROFS != 30); + This checks that the mapping above is the identify function for all the recognized NBD errnos. Why is that necessary? Still curious. Explain, and earn my R-by :) Because block/nbd.c expects host errnos, and I was too lazy to add a switch and a nbd_errno_to_system_errno function. But Eric pointed out that Hurd has weird errnos (the low bits match, but there's a 0x10 subsystem code in bits 24-31) so I'll add it. Oww! That's going to hurd... Same literals as above. Violates DRY. I don't mind all that much, but wonder whether we could at least do the checking next to system_errno_to_nbd_errno(). Could we? Yes, s/could/should/. Also, should have added RFC to the patch. Thanks!
Re: [Qemu-devel] [PATCH] block: return EPERM on writes or discards to read-only devices
On Thu, May 07, 2015 at 05:45:48PM +0200, Paolo Bonzini wrote: This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com pgpZi66XmRLg6.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire
Paolo Bonzini pbonz...@redhat.com writes: Right now, NBD includes potentially platform-specific error values in the wire protocol. Luckily, most common error values are more or less universal: in particular, of all errno values = 34 (up to ERANGE), they are all the same on supported platforms except for 11 (which is EAGAIN on Windows and Linux, but EDEADLK on Darwin and the *BSDs). So, in order to guarantee some portability, only keep a handful of possible error codes and squash everything else to EINVAL. This patch defines a limited set of errno values that are valid for the NBD protocol, and specifies recommendations for what error to return in specific corner cases. The set of errno values is roughly based on the errors listed in the read(2) and write(2) man pages, with some exceptions: - ENOMEM is added for servers that implement copy-on-write or other formats that require dynamic allocation. - EDQUOT is not part of the universal set of errors; it can be changed to ENOSPC on the wire format. Makes sense. - EFBIG is part of the universal set of errors, but it is also changed to ENOSPC because it is pretty similar to ENOSPC or EDQUOT. Perhaps debatable, but I defer to your judgement. Incoming values will in general match system errno values, but not on the Hurd which has different errno values (they have a subsystem code equal to 0x10 in bits 24-31). The Hurd is probably not something to which QEMU has been ported, but still do the right thing and reverse-map the NBD errno values to the system errno values. The corresponding patch to the NBD protocol description can be found at http://article.gmane.org/gmane.linux.drivers.nbd.general/3154. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v1-v2: use #defines and do not expect errnos to match NBD errnos for the sake of Hurd. Reduced the list of errno values even more. Better commit message explaining how the list was obtained. nbd.c | 52 1 file changed, 52 insertions(+) diff --git a/nbd.c b/nbd.c index cb1b9bb..57d71b2 100644 --- a/nbd.c +++ b/nbd.c @@ -86,6 +86,54 @@ #define NBD_OPT_ABORT (2) #define NBD_OPT_LIST(3) +/* NBD errors are based on errno numbers, so there is a 1:1 mapping, + * but only a limited set of errno values is specified in the protocol. + * Everything else is squashed to EINVAL. + */ +#define NBD_EPERM 1 +#define NBD_EIO5 +#define NBD_ENOMEM 12 +#define NBD_EINVAL 22 +#define NBD_ENOSPC 28 + +static int system_errno_to_nbd_errno(int err) +{ +switch (err) { +case EPERM: +return NBD_EPERM; +case EIO: +return NBD_EIO; +case ENOMEM: +return NBD_ENOMEM; +#ifdef EDQUOT +case EDQUOT: +#endif +case EFBIG: +case ENOSPC: +return NBD_ENOSPC; +case EINVAL: +default: +return NBD_EINVAL; +} +} + +static int nbd_errno_to_system_errno(int err) +{ +switch (err) { +case NBD_EPERM: +return EPERM; +case NBD_EIO: +return EIO; +case NBD_ENOMEM: +return ENOMEM; +case NBD_ENOSPC: +return ENOSPC; +case NBD_EINVAL: +default: +return EINVAL; +} +} + If we reach default, something's amiss, isn't it? Worth logging something then? /* Definitions for opaque data types */ typedef struct NBDRequest NBDRequest; @@ -856,6 +904,8 @@ ssize_t nbd_receive_reply(int csock, struct nbd_reply *reply) reply-error = be32_to_cpup((uint32_t*)(buf + 4)); reply-handle = be64_to_cpup((uint64_t*)(buf + 8)); +reply-error = nbd_errno_to_system_errno(reply-error); + TRACE(Got reply: { magic = 0x%x, .error = %d, handle = % PRIu64 }, magic, reply-error, reply-handle); @@ -872,6 +922,8 @@ static ssize_t nbd_send_reply(int csock, struct nbd_reply *reply) uint8_t buf[NBD_REPLY_SIZE]; ssize_t ret; +reply-error = system_errno_to_nbd_errno(reply-error); + /* Reply [ 0 .. 3]magic (NBD_REPLY_MAGIC) [ 4 .. 7]error (0 == no error) Reviewed-by: Markus Armbruster arm...@redhat.com
Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers
Hello! Yes, I think it makes sense to just pick the low-hanging fruit for virtio-mmio and wait for pci. Does this mean that my series can be accepted as it is? Since PCI is potentially better solution, MMIO is a low priority in my project, and i have lots of other tasks. This means i unfortunately don't have time for further refactor. If you ACK, i will resend the series once again as v3, i set up git send-email and it should be working now. I just wanted to share this piece because it's already done, and i would not like it to go to oblivion again. Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -Original Message- From: qemu-devel-bounces+p.fedin=samsung@nongnu.org [mailto:qemu-devel- bounces+p.fedin=samsung@nongnu.org] On Behalf Of Cornelia Huck Sent: Friday, May 08, 2015 1:20 PM To: Pavel Fedin Cc: qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH v2 2/3] virtio-mmio: introduce set_guest_notifiers On Fri, 08 May 2015 09:45:00 +0300 Pavel Fedin p.fe...@samsung.com wrote: Hello! Hm, weren't there some patches for irqfd on arm? Yes, there were. However, they had a design problem by breaking backwards compatibility with unmodified virtio. Their idea was to set up one more shared memory area between virtio and vhost-net and use it to pass ISR value, which helps to distinguish, which event took place (queue update or config change). So, this idea was rejected. http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03056.html I thought about it, and technically, i think, this can be done in better way. Actually, as far as i understood, all we need is mechanism for distinguishing between these two events. On PCI we do this by using multiple IRQs via MSI-X, and one IRQ signals exactly one type of event. MSI-X code also has two IRQs mode as a failsafe, where one IRQ signals config change and another IRQ signals queues update (and all queues are polled in turn). I think a similar thing could be done for virtio-mmio. It could allocate two IRQs instead of one and describe both of them in the device tree. Guest side, upon seeing that, could make use of those two IRQs and acknowledge to the host side that yes, i am new version and use new mode. But, sorry, i will unlikely implement this, because we already have PCI with MSI-X (i hope this is going to be published soon), so my project can use PCI emulation. So implementing irqfds for virtio-mmio is a bit out of my scope. Thanks for the explanation.
Re: [Qemu-devel] [PATCH v2] qemu-nbd: only send a limited number of errno codes on the wire
On Fri, May 08, 2015 at 12:27:59PM +0200, Paolo Bonzini wrote: And there are probably other NBD servers around. CCing Rich Jones so that he can fix nbdkit. Thanks. FWIW currently nbdkit will send any arbitrary errno returned by a system call, and it doesn't do any platform-specific encoding. I've added this to the to-do list. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Re: [Qemu-devel] [Qemu-block] [PATCH] vmdk: Fix next_cluster_sector for compressed write
Am 06.05.2015 um 20:01 hat Max Reitz geschrieben: On 06.05.2015 14:23, Fam Zheng wrote: This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster allocation). Sometimes, write_len could be larger than cluster size, because it contains both data and marker. We must advance next_cluster_sector in this case, otherwise the image gets corrupted. Reported-by: Antoni Villalonga qemu-l...@friki.cat Signed-off-by: Fam Zheng f...@redhat.com Reviewed-by: Max Reitz mre...@redhat.com Thanks, added a Cc: qemu-sta...@nongnu.org line to the commit message and applied to the block branch. Kevin
[Qemu-devel] [Bug 1453025] [NEW] remote usbredirection failed, when guest os has more than one vcpus
Public bug reported: I was testing usb3.0 redirection, it worked fine with one vcpu. But when I set vcpu to a number greater than one, the redirection failed. Host and guest are Fedora 20 and Windows 7. Client is remote-viwer. version: qemu 2.3.0 libvirt 1.2.15 spice 0.12.5 libvirt xml: vcpu placement='static'2/vcpu controller type='usb' index='0' model='nec-xhci' /controller ** Affects: qemu Importance: Undecided Status: New ** Tags: usbredir -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1453025 Title: remote usbredirection failed, when guest os has more than one vcpus Status in QEMU: New Bug description: I was testing usb3.0 redirection, it worked fine with one vcpu. But when I set vcpu to a number greater than one, the redirection failed. Host and guest are Fedora 20 and Windows 7. Client is remote-viwer. version: qemu 2.3.0 libvirt 1.2.15 spice 0.12.5 libvirt xml: vcpu placement='static'2/vcpu controller type='usb' index='0' model='nec-xhci' /controller To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1453025/+subscriptions
[Qemu-devel] [PULL 10/10] scripts: qmp-shell: Add verbose flag
From: John Snow js...@redhat.com Add a verbose flag that shows the QMP command that was constructed, to allow for later copy/pasting, reference, debugging, etc. The QMP is converted from a Python literal to JSON first, to ensure that it is viable input to the actual QMP parser. As a side-effect, this JSON output will helpfully show all the necessary conversions that were performed on the input, illustrating that True was transformed back into true, literal values are now escaped with instead of '', and so on. Signed-off-by: John Snow js...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Tested-by: Kashyap Chamarthy kcham...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- scripts/qmp/qmp-shell | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell index 1df2ca7..65280d2 100755 --- a/scripts/qmp/qmp-shell +++ b/scripts/qmp/qmp-shell @@ -195,6 +195,13 @@ class QMPShell(qmp.QEMUMonitorProtocol): self.__cli_expr(cmdargs[1:], qmpcmd['arguments']) return qmpcmd +def _print(self, qmp): +jsobj = json.dumps(qmp) +if self._pp is not None: +self._pp.pprint(jsobj) +else: +print str(jsobj) + def _execute_cmd(self, cmdline): try: qmpcmd = self.__build_cmd(cmdline) @@ -206,15 +213,13 @@ class QMPShell(qmp.QEMUMonitorProtocol): # For transaction mode, we may have just cached the action: if qmpcmd is None: return True +if self._verbose: +self._print(qmpcmd) resp = self.cmd_obj(qmpcmd) if resp is None: print 'Disconnected' return False - -if self._pp is not None: -self._pp.pprint(resp) -else: -print resp +self._print(resp) return True def connect(self): @@ -250,6 +255,9 @@ class QMPShell(qmp.QEMUMonitorProtocol): else: return self._execute_cmd(cmdline) +def set_verbosity(self, verbose): +self._verbose = verbose + class HMPShell(QMPShell): def __init__(self, address): QMPShell.__init__(self, address) @@ -327,7 +335,7 @@ def die(msg): def fail_cmdline(option=None): if option: sys.stderr.write('ERROR: bad command-line option \'%s\'\n' % option) -sys.stderr.write('qemu-shell [ -p ] [ -H ] UNIX socket path | TCP address:port \n') +sys.stderr.write('qemu-shell [ -v ] [ -p ] [ -H ] UNIX socket path | TCP address:port \n') sys.exit(1) def main(): @@ -335,6 +343,7 @@ def main(): qemu = None hmp = False pp = None +verbose = False try: for arg in sys.argv[1:]: @@ -346,6 +355,8 @@ def main(): if pp is not None: fail_cmdline(arg) pp = pprint.PrettyPrinter(indent=4) +elif arg == -v: +verbose = True else: if qemu is not None: fail_cmdline(arg) @@ -370,6 +381,7 @@ def main(): die('Could not connect to %s' % addr) qemu.show_banner() +qemu.set_verbosity(verbose) while qemu.read_exec_command(qemu.get_prompt()): pass qemu.close() -- 1.9.3
Re: [Qemu-devel] [PATCH RFC v8 5/7] qemu-iotests: s390x: fix test 049
On 07.05.2015 09:52, Bo Tu wrote: when creating an image qemu-img enable us specifying the size of the image using -o size=xx options. But when we specify an invalid size such as a negtive size then different platform gives different result. parse_option_size() function in util/qemu-option.c will be called to parse the size, a cast was called in the function to cast the input (saved as a double in the function) size to an unsigned int64 value, when the input is a negtive value or exceeds the maximum of uint64, then the result is undefined. Language spec 6.3.1.4 Real floating and integers: the result of this assignment/cast is undefined if the float is not in the open interval (-1, Utype_MAX+1). Signed-off-by: Bo Tu t...@linux.vnet.ibm.com --- tests/qemu-iotests/049.out | 10 -- util/qemu-option.c | 4 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out index 9f93666..b8790f9 100644 --- a/tests/qemu-iotests/049.out +++ b/tests/qemu-iotests/049.out @@ -95,17 +95,15 @@ qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1024 qemu-img: Image size must be less than 8 EiB! qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2 -qemu-img: qcow2 doesn't support shrinking images yet -qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported -Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 +qemu-img: Parameter 'size' expects a non-negative number below 2^64 +qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2' qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k qemu-img: Image size must be less than 8 EiB! qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2 -qemu-img: qcow2 doesn't support shrinking images yet -qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported -Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 +qemu-img: Parameter 'size' expects a non-negative number below 2^64 +qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2' qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for diff --git a/util/qemu-option.c b/util/qemu-option.c index fda4e5f..6e75698 100644 --- a/util/qemu-option.c +++ b/util/qemu-option.c @@ -179,6 +179,10 @@ void parse_option_size(const char *name, const char *value, if (value != NULL) { sizef = strtod(value, postfix); +if (sizef 0 || sizef UINT64_MAX) { +error_set(errp, QERR_INVALID_PARAMETER_VALUE, name, a non-negative number below 2^64); scripts/checkpatch.pl: Konsole output Konsole output WARNING: line over 80 characters #106: FILE: util/qemu-option.c:183: +error_set(errp, QERR_INVALID_PARAMETER_VALUE, name, a non-negative number below 2^64); Other than that, looks good. Max +return; +} switch (*postfix) { case 'T': sizef *= 1024;
Re: [Qemu-devel] [PATCH RFC v8 7/7] qemu-iotests: s390x: fix test 130
On 07.05.2015 09:52, Bo Tu wrote: The default device id of hard disk on the s390 platform is virtio0 which differs to the ide0-hd0 for the x86 platform. Setting id in the drive definition, ie:qemu -drive id=testdisk, will be the same on all platforms. Signed-off-by: Bo Tu t...@linux.vnet.ibm.com --- tests/qemu-iotests/130 | 8 tests/qemu-iotests/130.out | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
Re: [Qemu-devel] [PATCH 2/4] migration: move savevm.c inside migration/
On 05/08/2015 06:43 AM, Juan Quintela wrote: Now, everything is in place. Signed-off-by: Juan Quintela quint...@redhat.com --- MAINTAINERS| 1 - Makefile.target| 4 ++-- savevm.c = migration/savevm.c | 0 trace-events | 2 +- 4 files changed, 3 insertions(+), 4 deletions(-) rename savevm.c = migration/savevm.c (100%) Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-block] [PATCH 1/7] qcow2: use one single memory block for the L2/refcount cache tables
On 06.05.2015 15:39, Alberto Garcia wrote: The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia be...@igalia.com --- block/qcow2-cache.c| 55 -- block/qcow2-cluster.c | 12 +-- block/qcow2-refcount.c | 8 +--- block/qcow2.h | 3 ++- 4 files changed, 39 insertions(+), 39 deletions(-) Reviewed-by: Max Reitz mre...@redhat.com
[Qemu-devel] [PATCH 4/4] block: Include qemu-log.o in block-obj-y
Building the QEMU tools fails if we #define DEBUG_BLOCK inside block/raw-posix.c. This happens because qemu-log.o is missing from block-obj-y, which causes the link to fail. Fix this. Signed-off-by: Dimitris Aragiorgis dim...@arrikto.com --- Makefile.objs |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile.objs b/Makefile.objs index 28999d3..98f6e02 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -6,7 +6,7 @@ util-obj-y = util/ qobject/ qapi/ qapi-types.o qapi-visit.o qapi-event.o ### # block-obj-y is code used by both qemu system emulation and qemu-img -block-obj-y = async.o thread-pool.o +block-obj-y = async.o thread-pool.o qemu-log.o block-obj-y += nbd.o block.o blockjob.o block-obj-y += main-loop.o iohandler.o qemu-timer.o block-obj-$(CONFIG_POSIX) += aio-posix.o -- 1.7.10.4
[Qemu-devel] [PULL 09/10] scripts: qmp-shell: add transaction subshell
From: John Snow js...@redhat.com Add a special processing mode to craft transactions. By entering transaction( the shell will enter a special mode where each subsequent command will be saved as a transaction instead of executed as an individual command. The transaction can be submitted by entering ) on a line by itself. Examples: Separate lines: (QEMU) transaction( TRANS block-dirty-bitmap-add node=drive0 name=bitmap1 TRANS block-dirty-bitmap-clear node=drive0 name=bitmap0 TRANS ) With a transaction action included on the first line: (QEMU) transaction( block-dirty-bitmap-add node=drive0 name=bitmap2 TRANS block-dirty-bitmap-add node=drive0 name=bitmap3 TRANS ) As a one-liner, with just one transaction action: (QEMU) transaction( block-dirty-bitmap-add node=drive0 name=bitmap0 ) As a side-effect of this patch, blank lines are now parsed as no-ops, regardless of which shell mode you are in. Signed-off-by: John Snow js...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Tested-by: Kashyap Chamarthy kcham...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- scripts/qmp/qmp-shell | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell index 7f2c554..1df2ca7 100755 --- a/scripts/qmp/qmp-shell +++ b/scripts/qmp/qmp-shell @@ -73,6 +73,8 @@ class QMPShell(qmp.QEMUMonitorProtocol): self._greeting = None self._completer = None self._pp = pp +self._transmode = False +self._actions = list() def __get_address(self, arg): @@ -159,6 +161,36 @@ class QMPShell(qmp.QEMUMonitorProtocol): command-name [ arg-name1=arg1 ] ... [ arg-nameN=argN ] cmdargs = cmdline.split() + +# Transactional CLI entry/exit: +if cmdargs[0] == 'transaction(': +self._transmode = True +cmdargs.pop(0) +elif cmdargs[0] == ')' and self._transmode: +self._transmode = False +if len(cmdargs) 1: +raise QMPShellError(Unexpected input after close of Transaction sub-shell) +qmpcmd = { 'execute': 'transaction', + 'arguments': { 'actions': self._actions } } +self._actions = list() +return qmpcmd + +# Nothing to process? +if not cmdargs: +return None + +# Parse and then cache this Transactional Action +if self._transmode: +finalize = False +action = { 'type': cmdargs[0], 'data': {} } +if cmdargs[-1] == ')': +cmdargs.pop(-1) +finalize = True +self.__cli_expr(cmdargs[1:], action['data']) +self._actions.append(action) +return self.__build_cmd(')') if finalize else None + +# Standard command: parse and return it to be executed. qmpcmd = { 'execute': cmdargs[0], 'arguments': {} } self.__cli_expr(cmdargs[1:], qmpcmd['arguments']) return qmpcmd @@ -171,6 +203,9 @@ class QMPShell(qmp.QEMUMonitorProtocol): print 'command format: command-name ', print '[arg-name1=arg1] ... [arg-nameN=argN]' return True +# For transaction mode, we may have just cached the action: +if qmpcmd is None: +return True resp = self.cmd_obj(qmpcmd) if resp is None: print 'Disconnected' @@ -191,6 +226,11 @@ class QMPShell(qmp.QEMUMonitorProtocol): version = self._greeting['QMP']['version']['qemu'] print 'Connected to QEMU %d.%d.%d\n' % (version['major'],version['minor'],version['micro']) +def get_prompt(self): +if self._transmode: +return TRANS +return (QEMU) + def read_exec_command(self, prompt): Read and execute a command. @@ -330,7 +370,7 @@ def main(): die('Could not connect to %s' % addr) qemu.show_banner() -while qemu.read_exec_command('(QEMU) '): +while qemu.read_exec_command(qemu.get_prompt()): pass qemu.close() -- 1.9.3
[Qemu-devel] [PULL 04/10] qobject: Add a special null QObject
From: Markus Armbruster arm...@redhat.com I'm going to fix the JSON parser to recognize null. The obvious representation of JSON null as (QObject *)NULL doesn't work, because the parser already uses it as an error value. Perhaps we should change it to free NULL for null, but that's more than I can do right now. Create a special null QObject instead. The existing QDict, QList, and QString all represent something that is a pointer in C and could therefore be associated with NULL. But right now, all three of these sub-types are always non-null once created, so the new null sentinel object is intentionally unrelated to them. Signed-off-by: Markus Armbruster arm...@redhat.com Signed-off-by: Eric Blake ebl...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- include/qapi/qmp/qobject.h | 11 ++- qobject/Makefile.objs | 2 +- qobject/qjson.c| 3 +++ qobject/qnull.c| 29 + 4 files changed, 43 insertions(+), 2 deletions(-) create mode 100644 qobject/qnull.c diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h index 0991296..84b2d9f 100644 --- a/include/qapi/qmp/qobject.h +++ b/include/qapi/qmp/qobject.h @@ -3,7 +3,7 @@ * * Based on ideas by Avi Kivity a...@redhat.com * - * Copyright (C) 2009 Red Hat Inc. + * Copyright (C) 2009, 2015 Red Hat Inc. * * Authors: * Luiz Capitulino lcapitul...@redhat.com @@ -37,6 +37,7 @@ typedef enum { QTYPE_NONE,/* sentinel value, no QObject has this type code */ +QTYPE_QNULL, QTYPE_QINT, QTYPE_QSTRING, QTYPE_QDICT, @@ -110,4 +111,12 @@ static inline qtype_code qobject_type(const QObject *obj) return obj-type-code; } +extern QObject qnull_; + +static inline QObject *qnull(void) +{ +qobject_incref(qnull_); +return qnull_; +} + #endif /* QOBJECT_H */ diff --git a/qobject/Makefile.objs b/qobject/Makefile.objs index c9ff59c..f7595f5 100644 --- a/qobject/Makefile.objs +++ b/qobject/Makefile.objs @@ -1,3 +1,3 @@ -util-obj-y = qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o +util-obj-y = qnull.o qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o util-obj-y += qjson.o json-lexer.o json-streamer.o json-parser.o util-obj-y += qerror.o diff --git a/qobject/qjson.c b/qobject/qjson.c index f2857c1..846733d 100644 --- a/qobject/qjson.c +++ b/qobject/qjson.c @@ -127,6 +127,9 @@ static void to_json_list_iter(QObject *obj, void *opaque) static void to_json(const QObject *obj, QString *str, int pretty, int indent) { switch (qobject_type(obj)) { +case QTYPE_QNULL: +qstring_append(str, null); +break; case QTYPE_QINT: { QInt *val = qobject_to_qint(obj); char buffer[1024]; diff --git a/qobject/qnull.c b/qobject/qnull.c new file mode 100644 index 000..9873e26 --- /dev/null +++ b/qobject/qnull.c @@ -0,0 +1,29 @@ +/* + * QNull + * + * Copyright (C) 2015 Red Hat, Inc. + * + * Authors: + * Markus Armbruster arm...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 + * or later. See the COPYING.LIB file in the top-level directory. + */ + +#include qemu-common.h +#include qapi/qmp/qobject.h + +static void qnull_destroy_obj(QObject *obj) +{ +assert(0); +} + +static const QType qnull_type = { +.code = QTYPE_QNULL, +.destroy = qnull_destroy_obj, +}; + +QObject qnull_ = { +.type = qnull_type, +.refcnt = 1, +}; -- 1.9.3
[Qemu-devel] [PULL 05/10] json-parser: Accept 'null' in QMP
From: Eric Blake ebl...@redhat.com We document that in QMP, the client may send any json-value for the optional id key, and then return that same value on reply (both success and failures, insofar as the failure happened after parsing the id). [Note that the output may not be identical to the input, as whitespace may change and since we may reorder keys within a json-object, but that this still constitutes the same json-value]. However, we were not handling the JSON literal null, which counts as a json-value per RFC 7159. Also, down the road, given the QAPI schema of {'*foo':'str'} or {'*foo':'ComplexType'}, we could decide to allow the QMP client to pass { foo:null } instead of the current representation of { } where omitting the key is the only way to get at the default NULL value. Such a change might be useful for argument introspection (if a type in older qemu lacks 'foo' altogether, then an explicit foo:null probe will force an easily distinguished error message for whether the optional foo key is even understood in newer qemu). And if we add default values to optional arguments, allowing an explicit null would be required for getting a NULL value associated with an optional string that has a non-null default. But all that can come at a later day. The 'check-unit' testsuite is enhanced to test that parsing produces the same object as explicitly requesting a reference to the special qnull object. In addition, I tested with: $ ./x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults {QMP: {version: {qemu: {micro: 91, minor: 2, major: 2}, package: }, capabilities: []}} {execute:qmp_capabilities,id:null} {return: {}, id: null} {id:{a:null,b:[1,null]},execute:quit} {return: {}, id: {a: null, b: [1, null]}} {timestamp: {seconds: 1427742379, microseconds: 423128}, event: SHUTDOWN} Signed-off-by: Eric Blake ebl...@redhat.com Signed-off-by: Markus Armbruster arm...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- qobject/json-parser.c | 2 ++ tests/check-qjson.c | 15 +-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/qobject/json-parser.c b/qobject/json-parser.c index 4288267..717cb8f 100644 --- a/qobject/json-parser.c +++ b/qobject/json-parser.c @@ -561,6 +561,8 @@ static QObject *parse_keyword(JSONParserContext *ctxt) ret = QOBJECT(qbool_from_int(true)); } else if (token_is_keyword(token, false)) { ret = QOBJECT(qbool_from_int(false)); +} else if (token_is_keyword(token, null)) { +ret = qnull(); } else { parse_error(ctxt, token, invalid keyword `%s', token_get_value(token)); goto out; diff --git a/tests/check-qjson.c b/tests/check-qjson.c index 95497a0..60e5b22 100644 --- a/tests/check-qjson.c +++ b/tests/check-qjson.c @@ -1,6 +1,6 @@ /* * Copyright IBM, Corp. 2009 - * Copyright (c) 2013 Red Hat Inc. + * Copyright (c) 2013, 2015 Red Hat Inc. * * Authors: * Anthony Liguori aligu...@us.ibm.com @@ -1005,6 +1005,7 @@ static void keyword_literal(void) { QObject *obj; QBool *qbool; +QObject *null; QString *str; obj = qobject_from_json(true); @@ -1041,7 +1042,7 @@ static void keyword_literal(void) g_assert(qbool_get_int(qbool) == 0); QDECREF(qbool); - + obj = qobject_from_jsonf(%i, true); g_assert(obj != NULL); g_assert(qobject_type(obj) == QTYPE_QBOOL); @@ -1050,6 +1051,16 @@ static void keyword_literal(void) g_assert(qbool_get_int(qbool) != 0); QDECREF(qbool); + +obj = qobject_from_json(null); +g_assert(obj != NULL); +g_assert(qobject_type(obj) == QTYPE_QNULL); + +null = qnull(); +g_assert(null == obj); + +qobject_decref(obj); +qobject_decref(null); } typedef struct LiteralQDictEntry LiteralQDictEntry; -- 1.9.3
Re: [Qemu-devel] [PATCH 1/4] migration: move ram stuff to migration/ram
On 05/08/2015 06:43 AM, Juan Quintela wrote: For historic reasons, ram migration have been on arch_init.c. Just s/have been on/has been in/ split it into migration/ram.c, the same that happened with block.c. There is only code movement, no changes altogether. Signed-off-by: Juan Quintela quint...@redhat.com --- MAINTAINERS |1 - Makefile.target |1 + arch_init.c | 1754 ++--- include/migration/migration.h |2 + include/sysemu/arch_init.h|1 - migration/ram.c | 1681 +++ trace-events |2 +- 7 files changed, 1747 insertions(+), 1695 deletions(-) create mode 100644 migration/ram.c This diff is harder than necessary to read. It can be made smaller by using the --patience argument when creating the diff: $ git diff HEAD^ --patience --stat MAINTAINERS |1 - Makefile.target |1 + arch_init.c | 1630 --- include/migration/migration.h |2 + include/sysemu/arch_init.h|1 - migration/ram.c | 1681 + trace-events |2 +- 7 files changed, 1685 insertions(+), 1633 deletions(-) To make it permanent: $ git config diff.algorithm patience At any rate, here's how I reviewed: $ diff -u (git diff HEAD^ | sed -n 's/^-//p') (git diff HEAD^ | sed -n 's/^\+//p') You introduced a newline before mig_sleep_cpu(), and changed the text in the DPRINTF definition, but I can live with that. Reviewed-by: Eric Blake ebl...@redhat.com -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] net: Change help text to list -netdev instead of -net by default
On 08/05/2015 11:36, Thomas Huth wrote: Looking at the output of qemu-system-xxx -help, you easily get the impression that -net is the preferred way instead of -netdev to specify host network interface, since the -net option is omnipresent but the -netdev option is only listed as a one-liner at the end. This is ugly since -net is considered as legacy and even might be removed one day. Thus, this patch switches the output to explain the host network interfaces with the -netdev option instead, moving the legacy -net option into some few lines at the end. Signed-off-by: Thomas Huth th...@redhat.com --- qemu-options.hx | 66 ++--- 1 file changed, 39 insertions(+), 27 deletions(-) That would be great if it worked for all machine types, but it doesn't. For example, there's no equivalent of: $ qemu-system-arm -net user -net nic -machine versatilepb You cannot just use -netdev: $ qemu-system-arm -netdev user,id=nd0 -machine versatilepb Warning: netdev nd0 has no peer You cannot use -netdev and -net together: $ qemu-system-arm -netdev user,id=nd0 -net nic -machine versatilepb Warning: vlan 0 is not connected to host network Warning: netdev nd0 has no peer This particular board has a PCI controller, so you can just add a PCI NIC using -netdev/-device, but still that wouldn't be a match for -net user -net nic (which puts a smc91c111 NIC on sysbus). In most embedded boards there's not even a PCI controller. So the patch is great, but I wouldn't say it's deprecated, because in practice it isn't. Paolo
Re: [Qemu-devel] [PATCH v3 3/7] qom: create objects in two phases
On 08/05/2015 16:37, Andreas Färber wrote: Hi, Can we *please* find a better subject for this? To me, creating QOM objects in two phases is about instance_init vs. realize, and thus I was pretty upset that Paolo dared to apply this without asking me first. Oops, sorry. I very much understand where you came from, now. Thanks, Paolo
[Qemu-devel] [PULL 03/10] qobject: Clean up around qtype_code
From: Markus Armbruster arm...@redhat.com QTYPE_NONE is a sentinel value. No QObject has this type code. Document it properly. Fix dump_qobject() to abort() on QTYPE_NONE, just like for any other invalid type code. Fix to_json() to abort() on all invalid type codes, not just QTYPE_MAX. Clean up Property member qtype's type: it's a qtype_code. Signed-off-by: Markus Armbruster arm...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- block/qapi.c | 3 --- include/hw/qdev-core.h | 2 +- include/qapi/qmp/qobject.h | 2 +- qobject/qjson.c| 3 +-- 4 files changed, 3 insertions(+), 7 deletions(-) diff --git a/block/qapi.c b/block/qapi.c index 063dd1b..18d2b95 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -523,9 +523,6 @@ static void dump_qobject(fprintf_function func_fprintf, void *f, QDECREF(value); break; } -case QTYPE_NONE: -break; -case QTYPE_MAX: default: abort(); } diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 4e673f9..9a0ee30 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -226,7 +226,7 @@ struct Property { PropertyInfo *info; int offset; uint8_t bitnr; -uint8_t qtype; +qtype_code qtype; int64_t defval; int arrayoffset; PropertyInfo *arrayinfo; diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h index d0bbc7c..0991296 100644 --- a/include/qapi/qmp/qobject.h +++ b/include/qapi/qmp/qobject.h @@ -36,7 +36,7 @@ #include assert.h typedef enum { -QTYPE_NONE, +QTYPE_NONE,/* sentinel value, no QObject has this type code */ QTYPE_QINT, QTYPE_QSTRING, QTYPE_QDICT, diff --git a/qobject/qjson.c b/qobject/qjson.c index 12c576d..f2857c1 100644 --- a/qobject/qjson.c +++ b/qobject/qjson.c @@ -260,9 +260,8 @@ static void to_json(const QObject *obj, QString *str, int pretty, int indent) } case QTYPE_QERROR: /* XXX: should QError be emitted? */ -case QTYPE_NONE: break; -case QTYPE_MAX: +default: abort(); } } -- 1.9.3
[Qemu-devel] [PULL 02/10] QJSON: Use OBJECT_CHECK
From: Eduardo Habkost ehabk...@redhat.com The QJSON code used casts to (QJSON*) directly, instead of OBJECT_CHECK. There were even some functions using object_dynamic_cast() calls followed by assert(), which is exactly what OBJECT_CHECK does (by calling object_dynamic_cast_assert()). Signed-off-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- qjson.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/qjson.c b/qjson.c index 0cda269..e478802 100644 --- a/qjson.c +++ b/qjson.c @@ -24,6 +24,8 @@ struct QJSON { bool omit_comma; }; +#define QJSON(obj) OBJECT_CHECK(QJSON, (obj), TYPE_QJSON) + static void json_emit_element(QJSON *json, const char *name) { /* Check whether we need to print a , before an element */ @@ -87,7 +89,7 @@ const char *qjson_get_str(QJSON *json) QJSON *qjson_new(void) { -QJSON *json = (QJSON *)object_new(TYPE_QJSON); +QJSON *json = QJSON(object_new(TYPE_QJSON)); return json; } @@ -98,8 +100,7 @@ void qjson_finish(QJSON *json) static void qjson_initfn(Object *obj) { -QJSON *json = (QJSON *)object_dynamic_cast(obj, TYPE_QJSON); -assert(json); +QJSON *json = QJSON(obj); json-str = qstring_from_str({ ); json-omit_comma = true; @@ -107,9 +108,8 @@ static void qjson_initfn(Object *obj) static void qjson_finalizefn(Object *obj) { -QJSON *json = (QJSON *)object_dynamic_cast(obj, TYPE_QJSON); +QJSON *json = QJSON(obj); -assert(json); qobject_decref(QOBJECT(json-str)); } -- 1.9.3