Re: [Qemu-devel] [qom-cpu PATCH 2/2] i386: disable PMU passthrough mode by default
On Mon, 22 Jul 2013 16:25:35 -0300 Eduardo Habkost ehabk...@redhat.com wrote: Bug description: QEMU currently gets all bits from GET_SUPPORTED_CPUID for CPUID leaf 0xA and passes them directly to the guest. This makes the guest ABI depend on host kernel and host CPU capabilities, and breaks live migration if we migrate between host with different capabilities (e.g. different number of PMU counters). This patch adds a pmu-passthrough property to X86CPU, and set it to true only on -cpu host, or on pc-*-1.5 and older machine-types. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- include/hw/i386/pc.h | 4 target-i386/cpu-qom.h | 7 +++ target-i386/cpu.c | 11 ++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 7fb97b0..3cea83f 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -235,6 +235,10 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); .driver = virtio-net-pci,\ .property = any_layout,\ .value= off,\ +},{\ +.driver = TYPE_X86_CPU,\ +.property = pmu-passthrough,\ +.value = on,\ } #define PC_COMPAT_1_4 \ diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h index 7e55e5f..b505a45 100644 --- a/target-i386/cpu-qom.h +++ b/target-i386/cpu-qom.h @@ -68,6 +68,13 @@ typedef struct X86CPU { /* Features that were filtered out because of missing host capabilities */ uint32_t filtered_features[FEATURE_WORDS]; + +/* Pass all PMU CPUID bits to the guest directly from GET_SUPPORTED_CPUID. + * This can't be enabled by default because it breaks live-migration, + * as it makes the guest ABI change depending on host CPU/kernel + * capabilities. + */ +bool pmu_passthrough; } X86CPU; static inline X86CPU *x86_env_get_cpu(CPUX86State *env) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 41c81af..e192f63 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1475,17 +1475,25 @@ static void x86_cpu_get_feature_words(Object *obj, Visitor *v, void *opaque, error_propagate(errp, err); } +static Property cpu_x86_properties[] = { +DEFINE_PROP_BOOL(pmu-passthrough, X86CPU, pmu_passthrough, false), +DEFINE_PROP_END_OF_LIST(), +}; + static int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, const char *name) { x86_def_t *def; int i; +Error *err = NULL; if (name == NULL) { return -1; } if (kvm_enabled() strcmp(name, host) == 0) { kvm_cpu_fill_host(x86_cpu_def); +object_property_set_bool(OBJECT(cpu), true, pmu-passthrough, err); +assert_no_error(err); Could this hunk be implemented using compat props? That would spare us dealing with it later. return 0; } @@ -2017,7 +2025,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 0xA: /* Architectural Performance Monitoring Leaf */ -if (kvm_enabled()) { +if (kvm_enabled() cpu-pmu_passthrough) { KVMState *s = cs-kvm_state; *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX); @@ -2516,6 +2524,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, void *data) xcc-parent_realize = dc-realize; dc-realize = x86_cpu_realizefn; dc-bus_type = TYPE_ICC_BUS; +dc-props = cpu_x86_properties; xcc-parent_reset = cc-reset; cc-reset = x86_cpu_reset;
Re: [Qemu-devel] Emulating mips
On Tue, Jul 23, 2013 at 12:41 AM, Rob Landley r...@landley.net wrote: On 07/23/2013 12:16:53 AM, Renich Bon Ciric wrote: Hello, I am new to this... I'm trying to run some rom file I got from a client. It's a sc2005 processor; supposedly compatible with 4k. Anyway, I do this: qemu-system-mips -M mips -pflash 301-3100\ -\ user\ specified\ -\ Full.bin -serial stdio The processor goes to 100% but I see nothing, not in the serial console nor in the window (monitor, maybe?) I'd appreciate some tips I have working mips images at http://landley.net/aboriginal/bin/system-image-mips.tar.bz2 Grab that extract it, and ./run-emulator.sh. That should let you know what working looks like, and if you can dig a chroot or loopback mount out of your rom image, you can probably mount it under there and try running the binaries. Rob Ah, I've noticed on the kernel output that it's configured for a 24kc and my rom is desinged for a 4k compatible cpu. I think I should recreate the vmlinux right? Anyway, it totally stops when it seems not able to find the right partition for the rom: Linux version 3.10.0 (landley@driftwood) (collect2: ld returned 1 exit status) #1 Wed Jul 3 00:54:09 CDT 2013 bootconsole [early0] enabled CPU revision is: 00019300 (MIPS 24Kc) FPU revision is: 00739300 Software DMA cache coherency enabled Determined physical RAM map: memory: 1000 @ (reserved) memory: 000ef000 @ 1000 (ROM data) memory: 0038f000 @ 000f (reserved) memory: 07b8 @ 0047f000 (usable) Wasting 36832 bytes for tracking 1151 unused pages Zone ranges: DMA [mem 0x-0x00ff] Normal [mem 0x0100-0x07ffefff] Movable zone start for each node Early memory node ranges node 0: [mem 0x-0x07ffefff] Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes. Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32511 Kernel command line: root=/dev/hda rw console=ttyS0 HOST=mips PID hash table entries: 512 (order: -1, 2048 bytes) Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) Writing ErrCtl register= Readback ErrCtl register= Memory: 125296k/126464k available (2596k kernel code, 1168k reserved, 606k data, 188k init, 0k highmem) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 NR_IRQS:256 CPU frequency 200.00 MHz Console: colour dummy device 80x25 Calibrating delay loop... 970.75 BogoMIPS (lpj=1941504) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 devtmpfs: initialized NET: Registered protocol family 16 bio: create slab bio-0 at 0 vgaarb: loaded SCSI subsystem initialized PCI host bridge to bus :00 pci_bus :00: root bus resource [mem 0x1000-0x17ff] pci_bus :00: root bus resource [io 0x2000-0x1f] pci_bus :00: No busn resource found for root bus, will use [bus 00-ff] pci :00:0a.3: no compatible bridge window for [io 0x1100-0x110f] vgaarb: device added: PCI::00:12.0,decodes=io+mem,owns=none,locks=none pci :00:0a.3: BAR 8: [io 0x1100-0x110f] has bogus alignment pci :00:12.0: BAR 0: assigned [mem 0x1000-0x11ff pref] pci :00:0b.0: BAR 6: assigned [mem 0x1200-0x1201 pref] pci :00:12.0: BAR 6: assigned [mem 0x1202-0x1202 pref] pci :00:12.0: BAR 1: assigned [mem 0x1203-0x12030fff] pci :00:0a.2: BAR 4: assigned [io 0x2000-0x201f] pci :00:0b.0: BAR 0: assigned [io 0x2020-0x203f] pci :00:0b.0: BAR 1: assigned [mem 0x12031000-0x1203101f] pci :00:0a.1: BAR 4: assigned [io 0x2040-0x204f] Switching to clocksource pit NET: Registered protocol family 2 TCP established hash table entries: 1024 (order: 1, 8192 bytes) TCP bind hash table entries: 1024 (order: 0, 4096 bytes) TCP: Hash tables configured (established 1024 bind 1024) TCP: reno registered UDP hash table entries: 256 (order: 0, 4096 bytes) UDP-Lite hash table entries: 256 (order: 0, 4096 bytes) NET: Registered protocol family 1 PCI: Enabling device :00:0a.2 ( - 0001) squashfs: version 4.0 (2009/01/31) Phillip Lougher 9p: Installing v9fs 9p2000 file system support msgmni has been set to 244 io scheduler noop registered (default) Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled serial8250.0: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A console [ttyS0] enabled, bootconsole disabled console [ttyS0] enabled, bootconsole disabled serial8250.0: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A loop: module loaded Uniform Multi-Platform E-IDE driver piix :00:0a.1: IDE controller (0x8086:0x7111 rev 0x00) PCI: Enabling device :00:0a.1 ( - 0001) piix :00:0a.1: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x2040-0x2047 ide1: BM-DMA at 0x2048-0x204f hda: QEMU HARDDISK, ATA DISK drive hda: UDMA/33 mode selected hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive hdc: UDMA/33 mode
Re: [Qemu-devel] [PATCH qom-cpu] HACKING: Document vaddr type usage
Il 22/07/2013 19:27, Peter Maydell ha scritto: On 22 July 2013 17:36, Andreas Färber afaer...@suse.de wrote: Signed-off-by: Andreas Färber afaer...@suse.de --- HACKING | 1 + 1 file changed, 1 insertion(+) diff --git a/HACKING b/HACKING index e73ac79..d9dbb46 100644 --- a/HACKING +++ b/HACKING @@ -42,6 +42,7 @@ ram_addr_t. Use target_ulong (or abi_ulong) for CPU virtual addresses, however devices should not need to use target_ulong. +Use vaddr for CPU virtual addresses in target-independent code. Here's my suggestion for this paragraph (ie to replace both the Use target_ulong... and Use vaddr sentences above): ===begin=== For CPU virtual addresses there are several possible types. vaddr is the best type to use to hold a CPU virtual address in target-independent code, including most devices. It is guaranteed to be large enough to hold a virtual address for any target, and it does not change size from target to target. It is always unsigned. target_ulong is a type the size of a virtual address on the CPU; this means it may be 32 or 64 bits depending on which target is being built. It should therefore be used only in target specific code, and in some performance-critical built-per-target core code such as the TLB code. There is also a signed version, target_long. abi_ulong is for the *-user targets, and represents a type the size of 'void *' in that target's ABI. (This may not be the same as the size of a full CPU virtual address in the case of target ABIs which use 32 bit pointers on 64 bit CPUs, like sparc32plus.) Definitions of structures that must match the target's ABI must use this type for anything that on the target is defined to be an 'unsigned long' or a pointer type. There is also a signed version, abi_long. ===endit=== (cc'ing Paolo to check I didn't mangle the abi_ulong/target_ulong distinction.) Yes, it's fine. You might add that abi_short, abi_int, etc. also exist, and that they have the same alignment as short/int/etc in the target ABI. There is one nit: tn fact abi_ulong has the size of 'long'---which is the same as 'void *', but only because our *-user targets are all ILP32 or LP64. A hypotectical windows-user target would make abi_ullong the size of 'void *'. Paolo
[Qemu-devel] KVM call agenda for 2013-07-30
Hi Please, send any topic that you are interested in covering. Thanks, Juan. PD. If you want to attend and you don't have the call details, contact me.
Re: [Qemu-devel] trim in windows guest witch virtio
Il 23/07/2013 03:05, Libaiqing ha scritto: Hi paolo, Recently I test trim function,and it works well in linux guest with ext4 fs. How to test it in windows guest? I got some info like this: 1 windows7 can send discard command when the storage device is ssd; 2 find a tool like 'fstrim', 'TRIM' the volume manually. I think it only works with IDE and AHCI on Windows. You need a filter driver to send it on SCSI disks. Paolo
Re: [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD
Il 23/07/2013 03:52, Wenchao Xia ha scritto: Dirty change tracking is what miss for a full solution, which naturally exist in backing chain snapshot. Any one is doing dirty tracking feature? If no, I'd like to draft a version based on add-cow. I had started it, I can dig it up. Remind me if I don't write about it in a couple of days. Paolo
Re: [Qemu-devel] [PATCH] Convert stderr message calling error_get_pretty() to error_report() to prepend timestamp
Andreas Färber afaer...@suse.de writes: Am 22.07.2013 23:03, schrieb Seiji Aguchi: Convert stderr messages calling error_get_pretty() to error_report(). Timestamp is prepended by -msg timstamp option with it. This is suggested by Luiz Capitulino. http://marc.info/?l=qemu-develm=137331442628866w=2 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch_init.c |4 ++-- hw/char/serial.c|5 +++-- hw/i386/pc.c|6 +++--- qemu-char.c |2 +- target-i386/cpu.c |2 +- target-ppc/translate_init.c |3 ++- vl.c|9 + 7 files changed, 17 insertions(+), 14 deletions(-) How is this related to error_get_pretty()? And does error_report() expect a trailing \n or not? I would assume no, but spotted some in this patch. You're correct, and the patch needs fixing. See commits 312fd5f error: Strip trailing '\n' from error string arguments (again) be62a2e Strip trailing '\n' from error_report()'s first argument (again) 6daf194 Strip trailing '\n' from error_report()'s first argument [...]
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
On Mon, Jul 22, 2013 at 03:29:46PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Sun, Jul 21, 2013 at 04:09:04PM +0200, Andreas Färber wrote: Signed-off-by: Andreas Färber afaer...@suse.de --- hw/pci-bridge/ioh3420.c| 23 ++- hw/pci-bridge/xio3130_downstream.c | 23 ++- hw/pci-bridge/xio3130_upstream.c | 15 +++ hw/pci/pcie_port.c | 22 ++ include/hw/pci/pcie_port.h | 14 -- 5 files changed, 61 insertions(+), 36 deletions(-) diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c index 728f658..83db054 100644 --- a/hw/pci-bridge/ioh3420.c +++ b/hw/pci-bridge/ioh3420.c @@ -92,9 +92,8 @@ static void ioh3420_reset(DeviceState *qdev) static int ioh3420_initfn(PCIDevice *d) { -PCIBridge *br = PCI_BRIDGE(d); -PCIEPort *p = DO_UPCAST(PCIEPort, br, br); -PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +PCIEPort *p = PCIE_PORT(d); +PCIESlot *s = PCIE_SLOT(d); int rc; rc = pci_bridge_initfn(d, TYPE_PCIE_BUS); @@ -148,9 +147,7 @@ err_bridge: static void ioh3420_exitfn(PCIDevice *d) { -PCIBridge *br = PCI_BRIDGE(d); -PCIEPort *p = DO_UPCAST(PCIEPort, br, br); -PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +PCIESlot *s = PCIE_SLOT(d); pcie_aer_exit(d); pcie_chassis_del_slot(s); @@ -180,7 +177,7 @@ PCIESlot *ioh3420_init(PCIBus *bus, int devfn, bool multifunction, qdev_prop_set_uint16(qdev, slot, slot); qdev_init_nofail(qdev); -return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br)); +return PCIE_SLOT(d); } static const VMStateDescription vmstate_ioh3420 = { @@ -190,19 +187,19 @@ static const VMStateDescription vmstate_ioh3420 = { .minimum_version_id_old = 1, .post_load = pcie_cap_slot_post_load, .fields = (VMStateField[]) { -VMSTATE_PCIE_DEVICE(port.br.parent_obj, PCIESlot), -VMSTATE_STRUCT(port.br.parent_obj.exp.aer_log, PCIESlot, 0, +VMSTATE_PCIE_DEVICE(parent_obj, PCIBridge), +VMSTATE_STRUCT(exp.aer_log, PCIDevice, 0, vmstate_pcie_aer_log, PCIEAERLog), VMSTATE_END_OF_LIST() } }; static Property ioh3420_properties[] = { -DEFINE_PROP_UINT8(port, PCIESlot, port.port, 0), +DEFINE_PROP_UINT8(port, PCIEPort, port, 0), DEFINE_PROP_UINT8(chassis, PCIESlot, chassis, 0), DEFINE_PROP_UINT16(slot, PCIESlot, slot, 0), -DEFINE_PROP_UINT16(aer_log_max, PCIESlot, - port.br.parent_obj.exp.aer_log.log_max, +DEFINE_PROP_UINT16(aer_log_max, PCIDevice, + exp.aer_log.log_max, PCIE_AER_LOG_MAX_DEFAULT), DEFINE_PROP_END_OF_LIST(), }; This looks scary. This does a cast to different types without any checks at all. What cast? VMstate takes a void *. One an object is cast to a void *, it's just as much as PCIESlot as it is a PCIEPort. That said, for consistency, I think having everything be relatively to *one* type for a Property list is pretty helpful. Expecting someone to know the type hierarchy by heart such that this doesn't look like a bug is too much IMHO. Regards, Anthony Liguori Exactly.
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote: Am 22.07.2013 22:29, schrieb Anthony Liguori: Michael S. Tsirkin m...@redhat.com writes: On Sun, Jul 21, 2013 at 04:09:04PM +0200, Andreas Färber wrote: Signed-off-by: Andreas Färber afaer...@suse.de --- hw/pci-bridge/ioh3420.c| 23 ++- hw/pci-bridge/xio3130_downstream.c | 23 ++- hw/pci-bridge/xio3130_upstream.c | 15 +++ hw/pci/pcie_port.c | 22 ++ include/hw/pci/pcie_port.h | 14 -- 5 files changed, 61 insertions(+), 36 deletions(-) diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c index 728f658..83db054 100644 --- a/hw/pci-bridge/ioh3420.c +++ b/hw/pci-bridge/ioh3420.c @@ -92,9 +92,8 @@ static void ioh3420_reset(DeviceState *qdev) static int ioh3420_initfn(PCIDevice *d) { -PCIBridge *br = PCI_BRIDGE(d); -PCIEPort *p = DO_UPCAST(PCIEPort, br, br); -PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +PCIEPort *p = PCIE_PORT(d); +PCIESlot *s = PCIE_SLOT(d); int rc; rc = pci_bridge_initfn(d, TYPE_PCIE_BUS); @@ -148,9 +147,7 @@ err_bridge: static void ioh3420_exitfn(PCIDevice *d) { -PCIBridge *br = PCI_BRIDGE(d); -PCIEPort *p = DO_UPCAST(PCIEPort, br, br); -PCIESlot *s = DO_UPCAST(PCIESlot, port, p); +PCIESlot *s = PCIE_SLOT(d); pcie_aer_exit(d); pcie_chassis_del_slot(s); @@ -180,7 +177,7 @@ PCIESlot *ioh3420_init(PCIBus *bus, int devfn, bool multifunction, qdev_prop_set_uint16(qdev, slot, slot); qdev_init_nofail(qdev); -return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br)); +return PCIE_SLOT(d); } static const VMStateDescription vmstate_ioh3420 = { @@ -190,19 +187,19 @@ static const VMStateDescription vmstate_ioh3420 = { .minimum_version_id_old = 1, .post_load = pcie_cap_slot_post_load, .fields = (VMStateField[]) { -VMSTATE_PCIE_DEVICE(port.br.parent_obj, PCIESlot), -VMSTATE_STRUCT(port.br.parent_obj.exp.aer_log, PCIESlot, 0, +VMSTATE_PCIE_DEVICE(parent_obj, PCIBridge), +VMSTATE_STRUCT(exp.aer_log, PCIDevice, 0, vmstate_pcie_aer_log, PCIEAERLog), VMSTATE_END_OF_LIST() } }; static Property ioh3420_properties[] = { -DEFINE_PROP_UINT8(port, PCIESlot, port.port, 0), +DEFINE_PROP_UINT8(port, PCIEPort, port, 0), DEFINE_PROP_UINT8(chassis, PCIESlot, chassis, 0), DEFINE_PROP_UINT16(slot, PCIESlot, slot, 0), -DEFINE_PROP_UINT16(aer_log_max, PCIESlot, - port.br.parent_obj.exp.aer_log.log_max, +DEFINE_PROP_UINT16(aer_log_max, PCIDevice, + exp.aer_log.log_max, PCIE_AER_LOG_MAX_DEFAULT), DEFINE_PROP_END_OF_LIST(), }; This looks scary. This does a cast to different types without any checks at all. What cast? VMstate takes a void *. One an object is cast to a void *, it's just as much as PCIESlot as it is a PCIEPort. That said, for consistency, I think having everything be relatively to *one* type for a Property list is pretty helpful. Expecting someone to know the type hierarchy by heart such that this doesn't look like a bug is too much IMHO. I'm updating the patch to that effect for VMState. But I notice the real fix for qdev properties would be to move the PCIEPort property to the new PCIEPort type, so that all derived types inherit it. :) For VMState I believe the real follow-up fix would be mst defining a central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice. Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE() in the first place? Regards, Andreas The real fix is savevm/loadvm taking into account the class hierarchy. -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 4/9] linux-user: Clean up sendrecvmsg message parsing
On 6 July 2013 15:17, Alexander Graf ag...@suse.de wrote: The cmsg handling in the linux-user code is very hard to read as it tries to follow glibc's unreadable code closely. Let's clean it up, converting all helpers into static inline functions, so that QEMU developers have a chance to understand what's going on. While at it, this also allows us to make the next target header lookup more obvious and GUEST_BASE safe. We only compare host mapped pointers between each other now. During the rewrite I also saw that we never advance our target cmsg structure to the next one. This behavior is identical to the previous code, but looks very bogus to me. recvmsg01 and sendmsg01 both segfault (arm target and x86_64 host) with both current linux-user code and after this patch. So there is more to fix here. On the bright side this patch is not an regression :) Signed-off-by: Alexander Graf ag...@suse.de --- linux-user/syscall.c | 25 --- linux-user/syscall_defs.h | 58 ++-- 2 files changed, 55 insertions(+), 28 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 89b7698..8eb5c31 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -1120,6 +1120,7 @@ static inline abi_long target_to_host_cmsg(struct msghdr *msgh, abi_long msg_controllen; abi_ulong target_cmsg_addr; struct target_cmsghdr *target_cmsg; +struct target_cmsghdr *first_target_cmsg; socklen_t space = 0; msg_controllen = tswapal(target_msgh-msg_controllen); @@ -1129,13 +1130,14 @@ static inline abi_long target_to_host_cmsg(struct msghdr *msgh, target_cmsg = lock_user(VERIFY_READ, target_cmsg_addr, msg_controllen, 1); if (!target_cmsg) return -TARGET_EFAULT; +first_target_cmsg = target_cmsg; while (cmsg target_cmsg) { void *data = CMSG_DATA(cmsg); -void *target_data = TARGET_CMSG_DATA(target_cmsg); +void *target_data = target_cmsg_data(target_cmsg); int len = tswapal(target_cmsg-cmsg_len) - - TARGET_CMSG_ALIGN(sizeof (struct target_cmsghdr)); + - target_cmsg_align(sizeof (struct target_cmsghdr)); space += CMSG_SPACE(len); if (space msgh-msg_controllen) { @@ -1161,7 +1163,8 @@ static inline abi_long target_to_host_cmsg(struct msghdr *msgh, } cmsg = CMSG_NXTHDR(msgh, cmsg); -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg); +target_cmsg = target_cmsg_nxthdr(target_msgh, target_cmsg, + first_target_cmsg); } unlock_user(target_cmsg, target_cmsg_addr, 0); the_end: @@ -1176,6 +1179,7 @@ static inline abi_long host_to_target_cmsg(struct target_msghdr *target_msgh, abi_long msg_controllen; abi_ulong target_cmsg_addr; struct target_cmsghdr *target_cmsg; +struct target_cmsghdr *first_target_cmsg; socklen_t space = 0; msg_controllen = tswapal(target_msgh-msg_controllen); @@ -1183,25 +1187,27 @@ static inline abi_long host_to_target_cmsg(struct target_msghdr *target_msgh, goto the_end; target_cmsg_addr = tswapal(target_msgh-msg_control); target_cmsg = lock_user(VERIFY_WRITE, target_cmsg_addr, msg_controllen, 0); -if (!target_cmsg) +if (!target_cmsg) { return -TARGET_EFAULT; +} +first_target_cmsg = target_cmsg; while (cmsg target_cmsg) { void *data = CMSG_DATA(cmsg); -void *target_data = TARGET_CMSG_DATA(target_cmsg); +void *target_data = target_cmsg_data(target_cmsg); int len = cmsg-cmsg_len - CMSG_ALIGN(sizeof (struct cmsghdr)); -space += TARGET_CMSG_SPACE(len); +space += target_cmsg_space(len); if (space msg_controllen) { -space -= TARGET_CMSG_SPACE(len); +space -= target_cmsg_space(len); gemu_log(Target cmsg overflow\n); break; } target_cmsg-cmsg_level = tswap32(cmsg-cmsg_level); target_cmsg-cmsg_type = tswap32(cmsg-cmsg_type); -target_cmsg-cmsg_len = tswapal(TARGET_CMSG_LEN(len)); +target_cmsg-cmsg_len = tswapal(target_cmsg_len(len)); if ((cmsg-cmsg_level == TARGET_SOL_SOCKET) (cmsg-cmsg_type == SCM_RIGHTS)) { @@ -1228,7 +1234,8 @@ static inline abi_long host_to_target_cmsg(struct target_msghdr *target_msgh, } cmsg = CMSG_NXTHDR(msgh, cmsg); -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg); +target_cmsg = target_cmsg_nxthdr(target_msgh, target_cmsg, + first_target_cmsg); } unlock_user(target_cmsg, target_cmsg_addr, space); the_end: diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 92c01a9..84da7f7 100644 ---
Re: [Qemu-devel] [PATCHv3 0/2] seccomp: remove unused syscalls - for 1.6
Il 22/07/2013 20:33, Eduardo Otubo ha scritto: In this small patch series I basically: v3 update: - reincluded getrlimit(), it's used by Xen. v2 update: - set libseccomp 2.1.0 as requirement on configure script. - reincluded setrlimit() (used by Xen) and removed sendfile64() from the whitelist. 1) Remove the ifdef's for the (not so) new libseccomp version that does a best effort and translates x86_32 syscalls into x86_64 when possible. 2) Remove unused syscalls on the seccomp whitelist. For that removal, I've been running several instances of Qemu using a script written on top of virt-test[0]. After some weeks testing I could come up with this small list, and safely remove them without breaking anything. [0] - https://github.com/autotest/virt-test/wiki Eduardo Otubo (2): seccomp: no need to check arch in syscall whitelist seccomp: removing unused syscalls gtom whitelist configure | 2 +- qemu-seccomp.c | 17 - 2 files changed, 1 insertion(+), 18 deletions(-) Looks good. Reviewed-by: Paolo Bonzini pbonz...@redhat.com Paolo
Re: [Qemu-devel] KVM call agenda for 2013-07-30
On 23/07/13 08:33, Juan Quintela wrote: Hi Please, send any topic that you are interested in covering. - soft reset and other reset variants. What is the right way to go? (e.g. on s390 there are several reset variants that reset a defined subset of the system. This can be triggered by operating systems, e.g. kdump uses a diagnose instruction that resets the device subsystem and parts of the cpu registers (some are unchanged, this allows to take a proper crash dump with a well defined system state)
[Qemu-devel] [PATCH 09/11] sheepdog: try to reconnect to sheepdog after network error
This introduces a failed request queue and links all the inflight requests to the list after network error happens. After QEMU reconnects to the sheepdog server successfully, the sheepdog block driver will retry all the requests in the failed queue. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 72 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 1173605..cd72927 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState { Coroutine *co_recv; uint32_t aioreq_seq_num; + +/* Every aio request must be linked to either of these queues. */ QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head; QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head; +QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head; } BDRVSheepdogState; static const char * sd_strerror(int err) @@ -669,6 +672,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, enum AIOCBState aiocb_type); static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req); static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag); +static int get_sheep_fd(BDRVSheepdogState *s); +static void co_write_request(void *opaque); static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid) { @@ -710,6 +715,44 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid) } } +static coroutine_fn void reconnect_to_sdog(void *opaque) +{ +BDRVSheepdogState *s = opaque; +AIOReq *aio_req, *next; + +qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL); +close(s-fd); +s-fd = -1; + +/* Wait for outstanding write requests to be completed. */ +while (s-co_send != NULL) { +co_write_request(opaque); +} + +/* Move all the inflight requests to the failed queue. */ +QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) { +QLIST_REMOVE(aio_req, aio_siblings); +QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings); +} + +/* Try to reconnect the sheepdog server every one second. */ +while (s-fd 0) { +s-fd = get_sheep_fd(s); +if (s-fd 0) { +dprintf(Wait for connection to be established\n); +co_aio_sleep_ns(10ULL); +} +}; + +/* Resend all the failed aio requests. */ +while (!QLIST_EMPTY(s-failed_aio_head)) { +aio_req = QLIST_FIRST(s-failed_aio_head); +QLIST_REMOVE(aio_req, aio_siblings); +QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings); +resend_aioreq(s, aio_req); +} +} + /* * Receive responses of the I/O requests. * @@ -726,15 +769,11 @@ static void coroutine_fn aio_read_response(void *opaque) SheepdogAIOCB *acb; uint64_t idx; -if (QLIST_EMPTY(s-inflight_aio_head)) { -goto out; -} - /* read a header */ ret = qemu_co_recv(fd, rsp, sizeof(rsp)); if (ret sizeof(rsp)) { error_report(failed to get the header, %s, strerror(errno)); -goto out; +goto err; } /* find the right aio_req from the inflight aio list */ @@ -745,7 +784,7 @@ static void coroutine_fn aio_read_response(void *opaque) } if (!aio_req) { error_report(cannot find aio_req %x, rsp.id); -goto out; +goto err; } acb = aio_req-aiocb; @@ -785,7 +824,7 @@ static void coroutine_fn aio_read_response(void *opaque) aio_req-iov_offset, rsp.data_length); if (ret rsp.data_length) { error_report(failed to get the data, %s, strerror(errno)); -goto out; +goto err; } break; case AIOCB_FLUSH_CACHE: @@ -819,10 +858,9 @@ static void coroutine_fn aio_read_response(void *opaque) if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) { ret = reload_inode(s, 0, ); if (ret 0) { -goto out; +goto err; } } - if (is_data_obj(aio_req-oid)) { aio_req-oid = vid_to_data_oid(s-inode.vdi_id, data_oid_to_idx(aio_req-oid)); @@ -850,6 +888,10 @@ static void coroutine_fn aio_read_response(void *opaque) } out: s-co_recv = NULL; +return; +err: +s-co_recv = NULL; +reconnect_to_sdog(opaque); } static void co_read_response(void *opaque) @@ -875,7 +917,8 @@ static int aio_flush_request(void *opaque) BDRVSheepdogState *s = opaque; return !QLIST_EMPTY(s-inflight_aio_head) || -!QLIST_EMPTY(s-pending_aio_head); +!QLIST_EMPTY(s-pending_aio_head) || +!QLIST_EMPTY(s-failed_aio_head); } /* @@ -1150,23 +1193,21 @@ static int coroutine_fn
[Qemu-devel] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait
This allows us to use inet_nonblocking_connect() and unix_nonblocking_connect() in block drivers. qemu-ga needs to link block-obj to resolve dependencies of qemu_aio_set_fd_handler(). Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- Makefile| 4 ++-- util/qemu-sockets.c | 15 ++- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index c06bfab..5fe2e0f 100644 --- a/Makefile +++ b/Makefile @@ -197,7 +197,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) -qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) +qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBS_TOOLS) qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated gen-out-type = $(subst .,-,$(suffix $@)) @@ -227,7 +227,7 @@ $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py) QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h) $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN) -qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a +qemu-ga$(EXESUF): $(qga-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a $(call LINK, $^) clean: diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 095716e..8b21fd1 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -218,6 +218,11 @@ typedef struct ConnectState { static int inet_connect_addr(struct addrinfo *addr, bool *in_progress, ConnectState *connect_state, Error **errp); +static int return_true(void *opaque) +{ +return 1; +} + static void wait_for_connect(void *opaque) { ConnectState *s = opaque; @@ -225,7 +230,7 @@ static void wait_for_connect(void *opaque) socklen_t valsize = sizeof(val); bool in_progress; -qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL); +qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL); do { rc = qemu_getsockopt(s-fd, SOL_SOCKET, SO_ERROR, val, valsize); @@ -288,8 +293,8 @@ static int inet_connect_addr(struct addrinfo *addr, bool *in_progress, if (connect_state != NULL QEMU_SOCKET_RC_INPROGRESS(rc)) { connect_state-fd = sock; -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect, - connect_state); +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true, +connect_state); *in_progress = true; } else if (rc 0) { error_set_errno(errp, errno, QERR_SOCKET_CONNECT_FAILED); @@ -749,8 +754,8 @@ int unix_connect_opts(QemuOpts *opts, Error **errp, if (connect_state != NULL QEMU_SOCKET_RC_INPROGRESS(rc)) { connect_state-fd = sock; -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect, - connect_state); +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true, +connect_state); return sock; } else if (rc = 0) { /* non blocking socket immediate success, call callback */ -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 06/11] sheepdog: handle vdi objects in resend_aio_req
The current resend_aio_req() doesn't work when the request is against vdi objects. This fixes the problem. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 567f52e..018eab2 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -1265,11 +1265,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) return ret; } -aio_req-oid = vid_to_data_oid(s-inode.vdi_id, - data_oid_to_idx(aio_req-oid)); +if (is_data_obj(aio_req-oid)) { +aio_req-oid = vid_to_data_oid(s-inode.vdi_id, + data_oid_to_idx(aio_req-oid)); +} else { +aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id); +} /* check whether this request becomes a CoW one */ -if (acb-aiocb_type == AIOCB_WRITE_UDATA) { +if (acb-aiocb_type == AIOCB_WRITE_UDATA is_data_obj(aio_req-oid)) { int idx = data_oid_to_idx(aio_req-oid); AIOReq *areq; @@ -1297,8 +1301,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) create = true; } out: -return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, - create, acb-aiocb_type); +if (is_data_obj(aio_req-oid)) { +return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, + create, acb-aiocb_type); +} else { +struct iovec iov; +iov.iov_base = s-inode; +iov.iov_len = sizeof(s-inode); +return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA); +} } /* TODO Convert to fine grained options */ -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 11/11] sheepdog: cancel aio requests if possible
This patch tries to cancel aio requests in pending queue and failed queue. When the sheepdog driver cannot cancel the requests, it waits for them to be completed. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 70 +++- 1 file changed, 59 insertions(+), 11 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 8a6c432..43be479 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -294,7 +294,8 @@ struct SheepdogAIOCB { Coroutine *coroutine; void (*aio_done_func)(SheepdogAIOCB *); -bool canceled; +bool cancelable; +bool *finished; int nr_pending; }; @@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req) { SheepdogAIOCB *acb = aio_req-aiocb; +acb-cancelable = false; QLIST_REMOVE(aio_req, aio_siblings); g_free(aio_req); @@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req) static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb) { -if (!acb-canceled) { -qemu_coroutine_enter(acb-coroutine, NULL); +qemu_coroutine_enter(acb-coroutine, NULL); +if (acb-finished) { +*acb-finished = true; } qemu_aio_release(acb); } +/* + * Check whether the specified acb can be canceled + * + * We can cancel aio when any request belonging to the acb is: + * - Not processed by the sheepdog server. + * - Not linked to the inflight queue. + */ +static bool sd_acb_cancelable(const SheepdogAIOCB *acb) +{ +BDRVSheepdogState *s = acb-common.bs-opaque; +AIOReq *aioreq; + +if (!acb-cancelable) { +return false; +} + +QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) { +if (aioreq-aiocb == acb) { +return false; +} +} + +return false; +} + static void sd_aio_cancel(BlockDriverAIOCB *blockacb) { SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb; +BDRVSheepdogState *s = acb-common.bs-opaque; +AIOReq *aioreq, *next; +bool finished = false; + +acb-finished = finished; +while (!finished) { +if (sd_acb_cancelable(acb)) { +/* Remove outstanding requests from pending and failed queues. */ +QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings, + next) { +if (aioreq-aiocb == acb) { +free_aio_req(s, aioreq); +} +} +QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings, + next) { +if (aioreq-aiocb == acb) { +free_aio_req(s, aioreq); +} +} -/* - * Sheepdog cannot cancel the requests which are already sent to - * the servers, so we just complete the request with -EIO here. - */ -acb-ret = -EIO; -qemu_coroutine_enter(acb-coroutine, NULL); -acb-canceled = true; +assert(acb-nr_pending == 0); +sd_finish_aiocb(acb); +return; +} +qemu_aio_wait(); +} } static const AIOCBInfo sd_aiocb_info = { @@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov, acb-nb_sectors = nb_sectors; acb-aio_done_func = NULL; -acb-canceled = false; +acb-cancelable = true; +acb-finished = NULL; acb-coroutine = qemu_coroutine_self(); acb-ret = 0; acb-nr_pending = 0; -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 08/11] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
This helper function behaves similarly to co_sleep_ns(), but the sleeping coroutine will be resumed when using qemu_aio_wait(). Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- include/block/coroutine.h | 8 qemu-coroutine-sleep.c| 47 +++ 2 files changed, 55 insertions(+) diff --git a/include/block/coroutine.h b/include/block/coroutine.h index 377805a..23ea6e9 100644 --- a/include/block/coroutine.h +++ b/include/block/coroutine.h @@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock); void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns); /** + * Yield the coroutine for a given duration + * + * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be + * resumed when using qemu_aio_wait(). + */ +void coroutine_fn co_aio_sleep_ns(int64_t ns); + +/** * Yield until a file descriptor becomes readable * * Note that this function clobbers the handlers for the file descriptor. diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c index 169ce5c..3955347 100644 --- a/qemu-coroutine-sleep.c +++ b/qemu-coroutine-sleep.c @@ -13,6 +13,7 @@ #include block/coroutine.h #include qemu/timer.h +#include qemu/thread.h typedef struct CoSleepCB { QEMUTimer *ts; @@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns) qemu_del_timer(sleep_cb.ts); qemu_free_timer(sleep_cb.ts); } + +typedef struct CoAioSleepCB { +QEMUBH *bh; +int64_t ns; +Coroutine *co; +} CoAioSleepCB; + +static void co_aio_sleep_cb(void *opaque) +{ +CoAioSleepCB *aio_sleep_cb = opaque; + +qemu_coroutine_enter(aio_sleep_cb-co, NULL); +} + +static void *sleep_thread(void *opaque) +{ +CoAioSleepCB *aio_sleep_cb = opaque; +struct timespec req = { +.tv_sec = aio_sleep_cb-ns / 10, +.tv_nsec = aio_sleep_cb-ns % 10, +}; +struct timespec rem; + +while (nanosleep(req, rem) 0 errno == EINTR) { +req = rem; +} + +qemu_bh_schedule(aio_sleep_cb-bh); + +return NULL; +} + +void coroutine_fn co_aio_sleep_ns(int64_t ns) +{ +CoAioSleepCB aio_sleep_cb = { +.ns = ns, +.co = qemu_coroutine_self(), +}; +QemuThread thread; + +aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb); +qemu_thread_create(thread, sleep_thread, aio_sleep_cb, + QEMU_THREAD_DETACHED); +qemu_coroutine_yield(); +qemu_bh_delete(aio_sleep_cb.bh); +} -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 07/11] sheepdog: reload inode outside of resend_aioreq
This prepares for using resend_aioreq() after reconnecting to the sheepdog server. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 33 +++-- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 018eab2..1173605 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid) return oid (MAX_DATA_OBJS - 1); } +static inline uint32_t oid_to_vid(uint64_t oid) +{ +return (oid ~VDI_BIT) VDI_SPACE_SHIFT; +} + static inline uint64_t vid_to_vdi_oid(uint32_t vid) { return VDI_BIT | ((uint64_t)vid VDI_SPACE_SHIFT); @@ -663,7 +668,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, struct iovec *iov, int niov, bool create, enum AIOCBState aiocb_type); static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req); - +static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag); static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid) { @@ -811,6 +816,19 @@ static void coroutine_fn aio_read_response(void *opaque) case SD_RES_SUCCESS: break; case SD_RES_READONLY: +if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) { +ret = reload_inode(s, 0, ); +if (ret 0) { +goto out; +} +} + +if (is_data_obj(aio_req-oid)) { +aio_req-oid = vid_to_data_oid(s-inode.vdi_id, + data_oid_to_idx(aio_req-oid)); +} else { +aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id); +} ret = resend_aioreq(s, aio_req); if (ret == SD_RES_SUCCESS) { goto out; @@ -1258,19 +1276,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) { SheepdogAIOCB *acb = aio_req-aiocb; bool create = false; -int ret; - -ret = reload_inode(s, 0, ); -if (ret 0) { -return ret; -} - -if (is_data_obj(aio_req-oid)) { -aio_req-oid = vid_to_data_oid(s-inode.vdi_id, - data_oid_to_idx(aio_req-oid)); -} else { -aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id); -} /* check whether this request becomes a CoW one */ if (acb-aiocb_type == AIOCB_WRITE_UDATA is_data_obj(aio_req-oid)) { -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 04/11] sheepdog: make connect nonblocking
This uses nonblocking connect functions to connect to the sheepdog server. The connect operation is done in a coroutine function and it will be yielded until the created socked is ready for IO. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 70 ++-- 1 file changed, 63 insertions(+), 7 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 6a41ad9..6f5ede4 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -455,18 +455,51 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov, return acb; } -static int connect_to_sdog(BDRVSheepdogState *s) -{ +typedef struct SheepdogConnectCo { +BDRVSheepdogState *bs; +Coroutine *co; int fd; +bool finished; +} SheepdogConnectCo; + +static void sd_connect_completed(int fd, void *opaque) +{ +SheepdogConnectCo *scco = opaque; + +if (fd 0) { +int val, rc; +socklen_t valsize = sizeof(val); + +do { +rc = qemu_getsockopt(scco-fd, SOL_SOCKET, SO_ERROR, val, + valsize); +} while (rc == -1 socket_error() == EINTR); + +scco-fd = rc 0 ? -errno : -val; +} + +scco-finished = true; + +if (scco-co != NULL) { +qemu_coroutine_enter(scco-co, NULL); +} +} + +static coroutine_fn void co_connect_to_sdog(void *opaque) +{ +SheepdogConnectCo *scco = opaque; +BDRVSheepdogState *s = scco-bs; Error *err = NULL; if (s-is_unix) { -fd = unix_connect(s-host_spec, err); +scco-fd = unix_nonblocking_connect(s-host_spec, sd_connect_completed, +opaque, err); } else { -fd = inet_connect(s-host_spec, err); +scco-fd = inet_nonblocking_connect(s-host_spec, sd_connect_completed, +opaque, err); if (err == NULL) { -int ret = socket_set_nodelay(fd); +int ret = socket_set_nodelay(scco-fd); if (ret 0) { error_report(%s, strerror(errno)); } @@ -476,11 +509,34 @@ static int connect_to_sdog(BDRVSheepdogState *s) if (err != NULL) { qerror_report_err(err); error_free(err); +} + +if (!scco-finished) { +/* wait for connect to finish */ +scco-co = qemu_coroutine_self(); +qemu_coroutine_yield(); +} +} + +static int connect_to_sdog(BDRVSheepdogState *s) +{ +Coroutine *co; +SheepdogConnectCo scco = { +.bs = s, +.finished = false, +}; + +if (qemu_in_coroutine()) { +co_connect_to_sdog(scco); } else { -qemu_set_nonblock(fd); +co = qemu_coroutine_create(co_connect_to_sdog); +qemu_coroutine_enter(co, scco); +while (!scco.finished) { +qemu_aio_wait(); +} } -return fd; +return scco.fd; } static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data, -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 10/11] sheepdog: make add_aio_request and send_aioreq void functions
These functions no longer return errors. We can make them void functions and simplify the codes. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 66 +++- 1 file changed, 17 insertions(+), 49 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index cd72927..8a6c432 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -667,10 +667,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data, return srco.ret; } -static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, +static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, struct iovec *iov, int niov, bool create, enum AIOCBState aiocb_type); -static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req); +static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req); static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char *tag); static int get_sheep_fd(BDRVSheepdogState *s); static void co_write_request(void *opaque); @@ -696,22 +696,14 @@ static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid) { AIOReq *aio_req; SheepdogAIOCB *acb; -int ret; while ((aio_req = find_pending_req(s, oid)) != NULL) { acb = aio_req-aiocb; /* move aio_req from pending list to inflight one */ QLIST_REMOVE(aio_req, aio_siblings); QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings); -ret = add_aio_request(s, aio_req, acb-qiov-iov, - acb-qiov-niov, false, acb-aiocb_type); -if (ret 0) { -error_report(add_aio_request is failed); -free_aio_req(s, aio_req); -if (!acb-nr_pending) { -sd_finish_aiocb(acb); -} -} +add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false, +acb-aiocb_type); } } @@ -867,11 +859,8 @@ static void coroutine_fn aio_read_response(void *opaque) } else { aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id); } -ret = resend_aioreq(s, aio_req); -if (ret == SD_RES_SUCCESS) { -goto out; -} -/* fall through */ +resend_aioreq(s, aio_req); +goto out; default: acb-ret = -EIO; error_report(%s, sd_strerror(rsp.result)); @@ -1129,7 +1118,7 @@ out: return ret; } -static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, +static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, struct iovec *iov, int niov, bool create, enum AIOCBState aiocb_type) { @@ -1209,8 +1198,6 @@ out: aio_flush_request, s); s-co_send = NULL; qemu_co_mutex_unlock(s-lock); - -return 0; } static int read_write_object(int fd, char *buf, uint64_t oid, int copies, @@ -1313,7 +1300,7 @@ out: return ret; } -static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) +static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) { SheepdogAIOCB *acb = aio_req-aiocb; bool create = false; @@ -1338,7 +1325,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid); QLIST_REMOVE(aio_req, aio_siblings); QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings); -return SD_RES_SUCCESS; +return; } } @@ -1348,13 +1335,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req) } out: if (is_data_obj(aio_req-oid)) { -return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, - create, acb-aiocb_type); +add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create, +acb-aiocb_type); } else { struct iovec iov; iov.iov_base = s-inode; iov.iov_len = sizeof(s-inode); -return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA); +add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA); } } @@ -1744,7 +1731,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset) */ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb) { -int ret; BDRVSheepdogState *s = acb-common.bs-opaque; struct iovec iov; AIOReq *aio_req; @@ -1766,18 +1752,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb) aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id), data_len, offset, 0, 0, offset); QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings); -
[Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv
Without this patch, iov_send_recv() never returns when do_send_recv() returns zero. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- util/iov.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/util/iov.c b/util/iov.c index cc6e837..f705586 100644 --- a/util/iov.c +++ b/util/iov.c @@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt, return -1; } +if (ret == 0 !do_send) { +/* recv returns 0 when the peer has performed an orderly + * shutdown. */ +break; +} + /* Prepare for the next iteration */ offset += ret; total += ret; -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 05/11] sheepdog: check return values of qemu_co_recv/send correctly
qemu_co_recv/send return shorter length on error. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- block/sheepdog.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/block/sheepdog.c b/block/sheepdog.c index 6f5ede4..567f52e 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -727,7 +727,7 @@ static void coroutine_fn aio_read_response(void *opaque) /* read a header */ ret = qemu_co_recv(fd, rsp, sizeof(rsp)); -if (ret 0) { +if (ret sizeof(rsp)) { error_report(failed to get the header, %s, strerror(errno)); goto out; } @@ -778,7 +778,7 @@ static void coroutine_fn aio_read_response(void *opaque) case AIOCB_READ_UDATA: ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov, aio_req-iov_offset, rsp.data_length); -if (ret 0) { +if (ret rsp.data_length) { error_report(failed to get the data, %s, strerror(errno)); goto out; } @@ -1131,7 +1131,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, /* send a header */ ret = qemu_co_send(s-fd, hdr, sizeof(hdr)); -if (ret 0) { +if (ret sizeof(hdr)) { qemu_co_mutex_unlock(s-lock); error_report(failed to send a req, %s, strerror(errno)); return -errno; @@ -1139,7 +1139,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, if (wlen) { ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen); -if (ret 0) { +if (ret wlen) { qemu_co_mutex_unlock(s-lock); error_report(failed to send a data, %s, strerror(errno)); return -errno; -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 01/11] ignore SIGPIPE in qemu-img and qemu-io
This prevents the tools from being stopped when they write data to a closed connection in the other side. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- qemu-img.c | 4 qemu-io.c | 4 2 files changed, 8 insertions(+) diff --git a/qemu-img.c b/qemu-img.c index c55ca5c..919d464 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -2319,6 +2319,10 @@ int main(int argc, char **argv) const img_cmd_t *cmd; const char *cmdname; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + error_set_progname(argv[0]); qemu_init_main_loop(); diff --git a/qemu-io.c b/qemu-io.c index cb9def5..d54dc86 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -335,6 +335,10 @@ int main(int argc, char **argv) int opt_index = 0; int flags = BDRV_O_UNMAP; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + progname = basename(argv[0]); while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) { -- 1.8.1.3.566.gaa39828
[Qemu-devel] [PATCH 00/11] sheepdog: reconnect server after connection failure
Currently, if a sheepdog server exits, all the connecting VMs need to be restarted. This series implements a feature to reconnect the server, and enables us to do online sheepdog upgrade and avoid restarting VMs when sheepdog servers crash unexpectedly. MORITA Kazutaka (11): ignore SIGPIPE in qemu-img and qemu-io iov: handle eof in iov_send_recv qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait sheepdog: make connect nonblocking sheepdog: check return values of qemu_co_recv/send correctly sheepdog: handle vdi objects in resend_aio_req sheepdog: reload inode outside of resend_aioreq coroutine: add co_aio_sleep_ns() to allow sleep in block drivers sheepdog: try to reconnect to sheepdog after network error sheepdog: make add_aio_request and send_aioreq void functions sheepdog: cancel aio requests if possible Makefile | 4 +- block/sheepdog.c | 314 -- include/block/coroutine.h | 8 ++ qemu-coroutine-sleep.c| 47 +++ qemu-img.c| 4 + qemu-io.c | 4 + util/iov.c| 6 + util/qemu-sockets.c | 15 ++- 8 files changed, 303 insertions(+), 99 deletions(-) -- 1.8.1.3.566.gaa39828
Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on
On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote: I use systemtap to test this patch,the migration will success. But I found the dataplane will start again after migration start. the virtio_blk_handle_output will start dataplane. Hi Yin Yin, Thank you for testing the patch. It is not clear to me whether you encountered a problem or not. It is expected that the destination will start the dataplane thread. Was there a crash or some reason why you posted these traces? Stefan
Re: [Qemu-devel] Emulating mips
Hello, Am 23.07.2013 07:16, schrieb Renich Bon Ciric: I'm trying to run some rom file I got from a client. It's a sc2005 processor; supposedly compatible with 4k. Anyway, I do this: qemu-system-mips -M mips -pflash 301-3100\ -\ user\ specified\ -\ Full.bin -serial stdio The processor goes to 100% but I see nothing, not in the serial console nor in the window (monitor, maybe?) You didn't mention which version you're using, so try latest stable 1.5 or qemu.git. You need to know what board the ROM file was for, you can view the list with -M '?' - if it's none of those, chances are you need to implement the machine first. Note that there's qemu-system-mips and qemu-system-mipsel depending on endianness, and you can usually override the CPU via -cpu, again see -cpu '?' for a list. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] watchdog: Remove break after exit
Am 23.07.2013 06:46, schrieb Stefan Weil: This was dead code. Signed-off-by: Stefan Weil s...@weilnetz.de Reviewed-by: Andreas Färber afaer...@suse.de Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
Am 23.07.2013 09:07, schrieb Michael S. Tsirkin: On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote: For VMState I believe the real follow-up fix would be mst defining a central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice. Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE() in the first place? The real fix is savevm/loadvm taking into account the class hierarchy. That's not helping, unless you write a patch to show what you mean and how that is going to be migration-compatible. Does your not answering the question mean you don't know? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH qom-cpu] HACKING: Document vaddr type usage
Am 23.07.2013 08:33, schrieb Paolo Bonzini: Il 22/07/2013 19:27, Peter Maydell ha scritto: On 22 July 2013 17:36, Andreas Färber afaer...@suse.de wrote: Signed-off-by: Andreas Färber afaer...@suse.de --- HACKING | 1 + 1 file changed, 1 insertion(+) diff --git a/HACKING b/HACKING index e73ac79..d9dbb46 100644 --- a/HACKING +++ b/HACKING @@ -42,6 +42,7 @@ ram_addr_t. Use target_ulong (or abi_ulong) for CPU virtual addresses, however devices should not need to use target_ulong. +Use vaddr for CPU virtual addresses in target-independent code. Here's my suggestion for this paragraph (ie to replace both the Use target_ulong... and Use vaddr sentences above): ===begin=== For CPU virtual addresses there are several possible types. vaddr is the best type to use to hold a CPU virtual address in target-independent code, including most devices. It is guaranteed to be large enough to hold a virtual address for any target, and it does not change size from target to target. It is always unsigned. target_ulong is a type the size of a virtual address on the CPU; this means it may be 32 or 64 bits depending on which target is being built. It should therefore be used only in target specific code, and in some performance-critical built-per-target core code such as the TLB code. There is also a signed version, target_long. abi_ulong is for the *-user targets, and represents a type the size of 'void *' in that target's ABI. (This may not be the same as the size of a full CPU virtual address in the case of target ABIs which use 32 bit pointers on 64 bit CPUs, like sparc32plus.) Definitions of structures that must match the target's ABI must use this type for anything that on the target is defined to be an 'unsigned long' or a pointer type. There is also a signed version, abi_long. ===endit=== (cc'ing Paolo to check I didn't mangle the abi_ulong/target_ulong distinction.) Yes, it's fine. You might add that abi_short, abi_int, etc. also exist, and that they have the same alignment as short/int/etc in the target ABI. There is one nit: tn fact abi_ulong has the size of 'long'---which is the same as 'void *', but only because our *-user targets are all ILP32 or LP64. A hypotectical windows-user target would make abi_ullong the size of 'void *'. Given the number of people that will read HACKING to that detail level, I hope you can send a follow-up patch to clarify that once merged. :) Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [qom-cpu PATCH 2/2] i386: disable PMU passthrough mode by default
Il 22/07/2013 21:25, Eduardo Habkost ha scritto: Bug description: QEMU currently gets all bits from GET_SUPPORTED_CPUID for CPUID leaf 0xA and passes them directly to the guest. This makes the guest ABI depend on host kernel and host CPU capabilities, and breaks live migration if we migrate between host with different capabilities (e.g. different number of PMU counters). This patch adds a pmu-passthrough property to X86CPU, and set it to true only on -cpu host, or on pc-*-1.5 and older machine-types. Can we just call the property pmu? It doesn't have to be passthough. Later we can support setting the right values for leaf 0xA. This way migration will work even for -cpu other than -cpu host, and the same option will work. Paolo Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- include/hw/i386/pc.h | 4 target-i386/cpu-qom.h | 7 +++ target-i386/cpu.c | 11 ++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 7fb97b0..3cea83f 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -235,6 +235,10 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); .driver = virtio-net-pci,\ .property = any_layout,\ .value= off,\ +},{\ +.driver = TYPE_X86_CPU,\ +.property = pmu-passthrough,\ +.value = on,\ } #define PC_COMPAT_1_4 \ diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h index 7e55e5f..b505a45 100644 --- a/target-i386/cpu-qom.h +++ b/target-i386/cpu-qom.h @@ -68,6 +68,13 @@ typedef struct X86CPU { /* Features that were filtered out because of missing host capabilities */ uint32_t filtered_features[FEATURE_WORDS]; + +/* Pass all PMU CPUID bits to the guest directly from GET_SUPPORTED_CPUID. + * This can't be enabled by default because it breaks live-migration, + * as it makes the guest ABI change depending on host CPU/kernel + * capabilities. + */ +bool pmu_passthrough; } X86CPU; static inline X86CPU *x86_env_get_cpu(CPUX86State *env) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 41c81af..e192f63 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1475,17 +1475,25 @@ static void x86_cpu_get_feature_words(Object *obj, Visitor *v, void *opaque, error_propagate(errp, err); } +static Property cpu_x86_properties[] = { +DEFINE_PROP_BOOL(pmu-passthrough, X86CPU, pmu_passthrough, false), +DEFINE_PROP_END_OF_LIST(), +}; + static int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def, const char *name) { x86_def_t *def; int i; +Error *err = NULL; if (name == NULL) { return -1; } if (kvm_enabled() strcmp(name, host) == 0) { kvm_cpu_fill_host(x86_cpu_def); +object_property_set_bool(OBJECT(cpu), true, pmu-passthrough, err); +assert_no_error(err); return 0; } @@ -2017,7 +2025,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 0xA: /* Architectural Performance Monitoring Leaf */ -if (kvm_enabled()) { +if (kvm_enabled() cpu-pmu_passthrough) { KVMState *s = cs-kvm_state; *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX); @@ -2516,6 +2524,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, void *data) xcc-parent_realize = dc-realize; dc-realize = x86_cpu_realizefn; dc-bus_type = TYPE_ICC_BUS; +dc-props = cpu_x86_properties; xcc-parent_reset = cc-reset; cc-reset = x86_cpu_reset;
Re: [Qemu-devel] [PATCH for-1.6 01/11] ignore SIGPIPE in qemu-img and qemu-io
Il 23/07/2013 10:30, MORITA Kazutaka ha scritto: This prevents the tools from being stopped when they write data to a closed connection in the other side. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- qemu-img.c | 4 qemu-io.c | 4 2 files changed, 8 insertions(+) diff --git a/qemu-img.c b/qemu-img.c index c55ca5c..919d464 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -2319,6 +2319,10 @@ int main(int argc, char **argv) const img_cmd_t *cmd; const char *cmdname; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + error_set_progname(argv[0]); qemu_init_main_loop(); diff --git a/qemu-io.c b/qemu-io.c index cb9def5..d54dc86 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -335,6 +335,10 @@ int main(int argc, char **argv) int opt_index = 0; int flags = BDRV_O_UNMAP; +#ifdef CONFIG_POSIX +signal(SIGPIPE, SIG_IGN); +#endif + progname = basename(argv[0]); while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) { Reviewed-by: Paolo Bonzini pbonz...@redhat.com and adding qemu-stable for this one.
Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on
Hi, Stefan: during the migration, the source, not the destination, will start dataplane again I think the process of migration with dataplane as follows: 1. migration begin to start 2. the migration source stop the dataplane 3. do migration ... 4. migration completed, the destination start the dataplane. when migration start, the source dataplane should already stopped, and not start again, if there is no cancel or abort. But the trace show that, in step 3 above, the source dataplane will be start by virtio_blk_handle_output. I'm afraid of some inconsistent will happen there.Is it right? There is no crash found, I just use this trace to understand the flow of dataplane migration. Yin Yin yin@cs2c.com.cn 在 2013-7-23,下午4:51,Stefan Hajnoczi stefa...@redhat.com 写道: On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote: I use systemtap to test this patch,the migration will success. But I found the dataplane will start again after migration start. the virtio_blk_handle_output will start dataplane. Hi Yin Yin, Thank you for testing the patch. It is not clear to me whether you encountered a problem or not. It is expected that the destination will start the dataplane thread. Was there a crash or some reason why you posted these traces? Stefan
Re: [Qemu-devel] [RFC PATCH] vga: Start supporting resolution not multiple of 16 correctly.
Il 18/06/2013 12:17, Frediano Ziglio ha scritto: Modern notebook support 1366x768 resolution. The resolution width is not multiple of 16 causing some problems. QEMU VGA emulation requires width resolution to be multiple of 8. VNC implementation requires width resolution to be multiple of 16. Signed-off-by: Frediano Ziglio frediano.zig...@citrix.com mailto:frediano.zig...@citrix.com Tested-by: Fabio Fantoni fabio.fant...@m2r.biz I tested it for a long time with spice on xen (because qxl will be fully working only after adding SSE support on hvm domUs). It works, I think it is good to add this and the respective vgabios patch on upstream. --- hw/display/vga.c |2 +- ui/vnc.c | 34 +++--- 2 files changed, 20 insertions(+), 16 deletions(-) Updates: - rebased - fixed style problems - fixed typos in comment Attached patch for last vgabios in order to get some new resolutions. Still had no time to test deeply. diff --git a/hw/display/vga.c b/hw/display/vga.c index 21a108d..0053b0f 100644 --- a/hw/display/vga.c +++ b/hw/display/vga.c @@ -648,7 +648,7 @@ void vbe_ioport_write_data(void *opaque, uint32_t addr, uint32_t val) } break; case VBE_DISPI_INDEX_XRES: -if ((val = VBE_DISPI_MAX_XRES) ((val 7) == 0)) { +if ((val = VBE_DISPI_MAX_XRES) ((val 1) == 0)) { s-vbe_regs[s-vbe_index] = val; } break; diff --git a/ui/vnc.c b/ui/vnc.c index dfc7459..2a2bb90 100644 --- a/ui/vnc.c +++ b/ui/vnc.c @@ -911,26 +911,30 @@ static int vnc_update_client(VncState *vs, int has_dirty) for (y = 0; y height; y++) { int x; int last_x = -1; -for (x = 0; x width / 16; x++) { -if (test_and_clear_bit(x, vs-dirty[y])) { +for (x = 0; x width; x += 16) { +if (test_and_clear_bit(x/16, vs-dirty[y])) { if (last_x == -1) { last_x = x; } } else { if (last_x != -1) { -int h = find_and_clear_dirty_height(vs, y, last_x, x, - height); +int h = find_and_clear_dirty_height(vs, y, last_x/16, + x/16, height); -n += vnc_job_add_rect(job, last_x * 16, y, - (x - last_x) * 16, h); +n += vnc_job_add_rect(job, last_x, y, + (x - last_x), h); } last_x = -1; } } if (last_x != -1) { -int h = find_and_clear_dirty_height(vs, y, last_x, x, height); -n += vnc_job_add_rect(job, last_x * 16, y, - (x - last_x) * 16, h); +int h = find_and_clear_dirty_height(vs, y, last_x/16, x/16, + height); +if (x width) { +x = width; +} +n += vnc_job_add_rect(job, last_x, y, + (x - last_x), h); } } @@ -1861,7 +1865,7 @@ static void framebuffer_update_request(VncState *vs, int incremental, int w, int h) { int i; -const size_t width = surface_width(vs-vd-ds) / 16; +const size_t width = (surface_width(vs-vd-ds)+15) / 16; const size_t height = surface_height(vs-vd-ds); if (y_position height) { @@ -2686,10 +2690,6 @@ static int vnc_refresh_server_surface(VncDisplay *vd) * Check and copy modified bits from guest to server surface. * Update server dirty map. */ -cmp_bytes = 64; -if (cmp_bytes vnc_server_fb_stride(vd)) { -cmp_bytes = vnc_server_fb_stride(vd); -} if (vd-guest.format != VNC_SERVER_FB_FORMAT) { int width = pixman_image_get_width(vd-server); tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, width); @@ -2710,8 +2710,12 @@ static int vnc_refresh_server_surface(VncDisplay *vd) } server_ptr = server_row; -for (x = 0; x + 15 width; +cmp_bytes = 64; +for (x = 0; x width; x += 16, guest_ptr += cmp_bytes, server_ptr += cmp_bytes) { +if (width - x 16) { +cmp_bytes = 4 * (width - x); +} if (!test_and_clear_bit((x / 16), vd-guest.dirty[y])) continue; if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0) -- 1.7.10.4 2013/5/10 Andreas Färber afaer...@suse.de mailto:afaer...@suse.de Am 15.03.2013 19:14, schrieb Frediano Ziglio: Modern notebook support 136x768 resolution. The resolution width is 1366? not multiple of 16 causing some problems. a multiple? (me not a native English speaker)
Re: [Qemu-devel] [PATCH 00/28] Memory API for 1.6: fix I/O port endianness mess
Il 22/07/2013 22:16, Hervé Poussineau ha scritto: PReP is an exception, but I think it could be rewritten to use an IOMMU memory region. PReP PCI I/O area is located at 0x8000, up to 0xbf7f (in main memory space region), while ISA I/O area is at 0x8000, up to 0x8000 (size=64KB) or up to 0x807ff for non-contiguous mode. The IOMMU memory region would let you implement non-contiguous mode without calling cpu_inb/cpu_outb. However, as they are overlapped, some strange things can happen. For example, IBM 40p firmware configures the PCI SCSI bar at 0x2000 (ie 0xa000 in main memory), while Linux sets bar to 0x1000 (ie 0x80001000 in main memory), ie also in ISA I/O space. If BARs have a lower priority than assigned ISA I/O space, this should just work if you use an alias memory region for ISA I/O space. Gaps in the ISA I/O space will let you see through the ISA I/O space and access BARs. If BARs have a higher priority, you need to set the priority accordingly for the ISA I/O space alias (using memory_region_add_subregion_overlap), but that's it. By the way, i82378.c can also use memory_region_init_alias to initialize the memory regions that are passed to pci_register_bar. I didn't do this in this series. I don't know exactly what you mean by an IOMMU memory region, but how would you modelize it, so that 0x80001000 and 0xa000 accesses are redirected to PCI SCSI card, while 0x83f8 redirects (for example) to an ISA serial port? See above. Paolo If you create a new memory region for ISA I/O space, and you redirect all accesses from 0x8000-0x8000 to this new address space, 0x80001000 won't work to access the SCSI I/O bar (located in the PCI I/O address space).
Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on
Hi, Am 23.07.2013 11:19, schrieb yinyin: Hi, Stefan: during the migration, the source, not the destination, will start dataplane again I think the process of migration with dataplane as follows: 1. migration begin to start 2. the migration source stop the dataplane 3. do migration ... 4. migration completed, the destination start the dataplane. I can't speak for the dataplane, but in general the source guest is expected to continue working during live migration (that's the live part) until it has been fully transferred to the destination. HTH, Andreas when migration start, the source dataplane should already stopped, and not start again, if there is no cancel or abort. But the trace show that, in step 3 above, the source dataplane will be start by virtio_blk_handle_output. I'm afraid of some inconsistent will happen there.Is it right? There is no crash found, I just use this trace to understand the flow of dataplane migration. Yin Yin yin@cs2c.com.cn 在 2013-7-23,下午4:51,Stefan Hajnoczi stefa...@redhat.com 写道: On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote: I use systemtap to test this patch,the migration will success. But I found the dataplane will start again after migration start. the virtio_blk_handle_output will start dataplane. Hi Yin Yin, Thank you for testing the patch. It is not clear to me whether you encountered a problem or not. It is expected that the destination will start the dataplane thread. Was there a crash or some reason why you posted these traces? Stefan -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [PATCH v2 1/7] AARCH64: Add A57 CPU to default AArch64 configuration and enable KVM
From: Mian M. Hamayun m.hama...@virtualopensystems.com Introduce the A57 cpu to the default AArch64 configuration and enable KVM for 64-bit guests only. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com --- configure |2 +- default-configs/aarch64-softmmu.mak |3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 2f83771..021c1dd 100755 --- a/configure +++ b/configure @@ -4293,7 +4293,7 @@ case $target_name in *) esac case $target_name in - arm|i386|x86_64|ppcemb|ppc|ppc64|s390x) + arm|aarch64|i386|x86_64|ppcemb|ppc|ppc64|s390x) # Make sure the target and host cpus are compatible if test $kvm = yes -a $target_softmmu = yes -a \ \( $target_name = $cpu -o \ diff --git a/default-configs/aarch64-softmmu.mak b/default-configs/aarch64-softmmu.mak index 27cbe3d..00c8bdd 100644 --- a/default-configs/aarch64-softmmu.mak +++ b/default-configs/aarch64-softmmu.mak @@ -1,4 +1,4 @@ -# Default configuration for arm-softmmu +# Default configuration for aarch64-softmmu include pci.mak include usb.mak @@ -37,6 +37,7 @@ CONFIG_USB_MUSB=y CONFIG_ARM9MPCORE=y CONFIG_ARM11MPCORE=y CONFIG_ARM15MPCORE=y +CONFIG_ARM57MPCORE=y CONFIG_ARM_GIC=y CONFIG_ARM_GIC_KVM=$(CONFIG_KVM) -- 1.7.9.5
[Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls
From: Mian M. Hamayun m.hama...@virtualopensystems.com The memory region init calls require an additional parent parameter, so introduce a null parent parameter to make it happy. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com --- hw/arm/virt.c |2 +- hw/cpu/a57mpcore.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 97712d7..8a2bdc7 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -407,7 +407,7 @@ static void machvirt_init(QEMUMachineInitArgs *args) } fdt_add_cpu_nodes(virt_fdt, mi, smp_cpus); -memory_region_init_ram(ram, mach-virt.ram, ram_size); +memory_region_init_ram(ram, NULL, mach-virt.ram, ram_size); vmstate_register_ram_global(ram); memory_region_add_subregion(sysmem, mi-mem_base, ram); diff --git a/hw/cpu/a57mpcore.c b/hw/cpu/a57mpcore.c index 2923a2a..1ab6dc0 100644 --- a/hw/cpu/a57mpcore.c +++ b/hw/cpu/a57mpcore.c @@ -70,7 +70,7 @@ static int a57mp_priv_init(SysBusDevice *dev) * 0x5000-0x5fff -- GIC virtual interface control (not modelled) * 0x6000-0x7fff -- GIC virtual CPU interface (not modelled) */ -memory_region_init(s-container, a57mp-priv-container, 0x8000); +memory_region_init(s-container, NULL, a57mp-priv-container, 0x8000); memory_region_add_subregion(s-container, 0x1000, sysbus_mmio_get_region(busdev, 0)); memory_region_add_subregion(s-container, 0x2000, -- 1.7.9.5
[Qemu-devel] [PATCH v2 0/7] AARCH64 support on machvirt machine model using KVM
From: Mian M. Hamayun m.hama...@virtualopensystems.com This is the v2 of patch series that implements KVM support in QEMU for the ARMv8 Cortex A57 CPU. It depends on the previously submitted AArch64 preparation patch series v5 and machvirt patches, and uses the already available KVM in-kernel GIC support. Current implementation for 64-bit guests only, and 32-bit guest support will be introduced in near future. As a reference, KVM Tool and the AArch64 bootwrapper were used, as well as public documentation from ARM. The following work has been tested with SMP capabilities, under ARMv8 Fast and Foundation Models (Open Embedded userspace with an emulated MMC). The v1 of this patch series related to AArch64 CPU model for Versatile Express was sponsored by Huawei, and developed in collaboration between Huawei Technologies Duesseldorf GmbH - European Research Center Munich (ERC) and Virtual Open Systems. A working tree of this implementation is available on the kvm-aarch64-v2 branch of the following github repository. https://github.com/virtualopensystems/qemu/tree/kvm-aarch64-v2 Summary of Changes: Changes v1 - v2 * Based on AArch64 Preparation Patchset V5 and machvirt patches. * Implemented for Machvirt Machine Model. * Architecture-specific CPU initialization code improved. Removed hardcoding from register set/get loops and introduced CPU target type array to find appropriate ARMv8 CPU type supported by KVM. * Disable the PSCI method in case of AArch64 and use the spin-table method instead for booting secondary CPUs. * 32-bit guest support still missing v1 * Based on AArch64 Preparation Patchset V4 * Implemented for Versatile Express Machine Model * Support for SMP using bootcode injection * No 32-bit guest support Alexander Spyridakis (1): AARCH64: Add SMP support for aarch64 processors Mian M. Hamayun (6): AARCH64: Add A57 CPU to default AArch64 configuration and enable KVM Add the additional parent parameter to memory region init calls AARCH64: Add aarch64 CPU initialization, get and put registers support AARCH64: Add boot support for aarch64 processor AARCH64: Disable the non-aarch64 specific reset code AARCH64: Use the spin-table method for booting secondary processors in machvirt configure |2 +- default-configs/aarch64-softmmu.mak |3 +- hw/arm/boot.c | 62 +++-- hw/arm/virt.c | 16 - hw/cpu/a57mpcore.c |2 +- linux-headers/linux/kvm.h |1 + target-arm/kvm.c| 127 +++ 7 files changed, 201 insertions(+), 12 deletions(-) -- 1.7.9.5
[Qemu-devel] [PATCH v2 6/7] AARCH64: Add SMP support for aarch64 processors
From: Alexander Spyridakis a.spyrida...@virtualopensystems.com AArch64 uses a cpu-release-addr memory location (defined in the dts) as a way to inform secondary CPUs where to jump to and enter their holding pen. Inject a very simple bootloader that polls this memory location, until the primary CPU sets it to the right address. Signed-off-by: Alexander Spyridakis a.spyrida...@virtualopensystems.com --- hw/arm/boot.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index b9b0beb..efbd984 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -17,6 +17,8 @@ #include sysemu/device_tree.h #include qemu/config-file.h +#define DSB_INSN 0xf57ff04f +#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */ #define KERNEL_ARGS_ADDR 0x100 #ifdef TARGET_AARCH64 @@ -40,6 +42,16 @@ static uint32_t bootloader[] = { 0x /* .word @Board ID Higher 32-bits -- Placeholder */ }; +static uint32_t smpboot[] = { +0x18c5, /* ldr w5, =mbox_value - mbox value for secondary CPUs */ +0xf94000a4, /* 1: ldr x4, [x5] - Read address to jump to */ +0xb4e4, /* cbz x4, 1b - Check if mbox value is zero, if yes retry */ +0xd61f0080, /* br x4 - Branch to given address */ +0x0,/* padding word */ +0x0,/* gic_cpu_if_addr */ +0x8000fff8 /* mbox_value: default mbox value (aka cpu_release_addr) */ +}; + #else #define KERNEL_LOAD_ADDR0x0001 #define KERNEL_BOARDID_INDEX4 @@ -56,7 +68,6 @@ static uint32_t bootloader[] = { 0, /* Address of kernel args. Set by integratorcp_init. */ 0 /* Kernel entry point. Set by integratorcp_init. */ }; -#endif /* Handling for secondary CPU boot in a multicore system. * Unlike the uniprocessor/primary CPU boot, this is platform @@ -72,8 +83,6 @@ static uint32_t bootloader[] = { * for an interprocessor interrupt and polling a configurable * location for the kernel secondary CPU entry point. */ -#define DSB_INSN 0xf57ff04f -#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */ static uint32_t smpboot[] = { 0xe59f2028, /* ldr r2, gic_cpu_if */ @@ -91,6 +100,7 @@ static uint32_t smpboot[] = { 0, /* gic_cpu_if: base address of GIC CPU interface */ 0 /* bootreg: Boot register address is held here */ }; +#endif static void default_write_secondary(ARMCPU *cpu, const struct arm_boot_info *info) @@ -115,8 +125,12 @@ static void default_reset_secondary(ARMCPU *cpu, { CPUARMState *env = cpu-env; +#ifdef TARGET_AARCH64 +env-pc = info-smp_loader_start; +#else stl_phys_notdirty(info-smp_bootreg_addr, 0); env-regs[15] = info-smp_loader_start; +#endif } #define WRITE_WORD(p, value) do { \ -- 1.7.9.5
[Qemu-devel] [PATCH v2 4/7] AARCH64: Add boot support for aarch64 processor
From: Mian M. Hamayun m.hama...@virtualopensystems.com This version supports booting of a single Aarch64 CPU by setting appropriate registers. The bootloader includes placehoders for Board-ID that are used to implementing uniform indexing across different bootloaders. The same macro names are used with different values when compiling for different processors. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com Conflicts: hw/arm/boot.c --- hw/arm/boot.c | 44 +++- 1 file changed, 39 insertions(+), 5 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 7cca2b3..b9b0beb 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -18,7 +18,33 @@ #include qemu/config-file.h #define KERNEL_ARGS_ADDR 0x100 -#define KERNEL_LOAD_ADDR 0x0001 + +#ifdef TARGET_AARCH64 +#define KERNEL_LOAD_ADDR0x0008 +#define KERNEL_ARGS_INDEX 6 +#define KERNEL_ENTRY_INDEX 8 +#define KERNEL_BOARDID_INDEX10 + +static uint32_t bootloader[] = { +0x58c0,/* ldr x0, 18 ; Load the lower 32-bits of DTB */ +0xaa1f03e1,/* mov x1, xzr */ +0xaa1f03e2,/* mov x2, xzr */ +0xaa1f03e3,/* mov x3, xzr */ +0x5884,/* ldr x4, 20 ; Load the lower 32-bits of kernel entry */ +0xd61f0080,/* br x4 ; Jump to the kernel entry point */ +0x,/* .word @DTB Lower 32-bits */ +0x,/* .word @DTB Higher 32-bits */ +0x,/* .word @Kernel Entry Lower 32-bits */ +0x,/* .word @Kernel Entry Higher 32-bits */ +0x,/* .word @Board ID Lower 32-bits -- Placeholder */ +0x /* .word @Board ID Higher 32-bits -- Placeholder */ +}; + +#else +#define KERNEL_LOAD_ADDR0x0001 +#define KERNEL_BOARDID_INDEX4 +#define KERNEL_ARGS_INDEX 5 +#define KERNEL_ENTRY_INDEX 6 /* The worlds second smallest bootloader. Set r0-r2, then jump to kernel. */ static uint32_t bootloader[] = { @@ -30,6 +56,7 @@ static uint32_t bootloader[] = { 0, /* Address of kernel args. Set by integratorcp_init. */ 0 /* Kernel entry point. Set by integratorcp_init. */ }; +#endif /* Handling for secondary CPU boot in a multicore system. * Unlike the uniprocessor/primary CPU boot, this is platform @@ -341,8 +368,15 @@ static void do_cpu_reset(void *opaque) env-regs[15] = info-entry 0xfffe; env-thumb = info-entry 1; } else { +#ifdef TARGET_AARCH64 +env-pstate = PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | PSR_MODE_EL1h; +#endif if (env == first_cpu) { +#ifdef TARGET_AARCH64 +env-pc = info-loader_start; +#else env-regs[15] = info-loader_start; +#endif if (!info-dtb_filename) { if (old_param) { set_kernel_args_old(info); @@ -447,7 +481,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) } info-initrd_size = initrd_size; -bootloader[4] = info-board_id; +bootloader[KERNEL_BOARDID_INDEX] = info-board_id; /* for device tree boot, we pass the DTB directly in r2. Otherwise * we point to the kernel args. @@ -462,9 +496,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) if (load_dtb(dtb_start, info)) { exit(1); } -bootloader[5] = dtb_start; +bootloader[KERNEL_ARGS_INDEX] = dtb_start; } else { -bootloader[5] = info-loader_start + KERNEL_ARGS_ADDR; +bootloader[KERNEL_ARGS_INDEX] = info-loader_start + KERNEL_ARGS_ADDR; if (info-ram_size = (1ULL 32)) { fprintf(stderr, qemu: RAM size must be less than 4GB to boot Linux kernel using ATAGS (try passing a device tree @@ -472,7 +506,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) exit(1); } } -bootloader[6] = entry; +bootloader[KERNEL_ENTRY_INDEX] = entry; for (n = 0; n sizeof(bootloader) / 4; n++) { bootloader[n] = tswap32(bootloader[n]); } -- 1.7.9.5
Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote: Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference) to BlockDriverState, since in_use mechanism cannot provide proper management of lifecycle when a BDS is referenced in multiple places (e.g. pointed to by another bs's backing_hd while also used as a block job device, in the use case of image fleecing). The original in_use case is considered a hard reference in this patch, where the bs is busy and should not be used in other tasks that require a hard reference. (However the interface doesn't force this, caller still need to call bdrv_in_use() to check by itself.). A soft reference is implemented but not used yet. It will be used in following patches to manage the lifecycle together with hard reference. If bdrv_ref() is called on a BDS, it must be released by exactly the same numbers of bdrv_unref() with the same soft/hard type, and never call bdrv_delete() directly. If the BDS is only used locally (unnamed), bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete(). It is risky to keep bdrv_delete() public. I suggest replacing bdrv_delete() callers with bdrv_unref() and then making bdrv_delete() static in block.c. This way it is impossible to make the mistake of calling bdrv_delete() on a BDS which has refcnt 1. I don't really understand this patch. There are now two separate refcounts. They must both reach 0 for deletion to occur. I think you plan to treat the hard refcount like the in_use counter (there should only be 0 or 1 refs) but you don't enforce it. It seems cleaner to keep in_use separate: let in_use callers take a refcount and also set in_use.
[Qemu-devel] [PATCH v2 5/7] AARCH64: Disable the non-aarch64 specific reset code
From: Mian M. Hamayun m.hama...@virtualopensystems.com This commit disables the co-processor registers reset code for KVM, when compiling for AArch64 cpus. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com --- target-arm/kvm.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/target-arm/kvm.c b/target-arm/kvm.c index c96b871..5909d75 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -700,6 +700,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) void kvm_arch_reset_vcpu(CPUState *cs) { +#ifndef TARGET_AARCH64 /* Feed the kernel back its initial register state */ ARMCPU *cpu = ARM_CPU(cs); @@ -709,6 +710,7 @@ void kvm_arch_reset_vcpu(CPUState *cs) if (!write_list_to_kvmstate(cpu)) { abort(); } +#endif } bool kvm_arch_stop_on_emulation_error(CPUState *cs) -- 1.7.9.5
[Qemu-devel] [PATCH v2 3/7] AARCH64: Add aarch64 CPU initialization, get and put registers support
From: Mian M. Hamayun m.hama...@virtualopensystems.com The cpu init function tries to initialize with all possible cpu types, as KVM does not provide a means to detect the real cpu type and simply refuses to initialize on cpu type mis-match. By using the loop based init function, we avoid the need to modify code if the underlying platform is different, such as Fast Models instead of Foundation Models. Get and Put Registers deal with the basic state of Aarch64 CPUs for the moment. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com Conflicts: target-arm/kvm.c Conflicts: target-arm/cpu.c --- linux-headers/linux/kvm.h |1 + target-arm/kvm.c | 125 + 2 files changed, 126 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index c614070..4df5292 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -783,6 +783,7 @@ struct kvm_dirty_tlb { #define KVM_REG_IA64 0x3000ULL #define KVM_REG_ARM0x4000ULL #define KVM_REG_S390 0x5000ULL +#define KVM_REG_ARM64 0x6000ULL #define KVM_REG_SIZE_SHIFT 52 #define KVM_REG_SIZE_MASK 0x00f0ULL diff --git a/target-arm/kvm.c b/target-arm/kvm.c index b92e00d..c96b871 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -32,6 +32,11 @@ #error mismatch between cpu.h and KVM header definitions #endif +#ifdef TARGET_AARCH64 +#define AARCH64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ + KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) +#endif + const KVMCapabilityInfo kvm_arch_required_capabilities[] = { KVM_CAP_LAST_INFO }; @@ -50,6 +55,33 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu) return cpu-cpu_index; } +#ifdef TARGET_AARCH64 +static uint32_t kvm_arm_targets[KVM_ARM_NUM_TARGETS] = { +KVM_ARM_TARGET_AEM_V8, +KVM_ARM_TARGET_FOUNDATION_V8, +KVM_ARM_TARGET_CORTEX_A57 +}; + +int kvm_arch_init_vcpu(CPUState *cs) +{ +struct kvm_vcpu_init init; +int ret, i; + +memset(init.features, 0, sizeof(init.features)); +/* Find an appropriate target CPU type. + * KVM does not provide means to detect the host CPU type on aarch64, + * and simply refuses to initialize, if the CPU type mis-matches; + * so we try each possible CPU type on aarch64 before giving up! */ +for (i = 0; i KVM_ARM_NUM_TARGETS; ++i) { +init.target = kvm_arm_targets[i]; +ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, init); +if (!ret) +break; +} + +return ret; +} +#else static bool reg_syncs_via_tuple_list(uint64_t regidx) { /* Return true if the regidx is a register we should synchronize @@ -173,6 +205,7 @@ out: g_free(rlp); return ret; } +#endif /* We track all the KVM devices which need their memory addresses * passing to the kernel in a list of these structures. @@ -339,6 +372,7 @@ typedef struct Reg { int offset; } Reg; +#ifndef TARGET_AARCH64 #define COREREG(KERNELNAME, QEMUFIELD) \ {\ KVM_REG_ARM | KVM_REG_SIZE_U32 | \ @@ -402,7 +436,52 @@ static const Reg regs[] = { VFPSYSREG(FPINST), VFPSYSREG(FPINST2), }; +#endif +#ifdef TARGET_AARCH64 +int kvm_arch_put_registers(CPUState *cs, int level) +{ +struct kvm_one_reg reg; +int i; +int ret; + +ARMCPU *cpu = ARM_CPU(cs); +CPUARMState *env = cpu-env; + +for (i = 0; i ARRAY_SIZE(env-xregs); i++) { +reg.id = AARCH64_CORE_REG(regs.regs[i]); +reg.addr = (uintptr_t) env-xregs[i]; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg); +if (ret) { +return ret; +} +} + +reg.id = AARCH64_CORE_REG(regs.sp); +reg.addr = (uintptr_t) env-xregs[31]; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg); +if (ret) { +return ret; +} + +reg.id = AARCH64_CORE_REG(regs.pstate); +reg.addr = (uintptr_t) env-pstate; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg); +if (ret) { +return ret; +} + +reg.id = AARCH64_CORE_REG(regs.pc); +reg.addr = (uintptr_t) env-pc; +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg); +if (ret) { +return ret; +} + +/* TODO: Set Rest of Registers */ +return ret; +} +#else int kvm_arch_put_registers(CPUState *cs, int level) { ARMCPU *cpu = ARM_CPU(cs); @@ -488,7 +567,52 @@ int kvm_arch_put_registers(CPUState *cs, int level) return ret; } +#endif +#ifdef TARGET_AARCH64 +int kvm_arch_get_registers(CPUState *cs) +{ +struct kvm_one_reg reg; +int i; +int ret; + +ARMCPU *cpu = ARM_CPU(cs); +CPUARMState *env = cpu-env; + +for (i = 0; i ARRAY_SIZE(env-xregs); i++) { +reg.id = AARCH64_CORE_REG(regs.regs[i]); +reg.addr = (uintptr_t)
[Qemu-devel] [PATCH v2 7/7] AARCH64: Use the spin-table method for booting secondary processors in machvirt
From: Mian M. Hamayun m.hama...@virtualopensystems.com As the SMP bootloader uses a spin-table to wait for the cpu_release_addr, we disable the PSCI method for AArch64 in machvirt and use spin-table instead. The CPU_RELEASE_OFFSET is introduced in machvirt and is to calculate the cpu_release_addr by addition of this value to the memory base address, and this value is updated in the smp bootloader using the smp_bootreg_addr board info parameter. Tested-by: Alexander Spyridakis a.spyrida...@virtualopensystems.com Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com --- hw/arm/virt.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 8a2bdc7..44ab59d 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -52,6 +52,7 @@ #define IO_LEN 0x000f #if defined(TARGET_AARCH64) +#define CPU_RELEASE_OFFSET 0xfff8 #define DEFAULT_CPU_MODEL cortex-a57 #elif defined(TARGET_ARM) #define DEFAULT_CPU_MODEL cortex-a15 @@ -170,7 +171,7 @@ static void *initial_fdt(struct machine_info *mi) qemu_devtree_setprop_cell(fdt, /soc, #interrupt-cells, 0x1); /* No PSCI for TCG yet */ -#ifdef CONFIG_KVM +#if defined (CONFIG_KVM) !defined(TARGET_AARCH64) if (kvm_enabled()) { qemu_devtree_add_subnode(fdt, /psci); qemu_devtree_setprop_string(fdt, /psci, compatible, arm,psci); @@ -234,7 +235,14 @@ static void fdt_add_cpu_nodes(void *fdt, struct machine_info *mi, int smp_cpus) mi-cpu_compatible); if (smp_cpus 1) { +#if defined(TARGET_AARCH64) +qemu_devtree_setprop_string(fdt, cpu_name, enable-method, +spin-table); +qemu_devtree_setprop_u64(fdt, cpu_name, cpu-release-addr, +(mi-mem_base + CPU_RELEASE_OFFSET)); +#else qemu_devtree_setprop_string(fdt, cpu_name, enable-method, psci); +#endif } qemu_devtree_setprop_cell(fdt, cpu_name, reg, cpu); @@ -437,6 +445,10 @@ static void machvirt_init(QEMUMachineInitArgs *args) machvirt_binfo.nb_cpus = smp_cpus; machvirt_binfo.board_id = -1; machvirt_binfo.loader_start = mi-mem_base; +machvirt_binfo.smp_loader_start = mi-mem_base + 0x1000; +#if defined(TARGET_AARCH64) +machvirt_binfo.smp_bootreg_addr = mi-mem_base + CPU_RELEASE_OFFSET; +#endif machvirt_binfo.get_dtb = machvirt_dtb; arm_load_kernel(arm_env_get_cpu(first_cpu), machvirt_binfo); } -- 1.7.9.5
Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls
Am 23.07.2013 11:33, schrieb Mian M. Hamayun: From: Mian M. Hamayun m.hama...@virtualopensystems.com The memory region init calls require an additional parent parameter, so introduce a null parent parameter to make it happy. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com This is not OK for something labelled PATCH. Patch series need to be bisectable, not fixing up earlier patch series that have not been applied yet. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on
Hi, 在 2013-7-23,下午5:30,Andreas Färber afaer...@suse.de 写道: Hi, Am 23.07.2013 11:19, schrieb yinyin: Hi, Stefan: during the migration, the source, not the destination, will start dataplane again I think the process of migration with dataplane as follows: 1. migration begin to start 2. the migration source stop the dataplane 3. do migration ... 4. migration completed, the destination start the dataplane. I can't speak for the dataplane, but in general the source guest is expected to continue working during live migration (that's the live part) until it has been fully transferred to the destination. when dataplane stop, the source guest can continue work, but not use dataplane thread. HTH, Andreas when migration start, the source dataplane should already stopped, and not start again, if there is no cancel or abort. But the trace show that, in step 3 above, the source dataplane will be start by virtio_blk_handle_output. I'm afraid of some inconsistent will happen there.Is it right? There is no crash found, I just use this trace to understand the flow of dataplane migration. Yin Yin yin@cs2c.com.cn 在 2013-7-23,下午4:51,Stefan Hajnoczi stefa...@redhat.com 写道: On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote: I use systemtap to test this patch,the migration will success. But I found the dataplane will start again after migration start. the virtio_blk_handle_output will start dataplane. Hi Yin Yin, Thank you for testing the patch. It is not clear to me whether you encountered a problem or not. It is expected that the destination will start the dataplane thread. Was there a crash or some reason why you posted these traces? Stefan -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach
On Wed, Jul 17, 2013 at 05:42:09PM +0800, Fam Zheng wrote: Signed-off-by: Fam Zheng f...@redhat.com --- block.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block.c b/block.c index 7b46669..57a3876 100644 --- a/block.c +++ b/block.c @@ -1622,6 +1622,7 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev) return -EBUSY; } bs-dev = dev; +bdrv_ref(bs, false); bdrv_iostatus_reset(bs); return 0; } @@ -1639,6 +1640,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev) { assert(bs-dev == dev); bs-dev = NULL; +bdrv_unref(bs, false); bs-dev_ops = NULL; bs-dev_opaque = NULL; bs-buffer_alignment = 512; I'm not 100% sure about this. sd_init() and ide_init2_with_non_qdev_drives() call bdrv_attach_dev_nofail() but I don't see a bdrv_detach_dev() call. It may be necessary to audit the lifecycle of emulated storage controllers more closely before the refcounts work correctly.
Re: [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now
On Wed, Jul 17, 2013 at 05:42:10PM +0800, Fam Zheng wrote: Signed-off-by: Fam Zheng f...@redhat.com --- block-migration.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/block-migration.c b/block-migration.c index d558410..d14f4eb 100644 --- a/block-migration.c +++ b/block-migration.c @@ -320,7 +320,6 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs) bmds-completed_sectors = 0; bmds-shared_base = block_mig_state.shared_base; alloc_aio_bitmap(bmds); -drive_get_ref(drive_get_by_blockdev(bs)); bdrv_ref(bs, true); block_mig_state.total_sector_sum += sectors; @@ -558,7 +557,6 @@ static void blk_mig_cleanup(void) while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) { QSIMPLEQ_REMOVE_HEAD(block_mig_state.bmds_list, entry); bdrv_unref(bmds-bs, true); -drive_put_ref(drive_get_by_blockdev(bmds-bs)); g_free(bmds-aio_bitmap); g_free(bmds); } -- 1.8.3.2 The key information here is that block-migration.c does not actually use DriveInfo anywhere. Hence it's safe to drop this code since we really only cared about referencing BDS. I suggest including an explanation like this in the commit description. I had to audit the code to check whether the DriveInfo was used anywhere else.
Re: [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt
On Wed, Jul 17, 2013 at 05:42:11PM +0800, Fam Zheng wrote: We call bdrv_attach_dev when initializing whether or not bs is created locally, so call bdrv_detach_dev and let the refcnt handle the lifecycle. Signed-off-by: Fam Zheng f...@redhat.com --- hw/block/xen_disk.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 247f32f..ae17acc 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -910,12 +910,7 @@ static void blk_disconnect(struct XenDevice *xendev) struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev); if (blkdev-bs) { -if (!blkdev-dinfo) { -/* close/delete only if we created it ourself */ -bdrv_close(blkdev-bs); -bdrv_detach_dev(blkdev-bs, blkdev); -bdrv_delete(blkdev-bs); -} +bdrv_detach_dev(blkdev-bs, blkdev); blkdev-bs = NULL; } xen_be_unbind_evtchn(blkdev-xendev); This reminds me that bdrv_detach_dev() needs a comment documenting that it decrements the refcount; bs may be deleted when the function returns and must not be accessed anymore.
Re: [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target
On Wed, Jul 17, 2013 at 05:42:12PM +0800, Fam Zheng wrote: Signed-off-by: Fam Zheng f...@redhat.com --- block/backup.c | 3 ++- block/mirror.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) Should we update the blockjob.c in_use code instead of adding refcounting to specific block jobs? This ought to be handled generically for all block jobs.
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
On Tue, Jul 23, 2013 at 11:10:45AM +0200, Andreas Färber wrote: Am 23.07.2013 09:07, schrieb Michael S. Tsirkin: On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote: For VMState I believe the real follow-up fix would be mst defining a central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice. Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE() in the first place? The real fix is savevm/loadvm taking into account the class hierarchy. That's not helping, unless you write a patch to show what you mean and I merely mean that if I inherit a class I should inherit it's vmstate. So explicitly adding VMSTATE_PCI_DEVICE should not be necessary. how that is going to be migration-compatible. Most devices put VMSTATE_PCI_DEVICE at the beninning, so just calling that before vmstate for the device should be compatible. Does your not answering the question mean you don't know? Andreas I think the answer is that most pcie devices don't implement AER. AFAIK PCI devices can't support AER at all. -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on
Am 23.07.2013 11:43, schrieb yinyin: 在 2013-7-23,下午5:30,Andreas Färber afaer...@suse.de 写道: Am 23.07.2013 11:19, schrieb yinyin: during the migration, the source, not the destination, will start dataplane again I think the process of migration with dataplane as follows: 1. migration begin to start 2. the migration source stop the dataplane 3. do migration ... 4. migration completed, the destination start the dataplane. I can't speak for the dataplane, but in general the source guest is expected to continue working during live migration (that's the live part) until it has been fully transferred to the destination. when dataplane stop, the source guest can continue work, but not use dataplane thread. So how would it do I/O then? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls
On 23 July 2013 10:43, Andreas Färber afaer...@suse.de wrote: Am 23.07.2013 11:33, schrieb Mian M. Hamayun: From: Mian M. Hamayun m.hama...@virtualopensystems.com The memory region init calls require an additional parent parameter, so introduce a null parent parameter to make it happy. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com This is not OK for something labelled PATCH. Patch series need to be bisectable, not fixing up earlier patch series that have not been applied yet. I have a rebased version of John's mach-virt patch which includes these fixes; I haven't sent it out yet because I've still been pondering whether the create device tree nodes for everything code can be made less ugly... -- PMM
Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls
Am 23.07.2013 12:00, schrieb Peter Maydell: On 23 July 2013 10:43, Andreas Färber afaer...@suse.de wrote: Am 23.07.2013 11:33, schrieb Mian M. Hamayun: From: Mian M. Hamayun m.hama...@virtualopensystems.com The memory region init calls require an additional parent parameter, so introduce a null parent parameter to make it happy. Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com This is not OK for something labelled PATCH. Patch series need to be bisectable, not fixing up earlier patch series that have not been applied yet. I have a rebased version of John's mach-virt patch which includes these fixes; I haven't sent it out yet because I've still been pondering whether the create device tree nodes for everything code can be made less ugly... I'd also appreciate if you would update cpu/a57core.c wrt the container MemoryRegion and QOM realize (still a SysBus initfn here) - that was the intent of my a15mpcore patches I cc'ed all aarch64 people on. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'
On Fri, Jul 19, 2013 at 06:16:55PM +0800, Wenchao Xia wrote: 于 2013-7-18 12:41, Fam Zheng 写道: On Wed, 07/17 06:44, Eric Blake wrote: On 07/17/2013 03:42 AM, Fam Zheng wrote: Similar to drive-backup, but this command uses a device id as target instead of creating/opening an image file. Signed-off-by: Fam Zheng f...@redhat.com --- blockdev.c | 71 qapi-schema.json | 49 ++ qmp-commands.hx | 22 ++ 3 files changed, 142 insertions(+) +++ b/qapi-schema.json @@ -1665,6 +1665,40 @@ '*on-target-error': 'BlockdevOnError' } } ## +# @BlockdevBackup +# +{ 'type': 'BlockdevBackup', + 'data': { 'device': 'str', 'target': 'str', +'sync': 'MirrorSyncMode', +'*speed': 'int', +'*on-source-error': 'BlockdevOnError', +'*on-target-error': 'BlockdevOnError' } } Seems okay. But what is missing is the addition of this type into the union used for 'transaction' - shouldn't it be possible to mix this with other transaction capabilities? Should be possible, as users may want point-in-time snapshot of multiple disks. If this API looks OK, I will include it into transaction in the next version. Instead of add this new API, how about extend Driver-backup API? This patch is basically doing similar things with driver-backup, extension of old API will save trouble to do same things driver-backup already does, such as support qmp_transaction. The rationale was that drive-* commands should be high-level and can create image files. blockdev-* commands should be low-level and work on an existing -drive. The reason for keeping two separate commands is to avoid adding in parameters that work at different levels (filename vs drive name). In terms of API design I think drive- + blockdev- really is cleaner. Kevin can explain in more detail if I got something wrong. Stefan
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
Am 23.07.2013 11:59, schrieb Michael S. Tsirkin: On Tue, Jul 23, 2013 at 11:10:45AM +0200, Andreas Färber wrote: Am 23.07.2013 09:07, schrieb Michael S. Tsirkin: On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote: For VMState I believe the real follow-up fix would be mst defining a central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice. Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE() in the first place? I think the answer is that most pcie devices don't implement AER. AFAIK PCI devices can't support AER at all. Okay, so if it's just PCIe, then XHCI is the oddball preventing moving it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place, also operating on PCIDevice. Is there a way to detect use of AER or MSIX to place those into subsections of VMSTATE_PCIE_DEVICE()? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
Il 23/07/2013 04:53, liu ping fan ha scritto: The scenior I can figure out is if adopting timeout of poll, then when changing the deadline, we need to invoke poll, and set the new timeout, right? Yes, you need to call aio_notify so that poll is reinvoked. Paolo
Re: [Qemu-devel] [PATCH qom-cpu v3 00/41] QOM CPUState, part 11: GDB stub
On Wed, Jul 10, 2013 at 2:23 AM, Andreas Färber afaer...@suse.de wrote: Hello, This series cleans up gdbstub by changing all its internal CPU state to CPUState and by moving most target-specific code into the target directories. Support for m68k, moxie and unicore32 to set the PC via gdbstub is added. Lightweight subclasses for XtensaCPU are introduced, keeping the XtensaConfig mechanisms, to stop xtensa from deviating at gdbstub level wrt register count. I still wonder whether there would be interest in adding a program-counter dynamic property to the CPU, given that a setter has been factored out here? v3 avoids find_cpu() related breakages by deferring GDBState::c_cpu conversion until GDBState::g_cpu and find_cpu() can easily be converted, too. Available for testing at: git://github.com/afaerber/qemu-cpu.git qom-cpu-11.v3 https://github.com/afaerber/qemu-cpu/commits/qom-cpu-11.v3 xtensa parts: Acked-by: Max Filippov jcmvb...@gmail.com -- Thanks. -- Max
[Qemu-devel] [PULL 0/3] OpenRISC patch queue
Hi Anthony, This is my OpenRISC patch queue, and myfirst time to send a pull requests, please pull. It fix OpenRISC CPU and sim broad: * Free typename in openrisc_cpu_class_by_name * Use stderr output in openrisc_sim.c * fixed a indent typo. The following changes since commit 3464700f6aecb3e2aa9098839d90672d6b3fa974: tests: Add test-bitops.c with some sextract tests (2013-07-22 15:41:49 -0500) are available in the git repository at: git://github.com/J-Liu/qemu.git or32 for you to fetch changes up to 9b146e9a28bbd9567f5ac6a8e2bcb543aa3b9392: target-openrisc: Free typename in openrisc_cpu_class_by_name (2013-07-23 18:32:30 +0800) Jia Liu (3): hw/openrisc: Indent typo hw/openrisc: Use stderr output instead of qemu_log target-openrisc: Free typename in openrisc_cpu_class_by_name hw/openrisc/openrisc_sim.c | 6 +++--- target-openrisc/cpu.c | 1 + 2 files changed, 4 insertions(+), 3 deletions(-)
[Qemu-devel] [PULL 1/3] hw/openrisc: Indent typo
Indent typo. Signed-off-by: Jia Liu pro...@gmail.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Andreas Färber afaer...@suse.de --- hw/openrisc/openrisc_sim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index 924438b..250f5b5 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -96,7 +96,7 @@ static void openrisc_sim_init(QEMUMachineInitArgs *args) ram_addr_t ram_size = args-ram_size; const char *cpu_model = args-cpu_model; const char *kernel_filename = args-kernel_filename; - OpenRISCCPU *cpu = NULL; +OpenRISCCPU *cpu = NULL; MemoryRegion *ram; int n; -- 1.7.12.4 (Apple Git-37)
Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard
On Tue, 07/23 11:36, Stefan Hajnoczi wrote: On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote: Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference) to BlockDriverState, since in_use mechanism cannot provide proper management of lifecycle when a BDS is referenced in multiple places (e.g. pointed to by another bs's backing_hd while also used as a block job device, in the use case of image fleecing). The original in_use case is considered a hard reference in this patch, where the bs is busy and should not be used in other tasks that require a hard reference. (However the interface doesn't force this, caller still need to call bdrv_in_use() to check by itself.). A soft reference is implemented but not used yet. It will be used in following patches to manage the lifecycle together with hard reference. If bdrv_ref() is called on a BDS, it must be released by exactly the same numbers of bdrv_unref() with the same soft/hard type, and never call bdrv_delete() directly. If the BDS is only used locally (unnamed), bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete(). It is risky to keep bdrv_delete() public. I suggest replacing bdrv_delete() callers with bdrv_unref() and then making bdrv_delete() static in block.c. This way it is impossible to make the mistake of calling bdrv_delete() on a BDS which has refcnt 1. I don't really understand this patch. There are now two separate refcounts. They must both reach 0 for deletion to occur. I think you plan to treat the hard refcount like the in_use counter (there should only be 0 or 1 refs) but you don't enforce it. It seems cleaner to keep in_use separate: let in_use callers take a refcount and also set in_use. OK, I like your ideas, make bdrv_delete private is much cleaner. Will fix in next revision. I plan to make it like this: /* soft ref */ void bdrv_{ref,unref}(bs) /* hard ref */ bool bdrv_hard_{ref,unref}(bs) usage: bs = bdrv_new() implicit bdrv_ref(bs) called ... bdrv_unref(bs) automatically deleted here or with hard ref: bs = bdrv_new() implicit bdrv_ref() called bdrv_hard_ref(bs) ... bdrv_hard_unref(bs) bdrv_unref(bs) automatically deleted here The second bdrv_hard_ref call to a bs returns false, caller check the return value. -- Fam
[Qemu-devel] [PULL 3/3] target-openrisc: Free typename in openrisc_cpu_class_by_name
We should free typename here. Signed-off-by: Jia Liu pro...@gmail.com Reviewed-by: Andreas Färber afaer...@suse.de --- target-openrisc/cpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c index 6d40f1b..e348df0 100644 --- a/target-openrisc/cpu.c +++ b/target-openrisc/cpu.c @@ -99,6 +99,7 @@ static ObjectClass *openrisc_cpu_class_by_name(const char *cpu_model) typename = g_strdup_printf(%s- TYPE_OPENRISC_CPU, cpu_model); oc = object_class_by_name(typename); +g_free(typename); if (oc != NULL (!object_class_dynamic_cast(oc, TYPE_OPENRISC_CPU) || object_class_is_abstract(oc))) { return NULL; -- 1.7.12.4 (Apple Git-37)
[Qemu-devel] [PULL 2/3] hw/openrisc: Use stderr output instead of qemu_log
We should use stderr output instead of qemu_log in order to output ErrMsg onto the screen. Signed-off-by: Jia Liu pro...@gmail.com Reviewed-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Andreas Färber afaer...@suse.de --- hw/openrisc/openrisc_sim.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c index 250f5b5..a08f27c 100644 --- a/hw/openrisc/openrisc_sim.c +++ b/hw/openrisc/openrisc_sim.c @@ -82,7 +82,7 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size, } if (kernel_size 0) { -qemu_log(QEMU: couldn't load the kernel '%s'\n, +fprintf(stderr, QEMU: couldn't load the kernel '%s'\n, kernel_filename); exit(1); } @@ -107,7 +107,7 @@ static void openrisc_sim_init(QEMUMachineInitArgs *args) for (n = 0; n smp_cpus; n++) { cpu = cpu_openrisc_init(cpu_model); if (cpu == NULL) { -qemu_log(Unable to find CPU definition!\n); +fprintf(stderr, Unable to find CPU definition!\n); exit(1); } qemu_register_reset(main_cpu_reset, cpu); -- 1.7.12.4 (Apple Git-37)
Re: [Qemu-devel] [PATCH 1/2] block: allow live commit of active image
Il 23/07/2013 04:03, Fam Zheng ha scritto: stop block-job-complete ide0-hd0 cont I see, this way the job needs to stop vm in the point of all copying drained, then report 'ready' and wait for manual complete before swapping active, sounds not so good. Ideally we should mirror writes _synchronously_ to 'top' and 'base' after 'ready' state and wait for manual complete to switch image. Do you think this is easy to do? Synchronous mirroring makes it harder to handle errors in writing the target, especially if you're not running with rerror=stop/werror=stop (they would be reported to the guest). Paolo
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
Hi, Okay, so if it's just PCIe, then XHCI is the oddball preventing moving it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place, also operating on PCIDevice. Given that live migration support for xhci was added post-1.5 so we don't have a release with it yet this shouldn't be a blocker in case we get this sorted in time for 1.6. Is there a way to detect use of AER or MSIX to place those into subsections of VMSTATE_PCIE_DEVICE()? There is msix_enabled() ... Dunno about AER. cheers, Gerd
Re: [Qemu-devel] [RFC PATCH] vga: Start supporting resolution not multiple of 16 correctly.
Hi, Tested-by: Fabio Fantoni fabio.fant...@m2r.biz I tested it for a long time with spice on xen (because qxl will be fully working only after adding SSE support on hvm domUs). It works, I think it is good to add this and the respective vgabios patch on upstream. case VBE_DISPI_INDEX_XRES: -if ((val = VBE_DISPI_MAX_XRES) ((val 7) == 0)) { +if ((val = VBE_DISPI_MAX_XRES) ((val 1) == 0)) { s-vbe_regs[s-vbe_index] = val; } break; It's not that simple. With 32bit depths common today it will work fine, but for lower depths (especially those lower than 8bit) this will give you broken scanline alignment. cheers, Gerd
Re: [Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv
Il 23/07/2013 10:30, MORITA Kazutaka ha scritto: Without this patch, iov_send_recv() never returns when do_send_recv() returns zero. Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp --- util/iov.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/util/iov.c b/util/iov.c index cc6e837..f705586 100644 --- a/util/iov.c +++ b/util/iov.c @@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt, return -1; } +if (ret == 0 !do_send) { +/* recv returns 0 when the peer has performed an orderly + * shutdown. */ +break; +} + /* Prepare for the next iteration */ offset += ret; total += ret; Reviewed-by: Paolo Bonzini pbonz...@redhat.com ... and should also be in 1.5.2. Paolo
Re: [Qemu-devel] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait
Il 23/07/2013 10:30, MORITA Kazutaka ha scritto: This allows us to use inet_nonblocking_connect() and unix_nonblocking_connect() in block drivers. qemu-ga needs to link block-obj to resolve dependencies of qemu_aio_set_fd_handler(). Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp I'm not sure this is safe. You could have e.g. migration start during qemu_aio_wait(). Paolo --- Makefile| 4 ++-- util/qemu-sockets.c | 15 ++- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index c06bfab..5fe2e0f 100644 --- a/Makefile +++ b/Makefile @@ -197,7 +197,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h $ $@, GEN $@) -qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) +qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBS_TOOLS) qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated gen-out-type = $(subst .,-,$(suffix $@)) @@ -227,7 +227,7 @@ $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py) QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h) $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN) -qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a +qemu-ga$(EXESUF): $(qga-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a $(call LINK, $^) clean: diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 095716e..8b21fd1 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -218,6 +218,11 @@ typedef struct ConnectState { static int inet_connect_addr(struct addrinfo *addr, bool *in_progress, ConnectState *connect_state, Error **errp); +static int return_true(void *opaque) +{ +return 1; +} + static void wait_for_connect(void *opaque) { ConnectState *s = opaque; @@ -225,7 +230,7 @@ static void wait_for_connect(void *opaque) socklen_t valsize = sizeof(val); bool in_progress; -qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL); +qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL); do { rc = qemu_getsockopt(s-fd, SOL_SOCKET, SO_ERROR, val, valsize); @@ -288,8 +293,8 @@ static int inet_connect_addr(struct addrinfo *addr, bool *in_progress, if (connect_state != NULL QEMU_SOCKET_RC_INPROGRESS(rc)) { connect_state-fd = sock; -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect, - connect_state); +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true, +connect_state); *in_progress = true; } else if (rc 0) { error_set_errno(errp, errno, QERR_SOCKET_CONNECT_FAILED); @@ -749,8 +754,8 @@ int unix_connect_opts(QemuOpts *opts, Error **errp, if (connect_state != NULL QEMU_SOCKET_RC_INPROGRESS(rc)) { connect_state-fd = sock; -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect, - connect_state); +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true, +connect_state); return sock; } else if (rc = 0) { /* non blocking socket immediate success, call callback */
Re: [Qemu-devel] [Question] why x2apic's set by default without host support(on Nehalem CPU).
On Tue, Jul 23, 2013 at 10:44:48 +0800, Peter Huang(Peng) wrote: libvirt's host-passthrough uses -cpu host', and it -cpu host enables every feature that can be enabled on the host. From my test results, I found that even when use host-passthrough mode, VM's cpu features are very different from host, this doesn't match what host-passthrough mode's explanation. libvirt's option exlanation: With this mode, the CPU visible to the guest should be exactly the same as the host CPU even in the aspects that libvirt does not understand. The libvirt documentation is what needs to be updated. While host-passthrough is asking for a CPU which is as close as possible to the real host CPU, there are features that need special handling before they can be provided to a guest. And if the hypervisor does not provide that handling, it may just filter such feature out. Also if some features can be efficiently provided to a guest even though the host CPU does not provide them (x2apic is an example of such feature), they may be provided to a guest. Jirka
Re: [Qemu-devel] [PATCH 0/7] s390 fixes
On 15/07/13 21:57, Christian Borntraeger wrote: Alex, here is a bunch of fixes/cleanups for s390. Heinz Graalfs (1): s390/sclpconsole: handle char layer busy conditions Thomas Huth (6): s390x/ioinst: Add missing alignment checks for IO instructions s390x/ioinst: Throw addressing exception when memory_map failed s390x/ioinst: Fixed alignment check in SCHM instruction s390x/ioinst: Fixed priority of operand exceptions s390x/kvm: Reworked/fixed handling of cc3 in kvm_handle_css_inst() s390x/kvm: Remove redundant return code hw/char/sclpconsole.c | 18 +- target-s390x/ioinst.c | 65 ++- target-s390x/kvm.c| 64 +++--- 3 files changed, 57 insertions(+), 90 deletions(-) Ping?
Re: [Qemu-devel] [PATCH V6 0/3] Implement sync modes for drive-backup.
On Mon, Jul 22, 2013 at 03:09:17PM -0700, Ian Main wrote: This patch adds sync modes on top of the work that Stefan Hajnoczi has done. These patches apply on kevin/block. Hopefully all is in order as this is my first QEMU patch. Many thanks to Stephan and Fam Zheng for their help. V2: - No longer poll, instead use qemu_coroutine_yield(). - Use bdrv_co_is_allocated(). - Much better SYNC_MODE_NONE test. V3: - A few style fixes. - Better commit message explaining how TOP and NONE operate. - Verified using checkpatch.pl. V4: - Add patch to use the source as a backing hd during backup. - Add patch to default sync mode none to qcow2. V5: - Fix qcow2 patch. Forgot to git add final version. V6: - Default to requiring 'format' when mode is absolute-paths. - Removed one bad hunk that was misapplying. - Fixed docs, examples and tests to match changes. - Added tests for format bad/missing. - Added bdrv_set_in_use() to target. - Default to qcow2 patch not required. Ian Main (3): Implement sync modes for drive-backup. Add tests for sync modes 'TOP' and 'NONE' Add backing drive while performing backup. block/backup.c| 107 + blockdev.c| 36 +- include/block/block_int.h | 4 +- qapi-schema.json | 4 +- qmp-commands.hx | 2 + tests/qemu-iotests/055| 108 +- tests/qemu-iotests/055.out| 4 +- tests/qemu-iotests/group | 2 +- tests/qemu-iotests/iotests.py | 5 ++ 9 files changed, 211 insertions(+), 61 deletions(-) This patch mostly takes care of image fleecing except it does not give the target a device name which can be used by nbd-server-add. Fam's series tackles the target device name and some of the overlapping problems with your series. The core feature in your series is sync=top|none and that needs to be merged. Now we need to figure out which patches to take and what must be changed. Please see the sub-threads on Fam's series. Perhaps we can reach a consensus there. Stefan
[Qemu-devel] [PATCH v2] spice: fix display initialization
Spice has two display interface implementations: One integrated into the qxl graphics card, and one generic which can operate with every qemu-emulated graphics card. The generic one is activated in case spice is used without qxl. The logic for that only caught the -vga qxl case, -device qxl-vga goes unnoticed. Fix that by adding a check in the spice interface registration so we'll notice the qxl card no matter how it is created. https://bugzilla.redhat.com/show_bug.cgi?id=981094 Signed-off-by: Gerd Hoffmann kra...@redhat.com --- include/sysemu/sysemu.h |1 - include/ui/qemu-spice.h |2 ++ ui/spice-core.c |5 + vl.c|2 +- 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 3caeb66..d7a77b6 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -103,7 +103,6 @@ typedef enum { extern int vga_interface_type; #define xenfb_enabled (vga_interface_type == VGA_XENFB) -#define qxl_enabled (vga_interface_type == VGA_QXL) extern int graphic_width; extern int graphic_height; diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h index eba6d77..c6c756b 100644 --- a/include/ui/qemu-spice.h +++ b/include/ui/qemu-spice.h @@ -27,6 +27,7 @@ #include monitor/monitor.h extern int using_spice; +extern int spice_displays; void qemu_spice_init(void); void qemu_spice_input_init(void); @@ -57,6 +58,7 @@ static inline CharDriverState *qemu_chr_open_spice_port(const char *name) #include monitor/monitor.h #define using_spice 0 +#define spice_displays 0 static inline int qemu_spice_set_passwd(const char *passwd, bool fail_if_connected, bool disconnect_if_connected) diff --git a/ui/spice-core.c b/ui/spice-core.c index f308fd9..f0da673 100644 --- a/ui/spice-core.c +++ b/ui/spice-core.c @@ -48,6 +48,7 @@ static char *auth_passwd; static time_t auth_expires = TIME_MAX; static int spice_migration_completed; int using_spice = 0; +int spice_displays; static QemuThread me; @@ -836,6 +837,10 @@ int qemu_spice_add_interface(SpiceBaseInstance *sin) qemu_add_vm_change_state_handler(vm_change_state_handler, NULL); } +if (strcmp(sin-sif-type, SPICE_INTERFACE_QXL) == 0) { +spice_displays++; +} + return spice_server_add_interface(spice_server, sin); } diff --git a/vl.c b/vl.c index 25b8f2f..f422a1c 100644 --- a/vl.c +++ b/vl.c @@ -4387,7 +4387,7 @@ int main(int argc, char **argv, char **envp) } #endif #ifdef CONFIG_SPICE -if (using_spice !qxl_enabled) { +if (using_spice !spice_displays) { qemu_spice_display_init(ds); } #endif -- 1.7.9.7
Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver
On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote: gluster: Use pkg-config to configure GlusterFS block driver Use pkg-config to determine the version and library dependency for GlusterFS block driver. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- configure | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/configure b/configure index cb0f870..76adcb1 100755 --- a/configure +++ b/configure @@ -2566,23 +2566,17 @@ fi ## # glusterfs probe if test $glusterfs != no ; then - cat $TMPC EOF -#include glusterfs/api/glfs.h -int main(void) { -(void) glfs_new(volume); -return 0; -} -EOF - glusterfs_libs=-lgfapi -lgfrpc -lgfxdr - if compile_prog $glusterfs_libs ; then -glusterfs=yes -libs_tools=$glusterfs_libs $libs_tools -libs_softmmu=$glusterfs_libs $libs_softmmu + if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then +glusterfs=yes +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null` +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null` +CFLAGS=$CFLAGS $glusterfs_cflags +LIBS=$LIBS $glusterfs_libs The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files. So with this change now in GIT, QEMU no longer detects support for glusterfs even though it is present. Has the min required glusterfs been increased to a new 3.5 version which does include pkg-config support ? If not, then I think this patch needs to be reverted, so that it does a non-pkg-config based check for glusterfs. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver
On 7/23/13 4:57 AM, Daniel P. Berrange wrote: On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote: gluster: Use pkg-config to configure GlusterFS block driver Use pkg-config to determine the version and library dependency for GlusterFS block driver. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- configure | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/configure b/configure index cb0f870..76adcb1 100755 --- a/configure +++ b/configure @@ -2566,23 +2566,17 @@ fi ## # glusterfs probe if test $glusterfs != no ; then - cat $TMPC EOF -#include glusterfs/api/glfs.h -int main(void) { -(void) glfs_new(volume); -return 0; -} -EOF - glusterfs_libs=-lgfapi -lgfrpc -lgfxdr - if compile_prog $glusterfs_libs ; then -glusterfs=yes -libs_tools=$glusterfs_libs $libs_tools -libs_softmmu=$glusterfs_libs $libs_softmmu + if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then +glusterfs=yes +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null` +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null` +CFLAGS=$CFLAGS $glusterfs_cflags +LIBS=$LIBS $glusterfs_libs The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files. So with this change now in GIT, QEMU no longer detects support for glusterfs even though it is present. Has the min required glusterfs been increased to a new 3.5 version which does include pkg-config support ? If not, then I think this patch needs to be reverted, so that it does a non-pkg-config based check for glusterfs. Regards, Daniel Copying Kaleb. We should just include the pkg-config file in the Fedora RPM for glusterfs if it already isn't. Avati
Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver
On Tue, Jul 23, 2013 at 05:02:20AM -0700, Anand Avati wrote: On 7/23/13 4:57 AM, Daniel P. Berrange wrote: On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote: gluster: Use pkg-config to configure GlusterFS block driver Use pkg-config to determine the version and library dependency for GlusterFS block driver. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- configure | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/configure b/configure index cb0f870..76adcb1 100755 --- a/configure +++ b/configure @@ -2566,23 +2566,17 @@ fi ## # glusterfs probe if test $glusterfs != no ; then - cat $TMPC EOF -#include glusterfs/api/glfs.h -int main(void) { -(void) glfs_new(volume); -return 0; -} -EOF - glusterfs_libs=-lgfapi -lgfrpc -lgfxdr - if compile_prog $glusterfs_libs ; then -glusterfs=yes -libs_tools=$glusterfs_libs $libs_tools -libs_softmmu=$glusterfs_libs $libs_softmmu + if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then +glusterfs=yes +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null` +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null` +CFLAGS=$CFLAGS $glusterfs_cflags +LIBS=$LIBS $glusterfs_libs The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files. So with this change now in GIT, QEMU no longer detects support for glusterfs even though it is present. Has the min required glusterfs been increased to a new 3.5 version which does include pkg-config support ? If not, then I think this patch needs to be reverted, so that it does a non-pkg-config based check for glusterfs. Regards, Daniel Copying Kaleb. We should just include the pkg-config file in the Fedora RPM for glusterfs if it already isn't. That doesn't help anyone trying to build QEMU with gluster support on all the existing released distros which lack the pkg-config files. If you really want a pkg-config file check for glusterfs in QEMU, then it must at least fallback to probing the non-pkg-config way to support existing deployed distros. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers
On Fri, Jul 19, 2013 at 10:37:11AM +0200, Kevin Wolf wrote: Am 19.07.2013 um 07:27 hat Stefan Hajnoczi geschrieben: On Thu, Jul 18, 2013 at 11:21:42PM +0200, Charlie Shepherd wrote: This patch makes bdrv_flush a synchronous function and updates any callers from a coroutine context to use bdrv_co_flush instead. The motivation for this patch comes from the GSoC Continuation-Passing C project. When coroutines were introduced, synchronous functions in the block layer were converted to use asynchronous methods by dynamically detecting if they were being run from a coroutine context by calling qemu_in_coroutine(), and yielding if so. If they were not, they would spawn a new coroutine and poll until the asynchronous counterpart finished. However this approach does not work with CPC as the CPC translator converts all functions annotated coroutine_fn to a different (continuation based) calling convention. This means that coroutine_fn annotated functions cannot be called from a non-coroutine context. This patch is a Request For Comments on the approach of splitting these dynamic functions into synchronous and asynchronous versions. This is easy for bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. The only caller of bdrv_flush from a coroutine context is mirror_drain in block/mirror.c - this should be annotated as a coroutine_fn as it calls qemu_coroutine_yield(). If this approach meets with approval I will develop a patchset splitting the other dynamic functions in the block layer. This will allow all coroutine functions to have a coroutine_fn annotation that can be statically checked (CPC can be used to verify annotations). I have audited the other callers of bdrv_flush, they are included below: block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync bdrv_pwrite_sync() is a dynamic function. If called from coroutine context it will yield (indirectly from bdrv_pwrite() or bdrv_flush()). block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush block/qcow2-refcount.c: qcow2_update_snapshot_refcount block/qcow2-snapshot.c: qcow2_write_snapshots block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean qcow2 runs in coroutine context, the coroutine_fn annotations are just missing. See block/qcow2.c:bdrv_qcow2 for the entry points like qcow2_co_readv. Yes, you can't rely on coroutine_fn, it's missing in many places where it should be there. But that was still the optimistic view. The truth is that the greatest part of the qcow2 functions can be called from eiher coroutine or non-coroutine context. You get coroutine context for read/write/discard/flush, but anything else like doing snapshots, resizing, preallocating the image, writing compressed data also accesses the same metadata management functions outside coroutines. It's only getting worse for function like bdrv_pwrite(). A built-time check for coroutine_fn would be valuable if we ever hope to get disciplined with this annotation. The check can detect when a coroutine_fn is invoked outside coroutine context. I wonder if Coccinelle can detect this, although I never figured out how to use it as a grep-like tool instead of just a patch-like tool. Stefan
Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers
On Tue, Jul 23, 2013 at 02:05:15PM +0200, Stefan Hajnoczi wrote: A built-time check for coroutine_fn would be valuable if we ever hope to get disciplined with this annotation. The check can detect when a coroutine_fn is invoked outside coroutine context. I wonder if Coccinelle can detect this, although I never figured out how to use it as a grep-like tool instead of just a patch-like tool. The recent cps-inference branch of CPC enables precisely that kind of check. Charlie is using it to drive his modifications to QEMU and has already suggested several improvements that I have implemented. Hopefully we should reach something fully covering QEMU by the end his GSoC. If there is interest, I can post a script showing how to build it and use it to check QEMU annotations (it does not require any modifications to QEMU, only a couple of configure switches). Best regards, -- Gabriel
Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver
On Tue, Jul 23, 2013 at 05:37:54PM +0530, Kaleb KEITHLEY wrote: On 07/23/2013 05:32 PM, Anand Avati wrote: On 7/23/13 4:57 AM, Daniel P. Berrange wrote: On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote: gluster: Use pkg-config to configure GlusterFS block driver Use pkg-config to determine the version and library dependency for GlusterFS block driver. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- configure | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/configure b/configure index cb0f870..76adcb1 100755 --- a/configure +++ b/configure @@ -2566,23 +2566,17 @@ fi ## # glusterfs probe if test $glusterfs != no ; then - cat $TMPC EOF -#include glusterfs/api/glfs.h -int main(void) { -(void) glfs_new(volume); -return 0; -} -EOF - glusterfs_libs=-lgfapi -lgfrpc -lgfxdr - if compile_prog $glusterfs_libs ; then -glusterfs=yes -libs_tools=$glusterfs_libs $libs_tools -libs_softmmu=$glusterfs_libs $libs_softmmu + if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then +glusterfs=yes +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null` +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null` +CFLAGS=$CFLAGS $glusterfs_cflags +LIBS=$LIBS $glusterfs_libs The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files. So with this change now in GIT, QEMU no longer detects support for glusterfs even though it is present. Has the min required glusterfs been increased to a new 3.5 version which does include pkg-config support ? If not, then I think this patch needs to be reverted, so that it does a non-pkg-config based check for glusterfs. Regards, Daniel Copying Kaleb. We should just include the pkg-config file in the Fedora RPM for glusterfs if it already isn't. It's in the glusterfs-api-devel rpm: % rpm -ql glusterfs-api-devel /usr/include/glusterfs/api/glfs.h /usr/lib64/libgfapi.so /usr/lib64/pkgconfig/glusterfs-api.pc Oooh, not the main glusterfs-devel RPM. Ok, ignore my earlier message Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features
The first patch fixes the throttling which was broken by a previous commit. The next patch replace the existing throttling algorithm by the well described leaky bucket algorithm. Third patch implement bursting by adding *_threshold parameters to qmp_block_set_io_throttle. The last one allow to define the max size of an io when throttling by iops via iops_sector_count to avoid vm users cheating on the iops limit. The last patch adds a metric reflecting how much the I/O are throttled. since v1: Add throttling percentage metric [Benoît] Benoît Canet (5): block: Repair the throttling code. block: Modify the throttling code to implement the leaky bucket algorithm. block: Add support for throttling burst threshold in QMP and the command line. block: Add iops_sector_count to do the iops accounting for a given io size. block: Add throttling percentage metrics. block.c | 439 ++--- block/qapi.c | 32 blockdev.c| 174 -- hmp.c | 38 +++- include/block/block_int.h | 18 +- include/block/coroutine.h |5 + qapi-schema.json | 42 - qemu-coroutine-lock.c | 14 ++ qemu-options.hx |2 +- qmp-commands.hx | 34 +++- 10 files changed, 586 insertions(+), 212 deletions(-) -- 1.7.10.4
[Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code.
The throttling code was segfaulting since commit 02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller does not run in a coroutine. qemu_co_queue_do_restart assume that the caller is a coroutinne. As sugested by Stefan fix this by entering the coroutine directly. Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c |8 include/block/coroutine.h |5 + qemu-coroutine-lock.c | 14 ++ 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index b560241..dc72643 100644 --- a/block.c +++ b/block.c @@ -127,7 +127,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs) { bs-io_limits_enabled = false; -while (qemu_co_queue_next(bs-throttled_reqs)); +while (qemu_co_enter_next(bs-throttled_reqs)) { +} if (bs-block_timer) { qemu_del_timer(bs-block_timer); @@ -143,7 +144,7 @@ static void bdrv_block_timer(void *opaque) { BlockDriverState *bs = opaque; -qemu_co_queue_next(bs-throttled_reqs); +qemu_co_enter_next(bs-throttled_reqs); } void bdrv_io_limits_enable(BlockDriverState *bs) @@ -1445,8 +1446,7 @@ void bdrv_drain_all(void) * a busy wait. */ QTAILQ_FOREACH(bs, bdrv_states, list) { -if (!qemu_co_queue_empty(bs-throttled_reqs)) { -qemu_co_queue_restart_all(bs-throttled_reqs); +while (qemu_co_enter_next(bs-throttled_reqs)) { busy = true; } } diff --git a/include/block/coroutine.h b/include/block/coroutine.h index 377805a..66e331c 100644 --- a/include/block/coroutine.h +++ b/include/block/coroutine.h @@ -138,6 +138,11 @@ bool qemu_co_queue_next(CoQueue *queue); void qemu_co_queue_restart_all(CoQueue *queue); /** + * Enter the next coroutine in the queue + */ +bool qemu_co_enter_next(CoQueue *queue); + +/** * Checks if the CoQueue is empty. */ bool qemu_co_queue_empty(CoQueue *queue); diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c index d9fea49..c61d300 100644 --- a/qemu-coroutine-lock.c +++ b/qemu-coroutine-lock.c @@ -98,6 +98,20 @@ void qemu_co_queue_restart_all(CoQueue *queue) qemu_co_queue_do_restart(queue, false); } +bool qemu_co_enter_next(CoQueue *queue) +{ +Coroutine *next; + +next = QTAILQ_FIRST(queue-entries); +if (!next) { +return false; +} + +QTAILQ_REMOVE(queue-entries, next, co_queue_next); +qemu_coroutine_enter(next, NULL); +return true; +} + bool qemu_co_queue_empty(CoQueue *queue) { return (QTAILQ_FIRST(queue-entries) == NULL); -- 1.7.10.4
Re: [Qemu-devel] [PATCH 0/9] Add platform bus
Il 22/07/2013 20:21, Peter Maydell ha scritto: Platforms without ISA and/or PCI have had a seriously hard time in the dynamic device creation world of QEMU. Devices on these were modeled as SysBus devices which can only be instantiated in machine files, not through -device. Why is that so? Because you can't as a user of this sort of hardware plug in an extra serial port to a SoC, because there's just nowhere to plug it in. So why should it be possible to plug an extra serial port into the QEMU model of the SoC? And why exactly should QEMU be limited to modeling an existing SoC? Perhaps the user is not working with an existing SoC. They are working with with IP building blocks that they can combine the way they prefer, and they haven't yet made up their mind on the exact set of devices they'll have. (because not all the world is a PC, but then not all the non-PC world is ARM either). Perhaps the user is working on kernel support for device tree / ACPI, wants to test many device combinations, and does not want to touch C code in order to do that. Perhaps the user can plug daughterboards that connect to the SoC and add an extra serial port, visible as yet another MMIO device. And why should the user of QEMU have to know very low hardware level detail like what a particular devboard's IRQ and memory map are? Of course in some scenarios the user of QEMU doesn't need them, but there are many kinds of users... Paolo
[Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
The thresholds of the leaky bucket algorithm can be used to allow some burstiness. Signed-off-by: Benoit Canet ben...@irqsave.net --- block/qapi.c | 24 + blockdev.c | 105 +++--- hmp.c| 32 +++-- qapi-schema.json | 34 -- qemu-options.hx |2 +- qmp-commands.hx | 30 ++-- 6 files changed, 205 insertions(+), 22 deletions(-) diff --git a/block/qapi.c b/block/qapi.c index a4bc411..03f1604 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -235,6 +235,30 @@ void bdrv_query_info(BlockDriverState *bs, bs-io_limits.iops[BLOCK_IO_LIMIT_READ]; info-inserted-iops_wr = bs-io_limits.iops[BLOCK_IO_LIMIT_WRITE]; +info-inserted-has_bps_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL]; +info-inserted-bps_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL]; +info-inserted-has_bps_rd_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]; +info-inserted-bps_rd_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]; +info-inserted-has_bps_wr_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE]; +info-inserted-bps_wr_threshold = + bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE]; +info-inserted-has_iops_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL]; +info-inserted-iops_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL]; +info-inserted-iops_rd_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]; +info-inserted-has_iops_rd_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]; +info-inserted-has_iops_wr_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE]; +info-inserted-iops_wr_threshold = + bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE]; } bs0 = bs; diff --git a/blockdev.c b/blockdev.c index a78fba4..9bda359 100644 --- a/blockdev.c +++ b/blockdev.c @@ -341,6 +341,17 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp) return false; } +if (io_limits-bps_threshold[BLOCK_IO_LIMIT_TOTAL] 0 || +io_limits-bps_threshold[BLOCK_IO_LIMIT_WRITE] 0 || +io_limits-bps_threshold[BLOCK_IO_LIMIT_READ] 0 || +io_limits-iops_threshold[BLOCK_IO_LIMIT_TOTAL] 0 || +io_limits-iops_threshold[BLOCK_IO_LIMIT_WRITE] 0 || +io_limits-iops_threshold[BLOCK_IO_LIMIT_READ] 0) { +error_setg(errp, + threshold values must be 0 or greater); +return false; +} + return true; } @@ -523,24 +534,34 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) qemu_opt_get_number(opts, bps_rd, 0); io_limits.bps[BLOCK_IO_LIMIT_WRITE] = qemu_opt_get_number(opts, bps_wr, 0); +/* bps thresholds */ +io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = +qemu_opt_get_number(opts, bps_threshold, +io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ); +io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = +qemu_opt_get_number(opts, bps_rd_threshold, +io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ); +io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = +qemu_opt_get_number(opts, bps_wr_threshold, +io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ); + io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = qemu_opt_get_number(opts, iops, 0); io_limits.iops[BLOCK_IO_LIMIT_READ] = qemu_opt_get_number(opts, iops_rd, 0); io_limits.iops[BLOCK_IO_LIMIT_WRITE] = qemu_opt_get_number(opts, iops_wr, 0); -io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = - io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ; -io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = - io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ; -io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = - io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ; -io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = - io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ; + +/* iops thresholds */ +io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = +qemu_opt_get_number(opts, iops_threshold, +
[Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
This patch replace the previous algorithm by the well described leaky bucket algorithm: A bucket is filled by the incoming IOs and a periodic timer decrement the counter to make the bucket leak. When a given threshold is reached the bucket is full and the IOs are hold. In this patch the threshold is set to a default value to make the code behave like the previous implementation. In the next patch the threshold will be exposed in QMP to let the user control the burstiness of the throttling. Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c | 410 + blockdev.c| 71 ++-- include/block/block_int.h | 15 +- 3 files changed, 299 insertions(+), 197 deletions(-) diff --git a/block.c b/block.c index dc72643..2d6e9b4 100644 --- a/block.c +++ b/block.c @@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque); static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors); -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, -bool is_write, double elapsed_time, uint64_t *wait); -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, -double elapsed_time, uint64_t *wait); -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, -bool is_write, int64_t *wait); - static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -101,6 +94,8 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers = /* If non-zero, use only whitelisted block drivers */ static int use_bdrv_whitelist; +/* boolean used to inform the throttling code that a bdrv_drain_all is issued */ +static bool draining; #ifdef _WIN32 static int is_windows_drive_prefix(const char *filename) @@ -135,15 +130,122 @@ void bdrv_io_limits_disable(BlockDriverState *bs) qemu_free_timer(bs-block_timer); bs-block_timer = NULL; } +} -bs-slice_start = 0; -bs-slice_end = 0; +static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta) +{ +int64_t *bytes = bs-leaky_buckets.bytes; +int64_t read_leak, write_leak; + +/* the limit apply to both reads and writes */ +if (bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) { +/* compute half the total leak */ +int64_t leak = ((bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL] * delta) / + NANOSECONDS_PER_SECOND); +int remain = leak % 2; +leak /= 2; + +/* the read bucket is smaller than half the quantity to leak so take + * care adding the leak difference to write leak + */ +if (bytes[BLOCK_IO_LIMIT_READ] = leak) { +read_leak = bytes[BLOCK_IO_LIMIT_READ]; +write_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_READ]; +/* symetric case */ +} else if (bytes[BLOCK_IO_LIMIT_WRITE] = leak) { +write_leak = bytes[BLOCK_IO_LIMIT_WRITE]; +read_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_WRITE]; +/* both bucket above leak count use half the total leak for both */ +} else { +write_leak = leak; +read_leak = leak + remain; +} +/* else we consider that limits are separated */ +} else { +read_leak = (bs-io_limits.bps[BLOCK_IO_LIMIT_READ] * delta) / +NANOSECONDS_PER_SECOND; +write_leak = (bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] * delta) / + NANOSECONDS_PER_SECOND; +} + +/* make the buckets leak */ +bytes[BLOCK_IO_LIMIT_READ] = MAX(bytes[BLOCK_IO_LIMIT_READ] - read_leak, + 0); +bytes[BLOCK_IO_LIMIT_WRITE] = MAX(bytes[BLOCK_IO_LIMIT_WRITE] - write_leak, + 0); } +static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta) +{ +double *ios = bs-leaky_buckets.ios; +int64_t read_leak, write_leak; + +/* the limit apply to both reads and writes */ +if (bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) { +/* compute half the total leak */ +int64_t leak = ((bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) / + NANOSECONDS_PER_SECOND); +int remain = leak % 2; +leak /= 2; + +/* the read bucket is smaller than half the quantity to leak so take + * care adding the leak difference to write leak + */ +if (ios[BLOCK_IO_LIMIT_READ] = leak) { +read_leak = ios[BLOCK_IO_LIMIT_READ]; +write_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_READ]; +/* symetric case */ +} else if (ios[BLOCK_IO_LIMIT_WRITE] = leak) { +write_leak = ios[BLOCK_IO_LIMIT_WRITE]; +read_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_WRITE]; +/* both bucket above leak count use half the total leak for both */ +} else { +write_leak
[Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size.
This feature can be used in case where users are avoiding the iops limit by doing jumbo I/Os hammering the storage backend. Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c |8 +++- block/qapi.c |4 blockdev.c| 22 +- hmp.c |8 ++-- include/block/block_int.h |1 + qapi-schema.json | 10 -- qemu-options.hx |2 +- qmp-commands.hx |8 ++-- 8 files changed, 54 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 2d6e9b4..cfa890d 100644 --- a/block.c +++ b/block.c @@ -337,6 +337,12 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors, bool is_write) { bool bps_ret, iops_ret; +double ios = 1.0; + +if (bs-io_limits.iops_sector_count) { +ios = ((double) nb_sectors) / bs-io_limits.iops_sector_count; +ios = MAX(ios, 1.0); +} /* check if any bandwith or per IO threshold has been exceeded */ bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write); @@ -360,7 +366,7 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors, /* the IO is authorized so do the accounting and return false */ bs-leaky_buckets.bytes[is_write] += (int64_t)nb_sectors * BDRV_SECTOR_SIZE; -bs-leaky_buckets.ios[is_write]++; +bs-leaky_buckets.ios[is_write] += ios; return false; } diff --git a/block/qapi.c b/block/qapi.c index 03f1604..f81081c 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -259,6 +259,10 @@ void bdrv_query_info(BlockDriverState *bs, bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE]; info-inserted-iops_wr_threshold = bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE]; +info-inserted-has_iops_sector_count = + bs-io_limits.iops_sector_count; +info-inserted-iops_sector_count = + bs-io_limits.iops_sector_count; } bs0 = bs; diff --git a/blockdev.c b/blockdev.c index 9bda359..8ddd710 100644 --- a/blockdev.c +++ b/blockdev.c @@ -352,6 +352,11 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp) return false; } +if (io_limits-iops_sector_count 0) { +error_setg(errp, iops_sector_count must be 0 or greater); +return false; +} + return true; } @@ -563,6 +568,10 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type) qemu_opt_get_number(opts, iops_wr_threshold, io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ); +io_limits.iops_sector_count = + qemu_opt_get_number(opts, iops_sector_count, 0); + + if (!do_check_io_limits(io_limits, error)) { error_report(%s, error_get_pretty(error)); error_free(error); @@ -1260,7 +1269,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd, bool has_iops_rd_threshold, int64_t iops_rd_threshold, bool has_iops_wr_threshold, - int64_t iops_wr_threshold, Error **errp) + int64_t iops_wr_threshold, + bool has_iops_sector_count, + int64_t iops_sector_count, Error **errp) { BlockIOLimit io_limits; BlockDriverState *bs; @@ -1283,6 +1294,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd, io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ; io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ; io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ; +io_limits.iops_sector_count = 0; /* override them with givens values if present */ if (has_bps_threshold) { @@ -1304,6 +1316,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd, io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold; } +if (has_iops_sector_count) { +io_limits.iops_sector_count = iops_sector_count; +} + if (!do_check_io_limits(io_limits, errp)) { return; } @@ -2007,6 +2023,10 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_NUMBER, .help = write bytes threshold, },{ +.name = iops_sector_count, +.type = QEMU_OPT_NUMBER, +.help = when limiting by iops max size of an I/O in sector, +},{ .name = copy-on-read, .type = QEMU_OPT_BOOL, .help = copy read data from backing file into image file, diff --git a/hmp.c b/hmp.c index
[Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics.
This metrics show how many percent of the time I/Os are put on hold by the throttling algorithm. This metric could be used by system administrators or cloud stack developpers to decide when the throttling settings must be changed. Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c | 17 - block/qapi.c |4 hmp.c |6 -- include/block/block_int.h |2 ++ qapi-schema.json |4 +++- 5 files changed, 29 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index cfa890d..749f276 100644 --- a/block.c +++ b/block.c @@ -130,6 +130,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs) qemu_free_timer(bs-block_timer); bs-block_timer = NULL; } +bs-throttling_percentage = 0; +bs-full_since = 0; } static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta) @@ -219,7 +221,8 @@ static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta) static void bdrv_leak_if_needed(BlockDriverState *bs) { int64_t now; -int64_t delta; +int64_t delta; /* the delta that would be ideally the timer period */ +int64_t delta_full; /* the delta where the bucket is full */ if (!bs-must_leak) { return; @@ -229,6 +232,11 @@ static void bdrv_leak_if_needed(BlockDriverState *bs) now = qemu_get_clock_ns(rt_clock); delta = now - bs-previous_leak; +/* compute throttle percentage reflecting how long IO are hold on average */ +if (bs-full_since) { +delta_full = now - bs-full_since; +bs-throttling_percentage = (delta_full * 100) / delta; +} bs-previous_leak = now; bdrv_make_bps_buckets_leak(bs, delta); @@ -255,6 +263,8 @@ void bdrv_io_limits_enable(BlockDriverState *bs) bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs); bs-io_limits_enabled = true; bs-previous_leak = qemu_get_clock_ns(rt_clock); +bs-throttling_percentage = 0; +bs-full_since = 0; qemu_mod_timer(bs-block_timer, qemu_get_clock_ns(vm_clock) + BLOCK_IO_THROTTLE_PERIOD); @@ -396,6 +406,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, * not full */ while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) { +/* remember since when the code decided to block the first I/O */ +if (qemu_co_queue_empty(bs-throttled_reqs)) { +bs-full_since = qemu_get_clock_ns(rt_clock); +} + bdrv_leak_if_needed(bs); qemu_co_queue_wait_insert_head(bs-throttled_reqs); bdrv_leak_if_needed(bs); diff --git a/block/qapi.c b/block/qapi.c index f81081c..bd1c6af 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs, bs-io_limits.iops_sector_count; info-inserted-iops_sector_count = bs-io_limits.iops_sector_count; +info-inserted-has_throttling_percentage = + bs-throttling_percentage; +info-inserted-throttling_percentage = + bs-throttling_percentage; } bs0 = bs; diff --git a/hmp.c b/hmp.c index 3912305..9dc4862 100644 --- a/hmp.c +++ b/hmp.c @@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) iops_threshold=% PRId64 iops_rd_threshold=% PRId64 iops_wr_threshold=% PRId64 - iops_sector_count=% PRId64 \n, + iops_sector_count=% PRId64 + throttling_percentage=% PRId64 \n, info-value-inserted-bps, info-value-inserted-bps_rd, info-value-inserted-bps_wr, @@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict) info-value-inserted-iops_threshold, info-value-inserted-iops_rd_threshold, info-value-inserted-iops_wr_threshold, -info-value-inserted-iops_sector_count); +info-value-inserted-iops_sector_count, +info-value-inserted-throttling_percentage); } else { monitor_printf(mon, [not inserted]); } diff --git a/include/block/block_int.h b/include/block/block_int.h index 74d7503..4487cd9 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -271,6 +271,8 @@ struct BlockDriverState { BlockIOLimit io_limits; BlockIOBaseValue leaky_buckets; int64_t previous_leak; +int64_t full_since; +int throttling_percentage; bool must_leak; CoQueue throttled_reqs; QEMUTimer*block_timer; diff --git
Re: [Qemu-devel] [PATCH 0/9] Add platform bus
On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/07/2013 20:21, Peter Maydell ha scritto: Platforms without ISA and/or PCI have had a seriously hard time in the dynamic device creation world of QEMU. Devices on these were modeled as SysBus devices which can only be instantiated in machine files, not through -device. Why is that so? Because you can't as a user of this sort of hardware plug in an extra serial port to a SoC, because there's just nowhere to plug it in. So why should it be possible to plug an extra serial port into the QEMU model of the SoC? And why exactly should QEMU be limited to modeling an existing SoC? Perhaps the user is not working with an existing SoC. They are working with with IP building blocks that they can combine the way they prefer, and they haven't yet made up their mind on the exact set of devices they'll have. (because not all the world is a PC, but then not all the non-PC world is ARM either). This sounds like (a) a good thing (b) something that will turn into an incredible incomprehensible mess if we try to specify it on the command line. Why would we want to do that? That is, you're arguing for a scripting/config language for putting together board models so you don't have to write them in C. That's a good thing, but not what this patch series is doing. Perhaps the user is working on kernel support for device tree / ACPI, wants to test many device combinations, and does not want to touch C code in order to do that. Perhaps the user can plug daughterboards that connect to the SoC and add an extra serial port, visible as yet another MMIO device. Pluggable daughterboards should be implemented by actually defining the bus/socket that exists between the mainboard and the daughterboard, so you could say -device my-daughterboard and have it plug in to the mainboard. -- PMM
Re: [Qemu-devel] Question on aio_poll
On Sat, Jul 20, 2013 at 02:14:43PM +0100, Alex Bligh wrote: As part of adding timer operations to aio_poll and a clock to AioContext, I am trying to figure out a couple of points on how aio_poll works. This primarily revolves around the flag busy. Firstly, at the end of aio_poll, it has assert(progress || busy); This assert in fact checks nothing, because before the poll (20 lines or so higher), it has if (!busy) { return progress; } and as 'busy' is not altered, busy must be true at the assert. Is this in fact meant to be checking that we have made progress if aio_poll is called blocking? IE assert(progress || !blocking) ? Secondly, the tests I am writing for the timer fail because g_poll is not called, because busy is false. This is because I haven't got an fd set up. But in the instance where aio_poll is called with blocking=true and there are no fd's to wait on, surely aio_poll should either hang indefinitely or (perhaps better) assert, rather than return immediately which is a recipe for an unexpected busywait? If I have timers, it should be running those timers rather than returning. FWIW this goes away in my series that gets rid of .io_flush(): http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00092.html Unfortunately there is an issue with the series which I haven't had time to look into yet. I don't remember the details but I think make check is failing. The current qemu.git/master code is doing the correct thing though. Callers of aio_poll() are using it to complete any pending I/O requests and process BHs. If there is no work left, we do not want to block indefinitely. Instead we want to return. Thirdly, I don't quite understand how/why busy is being set. It seems to be set if the flush callback returns non-zero. That would imply (I think) the fd handler has something to write. But what if it is just interested in any data to read that is available (and never writes)? If this is the only fd aio_poll has, it would appear it never polls. The point of .io_flush() is to select file descriptors that are awaiting I/O (either direction). For example, consider an iSCSI TCP socket with no I/O requests pending. In that case .io_flush() returns 0 and we will not block in aio_poll(). But if there is an iSCSI request pending, then .io_flush() will return 1 and we'll wait for the iSCSI response to be received. The effect of .io_flush() is that aio_poll() will return false if there is no I/O pending. It turned out that this behavior could be implemented at the block layer instead of using the .io_flush() interface at the AioContext layer. The patch series I linked to above modifies the code so AioContext can eliminate the .io_flush() concept. When adapting this for timers, I think it should be returning true only if a an AIO dispatch did something, or a BH was executed, or a timer ran. Specifically if the poll simply times out, it should not be returning true unless a timer ran. Correct? Yes, the return value is about making progress. If there is still work pending then it must return true. If there is no work pending it must return false. Stefan
Re: [Qemu-devel] [PATCH] gluster: Add image resize support
On Fri, Jul 19, 2013 at 07:51:33PM +0530, Bharata B Rao wrote: From: Paolo Bonzini pbonz...@redhat.com Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend can support image resizing. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com Tested-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- block/gluster.c | 17 + 1 file changed, 17 insertions(+) Acked-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] [PATCH 0/9] Add platform bus
On 23/07/2013 14:19, Paolo Bonzini wrote: Il 22/07/2013 20:21, Peter Maydell ha scritto: Platforms without ISA and/or PCI have had a seriously hard time in the dynamic device creation world of QEMU. Devices on these were modeled as SysBus devices which can only be instantiated in machine files, not through -device. Why is that so? Because you can't as a user of this sort of hardware plug in an extra serial port to a SoC, because there's just nowhere to plug it in. So why should it be possible to plug an extra serial port into the QEMU model of the SoC? And why exactly should QEMU be limited to modeling an existing SoC? Perhaps the user is not working with an existing SoC. They are working with with IP building blocks that they can combine the way they prefer, and they haven't yet made up their mind on the exact set of devices they'll have. (because not all the world is a PC, but then not all the non-PC world is ARM either). Perhaps the user is working on kernel support for device tree / ACPI, wants to test many device combinations, and does not want to touch C code in order to do that. For what it's worth, ont sure it matters here, but the Sam460ex (PPC) (I've been writing code to support it recently) has an FDT, and a PCI controller... And there is a firmware setting to switch between the 2nd SATA port and the PCI-e slot as they are mutually exclusive... François.
Re: [Qemu-devel] APM regression since v1.3.0-408-g9ee59f3
On 07/03/13 22:25, Sebastian Herbszt wrote: Commit 9ee59f3 (pc: remove bochs bios debug ports) broke the APM interface between QEMU and Bochs BIOS/SeaBIOS. Without APM support older guests are no longer able to power off the VM. This regression also affects older machine types like pc-1.2. --verbose please. Which guest? Which firmware? ACPI poweroff with seabios works just fine. If APM support in seabios uses something else it should be switched to use the same hardware ports ACPI uses for poweroff (some piix4 pm device register I think) instead of the debug ports which where never meant to be used that way. Does bochs bios run on recent qemu versions in the first place? cheers, Gerd
Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types
On Tue, Jul 23, 2013 at 01:21:13PM +0200, Gerd Hoffmann wrote: Hi, Okay, so if it's just PCIe, then XHCI is the oddball preventing moving it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place, also operating on PCIDevice. Given that live migration support for xhci was added post-1.5 so we don't have a release with it yet this shouldn't be a blocker in case we get this sorted in time for 1.6. Is there a way to detect use of AER or MSIX to place those into subsections of VMSTATE_PCIE_DEVICE()? There is msix_enabled() ... msix_present() Dunno about AER. cheers, Gerd Not at the moment but it's not hard to add. -- MST
Re: [Qemu-devel] [PATCH 0/9] Add platform bus
Il 23/07/2013 14:22, Peter Maydell ha scritto: On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/07/2013 20:21, Peter Maydell ha scritto you can't as a user of this sort of hardware plug in an extra serial port to a SoC, because there's just nowhere to plug it in. So why should it be possible to plug an extra serial port into the QEMU model of the SoC? And why exactly should QEMU be limited to modeling an existing SoC? Perhaps the user is not working with an existing SoC. They are working with with IP building blocks that they can combine the way they prefer, and they haven't yet made up their mind on the exact set of devices they'll have. (because not all the world is a PC, but then not all the non-PC world is ARM either). This sounds like (a) a good thing (b) something that will turn into an incredible incomprehensible mess if we try to specify it on the command line. Why would we want to do that? It is an incomprehensible mess on the command line, but it is actually quite fine if you use -readconfig instead. It will not support for -serial and similar shortcut options, but they aren't necessary. PCI-enabled or paravirtual boards have been able to run for years now with -nodefaults and only -device options. That is, you're arguing for a scripting/config language for putting together board models so you don't have to write them in C. That's a good thing, but not what this patch series is doing. It is, modulo lack of support for -serial and similar shortcut options. Gluing those still requires C code (for PC as well). Perhaps the user is working on kernel support for device tree / ACPI, wants to test many device combinations, and does not want to touch C code in order to do that. Perhaps the user can plug daughterboards that connect to the SoC and add an extra serial port, visible as yet another MMIO device. Pluggable daughterboards should be implemented by actually defining the bus/socket that exists between the mainboard and the daughterboard, so you could say -device my-daughterboard and have it plug in to the mainboard. The bus might just be the processor's data bus + the interrupt controller's pins, basically the same as sysbus. In fact, the main thing I dislike about Alex's patch is adding a new bus instead of making sysbus devices just work as pluggable devices. Paolo
Re: [Qemu-devel] [PATCH 0/9] Add platform bus
On 23 July 2013 13:34, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/07/2013 14:22, Peter Maydell ha scritto: On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/07/2013 20:21, Peter Maydell ha scritto you can't as a user of this sort of hardware plug in an extra serial port to a SoC, because there's just nowhere to plug it in. So why should it be possible to plug an extra serial port into the QEMU model of the SoC? And why exactly should QEMU be limited to modeling an existing SoC? Perhaps the user is not working with an existing SoC. They are working with with IP building blocks that they can combine the way they prefer, and they haven't yet made up their mind on the exact set of devices they'll have. (because not all the world is a PC, but then not all the non-PC world is ARM either). This sounds like (a) a good thing (b) something that will turn into an incredible incomprehensible mess if we try to specify it on the command line. Why would we want to do that? It is an incomprehensible mess on the command line, but it is actually quite fine if you use -readconfig instead. Well, the justification for this whole new bus appears to be so you can easily just add a new device on the command line. Perhaps the user can plug daughterboards that connect to the SoC and add an extra serial port, visible as yet another MMIO device. Pluggable daughterboards should be implemented by actually defining the bus/socket that exists between the mainboard and the daughterboard, so you could say -device my-daughterboard and have it plug in to the mainboard. The bus might just be the processor's data bus + the interrupt controller's pins, basically the same as sysbus. Yes, we should have easy support for defining a pluggable bus as a collection of pins. In fact, the main thing I dislike about Alex's patch is adding a new bus instead of making sysbus devices just work as pluggable devices. Agreed, more or less. Actually I'd rather sysbus devices went away -- the requirement for interrupt and GPIO and memory regions to all be defined as single arrays (so you have to know what interrupt line 3 happens to be, and that memory region 1 is the registers, and so on) is pretty unfriendly. We should be able to define all these as named connections. -- PMM
Re: [Qemu-devel] [PATCH v2 1/2] ps2: add support of auto-repeat
On 07/02/13 08:49, Amos Kong wrote: Actually it's not a urgent/necessary request. Host provided auto-repeat works, and we didn't have real application of holding key by sendkey command now. So it's a solution for a non-existing problem ... Which guests do care about the hardware repeat rate in the first place? I know linux emulates it in software for a loong time (IIRC basically since usb keyboards exist). I wouldn't be surprised if other guests do the same. So I think we should simply not worry about it ;) cheers, Gerd
[Qemu-devel] QCOW2 cryptography and secure key handling
Hi, I have some budget to improve QCOW2's cryptography. My main concern is that the QCOW2 image crypto key is passed in clear text. Do you (the block maintainers) have an idea on how the code could be improved to securely pass the crypto key to the QCOW2 code ? Best regards Benoît
Re: [Qemu-devel] KVM call agenda for 2013-07-30
On 23 July 2013 08:55, Christian Borntraeger borntrae...@de.ibm.com wrote: - soft reset and other reset variants. What is the right way to go? (e.g. on s390 there are several reset variants that reset a defined subset of the system. This can be triggered by operating systems, e.g. kdump uses a diagnose instruction that resets the device subsystem and parts of the cpu registers (some are unchanged, this allows to take a proper crash dump with a well defined system state) FWIW my suggestion is that we should actually model reset lines, any associated reset/power controller, etc: basically do what the hardware does. The qdev reset method should be restricted to we powercycled the system. -- PMM
Re: [Qemu-devel] QCOW2 cryptography and secure key handling
On Tue, Jul 23, 2013 at 02:47:06PM +0200, Benoît Canet wrote: Hi, I have some budget to improve QCOW2's cryptography. My main concern is that the QCOW2 image crypto key is passed in clear text. That is only a problem if someone can sniff the communications channel used by the monitor socket between QEMU the management application. IOW, this is only a problem if someone has configured QEMU to listen on a TCP / UDP socket for monitor traffic. If they had done this, it would be considered an insecure configuration regardless of whether qcow2 encryption is used or not. So I don't think there's any problem which needs solving from the POV of clear text keys over the monitor, besides to document that you should configure QEMU such that its monitor is only accessible to the app managing it. eg use a UNIX domain socket for configuration. Do you (the block maintainers) have an idea on how the code could be improved to securely pass the crypto key to the QCOW2 code ? More generally, QCow2's current encryption support is woefully inadequate from a design POV. If we wanted better encryption built-in to QEMU it is best to just deprecate the current encryption support and define a new qcow2 extension based around something like the LUKS data format. Using the LUKS data format precisely would be good from a data portability POV, since then you can easily switch your images between LUKS encrypted block device qcow2-with-luks image file, without needing to re-encrypt the data. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[Qemu-devel] [PATCH 01/18] qapi-types.py: Split off generate_struct_fields()
Signed-off-by: Kevin Wolf kw...@redhat.com --- scripts/qapi-types.py | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py index ddcfed9..e1239e1 100644 --- a/scripts/qapi-types.py +++ b/scripts/qapi-types.py @@ -57,12 +57,8 @@ typedef struct %(name)sList ''', name=name) -def generate_struct(structname, fieldname, members): -ret = mcgen(''' -struct %(name)s -{ -''', - name=structname) +def generate_struct_fields(members): +ret = '' for argname, argentry, optional, structured in parse_args(members): if optional: @@ -80,6 +76,17 @@ struct %(name)s ''', c_type=c_type(argentry), c_name=c_var(argname)) +return ret + +def generate_struct(structname, fieldname, members): +ret = mcgen(''' +struct %(name)s +{ +''', + name=structname) + +ret += generate_struct_fields(members) + if len(fieldname): fieldname = + fieldname ret += mcgen(''' -- 1.8.1.4
[Qemu-devel] [PATCH 09/18] qapi.py: Maintain a list of union types
Signed-off-by: Kevin Wolf kw...@redhat.com --- scripts/qapi.py | 13 + 1 file changed, 13 insertions(+) diff --git a/scripts/qapi.py b/scripts/qapi.py index baf1321..3a54c7f 100644 --- a/scripts/qapi.py +++ b/scripts/qapi.py @@ -105,6 +105,7 @@ def parse_schema(fp): if expr_eval.has_key('enum'): add_enum(expr_eval['enum']) elif expr_eval.has_key('union'): +add_union(expr_eval) add_enum('%sKind' % expr_eval['union']) elif expr_eval.has_key('type'): add_struct(expr_eval) @@ -188,6 +189,7 @@ def type_name(name): enum_types = [] struct_types = [] +union_types = [] def add_struct(definition): global struct_types @@ -200,6 +202,17 @@ def find_struct(name): return struct return None +def add_union(definition): +global union_types +union_types.append(definition) + +def find_union(name): +global union_types +for union in union_types: +if union['union'] == name: +return union +return None + def add_enum(name): global enum_types enum_types.append(name) -- 1.8.1.4
[Qemu-devel] [PATCH 00/18] 'blockdev-add' QMP command
Kevin Wolf (18): qapi-types.py: Split off generate_struct_fields() qapi-types.py: Implement 'base' for unions qapi-visit.py: Split off generate_visit_struct_fields() qapi-visit.py: Implement 'base' for unions docs: Document QAPI union types qapi: Add visitor for implicit structs qapi: Flat unions with arbitrary discriminator qapi: Add consume argument to qmp_input_get_object() qapi.py: Maintain a list of union types qapi: Anonymous unions block: Allow driver option on the top level QemuOpts: Add qemu_opt_unset() blockdev: Rename I/O throttling options for QMP qcow2: Use dashes instead of underscores in options blockdev: Rename 'readonly' option to 'read-only' blockdev: Split up 'cache' option Implement qdict_flatten() blockdev: 'blockdev-add' QMP command block.c | 7 ++ block/qcow2.h | 8 +- blockdev.c | 184 ++-- docs/qapi-code-gen.txt | 105 +++- include/qapi/qmp/qdict.h| 1 + include/qapi/qmp/qobject.h | 1 + include/qapi/visitor-impl.h | 6 + include/qapi/visitor.h | 6 + include/qemu/option.h | 1 + qapi-schema.json| 293 qapi/qapi-visit-core.c | 25 qapi/qmp-input-visitor.c| 47 +-- qmp-commands.hx | 26 qobject/qdict.c | 50 qobject/qjson.c | 2 + scripts/qapi-types.py | 83 +++-- scripts/qapi-visit.py | 179 ++- scripts/qapi.py | 28 + util/qemu-option.c | 14 +++ 19 files changed, 970 insertions(+), 96 deletions(-) -- 1.8.1.4