[Qemu-devel] 回复: Re: 回复: Re: Which part of qemu responds to ACPI control method?
Thank you laszlo. however, after I got DSDT.dsl, I found that there is no “_PTS” method, even if “_TTS” “_GTS”. Then I go through all the acpi tables, still found no PTS/TTS methods: acpidump acpidump.out acpixtract -a acpidump.out for file in `ls |grep dat`; do iasl -a $file; done The guest os is redhat 6.1 hvm. What does that mean? Does that mean this OS does not support sleep/wakeup(suspend/resume) with acpi? What caused this problem? Does that have anything to do with qemu? (I tried to add logs in hwsleep.c:acpi_enter_sleep_mode in the guest kernel code, and found that the os does not get here) Thank you! 已从三星手机发送 原始邮件 发件人: Laszlo Ersek ler...@redhat.com 日期: 2013-07-03 16:01 (GMT+08:00) 收件人: bobooscar boboos...@gmail.com 抄送: qemu-devel@nongnu.org 主题: Re: 回复: Re: [Qemu-devel] Which part of qemu responds to ACPI control method? On 07/03/13 04:14, bobooscar wrote: Take the method “_PTS” for example, how could I know how it access a certain hardware, and what hardware it accesses? I am a newbie in this field, thanks in advance;) In POSIX-like guests, you can dump the ACPI tables with the acpidump utility (pmtools package), eg. acpidump --table DSDT --output DSDT.aml --binary then decompile it with iasl: iasl -d DSDT.aml This creates DSDT.dsl, a decompiled ACPI Source Language file. You can interpret it by consulting the ACPI specification http://www.acpi.info/spec50.htm. Laszlo
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
Amos Kong ak...@redhat.com writes: On Tue, Jul 02, 2013 at 03:27:12PM +0200, Markus Armbruster wrote: Amos Kong ak...@redhat.com writes: On Tue, Jul 02, 2013 at 11:05:56AM +0200, Markus Armbruster wrote: Amos Kong ak...@redhat.com writes: [...] This interface is abstract in the sense that it applies to all NICs. At this time, it's implemented only virtio-net implements it. I'm habitually wary of abstractions based on just one concrete instance, which makes me ask: 1. Ignorant question first: could the feature make sense for other NICs, too, or is it specific to virtio-net? We will not. It's ugly to check if nic is virtio-net nic in net/net.c, so I register the query function to NetClientInfo. Traversal the net client list in net/net.c, and execute query of each virtio-net instance in virtio-net.c Implementing the feature as an optional callback is fine. Let me rephrase my question: could this feature be implemented for other NICs? I'm *not* asking you to do that, just whether it would be possible. I'm asking because my review of the QAPI schema depends on the answer. 2. If the former, are you reasonably sure this object will do for other NICs? No. I'm not sure I understand you. Do you mean to say that the feature could be implemented for other NICs, but RxFilterInfo would probably not fit for them? We will not implement the feature to other NICs, no request. We notify the management of virtio-net rx-filter change, because we want to sync the the rx-filter change to macvtap device. I understand there are no plans to implement this feature for other NICs. But I'm not asking whether we *want* to implement it for other NICs, I'm asking whether we *could*. In theory, we can. Or rephrased yet another way: what exactly makes this feature applicable to virtio-net only? Macvtap can only be used by virtio-net, not other emulated nic. It's meaningless for management to know the rx-filter change of non-virtio-net NICs. I'm having trouble squaring in theory, we can with meaningless. So I'm rephrasing my question yet again. Do NICs other than virtio-net have rx-filters? If yes, what have these NIC rx-filters in common, and how do they differ? Why would anybody want to query rx-filters? Use cases, please. Why is querying rx-filters meaningless for anything but virtio-net? The dictionary explains meaningless as having no meaning; of no value. Thus, for the query to be meaningless, the answer must carry no information, or at least none of value. Is querying rx-filters really meaningless? Or is it just something we don't need right now, and can't see being needed in the future? If the answer is nothing, then we *could* implement it for other NICs. Else, implementing it for other NICs would be impossible. Once again, I'm not asking because I want it implemented for other NICs. I'm asking because the answer affects my review of the schema.
Re: [Qemu-devel] [PATCH 3/4] qemu-char: Register ring buffer driver with correct name ringbuf
Luiz Capitulino lcapitul...@redhat.com writes: On Thu, 27 Jun 2013 16:22:09 +0200 Markus Armbruster arm...@redhat.com wrote: The driver is new in 1.4, with the documented name ringbuf. However, it's actual name is the completely undocumented memory. Screwed up in commit 3949e59. Fix code to match documentation. Keep the undocumented name working as an alias for compatibility. Cc: qemu-sta...@nongnu.org Signed-off-by: Markus Armbruster arm...@redhat.com This patch doesn't apply anymore, can you respin please? Certainly.
Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
On Thu, 2013-07-04 at 07:54 +0200, Andreas Färber wrote: Am 27.06.2013 08:45, schrieb Alexey Kardashevskiy: From: Prerna Saxena pre...@linux.vnet.ibm.com This patch enables QEMU to launch VM guests on POWER8 chip. I have tested this to work with BML kernel on P8 dd1 chip. Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Paul Mackerras pau...@samba.org The subject slightly hides what the patch is actually doing: Suggest target-ppc: Add POWER8 v0.1 CPU model? It's 1.0 anyway :-) What's DD1, should that be added to the textual description? DD is how we call our chip revisions internally. DD1 is 1.0, DD1.1 is 1.1, etc.. Cheers, Ben. --- target-ppc/cpu-models.c |3 +++ target-ppc/cpu-models.h |1 + target-ppc/translate_init.c | 34 ++ 3 files changed, 38 insertions(+) diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c index 9bb68c8..f8c64dd 100644 --- a/target-ppc/cpu-models.c +++ b/target-ppc/cpu-models.c @@ -1145,6 +1145,8 @@ POWER7 v2.1) POWERPC_DEF(POWER7_v2.3, CPU_POWERPC_POWER7_v23, POWER7, POWER7 v2.3) +POWERPC_DEF(POWER8_v0.1, CPU_POWERPC_POWER8_v01, POWER8, +POWER8 v0.1) POWERPC_DEF(970, CPU_POWERPC_970,970, PowerPC 970) POWERPC_DEF(970fx_v1.0,CPU_POWERPC_970FX_v10, 970FX, @@ -1390,6 +1392,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = { { Dino, POWER3 }, { POWER3+, 631 }, { POWER7, POWER7_v2.3 }, +{ POWER8, POWER8_v0.1 }, { 970fx, 970fx_v3.1 }, { 970mp, 970mp_v1.1 }, { Apache, RS64 }, diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h index 262ca47..b349ad2 100644 --- a/target-ppc/cpu-models.h +++ b/target-ppc/cpu-models.h @@ -556,6 +556,7 @@ enum { CPU_POWERPC_POWER7_v20 = 0x003F0200, CPU_POWERPC_POWER7_v21 = 0x003F0201, CPU_POWERPC_POWER7_v23 = 0x003F0203, +CPU_POWERPC_POWER8_v01 = 0x004B0100, Are you sure this PVR is v0.1 and not v1.0? Rest looks okay, although I wouldn't know how to check all flags. Andreas CPU_POWERPC_970= 0x00390202, CPU_POWERPC_970FX_v10 = 0x00391100, CPU_POWERPC_970FX_v20 = 0x003C0200, diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 95aebf7..2502758 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -7011,6 +7011,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) pcc-l1_dcache_size = 0x8000; pcc-l1_icache_size = 0x8000; } + +POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(oc); +PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc); + +dc-desc = POWER8; +pcc-init_proc = init_proc_POWER7; +pcc-check_pow = check_pow_nocheck; +pcc-insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB | + PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES | + PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE | + PPC_FLOAT_STFIWX | + PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ | + PPC_MEM_SYNC | PPC_MEM_EIEIO | + PPC_MEM_TLBIE | PPC_MEM_TLBSYNC | + PPC_64B | PPC_ALTIVEC | + PPC_SEGMENT_64B | PPC_SLBI | + PPC_POPCNTB | PPC_POPCNTWD; +pcc-insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX; +pcc-msr_mask = 0x8204FF36ULL; +pcc-mmu_model = POWERPC_MMU_2_06; +#if defined(CONFIG_SOFTMMU) +pcc-handle_mmu_fault = ppc_hash64_handle_mmu_fault; +#endif +pcc-excp_model = POWERPC_EXCP_POWER7; +pcc-bus_model = PPC_FLAGS_INPUT_POWER7; +pcc-bfd_mach = bfd_mach_ppc64; +pcc-flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE | + POWERPC_FLAG_BE | POWERPC_FLAG_PMM | + POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR; +pcc-l1_dcache_size = 0x8000; +pcc-l1_icache_size = 0x8000; +} #endif /* defined (TARGET_PPC64) */
Re: [Qemu-devel] meaningless to compare irqfd's msi message with new msi message in virtio_pci_vq_vector_unmask
I searched vector_irqfd globally, no place found to set/change irqfd's msi message, only irqfd's virq or users member may be changed in kvm_virtio_pci_vq_vector_use, kvm_virtio_pci_vq_vector_release, etc. So I think it's meaningless to do below check in virtio_pci_vq_vector_unmask, if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) And, I think the comparison between old msi message and new msi messge should be performed in kvm_update_routing_entry, the raw patch shown as below, Signed-off-by: Zhang Haoyu haoyu.zh...@huawei.com Signed-off-by: Zhang Huanzhong zhanghuanzh...@huawei.com --- hw/virtio/virtio-pci.c |8 +++- kvm-all.c |5 + 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index b070b64..e4829a3 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -613,11 +613,9 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy, if (proxy-vector_irqfd) { irqfd = proxy-vector_irqfd[vector]; -if (irqfd-msg.data != msg.data || irqfd-msg.address != msg.address) { -ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); -if (ret 0) { -return ret; -} +ret = kvm_irqchip_update_msi_route(kvm_state, irqfd-virq, msg); +if (ret 0) { +return ret; } } diff --git a/kvm-all.c b/kvm-all.c index e6b262f..63a33b4 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1034,6 +1034,11 @@ static int kvm_update_routing_entry(KVMState *s, continue; } +if (entry-type == new_entry-type +entry-flags == new_entry-flags +!memcmp(entry-u, new_entry-u, sizeof(entry-u))) { +return 0; +} entry-type = new_entry-type; entry-flags = new_entry-flags; entry-u = new_entry-u; -- 1.7.3.1.msysgit.0 This patch works for both virtio-pci device and pci-passthrough device. MST and I had been discussed this patch before, this patch can avoid meaninglessly updating the routing entry in kvm hypervisor when new msi message is identical with old msi message, especially in some cases, for example, frequently mask/unmask per-vector masking control bit in ISR on some old linux guest(e.g., rhel-5.5), which gains much. At MST's request, the number will be provided later. I started a VM(rhel-5.5) with direct-assigned intel 82599 VF. And, ran iperf-client on the VM, iperf-server on the host where the VM resides, so communication between VM and host was switched in the 82599 NIC. The throughput comparison between above patch applied and not shown as below, before this patch applied: [ID] IntervalTransfer Bandwidth [SUM] 0.0-10.1 sec96.5Mbytes80.1Mbits/sec after this patch applied: [ID] IntervalTransfer Bandwidth [SUM] 0.0-10.0 sec10.9GBytes9.37Gbits/sec Then, I ran netperf-client on the VM, netperf-server on the host where the VM resides, the command shown as below netperf-client: netperf -H [host ip] -l 120 -t TCP_RR -- -m 1024 -r 32,1024 netperf-server: netserver The transaction rate comparison between above patch applied and not shown as below, before this patch applied: SocketSize Request Resp. Elapsed Trans. Send Recv SizeSize TimeRate Bytes Bytes bytes bytes secs. Per sec 16384 87380 32 1024 120.01 36.61 65536 87380 after this patch applied: SocketSize Request Resp. Elapsed Trans. Send Recv SizeSize TimeRate Bytes Bytes bytes bytes secs. Per sec 16384 87380 32 1024 120.01 7464.89 65536 87380 Thanks, Zhang Haoyu
Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
On Wed, Jul 03, 2013 at 02:53:27PM +0200, Benoît Canet wrote: By the way, I don't know much about journalling techniques. So I'm asking you these questions so that either you can answer them straight away or because they might warrant a look at existing journal implementations like: I tried to so something simple and performing for the deduplication usage. That explain that there is no concept of transaction and that the journal's block are flushed asynchronously in order to have an high insertion rate. I agree with your previous comment is more a log than a journal. Simple is good. Even for deduplication alone, I think data integrity is critical - otherwise we risk stale dedup metadata pointing to clusters that are unallocated or do not contain the right data. So the journal will probably need to follow techniques for commits/checksums. Stefan
Re: [Qemu-devel] [PATCH] Xen PV Device
Am 03.07.2013 18:37, schrieb Stefano Stabellini: On Wed, 3 Jul 2013, Paul Durrant wrote: This patch introduces a new Xen PV PCI device which will act as a new binding point for PV drivers for Xen. The device has parameterized vendor-id, device-id and revision to allow to be configured as a binding point for any vendor's PV drivers. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Stefano Stabellini stefano.stabell...@citrix.com --- hw/xen/Makefile.objs |1 + hw/xen/xen_pvdevice.c| 131 ++ include/hw/pci/pci_ids.h |5 +- trace-events |4 ++ 4 files changed, 139 insertions(+), 2 deletions(-) create mode 100644 hw/xen/xen_pvdevice.c diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs index 2017560..fd88003 100644 --- a/hw/xen/Makefile.objs +++ b/hw/xen/Makefile.objs @@ -4,3 +4,4 @@ common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o xen_devconfig.o obj-$(CONFIG_XEN_I386) += xen_platform.o xen_apic.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o +obj-$(CONFIG_XEN) += xen_pvdevice.o diff --git a/hw/xen/xen_pvdevice.c b/hw/xen/xen_pvdevice.c new file mode 100644 index 000..dbc4bf5 --- /dev/null +++ b/hw/xen/xen_pvdevice.c @@ -0,0 +1,131 @@ +/* Copyright (c) Citrix Systems Inc. + * All rights reserved. Like Anthony wrote before, All rights reserved contradicts what's written below. Aside from this, it looks OK to me. I would like to see the libxl side patch. Also it would be nice to have an ack from Andreas or another QOM expert. From a QOM view it looks fine now. :) Thanks for inquiring. Some other comments though: * Now that it no longer depends on TARGET_PAGE_SIZE, is it possible to use common-obj-$(CONFIG_XEN)? Then it would build only once rather than separately for i386 and x86_64 and any future Xen platforms (e.g., arm). * It looks as if the MMIO functions were renamed - the arguments no longer align. That could be edited before you apply the patch to your queue if there's nothing else - then feel free to add my Reviewed-by independent of the other issue. * Paolo had asked for new MemoryRegions not to include the device name - can be renamed once they get the owner field though (not merged yet). Don't have a better suggestion handy. Also Paul, by my count this is [PATCH v4] - please use --subject-prefix=PATCH v5 if you respin and include the change log either below --- or in a cover letter. We prefer to see it for patch review but not in Git commit history. Similarly, Introduce a new Xen PV device... would elegantly avoid reading This patch... after it's been committed. ;) Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [Bug 1197663] [NEW] qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size
Public bug reported: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size. Kernel Version: 3.10.0-rc5+ Libvirt Version: 1.0.6 Qemu Version: 1.5.50 Steps to reproduce the issue: 1. Create a qcow2 voulme using the command qemu-img create -f qcow2 virtio-scsi11.img 10G 2. Add the virtio-scsi controller controller type='scsi' index='0' model='virtio-scsi' address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /controller 3. Attach the qcow2 device to the guest, virsh attach-disk rhel64-64 /home/images/virtio-scsi11.img --persistent sdr --cache writethrough 4. Run the scan commnad echo ' - - - ' /sys/class/scsi_host/host#/scan, if the attached volume doesn't get recognize. 5. Check the dmesg for the added volume. 6. Run fdisk -l command Disk /dev/sdl: 0 MB, 197120 bytes 1 heads, 1 sectors/track, 385 cylinders, total 385 sectors Units = cylinders of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x And observe that the 10G qcow2 volume shows as 0MB. This is not seen with the raw image. Disk /dev/sdm: 10.7 GB, 10737418240 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x Expected Result: The volume size for the qcow2 volumes should be shown correctly inside the guest to avoid confusion. Guest XML: virsh dumpxml rhel64-64 domain type='kvm' id='4' namerhel64-64/name uuid48deb0e1-0c23-9be9-da12-2ead34864de2/uuid memory unit='KiB'4096000/memory currentMemory unit='KiB'4096000/currentMemory vcpu placement='static'1/vcpu resource partition/machine/partition /resource os type arch='x86_64' machine='pc-i440fx-1.5'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/local/bin/qemu-system-x86_64/emulator disk type='file' device='disk' driver name='qemu' type='qcow2' cache='none'/ source file='/home/images/rhel64-64.qcow2'/ target dev='hda' bus='ide'/ alias name='ide0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='cdrom' driver name='qemu' type='raw'/ source file='/home/upstream/autotest/virt-test/shared/data/isos/RHEL-6.4-x86_64-DVD.iso'/ target dev='hdb' bus='ide'/ readonly/ alias name='ide0-0-1'/ address type='drive' controller='0' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi11.img'/ target dev='sda' bus='scsi'/ alias name='scsi0-0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi1.img'/ target dev='sdf' bus='scsi'/ alias name='scsi0-0-0-5'/ address type='drive' controller='0' bus='0' target='0' unit='5'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi9.img'/ target dev='sdg' bus='scsi'/ alias name='scsi0-0-0-6'/ address type='drive' controller='0' bus='0' target='0' unit='6'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi8.img'/ target dev='sdh' bus='scsi'/ alias name='scsi1-0-0'/ address type='drive' controller='1' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi10.img'/ target dev='sdi' bus='scsi'/ alias name='scsi1-0-1'/ address type='drive' controller='1' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi7.img'/ target dev='sdk' bus='scsi'/ alias name='scsi1-0-3'/ address type='drive' controller='1' bus='0' target='0' unit='3'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi6.img'/ target dev='sdl' bus='scsi'/ alias name='scsi1-0-4'/ address type='drive' controller='1' bus='0' target='0' unit='4'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi5.img'/ target
[Qemu-devel] [Bug 1192499] Re: virsh migration copy-storage-all fails with Unable to read from monitor: Connection reset by peer
Moving to qemu component as qemu is crashing based on the inputs from Michal Privoznik Bugzilla : Bug 979411 - virsh migration copy-storage-all fails with Unable to read from monitor: Connection reset by peer ** Project changed: libvirt = qemu ** Bug watch added: Red Hat Bugzilla #979411 https://bugzilla.redhat.com/show_bug.cgi?id=979411 ** Also affects: libvirt (Fedora) via https://bugzilla.redhat.com/show_bug.cgi?id=979411 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1192499 Title: virsh migration copy-storage-all fails with Unable to read from monitor: Connection reset by peer Status in QEMU: New Status in “libvirt” package in Ubuntu: Invalid Status in “libvirt” package in Fedora: Unknown Bug description: virsh migration copy-storage-all fails with Unable to read from monitor: Connection reset by peer and shut downs the guest on the source host. Kernel Version: 3.10.0-rc5+ Libvirt Version: 1.0.6 Qemu Version: 1.5.50 Steps to reproduce the issue: 1. Created the qemu-img create -f qcow2 vm.qcow2 11G on the destination host which is same as the source. 2. Started the guest on the source 3. Started the vncdisplay to monitor the guest 4. Initiated the migration virsh migrate --live VM1 qemu+ssh://host-ip/system tcp://host-ip --verbose --copy-storage-all 5. It started the copying the storage from souce to destination (conitinously monitored it was growing) 6. Guest on the destination was paused and was running on the source 7. At some point the VM on the source got shutdown and migration failed with Unable to read from monitor: Connection reset by peer Attached the libvirt debug logs. The debug logs shows : 2013-06-19 08:43:12.253+: 4026: debug : virEventPollInterruptLocked:716 : Interrupting 2013-06-19 08:43:12.253+: 4026: debug : virEventPollAddTimeout:248 : EVENT_POLL_ADD_TIMEOUT: timer=1 frequency=0 cb=0x7fe930baa960 opaque=(nil) ff=(nil) Note: The virsh live migration works fine with nfs storage from source to destination and vice versa. With libvirt 1.0.5 and qemu 1.5 also we were facing the same issue, but with that even Live migration with nfs also was not working. Guest XML: domain type='kvm' nameVM1/name uuid47feb0e1-0c23-9be9-da12-2ead34864de2/uuid memory unit='KiB'4096000/memory currentMemory unit='KiB'2048000/currentMemory vcpu placement='auto'1/vcpu numatune memory mode='strict' nodeset='0'/ /numatune os type arch='x86_64' machine='pc-i440fx-1.5'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/local/bin/qemu-system-x86_64/emulator disk type='file' device='disk' driver name='qemu' type='qcow2' cache='none'/ source file='/home/images/VM1.qcow2'/ target dev='hda' bus='ide'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='block' device='cdrom' driver name='qemu' type='raw'/ target dev='hdc' bus='ide'/ readonly/ address type='drive' controller='0' bus='1' target='0' unit='0'/ /disk controller type='usb' index='0' address type='pci' domain='0x' bus='0x00' slot='0x01' function='0x2'/ /controller controller type='ide' index='0' address type='pci' domain='0x' bus='0x00' slot='0x01' function='0x1'/ /controller controller type='pci' index='0' model='pci-root'/ interface type='network' mac address='52:54:00:9d:cf:bb'/ source network='default'/ model type='rtl8139'/ address type='pci' domain='0x' bus='0x00' slot='0x03' function='0x0'/ /interface serial type='pty' target port='0'/ /serial console type='pty' target type='serial' port='0'/ /console input type='mouse' bus='ps2'/ graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1' listen type='address' address='127.0.0.1'/ /graphics video model type='cirrus' vram='9216' heads='1'/ address type='pci' domain='0x' bus='0x00' slot='0x02' function='0x0'/ /video memballoon model='virtio' address type='pci' domain='0x' bus='0x00' slot='0x05' function='0x0'/ /memballoon /devices seclabel type='none' model='selinux'/ /domain To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1192499/+subscriptions
Re: [Qemu-devel] [PATCH] qom: Use atomics for object refcounting
On Thu, Jul 4, 2013 at 1:43 PM, Andreas Färber afaer...@suse.de wrote: Am 04.07.2013 06:46, schrieb liu ping fan: On Thu, Jul 4, 2013 at 12:36 AM, Andreas Färber afaer...@suse.de wrote: Am 03.07.2013 03:23, schrieb liu ping fan: [...] It would be nice to get CC'ed on such proposals. :) I will CC you for qom related topic. :) And according to MAINTAINER, I had better CCed maintainer of Device Tree. Thanks. I was asking because I implemented realized and am working towards adopting it in the tree. Device Tree is something different (libfdt/dtc). We do not have Oh, sorry to disturb, Alexander Graf and Peter Crosthwaite :) dedicated Device (formerly qdev) maintainers, Paolo and me have been hacking on it as needed. diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 6985ad8..1f4e5d8 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -794,9 +794,7 @@ static void device_unparent(Object *obj) bus = QLIST_FIRST(dev-child_bus); qbus_free(bus); } -if (dev-realized) { -object_property_set_bool(obj, false, realized, NULL); -} + if (dev-parent_bus) { bus_remove_child(dev-parent_bus, dev); object_unref(OBJECT(dev-parent_bus)); diff --git a/qom/object.c b/qom/object.c index 803b94b..2c945f0 100644 --- a/qom/object.c +++ b/qom/object.c @@ -393,6 +393,7 @@ static void object_finalize(void *data) Object *obj = data; TypeImpl *ti = obj-class-type; +object_property_set_bool(obj, false, realized, NULL); This is incorrect since we specifically only have realized for devices, not for all QOM objects. If we want to move it to the finalizer you'll need to use .instance_finalize on the device type in hw/core/qdev.c. However the derived type's finalizer is run before its parent's, which Do you mean the sequence in object_deinit()? Yes. may lead to realized = false accessing freed memory. If my understanding as above is correct, we just need to guarantee realized=false (e.g. pci_e1000_uninit )for derived type will only free the resource at its layer, and not touch its parent's, then it can not access freed memory, right? For .instance_finalize you are right. For realized, it is up to the derived type to choose when to call the parent's realized implementation, e.g. a PCI device's unrealize implementation will need to call PCIDevice's unrealize after its own cleanups if it needs to access the config space or other resources allocated/free at PCIDevice layer. I doubt we can make it a rule not to touch the parent's resources at all. I think we can make rules more simple. When device_finalize() called, we will let realized=false, and this will reclaim e1000's extra resource, and then pci extra resource. And there is no issue about touching freed memory. But at least today, TYPE_OBJECT does not have an instance_finalize Think it will not happen. Since instance_finalize is a hook for derived object, as for Object, object_finalize is the one, right? implementation, so moving realized=false to hw/core/qdev.c:device_finalize() instead may be an option - hoping Paolo can comment more on device_unparent() vs. device_finalize() usage. I guess device_unparent = isolate and device_finalize = reclaim resource, basing on the understanding of Paolo's patches Delay destruction of memory regions to instance_finalize. Regards, Pingfan Regards, Andreas object_deinit(obj, ti); object_property_del_all(obj); -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
[Qemu-devel] [Bug 1197663] Re: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size
** Also affects: fedora Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1197663 Title: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size Status in QEMU: New Status in Fedora: New Bug description: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size. Kernel Version: 3.10.0-rc5+ Libvirt Version: 1.0.6 Qemu Version: 1.5.50 Steps to reproduce the issue: 1. Create a qcow2 voulme using the command qemu-img create -f qcow2 virtio-scsi11.img 10G 2. Add the virtio-scsi controller controller type='scsi' index='0' model='virtio-scsi' address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /controller 3. Attach the qcow2 device to the guest, virsh attach-disk rhel64-64 /home/images/virtio-scsi11.img --persistent sdr --cache writethrough 4. Run the scan commnad echo ' - - - ' /sys/class/scsi_host/host#/scan, if the attached volume doesn't get recognize. 5. Check the dmesg for the added volume. 6. Run fdisk -l command Disk /dev/sdl: 0 MB, 197120 bytes 1 heads, 1 sectors/track, 385 cylinders, total 385 sectors Units = cylinders of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x And observe that the 10G qcow2 volume shows as 0MB. This is not seen with the raw image. Disk /dev/sdm: 10.7 GB, 10737418240 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x Expected Result: The volume size for the qcow2 volumes should be shown correctly inside the guest to avoid confusion. Guest XML: virsh dumpxml rhel64-64 domain type='kvm' id='4' namerhel64-64/name uuid48deb0e1-0c23-9be9-da12-2ead34864de2/uuid memory unit='KiB'4096000/memory currentMemory unit='KiB'4096000/currentMemory vcpu placement='static'1/vcpu resource partition/machine/partition /resource os type arch='x86_64' machine='pc-i440fx-1.5'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/local/bin/qemu-system-x86_64/emulator disk type='file' device='disk' driver name='qemu' type='qcow2' cache='none'/ source file='/home/images/rhel64-64.qcow2'/ target dev='hda' bus='ide'/ alias name='ide0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='cdrom' driver name='qemu' type='raw'/ source file='/home/upstream/autotest/virt-test/shared/data/isos/RHEL-6.4-x86_64-DVD.iso'/ target dev='hdb' bus='ide'/ readonly/ alias name='ide0-0-1'/ address type='drive' controller='0' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi11.img'/ target dev='sda' bus='scsi'/ alias name='scsi0-0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi1.img'/ target dev='sdf' bus='scsi'/ alias name='scsi0-0-0-5'/ address type='drive' controller='0' bus='0' target='0' unit='5'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi9.img'/ target dev='sdg' bus='scsi'/ alias name='scsi0-0-0-6'/ address type='drive' controller='0' bus='0' target='0' unit='6'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi8.img'/ target dev='sdh' bus='scsi'/ alias name='scsi1-0-0'/ address type='drive' controller='1' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi10.img'/ target dev='sdi' bus='scsi'/ alias name='scsi1-0-1'/ address type='drive' controller='1' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source
Re: [Qemu-devel] [PATCH] full introspection support for QMP
Am 03.07.2013 um 17:59 hat Anthony Liguori geschrieben: Kevin Wolf kw...@redhat.com writes: Am 02.07.2013 um 19:06 hat Anthony Liguori geschrieben: Eric Blake ebl...@redhat.com writes: On 07/02/2013 08:51 AM, Anthony Liguori wrote: Amos Kong ak...@redhat.com writes: Introduces new monitor command to query QMP schema information, the return data is a nested dict/list, it contains the useful metadata. we can add events definations to qapi-schema.json, then it can also be queried. Signed-off-by: Amos Kong ak...@redhat.com Maybe I'm being too meta here, but why not just return qapi-schema.json as a string and call it as day? I know you don't agree with this, but as I mentioned several times before, I think the schema as returned by the introspection functions shouldn't contain what a qemu of this version _could_ in theory provide, but what this specific build actually _does_ provide. It shouldn't include things that are compiled out. I really don't disagree with you here. I just don't like having two formats for the schema. So you agree that we have to postprocess at least in the sense that we leave out things that aren't available? In this case, I think you already have most of the postprocessing code (and this diffstat of this patch seems to show that it's not that much code anyway), so code size isn't a valid point any more. Then we can concentrate on getting the optimal wire format and do whatever is needed to implement it. I've also been the one arguing that the additional complexity (an array of {name:str,type:str,optional:bool}) is better for libvirt in that the JSON is then well-suited for scanning (it is easier to scan through an array where the key is a constant name, and looking for the value that we are interested in, than it is to scan through a dictionary where the keys of the dictionary are the names we are interested in). That is, the JSON in qapi-schema.json is a nice compact representation that works for humans, but may be a bit TOO compact for handling via machines. But adding a bunch of code to do JSON translation just adds a bunch of additional complexity. One reasonable compromise would be: { command: foo, arguments: { name: str, id: int }, optional: { bar: bool } } This assumes that optional vs. mandatory is the only property we ever want to describe for fields. Eric's approach is much more future-proof. Let's keep the format of qapi-schema.json an implementation detail that we can change and extend when necessary. It's always possible to add another argument that describes additional information. For instance: { command: foo, arguments: { name: str, id: int }, optional: { bar: bool }, defaults: { bar: false } } That doesn't mean I think exposing defaults is good, but rather that it's still possible to do this in a compact form. Yeah, it's possible, but it feels kind of backwards to have the properties on the top level and repeat the field names for each property that they have. How does it work for nested structs? There you don't have the arguments substructure, so you'd have to have optional as a child of all the other fields or something like that. It becomes ugly quite quickly. Kevin
Re: [Qemu-devel] [PATCH V1 1/2] Implement sync modes for drive-backup.
Il 03/07/2013 20:14, Ian Main ha scritto: Should the source be bs for MIRROR_SYNC_MODE_NONE? Also in this case you may want to default the format to qcow2 instead of bs's format. I'm not sure that it matters what the source is for NONE. Since we are copying all new writes, whether they would go to a top-most layer or not shouldn't matter? It would matter for reads of still-uncopied data, though. You have to read from the topmost layer, not the one below. As for qcow2 format, there is a 'format' option to the drive-backup API which specifies the format. I guess we could set the default to qcow2 instead of the source format? Anyone have any opinions on that? That would be another possibility. Perhaps use qcow2 for top or none, and the source format for full. I have made the other changes above done. If I don't hear on this issue soon I'll post another revision. You can go ahead and post anyway (just remember to fix the backing file issue), it is a simple patch on top of what you have. Paolo
Re: [Qemu-devel] [PATCH] full introspection support for QMP
Il 03/07/2013 18:06, Anthony Liguori ha scritto: Paolo Bonzini pbonz...@redhat.com writes: Il 03/07/2013 14:54, Anthony Liguori ha scritto: So, qapi-schema.json has to be readable/writable _mostly_ by humans. That it is valid JSON is little more than a curious accident, because I can assure you that it wasn't an accident. Sure, it is not. But when designing the right API for a QMP client, it doesn't matter if it is or not. If QMP used ASN.1 or something like that as the wire protocol, we would not use JSON just for the schema, would we? JSON is a pretty good representation of Python data structures and the intention was for qapi-schema.json to be generated by another tool. But I understand the point you're trying to make. The thing is, QMP is JSON now so it's somewhat academic. If we generated a Python or C API based on the schema, should the client care (or know) that QMP is JSON? Does 'type' have argument 'foo': bool('foo' in type_dict['data']) or bool('*foo' in type_dict['data']) (as a QMP client I want to send the argument, I don't care if it is optional or not) and here the abstraction is already falling, IMHO. It should be one of these: Whether 'type' is in 'foo' is a static property. We would never add non-optional arguments to a function so the first part of the clause is a constant expression. What about returned types? I'm not sure we've never added non-optional arguments, even though in principle it was not the right thing to do. C) Does 'enum' have 'value' - bool('value' in enum_dict['data']) D) Does 'command' have 'parameter' - bool('parameter' in command_dict['data']) What is the type of 'parameter' in command: command_dict['data']['parameter'] or command_dict['data']['*parameter'] That's a fair point. But again, this is a constant expression. Type values never change. Not necessarily, a type that is currently used in two places can be split in two different types, with different optional fields. I understand though that command_dict['data']['parameter'] is either always true or always false, because new parameters are always added as optional. Still, for something that targets a new-enough QEMU only, there is no need to know if the parameter has always been there, or was added as optional. What are we really optimizing here for? I think we should optimize for the clients, not for ourselves. Paolo Regards, Anthony Liguori It should be something like these: command_dict['data'].arguments['parameter'].type command_dict['data']['arguments']['parameter']['type'] The example that Eric sent is not something that I would find easy to read/write. qapi-schema.json instead is more than acceptable. I don't think the example Eric sent is any easier to parse programmatically. It is, see the above examples. That's the problem I have here. I don't see why we can't have both a human readable and machine readable syntax. It is machine readable, but that doesn't mean it constitutes a nice API. Paolo Furthermore, qapi.py is an existence proof that we do :-) Regards, Anthony Liguori Paolo
Re: [Qemu-devel] [PATCH v2 3/4] ide: Set BSY bit during FLUSH
Am 03.07.2013 um 22:02 hat Alex Williamson geschrieben: On Wed, 2013-06-05 at 15:17 +0200, Kevin Wolf wrote: From: Andreas Färber afaer...@suse.de The implementation of the ATA FLUSH command invokes a flush at the block layer, which may on raw files on POSIX entail a synchronous fdatasync(). This may in some cases take so long that the SLES 11 SP1 guest driver reports I/O errors and filesystems get corrupted or remounted read-only. Avoid this by setting BUSY_STAT, so that the guest is made aware we are in the middle of an operation and no ATA commands are attempted to be processed concurrently. Addresses BNC#637297. Suggested-by: Gonglei (Arei) arei.gong...@huawei.com Signed-off-by: Andreas Färber afaer...@suse.de Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/ide/core.c b/hw/ide/core.c index c7a8041..9926d92 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -814,6 +814,7 @@ void ide_flush_cache(IDEState *s) return; } +s-status |= BUSY_STAT; bdrv_acct_start(s-bs, s-acct, 0, BDRV_ACCT_FLUSH); bdrv_aio_flush(s-bs, ide_flush_cb, s); } I can no longer boot win7 x64 on q35 with IDE using a qcow2 image. git bisect determined this patch is the culprit. -M q35 -nodefconfig -readconfig docs/q35-chipset.cfg -drive file=image.qcow2,if=none,id=mydisk -device ide-drive,drive=mydisk,bus=ide.0 This means you're using AHCI, right? handle_cmd() in ahci.c checks the flags and does indeed behave differently now: if (s-dev[port].port.ifs[0].status (BUSY_STAT|DRQ_STAT)) { /* async command, complete later */ s-dev[port].busy_slot = slot; return -1; } /* done handling the command */ return 0; The caller of this code updates pr-cmd_issue to clear the bit for the respective command slot. This is missed now, and the later completion mentioned in the comment doesn't happen for flushes, the IDE core never calls back into the AHCI core for the completion. The correct fix might be to call ide_set_inactive() in the flush callback, though I haven't checked in detail yet whether there's anything specific to DMA read/write in ide_set_inactive(). Kevin
Re: [Qemu-devel] [PATCH] full introspection support for QMP
Il 03/07/2013 17:59, Anthony Liguori ha scritto: For instance: { command: foo, arguments: { name: str, id: int }, optional: { bar: bool }, defaults: { bar: false } } This is still not a dictionary that QAPI is able to describe. Paolo
Re: [Qemu-devel] [PATCH 12/23] ide: Convert FLUSH CACHE to ide_cmd_table handler
Am 03.07.2013 um 23:51 hat Alex Williamson geschrieben: On Wed, 2013-07-03 at 15:41 -0600, Alex Williamson wrote: On Mon, 2013-06-24 at 11:10 +0200, Stefan Hajnoczi wrote: From: Kevin Wolf kw...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hw/ide/core.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 8789758..83e86aa 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1184,6 +1184,12 @@ static bool cmd_write_dma(IDEState *s, uint8_t cmd) return false; } +static bool cmd_flush_cache(IDEState *s, uint8_t cmd) +{ +ide_flush_cache(s); +return false; +} + static bool cmd_read_native_max(IDEState *s, uint8_t cmd) { bool lba48 = (cmd == WIN_READ_NATIVE_MAX_EXT); @@ -1345,8 +1351,8 @@ static const struct { [WIN_SETIDLE1]= { cmd_nop, ALL_OK }, [WIN_CHECKPOWERMODE1] = { cmd_check_power_mode, ALL_OK | SET_DSC }, [WIN_SLEEPNOW1] = { cmd_nop, ALL_OK }, -[WIN_FLUSH_CACHE] = { NULL, ALL_OK }, -[WIN_FLUSH_CACHE_EXT] = { NULL, HD_CFA_OK }, +[WIN_FLUSH_CACHE] = { cmd_flush_cache, ALL_OK }, +[WIN_FLUSH_CACHE_EXT] = { cmd_flush_cache, HD_CFA_OK }, [WIN_IDENTIFY]= { cmd_identify, ALL_OK }, [WIN_SETFEATURES] = { cmd_set_features, ALL_OK | SET_DSC }, [IBM_SENSE_CONDITION] = { NULL, CFA_OK }, @@ -1403,10 +1409,6 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) } switch(val) { -case WIN_FLUSH_CACHE: -case WIN_FLUSH_CACHE_EXT: -ide_flush_cache(s); -break; case WIN_SEEK: /* XXX: Check that seek is within bounds */ s-status = READY_STAT | SEEK_STAT; This also breaks win7 x64 q35 IDE. Note that while this change looks like a no-op, filling in a handler now means that we do: s-status = READY_STAT | BUSY_STAT; before calling the handler and don't clear it on the way out since the function statically returns false. This then introduces the same bug as f68ec837. Thanks, This seems to work around the bug, but I'll leave it to those of you who actually know how IDE works for a proper fix: diff --git a/hw/ide/core.c b/hw/ide/core.c index 96b468c..8893849 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1186,6 +1186,7 @@ static bool cmd_write_dma(IDEState *s, uint8_t cmd) static bool cmd_flush_cache(IDEState *s, uint8_t cmd) { ide_flush_cache(s); +s-status = ~BUSY_STAT; return false; } This is wrong, the BSY bit must remain set while the FLUSH command is running. As I said in the other thread, the real problem is that AHCI isn't notified about the command completion for flushes. Kevin
Re: [Qemu-devel] [PATCH 01/17] cow: make reads go at a decent speed
Il 04/07/2013 04:20, Fam Zheng ha scritto: On Wed, 07/03 16:34, Paolo Bonzini wrote: Do not do two reads for each sector; load each sector of the bitmap and use bitmap operations to process it. Writes are still dog slow! Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/cow.c | 54 -- 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/block/cow.c b/block/cow.c index 1cc2e89..204451e 100644 --- a/block/cow.c +++ b/block/cow.c @@ -126,18 +126,31 @@ static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum) return 0; } -static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum) +#define BITS_PER_BITMAP_SECTOR (512 * 8) + +/* Cannot use bitmap.c on big-endian machines. */ +static int cow_test_bit(int64_t bitnum, const uint8_t *bitmap) { -uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; -uint8_t bitmap; -int ret; +return (bitmap[bitnum / 8] (1 (bitnum 7))) != 0; +} -ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); -if (ret 0) { - return ret; +static int cow_find_streak(const uint8_t *bitmap, int value, int start, int nb_sectors) I think type bool is better for 'value' as you don't booleanize it. And also int64_t for start? start is always between 0 and BITS_PER_BITMAP_SECTOR. value here is a bit value, so 0 or 1 rather than true or false. I prefer to keep it as int, but it can be changed. Paolo +{ +int streak_value = value ? 0xFF : 0; +int last = MIN(start + nb_sectors, BITS_PER_BITMAP_SECTOR); +int bitnum = start; +while (bitnum last) { +if ((bitnum 7) == 0 bitmap[bitnum / 8] == streak_value) { +bitnum += 8; +continue; +} +if (cow_test_bit(bitnum, bitmap) == value) { +bitnum++; +continue; +} +break; } - -return !!(bitmap (1 (bitnum % 8))); +return MIN(bitnum, last) - start; } /* Return true if first block has been changed (ie. current version is @@ -146,23 +159,20 @@ static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum) static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *num_same) { +int64_t bitnum = sector_num + sizeof(struct cow_header_v2) * 8; +uint64_t offset = (bitnum / 8) -BDRV_SECTOR_SIZE; +uint8_t bitmap[512]; +int ret; int changed; -if (nb_sectors == 0) { -*num_same = nb_sectors; -return 0; -} - -changed = is_bit_set(bs, sector_num); -if (changed 0) { -return 0; /* XXX: how to return I/O errors? */ -} - -for (*num_same = 1; *num_same nb_sectors; (*num_same)++) { -if (is_bit_set(bs, sector_num + *num_same) != changed) -break; +ret = bdrv_pread(bs-file, offset, bitmap, sizeof(bitmap)); +if (ret 0) { +return ret; } +bitnum = BITS_PER_BITMAP_SECTOR - 1; +changed = cow_test_bit(bitnum, bitmap); +*num_same = cow_find_streak(bitmap, changed, bitnum, nb_sectors); return changed; } -- 1.8.2.1
[Qemu-devel] [PATCH] Makefile: disable parallel build with dtc
Sometimes I get this error when building with -j 4: ar: two different operation options specified make[1]: *** [libfdt/libfdt.a] Error 1 make: *** [subdir-dtc] Error 2 dtc make does not seem to support parallel make. Force non-parallel build to fix this. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index c06bfab..e86c15a 100644 --- a/Makefile +++ b/Makefile @@ -145,7 +145,7 @@ pixman/Makefile: $(SRC_PATH)/pixman/configure $(SRC_PATH)/pixman/configure: (cd $(SRC_PATH)/pixman; autoreconf -v --install) -DTC_MAKE_ARGS=-I$(SRC_PATH)/dtc VPATH=$(SRC_PATH)/dtc -C dtc V=$(V) LIBFDT_srcdir=$(SRC_PATH)/dtc/libfdt +DTC_MAKE_ARGS=-I$(SRC_PATH)/dtc VPATH=$(SRC_PATH)/dtc -C dtc V=$(V) LIBFDT_srcdir=$(SRC_PATH)/dtc/libfdt --jobs=1 DTC_CFLAGS=$(CFLAGS) $(QEMU_CFLAGS) DTC_CPPFLAGS=-I$(BUILD_DIR)/dtc -I$(SRC_PATH)/dtc -I$(SRC_PATH)/dtc/libfdt -- MST
Re: [Qemu-devel] [PATCH] qom: Use atomics for object refcounting
Il 02/07/2013 18:36, Anthony Liguori ha scritto: Paolo Bonzini pbonz...@redhat.com writes: Il 02/07/2013 16:47, Anthony Liguori ha scritto: Jan Kiszka jan.kis...@siemens.com writes: Objects can soon be referenced/dereference outside the BQL. So we need to use atomics in object_ref/unref. Based on patch by Liu Ping Fan. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qom/object.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/qom/object.c b/qom/object.c index 803b94b..a76a30b 100644 --- a/qom/object.c +++ b/qom/object.c @@ -683,16 +683,15 @@ GSList *object_class_get_list(const char *implements_type, void object_ref(Object *obj) { -obj-ref++; + __sync_fetch_and_add(obj-ref, 1); } void object_unref(Object *obj) { g_assert(obj-ref 0); -obj-ref--; /* parent always holds a reference to its children */ -if (obj-ref == 0) { +if (__sync_sub_and_fetch(obj-ref, 1) == 0) { object_finalize(obj); } } Should we introduce something akin to kref now that referencing counting has gotten fancy? I'm not a big fan of kref (it seems _too_ thin a wrapper to me, i.e. it doesn't really wrap enough to be useful), but I wouldn't oppose it if someone else does it. I had honestly hoped Object was light enough to be used for this purpose. What do you think? We should make it more robust against objects that are not in the QOM composition tree (adding/removing the child property is relatively slow). As things stand, QOM is definitely too slow for something like SCSIRequest. In the long term, it is definitely nice to use Object more. But if we really had to abstract things, for now I'd just do #define atomic_ref(x) atomic_inc(x) #define atomic_unref_test_zero(x) (atomic_fetch_dec(x) == 1) or something like that. Paolo
Re: [Qemu-devel] [PATCH 10/17] block: define get_block_status return value
Il 03/07/2013 23:04, Peter Lieven ha scritto: Define the return value of get_block_status. Bits 0, 1, 2 and 8-62 are valid; bit 63 (the sign bit) is reserved for errors. Bits 3-7 are left for future extensions. Is Bit 8 not also reserved for future use? BDRV_SECTOR_BITS is 9. Right. Can you explain which information is exactly returned in Bits 9-62? Bits 9-62 are the offset at which the data is stored in bs-file, they are valid if bit 2 (BDRV_BLOCK_OFFSET_VALID) is 1. Paolo
Re: [Qemu-devel] [PATCH 02/17] cow: make writes go at a less indecent speed
Il 04/07/2013 04:40, Fam Zheng ha scritto: On Wed, 07/03 16:34, Paolo Bonzini wrote: Only sync once per write, rather than once per sector. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- block/cow.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/block/cow.c b/block/cow.c index 204451e..133e596 100644 --- a/block/cow.c +++ b/block/cow.c @@ -106,7 +106,7 @@ static int cow_open(BlockDriverState *bs, QDict *options, int flags) * XXX(hch): right now these functions are extremely inefficient. * We should just read the whole bitmap we'll need in one go instead. */ -static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum) +static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum, bool *first) Why flush _before first_ write, rather than (more intuitively) flush _after last_ write? Because you have to flush the data before you start writing the metadata. Flushing the metadata can be done when the guest issues a flush. This ensures that, in case of a power loss, the metadata will never refer to data that hasn't been written. Paolo And personally I think bool sync makes a better signature than bool *first, although it's not that critical as cow_update_bitmap is the only caller. { uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8; uint8_t bitmap; @@ -117,9 +117,21 @@ static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum) return ret; } +if (bitmap (1 (bitnum % 8))) { +return 0; +} + +if (*first) { +ret = bdrv_flush(bs-file); +if (ret 0) { + return ret; +} +*first = false; +} + bitmap |= (1 (bitnum % 8)); -ret = bdrv_pwrite_sync(bs-file, offset, bitmap, sizeof(bitmap)); +ret = bdrv_pwrite(bs-file, offset, bitmap, sizeof(bitmap)); if (ret 0) { return ret; } @@ -181,9 +193,10 @@ static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num, { int error = 0; int i; +bool first = true; for (i = 0; i nb_sectors; i++) { -error = cow_set_bit(bs, sector_num + i); +error = cow_set_bit(bs, sector_num + i, first); if (error) { break; } -- 1.8.2.1
Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a map subcommand
Il 04/07/2013 07:34, Fam Zheng ha scritto: +if ((e-flags (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) == BDRV_BLOCK_DATA) { +printf(%lld %lld %d %lld\n, + (long long) e-start, (long long) e-length, + e-depth, (long long) e-offset); +} Why %lld and explicit cast, not using PRId64? Will fix. Is BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO distinguishable here for the user? By offset? I'm not sure I understand the question. Zero blocks are always omitted in the human format. Only non-zero blocks are listed. Paolo
Re: [Qemu-devel] [PATCH] Xen PV Device
-Original Message- From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] Sent: 03 July 2013 17:38 To: Paul Durrant Cc: qemu-devel@nongnu.org; xen-de...@lists.xen.org; Stefano Stabellini; afaer...@suse.de Subject: Re: [PATCH] Xen PV Device On Wed, 3 Jul 2013, Paul Durrant wrote: This patch introduces a new Xen PV PCI device which will act as a new binding point for PV drivers for Xen. The device has parameterized vendor-id, device-id and revision to allow to be configured as a binding point for any vendor's PV drivers. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Stefano Stabellini stefano.stabell...@citrix.com --- hw/xen/Makefile.objs |1 + hw/xen/xen_pvdevice.c| 131 ++ include/hw/pci/pci_ids.h |5 +- trace-events |4 ++ 4 files changed, 139 insertions(+), 2 deletions(-) create mode 100644 hw/xen/xen_pvdevice.c diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs index 2017560..fd88003 100644 --- a/hw/xen/Makefile.objs +++ b/hw/xen/Makefile.objs @@ -4,3 +4,4 @@ common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o xen_devconfig.o obj-$(CONFIG_XEN_I386) += xen_platform.o xen_apic.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o +obj-$(CONFIG_XEN) += xen_pvdevice.o diff --git a/hw/xen/xen_pvdevice.c b/hw/xen/xen_pvdevice.c new file mode 100644 index 000..dbc4bf5 --- /dev/null +++ b/hw/xen/xen_pvdevice.c @@ -0,0 +1,131 @@ +/* Copyright (c) Citrix Systems Inc. + * All rights reserved. Like Anthony wrote before, All rights reserved contradicts what's written below. Aside from this, it looks OK to me. I would like to see the libxl side patch. Working on it, but it's not required to use the new device so I don't think the QEMU patch need be predicated on it. Paul
Re: [Qemu-devel] [PATCH 11/17] block: return get_block_status data and flags for formats
Il 04/07/2013 05:22, Fam Zheng ha scritto: +case VMDK_OK: + /* TODO: might return offset if the extents are in bs-file. */ + ret = BDRV_BLOCK_DATA; if (extent-file == bs-file) { ret |= BDRV_BLOCK_OFFSET_VALID | offset; } Thanks. :) Paolo
Re: [Qemu-devel] [PATCH] Xen PV Device
-Original Message- Like Anthony wrote before, All rights reserved contradicts what's written below. Like I said, it's part of all BSD licenses that I can find. It's certainly in the template on the OSI website and the FreeBSD license for instance. Aside from this, it looks OK to me. I would like to see the libxl side patch. Also it would be nice to have an ack from Andreas or another QOM expert. From a QOM view it looks fine now. :) Thanks for inquiring. Some other comments though: * Now that it no longer depends on TARGET_PAGE_SIZE, is it possible to use common-obj-$(CONFIG_XEN)? Then it would build only once rather than separately for i386 and x86_64 and any future Xen platforms (e.g., arm). Sure, that sounds sensible. * It looks as if the MMIO functions were renamed - the arguments no longer align. That could be edited before you apply the patch to your queue if there's nothing else - then feel free to add my Reviewed-by independent of the other issue. Thanks. * Paolo had asked for new MemoryRegions not to include the device name - can be renamed once they get the owner field though (not merged yet). Don't have a better suggestion handy. I guess this can be fixed up later. Also Paul, by my count this is [PATCH v4] - please use --subject-prefix=PATCH v5 if you respin and include the change log either below --- or in a cover letter. We prefer to see it for patch review but not in Git commit history. Ok. I was unsure what to do since this device was under a different name so I opted to reset the version back to 1. I'll call the next one v5 as you suggest. I'm still finding my way with git so thanks for the tips. Similarly, Introduce a new Xen PV device... would elegantly avoid reading This patch... after it's been committed. ;) Sure. Good point. Paul
Re: [Qemu-devel] [PATCH v4 0/9] Make 'dump-guest-memory' dump in kdump-compressed format
On Wed, Jul 03, 2013 at 03:39:51PM +0800, Qiao Nuohan wrote: On 07/01/2013 07:45 PM, Stefan Hajnoczi wrote: In flatten format, data will be write to dumpfile block by block, and uses the following structure to indicate the offset and size of a data block. struct makedumpfile_data_header { int64_t offset; int64_t buf_size; }; For more information, please refer to makedumpfile http://sourceforge.net/projects/makedumpfile/ I see. From the QEMU code perspective this will be simpler. Stefan
Re: [Qemu-devel] [PATCH 3/3] PPC PReP: can run without bios image
No conclusion was finally done about the new option proposal to load roms files. It really would be handy. -- Julio Guerra
Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a map subcommand
On Thu, 07/04 10:16, Paolo Bonzini wrote: Il 04/07/2013 07:34, Fam Zheng ha scritto: +if ((e-flags (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) == BDRV_BLOCK_DATA) { +printf(%lld %lld %d %lld\n, + (long long) e-start, (long long) e-length, + e-depth, (long long) e-offset); +} Why %lld and explicit cast, not using PRId64? Will fix. Is BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO distinguishable here for the user? By offset? I'm not sure I understand the question. Zero blocks are always omitted in the human format. Only non-zero blocks are listed. I missed this. -- Fam
Re: [Qemu-devel] [PATCH] Citrix PV Bus device
On Tue, Jul 02, 2013 at 12:10:17PM +0100, Peter Maydell wrote: On 2 July 2013 11:57, Paul Durrant paul.durr...@citrix.com wrote: -Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] So the reason to place the device here is TARGET_PAGE_SIZE... We really need a way to access that value from common code, somewhere down my TODO list. :/ We probably don't, because it generally doesn't mean what you think it does. It's the smallest possible page size the guest CPU supports, which may not be the same as the actual page size the guest OS is using. Why does it need to be in pages rather than bytes? It doesn't necessarily need to be in pages; it's just a more convenient quantity than bytes. It isn't really more convienient, because the guest would have to tell QEMU what the page size was. (I'm told that virtio is planning to move to a simple just use a byte count approach.) thanks -- PMM Yes, sometime in a distant future ...
Re: [Qemu-devel] [PATCH v5] Add timestamp to error_report()
On Thu, Jul 04, 2013 at 02:57:13AM +, Seiji Aguchi wrote: -Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Wednesday, July 03, 2013 5:14 AM To: Seiji Aguchi Cc: qemu-devel@nongnu.org; aligu...@us.ibm.com; berra...@redhat.com; kw...@redhat.com; mtosa...@redhat.com; arm...@redhat.com; Tomoki Sekiyama; pbonz...@redhat.com; lcapitul...@redhat.com; ler...@redhat.com; ebl...@redhat.com; dle-deve...@lists.sourceforge.net Subject: Re: [PATCH v5] Add timestamp to error_report() On Tue, Jul 02, 2013 at 02:09:24PM +, Seiji Aguchi wrote: +DEF(msg, HAS_ARG, QEMU_OPTION_msg, +-msg [timestamp=on|off]\n + change the format of messages\n + timestamp=on|off enables leading timestamps (default:on)\n, +QEMU_ARCH_ALL) +STEXI +@item -msg timestamp=on|off +@findex -msg +prepend a timestamp to each log message. +(disabled by default) +ETEXI I am confused. If the user specifies -msg then enable_timestamp_msg is on by default. If the user does not specify -msg then enable_timestmap_msg is off. Did I get that right? Yes. This means that the default behavior of QEMU does not change but you can add -msg to enable timestamps. I'm happy with this but find the documentation confusing. I can remove (disabled by default) if needed. Perhaps the simplest solution is timestamp=off by default. Then there can be no confusion and users must do -msg timestamp=on to enable timestamps. If you really want to keep -msg as a shortcut for -msg timestamp=on, then please document explicitly that: 1. Without -msg timestamps are off. 1. With -msg timestamps are on. 2. -msg timestamp=off can be used to turn timestamps off again. My apologies for the confusion. The syntax, -msg [timestamp=on|off], was wrong. -msg timestamp[=on|off] is correct. And there is no way to make timestamp optional, as far as I looked into a source code. Therefore, the explanation should be as below. (I think it is reasonable to keep -msg timestamp as a shortcut for -msg timestamp=on.) Yes, I think you are correct. I thought previously that -msg works but it seems an option is always required. snip +DEF(msg, HAS_ARG, QEMU_OPTION_msg, +-msg timestamp[=on|off]\n +change the format of messages\n +on|off controls leading timestamps (default:on)\n, +QEMU_ARCH_ALL) +STEXI +@item -msg timestamp[=on|off] +@findex -msg +prepend a timestamp to each log message.(default:on) +ETEXI snip To be simpler, we may be able to introduce just a single -msg-timestamp. But I think current -msg timestamp[=on|off] is reasonable because other options may be introduced to msg, like log_level or debug. Yep. Stefan
Re: [Qemu-devel] PVFS2 Block Driver Support
On Wed, Jul 03, 2013 at 11:41:08AM -0400, Timothy Scott wrote: In testing my block driver implementation, I am receiving the following error when trying to run an orangefs protocol with a qcow2 image format: +Header extension too large +qemu-io: can't open device pvfs2:/... +no file open, try 'help open' Is './check -pvfs2 -qcow2' a valid usecase in the iotests suite for specifying a qcow2 format file over the orangefs protocol? I haven't run IMGPROTO + IMGFMT tests but looking at the code it should work. The Header extension too large error message comes from block/qcow2.c so it seems the header data is corrupt. I suggest running ./check -pvfs2 first to make sure it passes the raw image tests. Once that seems okay it's worth looking into issues from ./check -pvfs2 -qcow2 and installing/running guests. Stefan
Re: [Qemu-devel] [PATCH] hw/9pfs: Fix potential memory leak and avoid reuse of freed memory
Stefan Weil s...@weilnetz.de writes: The leak was reported by cppcheck. Function proxy_init also calls g_free for ctx-fs_root. Avoid reuse of this memory by setting ctx-fs_root to NULL. Signed-off-by: Stefan Weil s...@weilnetz.de Reviewed-by: M. Mohan Kumar mo...@in.ibm.com --- Hi, I'm not sure whether ctx-fs_root should also be freed in the error case. Please feel free to modify my patch if needed. Regards Stefan Weil hw/9pfs/virtio-9p-proxy.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/9pfs/virtio-9p-proxy.c b/hw/9pfs/virtio-9p-proxy.c index 8ba2959..5f44bb7 100644 --- a/hw/9pfs/virtio-9p-proxy.c +++ b/hw/9pfs/virtio-9p-proxy.c @@ -1153,10 +1153,12 @@ static int proxy_init(FsContext *ctx) sock_id = atoi(ctx-fs_root); if (sock_id 0) { fprintf(stderr, socket descriptor not initialized\n); +g_free(proxy); return -1; } } g_free(ctx-fs_root); +ctx-fs_root = NULL; proxy-in_iovec.iov_base = g_malloc(PROXY_MAX_IO_SZ + PROXY_HDR_SZ); proxy-in_iovec.iov_len = PROXY_MAX_IO_SZ + PROXY_HDR_SZ; -- 1.7.10.4
Re: [Qemu-devel] [Bug 1187529] [PATCH] Update mappings after PCI bridge live migration or save-restore.
On Wed, Jul 03, 2013 at 11:04:16AM -0400, Don Koch wrote: From: Don Koch dk...@verizon.com Update mappings for PCI bridge after live migration. Signed-off-by: Don Koch dk...@verizon.com --- This fixes bug 1187529: devices on a PCI bridge stop working after migration. Thanks, this looks good, but any bridge device would need this fix, won't it? Could we call this from get_pci_config_device instead? This way all bridge devices would be fixed. hw/pci-bridge/pci_bridge_dev.c | 9 + hw/pci/pci_bridge.c| 2 +- include/hw/pci/pci_bridge.h| 1 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index 971b432..9e5062e 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -110,6 +110,14 @@ static void qdev_pci_bridge_dev_reset(DeviceState *qdev) shpc_reset(dev); } +static int pci_bridge_dev_post_load(void *opaque, int ver) { +PCIDevice *d = opaque; +PCIBridge *s = container_of(d, PCIBridge, dev); + +pci_bridge_update_mappings(s); +return 0; +} + static Property pci_bridge_dev_properties[] = { /* Note: 0 is not a legal chassis number. */ DEFINE_PROP_UINT8(chassis_nr, PCIBridgeDev, chassis_nr, 0), @@ -119,6 +127,7 @@ static Property pci_bridge_dev_properties[] = { static const VMStateDescription pci_bridge_dev_vmstate = { .name = pci_bridge, +.post_load = pci_bridge_dev_post_load, .fields = (VMStateField[]) { VMSTATE_PCI_DEVICE(bridge.dev, PCIBridgeDev), SHPC_VMSTATE(bridge.dev.shpc, PCIBridgeDev), diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index 24be6c5..3897bd8 100644 --- a/hw/pci/pci_bridge.c +++ b/hw/pci/pci_bridge.c @@ -224,7 +224,7 @@ static void pci_bridge_region_cleanup(PCIBridge *br, PCIBridgeWindows *w) g_free(w); } -static void pci_bridge_update_mappings(PCIBridge *br) +void pci_bridge_update_mappings(PCIBridge *br) { PCIBridgeWindows *w = br-windows; diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h index 1868f7a..1d8f997 100644 --- a/include/hw/pci/pci_bridge.h +++ b/include/hw/pci/pci_bridge.h @@ -37,6 +37,7 @@ PCIBus *pci_bridge_get_sec_bus(PCIBridge *br); pcibus_t pci_bridge_get_base(const PCIDevice *bridge, uint8_t type); pcibus_t pci_bridge_get_limit(const PCIDevice *bridge, uint8_t type); +void pci_bridge_update_mappings(PCIBridge *br); void pci_bridge_write_config(PCIDevice *d, uint32_t address, uint32_t val, int len); void pci_bridge_disable_base_limit(PCIDevice *dev); -- 1.7.11.7
Re: [Qemu-devel] [libvirt] best way to provide disk storage for vm without shared storage system
On Wed, Jul 03, 2013 at 05:31:44PM +0400, Vasiliy Tolstov wrote: Now i provide ext4 fs for qcow2 images (raid1 with two sata disks). Now i don't need live migration (but may need it in feature). What is the best way to provide disks to vm in case of performance, ability to create backups (i don't want lvm snapshots)? As i search from google more speed can take physical storage - lvm. But may be QED or FVD can provide near lvm performance to me? Best really depends. If you don't want to use LVM you could use raw image files (fast) and perform backups inside the guest just like on a physical machine. qcow2 has pretty good performance nowadays. If you care about performance then benchmark your workload to decide which configuration best. There is no single answer because it depends on your workload and additional constraints (like no LVM). Stefan
Re: [Qemu-devel] [PULL v2 00/21] pci,kvm,misc enhancements
On Fri, Jun 28, 2013 at 12:44:17PM -0500, Anthony Liguori wrote: Markus Armbruster arm...@redhat.com writes: Michael S. Tsirkin m...@redhat.com writes: pvpanic: fix fwcfg for big endian hosts Umm, 1+10+9 is 20, but the pull is for 21 patches; what's going on here? Funny :-) Note that the series is missing 2/21. The branch has 20 commits, I suspect Michael did git-format-patch to a directory, deleted one of the files, and never bothered changing the N of M. Regards, Anthony Liguori Actually no, apparently something went wrong when sending mail. -- MST
Re: [Qemu-devel] [PATCH v6] add timestamp to error_report()
On Wed, Jul 03, 2013 at 11:02:46PM -0400, Seiji Aguchi wrote: [Issue] When we offer a customer support service and a problem happens in a customer's system, we try to understand the problem by comparing what the customer reports with message logs of the customer's system. In this case, we often need to know when the problem happens. But, currently, there is no timestamp in qemu's error messages. Therefore, we may not be able to understand the problem based on error messages. [Solution] Add a timestamp to qemu's error message logged by error_report() with g_time_val_to_iso8601(). Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- Changelog v5 - v6 - Remove include/qemu/time.h and utils/qemu-time.c. - Fix a syntax and indent of messages in msg option's DEF(). - Change explanation of the msg option. v4 - v5 - Fix descriptions of msg option. - Rename TIME_H to QEMU_TIME_H. (avoiding double inclusion of qemu/time.h) - Change argument of qemu_get_timestamp_str to char * and size_t. - Confirmed msg option is displayed by query-command-line-options. v3 - v4 - Correct email address of Signed-off-by. v2 - v3 - Use g_time_val_to_iso8601() to get timestamp instead of copying libvirt's time-handling functions. According to discussion below, qemu doesn't need to take care if timestamp functions are async-signal safe or not. http://marc.info/?l=qemu-develm=136741841921265w=2 Also, In the review of v2 patch, strftime() are recommended to format string. But it is not a suitable function to handle msec. Then, simply call g_time_val_to_iso8601(). - Intoroduce a common time-handling function to util/qemu-time.c. (Suggested by Daniel P. Berrange) - Add testing for g_time_val_to_iso8601() to tests/test-time.c. The test cases are copied from libvirt's virtimetest. (Suggested by Daniel P. Berrange) v1 - v2 - add an option, -msg timestamp={on|off}, to enable output message with timestamp --- include/qemu/error-report.h |2 ++ qemu-options.hx | 11 +++ util/qemu-error.c | 10 ++ vl.c| 26 ++ 4 files changed, 49 insertions(+), 0 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
[Qemu-devel] [PATCH v3 01/18] range: add Range structure
Sometimes we need to pass ranges around, add a handy structure for this purpose. Note: memory.c defines its own concept of AddrRange structure for working with 128 addresses. It's necessary there for doing range math. This is not needed for most users: struct Range is much simpler, and is only used for passing the range around. Cc: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/qemu/range.h | 16 1 file changed, 16 insertions(+) diff --git a/include/qemu/range.h b/include/qemu/range.h index 3502372..b76cc0d 100644 --- a/include/qemu/range.h +++ b/include/qemu/range.h @@ -1,6 +1,22 @@ #ifndef QEMU_RANGE_H #define QEMU_RANGE_H +#include inttypes.h + +/* + * Operations on 64 bit address ranges. + * Notes: + * - ranges must not wrap around 0, but can include the last byte ~0x0LL. + * - this can not represent a full 0 to ~0x0LL range. + */ + +/* A structure representing a range of addresses. */ +struct Range { +uint64_t begin; /* First byte of the range, or 0 if empty. */ +uint64_t end; /* 1 + the last byte. 0 if range empty or ends at ~0x0LL. */ +}; +typedef struct Range Range; + /* Get last byte of a range from offset + length. * Undefined for ranges that wrap around 0. */ static inline uint64_t range_get_last(uint64_t offset, uint64_t len) -- MST
[Qemu-devel] [PATCH v3 02/18] pci: store PCI hole ranges in guestinfo structure
Will be used to pass hole ranges to guests. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 46 +- hw/i386/pc_piix.c | 14 +- hw/i386/pc_q35.c | 6 +- hw/pci-host/q35.c | 8 include/hw/i386/pc.h | 19 ++- include/hw/pci-host/q35.h | 2 ++ include/qemu/typedefs.h | 1 + 7 files changed, 92 insertions(+), 4 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 78f92e2..8af1e4e 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -989,6 +989,48 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +typedef struct PcGuestInfoState { +PcGuestInfo info; +Notifier machine_done; +} PcGuestInfoState; + +static +void pc_guest_info_machine_done(Notifier *notifier, void *data) +{ +PcGuestInfoState *guest_info_state = container_of(notifier, + PcGuestInfoState, + machine_done); +} + +PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, +ram_addr_t above_4g_mem_size) +{ +PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); +PcGuestInfo *guest_info = guest_info_state-info; + +guest_info-pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS; +if (sizeof(hwaddr) == 4) { +guest_info-pci_info.w64.begin = 0; +guest_info-pci_info.w64.end = 0; +} else { +/* + * BIOS does not set MTRR entries for the 64 bit window, so no need to + * align address to power of two. Align address at 1G, this makes sure + * it can be exactly covered with a PAT entry even when using huge + * pages. + */ +guest_info-pci_info.w64.begin = +ROUND_UP((0x1ULL 32) + above_4g_mem_size, 0x1ULL 30); +guest_info-pci_info.w64.end = guest_info-pci_info.w64.begin + +(0x1ULL 62); +assert(guest_info-pci_info.w64.begin = guest_info-pci_info.w64.end); +} + +guest_info_state-machine_done.notify = pc_guest_info_machine_done; +qemu_add_machine_init_done_notifier(guest_info_state-machine_done); +return guest_info; +} + void pc_acpi_init(const char *default_dsdt) { char *filename; @@ -1030,7 +1072,8 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size, MemoryRegion *rom_memory, - MemoryRegion **ram_memory) + MemoryRegion **ram_memory, + PcGuestInfo *guest_info) { int linux_boot, i; MemoryRegion *ram, *option_rom_mr; @@ -1082,6 +1125,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, for (i = 0; i nb_option_roms; i++) { rom_add_option(option_rom[i].name, option_rom[i].bootindex); } +guest_info-fw_cfg = fw_cfg; return fw_cfg; } diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index fa59a0c..4637bde 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -90,6 +90,7 @@ static void pc_init1(MemoryRegion *system_memory, MemoryRegion *rom_memory; DeviceState *icc_bridge; FWCfgState *fw_cfg = NULL; +PcGuestInfo *guest_info; if (xen_enabled() xen_hvm_init() != 0) { fprintf(stderr, xen hardware virtual machine initialisation failed\n); @@ -124,12 +125,23 @@ static void pc_init1(MemoryRegion *system_memory, rom_memory = system_memory; } +guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size); + +/* Set PCI window size the way seabios has always done it. */ +/* Power of 2 so bios can cover it with a single MTRR */ +if (ram_size = 0x8000) +guest_info-pci_info.w32.begin = 0x8000; +else if (ram_size = 0xc000) +guest_info-pci_info.w32.begin = 0xc000; +else +guest_info-pci_info.w32.begin = 0xe000; + /* allocate ram and load rom/bios */ if (!xen_enabled()) { fw_cfg = pc_memory_init(system_memory, kernel_filename, kernel_cmdline, initrd_filename, below_4g_mem_size, above_4g_mem_size, - rom_memory, ram_memory); + rom_memory, ram_memory, guest_info); } gsi_state = g_malloc0(sizeof(*gsi_state)); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index bb0ce6a..a13acf2 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -77,6 +77,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args) ICH9LPCState *ich9_lpc; PCIDevice *ahci; DeviceState *icc_bridge; +PcGuestInfo *guest_info; icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE); object_property_add_child(qdev_get_machine(), icc-bridge, @@ -105,11 +106,13 @@ static void
[Qemu-devel] [PATCH v3 04/18] pc_piix: cleanup init compat handling
Make sure 1.4 calls 1.5, 1.3 calls 1.4 etc. This way it's enough to add enough new compat hook in a single place in piix. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc_piix.c | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 8a18dbe..e393022 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -270,38 +270,28 @@ static void pc_init_pci_1_5(QEMUMachineInitArgs *args) static void pc_init_pci_1_4(QEMUMachineInitArgs *args) { -has_pci_info = false; has_pvpanic = false; x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE); -pc_init_pci(args); +pc_init_pci_1_5(args); } static void pc_init_pci_1_3(QEMUMachineInitArgs *args) { -has_pci_info = false; enable_compat_apic_id_mode(); -has_pvpanic = false; -pc_init_pci(args); +pc_init_pci_1_4(args); } /* PC machine init function for pc-1.1 to pc-1.2 */ static void pc_init_pci_1_2(QEMUMachineInitArgs *args) { -has_pci_info = false; disable_kvm_pv_eoi(); -enable_compat_apic_id_mode(); -has_pvpanic = false; -pc_init_pci(args); +pc_init_pci_1_3(args); } /* PC machine init function for pc-0.14 to pc-1.0 */ static void pc_init_pci_1_0(QEMUMachineInitArgs *args) { -has_pci_info = false; -disable_kvm_pv_eoi(); -enable_compat_apic_id_mode(); -has_pvpanic = false; -pc_init_pci(args); +pc_init_pci_1_2(args); } /* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */ -- MST
[Qemu-devel] [PULL v3 00/18] pci,misc enhancements
Changes from v2: - rebased to origin/master - fixed up botched posting The following changes since commit ab8bf29078e0ab8347e2ff8b4e5542f7a0c751cf: Merge remote-tracking branch 'qemu-kvm/uq/master' into staging (2013-07-03 08:37:00 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony for you to fetch changes up to e34cc4adf3106ff5bed9723b8f9b4730f1662f7d: pci: Fold host_buses list into PCIHostState functionality (2013-07-04 10:45:32 +0300) pci,misc enhancements This includes some pci enhancements: Better support for systems with multiple PCI root buses FW cfg interface for more robust pci programming in BIOS Minor fixes/cleanups for fw cfg and cross-version migration - because of dependencies with other patches Signed-off-by: Michael S. Tsirkin m...@redhat.com Andrew Jones (1): e1000: cleanup process_tx_desc David Gibson (10): pci: Cleanup configuration for pci-hotplug.c pci: Move pci_read_devaddr to pci-hotplug-old.c pci: Abolish pci_find_root_bus() pci: Use helper to find device's root bus in pci_find_domain() pci: Replace pci_find_domain() with more general pci_root_bus_path() pci: Add root bus argument to pci_get_bus_devfn() pci: Add root bus parameter to pci_nic_init() pci: Simpler implementation of primary PCI bus pci: Remove domain from PCIHostBus pci: Fold host_buses list into PCIHostState functionality Michael S. Tsirkin (7): range: add Range structure pci: store PCI hole ranges in guestinfo structure pc: pass PCI hole ranges to Guests pc_piix: cleanup init compat handling MAINTAINERS: s/Marcelo/Paolo/ pvpanic: initialization cleanup pvpanic: fix fwcfg for big endian hosts MAINTAINERS | 2 +- default-configs/i386-softmmu.mak| 3 +- default-configs/ppc64-softmmu.mak | 2 - default-configs/x86_64-softmmu.mak | 3 +- hmp-commands.hx | 4 +- hw/alpha/dp264.c| 2 +- hw/arm/realview.c | 6 +- hw/arm/versatilepb.c| 2 +- hw/i386/pc.c| 74 ++- hw/i386/pc_piix.c | 40 +--- hw/i386/pc_q35.c| 18 +++- hw/mips/mips_fulong2e.c | 6 +- hw/mips/mips_malta.c| 6 +- hw/misc/pvpanic.c | 31 --- hw/net/e1000.c | 18 ++-- hw/pci-host/piix.c | 9 ++ hw/pci-host/q35.c | 17 hw/pci/Makefile.objs| 2 +- hw/pci/{pci-hotplug.c = pci-hotplug-old.c} | 75 --- hw/pci/pci.c| 137 ++-- hw/pci/pci_host.c | 1 + hw/pci/pcie_aer.c | 9 +- hw/ppc/e500.c | 2 +- hw/ppc/mac_newworld.c | 2 +- hw/ppc/mac_oldworld.c | 2 +- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/prep.c | 2 +- hw/ppc/spapr.c | 2 +- hw/ppc/spapr_pci.c | 10 ++ hw/sh4/r2d.c| 5 +- hw/sparc64/sun4u.c | 2 +- include/hw/i386/pc.h| 22 - include/hw/pci-host/q35.h | 2 + include/hw/pci/pci.h| 17 ++-- include/hw/pci/pci_host.h | 12 +++ include/qemu/range.h| 16 include/qemu/typedefs.h | 1 + 37 files changed, 404 insertions(+), 162 deletions(-) rename hw/pci/{pci-hotplug.c = pci-hotplug-old.c} (78%)
[Qemu-devel] [PATCH v3 07/18] pvpanic: initialization cleanup
Avoid use of static variables: PC systems initialize pvpanic device through pvpanic_init, so we can simply create the fw_cfg file at that point. This also makes it possible to skip device creation completely if fw_cfg is not there, e.g. for xen - so the ports it reserves are not discoverable by guests. Also, make pvpanic_init void since callers ignore return status anyway. Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Laszlo Ersek ler...@redhat.com Cc: Paul Durrant paul.durr...@citrix.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/misc/pvpanic.c| 30 -- include/hw/i386/pc.h | 2 +- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 060099b..83ed226 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -97,26 +97,28 @@ static void pvpanic_isa_realizefn(DeviceState *dev, Error **errp) { ISADevice *d = ISA_DEVICE(dev); PVPanicState *s = ISA_PVPANIC_DEVICE(dev); -static bool port_configured; -FWCfgState *fw_cfg; isa_register_ioport(d, s-io, s-ioport); +} -if (!port_configured) { -fw_cfg = fw_cfg_find(); -if (fw_cfg) { -fw_cfg_add_file(fw_cfg, etc/pvpanic-port, -g_memdup(s-ioport, sizeof(s-ioport)), -sizeof(s-ioport)); -port_configured = true; -} -} +static void pvpanic_fw_cfg(ISADevice *dev, FWCfgState *fw_cfg) +{ +PVPanicState *s = ISA_PVPANIC_DEVICE(dev); + +fw_cfg_add_file(fw_cfg, etc/pvpanic-port, +g_memdup(s-ioport, sizeof(s-ioport)), +sizeof(s-ioport)); } -int pvpanic_init(ISABus *bus) +void pvpanic_init(ISABus *bus) { -isa_create_simple(bus, TYPE_ISA_PVPANIC_DEVICE); -return 0; +ISADevice *dev; +FWCfgState *fw_cfg = fw_cfg_find(); +if (!fw_cfg) { +return; +} +dev = isa_create_simple (bus, TYPE_ISA_PVPANIC_DEVICE); +pvpanic_fw_cfg(dev, fw_cfg); } static Property pvpanic_isa_properties[] = { diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index dbdd523..5949e7e 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -193,7 +193,7 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, int irq, NICInfo *nd) void pc_system_firmware_init(MemoryRegion *rom_memory); /* pvpanic.c */ -int pvpanic_init(ISABus *bus); +void pvpanic_init(ISABus *bus); /* e820 types */ #define E820_RAM1 -- MST
[Qemu-devel] [PATCH v3 03/18] pc: pass PCI hole ranges to Guests
Guest currently has to jump through lots of hoops to guess the PCI hole ranges. It's fragile, and makes us change BIOS each time we add a new chipset. Let's report the window in a ROM file, to make BIOS do exactly what QEMU intends. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc.c | 26 ++ hw/i386/pc_piix.c| 16 +++- hw/i386/pc_q35.c | 12 ++-- include/hw/i386/pc.h | 1 + 4 files changed, 52 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8af1e4e..7c4794c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -989,6 +989,31 @@ void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge) } } +/* pci-info ROM file. Little endian format */ +typedef struct PcRomPciInfo { +uint64_t w32_min; +uint64_t w32_max; +uint64_t w64_min; +uint64_t w64_max; +} PcRomPciInfo; + +static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info) +{ +PcRomPciInfo *info; +if (!guest_info-has_pci_info) { +return; +} + +info = g_malloc(sizeof *info); +info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin); +info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end); +info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin); +info-w64_max = cpu_to_le64(guest_info-pci_info.w64.end); +/* Pass PCI hole info to guest via a side channel. + * Required so guest PCI enumeration does the right thing. */ +fw_cfg_add_file(guest_info-fw_cfg, etc/pci-info, info, sizeof *info); +} + typedef struct PcGuestInfoState { PcGuestInfo info; Notifier machine_done; @@ -1000,6 +1025,7 @@ void pc_guest_info_machine_done(Notifier *notifier, void *data) PcGuestInfoState *guest_info_state = container_of(notifier, PcGuestInfoState, machine_done); +pc_fw_cfg_guest_info(guest_info_state-info); } PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 4637bde..8a18dbe 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -57,6 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; static bool has_pvpanic = true; +static bool has_pci_info = true; /* PC hardware initialisation */ static void pc_init1(MemoryRegion *system_memory, @@ -126,6 +127,7 @@ static void pc_init1(MemoryRegion *system_memory, } guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size); +guest_info-has_pci_info = has_pci_info; /* Set PCI window size the way seabios has always done it. */ /* Power of 2 so bios can cover it with a single MTRR */ @@ -260,8 +262,15 @@ static void pc_init_pci(QEMUMachineInitArgs *args) initrd_filename, cpu_model, 1, 1); } +static void pc_init_pci_1_5(QEMUMachineInitArgs *args) +{ +has_pci_info = false; +pc_init_pci(args); +} + static void pc_init_pci_1_4(QEMUMachineInitArgs *args) { +has_pci_info = false; has_pvpanic = false; x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE); pc_init_pci(args); @@ -269,6 +278,7 @@ static void pc_init_pci_1_4(QEMUMachineInitArgs *args) static void pc_init_pci_1_3(QEMUMachineInitArgs *args) { +has_pci_info = false; enable_compat_apic_id_mode(); has_pvpanic = false; pc_init_pci(args); @@ -277,6 +287,7 @@ static void pc_init_pci_1_3(QEMUMachineInitArgs *args) /* PC machine init function for pc-1.1 to pc-1.2 */ static void pc_init_pci_1_2(QEMUMachineInitArgs *args) { +has_pci_info = false; disable_kvm_pv_eoi(); enable_compat_apic_id_mode(); has_pvpanic = false; @@ -286,6 +297,7 @@ static void pc_init_pci_1_2(QEMUMachineInitArgs *args) /* PC machine init function for pc-0.14 to pc-1.0 */ static void pc_init_pci_1_0(QEMUMachineInitArgs *args) { +has_pci_info = false; disable_kvm_pv_eoi(); enable_compat_apic_id_mode(); has_pvpanic = false; @@ -302,6 +314,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) const char *initrd_filename = args-initrd_filename; const char *boot_device = args-boot_device; has_pvpanic = false; +has_pci_info = false; disable_kvm_pv_eoi(); enable_compat_apic_id_mode(); pc_init1(get_system_memory(), @@ -320,6 +333,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args) const char *initrd_filename = args-initrd_filename; const char *boot_device = args-boot_device; has_pvpanic = false; +has_pci_info = false; if (cpu_model == NULL) cpu_model = 486; disable_kvm_pv_eoi(); @@ -359,7 +373,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = { static QEMUMachine pc_i440fx_machine_v1_5 = { .name = pc-i440fx-1.5, .desc = Standard PC (i440FX + PIIX, 1996), -.init = pc_init_pci, +.init = pc_init_pci_1_5,
[Qemu-devel] [PATCH v3 06/18] MAINTAINERS: s/Marcelo/Paolo/
Marcelo doesn't maintain kvm anymore, Paolo is taking over the job. Update MAINTAINERS to stop flooding Marcelo with mail. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index ad9c860..11dffee 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -155,7 +155,7 @@ Guest CPU Cores (KVM): Overall M: Gleb Natapov g...@redhat.com -M: Marcelo Tosatti mtosa...@redhat.com +M: Paolo Bonzini pbonz...@redhat.com L: k...@vger.kernel.org S: Supported F: kvm-* -- MST
[Qemu-devel] [PATCH v3 08/18] pvpanic: fix fwcfg for big endian hosts
Convert port number to little endian when exposing it in fw cfg. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/misc/pvpanic.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 83ed226..792d8e4 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -104,10 +104,11 @@ static void pvpanic_isa_realizefn(DeviceState *dev, Error **errp) static void pvpanic_fw_cfg(ISADevice *dev, FWCfgState *fw_cfg) { PVPanicState *s = ISA_PVPANIC_DEVICE(dev); +uint16_t *pvpanic_port = g_malloc(sizeof(*pvpanic_port)); +*pvpanic_port = cpu_to_le16(s-ioport); -fw_cfg_add_file(fw_cfg, etc/pvpanic-port, -g_memdup(s-ioport, sizeof(s-ioport)), -sizeof(s-ioport)); +fw_cfg_add_file(fw_cfg, etc/pvpanic-port, pvpanic_port, +sizeof(*pvpanic_port)); } void pvpanic_init(ISABus *bus) -- MST
[Qemu-devel] [PATCH v3 12/18] pci: Use helper to find device's root bus in pci_find_domain()
From: David Gibson da...@gibson.dropbear.id.au Currently pci_find_domain() performs two functions - it locates the PCI root bus above the given bus, then looks up that root bus's domain number. This patch adds a helper function to perform the first task, finding the root bus for a given PCI device. This is then used in pci_find_domain(). This changes pci_find_domain()'s signature slightly, taking a PCIDevice instead of a PCIBus - since all callers passed something of the form dev-bus, this simplifies things slightly. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci-hotplug-old.c | 2 +- hw/pci/pci.c | 20 +--- hw/pci/pcie_aer.c| 3 +-- include/hw/pci/pci.h | 3 ++- 4 files changed, 17 insertions(+), 11 deletions(-) diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index 7a47d6b..37e0720 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -276,7 +276,7 @@ void pci_device_hot_add(Monitor *mon, const QDict *qdict) if (dev) { monitor_printf(mon, OK domain %d, bus %d, slot %d, function %d\n, - pci_find_domain(dev-bus), + pci_find_domain(dev), pci_bus_num(dev-bus), PCI_SLOT(dev-devfn), PCI_FUNC(dev-devfn)); } else diff --git a/hw/pci/pci.c b/hw/pci/pci.c index fc99e3b..69a6995 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -259,18 +259,24 @@ PCIBus *pci_find_primary_bus(void) return NULL; } -int pci_find_domain(const PCIBus *bus) +PCIBus *pci_device_root_bus(const PCIDevice *d) { -PCIDevice *d; -struct PCIHostBus *host; +PCIBus *bus = d-bus; -/* obtain root bus */ while ((d = bus-parent_dev) != NULL) { bus = d-bus; } +return bus; +} + +int pci_find_domain(const PCIDevice *dev) +{ +const PCIBus *rootbus = pci_device_root_bus(dev); +struct PCIHostBus *host; + QLIST_FOREACH(host, host_buses, next) { -if (host-bus == bus) { +if (host-bus == rootbus) { return host-domain; } } @@ -1997,7 +2003,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, fprintf(stderr, ERROR: %04x:%02x:%02x.%x Attempt to add PCI capability %x at offset %x overlaps existing capability %x at offset %x\n, -pci_find_domain(pdev-bus), pci_bus_num(pdev-bus), +pci_find_domain(pdev), pci_bus_num(pdev-bus), PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn), cap_id, offset, overlapping_cap, i); return -EINVAL; @@ -2152,7 +2158,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) path[path_len] = '\0'; /* First field is the domain. */ -s = snprintf(domain, sizeof domain, %04x:00, pci_find_domain(d-bus)); +s = snprintf(domain, sizeof domain, %04x:00, pci_find_domain(d)); assert(s == domain_len); memcpy(path, domain, domain_len); diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c index 1ce72ce..06f77ac 100644 --- a/hw/pci/pcie_aer.c +++ b/hw/pci/pcie_aer.c @@ -1022,8 +1022,7 @@ int do_pcie_aer_inject_error(Monitor *mon, *ret_data = qobject_from_jsonf({'id': %s, 'domain': %d, 'bus': %d, 'devfn': %d, 'ret': %d}, - id, - pci_find_domain(dev-bus), + id, pci_find_domain(dev), pci_bus_num(dev-bus), dev-devfn, ret); assert(*ret_data); diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 7b89d88..f2bf1ed 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -390,7 +390,8 @@ void pci_for_each_device(PCIBus *bus, int bus_num, void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque), void *opaque); PCIBus *pci_find_primary_bus(void); -int pci_find_domain(const PCIBus *bus); +PCIBus *pci_device_root_bus(const PCIDevice *d); +int pci_find_domain(const PCIDevice *dev); PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn); int pci_qdev_find_device(const char *id, PCIDevice **pdev); PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr); -- MST
[Qemu-devel] [PATCH v3 10/18] pci: Move pci_read_devaddr to pci-hotplug-old.c
From: David Gibson da...@gibson.dropbear.id.au pci_read_devaddr() is only used by the legacy functions for the old PCI hotplug interface in pci-hotplug-old.c. So we move the function there, and make it static. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci-hotplug-old.c | 14 ++ hw/pci/pci.c | 16 +--- include/hw/pci/pci.h | 4 ++-- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index b3c233c..a0b5558 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -36,6 +36,20 @@ #include sysemu/blockdev.h #include qapi/error.h +static int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, +int *busp, unsigned *slotp) +{ +/* strip legacy tag */ +if (!strncmp(addr, pci_addr=, 9)) { +addr += 9; +} +if (pci_parse_devaddr(addr, domp, busp, slotp, NULL)) { +monitor_printf(mon, Invalid pci address\n); +return -1; +} +return 0; +} + static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon, const char *devaddr, const char *opts_str) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 61b681a..adf4da5 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -522,7 +522,7 @@ static void pci_set_default_subsystem_id(PCIDevice *pci_dev) * Parse [[domain:]bus:]slot, return -1 on error if funcp == NULL * [[domain:]bus:]slot.func, return -1 on error */ -static int pci_parse_devaddr(const char *addr, int *domp, int *busp, +int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned int *slotp, unsigned int *funcp) { const char *p; @@ -581,20 +581,6 @@ static int pci_parse_devaddr(const char *addr, int *domp, int *busp, return 0; } -int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp, - unsigned *slotp) -{ -/* strip legacy tag */ -if (!strncmp(addr, pci_addr=, 9)) { -addr += 9; -} -if (pci_parse_devaddr(addr, domp, busp, slotp, NULL)) { -monitor_printf(mon, Invalid pci address\n); -return -1; -} -return 0; -} - PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr) { int dom, bus; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 6ef1f97..b5edef8 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -395,8 +395,8 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn); int pci_qdev_find_device(const char *id, PCIDevice **pdev); PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr); -int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, int *busp, - unsigned *slotp); +int pci_parse_devaddr(const char *addr, int *domp, int *busp, + unsigned int *slotp, unsigned int *funcp); void pci_device_deassert_intx(PCIDevice *dev); -- MST
[Qemu-devel] [PATCH v3 05/18] e1000: cleanup process_tx_desc
From: Andrew Jones drjo...@redhat.com Coverity complains about two overruns in process_tx_desc(). The complaints are false positives, but we might as well eliminate them. The problem is that hdr is defined as an unsigned int, but then used to offset an array of size 65536, and another of size 256 bytes. hdr will actually never be greater than 255 though, as it's assigned only once and to the value of tp-hdr_len, which is an uint8_t. This patch simply gets rid of hdr, replacing it with tp-hdr_len, which makes it consistent with all other tp member use in the function. v2: - also cleanup coding style issues in the touched lines Signed-off-by: Andrew Jones drjo...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/net/e1000.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index e6f46f0..620f947 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -556,7 +556,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp) uint32_t txd_lower = le32_to_cpu(dp-lower.data); uint32_t dtype = txd_lower (E1000_TXD_CMD_DEXT | E1000_TXD_DTYP_D); unsigned int split_size = txd_lower 0x, bytes, sz, op; -unsigned int msh = 0xf, hdr = 0; +unsigned int msh = 0xf; uint64_t addr; struct e1000_context_desc *xp = (struct e1000_context_desc *)dp; struct e1000_tx *tp = s-tx; @@ -603,8 +603,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp) addr = le64_to_cpu(dp-buffer_addr); if (tp-tse tp-cptse) { -hdr = tp-hdr_len; -msh = hdr + tp-mss; +msh = tp-hdr_len + tp-mss; do { bytes = split_size; if (tp-size + bytes msh) @@ -612,14 +611,16 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp) bytes = MIN(sizeof(tp-data) - tp-size, bytes); pci_dma_read(s-dev, addr, tp-data + tp-size, bytes); -if ((sz = tp-size + bytes) = hdr tp-size hdr) -memmove(tp-header, tp-data, hdr); +sz = tp-size + bytes; +if (sz = tp-hdr_len tp-size tp-hdr_len) { +memmove(tp-header, tp-data, tp-hdr_len); +} tp-size = sz; addr += bytes; if (sz == msh) { xmit_seg(s); -memmove(tp-data, tp-header, hdr); -tp-size = hdr; +memmove(tp-data, tp-header, tp-hdr_len); +tp-size = tp-hdr_len; } } while (split_size -= bytes); } else if (!tp-tse tp-cptse) { @@ -633,8 +634,9 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp) if (!(txd_lower E1000_TXD_CMD_EOP)) return; -if (!(tp-tse tp-cptse tp-size hdr)) +if (!(tp-tse tp-cptse tp-size tp-hdr_len)) { xmit_seg(s); +} tp-tso_frames = 0; tp-sum_needed = 0; tp-vlan_needed = 0; -- MST
[Qemu-devel] [PATCH v3 09/18] pci: Cleanup configuration for pci-hotplug.c
From: David Gibson da...@gibson.dropbear.id.au pci-hotplug.c and the CONFIG_PCI_HOTPLUG variable which controls its compilation are misnamed. They're not about PCI hotplug in general, but rather about the pci_add/pci_del interface which are now deprecated in favour of the more general device_add/device_del interface. This patch therefore renames them to pci-hotplug-old.c and CONFIG_PCI_HOTPLUG_OLD. CONFIG_PCI_HOTPLUG=y was listed twice in {i386,x86_64}-softmmu.make for no particular reason, so we clean that up too. In addition it was included in ppc64-softmmu.mak for which the old hotplug interface was never used and is unsuitable, so we remove that too. Most of pci-hotplug.c was additionaly protected by #ifdef TARGET_I386. The small piece which wasn't is only called from the pci_add and pci_del hooks in hmp-commands.hx, which themselves were protected by #ifdef TARGET_I386. This patch therefore also removes the #ifdef from pci-hotplug-old.c, and changes the ifdefs in hmp-commands.hx to use CONFIG_PCI_HOTPLUG_OLD. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- default-configs/i386-softmmu.mak| 3 +-- default-configs/ppc64-softmmu.mak | 2 -- default-configs/x86_64-softmmu.mak | 3 +-- hmp-commands.hx | 4 ++-- hw/pci/Makefile.objs| 2 +- hw/pci/{pci-hotplug.c = pci-hotplug-old.c} | 6 +++--- 6 files changed, 8 insertions(+), 12 deletions(-) rename hw/pci/{pci-hotplug.c = pci-hotplug-old.c} (98%) diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 03deca2..4a0fc9c 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -28,11 +28,10 @@ CONFIG_APPLESMC=y CONFIG_I8259=y CONFIG_PFLASH_CFI01=y CONFIG_TPM_TIS=$(CONFIG_TPM) -CONFIG_PCI_HOTPLUG=y +CONFIG_PCI_HOTPLUG_OLD=y CONFIG_MC146818RTC=y CONFIG_PAM=y CONFIG_PCI_PIIX=y -CONFIG_PCI_HOTPLUG=y CONFIG_WDT_IB700=y CONFIG_PC_SYSFW=y CONFIG_XEN_I386=$(CONFIG_XEN) diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak index cb279cb..5a72b5f 100644 --- a/default-configs/ppc64-softmmu.mak +++ b/default-configs/ppc64-softmmu.mak @@ -45,7 +45,5 @@ CONFIG_OPENPIC=y CONFIG_PSERIES=y CONFIG_E500=y CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM)) -# For pSeries -CONFIG_PCI_HOTPLUG=y # For PReP CONFIG_MC146818RTC=y diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak index 599b630..10bb0c6 100644 --- a/default-configs/x86_64-softmmu.mak +++ b/default-configs/x86_64-softmmu.mak @@ -28,11 +28,10 @@ CONFIG_APPLESMC=y CONFIG_I8259=y CONFIG_PFLASH_CFI01=y CONFIG_TPM_TIS=$(CONFIG_TPM) -CONFIG_PCI_HOTPLUG=y +CONFIG_PCI_HOTPLUG_OLD=y CONFIG_MC146818RTC=y CONFIG_PAM=y CONFIG_PCI_PIIX=y -CONFIG_PCI_HOTPLUG=y CONFIG_WDT_IB700=y CONFIG_PC_SYSFW=y CONFIG_XEN_I386=$(CONFIG_XEN) diff --git a/hmp-commands.hx b/hmp-commands.hx index 915b0d1..d1cdcfb 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1077,7 +1077,7 @@ STEXI Add drive to PCI storage controller. ETEXI -#if defined(TARGET_I386) +#if defined(CONFIG_PCI_HOTPLUG_OLD) { .name = pci_add, .args_type = pci_addr:s,type:s,opts:s?, @@ -1093,7 +1093,7 @@ STEXI Hot-add PCI device. ETEXI -#if defined(TARGET_I386) +#if defined(CONFIG_PCI_HOTPLUG_OLD) { .name = pci_del, .args_type = pci_addr:s, diff --git a/hw/pci/Makefile.objs b/hw/pci/Makefile.objs index a7fb9d0..720f438 100644 --- a/hw/pci/Makefile.objs +++ b/hw/pci/Makefile.objs @@ -8,4 +8,4 @@ common-obj-$(CONFIG_PCI) += pcie.o pcie_aer.o pcie_port.o common-obj-$(CONFIG_NO_PCI) += pci-stub.o common-obj-$(CONFIG_ALL) += pci-stub.o -obj-$(CONFIG_PCI_HOTPLUG) += pci-hotplug.o +common-obj-$(CONFIG_PCI_HOTPLUG_OLD) += pci-hotplug-old.o diff --git a/hw/pci/pci-hotplug.c b/hw/pci/pci-hotplug-old.c similarity index 98% rename from hw/pci/pci-hotplug.c rename to hw/pci/pci-hotplug-old.c index 12287d1..b3c233c 100644 --- a/hw/pci/pci-hotplug.c +++ b/hw/pci/pci-hotplug-old.c @@ -1,5 +1,7 @@ /* - * QEMU PCI hotplug support + * Deprecated PCI hotplug interface support + * This covers the old pci_add / pci_del command, whereas the more general + * device_add / device_del commands are now preferred. * * Copyright (c) 2004 Fabrice Bellard * @@ -34,7 +36,6 @@ #include sysemu/blockdev.h #include qapi/error.h -#if defined(TARGET_I386) static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon, const char *devaddr, const char *opts_str) @@ -257,7 +258,6 @@ void pci_device_hot_add(Monitor *mon, const QDict *qdict) } else monitor_printf(mon, failed to add %s\n, opts); } -#endif static int pci_device_hot_remove(Monitor *mon, const char *pci_addr) { -- MST
[Qemu-devel] [PATCH v3 13/18] pci: Replace pci_find_domain() with more general pci_root_bus_path()
From: David Gibson da...@gibson.dropbear.id.au pci_find_domain() is used in a number of places where we want an id for a whole PCI domain (i.e. the subtree under a PCI root bus). The trouble is that many platforms may support multiple independent host bridges with no hardware supplied notion of domain number. This patch, therefore, replaces calls to pci_find_domain() with calls to a new pci_root_bus_path() returning a string. The new call is implemented in terms of a new callback in the host bridge class, so it can be defined in some way that's well defined for the platform. When no callback is available we fall back on the qbus name. Most current uses of pci_find_domain() are for error or informational messages, so the change in identifiers should be harmless. The exception is pci_get_dev_path(), whose results form part of migration streams. To maintain compatibility with old migration streams, the PIIX PCI host is altered to always supply for this path, which matches the old domain number (since the code didn't actually support domains other than 0). For the pseries (spapr) PCI bridge we use a different platform-unique identifier (pseries machines can routinely have dozens of PCI host bridges). Theoretically that breaks migration streams, but given that we don't yet have migration support for pseries, it doesn't matter. Any other machines that have working migration support including PCI devices will need to be updated to maintain migration stream compatibility. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci-host/piix.c| 9 + hw/pci-host/q35.c | 9 + hw/pci/pci-hotplug-old.c | 4 ++-- hw/pci/pci.c | 38 -- hw/pci/pci_host.c | 1 + hw/pci/pcie_aer.c | 8 hw/ppc/spapr_pci.c| 10 ++ include/hw/pci/pci.h | 2 +- include/hw/pci/pci_host.h | 10 ++ 9 files changed, 66 insertions(+), 25 deletions(-) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index f9e68c3..c36e725 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -629,11 +629,20 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, +PCIBus *rootbus) +{ +/* For backwards compat with old device paths */ +return ; +} + static void i440fx_pcihost_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); +PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass); +hc-root_bus_path = i440fx_pcihost_root_bus_path; k-init = i440fx_pcihost_initfn; dc-fw_name = pci; dc-no_user = 1; diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c index 3a5cff9..13148ed 100644 --- a/hw/pci-host/q35.c +++ b/hw/pci-host/q35.c @@ -63,6 +63,13 @@ static int q35_host_init(SysBusDevice *dev) return 0; } +static const char *q35_host_root_bus_path(PCIHostState *host_bridge, + PCIBus *rootbus) +{ +/* For backwards compat with old device paths */ +return ; +} + static Property mch_props[] = { DEFINE_PROP_UINT64(MCFG, Q35PCIHost, host.base_addr, MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT), @@ -73,7 +80,9 @@ static void q35_host_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); +PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass); +hc-root_bus_path = q35_host_root_bus_path; k-init = q35_host_init; dc-props = mch_props; dc-fw_name = pci; diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index 37e0720..e251810 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -275,8 +275,8 @@ void pci_device_hot_add(Monitor *mon, const QDict *qdict) } if (dev) { -monitor_printf(mon, OK domain %d, bus %d, slot %d, function %d\n, - pci_find_domain(dev), +monitor_printf(mon, OK root bus %s, bus %d, slot %d, function %d\n, + pci_root_bus_path(dev), pci_bus_num(dev-bus), PCI_SLOT(dev-devfn), PCI_FUNC(dev-devfn)); } else diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 69a6995..350b872 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -25,6 +25,7 @@ #include hw/pci/pci.h #include hw/pci/pci_bridge.h #include hw/pci/pci_bus.h +#include hw/pci/pci_host.h #include monitor/monitor.h #include net/net.h #include sysemu/sysemu.h @@ -270,19 +271,20 @@ PCIBus *pci_device_root_bus(const PCIDevice *d) return bus; } -int pci_find_domain(const PCIDevice *dev) +const char *pci_root_bus_path(PCIDevice *dev) { -const PCIBus *rootbus =
[Qemu-devel] [PATCH v3 15/18] pci: Add root bus parameter to pci_nic_init()
From: David Gibson da...@gibson.dropbear.id.au At present, pci_nic_init() and pci_nic_init_nofail() assume that they will only create a NIC under the primary PCI root. As we add support for multiple PCI roots, that may no longer be the case. This patch adds a root bus parameter to pci_nic_init() (and updates callers accordingly) to allow the machine init code using it to specify the right PCI root for NICs created by old-style -net nic parameters. NICs created new-style, with -device can of course be put anywhere. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/alpha/dp264.c | 2 +- hw/arm/realview.c| 6 -- hw/arm/versatilepb.c | 2 +- hw/i386/pc.c | 2 +- hw/mips/mips_fulong2e.c | 6 +++--- hw/mips/mips_malta.c | 6 +++--- hw/pci/pci-hotplug-old.c | 3 ++- hw/pci/pci.c | 10 ++ hw/ppc/e500.c| 2 +- hw/ppc/mac_newworld.c| 2 +- hw/ppc/mac_oldworld.c| 2 +- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/prep.c| 2 +- hw/ppc/spapr.c | 2 +- hw/sh4/r2d.c | 5 - hw/sparc64/sun4u.c | 2 +- include/hw/pci/pci.h | 6 -- 17 files changed, 36 insertions(+), 26 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index 8695efb..8dad08f 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -89,7 +89,7 @@ static void clipper_init(QEMUMachineInitArgs *args) /* Network setup. e1000 is good enough, failing Tulip support. */ for (i = 0; i nb_nics; i++) { -pci_nic_init_nofail(nd_table[i], e1000, NULL); +pci_nic_init_nofail(nd_table[i], pci_bus, e1000, NULL); } /* IDE disk setup. */ diff --git a/hw/arm/realview.c b/hw/arm/realview.c index d6f47bf..036a188 100644 --- a/hw/arm/realview.c +++ b/hw/arm/realview.c @@ -59,7 +59,7 @@ static void realview_init(QEMUMachineInitArgs *args, qemu_irq *irqp; qemu_irq pic[64]; qemu_irq mmc_irq[2]; -PCIBus *pci_bus; +PCIBus *pci_bus = NULL; NICInfo *nd; i2c_bus *i2c; int n; @@ -250,7 +250,9 @@ static void realview_init(QEMUMachineInitArgs *args, } done_nic = 1; } else { -pci_nic_init_nofail(nd, rtl8139, NULL); +if (pci_bus) { +pci_nic_init_nofail(nd, pci_bus, rtl8139, NULL); +} } } diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c index 753757e..15eb086 100644 --- a/hw/arm/versatilepb.c +++ b/hw/arm/versatilepb.c @@ -244,7 +244,7 @@ static void versatile_init(QEMUMachineInitArgs *args, int board_id) smc91c111_init(nd, 0x1001, sic[25]); done_smc = 1; } else { -pci_nic_init_nofail(nd, rtl8139, NULL); +pci_nic_init_nofail(nd, pci_bus, rtl8139, NULL); } } if (usb_enabled(false)) { diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 7c4794c..80c27d6 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1310,7 +1310,7 @@ void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus) if (!pci_bus || (nd-model strcmp(nd-model, ne2k_isa) == 0)) { pc_init_ne2k_isa(isa_bus, nd); } else { -pci_nic_init_nofail(nd, e1000, NULL); +pci_nic_init_nofail(nd, pci_bus, e1000, NULL); } } } diff --git a/hw/mips/mips_fulong2e.c b/hw/mips/mips_fulong2e.c index 00c9071..db67966 100644 --- a/hw/mips/mips_fulong2e.c +++ b/hw/mips/mips_fulong2e.c @@ -231,7 +231,7 @@ static void audio_init (PCIBus *pci_bus) } /* Network support */ -static void network_init (void) +static void network_init (PCIBus *pci_bus) { int i; @@ -244,7 +244,7 @@ static void network_init (void) default_devaddr = 07; } -pci_nic_init_nofail(nd, rtl8139, default_devaddr); +pci_nic_init_nofail(nd, pci_bus, rtl8139, default_devaddr); } } @@ -393,7 +393,7 @@ static void mips_fulong2e_init(QEMUMachineInitArgs *args) /* Sound card */ audio_init(pci_bus); /* Network card */ -network_init(); +network_init(pci_bus); } static QEMUMachine mips_fulong2e_machine = { diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 8a4459d..5843fad 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -468,7 +468,7 @@ static MaltaFPGAState *malta_fpga_init(MemoryRegion *address_space, } /* Network support */ -static void network_init(void) +static void network_init(PCIBus *pci_bus) { int i; @@ -480,7 +480,7 @@ static void network_init(void) /* The malta board has a PCNet card using PCI SLOT 11 */ default_devaddr = 0b; -pci_nic_init_nofail(nd, pcnet, default_devaddr); +pci_nic_init_nofail(nd, pci_bus, pcnet, default_devaddr); } } @@ -985,7 +985,7 @@ void mips_malta_init(QEMUMachineInitArgs *args) fdctrl_init_isa(isa_bus, fd); /* Network card */ -
[Qemu-devel] [PATCH v3 11/18] pci: Abolish pci_find_root_bus()
From: David Gibson da...@gibson.dropbear.id.au pci_find_root_bus() takes a domain parameter. Currently PCI root buses with domain other than 0 can't be created, so this is more or less a long winded way of retrieving the main PCI root bus. Numbered domains don't actually properly cover the (non x86) possibilities for multiple PCI root buses, so this patch for now enforces the domain == 0 restriction in other places to replace pci_find_root_bus() with an explicit pci_find_primary_bus(). Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci-hotplug-old.c | 34 +- hw/pci/pci.c | 19 +++ include/hw/pci/pci.h | 2 +- 3 files changed, 41 insertions(+), 14 deletions(-) diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index a0b5558..7a47d6b 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -36,17 +36,23 @@ #include sysemu/blockdev.h #include qapi/error.h -static int pci_read_devaddr(Monitor *mon, const char *addr, int *domp, +static int pci_read_devaddr(Monitor *mon, const char *addr, int *busp, unsigned *slotp) { +int dom; + /* strip legacy tag */ if (!strncmp(addr, pci_addr=, 9)) { addr += 9; } -if (pci_parse_devaddr(addr, domp, busp, slotp, NULL)) { +if (pci_parse_devaddr(addr, dom, busp, slotp, NULL)) { monitor_printf(mon, Invalid pci address\n); return -1; } +if (dom != 0) { +monitor_printf(mon, Multiple PCI domains not supported, use device_add\n); +return -1; +} return 0; } @@ -128,18 +134,22 @@ static int scsi_hot_add(Monitor *mon, DeviceState *adapter, int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo) { -int dom, pci_bus; +int pci_bus; unsigned slot; +PCIBus *root = pci_find_primary_bus(); PCIDevice *dev; const char *pci_addr = qdict_get_str(qdict, pci_addr); switch (dinfo-type) { case IF_SCSI: -if (pci_read_devaddr(mon, pci_addr, dom, pci_bus, slot)) { +if (!root) { +monitor_printf(mon, no primary PCI bus\n); +goto err; +} +if (pci_read_devaddr(mon, pci_addr, pci_bus, slot)) { goto err; } -dev = pci_find_device(pci_find_root_bus(dom), pci_bus, - PCI_DEVFN(slot, 0)); +dev = pci_find_device(root, pci_bus, PCI_DEVFN(slot, 0)); if (!dev) { monitor_printf(mon, no pci device with address %s\n, pci_addr); goto err; @@ -275,16 +285,22 @@ void pci_device_hot_add(Monitor *mon, const QDict *qdict) static int pci_device_hot_remove(Monitor *mon, const char *pci_addr) { +PCIBus *root = pci_find_primary_bus(); PCIDevice *d; -int dom, bus; +int bus; unsigned slot; Error *local_err = NULL; -if (pci_read_devaddr(mon, pci_addr, dom, bus, slot)) { +if (!root) { +monitor_printf(mon, no primary PCI bus\n); +return -1; +} + +if (pci_read_devaddr(mon, pci_addr, bus, slot)) { return -1; } -d = pci_find_device(pci_find_root_bus(dom), bus, PCI_DEVFN(slot, 0)); +d = pci_find_device(root, bus, PCI_DEVFN(slot, 0)); if (!d) { monitor_printf(mon, slot %d empty\n, slot); return -1; diff --git a/hw/pci/pci.c b/hw/pci/pci.c index adf4da5..fc99e3b 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -246,12 +246,12 @@ static void pci_host_bus_register(int domain, PCIBus *bus) QLIST_INSERT_HEAD(host_buses, host, next); } -PCIBus *pci_find_root_bus(int domain) +PCIBus *pci_find_primary_bus(void) { struct PCIHostBus *host; QLIST_FOREACH(host, host_buses, next) { -if (host-domain == domain) { +if (host-domain == 0) { return host-bus; } } @@ -583,20 +583,31 @@ int pci_parse_devaddr(const char *addr, int *domp, int *busp, PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr) { +PCIBus *root = pci_find_primary_bus(); int dom, bus; unsigned slot; +if (!root) { +fprintf(stderr, No primary PCI bus\n); +return NULL; +} + if (!devaddr) { *devfnp = -1; -return pci_find_bus_nr(pci_find_root_bus(0), 0); +return pci_find_bus_nr(root, 0); } if (pci_parse_devaddr(devaddr, dom, bus, slot, NULL) 0) { return NULL; } +if (dom != 0) { +fprintf(stderr, No support for non-zero PCI domains\n); +return NULL; +} + *devfnp = PCI_DEVFN(slot, 0); -return pci_find_bus_nr(pci_find_root_bus(dom), bus); +return pci_find_bus_nr(root, bus); } static void pci_init_cmask(PCIDevice *dev) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index b5edef8..7b89d88 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -389,7 +389,7
[Qemu-devel] [PATCH v3 16/18] pci: Simpler implementation of primary PCI bus
From: David Gibson da...@gibson.dropbear.id.au Currently pci_find_primary_bus() searches the list of root buses for one with domain 0. But since host buses are always registered with domain 0, this just amounts to finding the only PCI host bus. The only remaining users of pci_find_primary_bus() are in pci-hotplug-old.c, which implements the old style pci_add/pci_del commands. Therefore, this patch redefines pci_find_primary_bus() to find the only PCI root bus, returning an error if there are multiple roots. The callers in pci-hotplug-old.c are updated correspondingly, to produce sensible error messages. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci-hotplug-old.c | 26 -- hw/pci/pci.c | 9 ++--- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index 807260c..8077289 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -62,10 +62,17 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon, { Error *local_err = NULL; QemuOpts *opts; +PCIBus *root = pci_find_primary_bus(); PCIBus *bus; int ret, devfn; -bus = pci_get_bus_devfn(devfn, pci_find_primary_bus(), devaddr); +if (!root) { +monitor_printf(mon, no primary PCI bus (if there are multiple +PCI roots, you must use device_add instead)); +return NULL; +} + +bus = pci_get_bus_devfn(devfn, root, devaddr); if (!bus) { monitor_printf(mon, Invalid PCI device address %s\n, devaddr); return NULL; @@ -92,8 +99,7 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon, monitor_printf(mon, Parameter addr not supported\n); return NULL; } -return pci_nic_init(nd_table[ret], pci_find_primary_bus(), -rtl8139, devaddr); +return pci_nic_init(nd_table[ret], root, rtl8139, devaddr); } static int scsi_hot_add(Monitor *mon, DeviceState *adapter, @@ -144,7 +150,8 @@ int pci_drive_hot_add(Monitor *mon, const QDict *qdict, DriveInfo *dinfo) switch (dinfo-type) { case IF_SCSI: if (!root) { -monitor_printf(mon, no primary PCI bus\n); +monitor_printf(mon, no primary PCI bus (if there are multiple +PCI roots, you must use device_add instead)); goto err; } if (pci_read_devaddr(mon, pci_addr, pci_bus, slot)) { @@ -177,6 +184,7 @@ static PCIDevice *qemu_pci_hot_add_storage(Monitor *mon, DriveInfo *dinfo = NULL; int type = -1; char buf[128]; +PCIBus *root = pci_find_primary_bus(); PCIBus *bus; int devfn; @@ -206,7 +214,12 @@ static PCIDevice *qemu_pci_hot_add_storage(Monitor *mon, dinfo = NULL; } -bus = pci_get_bus_devfn(devfn, pci_find_primary_bus(), devaddr); +if (!root) { +monitor_printf(mon, no primary PCI bus (if there are multiple +PCI roots, you must use device_add instead)); +return NULL; +} +bus = pci_get_bus_devfn(devfn, root, devaddr); if (!bus) { monitor_printf(mon, Invalid PCI device address %s\n, devaddr); return NULL; @@ -293,7 +306,8 @@ static int pci_device_hot_remove(Monitor *mon, const char *pci_addr) Error *local_err = NULL; if (!root) { -monitor_printf(mon, no primary PCI bus\n); +monitor_printf(mon, no primary PCI bus (if there are multiple +PCI roots, you must use device_del instead)); return -1; } diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 2f2db0f..e0995aa 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -249,15 +249,18 @@ static void pci_host_bus_register(int domain, PCIBus *bus) PCIBus *pci_find_primary_bus(void) { +PCIBus *primary_bus = NULL; struct PCIHostBus *host; QLIST_FOREACH(host, host_buses, next) { -if (host-domain == 0) { -return host-bus; +if (primary_bus) { +/* We have multiple root buses, refuse to select a primary */ +return NULL; } +primary_bus = host-bus; } -return NULL; +return primary_bus; } PCIBus *pci_device_root_bus(const PCIDevice *d) -- MST
Re: [Qemu-devel] [PATCH] Makefile: disable parallel build with dtc
On 4 July 2013 09:06, Michael S. Tsirkin m...@redhat.com wrote: Sometimes I get this error when building with -j 4: ar: two different operation options specified make[1]: *** [libfdt/libfdt.a] Error 1 make: *** [subdir-dtc] Error 2 dtc make does not seem to support parallel make. Force non-parallel build to fix this. So, this is the second time somebody's reported this, and I think it would be better to try to figure out what's going on. Can you report what the actual ar command is when run with V=1 ? Also, can you confirm that you haven't got an environment that sets ARFLAGS to something weird (including ) ? thanks -- PMM
[Qemu-devel] [PATCH v3 14/18] pci: Add root bus argument to pci_get_bus_devfn()
From: David Gibson da...@gibson.dropbear.id.au pci_get_bus_devfn() interprets a full PCI address string to give a PCIBus * and device/function number within that bus. Currently it assumes it is working on an address under the primary PCI root bus. This patch extends it to allow the caller to specify a root bus. This might seem a little odd since the supplied address can (theoretically) include a PCI domain number. However, attempting to use a non-zero domain number there is currently an error, so that shouldn't really cause problems. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci-hotplug-old.c | 4 ++-- hw/pci/pci.c | 7 --- include/hw/pci/pci.h | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/hw/pci/pci-hotplug-old.c b/hw/pci/pci-hotplug-old.c index e251810..e92d646 100644 --- a/hw/pci/pci-hotplug-old.c +++ b/hw/pci/pci-hotplug-old.c @@ -65,7 +65,7 @@ static PCIDevice *qemu_pci_hot_add_nic(Monitor *mon, PCIBus *bus; int ret, devfn; -bus = pci_get_bus_devfn(devfn, devaddr); +bus = pci_get_bus_devfn(devfn, pci_find_primary_bus(), devaddr); if (!bus) { monitor_printf(mon, Invalid PCI device address %s\n, devaddr); return NULL; @@ -205,7 +205,7 @@ static PCIDevice *qemu_pci_hot_add_storage(Monitor *mon, dinfo = NULL; } -bus = pci_get_bus_devfn(devfn, devaddr); +bus = pci_get_bus_devfn(devfn, pci_find_primary_bus(), devaddr); if (!bus) { monitor_printf(mon, Invalid PCI device address %s\n, devaddr); return NULL; diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 350b872..c4f63ad 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -589,12 +589,13 @@ int pci_parse_devaddr(const char *addr, int *domp, int *busp, return 0; } -PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr) +PCIBus *pci_get_bus_devfn(int *devfnp, PCIBus *root, const char *devaddr) { -PCIBus *root = pci_find_primary_bus(); int dom, bus; unsigned slot; +assert(!root-parent_dev); + if (!root) { fprintf(stderr, No primary PCI bus\n); return NULL; @@ -1588,7 +1589,7 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char *default_model, if (i 0) return NULL; -bus = pci_get_bus_devfn(devfn, devaddr); +bus = pci_get_bus_devfn(devfn, pci_find_primary_bus(), devaddr); if (!bus) { error_report(Invalid PCI device address %s for device %s, devaddr, pci_nic_names[i]); diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index e0597b7..3a43fba 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -394,7 +394,7 @@ PCIBus *pci_device_root_bus(const PCIDevice *d); const char *pci_root_bus_path(PCIDevice *dev); PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn); int pci_qdev_find_device(const char *id, PCIDevice **pdev); -PCIBus *pci_get_bus_devfn(int *devfnp, const char *devaddr); +PCIBus *pci_get_bus_devfn(int *devfnp, PCIBus *root, const char *devaddr); int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned int *slotp, unsigned int *funcp); -- MST
[Qemu-devel] [PATCH v3 18/18] pci: Fold host_buses list into PCIHostState functionality
From: David Gibson da...@gibson.dropbear.id.au The host_buses list is an odd structure - a list of pointers to PCI root buses existing in parallel to the normal qdev tree structure. This patch removes it, instead putting the link pointers into the PCIHostState structure, which have a 1:1 relationship to PCIHostBus structures anyway. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci.c | 33 ++--- include/hw/pci/pci_host.h | 2 ++ 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d861b40..8680063 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -90,11 +90,7 @@ static void pci_del_option_rom(PCIDevice *pdev); static uint16_t pci_default_sub_vendor_id = PCI_SUBVENDOR_ID_REDHAT_QUMRANET; static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU; -struct PCIHostBus { -struct PCIBus *bus; -QLIST_ENTRY(PCIHostBus) next; -}; -static QLIST_HEAD(, PCIHostBus) host_buses; +static QLIST_HEAD(, PCIHostState) pci_host_bridges; static const VMStateDescription vmstate_pcibus = { .name = PCIBUS, @@ -237,20 +233,19 @@ static int pcibus_reset(BusState *qbus) return 1; } -static void pci_host_bus_register(PCIBus *bus) +static void pci_host_bus_register(PCIBus *bus, DeviceState *parent) { -struct PCIHostBus *host; -host = g_malloc0(sizeof(*host)); -host-bus = bus; -QLIST_INSERT_HEAD(host_buses, host, next); +PCIHostState *host_bridge = PCI_HOST_BRIDGE(parent); + +QLIST_INSERT_HEAD(pci_host_bridges, host_bridge, next); } PCIBus *pci_find_primary_bus(void) { PCIBus *primary_bus = NULL; -struct PCIHostBus *host; +PCIHostState *host; -QLIST_FOREACH(host, host_buses, next) { +QLIST_FOREACH(host, pci_host_bridges, next) { if (primary_bus) { /* We have multiple root buses, refuse to select a primary */ return NULL; @@ -302,7 +297,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, /* host bridge */ QLIST_INIT(bus-child); -pci_host_bus_register(bus); +pci_host_bus_register(bus, parent); vmstate_register(NULL, -1, vmstate_pcibus, bus); } @@ -1533,11 +1528,11 @@ static PciInfo *qmp_query_pci_bus(PCIBus *bus, int bus_num) PciInfoList *qmp_query_pci(Error **errp) { PciInfoList *info, *head = NULL, *cur_item = NULL; -struct PCIHostBus *host; +PCIHostState *host_bridge; -QLIST_FOREACH(host, host_buses, next) { +QLIST_FOREACH(host_bridge, pci_host_bridges, next) { info = g_malloc0(sizeof(*info)); -info-value = qmp_query_pci_bus(host-bus, 0); +info-value = qmp_query_pci_bus(host_bridge-bus, 0); /* XXX: waiting for the qapi to support GSList */ if (!cur_item) { @@ -2201,11 +2196,11 @@ static int pci_qdev_find_recursive(PCIBus *bus, int pci_qdev_find_device(const char *id, PCIDevice **pdev) { -struct PCIHostBus *host; +PCIHostState *host_bridge; int rc = -ENODEV; -QLIST_FOREACH(host, host_buses, next) { -int tmp = pci_qdev_find_recursive(host-bus, id, pdev); +QLIST_FOREACH(host_bridge, pci_host_bridges, next) { +int tmp = pci_qdev_find_recursive(host_bridge-bus, id, pdev); if (!tmp) { rc = 0; break; diff --git a/include/hw/pci/pci_host.h b/include/hw/pci/pci_host.h index 44052f2..ba31595 100644 --- a/include/hw/pci/pci_host.h +++ b/include/hw/pci/pci_host.h @@ -46,6 +46,8 @@ struct PCIHostState { MemoryRegion mmcfg; uint32_t config_reg; PCIBus *bus; + +QLIST_ENTRY(PCIHostState) next; }; typedef struct PCIHostBridgeClass { -- MST
[Qemu-devel] [PATCH 1/1] hw/9pfs: Fix memory leak in error path
From: M. Mohan Kumar mo...@in.ibm.com Fix few more memory leaks in virtio-9p-device.c detected using valgrind. Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- hw/9pfs/virtio-9p-device.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index dc6f4e4..35e2af4 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -68,14 +68,14 @@ static int virtio_9p_device_init(VirtIODevice *vdev) fprintf(stderr, Virtio-9p device couldn't find fsdev with the id = %s\n, s-fsconf.fsdev_id ? s-fsconf.fsdev_id : NULL); -return -1; +goto out; } if (!s-fsconf.tag) { /* we haven't specified a mount_tag */ fprintf(stderr, fsdev with id %s needs mount_tag arguments\n, s-fsconf.fsdev_id); -return -1; +goto out; } s-ctx.export_flags = fse-export_flags; @@ -85,10 +85,10 @@ static int virtio_9p_device_init(VirtIODevice *vdev) if (len MAX_TAG_LEN - 1) { fprintf(stderr, mount tag '%s' (%d bytes) is longer than maximum (%d bytes), s-fsconf.tag, len, MAX_TAG_LEN - 1); -return -1; +goto out; } -s-tag = strdup(s-fsconf.tag); +s-tag = g_strdup(s-fsconf.tag); s-ctx.uid = -1; s-ops = fse-ops; @@ -99,11 +99,11 @@ static int virtio_9p_device_init(VirtIODevice *vdev) if (s-ops-init(s-ctx) 0) { fprintf(stderr, Virtio-9p Failed to initialize fs-driver with id:%s and export path:%s\n, s-fsconf.fsdev_id, s-ctx.fs_root); -return -1; +goto out; } if (v9fs_init_worker_threads() 0) { fprintf(stderr, worker thread initialization failed\n); -return -1; +goto out; } /* @@ -115,18 +115,26 @@ static int virtio_9p_device_init(VirtIODevice *vdev) if (s-ops-name_to_path(s-ctx, NULL, /, path) 0) { fprintf(stderr, error in converting name to path %s, strerror(errno)); -return -1; +goto out; } if (s-ops-lstat(s-ctx, path, stat)) { fprintf(stderr, share path %s does not exist\n, fse-path); -return -1; +goto out; } else if (!S_ISDIR(stat.st_mode)) { fprintf(stderr, share path %s is not a directory\n, fse-path); -return -1; +goto out; } v9fs_path_free(path); return 0; +out: +g_free(s-ctx.fs_root); +g_free(s-tag); +virtio_cleanup(vdev); +v9fs_path_free(path); + +return -1; + } /* virtio-9p device */ -- 1.7.11.7
Re: [Qemu-devel] [PATCH v3 00/14] tcg: remainder and tcg-arm updates
On 03.07.2013 23:29, Richard Henderson wrote: Changes v2-v3: * Add myself to tcg maintainers, as per afaerber's suggestion. * Fix rebase error wrt aarch64, as per claudio. * Include tcg-arm unwind patch set; no point in half measures. r~ The following changes since commit ab8bf29078e0ab8347e2ff8b4e5542f7a0c751cf: Merge remote-tracking branch 'qemu-kvm/uq/master' into staging (2013-07-03 08:37:00 -0500) are available in the git repository at: git://github.com/rth7680/qemu.git tcg-next for you to fetch changes up to 6688d5d7eefa67b5f50b6f03a2456e4635781b3b: tcg-arm: Implement tcg_register_jit (2013-07-03 11:17:57 -0700) Richard Henderson (14): tcg: Add myself to general TCG maintainership tcg: Split rem requirement from div requirement tcg-arm: Don't implement rem tcg-ppc: Don't implement rem tcg-ppc64: Don't implement rem tcg: Allow non-constant control macros tcg: Simplify logic using TCG_OPF_NOT_PRESENT tcg-arm: Make use of conditional availability of opcodes for divide tcg-arm: Simplify logic in detecting the ARM ISA in use tcg-arm: Use AT_PLATFORM to detect the host ISA tcg: Fix high_pc fields in .debug_info tcg: Move the CIE and FDE header definitions to common code tcg-i386: Use QEMU_BUILD_BUG_ON instead of assert for frame size tcg-arm: Implement tcg_register_jit MAINTAINERS | 1 + tcg/aarch64/tcg-target.h | 2 + tcg/arm/tcg-target.c | 172 ++- tcg/arm/tcg-target.h | 15 +++-- tcg/hppa/tcg-target.c| 35 +++--- tcg/hppa/tcg-target.h| 1 + tcg/i386/tcg-target.c| 45 + tcg/ia64/tcg-target.h| 2 + tcg/mips/tcg-target.h| 1 + tcg/ppc/tcg-target.c | 14 tcg/ppc/tcg-target.h | 1 + tcg/ppc64/tcg-target.c | 26 --- tcg/ppc64/tcg-target.h | 2 + tcg/sparc/tcg-target.c | 35 +++--- tcg/sparc/tcg-target.h | 2 + tcg/tcg-op.h | 32 +++-- tcg/tcg-opc.h| 36 +- tcg/tcg.c| 26 +-- tcg/tcg.h| 6 +- tcg/tci/tcg-target.h | 2 + 20 files changed, 242 insertions(+), 214 deletions(-) Tested tcg/aarch64 on Aarch64 Foundationv8 (sparc-softmmu, arm-softmmu, x86_64-softmmu). Tested-by: Claudio Fontana claudio.font...@huawei.com Reviewed-by: Claudio Fontana claudio.font...@huawei.com
[Qemu-devel] [PATCH v5] Xen PV Device
Introduces a new Xen PV PCI device which will act as a binding point for PV drivers for Xen. The device has parameterized vendor-id, device-id and revision to allow to be configured as a binding point for any vendor's PV drivers. Signed-off-by: Paul Durrant paul.durr...@citrix.com Cc: Stefano Stabellini stefano.stabell...@citrix.com Reviewed-by: Andreas Färber afaer...@suse.de --- V5: - Addresses comments from Andreas Färber V4: - Renamed from 'Citrix PV Bus' to 'Xen PV Device' - Paramaterized vendor-id and device-id as requested by Stefano Stabellini V3: - Addresses comments from Anthony Liguori and Peter Maydell V2: - Addresses comments from Andreas Farber and Paolo Bonzini hw/xen/Makefile.objs |1 + hw/xen/xen_pvdevice.c| 131 ++ include/hw/pci/pci_ids.h |5 +- trace-events |4 ++ 4 files changed, 139 insertions(+), 2 deletions(-) create mode 100644 hw/xen/xen_pvdevice.c diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs index 2017560..cd2df6a 100644 --- a/hw/xen/Makefile.objs +++ b/hw/xen/Makefile.objs @@ -1,5 +1,6 @@ # xen backend driver support common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o xen_devconfig.o +common-obj-y += xen_pvdevice.o obj-$(CONFIG_XEN_I386) += xen_platform.o xen_apic.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o diff --git a/hw/xen/xen_pvdevice.c b/hw/xen/xen_pvdevice.c new file mode 100644 index 000..93dfab2 --- /dev/null +++ b/hw/xen/xen_pvdevice.c @@ -0,0 +1,131 @@ +/* Copyright (c) Citrix Systems Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, + * with or without modification, are permitted provided + * that the following conditions are met: + * + * * Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * * Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the + * following disclaimer in the documentation and/or other + * materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND + * CONTRIBUTORS AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, + * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include hw/hw.h +#include hw/pci/pci.h +#include trace.h + +#define TYPE_XEN_PV_DEVICE xen-pvdevice + +#define XEN_PV_DEVICE(obj) \ + OBJECT_CHECK(XenPVDevice, (obj), TYPE_XEN_PV_DEVICE) + +typedef struct XenPVDevice { +/* private */ +PCIDevice parent_obj; +/* public */ +uint16_tvendor_id; +uint16_tdevice_id; +uint8_t revision; +uint32_tsize; +MemoryRegionmmio; +} XenPVDevice; + +static uint64_t xen_pv_mmio_read(void *opaque, hwaddr addr, + unsigned size) +{ +trace_xen_pv_mmio_read(addr); + +return ~(uint64_t)0; +} + +static void xen_pv_mmio_write(void *opaque, hwaddr addr, + uint64_t val, unsigned size) +{ +trace_xen_pv_mmio_write(addr); +} + +static const MemoryRegionOps xen_pv_mmio_ops = { +.read = xen_pv_mmio_read, +.write = xen_pv_mmio_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static int xen_pv_init(PCIDevice *pci_dev) +{ +XenPVDevice *d = XEN_PV_DEVICE(pci_dev); +uint8_t *pci_conf; + +pci_conf = pci_dev-config; + +pci_set_word(pci_conf + PCI_VENDOR_ID, d-vendor_id); +pci_set_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID, d-vendor_id); +pci_set_word(pci_conf + PCI_DEVICE_ID, d-device_id); +pci_set_word(pci_conf + PCI_SUBSYSTEM_ID, d-device_id); +pci_set_byte(pci_conf + PCI_REVISION_ID, d-revision); + +pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_MEMORY); + +pci_config_set_prog_interface(pci_conf, 0); + +pci_conf[PCI_INTERRUPT_PIN] = 1; + +memory_region_init_io(d-mmio, xen_pv_mmio_ops, d, + mmio, d-size); + +pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_MEM_PREFETCH, + d-mmio); + +return 0; +} + +static Property xen_pv_props[] = { +DEFINE_PROP_UINT16(vendor-id, XenPVDevice, vendor_id, PCI_VENDOR_ID_XEN), +DEFINE_PROP_UINT16(device-id, XenPVDevice, device_id, PCI_DEVICE_ID_XEN_PVDEVICE), +
[Qemu-devel] [PATCH v3 17/18] pci: Remove domain from PCIHostBus
From: David Gibson da...@gibson.dropbear.id.au There are now no users of the domain field of PCIHostBus, so remove it from the structure, and as a parameter from the pci_host_bus_register() function which sets it. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/pci/pci.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index e0995aa..d861b40 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -91,7 +91,6 @@ static uint16_t pci_default_sub_vendor_id = PCI_SUBVENDOR_ID_REDHAT_QUMRANET; static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU; struct PCIHostBus { -int domain; struct PCIBus *bus; QLIST_ENTRY(PCIHostBus) next; }; @@ -238,11 +237,10 @@ static int pcibus_reset(BusState *qbus) return 1; } -static void pci_host_bus_register(int domain, PCIBus *bus) +static void pci_host_bus_register(PCIBus *bus) { struct PCIHostBus *host; host = g_malloc0(sizeof(*host)); -host-domain = domain; host-bus = bus; QLIST_INSERT_HEAD(host_buses, host, next); } @@ -303,7 +301,8 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, /* host bridge */ QLIST_INIT(bus-child); -pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */ + +pci_host_bus_register(bus); vmstate_register(NULL, -1, vmstate_pcibus, bus); } -- MST
Re: [Qemu-devel] [PATCH] Makefile: disable parallel build with dtc
Am 04.07.2013 11:17, schrieb Peter Maydell: On 4 July 2013 09:06, Michael S. Tsirkin m...@redhat.com wrote: Sometimes I get this error when building with -j 4: ar: two different operation options specified make[1]: *** [libfdt/libfdt.a] Error 1 make: *** [subdir-dtc] Error 2 dtc make does not seem to support parallel make. Force non-parallel build to fix this. So, this is the second time somebody's reported this, and I think it would be better to try to figure out what's going on. Can you report what the actual ar command is when run with V=1 ? Also, can you confirm that you haven't got an environment that sets ARFLAGS to something weird (including ) ? I did confirm that my environment does not have ARFLAGS set; I believe the issue is that ARFLAGS=$(ARFLAGS) is being passed in the Makefile, effectively setting it to . Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] [PATCH] Makefile: disable parallel build with dtc
On Thu, Jul 04, 2013 at 10:17:46AM +0100, Peter Maydell wrote: On 4 July 2013 09:06, Michael S. Tsirkin m...@redhat.com wrote: Sometimes I get this error when building with -j 4: ar: two different operation options specified make[1]: *** [libfdt/libfdt.a] Error 1 make: *** [subdir-dtc] Error 2 dtc make does not seem to support parallel make. Force non-parallel build to fix this. So, this is the second time somebody's reported this, and I think it would be better to try to figure out what's going on. Can you report what the actual ar command is when run with V=1 ? it stopped reproducing now :( Also, can you confirm that you haven't got an environment that sets ARFLAGS to something weird (including ) ? thanks -- PMM I can confirm that.
Re: [Qemu-devel] [PATCH] Makefile: disable parallel build with dtc
On 4 July 2013 10:45, Andreas Färber afaer...@suse.de wrote: Am 04.07.2013 11:17, schrieb Peter Maydell: Also, can you confirm that you haven't got an environment that sets ARFLAGS to something weird (including ) ? I did confirm that my environment does not have ARFLAGS set; I believe the issue is that ARFLAGS=$(ARFLAGS) is being passed in the Makefile, effectively setting it to . That should set it to rv, because the top level make will default ARFLAGS to that if you haven't set it explicitly. You can test this by seeing whether a V=1 build runs the libfdt make with ARFLAGS=rv or something else: cam-vm-266:precise:qemu$ make -C build/x86 -j4 V=1 make: Entering directory `/home/petmay01/linaro/qemu-from-laptop/qemu/build/x86' make -I/home/petmay01/linaro/qemu-from-laptop/qemu/dtc VPATH=/home/petmay01/linaro/qemu-from-laptop/qemu/dtc -C dtc V=1 LIBFDT_srcdir=/home/petmay01/linaro/qemu-from-laptop/qemu/dtc/libfdt CPPFLAGS=-I/home/petmay01/linaro/qemu-from-laptop/qemu/build/x86/dtc -I/home/petmay01/linaro/qemu-from-laptop/qemu/dtc -I/home/petmay01/linaro/qemu-from-laptop/qemu/dtc/libfdt CFLAGS=-O2 -D_FORTIFY_SOURCE=2 -g -Werror -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all -I/usr/include/libpng12 -I/usr/include/pixman-1 -I/home/petmay01/linaro/qemu-from-laptop/qemu/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/home/petmay01/linaro/qemu-from-laptop/qemu/tests LDFLAGS=-Wl,--warn-common -m64 -static -g ARFLAGS=rv CC=ccache gcc AR=ar LD=ld BUILD_DIR=/home/petmay01/linaro/qemu-from-laptop/qemu/build/x86 libfdt/libfdt.a make[1]: Entering directory `/home/petmay01/linaro/qemu-from-laptop/qemu/build/x86/dtc' (this is GNU Make 3.81 from ubuntu package 3.81-8.1ubuntu1.1) -- PMM
[Qemu-devel] [PATCH V4 06/10] NUMA: split out the common range parser
Since cpus parser and hostnode parser have the common range parser part, split it out to the common range parser to avoid the duplicate code. Reviewed-by: Bandan Das b...@redhat.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- vl.c | 89 1 file changed, 37 insertions(+), 52 deletions(-) diff --git a/vl.c b/vl.c index 38e0d3d..6e86dcf 100644 --- a/vl.c +++ b/vl.c @@ -1338,47 +1338,55 @@ char *get_boot_devices_list(size_t *size) return list; } -static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp) +static int numa_node_parse_common(const char *str, + unsigned long long *value, + unsigned long long *endvalue) { char *endptr; -unsigned long long value, endvalue; - -/* Empty CPU range strings will be considered valid, they will simply - * not set any bit in the CPU bitmap. - */ -if (!*cpus) { -return; +if (parse_uint(str, value, endptr, 10) 0) { +return -1; } -if (parse_uint(cpus, value, endptr, 10) 0) { -goto error; -} if (*endptr == '-') { -if (parse_uint_full(endptr + 1, endvalue, 10) 0) { -goto error; +if (parse_uint_full(endptr + 1, endvalue, 10) 0) { + return -1; } } else if (*endptr == '\0') { -endvalue = value; +*endvalue = *value; } else { -goto error; +return -1; } -if (endvalue = MAX_CPUMASK_BITS) { -endvalue = MAX_CPUMASK_BITS - 1; -fprintf(stderr, -qemu: NUMA: A max of %d VCPUs are supported\n, - MAX_CPUMASK_BITS); +if (*endvalue = MAX_CPUMASK_BITS) { +*endvalue = MAX_CPUMASK_BITS - 1; +fprintf(stderr, qemu: NUMA: A max number %d is supported\n, +MAX_CPUMASK_BITS); } -if (endvalue value) { -goto error; +if (*endvalue *value) { +return -1; } -bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1); -return; +return 0; +} -error: -error_setg(errp, Invalid NUMA CPU range: %s\n, cpus); +static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp) +{ +unsigned long long value, endvalue; + +/* Empty CPU range strings will be considered valid, they will simply + * not set any bit in the CPU bitmap. + */ +if (!*cpus) { +return; +} + +if (numa_node_parse_common(cpus, value, endvalue) 0) { +error_setg(errp, Invalid NUMA CPU range: %s, cpus); +return; +} + +bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1); return; } @@ -1403,7 +1411,6 @@ void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp) void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp) { unsigned long long value, endvalue; -char *endptr; bool clear = false; unsigned long *bm = numa_info[nodenr].host_mem; @@ -1422,27 +1429,9 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp) return; } -if (parse_uint(hostnode, value, endptr, 10) 0) -goto error; -if (*endptr == '-') { -if (parse_uint_full(endptr + 1, endvalue, 10) 0) { -goto error; -} -} else if (*endptr == '\0') { -endvalue = value; -} else { -goto error; -} - -if (endvalue = MAX_CPUMASK_BITS) { -endvalue = MAX_CPUMASK_BITS - 1; -fprintf(stderr, -qemu: NUMA: A max of %d host nodes are supported\n, - MAX_CPUMASK_BITS); -} - -if (endvalue value) { -goto error; +if (numa_node_parse_common(hostnode, value, endvalue) 0) { +error_setg(errp, Invalid host NUMA ndoes range: %s, hostnode); +return; } if (clear) @@ -1451,10 +1440,6 @@ void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp) bitmap_set(bm, value, endvalue - value + 1); return; - -error: -error_setg(errp, Invalid host NUMA nodes range: %s, hostnode); -return; } static int numa_add_cpus(const char *name, const char *value, void *opaque) -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 01/10] NUMA: Support multiple CPU ranges on -numa option
From: Bandan Das b...@redhat.com This allows us to use the cpus property multiple times to specify multiple cpu (ranges) to the -numa option : -numa node,cpus=1,cpus=2,cpus=4 or -numa node,cpus=1-3,cpus=5 Note that after this patch, the defalut suffix of -numa node,mem=N will no longer be M. So we must add the suffix M like -numa node,mem=NM when assigning N MB of node memory size. Signed-off-by: Bandan Das b...@redhat.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- qemu-options.hx | 3 +- vl.c| 108 ++-- 2 files changed, 67 insertions(+), 44 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 137a39b..449cf36 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -100,7 +100,8 @@ STEXI @item -numa @var{opts} @findex -numa Simulate a multi node NUMA system. If mem and cpus are omitted, resources -are split equally. +are split equally. The -cpus property may be specified multiple times +to denote multiple cpus or cpu ranges. ETEXI DEF(add-fd, HAS_ARG, QEMU_OPTION_add_fd, diff --git a/vl.c b/vl.c index 6d9fd7d..6f2e17a 100644 --- a/vl.c +++ b/vl.c @@ -516,6 +516,32 @@ static QemuOptsList qemu_realtime_opts = { }, }; +static QemuOptsList qemu_numa_opts = { +.name = numa, +.implied_opt_name = type, +.head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head), +.desc = { +{ +.name = type, +.type = QEMU_OPT_STRING, +.help = node type +},{ +.name = nodeid, +.type = QEMU_OPT_NUMBER, +.help = node ID +},{ +.name = mem, +.type = QEMU_OPT_SIZE, +.help = memory size +},{ +.name = cpus, +.type = QEMU_OPT_STRING, +.help = cpu number or range +}, +{ /* end of list */ } +}, +}; + const char *qemu_get_vm_name(void) { return qemu_name; @@ -1349,56 +1375,37 @@ error: exit(1); } -static void numa_add(const char *optarg) + +static int numa_add_cpus(const char *name, const char *value, void *opaque) { -char option[128]; -char *endptr; -unsigned long long nodenr; +int *nodenr = opaque; -optarg = get_opt_name(option, 128, optarg, ','); -if (*optarg == ',') { -optarg++; +if (!strcmp(name, cpu)) { +numa_node_parse_cpus(*nodenr, value); } -if (!strcmp(option, node)) { - -if (nb_numa_nodes = MAX_NODES) { -fprintf(stderr, qemu: too many NUMA nodes\n); -exit(1); -} +return 0; +} -if (get_param_value(option, 128, nodeid, optarg) == 0) { -nodenr = nb_numa_nodes; -} else { -if (parse_uint_full(option, nodenr, 10) 0) { -fprintf(stderr, qemu: Invalid NUMA nodeid: %s\n, option); -exit(1); -} -} +static int numa_init_func(QemuOpts *opts, void *opaque) +{ +uint64_t nodenr, mem_size; -if (nodenr = MAX_NODES) { -fprintf(stderr, qemu: invalid NUMA nodeid: %llu\n, nodenr); -exit(1); -} +nodenr = qemu_opt_get_number(opts, nodeid, nb_numa_nodes++); -if (get_param_value(option, 128, mem, optarg) == 0) { -node_mem[nodenr] = 0; -} else { -int64_t sval; -sval = strtosz(option, endptr); -if (sval 0 || *endptr) { -fprintf(stderr, qemu: invalid numa mem size: %s\n, optarg); -exit(1); -} -node_mem[nodenr] = sval; -} -if (get_param_value(option, 128, cpus, optarg) != 0) { -numa_node_parse_cpus(nodenr, option); -} -nb_numa_nodes++; -} else { -fprintf(stderr, Invalid -numa option: %s\n, option); +if (nodenr = MAX_NODES) { +fprintf(stderr, qemu: Max number of NUMA nodes reached : %d\n, +(int)nodenr); exit(1); } + +mem_size = qemu_opt_get_size(opts, mem, 0); +node_mem[nodenr] = mem_size; + +if (qemu_opt_foreach(opts, numa_add_cpus, nodenr, 1) 0) { +return -1; +} + +return 0; } static QemuOptsList qemu_smp_opts = { @@ -2933,6 +2940,7 @@ int main(int argc, char **argv, char **envp) qemu_add_opts(qemu_object_opts); qemu_add_opts(qemu_tpmdev_opts); qemu_add_opts(qemu_realtime_opts); +qemu_add_opts(qemu_numa_opts); runstate_init(); @@ -3119,7 +3127,16 @@ int main(int argc, char **argv, char **envp) } break; case QEMU_OPTION_numa: -numa_add(optarg); +olist = qemu_find_opts(numa); +opts = qemu_opts_parse(olist, optarg, 1); +if (!opts) { +exit(1); +} +optarg = qemu_opt_get(opts, type); +if (!optarg || strcmp(optarg, node)) { +
[Qemu-devel] [PATCH V4 00/10] Add support for binding guest numa nodes to host numa nodes
As you know, QEMU can't direct it's memory allocation now, this may cause guest cross node access performance regression. And, the worse thing is that if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =kvm_assign_device() = kvm_iommu_map_memslots() = kvm_iommu_map_pages() = kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policy before the pages are really mapped. According to this patch set, we are able to set guest nodes memory policy like following: -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=1 This supports mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N} like format. And patch 8/10 adds a QMP command set-mpol to set the memory policy for every guest nodes: set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1 And patch 9/10 adds a monitor command set-mpol whose format like: set-mpol 0 mem-policy=membind,mem-hostnode=0-1 And with patch 10/10, we can get the current memory policy of each guest node using monitor command info numa, for example: (qemu) info numa 2 nodes node 0 cpus: 0 node 0 size: 1024 MB node 0 mempolicy: membind=0,1 node 1 cpus: 1 node 1 size: 1024 MB node 1 mempolicy: interleave=1 V1-V2: change to use QemuOpts in numa options (Paolo) handle Error in mpol parser (Paolo) change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo) V2-V3: also handle Error in cpus parser (5/10) split out common parser from cpus and hostnode parser (Bandan 6/10) V3-V4: rebase to request for comments Bandan Das (1): NUMA: Support multiple CPU ranges on -numa option Wanlong Gao (9): NUMA: Add numa_info structure to contain numa nodes info NUMA: Add Linux libnuma detection NUMA: parse guest numa nodes memory policy NUMA: handle Error in cpus, mpol and hostnode parser NUMA: split out the common range parser NUMA: set guest numa nodes memory policy NUMA: add qmp command set-mpol to set memory policy for NUMA node NUMA: add hmp command set-mpol NUMA: show host memory policy info in info numa command configure | 32 ++ cpus.c | 143 +++- hmp-commands.hx | 16 +++ hmp.c | 35 ++ hmp.h | 1 + hw/i386/pc.c| 4 +- hw/net/eepro100.c | 1 - include/sysemu/sysemu.h | 20 +++- monitor.c | 44 +++- qapi-schema.json| 15 +++ qemu-options.hx | 3 +- qmp-commands.hx | 35 ++ vl.c| 285 +++- 13 files changed, 553 insertions(+), 81 deletions(-) -- 1.8.3.1.448.gfb7dfaa
[Qemu-devel] [PATCH V4 10/10] NUMA: show host memory policy info in info numa command
Show host memory policy of nodes in the info numa monitor command. After this patch, the monitor command info numa will show the information like following if the host numa support is enabled: (qemu) info numa 2 nodes node 0 cpus: 0 node 0 size: 1024 MB node 0 mempolicy: membind=0,1 node 1 cpus: 1 node 1 size: 1024 MB node 1 mempolicy: interleave=1 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- monitor.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/monitor.c b/monitor.c index 93ac045..a40415d 100644 --- a/monitor.c +++ b/monitor.c @@ -74,6 +74,11 @@ #endif #include hw/lm32/lm32_pic.h +#ifdef CONFIG_NUMA +#include numa.h +#include numaif.h +#endif + //#define DEBUG //#define DEBUG_COMPLETION @@ -1808,6 +1813,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict) int i; CPUArchState *env; CPUState *cpu; +unsigned long first, next; monitor_printf(mon, %d nodes\n, nb_numa_nodes); for (i = 0; i nb_numa_nodes; i++) { @@ -1821,6 +1827,42 @@ static void do_info_numa(Monitor *mon, const QDict *qdict) monitor_printf(mon, \n); monitor_printf(mon, node %d size: % PRId64 MB\n, i, numa_info[i].node_mem 20); + +#ifdef CONFIG_NUMA +monitor_printf(mon, node %d mempolicy: , i); +switch (numa_info[i].flags NODE_HOST_POLICY_MASK) { +case NODE_HOST_BIND: +monitor_printf(mon, membind=); +break; +case NODE_HOST_INTERLEAVE: +monitor_printf(mon, interleave=); +break; +case NODE_HOST_PREFERRED: +monitor_printf(mon, preferred=); +break; +default: +monitor_printf(mon, default\n); +continue; +} + +if (numa_info[i].flags NODE_HOST_RELATIVE) +monitor_printf(mon, +); + +next = first = find_first_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS); +monitor_printf(mon, %lu, first); +do { +if (next == numa_max_node()) +break; +next = find_next_bit(numa_info[i].host_mem, MAX_CPUMASK_BITS, + next + 1); +if (next numa_max_node() || next == MAX_CPUMASK_BITS) +break; + +monitor_printf(mon, ,%lu, next); +} while (true); + +monitor_printf(mon, \n); +#endif } } -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 03/10] NUMA: Add Linux libnuma detection
Add detection of libnuma (mostly contained in the numactl package) to the configure script. Can be enabled or disabled on the command line, default is use if available. Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- configure | 32 1 file changed, 32 insertions(+) diff --git a/configure b/configure index 0e0adde..9d3b4ce 100755 --- a/configure +++ b/configure @@ -242,6 +242,7 @@ gtk= gtkabi=2.0 tpm=no libssh2= +numa= # parse CC options first for opt do @@ -944,6 +945,10 @@ for opt do ;; --enable-libssh2) libssh2=yes ;; + --disable-numa) numa=no + ;; + --enable-numa) numa=yes + ;; *) echo ERROR: unknown option $opt; show_help=yes ;; esac @@ -1158,6 +1163,8 @@ echo --gcov=GCOV use specified gcov [$gcov_tool] echo --enable-tpm enable TPM support echo --disable-libssh2disable ssh block device support echo --enable-libssh2 enable ssh block device support +echo --disable-numa disable libnuma support +echo --enable-numaenable libnuma support echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -2389,6 +2396,27 @@ EOF fi ## +# libnuma probe + +if test $numa != no ; then + numa=no + cat $TMPC EOF +#include numa.h +int main(void) { return numa_available(); } +EOF + + if compile_prog -lnuma ; then +numa=yes +libs_softmmu=-lnuma $libs_softmmu + else +if test $numa = yes ; then + feature_not_found linux NUMA (install numactl?) +fi +numa=no + fi +fi + +## # linux-aio probe if test $linux_aio != no ; then @@ -3557,6 +3585,7 @@ echo TPM support $tpm echo libssh2 support $libssh2 echo TPM passthrough $tpm_passthrough echo QOM debugging $qom_cast_debug +echo NUMA host support $numa if test $sdl_too_old = yes; then echo - Your SDL version is too old - please upgrade to have SDL support @@ -3590,6 +3619,9 @@ echo extra_cflags=$EXTRA_CFLAGS $config_host_mak echo extra_ldflags=$EXTRA_LDFLAGS $config_host_mak echo qemu_localedir=$qemu_localedir $config_host_mak echo libs_softmmu=$libs_softmmu $config_host_mak +if test $numa = yes; then + echo CONFIG_NUMA=y $config_host_mak +fi echo ARCH=$ARCH $config_host_mak -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 09/10] NUMA: add hmp command set-mpol
Add hmp command set-mpol to set host memory policy for a guest NUMA node. Then we can also set node's memory policy using the monitor command like: (qemu) set-mpol 0 mem-policy=membind,mem-hostnode=0-1 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- hmp-commands.hx | 16 hmp.c | 35 +++ hmp.h | 1 + 3 files changed, 52 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 915b0d1..417b69f 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1567,6 +1567,22 @@ Executes a qemu-io command on the given block device. ETEXI { +.name = set-mpol, +.args_type = nodeid:i,args:s?, +.params = nodeid [args], +.help = set host memory policy for a guest NUMA node, +.mhandler.cmd = hmp_set_mpol, +}, + +STEXI +@item set-mpol @var{nodeid} @var{args} +@findex set-mpol + +Set host memory policy for a guest NUMA node + +ETEXI + +{ .name = info, .args_type = item:s?, .params = [subcommand], diff --git a/hmp.c b/hmp.c index 2daed43..57a5730 100644 --- a/hmp.c +++ b/hmp.c @@ -1482,3 +1482,38 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, err); } + +void hmp_set_mpol(Monitor *mon, const QDict *qdict) +{ +Error *local_err = NULL; +bool has_mpol = true; +bool has_hostnode = true; +const char *mpol = NULL; +const char *hostnode = NULL; +QemuOpts *opts; + +uint64_t nodeid = qdict_get_int(qdict, nodeid); +const char *args = qdict_get_try_str(qdict, args); + +if (args == NULL) { +has_mpol = false; +has_hostnode = false; +} else { +opts = qemu_opts_parse(qemu_find_opts(numa), args, 1); +if (opts == NULL) { +error_setg(local_err, Parsing memory policy args failed); +} else { +mpol = qemu_opt_get(opts, mem-policy); +if (mpol == NULL) { +has_mpol = false; +} +hostnode = qemu_opt_get(opts, mem-hostnode); +if (hostnode == NULL) { +has_hostnode = false; +} +} +} + +qmp_set_mpol(nodeid, has_mpol, mpol, has_hostnode, hostnode, local_err); +hmp_handle_error(mon, local_err); +} diff --git a/hmp.h b/hmp.h index 56d2e92..81f631b 100644 --- a/hmp.h +++ b/hmp.h @@ -86,5 +86,6 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict); void hmp_chardev_add(Monitor *mon, const QDict *qdict); void hmp_chardev_remove(Monitor *mon, const QDict *qdict); void hmp_qemu_io(Monitor *mon, const QDict *qdict); +void hmp_set_mpol(Monitor *mon, const QDict *qdict); #endif -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 04/10] NUMA: parse guest numa nodes memory policy
The memory policy setting format is like: mem-policy={membind|interleave|preferred},mem-hostnode=[+|!]{all|N-N} And we are adding this setting as a suboption of -numa, the memory policy then can be set like following: -numa node,nodeid=0,mem=1024,cpus=0,mem-policy=membind,mem-hostnode=0-1 -numa node,nodeid=1,mem=1024,cpus=1,mem-policy=interleave,mem-hostnode=!1 Reviewed-by: Bandan Das b...@redhat.com Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- include/sysemu/sysemu.h | 8 vl.c| 110 2 files changed, 118 insertions(+) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 70fd2ed..993b8e0 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -130,10 +130,18 @@ extern QEMUClock *rtc_clock; #define MAX_NODES 64 #define MAX_CPUMASK_BITS 255 +#define NODE_HOST_NONE0x00 +#define NODE_HOST_BIND0x01 +#define NODE_HOST_INTERLEAVE 0x02 +#define NODE_HOST_PREFERRED 0x03 +#define NODE_HOST_POLICY_MASK 0x03 +#define NODE_HOST_RELATIVE0x04 extern int nb_numa_nodes; struct node_info { uint64_t node_mem; DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); +DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS); +unsigned int flags; }; extern struct node_info numa_info[MAX_NODES]; diff --git a/vl.c b/vl.c index 5207b8e..495b3a8 100644 --- a/vl.c +++ b/vl.c @@ -536,6 +536,14 @@ static QemuOptsList qemu_numa_opts = { .name = cpus, .type = QEMU_OPT_STRING, .help = cpu number or range +},{ +.name = mem-policy, +.type = QEMU_OPT_STRING, +.help = memory policy +},{ +.name = mem-hostnode, +.type = QEMU_OPT_STRING, +.help = host node number or range for memory policy }, { /* end of list */ } }, @@ -1374,6 +1382,79 @@ error: exit(1); } +static void numa_node_parse_mpol(int nodenr, const char *mpol) +{ +if (!mpol) { +return; +} + +if (!strcmp(mpol, interleave)) { +numa_info[nodenr].flags |= NODE_HOST_INTERLEAVE; +} else if (!strcmp(mpol, preferred)) { +numa_info[nodenr].flags |= NODE_HOST_PREFERRED; +} else if (!strcmp(mpol, membind)) { +numa_info[nodenr].flags |= NODE_HOST_BIND; +} else { +fprintf(stderr, qemu: Invalid memory policy: %s\n, mpol); +} +} + +static void numa_node_parse_hostnode(int nodenr, const char *hostnode) +{ +unsigned long long value, endvalue; +char *endptr; +bool clear = false; +unsigned long *bm = numa_info[nodenr].host_mem; + +if (hostnode[0] == '!') { +clear = true; +bitmap_fill(bm, MAX_CPUMASK_BITS); +hostnode++; +} +if (hostnode[0] == '+') { +numa_info[nodenr].flags |= NODE_HOST_RELATIVE; +hostnode++; +} + +if (!strcmp(hostnode, all)) { +bitmap_fill(bm, MAX_CPUMASK_BITS); +return; +} + +if (parse_uint(hostnode, value, endptr, 10) 0) +goto error; +if (*endptr == '-') { +if (parse_uint_full(endptr + 1, endvalue, 10) 0) { +goto error; +} +} else if (*endptr == '\0') { +endvalue = value; +} else { +goto error; +} + +if (endvalue = MAX_CPUMASK_BITS) { +endvalue = MAX_CPUMASK_BITS - 1; +fprintf(stderr, +qemu: NUMA: A max of %d host nodes are supported\n, + MAX_CPUMASK_BITS); +} + +if (endvalue value) { +goto error; +} + +if (clear) +bitmap_clear(bm, value, endvalue - value + 1); +else +bitmap_set(bm, value, endvalue - value + 1); + +return; + +error: +fprintf(stderr, qemu: Invalid host NUMA nodes range: %s\n, hostnode); +return; +} static int numa_add_cpus(const char *name, const char *value, void *opaque) { @@ -1385,6 +1466,25 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque) return 0; } +static int numa_add_mpol(const char *name, const char *value, void *opaque) +{ +int *nodenr = opaque; + +if (!strcmp(name, mem-policy)) { +numa_node_parse_mpol(*nodenr, value); +} +return 0; +} + +static int numa_add_hostnode(const char *name, const char *value, void *opaque) +{ +int *nodenr = opaque; +if (!strcmp(name, mem-hostnode)) { +numa_node_parse_hostnode(*nodenr, value); +} +return 0; +} + static int numa_init_func(QemuOpts *opts, void *opaque) { uint64_t nodenr, mem_size; @@ -1404,6 +1504,14 @@ static int numa_init_func(QemuOpts *opts, void *opaque) return -1; } +if (qemu_opt_foreach(opts, numa_add_mpol, nodenr, 1) 0) { +return -1; +} + +if (qemu_opt_foreach(opts, numa_add_hostnode, nodenr, 1) 0) { +return -1; +} + return 0; } @@ -2962,6 +3070,8 @@ int main(int argc, char
[Qemu-devel] [PATCH V4 02/10] NUMA: Add numa_info structure to contain numa nodes info
Add the numa_info structure to contain the numa nodes memory, VCPUs information and the future added numa nodes host memory policies. Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- cpus.c | 2 +- hw/i386/pc.c| 4 ++-- hw/net/eepro100.c | 1 - include/sysemu/sysemu.h | 8 ++-- monitor.c | 2 +- vl.c| 24 6 files changed, 22 insertions(+), 19 deletions(-) diff --git a/cpus.c b/cpus.c index 20958e5..496d5ce 100644 --- a/cpus.c +++ b/cpus.c @@ -1180,7 +1180,7 @@ void set_numa_modes(void) for (env = first_cpu; env != NULL; env = env-next_cpu) { cpu = ENV_GET_CPU(env); for (i = 0; i nb_numa_nodes; i++) { -if (test_bit(cpu-cpu_index, node_cpumask[i])) { +if (test_bit(cpu-cpu_index, numa_info[i].node_cpu)) { cpu-numa_node = i; } } diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 78f92e2..78b5a72 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -650,14 +650,14 @@ static FWCfgState *bochs_bios_init(void) unsigned int apic_id = x86_cpu_apic_id_from_index(i); assert(apic_id apic_id_limit); for (j = 0; j nb_numa_nodes; j++) { -if (test_bit(i, node_cpumask[j])) { +if (test_bit(i, numa_info[j].node_cpu)) { numa_fw_cfg[apic_id + 1] = cpu_to_le64(j); break; } } } for (i = 0; i nb_numa_nodes; i++) { -numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]); +numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem); } fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg, (1 + apic_id_limit + nb_numa_nodes) * diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c index dc99ea6..478c688 100644 --- a/hw/net/eepro100.c +++ b/hw/net/eepro100.c @@ -105,7 +105,6 @@ #define PCI_IO_SIZE 64 #define PCI_FLASH_SIZE (128 * KiB) -#define BIT(n) (1 (n)) #define BITS(n, m) (((0xU (31 - n)) (31 - n + m)) m) /* The SCB accepts the following controls for the Tx and Rx units: */ diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 2fb71af..70fd2ed 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -9,6 +9,7 @@ #include qapi-types.h #include qemu/notify.h #include qemu/main-loop.h +#include qemu/bitmap.h /* vl.c */ @@ -130,8 +131,11 @@ extern QEMUClock *rtc_clock; #define MAX_NODES 64 #define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; -extern uint64_t node_mem[MAX_NODES]; -extern unsigned long *node_cpumask[MAX_NODES]; +struct node_info { +uint64_t node_mem; +DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); +}; +extern struct node_info numa_info[MAX_NODES]; #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/monitor.c b/monitor.c index 9be515c..93ac045 100644 --- a/monitor.c +++ b/monitor.c @@ -1820,7 +1820,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict) } monitor_printf(mon, \n); monitor_printf(mon, node %d size: % PRId64 MB\n, i, -node_mem[i] 20); +numa_info[i].node_mem 20); } } diff --git a/vl.c b/vl.c index 6f2e17a..5207b8e 100644 --- a/vl.c +++ b/vl.c @@ -250,8 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order); int nb_numa_nodes; -uint64_t node_mem[MAX_NODES]; -unsigned long *node_cpumask[MAX_NODES]; +struct node_info numa_info[MAX_NODES]; uint8_t qemu_uuid[16]; @@ -1367,7 +1366,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus) goto error; } -bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); +bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1); return; error: @@ -1399,7 +1398,7 @@ static int numa_init_func(QemuOpts *opts, void *opaque) } mem_size = qemu_opt_get_size(opts, mem, 0); -node_mem[nodenr] = mem_size; +numa_info[nodenr].node_mem = mem_size; if (qemu_opt_foreach(opts, numa_add_cpus, nodenr, 1) 0) { return -1; @@ -2961,8 +2960,8 @@ int main(int argc, char **argv, char **envp) translation = BIOS_ATA_TRANSLATION_AUTO; for (i = 0; i MAX_NODES; i++) { -node_mem[i] = 0; -node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS); +numa_info[i].node_mem = 0; +bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS); } nb_numa_nodes = 0; @@ -4228,7 +4227,7 @@ int main(int argc, char **argv, char **envp) * and distribute the available memory equally across all nodes */ for (i = 0; i nb_numa_nodes; i++) { -if (node_mem[i] != 0) +if (numa_info[i].node_mem != 0) break; } if (i == nb_numa_nodes) { @@ -4238,14 +4237,15 @@ int
[Qemu-devel] [PATCH V4 07/10] NUMA: set guest numa nodes memory policy
Set the guest numa nodes memory policies using the mbind(2) system call node by node. After this patch, we are able to set guest nodes memory policies through the QEMU options, this arms to solve the guest cross nodes memory access performance issue. And as you all know, if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =kvm_assign_device() = kvm_iommu_map_memslots() = kvm_iommu_map_pages() = kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policies before the pages are really mapped. Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- cpus.c | 87 ++ 1 file changed, 87 insertions(+) diff --git a/cpus.c b/cpus.c index 496d5ce..7240de7 100644 --- a/cpus.c +++ b/cpus.c @@ -60,6 +60,15 @@ #endif /* CONFIG_LINUX */ +#ifdef CONFIG_NUMA +#include numa.h +#include numaif.h +#ifndef MPOL_F_RELATIVE_NODES +#define MPOL_F_RELATIVE_NODES (1 14) +#define MPOL_F_STATIC_NODES (1 15) +#endif +#endif + static CPUArchState *next_cpu; static bool cpu_thread_is_idle(CPUState *cpu) @@ -1171,6 +1180,75 @@ static void tcg_exec_all(void) exit_request = 0; } +#ifdef CONFIG_NUMA +static int node_parse_bind_mode(unsigned int nodeid) +{ +int bind_mode; + +switch (numa_info[nodeid].flags NODE_HOST_POLICY_MASK) { +case NODE_HOST_BIND: +bind_mode = MPOL_BIND; +break; +case NODE_HOST_INTERLEAVE: +bind_mode = MPOL_INTERLEAVE; +break; +case NODE_HOST_PREFERRED: +bind_mode = MPOL_PREFERRED; +break; +default: +bind_mode = MPOL_DEFAULT; +return bind_mode; +} + +bind_mode |= (numa_info[nodeid].flags NODE_HOST_RELATIVE) ? +MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES; + +return bind_mode; +} +#endif + +static int set_node_mpol(unsigned int nodeid) +{ +#ifdef CONFIG_NUMA +void *ram_ptr; +RAMBlock *block; +ram_addr_t len, ram_offset = 0; +int bind_mode; +int i; + +QTAILQ_FOREACH(block, ram_list.blocks, next) { +if (!strcmp(block-mr-name, pc.ram)) { +break; +} +} + +if (block-host == NULL) +return -1; + +ram_ptr = block-host; +for (i = 0; i nodeid; i++) { +len = numa_info[i].node_mem; +ram_offset += len; +} + +len = numa_info[i].node_mem; +bind_mode = node_parse_bind_mode(i); + +/* This is a workaround for a long standing bug in Linux' + * mbind implementation, which cuts off the last specified + * node. To stay compatible should this bug be fixed, we + * specify one more node and zero this one out. + */ +clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem); +if (mbind(ram_ptr + ram_offset, len, bind_mode, +numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) { +perror(mbind); +return -1; +} +#endif +return 0; +} + void set_numa_modes(void) { CPUArchState *env; @@ -1185,6 +1263,15 @@ void set_numa_modes(void) } } } + +#ifdef CONFIG_NUMA +for (i = 0; i nb_numa_nodes; i++) { +if (set_node_mpol(i) == -1) { +fprintf(stderr, +qemu: can't set host memory policy for node%d\n, i); +} +} +#endif } void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg) -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 05/10] NUMA: handle Error in cpus, mpol and hostnode parser
As Paolo pointed out that, handle Error in mpol and hostnode parser will make it easier to be used for example in mem-hotplug in the future. And this will be used later in set-mpol QMP command. Also handle Error in cpus parser to be consistent with others. Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- include/sysemu/sysemu.h | 4 vl.c| 42 -- 2 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 993b8e0..0f135fe 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -144,6 +144,10 @@ struct node_info { unsigned int flags; }; extern struct node_info numa_info[MAX_NODES]; +extern void numa_node_parse_mpol(int nodenr, const char *hostnode, + Error **errp); +extern void numa_node_parse_hostnode(int nodenr, const char *hostnode, + Error **errp); #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/vl.c b/vl.c index 495b3a8..38e0d3d 100644 --- a/vl.c +++ b/vl.c @@ -1338,7 +1338,7 @@ char *get_boot_devices_list(size_t *size) return list; } -static void numa_node_parse_cpus(int nodenr, const char *cpus) +static void numa_node_parse_cpus(int nodenr, const char *cpus, Error **errp) { char *endptr; unsigned long long value, endvalue; @@ -1378,13 +1378,14 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus) return; error: -fprintf(stderr, qemu: Invalid NUMA CPU range: %s\n, cpus); -exit(1); +error_setg(errp, Invalid NUMA CPU range: %s\n, cpus); +return; } -static void numa_node_parse_mpol(int nodenr, const char *mpol) +void numa_node_parse_mpol(int nodenr, const char *mpol, Error **errp) { if (!mpol) { +error_setg(errp, Should specify memory policy); return; } @@ -1395,11 +1396,11 @@ static void numa_node_parse_mpol(int nodenr, const char *mpol) } else if (!strcmp(mpol, membind)) { numa_info[nodenr].flags |= NODE_HOST_BIND; } else { -fprintf(stderr, qemu: Invalid memory policy: %s\n, mpol); +error_setg(errp, Invalid memory policy: %s, mpol); } } -static void numa_node_parse_hostnode(int nodenr, const char *hostnode) +void numa_node_parse_hostnode(int nodenr, const char *hostnode, Error **errp) { unsigned long long value, endvalue; char *endptr; @@ -1452,16 +1453,22 @@ static void numa_node_parse_hostnode(int nodenr, const char *hostnode) return; error: -fprintf(stderr, qemu: Invalid host NUMA nodes range: %s\n, hostnode); +error_setg(errp, Invalid host NUMA nodes range: %s, hostnode); return; } static int numa_add_cpus(const char *name, const char *value, void *opaque) { int *nodenr = opaque; +Error *err = NULL; if (!strcmp(name, cpu)) { -numa_node_parse_cpus(*nodenr, value); +numa_node_parse_cpus(*nodenr, value, err); +} +if (error_is_set(err)) { +fprintf(stderr, qemu: %s\n, error_get_pretty(err)); +error_free(err); +return -1; } return 0; } @@ -1469,19 +1476,34 @@ static int numa_add_cpus(const char *name, const char *value, void *opaque) static int numa_add_mpol(const char *name, const char *value, void *opaque) { int *nodenr = opaque; +Error *err = NULL; if (!strcmp(name, mem-policy)) { -numa_node_parse_mpol(*nodenr, value); +numa_node_parse_mpol(*nodenr, value, err); +} +if (error_is_set(err)) { +fprintf(stderr, qemu: %s\n, error_get_pretty(err)); +error_free(err); +return -1; } + return 0; } static int numa_add_hostnode(const char *name, const char *value, void *opaque) { int *nodenr = opaque; +Error *err = NULL; + if (!strcmp(name, mem-hostnode)) { -numa_node_parse_hostnode(*nodenr, value); +numa_node_parse_hostnode(*nodenr, value, err); } +if (error_is_set(err)) { +fprintf(stderr, qemu: %s\n, error_get_pretty(err)); +error_free(err); +return -1; +} + return 0; } -- 1.8.3.2.634.g7a3187e
[Qemu-devel] [PATCH V4 08/10] NUMA: add qmp command set-mpol to set memory policy for NUMA node
The QMP command let it be able to set node's memory policy through the QMP protocol. The qmp-shell command is like: set-mpol nodeid=0 mem-policy=membind mem-hostnode=0-1 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com --- cpus.c | 54 ++ qapi-schema.json | 15 +++ qmp-commands.hx | 35 +++ 3 files changed, 104 insertions(+) diff --git a/cpus.c b/cpus.c index 7240de7..ff42b9d 100644 --- a/cpus.c +++ b/cpus.c @@ -1417,3 +1417,57 @@ void qmp_inject_nmi(Error **errp) error_set(errp, QERR_UNSUPPORTED); #endif } + +void qmp_set_mpol(int64_t nodeid, bool has_mpol, const char *mpol, + bool has_hostnode, const char *hostnode, Error **errp) +{ +unsigned int flags; +DECLARE_BITMAP(host_mem, MAX_CPUMASK_BITS); + +if (nodeid = nb_numa_nodes) { +error_setg(errp, Only has '%d' NUMA nodes, nb_numa_nodes); +return; +} + +bitmap_copy(host_mem, numa_info[nodeid].host_mem, MAX_CPUMASK_BITS); +flags = numa_info[nodeid].flags; + +numa_info[nodeid].flags = NODE_HOST_NONE; +bitmap_zero(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS); + +if (!has_mpol) { +if (set_node_mpol(nodeid) == -1) { +error_setg(errp, Failed to set memory policy for node%lu, nodeid); +goto error; +} +return; +} + +numa_node_parse_mpol(nodeid, mpol, errp); +if (error_is_set(errp)) { +goto error; +} + +if (!has_hostnode) { +bitmap_fill(numa_info[nodeid].host_mem, MAX_CPUMASK_BITS); +} + +if (hostnode) { +numa_node_parse_hostnode(nodeid, hostnode, errp); +if (error_is_set(errp)) { +goto error; +} +} + +if (set_node_mpol(nodeid) == -1) { +error_setg(errp, Failed to set memory policy for node%lu, nodeid); +goto error; +} + +return; + +error: +bitmap_copy(numa_info[nodeid].host_mem, host_mem, MAX_CPUMASK_BITS); +numa_info[nodeid].flags = flags; +return; +} diff --git a/qapi-schema.json b/qapi-schema.json index 5c32528..0870da2 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3712,3 +3712,18 @@ '*cpuid-input-ecx': 'int', 'cpuid-register': 'X86CPURegister32', 'features': 'int' } } + +# @set-mpol: +# +# Set the host memory binding policy for guest NUMA node. +# +# @nodeid: The node ID of guest NUMA node to set memory policy to. +# +# @mem-policy: The memory policy string to set. +# +# @mem-hostnode: The host node or node range for memory policy. +# +# Since: 1.6.0 +## +{ 'command': 'set-mpol', 'data': {'nodeid': 'int', '*mem-policy': 'str', + '*mem-hostnode': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 362f0e1..ccab51b 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3043,3 +3043,38 @@ Example: - { return: {} } EQMP + +{ +.name = set-mpol, +.args_type = nodeid:i,mem-policy:s?,mem-hostnode:s?, +.help = Set the host memory binding policy for guest NUMA node, +.mhandler.cmd_new = qmp_marshal_input_set_mpol, +}, + +SQMP +set-mpol +-- + +Set the host memory binding policy for guest NUMA node + +Arguments: + +- nodeid: The nodeid of guest NUMA node to set memory policy to. +(json-int) +- mem-policy: The memory policy string to set. +(json-string, optional) +- mem-hostnode: The host nodes contained to mpol. + (json-string, optional) + +Example: + +- { execute: set-mpol, arguments: { nodeid: 0, mem-policy: membind, + mem-hostnode: 0-1 }} +- { return: {} } + +Notes: +1. If mem-policy is not set, the memory policy of this nodeid will be set + to default. +2. If mem-hostnode is not set, the node mask of this mpol will be set + to all. +EQMP -- 1.8.3.2.634.g7a3187e
Re: [Qemu-devel] [RFC V8 01/24] qcow2: Add journal specification.
Simple is good. Even for deduplication alone, I think data integrity is critical - otherwise we risk stale dedup metadata pointing to clusters that are unallocated or do not contain the right data. So the journal will probably need to follow techniques for commits/checksums. I agree that checksums are missing for the dedup. Maybe we could even use some kind of error correcting code instead of a checksum. Concerning data integrity the events that the deduplication code cannot loose are hash deletions because they mark a previously inserted hash as obsolete. The problem with a commit/flush mechanism on hash deletion is that it will slow down the store insertion speed and also create some extra SSD wear out. To solve this I considered the fact that the dedup metadata as a whole is disposable. So I implemented a dedup dirty bit. When QEMU stop the journal is flushed and the dirty bit is cleared. When QEMU start and the dirty bit is set a crash is detected and _all_ the deduplication metadata is dropped. The QCOW2 data integrity won't suffer only the dedup ratio will be lower. As you said once on irc crashes don't happen often. Benoît
Re: [Qemu-devel] 回复: Re: 回复: Re: Which part of qemu responds to ACPI control method?
On 07/04/13 08:05, bobooscar wrote: Thank you laszlo. however, after I got DSDT.dsl, I found that there is no “_PTS” method, even if “_TTS” “_GTS”. Then I go through all the acpi tables, still found no PTS/TTS methods: acpidump acpidump.out acpixtract -a acpidump.out for file in `ls |grep dat`; do iasl -a $file; done The guest os is redhat 6.1 hvm. What does that mean? Does that mean this OS does not support sleep/wakeup(suspend/resume) with acpi? What caused this problem? Does that have anything to do with qemu? (I tried to add logs in hwsleep.c:acpi_enter_sleep_mode in the guest kernel code, and found that the os does not get here) Some tables can have several instances, like SSDT; see the --skip option. But, it seems reasonable that you have found no _PTS method, as SeaBIOS doesn't seem to define such. Since you started your email with _PTS, I treated the method as something given in your case. Now I'm supposing you use SeaBIOS and _PTS not being there is consistent with that. So, back to square 1, what is your *actual* problem? In any of the dumped / decompiled SSDT tables, do you see _S3, _S4, _S5 objects? Maybe try passing the following options to qemu: -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 Laszlo
Re: [Qemu-devel] [PATCH v3 01/14] tcg: Add myself to general TCG maintainership
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: Signed-off-by: Richard Henderson r...@twiddle.net I think this is definitely a good idea; thanks! Acked-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 02/14] tcg: Split rem requirement from div requirement
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: There are several hosts with only a div insn. Remainder is computed manually from the quotient and inputs. We can do this generically. Signed-off-by: Richard Henderson r...@twiddle.net --- a/tcg/tci/tcg-target.h +++ b/tcg/tci/tcg-target.h @@ -61,6 +61,7 @@ #define TCG_TARGET_HAS_bswap32_i32 1 /* Not more than one of the next two defines must be 1. */ #define TCG_TARGET_HAS_div_i32 1 +#define TCG_TARGET_HAS_rem_i32 1 #define TCG_TARGET_HAS_div2_i32 0 #define TCG_TARGET_HAS_ext8s_i321 #define TCG_TARGET_HAS_ext16s_i32 1 @@ -85,6 +86,7 @@ #define TCG_TARGET_HAS_deposit_i64 1 /* Not more than one of the next two defines must be 1. */ #define TCG_TARGET_HAS_div_i64 0 +#define TCG_TARGET_HAS_rem_i64 0 #define TCG_TARGET_HAS_div2_i64 0 #define TCG_TARGET_HAS_ext8s_i641 #define TCG_TARGET_HAS_ext16s_i64 1 The added line in the these two hunks makes the comments wrong, doesn't it? Other than that, Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 06/14] tcg: Allow non-constant control macros
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: This allows TCG_TARGET_HAS_* to be a variable rather than a constant, which allows easier support for differing ISA levels for the host. The effect of this is that TCG_OPF_NOT_PRESENT means if set, op is definitely not present; if not set, op might or might not be present, right? Which is OK because it's just a debug guard/sanity check. (That might be worth noting in a comment I guess.) Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 07/14] tcg: Simplify logic using TCG_OPF_NOT_PRESENT
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: Expand the definition of not present to include should not be present. This means we can simplify the logic surrounding the generic tcg opcodes for which the host backend ought not be providing definitions. Signed-off-by: Richard Henderson r...@twiddle.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 03/14] tcg-arm: Don't implement rem
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: Signed-off-by: Richard Henderson r...@twiddle.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 08/14] tcg-arm: Make use of conditional availability of opcodes for divide
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: We can now detect and use divide instructions at runtime, rather than having to restrict their availability to compile-time. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/arm/tcg-target.c | 16 ++-- tcg/arm/tcg-target.h | 14 -- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index 8321f80..2c46ceb 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -67,6 +67,13 @@ static const int use_armv7_instructions = 0; #endif #undef USE_ARMV7_INSTRUCTIONS +#ifndef use_idiv_instructions +bool use_idiv_instructions; +#endif +#ifdef CONFIG_GETAUXVAL +# include sys/auxv.h +#endif My ARM system doesn't have a sys/auxv.h, which renders most of this patch a bit moot (and certainly untestable :-)). Do newer glibc have this? + #ifndef NDEBUG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { %r0, @@ -2029,16 +2036,21 @@ static const TCGTargetOpDef arm_op_defs[] = { { INDEX_op_deposit_i32, { r, 0, rZ } }, -#if TCG_TARGET_HAS_div_i32 { INDEX_op_div_i32, { r, r, r } }, { INDEX_op_divu_i32, { r, r, r } }, -#endif { -1 }, }; static void tcg_target_init(TCGContext *s) { +#if defined(CONFIG_GETAUXVAL) !defined(use_idiv_instructions) +{ +unsigned long hwcap = getauxval(AT_HWCAP); +use_idiv_instructions = hwcap (HWCAP_ARM_IDIVA | HWCAP_ARM_IDIVT); Doesn't this mean we'll try to use the ARM division insns even if the CPU only supports the Thumb encodings? I think you should only be testing for whether HWCAP_ARM_IDIVA is set. +} +#endif thanks -- PMM
Re: [Qemu-devel] [PATCH v3 09/14] tcg-arm: Simplify logic in detecting the ARM ISA in use
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: -#if defined(__ARM_ARCH_7__) || \ -defined(__ARM_ARCH_7A__) || \ -defined(__ARM_ARCH_7EM__) || \ -defined(__ARM_ARCH_7M__) || \ -defined(__ARM_ARCH_7R__) -#define USE_ARMV7_INSTRUCTIONS +/* The __ARM_ARCH define is provided by gcc 4.8. Construct it otherwise. */ +#ifndef __ARM_ARCH +# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \ + || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \ + || defined(__ARM_ARCH_7EM__) +# define __ARM_ARCH 7 +# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \ + || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \ + || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__) +# define __ARM_ARCH 6 +# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \ + || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \ + || defined(__ARM_ARCH_5TEJ__) +# define __ARM_ARCH 5 +# else +# define __ARM_ARCH 4 +# endif #endif -#if defined(USE_ARMV7_INSTRUCTIONS) || \ -defined(__ARM_ARCH_6J__) || \ -defined(__ARM_ARCH_6K__) || \ -defined(__ARM_ARCH_6T2__) || \ -defined(__ARM_ARCH_6Z__) || \ -defined(__ARM_ARCH_6ZK__) -#define USE_ARMV6_INSTRUCTIONS -#endif - -#if defined(USE_ARMV6_INSTRUCTIONS) || \ -defined(__ARM_ARCH_5T__) || \ -defined(__ARM_ARCH_5TE__) || \ -defined(__ARM_ARCH_5TEJ__) -#define USE_ARMV5_INSTRUCTIONS -#endif This change means we now set use_armv5_instructions for __ARCH_ARCH_5__ and __ARM_ARCH_5E__, which we didn't before. However one of the things that bool is gating is whether we use the 'blx' insn, which is ARMv5T and above only. So this will break v5-but-not-v5T CPUs. (use_armv6_instructions is similarly now set for __ARCH_ARCH_6__ where it was not before, but none of the things we guard with that test are insns that aren't in base v6.) thanks -- PMM
Re: [Qemu-devel] [PATCH 6/9] vhost-scsi: new device supporting the tcm_vhost Linux kernel module
Il 03/07/2013 14:33, Libaiqing ha scritto: Hi asias, I got the rootcause:guest was installed on raw img with lvm partition,which vhost does not support. You mean LVM in the host or the guest? I guess in the host, but I'd rather make sure because otherwise you've found a bug. Paolo Now vhost-scsi can be used as bootable device.
Re: [Qemu-devel] [PATCH v3 12/14] tcg: Move the CIE and FDE header definitions to common code
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: These will necessarily be the same layout for all hosts. This limits the amount of boilerplate required to implement jit debug for a host. Signed-off-by: Richard Henderson r...@twiddle.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 14/14] tcg-arm: Implement tcg_register_jit
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: Allows unwinding past the code_gen_buffer. Signed-off-by: Richard Henderson r...@twiddle.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH v3 11/14] tcg: Fix high_pc fields in .debug_info
On 3 July 2013 22:29, Richard Henderson r...@twiddle.net wrote: I don't think the debugger actually looks at this for anything, using the correct .debug_frame contents, but might as well get it all correct. Signed-off-by: Richard Henderson r...@twiddle.net Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM
Re: [Qemu-devel] [PATCH] highbank: add initial Calxeda Midway A15 support
On 28 June 2013 12:59, Andre Przywara andre.przyw...@calxeda.com wrote: From: Rob Herring rob.herr...@calxeda.com While the Calxeda Midway part is actually a bit more than a Highbank with A15s, for QEMU's purposes this view is sufficient. So to allow both emulation with that chip as well as KVM guests using that model add an A15 CPU and it's peripherals as an option. The use of: -M highbank -cpu cortex-a15 simply gives the new chip without the need for a new model. I don't think we have any other board models which do I'm going to guess which board you actually wanted based on which CPU you specified, do we? I think it would be nicer just to have a '-M midway' which gave you the right CPU and peripherals. thanks -- PMM
[Qemu-devel] [Bug 1197663] Re: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size
I tried booting with the qemu and observed the same thing. qemu-system-x86_64 /home/images/rhel64-64.qcow2 -drive if=none,id=hd,file=/home/images/virtio-scsi11.img -device virtio-scsi- pci,id=scsi --enable-kvm -device scsi-hd,drive=hd -m 2000 After creating the filesystem tried running the iozone and noticed disk out of space issue. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1197663 Title: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size Status in QEMU: New Status in Fedora: New Bug description: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size. Kernel Version: 3.10.0-rc5+ Libvirt Version: 1.0.6 Qemu Version: 1.5.50 Steps to reproduce the issue: 1. Create a qcow2 voulme using the command qemu-img create -f qcow2 virtio-scsi11.img 10G 2. Add the virtio-scsi controller controller type='scsi' index='0' model='virtio-scsi' address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /controller 3. Attach the qcow2 device to the guest, virsh attach-disk rhel64-64 /home/images/virtio-scsi11.img --persistent sdr --cache writethrough 4. Run the scan commnad echo ' - - - ' /sys/class/scsi_host/host#/scan, if the attached volume doesn't get recognize. 5. Check the dmesg for the added volume. 6. Run fdisk -l command Disk /dev/sdl: 0 MB, 197120 bytes 1 heads, 1 sectors/track, 385 cylinders, total 385 sectors Units = cylinders of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x And observe that the 10G qcow2 volume shows as 0MB. This is not seen with the raw image. Disk /dev/sdm: 10.7 GB, 10737418240 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x Expected Result: The volume size for the qcow2 volumes should be shown correctly inside the guest to avoid confusion. Guest XML: virsh dumpxml rhel64-64 domain type='kvm' id='4' namerhel64-64/name uuid48deb0e1-0c23-9be9-da12-2ead34864de2/uuid memory unit='KiB'4096000/memory currentMemory unit='KiB'4096000/currentMemory vcpu placement='static'1/vcpu resource partition/machine/partition /resource os type arch='x86_64' machine='pc-i440fx-1.5'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/local/bin/qemu-system-x86_64/emulator disk type='file' device='disk' driver name='qemu' type='qcow2' cache='none'/ source file='/home/images/rhel64-64.qcow2'/ target dev='hda' bus='ide'/ alias name='ide0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='cdrom' driver name='qemu' type='raw'/ source file='/home/upstream/autotest/virt-test/shared/data/isos/RHEL-6.4-x86_64-DVD.iso'/ target dev='hdb' bus='ide'/ readonly/ alias name='ide0-0-1'/ address type='drive' controller='0' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi11.img'/ target dev='sda' bus='scsi'/ alias name='scsi0-0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi1.img'/ target dev='sdf' bus='scsi'/ alias name='scsi0-0-0-5'/ address type='drive' controller='0' bus='0' target='0' unit='5'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi9.img'/ target dev='sdg' bus='scsi'/ alias name='scsi0-0-0-6'/ address type='drive' controller='0' bus='0' target='0' unit='6'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi8.img'/ target dev='sdh' bus='scsi'/ alias name='scsi1-0-0'/ address type='drive' controller='1' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi10.img'/ target
[Qemu-devel] [Bug 1197663] Re: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size
** Attachment added: Screen shot https://bugs.launchpad.net/fedora/+bug/1197663/+attachment/3724283/+files/Screenshot2.png -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1197663 Title: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size Status in QEMU: New Status in Fedora: New Bug description: qcow2 [virtio-scsi] devices when mapped to the guest shows as 0MB irrespective of the volume size. Kernel Version: 3.10.0-rc5+ Libvirt Version: 1.0.6 Qemu Version: 1.5.50 Steps to reproduce the issue: 1. Create a qcow2 voulme using the command qemu-img create -f qcow2 virtio-scsi11.img 10G 2. Add the virtio-scsi controller controller type='scsi' index='0' model='virtio-scsi' address type='pci' domain='0x' bus='0x00' slot='0x04' function='0x0'/ /controller 3. Attach the qcow2 device to the guest, virsh attach-disk rhel64-64 /home/images/virtio-scsi11.img --persistent sdr --cache writethrough 4. Run the scan commnad echo ' - - - ' /sys/class/scsi_host/host#/scan, if the attached volume doesn't get recognize. 5. Check the dmesg for the added volume. 6. Run fdisk -l command Disk /dev/sdl: 0 MB, 197120 bytes 1 heads, 1 sectors/track, 385 cylinders, total 385 sectors Units = cylinders of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x And observe that the 10G qcow2 volume shows as 0MB. This is not seen with the raw image. Disk /dev/sdm: 10.7 GB, 10737418240 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x Expected Result: The volume size for the qcow2 volumes should be shown correctly inside the guest to avoid confusion. Guest XML: virsh dumpxml rhel64-64 domain type='kvm' id='4' namerhel64-64/name uuid48deb0e1-0c23-9be9-da12-2ead34864de2/uuid memory unit='KiB'4096000/memory currentMemory unit='KiB'4096000/currentMemory vcpu placement='static'1/vcpu resource partition/machine/partition /resource os type arch='x86_64' machine='pc-i440fx-1.5'hvm/type boot dev='hd'/ /os features acpi/ apic/ pae/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/local/bin/qemu-system-x86_64/emulator disk type='file' device='disk' driver name='qemu' type='qcow2' cache='none'/ source file='/home/images/rhel64-64.qcow2'/ target dev='hda' bus='ide'/ alias name='ide0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='cdrom' driver name='qemu' type='raw'/ source file='/home/upstream/autotest/virt-test/shared/data/isos/RHEL-6.4-x86_64-DVD.iso'/ target dev='hdb' bus='ide'/ readonly/ alias name='ide0-0-1'/ address type='drive' controller='0' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi11.img'/ target dev='sda' bus='scsi'/ alias name='scsi0-0-0-0'/ address type='drive' controller='0' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi1.img'/ target dev='sdf' bus='scsi'/ alias name='scsi0-0-0-5'/ address type='drive' controller='0' bus='0' target='0' unit='5'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi9.img'/ target dev='sdg' bus='scsi'/ alias name='scsi0-0-0-6'/ address type='drive' controller='0' bus='0' target='0' unit='6'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi8.img'/ target dev='sdh' bus='scsi'/ alias name='scsi1-0-0'/ address type='drive' controller='1' bus='0' target='0' unit='0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='writethrough'/ source file='/home/images/virtio-scsi10.img'/ target dev='sdi' bus='scsi'/ alias name='scsi1-0-1'/ address type='drive' controller='1' bus='0' target='0' unit='1'/ /disk disk type='file' device='disk' driver name='qemu' type='raw'
Re: [Qemu-devel] [PATCH V3 4/9] qmp: add internal snapshot support in qmp_transaction
On Thu, Jun 27, 2013 at 10:41:43AM +0800, Wenchao Xia wrote: +/* check whether a snapshot with name exist, no need to check id, since + name will be checked later to make sure it does not mess up with id. */ +ret = bdrv_snapshot_find_by_id_and_name(bs, NULL, name, sn, errp); +if (error_is_set(errp)) { +return; +} +if (ret) { +error_setg(errp, + Snapshot with name '%s' already exist on device '%s', s/exist/exists/ + name, device); +return; +} + +/* Forbid having a name similar to id, empty name is also forbidden. */ +if (!snapshot_name_wellformed(name)) { +error_setg(errp, Name '%s' on device '%s' is not a valid one, + name, device); +return; +} + +/* 3. take the snapshot */ +sn1 = state-sn; +pstrcpy(sn1-name, sizeof(sn1-name), name); +qemu_gettimeofday(tv); +sn1-date_sec = tv.tv_sec; +sn1-date_nsec = tv.tv_usec * 1000; +sn1-vm_clock_nsec = qemu_get_clock_ns(vm_clock); + +if (bdrv_snapshot_create(bs, sn1) 0) { +error_setg(errp, Failed to create snapshot '%s' on device '%s', + name, device); Please use error_setg_errno() to include the bdrv_snapshot_create() error message. @@ -1009,6 +1010,18 @@ the new image file has the same contents as the current one; QEMU cannot perform any meaningful check. Typically this is achieved by using the current image file as the backing file for the new image. +On failure, the original disks pre-snapshot attempt will be used. + +For internal snapshots, the dictionary contains the device and the snapshot's +name. If name is a numeric string which will mess up with ID, the request will This is about namespace collision. Collide or conflict are usually used to describe identical naming problems. Instead of mess up I would say something like: The name must not be a numeric string since this collides with snapshot IDs and an error will be returned. +be rejected. For example, name 99 is not a valid name. If an internal +snapshot matching name already exists, the request will be also rejected. Only +some image formats support it, for example, qcow2, rbd, and sheepdog. + +On failure, qemu will try delete new created internal snapshot in the s/new created/the newly created/ +transaction. When I/O error causes deletion failure, the user needs to fix it When an I/O error occurs during deletion, ...
Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
Il 03/07/2013 23:23, Peter Lieven ha scritto: BDC is not used. I had an implementation that sent multiple descriptors out, but at least for my storage the maximum unmap counts not for each descriptors, but for all together. So in this case we do not need the field at all. I forgot to remove it. discard and write_zeroes will both only send one request up to max_unmap in size. apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1? Yes. On the other hand note that WRITE_SAME should be guaranteed _not_ to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero payload, but I suspect there may be buggy targets here. I have read in the specs something that the target might unmap the blocks or not touch them at all. Maybe you have more information. That's even true of UNMAP itself, actually. :) The storage can always upgrade a block from unmapped to anchored and from anchored to allocated, so UNMAP can be a no-op and still comply with the standard. Paolo
Re: [Qemu-devel] [PATCH V3 5/9] qmp: add interface blockdev-snapshot-internal-sync
On Thu, Jun 27, 2013 at 10:41:44AM +0800, Wenchao Xia wrote: diff --git a/qapi-schema.json b/qapi-schema.json index 2547a7d..fba9b15 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1738,6 +1738,28 @@ '*mode': 'NewImageMode'} } ## +# @blockdev-snapshot-internal-sync +# +# Synchronously take an internal snapshot of a block device, when the format +# of the image used supports it. +# +# @device: the name of the device to generate the snapshot from +# +# @name: the name of new snapshot name +# +# Returns: nothing on success +# If @device is not a valid block device, DeviceNotFound +# If any snapshot matching @name exists, or the name string is invalid +# which may mess up with snapshot ID, or name is empty, GenericError s/mess up with snapshot ID/collide with snapshot IDs/ +image used supports it. If the name is a numeric string which may mess up with Same here.
Re: [Qemu-devel] [PATCH V3 6/9] qmp: add interface blockdev-snapshot-delete-internal-sync
On Thu, Jun 27, 2013 at 10:41:45AM +0800, Wenchao Xia wrote: diff --git a/blockdev.c b/blockdev.c index ce89f83..35fffd6 100644 --- a/blockdev.c +++ b/blockdev.c @@ -790,6 +790,67 @@ void qmp_blockdev_snapshot_internal_sync(const char *device, snapshot, errp); } +SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device, + bool has_id, + const char *id, + bool has_name, + const char *name, + Error **errp) +{ +BlockDriverState *bs = bdrv_find(device); +QEMUSnapshotInfo sn; +Error *local_err = NULL; +SnapshotInfo *info = NULL; +int ret; + +if (!bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, device); +return NULL; +}; Spurious ';' + +if (!has_id) { +id = NULL; +} + +if (!has_name) { +name = NULL; +} + +if (!id !name) { +error_setg(errp, Name or id must be provided); +return NULL; +} + +ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, sn, local_err); +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +return NULL; +} +if (!ret) { +error_setg(errp, + Snapshot with id '%s' and name '%s' do not exist on s/do not exist/does not exist/ diff --git a/qapi-schema.json b/qapi-schema.json index fba9b15..ffcdca7 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1760,6 +1760,33 @@ 'data': { 'device': 'str', 'name': 'str'} } ## +# @blockdev-snapshot-delete-internal-sync +# +# Synchronously delete an internal snapshot of a block device, when the format +# of the image used support it. The snapshot is identified by name or id or +# both. One of the name or id is required. It will returns SnapshotInfo of +# successfully deleted snapshot. Return SnapshotInfo for the successfully deleted snapshot. +SQMP +blockdev-snapshot-delete-internal-sync +-- + +Synchronously delete an internal snapshot of a block device when the format of +image used support it. The snapshot is identified by name or id or both. One s/support/supports/ +of the name or id is required. If the snapshot is not found, operation will +fail. s/One of the name or id/One of name or id/ s/operation will fail/the operation will fail/
Re: [Qemu-devel] [PATCH V3 0/9] add internal snapshot support at block device level
On Wed, Jul 03, 2013 at 09:52:10AM +0800, Wenchao Xia wrote: Any comments for this version? I'm happy with the code and left comments on error messages and documentation.
Re: [Qemu-devel] [Xen-devel] [PATCH] libxl: Spice usbredirection support for upstream qemu
Please don't use HTML in emails On Thu, 4 Jul 2013, Fabio Fantoni wrote: Il 04/07/2013 12:32, Wei Liu ha scritto: On Thu, Jul 04, 2013 at 12:16:43PM +0200, Fabio Fantoni wrote: Il 04/07/2013 12:12, Wei Liu ha scritto: On Thu, Jul 04, 2013 at 12:05:59PM +0200, Fabio Fantoni wrote: Usage: spiceusbredirection=1|0 (default=0) Enables spice usbredirection. The Spice usbredirection creates usb2 controller and 4 usbredirection channels for redirection of up to 4 usb devices from spice client to domU's qemu. Signed-off-by: Fabio Fantoni fabio.fant...@m2r.biz --- docs/man/xl.cfg.pod.5 |8 tools/libxl/libxl_create.c |1 + tools/libxl/libxl_dm.c | 18 ++ tools/libxl/libxl_types.idl |1 + tools/libxl/xl_cmdimpl.c|2 ++ 5 files changed, 30 insertions(+) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 766862d..a450800 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -1134,6 +1134,14 @@ requires vdagent service installed on domU o.s. to work. The default is 0. =back +=item Bspiceusbredirection=BOOLEAN + +Enables spice usbredirection. The Spice usbredirection creates usb2 +controller and 4 usbredirection channels for redirection of up to 4 usb +devices from spice client to domU's qemu. The default is 0. + +=back + =head3 Miscellaneous Emulated Hardware =over 4 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 8db5460..58df106 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -289,6 +289,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, false); libxl_defbool_setdefault(b_info-u.hvm.spice.agent_mouse, true); libxl_defbool_setdefault(b_info-u.hvm.spice.vdagent, false); +libxl_defbool_setdefault(b_info-u.hvm.spice.usbredirection, false); } libxl_defbool_setdefault(b_info-u.hvm.nographic, false); diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index bc605e4..4f625e0 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -471,6 +471,24 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, virtserialport,chardev=vdagent,name=com.redhat.spice.0, NULL); } + +if (libxl_defbool_val(b_info-u.hvm.spice.usbredirection)) { +flexarray_vappend(dm_args, -device,ich9-usb-ehci1,id=usb, +bus=pci.0,addr=0x1d.0x7, -device,ich9-usb-uhci1, +masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on, +addr=0x1d.0x0, -device,ich9-usb-uhci2,masterbus=usb.0, +firstport=2,bus=pci.0,addr=0x1d.0x1, -device, +ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0, +addr=0x1d.0x2, -chardev,spicevmc,name=usbredir, + id=usbrc1,-device,usb-redir,chardev=usbrc1,id=usbrc1, +bus=usb.0, -chardev,spicevmc,name=usbredir,id=usbrc2, +-device,usb-redir,chardev=usbrc2,id=usbrc2,bus=usb.0, +-chardev,spicevmc,name=usbredir,id=usbrc3,-device, +usb-redir,chardev=usbrc3,id=usbrc3,bus=usb.0, -chardev, +spicevmc,name=usbredir,id=usbrc4,-device,usb-redir, +chardev=usbrc4,id=usbrc4,bus=usb.0, NULL); Any reason for so many hardcoded options? I searched and requested for one year on spice-devel and qemu-devel about alternative methods but nothing found for now. Already tried usb=1 which creates usb1 controller that is not working with usb redirection. What if QEMU upstream changes and these options don't work any more? In that case this functionality is broken and users have no way to workaround it. IMHO unless they are clearly documented we should not consider adding in theses hardcoded options in libxl. Added to cc spice-devel and qemu-devel for ask again if there is a better solution to do this. @spice-devel and qemu-devel: Can someone help to improve qemu options above for enable usb redirection please? Thanks for any reply. It's not about improving qemu options for usb redirection (even though they could use a simplification), it's whether they are guaranteed to be stable. Are you sure that a future QEMU release is going to work with these options? For example, is libvirt using something similar to this? Usually QEMU cmdline options are considered a stable interface, so I wouldn't worry too much about it.
Re: [Qemu-devel] [PATCH V16 0/7] replace QEMUOptionParameter with QemuOpts parser
On Tue, Jun 18, 2013 at 05:31:52PM +0800, Dong Xu Wang wrote: These patches will replace QEMUOptionParameter with QemuOpts. Change logs please go to each patch's commit message. Dong Xu Wang (7): add def_value_str in QemuOptDesc struct and rewrite qemu_opts_print avoid duplication of default value in QemuOpts Create four QemuOptsList related functions Create some QemuOpts functons Use QemuOpts support in block layer query-command-line-options outputs def_value_str remove QEMUOptionParameter related functions and struct block.c | 100 - block/cow.c | 52 ++--- block/gluster.c | 37 ++- block/iscsi.c | 31 ++- block/qcow.c | 67 +++--- block/qcow2.c | 199 block/qed.c | 108 + block/qed.h | 2 +- block/raw-posix.c | 59 +++-- block/raw-win32.c | 31 +-- block/raw.c | 30 +-- block/rbd.c | 62 +++-- block/sheepdog.c | 81 --- block/ssh.c | 29 ++- block/vdi.c | 70 +++--- block/vmdk.c | 129 ++- block/vpc.c | 65 +++--- block/vvfat.c | 11 +- include/block/block.h | 5 +- include/block/block_int.h | 6 +- include/qemu/option.h | 56 ++--- qapi-schema.json | 5 +- qemu-img.c| 65 +++--- qmp-commands.hx | 2 + util/qemu-config.c| 4 + util/qemu-option.c| 562 +- 26 files changed, 906 insertions(+), 962 deletions(-) -- V15-V16: 1) discard double-initialization. 2) use pointer directly, not g_strdup. 3) modify query-command-line-options related code. V14-V15: 1) Only delete enum QEMUOptionParType. eblake: You commented on the last revision. Are you happy with v16? Stefan
Re: [Qemu-devel] [PATCH v5 10/11] qemu-ga: Install Windows VSS provider on `qemu-ga -s install'
Il 03/07/2013 18:19, Tomoki Sekiyama ha scritto: On 7/3/13 11:58 , Paolo Bonzini pbonz...@redhat.com wrote: Il 03/07/2013 17:49, Tomoki Sekiyama ha scritto: -return ga_install_service(path, log_filepath, fixed_state_dir); +if (ga_install_vss_provider()) { +return EXIT_FAILURE; +} +if (ga_install_service(path, log_filepath, fixed_state_dir)) { +ga_uninstall_vss_provider(); +return EXIT_FAILURE; +} +return 0; } else if (strcmp(service, uninstall) == 0) { +ga_uninstall_vss_provider(); return ga_uninstall_service(); I think this shouldn't be a hard failure. Only the freeze/thaw commands should fail. Paolo Do you mean that qemu-ga should work without qga-provider.dll etc. even if it is configured --with-vss-sdk ? Yes, and I'm even wondering if we should move all VSS code to a DLL (provider and requestor---they are very tied to each other anyway because of hEventFrozen/hEventThaw), and have qemu-ga simply look for qga-provider.dll dropped into the executable directory. Then qemu-ga can look for it even if it is not configured --with-vss-sdk. This is because the license of the SDK may be problematic for distributions that compile qemu-ga from source. These distribution cannot distribute the SDK, and thus they will not be able to compile and distribute the provider DLL. Still, we should make it as easy as possible to combine a DLL and executable from separate sources into---for example---a single MSI. Paolo
Re: [Qemu-devel] [PATCH 1/2] Refine and export infinite loop checking in collect_image_info_list()
On Fri, Jun 28, 2013 at 02:37:52PM -0600, Eric Blake wrote: On 06/27/2013 01:38 AM, Xu Wang wrote: +filenames = g_hash_table_new_full(g_str_hash, str_equal_func, NULL, NULL); + +/* If backing file exists, filename will insert into hash table and seek + * the whole backing file chain from @backing_file. + */ +if (backing_file) { +g_hash_table_insert(filenames, (gpointer)filename, NULL); Does this have any false positives (perhaps mishandling due to relative names) or false negatives (perhaps hard links allow different spellings of the same file to create a loop, although the difference in names won't indicate the problem)? I'd really like to see you add a testcase before this patch gets committed, although I agree that a patch along these lines is worthwhile. For example, make sure the following chain is not rejected: /dir1/base.img - /dir1/wrap.img(relative backing 'base.img') - /dir2/base.img (absolute backing '/dir1/base.img') - /dir2/wrap.img(relative backing 'base.img') whether opened in /dir2/ via relative name 'wrap.img' or absolute name '/dir2/wrap.img'. Likewise, make sure you can detect this loop: create directory 'dir' create './dir/b.img' create './b.img' with relative backing 'dir/b.img' remove ./dir/b.img and dir ln -s . dir now 'b.img' refers to itself as backing file, even though the names ./b.img and ./dir/b.img are not equal by strcmp. Yes, a test case should be added in tests/qemu-iotests/. Please see this wiki page for documentation: http://qemu-project.org/Documentation/QemuIoTests Stefan
[Qemu-devel] [RFC PATCH] elfload: load PIE executables to right address
PIE images are ET_DYN images. Check first for pinterp_name to make sure the main executable always is loaded to correct place. See below for current behaviour of PIE executables: Reserved 0x7f00 bytes of guest address space host mmap_min_addr=0x1000 guest_base 0x7f7cb41d5000 startend size prot 0037f400-003fe400 0007f000 r-x 003fe400-003ff400 1000 --- 003ff400-003fe400 f000 rw- 003fe400-003ff400 1000 --- 003ff400-003ffc00 0800 rw- 003ffc00-003fec00 f000 r-x 003fec00-003ffc00 1000 --- 003ffc00-0007f000 ffc7f400 rw- start_brk 0x end_code0x7eff7ac0 start_code 0x7eff7000 start_data 0x7efffac0 end_data0x7efffc18 start_stack 0x7eff6dc8 brk 0x7efffc34 entry 0x7e799b30 -5000 ---p 00:00 0 5000-00015000 rw-p 00:00 0 00015000-7e77d000 ---p 00:00 0 7e77d000-7e7ec000 r-xp 68:03 14326298 /lib/libc.so 7e7ec000-7e7f3000 ---p 00:00 0 7e7f3000-7e7f4000 rw-p 0006e000 68:03 14326298 /lib/libc.so 7e7f4000-7e7f6000 rw-p 00:00 0 7e7f6000-7e7f7000 ---p 00:00 0 7e7f7000-7eff7000 rw-p 00:00 0 7eff7000-7eff8000 r-xp 68:03 9731305 /usr/bin/brk 7eff8000-7efff000 ---p 00:00 0 7e7f7000-7eff7000 rw-p 00:00 0 [stack] Showing how the main binary got loaded to wrong place. Signed-off-by: Timo Teräs timo.te...@iki.fi --- I assume pinterp_name is only ever set for the main executable. Quick grep would indicate that this is indeed the case. linux-user/elfload.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index ddef23e..d6e00cd 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1660,7 +1660,12 @@ static void load_elf_image(const char *image_name, int image_fd, } load_addr = loaddr; -if (ehdr-e_type == ET_DYN) { +if (pinterp_name != NULL) { +/* This is the main executable. Make sure that the low + address does not conflict with MMAP_MIN_ADDR or the + QEMU application itself. */ +probe_guest_base(image_name, loaddr, hiaddr); +} else if (ehdr-e_type == ET_DYN) { /* The image indicates that it can be loaded anywhere. Find a location that can hold the memory space required. If the image is pre-linked, LOADDR will be non-zero. Since we do @@ -1672,11 +1677,6 @@ static void load_elf_image(const char *image_name, int image_fd, if (load_addr == -1) { goto exit_perror; } -} else if (pinterp_name != NULL) { -/* This is the main executable. Make sure that the low - address does not conflict with MMAP_MIN_ADDR or the - QEMU application itself. */ -probe_guest_base(image_name, loaddr, hiaddr); } load_bias = load_addr - loaddr; -- 1.8.3.2