Re: [Qemu-devel] [PATCH v3 0/3] Make mixer emulation configurable at runtime
On Sa, 2013-09-07 at 00:53 -0400, Bandan Das wrote: Currently, hda-codec mixer emulation can only be enabled by using the --enable-mixemu option to configure at compile time with the default value being off. These patches enable making mixer emulation selectable at runtime which is more convenient. Consequently, the last patch in this series removes the mixemu configuration option. Reviewed-by: Gerd Hoffmann kra...@redhat.com
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
On 09/09/2013 03:50 PM, Alexander Graf wrote: Am 09.09.2013 um 04:40 schrieb Alexey Kardashevskiy a...@ozlabs.ru: On 09/06/2013 01:11 AM, Alexander Graf wrote: On 05.09.2013, at 16:26, Benjamin Herrenschmidt wrote: On Thu, 2013-09-05 at 16:14 +0200, Andreas Färber wrote: Are you thinking of POWER8 having a different frequency than POWER8 in compat mode? Because migration from one -cpu to another is not supported elsewhere. Even if we want to migrate from one POWER7 revision to another, we should let the destination use the revision of the source (guest ABI!), via property if need be. Anything else will lead to confusion as to what is supported and what is not. That -cpu host is the default for convenience shouldn't relieve admins/libvirt to think about sensible setups like they have to on x86. Besides POWER8 uses 512Mhz too :-) It's been architected so it's unlikely to change from now on. Ok, ditch the tb frequency then. We can always default to 512Mhz when it's absent. QEMU uses what kvmppc_get_tbfreq() returns. And ppc_tb_t carries this value. It does not use 512MHz automatically but migration should always assume 512MHz :-/ If we remove it, I would like to add assert(tb_freq!=512MHz) into timebase_pre_save() and timebase_post_load(), is that ok? Good point. This would break TCG migration, right? In TCG we do not need any timebase offset at all, the time should migrate as is, no? :) It could break something like power5 migration under PR KVM, do not know... -- Alexey
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
Am 09.09.2013 um 07:58 schrieb Alexey Kardashevskiy a...@ozlabs.ru: On 09/09/2013 03:50 PM, Alexander Graf wrote: Am 09.09.2013 um 04:40 schrieb Alexey Kardashevskiy a...@ozlabs.ru: On 09/06/2013 01:11 AM, Alexander Graf wrote: On 05.09.2013, at 16:26, Benjamin Herrenschmidt wrote: On Thu, 2013-09-05 at 16:14 +0200, Andreas Färber wrote: Are you thinking of POWER8 having a different frequency than POWER8 in compat mode? Because migration from one -cpu to another is not supported elsewhere. Even if we want to migrate from one POWER7 revision to another, we should let the destination use the revision of the source (guest ABI!), via property if need be. Anything else will lead to confusion as to what is supported and what is not. That -cpu host is the default for convenience shouldn't relieve admins/libvirt to think about sensible setups like they have to on x86. Besides POWER8 uses 512Mhz too :-) It's been architected so it's unlikely to change from now on. Ok, ditch the tb frequency then. We can always default to 512Mhz when it's absent. QEMU uses what kvmppc_get_tbfreq() returns. And ppc_tb_t carries this value. It does not use 512MHz automatically but migration should always assume 512MHz :-/ If we remove it, I would like to add assert(tb_freq!=512MHz) into timebase_pre_save() and timebase_post_load(), is that ok? Good point. This would break TCG migration, right? In TCG we do not need any timebase offset at all, the time should migrate as is, no? :) I hope so, but we need to make sure we don't assert() in that case :). It could break something like power5 migration under PR KVM, do not know... Well, today we don't support full migration with PR KVM yet - IIRC we don't have access to all state. But that may change in the future. I think it's ok to restrict live migration to machines with the same tb frequency when kvm is enabled. Whether you implement it through a hardcoded 512Mhz or through a timebase value that gets live migrated and then compared is up to you - both ways work for me :). Alex -- Alexey
Re: [Qemu-devel] [PATCH V12 00/13] Add support for binding guest numa nodes to host numa nodes
Hi folks, Any comments? ;-P Wanlong Gao As you know, QEMU can't direct it's memory allocation now, this may cause guest cross node access performance regression. And, the worse thing is that if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =kvm_assign_device() = kvm_iommu_map_memslots() = kvm_iommu_map_pages() = kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policy before the pages are really mapped. According to this patch set, we are able to set guest nodes memory policy like following: -numa node,nodeid=0,cpus=0, \ -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ -numa node,nodeid=1,cpus=1 \ -numa mem,size=1024M,policy=interleave,host-nodes=1 This supports policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N like format. Also add set-mem-policy QMP and hmp command to set memory policy. And add a QMP command query-numa to show numa info through this API. And convert the info numa monitor command to use this QMP command query-numa. V1-V2: change to use QemuOpts in numa options (Paolo) handle Error in mpol parser (Paolo) change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo) V2-V3: also handle Error in cpus parser (5/10) split out common parser from cpus and hostnode parser (Bandan 6/10) V3-V4: rebase to request for comments V4-V5: use OptVisitor and split -numa option (Paolo) - s/set-mpol/set-mem-policy (Andreas) - s/mem-policy/policy - s/mem-hostnode/host-nodes fix hmp command process after error (Luiz) add qmp command query-numa and convert info numa to it (Luiz) V5-V6: remove tabs in json file (Laszlo, Paolo) add back -numa node,mem=xxx as legacy (Paolo) change cpus and host-nodes to array (Laszlo, Eric) change nodeid to uint16 add NumaMemPolicy enum type (Eric) rebased on Laszlo's OptsVisitor: support / flatten integer ranges for repeating options patch set, thanks for Laszlo's help V6-V7: change UInt16 to uint16 (Laszlo) fix a typo in adding qmp command set-mem-policy V7-V8: rebase to current master with Laszlo's V2 of OptsVisitor patch set fix an adding white space line error V8-V9: rebase to current master check if total numa memory size is equal to ram_size (Paolo) add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo) replace the use of numa_num_configured_nodes() (Andrew) avoid abusing the fact i==nodeid (Andrew) V9-V10: rebase to current master remove libnuma (Andrew) MAX_NODES=64 - MAX_NODES=128 since libnuma selected 128 (Andrew) use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew) remove a useless clear_bit() operation (Andrew) V10-V11: rebase to current master fix maxnode argument of mbind(2) V11-V12: rebase to current master split patch 02/11 of V11 (Eduardo) add some max value check (Eduardo) split MAX_NODES change patch (Eduardo) *I hope this can catch up the train of 1.7.* Thanks, Wanlong Gao Wanlong Gao (13): NUMA: move numa related code to new file numa.c NUMA: check if the total numa memory size is equal to ram_size NUMA: Add numa_info structure to contain numa nodes info NUMA: convert -numa option to use OptsVisitor NUMA: introduce NumaMemOptions NUMA: add -numa mem, options NUMA: expand MAX_NODES from 64 to 128 NUMA: parse guest numa nodes memory policy NUMA: set guest numa nodes memory policy NUMA: add qmp command set-mem-policy to set memory policy for NUMA node NUMA: add hmp command set-mem-policy NUMA: add qmp command query-numa NUMA: convert hmp command info_numa to use qmp command query_numa Makefile.target | 2 +- cpus.c | 14 -- hmp-commands.hx | 16 ++ hmp.c | 119 + hmp.h | 2 + hw/i386/pc.c| 4 +- include/sysemu/cpus.h | 1 - include/sysemu/sysemu.h | 18 +- monitor.c | 21 +-- numa.c | 460 qapi-schema.json| 133 ++ qemu-options.hx | 6 +- qmp-commands.hx | 90 ++ vl.c| 160 ++--- 14 files changed, 861 insertions(+), 185 deletions(-) create mode 100644 numa.c
Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller
On Fri, Sep 06, 2013 at 01:23:32PM +0100, Peter Maydell wrote: On 6 September 2013 13:19, Jens Freimann jf...@linux.vnet.ibm.com wrote: Add symbols required for the s390 floating interrupt controller (flic) Updates to linux-headers should be the result of a sync against a specified mainline kernel revision, please (otherwise this should be an RFC patchset). ok, I understand. I was not sure about that and only added a remark in the cover letter regards Jens thanks -- PMM
Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller
On Fri, Sep 06, 2013 at 01:32:52PM +0100, Peter Maydell wrote: On 6 September 2013 13:19, Jens Freimann jf...@linux.vnet.ibm.com wrote: @@ -839,6 +903,7 @@ struct kvm_device_attr { #define KVM_DEV_TYPE_FSL_MPIC_20 1 #define KVM_DEV_TYPE_FSL_MPIC_42 2 #define KVM_DEV_TYPE_XICS 3 +#define KVM_DEV_TYPE_FLIC 4 Christoffer's patchset switching the ARM VGIC to this list also uses 4 as its enumeration value: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006822.html That patchset isn't in yet, but maybe you should use 5 to avoid conflicts? sure, will do regards Jens thanks -- PMM
Re: [Qemu-devel] [PATCH V3 1/2] qemu: Adjust qemu wakeup
Il 09/09/2013 05:26, Liu, Jinsong ha scritto: From 6f40a66521e012170493964a2135fb3b4ae7c9b2 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Sun, 8 Sep 2013 00:33:19 +0800 Subject: [PATCH V3 1/2] qemu: Adjust qemu wakeup Currently Xen hvm s3 has a bug coming from the difference between qemu-traditioanl and qemu-xen. For qemu-traditional, the way to resume from hvm s3 is via 'xl trigger' command. However, for qemu-xen, the way to resume from hvm s3 inherited from standard qemu, i.e. via QMP, and it doesn't work under Xen. The root cause is, for qemu-xen, 'xl trigger' command didn't reset devices, while QMP didn't unpause hvm domain though they did qemu system reset. We have two qemu patches and one xl patch to fix Xen hvm s3 bug. This patch is the qemu patch 1. It adjusts qemu wakeup so that Xen s3 resume logic (which will be implemented at qemu patch 2) will be notified after qemu system reset. Signed-off-by: Liu Jinsong jinsong@intel.com --- hw/acpi/core.c |3 ++- include/sysemu/sysemu.h |4 +++- vl.c| 15 +++ 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/hw/acpi/core.c b/hw/acpi/core.c index 7467b88..7138139 100644 --- a/hw/acpi/core.c +++ b/hw/acpi/core.c @@ -324,12 +324,13 @@ static void acpi_notify_wakeup(Notifier *notifier, void *data) (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_TIMER_STATUS); break; case QEMU_WAKEUP_REASON_OTHER: -default: /* ACPI_BITMASK_WAKE_STATUS should be set on resume. Pretend that resume was caused by power button */ ar-pm1.evt.sts |= (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_POWER_BUTTON_STATUS); break; +default: +break; } } diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index b1aa059..50bc44d 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -39,9 +39,11 @@ int vm_stop(RunState state); int vm_stop_force_state(RunState state); typedef enum WakeupReason { -QEMU_WAKEUP_REASON_OTHER = 0, +/* Always keep QEMU_WAKEUP_REASON_NONE = 0 */ +QEMU_WAKEUP_REASON_NONE = 0, QEMU_WAKEUP_REASON_RTC, QEMU_WAKEUP_REASON_PMTIMER, +QEMU_WAKEUP_REASON_OTHER, } WakeupReason; void qemu_system_reset_request(void); diff --git a/vl.c b/vl.c index b4b119a..8c5f113 100644 --- a/vl.c +++ b/vl.c @@ -1792,14 +1792,14 @@ static pid_t shutdown_pid; static int powerdown_requested; static int debug_requested; static int suspend_requested; -static int wakeup_requested; +static WakeupReason wakeup_reason; static NotifierList powerdown_notifiers = NOTIFIER_LIST_INITIALIZER(powerdown_notifiers); static NotifierList suspend_notifiers = NOTIFIER_LIST_INITIALIZER(suspend_notifiers); static NotifierList wakeup_notifiers = NOTIFIER_LIST_INITIALIZER(wakeup_notifiers); -static uint32_t wakeup_reason_mask = ~0; +static uint32_t wakeup_reason_mask = ~(1 QEMU_WAKEUP_REASON_NONE); static RunState vmstop_requested = RUN_STATE_MAX; int qemu_shutdown_requested_get(void) @@ -1849,11 +1849,9 @@ static int qemu_suspend_requested(void) return r; } -static int qemu_wakeup_requested(void) +static WakeupReason qemu_wakeup_requested(void) { -int r = wakeup_requested; -wakeup_requested = 0; -return r; +return wakeup_reason; } static int qemu_powerdown_requested(void) @@ -1970,8 +1968,7 @@ void qemu_system_wakeup_request(WakeupReason reason) return; } runstate_set(RUN_STATE_RUNNING); -notifier_list_notify(wakeup_notifiers, reason); -wakeup_requested = 1; +wakeup_reason = reason; qemu_notify_event(); } @@ -2063,6 +2060,8 @@ static bool main_loop_should_exit(void) pause_all_vcpus(); cpu_synchronize_all_states(); qemu_system_reset(VMRESET_SILENT); +notifier_list_notify(wakeup_notifiers, wakeup_reason); +wakeup_reason = QEMU_WAKEUP_REASON_NONE; resume_all_vcpus(); monitor_protocol_event(QEVENT_WAKEUP, NULL); } Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
Hello Richard, On 02.09.2013 19:54, Richard Henderson wrote: I'm not sure if I posted v2 or not, but my branch is named -3, therefore this is v3. ;-) The jumbo fixme patch from v1 has been split up. This has been updated for the changes in the tlb helpers over the past few weeks. For the benefit of trivial conflict resolution, it's relative to a tree that contains basically all of my patches. See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if you find yourself missing any of the dependencies. r~ Richard Henderson (29): tcg-aarch64: Set ext based on TCG_OPF_64BIT tcg-aarch64: Change all ext variables to bool tcg-aarch64: Don't handle mov/movi in tcg_out_op tcg-aarch64: Hoist common argument loads in tcg_out_op tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn tcg-aarch64: Introduce tcg_fmt_* functions tcg-aarch64: Introduce tcg_fmt_Rdn_aimm tcg-aarch64: Implement mov with tcg_fmt_* functions tcg-aarch64: Handle constant operands to add, sub, and compare tcg-aarch64: Handle constant operands to and, or, xor tcg-aarch64: Support andc, orc, eqv, not tcg-aarch64: Handle zero as first argument to sub tcg-aarch64: Support movcond tcg-aarch64: Support deposit tcg-aarch64: Support add2, sub2 tcg-aarch64: Support muluh, mulsh tcg-aarch64: Support div, rem tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s tcg-aarch64: Improve tcg_out_movi tcg-aarch64: Avoid add with zero in tlb load tcg-aarch64: Use adrp in tcg_out_movi tcg-aarch64: Pass return address to load/store helpers directly. tcg-aarch64: Use tcg_out_call for qemu_ld/st tcg-aarch64: Use symbolic names for branches tcg-aarch64: Implement tcg_register_jit tcg-aarch64: Reuse FP and LR in translated code tcg-aarch64: Introduce tcg_out_ldst_pair tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check include/exec/exec-all.h | 18 - tcg/aarch64/tcg-target.c | 1276 ++ tcg/aarch64/tcg-target.h | 76 +-- 3 files changed, 867 insertions(+), 503 deletions(-) after carefully reading and testing your patches, this is how I suggest to proceed: first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code. No type changes, no refactoring, no beautification. Once we agree on those, introduce the meaningful restructuring you want to do, like the new INSN type, the don't handle mov/movi in tcg_out_op, the TCG_OPF_64BIT thing, etc. Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use 1 and 0: adapt them as well) etc. Right now the patchset is difficult to digest, given the regressions it introduces coupled with a mixing of functional changes, restructuring and cosmetics. I think this will allow us to proceed quicker towards agreement. Thanks, Claudio
[Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets
The isoc packet path in the ehci code has a bad qobject cast, causing an abort, this patch fixes this. Note this problem is backported in 1.6.0 too, and this patch should be backported to the 1.6.0 stable tree. Signed-off-by: Hans de Goede hdego...@redhat.com --- hw/usb/hcd-ehci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 010a0d0..77c4872 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci, return -1; } -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as); +qemu_sglist_init(ehci-isgl, BUS(ehci-bus)-parent, + 2, ehci-as); if (off + len 4096) { /* transfer crosses page border */ uint32_t len2 = off + len - 4096; -- 1.8.3.1
[Qemu-devel] [PATCH] ehci: save device pointer in EHCIState
We'll need a pointer to the actual pci/sysbus device, stick a pointer to it into the EHCIState struct. https://bugzilla.redhat.com/show_bug.cgi?id=1005495 Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb/hcd-ehci.c | 3 ++- hw/usb/hcd-ehci.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 137e200..162680c 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1486,7 +1486,7 @@ static int ehci_process_itd(EHCIState *ehci, return -1; } -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as); +qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as); if (off + len 4096) { /* transfer crosses page border */ uint32_t len2 = off + len - 4096; @@ -2529,6 +2529,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp) s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s); s-async_bh = qemu_bh_new(ehci_frame_timer, s); +s-device = dev; qemu_register_reset(ehci_reset, s); qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s); diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h index 15a28e8..065c9fa 100644 --- a/hw/usb/hcd-ehci.h +++ b/hw/usb/hcd-ehci.h @@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead; struct EHCIState { USBBus bus; +DeviceState *device; qemu_irq irq; MemoryRegion mem; AddressSpace *as; -- 1.8.3.1
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-i386/cpu.c | 1 + target-i386/cpu.h | 5 + 2 files changed, 6 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index c36345e..ac83106 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s) env-fpuc = 0x37f; env-mxcsr = 0x1f80; +env-xstate_bv = XSTATE_FP | XSTATE_SSE; env-pat = 0x0007040600070406ULL; env-msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 5723eff..a153078 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -380,6 +380,11 @@ #define MSR_VM_HSAVE_PA 0xc0010117 +#define XSTATE_SUPPORTED(XSTATE_FP|XSTATE_SSE|XSTATE_YMM) Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D then too. Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. Paolo +#define XSTATE_FP 1 +#define XSTATE_SSE 2 +#define XSTATE_YMM 4 + /* CPUID feature words */ typedef enum FeatureWord { FEAT_1_EDX, /* CPUID[1].EDX */ -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] savevm too slow
Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben: the other question: when I change the buffer size #define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. with cache=writeback. Is there any risk for setting cache=writeback with IO_BUF_SIZE 1M ? No. Using a larger buffer size should be safe. Kevin ━━━ xuanmao_001 From: Kevin Wolf Date: 2013-09-06 18:38 To: xuanmao_001 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz Subject: Re: savevm too slow Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben: Hi, qemuers: I found that the guest disk file cache mode will affect to the time of savevm. the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can, less than 10 seconds. here is the example I use virsh: @cache with writeback: #the first snapshot real0m21.904s user0m0.006s sys 0m0.008s #the secondary snapshot real2m11.624s user0m0.013s sys 0m0.008s @cache with unsafe: #the first snapshot real0m0.730s user0m0.006s sys 0m0.005s #the secondary snapshot real0m1.296s user0m0.002s sys 0m0.008s I sent patches that should eliminate the difference between the first and second snapshot at least. so, what the difference between them when using different cache. cache=unsafe ignores any flush requests. It's possible that there is potential for optimisation with cache=writeback, i.e. it sends flush requests that aren't necessary in fact. This is something that I haven't checked yet. the other question: when I change the buffer size #define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. Kevin
Re: [Qemu-devel] savevm too slow
I sent patches that should eliminate the difference between the first and second snapshot at least. where I can find the patches that can eliminate the difference between the first and second snapshot ? Does they fit qemu-kvm-1.0,1 ? xuanmao_001 From: Kevin Wolf Date: 2013-09-09 16:35 To: xuanmao_001 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz Subject: Re: Re: savevm too slow Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben: the other question: when I change the buffer size #define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. with cache=writeback. Is there any risk for setting cache=writeback with IO_BUF_SIZE 1M ? No. Using a larger buffer size should be safe. Kevin ━━━ xuanmao_001 From: Kevin Wolf Date: 2013-09-06 18:38 To: xuanmao_001 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz Subject: Re: savevm too slow Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben: Hi, qemuers: I found that the guest disk file cache mode will affect to the time of savevm. the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can, less than 10 seconds. here is the example I use virsh: @cache with writeback: #the first snapshot real0m21.904s user0m0.006s sys 0m0.008s #the secondary snapshot real2m11.624s user0m0.013s sys 0m0.008s @cache with unsafe: #the first snapshot real0m0.730s user0m0.006s sys 0m0.005s #the secondary snapshot real0m1.296s user0m0.002s sys 0m0.008s I sent patches that should eliminate the difference between the first and second snapshot at least. so, what the difference between them when using different cache. cache=unsafe ignores any flush requests. It's possible that there is potential for optimisation with cache=writeback, i.e. it sends flush requests that aren't necessary in fact. This is something that I haven't checked yet. the other question: when I change the buffer size #define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. Kevin
Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust
Il 08/09/2013 13:52, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote: QEMU moves state from CPUArchState to struct kvm_xsave and back when it invokes the KVM_*_XSAVE ioctls. Because it doesn't treat the XSAVE region as an opaque blob, it might be impossible to set some state on the destination if migrating to an older version. This patch blocks migration if it finds that unsupported bits are set in the XSTATE_BV header field. To make this work robustly, QEMU should only report in env-xstate_bv those fields that will actually end up in the migration stream. We usually handle host cpu differences in cpuid layer, not by trying to validate migration data. Actually we do both. QEMU for example detects invalid subsections and blocks migration, and CPU differences also result in subsections that the destination does not know. But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is not a CPU difference, it is simply invalid migration data. i.e CPUID.0D should be configurable and management should be able to query QEMU what is supported and prevent migration attempt accordingly. Management is already able to query QEMU of what is supported, because new XSAVE state is always attached to new CPUID bits in leaves other than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX). QEMU should compute 0Dh data based on those bits indeed. However, KVM_GET/SET_XSAVE should still return all values supported by the hypervisor, independent of the supported CPUID bits. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-i386/kvm.c | 3 ++- target-i386/machine.c | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 749aa09..df08a4b 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu) sizeof env-fpregs); memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE], sizeof env-xmm_regs); -env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV]; +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV] +XSTATE_SUPPORTED; Don't we just drop state here that will not be restored on the destination and destination will not be able to tell since we masked unsupported bits? A well-behaved guest should not have modified that state anyway, since: * the source and destination machines should have the same CPU * since the destination QEMU does not support the feature, the source should have masked it as well * the guest should always probe CPUID before using a feature There will be only one change for well-behaved guests with this patch (and the change will be invisible to them since they behave well). After the patch, KVM_SET_XSAVE will set the extended states to the processor-reset state instead of all-zeros. However, all currently-defined states have a processor-reset state that is equal to all-zeroes, so this change is theoretical. In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here, and we should hide all features that are not visible in CPUID. It is okay, however, to test it in cpu_post_load. Paolo memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE], sizeof env-ymmh_regs); return 0; diff --git a/target-i386/machine.c b/target-i386/machine.c index dc81cde..9e2cfcf 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id) CPUX86State *env = cpu-env; int i; +if (env-xstate_bv ~XSTATE_SUPPORTED) { +return -EINVAL; +} + /* * Real mode guest segments register DPL should be zero. * Older KVM version were setting it wrongly. -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. Can such migration happen? The commit that added xsave support (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id. In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. I do not see why. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-i386/cpu.c | 1 + target-i386/cpu.h | 5 + 2 files changed, 6 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index c36345e..ac83106 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s) env-fpuc = 0x37f; env-mxcsr = 0x1f80; +env-xstate_bv = XSTATE_FP | XSTATE_SSE; env-pat = 0x0007040600070406ULL; env-msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 5723eff..a153078 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -380,6 +380,11 @@ #define MSR_VM_HSAVE_PA 0xc0010117 +#define XSTATE_SUPPORTED (XSTATE_FP|XSTATE_SSE|XSTATE_YMM) Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D then too. Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. -- Gleb.
Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState
Hi, On 09/09/2013 10:20 AM, Gerd Hoffmann wrote: We'll need a pointer to the actual pci/sysbus device, stick a pointer to it into the EHCIState struct. https://bugzilla.redhat.com/show_bug.cgi?id=1005495 Looks like we've been working on exactly the same bug at the same time, but we've come up with slightly different solutions. If we're going to go this way (which seems the best), you should also modify the qemu_sglist_init call in ehci_init_transfer for consistency, and drop the bus and qbus variable declarations at the top of the functions as those will be unused then. Regards, Hans
Re: [Qemu-devel] savevm too slow
Am 09.09.2013 um 10:47 hat xuanmao_001 geschrieben: I sent patches that should eliminate the difference between the first and second snapshot at least. where I can find the patches that can eliminate the difference between the first and second snapshot ? Does they fit qemu-kvm-1.0,1 ? I sent them to you on Friday, the first email has the following subject line: [PATCH 0/2] qcow2: Discard VM state in active L1 after creating snapshot This patch series is for current git master, and chances are that it would work for qemu 1.6 as well. It will most likely not apply for qemu 1.0, which is almost two years old. Kevin ━━━ xuanmao_001 From: Kevin Wolf Date: 2013-09-09 16:35 To: xuanmao_001 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz Subject: Re: Re: savevm too slow Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben: the other question: when I change the buffer size # define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. with cache=writeback. Is there any risk for setting cache=writeback with IO_BUF_SIZE 1M ? No. Using a larger buffer size should be safe. Kevin ━━━ xuanmao_001 From: Kevin Wolf Date: 2013-09-06 18:38 To: xuanmao_001 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz Subject: Re: savevm too slow Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben: Hi, qemuers: I found that the guest disk file cache mode will affect to the time of savevm. the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can, less than 10 seconds. here is the example I use virsh: @cache with writeback: #the first snapshot real0m21.904s user0m0.006s sys 0m0.008s #the secondary snapshot real2m11.624s user0m0.013s sys 0m0.008s @cache with unsafe: #the first snapshot real0m0.730s user0m0.006s sys 0m0.005s #the secondary snapshot real0m1.296s user0m0.002s sys 0m0.008s I sent patches that should eliminate the difference between the first and second snapshot at least. so, what the difference between them when using different cache. cache=unsafe ignores any flush requests. It's possible that there is potential for optimisation with cache=writeback, i.e. it sends flush requests that aren't necessary in fact. This is something that I haven't checked yet. the other question: when I change the buffer size #define IO_BUF_SIZE 32768 to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly. Is this for cache=unsafe as well? Juan, any specific reason for using 32k? I think it would be better to have a multiple of the qcow2 cluster size, otherwise we get COW for the empty part of newly allocated clusters. If we can't make it dynamic, using at least fixed 64k to match the qcow2 default would probably improve things a bit. Kevin
Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust
On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:52, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote: QEMU moves state from CPUArchState to struct kvm_xsave and back when it invokes the KVM_*_XSAVE ioctls. Because it doesn't treat the XSAVE region as an opaque blob, it might be impossible to set some state on the destination if migrating to an older version. This patch blocks migration if it finds that unsupported bits are set in the XSTATE_BV header field. To make this work robustly, QEMU should only report in env-xstate_bv those fields that will actually end up in the migration stream. We usually handle host cpu differences in cpuid layer, not by trying to validate migration data. Actually we do both. QEMU for example detects invalid subsections and blocks migration, and CPU differences also result in subsections that the destination does not know. That's different from what you do here though. If xstate_bv was in its separate subsection things would be easier, but it is not. But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is not a CPU difference, it is simply invalid migration data. i.e CPUID.0D should be configurable and management should be able to query QEMU what is supported and prevent migration attempt accordingly. Management is already able to query QEMU of what is supported, because new XSAVE state is always attached to new CPUID bits in leaves other than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX). QEMU should compute 0Dh data based on those bits indeed. If it is computable from other data even better, easier for us. However, KVM_GET/SET_XSAVE should still return all values supported by the hypervisor, independent of the supported CPUID bits. Why? Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-i386/kvm.c | 3 ++- target-i386/machine.c | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 749aa09..df08a4b 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu) sizeof env-fpregs); memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE], sizeof env-xmm_regs); -env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV]; +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV] +XSTATE_SUPPORTED; Don't we just drop state here that will not be restored on the destination and destination will not be able to tell since we masked unsupported bits? A well-behaved guest should not have modified that state anyway, since: * the source and destination machines should have the same CPU * since the destination QEMU does not support the feature, the source should have masked it as well * the guest should always probe CPUID before using a feature The I fail to see what is the purpose of the patch. I see two cases: 1. Each extended state has separate CPUID bit (is this guarantied?) - In this case, as you say, by matching CPUID on src and dst we guaranty that migration data is good. 2. There is a state that is advertised in CPUID.0D, but does not have any regular feature CPUID associated with it. - In this case this patch will drop valid state that needs to be restored. There will be only one change for well-behaved guests with this patch (and the change will be invisible to them since they behave well). After the patch, KVM_SET_XSAVE will set the extended states to the processor-reset state instead of all-zeros. However, all currently-defined states have a processor-reset state that is equal to all-zeroes, so this change is theoretical. In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here, and we should hide all features that are not visible in CPUID. It is okay, however, to test it in cpu_post_load. The kernel should not even return state that is not visible in CPUID. Paolo memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE], sizeof env-ymmh_regs); return 0; diff --git a/target-i386/machine.c b/target-i386/machine.c index dc81cde..9e2cfcf 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id) CPUX86State *env = cpu-env; int i; +if (env-xstate_bv ~XSTATE_SUPPORTED) { +return -EINVAL; +} + /* * Real mode guest segments register DPL should be zero. * Older KVM version were setting it wrongly. -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote: On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote: I think it's ok to restrict live migration to machines with the same tb frequency when kvm is enabled. Whether you implement it through a hardcoded 512Mhz or through a timebase value that gets live migrated and then compared is up to you - both ways work for me :). The latter might be handy if we want to support migration on 970, though we don't have the TBU40 stuff there so adjusting the TB in the host kernel would be ... problematic. Well, we could save/restore TB when we enter/exit the guest, no? But I really only want to have something that doesn't block the door for someone who wants to enable things. Alex
Re: [Qemu-devel] [PATCH] raw-win32.c: Fix incorrect handling behaviour of small block files
Am 09.09.2013 um 11:14 hat Tal Kain geschrieben: It is a valid case that the read data's size is smaller than the requested size since there could be files that are smaller than the minimum block size (For ex. when a VMDK disk descriptor file) Signed-off-by: Tal Kain tal.k...@ravellosystems.com Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote: I think it's ok to restrict live migration to machines with the same tb frequency when kvm is enabled. Whether you implement it through a hardcoded 512Mhz or through a timebase value that gets live migrated and then compared is up to you - both ways work for me :). The latter might be handy if we want to support migration on 970, though we don't have the TBU40 stuff there so adjusting the TB in the host kernel would be ... problematic. Cheers, Ben.
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote: On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote: On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote: I think it's ok to restrict live migration to machines with the same tb frequency when kvm is enabled. Whether you implement it through a hardcoded 512Mhz or through a timebase value that gets live migrated and then compared is up to you - both ways work for me :). The latter might be handy if we want to support migration on 970, though we don't have the TBU40 stuff there so adjusting the TB in the host kernel would be ... problematic. Well, we could save/restore TB when we enter/exit the guest, no? Hard to do without introducing drift... But I really only want to have something that doesn't block the door for someone who wants to enable things. Alex
Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration
On 09.09.2013, at 11:38, Benjamin Herrenschmidt wrote: On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote: On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote: On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote: I think it's ok to restrict live migration to machines with the same tb frequency when kvm is enabled. Whether you implement it through a hardcoded 512Mhz or through a timebase value that gets live migrated and then compared is up to you - both ways work for me :). The latter might be handy if we want to support migration on 970, though we don't have the TBU40 stuff there so adjusting the TB in the host kernel would be ... problematic. Well, we could save/restore TB when we enter/exit the guest, no? Hard to do without introducing drift... We could probably quantify the drift through DEC. But I'm personally not too eager to worry about live migration on 970 :). Alex
Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes
Hi Will, Just trying to understand the scope of platform device assignment to guest on ARM. So, are the AMBA devices also represented in the device tree? Regards Varun -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Will Deacon Sent: Monday, September 09, 2013 2:23 PM To: Yoder Stuart-B08248 Cc: Sethi Varun-B16395; Wood Scott-B07421; Bhushan Bharat-R65777; 'Peter Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander Graf'; 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org'; kvm...@lists.cs.columbia.edu; kvm-...@vger.kernel.org; qemu- de...@nongnu.org Subject: Re: vfio for platform devices - 9/5/2012 - minutes On Fri, Sep 06, 2013 at 07:20:19PM +0100, Yoder Stuart-B08248 wrote: Adding Will... [...] I have a query about the ARM SMMU driver. In the ARM smmu driver I see, that bus notifiers are registered for both amba and platform bus. Amba is the I/O interconnect, right? Why is bus notifier required for the amba bus? Not sure I follow the question, really. If you have a DMA master registered as an AMBA device (e.g. PL330) and it wants to use an SMMU, then you need to be registered on that bus. What would you prefer instead? Will -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust
Il 09/09/2013 11:18, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:52, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote: QEMU moves state from CPUArchState to struct kvm_xsave and back when it invokes the KVM_*_XSAVE ioctls. Because it doesn't treat the XSAVE region as an opaque blob, it might be impossible to set some state on the destination if migrating to an older version. This patch blocks migration if it finds that unsupported bits are set in the XSTATE_BV header field. To make this work robustly, QEMU should only report in env-xstate_bv those fields that will actually end up in the migration stream. We usually handle host cpu differences in cpuid layer, not by trying to validate migration data. Actually we do both. QEMU for example detects invalid subsections and blocks migration, and CPU differences also result in subsections that the destination does not know. That's different from what you do here though. If xstate_bv was in its separate subsection things would be easier, but it is not. I agree. And also if YMM was in its separate subsections; future XSAVE states will likely use subsections (whose presence is keyed off bits in env-xstate_bv). However, KVM_GET/SET_XSAVE should still return all values supported by the hypervisor, independent of the supported CPUID bits. Why? Because this is not talking to the guest, it is talking to userspace. The VCPU state is more than what is visible to the guest, and returning all of it seems more consistent with the rest of the KVM API. For example, KVM_GET_FPU always returns SSE state even if the CPUID lacks SSE and/or FXSR. A well-behaved guest should not have modified that state anyway, since: * the source and destination machines should have the same CPU * since the destination QEMU does not support the feature, the source should have masked it as well * the guest should always probe CPUID before using a feature The I fail to see what is the purpose of the patch. I see two cases: 1. Each extended state has separate CPUID bit (is this guarantied?) Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX). - In this case, as you say, by matching CPUID on src and dst we guaranty that migration data is good. But we don't match CPUID on src and destination. This is something that the user should do, but it's better if we can test it too. Subsections do that for us; I am, in some sense, emulating subsections for the XSAVE states that are not stored in subsections. In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here, and we should hide all features that are not visible in CPUID. It is okay, however, to test it in cpu_post_load. The kernel should not even return state that is not visible in CPUID. That's an interesting point of view that I hadn't considered. But just like you asked me why it should return state that is not visible in CPUID, I'm asking you why it should not... Paolo Paolo memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE], sizeof env-ymmh_regs); return 0; diff --git a/target-i386/machine.c b/target-i386/machine.c index dc81cde..9e2cfcf 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id) CPUX86State *env = cpu-env; int i; +if (env-xstate_bv ~XSTATE_SUPPORTED) { +return -EINVAL; +} + /* * Real mode guest segments register DPL should be zero. * Older KVM version were setting it wrongly. -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
Il 09/09/2013 11:03, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. Can such migration happen? The commit that added xsave support (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id. Yes, old-new migration can happen. New-old of course cannot. In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. I do not see why. Because otherwise it behaves in a subtly different manner for XSAVE and non-XSAVE hosts. Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide the CPUID bits so that a well-behaved guest will not use it. QEMU does hide CPUID bits for non-supported XSAVE states, except for -cpu host. So this will not be a problem except with -cpu host. Paolo
[Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState
We'll need a pointer to the actual pci/sysbus device, stick a pointer to it into the EHCIState struct. https://bugzilla.redhat.com/show_bug.cgi?id=1005495 Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb/hcd-ehci.c | 7 +++ hw/usb/hcd-ehci.h | 1 + 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 137e200..22bdbf4 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p) { uint32_t cpage, offset, bytes, plen; dma_addr_t page; -USBBus *bus = p-queue-ehci-bus; -BusState *qbus = BUS(bus); cpage = get_field(p-qtd.token, QTD_TOKEN_CPAGE); bytes = get_field(p-qtd.token, QTD_TOKEN_TBYTES); offset = p-qtd.bufptr[0] ~QTD_BUFPTR_MASK; -qemu_sglist_init(p-sgl, qbus-parent, 5, p-queue-ehci-as); +qemu_sglist_init(p-sgl, p-queue-ehci-device, 5, p-queue-ehci-as); while (bytes 0) { if (cpage 4) { @@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci, return -1; } -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as); +qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as); if (off + len 4096) { /* transfer crosses page border */ uint32_t len2 = off + len - 4096; @@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp) s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s); s-async_bh = qemu_bh_new(ehci_frame_timer, s); +s-device = dev; qemu_register_reset(ehci_reset, s); qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s); diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h index 15a28e8..065c9fa 100644 --- a/hw/usb/hcd-ehci.h +++ b/hw/usb/hcd-ehci.h @@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead; struct EHCIState { USBBus bus; +DeviceState *device; qemu_irq irq; MemoryRegion mem; AddressSpace *as; -- 1.8.3.1
Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState
On Mo, 2013-09-09 at 11:04 +0200, Hans de Goede wrote: Hi, On 09/09/2013 10:20 AM, Gerd Hoffmann wrote: We'll need a pointer to the actual pci/sysbus device, stick a pointer to it into the EHCIState struct. https://bugzilla.redhat.com/show_bug.cgi?id=1005495 Looks like we've been working on exactly the same bug at the same time, but we've come up with slightly different solutions. Yep, and then git send-email exactly the same minute ;) If we're going to go this way (which seems the best), you should also modify the qemu_sglist_init call in ehci_init_transfer for consistency, and drop the bus and qbus variable declarations at the top of the functions as those will be unused then. Makes sense indeed. Done, v2 sent. cheers, Gerd
Re: [Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets
Il 09/09/2013 10:20, Hans de Goede ha scritto: The isoc packet path in the ehci code has a bad qobject cast, causing an abort, this patch fixes this. Note this problem is backported in 1.6.0 too, and this patch should be backported to the 1.6.0 stable tree. Signed-off-by: Hans de Goede hdego...@redhat.com --- hw/usb/hcd-ehci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 010a0d0..77c4872 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci, return -1; } -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as); +qemu_sglist_init(ehci-isgl, BUS(ehci-bus)-parent, + 2, ehci-as); if (off + len 4096) { /* transfer crosses page border */ uint32_t len2 = off + len - 4096; ... then qemu-stable should be CCed. Paolo
[Qemu-devel] [PATCH] block/qcow2: Use bdrv_truncate for size amend
When amending the size option for a qcow2 image, use bdrv_truncate instead of qcow2_truncate directly, since the latter will not adjust the total_sectors count in the BDS structure (whereas the former will). Signed-off-by: Max Reitz mre...@redhat.com --- Depends on (follow-up to): - block/qcow2: Image file option amendment (series, v5) --- block/qcow2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/qcow2.c b/block/qcow2.c index d29547b..259edfd 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1997,7 +1997,7 @@ static int qcow2_amend_options(BlockDriverState *bs, } if (new_size) { -ret = qcow2_truncate(bs, new_size); +ret = bdrv_truncate(bs, new_size); if (ret 0) { return ret; } -- 1.8.3.1
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
Il 06/09/2013 20:41, Eduardo Otubo ha scritto: Hello, Any chance to get this patch applied? Thanks! Paul, perhaps you can add yourself to MAINTAINERS and send a pull request? Paolo On 09/04/2013 11:11 AM, Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 },
Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions
Ping! thanks -- PMM On 25 August 2013 15:59, Peter Maydell peter.mayd...@linaro.org wrote: The bswap.h header includes a set of legacy unaligned functions that (since commit c732a52d3 at the beginning of this year) are just wrappers for underlying {ld,st}type functions. The legacy functions aren't used in many places, so just replace all their uses with uses of the new-style {ld,st} functions; this lets us remove the legacy wrappers altogether. Since we know the {ld,st}* routines are definitely functions, we can in the process remove some casts which were left over from when the legacy unaligned functions were previously macros. The patchset is divided up by function being removed, rather than by which device/subsystem is being fixed; I think this way round is easier to review since you only have to keep one substitution in your head when reading a patch. Peter Maydell (9): bswap.h: Remove cpu_to_le16wu() bswap.h: Remove cpu_to_le32wu() bswap.h: Remove le16_to_cpupu() bswap.h: Remove le32_to_cpupu() bswap.h: Remove be32_to_cpupu() bswap.h: Remove cpu_to_be16wu() bswap.h: Remove cpu_to_be32wu() bswap.h: Remove cpu_to_be64wu() bswap.h: Remove cpu_to_32wu() block/qcow2-cluster.c |2 +- hw/acpi/core.c|3 +-- hw/block/cdrom.c | 10 +- hw/display/vga_template.h | 14 -- hw/ide/atapi.c| 16 +++ hw/net/e1000.c| 22 + hw/net/ne2000.c |4 ++-- hw/pci/pcie_aer.c |4 ++-- include/hw/pci/pci.h |8 include/qemu/bswap.h | 47 - 10 files changed, 40 insertions(+), 90 deletions(-) -- 1.7.9.5
Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path
Il 09/09/2013 03:34, Fam Zheng ha scritto: On Fri, 09/06 20:19, Lluís Vilanova wrote: Fam Zheng writes: [...] Because $(obj) here is './block', instead of '../block'. This doesn't hurt compiling because we basically build all .o from top Makefile, before entering Makefile.target, but it will affact arriving per-object libs support. I'm curious. What's the reason to not use recursive make in QEMU? I don't know the answer, Paolo? It predates my involvement by a long time, so I don't know. But my guess is that whenever directories are not present in the build tree (e.g. i386-softmmu/hw) we have to create the Makefile in the configure script. Thus a heavily declarative Makefile style works better. Paolo
Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust
On Mon, Sep 09, 2013 at 11:50:03AM +0200, Paolo Bonzini wrote: Il 09/09/2013 11:18, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:52, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote: QEMU moves state from CPUArchState to struct kvm_xsave and back when it invokes the KVM_*_XSAVE ioctls. Because it doesn't treat the XSAVE region as an opaque blob, it might be impossible to set some state on the destination if migrating to an older version. This patch blocks migration if it finds that unsupported bits are set in the XSTATE_BV header field. To make this work robustly, QEMU should only report in env-xstate_bv those fields that will actually end up in the migration stream. We usually handle host cpu differences in cpuid layer, not by trying to validate migration data. Actually we do both. QEMU for example detects invalid subsections and blocks migration, and CPU differences also result in subsections that the destination does not know. That's different from what you do here though. If xstate_bv was in its separate subsection things would be easier, but it is not. I agree. And also if YMM was in its separate subsections; future XSAVE states will likely use subsections (whose presence is keyed off bits in env-xstate_bv). However, KVM_GET/SET_XSAVE should still return all values supported by the hypervisor, independent of the supported CPUID bits. Why? Because this is not talking to the guest, it is talking to userspace. The VCPU state is more than what is visible to the guest, and returning If a state does not affect guest in any way there is not reason to migrate it. all of it seems more consistent with the rest of the KVM API. For example, KVM_GET_FPU always returns SSE state even if the CPUID lacks SSE and/or FXSR. There are counter examples too :) If APIC is not created we do not return fake information on GET_IRQCHIP. I think nobody expected FPU state to grow indefinitely, so fixed, inflexible API was introduced, but now, when CPU state has flexible extended state management it make sense to model it in the API too. A well-behaved guest should not have modified that state anyway, since: * the source and destination machines should have the same CPU * since the destination QEMU does not support the feature, the source should have masked it as well * the guest should always probe CPUID before using a feature The I fail to see what is the purpose of the patch. I see two cases: 1. Each extended state has separate CPUID bit (is this guarantied?) Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX). OK, So for now no need to make 0D configurable, but we need to provide correct one according to those flags, not to passthrough host values. - In this case, as you say, by matching CPUID on src and dst we guaranty that migration data is good. But we don't match CPUID on src and destination. This is something that Yes, I was saying that management infrastructure already knows how to handle it. the user should do, but it's better if we can test it too. Subsections do that for us; I am, in some sense, emulating subsections for the XSAVE states that are not stored in subsections. We do not do it for other bits. It is possible currently to migrate to a slightly different cpu without failure and it may cause guest to crash, but we are not trying actively to catch those situations. Why XSAVE is different? In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here, and we should hide all features that are not visible in CPUID. It is okay, however, to test it in cpu_post_load. The kernel should not even return state that is not visible in CPUID. That's an interesting point of view that I hadn't considered. But just like you asked me why it should return state that is not visible in CPUID, I'm asking you why it should not... For number of reasons. First because since a sate is not used there is no point in migrating it. Second to make interface more deterministic for QEMU. i.e QEMU configures only features it supports and gets exactly same state from the kernel no matter what host cpu is and what kernel version is. This patch will not be needed since kernel will do the job. -- Gleb.
Re: [Qemu-devel] [RFC PATCH v2 3/6] Makefile: introduce common-obj-m and block-obj-m for DSO
Il 09/09/2013 04:26, Fam Zheng ha scritto: Actually I would like a try to keep $i-{obj,cflags,libs}: foo-obj := bar.o biz.o foo-libs := $(FOO_LIBS) foo-cflags := $(FOO_CFLAGS) getting rid of all the $(obj)/'s and expand them in the toplevel unnest magic (because we know foo.mo is contained in local $(block-obj-y) so it's all trackable). If this idea can't work out, I'll use your suggested way. Yes, either way works (i.e. as long as it's consistent). Paolo
Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path
On 6 September 2013 18:19, Lluís Vilanova vilan...@ac.upc.edu wrote: Fam Zheng writes: [...] Because $(obj) here is './block', instead of '../block'. This doesn't hurt compiling because we basically build all .o from top Makefile, before entering Makefile.target, but it will affact arriving per-object libs support. I'm curious. What's the reason to not use recursive make in QEMU? The classic paper is Recursive make considered harmful: http://aegis.sourceforge.net/auug97.pdf -- PMM
Re: [Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState
Hi, On 09/09/2013 11:57 AM, Gerd Hoffmann wrote: We'll need a pointer to the actual pci/sysbus device, stick a pointer to it into the EHCIState struct. https://bugzilla.redhat.com/show_bug.cgi?id=1005495 Looks good, note you've forgotten to add qemu-stable, I've done so now. Acked-by: Hans de Goede hdego...@redhat.com Regards, Hans Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/usb/hcd-ehci.c | 7 +++ hw/usb/hcd-ehci.h | 1 + 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 137e200..22bdbf4 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p) { uint32_t cpage, offset, bytes, plen; dma_addr_t page; -USBBus *bus = p-queue-ehci-bus; -BusState *qbus = BUS(bus); cpage = get_field(p-qtd.token, QTD_TOKEN_CPAGE); bytes = get_field(p-qtd.token, QTD_TOKEN_TBYTES); offset = p-qtd.bufptr[0] ~QTD_BUFPTR_MASK; -qemu_sglist_init(p-sgl, qbus-parent, 5, p-queue-ehci-as); +qemu_sglist_init(p-sgl, p-queue-ehci-device, 5, p-queue-ehci-as); while (bytes 0) { if (cpage 4) { @@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci, return -1; } -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as); +qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as); if (off + len 4096) { /* transfer crosses page border */ uint32_t len2 = off + len - 4096; @@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, Error **errp) s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s); s-async_bh = qemu_bh_new(ehci_frame_timer, s); +s-device = dev; qemu_register_reset(ehci_reset, s); qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s); diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h index 15a28e8..065c9fa 100644 --- a/hw/usb/hcd-ehci.h +++ b/hw/usb/hcd-ehci.h @@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead; struct EHCIState { USBBus bus; +DeviceState *device; qemu_irq irq; MemoryRegion mem; AddressSpace *as;
[Qemu-devel] [PATCH] e1000: NetClientInfo.receive_iov implemented
This patch implements the NetClientInfo.receive_iov method for the e1000 device emulation. In this way a network backend that uses qemu_sendv_packet() can deliver the fragmented packet without requiring an additional copy in the frontend/backend network code (nc_sendv_compat() function). The existing method NetClientInfo.receive has been reimplemented using the new method. Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com --- hw/net/e1000.c | 70 -- 1 file changed, 58 insertions(+), 12 deletions(-) I propose this patch also because our research group (University of Pisa, Department of Computer Engineering) is working on the e1000 device (optimizations and paravirtual extensions) and we have patches to support the VALE switch as a network backend (see http://info.iet.unipi.it/~luigi/vale/). The VALE backend uses qemu_sendv_packet() to send fragmented packets: For this reason we think it could be interesting to better support these packets with e1000. diff --git a/hw/net/e1000.c b/hw/net/e1000.c index f5ebed4..70ab17f 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -32,6 +32,7 @@ #include hw/loader.h #include sysemu/sysemu.h #include sysemu/dma.h +#include qemu/iov.h #include e1000_regs.h @@ -64,6 +65,8 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL); /* this is the size past which hardware will drop packets when setting LPE=1 */ #define MAXIMUM_ETHERNET_LPE_SIZE 16384 +#define MAXIMUM_ETHERNET_HDR_LEN (14+4) + /* * HW models: * E1000_DEV_ID_82540EM works with Windows and Linux @@ -825,7 +828,7 @@ static uint64_t rx_desc_base(E1000State *s) } static ssize_t -e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) +e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt) { E1000State *s = qemu_get_nic_opaque(nc); PCIDevice *d = PCI_DEVICE(s); @@ -834,11 +837,14 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) unsigned int n, rdt; uint32_t rdh_start; uint16_t vlan_special = 0; -uint8_t vlan_status = 0, vlan_offset = 0; +uint8_t vlan_status = 0; uint8_t min_buf[MIN_BUF_SIZE]; size_t desc_offset; size_t desc_size; size_t total_size; +size_t size = iov_size(iov, iovcnt), iov_ofs = 0; +struct iovec iv; +uint8_t *filter_buf = iov-iov_base; if (!(s-mac_reg[STATUS] E1000_STATUS_LU)) { return -1; @@ -850,10 +856,16 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) /* Pad to minimum Ethernet frame length */ if (size sizeof(min_buf)) { -memcpy(min_buf, buf, size); +iov_to_buf(iov, iovcnt, 0, min_buf, size); memset(min_buf[size], 0, sizeof(min_buf) - size); -buf = min_buf; -size = sizeof(min_buf); +iv.iov_base = filter_buf = min_buf; +iv.iov_len = size = sizeof(min_buf); +iovcnt = 1; +iov = iv; +} else if (iov-iov_len MAXIMUM_ETHERNET_HDR_LEN) { +/* This is very unlikely, but may happen. */ +iov_to_buf(iov, iovcnt, 0, min_buf, MAXIMUM_ETHERNET_HDR_LEN); +filter_buf = min_buf; } /* Discard oversized packets if !LPE and !SBP. */ @@ -864,14 +876,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) return size; } -if (!receive_filter(s, buf, size)) +if (!receive_filter(s, filter_buf, size)) { return size; +} -if (vlan_enabled(s) is_vlan_packet(s, buf)) { -vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(buf + 14))); -memmove((uint8_t *)buf + 4, buf, 12); +if (vlan_enabled(s) is_vlan_packet(s, filter_buf)) { +vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(filter_buf ++ 14))); +iov_ofs = 4; +if (filter_buf == iov-iov_base) { +memmove(filter_buf + 4, filter_buf, 12); +} else { +iov_from_buf(iov, iovcnt, 4, filter_buf, 12); +while (iov-iov_len = iov_ofs) { +iov_ofs -= iov-iov_len; +iov++; +} +} vlan_status = E1000_RXD_STAT_VP; -vlan_offset = 4; size -= 4; } @@ -893,12 +915,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size) desc.status |= (vlan_status | E1000_RXD_STAT_DD); if (desc.buffer_addr) { if (desc_offset size) { +size_t iov_copy, copied = 0; +hwaddr ba = le64_to_cpu(desc.buffer_addr); size_t copy_size = size - desc_offset; if (copy_size s-rxbuf_size) { copy_size = s-rxbuf_size; } -pci_dma_write(d, le64_to_cpu(desc.buffer_addr), - buf + desc_offset + vlan_offset, copy_size); +do { +iov_copy =
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote: Il 09/09/2013 11:03, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. Can such migration happen? The commit that added xsave support (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id. Yes, old-new migration can happen. New-old of course cannot. I see. I am fine with the patch, but please drop defines that are not used in the patch itself. In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. I do not see why. Because otherwise it behaves in a subtly different manner for XSAVE and non-XSAVE hosts. I do not see how. Can you elaborate? Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide Of course we can't, this is correct for other features too, but this is guest's problem. the CPUID bits so that a well-behaved guest will not use it. QEMU does hide CPUID bits for non-supported XSAVE states, except for -cpu host. So this will not be a problem except with -cpu host. -- Gleb.
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
On Mon, Sep 09, 2013 at 01:54:50PM +0300, Gleb Natapov wrote: On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote: Il 09/09/2013 11:03, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. Can such migration happen? The commit that added xsave support (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id. Yes, old-new migration can happen. New-old of course cannot. I see. I am fine with the patch, but please drop defines that are not used in the patch itself. BTW migration question, will xstate_bv no be zeroed by migration code in old-new case? -- Gleb.
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
Il 09/09/2013 12:54, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote: Il 09/09/2013 11:03, Gleb Natapov ha scritto: On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote: Il 08/09/2013 13:40, Gleb Natapov ha scritto: On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote: On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv, and not restore anything. XRSTOR restores FP/SSE state to reset state if no bits are set in xstate_bv. This is what should happen on reset, no? Yes. The problem happens on the migration destination when XSAVE data is not transmitted. FP/SSE data is transmitted and must be restored, but xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset state. The vcpu then loses the values that were set in the migration data. Since FP and SSE data are always valid, set them in xstate_bv at reset time. In fact, that value is the same that KVM_GET_XSAVE returns on pre-XSAVE hosts. It is needed for migration between non xsave host to xsave host. Yes, and this patch does the same for migration between non-XSAVE QEMU and XSAVE QEMU. Can such migration happen? The commit that added xsave support (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id. Yes, old-new migration can happen. New-old of course cannot. I see. I am fine with the patch, but please drop defines that are not used in the patch itself. Ok. (For the BTW question, xstate_bv will not be zeroed, it will remain to the default value). In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. I do not see why. Because otherwise it behaves in a subtly different manner for XSAVE and non-XSAVE hosts. I do not see how. Can you elaborate? Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0. On an XSAVE host, when the guest FPU state is loaded KVM will do an XRSTOR. The XRSTOR will restore the FPU state to default values. On a non-XSAVE host, when the guest FPU state is loaded KVM will do an FXRSTR. The FXRSTR will load the FPU state from the first 512 bytes of the block that was passed to KVM_SET_XSAVE. This is not a problem because userspace will usually pass to KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and KVM_GET_XSAVE will never set XSTATE_BV=0. However, KVM_SET_XSAVE is supposed to emulate XSAVE/XRSTOR if it is not available, and it is failing to emulate this detail. Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide Of course we can't, this is correct for other features too, but this is guest's problem. Ok, then we agree that QEMU doesn't have a problem? The XSAVE data will always be fresh as long as the guest obeys CPUID bits it receives, and the CPUID bits that QEMU passes will never enable XSAVE data that QEMU does not support. Paolo
Re: [Qemu-devel] Questions about hvm domU default devices with upstream qemu
On Thu, 5 Sep 2013, Fabio Fantoni wrote: Il 05/09/2013 16:04, Stefano Stabellini ha scritto: On Thu, 5 Sep 2013, Fabio Fantoni wrote: About qemu default empty and unusable floppy and cdrom I have already reply at start of year that qemu traditional does not have them. The tests were with xl only if I remember good, now I checked also on old production server with xend and qemu traditional. I confirm that hvm domUs is without empty floppy and cdrom also in that case. cd-insert works (tested with xl and upstream qemu). cd-insert does not works only with qemu default empty cdrom because is undefined on xen side. So HVM domUs do NOT have a cdrom drive by default with qemu-traditional. Exact. Do they have an empty cdrom drive by default with qemu-xen? Yes Is the lack of a cdrom drive the reason why cd-insert doesn't work by default (unless you specify an empty cdrom drive in the VM config file)? Yes. xl cd-insert command works with cdrom devices already present on domU defined on xl file configuration. The cdrom can be empty or not, the only exception is the empty cdrom added default by qemu that is not usable with cd-insert because not defined on xl file configuration. I see. That should be fixed, by either removing the empty cdrom drive or making it work properly with xl cd-insert. I don't have a strong opinion on this. Given that qemu-traditional doesn't have a cdrom drive by default, it might make sense to do the same on qemu-xen. I think the best choice is to remove the qemu default empty cdrom, if possible without another workaround as for floppy but considering to change the method hvm domUs starts qemu. I think that is probably better to add on qemu start command only components already defined xen side in order to avoid creation of unusable components added by qemu itself or mismatches on both sides. Another actual problem is that is difficult to know if the qemu binary that should be used on xl create has the features/components requested and if they are compiled. Actually on xl create we can have only qemu log file after domU create failing. Why not expose all features/components of a given qemu binary and let xl or other toolstack (not only xen) query them before start qemu itself? You are right, it would be nice to be able to know exactly what the qemu binary supports. However it's difficult to do because Linux distros are free to choose any QEMU version they like and any set of compile time options they want for qemu-xen. As a consequence we would actually need to start QEMU and issue QMP commands to figure out what is supported. It would slow down VM creation too much, so we would probably need to do this at host boot time and store the settings somewhere on disk or on xenstore.
[Qemu-devel] [PATCH RFC v2 0/3] pci: complete master abort protocol
Note: The series is incomplete, for review only PCI spec requires that a transaction that has not been claimed by any PCI bus devices will be terminated by the initiator with master abort. For read transactions -1() is returned and writes are silently dropped. The series deals also with the other aspect of the master abort scenario: Upon completion the master has to raise RECEIVED MASTER ABORT BIT in initiator's STATUS register. Implementation: - Allowed the MemoryRegion priority to be negative so a subregion will be visible on all the addresses not covered by the parent MemoryRegion or other subregions. - Added a memory region with negative priority that extends over all the pci address space. This region catches all the accesses to the unassigned pci addresses. - The MemoryRegion's ops emulates the master abort scenario. Note: For the moment the code assumes that all the reads/writes to pci address space are done by the cpu. Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods Marcel Apfelbaum (3): memory: allow MemoryRegion's priority field to accept negative values hw/pci: add MemoryRegion ops for unassigned pci addresses hw/pci-host: catch acesses to unassigned pci addresses hw/pci-host/piix.c| 8 hw/pci-host/q35.c | 19 --- hw/pci/pci.c | 18 ++ include/exec/memory.h | 6 +++--- include/hw/pci-host/q35.h | 1 + include/hw/pci/pci.h | 3 +++ memory.c | 2 +- 7 files changed, 50 insertions(+), 7 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values
Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; AddressSpace *address_space_filter; QTAILQ_ENTRY(MemoryListener) link; }; @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority); + int priority); /** * memory_region_get_ram_addr: Get the ram address associated with a memory diff --git a/memory.c b/memory.c index 5a10fd0..984a3dc 100644 --- a/memory.c +++ b/memory.c @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority) + int priority) { subregion-may_overlap = true; subregion-priority = priority; -- 1.8.3.1
[Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge + * (device 0 on the bus) */ +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, + memory_region_size(bus-address_space_mem)); +memory_region_add_subregion_overlap(bus-address_space_mem, +bus-address_space_mem-addr, +bus-unassigned_mem, +UNASSIGNED_MEM_PRIORITY); + /* host bridge */ QLIST_INIT(bus-child); diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h index 9df1788..4cc25a3 100644 --- a/include/hw/pci/pci_bus.h +++ b/include/hw/pci/pci_bus.h @@ -23,6 +23,7 @@ struct PCIBus { PCIDevice *parent_dev; MemoryRegion *address_space_mem; MemoryRegion *address_space_io; +MemoryRegion unassigned_mem; QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */ -- 1.8.3.1
[Qemu-devel] [PATCH RFC v3 0/2] pci: complete master abort protocol
Note: The series is incomplete, for review only PCI spec requires that a transaction that has not been claimed by any PCI bus devices will be terminated by the initiator with master abort. For read transactions -1() is returned and writes are silently dropped. The series deals also with the other aspect of the master abort scenario: Upon completion the master has to raise RECEIVED MASTER ABORT BIT in initiator's STATUS register. Implementation: - Allowed the MemoryRegion priority to be negative so a subregion will be visible on all the addresses not covered by the parent MemoryRegion or other subregions. - Added a memory region with negative priority that extends over all the pci address space. This region catches all the accesses to the unassigned pci addresses. - The MemoryRegion's ops emulates the master abort scenario. Note: For the moment the code assumes that all the reads/writes to pci address space are done by the cpu. Changes from v2: - minor: changed nr of patches int the title - minor: modified series list Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods Marcel Apfelbaum (2): memory: allow MemoryRegion's priority field to accept negative values hw/pci: handle unassigned pci addresses hw/pci/pci.c | 46 ++ include/exec/memory.h| 6 +++--- include/hw/pci/pci_bus.h | 1 + memory.c | 2 +- 4 files changed, 51 insertions(+), 4 deletions(-) -- 1.8.3.1
[Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values
Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; AddressSpace *address_space_filter; QTAILQ_ENTRY(MemoryListener) link; }; @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority); + int priority); /** * memory_region_get_ram_addr: Get the ram address associated with a memory diff --git a/memory.c b/memory.c index 5a10fd0..984a3dc 100644 --- a/memory.c +++ b/memory.c @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority) + int priority) { subregion-may_overlap = true; subregion-priority = priority; -- 1.8.3.1
[Qemu-devel] [PATCH RFC v3 2/2] hw/pci: handle unassigned pci addresses
Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge + * (device 0 on the bus) */ +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, + memory_region_size(bus-address_space_mem)); +memory_region_add_subregion_overlap(bus-address_space_mem, +bus-address_space_mem-addr, +bus-unassigned_mem, +UNASSIGNED_MEM_PRIORITY); + /* host bridge */ QLIST_INIT(bus-child); diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h index 9df1788..4cc25a3 100644 --- a/include/hw/pci/pci_bus.h +++ b/include/hw/pci/pci_bus.h @@ -23,6 +23,7 @@ struct PCIBus { PCIDevice *parent_dev; MemoryRegion *address_space_mem; MemoryRegion *address_space_io; +MemoryRegion unassigned_mem; QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */ -- 1.8.3.1
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
On Mon, Sep 09, 2013 at 01:07:37PM +0200, Paolo Bonzini wrote: In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores xstate_bv when XSAVE is not available. Instead, it should reset the FXSAVE data to processor-reset values (except for MXCSR which always comes from XRSTOR data), i.e. to all-zeros except for the x87 control and tag words. It should also check reserved bits of MXCSR. I do not see why. Because otherwise it behaves in a subtly different manner for XSAVE and non-XSAVE hosts. I do not see how. Can you elaborate? Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0. On an XSAVE host, when the guest FPU state is loaded KVM will do an XRSTOR. The XRSTOR will restore the FPU state to default values. On a non-XSAVE host, when the guest FPU state is loaded KVM will do an FXRSTR. The FXRSTR will load the FPU state from the first 512 bytes of the block that was passed to KVM_SET_XSAVE. This is not a problem because userspace will usually pass to KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and KVM_GET_XSAVE will never set XSTATE_BV=0. However, KVM_SET_XSAVE is supposed to emulate XSAVE/XRSTOR if it is not available, and it is failing to emulate this detail. You are trying to be bug to bug compatible :) XSTATE_BV can be zero only if FPU state is reset one, otherwise the guest will not survive. KVM_SET_XSAVE is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It is better to outlaw zero value for XSTATE_BV at all, but we cannot do it because current QEMU uses it. Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide Of course we can't, this is correct for other features too, but this is guest's problem. Ok, then we agree that QEMU doesn't have a problem? The XSAVE data will Which problem exactly. The problems I see is that 1. We do not support MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D data is not consistent with features. Guest may not expect it and do stupid things. always be fresh as long as the guest obeys CPUID bits it receives, and the CPUID bits that QEMU passes will never enable XSAVE data that QEMU does not support. -- Gleb.
Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values
On 9 September 2013 12:11, Marcel Apfelbaum marce...@redhat.com wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; You haven't addressed any of the points I made reviewing the first version of this. Please don't just ignore code review, or people will stop reviewing your code. I think it would also be nice to update docs/memory.txt to say explicitly that priority is signed and why this is useful, something like this: begin Priority values are signed, and the default value is zero. This means that you can use memory_region_add_subregion_overlap() both to specify a region that must sit 'above' any others (with a positive priority) and also a background region that sits 'below' others (with a negative priority). endit (in the 'Overlapping regions and priority' section of that document). thanks -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, I would rename unassigned to pci_master_Abort_ everywhere. Call things by what they do not how they are used. @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, + memory_region_size(bus-address_space_mem)); +memory_region_add_subregion_overlap(bus-address_space_mem, +bus-address_space_mem-addr, +bus-unassigned_mem, +UNASSIGNED_MEM_PRIORITY); + /* host bridge */ QLIST_INIT(bus-child); It seems safer to add an API for enabling this functionality. Something like pci_set_master_for_master_abort(PCIBus *, PCIDevice *); diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h index 9df1788..4cc25a3 100644 --- a/include/hw/pci/pci_bus.h +++ b/include/hw/pci/pci_bus.h @@ -23,6 +23,7 @@ struct PCIBus { PCIDevice *parent_dev; MemoryRegion *address_space_mem; MemoryRegion *address_space_io; +MemoryRegion unassigned_mem; QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */ -- 1.8.3.1
Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values
On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Seems harmless enough. Reviewed-by: Michael S. Tsirkin m...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; AddressSpace *address_space_filter; QTAILQ_ENTRY(MemoryListener) link; }; @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority); + int priority); /** * memory_region_get_ram_addr: Get the ram address associated with a memory diff --git a/memory.c b/memory.c index 5a10fd0..984a3dc 100644 --- a/memory.c +++ b/memory.c @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority) + int priority) { subregion-may_overlap = true; subregion-priority = priority; -- 1.8.3.1
Re: [Qemu-devel] virtio with Windows 8.
On Mon, Sep 09, 2013 at 12:02:57AM -0500, Yaodong Yang wrote: 1. I create a raw image named as win8.img, using the following command: /usr/local/kvm/bin/qemu-img create -f raw win8.img 20G 2. I try to install win8 with the following command, but I failed several times. sudo /usr/local/kvm/bin/qemu-system-x86_64 -enable-kvm -drive file=./win8.img,if=virtio,cache=none -cdrom ./win8.iso -boot d -m 2048. 3. I try the same command but with if=ide, it works. Could someone tell me the reason for this failure? The Windows installer will not detect virtio-blk disks since Windows does not come with virtio drivers by default. You must provide the virtio-win drivers ISO during the installation: https://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ Watch for the screen where Windows prompts you for a driver disk and put in the virtio-win ISO. Stefan
Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values
On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Seems harmless enough. Reviewed-by: Michael S. Tsirkin m...@redhat.com No, the idea is good but this version is just broken. See the comments I made on the previous version which Marcel ignored :-( -- PMM
Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values
On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote: On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Seems harmless enough. Reviewed-by: Michael S. Tsirkin m...@redhat.com No, the idea is good but this version is just broken. See the comments I made on the previous version which Marcel ignored :-( -- PMM You are right, I missed the bugs. Good catch, thanks.
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
Il 09/09/2013 13:28, Gleb Natapov ha scritto: On an XSAVE host, when the guest FPU state is loaded KVM will do an XRSTOR. The XRSTOR will restore the FPU state to default values. On a non-XSAVE host, when the guest FPU state is loaded KVM will do an FXRSTR. The FXRSTR will load the FPU state from the first 512 bytes of the block that was passed to KVM_SET_XSAVE. This is not a problem because userspace will usually pass to KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and KVM_GET_XSAVE will never set XSTATE_BV=0. However, KVM_SET_XSAVE is supposed to emulate XSAVE/XRSTOR if it is not available, and it is failing to emulate this detail. You are trying to be bug to bug compatible :) XSTATE_BV can be zero only if FPU state is reset one, otherwise the guest will not survive. Yes. KVM_SET_XSAVE is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It is better to outlaw zero value for XSTATE_BV at all, but we cannot do it because current QEMU uses it. I agree it'd be better to forbid it. If the mismatch in semantics does not bother you, I won't fix it. It slightly bothers me. :) Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide Of course we can't, this is correct for other features too, but this is guest's problem. Ok, then we agree that QEMU doesn't have a problem? The XSAVE data will Which problem exactly. The problems I see is that 1. We do not support MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D data is not consistent with features. Guest may not expect it and do stupid things. It is not a problem to unmarshal information out of KVM_GET_XSAVE data (and back). If the guest does stupid things, it's a bug in an ill-behaving guest. On the other hand, I agree that passthrough of host 0xD data is bad and will fix it. Paolo always be fresh as long as the guest obeys CPUID bits it receives, and the CPUID bits that QEMU passes will never enable XSAVE data that QEMU does not support. -- Gleb.
Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values
On Mon, Sep 09, 2013 at 02:48:55PM +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote: On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com Seems harmless enough. Reviewed-by: Michael S. Tsirkin m...@redhat.com No, the idea is good but this version is just broken. See the comments I made on the previous version which Marcel ignored :-( -- PMM You are right, I missed the bugs. Good catch, thanks. So I guess it's only an Acked-by: Michael S. Tsirkin m...@redhat.com at this point :)
Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions
On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote: The bswap.h header includes a set of legacy unaligned functions that (since commit c732a52d3 at the beginning of this year) are just wrappers for underlying {ld,st}type functions. The legacy functions aren't used in many places, so just replace all their uses with uses of the new-style {ld,st} functions; this lets us remove the legacy wrappers altogether. I have a question on this: what happens if an unaligned address is supplied? In particular, if the address is supplied by the guest? Esp the pci wrappers have many callers - were they all audited? I tried checking but there are many callers, will take a while. Since we know the {ld,st}* routines are definitely functions, we can in the process remove some casts which were left over from when the legacy unaligned functions were previously macros. The patchset is divided up by function being removed, rather than by which device/subsystem is being fixed; I think this way round is easier to review since you only have to keep one substitution in your head when reading a patch. Peter Maydell (9): bswap.h: Remove cpu_to_le16wu() bswap.h: Remove cpu_to_le32wu() bswap.h: Remove le16_to_cpupu() bswap.h: Remove le32_to_cpupu() bswap.h: Remove be32_to_cpupu() bswap.h: Remove cpu_to_be16wu() bswap.h: Remove cpu_to_be32wu() bswap.h: Remove cpu_to_be64wu() bswap.h: Remove cpu_to_32wu() block/qcow2-cluster.c |2 +- hw/acpi/core.c|3 +-- hw/block/cdrom.c | 10 +- hw/display/vga_template.h | 14 -- hw/ide/atapi.c| 16 +++ hw/net/e1000.c| 22 + hw/net/ne2000.c |4 ++-- hw/pci/pcie_aer.c |4 ++-- include/hw/pci/pci.h |8 include/qemu/bswap.h | 47 - 10 files changed, 40 insertions(+), 90 deletions(-) -- 1.7.9.5
Re: [Qemu-devel] [PATCH V4 0/3] qemu-iotests: add test for fd passing via SCM rights
On Fri, Sep 06, 2013 at 11:24:31AM +0800, Wenchao Xia wrote: This series add test case for fd passing with unix socket at runtime. Since getfd and closefd interface will interact with monitor's data, so it will help to do regression test for monitor patches. Since python2 do not support sendmsg(), so a C helper program is added to do the job. v2: 1: add missing $ in the makefile rule. v3: Address Eric's comments: 1: typo fix, remove . in the end of error message, strick check argc as !=, use EXIT_SUCCESS and EXIT_FAILURE as exit values, strict error check for strtol() call. Address Luiz's comments: 1: change the helper program parameter as bin socket-fd file-path , the program open the file itself now, data parameter is removed and blank is always used as iov data, better usage tip message, folder the string parsing code into a function. 2: related change for helper program parameter change. 3: related change for helper program parameter change. Other: 1: remove LINK rule in makefile, remove fd checking code inside send_fd() since it is already checked before calling, add '' around %s for path and number string in error message. 2: renamed fd_bin to bin in send_fd_scm() to tip better, add '' around %s for path in error message. v4: Address Stefan's comments: 2: add space after # for comments, refined the comment's grammar. 3: add space after # for comments, refined the comment's grammar, add two test cases for error path. Wenchao Xia (3): 1 qemu-iotests: add unix socket help program 2 qemu-iotests: add infrastructure of fd passing via SCM 3 qemu-iotests: add tests for runtime fd passing via SCM rights QMP/qmp.py |6 ++ configure |2 +- tests/Makefile |3 +- tests/qemu-iotests/045 | 51 - tests/qemu-iotests/045.out |4 +- tests/qemu-iotests/check |1 + tests/qemu-iotests/iotests.py | 23 ++ tests/qemu-iotests/socket_scm_helper.c | 135 8 files changed, 220 insertions(+), 5 deletions(-) create mode 100644 tests/qemu-iotests/socket_scm_helper.c Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust
Il 09/09/2013 12:41, Gleb Natapov ha scritto: In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here, and we should hide all features that are not visible in CPUID. It is okay, however, to test it in cpu_post_load. The kernel should not even return state that is not visible in CPUID. That's an interesting point of view that I hadn't considered. But just like you asked me why it should return state that is not visible in CPUID, I'm asking you why it should not... For number of reasons. First because since a sate is not used there is no point in migrating it. Second to make interface more deterministic for QEMU. i.e QEMU configures only features it supports and gets exactly same state from the kernel no matter what host cpu is and what kernel version is. This patch will not be needed since kernel will do the job. Good reasons, thanks. Let's do it in the kernel then and avoid this patch altogether. Paolo
Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12
On Mon, Sep 09, 2013 at 01:46:49PM +0200, Paolo Bonzini wrote: Yes. QEMU unmarshals information from the XSAVE region and back, so it cannot support MPX or AVX-512 yet (even if KVM were). Separate bug, though. IMO this is the main issue here, not separate bug. If we gonna let guest use CPU state QEMU does not support we gonna have a bad time. We cannot force the guest not to use a feature; all we can do is hide Of course we can't, this is correct for other features too, but this is guest's problem. Ok, then we agree that QEMU doesn't have a problem? The XSAVE data will Which problem exactly. The problems I see is that 1. We do not support MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D data is not consistent with features. Guest may not expect it and do stupid things. It is not a problem to unmarshal information out of KVM_GET_XSAVE data (and back). If the guest does stupid things, it's a bug in an ill-behaving guest. You know I am first in line to blame guest for everything :) (who needs guests anyway) but in this case I didn't mean that guest does something illegal. If we advertise support for some XSAVE state in 0D leaf guest is in his right to make conclusions we may not expect from that. It may check corespondent feature bit and crash if it is not present for instance. On the other hand, I agree that passthrough of host 0xD data is bad and will fix it. Thanks! -- Gleb.
Re: [Qemu-devel] [PATCH v2 1/5] util: add socket_set_fast_reuse function which will replace setting SO_REUSEADDR
On Wed, Sep 04, 2013 at 07:08:51PM +0200, Sebastian Ottlik wrote: diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 3dc8b1b..f071793 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -159,6 +159,20 @@ void qemu_set_nonblock(int fd) fcntl(fd, F_SETFL, f | O_NONBLOCK); } +int socket_set_fast_reuse(int fd) +{ +int val=1, ret; + +ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, + (const char *)val, sizeof(val)); + +if (ret 0) { + perror(setsockopt(SOL_SOCKET, SO_REUSEADDR)); +} + +return ret; +} Please run scripts/checkpatch.pl before submitting patches to check coding style (whitespace, indentation, etc).
Re: [Qemu-devel] [PATCH v5 0/6] block/qcow2: Image file option amendment
Am 03.09.2013 um 10:09 hat Max Reitz geschrieben: This series adds support to qemu-img, block and qcow2 for amending image options on existing image files. Depends on: - option: Add assigned flag to QEMUOptionParameter - qcow2-refcount: Snapshot update for zero clusters (series, v3) - Add metadata overlap checks (series, v5) Thanks, applied to the block branch with the following changes: - Updated patch 5 with [PATCH] block/qcow2: Use bdrv_truncate for size amend - Resolved merge conflict in patch 1 (bdrv_delete - bdrv_unref) Kevin
Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values
On Mon, 2013-09-09 at 12:28 +0100, Peter Maydell wrote: On 9 September 2013 12:11, Marcel Apfelbaum marce...@redhat.com wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; You haven't addressed any of the points I made reviewing the first version of this. Please don't just ignore code review, or people will stop reviewing your code. Hi Peter, I really value your comments and I did acted upon them. Basically all the changes of this version are based on your comments, thanks! I did answer to your comment and I was going to remove it, but I missed it again :(. Sorry for that. I will remove it of course in the following version. I think it would also be nice to update docs/memory.txt to say explicitly that priority is signed and why this is useful, something like this: Of course I will, thanks! begin Priority values are signed, and the default value is zero. This means that you can use memory_region_add_subregion_overlap() both to specify a region that must sit 'above' any others (with a positive priority) and also a background region that sits 'below' others (with a negative priority). endit (in the 'Overlapping regions and priority' section of that document). thanks -- PMM
Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows
On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote: On 04.09.2013 19:08, Sebastian Ottlik wrote: This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems the default behaviour is equivalent to SO_REUSEADDR on other operating systems. SO_REUSEADDR can still be set but results in undesired behaviour instead. It may even lead to situations were system behaviour is unspecified. More information on this can be found at: http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx I originally encountered this issue when accidentally launching two QEMU instances with identical GDB ports at the same time. In which case QEMU won't fail as one might expect. v2 Changes: - Introduce a function with os specific implementation instead of using #ifdef I named it socket_set_fast_reuse instead of the suggested qemu_set_reuseaddr so the name better reflects what the function actually does. gdbstub.c |6 ++ include/qemu/sockets.h |1 + net/socket.c | 19 +++ slirp/misc.c |3 +-- slirp/socket.c |4 +--- slirp/tcp_subr.c |6 ++ slirp/udp.c|4 ++-- util/oslib-posix.c | 14 ++ util/oslib-win32.c | 10 ++ util/qemu-sockets.c|6 +++--- 10 files changed, 43 insertions(+), 30 deletions(-) util: add socket_set_fast_reuse function gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR net: call socket_set_fast_reuse instead of setting SO_REUSEADDR slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR util: call socket_set_fast_reuse instead of setting SO_REUSEADDR Pinging this patch, as I think it is still an appropriate approach to the issue: I did some research and apparently there is a valid use case for SO_REUSEADDR on windows when multiple clients need to listen to the same port for the same multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows might be confusing for some use cases. Actually net_socket_mcast_create in net/socket.c should probably set SO_REUSEADDR on windows. This is also an issue with patch 3 I supplied that I will address in a new version of this patch set if there is an agreement on a general approach. Sounds like a good idea. The patch series overall looks good. Stefan
Re: [Qemu-devel] [PATCH v2] kvm: fix traces to use %x instead of %d
On Wed, Sep 04, 2013 at 08:26:25PM +1000, Alexey Kardashevskiy wrote: KVM request types are normally defined using hex constants but QEMU traces print decimal values instead, which is not very convenient. This changes the request type format from %d to %x. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Reviewed-by: Andreas Färber afaer...@suse.de Reviewed-by: Paolo Bonzini pbonz...@redhat.com --- Changes: v2: * added 0x to format strings so the change won't come unnoticed * fixed the commit message --- trace-events | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Thanks, applied to my tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ OK + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. I wanted to add the support for bridges later, I am struggling with some issues with PCI bridges. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, I would rename unassigned to pci_master_Abort_ everywhere. Are you sure about the capital A? For the memory ops I suppose is OK, but the MemoryRegion name I think it should remain unassigned_mem; it will be strange to name it pci_master_Abort_mem. Call things by what they do not how they are used. @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, + memory_region_size(bus-address_space_mem)); +memory_region_add_subregion_overlap(bus-address_space_mem, +bus-address_space_mem-addr, +bus-unassigned_mem, +UNASSIGNED_MEM_PRIORITY); + /* host bridge */ QLIST_INIT(bus-child); It seems safer to add an API for enabling this functionality. Something like pci_set_master_for_master_abort(PCIBus *, PCIDevice *); I started to implement that way, but it would be strange (I think) to see in the code the above method because almost all the pci devices can be master devices. I selected an implicit implementation so for this reason exactly. We do not really select a master, but a master for writes/reads on pci address space. Selecting device 0 on the bus automatically for this makes more sense to me. What do you think? Marcel diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h index 9df1788..4cc25a3 100644 --- a/include/hw/pci/pci_bus.h +++ b/include/hw/pci/pci_bus.h @@ -23,6 +23,7 @@ struct PCIBus { PCIDevice *parent_dev; MemoryRegion *address_space_mem; MemoryRegion *address_space_io; +MemoryRegion unassigned_mem; QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ QLIST_ENTRY(PCIBus) sibling;/* this will be replaced
Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows
On 09.09.2013 14:05, Stefan Hajnoczi wrote: On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote: On 04.09.2013 19:08, Sebastian Ottlik wrote: This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems the default behaviour is equivalent to SO_REUSEADDR on other operating systems. SO_REUSEADDR can still be set but results in undesired behaviour instead. It may even lead to situations were system behaviour is unspecified. More information on this can be found at: http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx I originally encountered this issue when accidentally launching two QEMU instances with identical GDB ports at the same time. In which case QEMU won't fail as one might expect. v2 Changes: - Introduce a function with os specific implementation instead of using #ifdef I named it socket_set_fast_reuse instead of the suggested qemu_set_reuseaddr so the name better reflects what the function actually does. gdbstub.c |6 ++ include/qemu/sockets.h |1 + net/socket.c | 19 +++ slirp/misc.c |3 +-- slirp/socket.c |4 +--- slirp/tcp_subr.c |6 ++ slirp/udp.c|4 ++-- util/oslib-posix.c | 14 ++ util/oslib-win32.c | 10 ++ util/qemu-sockets.c|6 +++--- 10 files changed, 43 insertions(+), 30 deletions(-) util: add socket_set_fast_reuse function gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR net: call socket_set_fast_reuse instead of setting SO_REUSEADDR slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR util: call socket_set_fast_reuse instead of setting SO_REUSEADDR Pinging this patch, as I think it is still an appropriate approach to the issue: I did some research and apparently there is a valid use case for SO_REUSEADDR on windows when multiple clients need to listen to the same port for the same multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows might be confusing for some use cases. Actually net_socket_mcast_create in net/socket.c should probably set SO_REUSEADDR on windows. This is also an issue with patch 3 I supplied that I will address in a new version of this patch set if there is an agreement on a general approach. Sounds like a good idea. The patch series overall looks good. Stefan Thanks for the feedback. I will resubmit the patch series including the change for net_socket_mcast_create and fixes for the style issues you pointed out soon. When I submitted this new version of the patch set I think I was a little early as there was still some discussion in the thread of the original version. In general, what is a good period to wait before submitting a new version? Best Regards, Sebastian
Re: [Qemu-devel] [PATCH RFC 1/3] memory: allow MemoryRegion's priority field to accept negative values
On Mon, 2013-09-02 at 15:38 +0100, Peter Maydell wrote: On 2 September 2013 15:13, Marcel Apfelbaum marce...@redhat.com wrote: Priority is used to make visible some subregions by obscuring the parent MemoryRegion addresses overlapping with the subregion. By allowing the priority to be negative the opposite can be done: Allow a subregion to be visible on all the addresses not covered by the parent MemoryRegion or other subregions. This comment is not exactly accurate. Allowing priority to be signed is just a convenience: you can achieve exactly the same effect by specifying some positive priority for everything you map into the region and having the background region be priority zero. (If you care at all about priorities then everything being mapped into the region should be happening under the control of your code anyway.) Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- include/exec/memory.h | 6 +++--- memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index ebe0d24..6995087 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -153,7 +153,7 @@ struct MemoryRegion { bool flush_coalesced_mmio; MemoryRegion *alias; hwaddr alias_offset; -unsigned priority; +int priority; bool may_overlap; QTAILQ_HEAD(subregions, MemoryRegion) subregions; QTAILQ_ENTRY(MemoryRegion) subregions_link; @@ -193,7 +193,7 @@ struct MemoryListener { void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection *section, hwaddr addr, hwaddr len); /* Lower = earlier (during add), later (during del) */ -unsigned priority; +int priority; This is unrelated to MemoryRegion priorities -- it controls the order in which listener callbacks are called. Don't try to change both at once (and you only need the MR priorities anyway.) AddressSpace *address_space_filter; QTAILQ_ENTRY(MemoryListener) link; }; @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority); + int priority); /** * memory_region_get_ram_addr: Get the ram address associated with a memory diff --git a/memory.c b/memory.c index 886f838..dfb3ae6 100644 --- a/memory.c +++ b/memory.c @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr, void memory_region_add_subregion_overlap(MemoryRegion *mr, hwaddr offset, MemoryRegion *subregion, - unsigned priority) + int priority) { subregion-may_overlap = true; subregion-priority = priority; This isn't a complete set of changes. For instance memory_region_set_address() has a local variable 'priority' which should be signed now. sysbus_mmio_map_common() and sysbus_mmio_map_overlap() have priority arguments which need to change to match. Missed that :( Making necessary changes and send a new version Thanks, Marcel thanks -- PMM
Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions
On 9 September 2013 12:54, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote: The bswap.h header includes a set of legacy unaligned functions that (since commit c732a52d3 at the beginning of this year) are just wrappers for underlying {ld,st}type functions. The legacy functions aren't used in many places, so just replace all their uses with uses of the new-style {ld,st} functions; this lets us remove the legacy wrappers altogether. I have a question on this: what happens if an unaligned address is supplied? In particular, if the address is supplied by the guest? Esp the pci wrappers have many callers - were they all audited? I tried checking but there are many callers, will take a while. This patchset has zero behavioural changes, because literally all it is doing is taking trivial wrapper functions like static inline void cpu_to_le16wu(uint16_t *p, uint16_t v) { stw_le_p(p, v); } and replacing the calls to the wrapper with direct calls to the underlying function. There's no need to audit calls to the callsites because there is no semantic change and not even any implementation change here. There are no problems with unaligned accesses, because the stw_le_p c functions are designed to work for unaligned accesses: they boil down to calls to ldl_p/stw_p and friends, which use memcpy() to ensure they work OK with unaligned accesses. You can check all this by looking at bswap.h. -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ OK + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. I wanted to add the support for bridges later, I am struggling with some issues with PCI bridges. Shouldn't be very different. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, I would rename unassigned to pci_master_Abort_ everywhere. Are you sure about the capital A? Ugh no, that's a typo. For the memory ops I suppose is OK, but the MemoryRegion name I think it should remain unassigned_mem; it will be strange to name it pci_master_Abort_mem. Why would it be strange? Call things by what they do not how they are used. @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, + memory_region_size(bus-address_space_mem)); +memory_region_add_subregion_overlap(bus-address_space_mem, +bus-address_space_mem-addr, +bus-unassigned_mem, +UNASSIGNED_MEM_PRIORITY); + /* host bridge */ QLIST_INIT(bus-child); It seems safer to add an API for enabling this functionality. Something like pci_set_master_for_master_abort(PCIBus *, PCIDevice *); I started to implement that way, but it would be strange (I think) to see in the code the above method because almost all the pci devices can be master devices. In theory yes in practice no one programs devices with addresses that trigger master abort. Addressing this theorectical issue would require memory ops to get the bus master pointer. When this happens the new API can go away, but we can do this gradually. I selected an implicit implementation so for this reason exactly. We do not really select a master, but a master for writes/reads on pci address space. Selecting device 0 on the bus automatically for this
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote: Il 06/09/2013 20:41, Eduardo Otubo ha scritto: Hello, Any chance to get this patch applied? Thanks! Paul, perhaps you can add yourself to MAINTAINERS and send a pull request? Paolo Out of respect for the work that Eduardo has done, and is continuing to do, with the QEMU seccomp filtering, I think Eduardo should be the one to take on this role. If Eduardo declines I'll do ahead and submit a patch adding myself to the MAINTAINERS file. On 09/04/2013 11:11 AM, Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ OK + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. I wanted to add the support for bridges later, I am struggling with some issues with PCI bridges. Shouldn't be very different. The current implementation uses only one MemoryRegion that spans over all pci address space. It catches all the accesses both to primary bus addresses (bus 0), or to the regions covered by the bridges. The callback ALWAYS receive a pointer to the primary bus. What would be a better approach? 1. Remain with a single MemoryRegion and check if the address belongs to a bridge address range and recursively travel the bridges tree and update the registers 2. Model a MemoryRegion for each bridge that represents the address range behind the bridge (not existing today). Add a MemoryRegion child to catch accesses to unassigned addresses behind the bridge, then recursively travel the bridges to top and update the registers. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, I would rename unassigned to pci_master_Abort_ everywhere. Are you sure about the capital A? Ugh no, that's a typo. For the memory ops I suppose is OK, but the MemoryRegion name I think it should remain unassigned_mem; it will be strange to name it pci_master_Abort_mem. Why would it be strange? 1. Because it states what happens when there is a an access to unassigned memory and not that IS an unassigned memory. 2. This name is already used for unassigned IO ports and maybe is better to follow this convension Call things by what they do not how they are used. @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent, bus-address_space_mem = address_space_mem; bus-address_space_io = address_space_io; + +memory_region_init_io(bus-unassigned_mem, OBJECT(bus), + unassigned_mem_ops, bus, pci-unassigned, +
[Qemu-devel] [PATCH] hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.
Follow-up to Rusty's virtio endianness serie: enough to get a working virtfs mount. Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p. Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com --- hw/9pfs/virtio-9p-device.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c index f0ffbe8..bc2d0da 100644 --- a/hw/9pfs/virtio-9p-device.c +++ b/hw/9pfs/virtio-9p-device.c @@ -19,6 +19,7 @@ #include fsdev/qemu-fsdev.h #include virtio-9p-xattr.h #include virtio-9p-coth.h +#include hw/virtio/virtio-access.h static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features) { @@ -34,7 +35,7 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t *config) len = strlen(s-tag); cfg = g_malloc0(sizeof(struct virtio_9p_config) + len); -stw_raw(cfg-tag_len, len); +virtio_stl_p(cfg-tag_len, len); /* We don't copy the terminating null to config space */ memcpy(cfg-tag, s-tag, len); memcpy(config, cfg, s-config_size);
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote: The scenario is covered only for the primary bus and not for buses behind the PCI bridge (the later being handled differently.) In this case, isn't the Host Bridge always device 00.0? No. For instance the host controller may pass a nonzero value of devfn_min to pci_bus_new/pci_register_bus (in which case the host bridge will end up there; example hw/pci-host/versatile.c) or it might just pass a nonzero devfn to pci_create_simple when it creates the host controller PCI device on the bus (I don't think we have anything that does it that way, though). If not, can we find a way to scan all bus devices and find the host bridge so we will not have to manually add it to each host bridge? It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote: On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote: The scenario is covered only for the primary bus and not for buses behind the PCI bridge (the later being handled differently.) In this case, isn't the Host Bridge always device 00.0? No. For instance the host controller may pass a nonzero value of devfn_min to pci_bus_new/pci_register_bus (in which case the host bridge will end up there; example hw/pci-host/versatile.c) or it might just pass a nonzero devfn to pci_create_simple when it creates the host controller PCI device on the bus (I don't think we have anything that does it that way, though). If not, can we find a way to scan all bus devices and find the host bridge so we will not have to manually add it to each host bridge? It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. -- PMM Yes. As a shortcut, what I suggest is registering the device that wants to be notified of master aborts with the bus. -- MST
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote: It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. Yes. As a shortcut, what I suggest is registering the device that wants to be notified of master aborts with the bus. Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote: On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote: The scenario is covered only for the primary bus and not for buses behind the PCI bridge (the later being handled differently.) In this case, isn't the Host Bridge always device 00.0? No. For instance the host controller may pass a nonzero value of devfn_min to pci_bus_new/pci_register_bus (in which case the host bridge will end up there; example hw/pci-host/versatile.c) or it might just pass a nonzero devfn to pci_create_simple when it creates the host controller PCI device on the bus (I don't think we have anything that does it that way, though). If my assumption is not generally true, I will not use it Thanks! If not, can we find a way to scan all bus devices and find the host bridge so we will not have to manually add it to each host bridge? It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. This is exactly my point. ALL device on the bus can be masters of a DMA transaction. So adding an interface as suggested by Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *) for the general case (a device doing DMA) it is too far from reality. However, if we don't want the master device far all accesses (including DMA) but only for accesses to pci address space, in this specific case we can appoint the HostBridge (explicitly/implicitly) as the master device because most devices do not access pci address space of other devices. As a conclusion, should we call the API pci_set_master_for_master_abort_on_pci_space ? Marcel -- PMM
Re: [Qemu-devel] [PATCH] qxl: fix local renderer
On 09/06/2013 01:20 AM, Gerd Hoffmann wrote: The local spice renderer assumes the primary surface is located at the start of the ram bar. This used to be a requirement in qxl hardware revision 1. In revision 2+ this is relaxed. Nevertheless guest drivers continued to use the traditional location, for historical and backward compatibility reasons. The qxl kms driver doesn't though as it depends on qxl revision 4+ anyway. Result is that local rendering is hosed for recent linux guests, you'll get pixel garbage with non-spice ui (gtk, sdl, vnc) and when doing screendumps. Fix that by doing a proper mapping of the guest-specified memory location. https://bugzilla.redhat.com/show_bug.cgi?id=948717 Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/display/qxl-render.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c index 269b1a7..d34b0c4 100644 --- a/hw/display/qxl-render.c +++ b/hw/display/qxl-render.c @@ -31,10 +31,6 @@ static void qxl_blit(PCIQXLDevice *qxl, QXLRect *rect) if (is_buffer_shared(surface)) { return; } -if (!qxl-guest_primary.data) { -trace_qxl_render_blit_guest_primary_initialized(); -qxl-guest_primary.data = memory_region_get_ram_ptr(qxl-vga.vram); -} trace_qxl_render_blit(qxl-guest_primary.qxl_stride, rect-left, rect-right, rect-top, rect-bottom); src = qxl-guest_primary.data; @@ -104,7 +100,12 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl) if (qxl-guest_primary.resized) { qxl-guest_primary.resized = 0; -qxl-guest_primary.data = memory_region_get_ram_ptr(qxl-vga.vram); +qxl-guest_primary.data = qxl_phys2virt(qxl, + qxl-guest_primary.surface.mem, +MEMSLOT_GROUP_GUEST); +if (!qxl-guest_primary.data) { +return; +} qxl_set_rect_to_surface(qxl, qxl-dirty[0]); qxl-num_dirty_rects = 1; trace_qxl_render_guest_primary_resized( @@ -128,6 +129,10 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl) } dpy_gfx_replace_surface(vga-con, surface); } + +if (!qxl-guest_primary.data) { +return; +} for (i = 0; i qxl-num_dirty_rects; i++) { if (qemu_spice_rect_is_empty(qxl-dirty+i)) { break; Tested-by: Cole Robinson crobi...@redhat.com And cc-ing qemu-stable - Cole
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote: On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote: It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. Yes. As a shortcut, what I suggest is registering the device that wants to be notified of master aborts with the bus. Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? This is what I would really want to do, but some HOST Bridge devices inherit directly from PCI_DEVICE. TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which is a not a PCI device and does not help us here (not a PCI_DEVICE on the bus) Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE and it hold as composition a PCIDevice that will be part of the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly inherits from PCI_DEVICE. It seems to me as a dead end, or we need to re-factor the current hierarchy. Marcel -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 14:07, Marcel Apfelbaum marce...@redhat.com wrote: This is exactly my point. ALL device on the bus can be masters of a DMA transaction. So adding an interface as suggested by Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *) for the general case (a device doing DMA) it is too far from reality. Actually I don't think it would be too painful. At the moment in do_pci_register_device() we do this to create the memory region used by a device for its bus master transactions: memory_region_init_alias(pci_dev-bus_master_enable_region, OBJECT(pci_dev), bus master, dma_as-root, 0, memory_region_size(dma_as-root)); If instead of using this alias directly as the bus_master_enable region you instead: * create a container region * create a 'background' region at negative priority (ie one per device, and you can make the 'opaque' pointer point to the device, not the bus) * put the alias and the background region into the container * use the container as the bus_master_enable region then you will get in your callback a pointer to the device which caused the abort. You can then have your callback call a method defined on PCIDevice which we implement: * as do-nothing in the PCI device base class * as set-the-master-abort bit in the PCI host bridge class (and anybody who wants to get fancy about handling aborts can override it in their own device implementation) That seems achievable without really requiring new infrastructure. Have I missed something that wouldn't work if we did this? thanks -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote: On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote: Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? This is what I would really want to do, but some HOST Bridge devices inherit directly from PCI_DEVICE. TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which is a not a PCI device and does not help us here (not a PCI_DEVICE on the bus) Oops, yes, I get those two the wrong way round a lot. Anyway, if we need to make all host bridges have a common subclass we could certainly refactor them accordingly. Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE and it hold as composition a PCIDevice that will be part of the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly inherits from PCI_DEVICE. This may just be wrong choice of name rather than actually wrong hierarchy. -- PMM
Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist
On 09/09/2013 09:36 AM, Paul Moore wrote: On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote: Il 06/09/2013 20:41, Eduardo Otubo ha scritto: Hello, Any chance to get this patch applied? Thanks! Paul, perhaps you can add yourself to MAINTAINERS and send a pull request? Paolo Out of respect for the work that Eduardo has done, and is continuing to do, with the QEMU seccomp filtering, I think Eduardo should be the one to take on this role. If Eduardo declines I'll do ahead and submit a patch adding myself to the MAINTAINERS file. If this is ok for everyone, I would be really glad to take this role to myself. Paul, thanks for this vote of confidence. Paolo, should I send a patch for MAINTAINERS right away? Regards, On 09/04/2013 11:11 AM, Paul Moore wrote: On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote: This was causing Qemu process to hang when using -sandbox on. Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com Works for me. Tested-by: Paul Moore pmo...@redhat.com --- qemu-seccomp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 37d38f8..69cee44 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(getuid), 245 }, { SCMP_SYS(geteuid), 245 }, { SCMP_SYS(timer_create), 245 }, +{ SCMP_SYS(times), 245 }, { SCMP_SYS(exit), 245 }, { SCMP_SYS(clock_gettime), 245 }, { SCMP_SYS(time), 245 }, -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote: On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote: On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote: Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? This is what I would really want to do, but some HOST Bridge devices inherit directly from PCI_DEVICE. TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which is a not a PCI device and does not help us here (not a PCI_DEVICE on the bus) Oops, yes, I get those two the wrong way round a lot. Anyway, if we need to make all host bridges have a common subclass we could certainly refactor them accordingly. Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE and it hold as composition a PCIDevice that will be part of the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly inherits from PCI_DEVICE. This may just be wrong choice of name rather than actually wrong hierarchy. I try not to judge the naming convention, so let's leave it aside for now. My issue is that we have at least 2 ways to model the bridges: 1. TYPE_PCI_HOST_BRIDGE * derives from TYPE_SYS_BUS_DEVICE * has a bus * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which derives from TYPE_PCI_DEVICE 2. TYPE_PCIE_HOST_BRIDGE * derives from TYPE_PCI_HOST_BRIDGE which derives from TYPE_SYS_BUS_DEVICE * has a PciDevice and register it to the bus in order to work as (1) I would like to implement an hierarchy that will allow all the host bridge devices to have a common ancestor In this was, we can scan the PCI bus to look for master... -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 14:29, Marcel Apfelbaum marce...@redhat.com wrote: My issue is that we have at least 2 ways to model the bridges: 1. TYPE_PCI_HOST_BRIDGE * derives from TYPE_SYS_BUS_DEVICE * has a bus * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which derives from TYPE_PCI_DEVICE 2. TYPE_PCIE_HOST_BRIDGE * derives from TYPE_PCI_HOST_BRIDGE which derives from TYPE_SYS_BUS_DEVICE * has a PciDevice and register it to the bus in order to work as (1) So I think there are two different (and slightly confusing) things here: (1) the model of the host's PCI controller; this is typically derived from TYPE_SYS_BUS_DEVICE somehow but I guess in theory need not be; often it's a TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE. (2) the PCI device which sits on the PCI bus and is visible to the guest, usually with a PCI class ID of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile is different, for instance). Currently we generally make these direct subclasses of TYPE_PCI_DEVICE. [The naming convention confusion arises because different controllers are inconsistent about how they name these classes and which type name ends up with 'host' in it.] What you're going to get in the callback is (2)... I would like to implement an hierarchy that will allow all the host bridge devices to have a common ancestor In this was, we can scan the PCI bus to look for master... ...and yes, we could insert an extra class and make all those bridge hosts derive from it rather than directly from TYPE_PCI_DEVICE if we needed to. -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 14:16 +0100, Peter Maydell wrote: On 9 September 2013 14:07, Marcel Apfelbaum marce...@redhat.com wrote: This is exactly my point. ALL device on the bus can be masters of a DMA transaction. So adding an interface as suggested by Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *) for the general case (a device doing DMA) it is too far from reality. Actually I don't think it would be too painful. At the moment in do_pci_register_device() we do this to create the memory region used by a device for its bus master transactions: memory_region_init_alias(pci_dev-bus_master_enable_region, OBJECT(pci_dev), bus master, dma_as-root, 0, memory_region_size(dma_as-root)); If instead of using this alias directly as the bus_master_enable region you instead: * create a container region * create a 'background' region at negative priority (ie one per device, and you can make the 'opaque' pointer point to the device, not the bus) * put the alias and the background region into the container * use the container as the bus_master_enable region then you will get in your callback a pointer to the device which caused the abort. You can then have your callback call a method defined on PCIDevice which we implement: * as do-nothing in the PCI device base class * as set-the-master-abort bit in the PCI host bridge class The Received Master Abort bit must be set for the initiator. In the case described here this bit mast be set in the device register rather that in host bridge. (and anybody who wants to get fancy about handling aborts can override it in their own device implementation) That seems achievable without really requiring new infrastructure. Have I missed something that wouldn't work if we did this? The idea seems correct (and cool!) to me (I'll look deeper), but it covers only one way: upstream. We need to unify this with the downstream: The cpu accesses to unassigned memory should result in the callback to the host bridge. All we need is that all the host bridges to have a common class and following the same idea we get the downstream (The host bridge inititates all transactions on the bus on behalf of the cpu) If this works, we don't need no work around! Marcel thanks -- PMM
[Qemu-devel] seccomp submaintainer? (was Re: [PATCH] seccomp: adding times() to the whitelist)
Il 09/09/2013 15:20, Eduardo Otubo ha scritto: Out of respect for the work that Eduardo has done, and is continuing to do, with the QEMU seccomp filtering, I think Eduardo should be the one to take on this role. If Eduardo declines I'll do ahead and submit a patch adding myself to the MAINTAINERS file. If this is ok for everyone, I would be really glad to take this role to myself. Paul, thanks for this vote of confidence. Paolo, should I send a patch for MAINTAINERS right away? Ok, I was suggesting Paul because he was the one doing reviews. Eduardo, that is also okay for me. However, even as a maintainer please do wait for Paul's reviews. Many areas of QEMU have maintainers that do not send their own patches without a review, so this wouldn't be a new rule. :) Please wait for Anthony's ack. I changed the subject and CCed him to grab his attention. Paolo
[Qemu-devel] [RFC] qapi/error: Optional error propagation backtrace
Add a configure switch which enables an error propagation backtrace. This results in the error_set function prepending every message by the source file name, function and line in which it was called, as well as error_propagate appending this information to the propagated message, resulting in a backtrace. Signed-off-by: Max Reitz mre...@redhat.com --- Since this obviously breaks existing tests (and cannot really be used for new tests since it includes a line number, which is a rather volatile output) and is generally not very useful to normal users, the switch is disabled by default. This functionality is intended for developers tracking down error paths. Since I do not know whether I am the only one considering it useful in the first place, this is just an RFC for now. Furthermore, I am not sure whether a configure switch is really the right way to implement this. --- configure| 12 +++ include/qapi/error.h | 40 +++- util/error.c | 57 +++- 3 files changed, 90 insertions(+), 19 deletions(-) diff --git a/configure b/configure index e989609..4e43f7b 100755 --- a/configure +++ b/configure @@ -243,6 +243,7 @@ gtk= gtkabi=2.0 tpm=no libssh2= +error_backtrace=no # parse CC options first for opt do @@ -949,6 +950,10 @@ for opt do ;; --enable-libssh2) libssh2=yes ;; + --enable-error-bt) error_backtrace=yes + ;; + --disable-error-bt) error_backtrace=no + ;; *) echo ERROR: unknown option $opt; show_help=yes ;; esac @@ -1168,6 +1173,8 @@ echo --gcov=GCOV use specified gcov [$gcov_tool] echo --enable-tpm enable TPM support echo --disable-libssh2disable ssh block device support echo --enable-libssh2 enable ssh block device support +echo --disable-error-bt disable backtraces on internal errors +echo --enable-error-btenable backtraces on internal errors echo echo NOTE: The object files are built at the place where configure is launched exit 1 @@ -3650,6 +3657,7 @@ echo gcov $gcov_tool echo gcov enabled $gcov echo TPM support $tpm echo libssh2 support $libssh2 +echo error backtraces $error_backtrace echo TPM passthrough $tpm_passthrough echo QOM debugging $qom_cast_debug @@ -4044,6 +4052,10 @@ if test $libssh2 = yes ; then echo CONFIG_LIBSSH2=y $config_host_mak fi +if test $error_backtrace = yes ; then + echo CONFIG_ERROR_BACKTRACE=y $config_host_mak +fi + if test $virtio_blk_data_plane = yes ; then echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' $config_host_mak fi diff --git a/include/qapi/error.h b/include/qapi/error.h index ffd1cea..d3cfe35 100644 --- a/include/qapi/error.h +++ b/include/qapi/error.h @@ -25,16 +25,28 @@ typedef struct Error Error; /** * Set an indirect pointer to an error given a ErrorClass value and a * printf-style human message. This function is not meant to be used outside - * of QEMU. + * of QEMU. The message will be prepended by the file/function/line information + * if CONFIG_ERROR_BACKTRACE is defined. */ -void error_set(Error **err, ErrorClass err_class, const char *fmt, ...) GCC_FMT_ATTR(3, 4); +void error_set_bt(const char *file, const char *func, int line, + Error **err, ErrorClass err_class, const char *fmt, ...) +GCC_FMT_ATTR(6, 7); + +#define error_set(...) \ +error_set_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Set an indirect pointer to an error given a ErrorClass value and a * printf-style human message, followed by a strerror() string if - * @os_error is not zero. + * @os_error is not zero. The message will be prepended by the + * file/function/line information if CONFIG_ERROR_BACKTRACE is defined. */ -void error_set_errno(Error **err, int os_error, ErrorClass err_class, const char *fmt, ...) GCC_FMT_ATTR(4, 5); +void error_set_errno_bt(const char *file, const char *func, int line, +Error **err, int os_error, ErrorClass err_class, +const char *fmt, ...) GCC_FMT_ATTR(7, 8); + +#define error_set_errno(...) \ +error_set_errno_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Same as error_set(), but sets a generic error @@ -47,7 +59,11 @@ void error_set_errno(Error **err, int os_error, ErrorClass err_class, const char /** * Helper for open() errors */ -void error_setg_file_open(Error **errp, int os_errno, const char *filename); +void error_setg_file_open_bt(const char *file, const char *func, int line, + Error **errp, int os_errno, const char *filename); + +#define error_setg_file_open(...) \ +error_setg_file_open_bt(__FILE__, __func__, __LINE__, __VA_ARGS__) /** * Returns true if an indirect pointer to an error is pointing to a valid @@ -71,11 +87,17 @@ Error *error_copy(const Error *err); const char *error_get_pretty(Error *err); /** - *
Re: [Qemu-devel] [PATCH V3] xl: HVM domain S3 bugfix
On Mon, 2013-09-09 at 03:29 +, Liu, Jinsong wrote: From 18344216b432648605726b137b348f28ef64a4ef Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 23 Aug 2013 23:30:23 +0800 Subject: [PATCH V3] xl: HVM domain S3 bugfix Currently Xen hvm s3 has a bug coming from the difference between qemu-traditioanl and qemu-xen. For qemu-traditional, the way to traditional resume from hvm s3 is via 'xl trigger' command. However, for qemu-xen, the way to resume from hvm s3 inherited from standard qemu, i.e. via QMP, and it doesn't work under Xen. The root cause is, for qemu-xen, 'xl trigger' command didn't reset devices, while QMP didn't unpause hvm domain though they did qemu system reset. We have two qemu patches one xl patch to fix the HVM S3 bug: This patch is the xl patch. It invokes QMP system_wakeup so that qemu logic for hvm s3 could be triggerred. triggered Signed-off-by: Liu Jinsong jinsong@intel.com Acked-by: Ian Campbell ian.campb...@citrix.com I'll hold off on applying until someone givers me a heads up that the qemu side is in place. I'll try and remember to fix the commit message typos as I commit, so no need to resend for those. --- tools/libxl/libxl.c | 31 +-- tools/libxl/libxl_internal.h |2 ++ tools/libxl/libxl_qmp.c |5 + 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 81785df..267bc25 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4641,10 +4641,37 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, uint32_t domid, return ret; } +static int libxl__domain_s3_resume(libxl__gc *gc, int domid) +{ +int rc = 0; + +switch (libxl__domain_type(gc, domid)) { +case LIBXL_DOMAIN_TYPE_HVM: +switch (libxl__device_model_version_running(gc, domid)) { +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: +rc = xc_set_hvm_param(CTX-xch, domid, HVM_PARAM_ACPI_S_STATE, 0); +break; +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: +rc = libxl__qmp_system_wakeup(gc, domid); +break; +default: +rc = ERROR_INVAL; +break; +} +break; +default: +rc = ERROR_INVAL; +break; +} + +return rc; +} + int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid, libxl_trigger trigger, uint32_t vcpuid) { int rc; +GC_INIT(ctx); switch (trigger) { case LIBXL_TRIGGER_POWER: @@ -4668,8 +4695,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid, XEN_DOMCTL_SENDTRIGGER_RESET, vcpuid); break; case LIBXL_TRIGGER_S3RESUME: -xc_set_hvm_param(ctx-xch, domid, HVM_PARAM_ACPI_S_STATE, 0); -rc = 0; +rc = libxl__domain_s3_resume(gc, domid); break; default: rc = -1; @@ -4684,6 +4710,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid, rc = ERROR_FAIL; } +GC_FREE; return rc; } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index f051d91..0ef0a3a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1408,6 +1408,8 @@ _hidden int libxl__qmp_query_serial(libxl__qmp_handler *qmp); _hidden int libxl__qmp_pci_add(libxl__gc *gc, int d, libxl_device_pci *pcidev); _hidden int libxl__qmp_pci_del(libxl__gc *gc, int domid, libxl_device_pci *pcidev); +/* Resume hvm domain */ +_hidden int libxl__qmp_system_wakeup(libxl__gc *gc, int domid); /* Suspend QEMU. */ _hidden int libxl__qmp_stop(libxl__gc *gc, int domid); /* Resume QEMU. */ diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index f681f3a..3c3b301 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -878,6 +878,11 @@ int libxl__qmp_pci_del(libxl__gc *gc, int domid, libxl_device_pci *pcidev) return qmp_device_del(gc, domid, id); } +int libxl__qmp_system_wakeup(libxl__gc *gc, int domid) +{ +return qmp_run_command(gc, domid, system_wakeup, NULL, NULL, NULL); +} + int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename) { libxl__json_object *args = NULL;
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ OK + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. I wanted to add the support for bridges later, I am struggling with some issues with PCI bridges. Shouldn't be very different. The current implementation uses only one MemoryRegion that spans over all pci address space. It catches all the accesses both to primary bus addresses (bus 0), or to the regions covered by the bridges. That's not what I see in code. pci_bridge_initfn creates address spaces for IO and memory. Bridge windows are aliases created by pci_bridge_init_alias. All they do is forward transactions to the secondary bus, so these address spaces represent secondary bus addressing. What did I miss? The callback ALWAYS receive a pointer to the primary bus. What would be a better approach? 1. Remain with a single MemoryRegion and check if the address belongs to a bridge address range and recursively travel the bridges tree and update the registers 2. Model a MemoryRegion for each bridge that represents the address range behind the bridge (not existing today). I think this is exactly what address_space_XXX represent. Add a MemoryRegion child to catch accesses to unassigned addresses behind the bridge, then recursively travel the bridges to top and update the registers. This bit sounds right. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 + static void pci_bus_init(PCIBus *bus, DeviceState *parent, const char *name, MemoryRegion *address_space_mem, I would rename unassigned to pci_master_Abort_ everywhere. Are you sure about the capital A? Ugh no, that's a typo. For the memory ops I suppose is OK, but the MemoryRegion name I think it should remain unassigned_mem; it will be strange to name it pci_master_Abort_mem. Why would it be strange? 1. Because it states what happens when there is a an access to
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 02:02:47PM +0100, Peter Maydell wrote: On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote: It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. Yes. As a shortcut, what I suggest is registering the device that wants to be notified of master aborts with the bus. Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? -- PMM Not anymore I think. So yes, I think this will work.
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 04:07:53PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote: On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote: The scenario is covered only for the primary bus and not for buses behind the PCI bridge (the later being handled differently.) In this case, isn't the Host Bridge always device 00.0? No. For instance the host controller may pass a nonzero value of devfn_min to pci_bus_new/pci_register_bus (in which case the host bridge will end up there; example hw/pci-host/versatile.c) or it might just pass a nonzero devfn to pci_create_simple when it creates the host controller PCI device on the bus (I don't think we have anything that does it that way, though). If my assumption is not generally true, I will not use it Thanks! If not, can we find a way to scan all bus devices and find the host bridge so we will not have to manually add it to each host bridge? It would be conceptually nicer not to treat host bridges as a special case but instead to just report the abort back to whatever the PCI master was (which might be a device doing DMA). That might be a lot of effort though. This is exactly my point. ALL device on the bus can be masters of a DMA transaction. So adding an interface as suggested by Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *) for the general case (a device doing DMA) it is too far from reality. However, if we don't want the master device far all accesses (including DMA) but only for accesses to pci address space, in this specific case we can appoint the HostBridge (explicitly/implicitly) as the master device because most devices do not access pci address space of other devices. As a conclusion, should we call the API pci_set_master_for_master_abort_on_pci_space ? Marcel I like Peter's idea of detecting a host bridge implictly using device type. With a big FIXME explaining that we shouldn't need to do this. -- PMM
Re: [Qemu-devel] [PATCH 1/2] alpha-linux-user: Fix umount syscall numbers
Hi, On Mon, Aug 26, 2013 at 01:26:11PM -0700, Richard Henderson wrote: Ping. Sorry for the delay, adding it to the next pull request. Riku On 08/16/2013 11:24 PM, Richard Henderson wrote: Ping. r~ On 07/24/2013 12:50 PM, Richard Henderson wrote: It has been pointed out on LKML that the alpha umount syscall numbers are named wrong, and a patch to rectify that has been posted for 3.11. Glibc works around this by treating NR_umount as NR_umount2 if NR_oldumount exists. That's more complicated than we need in QEMU, given that we control linux-user/*/syscall_nr.h. This is the last instance of TARGET_NR_oldumount, so delete that from the strace.list. Signed-off-by: Richard Henderson r...@twiddle.net --- linux-user/alpha/syscall_nr.h | 4 ++-- linux-user/strace.list| 3 --- linux-user/syscall.c | 2 +- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h index ac2b6e2..d52d76e 100644 --- a/linux-user/alpha/syscall_nr.h +++ b/linux-user/alpha/syscall_nr.h @@ -20,7 +20,7 @@ #define TARGET_NR_lseek19 #define TARGET_NR_getxpid 20 #define TARGET_NR_osf_mount21 -#define TARGET_NR_umount 22 +#define TARGET_NR_umount2 22 #define TARGET_NR_setuid 23 #define TARGET_NR_getxuid 24 #define TARGET_NR_exec_with_loader 25 /* not implemented */ @@ -255,7 +255,7 @@ #define TARGET_NR_sysinfo 318 #define TARGET_NR__sysctl 319 /* 320 was sys_idle. */ -#define TARGET_NR_oldumount 321 +#define TARGET_NR_umount 321 #define TARGET_NR_swapon 322 #define TARGET_NR_times 323 #define TARGET_NR_personality 324 diff --git a/linux-user/strace.list b/linux-user/strace.list index 08f115d..4f9c364 100644 --- a/linux-user/strace.list +++ b/linux-user/strace.list @@ -612,9 +612,6 @@ #ifdef TARGET_NR_oldstat { TARGET_NR_oldstat, oldstat , NULL, NULL, NULL }, #endif -#ifdef TARGET_NR_oldumount -{ TARGET_NR_oldumount, oldumount , NULL, NULL, NULL }, -#endif #ifdef TARGET_NR_olduname { TARGET_NR_olduname, olduname , NULL, NULL, NULL }, #endif diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 00a0390..e42c20e 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -5719,7 +5719,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, unlock_user(p, arg1, 0); } break; -#ifdef TARGET_NR_umount2 /* not on alpha */ +#ifdef TARGET_NR_umount2 case TARGET_NR_umount2: if (!(p = lock_user_string(arg1))) goto efault;
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 14:39 +0100, Peter Maydell wrote: On 9 September 2013 14:29, Marcel Apfelbaum marce...@redhat.com wrote: My issue is that we have at least 2 ways to model the bridges: 1. TYPE_PCI_HOST_BRIDGE * derives from TYPE_SYS_BUS_DEVICE * has a bus * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which derives from TYPE_PCI_DEVICE 2. TYPE_PCIE_HOST_BRIDGE * derives from TYPE_PCI_HOST_BRIDGE which derives from TYPE_SYS_BUS_DEVICE * has a PciDevice and register it to the bus in order to work as (1) So I think there are two different (and slightly confusing) things here: (1) the model of the host's PCI controller; this is typically derived from TYPE_SYS_BUS_DEVICE somehow but I guess in theory need not be; often it's a TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE. (2) the PCI device which sits on the PCI bus and is visible to the guest, usually with a PCI class ID of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile is different, for instance). Currently we generally make these direct subclasses of TYPE_PCI_DEVICE. [The naming convention confusion arises because different controllers are inconsistent about how they name these classes and which type name ends up with 'host' in it.] What you're going to get in the callback is (2)... This I come to understand, thanks! I would like to implement an hierarchy that will allow all the host bridge devices to have a common ancestor In this was, we can scan the PCI bus to look for master... ...and yes, we could insert an extra class and make all those bridge hosts derive from it rather than directly from TYPE_PCI_DEVICE if we needed to. And we solve the main issue - no need for an API for master abort - for upstream: use your suggestion (let's hope it works) - for downstream: look on the bus for a device deriving from this class and *this device* will handle the master abort I'll find a way how to handle the PCI bridges. By the way, I am not sure that the upstream transactions (DMA) can actually end with a master abort. Master abort would happen if a transaction will not be claimed by any device on the bus. But in DMA transactions, maybe the host bridge will always claim it and return with error. Does anybody know something about this? Thanks for all the help! Marcel -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote: On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote: On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote: Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? This is what I would really want to do, but some HOST Bridge devices inherit directly from PCI_DEVICE. TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which is a not a PCI device and does not help us here (not a PCI_DEVICE on the bus) Oops, yes, I get those two the wrong way round a lot. Anyway, if we need to make all host bridges have a common subclass we could certainly refactor them accordingly. Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE and it hold as composition a PCIDevice that will be part of the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly inherits from PCI_DEVICE. This may just be wrong choice of name rather than actually wrong hierarchy. I try not to judge the naming convention, so let's leave it aside for now. My issue is that we have at least 2 ways to model the bridges: 1. TYPE_PCI_HOST_BRIDGE * derives from TYPE_SYS_BUS_DEVICE * has a bus * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which derives from TYPE_PCI_DEVICE 2. TYPE_PCIE_HOST_BRIDGE * derives from TYPE_PCI_HOST_BRIDGE which derives from TYPE_SYS_BUS_DEVICE * has a PciDevice and register it to the bus in order to work as (1) I would like to implement an hierarchy that will allow all the host bridge devices to have a common ancestor In this was, we can scan the PCI bus to look for master... I wouldn't object to is_host is stuct PCIDeviceClass. That's probably easier that trying to create a common class for devices that share no common code. -- PMM
Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization
On Fri, 02 Aug 2013 22:33:24 +0200 Andreas Färber afaer...@suse.de wrote: Am 23.07.2013 18:22, schrieb Igor Mammedov: Signed-off-by: Igor Mammedov imamm...@redhat.com --- vl.c |7 +-- 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/vl.c b/vl.c index 8190504..bf0c658 100644 --- a/vl.c +++ b/vl.c @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_MACHINE); machine = find_default_machine(); cpu_model = NULL; -ram_size = 0; +ram_size = DEFAULT_RAM_SIZE * 1024 * 1024; snapshot = 0; cyls = heads = secs = 0; translation = BIOS_ATA_TRANSLATION_AUTO; @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp) exit(1); } -/* init the memory */ -if (ram_size == 0) { -ram_size = DEFAULT_RAM_SIZE * 1024 * 1024; -} - if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, NULL, 0) != 0) { exit(0); Commit message doesn't give any explanation why? it was intended as cleanup What happens with -m 0? My guess is the old code translates that to the default size, where by intializing the default earlier it would stay. patch is broken in this aspect. It aborts on start up with -m 0 The question is if -m 0 is correct value, perhaps QEMU should exit with error message in this case, instead of silent fallback to default? Andreas
Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
On 09/09/2013 01:13 AM, Claudio Fontana wrote: after carefully reading and testing your patches, this is how I suggest to proceed: first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code. No type changes, no refactoring, no beautification. Once we agree on those, introduce the meaningful restructuring you want to do, like the new INSN type, the don't handle mov/movi in tcg_out_op, the TCG_OPF_64BIT thing, etc. Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use 1 and 0: adapt them as well) etc. No, I don't agree. Especially with respect to the insn type. I'd much rather do all the cosmetic stuff, as you put it, first. It makes all of the real changes much easier to understand. r~
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 16:58 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote: Created a MemoryRegion with negative priority that spans over all the pci address space. It intercepts the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write 3. sets the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction Note: This implementation assumes that all the reads/writes to the pci address space are done by the cpu. Signed-off-by: Marcel Apfelbaum marce...@redhat.com --- Changes from v1: - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on various Host Bridges - pci-unassgined-mem does not have a .valid.accept field and implements read write methods hw/pci/pci.c | 46 ++ include/hw/pci/pci_bus.h | 1 + 2 files changed, 47 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d00682e..b6a8026 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev) return rootbus-qbus.name; } +static void unassigned_mem_access(PCIBus *bus) +{ +/* FIXME assumption: memory access to the pci address + * space is always initiated by the host bridge /* Always * like * this */ /* Not * like this */ OK + * (device 0 on the bus) */ Can't we pass the master device in? We are still left with the assumption that there's a single master, but at least we get rid of the assumption that it's always device 0 function 0. +PCIDevice *d = bus-devices[0]; +if (!d) { +return; +} + +pci_word_test_and_set_mask(d-config + PCI_STATUS, + PCI_STATUS_REC_MASTER_ABORT); Probably should check and set secondary status for a bridge device. I wanted to add the support for bridges later, I am struggling with some issues with PCI bridges. Shouldn't be very different. The current implementation uses only one MemoryRegion that spans over all pci address space. It catches all the accesses both to primary bus addresses (bus 0), or to the regions covered by the bridges. That's not what I see in code. pci_bridge_initfn creates address spaces for IO and memory. Bridge windows are aliases created by pci_bridge_init_alias. All they do is forward transactions to the secondary bus, so these address spaces represent secondary bus addressing. What did I miss? I was referring to *my* implementation of unassigned mem region, not to pci bridge's regions. The callback ALWAYS receive a pointer to the primary bus. What would be a better approach? 1. Remain with a single MemoryRegion and check if the address belongs to a bridge address range and recursively travel the bridges tree and update the registers 2. Model a MemoryRegion for each bridge that represents the address range behind the bridge (not existing today). I think this is exactly what address_space_XXX represent. Today I see (using info mtree) under the bridge: 1. A container memory region for all the space 0 - 2^64 2. Regions for the devices under the bridge I do not see a region corresponding with the bridge memory region that would be forwarded to the secondary bus. Thanks, Marcel Add a MemoryRegion child to catch accesses to unassigned addresses behind the bridge, then recursively travel the bridges to top and update the registers. This bit sounds right. +} + +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); + +return -1ULL; +} + +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val, +unsigned size) +{ +PCIBus *bus = opaque; +unassigned_mem_access(bus); +} + +static const MemoryRegionOps unassigned_mem_ops = { +.read = unassigned_mem_read, +.write = unassigned_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +#define UNASSIGNED_MEM_PRIORITY -1 +
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On Mon, 2013-09-09 at 17:04 +0300, Michael S. Tsirkin wrote: On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote: On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote: On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote: On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote: Can you just pick the device which is (a subclass of) TYPE_PCI_HOST_BRIDGE, or do we have host bridges which aren't using that class? This is what I would really want to do, but some HOST Bridge devices inherit directly from PCI_DEVICE. TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which is a not a PCI device and does not help us here (not a PCI_DEVICE on the bus) Oops, yes, I get those two the wrong way round a lot. Anyway, if we need to make all host bridges have a common subclass we could certainly refactor them accordingly. Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE and it hold as composition a PCIDevice that will be part of the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly inherits from PCI_DEVICE. This may just be wrong choice of name rather than actually wrong hierarchy. I try not to judge the naming convention, so let's leave it aside for now. My issue is that we have at least 2 ways to model the bridges: 1. TYPE_PCI_HOST_BRIDGE * derives from TYPE_SYS_BUS_DEVICE * has a bus * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which derives from TYPE_PCI_DEVICE 2. TYPE_PCIE_HOST_BRIDGE * derives from TYPE_PCI_HOST_BRIDGE which derives from TYPE_SYS_BUS_DEVICE * has a PciDevice and register it to the bus in order to work as (1) I would like to implement an hierarchy that will allow all the host bridge devices to have a common ancestor In this was, we can scan the PCI bus to look for master... I wouldn't object to is_host is stuct PCIDeviceClass. That's probably easier that trying to create a common class for devices that share no common code. This will work too, and less changes to the code. However Peter suggested that we can unify both upstream and downstream handling of the master abort by putting it all in this class. But if we have is_host flag, we can differentiate regular devices from host devices and handle them different. If there are no objections, I will chose this path. Thanks! Marcel -- PMM
Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses
On 9 September 2013 15:04, Marcel Apfelbaum marce...@redhat.com wrote: By the way, I am not sure that the upstream transactions (DMA) can actually end with a master abort. Master abort would happen if a transaction will not be claimed by any device on the bus. But in DMA transactions, maybe the host bridge will always claim it and return with error. Does anybody know something about this? No, it's perfectly possible for a bus master transaction to abort. The PC's host controller happens to be set up so that bus master DMA covers the whole of the PCI memory space and so it's probably not possible to get an abort on that platform, but this isn't necessarily the case. For instance the versatilePB's PCI controller only responds to accesses within its programmed MMIO BAR ranges, so if the device or the controller have been misconfigured you can get an abort when the device tries to do DMA. (This usually causes the device to decide something has gone seriously wrong. For instance an EHCI USB controller device will stop and set the STS_FATAL bit in its status register: http://lxr.free-electrons.com/source/include/linux/usb/ehci_def.h#L96 If we wanted to implement this correctly we would need to be able to return an access succeeded/failed indicator from dma_memory_write(): check hw/usb/hcd-ehci.c:get_dwords() (which already has the logic for whoops, DMA failed for the extremely simple case of no DMA address space). -- PMM