date:20130909

Re: [Qemu-devel] [PATCH v3 0/3] Make mixer emulation configurable at runtime

2013-09-09 Thread Gerd Hoffmann

On Sa, 2013-09-07 at 00:53 -0400, Bandan Das wrote:
 Currently, hda-codec mixer emulation can only be enabled by using the
 --enable-mixemu option to configure at compile time with the default
 value being off. These patches enable making mixer emulation
 selectable
 at runtime which is more convenient. Consequently, the last patch in
 this 
 series removes the mixemu configuration option.

Reviewed-by: Gerd Hoffmann kra...@redhat.com

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexey Kardashevskiy

On 09/09/2013 03:50 PM, Alexander Graf wrote:
 
 
 Am 09.09.2013 um 04:40 schrieb Alexey Kardashevskiy a...@ozlabs.ru:
 
 On 09/06/2013 01:11 AM, Alexander Graf wrote:

 On 05.09.2013, at 16:26, Benjamin Herrenschmidt wrote:

 On Thu, 2013-09-05 at 16:14 +0200, Andreas Färber wrote:

 Are you thinking of POWER8 having a different frequency than POWER8 in
 compat mode? Because migration from one -cpu to another is not supported
 elsewhere.

 Even if we want to migrate from one POWER7 revision to another, we
 should let the destination use the revision of the source (guest ABI!),
 via property if need be. Anything else will lead to confusion as to what
 is supported and what is not. That -cpu host is the default for
 convenience shouldn't relieve admins/libvirt to think about sensible
 setups like they have to on x86.

 Besides POWER8 uses 512Mhz too :-) It's been architected so it's
 unlikely to change from now on.

 Ok, ditch the tb frequency then. We can always default to 512Mhz when it's 
 absent.


 QEMU uses what kvmppc_get_tbfreq() returns. And ppc_tb_t carries this
 value. It does not use 512MHz automatically but migration should always
 assume 512MHz :-/

 If we remove it, I would like to add assert(tb_freq!=512MHz) into
 timebase_pre_save() and timebase_post_load(), is that ok?
 
 Good point. This would break TCG migration, right?


In TCG we do not need any timebase offset at all, the time should migrate
as is, no? :)

It could break something like power5 migration under PR KVM, do not know...


-- 
Alexey

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexander Graf



Am 09.09.2013 um 07:58 schrieb Alexey Kardashevskiy a...@ozlabs.ru:

 On 09/09/2013 03:50 PM, Alexander Graf wrote:
 
 
 Am 09.09.2013 um 04:40 schrieb Alexey Kardashevskiy a...@ozlabs.ru:
 
 On 09/06/2013 01:11 AM, Alexander Graf wrote:
 
 On 05.09.2013, at 16:26, Benjamin Herrenschmidt wrote:
 
 On Thu, 2013-09-05 at 16:14 +0200, Andreas Färber wrote:
 
 Are you thinking of POWER8 having a different frequency than POWER8 in
 compat mode? Because migration from one -cpu to another is not supported
 elsewhere.
 
 Even if we want to migrate from one POWER7 revision to another, we
 should let the destination use the revision of the source (guest ABI!),
 via property if need be. Anything else will lead to confusion as to what
 is supported and what is not. That -cpu host is the default for
 convenience shouldn't relieve admins/libvirt to think about sensible
 setups like they have to on x86.
 
 Besides POWER8 uses 512Mhz too :-) It's been architected so it's
 unlikely to change from now on.
 
 Ok, ditch the tb frequency then. We can always default to 512Mhz when it's 
 absent.
 
 
 QEMU uses what kvmppc_get_tbfreq() returns. And ppc_tb_t carries this
 value. It does not use 512MHz automatically but migration should always
 assume 512MHz :-/
 
 If we remove it, I would like to add assert(tb_freq!=512MHz) into
 timebase_pre_save() and timebase_post_load(), is that ok?
 
 Good point. This would break TCG migration, right?
 
 
 In TCG we do not need any timebase offset at all, the time should migrate
 as is, no? :)

I hope so, but we need to make sure we don't assert() in that case :).

 
 It could break something like power5 migration under PR KVM, do not know...

Well, today we don't support full migration with PR KVM yet - IIRC we don't 
have access to all state. But that may change in the future.

I think it's ok to restrict live migration to machines with the same tb 
frequency when kvm is enabled. Whether you implement it through a hardcoded 
512Mhz or through a timebase value that gets live migrated and then compared is 
up to you - both ways work for me :).

Alex

 
 
 -- 
 Alexey

Re: [Qemu-devel] [PATCH V12 00/13] Add support for binding guest numa nodes to host numa nodes

2013-09-09 Thread Wanlong Gao

Hi folks,

Any comments? ;-P

Wanlong Gao

 As you know, QEMU can't direct it's memory allocation now, this may cause
 guest cross node access performance regression.
 And, the worse thing is that if PCI-passthrough is used,
 direct-attached-device uses DMA transfer between device and qemu process.
 All pages of the guest will be pinned by get_user_pages().
 
 KVM_ASSIGN_PCI_DEVICE ioctl
   kvm_vm_ioctl_assign_device()
 =kvm_assign_device()
   = kvm_iommu_map_memslots()
 = kvm_iommu_map_pages()
= kvm_pin_pages()
 
 So, with direct-attached-device, all guest page's page count will be +1 and
 any page migration will not work. AutoNUMA won't too.
 
 So, we should set the guest nodes memory allocation policy before
 the pages are really mapped.
 
 According to this patch set, we are able to set guest nodes memory policy
 like following:
 
  -numa node,nodeid=0,cpus=0, \
  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
  -numa node,nodeid=1,cpus=1 \
  -numa mem,size=1024M,policy=interleave,host-nodes=1
 
 This supports 
 policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N 
 like format.
 
 
 Also add set-mem-policy QMP and hmp command to set memory policy.
 
 And add a QMP command query-numa to show numa info through
 this API.
 
 And convert the info numa monitor command to use this
 QMP command query-numa.
 
 
 V1-V2:
 change to use QemuOpts in numa options (Paolo)
 handle Error in mpol parser (Paolo)
 change qmp command format to mem-policy=membind,mem-hostnode=0-1 like 
 (Paolo)
 V2-V3:
 also handle Error in cpus parser (5/10)
 split out common parser from cpus and hostnode parser (Bandan 6/10)
 V3-V4:
 rebase to request for comments
 V4-V5:
 use OptVisitor and split -numa option (Paolo)
  - s/set-mpol/set-mem-policy (Andreas)
  - s/mem-policy/policy
  - s/mem-hostnode/host-nodes
 fix hmp command process after error (Luiz)
 add qmp command query-numa and convert info numa to it (Luiz)
 V5-V6:
 remove tabs in json file (Laszlo, Paolo)
 add back -numa node,mem=xxx as legacy (Paolo)
 change cpus and host-nodes to array (Laszlo, Eric)
 change nodeid to uint16
 add NumaMemPolicy enum type (Eric)
 rebased on Laszlo's OptsVisitor: support / flatten integer ranges for 
 repeating options patch set, thanks for Laszlo's help
 V6-V7:
 change UInt16 to uint16 (Laszlo)
 fix a typo in adding qmp command set-mem-policy
 V7-V8:
 rebase to current master with Laszlo's V2 of OptsVisitor patch set
 fix an adding white space line error
 V8-V9:
 rebase to current master
 check if total numa memory size is equal to ram_size (Paolo)
 add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo)
 replace the use of numa_num_configured_nodes() (Andrew)
 avoid abusing the fact i==nodeid (Andrew)
 V9-V10:
 rebase to current master
 remove libnuma (Andrew)
 MAX_NODES=64 - MAX_NODES=128 since libnuma selected 128 (Andrew)
 use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew)
 remove a useless clear_bit() operation (Andrew)
 V10-V11:
 rebase to current master
 fix maxnode argument of mbind(2)
 V11-V12:
 rebase to current master
 split patch 02/11 of V11 (Eduardo)
 add some max value check (Eduardo)
 split MAX_NODES change patch (Eduardo)
 
 
 *I hope this can catch up the train of 1.7.*
 
 Thanks,
 Wanlong Gao
 
 Wanlong Gao (13):
   NUMA: move numa related code to new file numa.c
   NUMA: check if the total numa memory size is equal to ram_size
   NUMA: Add numa_info structure to contain numa nodes info
   NUMA: convert -numa option to use OptsVisitor
   NUMA: introduce NumaMemOptions
   NUMA: add -numa mem, options
   NUMA: expand MAX_NODES from 64 to 128
   NUMA: parse guest numa nodes memory policy
   NUMA: set guest numa nodes memory policy
   NUMA: add qmp command set-mem-policy to set memory policy for NUMA
 node
   NUMA: add hmp command set-mem-policy
   NUMA: add qmp command query-numa
   NUMA: convert hmp command info_numa to use qmp command query_numa
 
  Makefile.target |   2 +-
  cpus.c  |  14 --
  hmp-commands.hx |  16 ++
  hmp.c   | 119 +
  hmp.h   |   2 +
  hw/i386/pc.c|   4 +-
  include/sysemu/cpus.h   |   1 -
  include/sysemu/sysemu.h |  18 +-
  monitor.c   |  21 +--
  numa.c  | 460 
 
  qapi-schema.json| 133 ++
  qemu-options.hx |   6 +-
  qmp-commands.hx |  90 ++
  vl.c| 160 ++---
  14 files changed, 861 insertions(+), 185 deletions(-)
  create mode 100644 numa.c

Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller

2013-09-09 Thread Jens Freimann

On Fri, Sep 06, 2013 at 01:23:32PM +0100, Peter Maydell wrote:
 On 6 September 2013 13:19, Jens Freimann jf...@linux.vnet.ibm.com wrote:
  Add symbols required for the s390 floating interrupt controller (flic)
 
 Updates to linux-headers should be the result of a sync
 against a specified mainline kernel revision, please
 (otherwise this should be an RFC patchset).

ok, I understand.  I was not sure about that and only added a remark in 
the cover letter

regards
Jens
 
 thanks
 -- PMM

Re: [Qemu-devel] [PATCH 1/2] linux-headers: update for s390 floating interrupt controller

2013-09-09 Thread Jens Freimann

On Fri, Sep 06, 2013 at 01:32:52PM +0100, Peter Maydell wrote:
 On 6 September 2013 13:19, Jens Freimann jf...@linux.vnet.ibm.com wrote:
  @@ -839,6 +903,7 @@ struct kvm_device_attr {
   #define KVM_DEV_TYPE_FSL_MPIC_20   1
   #define KVM_DEV_TYPE_FSL_MPIC_42   2
   #define KVM_DEV_TYPE_XICS  3
  +#define KVM_DEV_TYPE_FLIC  4
 
 Christoffer's patchset switching the ARM VGIC to this
 list also uses 4 as its enumeration value:
 
 https://lists.cs.columbia.edu/pipermail/kvmarm/2013-August/006822.html
 
 That patchset isn't in yet, but maybe you should use
 5 to avoid conflicts?

sure, will do

regards
Jens

 
 thanks
 -- PMM

Re: [Qemu-devel] [PATCH V3 1/2] qemu: Adjust qemu wakeup

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 05:26, Liu, Jinsong ha scritto:
 From 6f40a66521e012170493964a2135fb3b4ae7c9b2 Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Sun, 8 Sep 2013 00:33:19 +0800
 Subject: [PATCH V3 1/2] qemu: Adjust qemu wakeup
 
 Currently Xen hvm s3 has a bug coming from the difference between
 qemu-traditioanl and qemu-xen. For qemu-traditional, the way to
 resume from hvm s3 is via 'xl trigger' command. However, for
 qemu-xen, the way to resume from hvm s3 inherited from standard
 qemu, i.e. via QMP, and it doesn't work under Xen.
 
 The root cause is, for qemu-xen, 'xl trigger' command didn't reset
 devices, while QMP didn't unpause hvm domain though they did qemu
 system reset.
 
 We have two qemu patches and one xl patch to fix Xen hvm s3 bug.
 This patch is the qemu patch 1. It adjusts qemu wakeup so that
 Xen s3 resume logic (which will be implemented at qemu patch 2)
 will be notified after qemu system reset.
 
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  hw/acpi/core.c  |3 ++-
  include/sysemu/sysemu.h |4 +++-
  vl.c|   15 +++
  3 files changed, 12 insertions(+), 10 deletions(-)
 
 diff --git a/hw/acpi/core.c b/hw/acpi/core.c
 index 7467b88..7138139 100644
 --- a/hw/acpi/core.c
 +++ b/hw/acpi/core.c
 @@ -324,12 +324,13 @@ static void acpi_notify_wakeup(Notifier *notifier, void 
 *data)
  (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_TIMER_STATUS);
  break;
  case QEMU_WAKEUP_REASON_OTHER:
 -default:
  /* ACPI_BITMASK_WAKE_STATUS should be set on resume.
 Pretend that resume was caused by power button */
  ar-pm1.evt.sts |=
  (ACPI_BITMASK_WAKE_STATUS | ACPI_BITMASK_POWER_BUTTON_STATUS);
  break;
 +default:
 +break;
  }
  }
  
 diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
 index b1aa059..50bc44d 100644
 --- a/include/sysemu/sysemu.h
 +++ b/include/sysemu/sysemu.h
 @@ -39,9 +39,11 @@ int vm_stop(RunState state);
  int vm_stop_force_state(RunState state);
  
  typedef enum WakeupReason {
 -QEMU_WAKEUP_REASON_OTHER = 0,
 +/* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
 +QEMU_WAKEUP_REASON_NONE = 0,
  QEMU_WAKEUP_REASON_RTC,
  QEMU_WAKEUP_REASON_PMTIMER,
 +QEMU_WAKEUP_REASON_OTHER,
  } WakeupReason;
  
  void qemu_system_reset_request(void);
 diff --git a/vl.c b/vl.c
 index b4b119a..8c5f113 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -1792,14 +1792,14 @@ static pid_t shutdown_pid;
  static int powerdown_requested;
  static int debug_requested;
  static int suspend_requested;
 -static int wakeup_requested;
 +static WakeupReason wakeup_reason;
  static NotifierList powerdown_notifiers =
  NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
  static NotifierList suspend_notifiers =
  NOTIFIER_LIST_INITIALIZER(suspend_notifiers);
  static NotifierList wakeup_notifiers =
  NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 -static uint32_t wakeup_reason_mask = ~0;
 +static uint32_t wakeup_reason_mask = ~(1  QEMU_WAKEUP_REASON_NONE);
  static RunState vmstop_requested = RUN_STATE_MAX;
  
  int qemu_shutdown_requested_get(void)
 @@ -1849,11 +1849,9 @@ static int qemu_suspend_requested(void)
  return r;
  }
  
 -static int qemu_wakeup_requested(void)
 +static WakeupReason qemu_wakeup_requested(void)
  {
 -int r = wakeup_requested;
 -wakeup_requested = 0;
 -return r;
 +return wakeup_reason;
  }
  
  static int qemu_powerdown_requested(void)
 @@ -1970,8 +1968,7 @@ void qemu_system_wakeup_request(WakeupReason reason)
  return;
  }
  runstate_set(RUN_STATE_RUNNING);
 -notifier_list_notify(wakeup_notifiers, reason);
 -wakeup_requested = 1;
 +wakeup_reason = reason;
  qemu_notify_event();
  }
  
 @@ -2063,6 +2060,8 @@ static bool main_loop_should_exit(void)
  pause_all_vcpus();
  cpu_synchronize_all_states();
  qemu_system_reset(VMRESET_SILENT);
 +notifier_list_notify(wakeup_notifiers, wakeup_reason);
 +wakeup_reason = QEMU_WAKEUP_REASON_NONE;
  resume_all_vcpus();
  monitor_protocol_event(QEVENT_WAKEUP, NULL);
  }
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements

2013-09-09 Thread Claudio Fontana

Hello Richard,

On 02.09.2013 19:54, Richard Henderson wrote:
 I'm not sure if I posted v2 or not, but my branch is named -3,
 therefore this is v3.  ;-)
 
 The jumbo fixme patch from v1 has been split up.  This has been
 updated for the changes in the tlb helpers over the past few weeks.
 For the benefit of trivial conflict resolution, it's relative to a
 tree that contains basically all of my patches.
 
 See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
 you find yourself missing any of the dependencies.
 
 
 r~
 Richard Henderson (29):
   tcg-aarch64: Set ext based on TCG_OPF_64BIT
   tcg-aarch64: Change all ext variables to bool
   tcg-aarch64: Don't handle mov/movi in tcg_out_op
   tcg-aarch64: Hoist common argument loads in tcg_out_op
   tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
   tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
   tcg-aarch64: Introduce tcg_fmt_* functions
   tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
   tcg-aarch64: Implement mov with tcg_fmt_* functions
   tcg-aarch64: Handle constant operands to add, sub, and compare
   tcg-aarch64: Handle constant operands to and, or, xor
   tcg-aarch64: Support andc, orc, eqv, not
   tcg-aarch64: Handle zero as first argument to sub
   tcg-aarch64: Support movcond
   tcg-aarch64: Support deposit
   tcg-aarch64: Support add2, sub2
   tcg-aarch64: Support muluh, mulsh
   tcg-aarch64: Support div, rem
   tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
   tcg-aarch64: Improve tcg_out_movi
   tcg-aarch64: Avoid add with zero in tlb load
   tcg-aarch64: Use adrp in tcg_out_movi
   tcg-aarch64: Pass return address to load/store helpers directly.
   tcg-aarch64: Use tcg_out_call for qemu_ld/st
   tcg-aarch64: Use symbolic names for branches
   tcg-aarch64: Implement tcg_register_jit
   tcg-aarch64: Reuse FP and LR in translated code
   tcg-aarch64: Introduce tcg_out_ldst_pair
   tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check
 
  include/exec/exec-all.h  |   18 -
  tcg/aarch64/tcg-target.c | 1276 
 ++
  tcg/aarch64/tcg-target.h |   76 +--
  3 files changed, 867 insertions(+), 503 deletions(-)
 

after carefully reading and testing your patches, this is how I suggest to 
proceed: 

first do the implementation of the new functionality (tcg opcodes, jit) in a 
way that is consistent with the existing code.
No type changes, no refactoring, no beautification.

Once we agree on those, introduce the meaningful restructuring you want to do,
like the new INSN type, the don't handle mov/movi in tcg_out_op, the 
TCG_OPF_64BIT thing, etc.

Last do the cosmetic stuff if you really want to do it, like the change all ext 
to bool (note that there is no point if the callers still use 1 and 0: 
adapt them as well) etc.

Right now the patchset is difficult to digest, given the regressions it 
introduces coupled with a mixing of functional changes, restructuring and 
cosmetics.

I think this will allow us to proceed quicker towards agreement.

Thanks,

Claudio

[Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets

2013-09-09 Thread Hans de Goede

The isoc packet path in the ehci code has a bad qobject cast, causing an
abort, this patch fixes this.

Note this problem is backported in 1.6.0 too, and this patch should be
backported to the 1.6.0 stable tree.

Signed-off-by: Hans de Goede hdego...@redhat.com
---
 hw/usb/hcd-ehci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 010a0d0..77c4872 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as);
+qemu_sglist_init(ehci-isgl, BUS(ehci-bus)-parent,
+ 2, ehci-as);
 if (off + len  4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
-- 
1.8.3.1

[Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb/hcd-ehci.c | 3 ++-
 hw/usb/hcd-ehci.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..162680c 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1486,7 +1486,7 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as);
+qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as);
 if (off + len  4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
@@ -2529,6 +2529,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)
 
 s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
 s-async_bh = qemu_bh_new(ehci_frame_timer, s);
+s-device = dev;
 
 qemu_register_reset(ehci_reset, s);
 qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;
 
 struct EHCIState {
 USBBus bus;
+DeviceState *device;
 qemu_irq irq;
 MemoryRegion mem;
 AddressSpace *as;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 08/09/2013 13:40, Gleb Natapov ha scritto:
 On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
 On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
 and not restore anything.

 XRSTOR restores FP/SSE state to reset state if no bits are set in
 xstate_bv. This is what should happen on reset, no?

Yes. The problem happens on the migration destination when XSAVE data is
not transmitted.  FP/SSE data is transmitted and must be restored, but
xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
state.  The vcpu then loses the values that were set in the migration data.

 Since FP and SSE data are always valid, set them in xstate_bv at reset
 time.  In fact, that value is the same that KVM_GET_XSAVE returns on
 pre-XSAVE hosts.
 It is needed for migration between non xsave host to xsave host.

Yes, and this patch does the same for migration between non-XSAVE QEMU
and XSAVE QEMU.

In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
xstate_bv when XSAVE is not available.  Instead, it should reset the
FXSAVE data to processor-reset values (except for MXCSR which always
comes from XRSTOR data), i.e. to all-zeros except for the x87 control
and tag words.  It should also check reserved bits of MXCSR.


 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  target-i386/cpu.c | 1 +
  target-i386/cpu.h | 5 +
  2 files changed, 6 insertions(+)

 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index c36345e..ac83106 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s)
  env-fpuc = 0x37f;
  
  env-mxcsr = 0x1f80;
 +env-xstate_bv = XSTATE_FP | XSTATE_SSE;
  
  env-pat = 0x0007040600070406ULL;
  env-msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 5723eff..a153078 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -380,6 +380,11 @@
  
  #define MSR_VM_HSAVE_PA 0xc0010117
  
 +#define XSTATE_SUPPORTED(XSTATE_FP|XSTATE_SSE|XSTATE_YMM)
 Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D 
 then too.

Yes.  QEMU unmarshals information from the XSAVE region and back, so it
cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.

Paolo

 
 +#define XSTATE_FP   1
 +#define XSTATE_SSE  2
 +#define XSTATE_YMM  4
 +
  /* CPUID feature words */
  typedef enum FeatureWord {
  FEAT_1_EDX, /* CPUID[1].EDX */
 -- 
 1.8.3.1

 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
  the other question: when I change the buffer size #define IO_BUF_SIZE 32768
  to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
  
  Is this for cache=unsafe as well?
  
  Juan, any specific reason for using 32k? I think it would be better to
  have a multiple of the qcow2 cluster size, otherwise we get COW for the
  empty part of newly allocated clusters. If we can't make it dynamic,
  using at least fixed 64k to match the qcow2 default would probably
  improve things a bit.
  
 with cache=writeback.  Is there any risk for setting cache=writeback with
 IO_BUF_SIZE 1M ?

No. Using a larger buffer size should be safe.

Kevin

 ━━━
 xuanmao_001
  
 From: Kevin Wolf
 Date: 2013-09-06 18:38
 To: xuanmao_001
 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
 Subject: Re: savevm too slow
 Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
  Hi, qemuers:
   
 
  I found that the guest disk file cache mode will affect to the time of 
 savevm.
   
  the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
  less than 10 seconds.
   
  here is the example I use virsh:
  @cache with writeback:
  #the first snapshot
  real0m21.904s
  user0m0.006s
  sys 0m0.008s
   
  #the secondary snapshot
  real2m11.624s
  user0m0.013s
  sys 0m0.008s
   
  @cache with unsafe:
  #the first snapshot
  real0m0.730s
  user0m0.006s
  sys 0m0.005s
   
  #the secondary snapshot
  real0m1.296s
  user0m0.002s
  sys 0m0.008s
  
 I sent patches that should eliminate the difference between the first
 and second snapshot at least.
  
  so, what the difference between them when using different cache.
  
 cache=unsafe ignores any flush requests. It's possible that there is
 potential for optimisation with cache=writeback, i.e. it sends flush
 requests that aren't necessary in fact. This is something that I haven't
 checked yet.
  
  the other question: when I change the buffer size #define IO_BUF_SIZE 32768
  to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
  
 Is this for cache=unsafe as well?
  
 Juan, any specific reason for using 32k? I think it would be better to
 have a multiple of the qcow2 cluster size, otherwise we get COW for the
 empty part of newly allocated clusters. If we can't make it dynamic,
 using at least fixed 64k to match the qcow2 default would probably
 improve things a bit.
  
 Kevin

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread xuanmao_001

 I sent patches that should eliminate the difference between the first
 and second snapshot at least.

where I can find the patches that can eliminate the difference between the first
and second snapshot ? Does they fit qemu-kvm-1.0,1 ?




xuanmao_001

From: Kevin Wolf
Date: 2013-09-09 16:35
To: xuanmao_001
CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
Subject: Re: Re: savevm too slow
Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
  the other question: when I change the buffer size #define IO_BUF_SIZE 32768
  to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
  
  Is this for cache=unsafe as well?
  
  Juan, any specific reason for using 32k? I think it would be better to
  have a multiple of the qcow2 cluster size, otherwise we get COW for the
  empty part of newly allocated clusters. If we can't make it dynamic,
  using at least fixed 64k to match the qcow2 default would probably
  improve things a bit.
  
 with cache=writeback.  Is there any risk for setting cache=writeback with
 IO_BUF_SIZE 1M ?

No. Using a larger buffer size should be safe.

Kevin

 ━━━
 xuanmao_001
  
 From: Kevin Wolf
 Date: 2013-09-06 18:38
 To: xuanmao_001
 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
 Subject: Re: savevm too slow
 Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
  Hi, qemuers:
   
 
  I found that the guest disk file cache mode will affect to the time of 
 savevm.
   
  the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
  less than 10 seconds.
   
  here is the example I use virsh:
  @cache with writeback:
  #the first snapshot
  real0m21.904s
  user0m0.006s
  sys 0m0.008s
   
  #the secondary snapshot
  real2m11.624s
  user0m0.013s
  sys 0m0.008s
   
  @cache with unsafe:
  #the first snapshot
  real0m0.730s
  user0m0.006s
  sys 0m0.005s
   
  #the secondary snapshot
  real0m1.296s
  user0m0.002s
  sys 0m0.008s
  
 I sent patches that should eliminate the difference between the first
 and second snapshot at least.
  
  so, what the difference between them when using different cache.
  
 cache=unsafe ignores any flush requests. It's possible that there is
 potential for optimisation with cache=writeback, i.e. it sends flush
 requests that aren't necessary in fact. This is something that I haven't
 checked yet.
  
  the other question: when I change the buffer size #define IO_BUF_SIZE 32768
  to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
  
 Is this for cache=unsafe as well?
  
 Juan, any specific reason for using 32k? I think it would be better to
 have a multiple of the qcow2 cluster size, otherwise we get COW for the
 empty part of newly allocated clusters. If we can't make it dynamic,
 using at least fixed 64k to match the qcow2 default would probably
 improve things a bit.
  
 Kevin

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 08/09/2013 13:52, Gleb Natapov ha scritto:
 On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
 QEMU moves state from CPUArchState to struct kvm_xsave and back when it
 invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
 region as an opaque blob, it might be impossible to set some state on
 the destination if migrating to an older version.

 This patch blocks migration if it finds that unsupported bits are set
 in the XSTATE_BV header field.  To make this work robustly, QEMU should
 only report in env-xstate_bv those fields that will actually end up
 in the migration stream.
 
 We usually handle host cpu differences in cpuid layer, not by trying to
 validate migration data.

Actually we do both.  QEMU for example detects invalid subsections and
blocks migration, and CPU differences also result in subsections that
the destination does not know.

But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is
not a CPU difference, it is simply invalid migration data.

 i.e CPUID.0D should be configurable and
 management should be able to query QEMU what is supported and prevent
 migration attempt accordingly.

Management is already able to query QEMU of what is supported, because
new XSAVE state is always attached to new CPUID bits in leaves other
than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX).
QEMU should compute 0Dh data based on those bits indeed.

However, KVM_GET/SET_XSAVE should still return all values supported by
the hypervisor, independent of the supported CPUID bits.


 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  target-i386/kvm.c | 3 ++-
  target-i386/machine.c | 4 
  2 files changed, 6 insertions(+), 1 deletion(-)

 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 749aa09..df08a4b 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu)
  sizeof env-fpregs);
  memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
  sizeof env-xmm_regs);
 -env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
 +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV] 
 +XSTATE_SUPPORTED;
 Don't we just drop state here that will not be restored on the
 destination and destination will not be able to tell since we masked
 unsupported bits?

A well-behaved guest should not have modified that state anyway, since:

* the source and destination machines should have the same CPU

* since the destination QEMU does not support the feature, the source
should have masked it as well

* the guest should always probe CPUID before using a feature

There will be only one change for well-behaved guests with this patch
(and the change will be invisible to them since they behave well).
After the patch, KVM_SET_XSAVE will set the extended states to the
processor-reset state instead of all-zeros.  However, all
currently-defined states have a processor-reset state that is equal to
all-zeroes, so this change is theoretical.

In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
and we should hide all features that are not visible in CPUID.  It is
okay, however, to test it in cpu_post_load.

Paolo

  memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
  sizeof env-ymmh_regs);
  return 0;
 diff --git a/target-i386/machine.c b/target-i386/machine.c
 index dc81cde..9e2cfcf 100644
 --- a/target-i386/machine.c
 +++ b/target-i386/machine.c
 @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
  CPUX86State *env = cpu-env;
  int i;
  
 +if (env-xstate_bv  ~XSTATE_SUPPORTED) {
 +return -EINVAL;
 +}
 + 
  /*
   * Real mode guest segments register DPL should be zero.
   * Older KVM version were setting it wrongly.
 -- 
 1.8.3.1
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:40, Gleb Natapov ha scritto:
  On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
  On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
  and not restore anything.
 
  XRSTOR restores FP/SSE state to reset state if no bits are set in
  xstate_bv. This is what should happen on reset, no?
 
 Yes. The problem happens on the migration destination when XSAVE data is
 not transmitted.  FP/SSE data is transmitted and must be restored, but
 xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
 state.  The vcpu then loses the values that were set in the migration data.
 
  Since FP and SSE data are always valid, set them in xstate_bv at reset
  time.  In fact, that value is the same that KVM_GET_XSAVE returns on
  pre-XSAVE hosts.
  It is needed for migration between non xsave host to xsave host.
 
 Yes, and this patch does the same for migration between non-XSAVE QEMU
 and XSAVE QEMU.
 
Can such migration happen? The commit that added xsave support
(f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.

 In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
 xstate_bv when XSAVE is not available.  Instead, it should reset the
 FXSAVE data to processor-reset values (except for MXCSR which always
 comes from XRSTOR data), i.e. to all-zeros except for the x87 control
 and tag words.  It should also check reserved bits of MXCSR.
 
I do not see why.

 
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   target-i386/cpu.c | 1 +
   target-i386/cpu.h | 5 +
   2 files changed, 6 insertions(+)
 
  diff --git a/target-i386/cpu.c b/target-i386/cpu.c
  index c36345e..ac83106 100644
  --- a/target-i386/cpu.c
  +++ b/target-i386/cpu.c
  @@ -2386,6 +2386,7 @@ static void x86_cpu_reset(CPUState *s)
   env-fpuc = 0x37f;
   
   env-mxcsr = 0x1f80;
  +env-xstate_bv = XSTATE_FP | XSTATE_SSE;
   
   env-pat = 0x0007040600070406ULL;
   env-msr_ia32_misc_enable = MSR_IA32_MISC_ENABLE_DEFAULT;
  diff --git a/target-i386/cpu.h b/target-i386/cpu.h
  index 5723eff..a153078 100644
  --- a/target-i386/cpu.h
  +++ b/target-i386/cpu.h
  @@ -380,6 +380,11 @@
   
   #define MSR_VM_HSAVE_PA 0xc0010117
   
  +#define XSTATE_SUPPORTED  (XSTATE_FP|XSTATE_SSE|XSTATE_YMM)
  Supported by whom? By QEMU? We should filer unsupported bits from CPUID.0D 
  then too.
 
 Yes.  QEMU unmarshals information from the XSAVE region and back, so it
 cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.
 
IMO this is the main issue here, not separate bug. If we gonna let guest
use CPU state QEMU does not support we gonna have a bad time.

--
Gleb.

Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Hans de Goede


Hi,

On 09/09/2013 10:20 AM, Gerd Hoffmann wrote:

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495


Looks like we've been working on exactly the same bug at the
same time, but we've come up with slightly different solutions.

If we're going to go this way (which seems the best), you
should also modify the qemu_sglist_init call in
ehci_init_transfer for consistency, and drop the bus and qbus
variable declarations at the top of the functions as those will
be unused then.

Regards,

Hans

Re: [Qemu-devel] savevm too slow

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 10:47 hat xuanmao_001 geschrieben:
  I sent patches that should eliminate the difference between the first
  and second snapshot at least.
  
 where I can find the patches that can
 eliminate the difference between the first
 and second snapshot ? Does they fit qemu-kvm-1.0,1 ?

I sent them to you on Friday, the first email has the following subject
line:

[PATCH 0/2] qcow2: Discard VM state in active L1 after creating snapshot

This patch series is for current git master, and chances are that it
would work for qemu 1.6 as well. It will most likely not apply for qemu
1.0, which is almost two years old.

Kevin

 ━━━
 xuanmao_001
  
 From: Kevin Wolf
 Date: 2013-09-09 16:35
 To: xuanmao_001
 CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
 Subject: Re: Re: savevm too slow
 Am 09.09.2013 um 03:57 hat xuanmao_001 geschrieben:
   the other question: when I change the buffer size #
 define IO_BUF_SIZE 32768
   to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
   
   Is this for cache=unsafe as well?
   
   Juan, any specific reason for using 32k? I think it would be better to
   have a multiple of the qcow2 cluster size, otherwise we get COW for the
   empty part of newly allocated clusters. If we can't make it dynamic,
   using at least fixed 64k to match the qcow2 default would probably
   improve things a bit.
   
  with cache=writeback.  Is there any risk for setting cache=writeback with
  IO_BUF_SIZE 1M ?
  
 No. Using a larger buffer size should be safe.
  
 Kevin
  
 
  
 ━━━
  xuanmao_001
   
  From: Kevin Wolf
  Date: 2013-09-06 18:38
  To: xuanmao_001
  CC: qemu-discuss; qemu-devel; quintela; stefanha; mreitz
  Subject: Re: savevm too slow
  Am 06.09.2013 um 03:31 hat xuanmao_001 geschrieben:
   Hi, qemuers:

  
 
   I found that the guest disk file cache mode will affect to the time of 
 savevm.

  
  the cache 'writeback' too slow. but the cache 'unsafe' is as fast as it can,
   less than 10 seconds.

   here is the example I use virsh:
   @cache with writeback:
   #the first snapshot
   real0m21.904s
   user0m0.006s
   sys 0m0.008s

   #the secondary snapshot
   real2m11.624s
   user0m0.013s
   sys 0m0.008s

   @cache with unsafe:
   #the first snapshot
   real0m0.730s
   user0m0.006s
   sys 0m0.005s

   #the secondary snapshot
   real0m1.296s
   user0m0.002s
   sys 0m0.008s
   
  I sent patches that should eliminate the difference between the first
  and second snapshot at least.
   
   so, what the difference between them when using different cache.
   
  cache=unsafe ignores any flush requests. It's possible that there is
  potential for optimisation with cache=writeback, i.e. it sends flush
  requests that aren't necessary in fact. This is something that I haven't
  checked yet.
   
   the other question: when I change the buffer size #define IO_BUF_SIZE 
   32768
   to #define IO_BUF_SIZE (1 * 1024 * 1024), the savevm is more quickly.
   
  Is this for cache=unsafe as well?
   
  Juan, any specific reason for using 32k? I think it would be better to
  have a multiple of the qcow2 cluster size, otherwise we get COW for the
  empty part of newly allocated clusters. If we can't make it dynamic,
  using at least fixed 64k to match the qcow2 default would probably
  improve things a bit.
   
  Kevin

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:52, Gleb Natapov ha scritto:
  On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
  QEMU moves state from CPUArchState to struct kvm_xsave and back when it
  invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
  region as an opaque blob, it might be impossible to set some state on
  the destination if migrating to an older version.
 
  This patch blocks migration if it finds that unsupported bits are set
  in the XSTATE_BV header field.  To make this work robustly, QEMU should
  only report in env-xstate_bv those fields that will actually end up
  in the migration stream.
  
  We usually handle host cpu differences in cpuid layer, not by trying to
  validate migration data.
 
 Actually we do both.  QEMU for example detects invalid subsections and
 blocks migration, and CPU differences also result in subsections that
 the destination does not know.
 
That's different from what you do here though. If xstate_bv was in its
separate subsection things would be easier, but it is not.

 But as far as QEMU is concerned, setting an unknown bit in XSTATE_BV is
 not a CPU difference, it is simply invalid migration data.
 
  i.e CPUID.0D should be configurable and
  management should be able to query QEMU what is supported and prevent
  migration attempt accordingly.
 
 Management is already able to query QEMU of what is supported, because
 new XSAVE state is always attached to new CPUID bits in leaves other
 than 0Dh (e.g. EAX=07h, ECX=0h returns AVX512 and MPX support in EBX).
 QEMU should compute 0Dh data based on those bits indeed.
If it is computable from other data even better, easier for us.

 
 However, KVM_GET/SET_XSAVE should still return all values supported by
 the hypervisor, independent of the supported CPUID bits.
 
Why?

 
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   target-i386/kvm.c | 3 ++-
   target-i386/machine.c | 4 
   2 files changed, 6 insertions(+), 1 deletion(-)
 
  diff --git a/target-i386/kvm.c b/target-i386/kvm.c
  index 749aa09..df08a4b 100644
  --- a/target-i386/kvm.c
  +++ b/target-i386/kvm.c
  @@ -1291,7 +1291,8 @@ static int kvm_get_xsave(X86CPU *cpu)
   sizeof env-fpregs);
   memcpy(env-xmm_regs, xsave-region[XSAVE_XMM_SPACE],
   sizeof env-xmm_regs);
  -env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV];
  +env-xstate_bv = *(uint64_t *)xsave-region[XSAVE_XSTATE_BV] 
  +XSTATE_SUPPORTED;
  Don't we just drop state here that will not be restored on the
  destination and destination will not be able to tell since we masked
  unsupported bits?
 
 A well-behaved guest should not have modified that state anyway, since:
 
 * the source and destination machines should have the same CPU
 
 * since the destination QEMU does not support the feature, the source
 should have masked it as well
 
 * the guest should always probe CPUID before using a feature
 
The I fail to see what is the purpose of the patch. I see two cases:
1. Each extended state has separate CPUID bit (is this guarantied?)
  - In this case, as you say, by matching CPUID on src and dst we guaranty
that migration data is good.
2. There is a state that is advertised in CPUID.0D, but does not have
   any regular feature CPUID associated with it.
 - In this case this patch will drop valid state that needs to be
   restored.

 There will be only one change for well-behaved guests with this patch
 (and the change will be invisible to them since they behave well).
 After the patch, KVM_SET_XSAVE will set the extended states to the
 processor-reset state instead of all-zeros.  However, all
 currently-defined states have a processor-reset state that is equal to
 all-zeroes, so this change is theoretical.
 
 In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
 and we should hide all features that are not visible in CPUID.  It is
 okay, however, to test it in cpu_post_load.
The kernel should not even return state that is not visible in CPUID.

 
 Paolo
 
   memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
   sizeof env-ymmh_regs);
   return 0;
  diff --git a/target-i386/machine.c b/target-i386/machine.c
  index dc81cde..9e2cfcf 100644
  --- a/target-i386/machine.c
  +++ b/target-i386/machine.c
  @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
   CPUX86State *env = cpu-env;
   int i;
   
  +if (env-xstate_bv  ~XSTATE_SUPPORTED) {
  +return -EINVAL;
  +}
  + 
   /*
* Real mode guest segments register DPL should be zero.
* Older KVM version were setting it wrongly.
  -- 
  1.8.3.1
  
  --
  Gleb.
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

--

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexander Graf


On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:

 On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
 I think it's ok to restrict live migration to machines with the same
 tb frequency when kvm is enabled. Whether you implement it through a
 hardcoded 512Mhz or through a timebase value that gets live migrated
 and then compared is up to you - both ways work for me :).
 
 The latter might be handy if we want to support migration on 970, though
 we don't have the TBU40 stuff there so adjusting the TB in the host
 kernel would be ... problematic.

Well, we could save/restore TB when we enter/exit the guest, no? But I really 
only want to have something that doesn't block the door for someone who wants 
to enable things.


Alex

Re: [Qemu-devel] [PATCH] raw-win32.c: Fix incorrect handling behaviour of small block files

2013-09-09 Thread Kevin Wolf

Am 09.09.2013 um 11:14 hat Tal Kain geschrieben:
 It is a valid case that the read data's size is smaller than the
 requested size since there could be files that are smaller than
 the minimum block size (For ex. when a VMDK disk descriptor file)
 
 Signed-off-by: Tal Kain tal.k...@ravellosystems.com

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Benjamin Herrenschmidt

On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
 I think it's ok to restrict live migration to machines with the same
 tb frequency when kvm is enabled. Whether you implement it through a
 hardcoded 512Mhz or through a timebase value that gets live migrated
 and then compared is up to you - both ways work for me :).

The latter might be handy if we want to support migration on 970, though
we don't have the TBU40 stuff there so adjusting the TB in the host
kernel would be ... problematic.

Cheers,
Ben.

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Benjamin Herrenschmidt

On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote:
 On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:
 
  On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
  I think it's ok to restrict live migration to machines with the same
  tb frequency when kvm is enabled. Whether you implement it through a
  hardcoded 512Mhz or through a timebase value that gets live migrated
  and then compared is up to you - both ways work for me :).
  
  The latter might be handy if we want to support migration on 970, though
  we don't have the TBU40 stuff there so adjusting the TB in the host
  kernel would be ... problematic.
 
 Well, we could save/restore TB when we enter/exit the guest, no? 

Hard to do without introducing drift...

 But I really only want to have something that doesn't block the door for 
 someone who wants to enable things.
 
 
 Alex

Re: [Qemu-devel] [RFC PATCH] spapr: support time base offset migration

2013-09-09 Thread Alexander Graf


On 09.09.2013, at 11:38, Benjamin Herrenschmidt wrote:

 On Mon, 2013-09-09 at 11:32 +0200, Alexander Graf wrote:
 On 09.09.2013, at 11:29, Benjamin Herrenschmidt wrote:
 
 On Mon, 2013-09-09 at 08:06 +0200, Alexander Graf wrote:
 I think it's ok to restrict live migration to machines with the same
 tb frequency when kvm is enabled. Whether you implement it through a
 hardcoded 512Mhz or through a timebase value that gets live migrated
 and then compared is up to you - both ways work for me :).
 
 The latter might be handy if we want to support migration on 970, though
 we don't have the TBU40 stuff there so adjusting the TB in the host
 kernel would be ... problematic.
 
 Well, we could save/restore TB when we enter/exit the guest, no? 
 
 Hard to do without introducing drift...

We could probably quantify the drift through DEC. But I'm personally not too 
eager to worry about live migration on 970 :).


Alex

Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes

2013-09-09 Thread Sethi Varun-B16395

Hi Will,
Just trying to understand the scope of platform device assignment to guest on 
ARM. So, are the AMBA devices also represented in the device tree?

Regards
Varun

 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
 ow...@vger.kernel.org] On Behalf Of Will Deacon
 Sent: Monday, September 09, 2013 2:23 PM
 To: Yoder Stuart-B08248
 Cc: Sethi Varun-B16395; Wood Scott-B07421; Bhushan Bharat-R65777; 'Peter
 Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander Graf';
 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org';
 kvm...@lists.cs.columbia.edu; kvm-...@vger.kernel.org; qemu-
 de...@nongnu.org
 Subject: Re: vfio for platform devices - 9/5/2012 - minutes
 
 On Fri, Sep 06, 2013 at 07:20:19PM +0100, Yoder Stuart-B08248 wrote:
  Adding Will...
 
 [...]
 
   I have a query about the ARM SMMU driver. In the ARM smmu driver I
   see, that bus notifiers are registered for both amba and platform
   bus. Amba is the I/O interconnect, right? Why is bus notifier
   required for the amba bus?
 
 Not sure I follow the question, really. If you have a DMA master
 registered as an AMBA device (e.g. PL330) and it wants to use an SMMU,
 then you need to be registered on that bus.
 
 What would you prefer instead?
 
 Will
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 11:18, Gleb Natapov ha scritto:
 On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:52, Gleb Natapov ha scritto:
 On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
 QEMU moves state from CPUArchState to struct kvm_xsave and back when it
 invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
 region as an opaque blob, it might be impossible to set some state on
 the destination if migrating to an older version.

 This patch blocks migration if it finds that unsupported bits are set
 in the XSTATE_BV header field.  To make this work robustly, QEMU should
 only report in env-xstate_bv those fields that will actually end up
 in the migration stream.

 We usually handle host cpu differences in cpuid layer, not by trying to
 validate migration data.

 Actually we do both.  QEMU for example detects invalid subsections and
 blocks migration, and CPU differences also result in subsections that
 the destination does not know.

 That's different from what you do here though. If xstate_bv was in its
 separate subsection things would be easier, but it is not.

I agree.  And also if YMM was in its separate subsections; future XSAVE
states will likely use subsections (whose presence is keyed off bits in
env-xstate_bv).

 However, KVM_GET/SET_XSAVE should still return all values supported by
 the hypervisor, independent of the supported CPUID bits.

 Why?

Because this is not talking to the guest, it is talking to userspace.

The VCPU state is more than what is visible to the guest, and returning
all of it seems more consistent with the rest of the KVM API.  For
example, KVM_GET_FPU always returns SSE state even if the CPUID lacks
SSE and/or FXSR.

 A well-behaved guest should not have modified that state anyway, since:

 * the source and destination machines should have the same CPU

 * since the destination QEMU does not support the feature, the source
 should have masked it as well

 * the guest should always probe CPUID before using a feature

 The I fail to see what is the purpose of the patch. I see two cases:
 1. Each extended state has separate CPUID bit (is this guarantied?)

Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX).

   - In this case, as you say, by matching CPUID on src and dst we guaranty
 that migration data is good.

But we don't match CPUID on src and destination.  This is something that
the user should do, but it's better if we can test it too.  Subsections
do that for us; I am, in some sense, emulating subsections for the XSAVE
states that are not stored in subsections.

 In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
 and we should hide all features that are not visible in CPUID.  It is
 okay, however, to test it in cpu_post_load.
 
 The kernel should not even return state that is not visible in CPUID.

That's an interesting point of view that I hadn't considered.  But just
like you asked me why it should return state that is not visible in
CPUID, I'm asking you why it should not...

Paolo


 Paolo

  memcpy(env-ymmh_regs, xsave-region[XSAVE_YMMH_SPACE],
  sizeof env-ymmh_regs);
  return 0;
 diff --git a/target-i386/machine.c b/target-i386/machine.c
 index dc81cde..9e2cfcf 100644
 --- a/target-i386/machine.c
 +++ b/target-i386/machine.c
 @@ -278,6 +278,10 @@ static int cpu_post_load(void *opaque, int version_id)
  CPUX86State *env = cpu-env;
  int i;
  
 +if (env-xstate_bv  ~XSTATE_SUPPORTED) {
 +return -EINVAL;
 +}
 + 
  /*
   * Real mode guest segments register DPL should be zero.
   * Older KVM version were setting it wrongly.
 -- 
 1.8.3.1

 --
 Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 11:03, Gleb Natapov ha scritto:
 On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:40, Gleb Natapov ha scritto:
 On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
 On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
 and not restore anything.

 XRSTOR restores FP/SSE state to reset state if no bits are set in
 xstate_bv. This is what should happen on reset, no?

 Yes. The problem happens on the migration destination when XSAVE data is
 not transmitted.  FP/SSE data is transmitted and must be restored, but
 xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
 state.  The vcpu then loses the values that were set in the migration data.

 Since FP and SSE data are always valid, set them in xstate_bv at reset
 time.  In fact, that value is the same that KVM_GET_XSAVE returns on
 pre-XSAVE hosts.
 It is needed for migration between non xsave host to xsave host.

 Yes, and this patch does the same for migration between non-XSAVE QEMU
 and XSAVE QEMU.

 Can such migration happen? The commit that added xsave support
 (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.

Yes, old-new migration can happen.  New-old of course cannot.

 In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
 xstate_bv when XSAVE is not available.  Instead, it should reset the
 FXSAVE data to processor-reset values (except for MXCSR which always
 comes from XRSTOR data), i.e. to all-zeros except for the x87 control
 and tag words.  It should also check reserved bits of MXCSR.

 I do not see why.

Because otherwise it behaves in a subtly different manner for XSAVE and
non-XSAVE hosts.

 Yes.  QEMU unmarshals information from the XSAVE region and back, so it
 cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, though.

 IMO this is the main issue here, not separate bug. If we gonna let guest
 use CPU state QEMU does not support we gonna have a bad time.

We cannot force the guest not to use a feature; all we can do is hide
the CPUID bits so that a well-behaved guest will not use it.  QEMU does
hide CPUID bits for non-supported XSAVE states, except for -cpu host.
 So this will not be a problem except with -cpu host.

Paolo

[Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb/hcd-ehci.c | 7 +++
 hw/usb/hcd-ehci.h | 1 +
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..22bdbf4 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p)
 {
 uint32_t cpage, offset, bytes, plen;
 dma_addr_t page;
-USBBus *bus = p-queue-ehci-bus;
-BusState *qbus = BUS(bus);
 
 cpage  = get_field(p-qtd.token, QTD_TOKEN_CPAGE);
 bytes  = get_field(p-qtd.token, QTD_TOKEN_TBYTES);
 offset = p-qtd.bufptr[0]  ~QTD_BUFPTR_MASK;
-qemu_sglist_init(p-sgl, qbus-parent, 5, p-queue-ehci-as);
+qemu_sglist_init(p-sgl, p-queue-ehci-device, 5, p-queue-ehci-as);
 
 while (bytes  0) {
 if (cpage  4) {
@@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci,
 return -1;
 }
 
-qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as);
+qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as);
 if (off + len  4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
@@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)
 
 s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
 s-async_bh = qemu_bh_new(ehci_frame_timer, s);
+s-device = dev;
 
 qemu_register_reset(ehci_reset, s);
 qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;
 
 struct EHCIState {
 USBBus bus;
+DeviceState *device;
 qemu_irq irq;
 MemoryRegion mem;
 AddressSpace *as;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] ehci: save device pointer in EHCIState

2013-09-09 Thread Gerd Hoffmann

On Mo, 2013-09-09 at 11:04 +0200, Hans de Goede wrote:
 Hi,
 
 On 09/09/2013 10:20 AM, Gerd Hoffmann wrote:
  We'll need a pointer to the actual pci/sysbus device,
  stick a pointer to it into the EHCIState struct.
 
  https://bugzilla.redhat.com/show_bug.cgi?id=1005495
 
 Looks like we've been working on exactly the same bug at the
 same time, but we've come up with slightly different solutions.

Yep, and then git send-email exactly the same minute ;)

 If we're going to go this way (which seems the best), you
 should also modify the qemu_sglist_init call in
 ehci_init_transfer for consistency, and drop the bus and qbus
 variable declarations at the top of the functions as those will
 be unused then.

Makes sense indeed.  Done, v2 sent.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] ehci: Fix crash with isoc usb packets

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 10:20, Hans de Goede ha scritto:
 The isoc packet path in the ehci code has a bad qobject cast, causing an
 abort, this patch fixes this.
 
 Note this problem is backported in 1.6.0 too, and this patch should be
 backported to the 1.6.0 stable tree.
 
 Signed-off-by: Hans de Goede hdego...@redhat.com
 ---
  hw/usb/hcd-ehci.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
 index 010a0d0..77c4872 100644
 --- a/hw/usb/hcd-ehci.c
 +++ b/hw/usb/hcd-ehci.c
 @@ -1486,7 +1486,8 @@ static int ehci_process_itd(EHCIState *ehci,
  return -1;
  }
  
 -qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as);
 +qemu_sglist_init(ehci-isgl, BUS(ehci-bus)-parent,
 + 2, ehci-as);
  if (off + len  4096) {
  /* transfer crosses page border */
  uint32_t len2 = off + len - 4096;
 

... then qemu-stable should be CCed.

Paolo

[Qemu-devel] [PATCH] block/qcow2: Use bdrv_truncate for size amend

2013-09-09 Thread Max Reitz

When amending the size option for a qcow2 image, use bdrv_truncate
instead of qcow2_truncate directly, since the latter will not adjust the
total_sectors count in the BDS structure (whereas the former will).

Signed-off-by: Max Reitz mre...@redhat.com
---
Depends on (follow-up to):
 - block/qcow2: Image file option amendment (series, v5)
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d29547b..259edfd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1997,7 +1997,7 @@ static int qcow2_amend_options(BlockDriverState *bs,
 }
 
 if (new_size) {
-ret = qcow2_truncate(bs, new_size);
+ret = bdrv_truncate(bs, new_size);
 if (ret  0) {
 return ret;
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Paolo Bonzini

Il 06/09/2013 20:41, Eduardo Otubo ha scritto:
 Hello,
 
 Any chance to get this patch applied?
 
 Thanks!

Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?

Paolo

 On 09/04/2013 11:11 AM, Paul Moore wrote:
 On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:
 This was causing Qemu process to hang when using -sandbox on.

 Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175

 Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com

 Works for me.

 Tested-by: Paul Moore pmo...@redhat.com

 ---
   qemu-seccomp.c |1 +
   1 files changed, 1 insertions(+), 0 deletions(-)

 diff --git a/qemu-seccomp.c b/qemu-seccomp.c
 index 37d38f8..69cee44 100644
 --- a/qemu-seccomp.c
 +++ b/qemu-seccomp.c
 @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
 seccomp_whitelist[]
 = { { SCMP_SYS(getuid), 245 },
   { SCMP_SYS(geteuid), 245 },
   { SCMP_SYS(timer_create), 245 },
 +{ SCMP_SYS(times), 245 },
   { SCMP_SYS(exit), 245 },
   { SCMP_SYS(clock_gettime), 245 },
   { SCMP_SYS(time), 245 },

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Peter Maydell

Ping!

thanks
-- PMM


On 25 August 2013 15:59, Peter Maydell peter.mayd...@linaro.org wrote:
 The bswap.h header includes a set of legacy unaligned functions
 that (since commit c732a52d3 at the beginning of this year) are
 just wrappers for underlying {ld,st}type functions. The legacy
 functions aren't used in many places, so just replace all their
 uses with uses of the new-style {ld,st} functions; this lets us
 remove the legacy wrappers altogether.

 Since we know the {ld,st}* routines are definitely functions,
 we can in the process remove some casts which were left over
 from when the legacy unaligned functions were previously macros.

 The patchset is divided up by function being removed, rather
 than by which device/subsystem is being fixed; I think this way
 round is easier to review since you only have to keep one
 substitution in your head when reading a patch.

 Peter Maydell (9):
   bswap.h: Remove cpu_to_le16wu()
   bswap.h: Remove cpu_to_le32wu()
   bswap.h: Remove le16_to_cpupu()
   bswap.h: Remove le32_to_cpupu()
   bswap.h: Remove be32_to_cpupu()
   bswap.h: Remove cpu_to_be16wu()
   bswap.h: Remove cpu_to_be32wu()
   bswap.h: Remove cpu_to_be64wu()
   bswap.h: Remove cpu_to_32wu()

  block/qcow2-cluster.c |2 +-
  hw/acpi/core.c|3 +--
  hw/block/cdrom.c  |   10 +-
  hw/display/vga_template.h |   14 --
  hw/ide/atapi.c|   16 +++
  hw/net/e1000.c|   22 +
  hw/net/ne2000.c   |4 ++--
  hw/pci/pcie_aer.c |4 ++--
  include/hw/pci/pci.h  |8 
  include/qemu/bswap.h  |   47 
 -
  10 files changed, 40 insertions(+), 90 deletions(-)

 --
 1.7.9.5

Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 03:34, Fam Zheng ha scritto:
 On Fri, 09/06 20:19, Lluís Vilanova wrote:
 Fam Zheng writes:
 [...]
 Because $(obj) here is './block', instead of '../block'. This doesn't
 hurt compiling because we basically build all .o from top Makefile,
 before entering Makefile.target, but it will affact arriving per-object
 libs support.

 I'm curious. What's the reason to not use recursive make in QEMU?

 I don't know the answer, Paolo?

It predates my involvement by a long time, so I don't know.

But my guess is that whenever directories are not present in the build
tree (e.g. i386-softmmu/hw) we have to create the Makefile in the
configure script.  Thus a heavily declarative Makefile style works better.

Paolo

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 11:50:03AM +0200, Paolo Bonzini wrote:
 Il 09/09/2013 11:18, Gleb Natapov ha scritto:
  On Mon, Sep 09, 2013 at 10:51:58AM +0200, Paolo Bonzini wrote:
  Il 08/09/2013 13:52, Gleb Natapov ha scritto:
  On Thu, Sep 05, 2013 at 03:06:22PM +0200, Paolo Bonzini wrote:
  QEMU moves state from CPUArchState to struct kvm_xsave and back when it
  invokes the KVM_*_XSAVE ioctls.  Because it doesn't treat the XSAVE
  region as an opaque blob, it might be impossible to set some state on
  the destination if migrating to an older version.
 
  This patch blocks migration if it finds that unsupported bits are set
  in the XSTATE_BV header field.  To make this work robustly, QEMU should
  only report in env-xstate_bv those fields that will actually end up
  in the migration stream.
 
  We usually handle host cpu differences in cpuid layer, not by trying to
  validate migration data.
 
  Actually we do both.  QEMU for example detects invalid subsections and
  blocks migration, and CPU differences also result in subsections that
  the destination does not know.
 
  That's different from what you do here though. If xstate_bv was in its
  separate subsection things would be easier, but it is not.
 
 I agree.  And also if YMM was in its separate subsections; future XSAVE
 states will likely use subsections (whose presence is keyed off bits in
 env-xstate_bv).
 
  However, KVM_GET/SET_XSAVE should still return all values supported by
  the hypervisor, independent of the supported CPUID bits.
 
  Why?
 
 Because this is not talking to the guest, it is talking to userspace.
 
 The VCPU state is more than what is visible to the guest, and returning
If a state does not affect guest in any way there is not reason to
migrate it.

 all of it seems more consistent with the rest of the KVM API.  For
 example, KVM_GET_FPU always returns SSE state even if the CPUID lacks
 SSE and/or FXSR.
There are counter examples too :) If APIC is not created we do not return
fake information on GET_IRQCHIP.  I think nobody expected FPU state to
grow indefinitely, so fixed, inflexible API was introduced, but now,
when CPU state has flexible extended state management it make sense to
model it in the API too.

 
  A well-behaved guest should not have modified that state anyway, since:
 
  * the source and destination machines should have the same CPU
 
  * since the destination QEMU does not support the feature, the source
  should have masked it as well
 
  * the guest should always probe CPUID before using a feature
 
  The I fail to see what is the purpose of the patch. I see two cases:
  1. Each extended state has separate CPUID bit (is this guarantied?)
 
 Not guaranteed, but it has always happened so far (AVX, AVX-512, MPX).
 
OK, So for now no need to make 0D configurable, but we need to provide
correct one according to those flags, not to passthrough host values.
 
- In this case, as you say, by matching CPUID on src and dst we guaranty
  that migration data is good.
 
 But we don't match CPUID on src and destination.  This is something that
Yes, I was saying that management infrastructure already knows how to handle
it.

 the user should do, but it's better if we can test it too.  Subsections
 do that for us; I am, in some sense, emulating subsections for the XSAVE
 states that are not stored in subsections.
We do not do it for other bits. It is possible currently to migrate to a
slightly different cpu without failure and it may cause guest to crash,
but we are not trying actively to catch those situations. Why XSAVE is
different?

 
  In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
  and we should hide all features that are not visible in CPUID.  It is
  okay, however, to test it in cpu_post_load.
  
  The kernel should not even return state that is not visible in CPUID.
 
 That's an interesting point of view that I hadn't considered.  But just
 like you asked me why it should return state that is not visible in
 CPUID, I'm asking you why it should not...
 
For number of reasons. First because since a sate is not used there is no
point in migrating it. Second to make interface more deterministic for
QEMU. i.e QEMU configures only features it supports and gets
exactly same state from the kernel no matter what host cpu is and what
kernel version is. This patch will not be needed since kernel will do
the job.

--
Gleb.

Re: [Qemu-devel] [RFC PATCH v2 3/6] Makefile: introduce common-obj-m and block-obj-m for DSO

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 04:26, Fam Zheng ha scritto:
 Actually I would like a try to keep $i-{obj,cflags,libs}:
 
 foo-obj := bar.o biz.o
 foo-libs := $(FOO_LIBS)
 foo-cflags := $(FOO_CFLAGS)
 
 getting rid of all the $(obj)/'s and expand them in the toplevel unnest magic
 (because we know foo.mo is contained in local $(block-obj-y) so it's all
 trackable).
 
 If this idea can't work out, I'll use your suggested way.

Yes, either way works (i.e. as long as it's consistent).

Paolo

Re: [Qemu-devel] [RFC PATCH v2 1/6] make.rule: fix $(obj) to a real relative path

2013-09-09 Thread Peter Maydell

On 6 September 2013 18:19, Lluís Vilanova vilan...@ac.upc.edu wrote:
 Fam Zheng writes:
 [...]
 Because $(obj) here is './block', instead of '../block'. This doesn't
 hurt compiling because we basically build all .o from top Makefile,
 before entering Makefile.target, but it will affact arriving per-object
 libs support.

 I'm curious. What's the reason to not use recursive make in QEMU?

The classic paper is Recursive make considered harmful:
http://aegis.sourceforge.net/auug97.pdf

-- PMM

Re: [Qemu-devel] [PATCH v2] ehci: save device pointer in EHCIState

2013-09-09 Thread Hans de Goede


Hi,

On 09/09/2013 11:57 AM, Gerd Hoffmann wrote:

We'll need a pointer to the actual pci/sysbus device,
stick a pointer to it into the EHCIState struct.

https://bugzilla.redhat.com/show_bug.cgi?id=1005495


Looks good, note you've forgotten to add qemu-stable,
I've done so now.

Acked-by: Hans de Goede hdego...@redhat.com

Regards,

Hans



Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
  hw/usb/hcd-ehci.c | 7 +++
  hw/usb/hcd-ehci.h | 1 +
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 137e200..22bdbf4 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1241,13 +1241,11 @@ static int ehci_init_transfer(EHCIPacket *p)
  {
  uint32_t cpage, offset, bytes, plen;
  dma_addr_t page;
-USBBus *bus = p-queue-ehci-bus;
-BusState *qbus = BUS(bus);

  cpage  = get_field(p-qtd.token, QTD_TOKEN_CPAGE);
  bytes  = get_field(p-qtd.token, QTD_TOKEN_TBYTES);
  offset = p-qtd.bufptr[0]  ~QTD_BUFPTR_MASK;
-qemu_sglist_init(p-sgl, qbus-parent, 5, p-queue-ehci-as);
+qemu_sglist_init(p-sgl, p-queue-ehci-device, 5, p-queue-ehci-as);

  while (bytes  0) {
  if (cpage  4) {
@@ -1486,7 +1484,7 @@ static int ehci_process_itd(EHCIState *ehci,
  return -1;
  }

-qemu_sglist_init(ehci-isgl, DEVICE(ehci), 2, ehci-as);
+qemu_sglist_init(ehci-isgl, ehci-device, 2, ehci-as);
  if (off + len  4096) {
  /* transfer crosses page border */
  uint32_t len2 = off + len - 4096;
@@ -2529,6 +2527,7 @@ void usb_ehci_realize(EHCIState *s, DeviceState *dev, 
Error **errp)

  s-frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ehci_frame_timer, s);
  s-async_bh = qemu_bh_new(ehci_frame_timer, s);
+s-device = dev;

  qemu_register_reset(ehci_reset, s);
  qemu_add_vm_change_state_handler(usb_ehci_vm_state_change, s);
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 15a28e8..065c9fa 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -255,6 +255,7 @@ typedef QTAILQ_HEAD(EHCIQueueHead, EHCIQueue) EHCIQueueHead;

  struct EHCIState {
  USBBus bus;
+DeviceState *device;
  qemu_irq irq;
  MemoryRegion mem;
  AddressSpace *as;

[Qemu-devel] [PATCH] e1000: NetClientInfo.receive_iov implemented

2013-09-09 Thread Vincenzo Maffione

This patch implements the NetClientInfo.receive_iov method for the
e1000 device emulation. In this way a network backend that uses
qemu_sendv_packet() can deliver the fragmented packet without
requiring an additional copy in the frontend/backend network code
(nc_sendv_compat() function).

The existing method NetClientInfo.receive has been reimplemented
using the new method.

Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
---
 hw/net/e1000.c | 70 --
 1 file changed, 58 insertions(+), 12 deletions(-)

I propose this patch also because our research group (University of Pisa,
Department of Computer Engineering) is working on the e1000 device
(optimizations and paravirtual extensions) and we have patches to
support the VALE switch as a network backend (see
http://info.iet.unipi.it/~luigi/vale/).
The VALE backend uses qemu_sendv_packet() to send fragmented packets: For
this reason we think it could be interesting to better support these packets
with e1000.

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f5ebed4..70ab17f 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -32,6 +32,7 @@
 #include hw/loader.h
 #include sysemu/sysemu.h
 #include sysemu/dma.h
+#include qemu/iov.h
 
 #include e1000_regs.h
 
@@ -64,6 +65,8 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 /* this is the size past which hardware will drop packets when setting LPE=1 */
 #define MAXIMUM_ETHERNET_LPE_SIZE 16384
 
+#define MAXIMUM_ETHERNET_HDR_LEN (14+4)
+
 /*
  * HW models:
  *  E1000_DEV_ID_82540EM works with Windows and Linux
@@ -825,7 +828,7 @@ static uint64_t rx_desc_base(E1000State *s)
 }
 
 static ssize_t
-e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 {
 E1000State *s = qemu_get_nic_opaque(nc);
 PCIDevice *d = PCI_DEVICE(s);
@@ -834,11 +837,14 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 unsigned int n, rdt;
 uint32_t rdh_start;
 uint16_t vlan_special = 0;
-uint8_t vlan_status = 0, vlan_offset = 0;
+uint8_t vlan_status = 0;
 uint8_t min_buf[MIN_BUF_SIZE];
 size_t desc_offset;
 size_t desc_size;
 size_t total_size;
+size_t size = iov_size(iov, iovcnt), iov_ofs = 0;
+struct iovec iv;
+uint8_t *filter_buf = iov-iov_base;
 
 if (!(s-mac_reg[STATUS]  E1000_STATUS_LU)) {
 return -1;
@@ -850,10 +856,16 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 
 /* Pad to minimum Ethernet frame length */
 if (size  sizeof(min_buf)) {
-memcpy(min_buf, buf, size);
+iov_to_buf(iov, iovcnt, 0, min_buf, size);
 memset(min_buf[size], 0, sizeof(min_buf) - size);
-buf = min_buf;
-size = sizeof(min_buf);
+iv.iov_base = filter_buf = min_buf;
+iv.iov_len = size = sizeof(min_buf);
+iovcnt = 1;
+iov = iv;
+} else if (iov-iov_len  MAXIMUM_ETHERNET_HDR_LEN) {
+/* This is very unlikely, but may happen. */
+iov_to_buf(iov, iovcnt, 0, min_buf, MAXIMUM_ETHERNET_HDR_LEN);
+filter_buf = min_buf;
 }
 
 /* Discard oversized packets if !LPE and !SBP. */
@@ -864,14 +876,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 return size;
 }
 
-if (!receive_filter(s, buf, size))
+if (!receive_filter(s, filter_buf, size)) {
 return size;
+}
 
-if (vlan_enabled(s)  is_vlan_packet(s, buf)) {
-vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(buf + 14)));
-memmove((uint8_t *)buf + 4, buf, 12);
+if (vlan_enabled(s)  is_vlan_packet(s, filter_buf)) {
+vlan_special = cpu_to_le16(be16_to_cpup((uint16_t *)(filter_buf
++ 14)));
+iov_ofs = 4;
+if (filter_buf == iov-iov_base) {
+memmove(filter_buf + 4, filter_buf, 12);
+} else {
+iov_from_buf(iov, iovcnt, 4, filter_buf, 12);
+while (iov-iov_len = iov_ofs) {
+iov_ofs -= iov-iov_len;
+iov++;
+}
+}
 vlan_status = E1000_RXD_STAT_VP;
-vlan_offset = 4;
 size -= 4;
 }
 
@@ -893,12 +915,24 @@ e1000_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
 if (desc.buffer_addr) {
 if (desc_offset  size) {
+size_t iov_copy, copied = 0;
+hwaddr ba = le64_to_cpu(desc.buffer_addr);
 size_t copy_size = size - desc_offset;
 if (copy_size  s-rxbuf_size) {
 copy_size = s-rxbuf_size;
 }
-pci_dma_write(d, le64_to_cpu(desc.buffer_addr),
-  buf + desc_offset + vlan_offset, copy_size);
+do {
+iov_copy =

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
 Il 09/09/2013 11:03, Gleb Natapov ha scritto:
  On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
  Il 08/09/2013 13:40, Gleb Natapov ha scritto:
  On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
  On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
  and not restore anything.
 
  XRSTOR restores FP/SSE state to reset state if no bits are set in
  xstate_bv. This is what should happen on reset, no?
 
  Yes. The problem happens on the migration destination when XSAVE data is
  not transmitted.  FP/SSE data is transmitted and must be restored, but
  xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
  state.  The vcpu then loses the values that were set in the migration data.
 
  Since FP and SSE data are always valid, set them in xstate_bv at reset
  time.  In fact, that value is the same that KVM_GET_XSAVE returns on
  pre-XSAVE hosts.
  It is needed for migration between non xsave host to xsave host.
 
  Yes, and this patch does the same for migration between non-XSAVE QEMU
  and XSAVE QEMU.
 
  Can such migration happen? The commit that added xsave support
  (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.
 
 Yes, old-new migration can happen.  New-old of course cannot.
 
I see. I am fine with the patch, but please drop defines that are not
used in the patch itself.

  In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
  xstate_bv when XSAVE is not available.  Instead, it should reset the
  FXSAVE data to processor-reset values (except for MXCSR which always
  comes from XRSTOR data), i.e. to all-zeros except for the x87 control
  and tag words.  It should also check reserved bits of MXCSR.
 
  I do not see why.
 
 Because otherwise it behaves in a subtly different manner for XSAVE and
 non-XSAVE hosts.
I do not see how. Can you elaborate?

 
  Yes.  QEMU unmarshals information from the XSAVE region and back, so it
  cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
  though.
 
  IMO this is the main issue here, not separate bug. If we gonna let guest
  use CPU state QEMU does not support we gonna have a bad time.
 
 We cannot force the guest not to use a feature; all we can do is hide
Of course we can't, this is correct for other features too, but this is
guest's problem.

 the CPUID bits so that a well-behaved guest will not use it.  QEMU does
 hide CPUID bits for non-supported XSAVE states, except for -cpu host.
  So this will not be a problem except with -cpu host.
 

--
Gleb.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:54:50PM +0300, Gleb Natapov wrote:
 On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
  Il 09/09/2013 11:03, Gleb Natapov ha scritto:
   On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
   Il 08/09/2013 13:40, Gleb Natapov ha scritto:
   On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
   On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
   and not restore anything.
  
   XRSTOR restores FP/SSE state to reset state if no bits are set in
   xstate_bv. This is what should happen on reset, no?
  
   Yes. The problem happens on the migration destination when XSAVE data is
   not transmitted.  FP/SSE data is transmitted and must be restored, but
   xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
   state.  The vcpu then loses the values that were set in the migration 
   data.
  
   Since FP and SSE data are always valid, set them in xstate_bv at reset
   time.  In fact, that value is the same that KVM_GET_XSAVE returns on
   pre-XSAVE hosts.
   It is needed for migration between non xsave host to xsave host.
  
   Yes, and this patch does the same for migration between non-XSAVE QEMU
   and XSAVE QEMU.
  
   Can such migration happen? The commit that added xsave support
   (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.
  
  Yes, old-new migration can happen.  New-old of course cannot.
  
 I see. I am fine with the patch, but please drop defines that are not
 used in the patch itself.
 
BTW migration question, will xstate_bv no be zeroed by migration code in
old-new case?
 
--
Gleb.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 12:54, Gleb Natapov ha scritto:
 On Mon, Sep 09, 2013 at 11:53:45AM +0200, Paolo Bonzini wrote:
 Il 09/09/2013 11:03, Gleb Natapov ha scritto:
 On Mon, Sep 09, 2013 at 10:31:15AM +0200, Paolo Bonzini wrote:
 Il 08/09/2013 13:40, Gleb Natapov ha scritto:
 On Thu, Sep 05, 2013 at 03:06:21PM +0200, Paolo Bonzini wrote:
 On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
 and not restore anything.

 XRSTOR restores FP/SSE state to reset state if no bits are set in
 xstate_bv. This is what should happen on reset, no?

 Yes. The problem happens on the migration destination when XSAVE data is
 not transmitted.  FP/SSE data is transmitted and must be restored, but
 xstate_bv is zero and KVM_SET_XSAVE restores FP/SSE state to reset
 state.  The vcpu then loses the values that were set in the migration data.

 Since FP and SSE data are always valid, set them in xstate_bv at reset
 time.  In fact, that value is the same that KVM_GET_XSAVE returns on
 pre-XSAVE hosts.
 It is needed for migration between non xsave host to xsave host.

 Yes, and this patch does the same for migration between non-XSAVE QEMU
 and XSAVE QEMU.

 Can such migration happen? The commit that added xsave support
 (f1665b21f16c5dc0ac37de60233a4975aff31193) changed vmstate version id.

 Yes, old-new migration can happen.  New-old of course cannot.

 I see. I am fine with the patch, but please drop defines that are not
 used in the patch itself.

Ok.

(For the BTW question, xstate_bv will not be zeroed, it will remain to
the default value).

 In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
 xstate_bv when XSAVE is not available.  Instead, it should reset the
 FXSAVE data to processor-reset values (except for MXCSR which always
 comes from XRSTOR data), i.e. to all-zeros except for the x87 control
 and tag words.  It should also check reserved bits of MXCSR.

 I do not see why.

 Because otherwise it behaves in a subtly different manner for XSAVE and
 non-XSAVE hosts.
 
 I do not see how. Can you elaborate?

Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0.

On an XSAVE host, when the guest FPU state is loaded KVM will do an
XRSTOR.  The XRSTOR will restore the FPU state to default values.

On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
the block that was passed to KVM_SET_XSAVE.

This is not a problem because userspace will usually pass to
KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
supposed to emulate XSAVE/XRSTOR if it is not available, and it is
failing to emulate this detail.

 Yes.  QEMU unmarshals information from the XSAVE region and back, so it
 cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
 though.

 IMO this is the main issue here, not separate bug. If we gonna let guest
 use CPU state QEMU does not support we gonna have a bad time.

 We cannot force the guest not to use a feature; all we can do is hide
 
 Of course we can't, this is correct for other features too, but this is
 guest's problem.

Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
always be fresh as long as the guest obeys CPUID bits it receives, and
the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
does not support.

Paolo

Re: [Qemu-devel] Questions about hvm domU default devices with upstream qemu

2013-09-09 Thread Stefano Stabellini

On Thu, 5 Sep 2013, Fabio Fantoni wrote:
 Il 05/09/2013 16:04, Stefano Stabellini ha scritto:
  On Thu, 5 Sep 2013, Fabio Fantoni wrote:
 About qemu default empty and unusable floppy and cdrom I have already
 reply at
 start of year that qemu traditional does not have them.
 The tests were with xl only if I remember good, now I checked also on
 old
 production server with xend and qemu traditional. I confirm that hvm
 domUs
 is
 without empty floppy and cdrom also in that case.
 
 cd-insert works (tested with xl and upstream qemu). cd-insert does not
 works
 only with qemu default empty cdrom because is undefined on xen side.
So HVM domUs do NOT have a cdrom drive by default with qemu-traditional.
   Exact.
   
Do they have an empty cdrom drive by default with qemu-xen?
   Yes
   
Is the lack of a cdrom drive the reason why cd-insert doesn't work by
default (unless you specify an empty cdrom drive in the VM config file)?
   Yes.
   xl cd-insert command works with cdrom devices already present on domU
   defined
   on xl file configuration.
   The cdrom can be empty or not, the only exception is the empty cdrom added
   default by qemu that is not usable with cd-insert because not defined on
   xl
   file configuration.
  I see. That should be fixed, by either removing the empty cdrom drive or
  making it work properly with xl cd-insert.
  I don't have a strong opinion on this. Given that qemu-traditional
  doesn't have a cdrom drive by default, it might make sense to do the
  same on qemu-xen.
 
 I think the best choice is to remove the qemu default empty cdrom, if possible
 without another workaround as for floppy but considering to change the method
 hvm domUs starts qemu.
 I think that is probably better to add on qemu start command only components
 already defined xen side in order to avoid creation of unusable components
 added by qemu itself or mismatches on both sides.
 
 Another actual problem is that is difficult to know if the qemu binary that
 should be used on xl create has the features/components requested and if they
 are compiled. Actually on xl create we can have only qemu log file after domU
 create failing.
 Why not expose all features/components of a given qemu binary and let xl or
 other toolstack (not only xen) query them before start qemu itself?

You are right, it would be nice to be able to know exactly what the qemu
binary supports. However it's difficult to do because Linux distros are
free to choose any QEMU version they like and any set of compile time
options they want for qemu-xen.
As a consequence we would actually need to start QEMU and issue QMP
commands to figure out what is supported.  It would slow down VM
creation too much, so we would probably need to do this at host boot
time and store the settings somewhere on disk or on xenstore.

[Qemu-devel] [PATCH RFC v2 0/3] pci: complete master abort protocol

2013-09-09 Thread Marcel Apfelbaum

Note: The series is incomplete, for review only

PCI spec requires that a transaction that has not been claimed
by any PCI bus devices will be terminated by the initiator
with master abort. For read transactions -1() is returned and 
writes are silently dropped.

The series deals also with the other aspect of the master abort scenario:
Upon completion the master has to raise RECEIVED MASTER ABORT BIT in
initiator's STATUS register.

Implementation:
 - Allowed the MemoryRegion priority to be negative so a subregion will be
   visible on all the addresses not covered by the parent MemoryRegion
   or other subregions.
 - Added a memory region with negative priority that extends over all the
   pci address space. This region catches all the accesses
   to the unassigned pci addresses.
 - The MemoryRegion's ops emulates the master abort scenario.

Note:
For the moment the code assumes that all the reads/writes to
pci address space are done by the cpu.

Changes from v1:
 - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - pci-unassgined-mem does not have a .valid.accept field and
implements read write methods

Marcel Apfelbaum (3):
  memory: allow MemoryRegion's priority field to accept negative values
  hw/pci: add MemoryRegion ops for unassigned pci addresses
  hw/pci-host: catch acesses to unassigned pci addresses

 hw/pci-host/piix.c|  8 
 hw/pci-host/q35.c | 19 ---
 hw/pci/pci.c  | 18 ++
 include/exec/memory.h |  6 +++---
 include/hw/pci-host/q35.h |  1 +
 include/hw/pci/pci.h  |  3 +++
 memory.c  |  2 +-
 7 files changed, 50 insertions(+), 7 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

Priority is used to make visible some subregions by obscuring
the parent MemoryRegion addresses overlapping with the subregion.

By allowing the priority to be negative the opposite can be done:
Allow a subregion to be visible on all the addresses not covered
by the parent MemoryRegion or other subregions.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
 include/exec/memory.h | 6 +++---
 memory.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ebe0d24..6995087 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -153,7 +153,7 @@ struct MemoryRegion {
 bool flush_coalesced_mmio;
 MemoryRegion *alias;
 hwaddr alias_offset;
-unsigned priority;
+int priority;
 bool may_overlap;
 QTAILQ_HEAD(subregions, MemoryRegion) subregions;
 QTAILQ_ENTRY(MemoryRegion) subregions_link;
@@ -193,7 +193,7 @@ struct MemoryListener {
 void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
*section,
hwaddr addr, hwaddr len);
 /* Lower = earlier (during add), later (during del) */
-unsigned priority;
+int priority;
 AddressSpace *address_space_filter;
 QTAILQ_ENTRY(MemoryListener) link;
 };
@@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority);
+ int priority);
 
 /**
  * memory_region_get_ram_addr: Get the ram address associated with a memory
diff --git a/memory.c b/memory.c
index 5a10fd0..984a3dc 100644
--- a/memory.c
+++ b/memory.c
@@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority)
+ int priority)
 {
 subregion-may_overlap = true;
 subregion-priority = priority;
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

Created a MemoryRegion with negative priority that
spans over all the pci address space.
It intercepts the accesses to unassigned pci
address space and will follow the pci spec:
 1. returns -1 on read
 2. does nothing on write
 3. sets the RECEIVED MASTER ABORT bit in the STATUS register
of the device that initiated the transaction

Note: This implementation assumes that all the reads/writes to
the pci address space are done by the cpu.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
Changes from v1:
 - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - pci-unassgined-mem does not have a .valid.accept field and
implements read write methods

 hw/pci/pci.c | 46 ++
 include/hw/pci/pci_bus.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d00682e..b6a8026 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus-qbus.name;
 }
 
+static void unassigned_mem_access(PCIBus *bus)
+{
+/* FIXME assumption: memory access to the pci address
+ * space is always initiated by the host bridge
+ * (device 0 on the bus) */
+PCIDevice *d = bus-devices[0];
+if (!d) {
+return;
+}
+
+pci_word_test_and_set_mask(d-config + PCI_STATUS,
+   PCI_STATUS_REC_MASTER_ABORT);
+}
+
+static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+
+return -1ULL;
+}
+
+static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+}
+
+static const MemoryRegionOps unassigned_mem_ops = {
+.read = unassigned_mem_read,
+.write = unassigned_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+#define UNASSIGNED_MEM_PRIORITY -1
+
 static void pci_bus_init(PCIBus *bus, DeviceState *parent,
  const char *name,
  MemoryRegion *address_space_mem,
@@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 bus-address_space_mem = address_space_mem;
 bus-address_space_io = address_space_io;
 
+
+memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
+  unassigned_mem_ops, bus, pci-unassigned,
+  memory_region_size(bus-address_space_mem));
+memory_region_add_subregion_overlap(bus-address_space_mem,
+bus-address_space_mem-addr,
+bus-unassigned_mem,
+UNASSIGNED_MEM_PRIORITY);
+
 /* host bridge */
 QLIST_INIT(bus-child);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 9df1788..4cc25a3 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -23,6 +23,7 @@ struct PCIBus {
 PCIDevice *parent_dev;
 MemoryRegion *address_space_mem;
 MemoryRegion *address_space_io;
+MemoryRegion unassigned_mem;
 
 QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
 QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 0/2] pci: complete master abort protocol

2013-09-09 Thread Marcel Apfelbaum

Note: The series is incomplete, for review only

PCI spec requires that a transaction that has not been claimed
by any PCI bus devices will be terminated by the initiator
with master abort. For read transactions -1() is returned and 
writes are silently dropped.

The series deals also with the other aspect of the master abort scenario:
Upon completion the master has to raise RECEIVED MASTER ABORT BIT in
initiator's STATUS register.

Implementation:
 - Allowed the MemoryRegion priority to be negative so a subregion will be
   visible on all the addresses not covered by the parent MemoryRegion
   or other subregions.
 - Added a memory region with negative priority that extends over all the
   pci address space. This region catches all the accesses
   to the unassigned pci addresses.
 - The MemoryRegion's ops emulates the master abort scenario.

Note:
For the moment the code assumes that all the reads/writes to
pci address space are done by the cpu.

Changes from v2:
 - minor: changed nr of patches int the title
 - minor: modified series list

Changes from v1:
 - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - pci-unassgined-mem does not have a .valid.accept field and
implements read write methods


Marcel Apfelbaum (2):
  memory: allow MemoryRegion's priority field to accept negative values
  hw/pci: handle unassigned pci addresses

 hw/pci/pci.c | 46 ++
 include/exec/memory.h|  6 +++---
 include/hw/pci/pci_bus.h |  1 +
 memory.c |  2 +-
 4 files changed, 51 insertions(+), 4 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

Priority is used to make visible some subregions by obscuring
the parent MemoryRegion addresses overlapping with the subregion.

By allowing the priority to be negative the opposite can be done:
Allow a subregion to be visible on all the addresses not covered
by the parent MemoryRegion or other subregions.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
 include/exec/memory.h | 6 +++---
 memory.c  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ebe0d24..6995087 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -153,7 +153,7 @@ struct MemoryRegion {
 bool flush_coalesced_mmio;
 MemoryRegion *alias;
 hwaddr alias_offset;
-unsigned priority;
+int priority;
 bool may_overlap;
 QTAILQ_HEAD(subregions, MemoryRegion) subregions;
 QTAILQ_ENTRY(MemoryRegion) subregions_link;
@@ -193,7 +193,7 @@ struct MemoryListener {
 void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
*section,
hwaddr addr, hwaddr len);
 /* Lower = earlier (during add), later (during del) */
-unsigned priority;
+int priority;
 AddressSpace *address_space_filter;
 QTAILQ_ENTRY(MemoryListener) link;
 };
@@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority);
+ int priority);
 
 /**
  * memory_region_get_ram_addr: Get the ram address associated with a memory
diff --git a/memory.c b/memory.c
index 5a10fd0..984a3dc 100644
--- a/memory.c
+++ b/memory.c
@@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
- unsigned priority)
+ int priority)
 {
 subregion-may_overlap = true;
 subregion-priority = priority;
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC v3 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

Created a MemoryRegion with negative priority that
spans over all the pci address space.
It intercepts the accesses to unassigned pci
address space and will follow the pci spec:
 1. returns -1 on read
 2. does nothing on write
 3. sets the RECEIVED MASTER ABORT bit in the STATUS register
of the device that initiated the transaction

Note: This implementation assumes that all the reads/writes to
the pci address space are done by the cpu.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
Changes from v1:
 - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - pci-unassgined-mem does not have a .valid.accept field and
implements read write methods

 hw/pci/pci.c | 46 ++
 include/hw/pci/pci_bus.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d00682e..b6a8026 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus-qbus.name;
 }
 
+static void unassigned_mem_access(PCIBus *bus)
+{
+/* FIXME assumption: memory access to the pci address
+ * space is always initiated by the host bridge
+ * (device 0 on the bus) */
+PCIDevice *d = bus-devices[0];
+if (!d) {
+return;
+}
+
+pci_word_test_and_set_mask(d-config + PCI_STATUS,
+   PCI_STATUS_REC_MASTER_ABORT);
+}
+
+static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+
+return -1ULL;
+}
+
+static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+}
+
+static const MemoryRegionOps unassigned_mem_ops = {
+.read = unassigned_mem_read,
+.write = unassigned_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+#define UNASSIGNED_MEM_PRIORITY -1
+
 static void pci_bus_init(PCIBus *bus, DeviceState *parent,
  const char *name,
  MemoryRegion *address_space_mem,
@@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 bus-address_space_mem = address_space_mem;
 bus-address_space_io = address_space_io;
 
+
+memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
+  unassigned_mem_ops, bus, pci-unassigned,
+  memory_region_size(bus-address_space_mem));
+memory_region_add_subregion_overlap(bus-address_space_mem,
+bus-address_space_mem-addr,
+bus-unassigned_mem,
+UNASSIGNED_MEM_PRIORITY);
+
 /* host bridge */
 QLIST_INIT(bus-child);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 9df1788..4cc25a3 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -23,6 +23,7 @@ struct PCIBus {
 PCIDevice *parent_dev;
 MemoryRegion *address_space_mem;
 MemoryRegion *address_space_io;
+MemoryRegion unassigned_mem;
 
 QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
 QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:07:37PM +0200, Paolo Bonzini wrote:
  In fact, another bug is that kvm_vcpu_ioctl_x86_set_xsave ignores
  xstate_bv when XSAVE is not available.  Instead, it should reset the
  FXSAVE data to processor-reset values (except for MXCSR which always
  comes from XRSTOR data), i.e. to all-zeros except for the x87 control
  and tag words.  It should also check reserved bits of MXCSR.
 
  I do not see why.
 
  Because otherwise it behaves in a subtly different manner for XSAVE and
  non-XSAVE hosts.
  
  I do not see how. Can you elaborate?
 
 Suppose userspace calls KVM_SET_XSAVE with XSTATE_BV=0.
 
 On an XSAVE host, when the guest FPU state is loaded KVM will do an
 XRSTOR.  The XRSTOR will restore the FPU state to default values.
 
 On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
 FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
 the block that was passed to KVM_SET_XSAVE.
 
 This is not a problem because userspace will usually pass to
 KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
 KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
 supposed to emulate XSAVE/XRSTOR if it is not available, and it is
 failing to emulate this detail.
 
You are trying to be bug to bug compatible :) XSTATE_BV can be zero only
if FPU state is reset one, otherwise the guest will not survive. KVM_SET_XSAVE
is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It
is better to outlaw zero value for XSTATE_BV at all, but we cannot do it
because current QEMU uses it.

  Yes.  QEMU unmarshals information from the XSAVE region and back, so it
  cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
  though.
 
  IMO this is the main issue here, not separate bug. If we gonna let guest
  use CPU state QEMU does not support we gonna have a bad time.
 
  We cannot force the guest not to use a feature; all we can do is hide
  
  Of course we can't, this is correct for other features too, but this is
  guest's problem.
 
 Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
Which problem exactly. The problems I see is that 1. We do not support
MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
data is not consistent with features. Guest may not expect it and do stupid
things.

 always be fresh as long as the guest obeys CPUID bits it receives, and
 the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
 does not support.
 

--
Gleb.

Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:11, Marcel Apfelbaum marce...@redhat.com wrote:
 Priority is used to make visible some subregions by obscuring
 the parent MemoryRegion addresses overlapping with the subregion.

 By allowing the priority to be negative the opposite can be done:
 Allow a subregion to be visible on all the addresses not covered
 by the parent MemoryRegion or other subregions.

 Signed-off-by: Marcel Apfelbaum marce...@redhat.com
 ---
  include/exec/memory.h | 6 +++---
  memory.c  | 2 +-
  2 files changed, 4 insertions(+), 4 deletions(-)

 diff --git a/include/exec/memory.h b/include/exec/memory.h
 index ebe0d24..6995087 100644
 --- a/include/exec/memory.h
 +++ b/include/exec/memory.h
 @@ -153,7 +153,7 @@ struct MemoryRegion {
  bool flush_coalesced_mmio;
  MemoryRegion *alias;
  hwaddr alias_offset;
 -unsigned priority;
 +int priority;
  bool may_overlap;
  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
  QTAILQ_ENTRY(MemoryRegion) subregions_link;
 @@ -193,7 +193,7 @@ struct MemoryListener {
  void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
 *section,
 hwaddr addr, hwaddr len);
  /* Lower = earlier (during add), later (during del) */
 -unsigned priority;
 +int priority;

You haven't addressed any of the points I made reviewing
the first version of this. Please don't just ignore
code review, or people will stop reviewing your code.

I think it would also be nice to update docs/memory.txt
to say explicitly that priority is signed and why this
is useful, something like this:

begin
Priority values are signed, and the default value is
zero. This means that you can use
memory_region_add_subregion_overlap() both to specify
a region that must sit 'above' any others (with a
positive priority) and also a background region that
sits 'below' others (with a negative priority).
endit

(in the 'Overlapping regions and priority' section
of that document).

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
 Created a MemoryRegion with negative priority that
 spans over all the pci address space.
 It intercepts the accesses to unassigned pci
 address space and will follow the pci spec:
  1. returns -1 on read
  2. does nothing on write
  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
 of the device that initiated the transaction
 
 Note: This implementation assumes that all the reads/writes to
 the pci address space are done by the cpu.
 
 Signed-off-by: Marcel Apfelbaum marce...@redhat.com
 ---
 Changes from v1:
  - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
 various Host Bridges
  - pci-unassgined-mem does not have a .valid.accept field and
 implements read write methods
 
  hw/pci/pci.c | 46 ++
  include/hw/pci/pci_bus.h |  1 +
  2 files changed, 47 insertions(+)
 
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index d00682e..b6a8026 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
  return rootbus-qbus.name;
  }
  
 +static void unassigned_mem_access(PCIBus *bus)
 +{
 +/* FIXME assumption: memory access to the pci address
 + * space is always initiated by the host bridge

/* Always
 * like
 * this
 */

/* Not
 * like this */

 + * (device 0 on the bus) */

Can't we pass the master device in?
We are still left with the assumption that
there's a single master, but at least
we get rid of the assumption that it's
always device 0 function 0.


 +PCIDevice *d = bus-devices[0];
 +if (!d) {
 +return;
 +}
 +
 +pci_word_test_and_set_mask(d-config + PCI_STATUS,
 +   PCI_STATUS_REC_MASTER_ABORT);

Probably should check and set secondary status for
a bridge device.

 +}
 +
 +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned size)
 +{
 +PCIBus *bus = opaque;
 +unassigned_mem_access(bus);
 +
 +return -1ULL;
 +}
 +
 +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
 +unsigned size)
 +{
 +PCIBus *bus = opaque;
 +unassigned_mem_access(bus);
 +}
 +
 +static const MemoryRegionOps unassigned_mem_ops = {
 +.read = unassigned_mem_read,
 +.write = unassigned_mem_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,
 +};
 +
 +#define UNASSIGNED_MEM_PRIORITY -1
 +
  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
   const char *name,
   MemoryRegion *address_space_mem,


I would rename unassigned to pci_master_Abort_ everywhere.

Call things by what they do not how they are used.

 @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
 *parent,
  bus-address_space_mem = address_space_mem;
  bus-address_space_io = address_space_io;
  
 +
 +memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
 +  unassigned_mem_ops, bus, pci-unassigned,
 +  memory_region_size(bus-address_space_mem));
 +memory_region_add_subregion_overlap(bus-address_space_mem,
 +bus-address_space_mem-addr,
 +bus-unassigned_mem,
 +UNASSIGNED_MEM_PRIORITY);
 +
  /* host bridge */
  QLIST_INIT(bus-child);


It seems safer to add an API for enabling this functionality.
Something like

pci_set_master_for_master_abort(PCIBus *, PCIDevice *);

  
 diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
 index 9df1788..4cc25a3 100644
 --- a/include/hw/pci/pci_bus.h
 +++ b/include/hw/pci/pci_bus.h
 @@ -23,6 +23,7 @@ struct PCIBus {
  PCIDevice *parent_dev;
  MemoryRegion *address_space_mem;
  MemoryRegion *address_space_io;
 +MemoryRegion unassigned_mem;
  
  QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
  QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
 -- 
 1.8.3.1

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
 Priority is used to make visible some subregions by obscuring
 the parent MemoryRegion addresses overlapping with the subregion.
 
 By allowing the priority to be negative the opposite can be done:
 Allow a subregion to be visible on all the addresses not covered
 by the parent MemoryRegion or other subregions.
 
 Signed-off-by: Marcel Apfelbaum marce...@redhat.com

Seems harmless enough.

Reviewed-by: Michael S. Tsirkin m...@redhat.com


 ---
  include/exec/memory.h | 6 +++---
  memory.c  | 2 +-
  2 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/include/exec/memory.h b/include/exec/memory.h
 index ebe0d24..6995087 100644
 --- a/include/exec/memory.h
 +++ b/include/exec/memory.h
 @@ -153,7 +153,7 @@ struct MemoryRegion {
  bool flush_coalesced_mmio;
  MemoryRegion *alias;
  hwaddr alias_offset;
 -unsigned priority;
 +int priority;
  bool may_overlap;
  QTAILQ_HEAD(subregions, MemoryRegion) subregions;
  QTAILQ_ENTRY(MemoryRegion) subregions_link;
 @@ -193,7 +193,7 @@ struct MemoryListener {
  void (*coalesced_mmio_del)(MemoryListener *listener, MemoryRegionSection 
 *section,
 hwaddr addr, hwaddr len);
  /* Lower = earlier (during add), later (during del) */
 -unsigned priority;
 +int priority;
  AddressSpace *address_space_filter;
  QTAILQ_ENTRY(MemoryListener) link;
  };
 @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
  void memory_region_add_subregion_overlap(MemoryRegion *mr,
   hwaddr offset,
   MemoryRegion *subregion,
 - unsigned priority);
 + int priority);
  
  /**
   * memory_region_get_ram_addr: Get the ram address associated with a memory
 diff --git a/memory.c b/memory.c
 index 5a10fd0..984a3dc 100644
 --- a/memory.c
 +++ b/memory.c
 @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
  void memory_region_add_subregion_overlap(MemoryRegion *mr,
   hwaddr offset,
   MemoryRegion *subregion,
 - unsigned priority)
 + int priority)
  {
  subregion-may_overlap = true;
  subregion-priority = priority;
 -- 
 1.8.3.1

Re: [Qemu-devel] virtio with Windows 8.

2013-09-09 Thread Stefan Hajnoczi

On Mon, Sep 09, 2013 at 12:02:57AM -0500, Yaodong Yang wrote:
 1. I create a raw image named as win8.img, using the following command:
 /usr/local/kvm/bin/qemu-img create -f raw win8.img 20G
 
 2. I try to install win8 with the following command, but I failed several 
 times.
 
 sudo /usr/local/kvm/bin/qemu-system-x86_64 -enable-kvm -drive 
 file=./win8.img,if=virtio,cache=none -cdrom ./win8.iso -boot d -m 2048.
 
 3. I try the same command but with if=ide, it works. 
 
 Could someone tell me the reason for this failure? 

The Windows installer will not detect virtio-blk disks since Windows
does not come with virtio drivers by default.

You must provide the virtio-win drivers ISO during the installation:

https://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/

Watch for the screen where Windows prompts you for a driver disk and put
in the virtio-win ISO.

Stefan

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
 Priority is used to make visible some subregions by obscuring
 the parent MemoryRegion addresses overlapping with the subregion.

 By allowing the priority to be negative the opposite can be done:
 Allow a subregion to be visible on all the addresses not covered
 by the parent MemoryRegion or other subregions.

 Signed-off-by: Marcel Apfelbaum marce...@redhat.com

 Seems harmless enough.

 Reviewed-by: Michael S. Tsirkin m...@redhat.com

No, the idea is good but this version is just broken.
See the comments I made on the previous version which
Marcel ignored :-(

-- PMM

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote:
 On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
  Priority is used to make visible some subregions by obscuring
  the parent MemoryRegion addresses overlapping with the subregion.
 
  By allowing the priority to be negative the opposite can be done:
  Allow a subregion to be visible on all the addresses not covered
  by the parent MemoryRegion or other subregions.
 
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
 
  Seems harmless enough.
 
  Reviewed-by: Michael S. Tsirkin m...@redhat.com
 
 No, the idea is good but this version is just broken.
 See the comments I made on the previous version which
 Marcel ignored :-(
 
 -- PMM

You are right, I missed the bugs.
Good catch, thanks.

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 13:28, Gleb Natapov ha scritto:
 On an XSAVE host, when the guest FPU state is loaded KVM will do an
 XRSTOR.  The XRSTOR will restore the FPU state to default values.

 On a non-XSAVE host, when the guest FPU state is loaded KVM will do an
 FXRSTR.  The FXRSTR will load the FPU state from the first 512 bytes of
 the block that was passed to KVM_SET_XSAVE.

 This is not a problem because userspace will usually pass to
 KVM_SET_XSAVE only something that it got from KVM_GET_XSAVE, and
 KVM_GET_XSAVE will never set XSTATE_BV=0.  However, KVM_SET_XSAVE is
 supposed to emulate XSAVE/XRSTOR if it is not available, and it is
 failing to emulate this detail.

 You are trying to be bug to bug compatible :) XSTATE_BV can be zero only
 if FPU state is reset one, otherwise the guest will not survive.

Yes.

 KVM_SET_XSAVE
 is not suppose to emulate XSAVE/XRSTOR, it is not emulator function. It
 is better to outlaw zero value for XSTATE_BV at all, but we cannot do it
 because current QEMU uses it.

I agree it'd be better to forbid it.  If the mismatch in semantics does
not bother you, I won't fix it.  It slightly bothers me. :)

 Yes.  QEMU unmarshals information from the XSAVE region and back, so it
 cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
 though.

 IMO this is the main issue here, not separate bug. If we gonna let guest
 use CPU state QEMU does not support we gonna have a bad time.

 We cannot force the guest not to use a feature; all we can do is hide

 Of course we can't, this is correct for other features too, but this is
 guest's problem.

 Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
 
 Which problem exactly. The problems I see is that 1. We do not support
 MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
 data is not consistent with features. Guest may not expect it and do stupid
 things.

It is not a problem to unmarshal information out of KVM_GET_XSAVE data
(and back).  If the guest does stupid things, it's a bug in an
ill-behaving guest.

On the other hand, I agree that passthrough of host 0xD data is bad and
will fix it.

Paolo

 always be fresh as long as the guest obeys CPUID bits it receives, and
 the CPUID bits that QEMU passes will never enable XSAVE data that QEMU
 does not support.

 
 --
   Gleb.

Re: [Qemu-devel] [PATCH RFC v3 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:48:55PM +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 12:45:12PM +0100, Peter Maydell wrote:
  On 9 September 2013 12:41, Michael S. Tsirkin m...@redhat.com wrote:
   On Mon, Sep 09, 2013 at 02:21:35PM +0300, Marcel Apfelbaum wrote:
   Priority is used to make visible some subregions by obscuring
   the parent MemoryRegion addresses overlapping with the subregion.
  
   By allowing the priority to be negative the opposite can be done:
   Allow a subregion to be visible on all the addresses not covered
   by the parent MemoryRegion or other subregions.
  
   Signed-off-by: Marcel Apfelbaum marce...@redhat.com
  
   Seems harmless enough.
  
   Reviewed-by: Michael S. Tsirkin m...@redhat.com
  
  No, the idea is good but this version is just broken.
  See the comments I made on the previous version which
  Marcel ignored :-(
  
  -- PMM
 
 You are right, I missed the bugs.
 Good catch, thanks.
 

So I guess it's only an Acked-by: Michael S. Tsirkin m...@redhat.com
at this point :)

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Michael S. Tsirkin

On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote:
 The bswap.h header includes a set of legacy unaligned functions
 that (since commit c732a52d3 at the beginning of this year) are
 just wrappers for underlying {ld,st}type functions. The legacy
 functions aren't used in many places, so just replace all their
 uses with uses of the new-style {ld,st} functions; this lets us
 remove the legacy wrappers altogether.

I have a question on this: what happens if an unaligned
address is supplied?
In particular, if the address is supplied by the guest?

Esp the pci wrappers have many callers - were they all
audited?
I tried checking but there are many callers, will take
a while.

 Since we know the {ld,st}* routines are definitely functions,
 we can in the process remove some casts which were left over
 from when the legacy unaligned functions were previously macros.
 
 The patchset is divided up by function being removed, rather
 than by which device/subsystem is being fixed; I think this way
 round is easier to review since you only have to keep one
 substitution in your head when reading a patch.
 
 Peter Maydell (9):
   bswap.h: Remove cpu_to_le16wu()
   bswap.h: Remove cpu_to_le32wu()
   bswap.h: Remove le16_to_cpupu()
   bswap.h: Remove le32_to_cpupu()
   bswap.h: Remove be32_to_cpupu()
   bswap.h: Remove cpu_to_be16wu()
   bswap.h: Remove cpu_to_be32wu()
   bswap.h: Remove cpu_to_be64wu()
   bswap.h: Remove cpu_to_32wu()
 
  block/qcow2-cluster.c |2 +-
  hw/acpi/core.c|3 +--
  hw/block/cdrom.c  |   10 +-
  hw/display/vga_template.h |   14 --
  hw/ide/atapi.c|   16 +++
  hw/net/e1000.c|   22 +
  hw/net/ne2000.c   |4 ++--
  hw/pci/pcie_aer.c |4 ++--
  include/hw/pci/pci.h  |8 
  include/qemu/bswap.h  |   47 
 -
  10 files changed, 40 insertions(+), 90 deletions(-)
 
 -- 
 1.7.9.5

Re: [Qemu-devel] [PATCH V4 0/3] qemu-iotests: add test for fd passing via SCM rights

2013-09-09 Thread Stefan Hajnoczi

On Fri, Sep 06, 2013 at 11:24:31AM +0800, Wenchao Xia wrote:
 This series add test case for fd passing with unix socket at runtime. Since
 getfd and closefd interface will interact with monitor's data, so it will
 help to do regression test for monitor patches. Since python2 do not support
 sendmsg(), so a C helper program is added to do the job.
 
 v2:
   1: add missing $ in the makefile rule.
 
 v3:
   Address Eric's comments:
   1: typo fix, remove . in the end of error message, strick
 check argc as !=, use EXIT_SUCCESS and EXIT_FAILURE as exit
 values, strict error check for strtol() call.
   Address Luiz's comments:
   1: change the helper program parameter as bin  socket-fd   file-path ,
 the program open the file itself now, data parameter is removed and blank
 is always used as iov data, better usage tip message, folder the string 
 parsing
 code into a function.
   2: related change for helper program parameter change.
   3: related change for helper program parameter change.
   Other:
   1: remove LINK rule in makefile, remove fd checking code inside send_fd()
 since it is already checked before calling, add '' around %s for path and
 number string in error message.
   2: renamed fd_bin to bin in send_fd_scm() to tip better, add '' around %s
 for path in error message.
 v4:
   Address Stefan's comments:
   2: add space after # for comments, refined the comment's grammar.
   3: add space after # for comments, refined the comment's grammar, add two
 test cases for error path.
 
 Wenchao Xia (3):
   1 qemu-iotests: add unix socket help program
   2 qemu-iotests: add infrastructure of fd passing via SCM
   3 qemu-iotests: add tests for runtime fd passing via SCM rights
 
  QMP/qmp.py |6 ++
  configure  |2 +-
  tests/Makefile |3 +-
  tests/qemu-iotests/045 |   51 -
  tests/qemu-iotests/045.out |4 +-
  tests/qemu-iotests/check   |1 +
  tests/qemu-iotests/iotests.py  |   23 ++
  tests/qemu-iotests/socket_scm_helper.c |  135 
 
  8 files changed, 220 insertions(+), 5 deletions(-)
  create mode 100644 tests/qemu-iotests/socket_scm_helper.c

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Re: [Qemu-devel] [PATCH uq/master 2/2] KVM: make XSAVE support more robust

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 12:41, Gleb Natapov ha scritto:
 In fact, perhaps even XSTATE_SUPPORTED is not restrictive enough here,
 and we should hide all features that are not visible in CPUID.  It is
 okay, however, to test it in cpu_post_load.

 The kernel should not even return state that is not visible in CPUID.

 That's an interesting point of view that I hadn't considered.  But just
 like you asked me why it should return state that is not visible in
 CPUID, I'm asking you why it should not...

 For number of reasons. First because since a sate is not used there is no
 point in migrating it. Second to make interface more deterministic for
 QEMU. i.e QEMU configures only features it supports and gets
 exactly same state from the kernel no matter what host cpu is and what
 kernel version is. This patch will not be needed since kernel will do
 the job.

Good reasons, thanks.  Let's do it in the kernel then and avoid this
patch altogether.

Paolo

Re: [Qemu-devel] [PATCH uq/master 1/2] x86: fix migration from pre-version 12

2013-09-09 Thread Gleb Natapov

On Mon, Sep 09, 2013 at 01:46:49PM +0200, Paolo Bonzini wrote:
  Yes.  QEMU unmarshals information from the XSAVE region and back, so it
  cannot support MPX or AVX-512 yet (even if KVM were).  Separate bug, 
  though.
 
  IMO this is the main issue here, not separate bug. If we gonna let guest
  use CPU state QEMU does not support we gonna have a bad time.
 
  We cannot force the guest not to use a feature; all we can do is hide
 
  Of course we can't, this is correct for other features too, but this is
  guest's problem.
 
  Ok, then we agree that QEMU doesn't have a problem?  The XSAVE data will
  
  Which problem exactly. The problems I see is that 1. We do not support
  MPX and AVX-512 (but this is probably not the problem you meant :)) 2. 0D
  data is not consistent with features. Guest may not expect it and do stupid
  things.
 
 It is not a problem to unmarshal information out of KVM_GET_XSAVE data
 (and back).  If the guest does stupid things, it's a bug in an
 ill-behaving guest.
 
You know I am first in line to blame guest for everything :) (who needs
guests anyway) but in this case I didn't mean that guest does something
illegal. If we advertise support for some XSAVE state in 0D leaf guest
is in his right to make conclusions we may not expect from that. It may
check corespondent feature bit and crash if it is not present for
instance.

 On the other hand, I agree that passthrough of host 0xD data is bad and
 will fix it.
 
Thanks!

--
Gleb.

Re: [Qemu-devel] [PATCH v2 1/5] util: add socket_set_fast_reuse function which will replace setting SO_REUSEADDR

2013-09-09 Thread Stefan Hajnoczi

On Wed, Sep 04, 2013 at 07:08:51PM +0200, Sebastian Ottlik wrote:
 diff --git a/util/oslib-posix.c b/util/oslib-posix.c
 index 3dc8b1b..f071793 100644
 --- a/util/oslib-posix.c
 +++ b/util/oslib-posix.c
 @@ -159,6 +159,20 @@ void qemu_set_nonblock(int fd)
  fcntl(fd, F_SETFL, f | O_NONBLOCK);
  }
  
 +int socket_set_fast_reuse(int fd)
 +{
 +int val=1, ret;
 +
 +ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
 + (const char *)val, sizeof(val));
 +
 +if (ret  0) {
 +  perror(setsockopt(SOL_SOCKET, SO_REUSEADDR));
 +}
 +
 +return ret;
 +}

Please run scripts/checkpatch.pl before submitting patches to check
coding style (whitespace, indentation, etc).

Re: [Qemu-devel] [PATCH v5 0/6] block/qcow2: Image file option amendment

2013-09-09 Thread Kevin Wolf

Am 03.09.2013 um 10:09 hat Max Reitz geschrieben:
 
 This series adds support to qemu-img, block and qcow2 for amending image
 options on existing image files.
 
 Depends on:
  - option: Add assigned flag to QEMUOptionParameter
  - qcow2-refcount: Snapshot update for zero clusters (series, v3)
  - Add metadata overlap checks (series, v5)

Thanks, applied to the block branch with the following changes:

- Updated patch 5 with
  [PATCH] block/qcow2: Use bdrv_truncate for size amend

- Resolved merge conflict in patch 1 (bdrv_delete - bdrv_unref)

Kevin

Re: [Qemu-devel] [PATCH RFC v2 1/2] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 12:28 +0100, Peter Maydell wrote:
 On 9 September 2013 12:11, Marcel Apfelbaum marce...@redhat.com wrote:
  Priority is used to make visible some subregions by obscuring
  the parent MemoryRegion addresses overlapping with the subregion.
 
  By allowing the priority to be negative the opposite can be done:
  Allow a subregion to be visible on all the addresses not covered
  by the parent MemoryRegion or other subregions.
 
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
  ---
   include/exec/memory.h | 6 +++---
   memory.c  | 2 +-
   2 files changed, 4 insertions(+), 4 deletions(-)
 
  diff --git a/include/exec/memory.h b/include/exec/memory.h
  index ebe0d24..6995087 100644
  --- a/include/exec/memory.h
  +++ b/include/exec/memory.h
  @@ -153,7 +153,7 @@ struct MemoryRegion {
   bool flush_coalesced_mmio;
   MemoryRegion *alias;
   hwaddr alias_offset;
  -unsigned priority;
  +int priority;
   bool may_overlap;
   QTAILQ_HEAD(subregions, MemoryRegion) subregions;
   QTAILQ_ENTRY(MemoryRegion) subregions_link;
  @@ -193,7 +193,7 @@ struct MemoryListener {
   void (*coalesced_mmio_del)(MemoryListener *listener, 
  MemoryRegionSection *section,
  hwaddr addr, hwaddr len);
   /* Lower = earlier (during add), later (during del) */
  -unsigned priority;
  +int priority;
 
 You haven't addressed any of the points I made reviewing
 the first version of this. Please don't just ignore
 code review, or people will stop reviewing your code.
Hi Peter,
I really value your comments and I did acted upon them.
Basically all the changes of this version are based on your comments, thanks!

I did answer to your comment and I was going to remove it,
but I missed it again :(. Sorry for that.
I will remove it of course in the following version.

 
 I think it would also be nice to update docs/memory.txt
 to say explicitly that priority is signed and why this
 is useful, something like this:
Of course I will, thanks!

 
 begin
 Priority values are signed, and the default value is
 zero. This means that you can use
 memory_region_add_subregion_overlap() both to specify
 a region that must sit 'above' any others (with a
 positive priority) and also a background region that
 sits 'below' others (with a negative priority).
 endit
 
 (in the 'Overlapping regions and priority' section
 of that document).
 
 thanks
 -- PMM

Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows

2013-09-09 Thread Stefan Hajnoczi

On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote:
 On 04.09.2013 19:08, Sebastian Ottlik wrote:
 This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems
 the default behaviour is equivalent to SO_REUSEADDR on other operating
 systems. SO_REUSEADDR can still be set but results in undesired behaviour
 instead. It may even lead to situations were system behaviour is
 unspecified. More information on this can be found at:
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx
 
 I originally encountered this issue when accidentally launching two QEMU
 instances with identical GDB ports at the same time. In which case QEMU won't
 fail as one might expect.
 
 v2 Changes:
 
 - Introduce a function with os specific implementation instead of using 
 #ifdef
I named it socket_set_fast_reuse instead of the suggested 
  qemu_set_reuseaddr
so the name better reflects what the function actually does.
 
   gdbstub.c  |6 ++
   include/qemu/sockets.h |1 +
   net/socket.c   |   19 +++
   slirp/misc.c   |3 +--
   slirp/socket.c |4 +---
   slirp/tcp_subr.c   |6 ++
   slirp/udp.c|4 ++--
   util/oslib-posix.c |   14 ++
   util/oslib-win32.c |   10 ++
   util/qemu-sockets.c|6 +++---
   10 files changed, 43 insertions(+), 30 deletions(-)
 
 
 util: add socket_set_fast_reuse function
 gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR
 net: call socket_set_fast_reuse instead of setting SO_REUSEADDR
 slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR
 util: call socket_set_fast_reuse instead of setting SO_REUSEADDR
 
 
 Pinging this patch, as I think it is still an appropriate approach
 to the issue:
 
 I did some research and apparently there is a valid use case for
 SO_REUSEADDR
 on windows when multiple clients need to listen to the same port for
 the same
 multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows
 might be confusing for some use cases. Actually net_socket_mcast_create in
 net/socket.c should probably set SO_REUSEADDR on windows. This is
 also an issue
 with patch 3 I supplied that I will address in a new version of this
 patch set if there is
 an agreement on a general approach.

Sounds like a good idea.  The patch series overall looks good.

Stefan

Re: [Qemu-devel] [PATCH v2] kvm: fix traces to use %x instead of %d

2013-09-09 Thread Stefan Hajnoczi

On Wed, Sep 04, 2013 at 08:26:25PM +1000, Alexey Kardashevskiy wrote:
 KVM request types are normally defined using hex constants but QEMU traces
 print decimal values instead, which is not very convenient.
 
 This changes the request type format from %d to %x.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: Andreas Färber afaer...@suse.de
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 ---
 Changes:
 v2:
 * added 0x to format strings so the change won't come unnoticed
 * fixed the commit message
 ---
  trace-events | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
  Created a MemoryRegion with negative priority that
  spans over all the pci address space.
  It intercepts the accesses to unassigned pci
  address space and will follow the pci spec:
   1. returns -1 on read
   2. does nothing on write
   3. sets the RECEIVED MASTER ABORT bit in the STATUS register
  of the device that initiated the transaction
  
  Note: This implementation assumes that all the reads/writes to
  the pci address space are done by the cpu.
  
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
  ---
  Changes from v1:
   - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
  various Host Bridges
   - pci-unassgined-mem does not have a .valid.accept field and
  implements read write methods
  
   hw/pci/pci.c | 46 
  ++
   include/hw/pci/pci_bus.h |  1 +
   2 files changed, 47 insertions(+)
  
  diff --git a/hw/pci/pci.c b/hw/pci/pci.c
  index d00682e..b6a8026 100644
  --- a/hw/pci/pci.c
  +++ b/hw/pci/pci.c
  @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
   return rootbus-qbus.name;
   }
   
  +static void unassigned_mem_access(PCIBus *bus)
  +{
  +/* FIXME assumption: memory access to the pci address
  + * space is always initiated by the host bridge
 
 /* Always
  * like
  * this
  */
 
 /* Not
  * like this */
OK

 
  + * (device 0 on the bus) */
 
 Can't we pass the master device in?
 We are still left with the assumption that
 there's a single master, but at least
 we get rid of the assumption that it's
 always device 0 function 0.
 
 
  +PCIDevice *d = bus-devices[0];
  +if (!d) {
  +return;
  +}
  +
  +pci_word_test_and_set_mask(d-config + PCI_STATUS,
  +   PCI_STATUS_REC_MASTER_ABORT);
 
 Probably should check and set secondary status for
 a bridge device.
I wanted to add the support for bridges later,
I am struggling with some issues with PCI bridges.
 
 
  +}
  +
  +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned 
  size)
  +{
  +PCIBus *bus = opaque;
  +unassigned_mem_access(bus);
  +
  +return -1ULL;
  +}
  +
  +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
  +unsigned size)
  +{
  +PCIBus *bus = opaque;
  +unassigned_mem_access(bus);
  +}
  +
  +static const MemoryRegionOps unassigned_mem_ops = {
  +.read = unassigned_mem_read,
  +.write = unassigned_mem_write,
  +.endianness = DEVICE_NATIVE_ENDIAN,
  +};
  +
  +#define UNASSIGNED_MEM_PRIORITY -1
  +
   static void pci_bus_init(PCIBus *bus, DeviceState *parent,
const char *name,
MemoryRegion *address_space_mem,
 
 
 I would rename unassigned to pci_master_Abort_ everywhere.
Are you sure about the capital A?

For the memory ops I suppose is OK, but the MemoryRegion name
I think it should remain unassigned_mem; it will be strange
to name it pci_master_Abort_mem.
 
 Call things by what they do not how they are used.
 
  @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
  *parent,
   bus-address_space_mem = address_space_mem;
   bus-address_space_io = address_space_io;
   
  +
  +memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
  +  unassigned_mem_ops, bus, pci-unassigned,
  +  memory_region_size(bus-address_space_mem));
  +memory_region_add_subregion_overlap(bus-address_space_mem,
  +bus-address_space_mem-addr,
  +bus-unassigned_mem,
  +UNASSIGNED_MEM_PRIORITY);
  +
   /* host bridge */
   QLIST_INIT(bus-child);
 
 
 It seems safer to add an API for enabling this functionality.
 Something like
 
 pci_set_master_for_master_abort(PCIBus *, PCIDevice *);
I started to implement that way, but it would be strange (I think) 
to see in the code the above method because almost all the pci devices
can be master devices.

I selected an implicit implementation so for this reason exactly.
We do not really select a master, but a master for writes/reads on
pci address space. Selecting device 0 on the bus automatically for
this makes more sense to me.
What do you think?
Marcel 

 
   
  diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
  index 9df1788..4cc25a3 100644
  --- a/include/hw/pci/pci_bus.h
  +++ b/include/hw/pci/pci_bus.h
  @@ -23,6 +23,7 @@ struct PCIBus {
   PCIDevice *parent_dev;
   MemoryRegion *address_space_mem;
   MemoryRegion *address_space_io;
  +MemoryRegion unassigned_mem;
   
   QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
   QLIST_ENTRY(PCIBus) sibling;/* this will be replaced

Re: [Qemu-devel] [PATCH v2 0/5] Do not set SO_REUSEADDR on Windows

2013-09-09 Thread Sebastian Ottlik


On 09.09.2013 14:05, Stefan Hajnoczi wrote:

On Thu, Sep 05, 2013 at 03:48:16PM +0200, Sebastian Ottlik wrote:

On 04.09.2013 19:08, Sebastian Ottlik wrote:

This patchset disabels all use of SO_REUSEADDR on Windows. On Windows systems
the default behaviour is equivalent to SO_REUSEADDR on other operating
systems. SO_REUSEADDR can still be set but results in undesired behaviour
instead. It may even lead to situations were system behaviour is
unspecified. More information on this can be found at:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx

I originally encountered this issue when accidentally launching two QEMU
instances with identical GDB ports at the same time. In which case QEMU won't
fail as one might expect.

v2 Changes:

- Introduce a function with os specific implementation instead of using #ifdef
   I named it socket_set_fast_reuse instead of the suggested qemu_set_reuseaddr
   so the name better reflects what the function actually does.

  gdbstub.c  |6 ++
  include/qemu/sockets.h |1 +
  net/socket.c   |   19 +++
  slirp/misc.c   |3 +--
  slirp/socket.c |4 +---
  slirp/tcp_subr.c   |6 ++
  slirp/udp.c|4 ++--
  util/oslib-posix.c |   14 ++
  util/oslib-win32.c |   10 ++
  util/qemu-sockets.c|6 +++---
  10 files changed, 43 insertions(+), 30 deletions(-)


util: add socket_set_fast_reuse function
gdbstub: call socket_set_fast_reuse instead of setting SO_REUSEADDR
net: call socket_set_fast_reuse instead of setting SO_REUSEADDR
slirp: call socket_set_fast_reuse instead of setting SO_REUSEADDR
util: call socket_set_fast_reuse instead of setting SO_REUSEADDR


Pinging this patch, as I think it is still an appropriate approach
to the issue:

I did some research and apparently there is a valid use case for
SO_REUSEADDR
on windows when multiple clients need to listen to the same port for
the same
multicast group. IMHO making qemu_setsockopt ignore SO_REUSEADDR on windows
might be confusing for some use cases. Actually net_socket_mcast_create in
net/socket.c should probably set SO_REUSEADDR on windows. This is
also an issue
with patch 3 I supplied that I will address in a new version of this
patch set if there is
an agreement on a general approach.

Sounds like a good idea.  The patch series overall looks good.

Stefan
Thanks for the feedback. I will resubmit the patch series including the 
change for net_socket_mcast_create and fixes for the style issues you 
pointed out soon.


When I submitted this new version of the patch set I think I was a 
little early as there was still some discussion in the thread of the 
original version. In general, what is a good period to wait before 
submitting a new version?


Best Regards,
  Sebastian

Re: [Qemu-devel] [PATCH RFC 1/3] memory: allow MemoryRegion's priority field to accept negative values

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-02 at 15:38 +0100, Peter Maydell wrote:
 On 2 September 2013 15:13, Marcel Apfelbaum marce...@redhat.com wrote:
  Priority is used to make visible some subregions by obscuring
  the parent MemoryRegion addresses overlapping with the subregion.
 
  By allowing the priority to be negative the opposite can be done:
  Allow a subregion to be visible on all the addresses not covered
  by the parent MemoryRegion or other subregions.
 
 This comment is not exactly accurate. Allowing priority to
 be signed is just a convenience: you can achieve exactly
 the same effect by specifying some positive priority for
 everything you map into the region and having the background
 region be priority zero. (If you care at all about priorities
 then everything being mapped into the region should be
 happening under the control of your code anyway.)
 
 
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
  ---
   include/exec/memory.h | 6 +++---
   memory.c  | 2 +-
   2 files changed, 4 insertions(+), 4 deletions(-)
 
  diff --git a/include/exec/memory.h b/include/exec/memory.h
  index ebe0d24..6995087 100644
  --- a/include/exec/memory.h
  +++ b/include/exec/memory.h
  @@ -153,7 +153,7 @@ struct MemoryRegion {
   bool flush_coalesced_mmio;
   MemoryRegion *alias;
   hwaddr alias_offset;
  -unsigned priority;
  +int priority;
   bool may_overlap;
   QTAILQ_HEAD(subregions, MemoryRegion) subregions;
   QTAILQ_ENTRY(MemoryRegion) subregions_link;
  @@ -193,7 +193,7 @@ struct MemoryListener {
   void (*coalesced_mmio_del)(MemoryListener *listener, 
  MemoryRegionSection *section,
  hwaddr addr, hwaddr len);
   /* Lower = earlier (during add), later (during del) */
  -unsigned priority;
  +int priority;
 
 This is unrelated to MemoryRegion priorities -- it controls the
 order in which listener callbacks are called. Don't try to change
 both at once (and you only need the MR priorities anyway.)
 
   AddressSpace *address_space_filter;
   QTAILQ_ENTRY(MemoryListener) link;
   };
  @@ -779,7 +779,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
   void memory_region_add_subregion_overlap(MemoryRegion *mr,
hwaddr offset,
MemoryRegion *subregion,
  - unsigned priority);
  + int priority);
 
   /**
* memory_region_get_ram_addr: Get the ram address associated with a memory
  diff --git a/memory.c b/memory.c
  index 886f838..dfb3ae6 100644
  --- a/memory.c
  +++ b/memory.c
  @@ -1473,7 +1473,7 @@ void memory_region_add_subregion(MemoryRegion *mr,
   void memory_region_add_subregion_overlap(MemoryRegion *mr,
hwaddr offset,
MemoryRegion *subregion,
  - unsigned priority)
  + int priority)
   {
   subregion-may_overlap = true;
   subregion-priority = priority;
 
 This isn't a complete set of changes. For instance
 memory_region_set_address() has a local variable 'priority'
 which should be signed now.
 
 sysbus_mmio_map_common() and sysbus_mmio_map_overlap()
 have priority arguments which need to change to match.
Missed that :(
Making necessary changes and send a new version
Thanks,
Marcel

 
 thanks
 -- PMM

Re: [Qemu-devel] [PATCH 0/9] Remove legacy unaligned bswap functions

2013-09-09 Thread Peter Maydell

On 9 September 2013 12:54, Michael S. Tsirkin m...@redhat.com wrote:
 On Sun, Aug 25, 2013 at 03:59:28PM +0100, Peter Maydell wrote:
 The bswap.h header includes a set of legacy unaligned functions
 that (since commit c732a52d3 at the beginning of this year) are
 just wrappers for underlying {ld,st}type functions. The legacy
 functions aren't used in many places, so just replace all their
 uses with uses of the new-style {ld,st} functions; this lets us
 remove the legacy wrappers altogether.

 I have a question on this: what happens if an unaligned
 address is supplied?
 In particular, if the address is supplied by the guest?

 Esp the pci wrappers have many callers - were they all
 audited?
 I tried checking but there are many callers, will take
 a while.

This patchset has zero behavioural changes, because
literally all it is doing is taking trivial wrapper
functions like

static inline void cpu_to_le16wu(uint16_t *p, uint16_t v)
{
stw_le_p(p, v);
}

and replacing the calls to the wrapper with direct
calls to the underlying function. There's no need
to audit calls to the callsites because there is
no semantic change and not even any implementation
change here.

There are no problems with unaligned accesses, because
the stw_le_p c functions are designed to work for
unaligned accesses: they boil down to calls to
ldl_p/stw_p and friends, which use memcpy() to
ensure they work OK with unaligned accesses.

You can check all this by looking at bswap.h.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
 On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
  On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
   Created a MemoryRegion with negative priority that
   spans over all the pci address space.
   It intercepts the accesses to unassigned pci
   address space and will follow the pci spec:
1. returns -1 on read
2. does nothing on write
3. sets the RECEIVED MASTER ABORT bit in the STATUS register
   of the device that initiated the transaction
   
   Note: This implementation assumes that all the reads/writes to
   the pci address space are done by the cpu.
   
   Signed-off-by: Marcel Apfelbaum marce...@redhat.com
   ---
   Changes from v1:
- pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
   various Host Bridges
- pci-unassgined-mem does not have a .valid.accept field and
   implements read write methods
   
hw/pci/pci.c | 46 
   ++
include/hw/pci/pci_bus.h |  1 +
2 files changed, 47 insertions(+)
   
   diff --git a/hw/pci/pci.c b/hw/pci/pci.c
   index d00682e..b6a8026 100644
   --- a/hw/pci/pci.c
   +++ b/hw/pci/pci.c
   @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
return rootbus-qbus.name;
}

   +static void unassigned_mem_access(PCIBus *bus)
   +{
   +/* FIXME assumption: memory access to the pci address
   + * space is always initiated by the host bridge
  
  /* Always
   * like
   * this
   */
  
  /* Not
   * like this */
 OK
 
  
   + * (device 0 on the bus) */
  
  Can't we pass the master device in?
  We are still left with the assumption that
  there's a single master, but at least
  we get rid of the assumption that it's
  always device 0 function 0.
  
  
   +PCIDevice *d = bus-devices[0];
   +if (!d) {
   +return;
   +}
   +
   +pci_word_test_and_set_mask(d-config + PCI_STATUS,
   +   PCI_STATUS_REC_MASTER_ABORT);
  
  Probably should check and set secondary status for
  a bridge device.
 I wanted to add the support for bridges later,
 I am struggling with some issues with PCI bridges.

Shouldn't be very different.

  
   +}
   +
   +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, unsigned 
   size)
   +{
   +PCIBus *bus = opaque;
   +unassigned_mem_access(bus);
   +
   +return -1ULL;
   +}
   +
   +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t val,
   +unsigned size)
   +{
   +PCIBus *bus = opaque;
   +unassigned_mem_access(bus);
   +}
   +
   +static const MemoryRegionOps unassigned_mem_ops = {
   +.read = unassigned_mem_read,
   +.write = unassigned_mem_write,
   +.endianness = DEVICE_NATIVE_ENDIAN,
   +};
   +
   +#define UNASSIGNED_MEM_PRIORITY -1
   +
static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 const char *name,
 MemoryRegion *address_space_mem,
  
  
  I would rename unassigned to pci_master_Abort_ everywhere.
 Are you sure about the capital A?

Ugh no, that's a typo.

 
 For the memory ops I suppose is OK, but the MemoryRegion name
 I think it should remain unassigned_mem; it will be strange
 to name it pci_master_Abort_mem.

Why would it be strange?

  
  Call things by what they do not how they are used.
  
   @@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
   *parent,
bus-address_space_mem = address_space_mem;
bus-address_space_io = address_space_io;

   +
   +memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
   +  unassigned_mem_ops, bus, pci-unassigned,
   +  memory_region_size(bus-address_space_mem));
   +memory_region_add_subregion_overlap(bus-address_space_mem,
   +bus-address_space_mem-addr,
   +bus-unassigned_mem,
   +UNASSIGNED_MEM_PRIORITY);
   +
/* host bridge */
QLIST_INIT(bus-child);
  
  
  It seems safer to add an API for enabling this functionality.
  Something like
  
  pci_set_master_for_master_abort(PCIBus *, PCIDevice *);
 I started to implement that way, but it would be strange (I think) 
 to see in the code the above method because almost all the pci devices
 can be master devices.

In theory yes in practice no one programs devices
with addresses that trigger master abort.

Addressing this theorectical issue would require
memory ops to get the bus master pointer.
When this happens the new API can go away,
but we can do this gradually.

 I selected an implicit implementation so for this reason exactly.
 We do not really select a master, but a master for writes/reads on
 pci address space. Selecting device 0 on the bus automatically for
 this

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Paul Moore

On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote:
 Il 06/09/2013 20:41, Eduardo Otubo ha scritto:
  Hello,
  
  Any chance to get this patch applied?
  
  Thanks!
 
 Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?
 
 Paolo

Out of respect for the work that Eduardo has done, and is continuing to do, 
with the QEMU seccomp filtering, I think Eduardo should be the one to take on 
this role.  If Eduardo declines I'll do ahead and submit a patch adding myself 
to the MAINTAINERS file.

  On 09/04/2013 11:11 AM, Paul Moore wrote:
  On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:
  This was causing Qemu process to hang when using -sandbox on.
  
  Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175
  
  Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com
  
  Works for me.
  
  Tested-by: Paul Moore pmo...@redhat.com
  
  ---
  
qemu-seccomp.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
  
  diff --git a/qemu-seccomp.c b/qemu-seccomp.c
  index 37d38f8..69cee44 100644
  --- a/qemu-seccomp.c
  +++ b/qemu-seccomp.c
  @@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
  seccomp_whitelist[]
  = { { SCMP_SYS(getuid), 245 },
  
{ SCMP_SYS(geteuid), 245 },
{ SCMP_SYS(timer_create), 245 },
  
  +{ SCMP_SYS(times), 245 },
  
{ SCMP_SYS(exit), 245 },
{ SCMP_SYS(clock_gettime), 245 },
{ SCMP_SYS(time), 245 },

-- 
paul moore
security and virtualization @ redhat

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
  On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
   On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
Created a MemoryRegion with negative priority that
spans over all the pci address space.
It intercepts the accesses to unassigned pci
address space and will follow the pci spec:
 1. returns -1 on read
 2. does nothing on write
 3. sets the RECEIVED MASTER ABORT bit in the STATUS register
of the device that initiated the transaction

Note: This implementation assumes that all the reads/writes to
the pci address space are done by the cpu.

Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
Changes from v1:
 - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
various Host Bridges
 - pci-unassgined-mem does not have a .valid.accept field and
implements read write methods

 hw/pci/pci.c | 46 
++
 include/hw/pci/pci_bus.h |  1 +
 2 files changed, 47 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d00682e..b6a8026 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus-qbus.name;
 }
 
+static void unassigned_mem_access(PCIBus *bus)
+{
+/* FIXME assumption: memory access to the pci address
+ * space is always initiated by the host bridge
   
   /* Always
* like
* this
*/
   
   /* Not
* like this */
  OK
  
   
+ * (device 0 on the bus) */
   
   Can't we pass the master device in?
   We are still left with the assumption that
   there's a single master, but at least
   we get rid of the assumption that it's
   always device 0 function 0.
   
   
+PCIDevice *d = bus-devices[0];
+if (!d) {
+return;
+}
+
+pci_word_test_and_set_mask(d-config + PCI_STATUS,
+   PCI_STATUS_REC_MASTER_ABORT);
   
   Probably should check and set secondary status for
   a bridge device.
  I wanted to add the support for bridges later,
  I am struggling with some issues with PCI bridges.
 
 Shouldn't be very different.
The current implementation uses only one MemoryRegion that spans
over all pci address space. It catches all the accesses both to
primary bus addresses (bus 0), or to the regions covered by the bridges.
The callback ALWAYS receive a pointer to the primary bus.

What would be a better approach?
1. Remain with a single MemoryRegion and check if the address
   belongs to a bridge address range and recursively travel
   the bridges tree and update the registers
2. Model a MemoryRegion for each bridge that represents
   the address range behind the bridge (not existing today).
   Add a MemoryRegion child to catch accesses to unassigned
   addresses behind the bridge, then recursively travel the
   bridges to top and update the registers. 

 
   
+}
+
+static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+
+return -1ULL;
+}
+
+static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t 
val,
+unsigned size)
+{
+PCIBus *bus = opaque;
+unassigned_mem_access(bus);
+}
+
+static const MemoryRegionOps unassigned_mem_ops = {
+.read = unassigned_mem_read,
+.write = unassigned_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+#define UNASSIGNED_MEM_PRIORITY -1
+
 static void pci_bus_init(PCIBus *bus, DeviceState *parent,
  const char *name,
  MemoryRegion *address_space_mem,
   
   
   I would rename unassigned to pci_master_Abort_ everywhere.
  Are you sure about the capital A?
 
 Ugh no, that's a typo.
 
  
  For the memory ops I suppose is OK, but the MemoryRegion name
  I think it should remain unassigned_mem; it will be strange
  to name it pci_master_Abort_mem.
 
 Why would it be strange?
1. Because it states what happens when there is a an access to
   unassigned memory and not that IS an unassigned memory.
2. This name is already used for unassigned IO ports and
   maybe is better to follow this convension
 
   
   Call things by what they do not how they are used.
   
@@ -294,6 +331,15 @@ static void pci_bus_init(PCIBus *bus, DeviceState 
*parent,
 bus-address_space_mem = address_space_mem;
 bus-address_space_io = address_space_io;
 
+
+memory_region_init_io(bus-unassigned_mem, OBJECT(bus),
+  unassigned_mem_ops, bus, pci-unassigned,
+

[Qemu-devel] [PATCH] hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.

2013-09-09 Thread Greg Kurz

Follow-up to Rusty's virtio endianness serie: enough to get a working
virtfs mount.

Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p.

Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
---
 hw/9pfs/virtio-9p-device.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index f0ffbe8..bc2d0da 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -19,6 +19,7 @@
 #include fsdev/qemu-fsdev.h
 #include virtio-9p-xattr.h
 #include virtio-9p-coth.h
+#include hw/virtio/virtio-access.h
 
 static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
 {
@@ -34,7 +35,7 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t 
*config)
 
 len = strlen(s-tag);
 cfg = g_malloc0(sizeof(struct virtio_9p_config) + len);
-stw_raw(cfg-tag_len, len);
+virtio_stl_p(cfg-tag_len, len);
 /* We don't copy the terminating null to config space */
 memcpy(cfg-tag, s-tag, len);
 memcpy(config, cfg, s-config_size);

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote:
 The scenario is covered only for the primary bus and not for buses
 behind the PCI bridge (the later being handled differently.)
 In this case, isn't the Host Bridge always device 00.0?

No. For instance the host controller may pass a nonzero
value of devfn_min to pci_bus_new/pci_register_bus (in
which case the host bridge will end up there; example
hw/pci-host/versatile.c) or it might just pass a nonzero
devfn to pci_create_simple when it creates the host controller
PCI device on the bus (I don't think we have anything that
does it that way, though).

 If not, can we find a way to scan all bus devices and find
 the host bridge so we will not have to manually add it to each
 host bridge?

It would be conceptually nicer not to treat host bridges as
a special case but instead to just report the abort back
to whatever the PCI master was (which might be a device
doing DMA). That might be a lot of effort though.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
 On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote:
  The scenario is covered only for the primary bus and not for buses
  behind the PCI bridge (the later being handled differently.)
  In this case, isn't the Host Bridge always device 00.0?
 
 No. For instance the host controller may pass a nonzero
 value of devfn_min to pci_bus_new/pci_register_bus (in
 which case the host bridge will end up there; example
 hw/pci-host/versatile.c) or it might just pass a nonzero
 devfn to pci_create_simple when it creates the host controller
 PCI device on the bus (I don't think we have anything that
 does it that way, though).
 
  If not, can we find a way to scan all bus devices and find
  the host bridge so we will not have to manually add it to each
  host bridge?
 
 It would be conceptually nicer not to treat host bridges as
 a special case but instead to just report the abort back
 to whatever the PCI master was (which might be a device
 doing DMA). That might be a lot of effort though.
 
 -- PMM

Yes. As a shortcut, what I suggest is registering the
device that wants to be notified of master aborts with
the bus.

-- 
MST

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
 It would be conceptually nicer not to treat host bridges as
 a special case but instead to just report the abort back
 to whatever the PCI master was (which might be a device
 doing DMA). That might be a lot of effort though.

 Yes. As a shortcut, what I suggest is registering the
 device that wants to be notified of master aborts with
 the bus.

Can you just pick the device which is (a subclass of)
TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
aren't using that class?

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote:
 On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote:
  The scenario is covered only for the primary bus and not for buses
  behind the PCI bridge (the later being handled differently.)
  In this case, isn't the Host Bridge always device 00.0?
 
 No. For instance the host controller may pass a nonzero
 value of devfn_min to pci_bus_new/pci_register_bus (in
 which case the host bridge will end up there; example
 hw/pci-host/versatile.c) or it might just pass a nonzero
 devfn to pci_create_simple when it creates the host controller
 PCI device on the bus (I don't think we have anything that
 does it that way, though).
If my assumption is not generally true, I will not use it
Thanks!

 
  If not, can we find a way to scan all bus devices and find
  the host bridge so we will not have to manually add it to each
  host bridge?
 
 It would be conceptually nicer not to treat host bridges as
 a special case but instead to just report the abort back
 to whatever the PCI master was (which might be a device
 doing DMA). That might be a lot of effort though.
This is exactly my point. ALL device on the bus can be masters
of a DMA transaction. So adding an interface as suggested by
Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
for the general case (a device doing DMA) it is too far from reality.

However, if we don't want the master device far all accesses
(including DMA) but only for accesses to pci address space,
in this specific case we can appoint the HostBridge
(explicitly/implicitly) as the master device because most
devices do not access pci address space of other devices.

As a conclusion, should we call the API
pci_set_master_for_master_abort_on_pci_space ?
Marcel




 
 -- PMM

Re: [Qemu-devel] [PATCH] qxl: fix local renderer

2013-09-09 Thread Cole Robinson

On 09/06/2013 01:20 AM, Gerd Hoffmann wrote:
 The local spice renderer assumes the primary surface is located at the
 start of the ram bar.  This used to be a requirement in qxl hardware
 revision 1.  In revision 2+ this is relaxed.  Nevertheless guest drivers
 continued to use the traditional location, for historical and backward
 compatibility reasons.  The qxl kms driver doesn't though as it depends
 on qxl revision 4+ anyway.
 
 Result is that local rendering is hosed for recent linux guests, you'll
 get pixel garbage with non-spice ui (gtk, sdl, vnc) and when doing
 screendumps.  Fix that by doing a proper mapping of the guest-specified
 memory location.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=948717
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/display/qxl-render.c | 15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)
 
 diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
 index 269b1a7..d34b0c4 100644
 --- a/hw/display/qxl-render.c
 +++ b/hw/display/qxl-render.c
 @@ -31,10 +31,6 @@ static void qxl_blit(PCIQXLDevice *qxl, QXLRect *rect)
  if (is_buffer_shared(surface)) {
  return;
  }
 -if (!qxl-guest_primary.data) {
 -trace_qxl_render_blit_guest_primary_initialized();
 -qxl-guest_primary.data = memory_region_get_ram_ptr(qxl-vga.vram);
 -}
  trace_qxl_render_blit(qxl-guest_primary.qxl_stride,
  rect-left, rect-right, rect-top, rect-bottom);
  src = qxl-guest_primary.data;
 @@ -104,7 +100,12 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
 *qxl)
  
  if (qxl-guest_primary.resized) {
  qxl-guest_primary.resized = 0;
 -qxl-guest_primary.data = memory_region_get_ram_ptr(qxl-vga.vram);
 +qxl-guest_primary.data = qxl_phys2virt(qxl,
 +
 qxl-guest_primary.surface.mem,
 +MEMSLOT_GROUP_GUEST);
 +if (!qxl-guest_primary.data) {
 +return;
 +}
  qxl_set_rect_to_surface(qxl, qxl-dirty[0]);
  qxl-num_dirty_rects = 1;
  trace_qxl_render_guest_primary_resized(
 @@ -128,6 +129,10 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
 *qxl)
  }
  dpy_gfx_replace_surface(vga-con, surface);
  }
 +
 +if (!qxl-guest_primary.data) {
 +return;
 +}
  for (i = 0; i  qxl-num_dirty_rects; i++) {
  if (qemu_spice_rect_is_empty(qxl-dirty+i)) {
  break;
 

Tested-by: Cole Robinson crobi...@redhat.com

And cc-ing qemu-stable

- Cole

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
 On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
  It would be conceptually nicer not to treat host bridges as
  a special case but instead to just report the abort back
  to whatever the PCI master was (which might be a device
  doing DMA). That might be a lot of effort though.
 
  Yes. As a shortcut, what I suggest is registering the
  device that wants to be notified of master aborts with
  the bus.
 
 Can you just pick the device which is (a subclass of)
 TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
 aren't using that class?
This is what I would really want to do, but some HOST Bridge devices
inherit directly from PCI_DEVICE.

TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
is a not a PCI device and does not help us here (not a PCI_DEVICE
on the bus)

Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
and it hold as composition a PCIDevice that will be part of
the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
inherits from PCI_DEVICE.

It seems to me as a dead end, or we need to re-factor the current
hierarchy.

Marcel
 
 -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:07, Marcel Apfelbaum marce...@redhat.com wrote:
 This is exactly my point. ALL device on the bus can be masters
 of a DMA transaction. So adding an interface as suggested by
 Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
 for the general case (a device doing DMA) it is too far from reality.

Actually I don't think it would be too painful.
At the moment in do_pci_register_device() we do this to
create the memory region used by a device for its bus
master transactions:

memory_region_init_alias(pci_dev-bus_master_enable_region,
 OBJECT(pci_dev), bus master,
 dma_as-root, 0,
 memory_region_size(dma_as-root));

If instead of using this alias directly as the
bus_master_enable region you instead:
 * create a container region
 * create a 'background' region at negative priority
   (ie one per device, and you can make the 'opaque' pointer
   point to the device, not the bus)
 * put the alias and the background region into the container
 * use the container as the bus_master_enable region

then you will get in your callback a pointer to the
device which caused the abort. You can then have your
callback call a method defined on PCIDevice which we
implement:
 * as do-nothing in the PCI device base class
 * as set-the-master-abort bit in the PCI host bridge
   class
(and anybody who wants to get fancy about handling aborts
can override it in their own device implementation)

That seems achievable without really requiring new
infrastructure. Have I missed something that wouldn't
work if we did this?

thanks
-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote:
 On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
 Can you just pick the device which is (a subclass of)
 TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
 aren't using that class?
 This is what I would really want to do, but some HOST Bridge devices
 inherit directly from PCI_DEVICE.

 TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
 is a not a PCI device and does not help us here (not a PCI_DEVICE
 on the bus)

Oops, yes, I get those two the wrong way round a lot. Anyway,
if we need to make all host bridges have a common subclass
we could certainly refactor them accordingly.

 Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
 and it hold as composition a PCIDevice that will be part of
 the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
 inherits from PCI_DEVICE.

This may just be wrong choice of name rather than actually
wrong hierarchy.

-- PMM

Re: [Qemu-devel] [PATCH] seccomp: adding times() to the whitelist

2013-09-09 Thread Eduardo Otubo




On 09/09/2013 09:36 AM, Paul Moore wrote:

On Monday, September 09, 2013 12:38:12 PM Paolo Bonzini wrote:

Il 06/09/2013 20:41, Eduardo Otubo ha scritto:

Hello,

 Any chance to get this patch applied?

Thanks!


Paul, perhaps you can add yourself to MAINTAINERS and send a pull request?

Paolo


Out of respect for the work that Eduardo has done, and is continuing to do,
with the QEMU seccomp filtering, I think Eduardo should be the one to take on
this role.  If Eduardo declines I'll do ahead and submit a patch adding myself
to the MAINTAINERS file.


If this is ok for everyone, I would be really glad to take this role to 
myself. Paul, thanks for this vote of confidence. Paolo, should I send a 
patch for MAINTAINERS right away?


Regards,




On 09/04/2013 11:11 AM, Paul Moore wrote:

On Wednesday, September 04, 2013 09:25:08 AM Eduardo Otubo wrote:

This was causing Qemu process to hang when using -sandbox on.

Related RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175

Signed-off-by: Eduardo Otubo ot...@linux.vnet.ibm.com


Works for me.

Tested-by: Paul Moore pmo...@redhat.com


---

   qemu-seccomp.c |1 +
   1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 37d38f8..69cee44 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -90,6 +90,7 @@ static const struct QemuSeccompSyscall
seccomp_whitelist[]
= { { SCMP_SYS(getuid), 245 },

   { SCMP_SYS(geteuid), 245 },
   { SCMP_SYS(timer_create), 245 },

+{ SCMP_SYS(times), 245 },

   { SCMP_SYS(exit), 245 },
   { SCMP_SYS(clock_gettime), 245 },
   { SCMP_SYS(time), 245 },




--
Eduardo Otubo
IBM Linux Technology Center

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
 On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote:
  On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
  Can you just pick the device which is (a subclass of)
  TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
  aren't using that class?
  This is what I would really want to do, but some HOST Bridge devices
  inherit directly from PCI_DEVICE.
 
  TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
  is a not a PCI device and does not help us here (not a PCI_DEVICE
  on the bus)
 
 Oops, yes, I get those two the wrong way round a lot. Anyway,
 if we need to make all host bridges have a common subclass
 we could certainly refactor them accordingly.
 
  Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
  and it hold as composition a PCIDevice that will be part of
  the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
  inherits from PCI_DEVICE.
 
 This may just be wrong choice of name rather than actually
 wrong hierarchy.
I try not to judge the naming convention, so let's leave it
aside for now.
My issue is that we have at least 2 ways to model the bridges:
1. TYPE_PCI_HOST_BRIDGE
   * derives from TYPE_SYS_BUS_DEVICE
   * has a bus
   * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
derives from TYPE_PCI_DEVICE
2. TYPE_PCIE_HOST_BRIDGE
   * derives from TYPE_PCI_HOST_BRIDGE which derives
 from TYPE_SYS_BUS_DEVICE
   * has a PciDevice and register it to the bus in order
 to work as (1)

I would like to implement an hierarchy that will allow
all the host bridge devices to have a common ancestor
In this was, we can scan the PCI bus to look for
master...
 

 
 -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 14:29, Marcel Apfelbaum marce...@redhat.com wrote:
 My issue is that we have at least 2 ways to model the bridges:
 1. TYPE_PCI_HOST_BRIDGE
* derives from TYPE_SYS_BUS_DEVICE
* has a bus
* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
 derives from TYPE_PCI_DEVICE
 2. TYPE_PCIE_HOST_BRIDGE
* derives from TYPE_PCI_HOST_BRIDGE which derives
  from TYPE_SYS_BUS_DEVICE
* has a PciDevice and register it to the bus in order
  to work as (1)

So I think there are two different (and slightly confusing)
things here:
 (1) the model of the host's PCI controller; this is
 typically derived from TYPE_SYS_BUS_DEVICE somehow
 but I guess in theory need not be; often it's a
 TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE.
 (2) the PCI device which sits on the PCI bus and is
 visible to the guest, usually with a PCI class ID
 of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile
 is different, for instance). Currently we generally
 make these direct subclasses of TYPE_PCI_DEVICE.

[The naming convention confusion arises because
different controllers are inconsistent about how
they name these classes and which type name ends up
with 'host' in it.]

What you're going to get in the callback is (2)...

 I would like to implement an hierarchy that will allow
 all the host bridge devices to have a common ancestor
 In this was, we can scan the PCI bus to look for
 master...

...and yes, we could insert an extra class and make
all those bridge hosts derive from it rather than
directly from TYPE_PCI_DEVICE if we needed to.

-- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:16 +0100, Peter Maydell wrote:
 On 9 September 2013 14:07, Marcel Apfelbaum marce...@redhat.com wrote:
  This is exactly my point. ALL device on the bus can be masters
  of a DMA transaction. So adding an interface as suggested by
  Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
  for the general case (a device doing DMA) it is too far from reality.
 
 Actually I don't think it would be too painful.
 At the moment in do_pci_register_device() we do this to
 create the memory region used by a device for its bus
 master transactions:
 
 memory_region_init_alias(pci_dev-bus_master_enable_region,
  OBJECT(pci_dev), bus master,
  dma_as-root, 0,
  memory_region_size(dma_as-root));
 
 If instead of using this alias directly as the
 bus_master_enable region you instead:
  * create a container region
  * create a 'background' region at negative priority
(ie one per device, and you can make the 'opaque' pointer
point to the device, not the bus)
  * put the alias and the background region into the container
  * use the container as the bus_master_enable region
 
 then you will get in your callback a pointer to the
 device which caused the abort. You can then have your
 callback call a method defined on PCIDevice which we
 implement:
  * as do-nothing in the PCI device base class
  * as set-the-master-abort bit in the PCI host bridge
class
The Received Master Abort bit must be set for the initiator.
In the case described here this bit mast be set in the
device register rather that in host bridge.

 (and anybody who wants to get fancy about handling aborts
 can override it in their own device implementation)
 
 That seems achievable without really requiring new
 infrastructure. Have I missed something that wouldn't
 work if we did this?
The idea seems correct (and cool!) to me (I'll look deeper),
but it covers only one way: upstream.
We need to unify this with the downstream: The cpu accesses to
unassigned memory should result in the callback to the host bridge.

All we need is that all the host bridges to have a common class and
following the same idea we get the downstream (The host bridge inititates
all transactions on the bus on behalf of the cpu)

If this works, we don't need no work around!
Marcel


 
 thanks
 -- PMM

[Qemu-devel] seccomp submaintainer? (was Re: [PATCH] seccomp: adding times() to the whitelist)

2013-09-09 Thread Paolo Bonzini

Il 09/09/2013 15:20, Eduardo Otubo ha scritto:
 Out of respect for the work that Eduardo has done, and is
 continuing to do, with the QEMU seccomp filtering, I think Eduardo
 should be the one to take on this role. If Eduardo declines I'll do
 ahead and submit a patch adding myself to the MAINTAINERS file.

 If this is ok for everyone, I would be really glad to take this role to
 myself. Paul, thanks for this vote of confidence. Paolo, should I send a
 patch for MAINTAINERS right away?

Ok, I was suggesting Paul because he was the one doing reviews.

Eduardo, that is also okay for me.  However, even as a maintainer please
do wait for Paul's reviews.  Many areas of QEMU have maintainers that do
not send their own patches without a review, so this wouldn't be a new
rule. :)

Please wait for Anthony's ack.  I changed the subject and CCed him to
grab his attention.

Paolo

[Qemu-devel] [RFC] qapi/error: Optional error propagation backtrace

2013-09-09 Thread Max Reitz

Add a configure switch which enables an error propagation backtrace.
This results in the error_set function prepending every message by the
source file name, function and line in which it was called, as well as
error_propagate appending this information to the propagated message,
resulting in a backtrace.

Signed-off-by: Max Reitz mre...@redhat.com
---
Since this obviously breaks existing tests (and cannot really be used for
new tests since it includes a line number, which is a rather volatile
output) and is generally not very useful to normal users, the switch is
disabled by default. This functionality is intended for developers
tracking down error paths.

Since I do not know whether I am the only one considering it useful
in the first place, this is just an RFC for now.

Furthermore, I am not sure whether a configure switch is really the right
way to implement this.
---
 configure| 12 +++
 include/qapi/error.h | 40 +++-
 util/error.c | 57 +++-
 3 files changed, 90 insertions(+), 19 deletions(-)

diff --git a/configure b/configure
index e989609..4e43f7b 100755
--- a/configure
+++ b/configure
@@ -243,6 +243,7 @@ gtk=
 gtkabi=2.0
 tpm=no
 libssh2=
+error_backtrace=no
 
 # parse CC options first
 for opt do
@@ -949,6 +950,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2=yes
   ;;
+  --enable-error-bt) error_backtrace=yes
+  ;;
+  --disable-error-bt) error_backtrace=no
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -1168,6 +1173,8 @@ echo   --gcov=GCOV  use specified gcov 
[$gcov_tool]
 echo   --enable-tpm enable TPM support
 echo   --disable-libssh2disable ssh block device support
 echo   --enable-libssh2 enable ssh block device support
+echo   --disable-error-bt   disable backtraces on internal errors
+echo   --enable-error-btenable backtraces on internal errors
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -3650,6 +3657,7 @@ echo gcov  $gcov_tool
 echo gcov enabled  $gcov
 echo TPM support   $tpm
 echo libssh2 support   $libssh2
+echo error backtraces  $error_backtrace
 echo TPM passthrough   $tpm_passthrough
 echo QOM debugging $qom_cast_debug
 
@@ -4044,6 +4052,10 @@ if test $libssh2 = yes ; then
   echo CONFIG_LIBSSH2=y  $config_host_mak
 fi
 
+if test $error_backtrace = yes ; then
+  echo CONFIG_ERROR_BACKTRACE=y  $config_host_mak
+fi
+
 if test $virtio_blk_data_plane = yes ; then
   echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)'  $config_host_mak
 fi
diff --git a/include/qapi/error.h b/include/qapi/error.h
index ffd1cea..d3cfe35 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -25,16 +25,28 @@ typedef struct Error Error;
 /**
  * Set an indirect pointer to an error given a ErrorClass value and a
  * printf-style human message.  This function is not meant to be used outside
- * of QEMU.
+ * of QEMU.  The message will be prepended by the file/function/line 
information
+ * if CONFIG_ERROR_BACKTRACE is defined.
  */
-void error_set(Error **err, ErrorClass err_class, const char *fmt, ...) 
GCC_FMT_ATTR(3, 4);
+void error_set_bt(const char *file, const char *func, int line,
+  Error **err, ErrorClass err_class, const char *fmt, ...)
+GCC_FMT_ATTR(6, 7);
+
+#define error_set(...) \
+error_set_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Set an indirect pointer to an error given a ErrorClass value and a
  * printf-style human message, followed by a strerror() string if
- * @os_error is not zero.
+ * @os_error is not zero.  The message will be prepended by the
+ * file/function/line information if CONFIG_ERROR_BACKTRACE is defined.
  */
-void error_set_errno(Error **err, int os_error, ErrorClass err_class, const 
char *fmt, ...) GCC_FMT_ATTR(4, 5);
+void error_set_errno_bt(const char *file, const char *func, int line,
+Error **err, int os_error, ErrorClass err_class,
+const char *fmt, ...) GCC_FMT_ATTR(7, 8);
+
+#define error_set_errno(...) \
+error_set_errno_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Same as error_set(), but sets a generic error
@@ -47,7 +59,11 @@ void error_set_errno(Error **err, int os_error, ErrorClass 
err_class, const char
 /**
  * Helper for open() errors
  */
-void error_setg_file_open(Error **errp, int os_errno, const char *filename);
+void error_setg_file_open_bt(const char *file, const char *func, int line,
+ Error **errp, int os_errno, const char *filename);
+
+#define error_setg_file_open(...) \
+error_setg_file_open_bt(__FILE__, __func__, __LINE__, __VA_ARGS__)
 
 /**
  * Returns true if an indirect pointer to an error is pointing to a valid
@@ -71,11 +87,17 @@ Error *error_copy(const Error *err);
 const char *error_get_pretty(Error *err);
 
 /**
- *

Re: [Qemu-devel] [PATCH V3] xl: HVM domain S3 bugfix

2013-09-09 Thread Ian Campbell

On Mon, 2013-09-09 at 03:29 +, Liu, Jinsong wrote:
 From 18344216b432648605726b137b348f28ef64a4ef Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 23 Aug 2013 23:30:23 +0800
 Subject: [PATCH V3] xl: HVM domain S3 bugfix
 
 Currently Xen hvm s3 has a bug coming from the difference between
 qemu-traditioanl and qemu-xen. For qemu-traditional, the way to

traditional

 resume from hvm s3 is via 'xl trigger' command. However, for
 qemu-xen, the way to resume from hvm s3 inherited from standard
 qemu, i.e. via QMP, and it doesn't work under Xen.
 
 The root cause is, for qemu-xen, 'xl trigger' command didn't reset
 devices, while QMP didn't unpause hvm domain though they did qemu
 system reset.
 
 We have two qemu patches one xl patch to fix the HVM S3 bug:
 This patch is the xl patch. It invokes QMP system_wakeup so that
 qemu logic for hvm s3 could be triggerred.

triggered

 
 Signed-off-by: Liu Jinsong jinsong@intel.com

Acked-by: Ian Campbell ian.campb...@citrix.com

I'll hold off on applying until someone givers me a heads up that the
qemu side is in place. I'll try and remember to fix the commit message
typos as I commit, so no need to resend for those.

 ---
  tools/libxl/libxl.c  |   31 +--
  tools/libxl/libxl_internal.h |2 ++
  tools/libxl/libxl_qmp.c  |5 +
  3 files changed, 36 insertions(+), 2 deletions(-)
 
 diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
 index 81785df..267bc25 100644
 --- a/tools/libxl/libxl.c
 +++ b/tools/libxl/libxl.c
 @@ -4641,10 +4641,37 @@ int libxl_domain_sched_params_get(libxl_ctx *ctx, 
 uint32_t domid,
  return ret;
  }
  
 +static int libxl__domain_s3_resume(libxl__gc *gc, int domid)
 +{
 +int rc = 0;
 +
 +switch (libxl__domain_type(gc, domid)) {
 +case LIBXL_DOMAIN_TYPE_HVM:
 +switch (libxl__device_model_version_running(gc, domid)) {
 +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
 +rc = xc_set_hvm_param(CTX-xch, domid, HVM_PARAM_ACPI_S_STATE, 
 0);
 +break;
 +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
 +rc = libxl__qmp_system_wakeup(gc, domid);
 +break;
 +default:
 +rc = ERROR_INVAL;
 +break;
 +}
 +break;
 +default:
 +rc = ERROR_INVAL;
 +break;
 +}
 +
 +return rc;
 +}
 +
  int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
 libxl_trigger trigger, uint32_t vcpuid)
  {
  int rc;
 +GC_INIT(ctx);
  
  switch (trigger) {
  case LIBXL_TRIGGER_POWER:
 @@ -4668,8 +4695,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
  XEN_DOMCTL_SENDTRIGGER_RESET, vcpuid);
  break;
  case LIBXL_TRIGGER_S3RESUME:
 -xc_set_hvm_param(ctx-xch, domid, HVM_PARAM_ACPI_S_STATE, 0);
 -rc = 0;
 +rc = libxl__domain_s3_resume(gc, domid);
  break;
  default:
  rc = -1;
 @@ -4684,6 +4710,7 @@ int libxl_send_trigger(libxl_ctx *ctx, uint32_t domid,
  rc = ERROR_FAIL;
  }
  
 +GC_FREE;
  return rc;
  }
  
 diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
 index f051d91..0ef0a3a 100644
 --- a/tools/libxl/libxl_internal.h
 +++ b/tools/libxl/libxl_internal.h
 @@ -1408,6 +1408,8 @@ _hidden int libxl__qmp_query_serial(libxl__qmp_handler 
 *qmp);
  _hidden int libxl__qmp_pci_add(libxl__gc *gc, int d, libxl_device_pci 
 *pcidev);
  _hidden int libxl__qmp_pci_del(libxl__gc *gc, int domid,
 libxl_device_pci *pcidev);
 +/* Resume hvm domain */
 +_hidden int libxl__qmp_system_wakeup(libxl__gc *gc, int domid);
  /* Suspend QEMU. */
  _hidden int libxl__qmp_stop(libxl__gc *gc, int domid);
  /* Resume QEMU. */
 diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
 index f681f3a..3c3b301 100644
 --- a/tools/libxl/libxl_qmp.c
 +++ b/tools/libxl/libxl_qmp.c
 @@ -878,6 +878,11 @@ int libxl__qmp_pci_del(libxl__gc *gc, int domid, 
 libxl_device_pci *pcidev)
  return qmp_device_del(gc, domid, id);
  }
  
 +int libxl__qmp_system_wakeup(libxl__gc *gc, int domid)
 +{
 +return qmp_run_command(gc, domid, system_wakeup, NULL, NULL, NULL);
 +}
 +
  int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename)
  {
  libxl__json_object *args = NULL;

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote:
 On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
  On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
   On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
 Created a MemoryRegion with negative priority that
 spans over all the pci address space.
 It intercepts the accesses to unassigned pci
 address space and will follow the pci spec:
  1. returns -1 on read
  2. does nothing on write
  3. sets the RECEIVED MASTER ABORT bit in the STATUS register
 of the device that initiated the transaction
 
 Note: This implementation assumes that all the reads/writes to
 the pci address space are done by the cpu.
 
 Signed-off-by: Marcel Apfelbaum marce...@redhat.com
 ---
 Changes from v1:
  - pci-unassigned-mem MemoryRegion resides now in PCIBus and not on
 various Host Bridges
  - pci-unassgined-mem does not have a .valid.accept field and
 implements read write methods
 
  hw/pci/pci.c | 46 
 ++
  include/hw/pci/pci_bus.h |  1 +
  2 files changed, 47 insertions(+)
 
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index d00682e..b6a8026 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
  return rootbus-qbus.name;
  }
  
 +static void unassigned_mem_access(PCIBus *bus)
 +{
 +/* FIXME assumption: memory access to the pci address
 + * space is always initiated by the host bridge

/* Always
 * like
 * this
 */

/* Not
 * like this */
   OK
   

 + * (device 0 on the bus) */

Can't we pass the master device in?
We are still left with the assumption that
there's a single master, but at least
we get rid of the assumption that it's
always device 0 function 0.


 +PCIDevice *d = bus-devices[0];
 +if (!d) {
 +return;
 +}
 +
 +pci_word_test_and_set_mask(d-config + PCI_STATUS,
 +   PCI_STATUS_REC_MASTER_ABORT);

Probably should check and set secondary status for
a bridge device.
   I wanted to add the support for bridges later,
   I am struggling with some issues with PCI bridges.
  
  Shouldn't be very different.
 The current implementation uses only one MemoryRegion that spans
 over all pci address space. It catches all the accesses both to
 primary bus addresses (bus 0), or to the regions covered by the bridges.

That's not what I see in code.

pci_bridge_initfn creates address spaces for IO and memory.
Bridge windows are aliases created by pci_bridge_init_alias.
All they do is forward transactions to the secondary bus,
so these address spaces represent secondary bus addressing.

What did I miss?

 The callback ALWAYS receive a pointer to the primary bus.
 
 What would be a better approach?
 1. Remain with a single MemoryRegion and check if the address
belongs to a bridge address range and recursively travel
the bridges tree and update the registers
 2. Model a MemoryRegion for each bridge that represents
the address range behind the bridge (not existing today).

I think this is exactly what address_space_XXX represent.

Add a MemoryRegion child to catch accesses to unassigned
addresses behind the bridge, then recursively travel the
bridges to top and update the registers. 

This bit sounds right.

 
  

 +}
 +
 +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
 unsigned size)
 +{
 +PCIBus *bus = opaque;
 +unassigned_mem_access(bus);
 +
 +return -1ULL;
 +}
 +
 +static void unassigned_mem_write(void *opaque, hwaddr addr, uint64_t 
 val,
 +unsigned size)
 +{
 +PCIBus *bus = opaque;
 +unassigned_mem_access(bus);
 +}
 +
 +static const MemoryRegionOps unassigned_mem_ops = {
 +.read = unassigned_mem_read,
 +.write = unassigned_mem_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,
 +};
 +
 +#define UNASSIGNED_MEM_PRIORITY -1
 +
  static void pci_bus_init(PCIBus *bus, DeviceState *parent,
   const char *name,
   MemoryRegion *address_space_mem,


I would rename unassigned to pci_master_Abort_ everywhere.
   Are you sure about the capital A?
  
  Ugh no, that's a typo.
  
   
   For the memory ops I suppose is OK, but the MemoryRegion name
   I think it should remain unassigned_mem; it will be strange
   to name it pci_master_Abort_mem.
  
  Why would it be strange?
 1. Because it states what happens when there is a an access to

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 02:02:47PM +0100, Peter Maydell wrote:
 On 9 September 2013 13:59, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Sep 09, 2013 at 01:52:11PM +0100, Peter Maydell wrote:
  It would be conceptually nicer not to treat host bridges as
  a special case but instead to just report the abort back
  to whatever the PCI master was (which might be a device
  doing DMA). That might be a lot of effort though.
 
  Yes. As a shortcut, what I suggest is registering the
  device that wants to be notified of master aborts with
  the bus.
 
 Can you just pick the device which is (a subclass of)
 TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
 aren't using that class?
 
 -- PMM

Not anymore I think. So yes, I think this will work.

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 04:07:53PM +0300, Marcel Apfelbaum wrote:
 On Mon, 2013-09-09 at 13:52 +0100, Peter Maydell wrote:
  On 9 September 2013 13:43, Marcel Apfelbaum marce...@redhat.com wrote:
   The scenario is covered only for the primary bus and not for buses
   behind the PCI bridge (the later being handled differently.)
   In this case, isn't the Host Bridge always device 00.0?
  
  No. For instance the host controller may pass a nonzero
  value of devfn_min to pci_bus_new/pci_register_bus (in
  which case the host bridge will end up there; example
  hw/pci-host/versatile.c) or it might just pass a nonzero
  devfn to pci_create_simple when it creates the host controller
  PCI device on the bus (I don't think we have anything that
  does it that way, though).
 If my assumption is not generally true, I will not use it
 Thanks!
 
  
   If not, can we find a way to scan all bus devices and find
   the host bridge so we will not have to manually add it to each
   host bridge?
  
  It would be conceptually nicer not to treat host bridges as
  a special case but instead to just report the abort back
  to whatever the PCI master was (which might be a device
  doing DMA). That might be a lot of effort though.
 This is exactly my point. ALL device on the bus can be masters
 of a DMA transaction. So adding an interface as suggested by
 Michael: pci_set_master_for_master_abort(PCIBus *, PCIDevice *)
 for the general case (a device doing DMA) it is too far from reality.
 
 However, if we don't want the master device far all accesses
 (including DMA) but only for accesses to pci address space,
 in this specific case we can appoint the HostBridge
 (explicitly/implicitly) as the master device because most
 devices do not access pci address space of other devices.
 
 As a conclusion, should we call the API
 pci_set_master_for_master_abort_on_pci_space ?
 Marcel

I like Peter's idea of detecting a host bridge
implictly using device type.
With a big FIXME explaining that we shouldn't
need to do this.


 
 
 
  
  -- PMM

Re: [Qemu-devel] [PATCH 1/2] alpha-linux-user: Fix umount syscall numbers

2013-09-09 Thread Riku Voipio

Hi,

On Mon, Aug 26, 2013 at 01:26:11PM -0700, Richard Henderson wrote:
 Ping.

Sorry for the delay, adding it to the next pull request.

Riku

 On 08/16/2013 11:24 PM, Richard Henderson wrote:
  Ping.
  
  r~
  
  
  On 07/24/2013 12:50 PM, Richard Henderson wrote:
  It has been pointed out on LKML that the alpha umount syscall numbers
  are named wrong, and a patch to rectify that has been posted for 3.11.
 
  Glibc works around this by treating NR_umount as NR_umount2 if
  NR_oldumount exists.  That's more complicated than we need in QEMU,
  given that we control linux-user/*/syscall_nr.h.
 
  This is the last instance of TARGET_NR_oldumount, so delete that from
  the strace.list.
 
  Signed-off-by: Richard Henderson r...@twiddle.net
  ---
   linux-user/alpha/syscall_nr.h | 4 ++--
   linux-user/strace.list| 3 ---
   linux-user/syscall.c  | 2 +-
   3 files changed, 3 insertions(+), 6 deletions(-)
 
  diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h
  index ac2b6e2..d52d76e 100644
  --- a/linux-user/alpha/syscall_nr.h
  +++ b/linux-user/alpha/syscall_nr.h
  @@ -20,7 +20,7 @@
   #define TARGET_NR_lseek19
   #define TARGET_NR_getxpid  20
   #define TARGET_NR_osf_mount21
  -#define TARGET_NR_umount   22
  +#define TARGET_NR_umount2  22
   #define TARGET_NR_setuid   23
   #define TARGET_NR_getxuid  24
   #define TARGET_NR_exec_with_loader 25 /* not implemented */
  @@ -255,7 +255,7 @@
   #define TARGET_NR_sysinfo 318
   #define TARGET_NR__sysctl 319
   /* 320 was sys_idle.  */
  -#define TARGET_NR_oldumount   321
  +#define TARGET_NR_umount  321
   #define TARGET_NR_swapon  322
   #define TARGET_NR_times   323
   #define TARGET_NR_personality 324
  diff --git a/linux-user/strace.list b/linux-user/strace.list
  index 08f115d..4f9c364 100644
  --- a/linux-user/strace.list
  +++ b/linux-user/strace.list
  @@ -612,9 +612,6 @@
   #ifdef TARGET_NR_oldstat
   { TARGET_NR_oldstat, oldstat , NULL, NULL, NULL },
   #endif
  -#ifdef TARGET_NR_oldumount
  -{ TARGET_NR_oldumount, oldumount , NULL, NULL, NULL },
  -#endif
   #ifdef TARGET_NR_olduname
   { TARGET_NR_olduname, olduname , NULL, NULL, NULL },
   #endif
  diff --git a/linux-user/syscall.c b/linux-user/syscall.c
  index 00a0390..e42c20e 100644
  --- a/linux-user/syscall.c
  +++ b/linux-user/syscall.c
  @@ -5719,7 +5719,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
  arg1,
   unlock_user(p, arg1, 0);
   }
   break;
  -#ifdef TARGET_NR_umount2 /* not on alpha */
  +#ifdef TARGET_NR_umount2
   case TARGET_NR_umount2:
   if (!(p = lock_user_string(arg1)))
   goto efault;

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 14:39 +0100, Peter Maydell wrote:
 On 9 September 2013 14:29, Marcel Apfelbaum marce...@redhat.com wrote:
  My issue is that we have at least 2 ways to model the bridges:
  1. TYPE_PCI_HOST_BRIDGE
 * derives from TYPE_SYS_BUS_DEVICE
 * has a bus
 * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
  derives from TYPE_PCI_DEVICE
  2. TYPE_PCIE_HOST_BRIDGE
 * derives from TYPE_PCI_HOST_BRIDGE which derives
   from TYPE_SYS_BUS_DEVICE
 * has a PciDevice and register it to the bus in order
   to work as (1)
 
 So I think there are two different (and slightly confusing)
 things here:
  (1) the model of the host's PCI controller; this is
  typically derived from TYPE_SYS_BUS_DEVICE somehow
  but I guess in theory need not be; often it's a
  TYPE_PCI_HOST_BRIDGE or TYPE_PCIE_HOST_BRIDGE.
  (2) the PCI device which sits on the PCI bus and is
  visible to the guest, usually with a PCI class ID
  of PCI_CLASS_BRIDGE_HOST (but not necessarily; versatile
  is different, for instance). Currently we generally
  make these direct subclasses of TYPE_PCI_DEVICE.
 
 [The naming convention confusion arises because
 different controllers are inconsistent about how
 they name these classes and which type name ends up
 with 'host' in it.]
 
 What you're going to get in the callback is (2)...
This I come to understand, thanks!
 
  I would like to implement an hierarchy that will allow
  all the host bridge devices to have a common ancestor
  In this was, we can scan the PCI bus to look for
  master...
 
 ...and yes, we could insert an extra class and make
 all those bridge hosts derive from it rather than
 directly from TYPE_PCI_DEVICE if we needed to.
And we solve the main issue - no need for an API for master abort
- for upstream: use your suggestion (let's hope it works)
- for downstream: look on the bus for a device deriving
  from this class and *this device* will handle the
  master abort
I'll find a way how to handle the PCI bridges.

By the way, I am not sure that the upstream transactions (DMA)
can actually end with a master abort. Master abort would happen
if a transaction will not be claimed by any device on the bus.
But in DMA transactions, maybe the host bridge will always claim it
and return with error. Does anybody know something about this?

Thanks for all the help!
Marcel




 
 -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Michael S. Tsirkin

On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote:
 On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
  On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote:
   On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
   Can you just pick the device which is (a subclass of)
   TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
   aren't using that class?
   This is what I would really want to do, but some HOST Bridge devices
   inherit directly from PCI_DEVICE.
  
   TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
   is a not a PCI device and does not help us here (not a PCI_DEVICE
   on the bus)
  
  Oops, yes, I get those two the wrong way round a lot. Anyway,
  if we need to make all host bridges have a common subclass
  we could certainly refactor them accordingly.
  
   Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
   and it hold as composition a PCIDevice that will be part of
   the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
   inherits from PCI_DEVICE.
  
  This may just be wrong choice of name rather than actually
  wrong hierarchy.
 I try not to judge the naming convention, so let's leave it
 aside for now.
 My issue is that we have at least 2 ways to model the bridges:
 1. TYPE_PCI_HOST_BRIDGE
* derives from TYPE_SYS_BUS_DEVICE
* has a bus
* one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
 derives from TYPE_PCI_DEVICE
 2. TYPE_PCIE_HOST_BRIDGE
* derives from TYPE_PCI_HOST_BRIDGE which derives
  from TYPE_SYS_BUS_DEVICE
* has a PciDevice and register it to the bus in order
  to work as (1)
 
 I would like to implement an hierarchy that will allow
 all the host bridge devices to have a common ancestor
 In this was, we can scan the PCI bus to look for
 master...

I wouldn't object to is_host is stuct PCIDeviceClass.
That's probably easier that trying to create
a common class for devices that share no common code.


 
  
  -- PMM

Re: [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization

2013-09-09 Thread Igor Mammedov

On Fri, 02 Aug 2013 22:33:24 +0200
Andreas Färber afaer...@suse.de wrote:

 Am 23.07.2013 18:22, schrieb Igor Mammedov:
  Signed-off-by: Igor Mammedov imamm...@redhat.com
  ---
   vl.c |7 +--
   1 files changed, 1 insertions(+), 6 deletions(-)
  
  diff --git a/vl.c b/vl.c
  index 8190504..bf0c658 100644
  --- a/vl.c
  +++ b/vl.c
  @@ -2947,7 +2947,7 @@ int main(int argc, char **argv, char **envp)
   module_call_init(MODULE_INIT_MACHINE);
   machine = find_default_machine();
   cpu_model = NULL;
  -ram_size = 0;
  +ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
   snapshot = 0;
   cyls = heads = secs = 0;
   translation = BIOS_ATA_TRANSLATION_AUTO;
  @@ -4064,11 +4064,6 @@ int main(int argc, char **argv, char **envp)
   exit(1);
   }
   
  -/* init the memory */
  -if (ram_size == 0) {
  -ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
  -}
  -
   if (qemu_opts_foreach(qemu_find_opts(device), device_help_func, 
  NULL, 0)
   != 0) {
   exit(0);
 
 Commit message doesn't give any explanation why?
it was intended as cleanup

 
 What happens with -m 0? My guess is the old code translates that to the
 default size, where by intializing the default earlier it would stay.
patch is broken in this aspect. It aborts on start up with -m 0

The question is if -m 0 is correct value, perhaps QEMU should exit with
error message in this case, instead of silent fallback to default?

 
 Andreas

Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements

2013-09-09 Thread Richard Henderson

On 09/09/2013 01:13 AM, Claudio Fontana wrote:
 after carefully reading and testing your patches, this is how I suggest to 
 proceed: 
 
 first do the implementation of the new functionality (tcg opcodes, jit) in a 
 way that is consistent with the existing code.
 No type changes, no refactoring, no beautification.
 
 Once we agree on those, introduce the meaningful restructuring you want to do,
 like the new INSN type, the don't handle mov/movi in tcg_out_op, the 
 TCG_OPF_64BIT thing, etc.
 
 Last do the cosmetic stuff if you really want to do it, like the change all 
 ext to bool (note that there is no point if the callers still use 1 and 
 0: adapt them as well) etc.

No, I don't agree.  Especially with respect to the insn type.

I'd much rather do all the cosmetic stuff, as you put it, first.  It makes
all of the real changes much easier to understand.


r~

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 16:58 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 03:43:49PM +0300, Marcel Apfelbaum wrote:
  On Mon, 2013-09-09 at 15:23 +0300, Michael S. Tsirkin wrote:
   On Mon, Sep 09, 2013 at 03:11:55PM +0300, Marcel Apfelbaum wrote:
On Mon, 2013-09-09 at 14:40 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 02:11:54PM +0300, Marcel Apfelbaum wrote:
  Created a MemoryRegion with negative priority that
  spans over all the pci address space.
  It intercepts the accesses to unassigned pci
  address space and will follow the pci spec:
   1. returns -1 on read
   2. does nothing on write
   3. sets the RECEIVED MASTER ABORT bit in the STATUS register
  of the device that initiated the transaction
  
  Note: This implementation assumes that all the reads/writes to
  the pci address space are done by the cpu.
  
  Signed-off-by: Marcel Apfelbaum marce...@redhat.com
  ---
  Changes from v1:
   - pci-unassigned-mem MemoryRegion resides now in PCIBus and not 
  on
  various Host Bridges
   - pci-unassgined-mem does not have a .valid.accept field and
  implements read write methods
  
   hw/pci/pci.c | 46 
  ++
   include/hw/pci/pci_bus.h |  1 +
   2 files changed, 47 insertions(+)
  
  diff --git a/hw/pci/pci.c b/hw/pci/pci.c
  index d00682e..b6a8026 100644
  --- a/hw/pci/pci.c
  +++ b/hw/pci/pci.c
  @@ -283,6 +283,43 @@ const char *pci_root_bus_path(PCIDevice *dev)
   return rootbus-qbus.name;
   }
   
  +static void unassigned_mem_access(PCIBus *bus)
  +{
  +/* FIXME assumption: memory access to the pci address
  + * space is always initiated by the host bridge
 
 /* Always
  * like
  * this
  */
 
 /* Not
  * like this */
OK

 
  + * (device 0 on the bus) */
 
 Can't we pass the master device in?
 We are still left with the assumption that
 there's a single master, but at least
 we get rid of the assumption that it's
 always device 0 function 0.
 
 
  +PCIDevice *d = bus-devices[0];
  +if (!d) {
  +return;
  +}
  +
  +pci_word_test_and_set_mask(d-config + PCI_STATUS,
  +   PCI_STATUS_REC_MASTER_ABORT);
 
 Probably should check and set secondary status for
 a bridge device.
I wanted to add the support for bridges later,
I am struggling with some issues with PCI bridges.
   
   Shouldn't be very different.
  The current implementation uses only one MemoryRegion that spans
  over all pci address space. It catches all the accesses both to
  primary bus addresses (bus 0), or to the regions covered by the bridges.
 
 That's not what I see in code.
 
 pci_bridge_initfn creates address spaces for IO and memory.
 Bridge windows are aliases created by pci_bridge_init_alias.
 All they do is forward transactions to the secondary bus,
 so these address spaces represent secondary bus addressing.
 
 What did I miss?
I was referring to *my* implementation of unassigned mem region,
not to pci bridge's regions.
 
  The callback ALWAYS receive a pointer to the primary bus.
  
  What would be a better approach?
  1. Remain with a single MemoryRegion and check if the address
 belongs to a bridge address range and recursively travel
 the bridges tree and update the registers
  2. Model a MemoryRegion for each bridge that represents
 the address range behind the bridge (not existing today).
 
 I think this is exactly what address_space_XXX represent.
Today I see (using info mtree) under the bridge:
 1. A  container memory region for all the space 0 - 2^64
 2. Regions for the devices under the bridge
I do not see a region corresponding with the bridge memory region
that would be forwarded to the secondary bus.

Thanks,
Marcel

  
 Add a MemoryRegion child to catch accesses to unassigned
 addresses behind the bridge, then recursively travel the
 bridges to top and update the registers. 
 
 This bit sounds right.
 
  
   
 
  +}
  +
  +static uint64_t unassigned_mem_read(void *opaque, hwaddr addr, 
  unsigned size)
  +{
  +PCIBus *bus = opaque;
  +unassigned_mem_access(bus);
  +
  +return -1ULL;
  +}
  +
  +static void unassigned_mem_write(void *opaque, hwaddr addr, 
  uint64_t val,
  +unsigned size)
  +{
  +PCIBus *bus = opaque;
  +unassigned_mem_access(bus);
  +}
  +
  +static const MemoryRegionOps unassigned_mem_ops = {
  +.read = unassigned_mem_read,
  +.write = unassigned_mem_write,
  +.endianness = DEVICE_NATIVE_ENDIAN,
  +};
  +
  +#define UNASSIGNED_MEM_PRIORITY -1
  +

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Marcel Apfelbaum

On Mon, 2013-09-09 at 17:04 +0300, Michael S. Tsirkin wrote:
 On Mon, Sep 09, 2013 at 04:29:04PM +0300, Marcel Apfelbaum wrote:
  On Mon, 2013-09-09 at 14:19 +0100, Peter Maydell wrote:
   On 9 September 2013 14:15, Marcel Apfelbaum marce...@redhat.com wrote:
On Mon, 2013-09-09 at 14:02 +0100, Peter Maydell wrote:
Can you just pick the device which is (a subclass of)
TYPE_PCI_HOST_BRIDGE, or do we have host bridges which
aren't using that class?
This is what I would really want to do, but some HOST Bridge devices
inherit directly from PCI_DEVICE.
   
TYPE_PCI_HOST_BRIDGE derives from TYPE_SYS_BUS_DEVICE which
is a not a PCI device and does not help us here (not a PCI_DEVICE
on the bus)
   
   Oops, yes, I get those two the wrong way round a lot. Anyway,
   if we need to make all host bridges have a common subclass
   we could certainly refactor them accordingly.
   
Strangely TYPE_Q35_HOST_DEVICE derives from TYPE_SYS_BUS_DEVICE
and it hold as composition a PCIDevice that will be part of
the bus, as opposed to TYPE_I440FX_PCI_DEVICE which directly
inherits from PCI_DEVICE.
   
   This may just be wrong choice of name rather than actually
   wrong hierarchy.
  I try not to judge the naming convention, so let's leave it
  aside for now.
  My issue is that we have at least 2 ways to model the bridges:
  1. TYPE_PCI_HOST_BRIDGE
 * derives from TYPE_SYS_BUS_DEVICE
 * has a bus
 * one of the bus devices is a TYPE_I440FX_PCI_DEVICE which
  derives from TYPE_PCI_DEVICE
  2. TYPE_PCIE_HOST_BRIDGE
 * derives from TYPE_PCI_HOST_BRIDGE which derives
   from TYPE_SYS_BUS_DEVICE
 * has a PciDevice and register it to the bus in order
   to work as (1)
  
  I would like to implement an hierarchy that will allow
  all the host bridge devices to have a common ancestor
  In this was, we can scan the PCI bus to look for
  master...
 
 I wouldn't object to is_host is stuct PCIDeviceClass.
 That's probably easier that trying to create
 a common class for devices that share no common code.
This will work too, and less changes to the code.
However Peter suggested that we can unify both upstream
and downstream handling of the master abort by putting
it all in this class.
But if we have is_host flag, we can differentiate
regular devices from host devices and handle them different.

If there are no objections, I will chose this path.
Thanks!
Marcel




 
 
  
   
   -- PMM

Re: [Qemu-devel] [PATCH RFC v2 2/2] hw/pci: handle unassigned pci addresses

2013-09-09 Thread Peter Maydell

On 9 September 2013 15:04, Marcel Apfelbaum marce...@redhat.com wrote:
 By the way, I am not sure that the upstream transactions (DMA)
 can actually end with a master abort. Master abort would happen
 if a transaction will not be claimed by any device on the bus.
 But in DMA transactions, maybe the host bridge will always claim it
 and return with error. Does anybody know something about this?

No, it's perfectly possible for a bus master transaction
to abort. The PC's host controller happens to be set up so
that bus master DMA covers the whole of the PCI memory space
and so it's probably not possible to get an abort on that
platform, but this isn't necessarily the case. For instance
the versatilePB's PCI controller only responds to accesses
within its programmed MMIO BAR ranges, so if the device
or the controller have been misconfigured you can get an
abort when the device tries to do DMA. (This usually causes
the device to decide something has gone seriously wrong.
For instance an EHCI USB controller device will stop
and set the STS_FATAL bit in its status register:
http://lxr.free-electrons.com/source/include/linux/usb/ehci_def.h#L96

If we wanted to implement this correctly we would need
to be able to return an access succeeded/failed indicator
from dma_memory_write(): check hw/usb/hcd-ehci.c:get_dwords()
(which already has the logic for whoops, DMA failed
for the extremely simple case of no DMA address space).

-- PMM

1 2 3 >

1 - 100 of 213 matches

Mail list logo