date:20130723

Re: [Qemu-devel] [qom-cpu PATCH 2/2] i386: disable PMU passthrough mode by default

2013-07-23 Thread Igor Mammedov

On Mon, 22 Jul 2013 16:25:35 -0300
Eduardo Habkost ehabk...@redhat.com wrote:

 Bug description: QEMU currently gets all bits from GET_SUPPORTED_CPUID
 for CPUID leaf 0xA and passes them directly to the guest. This makes
 the guest ABI depend on host kernel and host CPU capabilities, and
 breaks live migration if we migrate between host with different
 capabilities (e.g. different number of PMU counters).
 
 This patch adds a pmu-passthrough property to X86CPU, and set it to
 true only on -cpu host, or on pc-*-1.5 and older machine-types.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  include/hw/i386/pc.h  |  4 
  target-i386/cpu-qom.h |  7 +++
  target-i386/cpu.c | 11 ++-
  3 files changed, 21 insertions(+), 1 deletion(-)
 
 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index 7fb97b0..3cea83f 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -235,6 +235,10 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
  .driver   = virtio-net-pci,\
  .property = any_layout,\
  .value= off,\
 +},{\
 +.driver = TYPE_X86_CPU,\
 +.property = pmu-passthrough,\
 +.value = on,\
  }
  
  #define PC_COMPAT_1_4 \
 diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
 index 7e55e5f..b505a45 100644
 --- a/target-i386/cpu-qom.h
 +++ b/target-i386/cpu-qom.h
 @@ -68,6 +68,13 @@ typedef struct X86CPU {
  
  /* Features that were filtered out because of missing host capabilities 
 */
  uint32_t filtered_features[FEATURE_WORDS];
 +
 +/* Pass all PMU CPUID bits to the guest directly from 
 GET_SUPPORTED_CPUID.
 + * This can't be enabled by default because it breaks live-migration,
 + * as it makes the guest ABI change depending on host CPU/kernel
 + * capabilities.
 + */
 +bool pmu_passthrough;
  } X86CPU;
  
  static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 41c81af..e192f63 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -1475,17 +1475,25 @@ static void x86_cpu_get_feature_words(Object *obj, 
 Visitor *v, void *opaque,
  error_propagate(errp, err);
  }
  
 +static Property cpu_x86_properties[] = {
 +DEFINE_PROP_BOOL(pmu-passthrough, X86CPU, pmu_passthrough, false),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def,
  const char *name)
  {
  x86_def_t *def;
  int i;
 +Error *err = NULL;
  
  if (name == NULL) {
  return -1;
  }
  if (kvm_enabled()  strcmp(name, host) == 0) {
  kvm_cpu_fill_host(x86_cpu_def);
 +object_property_set_bool(OBJECT(cpu), true, pmu-passthrough, err);
 +assert_no_error(err);
Could this hunk be implemented using compat props?
That would spare us dealing with it later.

  return 0;
  }
  
 @@ -2017,7 +2025,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  break;
  case 0xA:
  /* Architectural Performance Monitoring Leaf */
 -if (kvm_enabled()) {
 +if (kvm_enabled()  cpu-pmu_passthrough) {
  KVMState *s = cs-kvm_state;
  
  *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
 @@ -2516,6 +2524,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
 void *data)
  xcc-parent_realize = dc-realize;
  dc-realize = x86_cpu_realizefn;
  dc-bus_type = TYPE_ICC_BUS;
 +dc-props = cpu_x86_properties;
  
  xcc-parent_reset = cc-reset;
  cc-reset = x86_cpu_reset;

Re: [Qemu-devel] Emulating mips

2013-07-23 Thread Renich Bon Ciric

On Tue, Jul 23, 2013 at 12:41 AM, Rob Landley r...@landley.net wrote:
 On 07/23/2013 12:16:53 AM, Renich Bon Ciric wrote:

 Hello,

 I am new to this...

 I'm trying to run some rom file I got from a client. It's a sc2005
 processor; supposedly compatible with 4k.

 Anyway, I do this:

 qemu-system-mips -M mips -pflash 301-3100\ -\ user\ specified\ -\
 Full.bin -serial stdio

 The processor goes to 100% but I see nothing, not in the serial
 console nor in the window (monitor, maybe?)

 I'd appreciate some tips


 I have working mips images at

 http://landley.net/aboriginal/bin/system-image-mips.tar.bz2

 Grab that extract it, and ./run-emulator.sh.

 That should let you know what working looks like, and if you can dig a
 chroot or loopback mount out of your rom image, you can probably mount it
 under there and try running the binaries.

 Rob

Ah, I've noticed on the kernel output that it's configured for a 24kc
and my rom is desinged for a 4k compatible cpu. I think I should
recreate the vmlinux right?

Anyway, it totally stops when it seems not able to find the right
partition for the rom:

Linux version 3.10.0 (landley@driftwood) (collect2: ld returned 1 exit
status) #1 Wed Jul 3 00:54:09 CDT 2013
bootconsole [early0] enabled
CPU revision is: 00019300 (MIPS 24Kc)
FPU revision is: 00739300
Software DMA cache coherency enabled
Determined physical RAM map:
 memory: 1000 @  (reserved)
 memory: 000ef000 @ 1000 (ROM data)
 memory: 0038f000 @ 000f (reserved)
 memory: 07b8 @ 0047f000 (usable)
Wasting 36832 bytes for tracking 1151 unused pages
Zone ranges:
  DMA  [mem 0x-0x00ff]
  Normal   [mem 0x0100-0x07ffefff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x07ffefff]
Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes.
Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32511
Kernel command line: root=/dev/hda rw console=ttyS0 HOST=mips
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Writing ErrCtl register=
Readback ErrCtl register=
Memory: 125296k/126464k available (2596k kernel code, 1168k reserved,
606k data, 188k init, 0k highmem)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS:256
CPU frequency 200.00 MHz
Console: colour dummy device 80x25
Calibrating delay loop... 970.75 BogoMIPS (lpj=1941504)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
devtmpfs: initialized
NET: Registered protocol family 16
bio: create slab bio-0 at 0
vgaarb: loaded
SCSI subsystem initialized
PCI host bridge to bus :00
pci_bus :00: root bus resource [mem 0x1000-0x17ff]
pci_bus :00: root bus resource [io  0x2000-0x1f]
pci_bus :00: No busn resource found for root bus, will use [bus 00-ff]
pci :00:0a.3: no compatible bridge window for [io  0x1100-0x110f]
vgaarb: device added: PCI::00:12.0,decodes=io+mem,owns=none,locks=none
pci :00:0a.3: BAR 8: [io  0x1100-0x110f] has bogus alignment
pci :00:12.0: BAR 0: assigned [mem 0x1000-0x11ff pref]
pci :00:0b.0: BAR 6: assigned [mem 0x1200-0x1201 pref]
pci :00:12.0: BAR 6: assigned [mem 0x1202-0x1202 pref]
pci :00:12.0: BAR 1: assigned [mem 0x1203-0x12030fff]
pci :00:0a.2: BAR 4: assigned [io  0x2000-0x201f]
pci :00:0b.0: BAR 0: assigned [io  0x2020-0x203f]
pci :00:0b.0: BAR 1: assigned [mem 0x12031000-0x1203101f]
pci :00:0a.1: BAR 4: assigned [io  0x2040-0x204f]
Switching to clocksource pit
NET: Registered protocol family 2
TCP established hash table entries: 1024 (order: 1, 8192 bytes)
TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
TCP: Hash tables configured (established 1024 bind 1024)
TCP: reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
PCI: Enabling device :00:0a.2 ( - 0001)
squashfs: version 4.0 (2009/01/31) Phillip Lougher
9p: Installing v9fs 9p2000 file system support
msgmni has been set to 244
io scheduler noop registered (default)
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial8250.0: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
console [ttyS0] enabled, bootconsole disabled
console [ttyS0] enabled, bootconsole disabled
serial8250.0: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
loop: module loaded
Uniform Multi-Platform E-IDE driver
piix :00:0a.1: IDE controller (0x8086:0x7111 rev 0x00)
PCI: Enabling device :00:0a.1 ( - 0001)
piix :00:0a.1: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x2040-0x2047
ide1: BM-DMA at 0x2048-0x204f
hda: QEMU HARDDISK, ATA DISK drive
hda: UDMA/33 mode selected
hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
hdc: UDMA/33 mode

Re: [Qemu-devel] [PATCH qom-cpu] HACKING: Document vaddr type usage

2013-07-23 Thread Paolo Bonzini

Il 22/07/2013 19:27, Peter Maydell ha scritto:
 On 22 July 2013 17:36, Andreas Färber afaer...@suse.de wrote:
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  HACKING | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/HACKING b/HACKING
 index e73ac79..d9dbb46 100644
 --- a/HACKING
 +++ b/HACKING
 @@ -42,6 +42,7 @@ ram_addr_t.

  Use target_ulong (or abi_ulong) for CPU virtual addresses, however
  devices should not need to use target_ulong.
 +Use vaddr for CPU virtual addresses in target-independent code.
 
 Here's my suggestion for this paragraph (ie to replace
 both the Use target_ulong... and Use vaddr sentences
 above):
 
 ===begin===
 For CPU virtual addresses there are several possible types.
 vaddr is the best type to use to hold a CPU virtual address
 in target-independent code, including most devices. It is
 guaranteed to be large enough to hold a virtual address for
 any target, and it does not change size from target to target.
 It is always unsigned.
 target_ulong is a type the size of a virtual address on the CPU;
 this means it may be 32 or 64 bits depending on which target
 is being built. It should therefore be used only in target
 specific code, and in some performance-critical built-per-target
 core code such as the TLB code. There is also a signed version,
 target_long.
 abi_ulong is for the *-user targets, and represents a type the
 size of 'void *' in that target's ABI. (This may not be the same
 as the size of a full CPU virtual address in the case of target
 ABIs which use 32 bit pointers on 64 bit CPUs, like sparc32plus.)
 Definitions of structures that must match the target's ABI
 must use this type for anything that on the target is defined
 to be an 'unsigned long' or a pointer type. There is also a
 signed version, abi_long.
 ===endit===
 
 (cc'ing Paolo to check I didn't mangle the abi_ulong/target_ulong
 distinction.)

Yes, it's fine.  You might add that abi_short, abi_int, etc. also exist,
and that they have the same alignment as short/int/etc in the target ABI.

There is one nit: tn fact abi_ulong has the size of 'long'---which is
the same as 'void *', but only because our *-user targets are all ILP32
or LP64.  A hypotectical windows-user target would make abi_ullong the
size of 'void *'.

Paolo

[Qemu-devel] KVM call agenda for 2013-07-30

2013-07-23 Thread Juan Quintela


Hi

Please, send any topic that you are interested in covering.

Thanks, Juan.

PD.  If you want to attend and you don't have the call details,
  contact me.

Re: [Qemu-devel] trim in windows guest witch virtio

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 03:05, Libaiqing ha scritto:
 Hi paolo,
 Recently I test trim function,and it works well in linux guest with ext4 
 fs.
 
 How to test it in windows guest? I got some info like this: 
   1 windows7 can send discard command when the storage device is ssd;
   2 find a tool like 'fstrim', 'TRIM' the volume manually.

I think it only works with IDE and AHCI on Windows.  You need a filter
driver to send it on SCSI disks.

Paolo

Re: [Qemu-devel] [PATCH v2 00/11] Point-in-time snapshot exporting over NBD

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 03:52, Wenchao Xia ha scritto:
 Dirty change tracking is what miss for a full solution, which
  naturally exist in backing chain snapshot.
 
   Any one is doing dirty tracking feature? If no, I'd like to draft a
 version based on add-cow.

I had started it, I can dig it up.  Remind me if I don't write about it
in a couple of days.

Paolo

Re: [Qemu-devel] [PATCH] Convert stderr message calling error_get_pretty() to error_report() to prepend timestamp

2013-07-23 Thread Markus Armbruster

Andreas Färber afaer...@suse.de writes:

 Am 22.07.2013 23:03, schrieb Seiji Aguchi:
 Convert stderr messages calling error_get_pretty()
 to error_report().
 
 Timestamp is prepended by -msg timstamp option with it.
 
 This is suggested by Luiz Capitulino.
 
 http://marc.info/?l=qemu-develm=137331442628866w=2
 
 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
 ---
  arch_init.c |4 ++--
  hw/char/serial.c|5 +++--
  hw/i386/pc.c|6 +++---
  qemu-char.c |2 +-
  target-i386/cpu.c   |2 +-
  target-ppc/translate_init.c |3 ++-
  vl.c|9 +
  7 files changed, 17 insertions(+), 14 deletions(-)

 How is this related to error_get_pretty()?

 And does error_report() expect a trailing \n or not? I would assume no,
 but spotted some in this patch.

You're correct, and the patch needs fixing.  See commits

312fd5f error: Strip trailing '\n' from error string arguments (again)
be62a2e Strip trailing '\n' from error_report()'s first argument (again)
6daf194 Strip trailing '\n' from error_report()'s first argument

[...]

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Michael S. Tsirkin

On Mon, Jul 22, 2013 at 03:29:46PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Sun, Jul 21, 2013 at 04:09:04PM +0200, Andreas Färber wrote:
  Signed-off-by: Andreas Färber afaer...@suse.de
  ---
   hw/pci-bridge/ioh3420.c| 23 ++-
   hw/pci-bridge/xio3130_downstream.c | 23 ++-
   hw/pci-bridge/xio3130_upstream.c   | 15 +++
   hw/pci/pcie_port.c | 22 ++
   include/hw/pci/pcie_port.h | 14 --
   5 files changed, 61 insertions(+), 36 deletions(-)
  
  diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
  index 728f658..83db054 100644
  --- a/hw/pci-bridge/ioh3420.c
  +++ b/hw/pci-bridge/ioh3420.c
  @@ -92,9 +92,8 @@ static void ioh3420_reset(DeviceState *qdev)
   
   static int ioh3420_initfn(PCIDevice *d)
   {
  -PCIBridge *br = PCI_BRIDGE(d);
  -PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
  -PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
  +PCIEPort *p = PCIE_PORT(d);
  +PCIESlot *s = PCIE_SLOT(d);
   int rc;
   
   rc = pci_bridge_initfn(d, TYPE_PCIE_BUS);
  @@ -148,9 +147,7 @@ err_bridge:
   
   static void ioh3420_exitfn(PCIDevice *d)
   {
  -PCIBridge *br = PCI_BRIDGE(d);
  -PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
  -PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
  +PCIESlot *s = PCIE_SLOT(d);
   
   pcie_aer_exit(d);
   pcie_chassis_del_slot(s);
  @@ -180,7 +177,7 @@ PCIESlot *ioh3420_init(PCIBus *bus, int devfn, bool 
  multifunction,
   qdev_prop_set_uint16(qdev, slot, slot);
   qdev_init_nofail(qdev);
   
  -return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
  +return PCIE_SLOT(d);
   }
   
   static const VMStateDescription vmstate_ioh3420 = {
  @@ -190,19 +187,19 @@ static const VMStateDescription vmstate_ioh3420 = {
   .minimum_version_id_old = 1,
   .post_load = pcie_cap_slot_post_load,
   .fields = (VMStateField[]) {
  -VMSTATE_PCIE_DEVICE(port.br.parent_obj, PCIESlot),
  -VMSTATE_STRUCT(port.br.parent_obj.exp.aer_log, PCIESlot, 0,
  +VMSTATE_PCIE_DEVICE(parent_obj, PCIBridge),
  +VMSTATE_STRUCT(exp.aer_log, PCIDevice, 0,
  vmstate_pcie_aer_log, PCIEAERLog),
   VMSTATE_END_OF_LIST()
   }
   };
   
   static Property ioh3420_properties[] = {
  -DEFINE_PROP_UINT8(port, PCIESlot, port.port, 0),
  +DEFINE_PROP_UINT8(port, PCIEPort, port, 0),
   DEFINE_PROP_UINT8(chassis, PCIESlot, chassis, 0),
   DEFINE_PROP_UINT16(slot, PCIESlot, slot, 0),
  -DEFINE_PROP_UINT16(aer_log_max, PCIESlot,
  -   port.br.parent_obj.exp.aer_log.log_max,
  +DEFINE_PROP_UINT16(aer_log_max, PCIDevice,
  +   exp.aer_log.log_max,
  PCIE_AER_LOG_MAX_DEFAULT),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 
  This looks scary. This does a cast to different types
  without any checks at all.
 
 What cast?
 
 VMstate takes a void *.
 
 One an object is cast to a void *, it's just as much as PCIESlot as it
 is a PCIEPort.
 
 That said, for consistency, I think having everything be relatively to
 *one* type for a Property list is pretty helpful.
 
 Expecting someone to know the type hierarchy by heart such that this
 doesn't look like a bug is too much IMHO.
 
 Regards,
 
 Anthony Liguori

Exactly.

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Michael S. Tsirkin

On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote:
 Am 22.07.2013 22:29, schrieb Anthony Liguori:
  Michael S. Tsirkin m...@redhat.com writes:
  
  On Sun, Jul 21, 2013 at 04:09:04PM +0200, Andreas Färber wrote:
  Signed-off-by: Andreas Färber afaer...@suse.de
  ---
   hw/pci-bridge/ioh3420.c| 23 ++-
   hw/pci-bridge/xio3130_downstream.c | 23 ++-
   hw/pci-bridge/xio3130_upstream.c   | 15 +++
   hw/pci/pcie_port.c | 22 ++
   include/hw/pci/pcie_port.h | 14 --
   5 files changed, 61 insertions(+), 36 deletions(-)
 
  diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
  index 728f658..83db054 100644
  --- a/hw/pci-bridge/ioh3420.c
  +++ b/hw/pci-bridge/ioh3420.c
  @@ -92,9 +92,8 @@ static void ioh3420_reset(DeviceState *qdev)
   
   static int ioh3420_initfn(PCIDevice *d)
   {
  -PCIBridge *br = PCI_BRIDGE(d);
  -PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
  -PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
  +PCIEPort *p = PCIE_PORT(d);
  +PCIESlot *s = PCIE_SLOT(d);
   int rc;
   
   rc = pci_bridge_initfn(d, TYPE_PCIE_BUS);
  @@ -148,9 +147,7 @@ err_bridge:
   
   static void ioh3420_exitfn(PCIDevice *d)
   {
  -PCIBridge *br = PCI_BRIDGE(d);
  -PCIEPort *p = DO_UPCAST(PCIEPort, br, br);
  -PCIESlot *s = DO_UPCAST(PCIESlot, port, p);
  +PCIESlot *s = PCIE_SLOT(d);
   
   pcie_aer_exit(d);
   pcie_chassis_del_slot(s);
  @@ -180,7 +177,7 @@ PCIESlot *ioh3420_init(PCIBus *bus, int devfn, bool 
  multifunction,
   qdev_prop_set_uint16(qdev, slot, slot);
   qdev_init_nofail(qdev);
   
  -return DO_UPCAST(PCIESlot, port, DO_UPCAST(PCIEPort, br, br));
  +return PCIE_SLOT(d);
   }
   
   static const VMStateDescription vmstate_ioh3420 = {
  @@ -190,19 +187,19 @@ static const VMStateDescription vmstate_ioh3420 = {
   .minimum_version_id_old = 1,
   .post_load = pcie_cap_slot_post_load,
   .fields = (VMStateField[]) {
  -VMSTATE_PCIE_DEVICE(port.br.parent_obj, PCIESlot),
  -VMSTATE_STRUCT(port.br.parent_obj.exp.aer_log, PCIESlot, 0,
  +VMSTATE_PCIE_DEVICE(parent_obj, PCIBridge),
  +VMSTATE_STRUCT(exp.aer_log, PCIDevice, 0,
  vmstate_pcie_aer_log, PCIEAERLog),
   VMSTATE_END_OF_LIST()
   }
   };
   
   static Property ioh3420_properties[] = {
  -DEFINE_PROP_UINT8(port, PCIESlot, port.port, 0),
  +DEFINE_PROP_UINT8(port, PCIEPort, port, 0),
   DEFINE_PROP_UINT8(chassis, PCIESlot, chassis, 0),
   DEFINE_PROP_UINT16(slot, PCIESlot, slot, 0),
  -DEFINE_PROP_UINT16(aer_log_max, PCIESlot,
  -   port.br.parent_obj.exp.aer_log.log_max,
  +DEFINE_PROP_UINT16(aer_log_max, PCIDevice,
  +   exp.aer_log.log_max,
  PCIE_AER_LOG_MAX_DEFAULT),
   DEFINE_PROP_END_OF_LIST(),
   };
 
 
  This looks scary. This does a cast to different types
  without any checks at all.
  
  What cast?
  
  VMstate takes a void *.
  
  One an object is cast to a void *, it's just as much as PCIESlot as it
  is a PCIEPort.
  
  That said, for consistency, I think having everything be relatively to
  *one* type for a Property list is pretty helpful.
  
  Expecting someone to know the type hierarchy by heart such that this
  doesn't look like a bug is too much IMHO.
 
 I'm updating the patch to that effect for VMState. But I notice the real
 fix for qdev properties would be to move the PCIEPort property to the
 new PCIEPort type, so that all derived types inherit it. :)
 
 For VMState I believe the real follow-up fix would be mst defining a
 central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice.
 Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE()
 in the first place?
 
 Regards,
 Andreas

The real fix is savevm/loadvm taking into account
the class hierarchy.


 -- 
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 4/9] linux-user: Clean up sendrecvmsg message parsing

2013-07-23 Thread Riku Voipio

On 6 July 2013 15:17, Alexander Graf ag...@suse.de wrote:
 The cmsg handling in the linux-user code is very hard to read as it tries
 to follow glibc's unreadable code closely. Let's clean it up, converting
 all helpers into static inline functions, so that QEMU developers have a
 chance to understand what's going on.

 While at it, this also allows us to make the next target header lookup more
 obvious and GUEST_BASE safe. We only compare host mapped pointers between each
 other now.

 During the rewrite I also saw that we never advance our target cmsg structure
 to the next one. This behavior is identical to the previous code, but looks
 very bogus to me.

recvmsg01 and sendmsg01 both segfault (arm target and x86_64 host)
with both current linux-user code and after this patch. So there is
more to fix here. On the bright side this patch is not an regression
:)

 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  linux-user/syscall.c  |   25 ---
  linux-user/syscall_defs.h |   58 ++--
  2 files changed, 55 insertions(+), 28 deletions(-)

 diff --git a/linux-user/syscall.c b/linux-user/syscall.c
 index 89b7698..8eb5c31 100644
 --- a/linux-user/syscall.c
 +++ b/linux-user/syscall.c
 @@ -1120,6 +1120,7 @@ static inline abi_long target_to_host_cmsg(struct 
 msghdr *msgh,
  abi_long msg_controllen;
  abi_ulong target_cmsg_addr;
  struct target_cmsghdr *target_cmsg;
 +struct target_cmsghdr *first_target_cmsg;
  socklen_t space = 0;

  msg_controllen = tswapal(target_msgh-msg_controllen);
 @@ -1129,13 +1130,14 @@ static inline abi_long target_to_host_cmsg(struct 
 msghdr *msgh,
  target_cmsg = lock_user(VERIFY_READ, target_cmsg_addr, msg_controllen, 
 1);
  if (!target_cmsg)
  return -TARGET_EFAULT;
 +first_target_cmsg = target_cmsg;

  while (cmsg  target_cmsg) {
  void *data = CMSG_DATA(cmsg);
 -void *target_data = TARGET_CMSG_DATA(target_cmsg);
 +void *target_data = target_cmsg_data(target_cmsg);

  int len = tswapal(target_cmsg-cmsg_len)
 -  - TARGET_CMSG_ALIGN(sizeof (struct target_cmsghdr));
 +  - target_cmsg_align(sizeof (struct target_cmsghdr));

  space += CMSG_SPACE(len);
  if (space  msgh-msg_controllen) {
 @@ -1161,7 +1163,8 @@ static inline abi_long target_to_host_cmsg(struct 
 msghdr *msgh,
  }

  cmsg = CMSG_NXTHDR(msgh, cmsg);
 -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg);
 +target_cmsg = target_cmsg_nxthdr(target_msgh, target_cmsg,
 + first_target_cmsg);
  }
  unlock_user(target_cmsg, target_cmsg_addr, 0);
   the_end:
 @@ -1176,6 +1179,7 @@ static inline abi_long host_to_target_cmsg(struct 
 target_msghdr *target_msgh,
  abi_long msg_controllen;
  abi_ulong target_cmsg_addr;
  struct target_cmsghdr *target_cmsg;
 +struct target_cmsghdr *first_target_cmsg;
  socklen_t space = 0;

  msg_controllen = tswapal(target_msgh-msg_controllen);
 @@ -1183,25 +1187,27 @@ static inline abi_long host_to_target_cmsg(struct 
 target_msghdr *target_msgh,
  goto the_end;
  target_cmsg_addr = tswapal(target_msgh-msg_control);
  target_cmsg = lock_user(VERIFY_WRITE, target_cmsg_addr, msg_controllen, 
 0);
 -if (!target_cmsg)
 +if (!target_cmsg) {
  return -TARGET_EFAULT;
 +}
 +first_target_cmsg = target_cmsg;

  while (cmsg  target_cmsg) {
  void *data = CMSG_DATA(cmsg);
 -void *target_data = TARGET_CMSG_DATA(target_cmsg);
 +void *target_data = target_cmsg_data(target_cmsg);

  int len = cmsg-cmsg_len - CMSG_ALIGN(sizeof (struct cmsghdr));

 -space += TARGET_CMSG_SPACE(len);
 +space += target_cmsg_space(len);
  if (space  msg_controllen) {
 -space -= TARGET_CMSG_SPACE(len);
 +space -= target_cmsg_space(len);
  gemu_log(Target cmsg overflow\n);
  break;
  }

  target_cmsg-cmsg_level = tswap32(cmsg-cmsg_level);
  target_cmsg-cmsg_type = tswap32(cmsg-cmsg_type);
 -target_cmsg-cmsg_len = tswapal(TARGET_CMSG_LEN(len));
 +target_cmsg-cmsg_len = tswapal(target_cmsg_len(len));

  if ((cmsg-cmsg_level == TARGET_SOL_SOCKET) 
  (cmsg-cmsg_type == SCM_RIGHTS)) {
 @@ -1228,7 +1234,8 @@ static inline abi_long host_to_target_cmsg(struct 
 target_msghdr *target_msgh,
  }

  cmsg = CMSG_NXTHDR(msgh, cmsg);
 -target_cmsg = TARGET_CMSG_NXTHDR(target_msgh, target_cmsg);
 +target_cmsg = target_cmsg_nxthdr(target_msgh, target_cmsg,
 + first_target_cmsg);
  }
  unlock_user(target_cmsg, target_cmsg_addr, space);
   the_end:
 diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
 index 92c01a9..84da7f7 100644
 ---

Re: [Qemu-devel] [PATCHv3 0/2] seccomp: remove unused syscalls - for 1.6

2013-07-23 Thread Paolo Bonzini

Il 22/07/2013 20:33, Eduardo Otubo ha scritto:
 In this small patch series I basically:
 
   v3 update:
   - reincluded getrlimit(), it's used by Xen.
 
   v2 update:
   - set libseccomp 2.1.0 as requirement on configure script.
   - reincluded setrlimit() (used by Xen) and removed sendfile64() from
 the whitelist.
 
   1) Remove the ifdef's for the (not so) new libseccomp version that does a
   best effort and translates x86_32 syscalls into x86_64 when possible.
 
   2) Remove unused syscalls on the seccomp whitelist. For that removal, I've 
 been
   running several instances of Qemu using a script written on top of
   virt-test[0]. After some weeks testing I could come up with this small list,
   and safely remove them without breaking anything.
 
 [0] - https://github.com/autotest/virt-test/wiki
 
 Eduardo Otubo (2):
   seccomp: no need to check arch in syscall whitelist
   seccomp: removing unused syscalls gtom whitelist
 
  configure  |  2 +-
  qemu-seccomp.c | 17 -
  2 files changed, 1 insertion(+), 18 deletions(-)
 

Looks good.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo

Re: [Qemu-devel] KVM call agenda for 2013-07-30

2013-07-23 Thread Christian Borntraeger

On 23/07/13 08:33, Juan Quintela wrote:
 
 Hi
 
 Please, send any topic that you are interested in covering.

- soft reset and other reset variants. What is the right way to go?
(e.g. on s390 there are several reset variants that reset a defined subset of 
the
system. This can be triggered by operating systems, e.g. kdump uses a diagnose 
instruction that resets the device subsystem and parts of the cpu registers 
(some
are unchanged, this allows to take a proper crash dump with a well defined 
system
state)

[Qemu-devel] [PATCH 09/11] sheepdog: try to reconnect to sheepdog after network error

2013-07-23 Thread MORITA Kazutaka

This introduces a failed request queue and links all the inflight
requests to the list after network error happens.  After QEMU
reconnects to the sheepdog server successfully, the sheepdog block
driver will retry all the requests in the failed queue.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 72 
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 1173605..cd72927 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -318,8 +318,11 @@ typedef struct BDRVSheepdogState {
 Coroutine *co_recv;
 
 uint32_t aioreq_seq_num;
+
+/* Every aio request must be linked to either of these queues. */
 QLIST_HEAD(inflight_aio_head, AIOReq) inflight_aio_head;
 QLIST_HEAD(pending_aio_head, AIOReq) pending_aio_head;
+QLIST_HEAD(failed_aio_head, AIOReq) failed_aio_head;
 } BDRVSheepdogState;
 
 static const char * sd_strerror(int err)
@@ -669,6 +672,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
+static int get_sheep_fd(BDRVSheepdogState *s);
+static void co_write_request(void *opaque);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -710,6 +715,44 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 }
 }
 
+static coroutine_fn void reconnect_to_sdog(void *opaque)
+{
+BDRVSheepdogState *s = opaque;
+AIOReq *aio_req, *next;
+
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
+close(s-fd);
+s-fd = -1;
+
+/* Wait for outstanding write requests to be completed. */
+while (s-co_send != NULL) {
+co_write_request(opaque);
+}
+
+/* Move all the inflight requests to the failed queue. */
+QLIST_FOREACH_SAFE(aio_req, s-inflight_aio_head, aio_siblings, next) {
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-failed_aio_head, aio_req, aio_siblings);
+}
+
+/* Try to reconnect the sheepdog server every one second. */
+while (s-fd  0) {
+s-fd = get_sheep_fd(s);
+if (s-fd  0) {
+dprintf(Wait for connection to be established\n);
+co_aio_sleep_ns(10ULL);
+}
+};
+
+/* Resend all the failed aio requests. */
+while (!QLIST_EMPTY(s-failed_aio_head)) {
+aio_req = QLIST_FIRST(s-failed_aio_head);
+QLIST_REMOVE(aio_req, aio_siblings);
+QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
+resend_aioreq(s, aio_req);
+}
+}
+
 /*
  * Receive responses of the I/O requests.
  *
@@ -726,15 +769,11 @@ static void coroutine_fn aio_read_response(void *opaque)
 SheepdogAIOCB *acb;
 uint64_t idx;
 
-if (QLIST_EMPTY(s-inflight_aio_head)) {
-goto out;
-}
-
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
 if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
-goto out;
+goto err;
 }
 
 /* find the right aio_req from the inflight aio list */
@@ -745,7 +784,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 if (!aio_req) {
 error_report(cannot find aio_req %x, rsp.id);
-goto out;
+goto err;
 }
 
 acb = aio_req-aiocb;
@@ -785,7 +824,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 aio_req-iov_offset, rsp.data_length);
 if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
-goto out;
+goto err;
 }
 break;
 case AIOCB_FLUSH_CACHE:
@@ -819,10 +858,9 @@ static void coroutine_fn aio_read_response(void *opaque)
 if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
 ret = reload_inode(s, 0, );
 if (ret  0) {
-goto out;
+goto err;
 }
 }
-
 if (is_data_obj(aio_req-oid)) {
 aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
data_oid_to_idx(aio_req-oid));
@@ -850,6 +888,10 @@ static void coroutine_fn aio_read_response(void *opaque)
 }
 out:
 s-co_recv = NULL;
+return;
+err:
+s-co_recv = NULL;
+reconnect_to_sdog(opaque);
 }
 
 static void co_read_response(void *opaque)
@@ -875,7 +917,8 @@ static int aio_flush_request(void *opaque)
 BDRVSheepdogState *s = opaque;
 
 return !QLIST_EMPTY(s-inflight_aio_head) ||
-!QLIST_EMPTY(s-pending_aio_head);
+!QLIST_EMPTY(s-pending_aio_head) ||
+!QLIST_EMPTY(s-failed_aio_head);
 }
 
 /*
@@ -1150,23 +1193,21 @@ static int coroutine_fn

[Qemu-devel] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait

2013-07-23 Thread MORITA Kazutaka

This allows us to use inet_nonblocking_connect() and
unix_nonblocking_connect() in block drivers.

qemu-ga needs to link block-obj to resolve dependencies of
qemu_aio_set_fd_handler().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 Makefile|  4 ++--
 util/qemu-sockets.c | 15 ++-
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index c06bfab..5fe2e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -197,7 +197,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $  $@,  GEN  
 $@)
 
-qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
+qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBS_TOOLS)
 qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated
 
 gen-out-type = $(subst .,-,$(suffix $@))
@@ -227,7 +227,7 @@ $(SRC_PATH)/qapi-schema.json 
$(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
 QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h 
qga-qmp-commands.h)
 $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
 
-qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a
+qemu-ga$(EXESUF): $(qga-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
 
 clean:
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 095716e..8b21fd1 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -218,6 +218,11 @@ typedef struct ConnectState {
 static int inet_connect_addr(struct addrinfo *addr, bool *in_progress,
  ConnectState *connect_state, Error **errp);
 
+static int return_true(void *opaque)
+{
+return 1;
+}
+
 static void wait_for_connect(void *opaque)
 {
 ConnectState *s = opaque;
@@ -225,7 +230,7 @@ static void wait_for_connect(void *opaque)
 socklen_t valsize = sizeof(val);
 bool in_progress;
 
-qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
 
 do {
 rc = qemu_getsockopt(s-fd, SOL_SOCKET, SO_ERROR, val, valsize);
@@ -288,8 +293,8 @@ static int inet_connect_addr(struct addrinfo *addr, bool 
*in_progress,
 
 if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
 connect_state-fd = sock;
-qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
- connect_state);
+qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
+connect_state);
 *in_progress = true;
 } else if (rc  0) {
 error_set_errno(errp, errno, QERR_SOCKET_CONNECT_FAILED);
@@ -749,8 +754,8 @@ int unix_connect_opts(QemuOpts *opts, Error **errp,
 
 if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
 connect_state-fd = sock;
-qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
- connect_state);
+qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
+connect_state);
 return sock;
 } else if (rc = 0) {
 /* non blocking socket immediate success, call callback */
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 06/11] sheepdog: handle vdi objects in resend_aio_req

2013-07-23 Thread MORITA Kazutaka

The current resend_aio_req() doesn't work when the request is against
vdi objects.  This fixes the problem.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 567f52e..018eab2 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1265,11 +1265,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 return ret;
 }
 
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 
 /* check whether this request becomes a CoW one */
-if (acb-aiocb_type == AIOCB_WRITE_UDATA) {
+if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
 int idx = data_oid_to_idx(aio_req-oid);
 AIOReq *areq;
 
@@ -1297,8 +1301,15 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 create = true;
 }
 out:
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+if (is_data_obj(aio_req-oid)) {
+return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
+   create, acb-aiocb_type);
+} else {
+struct iovec iov;
+iov.iov_base = s-inode;
+iov.iov_len = sizeof(s-inode);
+return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+}
 }
 
 /* TODO Convert to fine grained options */
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 11/11] sheepdog: cancel aio requests if possible

2013-07-23 Thread MORITA Kazutaka

This patch tries to cancel aio requests in pending queue and failed
queue.  When the sheepdog driver cannot cancel the requests, it waits
for them to be completed.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 +++-
 1 file changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 8a6c432..43be479 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -294,7 +294,8 @@ struct SheepdogAIOCB {
 Coroutine *coroutine;
 void (*aio_done_func)(SheepdogAIOCB *);
 
-bool canceled;
+bool cancelable;
+bool *finished;
 int nr_pending;
 };
 
@@ -411,6 +412,7 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 
+acb-cancelable = false;
 QLIST_REMOVE(aio_req, aio_siblings);
 g_free(aio_req);
 
@@ -419,23 +421,68 @@ static inline void free_aio_req(BDRVSheepdogState *s, 
AIOReq *aio_req)
 
 static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
-if (!acb-canceled) {
-qemu_coroutine_enter(acb-coroutine, NULL);
+qemu_coroutine_enter(acb-coroutine, NULL);
+if (acb-finished) {
+*acb-finished = true;
 }
 qemu_aio_release(acb);
 }
 
+/*
+ * Check whether the specified acb can be canceled
+ *
+ * We can cancel aio when any request belonging to the acb is:
+ *  - Not processed by the sheepdog server.
+ *  - Not linked to the inflight queue.
+ */
+static bool sd_acb_cancelable(const SheepdogAIOCB *acb)
+{
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq;
+
+if (!acb-cancelable) {
+return false;
+}
+
+QLIST_FOREACH(aioreq, s-inflight_aio_head, aio_siblings) {
+if (aioreq-aiocb == acb) {
+return false;
+}
+}
+
+return false;
+}
+
 static void sd_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb;
+BDRVSheepdogState *s = acb-common.bs-opaque;
+AIOReq *aioreq, *next;
+bool finished = false;
+
+acb-finished = finished;
+while (!finished) {
+if (sd_acb_cancelable(acb)) {
+/* Remove outstanding requests from pending and failed queues.  */
+QLIST_FOREACH_SAFE(aioreq, s-pending_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
+QLIST_FOREACH_SAFE(aioreq, s-failed_aio_head, aio_siblings,
+   next) {
+if (aioreq-aiocb == acb) {
+free_aio_req(s, aioreq);
+}
+}
 
-/*
- * Sheepdog cannot cancel the requests which are already sent to
- * the servers, so we just complete the request with -EIO here.
- */
-acb-ret = -EIO;
-qemu_coroutine_enter(acb-coroutine, NULL);
-acb-canceled = true;
+assert(acb-nr_pending == 0);
+sd_finish_aiocb(acb);
+return;
+}
+qemu_aio_wait();
+}
 }
 
 static const AIOCBInfo sd_aiocb_info = {
@@ -456,7 +503,8 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 acb-nb_sectors = nb_sectors;
 
 acb-aio_done_func = NULL;
-acb-canceled = false;
+acb-cancelable = true;
+acb-finished = NULL;
 acb-coroutine = qemu_coroutine_self();
 acb-ret = 0;
 acb-nr_pending = 0;
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 08/11] coroutine: add co_aio_sleep_ns() to allow sleep in block drivers

2013-07-23 Thread MORITA Kazutaka

This helper function behaves similarly to co_sleep_ns(), but the
sleeping coroutine will be resumed when using qemu_aio_wait().

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 include/block/coroutine.h |  8 
 qemu-coroutine-sleep.c| 47 +++
 2 files changed, 55 insertions(+)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..23ea6e9 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -210,6 +210,14 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns);
 
 /**
+ * Yield the coroutine for a given duration
+ *
+ * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
+ * resumed when using qemu_aio_wait().
+ */
+void coroutine_fn co_aio_sleep_ns(int64_t ns);
+
+/**
  * Yield until a file descriptor becomes readable
  *
  * Note that this function clobbers the handlers for the file descriptor.
diff --git a/qemu-coroutine-sleep.c b/qemu-coroutine-sleep.c
index 169ce5c..3955347 100644
--- a/qemu-coroutine-sleep.c
+++ b/qemu-coroutine-sleep.c
@@ -13,6 +13,7 @@
 
 #include block/coroutine.h
 #include qemu/timer.h
+#include qemu/thread.h
 
 typedef struct CoSleepCB {
 QEMUTimer *ts;
@@ -37,3 +38,49 @@ void coroutine_fn co_sleep_ns(QEMUClock *clock, int64_t ns)
 qemu_del_timer(sleep_cb.ts);
 qemu_free_timer(sleep_cb.ts);
 }
+
+typedef struct CoAioSleepCB {
+QEMUBH *bh;
+int64_t ns;
+Coroutine *co;
+} CoAioSleepCB;
+
+static void co_aio_sleep_cb(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+
+qemu_coroutine_enter(aio_sleep_cb-co, NULL);
+}
+
+static void *sleep_thread(void *opaque)
+{
+CoAioSleepCB *aio_sleep_cb = opaque;
+struct timespec req = {
+.tv_sec = aio_sleep_cb-ns / 10,
+.tv_nsec = aio_sleep_cb-ns % 10,
+};
+struct timespec rem;
+
+while (nanosleep(req, rem)  0  errno == EINTR) {
+req = rem;
+}
+
+qemu_bh_schedule(aio_sleep_cb-bh);
+
+return NULL;
+}
+
+void coroutine_fn co_aio_sleep_ns(int64_t ns)
+{
+CoAioSleepCB aio_sleep_cb = {
+.ns = ns,
+.co = qemu_coroutine_self(),
+};
+QemuThread thread;
+
+aio_sleep_cb.bh = qemu_bh_new(co_aio_sleep_cb, aio_sleep_cb);
+qemu_thread_create(thread, sleep_thread, aio_sleep_cb,
+   QEMU_THREAD_DETACHED);
+qemu_coroutine_yield();
+qemu_bh_delete(aio_sleep_cb.bh);
+}
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 07/11] sheepdog: reload inode outside of resend_aioreq

2013-07-23 Thread MORITA Kazutaka

This prepares for using resend_aioreq() after reconnecting to the
sheepdog server.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 018eab2..1173605 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -222,6 +222,11 @@ static inline uint64_t data_oid_to_idx(uint64_t oid)
 return oid  (MAX_DATA_OBJS - 1);
 }
 
+static inline uint32_t oid_to_vid(uint64_t oid)
+{
+return (oid  ~VDI_BIT)  VDI_SPACE_SHIFT;
+}
+
 static inline uint64_t vid_to_vdi_oid(uint32_t vid)
 {
 return VDI_BIT | ((uint64_t)vid  VDI_SPACE_SHIFT);
@@ -663,7 +668,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
 static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
-
+static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
@@ -811,6 +816,19 @@ static void coroutine_fn aio_read_response(void *opaque)
 case SD_RES_SUCCESS:
 break;
 case SD_RES_READONLY:
+if (s-inode.vdi_id == oid_to_vid(aio_req-oid)) {
+ret = reload_inode(s, 0, );
+if (ret  0) {
+goto out;
+}
+}
+
+if (is_data_obj(aio_req-oid)) {
+aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
+   data_oid_to_idx(aio_req-oid));
+} else {
+aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
+}
 ret = resend_aioreq(s, aio_req);
 if (ret == SD_RES_SUCCESS) {
 goto out;
@@ -1258,19 +1276,6 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
-int ret;
-
-ret = reload_inode(s, 0, );
-if (ret  0) {
-return ret;
-}
-
-if (is_data_obj(aio_req-oid)) {
-aio_req-oid = vid_to_data_oid(s-inode.vdi_id,
-   data_oid_to_idx(aio_req-oid));
-} else {
-aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
-}
 
 /* check whether this request becomes a CoW one */
 if (acb-aiocb_type == AIOCB_WRITE_UDATA  is_data_obj(aio_req-oid)) {
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 04/11] sheepdog: make connect nonblocking

2013-07-23 Thread MORITA Kazutaka

This uses nonblocking connect functions to connect to the sheepdog
server.  The connect operation is done in a coroutine function and it
will be yielded until the created socked is ready for IO.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 70 ++--
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..6f5ede4 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -455,18 +455,51 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return acb;
 }
 
-static int connect_to_sdog(BDRVSheepdogState *s)
-{
+typedef struct SheepdogConnectCo {
+BDRVSheepdogState *bs;
+Coroutine *co;
 int fd;
+bool finished;
+} SheepdogConnectCo;
+
+static void sd_connect_completed(int fd, void *opaque)
+{
+SheepdogConnectCo *scco = opaque;
+
+if (fd  0) {
+int val, rc;
+socklen_t valsize = sizeof(val);
+
+do {
+rc = qemu_getsockopt(scco-fd, SOL_SOCKET, SO_ERROR, val,
+ valsize);
+} while (rc == -1  socket_error() == EINTR);
+
+scco-fd = rc  0 ? -errno : -val;
+}
+
+scco-finished = true;
+
+if (scco-co != NULL) {
+qemu_coroutine_enter(scco-co, NULL);
+}
+}
+
+static coroutine_fn void co_connect_to_sdog(void *opaque)
+{
+SheepdogConnectCo *scco = opaque;
+BDRVSheepdogState *s = scco-bs;
 Error *err = NULL;
 
 if (s-is_unix) {
-fd = unix_connect(s-host_spec, err);
+scco-fd = unix_nonblocking_connect(s-host_spec, sd_connect_completed,
+opaque, err);
 } else {
-fd = inet_connect(s-host_spec, err);
+scco-fd = inet_nonblocking_connect(s-host_spec, sd_connect_completed,
+opaque, err);
 
 if (err == NULL) {
-int ret = socket_set_nodelay(fd);
+int ret = socket_set_nodelay(scco-fd);
 if (ret  0) {
 error_report(%s, strerror(errno));
 }
@@ -476,11 +509,34 @@ static int connect_to_sdog(BDRVSheepdogState *s)
 if (err != NULL) {
 qerror_report_err(err);
 error_free(err);
+}
+
+if (!scco-finished) {
+/* wait for connect to finish */
+scco-co = qemu_coroutine_self();
+qemu_coroutine_yield();
+}
+}
+
+static int connect_to_sdog(BDRVSheepdogState *s)
+{
+Coroutine *co;
+SheepdogConnectCo scco = {
+.bs = s,
+.finished = false,
+};
+
+if (qemu_in_coroutine()) {
+co_connect_to_sdog(scco);
 } else {
-qemu_set_nonblock(fd);
+co = qemu_coroutine_create(co_connect_to_sdog);
+qemu_coroutine_enter(co, scco);
+while (!scco.finished) {
+qemu_aio_wait();
+}
 }
 
-return fd;
+return scco.fd;
 }
 
 static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data,
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 10/11] sheepdog: make add_aio_request and send_aioreq void functions

2013-07-23 Thread MORITA Kazutaka

These functions no longer return errors.  We can make them void
functions and simplify the codes.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 66 +++-
 1 file changed, 17 insertions(+), 49 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cd72927..8a6c432 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -667,10 +667,10 @@ static int do_req(int sockfd, SheepdogReq *hdr, void 
*data,
 return srco.ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type);
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 static int reload_inode(BDRVSheepdogState *s, uint32_t snapid, const char 
*tag);
 static int get_sheep_fd(BDRVSheepdogState *s);
 static void co_write_request(void *opaque);
@@ -696,22 +696,14 @@ static void coroutine_fn 
send_pending_req(BDRVSheepdogState *s, uint64_t oid)
 {
 AIOReq *aio_req;
 SheepdogAIOCB *acb;
-int ret;
 
 while ((aio_req = find_pending_req(s, oid)) != NULL) {
 acb = aio_req-aiocb;
 /* move aio_req from pending list to inflight one */
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-ret = add_aio_request(s, aio_req, acb-qiov-iov,
-  acb-qiov-niov, false, acb-aiocb_type);
-if (ret  0) {
-error_report(add_aio_request is failed);
-free_aio_req(s, aio_req);
-if (!acb-nr_pending) {
-sd_finish_aiocb(acb);
-}
-}
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, false,
+acb-aiocb_type);
 }
 }
 
@@ -867,11 +859,8 @@ static void coroutine_fn aio_read_response(void *opaque)
 } else {
 aio_req-oid = vid_to_vdi_oid(s-inode.vdi_id);
 }
-ret = resend_aioreq(s, aio_req);
-if (ret == SD_RES_SUCCESS) {
-goto out;
-}
-/* fall through */
+resend_aioreq(s, aio_req);
+goto out;
 default:
 acb-ret = -EIO;
 error_report(%s, sd_strerror(rsp.result));
@@ -1129,7 +1118,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
struct iovec *iov, int niov, bool create,
enum AIOCBState aiocb_type)
 {
@@ -1209,8 +1198,6 @@ out:
 aio_flush_request, s);
 s-co_send = NULL;
 qemu_co_mutex_unlock(s-lock);
-
-return 0;
 }
 
 static int read_write_object(int fd, char *buf, uint64_t oid, int copies,
@@ -1313,7 +1300,7 @@ out:
 return ret;
 }
 
-static int coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+static void coroutine_fn resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
 {
 SheepdogAIOCB *acb = aio_req-aiocb;
 bool create = false;
@@ -1338,7 +1325,7 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 dprintf(simultaneous CoW to % PRIx64 \n, aio_req-oid);
 QLIST_REMOVE(aio_req, aio_siblings);
 QLIST_INSERT_HEAD(s-pending_aio_head, aio_req, aio_siblings);
-return SD_RES_SUCCESS;
+return;
 }
 }
 
@@ -1348,13 +1335,13 @@ static int coroutine_fn resend_aioreq(BDRVSheepdogState 
*s, AIOReq *aio_req)
 }
 out:
 if (is_data_obj(aio_req-oid)) {
-return add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov,
-   create, acb-aiocb_type);
+add_aio_request(s, aio_req, acb-qiov-iov, acb-qiov-niov, create,
+acb-aiocb_type);
 } else {
 struct iovec iov;
 iov.iov_base = s-inode;
 iov.iov_len = sizeof(s-inode);
-return add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
+add_aio_request(s, aio_req, iov, 1, false, AIOCB_WRITE_UDATA);
 }
 }
 
@@ -1744,7 +1731,6 @@ static int sd_truncate(BlockDriverState *bs, int64_t 
offset)
  */
 static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
-int ret;
 BDRVSheepdogState *s = acb-common.bs-opaque;
 struct iovec iov;
 AIOReq *aio_req;
@@ -1766,18 +1752,13 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB 
*acb)
 aio_req = alloc_aio_req(s, acb, vid_to_vdi_oid(s-inode.vdi_id),
 data_len, offset, 0, 0, offset);
 QLIST_INSERT_HEAD(s-inflight_aio_head, aio_req, aio_siblings);
-

[Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv

2013-07-23 Thread MORITA Kazutaka

Without this patch, iov_send_recv() never returns when do_send_recv()
returns zero.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 util/iov.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..f705586 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 return -1;
 }
 
+if (ret == 0  !do_send) {
+/* recv returns 0 when the peer has performed an orderly
+ * shutdown. */
+break;
+}
+
 /* Prepare for the next iteration */
 offset += ret;
 total += ret;
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 05/11] sheepdog: check return values of qemu_co_recv/send correctly

2013-07-23 Thread MORITA Kazutaka

qemu_co_recv/send return shorter length on error.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 block/sheepdog.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6f5ede4..567f52e 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -727,7 +727,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 
 /* read a header */
 ret = qemu_co_recv(fd, rsp, sizeof(rsp));
-if (ret  0) {
+if (ret  sizeof(rsp)) {
 error_report(failed to get the header, %s, strerror(errno));
 goto out;
 }
@@ -778,7 +778,7 @@ static void coroutine_fn aio_read_response(void *opaque)
 case AIOCB_READ_UDATA:
 ret = qemu_co_recvv(fd, acb-qiov-iov, acb-qiov-niov,
 aio_req-iov_offset, rsp.data_length);
-if (ret  0) {
+if (ret  rsp.data_length) {
 error_report(failed to get the data, %s, strerror(errno));
 goto out;
 }
@@ -1131,7 +1131,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 /* send a header */
 ret = qemu_co_send(s-fd, hdr, sizeof(hdr));
-if (ret  0) {
+if (ret  sizeof(hdr)) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a req, %s, strerror(errno));
 return -errno;
@@ -1139,7 +1139,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState 
*s, AIOReq *aio_req,
 
 if (wlen) {
 ret = qemu_co_sendv(s-fd, iov, niov, aio_req-iov_offset, wlen);
-if (ret  0) {
+if (ret  wlen) {
 qemu_co_mutex_unlock(s-lock);
 error_report(failed to send a data, %s, strerror(errno));
 return -errno;
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 01/11] ignore SIGPIPE in qemu-img and qemu-io

2013-07-23 Thread MORITA Kazutaka

This prevents the tools from being stopped when they write data to a
closed connection in the other side.

Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
---
 qemu-img.c | 4 
 qemu-io.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..919d464 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
 const img_cmd_t *cmd;
 const char *cmdname;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 error_set_progname(argv[0]);
 
 qemu_init_main_loop();
diff --git a/qemu-io.c b/qemu-io.c
index cb9def5..d54dc86 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -335,6 +335,10 @@ int main(int argc, char **argv)
 int opt_index = 0;
 int flags = BDRV_O_UNMAP;
 
+#ifdef CONFIG_POSIX
+signal(SIGPIPE, SIG_IGN);
+#endif
+
 progname = basename(argv[0]);
 
 while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
-- 
1.8.1.3.566.gaa39828

[Qemu-devel] [PATCH 00/11] sheepdog: reconnect server after connection failure

2013-07-23 Thread MORITA Kazutaka

Currently, if a sheepdog server exits, all the connecting VMs need to
be restarted.  This series implements a feature to reconnect the
server, and enables us to do online sheepdog upgrade and avoid
restarting VMs when sheepdog servers crash unexpectedly.

MORITA Kazutaka (11):
  ignore SIGPIPE in qemu-img and qemu-io
  iov: handle eof in iov_send_recv
  qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait
  sheepdog: make connect nonblocking
  sheepdog: check return values of qemu_co_recv/send correctly
  sheepdog: handle vdi objects in resend_aio_req
  sheepdog: reload inode outside of resend_aioreq
  coroutine: add co_aio_sleep_ns() to allow sleep in block drivers
  sheepdog: try to reconnect to sheepdog after network error
  sheepdog: make add_aio_request and send_aioreq void functions
  sheepdog: cancel aio requests if possible

 Makefile  |   4 +-
 block/sheepdog.c  | 314 --
 include/block/coroutine.h |   8 ++
 qemu-coroutine-sleep.c|  47 +++
 qemu-img.c|   4 +
 qemu-io.c |   4 +
 util/iov.c|   6 +
 util/qemu-sockets.c   |  15 ++-
 8 files changed, 303 insertions(+), 99 deletions(-)

-- 
1.8.1.3.566.gaa39828

Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on

2013-07-23 Thread Stefan Hajnoczi

On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote:
   I use systemtap to test this patch,the migration will success. But I 
 found the dataplane will start again after migration start. the 
 virtio_blk_handle_output will start dataplane.

Hi Yin Yin,
Thank you for testing the patch.  It is not clear to me whether you
encountered a problem or not.

It is expected that the destination will start the dataplane thread.
Was there a crash or some reason why you posted these traces?

Stefan

Re: [Qemu-devel] Emulating mips

2013-07-23 Thread Andreas Färber

Hello,

Am 23.07.2013 07:16, schrieb Renich Bon Ciric:
 I'm trying to run some rom file I got from a client. It's a sc2005
 processor; supposedly compatible with 4k.
 
 Anyway, I do this:
 
 qemu-system-mips -M mips -pflash 301-3100\ -\ user\ specified\ -\
 Full.bin -serial stdio
 
 The processor goes to 100% but I see nothing, not in the serial
 console nor in the window (monitor, maybe?)

You didn't mention which version you're using, so try latest stable 1.5
or qemu.git.

You need to know what board the ROM file was for, you can view the list
with -M '?' - if it's none of those, chances are you need to implement
the machine first.

Note that there's qemu-system-mips and qemu-system-mipsel depending on
endianness, and you can usually override the CPU via -cpu, again see
-cpu '?' for a list.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] watchdog: Remove break after exit

2013-07-23 Thread Andreas Färber

Am 23.07.2013 06:46, schrieb Stefan Weil:
 This was dead code.
 
 Signed-off-by: Stefan Weil s...@weilnetz.de

Reviewed-by: Andreas Färber afaer...@suse.de

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Andreas Färber

Am 23.07.2013 09:07, schrieb Michael S. Tsirkin:
 On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote:
 For VMState I believe the real follow-up fix would be mst defining a
 central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice.
 Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE()
 in the first place?
 
 The real fix is savevm/loadvm taking into account
 the class hierarchy.

That's not helping, unless you write a patch to show what you mean and
how that is going to be migration-compatible.

Does your not answering the question mean you don't know?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH qom-cpu] HACKING: Document vaddr type usage

2013-07-23 Thread Andreas Färber

Am 23.07.2013 08:33, schrieb Paolo Bonzini:
 Il 22/07/2013 19:27, Peter Maydell ha scritto:
 On 22 July 2013 17:36, Andreas Färber afaer...@suse.de wrote:
 Signed-off-by: Andreas Färber afaer...@suse.de
 ---
  HACKING | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/HACKING b/HACKING
 index e73ac79..d9dbb46 100644
 --- a/HACKING
 +++ b/HACKING
 @@ -42,6 +42,7 @@ ram_addr_t.

  Use target_ulong (or abi_ulong) for CPU virtual addresses, however
  devices should not need to use target_ulong.
 +Use vaddr for CPU virtual addresses in target-independent code.

 Here's my suggestion for this paragraph (ie to replace
 both the Use target_ulong... and Use vaddr sentences
 above):

 ===begin===
 For CPU virtual addresses there are several possible types.
 vaddr is the best type to use to hold a CPU virtual address
 in target-independent code, including most devices. It is
 guaranteed to be large enough to hold a virtual address for
 any target, and it does not change size from target to target.
 It is always unsigned.
 target_ulong is a type the size of a virtual address on the CPU;
 this means it may be 32 or 64 bits depending on which target
 is being built. It should therefore be used only in target
 specific code, and in some performance-critical built-per-target
 core code such as the TLB code. There is also a signed version,
 target_long.
 abi_ulong is for the *-user targets, and represents a type the
 size of 'void *' in that target's ABI. (This may not be the same
 as the size of a full CPU virtual address in the case of target
 ABIs which use 32 bit pointers on 64 bit CPUs, like sparc32plus.)
 Definitions of structures that must match the target's ABI
 must use this type for anything that on the target is defined
 to be an 'unsigned long' or a pointer type. There is also a
 signed version, abi_long.
 ===endit===

 (cc'ing Paolo to check I didn't mangle the abi_ulong/target_ulong
 distinction.)
 
 Yes, it's fine.  You might add that abi_short, abi_int, etc. also exist,
 and that they have the same alignment as short/int/etc in the target ABI.
 
 There is one nit: tn fact abi_ulong has the size of 'long'---which is
 the same as 'void *', but only because our *-user targets are all ILP32
 or LP64.  A hypotectical windows-user target would make abi_ullong the
 size of 'void *'.

Given the number of people that will read HACKING to that detail level,
I hope you can send a follow-up patch to clarify that once merged. :)

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [qom-cpu PATCH 2/2] i386: disable PMU passthrough mode by default

2013-07-23 Thread Paolo Bonzini

Il 22/07/2013 21:25, Eduardo Habkost ha scritto:
 Bug description: QEMU currently gets all bits from GET_SUPPORTED_CPUID
 for CPUID leaf 0xA and passes them directly to the guest. This makes
 the guest ABI depend on host kernel and host CPU capabilities, and
 breaks live migration if we migrate between host with different
 capabilities (e.g. different number of PMU counters).
 
 This patch adds a pmu-passthrough property to X86CPU, and set it to
 true only on -cpu host, or on pc-*-1.5 and older machine-types.

Can we just call the property pmu?  It doesn't have to be passthough.

Later we can support setting the right values for leaf 0xA.  This way
migration will work even for -cpu other than -cpu host, and the same
option will work.

Paolo

 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
 ---
  include/hw/i386/pc.h  |  4 
  target-i386/cpu-qom.h |  7 +++
  target-i386/cpu.c | 11 ++-
  3 files changed, 21 insertions(+), 1 deletion(-)
 
 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index 7fb97b0..3cea83f 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -235,6 +235,10 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
  .driver   = virtio-net-pci,\
  .property = any_layout,\
  .value= off,\
 +},{\
 +.driver = TYPE_X86_CPU,\
 +.property = pmu-passthrough,\
 +.value = on,\
  }
  
  #define PC_COMPAT_1_4 \
 diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
 index 7e55e5f..b505a45 100644
 --- a/target-i386/cpu-qom.h
 +++ b/target-i386/cpu-qom.h
 @@ -68,6 +68,13 @@ typedef struct X86CPU {
  
  /* Features that were filtered out because of missing host capabilities 
 */
  uint32_t filtered_features[FEATURE_WORDS];
 +
 +/* Pass all PMU CPUID bits to the guest directly from 
 GET_SUPPORTED_CPUID.
 + * This can't be enabled by default because it breaks live-migration,
 + * as it makes the guest ABI change depending on host CPU/kernel
 + * capabilities.
 + */
 +bool pmu_passthrough;
  } X86CPU;
  
  static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 41c81af..e192f63 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -1475,17 +1475,25 @@ static void x86_cpu_get_feature_words(Object *obj, 
 Visitor *v, void *opaque,
  error_propagate(errp, err);
  }
  
 +static Property cpu_x86_properties[] = {
 +DEFINE_PROP_BOOL(pmu-passthrough, X86CPU, pmu_passthrough, false),
 +DEFINE_PROP_END_OF_LIST(),
 +};
 +
  static int cpu_x86_find_by_name(X86CPU *cpu, x86_def_t *x86_cpu_def,
  const char *name)
  {
  x86_def_t *def;
  int i;
 +Error *err = NULL;
  
  if (name == NULL) {
  return -1;
  }
  if (kvm_enabled()  strcmp(name, host) == 0) {
  kvm_cpu_fill_host(x86_cpu_def);
 +object_property_set_bool(OBJECT(cpu), true, pmu-passthrough, err);
 +assert_no_error(err);
  return 0;
  }
  
 @@ -2017,7 +2025,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  break;
  case 0xA:
  /* Architectural Performance Monitoring Leaf */
 -if (kvm_enabled()) {
 +if (kvm_enabled()  cpu-pmu_passthrough) {
  KVMState *s = cs-kvm_state;
  
  *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
 @@ -2516,6 +2524,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
 void *data)
  xcc-parent_realize = dc-realize;
  dc-realize = x86_cpu_realizefn;
  dc-bus_type = TYPE_ICC_BUS;
 +dc-props = cpu_x86_properties;
  
  xcc-parent_reset = cc-reset;
  cc-reset = x86_cpu_reset;

Re: [Qemu-devel] [PATCH for-1.6 01/11] ignore SIGPIPE in qemu-img and qemu-io

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
 This prevents the tools from being stopped when they write data to a
 closed connection in the other side.
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 ---
  qemu-img.c | 4 
  qemu-io.c  | 4 
  2 files changed, 8 insertions(+)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index c55ca5c..919d464 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -2319,6 +2319,10 @@ int main(int argc, char **argv)
  const img_cmd_t *cmd;
  const char *cmdname;
  
 +#ifdef CONFIG_POSIX
 +signal(SIGPIPE, SIG_IGN);
 +#endif
 +
  error_set_progname(argv[0]);
  
  qemu_init_main_loop();
 diff --git a/qemu-io.c b/qemu-io.c
 index cb9def5..d54dc86 100644
 --- a/qemu-io.c
 +++ b/qemu-io.c
 @@ -335,6 +335,10 @@ int main(int argc, char **argv)
  int opt_index = 0;
  int flags = BDRV_O_UNMAP;
  
 +#ifdef CONFIG_POSIX
 +signal(SIGPIPE, SIG_IGN);
 +#endif
 +
  progname = basename(argv[0]);
  
  while ((c = getopt_long(argc, argv, sopt, lopt, opt_index)) != -1) {
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

and adding qemu-stable for this one.

Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on

2013-07-23 Thread yinyin

Hi, Stefan:
during the migration, the source, not the destination, will start 
dataplane again
I think the process of migration with dataplane as follows:
1. migration begin to start
2. the migration source stop the dataplane
3. do migration ...
4. migration completed, the destination start the dataplane.

when migration start, the source dataplane should already stopped, and not 
start again, if there is no cancel or abort.
But the trace show that, in step 3 above, the source dataplane will be start by 
virtio_blk_handle_output. I'm afraid of some inconsistent will happen there.Is 
it right?

There is no crash found, I just use this trace to understand the flow of 
dataplane migration. 

Yin Yin
yin@cs2c.com.cn

 
在 2013-7-23，下午4:51，Stefan Hajnoczi stefa...@redhat.com 写道：

 On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote:
  I use systemtap to test this patch,the migration will success. But I 
 found the dataplane will start again after migration start. the 
 virtio_blk_handle_output will start dataplane.
 
 Hi Yin Yin,
 Thank you for testing the patch.  It is not clear to me whether you
 encountered a problem or not.
 
 It is expected that the destination will start the dataplane thread.
 Was there a crash or some reason why you posted these traces?
 
 Stefan

Re: [Qemu-devel] [RFC PATCH] vga: Start supporting resolution not multiple of 16 correctly.

2013-07-23 Thread Fabio Fantoni


Il 18/06/2013 12:17, Frediano Ziglio ha scritto:

Modern notebook support 1366x768 resolution. The resolution width is
not multiple of 16 causing some problems.

QEMU VGA emulation requires width resolution to be multiple of 8.

VNC implementation requires width resolution to be multiple of 16.

Signed-off-by: Frediano Ziglio frediano.zig...@citrix.com 
mailto:frediano.zig...@citrix.com


Tested-by: Fabio Fantoni fabio.fant...@m2r.biz

I tested it for a long time with spice on xen (because qxl will be fully 
working only after adding SSE support on hvm domUs). It works, I think 
it is good to add this and the respective vgabios patch on upstream.



---
 hw/display/vga.c |2 +-
 ui/vnc.c |   34 +++---
 2 files changed, 20 insertions(+), 16 deletions(-)

Updates:
- rebased
- fixed style problems
- fixed typos in comment

Attached patch for last vgabios in order to get some new resolutions.
Still had no time to test deeply.

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 21a108d..0053b0f 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -648,7 +648,7 @@ void vbe_ioport_write_data(void *opaque, uint32_t 
addr, uint32_t val)

 }
 break;
 case VBE_DISPI_INDEX_XRES:
-if ((val = VBE_DISPI_MAX_XRES)  ((val  7) == 0)) {
+if ((val = VBE_DISPI_MAX_XRES)  ((val  1) == 0)) {
 s-vbe_regs[s-vbe_index] = val;
 }
 break;
diff --git a/ui/vnc.c b/ui/vnc.c
index dfc7459..2a2bb90 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -911,26 +911,30 @@ static int vnc_update_client(VncState *vs, int 
has_dirty)

 for (y = 0; y  height; y++) {
 int x;
 int last_x = -1;
-for (x = 0; x  width / 16; x++) {
-if (test_and_clear_bit(x, vs-dirty[y])) {
+for (x = 0; x  width; x += 16) {
+if (test_and_clear_bit(x/16, vs-dirty[y])) {
 if (last_x == -1) {
 last_x = x;
 }
 } else {
 if (last_x != -1) {
-int h = find_and_clear_dirty_height(vs, y, 
last_x, x,

- height);
+int h = find_and_clear_dirty_height(vs, y, 
last_x/16,

+ x/16, height);

-n += vnc_job_add_rect(job, last_x * 16, y,
-  (x - last_x) * 16, h);
+n += vnc_job_add_rect(job, last_x, y,
+  (x - last_x), h);
 }
 last_x = -1;
 }
 }
 if (last_x != -1) {
-int h = find_and_clear_dirty_height(vs, y, last_x, x, 
height);

-n += vnc_job_add_rect(job, last_x * 16, y,
-  (x - last_x) * 16, h);
+int h = find_and_clear_dirty_height(vs, y, last_x/16, 
x/16,

+ height);
+if (x  width) {
+x = width;
+}
+n += vnc_job_add_rect(job, last_x, y,
+  (x - last_x), h);
 }
 }

@@ -1861,7 +1865,7 @@ static void framebuffer_update_request(VncState 
*vs, int incremental,

int w, int h)
 {
 int i;
-const size_t width = surface_width(vs-vd-ds) / 16;
+const size_t width = (surface_width(vs-vd-ds)+15) / 16;
 const size_t height = surface_height(vs-vd-ds);

 if (y_position  height) {
@@ -2686,10 +2690,6 @@ static int 
vnc_refresh_server_surface(VncDisplay *vd)

  * Check and copy modified bits from guest to server surface.
  * Update server dirty map.
  */
-cmp_bytes = 64;
-if (cmp_bytes  vnc_server_fb_stride(vd)) {
-cmp_bytes = vnc_server_fb_stride(vd);
-}
 if (vd-guest.format != VNC_SERVER_FB_FORMAT) {
 int width = pixman_image_get_width(vd-server);
 tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, width);
@@ -2710,8 +2710,12 @@ static int 
vnc_refresh_server_surface(VncDisplay *vd)

 }
 server_ptr = server_row;

-for (x = 0; x + 15  width;
+cmp_bytes = 64;
+for (x = 0; x  width;
 x += 16, guest_ptr += cmp_bytes, server_ptr += 
cmp_bytes) {

+if (width - x  16) {
+cmp_bytes = 4 * (width - x);
+}
 if (!test_and_clear_bit((x / 16), vd-guest.dirty[y]))
 continue;
 if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0)
--
1.7.10.4



2013/5/10 Andreas Färber afaer...@suse.de mailto:afaer...@suse.de

Am 15.03.2013 19:14, schrieb Frediano Ziglio:
 Modern notebook support 136x768 resolution. The resolution width is

1366?

 not multiple of 16 causing some problems.

a multiple? (me not a native English speaker)

Re: [Qemu-devel] [PATCH 00/28] Memory API for 1.6: fix I/O port endianness mess

2013-07-23 Thread Paolo Bonzini

Il 22/07/2013 22:16, Hervé Poussineau ha scritto:
 PReP is an exception, but
 I think it could be rewritten to use an IOMMU memory region.
 
 PReP PCI I/O area is located at 0x8000, up to 0xbf7f (in main
 memory space region), while ISA I/O area is at 0x8000, up to
 0x8000 (size=64KB)

or up to 0x807ff for non-contiguous mode.  The IOMMU memory region
would let you implement non-contiguous mode without calling
cpu_inb/cpu_outb.

 However, as they are overlapped, some strange things can happen.
 For example, IBM 40p firmware configures the PCI SCSI bar at 0x2000
 (ie 0xa000 in main memory), while Linux sets bar to 0x1000 (ie
 0x80001000 in main memory), ie also in ISA I/O space.

If BARs have a lower priority than assigned ISA I/O space, this should
just work if you use an alias memory region for ISA I/O space.  Gaps
in the ISA I/O space will let you see through the ISA I/O space and
access BARs.

If BARs have a higher priority, you need to set the priority accordingly
for the ISA I/O space alias (using memory_region_add_subregion_overlap),
but that's it.

By the way, i82378.c can also use memory_region_init_alias to initialize
the memory regions that are passed to pci_register_bar.  I didn't do
this in this series.

 I don't know exactly what you mean by an IOMMU memory region, but how
 would you modelize it, so that 0x80001000 and 0xa000 accesses are
 redirected to PCI SCSI card, while 0x83f8 redirects (for example) to
 an ISA serial port?

See above.

Paolo

 If you create a new memory region for ISA I/O space, and you redirect
 all accesses from 0x8000-0x8000 to this new address space,
 0x80001000 won't work to access the SCSI I/O bar (located in the PCI I/O
 address space).

Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on

2013-07-23 Thread Andreas Färber

Hi,

Am 23.07.2013 11:19, schrieb yinyin:
 Hi, Stefan:
   during the migration, the source, not the destination, will start 
 dataplane again
   I think the process of migration with dataplane as follows:
 1. migration begin to start
 2. the migration source stop the dataplane
 3. do migration ...
 4. migration completed, the destination start the dataplane.

I can't speak for the dataplane, but in general the source guest is
expected to continue working during live migration (that's the live
part) until it has been fully transferred to the destination.

HTH,
Andreas

 when migration start, the source dataplane should already stopped, and not 
 start again, if there is no cancel or abort.
 But the trace show that, in step 3 above, the source dataplane will be start 
 by virtio_blk_handle_output. I'm afraid of some inconsistent will happen 
 there.Is it right?
 
 There is no crash found, I just use this trace to understand the flow of 
 dataplane migration. 
 
 Yin Yin
 yin@cs2c.com.cn
 
  
 在 2013-7-23，下午4:51，Stefan Hajnoczi stefa...@redhat.com 写道：
 
 On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote:
 I use systemtap to test this patch,the migration will success. But I 
 found the dataplane will start again after migration start. the 
 virtio_blk_handle_output will start dataplane.

 Hi Yin Yin,
 Thank you for testing the patch.  It is not clear to me whether you
 encountered a problem or not.

 It is expected that the destination will start the dataplane thread.
 Was there a crash or some reason why you posted these traces?

 Stefan

 
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH v2 1/7] AARCH64: Add A57 CPU to default AArch64 configuration and enable KVM

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

Introduce the A57 cpu to the default AArch64 configuration and enable KVM for
64-bit guests only.

Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com
---
 configure   |2 +-
 default-configs/aarch64-softmmu.mak |3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 2f83771..021c1dd 100755
--- a/configure
+++ b/configure
@@ -4293,7 +4293,7 @@ case $target_name in
   *)
 esac
 case $target_name in
-  arm|i386|x86_64|ppcemb|ppc|ppc64|s390x)
+  arm|aarch64|i386|x86_64|ppcemb|ppc|ppc64|s390x)
 # Make sure the target and host cpus are compatible
 if test $kvm = yes -a $target_softmmu = yes -a \
   \( $target_name = $cpu -o \
diff --git a/default-configs/aarch64-softmmu.mak 
b/default-configs/aarch64-softmmu.mak
index 27cbe3d..00c8bdd 100644
--- a/default-configs/aarch64-softmmu.mak
+++ b/default-configs/aarch64-softmmu.mak
@@ -1,4 +1,4 @@
-# Default configuration for arm-softmmu
+# Default configuration for aarch64-softmmu
 
 include pci.mak
 include usb.mak
@@ -37,6 +37,7 @@ CONFIG_USB_MUSB=y
 CONFIG_ARM9MPCORE=y
 CONFIG_ARM11MPCORE=y
 CONFIG_ARM15MPCORE=y
+CONFIG_ARM57MPCORE=y
 
 CONFIG_ARM_GIC=y
 CONFIG_ARM_GIC_KVM=$(CONFIG_KVM)
-- 
1.7.9.5

[Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

The memory region init calls require an additional parent parameter, so
introduce a null parent parameter to make it happy.

Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com
---
 hw/arm/virt.c  |2 +-
 hw/cpu/a57mpcore.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 97712d7..8a2bdc7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -407,7 +407,7 @@ static void machvirt_init(QEMUMachineInitArgs *args)
 }
 fdt_add_cpu_nodes(virt_fdt, mi, smp_cpus);
 
-memory_region_init_ram(ram, mach-virt.ram, ram_size);
+memory_region_init_ram(ram, NULL, mach-virt.ram, ram_size);
 vmstate_register_ram_global(ram);
 memory_region_add_subregion(sysmem, mi-mem_base, ram);
 
diff --git a/hw/cpu/a57mpcore.c b/hw/cpu/a57mpcore.c
index 2923a2a..1ab6dc0 100644
--- a/hw/cpu/a57mpcore.c
+++ b/hw/cpu/a57mpcore.c
@@ -70,7 +70,7 @@ static int a57mp_priv_init(SysBusDevice *dev)
  *  0x5000-0x5fff -- GIC virtual interface control (not modelled)
  *  0x6000-0x7fff -- GIC virtual CPU interface (not modelled)
  */
-memory_region_init(s-container, a57mp-priv-container, 0x8000);
+memory_region_init(s-container, NULL, a57mp-priv-container, 0x8000);
 memory_region_add_subregion(s-container, 0x1000,
 sysbus_mmio_get_region(busdev, 0));
 memory_region_add_subregion(s-container, 0x2000,
-- 
1.7.9.5

[Qemu-devel] [PATCH v2 0/7] AARCH64 support on machvirt machine model using KVM

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

This is the v2 of patch series that implements KVM support in QEMU for the ARMv8
Cortex A57 CPU. It depends on the previously submitted AArch64 preparation patch
series v5 and machvirt patches, and uses the already available KVM in-kernel GIC
support. Current implementation for 64-bit guests only, and 32-bit guest support
will be introduced in near future.

As a reference, KVM Tool and the AArch64 bootwrapper were used, as well as
public documentation from ARM. The following work has been tested with SMP
capabilities, under ARMv8 Fast and Foundation Models (Open Embedded userspace
with an emulated MMC).

The v1 of this patch series related to AArch64 CPU model for Versatile Express
was sponsored by Huawei, and developed in collaboration between Huawei
Technologies Duesseldorf GmbH - European Research Center Munich (ERC) and
Virtual Open Systems.

A working tree of this implementation is available on the kvm-aarch64-v2
branch of the following github repository.

https://github.com/virtualopensystems/qemu/tree/kvm-aarch64-v2

Summary of Changes:

Changes v1 - v2
 * Based on AArch64 Preparation Patchset V5 and machvirt patches.
 * Implemented for Machvirt Machine Model.
 * Architecture-specific CPU initialization code improved. Removed hardcoding
   from register set/get loops and introduced CPU target type array to find
   appropriate ARMv8 CPU type supported by KVM.
 * Disable the PSCI method in case of AArch64 and use the spin-table method
   instead for booting secondary CPUs.
 * 32-bit guest support still missing

v1
 * Based on AArch64 Preparation Patchset V4
 * Implemented for Versatile Express Machine Model
 * Support for SMP using bootcode injection
 * No 32-bit guest support

Alexander Spyridakis (1):
  AARCH64: Add SMP support for aarch64 processors

Mian M. Hamayun (6):
  AARCH64: Add A57 CPU to default AArch64 configuration and enable KVM
  Add the additional parent parameter to memory region init calls
  AARCH64: Add aarch64 CPU initialization, get and put registers
support
  AARCH64: Add boot support for aarch64 processor
  AARCH64: Disable the non-aarch64 specific reset code
  AARCH64: Use the spin-table method for booting secondary processors
in machvirt

 configure   |2 +-
 default-configs/aarch64-softmmu.mak |3 +-
 hw/arm/boot.c   |   62 +++--
 hw/arm/virt.c   |   16 -
 hw/cpu/a57mpcore.c  |2 +-
 linux-headers/linux/kvm.h   |1 +
 target-arm/kvm.c|  127 +++
 7 files changed, 201 insertions(+), 12 deletions(-)

-- 
1.7.9.5

[Qemu-devel] [PATCH v2 6/7] AARCH64: Add SMP support for aarch64 processors

2013-07-23 Thread Mian M. Hamayun

From: Alexander Spyridakis a.spyrida...@virtualopensystems.com

AArch64 uses a cpu-release-addr memory location (defined in the dts) as
a way to inform secondary CPUs where to jump to and enter their holding
pen. Inject a very simple bootloader that polls this memory location,
until the primary CPU sets it to the right address.

Signed-off-by: Alexander Spyridakis a.spyrida...@virtualopensystems.com
---
 hw/arm/boot.c |   20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index b9b0beb..efbd984 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -17,6 +17,8 @@
 #include sysemu/device_tree.h
 #include qemu/config-file.h
 
+#define DSB_INSN 0xf57ff04f
+#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */
 #define KERNEL_ARGS_ADDR 0x100
 
 #ifdef TARGET_AARCH64
@@ -40,6 +42,16 @@ static uint32_t bootloader[] = {
 0x /* .word @Board ID Higher 32-bits -- Placeholder */
 };
 
+static uint32_t smpboot[] = {
+0x18c5, /* ldr w5, =mbox_value - mbox value for secondary CPUs */
+0xf94000a4, /* 1: ldr  x4, [x5] - Read address to jump to */
+0xb4e4, /* cbz x4, 1b - Check if mbox value is zero, if yes retry */
+0xd61f0080, /* br  x4 - Branch to given address */
+0x0,/* padding word */
+0x0,/* gic_cpu_if_addr */
+0x8000fff8  /* mbox_value: default mbox value (aka cpu_release_addr) */
+};
+
 #else
 #define KERNEL_LOAD_ADDR0x0001
 #define KERNEL_BOARDID_INDEX4
@@ -56,7 +68,6 @@ static uint32_t bootloader[] = {
   0, /* Address of kernel args.  Set by integratorcp_init.  */
   0  /* Kernel entry point.  Set by integratorcp_init.  */
 };
-#endif
 
 /* Handling for secondary CPU boot in a multicore system.
  * Unlike the uniprocessor/primary CPU boot, this is platform
@@ -72,8 +83,6 @@ static uint32_t bootloader[] = {
  * for an interprocessor interrupt and polling a configurable
  * location for the kernel secondary CPU entry point.
  */
-#define DSB_INSN 0xf57ff04f
-#define CP15_DSB_INSN 0xee070f9a /* mcr cp15, 0, r0, c7, c10, 4 */
 
 static uint32_t smpboot[] = {
   0xe59f2028, /* ldr r2, gic_cpu_if */
@@ -91,6 +100,7 @@ static uint32_t smpboot[] = {
   0,  /* gic_cpu_if: base address of GIC CPU interface */
   0   /* bootreg: Boot register address is held here */
 };
+#endif
 
 static void default_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info)
@@ -115,8 +125,12 @@ static void default_reset_secondary(ARMCPU *cpu,
 {
 CPUARMState *env = cpu-env;
 
+#ifdef TARGET_AARCH64
+env-pc = info-smp_loader_start;
+#else
 stl_phys_notdirty(info-smp_bootreg_addr, 0);
 env-regs[15] = info-smp_loader_start;
+#endif
 }
 
 #define WRITE_WORD(p, value) do { \
-- 
1.7.9.5

[Qemu-devel] [PATCH v2 4/7] AARCH64: Add boot support for aarch64 processor

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

This version supports booting of a single Aarch64 CPU by setting appropriate
registers. The bootloader includes placehoders for Board-ID that are used to
implementing uniform indexing across different bootloaders. The same macro
names are used with different values when compiling for different processors.

Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

Conflicts:

hw/arm/boot.c
---
 hw/arm/boot.c |   44 +++-
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 7cca2b3..b9b0beb 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -18,7 +18,33 @@
 #include qemu/config-file.h
 
 #define KERNEL_ARGS_ADDR 0x100
-#define KERNEL_LOAD_ADDR 0x0001
+
+#ifdef TARGET_AARCH64
+#define KERNEL_LOAD_ADDR0x0008
+#define KERNEL_ARGS_INDEX   6
+#define KERNEL_ENTRY_INDEX  8
+#define KERNEL_BOARDID_INDEX10
+
+static uint32_t bootloader[] = {
+0x58c0,/* ldr  x0, 18 ; Load the lower 32-bits of DTB */
+0xaa1f03e1,/* mov  x1, xzr */
+0xaa1f03e2,/* mov  x2, xzr */
+0xaa1f03e3,/* mov  x3, xzr */
+0x5884,/* ldr  x4, 20 ; Load the lower 32-bits of kernel entry 
*/
+0xd61f0080,/* br   x4 ; Jump to the kernel entry point */
+0x,/* .word @DTB Lower 32-bits */
+0x,/* .word @DTB Higher 32-bits */
+0x,/* .word @Kernel Entry Lower 32-bits */
+0x,/* .word @Kernel Entry Higher 32-bits */
+0x,/* .word @Board ID Lower 32-bits -- Placeholder */
+0x /* .word @Board ID Higher 32-bits -- Placeholder */
+};
+
+#else
+#define KERNEL_LOAD_ADDR0x0001
+#define KERNEL_BOARDID_INDEX4
+#define KERNEL_ARGS_INDEX   5
+#define KERNEL_ENTRY_INDEX  6
 
 /* The worlds second smallest bootloader.  Set r0-r2, then jump to kernel.  */
 static uint32_t bootloader[] = {
@@ -30,6 +56,7 @@ static uint32_t bootloader[] = {
   0, /* Address of kernel args.  Set by integratorcp_init.  */
   0  /* Kernel entry point.  Set by integratorcp_init.  */
 };
+#endif
 
 /* Handling for secondary CPU boot in a multicore system.
  * Unlike the uniprocessor/primary CPU boot, this is platform
@@ -341,8 +368,15 @@ static void do_cpu_reset(void *opaque)
 env-regs[15] = info-entry  0xfffe;
 env-thumb = info-entry  1;
 } else {
+#ifdef TARGET_AARCH64
+env-pstate = PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | 
PSR_MODE_EL1h;
+#endif
 if (env == first_cpu) {
+#ifdef TARGET_AARCH64
+env-pc = info-loader_start;
+#else
 env-regs[15] = info-loader_start;
+#endif
 if (!info-dtb_filename) {
 if (old_param) {
 set_kernel_args_old(info);
@@ -447,7 +481,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 }
 info-initrd_size = initrd_size;
 
-bootloader[4] = info-board_id;
+bootloader[KERNEL_BOARDID_INDEX] = info-board_id;
 
 /* for device tree boot, we pass the DTB directly in r2. Otherwise
  * we point to the kernel args.
@@ -462,9 +496,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 if (load_dtb(dtb_start, info)) {
 exit(1);
 }
-bootloader[5] = dtb_start;
+bootloader[KERNEL_ARGS_INDEX] = dtb_start;
 } else {
-bootloader[5] = info-loader_start + KERNEL_ARGS_ADDR;
+bootloader[KERNEL_ARGS_INDEX] = info-loader_start + 
KERNEL_ARGS_ADDR;
 if (info-ram_size = (1ULL  32)) {
 fprintf(stderr, qemu: RAM size must be less than 4GB to boot
  Linux kernel using ATAGS (try passing a device tree
@@ -472,7 +506,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 exit(1);
 }
 }
-bootloader[6] = entry;
+bootloader[KERNEL_ENTRY_INDEX] = entry;
 for (n = 0; n  sizeof(bootloader) / 4; n++) {
 bootloader[n] = tswap32(bootloader[n]);
 }
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard

2013-07-23 Thread Stefan Hajnoczi

On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
 Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
 to BlockDriverState, since in_use mechanism cannot provide proper
 management of lifecycle when a BDS is referenced in multiple places
 (e.g. pointed to by another bs's backing_hd while also used as a block
 job device, in the use case of image fleecing).
 
 The original in_use case is considered a hard reference in this patch,
 where the bs is busy and should not be used in other tasks that require
 a hard reference. (However the interface doesn't force this, caller
 still need to call bdrv_in_use() to check by itself.).
 
 A soft reference is implemented but not used yet. It will be used in
 following patches to manage the lifecycle together with hard reference.
 
 If bdrv_ref() is called on a BDS, it must be released by exactly the
 same numbers of bdrv_unref() with the same soft/hard type, and never
 call bdrv_delete() directly. If the BDS is only used locally (unnamed),
 bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().

It is risky to keep bdrv_delete() public.  I suggest replacing
bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
static in block.c.

This way it is impossible to make the mistake of calling bdrv_delete()
on a BDS which has refcnt  1.

I don't really understand this patch.  There are now two separate
refcounts.  They must both reach 0 for deletion to occur.  I think
you plan to treat the hard refcount like the in_use counter (there
should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
to keep in_use separate: let in_use callers take a refcount and also set
in_use.

[Qemu-devel] [PATCH v2 5/7] AARCH64: Disable the non-aarch64 specific reset code

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

This commit disables the co-processor registers reset code for KVM, when
compiling for AArch64 cpus.

Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com
---
 target-arm/kvm.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index c96b871..5909d75 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -700,6 +700,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 
 void kvm_arch_reset_vcpu(CPUState *cs)
 {
+#ifndef TARGET_AARCH64
 /* Feed the kernel back its initial register state */
 ARMCPU *cpu = ARM_CPU(cs);
 
@@ -709,6 +710,7 @@ void kvm_arch_reset_vcpu(CPUState *cs)
 if (!write_list_to_kvmstate(cpu)) {
 abort();
 }
+#endif
 }
 
 bool kvm_arch_stop_on_emulation_error(CPUState *cs)
-- 
1.7.9.5

[Qemu-devel] [PATCH v2 3/7] AARCH64: Add aarch64 CPU initialization, get and put registers support

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

The cpu init function tries to initialize with all possible cpu types, as
KVM does not provide a means to detect the real cpu type and simply refuses
to initialize on cpu type mis-match. By using the loop based init function,
we avoid the need to modify code if the underlying platform is different,
such as Fast Models instead of Foundation Models.

Get and Put Registers deal with the basic state of Aarch64 CPUs for the moment.

Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

Conflicts:

target-arm/kvm.c

Conflicts:

target-arm/cpu.c
---
 linux-headers/linux/kvm.h |1 +
 target-arm/kvm.c  |  125 +
 2 files changed, 126 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index c614070..4df5292 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -783,6 +783,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_IA64   0x3000ULL
 #define KVM_REG_ARM0x4000ULL
 #define KVM_REG_S390   0x5000ULL
+#define KVM_REG_ARM64  0x6000ULL
 
 #define KVM_REG_SIZE_SHIFT 52
 #define KVM_REG_SIZE_MASK  0x00f0ULL
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index b92e00d..c96b871 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -32,6 +32,11 @@
 #error mismatch between cpu.h and KVM header definitions
 #endif
 
+#ifdef TARGET_AARCH64
+#define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
+ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
+#endif
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
@@ -50,6 +55,33 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu-cpu_index;
 }
 
+#ifdef TARGET_AARCH64
+static uint32_t kvm_arm_targets[KVM_ARM_NUM_TARGETS] = {
+KVM_ARM_TARGET_AEM_V8,
+KVM_ARM_TARGET_FOUNDATION_V8,
+KVM_ARM_TARGET_CORTEX_A57
+};
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+struct kvm_vcpu_init init;
+int ret, i;
+
+memset(init.features, 0, sizeof(init.features));
+/* Find an appropriate target CPU type.
+ * KVM does not provide means to detect the host CPU type on aarch64,
+ * and simply refuses to initialize, if the CPU type mis-matches;
+ * so we try each possible CPU type on aarch64 before giving up! */
+for (i = 0; i  KVM_ARM_NUM_TARGETS; ++i) {
+init.target = kvm_arm_targets[i];
+ret = kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, init);
+if (!ret)
+break;
+}
+
+return ret;
+}
+#else
 static bool reg_syncs_via_tuple_list(uint64_t regidx)
 {
 /* Return true if the regidx is a register we should synchronize
@@ -173,6 +205,7 @@ out:
 g_free(rlp);
 return ret;
 }
+#endif
 
 /* We track all the KVM devices which need their memory addresses
  * passing to the kernel in a list of these structures.
@@ -339,6 +372,7 @@ typedef struct Reg {
 int offset;
 } Reg;
 
+#ifndef TARGET_AARCH64
 #define COREREG(KERNELNAME, QEMUFIELD)   \
 {\
 KVM_REG_ARM | KVM_REG_SIZE_U32 | \
@@ -402,7 +436,52 @@ static const Reg regs[] = {
 VFPSYSREG(FPINST),
 VFPSYSREG(FPINST2),
 };
+#endif
 
+#ifdef TARGET_AARCH64
+int kvm_arch_put_registers(CPUState *cs, int level)
+{
+struct kvm_one_reg reg;
+int i;
+int ret;
+
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = cpu-env;
+
+for (i = 0; i  ARRAY_SIZE(env-xregs); i++) {
+reg.id = AARCH64_CORE_REG(regs.regs[i]);
+reg.addr = (uintptr_t) env-xregs[i];
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg);
+if (ret) {
+return ret;
+}
+}
+
+reg.id = AARCH64_CORE_REG(regs.sp);
+reg.addr = (uintptr_t) env-xregs[31];
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg);
+if (ret) {
+return ret;
+}
+
+reg.id = AARCH64_CORE_REG(regs.pstate);
+reg.addr = (uintptr_t) env-pstate;
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg);
+if (ret) {
+return ret;
+}
+
+reg.id = AARCH64_CORE_REG(regs.pc);
+reg.addr = (uintptr_t) env-pc;
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, reg);
+if (ret) {
+return ret;
+}
+
+/* TODO: Set Rest of Registers */
+return ret;
+}
+#else
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -488,7 +567,52 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 
 return ret;
 }
+#endif
 
+#ifdef TARGET_AARCH64
+int kvm_arch_get_registers(CPUState *cs)
+{
+struct kvm_one_reg reg;
+int i;
+int ret;
+
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = cpu-env;
+
+for (i = 0; i  ARRAY_SIZE(env-xregs); i++) {
+reg.id = AARCH64_CORE_REG(regs.regs[i]);
+reg.addr = (uintptr_t)

[Qemu-devel] [PATCH v2 7/7] AARCH64: Use the spin-table method for booting secondary processors in machvirt

2013-07-23 Thread Mian M. Hamayun

From: Mian M. Hamayun m.hama...@virtualopensystems.com

As the SMP bootloader uses a spin-table to wait for the cpu_release_addr, we
disable the PSCI method for AArch64 in machvirt and use spin-table instead.

The CPU_RELEASE_OFFSET is introduced in machvirt and is to calculate the
cpu_release_addr by addition of this value to the memory base address, and this
value is updated in the smp bootloader using the smp_bootreg_addr board info
parameter.

Tested-by: Alexander Spyridakis a.spyrida...@virtualopensystems.com
Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com
---
 hw/arm/virt.c |   14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8a2bdc7..44ab59d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -52,6 +52,7 @@
 #define IO_LEN 0x000f
 
 #if defined(TARGET_AARCH64)
+#define CPU_RELEASE_OFFSET 0xfff8
 #define DEFAULT_CPU_MODEL cortex-a57
 #elif defined(TARGET_ARM)
 #define DEFAULT_CPU_MODEL cortex-a15
@@ -170,7 +171,7 @@ static void *initial_fdt(struct machine_info *mi)
 qemu_devtree_setprop_cell(fdt, /soc, #interrupt-cells, 0x1);
 
 /* No PSCI for TCG yet */
-#ifdef CONFIG_KVM
+#if defined (CONFIG_KVM)  !defined(TARGET_AARCH64)
 if (kvm_enabled()) {
 qemu_devtree_add_subnode(fdt, /psci);
 qemu_devtree_setprop_string(fdt, /psci, compatible, arm,psci);
@@ -234,7 +235,14 @@ static void fdt_add_cpu_nodes(void *fdt, struct 
machine_info *mi, int smp_cpus)
 mi-cpu_compatible);
 
 if (smp_cpus  1) {
+#if defined(TARGET_AARCH64)
+qemu_devtree_setprop_string(fdt, cpu_name, enable-method,
+spin-table);
+qemu_devtree_setprop_u64(fdt, cpu_name, cpu-release-addr,
+(mi-mem_base + CPU_RELEASE_OFFSET));
+#else
 qemu_devtree_setprop_string(fdt, cpu_name, enable-method, 
psci);
+#endif
 }
 
 qemu_devtree_setprop_cell(fdt, cpu_name, reg, cpu);
@@ -437,6 +445,10 @@ static void machvirt_init(QEMUMachineInitArgs *args)
 machvirt_binfo.nb_cpus = smp_cpus;
 machvirt_binfo.board_id = -1;
 machvirt_binfo.loader_start = mi-mem_base;
+machvirt_binfo.smp_loader_start = mi-mem_base + 0x1000;
+#if defined(TARGET_AARCH64)
+machvirt_binfo.smp_bootreg_addr = mi-mem_base + CPU_RELEASE_OFFSET;
+#endif
 machvirt_binfo.get_dtb = machvirt_dtb;
 arm_load_kernel(arm_env_get_cpu(first_cpu), machvirt_binfo);
 }
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls

2013-07-23 Thread Andreas Färber

Am 23.07.2013 11:33, schrieb Mian M. Hamayun:
 From: Mian M. Hamayun m.hama...@virtualopensystems.com
 
 The memory region init calls require an additional parent parameter, so
 introduce a null parent parameter to make it happy.
 
 Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

This is not OK for something labelled PATCH. Patch series need to be
bisectable, not fixing up earlier patch series that have not been
applied yet.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on

2013-07-23 Thread yinyin

Hi,
在 2013-7-23，下午5:30，Andreas Färber afaer...@suse.de 写道：

 Hi,
 
 Am 23.07.2013 11:19, schrieb yinyin:
 Hi, Stefan:
  during the migration, the source, not the destination, will start 
 dataplane again
  I think the process of migration with dataplane as follows:
 1. migration begin to start
 2. the migration source stop the dataplane
 3. do migration ...
 4. migration completed, the destination start the dataplane.
 
 I can't speak for the dataplane, but in general the source guest is
 expected to continue working during live migration (that's the live
 part) until it has been fully transferred to the destination.

when dataplane stop, the source guest can continue work, but not use dataplane 
thread.

 
 HTH,
 Andreas
 
 when migration start, the source dataplane should already stopped, and not 
 start again, if there is no cancel or abort.
 But the trace show that, in step 3 above, the source dataplane will be start 
 by virtio_blk_handle_output. I'm afraid of some inconsistent will happen 
 there.Is it right?
 
 There is no crash found, I just use this trace to understand the flow of 
 dataplane migration. 
 
 Yin Yin
 yin@cs2c.com.cn
 
 
 在 2013-7-23，下午4:51，Stefan Hajnoczi stefa...@redhat.com 写道：
 
 On Fri, Jul 19, 2013 at 01:33:12PM +0800, yinyin wrote:
I use systemtap to test this patch,the migration will success. But I 
 found the dataplane will start again after migration start. the 
 virtio_blk_handle_output will start dataplane.
 
 Hi Yin Yin,
 Thank you for testing the patch.  It is not clear to me whether you
 encountered a problem or not.
 
 It is expected that the destination will start the dataplane thread.
 Was there a crash or some reason why you posted these traces?
 
 Stefan
 
 
 
 
 
 -- 
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2 04/11] block: use refcnt for device attach/detach

2013-07-23 Thread Stefan Hajnoczi

On Wed, Jul 17, 2013 at 05:42:09PM +0800, Fam Zheng wrote:
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/block.c b/block.c
 index 7b46669..57a3876 100644
 --- a/block.c
 +++ b/block.c
 @@ -1622,6 +1622,7 @@ int bdrv_attach_dev(BlockDriverState *bs, void *dev)
  return -EBUSY;
  }
  bs-dev = dev;
 +bdrv_ref(bs, false);
  bdrv_iostatus_reset(bs);
  return 0;
  }
 @@ -1639,6 +1640,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
  {
  assert(bs-dev == dev);
  bs-dev = NULL;
 +bdrv_unref(bs, false);
  bs-dev_ops = NULL;
  bs-dev_opaque = NULL;
  bs-buffer_alignment = 512;

I'm not 100% sure about this.  sd_init() and
ide_init2_with_non_qdev_drives() call bdrv_attach_dev_nofail() but I
don't see a bdrv_detach_dev() call.

It may be necessary to audit the lifecycle of emulated storage
controllers more closely before the refcounts work correctly.

Re: [Qemu-devel] [PATCH v2 05/11] migration: omit drive ref as we have bdrv_ref now

2013-07-23 Thread Stefan Hajnoczi

On Wed, Jul 17, 2013 at 05:42:10PM +0800, Fam Zheng wrote:
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block-migration.c | 2 --
  1 file changed, 2 deletions(-)
 
 diff --git a/block-migration.c b/block-migration.c
 index d558410..d14f4eb 100644
 --- a/block-migration.c
 +++ b/block-migration.c
 @@ -320,7 +320,6 @@ static void init_blk_migration_it(void *opaque, 
 BlockDriverState *bs)
  bmds-completed_sectors = 0;
  bmds-shared_base = block_mig_state.shared_base;
  alloc_aio_bitmap(bmds);
 -drive_get_ref(drive_get_by_blockdev(bs));
  bdrv_ref(bs, true);
  
  block_mig_state.total_sector_sum += sectors;
 @@ -558,7 +557,6 @@ static void blk_mig_cleanup(void)
  while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
  QSIMPLEQ_REMOVE_HEAD(block_mig_state.bmds_list, entry);
  bdrv_unref(bmds-bs, true);
 -drive_put_ref(drive_get_by_blockdev(bmds-bs));
  g_free(bmds-aio_bitmap);
  g_free(bmds);
  }
 -- 
 1.8.3.2

The key information here is that block-migration.c does not actually use
DriveInfo anywhere.  Hence it's safe to drop this code since we really
only cared about referencing BDS.

I suggest including an explanation like this in the commit description.
I had to audit the code to check whether the DriveInfo was used anywhere
else.

Re: [Qemu-devel] [PATCH v2 06/11] xen_disk: simplify blk_disconnect with refcnt

2013-07-23 Thread Stefan Hajnoczi

On Wed, Jul 17, 2013 at 05:42:11PM +0800, Fam Zheng wrote:
 We call bdrv_attach_dev when initializing whether or not bs is created
 locally, so call bdrv_detach_dev and let the refcnt handle the
 lifecycle.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  hw/block/xen_disk.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)
 
 diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
 index 247f32f..ae17acc 100644
 --- a/hw/block/xen_disk.c
 +++ b/hw/block/xen_disk.c
 @@ -910,12 +910,7 @@ static void blk_disconnect(struct XenDevice *xendev)
  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
 xendev);
  
  if (blkdev-bs) {
 -if (!blkdev-dinfo) {
 -/* close/delete only if we created it ourself */
 -bdrv_close(blkdev-bs);
 -bdrv_detach_dev(blkdev-bs, blkdev);
 -bdrv_delete(blkdev-bs);
 -}
 +bdrv_detach_dev(blkdev-bs, blkdev);
  blkdev-bs = NULL;
  }
  xen_be_unbind_evtchn(blkdev-xendev);

This reminds me that bdrv_detach_dev() needs a comment documenting that
it decrements the refcount; bs may be deleted when the function returns
and must not be accessed anymore.

Re: [Qemu-devel] [PATCH v2 07/11] block: hold hard reference for backup/mirror target

2013-07-23 Thread Stefan Hajnoczi

On Wed, Jul 17, 2013 at 05:42:12PM +0800, Fam Zheng wrote:
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/backup.c | 3 ++-
  block/mirror.c | 4 ++--
  2 files changed, 4 insertions(+), 3 deletions(-)

Should we update the blockjob.c in_use code instead of adding
refcounting to specific block jobs?  This ought to be handled
generically for all block jobs.

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Michael S. Tsirkin

On Tue, Jul 23, 2013 at 11:10:45AM +0200, Andreas Färber wrote:
 Am 23.07.2013 09:07, schrieb Michael S. Tsirkin:
  On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote:
  For VMState I believe the real follow-up fix would be mst defining a
  central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice.
  Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE()
  in the first place?
  
  The real fix is savevm/loadvm taking into account
  the class hierarchy.
 
 That's not helping, unless you write a patch to show what you mean and

I merely mean that if I inherit a class I should
inherit it's vmstate.
So explicitly adding VMSTATE_PCI_DEVICE should not be
necessary.

 how that is going to be migration-compatible.

Most devices put VMSTATE_PCI_DEVICE at the beninning,
so just calling that before vmstate for the device
should be compatible.

 Does your not answering the question mean you don't know?
 
 Andreas

I think the answer is that most pcie devices
don't implement AER. AFAIK PCI devices can't
support AER at all.

 -- 
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 0/3] dataplane: virtio-blk live migration with x-data-plane=on

2013-07-23 Thread Andreas Färber

Am 23.07.2013 11:43, schrieb yinyin:
 在 2013-7-23，下午5:30，Andreas Färber afaer...@suse.de 写道：
 Am 23.07.2013 11:19, schrieb yinyin:
 during the migration, the source, not the destination, will start 
 dataplane again
 I think the process of migration with dataplane as follows:
 1. migration begin to start
 2. the migration source stop the dataplane
 3. do migration ...
 4. migration completed, the destination start the dataplane.

 I can't speak for the dataplane, but in general the source guest is
 expected to continue working during live migration (that's the live
 part) until it has been fully transferred to the destination.
 
 when dataplane stop, the source guest can continue work, but not use 
 dataplane thread.

So how would it do I/O then?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls

2013-07-23 Thread Peter Maydell

On 23 July 2013 10:43, Andreas Färber afaer...@suse.de wrote:
 Am 23.07.2013 11:33, schrieb Mian M. Hamayun:
 From: Mian M. Hamayun m.hama...@virtualopensystems.com

 The memory region init calls require an additional parent parameter, so
 introduce a null parent parameter to make it happy.

 Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

 This is not OK for something labelled PATCH. Patch series need to be
 bisectable, not fixing up earlier patch series that have not been
 applied yet.

I have a rebased version of John's mach-virt patch which includes
these fixes; I haven't sent it out yet because I've still been
pondering whether the create device tree nodes for everything
code can be made less ugly...

-- PMM

Re: [Qemu-devel] [PATCH v2 2/7] Add the additional parent parameter to memory region init calls

2013-07-23 Thread Andreas Färber

Am 23.07.2013 12:00, schrieb Peter Maydell:
 On 23 July 2013 10:43, Andreas Färber afaer...@suse.de wrote:
 Am 23.07.2013 11:33, schrieb Mian M. Hamayun:
 From: Mian M. Hamayun m.hama...@virtualopensystems.com

 The memory region init calls require an additional parent parameter, so
 introduce a null parent parameter to make it happy.

 Signed-off-by: Mian M. Hamayun m.hama...@virtualopensystems.com

 This is not OK for something labelled PATCH. Patch series need to be
 bisectable, not fixing up earlier patch series that have not been
 applied yet.
 
 I have a rebased version of John's mach-virt patch which includes
 these fixes; I haven't sent it out yet because I've still been
 pondering whether the create device tree nodes for everything
 code can be made less ugly...

I'd also appreciate if you would update cpu/a57core.c wrt the container
MemoryRegion and QOM realize (still a SysBus initfn here) - that was the
intent of my a15mpcore patches I cc'ed all aarch64 people on.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2 11/11] qmp: add command 'blockdev-backup'

2013-07-23 Thread Stefan Hajnoczi

On Fri, Jul 19, 2013 at 06:16:55PM +0800, Wenchao Xia wrote:
 于 2013-7-18 12:41, Fam Zheng 写道:
 On Wed, 07/17 06:44, Eric Blake wrote:
 On 07/17/2013 03:42 AM, Fam Zheng wrote:
 Similar to drive-backup, but this command uses a device id as target
 instead of creating/opening an image file.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
   blockdev.c   | 71 
  
   qapi-schema.json | 49 ++
   qmp-commands.hx  | 22 ++
   3 files changed, 142 insertions(+)
 
 
 +++ b/qapi-schema.json
 @@ -1665,6 +1665,40 @@
   '*on-target-error': 'BlockdevOnError' } }
 
   ##
 +# @BlockdevBackup
 +#
 
 +{ 'type': 'BlockdevBackup',
 +  'data': { 'device': 'str', 'target': 'str',
 +'sync': 'MirrorSyncMode',
 +'*speed': 'int',
 +'*on-source-error': 'BlockdevOnError',
 +'*on-target-error': 'BlockdevOnError' } }
 
 Seems okay.  But what is missing is the addition of this type into the
 union used for 'transaction' - shouldn't it be possible to mix this with
 other transaction capabilities?
 
 
 Should be possible, as users may want point-in-time snapshot of multiple
 disks. If this API looks OK, I will include it into transaction in the
 next version.
 
   Instead of add this new API, how about extend Driver-backup API? This
 patch is basically doing similar things with driver-backup, extension
 of old API will save trouble to do same things driver-backup already
 does, such as support qmp_transaction.

The rationale was that drive-* commands should be high-level and can
create image files.  blockdev-* commands should be low-level and work on
an existing -drive.

The reason for keeping two separate commands is to avoid adding in
parameters that work at different levels (filename vs drive name).

In terms of API design I think drive- + blockdev- really is cleaner.
Kevin can explain in more detail if I got something wrong.

Stefan

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Andreas Färber

Am 23.07.2013 11:59, schrieb Michael S. Tsirkin:
 On Tue, Jul 23, 2013 at 11:10:45AM +0200, Andreas Färber wrote:
 Am 23.07.2013 09:07, schrieb Michael S. Tsirkin:
 On Mon, Jul 22, 2013 at 11:04:49PM +0200, Andreas Färber wrote:
 For VMState I believe the real follow-up fix would be mst defining a
 central macro VMSTATE_PCI_DEVICE_AER_LOG() operating on PCIDevice.
 Why is that separate from VMSTATE_PCI_DEVICE() or VMSTATE_PCIE_DEVICE()
 in the first place?
 
 I think the answer is that most pcie devices
 don't implement AER. AFAIK PCI devices can't
 support AER at all.

Okay, so if it's just PCIe, then XHCI is the oddball preventing moving
it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place,
also operating on PCIDevice.

Is there a way to detect use of AER or MSIX to place those into
subsections of VMSTATE_PCIE_DEVICE()?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 04:53, liu ping fan ha scritto:
 The scenior I can figure out is if adopting timeout of poll, then when
 changing the deadline, we need to invoke poll, and set the new
 timeout, right?

Yes, you need to call aio_notify so that poll is reinvoked.

Paolo

Re: [Qemu-devel] [PATCH qom-cpu v3 00/41] QOM CPUState, part 11: GDB stub

2013-07-23 Thread Max Filippov

On Wed, Jul 10, 2013 at 2:23 AM, Andreas Färber afaer...@suse.de wrote:
 Hello,

 This series cleans up gdbstub by changing all its internal CPU state to 
 CPUState
 and by moving most target-specific code into the target directories.

 Support for m68k, moxie and unicore32 to set the PC via gdbstub is added.

 Lightweight subclasses for XtensaCPU are introduced, keeping the XtensaConfig
 mechanisms, to stop xtensa from deviating at gdbstub level wrt register count.

 I still wonder whether there would be interest in adding a program-counter
 dynamic property to the CPU, given that a setter has been factored out here?

 v3 avoids find_cpu() related breakages by deferring GDBState::c_cpu conversion
 until GDBState::g_cpu and find_cpu() can easily be converted, too.

 Available for testing at:
 git://github.com/afaerber/qemu-cpu.git qom-cpu-11.v3
 https://github.com/afaerber/qemu-cpu/commits/qom-cpu-11.v3

xtensa parts: Acked-by: Max Filippov jcmvb...@gmail.com

-- 
Thanks.
-- Max

[Qemu-devel] [PULL 0/3] OpenRISC patch queue

2013-07-23 Thread Jia Liu

Hi Anthony,

This is my OpenRISC patch queue, and myfirst time to send a pull requests,
please pull.

It fix OpenRISC CPU and sim broad:
* Free typename in openrisc_cpu_class_by_name
* Use stderr output in openrisc_sim.c
* fixed a indent typo.


The following changes since commit 3464700f6aecb3e2aa9098839d90672d6b3fa974:

  tests: Add test-bitops.c with some sextract tests (2013-07-22 15:41:49 -0500)

are available in the git repository at:

  git://github.com/J-Liu/qemu.git or32

for you to fetch changes up to 9b146e9a28bbd9567f5ac6a8e2bcb543aa3b9392:

  target-openrisc: Free typename in openrisc_cpu_class_by_name (2013-07-23 
18:32:30 +0800)


Jia Liu (3):
  hw/openrisc: Indent typo
  hw/openrisc: Use stderr output instead of qemu_log
  target-openrisc: Free typename in openrisc_cpu_class_by_name

 hw/openrisc/openrisc_sim.c | 6 +++---
 target-openrisc/cpu.c  | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

[Qemu-devel] [PULL 1/3] hw/openrisc: Indent typo

2013-07-23 Thread Jia Liu

Indent typo.

Signed-off-by: Jia Liu pro...@gmail.com
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Andreas Färber afaer...@suse.de
---
 hw/openrisc/openrisc_sim.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
index 924438b..250f5b5 100644
--- a/hw/openrisc/openrisc_sim.c
+++ b/hw/openrisc/openrisc_sim.c
@@ -96,7 +96,7 @@ static void openrisc_sim_init(QEMUMachineInitArgs *args)
 ram_addr_t ram_size = args-ram_size;
 const char *cpu_model = args-cpu_model;
 const char *kernel_filename = args-kernel_filename;
-   OpenRISCCPU *cpu = NULL;
+OpenRISCCPU *cpu = NULL;
 MemoryRegion *ram;
 int n;
 
-- 
1.7.12.4 (Apple Git-37)

Re: [Qemu-devel] [PATCH v2 01/11] block: replace in_use with refcnt_soft and refcnt_hard

2013-07-23 Thread Fam Zheng

On Tue, 07/23 11:36, Stefan Hajnoczi wrote:
 On Wed, Jul 17, 2013 at 05:42:06PM +0800, Fam Zheng wrote:
  Introduce refcnt_soft (soft reference) and refcnt_hard (hard reference)
  to BlockDriverState, since in_use mechanism cannot provide proper
  management of lifecycle when a BDS is referenced in multiple places
  (e.g. pointed to by another bs's backing_hd while also used as a block
  job device, in the use case of image fleecing).
  
  The original in_use case is considered a hard reference in this patch,
  where the bs is busy and should not be used in other tasks that require
  a hard reference. (However the interface doesn't force this, caller
  still need to call bdrv_in_use() to check by itself.).
  
  A soft reference is implemented but not used yet. It will be used in
  following patches to manage the lifecycle together with hard reference.
  
  If bdrv_ref() is called on a BDS, it must be released by exactly the
  same numbers of bdrv_unref() with the same soft/hard type, and never
  call bdrv_delete() directly. If the BDS is only used locally (unnamed),
  bdrv_ref/bdrv_unref can be skipped and just use bdrv_delete().
 
 It is risky to keep bdrv_delete() public.  I suggest replacing
 bdrv_delete() callers with bdrv_unref() and then making bdrv_delete()
 static in block.c.
 
 This way it is impossible to make the mistake of calling bdrv_delete()
 on a BDS which has refcnt  1.
 
 I don't really understand this patch.  There are now two separate
 refcounts.  They must both reach 0 for deletion to occur.  I think
 you plan to treat the hard refcount like the in_use counter (there
 should only be 0 or 1 refs) but you don't enforce it.  It seems cleaner
 to keep in_use separate: let in_use callers take a refcount and also set
 in_use.

OK, I like your ideas, make bdrv_delete private is much cleaner. Will
fix in next revision.

I plan to make it like this:

/* soft ref */
void bdrv_{ref,unref}(bs)

/* hard ref */
bool bdrv_hard_{ref,unref}(bs)

usage:
bs = bdrv_new()
implicit bdrv_ref(bs) called
...
bdrv_unref(bs)
automatically deleted here

or with hard ref:
bs = bdrv_new()
implicit bdrv_ref() called

bdrv_hard_ref(bs)
...
bdrv_hard_unref(bs)

bdrv_unref(bs)
automatically deleted here

The second bdrv_hard_ref call to a bs returns false, caller check the
return value.

-- 
Fam

[Qemu-devel] [PULL 3/3] target-openrisc: Free typename in openrisc_cpu_class_by_name

2013-07-23 Thread Jia Liu

We should free typename here.

Signed-off-by: Jia Liu pro...@gmail.com
Reviewed-by: Andreas Färber afaer...@suse.de
---
 target-openrisc/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
index 6d40f1b..e348df0 100644
--- a/target-openrisc/cpu.c
+++ b/target-openrisc/cpu.c
@@ -99,6 +99,7 @@ static ObjectClass *openrisc_cpu_class_by_name(const char 
*cpu_model)
 
 typename = g_strdup_printf(%s- TYPE_OPENRISC_CPU, cpu_model);
 oc = object_class_by_name(typename);
+g_free(typename);
 if (oc != NULL  (!object_class_dynamic_cast(oc, TYPE_OPENRISC_CPU) ||
object_class_is_abstract(oc))) {
 return NULL;
-- 
1.7.12.4 (Apple Git-37)

[Qemu-devel] [PULL 2/3] hw/openrisc: Use stderr output instead of qemu_log

2013-07-23 Thread Jia Liu

We should use stderr output instead of qemu_log in order to output ErrMsg
onto the screen.

Signed-off-by: Jia Liu pro...@gmail.com
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Andreas Färber afaer...@suse.de
---
 hw/openrisc/openrisc_sim.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
index 250f5b5..a08f27c 100644
--- a/hw/openrisc/openrisc_sim.c
+++ b/hw/openrisc/openrisc_sim.c
@@ -82,7 +82,7 @@ static void cpu_openrisc_load_kernel(ram_addr_t ram_size,
 }
 
 if (kernel_size  0) {
-qemu_log(QEMU: couldn't load the kernel '%s'\n,
+fprintf(stderr, QEMU: couldn't load the kernel '%s'\n,
 kernel_filename);
 exit(1);
 }
@@ -107,7 +107,7 @@ static void openrisc_sim_init(QEMUMachineInitArgs *args)
 for (n = 0; n  smp_cpus; n++) {
 cpu = cpu_openrisc_init(cpu_model);
 if (cpu == NULL) {
-qemu_log(Unable to find CPU definition!\n);
+fprintf(stderr, Unable to find CPU definition!\n);
 exit(1);
 }
 qemu_register_reset(main_cpu_reset, cpu);
-- 
1.7.12.4 (Apple Git-37)

Re: [Qemu-devel] [PATCH 1/2] block: allow live commit of active image

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 04:03, Fam Zheng ha scritto:
  stop
  block-job-complete ide0-hd0
  cont
  
 I see, this way the job needs to stop vm in the point of all copying
 drained, then report 'ready' and wait for manual complete before
 swapping active, sounds not so good. Ideally we should mirror writes
 _synchronously_ to 'top' and 'base' after 'ready' state and wait for
 manual complete to switch image. Do you think this is easy to do?

Synchronous mirroring makes it harder to handle errors in writing the
target, especially if you're not running with rerror=stop/werror=stop
(they would be reported to the guest).

Paolo

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Gerd Hoffmann

  Hi,

 Okay, so if it's just PCIe, then XHCI is the oddball preventing moving
 it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place,
 also operating on PCIDevice.

Given that live migration support for xhci was added post-1.5 so we
don't have a release with it yet this shouldn't be a blocker in case we
get this sorted in time for 1.6.

 Is there a way to detect use of AER or MSIX to place those into
 subsections of VMSTATE_PCIE_DEVICE()?

There is msix_enabled() ...

Dunno about AER.

cheers,
  Gerd

Re: [Qemu-devel] [RFC PATCH] vga: Start supporting resolution not multiple of 16 correctly.

2013-07-23 Thread Gerd Hoffmann

  Hi,

 Tested-by: Fabio Fantoni fabio.fant...@m2r.biz
 
 I tested it for a long time with spice on xen (because qxl will be fully
 working only after adding SSE support on hvm domUs). It works, I think
 it is good to add this and the respective vgabios patch on upstream.

  case VBE_DISPI_INDEX_XRES:
 -if ((val = VBE_DISPI_MAX_XRES)  ((val  7) == 0)) {
 +if ((val = VBE_DISPI_MAX_XRES)  ((val  1) == 0)) {
  s-vbe_regs[s-vbe_index] = val;
  }
  break;

It's not that simple.  With 32bit depths common today it will work fine,
but for lower depths (especially those lower than 8bit) this will give
you broken scanline alignment.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 02/11] iov: handle EOF in iov_send_recv

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
 Without this patch, iov_send_recv() never returns when do_send_recv()
 returns zero.
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 ---
  util/iov.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/util/iov.c b/util/iov.c
 index cc6e837..f705586 100644
 --- a/util/iov.c
 +++ b/util/iov.c
 @@ -202,6 +202,12 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
 unsigned iov_cnt,
  return -1;
  }
  
 +if (ret == 0  !do_send) {
 +/* recv returns 0 when the peer has performed an orderly
 + * shutdown. */
 +break;
 +}
 +
  /* Prepare for the next iteration */
  offset += ret;
  total += ret;
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

... and should also be in 1.5.2.

Paolo

Re: [Qemu-devel] [PATCH 03/11] qemu-sockets: make wait_for_connect be invoked in qemu_aio_wait

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 10:30, MORITA Kazutaka ha scritto:
 This allows us to use inet_nonblocking_connect() and
 unix_nonblocking_connect() in block drivers.
 
 qemu-ga needs to link block-obj to resolve dependencies of
 qemu_aio_set_fd_handler().
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

I'm not sure this is safe.  You could have e.g. migration start during
qemu_aio_wait().

Paolo

 ---
  Makefile|  4 ++--
  util/qemu-sockets.c | 15 ++-
  2 files changed, 12 insertions(+), 7 deletions(-)
 
 diff --git a/Makefile b/Makefile
 index c06bfab..5fe2e0f 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -197,7 +197,7 @@ fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
  qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
   $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h  $  $@,  GEN  
  $@)
  
 -qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
 +qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) $(LIBS_TOOLS)
  qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated
  
  gen-out-type = $(subst .,-,$(suffix $@))
 @@ -227,7 +227,7 @@ $(SRC_PATH)/qapi-schema.json 
 $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
  QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h 
 qga-qapi-visit.h qga-qmp-commands.h)
  $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
  
 -qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a
 +qemu-ga$(EXESUF): $(qga-obj-y) $(block-obj-y) libqemuutil.a libqemustub.a
   $(call LINK, $^)
  
  clean:
 diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
 index 095716e..8b21fd1 100644
 --- a/util/qemu-sockets.c
 +++ b/util/qemu-sockets.c
 @@ -218,6 +218,11 @@ typedef struct ConnectState {
  static int inet_connect_addr(struct addrinfo *addr, bool *in_progress,
   ConnectState *connect_state, Error **errp);
  
 +static int return_true(void *opaque)
 +{
 +return 1;
 +}
 +
  static void wait_for_connect(void *opaque)
  {
  ConnectState *s = opaque;
 @@ -225,7 +230,7 @@ static void wait_for_connect(void *opaque)
  socklen_t valsize = sizeof(val);
  bool in_progress;
  
 -qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
 +qemu_aio_set_fd_handler(s-fd, NULL, NULL, NULL, NULL);
  
  do {
  rc = qemu_getsockopt(s-fd, SOL_SOCKET, SO_ERROR, val, valsize);
 @@ -288,8 +293,8 @@ static int inet_connect_addr(struct addrinfo *addr, bool 
 *in_progress,
  
  if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
  connect_state-fd = sock;
 -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
 - connect_state);
 +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
 +connect_state);
  *in_progress = true;
  } else if (rc  0) {
  error_set_errno(errp, errno, QERR_SOCKET_CONNECT_FAILED);
 @@ -749,8 +754,8 @@ int unix_connect_opts(QemuOpts *opts, Error **errp,
  
  if (connect_state != NULL  QEMU_SOCKET_RC_INPROGRESS(rc)) {
  connect_state-fd = sock;
 -qemu_set_fd_handler2(sock, NULL, NULL, wait_for_connect,
 - connect_state);
 +qemu_aio_set_fd_handler(sock, NULL, wait_for_connect, return_true,
 +connect_state);
  return sock;
  } else if (rc = 0) {
  /* non blocking socket immediate success, call callback */

Re: [Qemu-devel] [Question] why x2apic's set by default without host support(on Nehalem CPU).

2013-07-23 Thread Jiri Denemark

On Tue, Jul 23, 2013 at 10:44:48 +0800, Peter Huang(Peng) wrote:
  libvirt's host-passthrough uses -cpu host', and it -cpu host
  enables every feature that can be enabled on the host.
 From my test results, I found that even when use host-passthrough mode, VM's
 cpu features are very different from host, this doesn't match what 
 host-passthrough
 mode's explanation.
 
 libvirt's option exlanation:
 With this mode, the CPU visible to the guest should be exactly  the  same as  
 the host 
 CPU even in  the aspects  that libvirt  does not understand.

The libvirt documentation is what needs to be updated. While
host-passthrough is asking for a CPU which is as close as possible to
the real host CPU, there are features that need special handling before
they can be provided to a guest. And if the hypervisor does not provide
that handling, it may just filter such feature out. Also if some
features can be efficiently provided to a guest even though the host CPU
does not provide them (x2apic is an example of such feature), they may
be provided to a guest.

Jirka

Re: [Qemu-devel] [PATCH 0/7] s390 fixes

2013-07-23 Thread Christian Borntraeger

On 15/07/13 21:57, Christian Borntraeger wrote:
 Alex,
 
 here is a bunch of fixes/cleanups for s390.
 
 Heinz Graalfs (1):
   s390/sclpconsole: handle char layer busy conditions
 
 Thomas Huth (6):
   s390x/ioinst: Add missing alignment checks for IO instructions
   s390x/ioinst: Throw addressing exception when memory_map failed
   s390x/ioinst: Fixed alignment check in SCHM instruction
   s390x/ioinst: Fixed priority of operand exceptions
   s390x/kvm: Reworked/fixed handling of cc3 in kvm_handle_css_inst()
   s390x/kvm: Remove redundant return code
 
  hw/char/sclpconsole.c | 18 +-
  target-s390x/ioinst.c | 65 
 ++-
  target-s390x/kvm.c| 64 +++---
  3 files changed, 57 insertions(+), 90 deletions(-)
 

Ping?

Re: [Qemu-devel] [PATCH V6 0/3] Implement sync modes for drive-backup.

2013-07-23 Thread Stefan Hajnoczi

On Mon, Jul 22, 2013 at 03:09:17PM -0700, Ian Main wrote:
 This patch adds sync modes on top of the work that Stefan Hajnoczi has done.
 
 These patches apply on kevin/block.
 
 Hopefully all is in order as this is my first QEMU patch.  Many thanks to
 Stephan and Fam Zheng for their help.
 
 V2:
 
 - No longer poll, instead use qemu_coroutine_yield().
 - Use bdrv_co_is_allocated().
 - Much better SYNC_MODE_NONE test.
 
 V3:
 
 - A few style fixes.
 - Better commit message explaining how TOP and NONE operate.
 - Verified using checkpatch.pl.
 
 V4:
 
 - Add patch to use the source as a backing hd during backup.
 - Add patch to default sync mode none to qcow2.
 
 V5:
 
 - Fix qcow2 patch.  Forgot to git add final version.
 
 V6:
 
 - Default to requiring 'format' when mode is absolute-paths.
 - Removed one bad hunk that was misapplying.
 - Fixed docs, examples and tests to match changes.
 - Added tests for format bad/missing.
 - Added bdrv_set_in_use() to target.
 - Default to qcow2 patch not required.
 
 Ian Main (3):
   Implement sync modes for drive-backup.
   Add tests for sync modes 'TOP' and 'NONE'
   Add backing drive while performing backup.
 
  block/backup.c| 107 +
  blockdev.c|  36 +-
  include/block/block_int.h |   4 +-
  qapi-schema.json  |   4 +-
  qmp-commands.hx   |   2 +
  tests/qemu-iotests/055| 108 
 +-
  tests/qemu-iotests/055.out|   4 +-
  tests/qemu-iotests/group  |   2 +-
  tests/qemu-iotests/iotests.py |   5 ++
  9 files changed, 211 insertions(+), 61 deletions(-)

This patch mostly takes care of image fleecing except it does not give
the target a device name which can be used by nbd-server-add.

Fam's series tackles the target device name and some of the overlapping
problems with your series.

The core feature in your series is sync=top|none and that needs to be
merged.

Now we need to figure out which patches to take and what must be
changed.  Please see the sub-threads on Fam's series.  Perhaps we can
reach a consensus there.

Stefan

[Qemu-devel] [PATCH v2] spice: fix display initialization

2013-07-23 Thread Gerd Hoffmann

Spice has two display interface implementations:  One integrated into
the qxl graphics card, and one generic which can operate with every
qemu-emulated graphics card.

The generic one is activated in case spice is used without qxl.  The
logic for that only caught the -vga qxl case, -device qxl-vga goes
unnoticed.  Fix that by adding a check in the spice interface
registration so we'll notice the qxl card no matter how it is created.

https://bugzilla.redhat.com/show_bug.cgi?id=981094

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 include/sysemu/sysemu.h |1 -
 include/ui/qemu-spice.h |2 ++
 ui/spice-core.c |5 +
 vl.c|2 +-
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3caeb66..d7a77b6 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -103,7 +103,6 @@ typedef enum {
 
 extern int vga_interface_type;
 #define xenfb_enabled (vga_interface_type == VGA_XENFB)
-#define qxl_enabled (vga_interface_type == VGA_QXL)
 
 extern int graphic_width;
 extern int graphic_height;
diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
index eba6d77..c6c756b 100644
--- a/include/ui/qemu-spice.h
+++ b/include/ui/qemu-spice.h
@@ -27,6 +27,7 @@
 #include monitor/monitor.h
 
 extern int using_spice;
+extern int spice_displays;
 
 void qemu_spice_init(void);
 void qemu_spice_input_init(void);
@@ -57,6 +58,7 @@ static inline CharDriverState *qemu_chr_open_spice_port(const 
char *name)
 #include monitor/monitor.h
 
 #define using_spice 0
+#define spice_displays 0
 static inline int qemu_spice_set_passwd(const char *passwd,
 bool fail_if_connected,
 bool disconnect_if_connected)
diff --git a/ui/spice-core.c b/ui/spice-core.c
index f308fd9..f0da673 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -48,6 +48,7 @@ static char *auth_passwd;
 static time_t auth_expires = TIME_MAX;
 static int spice_migration_completed;
 int using_spice = 0;
+int spice_displays;
 
 static QemuThread me;
 
@@ -836,6 +837,10 @@ int qemu_spice_add_interface(SpiceBaseInstance *sin)
 qemu_add_vm_change_state_handler(vm_change_state_handler, NULL);
 }
 
+if (strcmp(sin-sif-type, SPICE_INTERFACE_QXL) == 0) {
+spice_displays++;
+}
+
 return spice_server_add_interface(spice_server, sin);
 }
 
diff --git a/vl.c b/vl.c
index 25b8f2f..f422a1c 100644
--- a/vl.c
+++ b/vl.c
@@ -4387,7 +4387,7 @@ int main(int argc, char **argv, char **envp)
 }
 #endif
 #ifdef CONFIG_SPICE
-if (using_spice  !qxl_enabled) {
+if (using_spice  !spice_displays) {
 qemu_spice_display_init(ds);
 }
 #endif
-- 
1.7.9.7

Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver

2013-07-23 Thread Daniel P. Berrange

On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote:
 gluster: Use pkg-config to configure GlusterFS block driver
 
 Use pkg-config to determine the version and library dependency
 for GlusterFS block driver.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 ---
  configure |   20 +++-
  1 file changed, 7 insertions(+), 13 deletions(-)
 
 diff --git a/configure b/configure
 index cb0f870..76adcb1 100755
 --- a/configure
 +++ b/configure
 @@ -2566,23 +2566,17 @@ fi
  ##
  # glusterfs probe
  if test $glusterfs != no ; then
 -  cat  $TMPC EOF
 -#include glusterfs/api/glfs.h
 -int main(void) {
 -(void) glfs_new(volume);
 -return 0;
 -}
 -EOF
 -  glusterfs_libs=-lgfapi -lgfrpc -lgfxdr
 -  if compile_prog  $glusterfs_libs ; then
 -glusterfs=yes
 -libs_tools=$glusterfs_libs $libs_tools
 -libs_softmmu=$glusterfs_libs $libs_softmmu
 +  if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then
 +glusterfs=yes
 +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null`
 +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null`
 +CFLAGS=$CFLAGS $glusterfs_cflags
 +LIBS=$LIBS $glusterfs_libs

The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files.
So with this change now in GIT, QEMU no longer detects support for
glusterfs even though it is present.

Has the min required glusterfs been increased to a new 3.5 version
which does include pkg-config support ?  If not, then I think this
patch needs to be reverted, so that it does a non-pkg-config based
check for glusterfs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver

2013-07-23 Thread Anand Avati


On 7/23/13 4:57 AM, Daniel P. Berrange wrote:

On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote:

gluster: Use pkg-config to configure GlusterFS block driver

Use pkg-config to determine the version and library dependency
for GlusterFS block driver.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
  configure |   20 +++-
  1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index cb0f870..76adcb1 100755
--- a/configure
+++ b/configure
@@ -2566,23 +2566,17 @@ fi
  ##
  # glusterfs probe
  if test $glusterfs != no ; then
-  cat  $TMPC EOF
-#include glusterfs/api/glfs.h
-int main(void) {
-(void) glfs_new(volume);
-return 0;
-}
-EOF
-  glusterfs_libs=-lgfapi -lgfrpc -lgfxdr
-  if compile_prog  $glusterfs_libs ; then
-glusterfs=yes
-libs_tools=$glusterfs_libs $libs_tools
-libs_softmmu=$glusterfs_libs $libs_softmmu
+  if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then
+glusterfs=yes
+glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null`
+glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null`
+CFLAGS=$CFLAGS $glusterfs_cflags
+LIBS=$LIBS $glusterfs_libs


The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files.
So with this change now in GIT, QEMU no longer detects support for
glusterfs even though it is present.

Has the min required glusterfs been increased to a new 3.5 version
which does include pkg-config support ?  If not, then I think this
patch needs to be reverted, so that it does a non-pkg-config based
check for glusterfs.

Regards,
Daniel



Copying Kaleb.

We should just include the pkg-config file in the Fedora RPM for 
glusterfs if it already isn't.


Avati

Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver

2013-07-23 Thread Daniel P. Berrange

On Tue, Jul 23, 2013 at 05:02:20AM -0700, Anand Avati wrote:
 On 7/23/13 4:57 AM, Daniel P. Berrange wrote:
 On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote:
 gluster: Use pkg-config to configure GlusterFS block driver
 
 Use pkg-config to determine the version and library dependency
 for GlusterFS block driver.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 ---
   configure |   20 +++-
   1 file changed, 7 insertions(+), 13 deletions(-)
 
 diff --git a/configure b/configure
 index cb0f870..76adcb1 100755
 --- a/configure
 +++ b/configure
 @@ -2566,23 +2566,17 @@ fi
   ##
   # glusterfs probe
   if test $glusterfs != no ; then
 -  cat  $TMPC EOF
 -#include glusterfs/api/glfs.h
 -int main(void) {
 -(void) glfs_new(volume);
 -return 0;
 -}
 -EOF
 -  glusterfs_libs=-lgfapi -lgfrpc -lgfxdr
 -  if compile_prog  $glusterfs_libs ; then
 -glusterfs=yes
 -libs_tools=$glusterfs_libs $libs_tools
 -libs_softmmu=$glusterfs_libs $libs_softmmu
 +  if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21; then
 +glusterfs=yes
 +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null`
 +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null`
 +CFLAGS=$CFLAGS $glusterfs_cflags
 +LIBS=$LIBS $glusterfs_libs
 
 The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files.
 So with this change now in GIT, QEMU no longer detects support for
 glusterfs even though it is present.
 
 Has the min required glusterfs been increased to a new 3.5 version
 which does include pkg-config support ?  If not, then I think this
 patch needs to be reverted, so that it does a non-pkg-config based
 check for glusterfs.
 
 Regards,
 Daniel
 
 
 Copying Kaleb.
 
 We should just include the pkg-config file in the Fedora RPM for
 glusterfs if it already isn't.

That doesn't help anyone trying to build QEMU with gluster support
on all the existing released distros which lack the pkg-config files.

If you really want a pkg-config file check for glusterfs in QEMU,
then it must at least fallback to probing the non-pkg-config way
to support existing deployed distros.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers

2013-07-23 Thread Stefan Hajnoczi

On Fri, Jul 19, 2013 at 10:37:11AM +0200, Kevin Wolf wrote:
 Am 19.07.2013 um 07:27 hat Stefan Hajnoczi geschrieben:
  On Thu, Jul 18, 2013 at 11:21:42PM +0200, Charlie Shepherd wrote:
   This patch makes bdrv_flush a synchronous function and updates any 
   callers from
   a coroutine context to use bdrv_co_flush instead.
   
   The motivation for this patch comes from the GSoC Continuation-Passing C
   project. When coroutines were introduced, synchronous functions in the 
   block
   layer were converted to use asynchronous methods by dynamically detecting 
   if
   they were being run from a coroutine context by calling 
   qemu_in_coroutine(), and
   yielding if so. If they were not, they would spawn a new coroutine and 
   poll
   until the asynchronous counterpart finished.
   
   However this approach does not work with CPC as the CPC translator 
   converts all
   functions annotated coroutine_fn to a different (continuation based) 
   calling
   convention. This means that coroutine_fn annotated functions cannot be 
   called
   from a non-coroutine context.
   
   This patch is a Request For Comments on the approach of splitting these
   dynamic functions into synchronous and asynchronous versions. This is 
   easy for
   bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. 
   The
   only caller of bdrv_flush from a coroutine context is mirror_drain in
   block/mirror.c - this should be annotated as a coroutine_fn as it calls
   qemu_coroutine_yield().
   
   If this approach meets with approval I will develop a patchset splitting 
   the
   other dynamic functions in the block layer. This will allow all 
   coroutine
   functions to have a coroutine_fn annotation that can be statically 
   checked (CPC
   can be used to verify annotations).
   
   I have audited the other callers of bdrv_flush, they are included below:
   
   block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync
  
  bdrv_pwrite_sync() is a dynamic function.  If called from coroutine
  context it will yield (indirectly from bdrv_pwrite() or bdrv_flush()).
  
   block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush
   block/qcow2-refcount.c: qcow2_update_snapshot_refcount
   block/qcow2-snapshot.c: qcow2_write_snapshots
   block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean
  
  qcow2 runs in coroutine context, the coroutine_fn annotations are just
  missing.  See block/qcow2.c:bdrv_qcow2 for the entry points like
  qcow2_co_readv.
 
 Yes, you can't rely on coroutine_fn, it's missing in many places where
 it should be there. But that was still the optimistic view.
 
 The truth is that the greatest part of the qcow2 functions can be called
 from eiher coroutine or non-coroutine context. You get coroutine context
 for read/write/discard/flush, but anything else like doing snapshots,
 resizing, preallocating the image, writing compressed data also accesses
 the same metadata management functions outside coroutines.
 
 It's only getting worse for function like bdrv_pwrite().

A built-time check for coroutine_fn would be valuable if we ever hope to
get disciplined with this annotation.

The check can detect when a coroutine_fn is invoked outside coroutine
context.  I wonder if Coccinelle can detect this, although I never
figured out how to use it as a grep-like tool instead of just a
patch-like tool.

Stefan

Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers

2013-07-23 Thread Gabriel Kerneis

On Tue, Jul 23, 2013 at 02:05:15PM +0200, Stefan Hajnoczi wrote:
 A built-time check for coroutine_fn would be valuable if we ever hope to
 get disciplined with this annotation.
 
 The check can detect when a coroutine_fn is invoked outside coroutine
 context.  I wonder if Coccinelle can detect this, although I never
 figured out how to use it as a grep-like tool instead of just a
 patch-like tool.

The recent cps-inference branch of CPC enables precisely that kind of check.
Charlie is using it to drive his modifications to QEMU and has already suggested
several improvements that I have implemented.  Hopefully we should reach
something fully covering QEMU by the end his GSoC.

If there is interest, I can post a script showing how to build it and use it to
check QEMU annotations (it does not require any modifications to QEMU, only a
couple of configure switches).

Best regards,
-- 
Gabriel

Re: [Qemu-devel] [RFC PATCH v1 1/2] gluster: Use pkg-config to configure GlusterFS block driver

2013-07-23 Thread Daniel P. Berrange

On Tue, Jul 23, 2013 at 05:37:54PM +0530, Kaleb KEITHLEY wrote:
 On 07/23/2013 05:32 PM, Anand Avati wrote:
 On 7/23/13 4:57 AM, Daniel P. Berrange wrote:
 On Fri, Jul 12, 2013 at 12:28:54PM +0530, Bharata B Rao wrote:
 gluster: Use pkg-config to configure GlusterFS block driver
 
 Use pkg-config to determine the version and library dependency
 for GlusterFS block driver.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 ---
   configure |   20 +++-
   1 file changed, 7 insertions(+), 13 deletions(-)
 
 diff --git a/configure b/configure
 index cb0f870..76adcb1 100755
 --- a/configure
 +++ b/configure
 @@ -2566,23 +2566,17 @@ fi
   ##
   # glusterfs probe
   if test $glusterfs != no ; then
 -  cat  $TMPC EOF
 -#include glusterfs/api/glfs.h
 -int main(void) {
 -(void) glfs_new(volume);
 -return 0;
 -}
 -EOF
 -  glusterfs_libs=-lgfapi -lgfrpc -lgfxdr
 -  if compile_prog  $glusterfs_libs ; then
 -glusterfs=yes
 -libs_tools=$glusterfs_libs $libs_tools
 -libs_softmmu=$glusterfs_libs $libs_softmmu
 +  if $pkg_config --atleast-version=3 glusterfs-api /dev/null 21;
 then
 +glusterfs=yes
 +glusterfs_cflags=`$pkg_config --cflags glusterfs-api 2/dev/null`
 +glusterfs_libs=`$pkg_config --libs glusterfs-api 2/dev/null`
 +CFLAGS=$CFLAGS $glusterfs_cflags
 +LIBS=$LIBS $glusterfs_libs
 
 The glusterfs v 3.4 RPMs in Fedora do not include any pkg-config files.
 So with this change now in GIT, QEMU no longer detects support for
 glusterfs even though it is present.
 
 Has the min required glusterfs been increased to a new 3.5 version
 which does include pkg-config support ?  If not, then I think this
 patch needs to be reverted, so that it does a non-pkg-config based
 check for glusterfs.
 
 Regards,
 Daniel
 
 
 Copying Kaleb.
 
 We should just include the pkg-config file in the Fedora RPM for
 glusterfs if it already isn't.
 
 It's in the glusterfs-api-devel rpm:
 
 % rpm -ql glusterfs-api-devel
 /usr/include/glusterfs/api/glfs.h
 /usr/lib64/libgfapi.so
 /usr/lib64/pkgconfig/glusterfs-api.pc

Oooh, not the main glusterfs-devel RPM. Ok, ignore my earlier message

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH V2 for-1.6 0/5] Leaky bucket throttling and features

2013-07-23 Thread Benoît Canet

The first patch fixes the throttling which was broken by a previous commit.

The next patch replace the existing throttling algorithm by the well described
leaky bucket algorithm.

Third patch implement bursting by adding *_threshold parameters to
qmp_block_set_io_throttle.

The last one allow to define the max size of an io when throttling by iops via
iops_sector_count to avoid vm users cheating on the iops limit.

The last patch adds a metric reflecting how much the I/O are throttled.

since v1:
Add throttling percentage metric [Benoît]

Benoît Canet (5):
  block: Repair the throttling code.
  block: Modify the throttling code to implement the leaky bucket
algorithm.
  block: Add support for throttling burst threshold in QMP and the
command line.
  block: Add iops_sector_count to do the iops accounting for a given io
size.
  block: Add throttling percentage metrics.

 block.c   |  439 ++---
 block/qapi.c  |   32 
 blockdev.c|  174 --
 hmp.c |   38 +++-
 include/block/block_int.h |   18 +-
 include/block/coroutine.h |5 +
 qapi-schema.json  |   42 -
 qemu-coroutine-lock.c |   14 ++
 qemu-options.hx   |2 +-
 qmp-commands.hx   |   34 +++-
 10 files changed, 586 insertions(+), 212 deletions(-)

-- 
1.7.10.4

[Qemu-devel] [PATCH V2 for-1.6 1/5] block: Repair the throttling code.

2013-07-23 Thread Benoît Canet

The throttling code was segfaulting since commit
02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As sugested by Stefan fix this by entering the coroutine directly.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block.c   |8 
 include/block/coroutine.h |5 +
 qemu-coroutine-lock.c |   14 ++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index b560241..dc72643 100644
--- a/block.c
+++ b/block.c
@@ -127,7 +127,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 {
 bs-io_limits_enabled = false;
 
-while (qemu_co_queue_next(bs-throttled_reqs));
+while (qemu_co_enter_next(bs-throttled_reqs)) {
+}
 
 if (bs-block_timer) {
 qemu_del_timer(bs-block_timer);
@@ -143,7 +144,7 @@ static void bdrv_block_timer(void *opaque)
 {
 BlockDriverState *bs = opaque;
 
-qemu_co_queue_next(bs-throttled_reqs);
+qemu_co_enter_next(bs-throttled_reqs);
 }
 
 void bdrv_io_limits_enable(BlockDriverState *bs)
@@ -1445,8 +1446,7 @@ void bdrv_drain_all(void)
  * a busy wait.
  */
 QTAILQ_FOREACH(bs, bdrv_states, list) {
-if (!qemu_co_queue_empty(bs-throttled_reqs)) {
-qemu_co_queue_restart_all(bs-throttled_reqs);
+while (qemu_co_enter_next(bs-throttled_reqs)) {
 busy = true;
 }
 }
diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..66e331c 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -138,6 +138,11 @@ bool qemu_co_queue_next(CoQueue *queue);
 void qemu_co_queue_restart_all(CoQueue *queue);
 
 /**
+ * Enter the next coroutine in the queue
+ */
+bool qemu_co_enter_next(CoQueue *queue);
+
+/**
  * Checks if the CoQueue is empty.
  */
 bool qemu_co_queue_empty(CoQueue *queue);
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index d9fea49..c61d300 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -98,6 +98,20 @@ void qemu_co_queue_restart_all(CoQueue *queue)
 qemu_co_queue_do_restart(queue, false);
 }
 
+bool qemu_co_enter_next(CoQueue *queue)
+{
+Coroutine *next;
+
+next = QTAILQ_FIRST(queue-entries);
+if (!next) {
+return false;
+}
+
+QTAILQ_REMOVE(queue-entries, next, co_queue_next);
+qemu_coroutine_enter(next, NULL);
+return true;
+}
+
 bool qemu_co_queue_empty(CoQueue *queue)
 {
 return (QTAILQ_FIRST(queue-entries) == NULL);
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 0/9] Add platform bus

2013-07-23 Thread Paolo Bonzini

Il 22/07/2013 20:21, Peter Maydell ha scritto:
  Platforms without ISA and/or PCI have had a seriously hard time in the 
  dynamic
  device creation world of QEMU. Devices on these were modeled as SysBus 
  devices
  which can only be instantiated in machine files, not through -device.
 
  Why is that so?
 Because you can't as a user of this sort of hardware
 plug in an extra serial port to a SoC, because there's just nowhere
 to plug it in. So why should it be possible to plug an extra
 serial port into the QEMU model of the SoC?

And why exactly should QEMU be limited to modeling an existing SoC?

Perhaps the user is not working with an existing SoC.  They are working
with with IP building blocks that they can combine the way they prefer,
and they haven't yet made up their mind on the exact set of devices
they'll have.  (because not all the world is a PC, but then not all the
non-PC world is ARM either).

Perhaps the user is working on kernel support for device tree / ACPI,
wants to test many device combinations, and does not want to touch C
code in order to do that.

Perhaps the user can plug daughterboards that connect to the SoC and add
an extra serial port, visible as yet another MMIO device.

 And why should
 the user of QEMU have to know very low hardware level detail
 like what a particular devboard's IRQ and memory map are?

Of course in some scenarios the user of QEMU doesn't need them, but
there are many kinds of users...

Paolo

[Qemu-devel] [PATCH V2 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.

2013-07-23 Thread Benoît Canet

The thresholds of the leaky bucket algorithm can be used to allow some
burstiness.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/qapi.c |   24 +
 blockdev.c   |  105 +++---
 hmp.c|   32 +++--
 qapi-schema.json |   34 --
 qemu-options.hx  |2 +-
 qmp-commands.hx  |   30 ++--
 6 files changed, 205 insertions(+), 22 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..03f1604 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -235,6 +235,30 @@ void bdrv_query_info(BlockDriverState *bs,
bs-io_limits.iops[BLOCK_IO_LIMIT_READ];
 info-inserted-iops_wr =
bs-io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+info-inserted-has_bps_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+info-inserted-bps_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+info-inserted-has_bps_rd_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+info-inserted-bps_rd_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+info-inserted-has_bps_wr_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+info-inserted-bps_wr_threshold =
+   bs-io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+info-inserted-has_iops_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+info-inserted-iops_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+info-inserted-iops_rd_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+info-inserted-has_iops_rd_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+info-inserted-has_iops_wr_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+info-inserted-iops_wr_threshold =
+   bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
 }
 
 bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index a78fba4..9bda359 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,6 +341,17 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, 
Error **errp)
 return false;
 }
 
+if (io_limits-bps_threshold[BLOCK_IO_LIMIT_TOTAL]  0 ||
+io_limits-bps_threshold[BLOCK_IO_LIMIT_WRITE]  0 ||
+io_limits-bps_threshold[BLOCK_IO_LIMIT_READ]  0 ||
+io_limits-iops_threshold[BLOCK_IO_LIMIT_TOTAL]  0 ||
+io_limits-iops_threshold[BLOCK_IO_LIMIT_WRITE]  0 ||
+io_limits-iops_threshold[BLOCK_IO_LIMIT_READ]  0) {
+error_setg(errp,
+   threshold values must be 0 or greater);
+return false;
+}
+
 return true;
 }
 
@@ -523,24 +534,34 @@ DriveInfo *drive_init(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
qemu_opt_get_number(opts, bps_rd, 0);
 io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
qemu_opt_get_number(opts, bps_wr, 0);
+/* bps thresholds */
+io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL]  =
+qemu_opt_get_number(opts, bps_threshold,
+io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
+io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  =
+qemu_opt_get_number(opts, bps_rd_threshold,
+io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
+io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE]  =
+qemu_opt_get_number(opts, bps_wr_threshold,
+io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
+
 io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
qemu_opt_get_number(opts, iops, 0);
 io_limits.iops[BLOCK_IO_LIMIT_READ]  =
qemu_opt_get_number(opts, iops_rd, 0);
 io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
qemu_opt_get_number(opts, iops_wr, 0);
-io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
-   io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
-io_limits.bps_threshold[BLOCK_IO_LIMIT_READ]  =
-   io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
-io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
-   io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
-io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
-   io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+
+/* iops thresholds */
+io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL]  =
+qemu_opt_get_number(opts, iops_threshold,
+

[Qemu-devel] [PATCH V2 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.

2013-07-23 Thread Benoît Canet

This patch replace the previous algorithm by the well described leaky bucket
algorithm: A bucket is filled by the incoming IOs and a periodic timer decrement
the counter to make the bucket leak. When a given threshold is reached the
bucket is full and the IOs are hold.

In this patch the threshold is set to a default value to make the code behave
like the previous implementation.

In the next patch the threshold will be exposed in QMP to let the user control
the burstiness of the throttling.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block.c   |  410 +
 blockdev.c|   71 ++--
 include/block/block_int.h |   15 +-
 3 files changed, 299 insertions(+), 197 deletions(-)

diff --git a/block.c b/block.c
index dc72643..2d6e9b4 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors);
 
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-bool is_write, int64_t *wait);
-
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -101,6 +94,8 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
+/* boolean used to inform the throttling code that a bdrv_drain_all is issued 
*/
+static bool draining;
 
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
@@ -135,15 +130,122 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 qemu_free_timer(bs-block_timer);
 bs-block_timer = NULL;
 }
+}
 
-bs-slice_start = 0;
-bs-slice_end   = 0;
+static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+int64_t *bytes = bs-leaky_buckets.bytes;
+int64_t read_leak, write_leak;
+
+/* the limit apply to both reads and writes */
+if (bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+/* compute half the total leak */
+int64_t leak = ((bs-io_limits.bps[BLOCK_IO_LIMIT_TOTAL] * delta) /
+   NANOSECONDS_PER_SECOND);
+int remain = leak % 2;
+leak /= 2;
+
+/* the read bucket is smaller than half the quantity to leak so take
+ * care adding the leak difference to write leak
+ */
+if (bytes[BLOCK_IO_LIMIT_READ] = leak) {
+read_leak = bytes[BLOCK_IO_LIMIT_READ];
+write_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_READ];
+/* symetric case */
+} else if (bytes[BLOCK_IO_LIMIT_WRITE] = leak) {
+write_leak = bytes[BLOCK_IO_LIMIT_WRITE];
+read_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_WRITE];
+/* both bucket above leak count use half the total leak for both */
+} else {
+write_leak = leak;
+read_leak = leak + remain;
+}
+/* else we consider that limits are separated */
+} else {
+read_leak = (bs-io_limits.bps[BLOCK_IO_LIMIT_READ] * delta) /
+NANOSECONDS_PER_SECOND;
+write_leak = (bs-io_limits.bps[BLOCK_IO_LIMIT_WRITE] * delta) /
+ NANOSECONDS_PER_SECOND;
+}
+
+/* make the buckets leak */
+bytes[BLOCK_IO_LIMIT_READ]  = MAX(bytes[BLOCK_IO_LIMIT_READ] - read_leak,
+  0);
+bytes[BLOCK_IO_LIMIT_WRITE] = MAX(bytes[BLOCK_IO_LIMIT_WRITE] - write_leak,
+  0);
 }
 
+static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+double *ios = bs-leaky_buckets.ios;
+int64_t read_leak, write_leak;
+
+/* the limit apply to both reads and writes */
+if (bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+/* compute half the total leak */
+int64_t leak = ((bs-io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
+   NANOSECONDS_PER_SECOND);
+int remain = leak % 2;
+leak /= 2;
+
+/* the read bucket is smaller than half the quantity to leak so take
+ * care adding the leak difference to write leak
+ */
+if (ios[BLOCK_IO_LIMIT_READ] = leak) {
+read_leak = ios[BLOCK_IO_LIMIT_READ];
+write_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_READ];
+/* symetric case */
+} else if (ios[BLOCK_IO_LIMIT_WRITE] = leak) {
+write_leak = ios[BLOCK_IO_LIMIT_WRITE];
+read_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_WRITE];
+/* both bucket above leak count use half the total leak for both */
+} else {
+write_leak

[Qemu-devel] [PATCH V2 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size.

2013-07-23 Thread Benoît Canet

This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block.c   |8 +++-
 block/qapi.c  |4 
 blockdev.c|   22 +-
 hmp.c |8 ++--
 include/block/block_int.h |1 +
 qapi-schema.json  |   10 --
 qemu-options.hx   |2 +-
 qmp-commands.hx   |8 ++--
 8 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 2d6e9b4..cfa890d 100644
--- a/block.c
+++ b/block.c
@@ -337,6 +337,12 @@ static bool 
bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
bool is_write)
 {
 bool bps_ret, iops_ret;
+double   ios = 1.0;
+
+if (bs-io_limits.iops_sector_count) {
+ios = ((double) nb_sectors) / bs-io_limits.iops_sector_count;
+ios = MAX(ios, 1.0);
+}
 
 /* check if any bandwith or per IO threshold has been exceeded */
 bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
@@ -360,7 +366,7 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState 
*bs, int nb_sectors,
 /* the IO is authorized so do the accounting and return false */
 bs-leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
  BDRV_SECTOR_SIZE;
-bs-leaky_buckets.ios[is_write]++;
+bs-leaky_buckets.ios[is_write] += ios;
 
 return false;
 }
diff --git a/block/qapi.c b/block/qapi.c
index 03f1604..f81081c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -259,6 +259,10 @@ void bdrv_query_info(BlockDriverState *bs,
bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
 info-inserted-iops_wr_threshold =
bs-io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+info-inserted-has_iops_sector_count =
+   bs-io_limits.iops_sector_count;
+info-inserted-iops_sector_count =
+   bs-io_limits.iops_sector_count;
 }
 
 bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 9bda359..8ddd710 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -352,6 +352,11 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, 
Error **errp)
 return false;
 }
 
+if (io_limits-iops_sector_count  0) {
+error_setg(errp, iops_sector_count must be 0 or greater);
+return false;
+}
+
 return true;
 }
 
@@ -563,6 +568,10 @@ DriveInfo *drive_init(QemuOpts *all_opts, 
BlockInterfaceType block_default_type)
 qemu_opt_get_number(opts, iops_wr_threshold,
 io_limits.iops[BLOCK_IO_LIMIT_WRITE] / 
THROTTLE_HZ);
 
+io_limits.iops_sector_count =
+   qemu_opt_get_number(opts, iops_sector_count, 0);
+ 
+
 if (!do_check_io_limits(io_limits, error)) {
 error_report(%s, error_get_pretty(error));
 error_free(error);
@@ -1260,7 +1269,9 @@ void qmp_block_set_io_throttle(const char *device, 
int64_t bps, int64_t bps_rd,
bool has_iops_rd_threshold,
int64_t iops_rd_threshold,
bool has_iops_wr_threshold,
-   int64_t iops_wr_threshold, Error **errp)
+   int64_t iops_wr_threshold,
+   bool has_iops_sector_count,
+   int64_t iops_sector_count, Error **errp)
 {
 BlockIOLimit io_limits;
 BlockDriverState *bs;
@@ -1283,6 +1294,7 @@ void qmp_block_set_io_throttle(const char *device, 
int64_t bps, int64_t bps_rd,
 io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
 io_limits.iops_threshold[BLOCK_IO_LIMIT_READ]  = iops_rd / THROTTLE_HZ;
 io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
+io_limits.iops_sector_count = 0;
 
 /* override them with givens values if present */
 if (has_bps_threshold) {
@@ -1304,6 +1316,10 @@ void qmp_block_set_io_throttle(const char *device, 
int64_t bps, int64_t bps_rd,
 io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
 }
 
+if (has_iops_sector_count) {
+io_limits.iops_sector_count = iops_sector_count;
+}
+
 if (!do_check_io_limits(io_limits, errp)) {
 return;
 }
@@ -2007,6 +2023,10 @@ QemuOptsList qemu_common_drive_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = write bytes threshold,
 },{
+.name = iops_sector_count,
+.type = QEMU_OPT_NUMBER,
+.help = when limiting by iops max size of an I/O in sector,
+},{
 .name = copy-on-read,
 .type = QEMU_OPT_BOOL,
 .help = copy read data from backing file into image file,
diff --git a/hmp.c b/hmp.c
index

[Qemu-devel] [PATCH V2 for-1.6 5/5] block: Add throttling percentage metrics.

2013-07-23 Thread Benoît Canet

This metrics show how many percent of the time I/Os are put on hold by the
throttling algorithm.
This metric could be used by system administrators or cloud stack developpers
to decide when the throttling settings must be changed.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block.c   |   17 -
 block/qapi.c  |4 
 hmp.c |6 --
 include/block/block_int.h |2 ++
 qapi-schema.json  |4 +++-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index cfa890d..749f276 100644
--- a/block.c
+++ b/block.c
@@ -130,6 +130,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 qemu_free_timer(bs-block_timer);
 bs-block_timer = NULL;
 }
+bs-throttling_percentage = 0;
+bs-full_since = 0;
 }
 
 static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
@@ -219,7 +221,8 @@ static void bdrv_make_iops_buckets_leak(BlockDriverState 
*bs, int64_t delta)
 static void bdrv_leak_if_needed(BlockDriverState *bs)
 {
 int64_t now;
-int64_t delta;
+int64_t delta; /* the delta that would be ideally the timer period */
+int64_t delta_full; /* the delta where the bucket is full */
 
 if (!bs-must_leak) {
 return;
@@ -229,6 +232,11 @@ static void bdrv_leak_if_needed(BlockDriverState *bs)
 
 now = qemu_get_clock_ns(rt_clock);
 delta = now - bs-previous_leak;
+/* compute throttle percentage reflecting how long IO are hold on average 
*/
+if (bs-full_since) {
+delta_full = now - bs-full_since;
+bs-throttling_percentage = (delta_full * 100) / delta;
+}
 bs-previous_leak = now;
 
 bdrv_make_bps_buckets_leak(bs, delta);
@@ -255,6 +263,8 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
 bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
 bs-io_limits_enabled = true;
 bs-previous_leak = qemu_get_clock_ns(rt_clock);
+bs-throttling_percentage = 0;
+bs-full_since = 0;
 qemu_mod_timer(bs-block_timer,
qemu_get_clock_ns(vm_clock) +
BLOCK_IO_THROTTLE_PERIOD);
@@ -396,6 +406,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
  * not full
  */
 while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
+/* remember since when the code decided to block the first I/O */
+if (qemu_co_queue_empty(bs-throttled_reqs)) {
+bs-full_since = qemu_get_clock_ns(rt_clock);
+}
+
 bdrv_leak_if_needed(bs);
 qemu_co_queue_wait_insert_head(bs-throttled_reqs);
 bdrv_leak_if_needed(bs);
diff --git a/block/qapi.c b/block/qapi.c
index f81081c..bd1c6af 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs,
bs-io_limits.iops_sector_count;
 info-inserted-iops_sector_count =
bs-io_limits.iops_sector_count;
+info-inserted-has_throttling_percentage =
+   bs-throttling_percentage;
+info-inserted-throttling_percentage =
+   bs-throttling_percentage;
 }
 
 bs0 = bs;
diff --git a/hmp.c b/hmp.c
index 3912305..9dc4862 100644
--- a/hmp.c
+++ b/hmp.c
@@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
  iops_threshold=% PRId64
  iops_rd_threshold=% PRId64
  iops_wr_threshold=% PRId64
- iops_sector_count=% PRId64 \n,
+ iops_sector_count=% PRId64
+ throttling_percentage=% PRId64 \n,
 info-value-inserted-bps,
 info-value-inserted-bps_rd,
 info-value-inserted-bps_wr,
@@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
 info-value-inserted-iops_threshold,
 info-value-inserted-iops_rd_threshold,
 info-value-inserted-iops_wr_threshold,
-info-value-inserted-iops_sector_count);
+info-value-inserted-iops_sector_count,
+info-value-inserted-throttling_percentage);
 } else {
 monitor_printf(mon,  [not inserted]);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 74d7503..4487cd9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -271,6 +271,8 @@ struct BlockDriverState {
 BlockIOLimit io_limits;
 BlockIOBaseValue leaky_buckets;
 int64_t  previous_leak;
+int64_t  full_since;
+int  throttling_percentage;
 bool must_leak;
 CoQueue  throttled_reqs;
 QEMUTimer*block_timer;
diff --git

Re: [Qemu-devel] [PATCH 0/9] Add platform bus

2013-07-23 Thread Peter Maydell

On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/07/2013 20:21, Peter Maydell ha scritto:
  Platforms without ISA and/or PCI have had a seriously hard time in the 
  dynamic
  device creation world of QEMU. Devices on these were modeled as SysBus 
  devices
  which can only be instantiated in machine files, not through -device.
 
  Why is that so?
 Because you can't as a user of this sort of hardware
 plug in an extra serial port to a SoC, because there's just nowhere
 to plug it in. So why should it be possible to plug an extra
 serial port into the QEMU model of the SoC?

 And why exactly should QEMU be limited to modeling an existing SoC?

 Perhaps the user is not working with an existing SoC.  They are working
 with with IP building blocks that they can combine the way they prefer,
 and they haven't yet made up their mind on the exact set of devices
 they'll have.  (because not all the world is a PC, but then not all the
 non-PC world is ARM either).

This sounds like (a) a good thing (b) something that will
turn into an incredible incomprehensible mess if we try
to specify it on the command line. Why would we want to do that?

That is, you're arguing for a scripting/config language for
putting together board models so you don't have to write them
in C. That's a good thing, but not what this patch series is doing.

 Perhaps the user is working on kernel support for device tree / ACPI,
 wants to test many device combinations, and does not want to touch C
 code in order to do that.

 Perhaps the user can plug daughterboards that connect to the SoC and add
 an extra serial port, visible as yet another MMIO device.

Pluggable daughterboards should be implemented by actually
defining the bus/socket that exists between the mainboard
and the daughterboard, so you could say -device my-daughterboard
and have it plug in to the mainboard.

-- PMM

Re: [Qemu-devel] Question on aio_poll

2013-07-23 Thread Stefan Hajnoczi

On Sat, Jul 20, 2013 at 02:14:43PM +0100, Alex Bligh wrote:
 As part of adding timer operations to aio_poll and a clock to AioContext, I
 am trying to figure out a couple of points on how aio_poll works. This
 primarily revolves around the flag busy.
 
 Firstly, at the end of aio_poll, it has
 assert(progress || busy);
 
 This assert in fact checks nothing, because before the poll (20 lines or
 so higher), it has
 if (!busy) {
 return progress;
 }
 
 and as 'busy' is not altered, busy must be true at the assert.
 
 Is this in fact meant to be checking that we have made progress if aio_poll
 is called blocking? IE assert(progress || !blocking) ?
 
 Secondly, the tests I am writing for the timer fail because g_poll is
 not called, because busy is false. This is because I haven't got an
 fd set up. But in the instance where aio_poll is called with blocking=true
 and there are no fd's to wait on, surely aio_poll should either hang
 indefinitely or (perhaps better) assert, rather than return immediately
 which is a recipe for an unexpected busywait? If I have timers, it should
 be running those timers rather than returning.

FWIW this goes away in my series that gets rid of .io_flush():

http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg00092.html

Unfortunately there is an issue with the series which I haven't had time
to look into yet.  I don't remember the details but I think make check
is failing.

The current qemu.git/master code is doing the correct thing though.
Callers of aio_poll() are using it to complete any pending I/O requests
and process BHs.  If there is no work left, we do not want to block
indefinitely.  Instead we want to return.

 Thirdly, I don't quite understand how/why busy is being set. It seems
 to be set if the flush callback returns non-zero. That would imply (I think)
 the fd handler has something to write. But what if it is just interested
 in any data to read that is available (and never writes)? If this is the
 only fd aio_poll has, it would appear it never polls.

The point of .io_flush() is to select file descriptors that are awaiting
I/O (either direction).  For example, consider an iSCSI TCP socket with
no I/O requests pending.  In that case .io_flush() returns 0 and we will
not block in aio_poll().  But if there is an iSCSI request pending, then
.io_flush() will return 1 and we'll wait for the iSCSI response to be
received.

The effect of .io_flush() is that aio_poll() will return false if there
is no I/O pending.

It turned out that this behavior could be implemented at the block layer
instead of using the .io_flush() interface at the AioContext layer.  The
patch series I linked to above modifies the code so AioContext can
eliminate the .io_flush() concept.

 When adapting this for timers, I think it should be returning true only
 if a an AIO dispatch did something, or a BH was executed, or a timer ran.
 Specifically if the poll simply times out, it should not be returning
 true unless a timer ran. Correct?

Yes, the return value is about making progress.  If there is still work
pending then it must return true.  If there is no work pending it must
return false.

Stefan

Re: [Qemu-devel] [PATCH] gluster: Add image resize support

2013-07-23 Thread Stefan Hajnoczi

On Fri, Jul 19, 2013 at 07:51:33PM +0530, Bharata B Rao wrote:
 From: Paolo Bonzini pbonz...@redhat.com
 
 Implement .bdrv_truncate in GlusterFS block driver so that GlusterFS backend
 can support image resizing.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 Tested-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 ---
  block/gluster.c | 17 +
  1 file changed, 17 insertions(+)

Acked-by: Stefan Hajnoczi stefa...@redhat.com

Re: [Qemu-devel] [PATCH 0/9] Add platform bus

2013-07-23 Thread François Revol

On 23/07/2013 14:19, Paolo Bonzini wrote:
 Il 22/07/2013 20:21, Peter Maydell ha scritto:
   Platforms without ISA and/or PCI have had a seriously hard time in 
   the dynamic
   device creation world of QEMU. Devices on these were modeled as 
   SysBus devices
   which can only be instantiated in machine files, not through -device.
  
   Why is that so?
  Because you can't as a user of this sort of hardware
  plug in an extra serial port to a SoC, because there's just nowhere
  to plug it in. So why should it be possible to plug an extra
  serial port into the QEMU model of the SoC?
 And why exactly should QEMU be limited to modeling an existing SoC?
 
 Perhaps the user is not working with an existing SoC.  They are working
 with with IP building blocks that they can combine the way they prefer,
 and they haven't yet made up their mind on the exact set of devices
 they'll have.  (because not all the world is a PC, but then not all the
 non-PC world is ARM either).
 
 Perhaps the user is working on kernel support for device tree / ACPI,
 wants to test many device combinations, and does not want to touch C
 code in order to do that.

For what it's worth, ont sure it matters here, but the Sam460ex (PPC)
(I've been writing code to support it recently) has an FDT, and a PCI
controller...
And there is a firmware setting to switch between the 2nd SATA port and
the PCI-e slot as they are mutually exclusive...

François.

Re: [Qemu-devel] APM regression since v1.3.0-408-g9ee59f3

2013-07-23 Thread Gerd Hoffmann

On 07/03/13 22:25, Sebastian Herbszt wrote:
 Commit 9ee59f3 (pc: remove bochs bios debug ports) broke the APM
 interface
 between QEMU and Bochs BIOS/SeaBIOS. Without APM support older guests
 are no longer able to power off the VM. This regression also affects
 older machine
 types like pc-1.2.

--verbose please.  Which guest?  Which firmware?

ACPI poweroff with seabios works just fine.  If APM support in seabios
uses something else it should be switched to use the same hardware ports
ACPI uses for poweroff (some piix4 pm device register I think) instead
of the debug ports which where never meant to be used that way.

Does bochs bios run on recent qemu versions in the first place?

cheers,
  Gerd

Re: [Qemu-devel] [PATCH RFC qom-next 4/4] pcie_port: Turn PCIEPort and PCIESlot into abstract QOM types

2013-07-23 Thread Michael S. Tsirkin

On Tue, Jul 23, 2013 at 01:21:13PM +0200, Gerd Hoffmann wrote:
   Hi,
 
  Okay, so if it's just PCIe, then XHCI is the oddball preventing moving
  it into VMSTATE_PCIE_DEVICE(). XHCI has VMSTATE_MSIX() in its place,
  also operating on PCIDevice.
 
 Given that live migration support for xhci was added post-1.5 so we
 don't have a release with it yet this shouldn't be a blocker in case we
 get this sorted in time for 1.6.
 
  Is there a way to detect use of AER or MSIX to place those into
  subsections of VMSTATE_PCIE_DEVICE()?
 
 There is msix_enabled() ...

msix_present()

 Dunno about AER.
 
 cheers,
   Gerd


Not at the moment but it's not hard to add.

-- 
MST

Re: [Qemu-devel] [PATCH 0/9] Add platform bus

2013-07-23 Thread Paolo Bonzini

Il 23/07/2013 14:22, Peter Maydell ha scritto:
 On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/07/2013 20:21, Peter Maydell ha scritto
 you can't as a user of this sort of hardware
 plug in an extra serial port to a SoC, because there's just nowhere
 to plug it in. So why should it be possible to plug an extra
 serial port into the QEMU model of the SoC?

 And why exactly should QEMU be limited to modeling an existing SoC?

 Perhaps the user is not working with an existing SoC.  They are working
 with with IP building blocks that they can combine the way they prefer,
 and they haven't yet made up their mind on the exact set of devices
 they'll have.  (because not all the world is a PC, but then not all the
 non-PC world is ARM either).
 
 This sounds like (a) a good thing (b) something that will
 turn into an incredible incomprehensible mess if we try
 to specify it on the command line. Why would we want to do that?

It is an incomprehensible mess on the command line, but it is actually
quite fine if you use -readconfig instead.

It will not support for -serial and similar shortcut options, but
they aren't necessary.  PCI-enabled or paravirtual boards have been able
to run for years now with -nodefaults and only -device options.

 That is, you're arguing for a scripting/config language for
 putting together board models so you don't have to write them
 in C. That's a good thing, but not what this patch series is doing.

It is, modulo lack of support for -serial and similar shortcut
options.  Gluing those still requires C code (for PC as well).

 Perhaps the user is working on kernel support for device tree / ACPI,
 wants to test many device combinations, and does not want to touch C
 code in order to do that.

 Perhaps the user can plug daughterboards that connect to the SoC and add
 an extra serial port, visible as yet another MMIO device.
 
 Pluggable daughterboards should be implemented by actually
 defining the bus/socket that exists between the mainboard
 and the daughterboard, so you could say -device my-daughterboard
 and have it plug in to the mainboard.

The bus might just be the processor's data bus + the interrupt
controller's pins, basically the same as sysbus.

In fact, the main thing I dislike about Alex's patch is adding a new bus
instead of making sysbus devices just work as pluggable devices.

Paolo

Re: [Qemu-devel] [PATCH 0/9] Add platform bus

2013-07-23 Thread Peter Maydell

On 23 July 2013 13:34, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 23/07/2013 14:22, Peter Maydell ha scritto:
 On 23 July 2013 13:19, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/07/2013 20:21, Peter Maydell ha scritto
 you can't as a user of this sort of hardware
 plug in an extra serial port to a SoC, because there's just nowhere
 to plug it in. So why should it be possible to plug an extra
 serial port into the QEMU model of the SoC?

 And why exactly should QEMU be limited to modeling an existing SoC?

 Perhaps the user is not working with an existing SoC.  They are working
 with with IP building blocks that they can combine the way they prefer,
 and they haven't yet made up their mind on the exact set of devices
 they'll have.  (because not all the world is a PC, but then not all the
 non-PC world is ARM either).

 This sounds like (a) a good thing (b) something that will
 turn into an incredible incomprehensible mess if we try
 to specify it on the command line. Why would we want to do that?

 It is an incomprehensible mess on the command line, but it is actually
 quite fine if you use -readconfig instead.

Well, the justification for this whole new bus appears to be
so you can easily just add a new device on the command line.

 Perhaps the user can plug daughterboards that connect to the SoC and add
 an extra serial port, visible as yet another MMIO device.

 Pluggable daughterboards should be implemented by actually
 defining the bus/socket that exists between the mainboard
 and the daughterboard, so you could say -device my-daughterboard
 and have it plug in to the mainboard.

 The bus might just be the processor's data bus + the interrupt
 controller's pins, basically the same as sysbus.

Yes, we should have easy support for defining a pluggable
bus as a collection of pins.

 In fact, the main thing I dislike about Alex's patch is adding a new bus
 instead of making sysbus devices just work as pluggable devices.

Agreed, more or less. Actually I'd rather sysbus devices
went away -- the requirement for interrupt and GPIO and
memory regions to all be defined as single arrays (so you
have to know what interrupt line 3 happens to be, and
that memory region 1 is the registers, and so on) is
pretty unfriendly. We should be able to define all these
as named connections.

-- PMM

Re: [Qemu-devel] [PATCH v2 1/2] ps2: add support of auto-repeat

2013-07-23 Thread Gerd Hoffmann

On 07/02/13 08:49, Amos Kong wrote:
 Actually it's not a urgent/necessary request. Host provided
 auto-repeat works, and we didn't have real application of holding
 key by sendkey command now.

So it's a solution for a non-existing problem ...

Which guests do care about the hardware repeat rate in the first place?
 I know linux emulates it in software for a loong time (IIRC
basically since usb keyboards exist).  I wouldn't be surprised if other
guests do the same.

So I think we should simply not worry about it ;)

cheers,
  Gerd

[Qemu-devel] QCOW2 cryptography and secure key handling

2013-07-23 Thread Benoît Canet


Hi,

I have some budget to improve QCOW2's cryptography.

My main concern is that the QCOW2 image crypto key is passed in clear text.

Do you (the block maintainers) have an idea on how the code could be improved
to securely pass the crypto key to the QCOW2 code ?

Best regards

Benoît

Re: [Qemu-devel] KVM call agenda for 2013-07-30

2013-07-23 Thread Peter Maydell

On 23 July 2013 08:55, Christian Borntraeger borntrae...@de.ibm.com wrote:
 - soft reset and other reset variants. What is the right way to go?
 (e.g. on s390 there are several reset variants that reset a defined subset of 
 the
 system. This can be triggered by operating systems, e.g. kdump uses a diagnose
 instruction that resets the device subsystem and parts of the cpu registers 
 (some
 are unchanged, this allows to take a proper crash dump with a well defined 
 system
 state)

FWIW my suggestion is that we should actually model
reset lines, any associated reset/power controller,
etc: basically do what the hardware does. The qdev
reset method should be restricted to we powercycled
the system.

-- PMM

Re: [Qemu-devel] QCOW2 cryptography and secure key handling

2013-07-23 Thread Daniel P. Berrange

On Tue, Jul 23, 2013 at 02:47:06PM +0200, Benoît Canet wrote:
 
 Hi,
 
 I have some budget to improve QCOW2's cryptography.
 
 My main concern is that the QCOW2 image crypto key is passed in clear text.

That is only a problem if someone can sniff the communications channel
used by the monitor socket between QEMU  the management application.
IOW, this is only a problem if someone has configured QEMU to listen on
a TCP / UDP socket for monitor traffic. If they had done this, it would
be considered an insecure configuration regardless of whether qcow2
encryption is used or not. So I don't think there's any problem which
needs solving from the POV of clear text keys over the monitor, besides
to document that you should configure QEMU such that its monitor is
only accessible to the app managing it. eg use a UNIX domain socket
for configuration.

 Do you (the block maintainers) have an idea on how the code could be improved
 to securely pass the crypto key to the QCOW2 code ?

More generally, QCow2's current encryption support is woefully inadequate
from a design POV. If we wanted better encryption built-in to QEMU it is
best to just deprecate the current encryption support and define a new
qcow2 extension based around something like the LUKS data format. Using
the LUKS data format precisely would be good from a data portability
POV, since then you can easily switch your images between LUKS encrypted
block device  qcow2-with-luks image file, without needing to re-encrypt
the data.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH 01/18] qapi-types.py: Split off generate_struct_fields()

2013-07-23 Thread Kevin Wolf

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 scripts/qapi-types.py | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
index ddcfed9..e1239e1 100644
--- a/scripts/qapi-types.py
+++ b/scripts/qapi-types.py
@@ -57,12 +57,8 @@ typedef struct %(name)sList
 ''',
  name=name)
 
-def generate_struct(structname, fieldname, members):
-ret = mcgen('''
-struct %(name)s
-{
-''',
-  name=structname)
+def generate_struct_fields(members):
+ret = ''
 
 for argname, argentry, optional, structured in parse_args(members):
 if optional:
@@ -80,6 +76,17 @@ struct %(name)s
 ''',
  c_type=c_type(argentry), c_name=c_var(argname))
 
+return ret
+
+def generate_struct(structname, fieldname, members):
+ret = mcgen('''
+struct %(name)s
+{
+''',
+  name=structname)
+
+ret += generate_struct_fields(members)
+
 if len(fieldname):
 fieldname =   + fieldname
 ret += mcgen('''
-- 
1.8.1.4

[Qemu-devel] [PATCH 09/18] qapi.py: Maintain a list of union types

2013-07-23 Thread Kevin Wolf

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 scripts/qapi.py | 13 +
 1 file changed, 13 insertions(+)

diff --git a/scripts/qapi.py b/scripts/qapi.py
index baf1321..3a54c7f 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -105,6 +105,7 @@ def parse_schema(fp):
 if expr_eval.has_key('enum'):
 add_enum(expr_eval['enum'])
 elif expr_eval.has_key('union'):
+add_union(expr_eval)
 add_enum('%sKind' % expr_eval['union'])
 elif expr_eval.has_key('type'):
 add_struct(expr_eval)
@@ -188,6 +189,7 @@ def type_name(name):
 
 enum_types = []
 struct_types = []
+union_types = []
 
 def add_struct(definition):
 global struct_types
@@ -200,6 +202,17 @@ def find_struct(name):
 return struct
 return None
 
+def add_union(definition):
+global union_types
+union_types.append(definition)
+
+def find_union(name):
+global union_types
+for union in union_types:
+if union['union'] == name:
+return union
+return None
+
 def add_enum(name):
 global enum_types
 enum_types.append(name)
-- 
1.8.1.4

[Qemu-devel] [PATCH 00/18] 'blockdev-add' QMP command

2013-07-23 Thread Kevin Wolf

Kevin Wolf (18):
  qapi-types.py: Split off generate_struct_fields()
  qapi-types.py: Implement 'base' for unions
  qapi-visit.py: Split off generate_visit_struct_fields()
  qapi-visit.py: Implement 'base' for unions
  docs: Document QAPI union types
  qapi: Add visitor for implicit structs
  qapi: Flat unions with arbitrary discriminator
  qapi: Add consume argument to qmp_input_get_object()
  qapi.py: Maintain a list of union types
  qapi: Anonymous unions
  block: Allow driver option on the top level
  QemuOpts: Add qemu_opt_unset()
  blockdev: Rename I/O throttling options for QMP
  qcow2: Use dashes instead of underscores in options
  blockdev: Rename 'readonly' option to 'read-only'
  blockdev: Split up 'cache' option
  Implement qdict_flatten()
  blockdev: 'blockdev-add' QMP command

 block.c |   7 ++
 block/qcow2.h   |   8 +-
 blockdev.c  | 184 ++--
 docs/qapi-code-gen.txt  | 105 +++-
 include/qapi/qmp/qdict.h|   1 +
 include/qapi/qmp/qobject.h  |   1 +
 include/qapi/visitor-impl.h |   6 +
 include/qapi/visitor.h  |   6 +
 include/qemu/option.h   |   1 +
 qapi-schema.json| 293 
 qapi/qapi-visit-core.c  |  25 
 qapi/qmp-input-visitor.c|  47 +--
 qmp-commands.hx |  26 
 qobject/qdict.c |  50 
 qobject/qjson.c |   2 +
 scripts/qapi-types.py   |  83 +++--
 scripts/qapi-visit.py   | 179 ++-
 scripts/qapi.py |  28 +
 util/qemu-option.c  |  14 +++
 19 files changed, 970 insertions(+), 96 deletions(-)

-- 
1.8.1.4

1 2 3 >

1 - 100 of 284 matches

Mail list logo