Re: [Qemu-devel] [PATCHv7 4/9] slirp: Factorizing tcpiphdr structure with an union

2016-02-21 Thread Thomas Huth
On 22.02.2016 02:48, Samuel Thibault wrote:
> Hello,
> 
> Thomas Huth, on Fri 19 Feb 2016 14:44:59 +0100, wrote:
>>> +   m->m_data -= sizeof(struct tcpiphdr) - (sizeof(struct ip)
>>> ++ sizeof(struct tcphdr));
>>> +   m->m_len += sizeof(struct tcpiphdr) - (sizeof(struct ip)
>>> +   + sizeof(struct tcphdr));
>>
>> I'm somewhat having a hard time to understand the  "+ sizeof(struct
>> tcphdr))" here.
>>
>> In the tcp_output.c code, there is this:
>>
>>  m->m_data += sizeof(struct tcpiphdr) - sizeof(struct tcphdr)
>>   - sizeof(struct ip);
>>
>> So with my limited point of view, I'd rather expect this here in
>> tcp_input.c:
>>
>>  m->m_data -= sizeof(struct tcpiphdr) - (sizeof(struct ip)
>>   - sizeof(struct tcphdr));
>> i.e. "-" instead of "+" here ^
> 
> The parentheses and indentation were misleading actually, here is how it
> should actually looks like:
> 
>>> +   m->m_data -= sizeof(struct tcpiphdr) - ( sizeof(struct ip)
>>> ++ sizeof(struct tcphdr));
> 
> I've now dropped the parentheses, so it looks like the tcp_output.c code:
> 
>   m->m_data -= sizeof(struct tcpiphdr) - sizeof(struct ip)
>- sizeof(struct tcphdr);

Ah, sorry, I indeed simply got confused because it was written in two
different ways :-/ ... would it maybe be applicable to use the
TCPIPHDR_DELTA macro here instead?

Apart from that, the patch looks ok to me.

 Thomas




Re: [Qemu-devel] [RFC PATCH v0 4/8] spapr: Introduce CPU core device

2016-02-21 Thread David Gibson
On Mon, Feb 22, 2016 at 07:44:40AM +0100, Andreas Färber wrote:
> Am 22.02.2016 um 06:01 schrieb Bharata B Rao:
> > sPAPR CPU core device is a container of CPU thread devices. CPU hotplug is
> > performed in the granularity of CPU core device by setting the "realized"
> > property of this device to "true". When hotplugged, CPU core creates CPU
> > thread devices.
> > 
> > TODO: Right now allows for only homogeneous configurations as we depend
> > on global smp_threads and machine->cpu_model.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/Makefile.objs   |  1 +
> >  hw/ppc/spapr_cpu_package.c | 50 
> > ++
> >  include/hw/ppc/spapr_cpu_package.h | 26 
> >  3 files changed, 77 insertions(+)
> >  create mode 100644 hw/ppc/spapr_cpu_package.c
> >  create mode 100644 include/hw/ppc/spapr_cpu_package.h
> > 
> > diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> > index c1ffc77..3000982 100644
> > --- a/hw/ppc/Makefile.objs
> > +++ b/hw/ppc/Makefile.objs
> > @@ -4,6 +4,7 @@ obj-y += ppc.o ppc_booke.o
> >  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
> >  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> >  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> > +obj-$(CONFIG_PSERIES) += spapr_cpu_package.o
> >  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >  obj-y += spapr_pci_vfio.o
> >  endif
> > diff --git a/hw/ppc/spapr_cpu_package.c b/hw/ppc/spapr_cpu_package.c
> > new file mode 100644
> > index 000..3120a16
> > --- /dev/null
> > +++ b/hw/ppc/spapr_cpu_package.c
> > @@ -0,0 +1,50 @@
> > +/*
> > + * sPAPR CPU package device, acts as container of CPU thread devices.
> > + *
> > + * Copyright (C) 2016 Bharata B Rao 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +#include "hw/cpu/package.h"
> > +#include "hw/ppc/spapr_cpu_package.h"
> > +#include "hw/boards.h"
> > +#include 
> > +#include "qemu/error-report.h"
> > +
> > +static void spapr_cpu_package_instance_init(Object *obj)
> > +{
> > +int i;
> > +CPUState *cpu;
> > +MachineState *machine = MACHINE(qdev_get_machine());
> > +sPAPRCPUPackage *package = SPAPR_CPU_PACKAGE(obj);
> > +
> > +/* Create as many CPU threads as specified in the topology */
> > +for (i = 0; i < smp_threads; i++) {
> > +cpu = cpu_generic_init(machine->cpu_type, machine->cpu_model);
> 
> No, no, no. This is horribly violating QOM design.

Ok.. why?  There does not, to me, seem to be any remotely easily
discoverable means of finding out what QOM design principles are.

It also would have been nice if you weighed in on my RFC this code is
based on.

> Please compare the x86 RFC.

Where do I find this?

From all I could tell there just seemed to be a lot of spinning of
wheels on the cpu hotplug stuff, which is why I made the cpu-packages
proposal to try and move things forward.

There's been a lot of "don't do it like that" but precious little
"here's how you _should_ do it" that actually addresses the needs of
the various platforms.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [V6 0/4] AMD IOMMU

2016-02-21 Thread Jan Kiszka
On 2016-02-22 06:57, David Kiarie wrote:
> On Sun, Feb 21, 2016 at 11:20 PM, Jan Kiszka  wrote:
>> On 2016-02-21 19:10, David Kiarie wrote:
>>> Hello there,
>>>
>>> Repost, AMD IOMMU patches version 6.
>>>
>>> Changes since version 5
>>>  -Fixed macro formating issues
>>>  -changed occurences of IO MMU to IOMMU for consistency
>>>  -Fixed capability registers duplication
>>>  -Rebased to current master
>>
>> I suspect this still has some subtle bugs: I'm running the patches over
>> master with standard Linux distro as guest, full desktop, and I'm
>> getting sporadic segfaults of arbitrary programs. These disappear once I
>> disable the IOMMU or switch to the Intel version.
> 
> Is this L1 guest or L2 guest ? - haven't got any such so far.

It's L1 only.

> 
>>
>> How did you test so far?
> 
> I mainly test by logging. I've tested L1 without any iommu-related
> command line parameters and with L1 with 'iommu=1 iommu=pt'. L2 guest;
> passed-through a device checked it's working correctly, that all.
> These guests barely have any load though.

I quickly reproduced the issue by starting some "heavier" applications,
a browser or an office suite. Something is apparently always corrupted
then, data or code, thus the crashes.

Jan




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Memory mapping on MIPS

2016-02-21 Thread Igor R
I have some issues when accessing guest Linux kernel memory above
0xC000 by means of cpu_memory_rw_debug (x86_64 host, MIPS guest),
and I'm trying to debug it.

Here is an excerpt from r4k_map_address(), related to addresses >= 0x8000.
Actually, it maps 0x8010 and 0xA010 to the same physical
address. What's the idea behind that?
What should happen if I map KSEG2 directly as a continuation of KSEG1,
i.e. substitute TLB lookup with "address - (int32_t)KSEG1_BASE"? Guest
Linux seems to work correctly (but maybe it's just a matter of luck?).

Thanks!

#define KSEG0_BASE 0x8000UL
#define KSEG1_BASE 0xA000UL
#define KSEG2_BASE 0xC000UL
#define KSEG3_BASE 0xE000UL
//..
if (address < (int32_t)KSEG1_BASE) {
  /* kseg0 */
  if (kernel_mode) {
*physical = address - (int32_t)KSEG0_BASE;
*prot = PAGE_READ | PAGE_WRITE;
  } else {
ret = TLBRET_BADADDR;
  }
} else if (address < (int32_t)KSEG2_BASE) {
  /* kseg1 */
  if (kernel_mode) {
*physical = address - (int32_t)KSEG1_BASE;
*prot = PAGE_READ | PAGE_WRITE;
  } else {
ret = TLBRET_BADADDR;
  }
} else if (address < (int32_t)KSEG3_BASE) {
/* sseg (kseg2) */
if (supervisor_mode || kernel_mode) {
  ret = env->tlb->map_address(env, physical, prot, real_address,
rw, access_type);
} else {
  ret = TLBRET_BADADDR;
  }



Re: [Qemu-devel] [PATCH v5 0/2] add avx2 instruction optimization

2016-02-21 Thread Li, Liang Z
> Not sure; I could take it from the migration tree if no one objects.
> 
>   Amit

Thanks, Amit. If rework is needed, just let me know.

Liang



Re: [Qemu-devel] [RFC PATCH v0 4/8] spapr: Introduce CPU core device

2016-02-21 Thread Andreas Färber
Am 22.02.2016 um 06:01 schrieb Bharata B Rao:
> sPAPR CPU core device is a container of CPU thread devices. CPU hotplug is
> performed in the granularity of CPU core device by setting the "realized"
> property of this device to "true". When hotplugged, CPU core creates CPU
> thread devices.
> 
> TODO: Right now allows for only homogeneous configurations as we depend
> on global smp_threads and machine->cpu_model.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  hw/ppc/Makefile.objs   |  1 +
>  hw/ppc/spapr_cpu_package.c | 50 
> ++
>  include/hw/ppc/spapr_cpu_package.h | 26 
>  3 files changed, 77 insertions(+)
>  create mode 100644 hw/ppc/spapr_cpu_package.c
>  create mode 100644 include/hw/ppc/spapr_cpu_package.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c1ffc77..3000982 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -4,6 +4,7 @@ obj-y += ppc.o ppc_booke.o
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> +obj-$(CONFIG_PSERIES) += spapr_cpu_package.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> diff --git a/hw/ppc/spapr_cpu_package.c b/hw/ppc/spapr_cpu_package.c
> new file mode 100644
> index 000..3120a16
> --- /dev/null
> +++ b/hw/ppc/spapr_cpu_package.c
> @@ -0,0 +1,50 @@
> +/*
> + * sPAPR CPU package device, acts as container of CPU thread devices.
> + *
> + * Copyright (C) 2016 Bharata B Rao 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "hw/cpu/package.h"
> +#include "hw/ppc/spapr_cpu_package.h"
> +#include "hw/boards.h"
> +#include 
> +#include "qemu/error-report.h"
> +
> +static void spapr_cpu_package_instance_init(Object *obj)
> +{
> +int i;
> +CPUState *cpu;
> +MachineState *machine = MACHINE(qdev_get_machine());
> +sPAPRCPUPackage *package = SPAPR_CPU_PACKAGE(obj);
> +
> +/* Create as many CPU threads as specified in the topology */
> +for (i = 0; i < smp_threads; i++) {
> +cpu = cpu_generic_init(machine->cpu_type, machine->cpu_model);

No, no, no. This is horribly violating QOM design.

Please compare the x86 RFC.

Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection

2016-02-21 Thread Zhang Chen



On 02/20/2016 04:04 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

From: zhangchen 

In here we will handle ip packet and connection

Signed-off-by: zhangchen 
Signed-off-by: zhanghailiang 
---
  net/colo-proxy.c | 130 +++
  1 file changed, 130 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index 5e5c72e..06bab80 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, 
const void *opaque2)
  return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
  }
  
+static void connection_destroy(void *opaque)

+{
+Connection *conn = opaque;
+
+g_queue_foreach(>primary_list, packet_destroy, NULL);
+g_queue_free(>primary_list);
+g_queue_foreach(>secondary_list, packet_destroy, NULL);

Be careful about these lists and which threads access them;
I found I could occasionally trigger a seg fault as two
threads tried to manipulate them at once; I just put a 'list_lock'
in the connection, which seems to fix it, but I might have to be
more careful with deadlocks.


Thanks for your work to colo.
and where can I  see your code for colo-proxy?
maybe I need it to make my code better.




+g_queue_free(>secondary_list);
+g_slice_free(Connection, conn);
+}
+
+static Connection *connection_new(ConnectionKey *key)
+{
+Connection *conn = g_slice_new(Connection);
+
+conn->ip_proto = key->ip_proto;
+conn->processing = false;
+g_queue_init(>primary_list);
+g_queue_init(>secondary_list);
+
+return conn;
+}
+
+/*
+ * Clear hashtable, stop this hash growing really huge
+ */
+static void clear_connection_hashtable(COLOProxyState *s)
+{
+s->hashtable_size = 0;
+g_hash_table_remove_all(colo_conn_hash);
+trace_colo_proxy("clear_connection_hashtable");
+}
+
  bool colo_proxy_query_checkpoint(void)
  {
  return colo_do_checkpoint;
  }
  
+/* Return 0 on success, or return -1 if the pkt is corrupted */

+static int parse_packet_early(Packet *pkt, ConnectionKey *key)
+{
+int network_length;
+uint8_t *data = pkt->data;
+uint16_t l3_proto;
+uint32_t tmp_ports;
+ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+
+pkt->network_layer = data + ETH_HLEN;
+l3_proto = eth_get_l3_proto(data, l2hdr_len);
+if (l3_proto != ETH_P_IP) {
+if (l3_proto == ETH_P_ARP) {
+return -1;
+}
+return 0;
+}
+
+network_length = pkt->ip->ip_hl * 4;
+pkt->transport_layer = pkt->network_layer + network_length;
+key->ip_proto = pkt->ip->ip_p;
+key->src = pkt->ip->ip_src;
+key->dst = pkt->ip->ip_dst;
+
+switch (key->ip_proto) {
+case IPPROTO_TCP:
+case IPPROTO_UDP:
+case IPPROTO_DCCP:
+case IPPROTO_ESP:
+case IPPROTO_SCTP:
+case IPPROTO_UDPLITE:
+tmp_ports = *(uint32_t *)(pkt->transport_layer);
+key->src_port = tmp_ports & 0x;
+key->dst_port = tmp_ports >> 16;

These fields are not byteswapped; it makes it very confusing
when printing them for debug;  I added htons around every
reading of the ports from the packets.

Dave


I will fix it in colo-compare module.

Thanks
zhangchen


+break;
+case IPPROTO_AH:
+tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
+key->src_port = tmp_ports & 0x;
+key->dst_port = tmp_ports >> 16;
+break;
+default:
+break;
+}
+
+return 0;
+}
+
+static Packet *packet_new(COLOProxyState *s, void *data,
+  int size, ConnectionKey *key, NetClientState *sender)
+{
+Packet *pkt = g_slice_new(Packet);
+
+pkt->data = data;
+pkt->size = size;
+pkt->s = s;
+pkt->sender = sender;
+
+if (parse_packet_early(pkt, key)) {
+packet_destroy(pkt, NULL);
+pkt = NULL;
+}
+
+return pkt;
+}
+
+static void packet_destroy(void *opaque, void *user_data)
+{
+Packet *pkt = opaque;
+g_free(pkt->data);
+g_slice_free(Packet, pkt);
+}
+
+/* if not found, creata a new connection and add to hash table */
+static Connection *colo_proxy_get_conn(COLOProxyState *s,
+ConnectionKey *key)
+{
+/* FIXME: protect colo_conn_hash */
+Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
+
+if (conn == NULL) {
+ConnectionKey *new_key = g_malloc(sizeof(*key));
+
+conn = connection_new(key);
+memcpy(new_key, key, sizeof(*key));
+
+s->hashtable_size++;
+if (s->hashtable_size > hashtable_max_size) {
+trace_colo_proxy("colo proxy connection hashtable full, clear it");
+clear_connection_hashtable(s);
+} else {
+g_hash_table_insert(colo_conn_hash, new_key, conn);
+}
+}
+
+ return conn;
+}
+
  static ssize_t 

Re: [Qemu-devel] [PATCH v5 0/2] add avx2 instruction optimization

2016-02-21 Thread Amit Shah
On (Mon) 22 Feb 2016 [06:23:54], Li, Liang Z wrote:
> > On 27/01/2016 08:33, Liang Li wrote:
> > > buffer_find_nonzero_offset() is a hot function during live migration.
> > > Now it use SSE2 instructions for optimization. For platform supports
> > > AVX2 instructions, use the AVX2 instructions for optimization can help
> > > to improve the performance of zero page checking about 30% comparing
> > > to SSE2.
> > > Live migration can be faster with this optimization, the test result
> > > shows that for an 8GB RAM idle guest, this patch can help to shorten
> > > the total live migration time about 6%.
> > >
> > > This patch use the ifunc mechanism to select the proper function when
> > > running, for platform supports AVX2, execute the AVX2 instructions,
> > > else, execute the original instructions.
> > >
> > > With this patch, the QEMU binary can run on both platforms support
> > > AVX2 or not.
> > >
> > > Compiler which doesn't support the AVX2 and ifunc attribute can also
> > > build the source code successfully.
> > >
> > > v5 -> v4 changes:
> > >   * Enhance the ifunc attribute detection (Paolo's suggestion)
> > >
> > > v3 -> v4 changes:
> > >   * Use the GCC #pragma to make things simple (Paolo's suggestion)
> > >   * Put avx2 related code in cutils.c (Richard's suggestion)
> > >   * Change the configure, detect ifunc and avx2 attributes together
> > >
> > > v2 -> v3 changes:
> > >   * Detect the ifunc attribute support (Paolo's suggestion)
> > >   * Use the ifunc attribute instead of the inline asm (Richard's 
> > > suggestion)
> > >   * Change the configure (Juan's suggestion)
> > >
> > > Liang Li (2):
> > >   configure: detect ifunc and avx2 attribute
> > >   cutils: add avx2 instruction optimization
> > >
> > >  configure |  21 +
> > >  include/qemu-common.h |   8 +---
> > >  util/cutils.c | 118
> > --
> > >  3 files changed, 136 insertions(+), 11 deletions(-)
> > 
> > Reviewed-by: Paolo Bonzini 
> 
> This patch set is pending here for a long time,  who can help to make it 
> merged?

Not sure; I could take it from the migration tree if no one objects.


Amit



Re: [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address

2016-02-21 Thread David Gibson
On Mon, Feb 22, 2016 at 05:09:39PM +1100, Alexey Kardashevskiy wrote:
> Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
> when new VFIO listener is added, all existing IOMMU mappings are replayed.
> However there is a problem that the base address of an IOMMU memory region
> (IOMMU MR) is ignored which is not a problem for the existing user (which is
> pseries) with its default 32bit DMA window starting at 0 but it is if there is
> another DMA window.
> 
> This adjusts the replaying address by mr->addr.

Uh.. this doesn't look right to me.  AFAICT from the existing
implementations the 'addr' parameter to the translate function is an
offset within the memory region, which would make the original version
correct.

> Signed-off-by: Alexey Kardashevskiy 
> ---
>  memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/memory.c b/memory.c
> index 09041ed..377269b 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1436,7 +1436,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
> Notifier *n,
>  IOMMUTLBEntry iotlb;
>  
>  for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +iotlb = mr->iommu_ops->translate(mr, mr->addr + addr, is_write);
>  if (iotlb.perm != IOMMU_NONE) {
>  n->notify(n, );
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 0/2] add avx2 instruction optimization

2016-02-21 Thread Li, Liang Z
> On 27/01/2016 08:33, Liang Li wrote:
> > buffer_find_nonzero_offset() is a hot function during live migration.
> > Now it use SSE2 instructions for optimization. For platform supports
> > AVX2 instructions, use the AVX2 instructions for optimization can help
> > to improve the performance of zero page checking about 30% comparing
> > to SSE2.
> > Live migration can be faster with this optimization, the test result
> > shows that for an 8GB RAM idle guest, this patch can help to shorten
> > the total live migration time about 6%.
> >
> > This patch use the ifunc mechanism to select the proper function when
> > running, for platform supports AVX2, execute the AVX2 instructions,
> > else, execute the original instructions.
> >
> > With this patch, the QEMU binary can run on both platforms support
> > AVX2 or not.
> >
> > Compiler which doesn't support the AVX2 and ifunc attribute can also
> > build the source code successfully.
> >
> > v5 -> v4 changes:
> >   * Enhance the ifunc attribute detection (Paolo's suggestion)
> >
> > v3 -> v4 changes:
> >   * Use the GCC #pragma to make things simple (Paolo's suggestion)
> >   * Put avx2 related code in cutils.c (Richard's suggestion)
> >   * Change the configure, detect ifunc and avx2 attributes together
> >
> > v2 -> v3 changes:
> >   * Detect the ifunc attribute support (Paolo's suggestion)
> >   * Use the ifunc attribute instead of the inline asm (Richard's suggestion)
> >   * Change the configure (Juan's suggestion)
> >
> > Liang Li (2):
> >   configure: detect ifunc and avx2 attribute
> >   cutils: add avx2 instruction optimization
> >
> >  configure |  21 +
> >  include/qemu-common.h |   8 +---
> >  util/cutils.c | 118
> --
> >  3 files changed, 136 insertions(+), 11 deletions(-)
> 
> Reviewed-by: Paolo Bonzini 

This patch set is pending here for a long time,  who can help to make it merged?

Liang



[Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address

2016-02-21 Thread Alexey Kardashevskiy
Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
when new VFIO listener is added, all existing IOMMU mappings are replayed.
However there is a problem that the base address of an IOMMU memory region
(IOMMU MR) is ignored which is not a problem for the existing user (which is
pseries) with its default 32bit DMA window starting at 0 but it is if there is
another DMA window.

This adjusts the replaying address by mr->addr.

Signed-off-by: Alexey Kardashevskiy 
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 09041ed..377269b 100644
--- a/memory.c
+++ b/memory.c
@@ -1436,7 +1436,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
Notifier *n,
 IOMMUTLBEntry iotlb;
 
 for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
-iotlb = mr->iommu_ops->translate(mr, addr, is_write);
+iotlb = mr->iommu_ops->translate(mr, mr->addr + addr, is_write);
 if (iotlb.perm != IOMMU_NONE) {
 n->notify(n, );
 }
-- 
2.5.0.rc3




Re: [Qemu-devel] [V6 0/4] AMD IOMMU

2016-02-21 Thread David Kiarie
On Sun, Feb 21, 2016 at 11:20 PM, Jan Kiszka  wrote:
> On 2016-02-21 19:10, David Kiarie wrote:
>> Hello there,
>>
>> Repost, AMD IOMMU patches version 6.
>>
>> Changes since version 5
>>  -Fixed macro formating issues
>>  -changed occurences of IO MMU to IOMMU for consistency
>>  -Fixed capability registers duplication
>>  -Rebased to current master
>
> I suspect this still has some subtle bugs: I'm running the patches over
> master with standard Linux distro as guest, full desktop, and I'm
> getting sporadic segfaults of arbitrary programs. These disappear once I
> disable the IOMMU or switch to the Intel version.

Is this L1 guest or L2 guest ? - haven't got any such so far.

>
> How did you test so far?

I mainly test by logging. I've tested L1 without any iommu-related
command line parameters and with L1 with 'iommu=1 iommu=pt'. L2 guest;
passed-through a device checked it's working correctly, that all.
These guests barely have any load though.

>
> Jan
>
>



Re: [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func

2016-02-21 Thread Zhang Chen



On 02/20/2016 04:01 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

From: zhangchen 

Colo need to forward packets
we start socket server in secondary and primary
connect to secondary in startup
the packet recv by primary forward to secondary
the packet send by secondary forward to primary

Signed-off-by: zhangchen 
Signed-off-by: zhanghailiang 

I found one problem with the socket setup is that the
packets from the primary and secondary aren't tied to the
checkpoint they are part of; so for example a packet from the secondary
may reach the primary at the start of the next checkpoint, causing a
miscomparison.
I added a counter to discard old packets.

Dave


I will fix it in colo-compare module.

Thanks
zhangchen




---
  net/colo-proxy.c | 114 +++
  1 file changed, 114 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index ba2bbe7..2347bbf 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -172,6 +172,69 @@ bool colo_proxy_query_checkpoint(void)
  return colo_do_checkpoint;
  }
  
+/*

+ * send a packet to peer
+ * >=0: success
+ * <0: fail
+ */
+static ssize_t colo_proxy_sock_send(NetFilterState *nf,
+ const struct iovec *iov,
+ int iovcnt)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(nf);
+ssize_t ret = 0;
+ssize_t size = 0;
+struct iovec sizeiov = {
+.iov_base = ,
+.iov_len = sizeof(size)
+};
+size = iov_size(iov, iovcnt);
+if (!size) {
+return 0;
+}
+
+ret = iov_send(s->sockfd, , 1, 0, sizeof(size));
+if (ret < 0) {
+return ret;
+}
+ret = iov_send(s->sockfd, iov, iovcnt, 0, size);
+return ret;
+}
+
+/*
+ * receive a packet from peer
+ * in primary: enqueue packet to secondary_list
+ * in secondary: pass packet to next
+ */
+static void colo_proxy_sock_receive(void *opaque)
+{
+NetFilterState *nf = opaque;
+COLOProxyState *s = FILTER_COLO_PROXY(nf);
+ssize_t len = 0;
+struct iovec sizeiov = {
+.iov_base = ,
+.iov_len = sizeof(len)
+};
+
+iov_recv(s->sockfd, , 1, 0, sizeof(len));
+if (len > 0 && len < NET_BUFSIZE) {
+char *buf = g_malloc0(len);
+struct iovec iov = {
+.iov_base = buf,
+.iov_len = len
+};
+
+iov_recv(s->sockfd, , 1, 0, len);
+if (s->colo_mode == COLO_MODE_PRIMARY) {
+colo_proxy_enqueue_secondary_packet(nf, buf, len);
+/* buf will be release when pakcet destroy */
+} else {
+qemu_net_queue_send(s->incoming_queue, nf->netdev,
+0, (const uint8_t *)buf, len, NULL);
+}
+}
+}
+
  static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
   NetClientState *sender,
   unsigned flags,
@@ -208,6 +271,57 @@ static void colo_proxy_cleanup(NetFilterState *nf)
  qemu_event_destroy(>need_compare_ev);
  }
  
+/* wait for peer connecting

+ * NOTE: this function will block the caller
+ * 0 on success, otherwise returns -1
+ */
+static int colo_wait_incoming(COLOProxyState *s)
+{
+struct sockaddr_in addr;
+socklen_t addrlen = sizeof(addr);
+int accept_sock, err;
+int fd = inet_listen(s->addr, NULL, 256, SOCK_STREAM, 0, NULL);
+
+if (fd < 0) {
+error_report("colo proxy listen failed");
+return -1;
+}
+
+do {
+accept_sock = qemu_accept(fd, (struct sockaddr *), );
+err = socket_error();
+} while (accept_sock < 0 && err == EINTR);
+closesocket(fd);
+
+if (accept_sock < 0) {
+error_report("colo proxy accept failed(%s)", strerror(err));
+return -1;
+}
+s->sockfd = accept_sock;
+
+qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
+
+return 0;
+}
+
+/* try to connect listening server
+ * 0 on success, otherwise something wrong
+ */
+static ssize_t colo_proxy_connect(COLOProxyState *s)
+{
+int sock;
+sock = inet_connect(s->addr, NULL);
+
+if (sock < 0) {
+error_report("colo proxy inet_connect failed");
+return -1;
+}
+s->sockfd = sock;
+qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
+
+return 0;
+}
+
  static void colo_proxy_notify_checkpoint(void)
  {
  trace_colo_proxy("colo_proxy_notify_checkpoint");
--
1.9.1





--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Thanks
zhangchen






Re: [Qemu-devel] [RFC PATCH v0 0/8] cpu-package hotplug

2016-02-21 Thread Bharata B Rao
On Mon, Feb 22, 2016 at 10:31:17AM +0530, Bharata B Rao wrote:
> Hi,
> 
> This is an attempt to implement David Gibson's RFC that was posted at
> https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg0.html
> I am not sure if I have followed all the aspects of the RFC fully, but we
> can make changes going forward.
> 
> An example cpu-package implementation is done for sPAPR in this patchset.
> Hot removal is not yet done in this patchset.
> 
> For the command line,
> 
> -smp 8,sockets=1,cores=1,threads=8,maxcpus=16 -numa node,nodeid=0,cpus=0-7 
> -numa node,nodeid=1,cpus=8-15
> 
> the HMP query looks like this:
> 
> (qemu) info cpu-packages 
> CPU Package: ""
>   type: "spapr-cpu-package"
>   qom_path: "/machine/cpu-package[0]"
>   realized: true
>   nr_cpus: 8
>   CPU: 0
> Type: "host-powerpc64-cpu"
> Arch ID: 0
> Thread: 0
> Core: 0
> Socket: 0
> Node: 0

  qom_path: "/machine/cpu-package[0]/thread[0]"

Missed qom_path.

Regards,
Bharata.




[Qemu-devel] [RFC PATCH v0 3/8] cpu: CPU package abstract device

2016-02-21 Thread Bharata B Rao
A minimal abstract device that target machines can create sub-types of
to define their own cpu-package devices. Provides a realize routine that
walks the child objects and realizes them iteratively. Hence cpu-package
interface expects the target implementations to have a hierarchical
setup for their CPU objects.

Signed-off-by: Bharata B Rao 
---
 hw/cpu/Makefile.objs |  1 +
 hw/cpu/package.c | 66 
 include/hw/cpu/package.h | 27 
 3 files changed, 94 insertions(+)
 create mode 100644 hw/cpu/package.c
 create mode 100644 include/hw/cpu/package.h

diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 0954a18..f540826 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -2,4 +2,5 @@ obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o
 obj-$(CONFIG_REALVIEW) += realview_mpcore.o
 obj-$(CONFIG_A9MPCORE) += a9mpcore.o
 obj-$(CONFIG_A15MPCORE) += a15mpcore.o
+obj-y += package.o
 
diff --git a/hw/cpu/package.c b/hw/cpu/package.c
new file mode 100644
index 000..259dbfa
--- /dev/null
+++ b/hw/cpu/package.c
@@ -0,0 +1,66 @@
+/*
+ * CPU package device
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/cpu/package.h"
+#include "qom/object_interfaces.h"
+
+Object *cpu_package_create_object(const char *typename, uint32_t index,
+  Error **errp)
+{
+char *id;
+Object *obj;
+
+id = g_strdup_printf("" TYPE_CPU_PACKAGE "[%"PRIu32"]", index);
+obj = object_new(typename);
+object_property_add_child(qdev_get_machine(), id, obj, errp);
+g_free(id);
+
+if (*errp) {
+return NULL;
+} else {
+return obj;
+}
+}
+
+static int cpu_package_realize_child(Object *child, void *opaque)
+{
+Error **errp = opaque;
+
+object_property_set_bool(child, true, "realized", errp);
+if (*errp) {
+return 1;
+}
+return 0;
+}
+
+static void cpu_package_realize(DeviceState *dev, Error **errp)
+{
+object_child_foreach(OBJECT(dev), cpu_package_realize_child, errp);
+}
+
+static void cpu_package_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = cpu_package_realize;
+}
+
+static const TypeInfo cpu_package_info = {
+.name = TYPE_CPU_PACKAGE,
+.parent = TYPE_DEVICE,
+.abstract = true,
+.instance_size = sizeof(CPUPackage),
+.class_init = cpu_package_class_init,
+};
+
+static void cpu_package_register_types(void)
+{
+type_register_static(_package_info);
+}
+
+type_init(cpu_package_register_types)
diff --git a/include/hw/cpu/package.h b/include/hw/cpu/package.h
new file mode 100644
index 000..0579a42
--- /dev/null
+++ b/include/hw/cpu/package.h
@@ -0,0 +1,27 @@
+/*
+ * CPU package device
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_CPU_PACKAGE_H
+#define HW_CPU_PACKAGE_H
+
+#include "hw/qdev.h"
+
+#define TYPE_CPU_PACKAGE "cpu-package"
+#define CPU_PACKAGE(obj) \
+OBJECT_CHECK(CPUPackage, (obj), TYPE_CPU_PACKAGE)
+
+typedef struct CPUPackage {
+/* private */
+DeviceState parent_obj;
+
+/* public */
+} CPUPackage;
+
+Object *cpu_package_create_object(const char *typename, uint32_t index,
+  Error **errp);
+#endif
-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 8/8] hmp: Implement 'info cpu-slots'

2016-02-21 Thread Bharata B Rao
Signed-off-by: Bharata B Rao 
---
 hmp-commands-info.hx | 14 ++
 hmp.c| 50 ++
 hmp.h|  1 +
 3 files changed, 65 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 9b71351..6ca6da3 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -786,6 +786,20 @@ STEXI
 Display the value of a storage key (s390 only)
 ETEXI
 
+{
+.name   = "cpu-packages",
+.args_type  = "",
+.params = "",
+.help   = "show CPU packages",
+.mhandler.cmd = hmp_info_cpu_packages,
+},
+
+STEXI
+@item info cpu-packages
+@findex cpu-packages
+Show CPU packages
+ETEXI
+
 STEXI
 @end table
 ETEXI
diff --git a/hmp.c b/hmp.c
index cb03a15..3450600 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2374,3 +2374,53 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict 
*qdict)
 
 qapi_free_RockerOfDpaGroupList(list);
 }
+
+void hmp_info_cpu_packages(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+CPUPackageInfoList *cpu_packageinfo_list = qmp_query_cpu_packages();
+CPUPackageInfoList *s = cpu_packageinfo_list;
+CPUInfoList *cpu;
+int i;
+
+while (s) {
+monitor_printf(mon, "CPU Package: \"%s\"\n",
+   s->value->has_id ? s->value->id : "");
+monitor_printf(mon, "  type: \"%s\"\n", s->value->type);
+monitor_printf(mon, "  qom_path: \"%s\"\n", s->value->qom_path);
+monitor_printf(mon, "  realized: %s\n",
+   s->value->realized ? "true" : "false");
+monitor_printf(mon, "  nr_cpus: %" PRId64 "\n", s->value->nr_cpus);
+if (s->value->nr_cpus) {
+for (i = 0, cpu = s->value->cpus; cpu; cpu = cpu->next, i++) {
+monitor_printf(mon, "  CPU: %" PRId32 "\n", i);
+monitor_printf(mon, "Type: \"%s\"\n", cpu->value->type);
+monitor_printf(mon, "Arch ID: %" PRId64 "\n",
+   cpu->value->arch_id);
+if (cpu->value->has_thread) {
+monitor_printf(mon, "Thread: %" PRId64 "\n",
+   cpu->value->thread);
+}
+if (cpu->value->has_core) {
+monitor_printf(mon, "Core: %" PRId64 "\n",
+   cpu->value->core);
+}
+if (cpu->value->has_core) {
+monitor_printf(mon, "Socket: %" PRId64 "\n",
+   cpu->value->socket);
+}
+if (cpu->value->has_core) {
+monitor_printf(mon, "Node: %" PRId64 "\n",
+   cpu->value->node);
+}
+if (cpu->value->has_qom_path) {
+monitor_printf(mon, "qom_path: \"%s\"\n",
+   cpu->value->qom_path);
+}
+}
+}
+s = s->next;
+}
+
+qapi_free_CPUPackageInfoList(cpu_packageinfo_list);
+}
diff --git a/hmp.h b/hmp.h
index a8c5b5a..f78f55f 100644
--- a/hmp.h
+++ b/hmp.h
@@ -131,5 +131,6 @@ void hmp_rocker(Monitor *mon, const QDict *qdict);
 void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
+void hmp_info_cpu_packages(Monitor *mon, const QDict *qdict);
 
 #endif
-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 7/8] qmp: Implement query cpu-packages

2016-02-21 Thread Bharata B Rao
Signed-off-by: Bharata B Rao 
---
 hw/cpu/package.c| 19 +
 hw/ppc/spapr.c  | 79 +
 include/hw/boards.h |  1 +
 qapi-schema.json| 48 
 4 files changed, 147 insertions(+)

diff --git a/hw/cpu/package.c b/hw/cpu/package.c
index 259dbfa..4ff20fa 100644
--- a/hw/cpu/package.c
+++ b/hw/cpu/package.c
@@ -7,7 +7,26 @@
  * See the COPYING file in the top-level directory.
  */
 #include "hw/cpu/package.h"
+#include "hw/boards.h"
 #include "qom/object_interfaces.h"
+#include "qmp-commands.h"
+#include "qapi/qmp/qerror.h"
+
+/*
+ * QMP: query cpu-pacakges
+ */
+CPUPackageInfoList *qmp_query_cpu_packages(Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+if (!mc->cpu_packages) {
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
+
+return mc->cpu_packages(ms);
+}
 
 Object *cpu_package_create_object(const char *typename, uint32_t index,
   Error **errp)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0bbbaf8..147b9d1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2441,6 +2441,84 @@ static unsigned spapr_cpu_index_to_socket_id(unsigned 
cpu_index)
 return cpu_index / smp_threads / smp_cores;
 }
 
+static int spapr_cpuinfo_list(Object *obj, void *opaque)
+{
+MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
+CPUInfoList ***prev = opaque;
+
+if (object_dynamic_cast(obj, TYPE_CPU)) {
+CPUInfoList *elem = g_new0(CPUInfoList, 1);
+CPUInfo *s = g_new0(CPUInfo, 1);
+CPUState *cpu = CPU(obj);
+PowerPCCPU *pcpu = POWERPC_CPU(cpu);
+
+s->arch_id = ppc_get_vcpu_dt_id(pcpu);
+s->type = g_strdup(object_get_typename(obj));
+s->thread = cpu->cpu_index;
+s->has_thread = true;
+s->core = cpu->cpu_index / smp_threads;
+s->has_core = true;
+if (mc->cpu_index_to_socket_id) {
+s->socket = mc->cpu_index_to_socket_id(cpu->cpu_index);
+} else {
+s->socket = cpu->cpu_index / smp_threads / smp_cores;
+}
+s->has_socket = true;
+s->node = cpu->numa_node;
+s->has_node = true;
+s->qom_path = object_get_canonical_path(obj);
+
+elem->value = s;
+elem->next = NULL;
+**prev = elem;
+*prev = >next;
+}
+object_child_foreach(obj, spapr_cpuinfo_list, opaque);
+return 0;
+}
+
+static int spapr_cpu_packageinfo_list(Object *obj, void *opaque)
+{
+CPUPackageInfoList ***prev = opaque;
+
+if (object_dynamic_cast(obj, TYPE_CPU_PACKAGE)) {
+DeviceState *dev = DEVICE(obj);
+CPUPackageInfoList *elem = g_new0(CPUPackageInfoList, 1);
+CPUPackageInfo *s = g_new0(CPUPackageInfo, 1);
+CPUInfoList *cpu_head = NULL;
+CPUInfoList **cpu_prev = _head;
+
+if (dev->id) {
+s->has_id = true;
+s->id = g_strdup(dev->id);
+}
+s->realized = object_property_get_bool(obj, "realized", NULL);
+s->nr_cpus = smp_threads;
+s->qom_path = object_get_canonical_path(obj);
+s->type = g_strdup(TYPE_SPAPR_CPU_PACKAGE);
+if (s->realized) {
+spapr_cpuinfo_list(obj, _prev);
+}
+s->cpus = cpu_head;
+elem->value = s;
+elem->next = NULL;
+**prev = elem;
+*prev = >next;
+}
+
+object_child_foreach(obj, spapr_cpu_packageinfo_list, opaque);
+return 0;
+}
+
+static CPUPackageInfoList *spapr_cpu_packages(MachineState *machine)
+{
+CPUPackageInfoList *head = NULL;
+CPUPackageInfoList **prev = 
+
+spapr_cpu_packageinfo_list(qdev_get_machine(), );
+return head;
+}
+
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -2467,6 +2545,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 mc->has_dynamic_sysbus = true;
 mc->pci_allow_0_address = true;
 mc->get_hotplug_handler = spapr_get_hotpug_handler;
+mc->cpu_packages = spapr_cpu_packages;
 hc->plug = spapr_machine_device_plug;
 hc->unplug = spapr_machine_device_unplug;
 mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index cf95d10..66d8780 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -99,6 +99,7 @@ struct MachineClass {
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
 unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
+CPUPackageInfoList *(*cpu_packages)(MachineState *machine);
 };
 
 /**
diff --git a/qapi-schema.json b/qapi-schema.json
index 8d04897..5a0dd80 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4083,3 +4083,51 @@
 ##
 { 'enum': 'ReplayMode',
   'data': 

[Qemu-devel] [RFC PATCH v0 4/8] spapr: Introduce CPU core device

2016-02-21 Thread Bharata B Rao
sPAPR CPU core device is a container of CPU thread devices. CPU hotplug is
performed in the granularity of CPU core device by setting the "realized"
property of this device to "true". When hotplugged, CPU core creates CPU
thread devices.

TODO: Right now allows for only homogeneous configurations as we depend
on global smp_threads and machine->cpu_model.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/Makefile.objs   |  1 +
 hw/ppc/spapr_cpu_package.c | 50 ++
 include/hw/ppc/spapr_cpu_package.h | 26 
 3 files changed, 77 insertions(+)
 create mode 100644 hw/ppc/spapr_cpu_package.c
 create mode 100644 include/hw/ppc/spapr_cpu_package.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c1ffc77..3000982 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -4,6 +4,7 @@ obj-y += ppc.o ppc_booke.o
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
+obj-$(CONFIG_PSERIES) += spapr_cpu_package.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr_cpu_package.c b/hw/ppc/spapr_cpu_package.c
new file mode 100644
index 000..3120a16
--- /dev/null
+++ b/hw/ppc/spapr_cpu_package.c
@@ -0,0 +1,50 @@
+/*
+ * sPAPR CPU package device, acts as container of CPU thread devices.
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/cpu/package.h"
+#include "hw/ppc/spapr_cpu_package.h"
+#include "hw/boards.h"
+#include 
+#include "qemu/error-report.h"
+
+static void spapr_cpu_package_instance_init(Object *obj)
+{
+int i;
+CPUState *cpu;
+MachineState *machine = MACHINE(qdev_get_machine());
+sPAPRCPUPackage *package = SPAPR_CPU_PACKAGE(obj);
+
+/* Create as many CPU threads as specified in the topology */
+for (i = 0; i < smp_threads; i++) {
+cpu = cpu_generic_init(machine->cpu_type, machine->cpu_model);
+if (!cpu) {
+error_setg(_fatal, "Unable to find CPU definition: %s\n",
+   machine->cpu_model);
+}
+object_property_add_child(obj, "thread[*]", OBJECT(cpu), _fatal);
+object_unref(OBJECT(cpu));
+DEVICE(OBJECT(cpu))->hotplugged = true;
+if (!i) {
+package->thread0 = POWERPC_CPU(cpu);
+}
+}
+}
+
+static const TypeInfo spapr_cpu_package_type_info = {
+.name = TYPE_SPAPR_CPU_PACKAGE,
+.parent = TYPE_CPU_PACKAGE,
+.instance_init = spapr_cpu_package_instance_init,
+.instance_size = sizeof(sPAPRCPUPackage),
+};
+
+static void spapr_cpu_package_register_types(void)
+{
+type_register_static(_cpu_package_type_info);
+}
+
+type_init(spapr_cpu_package_register_types)
diff --git a/include/hw/ppc/spapr_cpu_package.h 
b/include/hw/ppc/spapr_cpu_package.h
new file mode 100644
index 000..547dbc1
--- /dev/null
+++ b/include/hw/ppc/spapr_cpu_package.h
@@ -0,0 +1,26 @@
+/*
+ * sPAPR CPU package device.
+ *
+ * Copyright (C) 2016 Bharata B Rao 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_SPAPR_CPU_PACKAGE_H
+#define HW_SPAPR_CPU_PACKAGE_H
+
+#include "hw/qdev.h"
+
+#define TYPE_SPAPR_CPU_PACKAGE "spapr-cpu-package"
+#define SPAPR_CPU_PACKAGE(obj) \
+OBJECT_CHECK(sPAPRCPUPackage, (obj), TYPE_SPAPR_CPU_PACKAGE)
+
+typedef struct sPAPRCPUPackage {
+/*< private >*/
+DeviceState parent_obj;
+
+/*< public >*/
+PowerPCCPU *thread0;
+} sPAPRCPUPackage;
+
+#endif
-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 6/8] spapr: CPU hotplug support

2016-02-21 Thread Bharata B Rao
Set up device tree entries for the hotplugged CPU core and use the
exising EPOW event infrastructure to send CPU hotplug notification to
the guest.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 136 -
 hw/ppc/spapr_events.c  |   3 ++
 hw/ppc/spapr_rtas.c|  24 +
 include/hw/ppc/spapr.h |   1 +
 4 files changed, 163 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f3913b4..0bbbaf8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -604,6 +604,18 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, 
int offset,
 size_t page_sizes_prop_size;
 uint32_t vcpus_per_socket = smp_threads * smp_cores;
 uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
+sPAPRDRConnector *drc;
+sPAPRDRConnectorClass *drck;
+int drc_index;
+
+if (smc->dr_cpu_enabled) {
+drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
+g_assert(drc);
+drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+drc_index = drck->get_index(drc);
+_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
+}
 
 /* Note: we keep CI large pages off for now because a 64K capable guest
  * provisioned with large pages might otherwise try to map a qemu
@@ -988,6 +1000,16 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 _FDT(spapr_drc_populate_dt(fdt, 0, NULL, SPAPR_DR_CONNECTOR_TYPE_LMB));
 }
 
+if (smc->dr_cpu_enabled) {
+int offset = fdt_path_offset(fdt, "/cpus");
+ret = spapr_drc_populate_dt(fdt, offset, NULL,
+SPAPR_DR_CONNECTOR_TYPE_CPU);
+if (ret < 0) {
+fprintf(stderr, "Couldn't set up CPU DR device tree properties\n");
+exit(1);
+}
+}
+
 _FDT((fdt_pack(fdt)));
 
 if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
@@ -1756,6 +1778,7 @@ static void ppc_spapr_init(MachineState *machine)
 char *filename;
 int spapr_packages = smp_cpus / smp_threads;
 int spapr_max_packages = max_cpus / smp_threads;
+int smt = kvmppc_smt_threads();
 
 msi_supported = true;
 
@@ -1822,6 +1845,15 @@ static void ppc_spapr_init(MachineState *machine)
 spapr_validate_node_memory(machine, _fatal);
 }
 
+if (smc->dr_cpu_enabled) {
+for (i = 0; i < spapr_max_packages; i++) {
+sPAPRDRConnector *drc =
+spapr_dr_connector_new(OBJECT(spapr),
+   SPAPR_DR_CONNECTOR_TYPE_CPU, i * smt);
+qemu_register_reset(spapr_drc_reset, drc);
+}
+}
+
 /* init CPUs */
 if (machine->cpu_model == NULL) {
 machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
@@ -2235,6 +2267,88 @@ out:
 error_propagate(errp, local_err);
 }
 
+static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
+   int *fdt_offset,
+   sPAPRMachineState *spapr)
+{
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+DeviceClass *dc = DEVICE_GET_CLASS(cs);
+int id = ppc_get_vcpu_dt_id(cpu);
+void *fdt;
+int offset, fdt_size;
+char *nodename;
+
+fdt = create_device_tree(_size);
+nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
+offset = fdt_add_subnode(fdt, 0, nodename);
+
+spapr_populate_cpu_dt(cs, fdt, offset, spapr);
+g_free(nodename);
+
+*fdt_offset = offset;
+return fdt;
+}
+
+static void spapr_core_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+Error **errp)
+{
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
+sPAPRMachineState *ms = SPAPR_MACHINE(qdev_get_machine());
+sPAPRCPUPackage *package = SPAPR_CPU_PACKAGE(OBJECT(dev));
+PowerPCCPU *cpu = package->thread0;
+CPUState *cs = CPU(cpu);
+int id = ppc_get_vcpu_dt_id(cpu);
+sPAPRDRConnector *drc =
+spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
+sPAPRDRConnectorClass *drck;
+Error *local_err = NULL;
+void *fdt = NULL;
+int fdt_offset = 0;
+
+if (!smc->dr_cpu_enabled) {
+/*
+ * This is a cold plugged CPU core but the machine doesn't support
+ * DR. So skip the hotplug path ensuring that the core is brought
+ * up online with out an associated DR connector.
+ */
+return;
+}
+
+g_assert(drc);
+
+/*
+ * Setup CPU DT entries only for hotplugged CPUs. For boot time or
+ * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
+ */
+if (qdev_hotplug) {
+fdt = spapr_populate_hotplug_cpu_dt(dev, cs, _offset, ms);
+dev->hotplugged = true;
+}
+
+drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+drck->attach(drc, dev, fdt, fdt_offset, 

[Qemu-devel] [RFC PATCH v0 5/8] spapr: Convert boot CPUs into CPU core device initialization

2016-02-21 Thread Bharata B Rao
Initialize boot CPUs specified with -smp option as CPU core devices.
Create as many CPU package devices as necessary to fit in max_cpus and
populate (i,e., realize) only as many as required.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 31 +++
 include/hw/ppc/spapr_cpu_package.h |  1 +
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3892a99..f3913b4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -64,6 +64,8 @@
 
 #include "hw/compat.h"
 #include "qemu-common.h"
+#include "hw/ppc/spapr_cpu_package.h"
+#include "monitor/monitor.h"
 
 #include 
 
@@ -1739,7 +1741,6 @@ static void ppc_spapr_init(MachineState *machine)
 const char *kernel_filename = machine->kernel_filename;
 const char *kernel_cmdline = machine->kernel_cmdline;
 const char *initrd_filename = machine->initrd_filename;
-PowerPCCPU *cpu;
 PCIHostState *phb;
 int i;
 MemoryRegion *sysmem = get_system_memory();
@@ -1753,6 +1754,8 @@ static void ppc_spapr_init(MachineState *machine)
 long load_limit, fw_size;
 bool kernel_le = false;
 char *filename;
+int spapr_packages = smp_cpus / smp_threads;
+int spapr_max_packages = max_cpus / smp_threads;
 
 msi_supported = true;
 
@@ -1825,13 +1828,18 @@ static void ppc_spapr_init(MachineState *machine)
 }
 machine->cpu_type = TYPE_POWERPC_CPU;
 
-for (i = 0; i < smp_cpus; i++) {
-cpu = cpu_ppc_init(machine->cpu_model);
-if (cpu == NULL) {
-error_report("Unable to find PowerPC CPU definition");
-exit(1);
+/*
+ * Create enough CPU package devices for max_cpus and realize the
+ * required number of them.
+ */
+for (i = 0; i < spapr_max_packages; i++) {
+Object *spapr_cpu_package  =
+cpu_package_create_object(TYPE_SPAPR_CPU_PACKAGE, i, _fatal);
+
+if (i < spapr_packages) {
+object_property_set_bool(spapr_cpu_package, true, "realized",
+ _fatal);
 }
-spapr_cpu_init(spapr, cpu, _fatal);
 }
 
 if (kvm_enabled()) {
@@ -2231,6 +2239,7 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
   DeviceState *dev, Error **errp)
 {
 sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
+sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 int node;
@@ -2267,6 +2276,11 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
 }
 
 spapr_memory_plug(hotplug_dev, dev, node, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+CPUState *cs = CPU(dev);
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+spapr_cpu_init(ms, cpu, errp);
 }
 }
 
@@ -2281,7 +2295,8 @@ static void spapr_machine_device_unplug(HotplugHandler 
*hotplug_dev,
 static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
  DeviceState *dev)
 {
-if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 return HOTPLUG_HANDLER(machine);
 }
 return NULL;
diff --git a/include/hw/ppc/spapr_cpu_package.h 
b/include/hw/ppc/spapr_cpu_package.h
index 547dbc1..a3810d8 100644
--- a/include/hw/ppc/spapr_cpu_package.h
+++ b/include/hw/ppc/spapr_cpu_package.h
@@ -10,6 +10,7 @@
 #define HW_SPAPR_CPU_PACKAGE_H
 
 #include "hw/qdev.h"
+#include "hw/cpu/package.h"
 
 #define TYPE_SPAPR_CPU_PACKAGE "spapr-cpu-package"
 #define SPAPR_CPU_PACKAGE(obj) \
-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 0/8] cpu-package hotplug

2016-02-21 Thread Bharata B Rao
Hi,

This is an attempt to implement David Gibson's RFC that was posted at
https://lists.gnu.org/archive/html/qemu-ppc/2016-02/msg0.html
I am not sure if I have followed all the aspects of the RFC fully, but we
can make changes going forward.

An example cpu-package implementation is done for sPAPR in this patchset.
Hot removal is not yet done in this patchset.

For the command line,

-smp 8,sockets=1,cores=1,threads=8,maxcpus=16 -numa node,nodeid=0,cpus=0-7 
-numa node,nodeid=1,cpus=8-15

the HMP query looks like this:

(qemu) info cpu-packages 
CPU Package: ""
  type: "spapr-cpu-package"
  qom_path: "/machine/cpu-package[0]"
  realized: true
  nr_cpus: 8
  CPU: 0
Type: "host-powerpc64-cpu"
Arch ID: 0
Thread: 0
Core: 0
Socket: 0
Node: 0
  CPU: 1
Type: "host-powerpc64-cpu"
Arch ID: 1
Thread: 1
Core: 0
Socket: 0
Node: 0
  CPU: 2
Type: "host-powerpc64-cpu"
Arch ID: 2
Thread: 2
Core: 0
Socket: 0
Node: 0
  CPU: 3
Type: "host-powerpc64-cpu"
Arch ID: 3
Thread: 3
Core: 0
Socket: 0
Node: 0
  CPU: 4
Type: "host-powerpc64-cpu"
Arch ID: 4
Thread: 4
Core: 0
Socket: 0
Node: 0
  CPU: 5
Type: "host-powerpc64-cpu"
Arch ID: 5
Thread: 5
Core: 0
Socket: 0
Node: 0
  CPU: 6
Type: "host-powerpc64-cpu"
Arch ID: 6
Thread: 6
Core: 0
Socket: 0
Node: 0
  CPU: 7
Type: "host-powerpc64-cpu"
Arch ID: 7
Thread: 7
Core: 0
Socket: 0
Node: 0
CPU Package: ""
  type: "spapr-cpu-package"
  qom_path: "/machine/cpu-package[1]"
  realized: false
  nr_cpus: 8

As can be seen from above, all the cores upto max_cpus are created upfront
here and hot plug is done in the following manner:

(qemu) qom-set /machine/cpu-package[1] realized true

This will result in the 2nd cpu-package consisting of a core with 8 threads
to become available.

I am not fully sure if the QMP emumeration here works for all archs, but
just wanted to share what I currently.

Bharata B Rao (8):
  cpu: Store CPU typename in MachineState
  cpu: Don't realize CPU from cpu_generic_init()
  cpu: CPU package abstract device
  spapr: Introduce CPU core device
  spapr: Convert boot CPUs into CPU core device initialization
  spapr: CPU hotplug support
  qmp: Implement query cpu-packages
  hmp: Implement 'info cpu-slots'

 hmp-commands-info.hx   |  14 +++
 hmp.c  |  50 
 hmp.h  |   1 +
 hw/cpu/Makefile.objs   |   1 +
 hw/cpu/package.c   |  85 +
 hw/ppc/Makefile.objs   |   1 +
 hw/ppc/spapr.c | 246 +++--
 hw/ppc/spapr_cpu_package.c |  50 
 hw/ppc/spapr_events.c  |   3 +
 hw/ppc/spapr_rtas.c|  24 
 include/hw/boards.h|   2 +
 include/hw/cpu/package.h   |  27 
 include/hw/ppc/spapr.h |   1 +
 include/hw/ppc/spapr_cpu_package.h |  27 
 qapi-schema.json   |  48 
 qom/cpu.c  |   6 -
 target-arm/helper.c|  16 ++-
 target-cris/cpu.c  |  16 ++-
 target-lm32/helper.c   |  16 ++-
 target-moxie/cpu.c |  16 ++-
 target-openrisc/cpu.c  |  16 ++-
 target-ppc/translate_init.c|  16 ++-
 target-sh4/cpu.c   |  16 ++-
 target-tricore/helper.c|  16 ++-
 target-unicore32/helper.c  |  16 ++-
 25 files changed, 707 insertions(+), 23 deletions(-)
 create mode 100644 hw/cpu/package.c
 create mode 100644 hw/ppc/spapr_cpu_package.c
 create mode 100644 include/hw/cpu/package.h
 create mode 100644 include/hw/ppc/spapr_cpu_package.h

-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 1/8] cpu: Store CPU typename in MachineState

2016-02-21 Thread Bharata B Rao
Storing CPU typename in MachineState lets us to create CPU threads
for all architectures in uniform manner from arch-neutral code.

TODO: Touching only sPAPR target for now

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c  | 2 ++
 include/hw/boards.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5bd8fd3..3892a99 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1823,6 +1823,8 @@ static void ppc_spapr_init(MachineState *machine)
 if (machine->cpu_model == NULL) {
 machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
 }
+machine->cpu_type = TYPE_POWERPC_CPU;
+
 for (i = 0; i < smp_cpus; i++) {
 cpu = cpu_ppc_init(machine->cpu_model);
 if (cpu == NULL) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 0f30959..cf95d10 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -137,6 +137,7 @@ struct MachineState {
 char *kernel_cmdline;
 char *initrd_filename;
 const char *cpu_model;
+const char *cpu_type;
 AccelState *accelerator;
 };
 
-- 
2.1.0




[Qemu-devel] [RFC PATCH v0 2/8] cpu: Don't realize CPU from cpu_generic_init()

2016-02-21 Thread Bharata B Rao
Don't do CPU realization from cpu_generic_init(). With this
cpu_generic_init() will be used to just create CPU threads and they
should be realized separately from realizefn call.

Convert the existing callers to do explicit realization.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
Reviewed-by: Eduardo Habkost 
---
 qom/cpu.c   |  6 --
 target-arm/helper.c | 16 +++-
 target-cris/cpu.c   | 16 +++-
 target-lm32/helper.c| 16 +++-
 target-moxie/cpu.c  | 16 +++-
 target-openrisc/cpu.c   | 16 +++-
 target-ppc/translate_init.c | 16 +++-
 target-sh4/cpu.c| 16 +++-
 target-tricore/helper.c | 16 +++-
 target-unicore32/helper.c   | 16 +++-
 10 files changed, 135 insertions(+), 15 deletions(-)

diff --git a/qom/cpu.c b/qom/cpu.c
index 38dc713..c0211fa 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -64,13 +64,7 @@ CPUState *cpu_generic_init(const char *typename, const char 
*cpu_model)
 featurestr = strtok(NULL, ",");
 cc->parse_features(cpu, featurestr, );
 g_free(str);
-if (err != NULL) {
-goto out;
-}
-
-object_property_set_bool(OBJECT(cpu), true, "realized", );
 
-out:
 if (err != NULL) {
 error_report_err(err);
 object_unref(OBJECT(cpu));
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 5ea507f..d76c55c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4564,7 +4564,21 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 
 ARMCPU *cpu_arm_init(const char *cpu_model)
 {
-return ARM_CPU(cpu_generic_init(TYPE_ARM_CPU, cpu_model));
+CPUState *cpu = cpu_generic_init(TYPE_ARM_CPU, cpu_model);
+Error *err = NULL;
+
+if (!cpu) {
+return NULL;
+}
+
+object_property_set_bool(OBJECT(cpu), true, "realized", );
+if (err != NULL) {
+error_report_err(err);
+object_unref(OBJECT(cpu));
+return NULL;
+} else {
+return ARM_CPU(cpu);
+}
 }
 
 void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
diff --git a/target-cris/cpu.c b/target-cris/cpu.c
index b2c8624..82f0ce5 100644
--- a/target-cris/cpu.c
+++ b/target-cris/cpu.c
@@ -90,7 +90,21 @@ static ObjectClass *cris_cpu_class_by_name(const char 
*cpu_model)
 
 CRISCPU *cpu_cris_init(const char *cpu_model)
 {
-return CRIS_CPU(cpu_generic_init(TYPE_CRIS_CPU, cpu_model));
+CPUState *cpu = cpu_generic_init(TYPE_CRIS_CPU, cpu_model);
+Error *err = NULL;
+
+if (!cpu) {
+return NULL;
+}
+
+object_property_set_bool(OBJECT(cpu), true, "realized", );
+if (err != NULL) {
+error_report_err(err);
+object_unref(OBJECT(cpu));
+return NULL;
+} else {
+return CRIS_CPU(cpu);
+}
 }
 
 /* Sort alphabetically by VR. */
diff --git a/target-lm32/helper.c b/target-lm32/helper.c
index 655248f..4080496 100644
--- a/target-lm32/helper.c
+++ b/target-lm32/helper.c
@@ -220,7 +220,21 @@ bool lm32_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 
 LM32CPU *cpu_lm32_init(const char *cpu_model)
 {
-return LM32_CPU(cpu_generic_init(TYPE_LM32_CPU, cpu_model));
+CPUState *cpu = cpu_generic_init(TYPE_LM32_CPU, cpu_model);
+Error *err = NULL;
+
+if (!cpu) {
+return NULL;
+}
+
+object_property_set_bool(OBJECT(cpu), true, "realized", );
+if (err != NULL) {
+error_report_err(err);
+object_unref(OBJECT(cpu));
+return NULL;
+} else {
+return LM32_CPU(cpu);
+}
 }
 
 /* Some soc ignores the MSB on the address bus. Thus creating a shadow memory
diff --git a/target-moxie/cpu.c b/target-moxie/cpu.c
index b33c2b3..95fc3d0 100644
--- a/target-moxie/cpu.c
+++ b/target-moxie/cpu.c
@@ -153,7 +153,21 @@ static const MoxieCPUInfo moxie_cpus[] = {
 
 MoxieCPU *cpu_moxie_init(const char *cpu_model)
 {
-return MOXIE_CPU(cpu_generic_init(TYPE_MOXIE_CPU, cpu_model));
+CPUState *cpu = cpu_generic_init(TYPE_MOXIE_CPU, cpu_model);
+Error *err = NULL;
+
+if (!cpu) {
+return NULL;
+}
+
+object_property_set_bool(OBJECT(cpu), true, "realized", );
+if (err != NULL) {
+error_report_err(err);
+object_unref(OBJECT(cpu));
+return NULL;
+} else {
+return MOXIE_CPU(cpu);
+}
 }
 
 static void cpu_register(const MoxieCPUInfo *info)
diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
index cafc07f..e82bcfc 100644
--- a/target-openrisc/cpu.c
+++ b/target-openrisc/cpu.c
@@ -223,7 +223,21 @@ static void openrisc_cpu_register_types(void)
 
 OpenRISCCPU *cpu_openrisc_init(const char *cpu_model)
 {
-return OPENRISC_CPU(cpu_generic_init(TYPE_OPENRISC_CPU, cpu_model));
+CPUState *cpu = cpu_generic_init(TYPE_OPENRISC_CPU, cpu_model);
+Error *err = NULL;
+
+if (!cpu) {
+return NULL;
+}
+
+   

Re: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus

2016-02-21 Thread David Gibson
On Fri, Feb 19, 2016 at 04:49:11PM +0100, Igor Mammedov wrote:
> On Fri, 19 Feb 2016 15:38:48 +1100
> David Gibson  wrote:
> 
> CCing thread a couple of libvirt guys.
> 
> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> > > On Thu, 18 Feb 2016 14:39:52 +1100
> > > David Gibson  wrote:
> > >   
> > > > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:  
> > > > > On Mon, 15 Feb 2016 20:43:41 +0100
> > > > > Markus Armbruster  wrote:
> > > > > 
> > > > > > Igor Mammedov  writes:
> > > > > > 
> > > > > > > it will allow mgmt to query present and possible to hotplug CPUs
> > > > > > > it is required from a target platform that wish to support
> > > > > > > command to set board specific MachineClass.possible_cpus() hook,
> > > > > > > which will return a list of possible CPUs with options
> > > > > > > that would be needed for hotplugging possible CPUs.
> > > > > > >
> > > > > > > For RFC there are:
> > > > > > >'arch_id': 'int' - mandatory unique CPU number,
> > > > > > >   for x86 it's APIC ID for ARM it's MPIDR
> > > > > > >'type': 'str' - CPU object type for usage with device_add
> > > > > > >
> > > > > > > and a set of optional fields that would allows mgmt tools
> > > > > > > to know at what granularity and where a new CPU could be
> > > > > > > hotplugged;
> > > > > > > [node],[socket],[core],[thread]
> > > > > > > Hopefully that should cover needs for CPU hotplug porposes for
> > > > > > > magor targets and we can extend structure in future adding
> > > > > > > more fields if it will be needed.
> > > > > > >
> > > > > > > also for present CPUs there is a 'cpu_link' field which
> > > > > > > would allow mgmt inspect whatever object/abstraction
> > > > > > > the target platform considers as CPU object.
> > > > > > >
> > > > > > > For RFC purposes implements only for x86 target so far.  
> > > > > > 
> > > > > > Adding ad hoc queries as we go won't scale.  Could this be solved 
> > > > > > by a
> > > > > > generic introspection interface?
> > > > > Do you mean generic QOM introspection?
> > > > > 
> > > > > Using QOM we could have '/cpus' container and create QOM links
> > > > > for exiting (populated links) and possible (empty links) CPUs.
> > > > > However in that case link's name will need have a special format
> > > > > that will convey an information necessary for mgmt to hotplug
> > > > > a CPU object, at least:
> > > > >   - where: [node],[socket],[core],[thread] options
> > > > >   - optionally what CPU object to use with device_add command
> > > > 
> > > > Hmm.. is it not enough to follow the link and get the topology
> > > > information by examining the target?  
> > > One can't follow a link if it's an empty one, hence
> > > CPU placement information should be provided somehow,
> > > either:  
> > 
> > Ah, right, so the issue is determining the socket/core/thread
> > addresses that cpus which aren't yet present will have.
> > 
> > >  * by precreating cpu-package objects with properties that
> > >would describe it /could be inspected via OQM/  
> > 
> > So, we could do this, but I think the natural way would be to have the
> > information for each potential thread in the package.  Just putting
> > say "core number" in the package itself assumes more than I'd like
> > about how packages sit in the heirarchy.  Plus, it means that
> > management has a bunch of cases to deal with: package has all the
> > information, package has just a core id, package has just a socket id,
> > and so forth.
> > 
> > It is a but clunky that when the package is plugged, this information
> > will have to sit parallel to the array of actual thread links.
> >
> > Markus or Andreas is there a natural way to present a list of (node,
> > socket, core, thread) tuples in the package object?  Preferably
> > without having to create a whole bunch of "potential thread" objects
> > just for the purpose.
> I'm sorry but I couldn't parse above 2 paragraphs. The way I see
> whatever placement info QEMU will provide to mgmt, mgmt will have
> to deal with it in one way or another.
> Perhaps rephrasing and adding some examples might help to explain
> suggestion a bit better?

Ok, so what I'm saying is that I think describing a location for the
package itself could be problematic.  For some cases it will be ok,
but depending on exactly what the package represents on a particular
platform there could be a lot of options for how to represent it.

What I'm suggesting instead is that instead of giving a location for
itself, the package lists the locations of all the threads it will
contain when it is enabled/present/whatever.  Those locations can be
given as node/socket/core/thread tuples - which are properties that
cpu threads already need to have, so we're not making the possible
inadequacy of that information any worse than it already was.

Examples.. so I'm 

Re: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus

2016-02-21 Thread David Gibson
On Fri, Feb 19, 2016 at 10:51:11AM +0100, Markus Armbruster wrote:
> David Gibson  writes:
> 
> > On Thu, Feb 18, 2016 at 11:37:39AM +0100, Igor Mammedov wrote:
> >> On Thu, 18 Feb 2016 14:39:52 +1100
> >> David Gibson  wrote:
> >> 
> >> > On Tue, Feb 16, 2016 at 11:36:55AM +0100, Igor Mammedov wrote:
> >> > > On Mon, 15 Feb 2016 20:43:41 +0100
> >> > > Markus Armbruster  wrote:
> >> > >   
> >> > > > Igor Mammedov  writes:
> >> > > >   
> >> > > > > it will allow mgmt to query present and possible to hotplug CPUs
> >> > > > > it is required from a target platform that wish to support
> >> > > > > command to set board specific MachineClass.possible_cpus() hook,
> >> > > > > which will return a list of possible CPUs with options
> >> > > > > that would be needed for hotplugging possible CPUs.
> >> > > > >
> >> > > > > For RFC there are:
> >> > > > >'arch_id': 'int' - mandatory unique CPU number,
> >> > > > >   for x86 it's APIC ID for ARM it's MPIDR
> >> > > > >'type': 'str' - CPU object type for usage with device_add
> >> > > > >
> >> > > > > and a set of optional fields that would allows mgmt tools
> >> > > > > to know at what granularity and where a new CPU could be
> >> > > > > hotplugged;
> >> > > > > [node],[socket],[core],[thread]
> >> > > > > Hopefully that should cover needs for CPU hotplug porposes for
> >> > > > > magor targets and we can extend structure in future adding
> >> > > > > more fields if it will be needed.
> >> > > > >
> >> > > > > also for present CPUs there is a 'cpu_link' field which
> >> > > > > would allow mgmt inspect whatever object/abstraction
> >> > > > > the target platform considers as CPU object.
> >> > > > >
> >> > > > > For RFC purposes implements only for x86 target so far.
> >> > > > 
> >> > > > Adding ad hoc queries as we go won't scale.  Could this be solved by 
> >> > > > a
> >> > > > generic introspection interface?  
> >> > > Do you mean generic QOM introspection?
> >> > > 
> >> > > Using QOM we could have '/cpus' container and create QOM links
> >> > > for exiting (populated links) and possible (empty links) CPUs.
> >> > > However in that case link's name will need have a special format
> >> > > that will convey an information necessary for mgmt to hotplug
> >> > > a CPU object, at least:
> >> > >   - where: [node],[socket],[core],[thread] options
> >> > >   - optionally what CPU object to use with device_add command  
> >> > 
> >> > Hmm.. is it not enough to follow the link and get the topology
> >> > information by examining the target?
> >> One can't follow a link if it's an empty one, hence
> >> CPU placement information should be provided somehow,
> >> either:
> >
> > Ah, right, so the issue is determining the socket/core/thread
> > addresses that cpus which aren't yet present will have.
> >
> >>  * by precreating cpu-package objects with properties that
> >>would describe it /could be inspected via OQM/
> >
> > So, we could do this, but I think the natural way would be to have the
> > information for each potential thread in the package.  Just putting
> > say "core number" in the package itself assumes more than I'd like
> > about how packages sit in the heirarchy.  Plus, it means that
> > management has a bunch of cases to deal with: package has all the
> > information, package has just a core id, package has just a socket id,
> > and so forth.
> >
> > It is a but clunky that when the package is plugged, this information
> > will have to sit parallel to the array of actual thread links.
> >
> > Markus or Andreas is there a natural way to present a list of (node,
> > socket, core, thread) tuples in the package object?  Preferably
> > without having to create a whole bunch of "potential thread" objects
> > just for the purpose.
> 
> I'm just a dabbler when it comes to QOM, but I can try.
> 
> I view a concrete cpu-package device (subtype of the abstract
> cpu-package device) as a composite device containing stuff like actual
> cores.

So.. the idea is it's a bit more abstract than that.  My intention is
that the package lists - in some manner - each of the threads
(i.e. vcpus) it contains / can contain.  Depending on the platform it
*might* also have internal structure such as cores / sockets, but it
doesn't have to.  Either way, the contained threads will be listed in
a common way, as a flat array.

> To create a composite device, you start with the outer shell, then plug
> in components one by one.  Components can be nested arbitrarily deep.
> 
> Perhaps you can define the concrete cpu-package shell in a way that lets
> you query what you need to know from a mere shell (no components
> plugged).

Right.. that's exactly what I'm suggesting, but I don't know enough
about the presentation of basic data in QOM to know quite how to
accomplish it.

> >> or
> >>  * via QMP/HMP command that would provide the same information
> >>   

Re: [Qemu-devel] [PATCH] migration: reorder code to make it symmetric

2016-02-21 Thread Amit Shah
On (Thu) 04 Feb 2016 [22:50:30], Wei Yang wrote:
> In qemu_savevm_state_complete_precopy(), it iterates on each device to add
> a json object and transfer related status to destination, while the order
> of the last two steps could be refined.
> 
> Current order:
> 
> json_start_object()
>   save_section_header()
>   vmstate_save()
> json_end_object()
>   save_section_footer()
> 
> After the change:
> 
> json_start_object()
>   save_section_header()
>   vmstate_save()
>   save_section_footer()
> json_end_object()
> 
> This patch reorder the code to to make it symmetric. No functional change.
> 
> Signed-off-by: Wei Yang 

Reviewed-by: Amit Shah 

Thanks,

Amit



[Qemu-devel] [Bug 1548166] [NEW] QEMU crash after send data from Host through serial port

2016-02-21 Thread Sugar
Public bug reported:

Hi All

I have two computer, one is Win7 32 another is Win7 64, Both computer meet this 
issue.
My QEMU version is qemu-w32-setup-20160215

I want used EDK2 OVMF with Intel UDK Debugger tools to do source level debug
I had install com0com Virtual Com Port, and set COM3 connect to COM4

Intel UDK Debugger tools used COM3
QEMU run OVMF used COM4

First execute Intel UDK Debugger tools, then launch QEMU
C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
Then QEMU crashes on stratup

I have do some experiment
Execute terminal tool Tera Term and used COM3
launch QEMU and used COM4
C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
This is fine and i can see OVMF trace log on terminal
But if i press "Down" key on terminal, then QEMU crashe
It's caused by terminal send data("Down" key) to QEMU

Have somebody can share some information about this?

Thanks a lot.
Sugar

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1548166

Title:
  QEMU crash after send data from Host through serial port

Status in QEMU:
  New

Bug description:
  Hi All

  I have two computer, one is Win7 32 another is Win7 64, Both computer meet 
this issue.
  My QEMU version is qemu-w32-setup-20160215

  I want used EDK2 OVMF with Intel UDK Debugger tools to do source level debug
  I had install com0com Virtual Com Port, and set COM3 connect to COM4

  Intel UDK Debugger tools used COM3
  QEMU run OVMF used COM4

  First execute Intel UDK Debugger tools, then launch QEMU
  C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
  Then QEMU crashes on stratup

  I have do some experiment
  Execute terminal tool Tera Term and used COM3
  launch QEMU and used COM4
  C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
  This is fine and i can see OVMF trace log on terminal
  But if i press "Down" key on terminal, then QEMU crashe
  It's caused by terminal send data("Down" key) to QEMU

  Have somebody can share some information about this?

  Thanks a lot.
  Sugar

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1548166/+subscriptions



[Qemu-devel] [Bug 1423124] Re: QEMU crash after sending data on host serial port

2016-02-21 Thread Sugar
** Changed in: qemu
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1423124

Title:
  QEMU crash after sending data on host serial port

Status in QEMU:
  Fix Released

Bug description:
  Good morning,

  I'm using QEMU for Windows last version.
  The host system is Windows 7 64bits.
  I'm excuting the following statment :

  qemu-system-x86_64w.exe -hda debian.img -m 256 -net nic -net
  tap,ifname=TAP32 -soundhw all -serial COM9

  Qemu starts the emulated Debian and it runs correctly.

  If I try to send data from Windows using COM9 to QEMU (both "real" or 
emulated by the COM0COM driver), QEMU crashes. Windows dump available if 
required.
  If I try to send data to /dev/ttyS0 (that should be the Linux side of COM9) 
from Debian, again, the wirtual machine crashes.

  More details if necessary
  Best regards
  U.Poddine

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1423124/+subscriptions



Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region

2016-02-21 Thread Gonglei (Arei)
Hi Fam,

> From: Fam Zheng [mailto:f...@redhat.com]
> Sent: Monday, February 22, 2016 10:46 AM
> 
> On Sat, 02/20 10:35, Gonglei wrote:
> > Each RAM memory region has a unique corresponding RAMBlock.
> > In the current realization, the memory region only stored
> > the ram_addr which means the offset of RAM address space,
> > We need to qurey the global ram.list to find the ram block
> > by ram_addr if we want to get the ram block, which is very
> > expensive.
> >
> > Now, we store the RAMBlock pointer into memory region
> > structure. So, if we know the mr, we can easily get the
> > RAMBlock.
> >
> > Signed-off-by: Gonglei 
> > ---
> >  exec.c| 2 ++
> >  include/exec/memory.h | 1 +
> >  memory.c  | 1 +
> >  3 files changed, 4 insertions(+)
> >
> > diff --git a/exec.c b/exec.c
> > index 1f24500..e29e369 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1717,6 +1717,8 @@ ram_addr_t
> qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
> >  error_propagate(errp, local_err);
> >  return -1;
> >  }
> > +/* store the ram block pointer into memroy region */
> 
> The comment is superfluous IMHO, the code is quite self-explanatory.
> 
Yes, agree.

> > +mr->ram_block = new_block;
> >  return addr;
> >  }
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index c92734a..23e2e3e 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -172,6 +172,7 @@ struct MemoryRegion {
> >  bool global_locking;
> >  uint8_t dirty_log_mask;
> >  ram_addr_t ram_addr;
> > +void *ram_block;   /* RAMBlock pointer */
> 
> Why not add
> 
> typedef struct RAMBlock RAMBlock;
> 
> then
> 
> RAMBlock *ram_block;
> 
> ?
> 
It's clearer. Will fix in v2, thanks :)

Regards,
-Gonglei

> >  Object *owner;
> >  const MemoryRegionIOMMUOps *iommu_ops;
> >
> > diff --git a/memory.c b/memory.c
> > index 09041ed..b4451dd 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr,
> >  }
> >  mr->name = g_strdup(name);
> >  mr->owner = owner;
> > +mr->ram_block = NULL;
> >
> >  if (name) {
> >  char *escaped_name = memory_region_escape_name(name);
> > --
> > 1.8.5.2
> >
> >
> >



[Qemu-devel] [PATCH v4 1/9] hw/timer: QOM'ify etraxfs_timer

2016-02-21 Thread xiaoqiang zhao
assign etraxfs_timer_init to etraxfs_timer_info.instance_init
and drop the SysBusDeviceClass::init

Reviewed-by: Edgar E. Iglesias 
Signed-off-by: xiaoqiang zhao 
---
 hw/timer/etraxfs_timer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/timer/etraxfs_timer.c b/hw/timer/etraxfs_timer.c
index 36d8f46..4f115c7 100644
--- a/hw/timer/etraxfs_timer.c
+++ b/hw/timer/etraxfs_timer.c
@@ -315,9 +315,10 @@ static void etraxfs_timer_reset(void *opaque)
 qemu_irq_lower(t->irq);
 }
 
-static int etraxfs_timer_init(SysBusDevice *dev)
+static void etraxfs_timer_init(Object *obj)
 {
-ETRAXTimerState *t = ETRAX_TIMER(dev);
+ETRAXTimerState *t = ETRAX_TIMER(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 t->bh_t0 = qemu_bh_new(timer0_hit, t);
 t->bh_t1 = qemu_bh_new(timer1_hit, t);
@@ -329,24 +330,23 @@ static int etraxfs_timer_init(SysBusDevice *dev)
 sysbus_init_irq(dev, >irq);
 sysbus_init_irq(dev, >nmi);
 
-memory_region_init_io(>mmio, OBJECT(t), _ops, t,
+memory_region_init_io(>mmio, obj, _ops, t,
   "etraxfs-timer", 0x5c);
 sysbus_init_mmio(dev, >mmio);
-qemu_register_reset(etraxfs_timer_reset, t);
-return 0;
 }
 
 static void etraxfs_timer_class_init(ObjectClass *klass, void *data)
 {
-SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
+DeviceClass *dc = DEVICE_CLASS(klass);
 
-sdc->init = etraxfs_timer_init;
+dc->reset = etraxfs_timer_reset;
 }
 
 static const TypeInfo etraxfs_timer_info = {
 .name  = TYPE_ETRAX_FS_TIMER,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(ETRAXTimerState),
+.instance_init = etraxfs_timer_init,
 .class_init= etraxfs_timer_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v4 8/9] hw/timer: QOM'ify slavio_timer

2016-02-21 Thread xiaoqiang zhao
rename slavio_timer_init1 to slavio_timer_init and assign
it to slavio_timer_info.instance_init, then we drop the
SysBusDeviceClass::init

Signed-off-by: xiaoqiang zhao 
---
 hw/timer/slavio_timer.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
index fb3e08b..b2c9364 100644
--- a/hw/timer/slavio_timer.c
+++ b/hw/timer/slavio_timer.c
@@ -373,9 +373,10 @@ static void slavio_timer_reset(DeviceState *d)
 s->cputimer_mode = 0;
 }
 
-static int slavio_timer_init1(SysBusDevice *dev)
+static void slavio_timer_init(Object *obj)
 {
-SLAVIO_TIMERState *s = SLAVIO_TIMER(dev);
+SLAVIO_TIMERState *s = SLAVIO_TIMER(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 QEMUBH *bh;
 unsigned int i;
 TimerContext *tc;
@@ -394,14 +395,12 @@ static int slavio_timer_init1(SysBusDevice *dev)
 
 size = i == 0 ? SYS_TIMER_SIZE : CPU_TIMER_SIZE;
 snprintf(timer_name, sizeof(timer_name), "timer-%i", i);
-memory_region_init_io(>iomem, OBJECT(s), _timer_mem_ops, tc,
+memory_region_init_io(>iomem, obj, _timer_mem_ops, tc,
   timer_name, size);
 sysbus_init_mmio(dev, >iomem);
 
 sysbus_init_irq(dev, >cputimer[i].irq);
 }
-
-return 0;
 }
 
 static Property slavio_timer_properties[] = {
@@ -412,9 +411,7 @@ static Property slavio_timer_properties[] = {
 static void slavio_timer_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = slavio_timer_init1;
 dc->reset = slavio_timer_reset;
 dc->vmsd = _slavio_timer;
 dc->props = slavio_timer_properties;
@@ -424,6 +421,7 @@ static const TypeInfo slavio_timer_info = {
 .name  = TYPE_SLAVIO_TIMER,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(SLAVIO_TIMERState),
+.instance_init = slavio_timer_init,
 .class_init= slavio_timer_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v4 4/9] hw/timer: QOM'ify m48txx_sysbus (pass 1)

2016-02-21 Thread xiaoqiang zhao
* split the old SysBus init function into an instance_init
  and a Device realize function
* use DeviceClass::realize instead of SysBusDeviceClass::init

Signed-off-by: xiaoqiang zhao 
---
 hw/timer/m48t59.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/hw/timer/m48t59.c b/hw/timer/m48t59.c
index bbcfeb2..3c683aa 100644
--- a/hw/timer/m48t59.c
+++ b/hw/timer/m48t59.c
@@ -763,30 +763,31 @@ static void m48t59_isa_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
-static int m48t59_init1(SysBusDevice *dev)
+static void m48t59_init1(Object *obj)
 {
-M48txxSysBusDeviceClass *u = M48TXX_SYS_BUS_GET_CLASS(dev);
-M48txxSysBusState *d = M48TXX_SYS_BUS(dev);
-Object *o = OBJECT(dev);
+M48txxSysBusDeviceClass *u = M48TXX_SYS_BUS_GET_CLASS(obj);
+M48txxSysBusState *d = M48TXX_SYS_BUS(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 M48t59State *s = >state;
-Error *err = NULL;
 
 s->model = u->info.model;
 s->size = u->info.size;
 sysbus_init_irq(dev, >IRQ);
 
-memory_region_init_io(>iomem, o, _ops, s, "m48t59.nvram",
+memory_region_init_io(>iomem, obj, _ops, s, "m48t59.nvram",
   s->size);
-memory_region_init_io(>io, o, _io_ops, s, "m48t59", 4);
-sysbus_init_mmio(dev, >iomem);
-sysbus_init_mmio(dev, >io);
-m48t59_realize_common(s, );
-if (err != NULL) {
-error_free(err);
-return -1;
-}
+memory_region_init_io(>io, obj, _io_ops, s, "m48t59", 4);
+}
 
-return 0;
+static void m48t59_realize(DeviceState *dev, Error **errp)
+{
+M48txxSysBusState *d = M48TXX_SYS_BUS(dev);
+M48t59State *s = >state;
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+
+sysbus_init_mmio(sbd, >iomem);
+sysbus_init_mmio(sbd, >io);
+m48t59_realize_common(s, errp);
 }
 
 static uint32_t m48txx_isa_read(Nvram *obj, uint32_t addr)
@@ -860,10 +861,9 @@ static Property m48t59_sysbus_properties[] = {
 static void m48txx_sysbus_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 NvramClass *nc = NVRAM_CLASS(klass);
 
-k->init = m48t59_init1;
+dc->realize = m48t59_realize;
 dc->reset = m48t59_reset_sysbus;
 dc->props = m48t59_sysbus_properties;
 nc->read = m48txx_sysbus_read;
@@ -889,6 +889,7 @@ static const TypeInfo m48txx_sysbus_type_info = {
 .name = TYPE_M48TXX_SYS_BUS,
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(M48txxSysBusState),
+.instance_init = m48t59_init1,
 .abstract = true,
 .class_init = m48txx_sysbus_class_init,
 .interfaces = (InterfaceInfo[]) {
-- 
2.1.4





[Qemu-devel] [PATCH v4 0/9] QOM'ify hw/timer/*

2016-02-21 Thread xiaoqiang zhao
This patch series QOM'ify timer code under hw/timer directory.
Main idea is to split the initfn's work, some to TypeInfo.instance_init
and some is placed in DeviceClass::realize.
Drop the use of SysBusDeviceClass::init if possible.

Some patches in v3:
  hw/timer: QOM'ify arm_timer (pass 1)
  hw/timer: QOM'ify arm_timer (pass 2)
  hw/timer: QOM'ify exynos4210_mct
  hw/timer: QOM'ify exynos4210_pwm
  hw/timer: QOM'ify exynos4210_rtc
  hw/timer: QOM'ify pl031
  hw/timer: QOM'ify pxa2xx_timer
have been taken by Peter Maydell , others
still need some comments from the relevant maintainers.

changes in v4: 
* correct some misused "Reviewed-by" tags
* fix 'make check' fail case in the "/arm/device/introspect/concrete"
  test in tusb6010.c 

changes in v3: 
* remove unnecessary OBJECT cast
* refine some commit message
* use DeviceClass::vmsd instead of vmstate_register to register
  the VMState if possible

changes in v2: 
fix a stupid typo (timmer->timer)

xiaoqiang zhao (9):
  hw/timer: QOM'ify etraxfs_timer
  hw/timer: QOM'ify grlib_gptimer
  hw/timer: QOM'ify lm32_timer
  hw/timer: QOM'ify m48txx_sysbus (pass 1)
  hw/timer: QOM'ify m48txx_sysbus (pass 2)
  hw/timer: QOM'ify milkymist_sysctl
  hw/timer: QOM'ify puv3_ost
  hw/timer: QOM'ify slavio_timer
  hw/timer: QOM'ify tusb6010

 hw/timer/etraxfs_timer.c| 14 +++---
 hw/timer/grlib_gptimer.c| 30 ++
 hw/timer/lm32_timer.c   | 19 ---
 hw/timer/m48t59.c   | 39 ---
 hw/timer/milkymist-sysctl.c | 21 +
 hw/timer/puv3_ost.c | 18 +-
 hw/timer/slavio_timer.c | 12 +---
 hw/timer/tusb6010.c | 23 ---
 8 files changed, 96 insertions(+), 80 deletions(-)

-- 
2.1.4





[Qemu-devel] [PATCH v4 7/9] hw/timer: QOM'ify puv3_ost

2016-02-21 Thread xiaoqiang zhao
assign puv3_ost_init to puv3_ost_info.instance_init
and drop the SysBusDeviceClass::init

Signed-off-by: xiaoqiang zhao 
---
 hw/timer/puv3_ost.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/hw/timer/puv3_ost.c b/hw/timer/puv3_ost.c
index 93650b7..72c87ba 100644
--- a/hw/timer/puv3_ost.c
+++ b/hw/timer/puv3_ost.c
@@ -113,9 +113,10 @@ static void puv3_ost_tick(void *opaque)
 }
 }
 
-static int puv3_ost_init(SysBusDevice *dev)
+static void puv3_ost_init(Object *obj)
 {
-PUV3OSTState *s = PUV3_OST(dev);
+PUV3OSTState *s = PUV3_OST(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 s->reg_OIER = 0;
 s->reg_OSSR = 0;
@@ -128,25 +129,16 @@ static int puv3_ost_init(SysBusDevice *dev)
 s->ptimer = ptimer_init(s->bh);
 ptimer_set_freq(s->ptimer, 50 * 1000 * 1000);
 
-memory_region_init_io(>iomem, OBJECT(s), _ost_ops, s, "puv3_ost",
+memory_region_init_io(>iomem, obj, _ost_ops, s, "puv3_ost",
 PUV3_REGS_OFFSET);
 sysbus_init_mmio(dev, >iomem);
-
-return 0;
-}
-
-static void puv3_ost_class_init(ObjectClass *klass, void *data)
-{
-SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
-
-sdc->init = puv3_ost_init;
 }
 
 static const TypeInfo puv3_ost_info = {
 .name = TYPE_PUV3_OST,
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(PUV3OSTState),
-.class_init = puv3_ost_class_init,
+.instance_init = puv3_ost_init,
 };
 
 static void puv3_ost_register_type(void)
-- 
2.1.4





[Qemu-devel] [PATCH v4 6/9] hw/timer: QOM'ify milkymist_sysctl

2016-02-21 Thread xiaoqiang zhao
* split the old SysBus init function into an instance_init
  and a Device realize function
* use DeviceClass::realize instead of SysBusDeviceClass::init

Reviewed-by: Peter Maydell 
Signed-off-by: xiaoqiang zhao 
---
 hw/timer/milkymist-sysctl.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/timer/milkymist-sysctl.c b/hw/timer/milkymist-sysctl.c
index 5f29480..30a4bc4 100644
--- a/hw/timer/milkymist-sysctl.c
+++ b/hw/timer/milkymist-sysctl.c
@@ -270,9 +270,10 @@ static void milkymist_sysctl_reset(DeviceState *d)
 s->regs[R_GPIO_IN] = s->strappings;
 }
 
-static int milkymist_sysctl_init(SysBusDevice *dev)
+static void milkymist_sysctl_init(Object *obj)
 {
-MilkymistSysctlState *s = MILKYMIST_SYSCTL(dev);
+MilkymistSysctlState *s = MILKYMIST_SYSCTL(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 sysbus_init_irq(dev, >gpio_irq);
 sysbus_init_irq(dev, >timer0_irq);
@@ -282,14 +283,18 @@ static int milkymist_sysctl_init(SysBusDevice *dev)
 s->bh1 = qemu_bh_new(timer1_hit, s);
 s->ptimer0 = ptimer_init(s->bh0);
 s->ptimer1 = ptimer_init(s->bh1);
-ptimer_set_freq(s->ptimer0, s->freq_hz);
-ptimer_set_freq(s->ptimer1, s->freq_hz);
 
-memory_region_init_io(>regs_region, OBJECT(s), _mmio_ops, s,
+memory_region_init_io(>regs_region, obj, _mmio_ops, s,
 "milkymist-sysctl", R_MAX * 4);
 sysbus_init_mmio(dev, >regs_region);
+}
 
-return 0;
+static void milkymist_sysctl_realize(DeviceState *dev, Error **errp)
+{
+MilkymistSysctlState *s = MILKYMIST_SYSCTL(dev);
+
+ptimer_set_freq(s->ptimer0, s->freq_hz);
+ptimer_set_freq(s->ptimer1, s->freq_hz);
 }
 
 static const VMStateDescription vmstate_milkymist_sysctl = {
@@ -319,9 +324,8 @@ static Property milkymist_sysctl_properties[] = {
 static void milkymist_sysctl_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = milkymist_sysctl_init;
+dc->realize = milkymist_sysctl_realize;
 dc->reset = milkymist_sysctl_reset;
 dc->vmsd = _milkymist_sysctl;
 dc->props = milkymist_sysctl_properties;
@@ -331,6 +335,7 @@ static const TypeInfo milkymist_sysctl_info = {
 .name  = TYPE_MILKYMIST_SYSCTL,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(MilkymistSysctlState),
+.instance_init = milkymist_sysctl_init,
 .class_init= milkymist_sysctl_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v4 9/9] hw/timer: QOM'ify tusb6010

2016-02-21 Thread xiaoqiang zhao
Move majority of old SysBus init's work the into instance_init.

Note:
musb_init must be called in SysBus's init, otherwise it will
break "make check" with error message as follows:

qom/object.c:1576:object_get_canonical_path_component: assertion failed: 
(obj->parent != NULL)

Signed-off-by: xiaoqiang zhao 
---
 hw/timer/tusb6010.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/hw/timer/tusb6010.c b/hw/timer/tusb6010.c
index 9f6af90..2d6bdd3 100644
--- a/hw/timer/tusb6010.c
+++ b/hw/timer/tusb6010.c
@@ -776,21 +776,29 @@ static void tusb6010_reset(DeviceState *dev)
 musb_reset(s->musb);
 }
 
-static int tusb6010_init(SysBusDevice *sbd)
+static void tusb6010_init(Object *obj)
 {
-DeviceState *dev = DEVICE(sbd);
-TUSBState *s = TUSB(dev);
+DeviceState *dev = DEVICE(obj);
+TUSBState *s = TUSB(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 
 s->otg_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, tusb_otg_tick, s);
 s->pwr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, tusb_power_tick, s);
-memory_region_init_io(>iomem[1], OBJECT(s), _async_ops, s,
+memory_region_init_io(>iomem[1], obj, _async_ops, s,
   "tusb-async", UINT32_MAX);
 sysbus_init_mmio(sbd, >iomem[0]);
 sysbus_init_mmio(sbd, >iomem[1]);
 sysbus_init_irq(sbd, >irq);
 qdev_init_gpio_in(dev, tusb6010_irq, musb_irq_max + 1);
-s->musb = musb_init(dev, 1);
-return 0;
+}
+
+static int tusb6010_bus_init(SysBusDevice *sbd)
+{
+   TUSBState *s = TUSB(sbd);
+   DeviceState *dev = DEVICE(sbd);
+
+   s->musb = musb_init(dev, 1);
+   return 0;
 }
 
 static void tusb6010_class_init(ObjectClass *klass, void *data)
@@ -798,7 +806,7 @@ static void tusb6010_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = tusb6010_init;
+k->init = tusb6010_bus_init;
 dc->reset = tusb6010_reset;
 }
 
@@ -806,6 +814,7 @@ static const TypeInfo tusb6010_info = {
 .name  = TYPE_TUSB6010,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(TUSBState),
+.instance_init = tusb6010_init,
 .class_init= tusb6010_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v4 3/9] hw/timer: QOM'ify lm32_timer

2016-02-21 Thread xiaoqiang zhao
* split the old SysBus init function into an instance_init
  and a Device realize function
* use DeviceClass::realize instead of SysBusDeviceClass::init

Reviewed-by: Peter Maydell 
Signed-off-by: xiaoqiang zhao 
---
 hw/timer/lm32_timer.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/timer/lm32_timer.c b/hw/timer/lm32_timer.c
index 3198355..e45a65b 100644
--- a/hw/timer/lm32_timer.c
+++ b/hw/timer/lm32_timer.c
@@ -176,21 +176,26 @@ static void timer_reset(DeviceState *d)
 ptimer_stop(s->ptimer);
 }
 
-static int lm32_timer_init(SysBusDevice *dev)
+static void lm32_timer_init(Object *obj)
 {
-LM32TimerState *s = LM32_TIMER(dev);
+LM32TimerState *s = LM32_TIMER(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 sysbus_init_irq(dev, >irq);
 
 s->bh = qemu_bh_new(timer_hit, s);
 s->ptimer = ptimer_init(s->bh);
-ptimer_set_freq(s->ptimer, s->freq_hz);
 
-memory_region_init_io(>iomem, OBJECT(s), _ops, s,
+memory_region_init_io(>iomem, obj, _ops, s,
   "timer", R_MAX * 4);
 sysbus_init_mmio(dev, >iomem);
+}
 
-return 0;
+static void lm32_timer_realize(DeviceState *dev, Error **errp)
+{
+LM32TimerState *s = LM32_TIMER(dev);
+
+ptimer_set_freq(s->ptimer, s->freq_hz);
 }
 
 static const VMStateDescription vmstate_lm32_timer = {
@@ -213,9 +218,8 @@ static Property lm32_timer_properties[] = {
 static void lm32_timer_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = lm32_timer_init;
+dc->realize = lm32_timer_realize;
 dc->reset = timer_reset;
 dc->vmsd = _lm32_timer;
 dc->props = lm32_timer_properties;
@@ -225,6 +229,7 @@ static const TypeInfo lm32_timer_info = {
 .name  = TYPE_LM32_TIMER,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(LM32TimerState),
+.instance_init = lm32_timer_init,
 .class_init= lm32_timer_class_init,
 };
 
-- 
2.1.4





[Qemu-devel] [PATCH v4 2/9] hw/timer: QOM'ify grlib_gptimer

2016-02-21 Thread xiaoqiang zhao
* split the old SysBus init function into an instance_init
  and a Device realize function
* use DeviceClass::realize instead of SysBusDeviceClass::init

Signed-off-by: xiaoqiang zhao 
---
 hw/timer/grlib_gptimer.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/hw/timer/grlib_gptimer.c b/hw/timer/grlib_gptimer.c
index dd000f5..02a19c9 100644
--- a/hw/timer/grlib_gptimer.c
+++ b/hw/timer/grlib_gptimer.c
@@ -348,16 +348,29 @@ static void grlib_gptimer_reset(DeviceState *d)
 }
 }
 
-static int grlib_gptimer_init(SysBusDevice *dev)
+static void grlib_gptimer_init(Object *obj)
 {
-GPTimerUnit  *unit = GRLIB_GPTIMER(dev);
-unsigned int  i;
+GPTimerUnit  *unit = GRLIB_GPTIMER(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 assert(unit->nr_timers > 0);
 assert(unit->nr_timers <= GPTIMER_MAX_TIMERS);
 
 unit->timers = g_malloc0(sizeof unit->timers[0] * unit->nr_timers);
 
+memory_region_init_io(>iomem, obj, _gptimer_ops,
+  unit, "gptimer",
+  UNIT_REG_SIZE + GPTIMER_REG_SIZE * unit->nr_timers);
+
+sysbus_init_mmio(dev, >iomem);
+}
+
+static void grlib_gptimer_realize(DeviceState *dev, Error *errp)
+{
+GPTimerUnit  *unit = GRLIB_GPTIMER(dev);
+SysBusDevice *dev = SYS_BUS_DEVICE(dev);
+unsigned int  i;
+
 for (i = 0; i < unit->nr_timers; i++) {
 GPTimer *timer = >timers[i];
 
@@ -371,13 +384,6 @@ static int grlib_gptimer_init(SysBusDevice *dev)
 
 ptimer_set_freq(timer->ptimer, unit->freq_hz);
 }
-
-memory_region_init_io(>iomem, OBJECT(unit), _gptimer_ops,
-  unit, "gptimer",
-  UNIT_REG_SIZE + GPTIMER_REG_SIZE * unit->nr_timers);
-
-sysbus_init_mmio(dev, >iomem);
-return 0;
 }
 
 static Property grlib_gptimer_properties[] = {
@@ -390,9 +396,8 @@ static Property grlib_gptimer_properties[] = {
 static void grlib_gptimer_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = grlib_gptimer_init;
+dc->realize = grlib_gptimer_realize;
 dc->reset = grlib_gptimer_reset;
 dc->props = grlib_gptimer_properties;
 }
@@ -401,6 +406,7 @@ static const TypeInfo grlib_gptimer_info = {
 .name  = TYPE_GRLIB_GPTIMER,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(GPTimerUnit),
+.instance_init = grlib_gptimer_init,
 .class_init= grlib_gptimer_class_init,
 };
 
-- 
2.1.4





Re: [Qemu-devel] [PATCH v2 1/1] quorum: Change vote rules for 64 bits hash

2016-02-21 Thread Changlong Xie

On 02/20/2016 10:28 PM, Max Reitz wrote:

On 19.02.2016 12:24, Alberto Garcia wrote:

On Fri 19 Feb 2016 09:26:53 AM CET, Wen Congyang  wrote:


If quorum has two children(A, B). A do flush sucessfully, but B
flush failed.  We MUST choice A as winner rather than just pick
anyone of them. Otherwise the filesystem of guest will become
read-only with following errors:

end_request: I/O error, dev vda, sector 11159960
Aborting journal on device vda3-8
EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal
EXT4-fs (vda3): Remounting filesystem read-only


Hi Xie,

Let's see if I'm getting this right:

- When Quorum flushes to disk, there's a vote among the return values of
   the flush operations of its members, and the one that wins is the one
   that Quorum returns.

- If there's a tie then Quorum choses the first result from the list of
   winners.

- With your patch you want to give priority to the vote with result == 0
   if there's any, so Quorum would return 0 (and succeed).

This seems to me like an ad-hoc fix for a particular use case. What
if you have 3 members and two of them fail with the same error code?
Would you still return 0 or the error code from the other two?


For example:
children.0 returns 0
children.1 returns -EIO
children.2 returns -EPIPE

In this case, quorum returns -EPIPE now(without this patch).

For example:
children.0 returns -EPIPE
children.1 returns -EIO
children.2 returns 0
In this case, quorum returns 0 now.


My question is: what's the rationale for returning 0 in case a) but not
in case b)?

   a)
 children.0 returns -EPIPE
 children.1 returns -EIO
 children.2 returns 0

   b)
 children.0 returns -EIO
 children.1 returns -EIO
 children.2 returns 0

In both cases you have one successful flush and two errors. You want to
return always 0 in case a) and always -EIO in case b). But the only
difference is that in case b) the errors happen to be the same, so why
does that matter?

That said, I'm not very convinced of the current logics of the Quorum
flush code either, so it's not even a problem with your patch... it
seems to me that the code should follow the same logics as in the
read/write case: if the number of correct flushes >= threshold then
return 0, else select the most common error code.


I'm not convinced of the logic either, which is why I waited for you to
respond to this patch. :-)

Intuitively, I'd expect Quorum to return an error if flushing failed for
any of the children, because, well, flushing failed. I somehow feel like
flushing is different from a read or write operation and therefore
ignoring the threshold would be fine here. However, maybe my intuition
is just off.

Anyway, regardless of that, if we do take the threshold into account, we
should not use the exact error value for voting but just whether an
error occurred or not. If all but one children fail to flush (all for
different reasons), I find it totally wrong to return success. We should
then just return -EIO or something.


Hi Berto & Max

Thanks for your comments, i'd like to have a summary here. For flush cases:

1) if flush successfully(result >= 0), result = 0; else if result < 0, 
result = -EIO. then invoke quorum_count_vote

2) if correct flushes >= threshold, mark correct flushes as winner directly.

Will fix in next version.

Thanks
-Xie

Max







Re: [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy

2016-02-21 Thread Zhang Chen



On 02/20/2016 03:58 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

From: zhangchen 

Add interface used by migration/colo.c
so colo framework can work with proxy

Signed-off-by: zhangchen 
Signed-off-by: zhanghailiang 
---
  net/colo-proxy.c | 93 
  1 file changed, 93 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index f448ee1..ba2bbe7 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -167,6 +167,11 @@ static int connection_key_equal(const void *opaque1, const 
void *opaque2)
  return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
  }
  
+bool colo_proxy_query_checkpoint(void)

+{
+return colo_do_checkpoint;
+}
+
  static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
   NetClientState *sender,
   unsigned flags,
@@ -203,6 +208,94 @@ static void colo_proxy_cleanup(NetFilterState *nf)
  qemu_event_destroy(>need_compare_ev);
  }
  
+static void colo_proxy_notify_checkpoint(void)

+{
+trace_colo_proxy("colo_proxy_notify_checkpoint");
+colo_do_checkpoint = true;
+}
+
+static void colo_proxy_start_one(NetFilterState *nf,
+  void *opaque, Error **errp)
+{
+COLOProxyState *s;
+int mode, ret;
+
+if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
+return;
+}
+
+mode = *(int *)opaque;
+s = FILTER_COLO_PROXY(nf);
+assert(s->colo_mode == mode);
+
+if (s->colo_mode == COLO_MODE_PRIMARY) {
+char thread_name[1024];
+
+ret = colo_proxy_connect(s);
+if (ret) {
+error_setg(errp, "colo proxy connect failed");
+return ;
+}
+
+s->status = COLO_PROXY_RUNNING;
+sprintf(thread_name, "proxy compare %s", nf->netdev_id);
+qemu_thread_create(>thread, thread_name,
+colo_proxy_compare_thread, s,
+QEMU_THREAD_JOINABLE);

Note most OSs have a ~14 character limit on the size of the thread
name, otherwise they ignore the request to set the name (and the
thread shows up as 'migration'), so I suggest keep it as "proxy:%s".

Dave


I will fix it in colo-compare module.

Thanks
zhangchen


+} else {
+ret = colo_wait_incoming(s);
+if (ret) {
+error_setg(errp, "colo proxy wait incoming failed");
+return ;
+}
+s->status = COLO_PROXY_RUNNING;
+}
+}
+
+int colo_proxy_start(int mode)
+{
+Error *err = NULL;
+qemu_foreach_netfilter(colo_proxy_start_one, , );
+if (err) {
+return -1;
+}
+return 0;
+}
+
+static void colo_proxy_stop_one(NetFilterState *nf,
+  void *opaque, Error **errp)
+{
+COLOProxyState *s;
+int mode;
+
+if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
+return;
+}
+
+s = FILTER_COLO_PROXY(nf);
+mode = *(int *)opaque;
+assert(s->colo_mode == mode);
+
+s->status = COLO_PROXY_DONE;
+if (s->sockfd >= 0) {
+qemu_set_fd_handler(s->sockfd, NULL, NULL, NULL);
+closesocket(s->sockfd);
+}
+if (s->colo_mode == COLO_MODE_PRIMARY) {
+colo_proxy_primary_checkpoint(s);
+qemu_event_set(>need_compare_ev);
+qemu_thread_join(>thread);
+} else {
+colo_proxy_secondary_checkpoint(s);
+}
+}
+
+void colo_proxy_stop(int mode)
+{
+Error *err = NULL;
+qemu_foreach_netfilter(colo_proxy_stop_one, , );
+}
+
  static void colo_proxy_setup(NetFilterState *nf, Error **errp)
  {
  COLOProxyState *s = FILTER_COLO_PROXY(nf);
--
1.9.1





--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Thanks
zhangchen






Re: [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework

2016-02-21 Thread Zhang Chen



On 02/20/2016 03:57 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

From: zhangchen 

+static void colo_proxy_setup(NetFilterState *nf, Error **errp)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(nf);
+
+if (!s->addr) {
+error_setg(errp, "filter colo_proxy needs 'addr' property set!");
+return;
+}
+
+if (nf->direction != NET_FILTER_DIRECTION_ALL) {
+error_setg(errp, "colo need queue all packet,"
+"please startup colo-proxy with queue=all\n");
+return;
+}
+
+s->sockfd = -1;
+s->hashtable_size = 0;
+colo_do_checkpoint = false;
+qemu_event_init(>need_compare_ev, false);
+
+s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);

I found that I had to be careful that this queue got flushed.  If the packet
can't be sent immediately, then the packet only gets sent if another
packet is added to the queue later.  I added a state change notifier to
flush it when the VM started running (this is more of a problem in my hybrid
mode case).

Note also that the queue is not protected by locks; so take care since packets
are sent from both the comparison thread and the colo thread (when it flushes)
and I think it's read by the main thread as well potentially as packets are 
sent.

Dave



Hi, Dave.
Thanks for your review, I will pay attention to this problem in the 
following modules.
and We have split colo-proxy to filter-mirror, filter-redirector, 
filter-rewriter and
colo-compare about jason's comments. The detail please look at the 
discussion
about "[RFC PATCH v2 00/10] Add colo-proxy based on netfilter" . If you 
have time,

please review it.

Thanks
zhangchen


+colo_conn_hash = g_hash_table_new_full(connection_key_hash,
+   connection_key_equal,
+   g_free,
+   connection_destroy);
+g_queue_init(>conn_list);
+}
+
+static void colo_proxy_class_init(ObjectClass *oc, void *data)
+{
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+nfc->setup = colo_proxy_setup;
+nfc->cleanup = colo_proxy_cleanup;
+nfc->receive_iov = colo_proxy_receive_iov;
+}
+
+static int colo_proxy_get_mode(Object *obj, Error **errp)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+return s->colo_mode;
+}
+
+static void
+colo_proxy_set_mode(Object *obj, int mode, Error **errp)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+s->colo_mode = mode;
+}
+
+static char *colo_proxy_get_addr(Object *obj, Error **errp)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+return g_strdup(s->addr);
+}
+
+static void
+colo_proxy_set_addr(Object *obj, const char *value, Error **errp)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(obj);
+g_free(s->addr);
+s->addr = g_strdup(value);
+if (!s->addr) {
+error_setg(errp, "colo_proxy needs 'addr'"
+ "property set!");
+return;
+}
+}
+
+static void colo_proxy_init(Object *obj)
+{
+object_property_add_enum(obj, "mode", "COLOMode", COLOMode_lookup,
+ colo_proxy_get_mode, colo_proxy_set_mode, NULL);
+object_property_add_str(obj, "addr", colo_proxy_get_addr,
+colo_proxy_set_addr, NULL);
+}
+
+static void colo_proxy_fini(Object *obj)
+{
+COLOProxyState *s = FILTER_COLO_PROXY(obj);
+g_free(s->addr);
+}
+
+static const TypeInfo colo_proxy_info = {
+.name = TYPE_FILTER_COLO_PROXY,
+.parent = TYPE_NETFILTER,
+.class_init = colo_proxy_class_init,
+.instance_init = colo_proxy_init,
+.instance_finalize = colo_proxy_fini,
+.instance_size = sizeof(COLOProxyState),
+};
+
+static void register_types(void)
+{
+type_register_static(_proxy_info);
+}
+
+type_init(register_types);
diff --git a/net/colo-proxy.h b/net/colo-proxy.h
new file mode 100644
index 000..affc117
--- /dev/null
+++ b/net/colo-proxy.h
@@ -0,0 +1,24 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+
+#ifndef QEMU_COLO_PROXY_H
+#define QEMU_COLO_PROXY_H
+
+int colo_proxy_start(int mode);
+void colo_proxy_stop(int mode);
+int colo_proxy_do_checkpoint(int mode);
+bool colo_proxy_query_checkpoint(void);
+
+#endif /* QEMU_COLO_PROXY_H */
--
1.9.1





--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Thanks
zhangchen






Re: [Qemu-devel] [PATCH 1/3] exec: store RAMBlock pointer into memory region

2016-02-21 Thread Fam Zheng
On Sat, 02/20 10:35, Gonglei wrote:
> Each RAM memory region has a unique corresponding RAMBlock.
> In the current realization, the memory region only stored
> the ram_addr which means the offset of RAM address space,
> We need to qurey the global ram.list to find the ram block
> by ram_addr if we want to get the ram block, which is very
> expensive.
> 
> Now, we store the RAMBlock pointer into memory region
> structure. So, if we know the mr, we can easily get the
> RAMBlock.
> 
> Signed-off-by: Gonglei 
> ---
>  exec.c| 2 ++
>  include/exec/memory.h | 1 +
>  memory.c  | 1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index 1f24500..e29e369 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1717,6 +1717,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, 
> ram_addr_t max_size,
>  error_propagate(errp, local_err);
>  return -1;
>  }
> +/* store the ram block pointer into memroy region */

The comment is superfluous IMHO, the code is quite self-explanatory.

> +mr->ram_block = new_block;
>  return addr;
>  }
>  
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index c92734a..23e2e3e 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -172,6 +172,7 @@ struct MemoryRegion {
>  bool global_locking;
>  uint8_t dirty_log_mask;
>  ram_addr_t ram_addr;
> +void *ram_block;   /* RAMBlock pointer */

Why not add

typedef struct RAMBlock RAMBlock;

then

RAMBlock *ram_block;

?

>  Object *owner;
>  const MemoryRegionIOMMUOps *iommu_ops;
>  
> diff --git a/memory.c b/memory.c
> index 09041ed..b4451dd 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -912,6 +912,7 @@ void memory_region_init(MemoryRegion *mr,
>  }
>  mr->name = g_strdup(name);
>  mr->owner = owner;
> +mr->ram_block = NULL;
>  
>  if (name) {
>  char *escaped_name = memory_region_escape_name(name);
> -- 
> 1.8.5.2
> 
> 
> 



[Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process

2016-02-21 Thread zhanghailiang
Make sure master start block replication after slave's block
replication started.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Li Zhijian 
Cc: Stefan Hajnoczi 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 migration/colo.c  | 48 
 migration/migration.c |  6 +-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index a2d489b..abb7b14 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -21,6 +21,7 @@
 #include "net/net.h"
 #include "net/filter.h"
 #include "net/vhost_net.h"
+#include "replication.h"
 
 static bool vmstate_loading;
 
@@ -63,6 +64,7 @@ static void secondary_vm_do_failover(void)
 {
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
 
 /* Can not do failover during the process of VM's loading VMstate, Or
   * it will break the secondary VM.
@@ -80,6 +82,11 @@ static void secondary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 if (!autostart) {
 error_report("\"-S\" qemu option will be ignored in secondary side");
 /* recover runstate to normal migration finish state */
@@ -170,6 +177,11 @@ static void primary_vm_do_failover(void)
 }
 colo_flush_filter_packets(NULL);
 
+replication_stop_all(true, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_sem);
 }
@@ -329,6 +341,14 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+/* we call this api although this may do nothing on primary side */
+qemu_mutex_lock_iothread();
+replication_do_checkpoint_all(_err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
 if (local_err) {
 goto out;
@@ -505,6 +525,13 @@ static void colo_process_checkpoint(MigrationState *s)
 }
 
 qemu_mutex_lock_iothread();
+/* start block replication */
+replication_start_all(REPLICATION_MODE_PRIMARY, _err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
@@ -600,6 +627,7 @@ static void colo_wait_handle_cmd(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_GUEST_SHUTDOWN:
 qemu_mutex_lock_iothread();
 vm_stop_force_state(RUN_STATE_COLO);
+replication_stop_all(false, NULL);
 qemu_system_shutdown_request_core();
 qemu_mutex_unlock_iothread();
 /* the main thread will exit and terminate the whole
@@ -669,6 +697,14 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+qemu_mutex_lock_iothread();
+/* start block replication */
+replication_start_all(REPLICATION_MODE_SECONDARY, _err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+goto out;
+}
+
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
  _err);
 if (local_err) {
@@ -746,6 +782,18 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+replication_get_error_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+/* discard colo disk buffer */
+replication_do_checkpoint_all(_err);
+if (local_err) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
+
 vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 
diff --git a/migration/migration.c b/migration/migration.c
index 324dcb6..068edb0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1591,7 +1591,11 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
 
 if (!ret) {
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-if (ret >= 0) {
+/*
+* Don't mark image with BDRV_O_INACTIVE flag if
+* we will go into COLO stage later.
+*/
+if (ret >= 0 && !migrate_colo_enabled()) {
 ret = bdrv_inactivate_all();
 }
 if (ret >= 0) {
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public

2016-02-21 Thread zhanghailiang
We will use it in COLO to flush the buffered packets.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v14:
- New patch
---
 include/net/filter.h | 2 ++
 net/filter-buffer.c  | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index faccedd..8ffd53b 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -76,4 +76,6 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 int iovcnt,
 void *opaque);
 
+void filter_buffer_flush(NetFilterState *nf);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 34dc312..91ddd68 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -27,7 +27,7 @@ typedef struct FilterBufferState {
 QEMUTimer release_timer;
 } FilterBufferState;
 
-static void filter_buffer_flush(NetFilterState *nf)
+void filter_buffer_flush(NetFilterState *nf)
 {
 FilterBufferState *s = FILTER_BUFFER(nf);
 
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it

2016-02-21 Thread zhanghailiang
We should not destroy the state of SVM (Secondary VM) until we receive the whole
state from the PVM (Primary VM), in case the primary fails in the middle of 
sending
the state, so, here we cache the device state in Secondary before restore it.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v13:
- Fix the define of colo_get_cmd_value() to use 'Error **errp' instead of
  return value.
v12:
- Use the new helper colo_get_cmd_value() instead of colo_ctl_get()
---
 migration/colo.c | 74 ++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 57a1132..b9f60c7 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -114,6 +114,28 @@ static void colo_get_check_cmd(QEMUFile *f, COLOMessage 
expect_cmd,
 }
 }
 
+static uint64_t colo_get_cmd_value(QEMUFile *f, uint32_t expect_cmd,
+   Error **errp)
+{
+Error *local_err = NULL;
+uint64_t value;
+int ret;
+
+colo_get_check_cmd(f, expect_cmd, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return 0;
+}
+
+value = qemu_get_be64(f);
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to get value for COLO commnd: %s",
+ COLOMessage_lookup[expect_cmd]);
+}
+return value;
+}
+
 static int colo_do_checkpoint_transaction(MigrationState *s,
   QEMUSizedBuffer *buffer)
 {
@@ -297,6 +319,10 @@ static void colo_wait_handle_cmd(QEMUFile *f, int 
*checkpoint_request,
 void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
+QEMUFile *fb = NULL;
+QEMUSizedBuffer *buffer = NULL; /* Cache incoming device state */
+uint64_t total_size;
+uint64_t value;
 Error *local_err = NULL;
 int ret;
 
@@ -320,6 +346,12 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+if (buffer == NULL) {
+error_report("Failed to allocate colo buffer!");
+goto out;
+}
+
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
  _err);
 if (local_err) {
@@ -347,7 +379,21 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-/* TODO: read migration data into colo buffer */
+/* read the VM state total size first */
+value = colo_get_cmd_value(mis->from_src_file,
+ COLO_MESSAGE_VMSTATE_SIZE, _err);
+if (local_err) {
+goto out;
+}
+
+/* read vm device state into colo buffer */
+total_size = qsb_fill_buffer(buffer, mis->from_src_file, value);
+if (total_size != value) {
+error_report("Got %lu VMState data, less than expected %lu",
+ total_size, value);
+ret = -EINVAL;
+goto out;
+}
 
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED,
  _err);
@@ -355,13 +401,32 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-/* TODO: load vm state */
+/* open colo buffer for read */
+fb = qemu_bufopen("r", buffer);
+if (!fb) {
+error_report("Can't open colo buffer for read");
+goto out;
+}
+
+qemu_mutex_lock_iothread();
+qemu_system_reset(VMRESET_SILENT);
+if (qemu_loadvm_state(fb) < 0) {
+error_report("COLO: loadvm failed");
+qemu_mutex_unlock_iothread();
+goto out;
+}
+qemu_mutex_unlock_iothread();
+
+/* TODO: flush vm state */
 
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
  _err);
 if (local_err) {
 goto out;
 }
+
+qemu_fclose(fb);
+fb = NULL;
 }
 
 out:
@@ -370,6 +435,11 @@ out:
 error_report_err(local_err);
 }
 
+if (fb) {
+qemu_fclose(fb);
+}
+qsb_free(buffer);
+
 qemu_mutex_lock_iothread();
 colo_release_ram_cache();
 qemu_mutex_unlock_iothread();
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm

2016-02-21 Thread zhanghailiang
There are several stages during loadvm process. In different stage,
migration incoming processes different section.
We want to control these stages more accuracy, to optimize the COLO
capability.

Here we add two new helper functions: qemu_loadvm_state_begin()
and qemu_load_device_state().
Besides, we make qemu_loadvm_state_main() API public.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v14:
- Split from patch 'COLO: Separate the process of saving/loading
  ram and device state
---
 include/sysemu/sysemu.h |  3 +++
 migration/savevm.c  | 38 +++---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 91eeda3..c0694a1 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -134,6 +134,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
uint64_t *length_list);
 
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
 
 typedef enum DisplayType
 {
diff --git a/migration/savevm.c b/migration/savevm.c
index 9e3c18a..954e0a7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1249,8 +1249,6 @@ enum LoadVMExitCodes {
 LOADVM_QUIT =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* -- incoming postcopy messages -- */
 /* 'advise' arrives before any transfers just to tell us that a postcopy
  * *might* happen - it might be skipped if precopy transferred everything
@@ -1832,7 +1830,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 return 0;
 }
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
 uint8_t section_type;
 int ret;
@@ -1965,6 +1963,40 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
 void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
 BlockDriverState *bs, *bs1;
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM

2016-02-21 Thread zhanghailiang
Enable all buffer filters that added by COLO while
go into COLO process, and disable them while exit COLO.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v15:
- Re-implement colo_set_filter_status() based on COLOBufferFilters list.
- Fix the title of this patch
---
 migration/colo.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index bbff4e8..4c39204 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -113,10 +113,22 @@ static void secondary_vm_do_failover(void)
 }
 }
 
+static void colo_set_filter_status(const char *status, Error **errp)
+{
+struct COLOListNode *e, *next;
+NetFilterState *nf;
+
+QLIST_FOREACH_SAFE(e, , node, next) {
+nf = e->opaque;
+object_property_set_str(OBJECT(nf), status, "status", errp);
+}
+}
+
 static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
 int old_state;
+Error *local_err = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
@@ -140,6 +152,12 @@ static void primary_vm_do_failover(void)
  old_state);
 return;
 }
+
+colo_set_filter_status("disable", _err);
+if (local_err) {
+error_report_err(local_err);
+}
+
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_sem);
 }
@@ -440,6 +458,11 @@ static void colo_process_checkpoint(MigrationState *s)
 
 failover_init_state();
 
+colo_set_filter_status("enable", _err);
+if (local_err) {
+goto out;
+}
+
 s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
 if (!s->rp_state.from_dst_file) {
 error_report("Open QEMUFile from_dst_file failed");
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process

2016-02-21 Thread zhanghailiang
We add a new helper functions qemu_savevm_live_state(),
and make qemu_save_device_state() public.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v14:
- New patch split from previous
 'COLO: Separate the process of saving/loading ram and device state
---
 include/sysemu/sysemu.h |  3 +++
 migration/savevm.c  | 15 +++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c0694a1..7b1748c 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -133,6 +133,9 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
uint64_t *start_list,
uint64_t *length_list);
 
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
 int qemu_loadvm_state_begin(QEMUFile *f);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
diff --git a/migration/savevm.c b/migration/savevm.c
index 954e0a7..60c7b57 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1192,13 +1192,20 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-SaveStateEntry *se;
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+SaveStateEntry *se;
 
+if (!migration_in_colo_state()) {
+qemu_savevm_state_header(f);
+}
 cpu_synchronize_all_states();
 
 QTAILQ_FOREACH(se, _state.handlers, entry) {
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically

2016-02-21 Thread zhanghailiang
Do checkpoint periodically, the default interval is 200ms.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Add Reviewed-by tag
v11:
- Fix wrong sleep time for checkpoint period. (Dave's comment)
---
 migration/colo.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 473fb14..ba3b310 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -11,6 +11,7 @@
  */
 
 #include 
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -230,6 +231,7 @@ out:
 static void colo_process_checkpoint(MigrationState *s)
 {
 QEMUSizedBuffer *buffer = NULL;
+int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 Error *local_err = NULL;
 int ret;
 
@@ -261,11 +263,21 @@ static void colo_process_checkpoint(MigrationState *s)
 trace_colo_vm_state_change("stop", "run");
 
 while (s->state == MIGRATION_STATUS_COLO) {
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if (current_time - checkpoint_time <
+s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+int64_t delay_ms;
+
+delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
+   (current_time - checkpoint_time);
+g_usleep(delay_ms * 1000);
+}
 /* start a colo checkpoint */
 ret = colo_do_checkpoint_transaction(s, buffer);
 if (ret < 0) {
 goto out;
 }
+checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 }
 
 out:
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO

2016-02-21 Thread zhanghailiang
If some errors happen during VM's COLO FT stage, it's important to notify the 
users
of this event. Together with 'x_colo_lost_heartbeat', users can intervene in 
COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Cc: Markus Armbruster 
Cc: Michael Roth 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v13:
- Remove optional 'error' string for this event.
  (I doubted it was usefull for users, Since users shouldn't
   interpret it and can't depend on it to decide what happened
   exaclty. Besides it is really hard to organize.)
- Remove unused 'unknown' member for enum COLOExitReason.
 (Eric's suggestion)
- Fix comment for COLO_EXIT
v11:
- Fix several typos found by Eric
---
 docs/qmp-events.txt | 16 
 migration/colo.c| 20 
 qapi-schema.json| 14 ++
 qapi/event.json | 15 +++
 4 files changed, 65 insertions(+)

diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
index 52eb7e2..b6e8937 100644
--- a/docs/qmp-events.txt
+++ b/docs/qmp-events.txt
@@ -184,6 +184,22 @@ Example:
 Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
 event.
 
+COLO_EXIT
+-
+
+Emitted when VM finishes COLO mode due to some errors happening or
+at the request of users.
+
+Data:
+
+ - "mode": COLO mode, primary or secondary side (json-string)
+ - "reason": the exit reason, internal error or external request. (json-string)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+
 DEVICE_DELETED
 --
 
diff --git a/migration/colo.c b/migration/colo.c
index a65b22b..814480c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/failover.h"
+#include "qapi-event.h"
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -367,6 +368,18 @@ out:
 if (local_err) {
 error_report_err(local_err);
 }
+/*
+* There are only two reasons we can go here, something error happened,
+* Or users triggered failover.
+*/
+if (!failover_request_is_active()) {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
+
 qsb_free(buffer);
 buffer = NULL;
 
@@ -530,6 +543,13 @@ out:
 if (local_err) {
 error_report_err(local_err);
 }
+if (!failover_request_is_active()) {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_ERROR, NULL);
+} else {
+qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+  COLO_EXIT_REASON_REQUEST, NULL);
+}
 
 if (fb) {
 qemu_fclose(fb);
diff --git a/qapi-schema.json b/qapi-schema.json
index 73325ed..7fec696 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -776,6 +776,20 @@
   'data': [ 'unknown', 'primary', 'secondary'] }
 
 ##
+# @COLOExitReason
+#
+# The reason for a COLO exit
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 2.6
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'request', 'error' ] }
+
+##
 # @x-colo-lost-heartbeat
 #
 # Tell qemu that heartbeat is lost, request it to do takeover procedures.
diff --git a/qapi/event.json b/qapi/event.json
index 390fd45..cfcc887 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -268,6 +268,21 @@
   'data': { 'pass': 'int' } }
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: which COLO mode the VM was in when it exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 2.6
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO

2016-02-21 Thread zhanghailiang
In COLO periodic mode, the packets from VM should not be sent
during the time interval of two checkpoints, we will release
all these buffered packets after the checkpoint process, before
VM is resumed.

In this way, we can ensure not to break the network services if
COLO goes into failover process.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v15:
- Re-implement colo_flush_filter_packets() based on COLOBufferFilters list
---
 migration/colo.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 4c39204..a2d489b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -124,6 +124,17 @@ static void colo_set_filter_status(const char *status, 
Error **errp)
 }
 }
 
+static void colo_flush_filter_packets(Error **errp)
+{
+struct COLOListNode *e, *next;
+NetFilterState *nf;
+
+QLIST_FOREACH_SAFE(e, , node, next) {
+nf = e->opaque;
+filter_buffer_flush(nf);
+}
+}
+
 static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
@@ -157,6 +168,7 @@ static void primary_vm_do_failover(void)
 if (local_err) {
 error_report_err(local_err);
 }
+colo_flush_filter_packets(NULL);
 
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_sem);
@@ -364,6 +376,8 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 if (local_err) {
 goto out;
 }
+/* FIXME: Remove this after switch to use colo-proxy */
+colo_flush_filter_packets(NULL);
 
 if (colo_shutdown_requested) {
 colo_put_cmd(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN, _err);
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry

2016-02-21 Thread zhanghailiang
For COLO's checkpoint process, we will do savevm/loadvm repeatedly.
So every time we call qemu_loadvm_section_start_full(), we will
add all sections information into loadvm_handlers list for one time.
There will be many instances in loadvm_handlers for one section,
and this will lead to memory leak.

We need to check if we have the section info in loadvm_handlers list
before save it. For normal migration, it is harmless.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
v14:
- Add Reviewed-by tag
-
v13:
- New patch
---
 migration/savevm.c | 56 ++
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 94f2894..9e3c18a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1718,6 +1718,37 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
 }
 }
 
+static LoadStateEntry *loadvm_save_section_entry(MigrationIncomingState *mis,
+ SaveStateEntry *se,
+ uint32_t section_id,
+ uint32_t version_id)
+{
+LoadStateEntry *le;
+
+/* Add entry */
+le = g_malloc0(sizeof(*le));
+
+le->se = se;
+le->section_id = section_id;
+le->version_id = version_id;
+QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
+return le;
+}
+
+static LoadStateEntry *loadvm_find_section_entry(MigrationIncomingState *mis,
+ uint32_t section_id)
+{
+LoadStateEntry *le;
+
+QLIST_FOREACH(le, >loadvm_handlers, entry) {
+if (le->section_id == section_id) {
+break;
+}
+}
+
+return le;
+}
+
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
@@ -1753,16 +1784,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
  version_id, idstr, se->version_id);
 return -EINVAL;
 }
-
-/* Add entry */
-le = g_malloc0(sizeof(*le));
-
-le->se = se;
-le->section_id = section_id;
-le->version_id = version_id;
-QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
-
-ret = vmstate_load(f, le->se, le->version_id);
+ /* Check if we have saved this section info before, if not, save it */
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
+le = loadvm_save_section_entry(mis, se, section_id, version_id);
+}
+ret = vmstate_load(f, se, version_id);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
@@ -1785,12 +1812,9 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 section_id = qemu_get_be32(f);
 
 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, >loadvm_handlers, entry) {
-if (le->section_id == section_id) {
-break;
-}
-}
-if (le == NULL) {
+
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
 error_report("Unknown savevm section %d", section_id);
 return -EINVAL;
 }
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init

2016-02-21 Thread zhanghailiang
We can register some callback for this notifier,
this will be used by COLO to register a callback which
will add each netdev a buffer filter.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v14:
- New patch
---
 include/net/net.h |  4 
 net/net.c | 33 +
 2 files changed, 37 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..f6f0194 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -176,6 +176,10 @@ struct NICInfo {
 int nvectors;
 };
 
+typedef struct netdev_init_entry NetdevInitEntry;
+typedef void NetdevInitHandler(const char *netdev_id, void *opaque);
+NetdevInitEntry *netdev_init_add_handler(NetdevInitHandler *cb, void *opaque);
+
 extern int nb_nics;
 extern NICInfo nd_table[MAX_NICS];
 extern int default_net;
diff --git a/net/net.c b/net/net.c
index aebf753..bdd3e7b 100644
--- a/net/net.c
+++ b/net/net.c
@@ -55,6 +55,14 @@
 static VMChangeStateEntry *net_change_state_entry;
 static QTAILQ_HEAD(, NetClientState) net_clients;
 
+struct netdev_init_entry {
+NetdevInitHandler *cb;
+void *opaque;
+QLIST_ENTRY(netdev_init_entry) entries;
+};
+
+static QLIST_HEAD(netdev_init_head, netdev_init_entry)netdev_init_head;
+
 const char *host_net_devices[] = {
 "tap",
 "socket",
@@ -953,6 +961,26 @@ static int net_init_nic(const NetClientOptions *opts, 
const char *name,
 return idx;
 }
 
+NetdevInitEntry *netdev_init_add_handler(NetdevInitHandler *cb, void *opaque)
+{
+NetdevInitEntry *e;
+
+e = g_malloc0(sizeof(*e));
+
+e->cb = cb;
+e->opaque = opaque;
+QLIST_INSERT_HEAD(_init_head, e, entries);
+return e;
+}
+
+static void netdev_init_notify(const char *netdev_id)
+{
+NetdevInitEntry *e, *next;
+
+QLIST_FOREACH_SAFE(e, _init_head, entries, next) {
+e->cb(netdev_id, e->opaque);
+}
+}
 
 static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND__MAX])(
 const NetClientOptions *opts,
@@ -1039,6 +1067,11 @@ static int net_client_init1(const void *object, int 
is_netdev, Error **errp)
 }
 return -1;
 }
+if (is_netdev) {
+const Netdev *netdev = object;
+
+netdev_init_notify(netdev->id);
+}
 return 0;
 }
 
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state

2016-02-21 Thread zhanghailiang
If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
after going into colo state.
So it is necessary to update the global runstate after going into colo state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Add Reviewed-by tag
---
 migration/colo.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 855edee..16bada6 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -397,6 +397,11 @@ static void colo_process_checkpoint(MigrationState *s)
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");
 
+ret = global_state_store();
+if (ret < 0) {
+goto out;
+}
+
 while (s->state == MIGRATION_STATUS_COLO) {
 if (failover_request_is_active()) {
 error_report("failover request");
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter

2016-02-21 Thread zhanghailiang
For COLO periodic mode, it need to buffer packets that
sent by VM, and we will not release these packets until
finish a checkpoint.

Here, we add each netdev a buffer-filter that will be controlled
by COLO. It is disabled by default, and the packets will not pass
through these filters. If users don't enable COLO while configure
qemu, these buffer-filters will not be added.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v15:
- call object_new_with_props() directly to add filter in
  colo_add_buffer_filter. (Jason's suggestion)
v14:
- New patch
---
 include/migration/colo.h |  2 ++
 include/net/filter.h |  2 ++
 migration/colo-comm.c|  5 +
 migration/colo.c | 49 
 net/filter-buffer.c  |  2 --
 stubs/migration-colo.c   |  4 
 6 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 919b135..22b92c9 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -37,4 +37,6 @@ COLOMode get_colo_mode(void);
 void colo_do_failover(MigrationState *s);
 
 bool colo_shutdown(void);
+void colo_add_buffer_filter(const char *netdev_id, void *opaque);
+
 #endif
diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..faccedd 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -22,6 +22,8 @@
 #define NETFILTER_CLASS(klass) \
 OBJECT_CLASS_CHECK(NetFilterClass, (klass), TYPE_NETFILTER)
 
+#define TYPE_FILTER_BUFFER "filter-buffer"
+
 typedef void (FilterSetup) (NetFilterState *nf, Error **errp);
 typedef void (FilterCleanup) (NetFilterState *nf);
 /*
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 3943e94..91d873e 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -13,6 +13,7 @@
 
 #include 
 #include "trace.h"
+#include 
 
 typedef struct {
  bool colo_requested;
@@ -58,6 +59,10 @@ static const VMStateDescription colo_state = {
 void colo_info_init(void)
 {
 vmstate_register(NULL, 0, _state, _info);
+/* FIXME: Remove this after COLO switch to use colo-proxy */
+if (colo_supported()) {
+netdev_init_add_handler(colo_add_buffer_filter, NULL);
+}
 }
 
 bool migration_incoming_enable_colo(void)
diff --git a/migration/colo.c b/migration/colo.c
index 0140203..bbff4e8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,12 +18,23 @@
 #include "qemu/error-report.h"
 #include "migration/failover.h"
 #include "qapi-event.h"
+#include "net/net.h"
+#include "net/filter.h"
+#include "net/vhost_net.h"
 
 static bool vmstate_loading;
 
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+typedef struct COLOListNode {
+void *opaque;
+QLIST_ENTRY(COLOListNode) node;
+} COLOListNode;
+
+static QLIST_HEAD(, COLOListNode) COLOBufferFilters =
+QLIST_HEAD_INITIALIZER(COLOBufferFilters);
+
 bool colo_supported(void)
 {
 return true;
@@ -382,6 +393,44 @@ static int colo_prepare_before_save(MigrationState *s)
 return ret;
 }
 
+void colo_add_buffer_filter(const char *netdev_id, void *opaque)
+{
+NetFilterState *nf;
+char filter_name[128];
+Object *filter;
+COLOListNode *filternode;
+NetClientState *nc = qemu_find_netdev(netdev_id);
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+
+ /* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+snprintf(filter_name, sizeof(filter_name),
+"%scolo", netdev_id);
+
+filter = object_new_with_props(TYPE_FILTER_BUFFER,
+object_get_objects_root(),
+filter_name, NULL,
+"netdev", netdev_id,
+"status", "disable",
+NULL);
+if (!filter) {
+return;
+}
+nf =  NETFILTER(filter);
+/* Only buffer the packets that sent out by VM */
+nf->direction = NET_FILTER_DIRECTION_RX;
+filternode = g_new0(COLOListNode, 1);
+filternode->opaque = nf;
+QLIST_INSERT_HEAD(, filternode, node);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QEMUSizedBuffer *buffer = NULL;
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index f0a9151..34dc312 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -16,8 +16,6 @@
 #include "qapi-visit.h"
 #include "qom/object.h"
 
-#define TYPE_FILTER_BUFFER "filter-buffer"
-
 #define FILTER_BUFFER(obj) \
 OBJECT_CHECK(FilterBufferState, (obj), TYPE_FILTER_BUFFER)
 
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index 1996cd9..8e74acb 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -48,3 +48,7 @@ bool colo_shutdown(void)
 {
 return false;
 }
+
+void colo_add_buffer_filter(const char *netdev_id, void *opaque)
+{
+}
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover

2016-02-21 Thread zhanghailiang
If the net connection between COLO's two sides is broken while colo/colo 
incoming
thread is blocked in 'read'/'write' socket fd. It will not detect this error 
until
connect timeout. It will be a long time.

Here we shutdown all the related socket file descriptors to wake up the blocking
operation in failover BH. Besides, we should close the corresponding file 
descriptors
after failvoer BH shutdown them, or there will be an error.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v13:
- Add Reviewed-by tag
- Use semaphore to notify colo/colo incoming loop that
  failover work is finished.
v12:
- Shutdown both QEMUFile's fd though they may use the
  same fd. (Dave's suggestion)
v11:
- Only shutdown fd for once
---
 include/migration/migration.h |  3 +++
 migration/colo.c  | 43 +++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 14b9f3d..b34def6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -112,6 +112,7 @@ struct MigrationIncomingState {
 QemuThread colo_incoming_thread;
 /* The coroutine we should enter (back) after failover */
 Coroutine *migration_incoming_co;
+QemuSemaphore colo_incoming_sem;
 
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
@@ -175,6 +176,8 @@ struct MigrationState
 QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
 /* The RAMBlock used in the last src_page_request */
 RAMBlock *last_req_rb;
+
+QemuSemaphore colo_sem;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/colo.c b/migration/colo.c
index 814480c..5c87a8e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
 /* recover runstate to normal migration finish state */
 autostart = true;
 }
+/*
+* Make sure colo incoming thread not block in recv or send,
+* If mis->from_src_file and mis->to_src_file use the same fd,
+* The second shutdown() will return -1, we ignore this value,
+* it is harmless.
+*/
+if (mis->from_src_file) {
+qemu_file_shutdown(mis->from_src_file);
+}
+if (mis->to_src_file) {
+qemu_file_shutdown(mis->to_src_file);
+}
 
 old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
  "secondary VM", old_state);
 return;
 }
+/* Notify COLO incoming thread that failover work is finished */
+qemu_sem_post(>colo_incoming_sem);
 /* For Secondary VM, jump to incoming co */
 if (mis->migration_incoming_co) {
 qemu_coroutine_enter(mis->migration_incoming_co, NULL);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
+/*
+* Make sure colo thread no block in recv or send,
+* The s->rp_state.from_dst_file and s->to_dst_file may use the
+* same fd, but we still shutdown the fd for twice, it is harmless.
+*/
+if (s->to_dst_file) {
+qemu_file_shutdown(s->to_dst_file);
+}
+if (s->rp_state.from_dst_file) {
+qemu_file_shutdown(s->rp_state.from_dst_file);
+}
+
 old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
 if (old_state != FAILOVER_STATUS_HANDLING) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
  old_state);
 return;
 }
+/* Notify COLO thread that failover work is finished */
+qemu_sem_post(>colo_sem);
 }
 
 void colo_do_failover(MigrationState *s)
@@ -383,6 +411,14 @@ out:
 qsb_free(buffer);
 buffer = NULL;
 
+/* Hope this not to be too long to wait here */
+qemu_sem_wait(>colo_sem);
+qemu_sem_destroy(>colo_sem);
+/*
+* Must be called after failover BH is completed,
+* Or the failover BH may shutdown the wrong fd, that
+* re-used by other thread after we release here.
+*/
 if (s->rp_state.from_dst_file) {
 qemu_fclose(s->rp_state.from_dst_file);
 }
@@ -391,6 +427,7 @@ out:
 void migrate_start_colo_process(MigrationState *s)
 {
 qemu_mutex_unlock_iothread();
+qemu_sem_init(>colo_sem, 0);
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 colo_process_checkpoint(s);
@@ -430,6 +467,8 @@ void *colo_process_incoming_thread(void *opaque)
 Error *local_err = NULL;
 int ret;
 
+qemu_sem_init(>colo_incoming_sem, 0);
+
 

[Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval

2016-02-21 Thread zhanghailiang
We may want to accept zero interval when VM FT solutions like MC
or COLO use this filter to release packets on demand.

Signed-off-by: zhanghailiang 
Reviewed-by: Yang Hongyang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
 net/filter-buffer.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 12ad2e3..f0a9151 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -104,16 +104,6 @@ static void filter_buffer_setup(NetFilterState *nf, Error 
**errp)
 {
 FilterBufferState *s = FILTER_BUFFER(nf);
 
-/*
- * We may want to accept zero interval when VM FT solutions like MC
- * or COLO use this filter to release packets on demand.
- */
-if (!s->interval) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval",
-   "a non-zero interval");
-return;
-}
-
 s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 if (s->interval) {
 timer_init_us(>release_timer, QEMU_CLOCK_VIRTUAL,
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM

2016-02-21 Thread zhanghailiang
For PVM, if there is failover request from users.
The colo thread will exit the loop while the failover BH does the
cleanup work and resumes VM.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Add Reviewed-by tag
v12:
- Fix error report and remove unnecessary check in
  primary_vm_do_failover() (Dave's suggestion)
v11:
- Don't call migration_end() in primary_vm_do_failover(),
 The cleanup work will be done in migration_thread().
- Remove vm_start() in primary_vm_do_failover() which also been
  done in migraiton_thread()
v10:
- Call migration_end() in primary_vm_do_failover()
---
 include/migration/colo.h |  3 +++
 include/migration/failover.h |  1 +
 migration/colo-failover.c|  7 +-
 migration/colo.c | 53 +---
 4 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e9ac2c3..e32eef4 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
 COLOMode get_colo_mode(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
index fe71bb4..c4bd81e 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -26,5 +26,6 @@ void failover_init_state(void);
 int failover_set_state(int old_state, int new_state);
 int failover_get_state(void);
 void failover_request_active(Error **errp);
+bool failover_request_is_active(void);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index e94b3ba..0a1d4bd 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
 error_report("Unkown error for failover, old_state=%d", old_state);
 return;
 }
-/*TODO: Do failover work */
+colo_do_failover(NULL);
 }
 
 void failover_request_active(Error **errp)
@@ -67,6 +67,11 @@ int failover_get_state(void)
 return atomic_read(_state);
 }
 
+bool failover_request_is_active(void)
+{
+return failover_get_state() != FAILOVER_STATUS_NONE;
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
 if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index bf1ac2e..89cea58 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -40,6 +40,40 @@ bool migration_incoming_in_colo_state(void)
 return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+static void primary_vm_do_failover(void)
+{
+MigrationState *s = migrate_get_current();
+int old_state;
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_COMPLETED);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Incorrect state (%d) while doing failover for Primary 
VM",
+ old_state);
+return;
+}
+}
+
+void colo_do_failover(MigrationState *s)
+{
+/* Make sure vm stopped while failover */
+if (!colo_runstate_is_stopped()) {
+vm_stop_force_state(RUN_STATE_COLO);
+}
+
+if (get_colo_mode() == COLO_MODE_PRIMARY) {
+primary_vm_do_failover();
+}
+}
+
 static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
  Error **errp)
 {
@@ -166,9 +200,20 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 }
 
 qemu_mutex_lock_iothread();
+if (failover_request_is_active()) {
+qemu_mutex_unlock_iothread();
+goto out;
+}
 vm_stop_force_state(RUN_STATE_COLO);
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("run", "stop");
+/*
+ * failover request bh could be called after
+ * vm_stop_force_state so we check failover_request_is_active() again.
+ */
+if (failover_request_is_active()) {
+goto out;
+}
 
 /* Disable block migration */
 s->params.blk = 0;
@@ -266,6 +311,11 @@ static void colo_process_checkpoint(MigrationState *s)
 trace_colo_vm_state_change("stop", "run");
 
 while (s->state == MIGRATION_STATUS_COLO) {
+if (failover_request_is_active()) {
+error_report("failover request");
+goto out;
+}
+
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 if (current_time - checkpoint_time <
 s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
@@ -288,9 +338,6 @@ out:
 if (local_err) {
 error_report_err(local_err);
 }
-

[Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process

2016-02-21 Thread zhanghailiang
When handling failover, we do different things according to the different stage
of failover process, here we introduce a global atomic variable to record the
status of failover.

We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and set the value.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
v11:
- fix several typos found by Dave
- Add Reviewed-by tag
---
 include/migration/failover.h | 10 ++
 migration/colo-failover.c| 37 +
 migration/colo.c |  4 
 trace-events |  1 +
 4 files changed, 52 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index 3274735..fe71bb4 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -15,6 +15,16 @@
 
 #include "qemu-common.h"
 
+typedef enum COLOFailoverStatus {
+FAILOVER_STATUS_NONE = 0,
+FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
+FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
+FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+} COLOFailoverStatus;
+
+void failover_init_state(void);
+int failover_set_state(int old_state, int new_state);
+int failover_get_state(void);
 void failover_request_active(Error **errp);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 3533409..e94b3ba 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -14,22 +14,59 @@
 #include "migration/failover.h"
 #include "qmp-commands.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+#include "trace.h"
 
 static QEMUBH *failover_bh;
+static COLOFailoverStatus failover_state;
 
 static void colo_failover_bh(void *opaque)
 {
+int old_state;
+
 qemu_bh_delete(failover_bh);
 failover_bh = NULL;
+old_state = failover_set_state(FAILOVER_STATUS_REQUEST,
+   FAILOVER_STATUS_HANDLING);
+if (old_state != FAILOVER_STATUS_REQUEST) {
+error_report("Unkown error for failover, old_state=%d", old_state);
+return;
+}
 /*TODO: Do failover work */
 }
 
 void failover_request_active(Error **errp)
 {
+   if (failover_set_state(FAILOVER_STATUS_NONE, FAILOVER_STATUS_REQUEST)
+ != FAILOVER_STATUS_NONE) {
+error_setg(errp, "COLO failover is already actived");
+return;
+}
 failover_bh = qemu_bh_new(colo_failover_bh, NULL);
 qemu_bh_schedule(failover_bh);
 }
 
+void failover_init_state(void)
+{
+failover_state = FAILOVER_STATUS_NONE;
+}
+
+int failover_set_state(int old_state, int new_state)
+{
+int old;
+
+old = atomic_cmpxchg(_state, old_state, new_state);
+if (old == old_state) {
+trace_colo_failover_set_state(new_state);
+}
+return old;
+}
+
+int failover_get_state(void)
+{
+return atomic_read(_state);
+}
+
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
 if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index 1aede64..bf1ac2e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -236,6 +236,8 @@ static void colo_process_checkpoint(MigrationState *s)
 Error *local_err = NULL;
 int ret;
 
+failover_init_state();
+
 s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
 if (!s->rp_state.from_dst_file) {
 error_report("Open QEMUFile from_dst_file failed");
@@ -342,6 +344,8 @@ void *colo_process_incoming_thread(void *opaque)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 
+failover_init_state();
+
 mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
 if (!mis->to_src_file) {
 error_report("colo incoming thread: Open QEMUFile to_src_file failed");
diff --git a/trace-events b/trace-events
index ee4a2fb..8fb6b31 100644
--- a/trace-events
+++ b/trace-events
@@ -1609,6 +1609,7 @@ postcopy_ram_incoming_cleanup_join(void) ""
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_put_cmd(const char *msg) "Send '%s' cmd"
 colo_get_cmd(const char *msg) "Receive '%s' cmd"
+colo_failover_set_state(int new_state) "new state %d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state

2016-02-21 Thread zhanghailiang
We separate the process of saving/loading ram and device state when do
checkpoint, we add new helpers for save/load ram/device. With this change,
we can directly transfer ram from master to slave without using
QEMUSizeBufferas as assistant, which also reduce the size of extra memory
been used during checkpoint.

Besides, we move the colo_flush_ram_cache to the proper position after the
above change.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
v14:
- split two new patches from this patch
- Some minor fixes from Dave
v13:
- Re-use some existed helper functions to realize saving/loading
  ram and device.
v11:
- Remove load configuration section in qemu_loadvm_state_begin()
---
 migration/colo.c   | 48 ++--
 migration/ram.c|  5 -
 migration/savevm.c |  5 +
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 16bada6..300fa54 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -288,21 +288,37 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
+if (local_err) {
+goto out;
+}
+
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
-qemu_savevm_state_header(trans);
-qemu_savevm_state_begin(trans, >params);
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save vm state begin error");
+goto out;
+}
+
 qemu_mutex_lock_iothread();
-qemu_savevm_state_complete_precopy(trans, false);
+/*
+* Only save VM's live state, which not including device state.
+* TODO: We may need a timeout mechanism to prevent COLO process
+* to be blocked here.
+*/
+qemu_savevm_live_state(s->to_dst_file);
+/* Note: device state is saved into buffer */
+ret = qemu_save_device_state(trans);
 qemu_mutex_unlock_iothread();
-
-qemu_fflush(trans);
-
-colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
-if (local_err) {
+if (ret < 0) {
+error_report("Save device state error");
 goto out;
 }
+qemu_fflush(trans);
+
 /* we send the total size of the vmstate first */
 size = qsb_get_length(buffer);
 colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
@@ -573,6 +589,16 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+ret = qemu_loadvm_state_begin(mis->from_src_file);
+if (ret < 0) {
+error_report("Load vm state begin error, ret=%d", ret);
+goto out;
+}
+ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+if (ret < 0) {
+error_report("Load VM's live state (ram) error");
+goto out;
+}
 /* read the VM state total size first */
 value = colo_get_cmd_value(mis->from_src_file,
  COLO_MESSAGE_VMSTATE_SIZE, _err);
@@ -605,8 +631,10 @@ void *colo_process_incoming_thread(void *opaque)
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
 vmstate_loading = true;
-if (qemu_loadvm_state(fb) < 0) {
-error_report("COLO: loadvm failed");
+colo_flush_ram_cache();
+ret = qemu_load_device_state(fb);
+if (ret < 0) {
+error_report("COLO: load device state failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
diff --git a/migration/ram.c b/migration/ram.c
index 891f3b2..8f416d5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2465,7 +2465,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  * be atomic
  */
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
-bool need_flush = false;
 
 seq_iter++;
 
@@ -2500,7 +2499,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (ram_cache_enable) {
 host = colo_cache_from_block_offset(block, addr);
-need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);
 }
@@ -2594,9 +2592,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 rcu_read_unlock();
 
-if (!ret  && ram_cache_enable && need_flush) {
-colo_flush_ram_cache();
-}
 DPRINTF("Completed load of VM with exit code %d seq iteration "
 "%" PRIu64 "\n", ret, seq_iter);
 return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index 60c7b57..1551fbb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -50,6 +50,7 @@
 #include "qemu/iov.h"
 #include 

[Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received

2016-02-21 Thread zhanghailiang
We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.
We record them by re-using migration dirty bitmap.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Add Reviewed-by tag
v11:
- Split a new helper function from original
  host_from_stream_offset() (Dave's suggestion)
- Only do recording work in this patch
v10:
- New patch split from v9's patch 13
- Rebase to master to use 'migration_bitmap_rcu'
---
 migration/ram.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 027c5bc..7373df3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2195,6 +2195,9 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
  ram_addr_t offset)
 {
+unsigned long *bitmap;
+long k;
+
 if (!offset_in_ramblock(block, offset)) {
 return NULL;
 }
@@ -2203,6 +2206,17 @@ static inline void 
*colo_cache_from_block_offset(RAMBlock *block,
  __func__, block->idstr);
 return NULL;
 }
+
+k = (block->mr->ram_addr + offset) >> TARGET_PAGE_BITS;
+bitmap = atomic_rcu_read(_bitmap_rcu)->bmap;
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+if (!test_and_set_bit(k, bitmap)) {
+migration_dirty_pages++;
+}
 return block->colo_cache + offset;
 }
 
@@ -2589,6 +2603,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 int colo_init_ram_cache(void)
 {
 RAMBlock *block;
+int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, _list.blocks, next) {
@@ -2603,6 +2618,15 @@ int colo_init_ram_cache(void)
 }
 rcu_read_unlock();
 ram_cache_enable = true;
+/*
+* Record the dirty pages that sent by PVM, we use this dirty bitmap 
together
+* with to decide which page in cache should be flushed into SVM's RAM. Here
+* we use the same name 'migration_bitmap_rcu' as for migration.
+*/
+migration_bitmap_rcu = g_new0(struct BitmapRcu, 1);
+migration_bitmap_rcu->bmap = bitmap_new(ram_cache_pages);
+migration_dirty_pages = 0;
+
 return 0;
 
 out_locked:
@@ -2620,9 +2644,15 @@ out_locked:
 void colo_release_ram_cache(void)
 {
 RAMBlock *block;
+struct BitmapRcu *bitmap = migration_bitmap_rcu;
 
 ram_cache_enable = false;
 
+atomic_rcu_set(_bitmap_rcu, NULL);
+if (bitmap) {
+call_rcu(bitmap, migration_bitmap_free, rcu);
+}
+
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, _list.blocks, next) {
 if (block->colo_cache) {
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object

2016-02-21 Thread zhanghailiang
With this property, users can control if this filter is 'enable'
or 'disable'. The default behavior for filter is enabled.

We will skip the disabled filter when delivering packets in net layer.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v15:
- Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip (Jason)
- Remove some useless comment (Jason)
---
 include/net/filter.h |  1 +
 net/filter.c | 40 
 qemu-options.hx  |  4 +++-
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index 5639976..af3c53c 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
 char *netdev_id;
 NetClientState *netdev;
 NetFilterDirection direction;
+bool enabled;
 QTAILQ_ENTRY(NetFilterState) next;
 };
 
diff --git a/net/filter.c b/net/filter.c
index d2a514e..f114dfb 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -17,6 +17,11 @@
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
 
+static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
+{
+return nf->enabled ? false : true;
+}
+
 ssize_t qemu_netfilter_receive(NetFilterState *nf,
NetFilterDirection direction,
NetClientState *sender,
@@ -25,6 +30,9 @@ ssize_t qemu_netfilter_receive(NetFilterState *nf,
int iovcnt,
NetPacketSent *sent_cb)
 {
+if (qemu_can_skip_netfilter(nf)) {
+return 0;
+}
 if (nf->direction == direction ||
 nf->direction == NET_FILTER_DIRECTION_ALL) {
 return NETFILTER_GET_CLASS(OBJECT(nf))->receive_iov(
@@ -134,8 +142,37 @@ static void netfilter_set_direction(Object *obj, int 
direction, Error **errp)
 nf->direction = direction;
 }
 
+static char *netfilter_get_status(Object *obj, Error **errp)
+{
+NetFilterState *nf = NETFILTER(obj);
+
+if (nf->enabled) {
+return g_strdup("enable");
+} else {
+return g_strdup("disable");
+}
+}
+
+static void netfilter_set_status(Object *obj, const char *str, Error **errp)
+{
+NetFilterState *nf = NETFILTER(obj);
+
+if (!strcmp(str, "enable")) {
+nf->enabled = true;
+} else if (!strcmp(str, "disable")) {
+nf->enabled = false;
+} else {
+error_setg(errp, "Invalid value for netfilter status, "
+ "should be 'enable' or 'disable'");
+}
+}
+
 static void netfilter_init(Object *obj)
 {
+NetFilterState *nf = NETFILTER(obj);
+
+nf->enabled = true;
+
 object_property_add_str(obj, "netdev",
 netfilter_get_netdev_id, netfilter_set_netdev_id,
 NULL);
@@ -143,6 +180,9 @@ static void netfilter_init(Object *obj)
  NetFilterDirection_lookup,
  netfilter_get_direction, netfilter_set_direction,
  NULL);
+object_property_add_str(obj, "status",
+netfilter_get_status, netfilter_set_status,
+NULL);
 }
 
 static void netfilter_complete(UserCreatable *uc, Error **errp)
diff --git a/qemu-options.hx b/qemu-options.hx
index 2f0465e..6f302e6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3742,11 +3742,13 @@ version by providing the @var{passwordid} parameter. 
This provides
 the ID of a previously created @code{secret} object containing the
 password for decryption.
 
-@item -object 
filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}]
+@item -object 
filter-buffer,id=@var{id},netdev=@var{netdevid},interval=@var{t}[,queue=@var{all|rx|tx}][,status=@var{enable|disable}]
 
 Interval @var{t} can't be 0, this filter batches the packet delivery: all
 packets arriving in a given interval on netdev @var{netdevid} are delayed
 until the end of the interval. Interval is in microseconds.
+@option{status} is optional that indicate whether the netfilter is enabled
+or disabled, the default status for netfilter will be enabled.
 
 queue @var{all|rx|tx} is an option that can be applied to any netfilter.
 
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process

2016-02-21 Thread zhanghailiang
It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these data
transferring in the later checkpoint.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Fix some minor issues found by Dave
- Add Reviewed-by tag
---
 migration/colo.c | 51 ---
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 300fa54..0140203 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -293,16 +293,6 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
-/* Disable block migration */
-s->params.blk = 0;
-s->params.shared = 0;
-qemu_savevm_state_begin(s->to_dst_file, >params);
-ret = qemu_file_get_error(s->to_dst_file);
-if (ret < 0) {
-error_report("Save vm state begin error");
-goto out;
-}
-
 qemu_mutex_lock_iothread();
 /*
 * Only save VM's live state, which not including device state.
@@ -377,6 +367,21 @@ out:
 return ret;
 }
 
+static int colo_prepare_before_save(MigrationState *s)
+{
+int ret;
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_begin(s->to_dst_file, >params);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+error_report("Save vm state begin error");
+}
+return ret;
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 QEMUSizedBuffer *buffer = NULL;
@@ -392,6 +397,11 @@ static void colo_process_checkpoint(MigrationState *s)
 goto out;
 }
 
+ret = colo_prepare_before_save(s);
+if (ret < 0) {
+goto out;
+}
+
 /*
  * Wait for Secondary finish loading vm states and enter COLO
  * restore.
@@ -517,6 +527,17 @@ static void colo_wait_handle_cmd(QEMUFile *f, int 
*checkpoint_request,
 }
 }
 
+static int colo_prepare_before_load(QEMUFile *f)
+{
+int ret;
+
+ret = qemu_loadvm_state_begin(f);
+if (ret < 0) {
+error_report("load vm state begin error, ret=%d", ret);
+}
+return ret;
+}
+
 void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
@@ -557,6 +578,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
+ret = colo_prepare_before_load(mis->from_src_file);
+if (ret < 0) {
+goto out;
+}
+
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
  _err);
 if (local_err) {
@@ -589,11 +615,6 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state_begin(mis->from_src_file);
-if (ret < 0) {
-error_report("Load vm state begin error, ret=%d", ret);
-goto out;
-}
 ret = qemu_loadvm_state_main(mis->from_src_file, mis);
 if (ret < 0) {
 error_report("Load VM's live state (ram) error");
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration

2016-02-21 Thread zhanghailiang
Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.

We reuse migration thread, so if colo is enabled by user, migration thread will
go into the process of colo.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v11:
- Rebase to master
- Add Reviewed-by tag
v10:
- Simplify process by dropping colo thread and reusing migration thread.
 (Dave's suggestion)
---
 include/migration/colo.h |  3 +++
 migration/colo.c | 31 +++
 migration/migration.c| 30 ++
 qapi-schema.json |  4 +++-
 stubs/migration-colo.c   |  9 +
 trace-events |  3 +++
 6 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 1c899a0..bf84b99 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_init(void);
 
+void migrate_start_colo_process(MigrationState *s);
+bool migration_in_colo_state(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index cb3e22d..8d0d851 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,9 +10,40 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "trace.h"
 
 bool colo_supported(void)
 {
 return true;
 }
+
+bool migration_in_colo_state(void)
+{
+MigrationState *s = migrate_get_current();
+
+return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void colo_process_checkpoint(MigrationState *s)
+{
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
+
+/*TODO: COLO checkpoint savevm loop*/
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+qemu_mutex_unlock_iothread();
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_COLO);
+colo_process_checkpoint(s);
+qemu_mutex_lock_iothread();
+}
diff --git a/migration/migration.c b/migration/migration.c
index 68b5019..d7228f5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -641,6 +641,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 get_xbzrle_cache_stats(info);
 break;
+case MIGRATION_STATUS_COLO:
+info->has_status = true;
+/* TODO: display COLO specific information (checkpoint info etc.) */
+break;
 case MIGRATION_STATUS_COMPLETED:
 get_xbzrle_cache_stats(info);
 
@@ -1001,7 +1005,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.shared = has_inc && inc;
 
 if (migration_is_setup_or_active(s->state) ||
-s->state == MIGRATION_STATUS_CANCELLING) {
+s->state == MIGRATION_STATUS_CANCELLING ||
+s->state == MIGRATION_STATUS_COLO) {
 error_setg(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -1595,8 +1600,11 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
 goto fail;
 }
 
-migrate_set_state(>state, current_active_state,
-  MIGRATION_STATUS_COMPLETED);
+if (!migrate_colo_enabled()) {
+migrate_set_state(>state, current_active_state,
+  MIGRATION_STATUS_COMPLETED);
+}
+
 return;
 
 fail:
@@ -1628,6 +1636,7 @@ static void *migration_thread(void *opaque)
 bool entered_postcopy = false;
 /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
 enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE;
+bool enable_colo = migrate_colo_enabled();
 
 rcu_register_thread();
 
@@ -1736,7 +1745,11 @@ static void *migration_thread(void *opaque)
 end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
 qemu_mutex_lock_iothread();
-qemu_savevm_state_cleanup();
+/* The resource has been allocated by migration will be reused in COLO
+  process, so don't release them. */
+if (!enable_colo) {
+qemu_savevm_state_cleanup();
+}
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
 s->total_time = end_time - s->total_time;
@@ -1749,6 +1762,15 @@ static void *migration_thread(void *opaque)
 }
 runstate_set(RUN_STATE_POSTMIGRATE);
 } else {
+if (s->state == MIGRATION_STATUS_ACTIVE && enable_colo) {
+migrate_start_colo_process(s);
+qemu_savevm_state_cleanup();
+/*
+* Fixme: we will run VM in COLO no matter its old running state.
+* After exited COLO, we will 

[Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2016-02-21 Thread zhanghailiang
From: root 

This is the 15th version of COLO (Still only support periodic checkpoint).

Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode

There are little changes for this series except the network releated part.

Patch status:
Unreviewed: patch 21,27,28,29,33,38
Updated: patch 31,34,35,37

TODO:
1. Checkpoint based on proxy in qemu
2. The capability of continuous FT
3. Optimize the VM's downtime during checkpoint

v15:
 - Go on the shutdown process if encounter error while sending shutdown
   message to SVM. (patch 24)
 - Rename qemu_need_skip_netfilter to qemu_netfilter_can_skip and Remove
   some useless comment. (patch 31, Jason)
 - Call object_new_with_props() directly to add filter in
   colo_add_buffer_filter. (patch 34, Jason)
 - Re-implement colo_set_filter_status() based on COLOBufferFilters
   list. (patch 35)
 - Re-implement colo_flush_filter_packets() based on COLOBufferFilters
   list. (patch 37) 
v14:
 - Re-implement the network processing based on netfilter (Jason Wang)
 - Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
 - Split two new patches (patch 27/28) from patch 29
 - Fix some other comments from Dave and Markus.

v13:
 - Refactor colo_*_cmd helper functions to use 'Error **errp' parameter
  instead of return value to indicate success or failure. (patch 10)
 - Remove the optional error message for COLO_EXIT event. (patch 25)
 - Use semaphore to notify colo/colo incoming loop that failover work is
   finished. (patch 26)
 - Move COLO shutdown related codes to colo.c file. (patch 28)
 - Fix memory leak bug for colo incoming loop. (new patch 31)
 - Re-use some existed helper functions to realize the process of
   saving/loading ram and device. (patch 32)
 - Fix some other comments from Dave and Markus.

zhanghailiang (38):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'x-colo' to migration
  COLO: migrate colo related info to secondary node
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO/migration: Create a new communication path from destination to
source
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save PVM state to secondary side when do checkpoint
  COLO: Load PVM's dirty pages into SVM's RAM cache temporarily
  ram/COLO: Record the dirty pages that SVM received
  COLO: Load VMState into qsb before restore it
  COLO: Flush PVM's cached RAM into SVM's memory
  COLO: Add checkpoint-delay parameter for migrate-set-parameters
  COLO: synchronize PVM's state to SVM periodically
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Introduce state to record failover process
  COLO: Implement failover work for Primary VM
  COLO: Implement failover work for Secondary VM
  qmp event: Add COLO_EXIT event to notify users while exited from COLO
  COLO failover: Shutdown related socket fd when do failover
  COLO failover: Don't do failover during loading VM's state
  COLO: Process shutdown command for VM in COLO state
  COLO: Update the global runstate after going into colo state
  savevm: Introduce two helper functions for save/find loadvm_handlers
entry
  migration/savevm: Add new helpers to process the different stages of
loadvm
  migration/savevm: Export two helper functions for savevm process
  COLO: Separate the process of saving/loading ram and device state
  COLO: Split qemu_savevm_state_begin out of checkpoint process
  net/filter: Add a 'status' property for filter object
  filter-buffer: Accept zero interval
  net: Add notifier/callback for netdev init
  COLO/filter: add each netdev a buffer filter
  COLO: manage the status of buffer filters for PVM
  filter-buffer: make filter_buffer_flush() public
  COLO: flush buffered packets in checkpoint process or exit COLO
  COLO: Add block replication into colo process

 configure |  11 +
 docs/qmp-events.txt   |  16 +
 hmp-commands.hx   |  15 +
 hmp.c |  15 +
 hmp.h |   1 +
 include/exec/ram_addr.h   |   1 +
 include/migration/colo.h  |  42 ++
 include/migration/failover.h  |  33 ++
 include/migration/migration.h |  16 +
 include/migration/qemu-file.h |   3 +-
 include/net/filter.h  |   5 +
 include/net/net.h |   4 +
 include/sysemu/sysemu.h   |   9 +
 migration/Makefile.objs   |   2 +
 migration/colo-comm.c |  76 
 migration/colo-failover.c |  83 
 migration/colo.c  | 866 ++
 migration/migration.c | 109 +-
 migration/qemu-file-buf.c |  61 +++
 migration/ram.c   | 175 -
 migration/savevm.c| 114 

[Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover

2016-02-21 Thread zhanghailiang
We leave users to choose whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use experimental command
'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example, if the command is sent to the PVM, the Primary side will
exit COLO mode and take over operation. If sent to the Secondary, the
secondary will run failover work, then take over server operation to
become the new Primary.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Add Reviewed-by tag
v11:
- Add more comments for x-colo-lost-heartbeat command (Eric's suggestion)
- Return 'enum' instead of 'int' for get_colo_mode() (Eric's suggestion)
v10:
- Rename command colo_lost_hearbeat to experimental 'x_colo_lost_heartbeat'
---
 hmp-commands.hx  | 15 +++
 hmp.c|  8 
 hmp.h|  1 +
 include/migration/colo.h |  3 +++
 include/migration/failover.h | 20 
 migration/Makefile.objs  |  2 +-
 migration/colo-comm.c| 11 +++
 migration/colo-failover.c| 41 +
 migration/colo.c |  1 +
 qapi-schema.json | 29 +
 qmp-commands.hx  | 19 +++
 stubs/migration-colo.c   |  8 
 12 files changed, 157 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb52e4d..a381b0b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1039,6 +1039,21 @@ migration (or once already in postcopy).
 ETEXI
 
 {
+.name   = "x_colo_lost_heartbeat",
+.args_type  = "",
+.params = "",
+.help   = "Tell COLO that heartbeat is lost,\n\t\t\t"
+  "a failover or takeover is needed.",
+.mhandler.cmd = hmp_x_colo_lost_heartbeat,
+},
+
+STEXI
+@item x_colo_lost_heartbeat
+@findex x_colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+{
 .name   = "client_migrate_info",
 .args_type  = 
"protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
 .params = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 786954d..531963c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1312,6 +1312,14 @@ void hmp_migrate_start_postcopy(Monitor *mon, const 
QDict *qdict)
 hmp_handle_error(mon, );
 }
 
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+
+qmp_x_colo_lost_heartbeat();
+hmp_handle_error(mon, );
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index a8c5b5a..864a300 100644
--- a/hmp.h
+++ b/hmp.h
@@ -70,6 +70,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
+void hmp_x_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/colo.h b/include/migration/colo.h
index b40676c..e9ac2c3 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "qemu/coroutine_int.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -29,4 +30,6 @@ bool migration_incoming_enable_colo(void);
 void migration_incoming_exit_colo(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
+
+COLOMode get_colo_mode(void);
 #endif
diff --git a/include/migration/failover.h b/include/migration/failover.h
new file mode 100644
index 000..3274735
--- /dev/null
+++ b/include/migration/failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_FAILOVER_H
+#define QEMU_FAILOVER_H
+
+#include "qemu-common.h"
+

[Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state

2016-02-21 Thread zhanghailiang
If VM is in COLO FT state, we should do some extra work before normal shutdown
process. SVM will ignore the shutdown command if this command is issued directly
to it, PVM will send the shutdown command to SVM if it gets this command.

Cc: Paolo Bonzini 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v15:
- Go on the shutdown process even some error happened
  while sent 'SHUTDOWN' message to SVM.
- Add Reviewed-by tag
v14:
- Remove 'colo_shutdown' variable, use colo_shutdown_request directly
v13:
- Move COLO shutdown related codes to colo.c file (Dave's suggestion)
---
 include/migration/colo.h |  2 ++
 include/sysemu/sysemu.h  |  3 +++
 migration/colo.c | 44 ++--
 qapi-schema.json |  4 +++-
 stubs/migration-colo.c   |  5 +
 vl.c | 19 ---
 6 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index e32eef4..919b135 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -35,4 +35,6 @@ COLOMode get_colo_mode(void);
 
 /* failover */
 void colo_do_failover(MigrationState *s);
+
+bool colo_shutdown(void);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 3bb8897..91eeda3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -52,6 +52,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -59,6 +61,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index 515d561..855edee 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -330,6 +330,20 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 goto out;
 }
 
+if (colo_shutdown_requested) {
+colo_put_cmd(s->to_dst_file, COLO_MESSAGE_GUEST_SHUTDOWN, _err);
+if (local_err) {
+error_free(local_err);
+/* Go on the shutdown process and throw the error message */
+error_report("Failed to send shutdown message to SVM");
+}
+qemu_fflush(s->to_dst_file);
+colo_shutdown_requested = 0;
+qemu_system_shutdown_request_core();
+/* Fix me: Just let the colo thread exit ? */
+qemu_thread_exit(0);
+}
+
 ret = 0;
 /* Resume primary guest */
 qemu_mutex_lock_iothread();
@@ -390,8 +404,9 @@ static void colo_process_checkpoint(MigrationState *s)
 }
 
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-if (current_time - checkpoint_time <
-s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) {
+if ((current_time - checkpoint_time <
+s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) &&
+!colo_shutdown_requested) {
 int64_t delay_ms;
 
 delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] -
@@ -465,6 +480,15 @@ static void colo_wait_handle_cmd(QEMUFile *f, int 
*checkpoint_request,
 case COLO_MESSAGE_CHECKPOINT_REQUEST:
 *checkpoint_request = 1;
 break;
+case COLO_MESSAGE_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_system_shutdown_request_core();
+qemu_mutex_unlock_iothread();
+/* the main thread will exit and terminate the whole
+* process, do we need some cleanup?
+*/
+qemu_thread_exit(0);
 default:
 *checkpoint_request = 0;
 error_setg(errp, "Got unknown COLO command: %d", cmd);
@@ -636,3 +660,19 @@ out:
 
 return NULL;
 }
+
+bool colo_shutdown(void)
+{
+/*
+* if in colo mode, we need do some significant work before respond
+* to the shutdown request.
+*/
+if (migration_incoming_in_colo_state()) {
+return true; /* primary's responsibility */
+}
+if (migration_in_colo_state()) {
+colo_shutdown_requested = 1;
+return true;
+}
+return false;
+}
diff --git a/qapi-schema.json b/qapi-schema.json
index 7fec696..4d8ba04 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -752,12 +752,14 @@
 #
 # @vmstate-loaded: VM's state has been loaded by SVM.
 #
+# @guest-shutdown: shutdown require from PVM to SVM
+#
 # Since: 2.6
 ##
 { 'enum': 'COLOMessage',
   

[Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters

2016-02-21 Thread zhanghailiang
Add checkpoint-delay parameter for migrate-set-parameters, so that
we can control the checkpoint frequency when COLO is in periodic mode.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Change checkpoint-delay to x-checkpoint-delay (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move this patch ahead of the patch where uses 'checkpoint_delay'
 (Dave's suggestion)
v10:
- Fix related qmp command
---
 hmp.c |  7 +++
 migration/migration.c | 24 +++-
 qapi-schema.json  | 19 ---
 qmp-commands.hx   |  3 ++-
 4 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/hmp.c b/hmp.c
index bfbd667..786954d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -285,6 +285,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, " %s: %" PRId64,
 
MigrationParameter_lookup[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT],
 params->x_cpu_throttle_increment);
+monitor_printf(mon, " %s: %" PRId64,
+MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
+params->x_checkpoint_delay);
 monitor_printf(mon, "\n");
 }
 
@@ -1241,6 +1244,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 bool has_decompress_threads = false;
 bool has_x_cpu_throttle_initial = false;
 bool has_x_cpu_throttle_increment = false;
+bool has_x_checkpoint_delay = false;
 int i;
 
 for (i = 0; i < MIGRATION_PARAMETER__MAX; i++) {
@@ -1260,6 +1264,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 break;
 case MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT:
 has_x_cpu_throttle_increment = true;
+case MIGRATION_PARAMETER_X_CHECKPOINT_DELAY:
+has_x_checkpoint_delay = true;
 break;
 }
 qmp_migrate_set_parameters(has_compress_level, value,
@@ -1267,6 +1273,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
has_decompress_threads, value,
has_x_cpu_throttle_initial, value,
has_x_cpu_throttle_increment, value,
+   has_x_checkpoint_delay, value,
);
 break;
 }
diff --git a/migration/migration.c b/migration/migration.c
index 6e19c15..324dcb6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -57,6 +57,11 @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_CACHE_SIZE (64 * 1024 * 1024)
 
+/* The delay time (in ms) between two COLO checkpoints
+ * Note: Please change this default value to 1 when we support hybrid mode.
+ */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 
@@ -92,6 +97,8 @@ MigrationState *migrate_get_current(void)
 DEFAULT_MIGRATE_X_CPU_THROTTLE_INITIAL,
 .parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
 DEFAULT_MIGRATE_X_CPU_THROTTLE_INCREMENT,
+.parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] =
+DEFAULT_MIGRATE_X_CHECKPOINT_DELAY,
 };
 
 if (!once) {
@@ -531,6 +538,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
 params->x_cpu_throttle_increment =
 s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
+params->x_checkpoint_delay =
+s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY];
 
 return params;
 }
@@ -738,7 +747,10 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 bool has_x_cpu_throttle_initial,
 int64_t x_cpu_throttle_initial,
 bool has_x_cpu_throttle_increment,
-int64_t x_cpu_throttle_increment, Error **errp)
+int64_t x_cpu_throttle_increment,
+bool has_x_checkpoint_delay,
+int64_t x_checkpoint_delay,
+Error **errp)
 {
 MigrationState *s = migrate_get_current();
 
@@ -773,6 +785,11 @@ void qmp_migrate_set_parameters(bool has_compress_level,
"x_cpu_throttle_increment",
"an integer in the range of 1 to 99");
 }
+if (has_x_checkpoint_delay && (x_checkpoint_delay < 0)) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+ 

[Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol

2016-02-21 Thread zhanghailiang
We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points:

   Primary   Secondary
initial work
'checkpoint-ready'< @

'checkpoint-request'  @ >
Suspend (Only in hybrid mode)
'checkpoint-reply'< @
  Suspend state
'vmstate-send'@ >
  Send stateReceive state
'vmstate-received'< @
  Release packets   Load state
'vmstate-load'< @
  ResumeResume (Only in hybrid mode)

  Start Comparing (Only in hybrid mode)
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
one handshake(single direction) for low-latency.
If more strict synchronization is required, a opposite direction
sync-point should be added.
 3) Since sync-points are single direction, the remote side may
go forward a lot when this side just receives the sync-point.
 4) For now, we only support 'periodic' checkpoint, for which
   the Secondary VM is not running, later we will support 'hybrid' mode.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: Dr. David Alan Gilbert 
Reviewed-by: Dr. David Alan Gilbert 
---
v14:
- Rename 'COLOCommand' to 'COLOMessage'. (Markus's suggestion)
- Add Reviewd-by tag
v13:
- Refactor colo command related helper functions, use 'Error **errp' parameter
  instead of return value to indicate success or failure.
- Fix some other comments from Markus.

v12:
- Rename colo_ctl_put() to colo_put_cmd()
- Rename colo_ctl_get() to colo_get_check_cmd() and drop
  the third parameter
- Rename colo_ctl_get_cmd() to colo_get_cmd()
- Remove useless 'invalid' member for COLOcommand enum.
v11:
- Add missing 'checkpoint-ready' communication in comment.
- Use parameter to return 'value' for colo_ctl_get() (Dave's suggestion)
- Fix trace for colo_ctl_get() to trace command and value both
v10:
- Rename enum COLOCmd to COLOCommand (Eric's suggestion).
- Remove unused 'ram-steal'
---
 migration/colo.c | 201 ++-
 qapi-schema.json |  25 +++
 trace-events |   2 +
 3 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 43e9890..c0ff088 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include 
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
@@ -34,22 +35,147 @@ bool migration_incoming_in_colo_state(void)
 return mis && (mis->state == MIGRATION_STATUS_COLO);
 }
 
+static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
+ Error **errp)
+{
+int ret;
+
+if (cmd >= COLO_MESSAGE__MAX) {
+error_setg(errp, "%s: Invalid cmd", __func__);
+return;
+}
+qemu_put_be32(f, cmd);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Can't put COLO command");
+}
+trace_colo_put_cmd(COLOMessage_lookup[cmd]);
+}
+
+static COLOMessage colo_get_cmd(QEMUFile *f, Error **errp)
+{
+COLOMessage cmd;
+int ret;
+
+cmd = qemu_get_be32(f);
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Can't get COLO command");
+return cmd;
+}
+if (cmd >= COLO_MESSAGE__MAX) {
+error_setg(errp, "%s: Invalid cmd", __func__);
+return cmd;
+}
+trace_colo_get_cmd(COLOMessage_lookup[cmd]);
+return cmd;
+}
+
+static void colo_get_check_cmd(QEMUFile *f, COLOMessage expect_cmd,
+   Error **errp)
+{
+COLOMessage cmd;
+Error *local_err = NULL;
+
+cmd = colo_get_cmd(f, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+if (cmd != expect_cmd) {
+error_setg(errp, "Unexpected COLO command %d, expected %d",
+  expect_cmd, cmd);
+}
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s)
+{
+Error *local_err = NULL;
+
+colo_put_cmd(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
+ _err);
+if (local_err) {
+goto out;
+}
+
+colo_get_check_cmd(s->rp_state.from_dst_file,
+   COLO_MESSAGE_CHECKPOINT_REPLY, _err);
+if (local_err) {
+

[Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint

2016-02-21 Thread zhanghailiang
The main process of checkpoint is to synchronize SVM with PVM.
VM's state includes ram and device state. So we will migrate PVM's
state to SVM when do checkpoint, just like migration does.

We will cache PVM's state in slave, we use QEMUSizedBuffer
to store the data, we need to know the size of VM state, so in master,
we use qsb to store VM state temporarily, get the data size by call 
qsb_get_length()
and then migrate the data to the qsb in the secondary side.

Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
Cc: Dr. David Alan Gilbert 
---
v13:
- Refactor colo_put_cmd_value() to use 'Error **errp' to indicate success
  or failure.
v12:
- Replace the old colo_ctl_get() with the new helper function 
colo_put_cmd_value()
v11:
- Add Reviewed-by tag
---
 migration/colo.c | 92 +++-
 migration/ram.c  | 39 ++--
 2 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index c0ff088..7e4692c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,9 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
 bool colo_supported(void)
 {
 return true;
@@ -54,6 +57,27 @@ static void colo_put_cmd(QEMUFile *f, COLOMessage cmd,
 trace_colo_put_cmd(COLOMessage_lookup[cmd]);
 }
 
+static void colo_put_cmd_value(QEMUFile *f, COLOMessage cmd,
+   uint64_t value, Error **errp)
+{
+Error *local_err = NULL;
+int ret;
+
+colo_put_cmd(f, cmd, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+qemu_put_be64(f, value);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to send value for command:%s",
+ COLOMessage_lookup[cmd]);
+}
+}
+
 static COLOMessage colo_get_cmd(QEMUFile *f, Error **errp)
 {
 COLOMessage cmd;
@@ -90,9 +114,13 @@ static void colo_get_check_cmd(QEMUFile *f, COLOMessage 
expect_cmd,
 }
 }
 
-static int colo_do_checkpoint_transaction(MigrationState *s)
+static int colo_do_checkpoint_transaction(MigrationState *s,
+  QEMUSizedBuffer *buffer)
 {
+QEMUFile *trans = NULL;
+size_t size;
 Error *local_err = NULL;
+int ret = -1;
 
 colo_put_cmd(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
  _err);
@@ -105,15 +133,48 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
 if (local_err) {
 goto out;
 }
+/* Reset colo buffer and open it for write */
+qsb_set_length(buffer, 0);
+trans = qemu_bufopen("w", buffer);
+if (!trans) {
+error_report("Open colo buffer for write failed");
+goto out;
+}
 
-/* TODO: suspend and save vm state to colo buffer */
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("run", "stop");
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_header(trans);
+qemu_savevm_state_begin(trans, >params);
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete_precopy(trans, false);
+qemu_mutex_unlock_iothread();
+
+qemu_fflush(trans);
 
 colo_put_cmd(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
 if (local_err) {
 goto out;
 }
+/* we send the total size of the vmstate first */
+size = qsb_get_length(buffer);
+colo_put_cmd_value(s->to_dst_file, COLO_MESSAGE_VMSTATE_SIZE,
+   size, _err);
+if (local_err) {
+goto out;
+}
 
-/* TODO: send vmstate to Secondary */
+qsb_put_buffer(s->to_dst_file, buffer, size);
+qemu_fflush(s->to_dst_file);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+goto out;
+}
 
 colo_get_check_cmd(s->rp_state.from_dst_file,
COLO_MESSAGE_VMSTATE_RECEIVED, _err);
@@ -127,18 +188,26 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
 goto out;
 }
 
-/* TODO: resume Primary */
+ret = 0;
+/* Resume primary guest */
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
 
-return 0;
 out:
 if (local_err) {
 error_report_err(local_err);
 }
-return -EINVAL;
+if (trans) {
+qemu_fclose(trans);
+}
+return ret;
 }
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+QEMUSizedBuffer *buffer = NULL;
 Error *local_err = NULL;
 int ret;
 
@@ -158,6 +227,12 @@ static void 

[Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM

2016-02-21 Thread zhanghailiang
If users require SVM to takeover work, colo incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Improve error message that suggested by Dave
- Add Reviewed-by tag
---
 migration/colo.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 89cea58..a65b22b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -45,6 +45,33 @@ static bool colo_runstate_is_stopped(void)
 return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void secondary_vm_do_failover(void)
+{
+int old_state;
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+migrate_set_state(>state, MIGRATION_STATUS_COLO,
+  MIGRATION_STATUS_COMPLETED);
+
+if (!autostart) {
+error_report("\"-S\" qemu option will be ignored in secondary side");
+/* recover runstate to normal migration finish state */
+autostart = true;
+}
+
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_COMPLETED);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Incorrect state (%d) while doing failover for "
+ "secondary VM", old_state);
+return;
+}
+/* For Secondary VM, jump to incoming co */
+if (mis->migration_incoming_co) {
+qemu_coroutine_enter(mis->migration_incoming_co, NULL);
+}
+}
+
 static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
@@ -71,6 +98,8 @@ void colo_do_failover(MigrationState *s)
 
 if (get_colo_mode() == COLO_MODE_PRIMARY) {
 primary_vm_do_failover();
+} else {
+secondary_vm_do_failover();
 }
 }
 
@@ -430,6 +459,11 @@ void *colo_process_incoming_thread(void *opaque)
 goto out;
 }
 assert(request);
+if (failover_request_is_active()) {
+error_report("failover request");
+goto out;
+}
+
 /* FIXME: This is unnecessary for periodic checkpoint mode */
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
  _err);
@@ -501,10 +535,11 @@ out:
 qemu_fclose(fb);
 }
 qsb_free(buffer);
-
-qemu_mutex_lock_iothread();
+/* Here, we can ensure BH is hold the global lock, and will join colo
+* incoming thread, so here it is not necessary to lock here again,
+* or there will be a deadlock error.
+*/
 colo_release_ram_cache();
-qemu_mutex_unlock_iothread();
 
 if (mis->to_src_file) {
 qemu_fclose(mis->to_src_file);
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state

2016-02-21 Thread zhanghailiang
We should not do failover work while the main thread is loading
VM's state, otherwise it will destroy the consistent of VM's memory and
device state.

Here we add a new failover status 'RELAUNCH' which means we should
relaunch the process of failover.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v14:
- Move the place of 'vmstate_loading = false;'.
v13:
- Add Reviewed-by tag
---
 include/migration/failover.h |  2 ++
 migration/colo.c | 25 +
 2 files changed, 27 insertions(+)

diff --git a/include/migration/failover.h b/include/migration/failover.h
index c4bd81e..99b0d58 100644
--- a/include/migration/failover.h
+++ b/include/migration/failover.h
@@ -20,6 +20,8 @@ typedef enum COLOFailoverStatus {
 FAILOVER_STATUS_REQUEST = 1, /* Request but not handled */
 FAILOVER_STATUS_HANDLING = 2, /* In the process of handling failover */
 FAILOVER_STATUS_COMPLETED = 3, /* Finish the failover process */
+/* Optional, Relaunch the failover process, again 'NONE' -> 'COMPLETED' */
+FAILOVER_STATUS_RELAUNCH = 4,
 } COLOFailoverStatus;
 
 void failover_init_state(void);
diff --git a/migration/colo.c b/migration/colo.c
index 5c87a8e..515d561 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -19,6 +19,8 @@
 #include "migration/failover.h"
 #include "qapi-event.h"
 
+static bool vmstate_loading;
+
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -51,6 +53,19 @@ static void secondary_vm_do_failover(void)
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
 
+/* Can not do failover during the process of VM's loading VMstate, Or
+  * it will break the secondary VM.
+  */
+if (vmstate_loading) {
+old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
+   FAILOVER_STATUS_RELAUNCH);
+if (old_state != FAILOVER_STATUS_HANDLING) {
+error_report("Unknown error while do failover for secondary VM,"
+ "old_state: %d", old_state);
+}
+return;
+}
+
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
 
@@ -560,13 +575,22 @@ void *colo_process_incoming_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
+vmstate_loading = true;
 if (qemu_loadvm_state(fb) < 0) {
 error_report("COLO: loadvm failed");
 qemu_mutex_unlock_iothread();
 goto out;
 }
+
+vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 
+if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
+failover_set_state(FAILOVER_STATUS_RELAUNCH, FAILOVER_STATUS_NONE);
+failover_request_active(NULL);
+goto out;
+}
+
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
  _err);
 if (local_err) {
@@ -578,6 +602,7 @@ void *colo_process_incoming_thread(void *opaque)
 }
 
 out:
+ vmstate_loading = false;
 /* Throw the unreported error message after exited from loop */
 if (local_err) {
 error_report_err(local_err);
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source

2016-02-21 Thread zhanghailiang
This new communication path will be used for returning messages
from destination to source.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v13:
- Remove useless error report
v12:
- Add Reviewed-by tag
v11:
- Rebase master to use qemu_file_get_return_path() for opening return path
v10:
- fix the the error log (Dave's suggestion).
---
 migration/colo.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 20052d9..43e9890 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -36,6 +36,12 @@ bool migration_incoming_in_colo_state(void)
 
 static void colo_process_checkpoint(MigrationState *s)
 {
+s->rp_state.from_dst_file = qemu_file_get_return_path(s->to_dst_file);
+if (!s->rp_state.from_dst_file) {
+error_report("Open QEMUFile from_dst_file failed");
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 vm_start();
 qemu_mutex_unlock_iothread();
@@ -43,8 +49,13 @@ static void colo_process_checkpoint(MigrationState *s)
 
 /*TODO: COLO checkpoint savevm loop*/
 
+out:
 migrate_set_state(>state, MIGRATION_STATUS_COLO,
   MIGRATION_STATUS_COMPLETED);
+
+if (s->rp_state.from_dst_file) {
+qemu_fclose(s->rp_state.from_dst_file);
+}
 }
 
 void migrate_start_colo_process(MigrationState *s)
@@ -63,8 +74,23 @@ void *colo_process_incoming_thread(void *opaque)
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
 
+mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
+if (!mis->to_src_file) {
+error_report("colo incoming thread: Open QEMUFile to_src_file failed");
+goto out;
+}
+/* Note: We set the fd to unblocked in migration incoming coroutine,
+*  But here we are in the colo incoming thread, so it is ok to set the
+*  fd back to blocked.
+*/
+qemu_file_set_blocking(mis->from_src_file, true);
+
 /* TODO: COLO checkpoint restore loop */
 
+out:
+if (mis->to_src_file) {
+qemu_fclose(mis->to_src_file);
+}
 migration_incoming_exit_colo();
 
 return NULL;
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory

2016-02-21 Thread zhanghailiang
During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Add a trace point in the end of colo_flush_ram_cache() (Dave's suggestion)
- Add Reviewed-by tag
v11:
- Move the place of 'need_flush' (Dave's suggestion)
- Remove unused 'DPRINTF("Flush ram_cache\n")'
v10:
- trace the number of dirty pages that be received.
---
 include/migration/migration.h |  1 +
 migration/colo.c  |  2 --
 migration/ram.c   | 38 ++
 trace-events  |  2 ++
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 6907986..14b9f3d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -336,4 +336,5 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 /* ram cache */
 int colo_init_ram_cache(void);
 void colo_release_ram_cache(void);
+void colo_flush_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index b9f60c7..473fb14 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -417,8 +417,6 @@ void *colo_process_incoming_thread(void *opaque)
 }
 qemu_mutex_unlock_iothread();
 
-/* TODO: flush vm state */
-
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED,
  _err);
 if (local_err) {
diff --git a/migration/ram.c b/migration/ram.c
index 7373df3..891f3b2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2465,6 +2465,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  * be atomic
  */
 bool postcopy_running = postcopy_state_get() >= 
POSTCOPY_INCOMING_LISTENING;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -2499,6 +2500,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 /* After going into COLO, we should load the Page into colo_cache 
*/
 if (ram_cache_enable) {
 host = colo_cache_from_block_offset(block, addr);
+need_flush = true;
 } else {
 host = host_from_ram_block_offset(block, addr);
 }
@@ -2591,6 +2593,10 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 rcu_read_unlock();
+
+if (!ret  && ram_cache_enable && need_flush) {
+colo_flush_ram_cache();
+}
 DPRINTF("Completed load of VM with exit code %d seq iteration "
 "%" PRIu64 "\n", ret, seq_iter);
 return ret;
@@ -2663,6 +2669,38 @@ void colo_release_ram_cache(void)
 rcu_read_unlock();
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+void colo_flush_ram_cache(void)
+{
+RAMBlock *block = NULL;
+void *dst_host;
+void *src_host;
+ram_addr_t offset = 0;
+
+trace_colo_flush_ram_cache_begin(migration_dirty_pages);
+rcu_read_lock();
+block = QLIST_FIRST_RCU(_list.blocks);
+while (block) {
+ram_addr_t ram_addr_abs;
+offset = migration_bitmap_find_dirty(block, offset, _addr_abs);
+migration_bitmap_clear_dirty(ram_addr_abs);
+if (offset >= block->used_length) {
+offset = 0;
+block = QLIST_NEXT_RCU(block, next);
+} else {
+dst_host = block->host + offset;
+src_host = block->colo_cache + offset;
+memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+}
+}
+rcu_read_unlock();
+trace_colo_flush_ram_cache_end();
+assert(migration_dirty_pages == 0);
+}
+
 static SaveVMHandlers savevm_ram_handlers = {
 .save_live_setup = ram_save_setup,
 .save_live_iterate = ram_save_iterate,
diff --git a/trace-events b/trace-events
index 97807cd..ee4a2fb 100644
--- a/trace-events
+++ b/trace-events
@@ -1290,6 +1290,8 @@ migration_throttle(void) ""
 ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 
%zx len: %zx"
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb

2016-02-21 Thread zhanghailiang
Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v11:
- size_t'ify these two help functions (Dave's suggestion)
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c | 61 +++
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index b5d08d2..ca6a582 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -150,7 +150,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t 
start, size_t count,
uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
  off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size);
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 7b8e78e..7801780 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -367,6 +367,67 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t 
*source,
 return count;
 }
 
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, size_t size)
+{
+size_t l;
+int i;
+
+for (i = 0; i < qsb->n_iov && size > 0; i++) {
+l = MIN(qsb->iov[i].iov_len, size);
+qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+size -= l;
+}
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+size_t qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, size_t size)
+{
+ssize_t rc = qsb_grow(qsb, size);
+ssize_t pending = size;
+int i;
+uint8_t *buf = NULL;
+
+qsb->used = 0;
+
+if (rc < 0) {
+return rc;
+}
+
+for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+size_t doneone = 0;
+/* read until iov full */
+while (doneone < qsb->iov[i].iov_len && pending > 0) {
+size_t readone = 0;
+
+buf = qsb->iov[i].iov_base;
+readone = qemu_get_buffer(f, buf,
+MIN(qsb->iov[i].iov_len - doneone, pending));
+if (readone == 0) {
+return qsb->used;
+}
+buf += readone;
+doneone += readone;
+pending -= readone;
+qsb->used += readone;
+}
+}
+return qsb->used;
+}
+
 typedef struct QEMUBuffer {
 QEMUSizedBuffer *qsb;
 QEMUFile *file;
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO

2016-02-21 Thread zhanghailiang
Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Eric Blake 
---
 qapi-schema.json | 5 -
 vl.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 29afbb9..935870d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -154,12 +154,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.6)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-'guest-panicked' ] }
+'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index f35703f..1cde195 100644
--- a/vl.c
+++ b/vl.c
@@ -593,6 +593,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH },
 { RUN_STATE_INMIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -605,6 +606,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_PAUSED, RUN_STATE_COLO},
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -617,10 +619,13 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
+{ RUN_STATE_COLO, RUN_STATE_RUNNING },
+
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -631,6 +636,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
 { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_RUNNING, RUN_STATE_COLO},
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -643,10 +649,12 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_WATCHDOG, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm

2016-02-21 Thread zhanghailiang
Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.
We add three new members to struct MigrationIncomingState, 
'have_colo_incoming_thread'
and 'colo_incoming_thread' record the colo related threads for secondary VM,
'migration_incoming_co' records the original migration incoming coroutine.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Add Reviewed-by tag
v11:
- We moved the place of bdrv_invalidate_cache_all(), but done the deleting work
  in other patch. Fix it.
- Add documentation for colo in 'MigrationStatus' (Eric's review comment)
v10:
- fix a bug about fd leak which is found by Dave.
---
 include/migration/colo.h  |  7 +++
 include/migration/migration.h |  7 +++
 migration/colo-comm.c | 10 ++
 migration/colo.c  | 22 ++
 migration/migration.c | 31 +--
 stubs/migration-colo.c| 10 ++
 6 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index bf84b99..b40676c 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -15,6 +15,8 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "qemu/coroutine_int.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_init(void);
@@ -22,4 +24,9 @@ void colo_info_init(void);
 void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
+/* loadvm */
+bool migration_incoming_enable_colo(void);
+void migration_incoming_exit_colo(void);
+void *colo_process_incoming_thread(void *opaque);
+bool migration_incoming_in_colo_state(void);
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index c962ad4..e7a516c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -22,6 +22,7 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 #include "exec/cpu-common.h"
+#include "qemu/coroutine_int.h"
 
 #define QEMU_VM_FILE_MAGIC   0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x0002
@@ -106,6 +107,12 @@ struct MigrationIncomingState {
 void *postcopy_tmp_page;
 
 int state;
+
+bool have_colo_incoming_thread;
+QemuThread colo_incoming_thread;
+/* The coroutine we should enter (back) after failover */
+Coroutine *migration_incoming_co;
+
 /* See savevm.c */
 LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 723d86d..c36d13f 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -48,3 +48,13 @@ void colo_info_init(void)
 {
 vmstate_register(NULL, 0, _state, _info);
 }
+
+bool migration_incoming_enable_colo(void)
+{
+return colo_info.colo_requested;
+}
+
+void migration_incoming_exit_colo(void)
+{
+colo_info.colo_requested = 0;
+}
diff --git a/migration/colo.c b/migration/colo.c
index 8d0d851..20052d9 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
 #include "trace.h"
+#include "qemu/error-report.h"
 
 bool colo_supported(void)
 {
@@ -26,6 +27,13 @@ bool migration_in_colo_state(void)
 return (s->state == MIGRATION_STATUS_COLO);
 }
 
+bool migration_incoming_in_colo_state(void)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+return mis && (mis->state == MIGRATION_STATUS_COLO);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
 qemu_mutex_lock_iothread();
@@ -47,3 +55,17 @@ void migrate_start_colo_process(MigrationState *s)
 colo_process_checkpoint(s);
 qemu_mutex_lock_iothread();
 }
+
+void *colo_process_incoming_thread(void *opaque)
+{
+MigrationIncomingState *mis = opaque;
+
+migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
+  MIGRATION_STATUS_COLO);
+
+/* TODO: COLO checkpoint restore loop */
+
+migration_incoming_exit_colo();
+
+return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index d7228f5..6e19c15 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -359,6 +359,27 @@ static void process_incoming_migration_co(void *opaque)
 /* Else if something went wrong then just fall out of the normal exit 
*/
 }
 
+if (!ret) {
+/* Make sure all file formats flush their mutable metadata */
+bdrv_invalidate_cache_all(_err);
+if (local_err) {
+error_report_err(local_err);
+migrate_decompress_threads_join();
+exit(EXIT_FAILURE);
+}
+}
+/* we get colo info, and know if we are in colo mode */
+if (!ret && migration_incoming_enable_colo()) {
+mis->migration_incoming_co = qemu_coroutine_self();
+qemu_thread_create(>colo_incoming_thread, "colo 

[Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support

2016-02-21 Thread zhanghailiang
configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is On by default.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v11:
- Turn COLO on in default (Eric's suggestion)
---
 configure | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/configure b/configure
index 0aa249b..3b89fe2 100755
--- a/configure
+++ b/configure
@@ -229,6 +229,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="yes"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -911,6 +912,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1334,6 +1339,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   fdt fdt device tree
   bluez   bluez stack connectivity
   kvm KVM acceleration support
+  coloCOarse-grain LOck-stepping VM for Non-stop Service
   rdmaRDMA-based migration support
   uuiduuid support
   vde support for vde network
@@ -4725,6 +4731,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
+echo "COLO support  $colo"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -5320,6 +5327,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily

2016-02-21 Thread zhanghailiang
We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached 
ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v12:
- Fix minor error in error_report (Dave's comment)
- Add Reviewed-by tag
v11:
- Rename 'host_cache' to 'colo_cache' (Dave's suggestion)
v10:
- Split the process of dirty pages recording into a new patch
---
 include/exec/ram_addr.h   |  1 +
 include/migration/migration.h |  4 +++
 migration/colo.c  | 11 +++
 migration/ram.c   | 73 ++-
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 5d33def..53c1f48 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -26,6 +26,7 @@ struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
 uint8_t *host;
+uint8_t *colo_cache; /* For colo, VM's ram cache */
 ram_addr_t offset;
 ram_addr_t used_length;
 ram_addr_t max_length;
diff --git a/include/migration/migration.h b/include/migration/migration.h
index e7a516c..6907986 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -332,4 +332,8 @@ int ram_save_queue_pages(MigrationState *ms, const char 
*rbname,
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
+
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 7e4692c..57a1132 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -298,6 +298,7 @@ void *colo_process_incoming_thread(void *opaque)
 {
 MigrationIncomingState *mis = opaque;
 Error *local_err = NULL;
+int ret;
 
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_COLO);
@@ -313,6 +314,12 @@ void *colo_process_incoming_thread(void *opaque)
 */
 qemu_file_set_blocking(mis->from_src_file, true);
 
+ret = colo_init_ram_cache();
+if (ret < 0) {
+error_report("Failed to initialize ram cache");
+goto out;
+}
+
 colo_put_cmd(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
  _err);
 if (local_err) {
@@ -363,6 +370,10 @@ out:
 error_report_err(local_err);
 }
 
+qemu_mutex_lock_iothread();
+colo_release_ram_cache();
+qemu_mutex_unlock_iothread();
+
 if (mis->to_src_file) {
 qemu_fclose(mis->to_src_file);
 }
diff --git a/migration/ram.c b/migration/ram.c
index 627ffea..027c5bc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -224,6 +224,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static QemuMutex migration_bitmap_mutex;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -2191,6 +2192,20 @@ static inline void *host_from_ram_block_offset(RAMBlock 
*block,
 return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+ ram_addr_t offset)
+{
+if (!offset_in_ramblock(block, offset)) {
+return NULL;
+}
+if (!block->colo_cache) {
+error_report("%s: colo_cache is NULL in block :%s",
+ __func__, block->idstr);
+return NULL;
+}
+return block->colo_cache + offset;
+}
+
 /*
  * If a page (or a whole RDMA chunk) has been
  * determined to be zero, then zap it.
@@ -2467,7 +2482,12 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
-host = host_from_ram_block_offset(block, addr);
+/* After going into COLO, we should load the Page into colo_cache 
*/
+if (ram_cache_enable) {
+host = colo_cache_from_block_offset(block, addr);
+} else {
+host = host_from_ram_block_offset(block, addr);
+}
 if (!host) {
 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
 ret = -EINVAL;
@@ -2562,6 +2582,57 @@ static int ram_load(QEMUFile *f, 

[Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node

2016-02-21 Thread zhanghailiang
We can know if VM in destination should go into COLO mode by refer to
the info that been migrated from PVM.

We skip this section if colo is not enabled (i.e.
migrate_set_capability colo off), so that, It not break compatibility with 
migration
however the --enable-colo/disable-colo on the source/destination;

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
v14:
- Adjust the place of calling colo_info_init()
v11:
- Add Reviewed-by tag
v10:
- Use VMSTATE_BOOL instead of VMSTATE_UNIT32 for 'colo_requested' (Dave's 
suggestion)
---
 include/migration/colo.h |  2 ++
 migration/Makefile.objs  |  1 +
 migration/colo-comm.c| 50 
 vl.c |  4 +++-
 4 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index 59a632a..1c899a0 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -14,7 +14,9 @@
 #define QEMU_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 65ecc35..81b5713 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 000..723d86d
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,50 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include "trace.h"
+
+typedef struct {
+ bool colo_requested;
+} COLOInfo;
+
+static COLOInfo colo_info;
+
+static void colo_info_pre_save(void *opaque)
+{
+COLOInfo *s = opaque;
+
+s->colo_requested = migrate_colo_enabled();
+}
+
+static bool colo_info_need(void *opaque)
+{
+   return migrate_colo_enabled();
+}
+
+static const VMStateDescription colo_state = {
+ .name = "COLOState",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .pre_save = colo_info_pre_save,
+ .needed = colo_info_need,
+ .fields = (VMStateField[]) {
+ VMSTATE_BOOL(colo_requested, COLOInfo),
+ VMSTATE_END_OF_LIST()
+},
+};
+
+void colo_info_init(void)
+{
+vmstate_register(NULL, 0, _state, _info);
+}
diff --git a/vl.c b/vl.c
index b87e292..f35703f 100644
--- a/vl.c
+++ b/vl.c
@@ -85,6 +85,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4394,6 +4395,8 @@ int main(int argc, char **argv, char **envp)
 /* clean up network at qemu process termination */
 atexit(_cleanup);
 
+colo_info_init();
+
 if (net_init_clients() < 0) {
 exit(1);
 }
@@ -4425,7 +4428,6 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
-
 /* If the currently selected machine wishes to override the units-per-bus
  * property of its default HBA interface type, do so now. */
 if (machine_class->units_per_default_bus) {
-- 
1.8.3.1





[Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration

2016-02-21 Thread zhanghailiang
We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
Reviewed-by: Eric Blake 
---
v14:
- Fix the date of Copyright to 2016
v10:
- Rename capability 'colo' to experimental 'x-colo' (Eric's suggestion).
- Rename migrate_enable_colo() to migrate_colo_enabled() (Eric's suggestion).
---
 include/migration/colo.h  | 20 
 include/migration/migration.h |  1 +
 migration/Makefile.objs   |  1 +
 migration/colo.c  | 18 ++
 migration/migration.c | 18 ++
 qapi-schema.json  |  6 +-
 qmp-commands.hx   |  1 +
 stubs/Makefile.objs   |  1 +
 stubs/migration-colo.c| 18 ++
 9 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/colo.h
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
new file mode 100644
index 000..59a632a
--- /dev/null
+++ b/include/migration/colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_COLO_H
+#define QEMU_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 74684ad..c962ad4 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -271,6 +271,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 0cac6d7..65ecc35 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o postcopy-ram.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 000..cb3e22d
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/colo.h"
+
+bool colo_supported(void)
+{
+return true;
+}
diff --git a/migration/migration.c b/migration/migration.c
index a64cfcd..68b5019 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -34,6 +34,7 @@
 #include "qom/cpu.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "migration/colo.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -485,6 +486,9 @@ MigrationCapabilityStatusList 
*qmp_query_migrate_capabilities(Error **errp)
 
 caps = NULL; /* silence compiler warning */
 for (i = 0; i < MIGRATION_CAPABILITY__MAX; i++) {
+if (i == MIGRATION_CAPABILITY_X_COLO && !colo_supported()) {
+continue;
+}
 if (head == NULL) {
 head = g_malloc0(sizeof(*caps));
 caps = head;
@@ -684,6 +688,14 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap->next) {
+if (cap->value->capability == MIGRATION_CAPABILITY_X_COLO) {
+if (!colo_supported()) {
+error_setg(errp, "COLO is not currently supported, please"
+ " configure with --enable-colo option in order to"
+ " support COLO feature");
+continue;
+}
+}
 s->enabled_capabilities[cap->value->capability] = cap->value->state;
 }
 
@@ 

Re: [Qemu-devel] [PATCHv7 4/9] slirp: Factorizing tcpiphdr structure with an union

2016-02-21 Thread Samuel Thibault
Hello,

Thomas Huth, on Fri 19 Feb 2016 14:44:59 +0100, wrote:
> > +   m->m_data -= sizeof(struct tcpiphdr) - (sizeof(struct ip)
> > ++ sizeof(struct tcphdr));
> > +   m->m_len += sizeof(struct tcpiphdr) - (sizeof(struct ip)
> > +   + sizeof(struct tcphdr));
> 
> I'm somewhat having a hard time to understand the  "+ sizeof(struct
> tcphdr))" here.
> 
> In the tcp_output.c code, there is this:
> 
>   m->m_data += sizeof(struct tcpiphdr) - sizeof(struct tcphdr)
>- sizeof(struct ip);
> 
> So with my limited point of view, I'd rather expect this here in
> tcp_input.c:
> 
>   m->m_data -= sizeof(struct tcpiphdr) - (sizeof(struct ip)
>- sizeof(struct tcphdr));
> i.e. "-" instead of "+" here ^

The parentheses and indentation were misleading actually, here is how it
should actually looks like:

> > +   m->m_data -= sizeof(struct tcpiphdr) - ( sizeof(struct ip)
> > ++ sizeof(struct tcphdr));

I've now dropped the parentheses, so it looks like the tcp_output.c code:

m->m_data -= sizeof(struct tcpiphdr) - sizeof(struct ip)
 - sizeof(struct tcphdr);

Samuel



Re: [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-21 Thread Peter Xu
On Fri, Feb 19, 2016 at 01:33:09PM +0100, Andrea Bolognani wrote:
> I didn't say it would be hard :)
> 
> I just said that such compatibility code would have to be kept
> around forever. We already support lots and lots of similar cases
> in libvirt, the difference being that in this case we would add
> support for a new command *knowing in advance* that it will become
> obsolete as soon as a proper implementation is available.
> 
> It might still be the right thing to do! I just want to make sure
> everything's been properly considered and discussed beforehand.

I totally agree with you to think more before doing. :)

Then I will try to move on. Appreciate for all the review comments!

Peter



Re: [Qemu-devel] [PATCH v3 0/2] Fix migration of old pseries

2016-02-21 Thread David Gibson
On Fri, Feb 19, 2016 at 08:59:44AM +0100, Greg Kurz wrote:
> On Fri, 19 Feb 2016 11:11:47 +1100
> David Gibson  wrote:
> 
> > On Thu, Feb 18, 2016 at 12:32:11PM +0100, Greg Kurz wrote:
> > > QEMU 2.4 broke the migration of old pseries machine with the addition
> > > of configuration sections, which are sent unconditionally.
> > > 
> > > We assume that QEMU 2.3 is more deployed than any newer release (based on
> > > the versions currently shipped by most distros). This v3 series hence
> > > reverses the logic from v2: it now fully fixes migration of old pseries
> > > from/to QEMU 2.3 and provides a manual workaround for the QEMU 
> > > 2.4/2.4.1/2.5
> > > case.
> > > 
> > > With this series, I could migrate the same pseries-2.3 instance in a full
> > > 2.3->2.6->2.5->2.6->2.4->2.6->2.3 cycle.  
> > 
> > Sorry, I've lost track slightly here.  Does this series apply on top
> > of, or instead of your earlier series that peeks for the config
> > section?
> > 
> 
> This v3 series applies instead of the v2 that peeks for the config section.

Ok, thanks for the clarification.

> It was suggested by Laurent during review, and motivated by your decision
> to favor fixing 2.3 over 2.4.
> 
> As shown in Laurent's detailed test report, migration from/to 2.3.x now works
> out of the box and 2.4.x/2.5 requires qom-set.
> 
> I was also feeling a bit uncomfortable with all these machine properties to
> disable the configuration section, which was explicitly coded to be 
> non-optional
> according to the changelog of commit 61964c23. The logic inversion in v3 seem
> to be friendlier with the configuration section design.
> 
> Juan, could you share your thoughts ?

With an ack from Juan I'll be happy to merge this to ppc-for-2.6.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 5/7] STM32F205: Connect the ADC devices

2016-02-21 Thread Peter Maydell
On 21 February 2016 at 23:35, Alistair Francis  wrote:
> On Tue, Feb 2, 2016 at 7:27 AM, Peter Maydell  
> wrote:
>> On 19 January 2016 at 07:23, Alistair Francis  wrote:
>> You can't just wire multiple irq lines up like this; I think if
>> you do then if devices A and B both assert the IRQ and then A
>> deasserts it, then the receiving device will see an IRQ deassert
>> when it should not (since B still holds it high).
>
> I can't figure out if that is how HW actually does it. I can't find
> too much in the data sheet on how these interrupts behave.
>
> In saying that, I am fine with what you described being the behaviour.
> I don't know any better way to connect the 3 devices to one interrupt
> line. Do you have any suggestions?

You're right that the data sheet is unclear, but I think the
only vaguely plausible setup is that the three lines are ORed
together. That way if any ADC asserts the line then the guest
presumably looks at all of them to find which one has asserted
it, and then writes to the register to acknowledge the interrupt.
So if two ADCs assert at the same time, the guest will still
(correctly) see an interrupt until it acks the second ADC.

Unfortunately we don't have a qemu_irq OR gate at the moment
I think, but it's a pretty simple thing to write.

thanks
-- PMM



[Qemu-devel] Integrating a Memory Simulator

2016-02-21 Thread Hao Bai
Hi All,

I was trying to integrate the DRAMSim2 memory simulator [1] into QEMU.
Basically I wanted to modify the current memory interface of QEMU so that
all memory accesses will be directed to DRAMSim2. Can anyone give me
hints/comments/thoughts on how to do this? I am targeting x86-64
architecture in user mode.

[1] Repository: https://github.com/dramninjasUMD/DRAMSim2
Paper: https://www.ece.umd.edu/~blj/papers/cal10-1.pdf

Cheers


Re: [Qemu-devel] [PATCH v3 5/7] STM32F205: Connect the ADC devices

2016-02-21 Thread Alistair Francis
On Tue, Feb 2, 2016 at 7:27 AM, Peter Maydell  wrote:
> On 19 January 2016 at 07:23, Alistair Francis  wrote:
>> Connect the ADC devices to the STM32F205 SoC.
>>
>> Signed-off-by: Alistair Francis 
>> ---
>> V2:
>>  - Fix up the device/devices commit message
>>
>>  hw/arm/stm32f205_soc.c | 22 ++
>>  include/hw/arm/stm32f205_soc.h |  3 +++
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/hw/arm/stm32f205_soc.c b/hw/arm/stm32f205_soc.c
>> index a2bd970..28d4301 100644
>> --- a/hw/arm/stm32f205_soc.c
>> +++ b/hw/arm/stm32f205_soc.c
>> @@ -32,9 +32,12 @@ static const uint32_t timer_addr[STM_NUM_TIMERS] = { 
>> 0x4000, 0x4400,
>>  0x4800, 0x4C00 };
>>  static const uint32_t usart_addr[STM_NUM_USARTS] = { 0x40011000, 0x40004400,
>>  0x40004800, 0x40004C00, 0x40005000, 0x40011400 };
>> +static const uint32_t adc_addr[STM_NUM_ADCS] = { 0x40012000, 0x40012100,
>> +0x40012200 };
>>
>>  static const int timer_irq[STM_NUM_TIMERS] = {28, 29, 30, 50};
>>  static const int usart_irq[STM_NUM_USARTS] = {37, 38, 39, 52, 53, 71};
>> +#define ADC_IRQ 18
>
> Really three devices but only one IRQ ?

Yep, that's how HW does it. At least according to the reference manual.

>
>> +/* ADC 1 to 3 */
>> +for (i = 0; i < STM_NUM_ADCS; i++) {
>> +dev = DEVICE(&(s->adc[i]));
>> +object_property_set_bool(OBJECT(>adc[i]), true, "realized", 
>> );
>> +if (err != NULL) {
>> +error_propagate(errp, err);
>> +return;
>> +}
>> +busdev = SYS_BUS_DEVICE(dev);
>> +sysbus_mmio_map(busdev, 0, adc_addr[i]);
>> +sysbus_connect_irq(busdev, 0, qdev_get_gpio_in(nvic, ADC_IRQ));
>
> You can't just wire multiple irq lines up like this; I think if
> you do then if devices A and B both assert the IRQ and then A
> deasserts it, then the receiving device will see an IRQ deassert
> when it should not (since B still holds it high).

I can't figure out if that is how HW actually does it. I can't find
too much in the data sheet on how these interrupts behave.

In saying that, I am fine with what you described being the behaviour.
I don't know any better way to connect the 3 devices to one interrupt
line. Do you have any suggestions?

Thanks,

Alistair

>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH v3 4/7] STM32F2xx: Add the SPI device

2016-02-21 Thread Alistair Francis
On Tue, Feb 2, 2016 at 7:30 AM, Peter Maydell  wrote:
> On 19 January 2016 at 07:23, Alistair Francis  wrote:
>> Add the STM32F2xx SPI device.
>>
>> Signed-off-by: Alistair Francis 
>> ---
>> V2:
>>  - Address Peter C's comments
>>
>>  default-configs/arm-softmmu.mak |   1 +
>>  hw/ssi/Makefile.objs|   1 +
>>  hw/ssi/stm32f2xx_spi.c  | 205 
>> 
>>  include/hw/ssi/stm32f2xx_spi.h  |  72 ++
>>  4 files changed, 279 insertions(+)
>>  create mode 100644 hw/ssi/stm32f2xx_spi.c
>>  create mode 100644 include/hw/ssi/stm32f2xx_spi.h
>
>> +static uint64_t stm32f2xx_spi_read(void *opaque, hwaddr addr,
>> + unsigned int size)
>> +{
>> +STM32F2XXSPIState *s = opaque;
>> +uint32_t retval;
>> +
>> +DB_PRINT("Address: 0x%"HWADDR_PRIx"\n", addr);
>> +
>> +switch (addr) {
>> +case STM_SPI_CR1:
>> +return s->spi_cr1;
>> +case STM_SPI_CR2:
>> +qemu_log_mask(LOG_UNIMP, "%s: Interrupts and DMA are not 
>> implemented\n",
>> +  __func__);
>> +return s->spi_cr2;
>> +case STM_SPI_SR:
>> +retval = s->spi_sr;
>> +return retval;
>> +case STM_SPI_DR:
>> +stm32f2xx_spi_transfer(s);
>> +s->spi_sr &= ~STM_SPI_SR_RXNE;
>> +return s->spi_dr;
>> +case STM_SPI_CRCPR:
>> +qemu_log_mask(LOG_UNIMP, "%s: CRC is not implemented, the registers 
>> " \
>> +  "are included for compatability\n", __func__);
>
> "compatibility" again, here and below.

Fixed

>
>
>> +return s->spi_crcpr;
>> +case STM_SPI_RXCRCR:
>> +qemu_log_mask(LOG_UNIMP, "%s: CRC is not implemented, the registers 
>> " \
>> +  "are included for compatability\n", __func__);
>> +return s->spi_rxcrcr;
>> +case STM_SPI_TXCRCR:
>> +qemu_log_mask(LOG_UNIMP, "%s: CRC is not implemented, the registers 
>> " \
>> +  "are included for compatability\n", __func__);
>> +return s->spi_txcrcr;
>> +case STM_SPI_I2SCFGR:
>> +qemu_log_mask(LOG_UNIMP, "%s: I2S is not implemented, the registers 
>> " \
>> +  "are included for compatability\n", __func__);
>> +return s->spi_i2scfgr;
>> +case STM_SPI_I2SPR:
>> +qemu_log_mask(LOG_UNIMP, "%s: I2S is not implemented, the registers 
>> " \
>> +  "are included for compatability\n", __func__);
>> +return s->spi_i2spr;
>> +default:
>> +qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n",
>
> Spaces, please.

Fixed

>
>> +static void stm32f2xx_spi_class_init(ObjectClass *klass, void *data)
>> +{
>> +DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +dc->reset = stm32f2xx_spi_reset;
>> +}
>> +
>> +static const TypeInfo stm32f2xx_spi_info = {
>> +.name  = TYPE_STM32F2XX_SPI,
>> +.parent= TYPE_SYS_BUS_DEVICE,
>> +.instance_size = sizeof(STM32F2XXSPIState),
>> +.instance_init = stm32f2xx_spi_init,
>> +.class_init= stm32f2xx_spi_class_init,
>> +};
>
> Can we have a VMState for migration, please?

Yep, I added one.

>
>> +
>> +static void stm32f2xx_spi_register_types(void)
>> +{
>> +type_register_static(_spi_info);
>> +}
>
>> +typedef struct {
>> +/*  */
>> +SysBusDevice parent_obj;
>> +
>> +/*  */
>> +MemoryRegion mmio;
>> +
>> +uint32_t spi_cr1;
>> +uint32_t spi_cr2;
>> +uint32_t spi_sr;
>> +uint32_t spi_dr;
>> +uint32_t spi_crcpr;
>> +uint32_t spi_rxcrcr;
>> +uint32_t spi_txcrcr;
>> +uint32_t spi_i2scfgr;
>> +uint32_t spi_i2spr;
>> +
>> +qemu_irq irq;
>> +SSIBus *ssi;
>> +} STM32F2XXSPIState;
>
> Personally I like to order the struct fields of a device to put all
> the not-migration data first (so here, mmio, irq, ssi), and then
> the fields that correspond to real device state after that.
> I don't feel very strongly about that though and of course a
> lot of our devices don't do it.

So far I think all the STM devices are all like this, so I'm going to
keep it as is. I don't really mind which way it is either, but I think
they should all be consistent and at the moment this is the consistent
way :)

Thanks,

Alistair

>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH v3 3/7] STM32F2xx: Add the ADC device

2016-02-21 Thread Alistair Francis
On Tue, Feb 2, 2016 at 7:17 AM, Peter Maydell  wrote:
> On 19 January 2016 at 07:23, Alistair Francis  wrote:
>> Add the STM32F2xx ADC device. This device randomly
>> generates values on each read.
>>
>> This also includes creating a hw/adc directory.
>>
>> Signed-off-by: Alistair Francis 
>
>> +static uint32_t stm32f2xx_adc_generate_value(STM32F2XXADCState *s)
>> +{
>> +/* Attempts to fake some ADC values */
>> +#ifdef RAND_AVALIABLE
>> +s->adc_dr = s->adc_dr + rand();
>> +#else
>> +s->adc_dr = s->adc_dr + 7;
>> +#endif
>
> We shouldn't be using rand() in devices I think. (Among other things
> it means we won't be deterministic, which will break record-replay.)
>
> In any case you've typoed your #ifdef constant name, which means
> that code is never used :-)

Woops, I didn't realise that. I'll take the rand() function out then.

I have made all of the other changes as well.

Thanks,

Alistair


>
>> +static uint64_t stm32f2xx_adc_read(void *opaque, hwaddr addr,
>> + unsigned int size)
>> +{
>> +STM32F2XXADCState *s = opaque;
>> +
>> +DB_PRINT("Address: 0x%"HWADDR_PRIx"\n", addr);
>
> Spaces around the HWADDR_PRIx would be nice.
>
>> +
>> +if (addr >= ADC_COMMON_ADDRESS) {
>> +qemu_log_mask(LOG_UNIMP,
>> +  "%s: ADC Common Register Unsupported\n", __func__);
>> +}
>> +
>> +switch (addr) {
>> +case ADC_SR:
>> +return s->adc_sr;
>> +case ADC_CR1:
>> +return s->adc_cr1;
>> +case ADC_CR2:
>> +return s->adc_cr2 & 0xFFF;
>> +case ADC_SMPR1:
>> +return s->adc_smpr1;
>> +case ADC_SMPR2:
>> +return s->adc_smpr2;
>> +case ADC_JOFR1:
>> +case ADC_JOFR2:
>> +case ADC_JOFR3:
>> +case ADC_JOFR4:
>> +qemu_log_mask(LOG_UNIMP, "%s: " \
>> +  "Injection ADC is not implemented, the registers are 
>> " \
>> +  "included for compatability\n", __func__);
>
> "compatibility"
>
>> +return s->adc_jofr[(addr - ADC_JOFR1) / 4];
>> +case ADC_HTR:
>> +return s->adc_htr;
>> +case ADC_LTR:
>> +return s->adc_ltr;
>> +case ADC_SQR1:
>> +return s->adc_sqr1;
>> +case ADC_SQR2:
>> +return s->adc_sqr2;
>> +case ADC_SQR3:
>> +return s->adc_sqr3;
>> +case ADC_JSQR:
>> +qemu_log_mask(LOG_UNIMP, "%s: " \
>> +  "Injection ADC is not implemented, the registers are 
>> " \
>> +  "included for compatability\n", __func__);
>> +return s->adc_jsqr;
>> +case ADC_JDR1:
>> +case ADC_JDR2:
>> +case ADC_JDR3:
>> +case ADC_JDR4:
>> +qemu_log_mask(LOG_UNIMP, "%s: " \
>> +  "Injection ADC is not implemented, the registers are 
>> " \
>> +  "included for compatability\n", __func__);
>> +return s->adc_jdr[(addr - ADC_JDR1) / 4] -
>> +   s->adc_jofr[(addr - ADC_JDR1) / 4];
>> +case ADC_DR:
>> +if ((s->adc_cr2 & ADC_CR2_ADON) && (s->adc_cr2 & ADC_CR2_SWSTART)) {
>> +s->adc_cr2 ^= ADC_CR2_SWSTART;
>> +return stm32f2xx_adc_generate_value(s);
>> +} else {
>> +return 0x;
>
> Just "0" seems more readable to me.
>
>> +#ifdef RAND_MAX
>> +/* The rand() function is avaliable */
>> +#define RAND_AVAILABLE
>> +#undef RAND_MAX
>> +#define RAND_MAX 0xFF
>> +#endif /* RAND_MAX */
>
> What platforms don't have rand()?
> If we need an "exists everywhere" random number function
> then there is one in glib.
>
> (but as noted earlier I don't think we should be using rand() here)
>
>> +
>> +typedef struct {
>> +/*  */
>> +SysBusDevice parent_obj;
>> +
>> +/*  */
>> +MemoryRegion mmio;
>> +
>> +uint32_t adc_sr;
>> +uint32_t adc_cr1;
>> +uint32_t adc_cr2;
>> +uint32_t adc_smpr1;
>> +uint32_t adc_smpr2;
>> +uint32_t adc_jofr[4];
>> +uint32_t adc_htr;
>> +uint32_t adc_ltr;
>> +uint32_t adc_sqr1;
>> +uint32_t adc_sqr2;
>> +uint32_t adc_sqr3;
>> +uint32_t adc_jsqr;
>> +uint32_t adc_jdr[4];
>> +uint32_t adc_dr;
>> +
>> +qemu_irq irq;
>> +} STM32F2XXADCState;
>> +
>> +#endif /* HW_STM32F2XX_ADC_H */
>
> You need to implement the VMState structure for migration.
>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH] gtk: implement set_echo

2016-02-21 Thread Jan Kiszka
On 2016-02-17 15:05, Paolo Bonzini wrote:
> 
> 
> On 17/02/2016 14:53, Kevin Wolf wrote:
>> Waiting didn't fix the bug, so I tried a git bisect now and it pointed
>> me to this commit.
>>
>> I'm using HMP with the default vc backend. Starting with this commit,
>> the echo is broken sometimes, in a way that the first character in the
>> entered command is duplicated for each new character I enter, like this:
>>
>> (qemu) ii
>> (qemu) iiin
>> (qemu) nf
>> (qemu) info
>>
>> This doesn't happen always, but often enough to be annoying.
> 
> The patch for it is already on the way.

Is it still on the road, or did it already arrive on the list at least?
I just bisected to this bug as well and would be happy to see this fixed.

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1423124] Re: QEMU crash after sending data on host serial port

2016-02-21 Thread Sugar
Hi All

I also meet this issue.
 I have two computer, one is Win7 32 another is Win7 64, Both computer meet 
this issue.
My QEMU version is qemu-w32-setup-20160215

I want used EDK2 OVMF with Intel UDK Debugger tools to do source level debug
I had install com0com Virtual Com Port, and set COM3 connect to COM4

Intel UDK Debugger tools used COM3
QEMU run OVMF used COM4

First execute Intel UDK Debugger tools, then launch QEMU
C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
Then QEMU crashes on stratup

I have do some experiment
Execute terminal tool Tera Term and used COM3
launch QEMU and used COM4
C:\Program Files\qemu\qemu-system-x86_64.exe -bios 
"C:\EDK2\Build\OvmfX64\DEBUG_VS2010\FV\OVMF.fd" -serial COM4
This is fine and i can see OVMF trace log on terminal 
But if i press "Down" key on terminal, then QEMU crashe
It's caused by terminal send data("Down" key) to QEMU

Have somebody can share some information about this?

Thanks a lot.
Sugar

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1423124

Title:
  QEMU crash after sending data on host serial port

Status in QEMU:
  New

Bug description:
  Good morning,

  I'm using QEMU for Windows last version.
  The host system is Windows 7 64bits.
  I'm excuting the following statment :

  qemu-system-x86_64w.exe -hda debian.img -m 256 -net nic -net
  tap,ifname=TAP32 -soundhw all -serial COM9

  Qemu starts the emulated Debian and it runs correctly.

  If I try to send data from Windows using COM9 to QEMU (both "real" or 
emulated by the COM0COM driver), QEMU crashes. Windows dump available if 
required.
  If I try to send data to /dev/ttyS0 (that should be the Linux side of COM9) 
from Debian, again, the wirtual machine crashes.

  More details if necessary
  Best regards
  U.Poddine

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1423124/+subscriptions



Re: [Qemu-devel] [V6 0/4] AMD IOMMU

2016-02-21 Thread Jan Kiszka
On 2016-02-21 19:10, David Kiarie wrote:
> Hello there,
> 
> Repost, AMD IOMMU patches version 6.
> 
> Changes since version 5
>  -Fixed macro formating issues
>  -changed occurences of IO MMU to IOMMU for consistency
>  -Fixed capability registers duplication
>  -Rebased to current master

I suspect this still has some subtle bugs: I'm running the patches over
master with standard Linux distro as guest, full desktop, and I'm
getting sporadic segfaults of arbitrary programs. These disappear once I
disable the IOMMU or switch to the Intel version.

How did you test so far?

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [V6 2/4] hw/core: Add AMD IOMMU to machine properties

2016-02-21 Thread Jan Kiszka
On 2016-02-21 19:10, David Kiarie wrote:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 2f0465e..dad160f 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -38,7 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>  "kvm_shadow_mem=size of KVM shadow MMU\n"
>  "dump-guest-core=on|off include guest memory in a core 
> dump (default=on)\n"
>  "mem-merge=on|off controls memory merge support 
> (default: on)\n"
> -"iommu=on|off controls emulated Intel IOMMU (VT-d) 
> support (default=off)\n"
> +"iommu=amd|intel enables and selects the emulated IOMMU 
> (default: off)\n"

We should also support "iommu=off" or "none" to explicitly disable it.
That is consistent with other switches, and maybe there will once be a
machine type (chipset) with IOMMU enabled by default.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 3/4] hw/i386: ACPI table for AMD IOMMU

2016-02-21 Thread David Kiarie
Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
depending on emulated IOMMU

Signed-off-by: David Kiarie 
---
 hw/i386/acpi-build.c| 98 -
 include/hw/acpi/acpi-defs.h | 55 +
 2 files changed, 142 insertions(+), 11 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4554eb8..76ef75f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -51,6 +51,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/amd_iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -121,6 +122,12 @@ typedef struct AcpiBuildPciBusHotplugState {
 bool pcihp_bridge_en;
 } AcpiBuildPciBusHotplugState;
 
+typedef enum iommu_type {
+TYPE_AMD,
+TYPE_INTEL,
+TYPE_NONE
+} iommu_type;
+
 static
 int acpi_add_cpu_info(Object *o, void *opaque)
 {
@@ -2513,6 +2520,78 @@ build_dmar_q35(GArray *table_data, GArray *linker)
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
 
+static void
+build_amd_iommu(GArray *table_data, GArray *linker)
+{
+int iommu_start = table_data->len;
+bool iommu_ambig;
+
+AcpiAMDIOMMUIVRS *ivrs;
+AcpiAMDIOMMUHardwareUnit *iommu;
+
+/* IVRS definition */
+ivrs = acpi_data_push(table_data, sizeof(*ivrs));
+ivrs->revision = cpu_to_le16(ACPI_IOMMU_IVRS_TYPE);
+ivrs->length = cpu_to_le16((sizeof(*ivrs) + sizeof(*iommu)));
+ivrs->v_common_info = cpu_to_le64(AMD_IOMMU_HOST_ADDRESS_WIDTH << 8);
+
+AMDIOMMUState *s = (AMDIOMMUState *)object_resolve_path_type("",
+TYPE_AMD_IOMMU_DEVICE, _ambig);
+
+/* IVDB definition - type 10h */
+iommu = acpi_data_push(table_data, sizeof(*iommu));
+if (!iommu_ambig) {
+iommu->type = cpu_to_le16(0x10);
+/* IVHD flags */
+iommu->flags = cpu_to_le16(iommu->flags);
+iommu->flags = cpu_to_le16(IVHD_HT_TUNEN | IVHD_PPRSUP | IVHD_IOTLBSUP
+   | IVHD_PREFSUP);
+iommu->length = cpu_to_le16(sizeof(*iommu));
+iommu->device_id = cpu_to_le16(PCI_DEVICE_ID_RD890_IOMMU);
+iommu->capability_offset = cpu_to_le16(s->capab_offset);
+iommu->mmio_base = cpu_to_le64(s->mmio.addr);
+iommu->pci_segment = 0;
+iommu->interrupt_info = 0;
+/* EFR features */
+iommu->efr_register = cpu_to_le64(IVHD_EFR_GTSUP | IVHD_EFR_HATS
+  | IVHD_EFR_GATS);
+iommu->efr_register = cpu_to_le64(iommu->efr_register);
+/* device entries */
+memset(iommu->dev_entries, 0, 20);
+/* Add device flags here
+ *  This is are 4-byte device entries currently reporting the range of
+ *  devices 00h - h; all devices
+ *
+ *  Device setting affecting all devices should be made here
+ *
+ *  Refer to
+ *  (http://developer.amd.com/wordpress/media/2012/10/488821.pdf)
+ *  5.2.2.1
+ */
+iommu->dev_entries[12] = 3;
+iommu->dev_entries[16] = 4;
+iommu->dev_entries[17] = 0xff;
+iommu->dev_entries[18] = 0xff;
+}
+
+build_header(linker, table_data, (void *)(table_data->data + iommu_start),
+ "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
+}
+
+static iommu_type has_iommu(void)
+{
+bool ambiguous;
+
+if (object_resolve_path_type("", TYPE_AMD_IOMMU_DEVICE, )
+&& !ambiguous)
+return TYPE_AMD;
+else if (object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE, )
+&& !ambiguous)
+return TYPE_INTEL;
+else
+return TYPE_NONE;
+}
+
 static GArray *
 build_rsdp(GArray *rsdp_table, GArray *linker, unsigned rsdt)
 {
@@ -2570,16 +2649,6 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 return true;
 }
 
-static bool acpi_has_iommu(void)
-{
-bool ambiguous;
-Object *intel_iommu;
-
-intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
-   );
-return intel_iommu && !ambiguous;
-}
-
 static bool acpi_has_nvdimm(void)
 {
 PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
@@ -2600,6 +2669,7 @@ void acpi_build(AcpiBuildTables *tables)
 AcpiMcfgInfo mcfg;
 PcPciInfo pci;
 uint8_t *u;
+iommu_type type = has_iommu();
 size_t aml_len = 0;
 GArray *tables_blob = tables->table_data;
 AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
@@ -2666,7 +2736,13 @@ void acpi_build(AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_mcfg_q35(tables_blob, tables->linker, );
 }
-if (acpi_has_iommu()) {
+
+if (type == TYPE_AMD) {
+acpi_add_table(table_offsets, tables_blob);
+build_amd_iommu(tables_blob, tables->linker);
+}
+
+if (type == TYPE_INTEL) {
 acpi_add_table(table_offsets, tables_blob);
  

Re: [Qemu-devel] [V6 3/4] hw/i386: ACPI table for AMD IOMMU

2016-02-21 Thread David Kiarie
On Sun, Feb 21, 2016 at 9:20 PM, Jan Kiszka  wrote:
> On 2016-02-21 19:10, David Kiarie wrote:
>> Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
>> depending on emulated IOMMU
>>
>> Signed-off-by: David Kiarie 
>> ---
>>  hw/i386/acpi-build.c| 208 
>> +---
>>  include/hw/acpi/acpi-defs.h |  55 
>>  2 files changed, 252 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 4554eb8..fa1310f 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -51,6 +51,7 @@
>>  #include "hw/pci/pci_bus.h"
>>  #include "hw/pci-host/q35.h"
>>  #include "hw/i386/intel_iommu.h"
>> +#include "hw/i386/amd_iommu.h"
>>  #include "hw/timer/hpet.h"
>>
>>  #include "hw/acpi/aml-build.h"
>> @@ -121,6 +122,12 @@ typedef struct AcpiBuildPciBusHotplugState {
>>  bool pcihp_bridge_en;
>>  } AcpiBuildPciBusHotplugState;
>>
>> +typedef enum iommu_type {
>> +TYPE_AMD,
>> +TYPE_INTEL,
>> +TYPE_NONE
>> +} iommu_type;
>> +
>>  static
>>  int acpi_add_cpu_info(Object *o, void *opaque)
>>  {
>> @@ -2513,6 +2520,188 @@ build_dmar_q35(GArray *table_data, GArray *linker)
>>   "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
>>  }
>>
>> +static void
>> +build_amd_iommu(GArray *table_data, GArray *linker)
>> +{
>> +int iommu_start = table_data->len;
>> +bool iommu_ambig;
>> +
>> +AcpiAMDIOMMUIVRS *ivrs;
>> +AcpiAMDIOMMUHardwareUnit *iommu;
>> +
>> +/* IVRS definition */
>> +ivrs = acpi_data_push(table_data, sizeof(*ivrs));
>> +ivrs->revision = cpu_to_le16(ACPI_IOMMU_IVRS_TYPE);
>> +ivrs->length = cpu_to_le16((sizeof(*ivrs) + sizeof(*iommu)));
>> +ivrs->v_common_info = cpu_to_le64(AMD_IOMMU_HOST_ADDRESS_WIDTH << 8);
>> +
>> +AMDIOMMUState *s = (AMDIOMMUState *)object_resolve_path_type("",
>> +TYPE_AMD_IOMMU_DEVICE, _ambig);
>> +
>> +/* IVDB definition - type 10h */
>> +iommu = acpi_data_push(table_data, sizeof(*iommu));
>> +if (!iommu_ambig) {
>> +iommu->type = cpu_to_le16(0x10);
>> +/* IVHD flags */
>> +iommu->flags = cpu_to_le16(iommu->flags);
>> +iommu->flags = cpu_to_le16(IVHD_HT_TUNEN | IVHD_PPRSUP | 
>> IVHD_IOTLBSUP
>> +   | IVHD_PREFSUP);
>> +iommu->length = cpu_to_le16(sizeof(*iommu));
>> +iommu->device_id = cpu_to_le16(PCI_DEVICE_ID_RD890_IOMMU);
>> +iommu->capability_offset = cpu_to_le16(s->capab_offset);
>> +iommu->mmio_base = cpu_to_le64(s->mmio.addr);
>> +iommu->pci_segment = 0;
>> +iommu->interrupt_info = 0;
>> +/* EFR features */
>> +iommu->efr_register = cpu_to_le64(IVHD_EFR_GTSUP | IVHD_EFR_HATS
>> +  | IVHD_EFR_GATS);
>> +iommu->efr_register = cpu_to_le64(iommu->efr_register);
>> +/* device entries */
>> +memset(iommu->dev_entries, 0, 20);
>> +/* Add device flags here
>> + *  This is are 4-byte device entries currently reporting the range 
>> of
>> + *  devices 00h - h; all devices
>> + *
>> + *  Device setting affecting all devices should be made here
>> + *
>> + *  Refer to
>> + *  (http://developer.amd.com/wordpress/media/2012/10/488821.pdf)
>> + *  5.2.2.1
>> + */
>> +iommu->dev_entries[12] = 3;
>> +iommu->dev_entries[16] = 4;
>> +iommu->dev_entries[17] = 0xff;
>> +iommu->dev_entries[18] = 0xff;
>> +}
>> +
>> +build_header(linker, table_data, (void *)(table_data->data + 
>> iommu_start),
>> + "IVRS", table_data->len - iommu_start, 1, NULL);
>> +}
>> +
>> +static iommu_type has_iommu(void)
>> +{
>> +bool ambiguous;
>> +
>> +if (object_resolve_path_type("", TYPE_AMD_IOMMU_DEVICE, )
>> +&& !ambiguous)
>> +return TYPE_AMD;
>> +else if (object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE, 
>> )
>> +&& !ambiguous)
>> +return TYPE_INTEL;
>> +else
>> +return TYPE_NONE;
>> +}
>> +
>> +static void
>> +build_dsdt(GArray *table_data, GArray *linker,
>> +   AcpiPmInfo *pm, AcpiMiscInfo *misc)
>> +{
>> +Aml *dsdt, *sb_scope, *scope, *dev, *method, *field;
>> +MachineState *machine = MACHINE(qdev_get_machine());
>> +uint32_t nr_mem = machine->ram_slots;
>> +
>> +dsdt = init_aml_allocator();
>> +
>> +/* Reserve space for header */
>> +acpi_data_push(dsdt->buf, sizeof(AcpiTableHeader));
>> +
>> +build_dbg_aml(dsdt);
>> +if (misc->is_piix4) {
>> +sb_scope = aml_scope("_SB");
>> +dev = aml_device("PCI0");
>> +aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
>> +aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
>> +aml_append(dev, aml_name_decl("_UID", aml_int(1)));
>> +aml_append(sb_scope, dev);

Re: [Qemu-devel] [V6 3/4] hw/i386: ACPI table for AMD IOMMU

2016-02-21 Thread Jan Kiszka
On 2016-02-21 19:10, David Kiarie wrote:
> Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
> depending on emulated IOMMU
> 
> Signed-off-by: David Kiarie 
> ---
>  hw/i386/acpi-build.c| 208 
> +---
>  include/hw/acpi/acpi-defs.h |  55 
>  2 files changed, 252 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 4554eb8..fa1310f 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -51,6 +51,7 @@
>  #include "hw/pci/pci_bus.h"
>  #include "hw/pci-host/q35.h"
>  #include "hw/i386/intel_iommu.h"
> +#include "hw/i386/amd_iommu.h"
>  #include "hw/timer/hpet.h"
>  
>  #include "hw/acpi/aml-build.h"
> @@ -121,6 +122,12 @@ typedef struct AcpiBuildPciBusHotplugState {
>  bool pcihp_bridge_en;
>  } AcpiBuildPciBusHotplugState;
>  
> +typedef enum iommu_type {
> +TYPE_AMD,
> +TYPE_INTEL,
> +TYPE_NONE
> +} iommu_type;
> +
>  static
>  int acpi_add_cpu_info(Object *o, void *opaque)
>  {
> @@ -2513,6 +2520,188 @@ build_dmar_q35(GArray *table_data, GArray *linker)
>   "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
>  }
>  
> +static void
> +build_amd_iommu(GArray *table_data, GArray *linker)
> +{
> +int iommu_start = table_data->len;
> +bool iommu_ambig;
> +
> +AcpiAMDIOMMUIVRS *ivrs;
> +AcpiAMDIOMMUHardwareUnit *iommu;
> +
> +/* IVRS definition */
> +ivrs = acpi_data_push(table_data, sizeof(*ivrs));
> +ivrs->revision = cpu_to_le16(ACPI_IOMMU_IVRS_TYPE);
> +ivrs->length = cpu_to_le16((sizeof(*ivrs) + sizeof(*iommu)));
> +ivrs->v_common_info = cpu_to_le64(AMD_IOMMU_HOST_ADDRESS_WIDTH << 8);
> +
> +AMDIOMMUState *s = (AMDIOMMUState *)object_resolve_path_type("",
> +TYPE_AMD_IOMMU_DEVICE, _ambig);
> +
> +/* IVDB definition - type 10h */
> +iommu = acpi_data_push(table_data, sizeof(*iommu));
> +if (!iommu_ambig) {
> +iommu->type = cpu_to_le16(0x10);
> +/* IVHD flags */
> +iommu->flags = cpu_to_le16(iommu->flags);
> +iommu->flags = cpu_to_le16(IVHD_HT_TUNEN | IVHD_PPRSUP | 
> IVHD_IOTLBSUP
> +   | IVHD_PREFSUP);
> +iommu->length = cpu_to_le16(sizeof(*iommu));
> +iommu->device_id = cpu_to_le16(PCI_DEVICE_ID_RD890_IOMMU);
> +iommu->capability_offset = cpu_to_le16(s->capab_offset);
> +iommu->mmio_base = cpu_to_le64(s->mmio.addr);
> +iommu->pci_segment = 0;
> +iommu->interrupt_info = 0;
> +/* EFR features */
> +iommu->efr_register = cpu_to_le64(IVHD_EFR_GTSUP | IVHD_EFR_HATS
> +  | IVHD_EFR_GATS);
> +iommu->efr_register = cpu_to_le64(iommu->efr_register);
> +/* device entries */
> +memset(iommu->dev_entries, 0, 20);
> +/* Add device flags here
> + *  This is are 4-byte device entries currently reporting the range 
> of
> + *  devices 00h - h; all devices
> + *
> + *  Device setting affecting all devices should be made here
> + *
> + *  Refer to
> + *  (http://developer.amd.com/wordpress/media/2012/10/488821.pdf)
> + *  5.2.2.1
> + */
> +iommu->dev_entries[12] = 3;
> +iommu->dev_entries[16] = 4;
> +iommu->dev_entries[17] = 0xff;
> +iommu->dev_entries[18] = 0xff;
> +}
> +
> +build_header(linker, table_data, (void *)(table_data->data + 
> iommu_start),
> + "IVRS", table_data->len - iommu_start, 1, NULL);
> +}
> +
> +static iommu_type has_iommu(void)
> +{
> +bool ambiguous;
> +
> +if (object_resolve_path_type("", TYPE_AMD_IOMMU_DEVICE, )
> +&& !ambiguous)
> +return TYPE_AMD;
> +else if (object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE, 
> )
> +&& !ambiguous)
> +return TYPE_INTEL;
> +else
> +return TYPE_NONE;
> +}
> +
> +static void
> +build_dsdt(GArray *table_data, GArray *linker,
> +   AcpiPmInfo *pm, AcpiMiscInfo *misc)
> +{
> +Aml *dsdt, *sb_scope, *scope, *dev, *method, *field;
> +MachineState *machine = MACHINE(qdev_get_machine());
> +uint32_t nr_mem = machine->ram_slots;
> +
> +dsdt = init_aml_allocator();
> +
> +/* Reserve space for header */
> +acpi_data_push(dsdt->buf, sizeof(AcpiTableHeader));
> +
> +build_dbg_aml(dsdt);
> +if (misc->is_piix4) {
> +sb_scope = aml_scope("_SB");
> +dev = aml_device("PCI0");
> +aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
> +aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
> +aml_append(dev, aml_name_decl("_UID", aml_int(1)));
> +aml_append(sb_scope, dev);
> +aml_append(dsdt, sb_scope);
> +
> +build_hpet_aml(dsdt);
> +build_piix4_pm(dsdt);
> +build_piix4_isa_bridge(dsdt);
> +build_isa_devices_aml(dsdt);
> +

[Qemu-devel] [V6 4/4] hw/pci-host: Emulate AMD IOMMU

2016-02-21 Thread David Kiarie
Add AMD IOMMU emulation support to q35 chipset

Signed-off-by: David Kiarie 
---
 hw/pci-host/piix.c|  1 +
 hw/pci-host/q35.c | 14 --
 include/hw/i386/intel_iommu.h |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 41aa66f..ab2e24a 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -36,6 +36,7 @@
 #include "hw/i386/ioapic.h"
 #include "qapi/visitor.h"
 #include "qemu/error-report.h"
+#include "hw/i386/amd_iommu.h"
 
 /*
  * I440FX chipset data sheet.
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 115fb8c..355fb32 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -31,6 +31,7 @@
 #include "hw/hw.h"
 #include "hw/pci-host/q35.h"
 #include "qapi/visitor.h"
+#include "hw/i386/amd_iommu.h"
 
 /
  * Q35 host
@@ -505,9 +506,18 @@ static void mch_realize(PCIDevice *d, Error **errp)
  mch->pci_address_space, >pam_regions[i+1],
  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
 }
-/* Intel IOMMU (VT-d) */
-if (object_property_get_bool(qdev_get_machine(), "iommu", NULL)) {
+
+if (g_strcmp0(MACHINE(qdev_get_machine())->iommu, INTEL_IOMMU_STR) == 0) {
+/* Intel IOMMU (VT-d) */
 mch_init_dmar(mch);
+} else if (g_strcmp0(MACHINE(qdev_get_machine())->iommu, AMD_IOMMU_STR)
+   == 0) {
+AMDIOMMUState *iommu_state;
+PCIDevice *iommu;
+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
+iommu = pci_create_simple(bus, 0x20, TYPE_AMD_IOMMU_DEVICE);
+iommu_state = AMD_IOMMU_DEVICE(iommu);
+pci_setup_iommu(bus, bridge_host_amd_iommu, iommu_state);
 }
 }
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b024ffa..539530c 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -27,6 +27,7 @@
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
  OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
+#define INTEL_IOMMU_STR "intel"
 
 /* DMAR Hardware Unit Definition address (IOMMU unit) */
 #define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed9ULL
-- 
2.1.4




[Qemu-devel] [V6 1/4] hw/i386: Introduce AMD IOMMU

2016-02-21 Thread David Kiarie
Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU
The IOMMU does basic translation, error checking and has a
mininal IOTLB implementation

Signed-off-by: David Kiarie 
---
 hw/i386/Makefile.objs |1 +
 hw/i386/amd_iommu.c   | 1432 +
 hw/i386/amd_iommu.h   |  395 ++
 include/hw/pci/pci.h  |2 +
 4 files changed, 1830 insertions(+)
 create mode 100644 hw/i386/amd_iommu.c
 create mode 100644 hw/i386/amd_iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..2f1a265 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -3,6 +3,7 @@ obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
 obj-y += intel_iommu.o
+obj-y += amd_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
new file mode 100644
index 000..3dac043
--- /dev/null
+++ b/hw/i386/amd_iommu.c
@@ -0,0 +1,1432 @@
+/*
+ * QEMU emulation of AMD IOMMU (AMD-Vi)
+ *
+ * Copyright (C) 2011 Eduard - Gabriel Munteanu
+ * Copyright (C) 2015 David Kiarie, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ * Cache implementation inspired by hw/i386/intel_iommu.c
+ *
+ */
+#include "hw/i386/amd_iommu.h"
+
+/*#define DEBUG_AMD_IOMMU*/
+#ifdef DEBUG_AMD_IOMMU
+enum {
+DEBUG_GENERAL, DEBUG_CAPAB, DEBUG_MMIO, DEBUG_ELOG,
+DEBUG_CACHE, DEBUG_COMMAND, DEBUG_MMU
+};
+
+#define IOMMU_DBGBIT(x)   (1 << DEBUG_##x)
+static int iommu_dbgflags = IOMMU_DBGBIT(MMIO);
+
+#define IOMMU_DPRINTF(what, fmt, ...) do { \
+if (iommu_dbgflags & IOMMU_DBGBIT(what)) { \
+fprintf(stderr, "(amd-iommu)%s: " fmt "\n", __func__, \
+## __VA_ARGS__); } \
+} while (0)
+#else
+#define IOMMU_DPRINTF(what, fmt, ...) do {} while (0)
+#endif
+
+typedef struct AMDIOMMUAddressSpace {
+uint8_t bus_num;/* bus number   */
+uint8_t devfn;  /* device function  */
+AMDIOMMUState *iommu_state; /* IOMMU - one per machine  */
+MemoryRegion iommu; /* Device's iommu region*/
+AddressSpace as;/* device's corresponding address space */
+} AMDIOMMUAddressSpace;
+
+/* IOMMU cache entry */
+typedef struct IOMMUIOTLBEntry {
+uint64_t gfn;
+uint16_t domid;
+uint64_t devid;
+uint64_t perms;
+uint64_t translated_addr;
+} IOMMUIOTLBEntry;
+
+/* configure MMIO registers at startup/reset */
+static void amd_iommu_set_quad(AMDIOMMUState *s, hwaddr addr, uint64_t val,
+   uint64_t romask, uint64_t w1cmask)
+{
+stq_le_p(>mmior[addr], val);
+stq_le_p(>romask[addr], romask);
+stq_le_p(>w1cmask[addr], w1cmask);
+}
+
+static uint16_t amd_iommu_readw(AMDIOMMUState *s, hwaddr addr)
+{
+return lduw_le_p(>mmior[addr]);
+}
+
+static uint32_t amd_iommu_readl(AMDIOMMUState *s, hwaddr addr)
+{
+return ldl_le_p(>mmior[addr]);
+}
+
+static uint64_t amd_iommu_readq(AMDIOMMUState *s, hwaddr addr)
+{
+return ldq_le_p(>mmior[addr]);
+}
+
+/* internal write */
+static void amd_iommu_writeq_raw(AMDIOMMUState *s, uint64_t val, hwaddr addr)
+{
+stq_le_p(>mmior[addr], val);
+}
+
+/* external write */
+static void amd_iommu_writew(AMDIOMMUState *s, hwaddr addr, uint16_t val)
+{
+uint16_t romask = lduw_le_p(>romask[addr]);
+uint16_t w1cmask = lduw_le_p(>w1cmask[addr]);
+uint16_t oldval = lduw_le_p(>mmior[addr]);
+stw_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & oldval));
+}
+
+static void amd_iommu_writel(AMDIOMMUState *s, hwaddr addr, uint32_t val)
+{
+uint32_t romask = ldl_le_p(>romask[addr]);
+uint32_t w1cmask = ldl_le_p(>w1cmask[addr]);
+uint32_t oldval = ldl_le_p(>mmior[addr]);
+stl_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & oldval));
+}
+
+static void amd_iommu_writeq(AMDIOMMUState *s, hwaddr addr, uint64_t val)
+{
+uint64_t romask = ldq_le_p(>romask[addr]);
+uint64_t w1cmask = ldq_le_p(>w1cmask[addr]);
+uint32_t oldval = ldq_le_p(>mmior[addr]);
+stq_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & oldval));
+}
+
+static void amd_iommu_log_event(AMDIOMMUState *s, uint16_t *evt)
+{
+/* event logging not enabled */
+if (!s->evtlog_enabled || *(uint64_t 

[Qemu-devel] [V6 3/4] hw/i386: ACPI table for AMD IOMMU

2016-02-21 Thread David Kiarie
Add IVRS table for AMD IOMMU. Generate IVRS or DMAR
depending on emulated IOMMU

Signed-off-by: David Kiarie 
---
 hw/i386/acpi-build.c| 208 +---
 include/hw/acpi/acpi-defs.h |  55 
 2 files changed, 252 insertions(+), 11 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4554eb8..fa1310f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -51,6 +51,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/amd_iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -121,6 +122,12 @@ typedef struct AcpiBuildPciBusHotplugState {
 bool pcihp_bridge_en;
 } AcpiBuildPciBusHotplugState;
 
+typedef enum iommu_type {
+TYPE_AMD,
+TYPE_INTEL,
+TYPE_NONE
+} iommu_type;
+
 static
 int acpi_add_cpu_info(Object *o, void *opaque)
 {
@@ -2513,6 +2520,188 @@ build_dmar_q35(GArray *table_data, GArray *linker)
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
 
+static void
+build_amd_iommu(GArray *table_data, GArray *linker)
+{
+int iommu_start = table_data->len;
+bool iommu_ambig;
+
+AcpiAMDIOMMUIVRS *ivrs;
+AcpiAMDIOMMUHardwareUnit *iommu;
+
+/* IVRS definition */
+ivrs = acpi_data_push(table_data, sizeof(*ivrs));
+ivrs->revision = cpu_to_le16(ACPI_IOMMU_IVRS_TYPE);
+ivrs->length = cpu_to_le16((sizeof(*ivrs) + sizeof(*iommu)));
+ivrs->v_common_info = cpu_to_le64(AMD_IOMMU_HOST_ADDRESS_WIDTH << 8);
+
+AMDIOMMUState *s = (AMDIOMMUState *)object_resolve_path_type("",
+TYPE_AMD_IOMMU_DEVICE, _ambig);
+
+/* IVDB definition - type 10h */
+iommu = acpi_data_push(table_data, sizeof(*iommu));
+if (!iommu_ambig) {
+iommu->type = cpu_to_le16(0x10);
+/* IVHD flags */
+iommu->flags = cpu_to_le16(iommu->flags);
+iommu->flags = cpu_to_le16(IVHD_HT_TUNEN | IVHD_PPRSUP | IVHD_IOTLBSUP
+   | IVHD_PREFSUP);
+iommu->length = cpu_to_le16(sizeof(*iommu));
+iommu->device_id = cpu_to_le16(PCI_DEVICE_ID_RD890_IOMMU);
+iommu->capability_offset = cpu_to_le16(s->capab_offset);
+iommu->mmio_base = cpu_to_le64(s->mmio.addr);
+iommu->pci_segment = 0;
+iommu->interrupt_info = 0;
+/* EFR features */
+iommu->efr_register = cpu_to_le64(IVHD_EFR_GTSUP | IVHD_EFR_HATS
+  | IVHD_EFR_GATS);
+iommu->efr_register = cpu_to_le64(iommu->efr_register);
+/* device entries */
+memset(iommu->dev_entries, 0, 20);
+/* Add device flags here
+ *  This is are 4-byte device entries currently reporting the range of
+ *  devices 00h - h; all devices
+ *
+ *  Device setting affecting all devices should be made here
+ *
+ *  Refer to
+ *  (http://developer.amd.com/wordpress/media/2012/10/488821.pdf)
+ *  5.2.2.1
+ */
+iommu->dev_entries[12] = 3;
+iommu->dev_entries[16] = 4;
+iommu->dev_entries[17] = 0xff;
+iommu->dev_entries[18] = 0xff;
+}
+
+build_header(linker, table_data, (void *)(table_data->data + iommu_start),
+ "IVRS", table_data->len - iommu_start, 1, NULL);
+}
+
+static iommu_type has_iommu(void)
+{
+bool ambiguous;
+
+if (object_resolve_path_type("", TYPE_AMD_IOMMU_DEVICE, )
+&& !ambiguous)
+return TYPE_AMD;
+else if (object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE, )
+&& !ambiguous)
+return TYPE_INTEL;
+else
+return TYPE_NONE;
+}
+
+static void
+build_dsdt(GArray *table_data, GArray *linker,
+   AcpiPmInfo *pm, AcpiMiscInfo *misc)
+{
+Aml *dsdt, *sb_scope, *scope, *dev, *method, *field;
+MachineState *machine = MACHINE(qdev_get_machine());
+uint32_t nr_mem = machine->ram_slots;
+
+dsdt = init_aml_allocator();
+
+/* Reserve space for header */
+acpi_data_push(dsdt->buf, sizeof(AcpiTableHeader));
+
+build_dbg_aml(dsdt);
+if (misc->is_piix4) {
+sb_scope = aml_scope("_SB");
+dev = aml_device("PCI0");
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
+aml_append(dev, aml_name_decl("_UID", aml_int(1)));
+aml_append(sb_scope, dev);
+aml_append(dsdt, sb_scope);
+
+build_hpet_aml(dsdt);
+build_piix4_pm(dsdt);
+build_piix4_isa_bridge(dsdt);
+build_isa_devices_aml(dsdt);
+build_piix4_pci_hotplug(dsdt);
+build_piix4_pci0_int(dsdt);
+} else {
+sb_scope = aml_scope("_SB");
+aml_append(sb_scope,
+aml_operation_region("PCST", AML_SYSTEM_IO, 0xae00, 0x0c));
+aml_append(sb_scope,
+aml_operation_region("PCSB", AML_SYSTEM_IO, 0xae0c, 0x01));
+   

[Qemu-devel] [V6 2/4] hw/core: Add AMD IOMMU to machine properties

2016-02-21 Thread David Kiarie
Add IOMMU as a string to machine properties which is
used to control whether and the type of IOMMU to emulate

Signed-off-by: David Kiarie 
---
 hw/core/machine.c   | 28 
 include/hw/boards.h |  3 ++-
 qemu-options.hx |  6 +++---
 util/qemu-config.c  |  4 ++--
 4 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6d1a0d8..001ace9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -14,6 +14,8 @@
 #include "hw/boards.h"
 #include "qapi-visit.h"
 #include "qapi/visitor.h"
+#include "hw/i386/amd_iommu.h"
+#include "hw/i386/intel_iommu.h"
 #include "hw/sysbus.h"
 #include "sysemu/sysemu.h"
 #include "qemu/error-report.h"
@@ -284,18 +286,28 @@ static void machine_set_firmware(Object *obj, const char 
*value, Error **errp)
 ms->firmware = g_strdup(value);
 }
 
-static bool machine_get_iommu(Object *obj, Error **errp)
+static char *machine_get_iommu(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
 
-return ms->iommu;
+return g_strdup(ms->iommu);
 }
 
-static void machine_set_iommu(Object *obj, bool value, Error **errp)
+static void machine_set_iommu(Object *obj, const char *value, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
+Error *err = NULL;
+
+g_free(ms->iommu);
+
+if (g_strcmp0(value, AMD_IOMMU_STR) &&
+g_strcmp0(value, INTEL_IOMMU_STR)) {
+error_setg(errp, "Invalid IOMMU type %s", value);
+error_propagate(errp, err);
+return;
+}
 
-ms->iommu = value;
+ms->iommu = g_strdup(value);
 }
 
 static void machine_set_suppress_vmdesc(Object *obj, bool value, Error **errp)
@@ -455,11 +467,10 @@ static void machine_initfn(Object *obj)
 object_property_set_description(obj, "firmware",
 "Firmware image",
 NULL);
-object_property_add_bool(obj, "iommu",
- machine_get_iommu,
- machine_set_iommu, NULL);
+object_property_add_str(obj, "iommu",
+machine_get_iommu, machine_set_iommu, NULL);
 object_property_set_description(obj, "iommu",
-"Set on/off to enable/disable Intel IOMMU 
(VT-d)",
+"IOMMU list",
 NULL);
 object_property_add_bool(obj, "suppress-vmdesc",
  machine_get_suppress_vmdesc,
@@ -485,6 +496,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->dumpdtb);
 g_free(ms->dt_compatible);
 g_free(ms->firmware);
+g_free(ms->iommu);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 0f30959..b119245 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -36,6 +36,7 @@ bool machine_usb(MachineState *machine);
 bool machine_kernel_irqchip_allowed(MachineState *machine);
 bool machine_kernel_irqchip_required(MachineState *machine);
 bool machine_kernel_irqchip_split(MachineState *machine);
+bool machine_amd_iommu(MachineState *machine);
 int machine_kvm_shadow_mem(MachineState *machine);
 int machine_phandle_start(MachineState *machine);
 bool machine_dump_guest_core(MachineState *machine);
@@ -126,7 +127,7 @@ struct MachineState {
 bool usb_disabled;
 bool igd_gfx_passthru;
 char *firmware;
-bool iommu;
+char *iommu;
 bool suppress_vmdesc;
 
 ram_addr_t ram_size;
diff --git a/qemu-options.hx b/qemu-options.hx
index 2f0465e..dad160f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,7 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "kvm_shadow_mem=size of KVM shadow MMU\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
 "mem-merge=on|off controls memory merge support (default: 
on)\n"
-"iommu=on|off controls emulated Intel IOMMU (VT-d) support 
(default=off)\n"
+"iommu=amd|intel enables and selects the emulated IOMMU 
(default: off)\n"
 "igd-passthru=on|off controls IGD GFX passthrough support 
(default=off)\n"
 "aes-key-wrap=on|off controls support for AES key wrapping 
(default=on)\n"
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
@@ -72,8 +72,8 @@ Include guest memory in a core dump. The default is on.
 Enables or disables memory merge support. This feature, when supported by
 the host, de-duplicates identical memory pages among VMs instances
 (enabled by default).
-@item iommu=on|off
-Enables or disables emulated Intel IOMMU (VT-d) support. The default is off.
+@item iommu=intel|amd
+Enables and selects the emulated IOMMU. The default is off.
 @item aes-key-wrap=on|off
 Enables or disables AES key wrapping support on s390-ccw hosts. This feature
 controls 

  1   2   >