date:20170607

Re: [Qemu-devel] [PATCH v3 5/7] pci: Make errp the last parameter of pci_add_capability()

2017-06-07 Thread Markus Armbruster

Mao Zhongyi  writes:

> Hi, Eduardo
>
> On 06/06/2017 10:52 PM, Eduardo Habkost wrote:
>> On Tue, Jun 06, 2017 at 07:26:30PM +0800, Mao Zhongyi wrote:
>>> Add Error argument for pci_add_capability() to leverage the errp
>>> to pass info on errors. This way is helpful for its callers to
>>> make a better error handling when moving to 'realize'.
[...]
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index b73bfea..2bba37a 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -2264,15 +2264,13 @@ static void pci_del_option_rom(PCIDevice *pdev)
>>>   * in pci config space
>>>   */
>>>  int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
>>> -   uint8_t offset, uint8_t size)
>>> +   uint8_t offset, uint8_t size,
>>> +   Error **errp)
>>>  {
>>>  int ret;
>>> -Error *local_err = NULL;
>>>
>>> -ret = pci_add_capability2(pdev, cap_id, offset, size, &local_err);
>>> -if (ret < 0) {
>>> -error_report_err(local_err);
>>> -}
>>> +ret = pci_add_capability2(pdev, cap_id, offset, size, errp);
>>> +
>>>  return ret;
>>>  }
>>
>> pci_add_capability() and pci_add_capability2() now do exactly the
>> same, why are both being kept?  I suggest replacing
>> pci_add_capability2() with pci_add_capability() everywhere (on a
>> separate patch).
>>
>
> Completely remove pci_add_capability and direct use pci_add_capability2()
> everywhere is it a more thorough way?

You're converting pci_add_capability() to Error because you need the
Error for your conversions to realize().

I recommend to change the calls where you need the Error (and only
these) to call pci_add_capability2() instead.

When no calls to pci_add_capability() remain, we remove it.  If that
becomes the case in your series, you remove it.

Okay?

Re: [Qemu-devel] [PATCH] Add chardev-send-break monitor command

2017-06-07 Thread Markus Armbruster

Paolo Bonzini  writes:

>> >> Is there an obvious test that we can enhance to add coverage of the new
>> >> QMP command?
>> >
>> > You could have a new test covering hw/char/serial.c, but I wouldn't let
>> > that hold the patch.
>> 
>> Holding patches is pretty much the only leverage I have to get tests for
>> new stuff :)
>> 
>> Asking for tests that cover all of serial.c wouldn't be fair.  But I am
>> asking for basic test coverage of new QMP commands.
>
> I agree, on the other hand it's not exactly a comparable area.  I am planning
> to write a serial qtest for migration as well (which is more complex than this
> QMP command) so I might as well write the test for this new QMP command myself
> to get my feet wet...

I'm willing to take a committment from someone I trust in lieu of actual
tests.  Is this one?

[Qemu-devel] [PATCH] pseries: Correct panic behaviour for pseries machine type

2017-06-07 Thread David Gibson

The pseries machine type doesn't usually use the 'pvpanic' device as such,
because it has a firmware/hypervisor facility with roughly the same
purpose.  The 'ibm,os-term' RTAS call notifies the hypervisor that the
guest has crashed.

Our implementation of this call was sending a GUEST_PANICKED qmp event;
however, it was not doing the other usual panic actions, making its
behaviour different from pvpanic for no good reason.

To correct this, we should call qemu_system_guest_panicked() rather than
directly sending the panic event.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr_rtas.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 707c4d4..94a2799 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -293,12 +293,9 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
 target_ulong args,
 uint32_t nret, target_ulong rets)
 {
-target_ulong ret = 0;
+qemu_system_guest_panicked(NULL);
 
-qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, false, NULL,
-   &error_abort);
-
-rtas_st(rets, 0, ret);
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
 static void rtas_set_power_level(PowerPCCPU *cpu, sPAPRMachineState *spapr,
-- 
2.9.4

Re: [Qemu-devel] [PATCH] Add chardev-send-break monitor command

2017-06-07 Thread Paolo Bonzini



- Original Message -
> From: "Markus Armbruster" 
> To: "Paolo Bonzini" 
> Cc: "Stefan Fritsch" , qemu-devel@nongnu.org, "Dr. David 
> Alan Gilbert" ,
> "Marc-André Lureau" 
> Sent: Wednesday, June 7, 2017 9:06:53 AM
> Subject: Re: [Qemu-devel] [PATCH] Add chardev-send-break monitor command
> 
> Paolo Bonzini  writes:
> 
> >> >> Is there an obvious test that we can enhance to add coverage of the new
> >> >> QMP command?
> >> >
> >> > You could have a new test covering hw/char/serial.c, but I wouldn't let
> >> > that hold the patch.
> >> 
> >> Holding patches is pretty much the only leverage I have to get tests for
> >> new stuff :)
> >> 
> >> Asking for tests that cover all of serial.c wouldn't be fair.  But I am
> >> asking for basic test coverage of new QMP commands.
> >
> > I agree, on the other hand it's not exactly a comparable area.  I am 
> > planning
> > to write a serial qtest for migration as well (which is more complex than 
> > this
> > QMP command) so I might as well write the test for this new QMP command
> > myself to get my feet wet...
> 
> I'm willing to take a committment from someone I trust in lieu of actual
> tests.  Is this one?

Sure, though it looks like Stefan also wrote actual tests so we might get
two birds with a stone.

Paolo

Re: [Qemu-devel] [PATCH v1 1/1] char-socket: Don't report TCP socket waiting as an error

2017-06-07 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 06/06/2017 18:30, Alistair Francis wrote:
>>>
>>> This is somehow confusing. I don't think it is worth having another
>>> qemu_log_stderr() function rather than using error_report() but this very
>>> call might deserve a comment explaining this unusual use. What do you think?
>> 
>> The problem with stderr is that this isn't an error. Some uses of QEMU
>> (inside Eclipse for example) flag everything printed on stderr as red
>> which confuses users that they are seeing an error when they really
>> aren't.
>
> But they are wrong.

Concur.  We also print warnings and informational messages to stderr.

We should make errors easy to recognize.  Fortunately, error_report()
prints errors to stderr in a rigid format.  Unfortunately, error
messages bypassing error_report() still exist in places.  We suck.

The format is

timestamp-if-enabled progname ':' location message

timestamp-if-enabled is normally empty.  With -msg timestamp=on, it's
the current time in ISO 8601 format, followed by a space.

progname is the program name (main()'s argv[0]).

location is either empty, or a reference to the command line or a
configuration file.

See error_vreport() for details.

[...]

Re: [Qemu-devel] [PATCH] pseries: Correct panic behaviour for pseries machine type

2017-06-07 Thread Paolo Bonzini



- Original Message -
> From: "David Gibson" 
> To: mdr...@linux.vnet.ibm.com, th...@redhat.com, lviv...@redhat.com
> Cc: pbonz...@redhat.com, qemu-...@nongnu.org, qemu-devel@nongnu.org, "David 
> Gibson" 
> Sent: Wednesday, June 7, 2017 9:07:32 AM
> Subject: [PATCH] pseries: Correct panic behaviour for pseries machine type
> 
> The pseries machine type doesn't usually use the 'pvpanic' device as such,
> because it has a firmware/hypervisor facility with roughly the same
> purpose.  The 'ibm,os-term' RTAS call notifies the hypervisor that the
> guest has crashed.
> 
> Our implementation of this call was sending a GUEST_PANICKED qmp event;
> however, it was not doing the other usual panic actions, making its
> behaviour different from pvpanic for no good reason.
> 
> To correct this, we should call qemu_system_guest_panicked() rather than
> directly sending the panic event.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_rtas.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 707c4d4..94a2799 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -293,12 +293,9 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>  target_ulong args,
>  uint32_t nret, target_ulong rets)
>  {
> -target_ulong ret = 0;
> +qemu_system_guest_panicked(NULL);
>  
> -qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, false, NULL,
> -   &error_abort);
> -
> -rtas_st(rets, 0, ret);
> +rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }

It's possible to "cont" a panicked guest, so I think you should keep
the rtas_st.

Paolo

Re: [Qemu-devel] [PATCH] pseries: Correct panic behaviour for pseries machine type

2017-06-07 Thread Thomas Huth

On 07.06.2017 09:07, David Gibson wrote:
> The pseries machine type doesn't usually use the 'pvpanic' device as such,
> because it has a firmware/hypervisor facility with roughly the same
> purpose.  The 'ibm,os-term' RTAS call notifies the hypervisor that the
> guest has crashed.
> 
> Our implementation of this call was sending a GUEST_PANICKED qmp event;
> however, it was not doing the other usual panic actions, making its
> behaviour different from pvpanic for no good reason.
> 
> To correct this, we should call qemu_system_guest_panicked() rather than
> directly sending the panic event.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_rtas.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 707c4d4..94a2799 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -293,12 +293,9 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>  target_ulong args,
>  uint32_t nret, target_ulong rets)
>  {
> -target_ulong ret = 0;
> +qemu_system_guest_panicked(NULL);
>  
> -qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, false, NULL,
> -   &error_abort);
> -
> -rtas_st(rets, 0, ret);
> +rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>  }
>  
>  static void rtas_set_power_level(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> 

If I get that qemu_system_guest_panicked() function right, it will stop
the VM, won't it? That contradicts the LoPAPR spec that says that the
RTAS call returns if the "ibm,extended-os-term" property is available in
the device tree. And we currently present this property in the device
tree. So either the guest should not be stopped here, or we've got to
remove the property from the device tree again.

 Thomas

[Qemu-devel] [[PATCH V7] 00/11] calculate blocktime for postcopy live migration

2017-06-07 Thread Alexey Perevalov

This is 7th version

(V6 -> V7)
1. copied bitmap was placed into RAMBlock as another migration
related bitmaps.
2. Ordering of mark_postcopy_blocktime_end call and ordering
of checking copied bitmap were changed.
3. linewrap style defects
4. new patch "postcopy_place_page factoring out"
5. postcopy_ram_supported_by_host accepts
MigrationIncomingState in qmp_migrate_set_capabilities
5. minor fixes of documentation. 
and huge description of get_postcopy_total_blocktime was
moved. Davids comment.

This patch set is based on commit
a0d4aac7467dd02e5657b79e867f067330266a24
of git://git.qemu-project.org/qemu.git

Alexey Perevalov (11):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  migration: pass MigrationIncomingState* into migration check functions
  migration: fix hardcoded function name in error report
  migration: split ufd_version_check onto receive/request features part
  migration: introduce postcopy-blocktime capability
  migration: add postcopy blocktime ctx into MigrationIncomingState
  migration: add bitmap for copied page
  migration: postcopy_place_page factoring out
  migration: calculate vCPU blocktime on dst side
  migration: add postcopy total blocktime into query-migrate
  migration: postcopy_blocktime documentation

 docs/migration.txt|  10 +
 hmp.c |  15 ++
 include/exec/ram_addr.h   |   2 +
 include/migration/migration.h |  13 ++
 linux-headers/linux/userfaultfd.h |   4 +
 migration/migration.c |  52 +-
 migration/postcopy-ram.c  | 374 --
 migration/postcopy-ram.h  |   6 +-
 migration/ram.c   |  40 +++-
 migration/ram.h   |   4 +
 migration/savevm.c|   2 +-
 migration/trace-events|   6 +-
 qapi-schema.json  |  14 +-
 13 files changed, 514 insertions(+), 28 deletions(-)

-- 
1.9.1

[Qemu-devel] [[PATCH V7] 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState

2017-06-07 Thread Alexey Perevalov

This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
in case when this feature is provided by kernel.

PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
due to it's postcopy only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to request
proper capabiltiy (Patch for documentation will be at the tail of the patch
set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h |  8 ++
 migration/postcopy-ram.c  | 65 +++
 2 files changed, 73 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2e61df5..766e802 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -49,6 +49,8 @@ enum mig_rp_message_type {
 MIG_RP_MSG_MAX
 };
 
+struct PostcopyBlocktimeContext;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
@@ -86,6 +88,12 @@ struct MigrationIncomingState {
 /* The coroutine we should enter (back) after failover */
 Coroutine *migration_incoming_co;
 QemuSemaphore colo_incoming_sem;
+
+/*
+ * PostcopyBlocktimeContext to keep information for postcopy
+ * live migration, to calculate vCPU block time
+ * */
+struct PostcopyBlocktimeContext *blocktime_ctx;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cbe8f9f..ade7f1c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,6 +63,58 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
+typedef struct PostcopyBlocktimeContext {
+/* time when page fault initiated per vCPU */
+int64_t *page_fault_vcpu_time;
+/* page address per vCPU */
+uint64_t *vcpu_addr;
+int64_t total_blocktime;
+/* blocktime per vCPU */
+int64_t *vcpu_blocktime;
+/* point in time when last page fault was initiated */
+int64_t last_begin;
+/* number of vCPU are suspended */
+int smp_cpus_down;
+
+/*
+ * Handler for exit event, necessary for
+ * releasing whole blocktime_ctx
+ */
+Notifier exit_notifier;
+/*
+ * Handler for postcopy event, necessary for
+ * releasing unnecessary part of blocktime_ctx
+ */
+Notifier postcopy_notifier;
+} PostcopyBlocktimeContext;
+
+static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
+{
+g_free(ctx->page_fault_vcpu_time);
+g_free(ctx->vcpu_addr);
+g_free(ctx->vcpu_blocktime);
+g_free(ctx);
+}
+
+static void migration_exit_cb(Notifier *n, void *data)
+{
+PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
+ exit_notifier);
+destroy_blocktime_context(ctx);
+}
+
+static struct PostcopyBlocktimeContext *blocktime_context_new(void)
+{
+PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
+ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
+ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
+ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
+
+ctx->exit_notifier.notify = migration_exit_cb;
+qemu_add_exit_notifier(&ctx->exit_notifier);
+add_migration_state_change_notifier(&ctx->postcopy_notifier);
+return ctx;
+}
 
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
@@ -155,6 +207,19 @@ static bool ufd_check_and_apply(int ufd, 
MigrationIncomingState *mis)
 }
 }
 
+#ifdef UFFD_FEATURE_THREAD_ID
+if (migrate_postcopy_blocktime() && mis &&
+UFFD_FEATURE_THREAD_ID & supported_features) {
+/* kernel supports that feature */
+/* don't create blocktime_context if it exists */
+if (!mis->blocktime_ctx) {
+mis->blocktime_ctx = blocktime_context_new();
+}
+
+asked_features |= UFFD_FEATURE_THREAD_ID;
+}
+#endif
+
 /*
  * request features, even if asked_features is 0, due to
  * kernel expects UFFD_API before UFFDIO_REGISTER, per
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_*

2017-06-07 Thread Alexey Perevalov

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov 
---
 linux-headers/linux/userfaultfd.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h 
b/linux-headers/linux/userfaultfd.h
index 9701772..eda028c 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -78,6 +78,9 @@ struct uffd_msg {
struct {
__u64   flags;
__u64   address;
+   union {
+   __u32   ptid;
+   } feat;
} pagefault;
 
struct {
@@ -161,6 +164,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MISSING_HUGETLBFS (1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM (1<<5)
 #define UFFD_FEATURE_EVENT_UNMAP   (1<<6)
+#define UFFD_FEATURE_THREAD_ID (1<<7)
__u64 features;
 
__u64 ioctls;
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 02/11] migration: pass MigrationIncomingState* into migration check functions

2017-06-07 Thread Alexey Perevalov

That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.

Signed-off-by: Alexey Perevalov 
---
 migration/migration.c|  3 ++-
 migration/postcopy-ram.c | 10 +-
 migration/postcopy-ram.h |  2 +-
 migration/savevm.c   |  2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 48c94c9..2a77636 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -726,6 +726,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
   Error **errp)
 {
 MigrationState *s = migrate_get_current();
+MigrationIncomingState *mis = migration_incoming_get_current();
 MigrationCapabilityStatusList *cap;
 bool old_postcopy_cap = migrate_postcopy_ram();
 
@@ -772,7 +773,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
  * special support.
  */
 if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-!postcopy_ram_supported_by_host()) {
+!postcopy_ram_supported_by_host(mis)) {
 /* postcopy_ram_supported_by_host will have emitted a more
  * detailed message
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9c41887..10d39a0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,7 +63,7 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
 struct uffdio_api api_struct;
 uint64_t ioctl_mask;
@@ -126,7 +126,7 @@ static int test_ramblock_postcopiable(const char 
*block_name, void *host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 long pagesize = getpagesize();
 int ufd = -1;
@@ -149,7 +149,7 @@ bool postcopy_ram_supported_by_host(void)
 }
 
 /* Version and features check */
-if (!ufd_version_check(ufd)) {
+if (!ufd_version_check(ufd, mis)) {
 goto out;
 }
 
@@ -525,7 +525,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * Although the host check already tested the API, we need to
  * do the check again as an ABI handshake on the new fd.
  */
-if (!ufd_version_check(mis->userfault_fd)) {
+if (!ufd_version_check(mis->userfault_fd, mis)) {
 return -1;
 }
 
@@ -663,7 +663,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 error_report("%s: No OS support", __func__);
 return false;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 52d51e8..587a8b8 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/savevm.c b/migration/savevm.c
index 9c320f5..8b7bab8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1380,7 +1380,7 @@ static int 
loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
 return -1;
 }
 
-if (!postcopy_ram_supported_by_host()) {
+if (!postcopy_ram_supported_by_host(mis)) {
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
 return -1;
 }
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 03/11] migration: fix hardcoded function name in error report

2017-06-07 Thread Alexey Perevalov

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 10d39a0..9963ce4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -71,7 +71,7 @@ static bool ufd_version_check(int ufd, MigrationIncomingState 
*mis)
 api_struct.api = UFFD_API;
 api_struct.features = 0;
 if (ioctl(ufd, UFFDIO_API, &api_struct)) {
-error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+error_report("%s: UFFDIO_API failed: %s", __func__
  strerror(errno));
 return false;
 }
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 07/11] migration: add bitmap for copied page

2017-06-07 Thread Alexey Perevalov

This patch adds ability to track down already copied
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, maybe for restore after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about copied pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already copied pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps. Helpers are in migration/ram.c, due to
in this file is allowing to work with RAMBlock.

Signed-off-by: Alexey Perevalov 
---
 include/exec/ram_addr.h |  2 ++
 migration/ram.c | 36 
 migration/ram.h |  4 
 3 files changed, 42 insertions(+)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 140efa8..6a3780b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -47,6 +47,8 @@ struct RAMBlock {
  * of the postcopy phase
  */
 unsigned long *unsentmap;
+/* bitmap of already copied pages in postcopy */
+unsigned long *copiedmap;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/migration/ram.c b/migration/ram.c
index f387e9c..a7c0db4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -149,6 +149,25 @@ out:
 return ret;
 }
 
+static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
+{
+uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
+int page_shift = find_first_bit((unsigned long *)&rb->page_size,
+sizeof(rb->page_size));
+
+return addr_offset >> page_shift;
+}
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
 block->bmap = NULL;
 g_free(block->unsentmap);
 block->unsentmap = NULL;
+g_free(block->copiedmap);
+block->copiedmap = NULL;
 }
 
 XBZRLE_cache_lock();
@@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
 return ret;
 }
 
+static unsigned long get_copiedmap_size(RAMBlock *rb)
+{
+unsigned long pages;
+pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
+ sizeof(rb->page_size));
+return pages;
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 int flags = 0, ret = 0;
@@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 rcu_read_lock();
 
 if (postcopy_running) {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+/* need for destination, bitmap_new calls
+ * g_try_malloc0 and this function
+ * Attempts to allocate @n_bytes, initialized to 0'sh */
+rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
+}
 ret = ram_load_postcopy(f);
 }
 
diff --git a/migration/ram.h b/migration/ram.h
index c9563d1..1f32824 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, 
size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+
 #endif
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 11/11] migration: postcopy_blocktime documentation

2017-06-07 Thread Alexey Perevalov

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 docs/migration.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 1b940a8..4b625ca 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -402,6 +402,16 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
+Blocktime is a postcopy live migration metric, intended to show
+how long the vCPU was in state of interruptable sleep due to pagefault.
+This value is calculated on destination side.
+To enable postcopy blocktime calculation, enter following command on 
destination
+monitor:
+
+migrate_set_capability postcopy-blocktime on
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+
 Note: During the postcopy phase, the bandwidth limits set using
 migrate_set_speed is ignored (to avoid delaying requested pages that
 the destination is waiting for).
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 04/11] migration: split ufd_version_check onto receive/request features part

2017-06-07 Thread Alexey Perevalov

This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.

QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 96 
 1 file changed, 89 insertions(+), 7 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9963ce4..cbe8f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,15 +63,66 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
-static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
+
+/**
+ * receive_ufd_features: check userfault fd features, to request only supported
+ * features in the future.
+ *
+ * Returns: true on success
+ *
+ * __NR_userfaultfd - should be checked before
+ *  @features: out parameter will contain uffdio_api.features provided by 
kernel
+ *  in case of success
+ */
+static bool receive_ufd_features(uint64_t *features)
 {
-struct uffdio_api api_struct;
-uint64_t ioctl_mask;
+struct uffdio_api api_struct = {0};
+int ufd;
+bool ret = true;
+
+/* if we are here __NR_userfaultfd should exists */
+ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+if (ufd == -1) {
+error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
+ strerror(errno));
+return false;
+}
 
+/* ask features */
 api_struct.api = UFFD_API;
 api_struct.features = 0;
 if (ioctl(ufd, UFFDIO_API, &api_struct)) {
-error_report("%s: UFFDIO_API failed: %s", __func__
+error_report("%s: UFFDIO_API failed: %s", __func__,
+ strerror(errno));
+ret = false;
+goto release_ufd;
+}
+
+*features = api_struct.features;
+
+release_ufd:
+close(ufd);
+return ret;
+}
+
+/**
+ * request_ufd_features: this function should be called only once on a newly
+ * opened ufd, subsequent calls will lead to error.
+ *
+ * Returns: true on succes
+ *
+ * @ufd: fd obtained from userfaultfd syscall
+ * @features: bit mask see UFFD_API_FEATURES
+ */
+static bool request_ufd_features(int ufd, uint64_t features)
+{
+struct uffdio_api api_struct = {0};
+uint64_t ioctl_mask;
+
+api_struct.api = UFFD_API;
+api_struct.features = features;
+if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+error_report("%s failed: UFFDIO_API failed: %s", __func__,
  strerror(errno));
 return false;
 }
@@ -84,11 +135,42 @@ static bool ufd_version_check(int ufd, 
MigrationIncomingState *mis)
 return false;
 }
 
+return true;
+}
+
+static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
+{
+uint64_t asked_features = 0;
+static uint64_t supported_features;
+
+/*
+ * it's not possible to
+ * request UFFD_API twice per one fd
+ * userfault fd features is persistent
+ */
+if (!supported_features) {
+if (!receive_ufd_features(&supported_features)) {
+error_report("%s failed", __func__);
+return false;
+}
+}
+
+/*
+ * request features, even if asked_features is 0, due to
+ * kernel expects UFFD_API before UFFDIO_REGISTER, per
+ * userfault file descriptor
+ */
+if (!request_ufd_features(ufd, asked_features)) {
+error_report("%s failed: features %" PRIu64, __func__,
+ asked_features);
+return false;
+}
+
 if (getpagesize() != ram_pagesize_summary()) {
 bool have_hp = false;
 /* We've got a huge page */
 #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
-have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
+have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
 #endif
 if (!have_hp) {
 error_report("Userfault on this host does not support huge pages");
@@ -149,7 +231,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis)
 }
 
 /* Version and features check */
-if (!ufd_version_check(ufd, mis)) {
+if (!ufd_check_and_apply(ufd, mis)) {
 goto out;
 }
 
@@ -525,7 +607,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * Although the host check already tested the API, we need to
  * do the check again as an ABI handshake on the new fd.
  */
-if (!ufd_version_check(mis->userfault_fd, mis)) {
+if (!ufd_check_and_apply(mis->userfault_fd, mis)) {
 return -1;
 }
 
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 08/11] migration: postcopy_place_page factoring out

2017-06-07 Thread Alexey Perevalov

Need to set copied bitmap as closer as possible to mark_postcopy_blocktime_end.
So postcopy_place_page is proper place. RAMBlock argument here could avoid
additional RAMBlock lookup as well as reduce number of arguments
(no need to pass pointer to copied bitmap).

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 13 -
 migration/postcopy-ram.h |  4 ++--
 migration/ram.c  |  4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ade7f1c..62a272a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -713,9 +713,10 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 struct uffdio_copy copy_struct;
+size_t pagesize = qemu_ram_pagesize(rb);
 
 copy_struct.dst = (uint64_t)(uintptr_t)host;
 copy_struct.src = (uint64_t)(uintptr_t)from;
@@ -744,10 +745,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
*host, void *from,
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize)
+ RAMBlock *rb)
 {
+size_t pagesize;
 trace_postcopy_place_page_zero(host);
 
+pagesize = qemu_ram_pagesize(rb);
 if (pagesize == getpagesize()) {
 struct uffdio_zeropage zero_struct;
 zero_struct.range.start = (uint64_t)(uintptr_t)host;
@@ -778,7 +781,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, 
void *host,
 memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
 }
 return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
-   pagesize);
+   rb);
 }
 
 return 0;
@@ -841,14 +844,14 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
*mis)
 }
 
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
 }
 
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 587a8b8..77ea0fd 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -72,14 +72,14 @@ void postcopy_discard_send_finish(MigrationState *ms,
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize);
+RAMBlock *rb);
 
 /*
  * Place a zero page at (host) atomically
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize);
+ RAMBlock *rb);
 
 /* The current postcopy state is read/set by postcopy_state_get/set
  * which update it atomically.
diff --git a/migration/ram.c b/migration/ram.c
index a7c0db4..a791d40 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2524,10 +2524,10 @@ static int ram_load_postcopy(QEMUFile *f)
 
 if (all_zero) {
 ret = postcopy_place_page_zero(mis, place_dest,
-   block->page_size);
+   block);
 } else {
 ret = postcopy_place_page(mis, place_dest,
-  place_source, block->page_size);
+  place_source, block);
 }
 }
 if (!ret) {
-- 
1.9.1

[Qemu-devel] [[PATCH V7] 10/11] migration: add postcopy total blocktime into query-migrate

2017-06-07 Thread Alexey Perevalov

Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
"postcopy-vcpu-blocktime": [115, 100],
"status": "completed",
"postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 hmp.c | 15 
 include/migration/migration.h |  4 +++
 migration/migration.c | 40 +++---
 migration/postcopy-ram.c  | 57 +++
 migration/trace-events|  1 +
 qapi-schema.json  |  9 ++-
 6 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index 8c72c58..e0c4fdf 100644
--- a/hmp.c
+++ b/hmp.c
@@ -262,6 +262,21 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->cpu_throttle_percentage);
 }
 
+if (info->has_postcopy_blocktime) {
+monitor_printf(mon, "postcopy blocktime: %" PRId64 "\n",
+   info->postcopy_blocktime);
+}
+
+if (info->has_postcopy_vcpu_blocktime) {
+Visitor *v;
+char *str;
+v = string_output_visitor_new(false, &str);
+visit_type_int64List(v, NULL, &info->postcopy_vcpu_blocktime, NULL);
+visit_complete(v, &str);
+monitor_printf(mon, "postcopy vcpu blocktime: %s\n", str);
+g_free(str);
+visit_free(v);
+}
 qapi_free_MigrationInfo(info);
 qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 766e802..7d20470 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -98,6 +98,10 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with blocktime context
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info);
 
 struct MigrationState
 {
diff --git a/migration/migration.c b/migration/migration.c
index d1cc34f..b80d5b5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,14 +625,15 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 }
 }
 
-MigrationInfo *qmp_query_migrate(Error **errp)
+static void fill_source_migration_info(MigrationInfo *info)
 {
-MigrationInfo *info = g_malloc0(sizeof(*info));
 MigrationState *s = migrate_get_current();
 
 switch (s->state) {
 case MIGRATION_STATUS_NONE:
 /* no migration has happened ever */
+/* do not overwrite destination migration status */
+return;
 break;
 case MIGRATION_STATUS_SETUP:
 info->has_status = true;
@@ -718,10 +719,43 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 break;
 }
 info->status = s->state;
+}
 
-return info;
+static void fill_destination_migration_info(MigrationInfo *info)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+switch (mis->state) {
+case MIGRATION_STATUS_NONE:
+return;
+break;
+case MIGRATION_STATUS_SETUP:
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_ACTIVE:
+case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+case MIGRATION_STATUS_FAILED:
+case MIGRATION_STATUS_COLO:
+info->has_status = true;
+break;
+case MIGRATION_STATUS_COMPLETED:
+info->has_status = true;
+fill_destination_postcopy_migration_info(info);
+break;
+}
+info->status = mis->state;
 }
 
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+MigrationInfo *info = g_malloc0(sizeof(*info));
+
+fill_destination_migration_info(info);
+fill_source_migration_info(info);
+
+return info;
+ }
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
   Error **errp)
 {
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0ad9f9f..7f5b402 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -117,6 +117,55 @@ static struct PostcopyBlocktimeContext 
*blocktime_context_new(void)
 return ctx;
 }
 
+static int64List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
+{
+int64List *list = NULL, *entry = NULL;
+int i;
+
+for (i = smp_cpus - 1; i >= 0; i--) {
+entry = g_new0(in

[Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread Alexey Perevalov

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 139 ++-
 migration/trace-events   |   5 +-
 2 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 62a272a..0ad9f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,7 @@
 #include "ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include 
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char 
*block_name, void *host_addr,
 return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+CPUState *cpu_iter;
+
+CPU_FOREACH(cpu_iter) {
+if (cpu_iter->thread_id == pid) {
+return cpu_iter->cpu_index;
+}
+}
+trace_get_mem_fault_cpu_index(pid);
+return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+  RAMBlock *rb)
+{
+int cpu;
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int64_t now_ms;
+
+if (!dc || ptid == 0) {
+return;
+}
+cpu = get_mem_fault_cpu_index(ptid);
+if (cpu < 0) {
+return;
+}
+
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (dc->vcpu_addr[cpu] == 0) {
+atomic_inc(&dc->smp_cpus_down);
+}
+
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+atomic_xchg__nocheck(&dc->last_begin, now_ms);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+
+if (test_copiedmap_by_addr(addr, rb)) {
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+atomic_sub(&dc->smp_cpus_down, 1);
+}
+trace_mark_postcopy_blocktime_begin(addr, dc, 
dc->page_fault_vcpu_time[cpu],
+cpu);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *  S1E1   S1   E1
+ * -***xxx***> CPU1
+ *
+ * S2E2
+ * xxx---> CPU2
+ *
+ * S3E3
+ * xxx---> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include 
CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *  * - means blocktime per vCPU
+ *  x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int i, affected_cpu = 0;
+int64_t now_ms;
+bool vcpu_total_blocktime = false;
+
+if (!dc) {
+return;
+}
+
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+/* lookup cpu, to clear it,
+ * that algorithm looks straighforward, but it's not
+ * optimal, more optimal algorithm is keeping tree or hash
+ * where key is address value is a list of  */
+for (i = 0; i < smp_cpus; i++) {
+uint64_t vcpu_blocktime = 0;
+if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
+continue;
+}
+atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+vcpu_blocktime = now_ms -
+atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+affected_cpu += 1;
+/* we need to know is that mark_postcopy_end was due to
+ * faulted page, another possible case it's prefetched
+ * page and in that case we shouldn't be here */
+if (!vcp

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread Alexey Perevalov


On 06/01/2017 01:57 PM, Dr. David Alan Gilbert wrote:

* Alexey Perevalov (a.pereva...@samsung.com) wrote:

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov 




+if (dc->vcpu_addr[cpu] == 0) {
+atomic_inc(&dc->smp_cpus_down);
+}
+
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);

I was wondering if this could be done with atomic_cmpxchg with old=0,
but the behaviour would be different in the case where vcpu_addr[cpu]
wasn't zero  or the 'addr'; so I think allowing it to cope with that
case seems better.


atomic_xchg__nocheck isn't atomic_cmpxchg, it is based on __atomic_exchange_n, 
( from reference
 It writesval  into|*ptr|, and returns the previous contents of|*ptr ) so I 
leave it as is. |



Dave


+atomic_xchg__nocheck(&dc->last_begin, now_ms);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+
+trace_mark_postcopy_blocktime_begin(addr, dc, 
dc->page_fault_vcpu_time[cpu],
+cpu);
+}
+
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int i, affected_cpu = 0;
+int64_t now_ms;
+bool vcpu_total_blocktime = false;
+unsigned long int nr_bit;
+
+if (!dc) {
+return;
+}
+/* mark that page as copied */
+nr_bit = get_copied_bit_offset(addr);
+set_bit_atomic(nr_bit, mis->copied_pages);
+
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+/* lookup cpu, to clear it,
+ * that algorithm looks straighforward, but it's not
+ * optimal, more optimal algorithm is keeping tree or hash
+ * where key is address value is a list of  */
+for (i = 0; i < smp_cpus; i++) {
+uint64_t vcpu_blocktime = 0;
+if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
+continue;
+}
+atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+vcpu_blocktime = now_ms -
+atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+affected_cpu += 1;
+/* we need to know is that mark_postcopy_end was due to
+ * faulted page, another possible case it's prefetched
+ * page and in that case we shouldn't be here */
+if (!vcpu_total_blocktime &&
+atomic_fetch_add(&dc->smp_cpus_down, 0) == smp_cpus) {
+vcpu_total_blocktime = true;
+}
+/* continue cycle, due to one page could affect several vCPUs */
+dc->vcpu_blocktime[i] += vcpu_blocktime;
+}
+
+atomic_sub(&dc->smp_cpus_down, affected_cpu);
+if (vcpu_total_blocktime) {
+dc->total_blocktime += now_ms - atomic_fetch_add(&dc->last_begin, 0);
+}
+trace_mark_postcopy_blocktime_end(addr, dc, dc->total_blocktime);
+}
+
  /*
   * Handle faults detected by the USERFAULT markings
   */
@@ -654,8 +750,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
  rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
  trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
  qemu_ram_get_idstr(rb),
-rb_offset);
+rb_offset,
+msg.arg.pagefault.feat.ptid);
  
+mark_postcopy_blocktime_begin((uintptr_t)(msg.arg.pagefault.address),

+msg.arg.pagefault.feat.ptid, rb);
  /*
   * Send the request to the source - we want to request one
   * of our host page sizes (which is >= TPS)
@@ -750,6 +849,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
*host, void *from,
  
  return -e;

  }
+mark_postcopy_blocktime_end((uint64_t)(uintptr_t)host);
  
  trace_postcopy_place_page(host);

  return 0;
diff --git a/migration/trace-events b/migration/trace-events
index 5b8ccf3..7bdadbb 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -112,6 +112,8 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d 
postcopy-state=%d"
  process_incoming_migration_co_postcopy_end_main(void) ""
  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p 
ioctype=%s"
  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  
"ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_blocktime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "add

[Qemu-devel] [[PATCH V7] 05/11] migration: introduce postcopy-blocktime capability

2017-06-07 Thread Alexey Perevalov

Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h | 1 +
 migration/migration.c | 9 +
 qapi-schema.json  | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 79b5484..2e61df5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -189,6 +189,7 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
+bool migrate_postcopy_blocktime(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_message(MigrationIncomingState *mis,
diff --git a/migration/migration.c b/migration/migration.c
index 2a77636..d1cc34f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1371,6 +1371,15 @@ bool migrate_zero_blocks(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
 }
 
+bool migrate_postcopy_blocktime(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME];
+}
+
 bool migrate_use_compression(void)
 {
 MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 4b50b65..e906953 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -900,12 +900,15 @@
 #  offers more flexibility.
 #  (Since 2.10)
 #
+# @postcopy-blocktime: Calculate downtime for postcopy live migration
+# (since 2.10)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-   'block' ] }
+   'block', 'postcopy-blocktime'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
1.9.1

[Qemu-devel] [PATCH 0/5] Code cleanups with Coccinelle

2017-06-07 Thread Marc-André Lureau

Hi,

Here is a few patches I came up with playing with some semantic
patches. If necessary, I can split them further by unit/domains, but
that will likely make a quite long series of patches.

Marc-André Lureau (5):
  coccinelle: replace code with ROUND_UP macro
  coccinelle: use DIV_ROUND_UP
  arch: introduce ELF_NOTE_SIZE macro
  Replace g_malloc()+memcpy() with g_memdup()
  coccinelle: prefer glib g_new/g_renew macros

 hw/lm32/lm32_hwsetup.h  |  2 +-
 include/hw/elf_ops.h|  2 +-
 include/qemu/timer.h|  2 +-
 audio/alsaaudio.c   |  2 +-
 audio/coreaudio.c   |  2 +-
 audio/dsoundaudio.c |  2 +-
 audio/ossaudio.c|  2 +-
 audio/paaudio.c |  2 +-
 audio/wavaudio.c|  2 +-
 backends/cryptodev.c|  2 +-
 block/qed-check.c   |  3 ++-
 bootdevice.c|  2 +-
 bsd-user/syscall.c  |  2 +-
 bt-host.c   |  2 +-
 bt-vhci.c   |  2 +-
 cpus-common.c   |  4 ++--
 cpus.c  | 16 
 disas/ia64.c|  4 ++--
 dma-helpers.c   |  4 ++--
 dump.c  | 10 +-
 gdbstub.c   |  4 ++--
 hw/9pfs/9p-handle.c |  2 +-
 hw/9pfs/9p-proxy.c  |  2 +-
 hw/9pfs/9p-synth.c  |  5 ++---
 hw/9pfs/9p.c|  6 +++---
 hw/9pfs/xen-9p-backend.c|  6 +++---
 hw/acpi/memory_hotplug.c|  2 +-
 hw/audio/intel-hda.c|  2 +-
 hw/bt/core.c|  4 ++--
 hw/bt/hci.c |  4 ++--
 hw/bt/l2cap.c   |  4 ++--
 hw/bt/sdp.c |  6 +++---
 hw/char/parallel.c  |  2 +-
 hw/char/serial.c|  4 ++--
 hw/char/sh_serial.c |  2 +-
 hw/char/virtio-serial-bus.c | 19 +--
 hw/core/irq.c   |  2 +-
 hw/core/ptimer.c|  2 +-
 hw/core/reset.c |  2 +-
 hw/cris/axis_dev88.c|  2 +-
 hw/display/pxa2xx_lcd.c |  2 +-
 hw/display/tc6393xb.c   |  2 +-
 hw/display/vga.c|  2 +-
 hw/display/virtio-gpu.c |  8 
 hw/display/xenfb.c  |  4 ++--
 hw/dma/etraxfs_dma.c|  2 +-
 hw/dma/rc4030.c |  4 ++--
 hw/dma/soc_dma.c|  6 ++
 hw/i2c/bitbang_i2c.c|  2 +-
 hw/i2c/core.c   |  4 ++--
 hw/i386/amd_iommu.c |  4 ++--
 hw/i386/intel_iommu.c   |  2 +-
 hw/i386/kvm/pci-assign.c|  2 +-
 hw/i386/multiboot.c |  3 +--
 hw/i386/pc.c|  5 ++---
 hw/i386/xen/xen-hvm.c   | 12 ++--
 hw/i386/xen/xen-mapcache.c  | 14 +++---
 hw/input/pckbd.c|  2 +-
 hw/input/ps2.c  |  4 ++--
 hw/input/pxa2xx_keypad.c|  2 +-
 hw/input/tsc2005.c  |  3 +--
 hw/input/virtio-input.c |  4 ++--
 hw/intc/exynos4210_gic.c|  2 +-
 hw/intc/heathrow_pic.c  |  2 +-
 hw/intc/xics.c  |  2 +-
 hw/intc/xics_kvm.c  |  2 +-
 hw/lm32/lm32_boards.c   |  4 ++--
 hw/lm32/milkymist.c |  2 +-
 hw/m68k/mcf5206.c   |  4 ++--
 hw/m68k/mcf5208.c   |  2 +-
 hw/mips/mips_malta.c|  2 +-
 hw/mips/mips_mipssim.c  |  2 +-
 hw/mips/mips_r4k.c  |  2 +-
 hw/misc/applesmc.c  |  2 +-
 hw/misc/imx6_src.c  |  2 +-
 hw/misc/ivshmem.c   |  4 ++--
 hw/misc/macio/mac_dbdma.c   |  2 +-
 hw/misc/pci-testdev.c   |  2 +-
 hw/net/eepro100.c   |  3 +--
 hw/net/net_rx_pkt.c |  2 +-
 hw/net/virtio-net.c |  2 +-
 hw/pci/msix.c   |  6 +++---
 hw/pci/pci.c|  2 +-
 hw/pci/pcie_aer.c   |  4 ++--
 hw/ppc/e500.c   |  4 ++--
 hw/ppc/mac_newworld.c   |  2 +-
 hw/ppc/mac_oldworld.c   |  2 +-
 hw/ppc/ppc.c|  8 
 hw/ppc/ppc405_boards.c  |  8 
 hw/ppc/ppc405_uc.c  | 28 ++--
 hw/ppc/ppc440_bamboo.c  |  4 ++--
 hw/ppc/ppc4xx_devs.c|  4 ++--
 hw/ppc/ppc_booke.c  |  4 ++--
 hw/ppc/prep.c   |  2 +-
 hw/ppc/spapr.c  |  4 ++--
 hw/ppc/spapr_events.c   |  2 +-
 hw/ppc/spapr_iommu.c|  2 +-
 hw/ppc/spapr_pci.c

[Qemu-devel] [PATCH 1/5] coccinelle: replace code with ROUND_UP macro

2017-06-07 Thread Marc-André Lureau

I used a the following coccinelle script:

@@
expression e1;
@@
- ((e1) + (3)) / (4) * (4)
+ ROUND_UP(e1,4)

@@
expression e1;
expression e2;
@@
-(ROUND_UP(e1,e2))
+ROUND_UP(e1,e2)

I tried with various other values (4, 8, 16, 32), but got only the
matches in this patch.

Signed-off-by: Marc-André Lureau 
---
 target/i386/arch_dump.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/i386/arch_dump.c b/target/i386/arch_dump.c
index fe0aa36932..e6788250b8 100644
--- a/target/i386/arch_dump.c
+++ b/target/i386/arch_dump.c
@@ -84,9 +84,9 @@ static int x86_64_write_elf64_note(WriteCoreDumpFunction f,
 note->n_descsz = cpu_to_le32(descsz);
 note->n_type = cpu_to_le32(NT_PRSTATUS);
 buf = (char *)note;
-buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+buf += ROUND_UP(sizeof(Elf64_Nhdr), 4);
 memcpy(buf, name, name_size);
-buf += ((name_size + 3) / 4) * 4;
+buf += ROUND_UP(name_size, 4);
 memcpy(buf + 32, &id, 4); /* pr_pid */
 buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
 memcpy(buf, ®s, sizeof(x86_64_user_regs_struct));
@@ -163,9 +163,9 @@ static int x86_write_elf64_note(WriteCoreDumpFunction f, 
CPUX86State *env,
 note->n_descsz = cpu_to_le32(descsz);
 note->n_type = cpu_to_le32(NT_PRSTATUS);
 buf = (char *)note;
-buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+buf += ROUND_UP(sizeof(Elf64_Nhdr), 4);
 memcpy(buf, name, name_size);
-buf += ((name_size + 3) / 4) * 4;
+buf += ROUND_UP(name_size, 4);
 memcpy(buf, &prstatus, sizeof(prstatus));
 
 ret = f(note, note_size, opaque);
@@ -218,9 +218,9 @@ int x86_cpu_write_elf32_note(WriteCoreDumpFunction f, 
CPUState *cs,
 note->n_descsz = cpu_to_le32(descsz);
 note->n_type = cpu_to_le32(NT_PRSTATUS);
 buf = (char *)note;
-buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
+buf += ROUND_UP(sizeof(Elf32_Nhdr), 4);
 memcpy(buf, name, name_size);
-buf += ((name_size + 3) / 4) * 4;
+buf += ROUND_UP(name_size, 4);
 memcpy(buf, &prstatus, sizeof(prstatus));
 
 ret = f(note, note_size, opaque);
-- 
2.13.0.91.g00982b8dd

[Qemu-devel] [PATCH 2/5] coccinelle: use DIV_ROUND_UP

2017-06-07 Thread Marc-André Lureau

The coccinelle/round.cocci script doesn't catch hard coded values.

I used the following script over qemu code base:

(
- ((e1) + 3) / (4)
+ DIV_ROUND_UP(e1,4)
|
- ((e1) + (3)) / (4)
+ DIV_ROUND_UP(e1,4)
|
- ((e1) + 7) / (8)
+ DIV_ROUND_UP(e1,8)
|
- ((e1) + (7)) / (8)
+ DIV_ROUND_UP(e1,8)
|
- ((e1) + 15) / (16)
+ DIV_ROUND_UP(e1,16)
|
- ((e1) + (15)) / (16)
+ DIV_ROUND_UP(e1,16)
|
- ((e1) + 31) / (32)
+ DIV_ROUND_UP(e1,32)
|
- ((e1) + (31)) / (32)
+ DIV_ROUND_UP(e1,32)
)

Signed-off-by: Marc-André Lureau 
---
 block/qed-check.c   |  3 ++-
 disas/ia64.c|  4 ++--
 hw/char/virtio-serial-bus.c | 11 ++-
 hw/display/vga.c|  2 +-
 hw/display/virtio-gpu.c |  4 ++--
 hw/pci/msix.c   |  4 ++--
 hw/usb/dev-hub.c|  8 
 libdecnumber/decNumber.c|  2 +-
 target/i386/arch_dump.c | 25 +++--
 target/ppc/kvm.c|  4 ++--
 target/ppc/mem_helper.c |  2 +-
 target/ppc/translate.c  |  2 +-
 ui/cursor.c |  2 +-
 ui/vnc-enc-tight.c  |  2 +-
 ui/vnc.c|  3 ++-
 15 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/block/qed-check.c b/block/qed-check.c
index dcd4f036b8..f447053d24 100644
--- a/block/qed-check.c
+++ b/block/qed-check.c
@@ -228,7 +228,8 @@ int qed_check(BDRVQEDState *s, BdrvCheckResult *result, 
bool fix)
 };
 int ret;
 
-check.used_clusters = g_try_new0(uint32_t, (check.nclusters + 31) / 32);
+check.used_clusters = g_try_new0(uint32_t,
+ DIV_ROUND_UP(check.nclusters, 32));
 if (check.nclusters && check.used_clusters == NULL) {
 return -ENOMEM;
 }
diff --git a/disas/ia64.c b/disas/ia64.c
index 140754c944..bf576d3099 100644
--- a/disas/ia64.c
+++ b/disas/ia64.c
@@ -10156,14 +10156,14 @@ locate_opcode_ent (ia64_insn opcode, enum 
ia64_insn_type type)
}
  if (x > count)
{
- next_op = op_pointer + ((oplen + 7) / 8);
+ next_op = op_pointer + (DIV_ROUND_UP(oplen, 8));
  currbitnum -= count;
  break;
}
}
  else if (! currbit)
{
- next_op = op_pointer + ((oplen + 7) / 8);
+ next_op = op_pointer + (DIV_ROUND_UP(oplen, 8));
  break;
}
}
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f5bc173844..823e1c915c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -663,7 +663,7 @@ static void virtio_serial_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 
 /* The ports map */
 max_nr_ports = s->serial.max_virtserial_ports;
-for (i = 0; i < (max_nr_ports + 31) / 32; i++) {
+for (i = 0; i < DIV_ROUND_UP(max_nr_ports, 32); i++) {
 qemu_put_be32s(f, &s->ports_map[i]);
 }
 
@@ -798,7 +798,7 @@ static int virtio_serial_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 qemu_get_be32s(f, &tmp);
 
 max_nr_ports = s->serial.max_virtserial_ports;
-for (i = 0; i < (max_nr_ports + 31) / 32; i++) {
+for (i = 0; i < DIV_ROUND_UP(max_nr_ports, 32); i++) {
 qemu_get_be32s(f, &ports_map);
 
 if (ports_map != s->ports_map[i]) {
@@ -863,7 +863,7 @@ static uint32_t find_free_port_id(VirtIOSerial *vser)
 unsigned int i, max_nr_ports;
 
 max_nr_ports = vser->serial.max_virtserial_ports;
-for (i = 0; i < (max_nr_ports + 31) / 32; i++) {
+for (i = 0; i < DIV_ROUND_UP(max_nr_ports, 32); i++) {
 uint32_t map, zeroes;
 
 map = vser->ports_map[i];
@@ -1075,8 +1075,9 @@ static void virtio_serial_device_realize(DeviceState 
*dev, Error **errp)
 vser->ovqs[i] = virtio_add_queue(vdev, 128, handle_output);
 }
 
-vser->ports_map = g_malloc0(((vser->serial.max_virtserial_ports + 31) / 32)
-* sizeof(vser->ports_map[0]));
+vser->ports_map =
+g_malloc0(DIV_ROUND_UP(vser->serial.max_virtserial_ports, 32)
+  * sizeof(vser->ports_map[0]));
 /*
  * Reserve location 0 for a console port for backward compat
  * (old kernel, new qemu)
diff --git a/hw/display/vga.c b/hw/display/vga.c
index dcc95f88e2..c2d3e8f54b 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1621,7 +1621,7 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
s->line_compare, sr(s, VGA_SEQ_CLOCK_MODE));
 #endif
 addr1 = (s->start_addr * 4);
-bwidth = (width * bits + 7) / 8;
+bwidth = DIV_ROUND_UP(width * bits, 8);
 y_start = -1;
 d = surface_data(surface);
 linesize = surface_stride(surface);
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 58dc0b2737..641f57e7c5 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -408,7 +408,7 @@ static void virtio_gpu_transfer_to_host_2d(VirtIOGPU *g,
 }

[Qemu-devel] [PATCH 3/5] arch: introduce ELF_NOTE_SIZE macro

2017-06-07 Thread Marc-André Lureau

Factour out a common pattern to compute the ELF note size.

Signed-off-by: Marc-André Lureau 
---
 target/i386/arch_dump.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/target/i386/arch_dump.c b/target/i386/arch_dump.c
index 158e056b59..4471f44e3d 100644
--- a/target/i386/arch_dump.c
+++ b/target/i386/arch_dump.c
@@ -18,6 +18,11 @@
 #include "elf.h"
 #include "sysemu/memory_mapping.h"
 
+#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
+((DIV_ROUND_UP((hdr_size), 4)   \
+  + DIV_ROUND_UP((name_size), 4)\
+  + DIV_ROUND_UP((desc_size), 4)) * 4)
+
 #ifdef TARGET_X86_64
 typedef struct {
 target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
@@ -77,9 +82,7 @@ static int x86_64_write_elf64_note(WriteCoreDumpFunction f,
 regs.gs = env->segs[R_GS].selector;
 
 descsz = sizeof(x86_64_elf_prstatus);
-note_size = (DIV_ROUND_UP(sizeof(Elf64_Nhdr), 4)
- + DIV_ROUND_UP(name_size, 4)
- + DIV_ROUND_UP(descsz, 4)) * 4;
+note_size = ELF_NOTE_SIZE(sizeof(Elf64_Nhdr), name_size, descsz);
 note = g_malloc0(note_size);
 note->n_namesz = cpu_to_le32(name_size);
 note->n_descsz = cpu_to_le32(descsz);
@@ -157,9 +160,7 @@ static int x86_write_elf64_note(WriteCoreDumpFunction f, 
CPUX86State *env,
 
 x86_fill_elf_prstatus(&prstatus, env, id);
 descsz = sizeof(x86_elf_prstatus);
-note_size = (DIV_ROUND_UP(sizeof(Elf64_Nhdr), 4)
- + DIV_ROUND_UP(name_size, 4)
- + DIV_ROUND_UP(descsz, 4)) * 4;
+note_size = ELF_NOTE_SIZE(sizeof(Elf64_Nhdr), name_size, descsz);
 note = g_malloc0(note_size);
 note->n_namesz = cpu_to_le32(name_size);
 note->n_descsz = cpu_to_le32(descsz);
@@ -213,9 +214,7 @@ int x86_cpu_write_elf32_note(WriteCoreDumpFunction f, 
CPUState *cs,
 
 x86_fill_elf_prstatus(&prstatus, &cpu->env, cpuid);
 descsz = sizeof(x86_elf_prstatus);
-note_size = (DIV_ROUND_UP(sizeof(Elf32_Nhdr), 4)
- + DIV_ROUND_UP(name_size, 4)
- + DIV_ROUND_UP(descsz, 4)) * 4;
+note_size = ELF_NOTE_SIZE(sizeof(Elf32_Nhdr), name_size, descsz);
 note = g_malloc0(note_size);
 note->n_namesz = cpu_to_le32(name_size);
 note->n_descsz = cpu_to_le32(descsz);
@@ -446,12 +445,8 @@ ssize_t cpu_get_note_size(int class, int machine, int 
nr_cpus)
 #endif
 qemu_desc_size = sizeof(QEMUCPUState);
 
-elf_note_size = (DIV_ROUND_UP(note_head_size, 4)
- + DIV_ROUND_UP(name_size, 4)
- + DIV_ROUND_UP(elf_desc_size, 4)) * 4;
-qemu_note_size = (DIV_ROUND_UP(note_head_size, 4)
-  + DIV_ROUND_UP(name_size, 4)
-  + DIV_ROUND_UP(qemu_desc_size, 4)) * 4;
+elf_note_size = ELF_NOTE_SIZE(note_head_size, name_size, elf_desc_size);
+qemu_note_size = ELF_NOTE_SIZE(note_head_size, name_size, qemu_desc_size);
 
 return (elf_note_size + qemu_note_size) * nr_cpus;
 }
-- 
2.13.0.91.g00982b8dd

[Qemu-devel] [PATCH 4/5] Replace g_malloc()+memcpy() with g_memdup()

2017-06-07 Thread Marc-André Lureau

I found these pattern via grepping the source tree. I don't have a
coccinelle script for it!

Signed-off-by: Marc-André Lureau 
---
 hw/9pfs/9p-synth.c  | 3 +--
 hw/i386/multiboot.c | 3 +--
 hw/net/eepro100.c   | 3 +--
 tests/test-iov.c| 3 +--
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index 4b6d4e6a3f..df0a8de08a 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -494,8 +494,7 @@ static int synth_name_to_path(FsContext *ctx, V9fsPath 
*dir_path,
 }
 out:
 /* Copy the node pointer to fid */
-target->data = g_malloc(sizeof(void *));
-memcpy(target->data, &node, sizeof(void *));
+target->data = g_memdup(&node, sizeof(void *));
 target->size = sizeof(void *);
 return 0;
 }
diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index f13e23139b..6001f4caa2 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -352,8 +352,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mb_debug("   mb_mods_count = %d\n", mbs.mb_mods_count);
 
 /* save bootinfo off the stack */
-mb_bootinfo_data = g_malloc(sizeof(bootinfo));
-memcpy(mb_bootinfo_data, bootinfo, sizeof(bootinfo));
+mb_bootinfo_data = g_memdup(bootinfo, sizeof(bootinfo));
 
 /* Pass variables to option rom */
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, mh_entry_addr);
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 4bf71f2d85..7d3b2e52c7 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1894,8 +1894,7 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error 
**errp)
 
 qemu_register_reset(nic_reset, s);
 
-s->vmstate = g_malloc(sizeof(vmstate_eepro100));
-memcpy(s->vmstate, &vmstate_eepro100, sizeof(vmstate_eepro100));
+s->vmstate = g_memdup(&vmstate_eepro100, sizeof(vmstate_eepro100));
 s->vmstate->name = qemu_get_queue(s->nic)->model;
 vmstate_register(&pci_dev->qdev, -1, s->vmstate, s);
 }
diff --git a/tests/test-iov.c b/tests/test-iov.c
index a22d71fd2c..fa3d75aee1 100644
--- a/tests/test-iov.c
+++ b/tests/test-iov.c
@@ -167,8 +167,7 @@ static void test_io(void)
 }
 iov_from_buf(iov, niov, 0, buf, sz);
 
-siov = g_malloc(sizeof(*iov) * niov);
-memcpy(siov, iov, sizeof(*iov) * niov);
+siov = g_memdup(iov, sizeof(*iov) * niov);
 
 if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) < 0) {
perror("socketpair");
-- 
2.13.0.91.g00982b8dd

[Qemu-devel] [PATCH 5/5] coccinelle: prefer glib g_new/g_renew macros

2017-06-07 Thread Marc-André Lureau

The g_new() familly of macros is simpler and safer than g_malloc().

"The return pointer is cast to the given type... Care is taken to
avoid overflow when calculating the size of the allocated block."

I left out the common g_malloc(sizeof(*ptr)) pattern, since
alternative "g_new(typeof(*ptr))" isn't very common. But we may want
to change that too?

Here is the cocci script I used, then I edited manually a few
changes (I removed useless cast for ex):

@@
expression e1;
expression e2;
expression mem;
type t1;
@@
(
- g_malloc0(sizeof(*e2))
+ g_malloc0(sizeof(*e2))
|
- g_malloc(sizeof(*e2))
+ g_malloc(sizeof(*e2))
|
- g_realloc(mem, (e1) * sizeof(*e2))
+ g_renew(typeof(*e2), mem, e1)
|
- g_malloc0((e1) * sizeof(*e2))
+ g_new0(typeof(*e2), e1)
|
- g_malloc((e1) * sizeof(*e2))
+ g_new(typeof(*e2), e1)
|
- g_realloc(mem, (e1) * sizeof(e2[0]))
+ g_renew(typeof(e2[0]), mem, e1)
|
- g_realloc(mem, (e1) * sizeof(e2))
+ g_renew(e2, mem, e1)
|
- g_malloc0((e1) * sizeof(e2[0]))
+ g_new0(typeof(e2[0]), e1)
|
- g_malloc0((e1) * sizeof(e2))
+ g_new0(e2, e1)
|
- g_malloc((e1) * sizeof(e2[0]))
+ g_new(typeof(e2[0]), e1)
|
- g_malloc((e1) * sizeof(e2))
+ g_new(e2, e1)
|
- g_realloc(mem, (e1) * sizeof(t1))
+ g_renew(t1, mem, e1)
|
- g_malloc0((e1) * sizeof(t1))
+ g_new0(t1, e1)
|
- g_malloc((e1) * sizeof(t1))
+ g_new(t1, e1)
|
- g_malloc0(sizeof(e2[0]))
+ g_new0(typeof(e2[0]), 1)
|
- g_malloc0(sizeof(e2))
+ g_new0(e2, 1)
|
- g_malloc(sizeof(e2[0]))
+ g_new(typeof(e2[0]), 1)
|
- g_malloc(sizeof(e2))
+ g_new(e2, 1)
|
- g_malloc0(sizeof(t1))
+ g_new0(t1, 1)
|
- g_malloc(sizeof(t1))
+ g_new(t1, 1)
)

Signed-off-by: Marc-André Lureau 
---
 hw/lm32/lm32_hwsetup.h  |  2 +-
 include/hw/elf_ops.h|  2 +-
 include/qemu/timer.h|  2 +-
 audio/alsaaudio.c   |  2 +-
 audio/coreaudio.c   |  2 +-
 audio/dsoundaudio.c |  2 +-
 audio/ossaudio.c|  2 +-
 audio/paaudio.c |  2 +-
 audio/wavaudio.c|  2 +-
 backends/cryptodev.c|  2 +-
 bootdevice.c|  2 +-
 bsd-user/syscall.c  |  2 +-
 bt-host.c   |  2 +-
 bt-vhci.c   |  2 +-
 cpus-common.c   |  4 ++--
 cpus.c  | 16 
 dma-helpers.c   |  4 ++--
 dump.c  | 10 +-
 gdbstub.c   |  4 ++--
 hw/9pfs/9p-handle.c |  2 +-
 hw/9pfs/9p-proxy.c  |  2 +-
 hw/9pfs/9p-synth.c  |  2 +-
 hw/9pfs/9p.c|  6 +++---
 hw/9pfs/xen-9p-backend.c|  6 +++---
 hw/acpi/memory_hotplug.c|  2 +-
 hw/audio/intel-hda.c|  2 +-
 hw/bt/core.c|  4 ++--
 hw/bt/hci.c |  4 ++--
 hw/bt/l2cap.c   |  4 ++--
 hw/bt/sdp.c |  6 +++---
 hw/char/parallel.c  |  2 +-
 hw/char/serial.c|  4 ++--
 hw/char/sh_serial.c |  2 +-
 hw/char/virtio-serial-bus.c | 12 +---
 hw/core/irq.c   |  2 +-
 hw/core/ptimer.c|  2 +-
 hw/core/reset.c |  2 +-
 hw/cris/axis_dev88.c|  2 +-
 hw/display/pxa2xx_lcd.c |  2 +-
 hw/display/tc6393xb.c   |  2 +-
 hw/display/virtio-gpu.c |  4 ++--
 hw/display/xenfb.c  |  4 ++--
 hw/dma/etraxfs_dma.c|  2 +-
 hw/dma/rc4030.c |  4 ++--
 hw/dma/soc_dma.c|  6 ++
 hw/i2c/bitbang_i2c.c|  2 +-
 hw/i2c/core.c   |  4 ++--
 hw/i386/amd_iommu.c |  4 ++--
 hw/i386/intel_iommu.c   |  2 +-
 hw/i386/kvm/pci-assign.c|  2 +-
 hw/i386/pc.c|  5 ++---
 hw/i386/xen/xen-hvm.c   | 12 ++--
 hw/i386/xen/xen-mapcache.c  | 14 +++---
 hw/input/pckbd.c|  2 +-
 hw/input/ps2.c  |  4 ++--
 hw/input/pxa2xx_keypad.c|  2 +-
 hw/input/tsc2005.c  |  3 +--
 hw/input/virtio-input.c |  4 ++--
 hw/intc/exynos4210_gic.c|  2 +-
 hw/intc/heathrow_pic.c  |  2 +-
 hw/intc/xics.c  |  2 +-
 hw/intc/xics_kvm.c  |  2 +-
 hw/lm32/lm32_boards.c   |  4 ++--
 hw/lm32/milkymist.c |  2 +-
 hw/m68k/mcf5206.c   |  4 ++--
 hw/m68k/mcf5208.c   |  2 +-
 hw/mips/mips_malta.c|  2 +-
 hw/mips/mips_mipssim.c  |  2 +-
 hw/mips/mips_r4k.c  |  2 +-
 hw/misc/applesmc.c  |  2 +-
 hw/misc/imx6_src.c  |  2 +-
 hw/misc/ivshmem.c   |  4 ++--
 hw/misc/

Re: [Qemu-devel] [RFC PATCH 8/8] iommu: introduce hw/core/iommu

2017-06-07 Thread Liu, Yi L

Hi Peter,

Some updates on it.

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Thursday, April 27, 2017 5:34 PM
> To: qemu-devel@nongnu.org
> Cc: Lan, Tianyu ; Paolo Bonzini ;
> Tian, Kevin ; Liu, Yi L ;
> pet...@redhat.com; Jason Wang ; David Gibson
> ; Alex Williamson 
> Subject: [RFC PATCH 8/8] iommu: introduce hw/core/iommu
> 
> Time to consider a common stuff for IOMMU. Let's start from an common IOMMU
> object (which should be inlayed in custom IOMMU implementations) and a 
> notifier
> mechanism.
> 
> Let VT-d IOMMU be the first user.
> 
> An example to use this per-iommu notifier:
> 
>   (when registering)
>   iommu = address_space_iommu_get(pci_device_iommu_address_space(dev));
>   notifier = iommu_notifier_register(iommu, IOMMU_EVENT_SVM_PASID, func);
>   ...
>   (when notify)
>   IOMMUEvent event = { .type = IOMMU_EVENT_SVM_PASID ... };
>   iommu_notify(iommu, &event);
>   ...
>   (when releasing)
>   iommu_notifier_unregister(notifier);
>   notifier = NULL;
> 
> Signed-off-by: Peter Xu 
> ---

[...]

> +#include "qemu/osdep.h"
> +#include "hw/core/iommu.h"
> +
> +IOMMUNotifier *iommu_notifier_register(IOMMUObject *iommu,
> +   IOMMUNotifyFn fn,
> +   uint64_t event_mask) {
> +IOMMUNotifier *notifier = g_new0(IOMMUNotifier, 1);

For this part, I think may need to consider to alloc the memory in a
similar way with IOMMUMRNotifier. The notifier surely needs to
be connect with vfio container so that it could manipulate
pIOMMU through vfio IOCTL.

I'm thinking of adding a new struct VFIOGuestIOMMUObject which
is similar to strcut VFIOGuestIOMMU. And have the original struct
VFIOGuestIOMMU modified to be VFIOGuestIOMMUMR.

Then there would be following definition in
"include\hw\vfio\vfio-common.h":

typedef struct VFIOGuestIOMMUObject {
VFIOContainer *container;
IOMMUObject *iommu;
IOMMUNotifier n; // n is for non-MemoryRegion related events, e.g. pasid 
table binding
QLIST_ENTRY(VFIOGuestIOMMUObject) giommu_next;
} VFIOGuestIOMMUObject;

typedef struct VFIOGuestIOMMUMR {
VFIOContainer *container;
MemoryRegion *iommu;
hwaddr iommu_offset;
IOMMUNotifier n;
QLIST_ENTRY(VFIOGuestIOMMUMR) giommu_next;
} VFIOGuestIOMMUMR;

How about your opinion?

Thanks,
Yi L

Re: [Qemu-devel] [PATCH v4 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream

2017-06-07 Thread Bharata B Rao

On Thu, Jun 01, 2017 at 02:54:48PM +1000, David Gibson wrote:
> On Wed, May 31, 2017 at 04:56:44PM +0530, Bharata B Rao wrote:
> > Add a "no HPT" encoding (using value -1) to the HTAB migration
> > stream (in the place of HPT size) when the guest doesn't allocate HPT.
> > This will help the target side to match target HPT with the source HPT
> > and thus enable successful migration.
> > 
> > A few more fixes to enable TCG migration to work correctly are also
> > included in this commit:
> > 
> > - HTAB savevm handlers have a few asserts on kvm_enabled() when
> >   spapr->htab != 0. Convert these into conditional checks as it is now
> >   possible to have no HTAB with TCG radix guests.
> > - htab_save_setup() asserts for kvm_enabled() when spapr->htab != 0.
> >   Remove this as we can't assert this for TCG radix guests.
> > 
> > Suggested-by: David Gibson 
> >   [no HPT encoding suggestion]
> > Signed-off-by: Bharata B Rao 
> 
> Looks basically ok, but there are still some details to address.
> 
> > ---
> >  hw/ppc/spapr.c | 31 +--
> >  1 file changed, 17 insertions(+), 14 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index ab3aab1..b589ed4 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1559,17 +1559,18 @@ static int htab_save_setup(QEMUFile *f, void 
> > *opaque)
> >  {
> >  sPAPRMachineState *spapr = opaque;
> >  
> > -/* "Iteration" header */
> > -qemu_put_be32(f, spapr->htab_shift);
> > +/* "Iteration" header: no-HPT or HPT size encoding */
> > +if (!spapr->htab_shift) {
> > +qemu_put_be32(f, -1);
> 
> We're already using htab_shift == 0 to represent no HPT in the runtime
> structure; we might as well do the same on the wire.  As a bonus it
> slightly simplifies the logic here.

Non-zero value of iteration header (which is htab_shift) results in
htab_load() at the target to reallocate HTAB.

zero value of iteration header is used by htab_save_iterate() and
htab_save_complete() to tell htab_load() not to freshly allocate HTAB
at the target.

Hence we can't use 0 value to mean no-HPT.

I have addressed the rest of the comments on asserts by ensuring that
those code paths are taken only when HPT is present. v5 has those
changes.

Regards,
Bharata.

[Qemu-devel] [PATCH v5 0/2] ppc/spapr: Fix migration of radix guests

2017-06-07 Thread Bharata B Rao

This patchset fixes the migration of sPAPR radix guests.

Changes in v5
-
- Ensured that assert(kvm_enabled()) isn't touched in any HTAB savevm
  handlers except in htab_save_setup() where it is made conditional on
  htab_shift.

v4: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg07058.html

Bharata B Rao (2):
  spapr: Add a "no HPT" encoding to HTAB migration stream
  spapr: Fix migration of Radix guests

 hw/ppc/spapr.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH v5 2/2] spapr: Fix migration of Radix guests

2017-06-07 Thread Bharata B Rao

Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry 
Signed-off-by: Bharata B Rao 
Reviewed-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index df27c5c..4a33c06 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1442,6 +1442,18 @@ static int spapr_post_load(void *opaque, int version_id)
 err = spapr_rtc_import_offset(&spapr->rtc, spapr->rtc_offset);
 }
 
+if (spapr->patb_entry) {
+PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
+bool radix = !!(spapr->patb_entry & PATBE1_GR);
+bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE);
+
+err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry);
+if (err) {
+error_report("Process table config unsupported by the host");
+return -EINVAL;
+}
+}
+
 return err;
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 1/2] spapr: Add a "no HPT" encoding to HTAB migration stream

2017-06-07 Thread Bharata B Rao

Add a "no HPT" encoding (using value -1) to the HTAB migration
stream (in the place of HPT size) when the guest doesn't allocate HPT.
This will help the target side to match target HPT with the source HPT
and thus enable successful migration.

Suggested-by: David Gibson 
Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 86e6228..df27c5c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1557,13 +1557,19 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 sPAPRMachineState *spapr = opaque;
 
 /* "Iteration" header */
-qemu_put_be32(f, spapr->htab_shift);
+if (!spapr->htab_shift) {
+qemu_put_be32(f, -1);
+} else {
+qemu_put_be32(f, spapr->htab_shift);
+}
 
 if (spapr->htab) {
 spapr->htab_save_index = 0;
 spapr->htab_first_pass = true;
 } else {
-assert(kvm_enabled());
+if (spapr->htab_shift) {
+assert(kvm_enabled());
+}
 }
 
 
@@ -1709,7 +1715,12 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
 int rc = 0;
 
 /* Iteration header */
-qemu_put_be32(f, 0);
+if (!spapr->htab_shift) {
+qemu_put_be32(f, -1);
+return 0;
+} else {
+qemu_put_be32(f, 0);
+}
 
 if (!spapr->htab) {
 assert(kvm_enabled());
@@ -1743,7 +1754,12 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
 int fd;
 
 /* Iteration header */
-qemu_put_be32(f, 0);
+if (!spapr->htab_shift) {
+qemu_put_be32(f, -1);
+return 0;
+} else {
+qemu_put_be32(f, 0);
+}
 
 if (!spapr->htab) {
 int rc;
@@ -1787,6 +1803,10 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 
 section_hdr = qemu_get_be32(f);
 
+if (section_hdr == -1) {
+return 0;
+}
+
 if (section_hdr) {
 Error *local_err = NULL;
 
-- 
2.7.4

[Qemu-devel] [PATCH] nvdimm acpi: fix region format interface code

2017-06-07 Thread Haozhong Zhang

Per ACPI 6.2, section 5.2.25.6 and JEDEC Annex L Release 3, the
current region format interface code 0x201 indicates the block
addressed function interface 1, rather than a byte addressable
interface. Fix it by using 0x301 which indicates the byte addressable
no energy backed function interface 1.

Signed-off-by: Haozhong Zhang 
---
 hw/acpi/nvdimm.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 8e7d6ec034..b5734f5897 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -338,9 +338,10 @@ static void nvdimm_build_structure_dcr(GArray *structures, 
DeviceState *dev)
 nfit_dcr->revision_id = cpu_to_le16(1 /* Current Revision supported
  in ACPI 6.0 is 1. */);
 nfit_dcr->serial_number = cpu_to_le32(sn);
-nfit_dcr->fic = cpu_to_le16(0x201 /* Format Interface Code. See Chapter
- 2: NVDIMM Device Specific Method
- (DSM) in DSM Spec Rev1.*/);
+nfit_dcr->fic = cpu_to_le16(0x301 /* Format Interface Code:
+ Byte addressable, no energy backed.
+ See ACPI 6.2, sect 5.2.25.6 and
+ JEDEC Annex L Release 3. */);
 }
 
 static GArray *nvdimm_build_device_structure(void)
-- 
2.11.0

Re: [Qemu-devel] [PULL 01/17] migration: remove register_savevm()

2017-06-07 Thread Juan Quintela

Peter Maydell  wrote:
> On 6 June 2017 at 03:51, David Gibson  wrote:
>> From: Laurent Vivier 
>>
>> We can replace the four remaining calls of register_savevm() by
>> calls to register_savevm_live(). So we can remove the function and
>> as we don't allocate anymore the ops pointer with g_new0()
>> we don't have to free it then.
>>
>> Signed-off-by: Laurent Vivier 
>> Reviewed-by: Juan Quintela 
>> Signed-off-by: David Gibson 
>> ---
>>  hw/net/vmxnet3.c|  8 ++--
>>  hw/s390x/s390-skeys.c   |  9 +++--
>>  hw/s390x/s390-virtio-ccw.c  |  8 ++--
>>  include/migration/vmstate.h |  8 
>>  migration/savevm.c  | 16 
>>  slirp/slirp.c   |  8 ++--
>>  6 files changed, 25 insertions(+), 32 deletions(-)
>
> Great to see register_savevm() finally disappearing.
>
> Any chance of an update to docs/migration.txt, which still
> mentions register_savevm(), but on the other hand doesn't
> say anything about register_savevm_live() and unregister_savevm().
> (Doc comments in the .h file for those functions would be
> nice too...)

Ok, will take a look.

> Things that would be interesting to explain/document:
>  * what is special about vmxnet3 that makes it the only pci device
>that needs to use this rather than having a vmstate struct?

Will take a look.  vmxnet3 used to be a mess (in relation to migration).

>  * why does s390-skeys call the register function with a NULL
>pointer but the unregister pointer with a device pointer?

No clue, will left that 

> (Could we replace the uses of these which pass a dev pointer
> with vmstate structs and then drop the dev parameter?)

Not sure, have to take a look.

Thanks, Juan.

Re: [Qemu-devel] [PATCH v3 11/16] virtio-scsi: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane

2017-06-07 Thread Fam Zheng

On Wed, 05/24 10:52, Fam Zheng wrote:
> blk_set_aio_context is audited by perm API, so follow the protocol and
> request for permission first.
> 
> Signed-off-by: Fam Zheng 
> ---
>  hw/scsi/virtio-scsi.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 46a3e3f..074e235 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -794,6 +794,10 @@ static void virtio_scsi_hotplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  return;
>  }
>  virtio_scsi_acquire(s);
> +if (!blk_request_perm(sd->conf.blk, BLK_PERM_AIO_CONTEXT_CHANGE, 
> errp)) {

Inversed condition, should be s/!//.

Fam

> +virtio_scsi_release(s);
> +return;
> +}
>  blk_set_aio_context(sd->conf.blk, s->ctx);
>  virtio_scsi_release(s);
>  
> -- 
> 2.9.4
> 
>

Re: [Qemu-devel] [[PATCH V7] 03/11] migration: fix hardcoded function name in error report

2017-06-07 Thread no-reply

Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Subject: [Qemu-devel] [[PATCH V7] 03/11] migration: fix hardcoded function name 
in error report
Message-id: 1496820931-27416-4-git-send-email-a.pereva...@samsung.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-mingw@fedora
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
68e091d 03/11] migration: fix hardcoded function name in error report

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-y__ux6w1/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-y__ux6w1/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=940f478d6aab
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
HAX support   no
RDMA support  no
TCG interpreter

[Qemu-devel] [PATCH] hw/ppc/spapr: Adjust firmware name for PCI bridges

2017-06-07 Thread Thomas Huth

SLOF uses "pci" as name for PCI bridges nodes in the device tree instead
of "pci-bridges", so booting via bootindex from a device behind a PCI
bridge currently does not work since QEMU passes the wrong name in the
"qemu,boot-list" property. Fix it by changing the name of the PCI bridge
nodes to "pci" instead.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170
Signed-off-by: Thomas Huth 
---
 hw/ppc/spapr.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 91b4057..27b1f3c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2441,6 +2441,12 @@ static char *spapr_get_fw_dev_path(FWPathProvider *p, 
BusState *bus,
 return g_strdup_printf("disk@%"PRIX64, (uint64_t)id << 32);
 }
 
+if (g_str_equal("pci-bridge", qdev_fw_name(dev))) {
+/* SLOF uses "pci" instead of "pci-bridge" for PCI bridges */
+PCIDevice *pcidev = CAST(PCIDevice, dev, TYPE_PCI_DEVICE);
+return g_strdup_printf("pci@%x", PCI_SLOT(pcidev->devfn));
+}
+
 return NULL;
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [RFC PATCH 8/8] iommu: introduce hw/core/iommu

2017-06-07 Thread Peter Xu

On Wed, Jun 07, 2017 at 07:51:55AM +, Liu, Yi L wrote:
> Hi Peter,
> 
> Some updates on it.
> 
> > -Original Message-
> > From: Peter Xu [mailto:pet...@redhat.com]
> > Sent: Thursday, April 27, 2017 5:34 PM
> > To: qemu-devel@nongnu.org
> > Cc: Lan, Tianyu ; Paolo Bonzini ;
> > Tian, Kevin ; Liu, Yi L ;
> > pet...@redhat.com; Jason Wang ; David Gibson
> > ; Alex Williamson 
> > Subject: [RFC PATCH 8/8] iommu: introduce hw/core/iommu
> > 
> > Time to consider a common stuff for IOMMU. Let's start from an common IOMMU
> > object (which should be inlayed in custom IOMMU implementations) and a 
> > notifier
> > mechanism.
> > 
> > Let VT-d IOMMU be the first user.
> > 
> > An example to use this per-iommu notifier:
> > 
> >   (when registering)
> >   iommu = address_space_iommu_get(pci_device_iommu_address_space(dev));
> >   notifier = iommu_notifier_register(iommu, IOMMU_EVENT_SVM_PASID, func);
> >   ...
> >   (when notify)
> >   IOMMUEvent event = { .type = IOMMU_EVENT_SVM_PASID ... };
> >   iommu_notify(iommu, &event);
> >   ...
> >   (when releasing)
> >   iommu_notifier_unregister(notifier);
> >   notifier = NULL;
> > 
> > Signed-off-by: Peter Xu 
> > ---
> 
> [...]
> 
> > +#include "qemu/osdep.h"
> > +#include "hw/core/iommu.h"
> > +
> > +IOMMUNotifier *iommu_notifier_register(IOMMUObject *iommu,
> > +   IOMMUNotifyFn fn,
> > +   uint64_t event_mask) {
> > +IOMMUNotifier *notifier = g_new0(IOMMUNotifier, 1);
> 
> For this part, I think may need to consider to alloc the memory in a
> similar way with IOMMUMRNotifier. The notifier surely needs to
> be connect with vfio container so that it could manipulate
> pIOMMU through vfio IOCTL.

Hmm yes. Or we can add one more parameter for it?

IOMMUNotifier *iommu_notifier_register(IOMMUObject *iommu,
   IOMMUNotifyFn fn,
   void *private,
   uint64_t event_mask);

Both works imho.

> 
> I'm thinking of adding a new struct VFIOGuestIOMMUObject which
> is similar to strcut VFIOGuestIOMMU. And have the original struct
> VFIOGuestIOMMU modified to be VFIOGuestIOMMUMR.
> 
> Then there would be following definition in
> "include\hw\vfio\vfio-common.h":
> 
> typedef struct VFIOGuestIOMMUObject {
> VFIOContainer *container;
> IOMMUObject *iommu;
> IOMMUNotifier n; // n is for non-MemoryRegion related events, e.g. pasid 
> table binding
> QLIST_ENTRY(VFIOGuestIOMMUObject) giommu_next;
> } VFIOGuestIOMMUObject;
> 
> typedef struct VFIOGuestIOMMUMR {
> VFIOContainer *container;
> MemoryRegion *iommu;
> hwaddr iommu_offset;
> IOMMUNotifier n;
> QLIST_ENTRY(VFIOGuestIOMMUMR) giommu_next;
> } VFIOGuestIOMMUMR;
> 
> How about your opinion?

Currently this series "blocks" at the assumption that "for each
address space we have one IOMMU backend". If you see this series, it
did not do too much thing: it tried to get the IOMMU object for a
device, then it added a notifier for it. David proposed possible model
that is against this, say, is it possible that there are more than one
IOMMUs behind one device? So before we move on to "how VFIO will
tackle with this interface", we may need to first settle down on how
we can provide such a IOMMU-oriented notifier that no one dislike.

IMHO this series may still be okay before we move on to more
complicated IOMMU models, however I cannot really persuade David since
my reasoning is not strong enough. :-)

Maybe you can think out something better so that everyone will like?
Any thoughts?

(I think I'll move my "investigate system IOMMU hierachy" TODO item
 with higher priority - imho we need to know exactly all the scenarios
 of IOMMU usage, like nested IOMMU? parallel IOMMU to separate
 translation window? etc. only after that could we finally provide a
 general and good vIOMMU framework for QEMU, then the notifier thing
 should be far easier)

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCHv2 01/04] colo-compare: Use IOThread context timer to Check old packet regularly

2017-06-07 Thread Jason Wang




On 2017年06月05日 18:44, Yong Wang wrote:

From: Wang Yong

Remove the task which check old packet in the comparing thread,
then use IOthread context timer to handle it.

Signed-off-by: Wang Yong
Signed-off-by: Wang Guang
---
  net/colo-compare.c | 62 +++---
  1 file changed, 50 insertions(+), 12 deletions(-)


I suggest to squash this into patch 2, since we will have 2 threads now.

Thanks

[Qemu-devel] [PATCH qemu v7] memory/iommu: QOM'fy IOMMU MemoryRegion

2017-06-07 Thread Alexey Kardashevskiy

This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.

This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.

This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.

This defines memory_region_init_iommu_type() to allow creating
IOMMUMemoryRegion subclasses. In order to support custom QOM type,
this splits memory_region_init() to object_initialize() +
memory_region_do_init.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v7:
* rebased on top of the current upstream

v6:
* s/\/iommu_mr/g

v5:
* fixed sparc64, first time in many years did run "./configure" without
--target-list :-D Sorry for the noise though :(

v4:
* fixed alpha, mips64el and sparc

v3:
* rebased on sha1 81b2d5ceb0

v2:
* added mr->is_iommu
* updated i386/x86_64/s390/sun
---
 hw/s390x/s390-pci-bus.h   |   2 +-
 include/exec/memory.h |  76 +++---
 include/hw/i386/intel_iommu.h |   2 +-
 include/hw/mips/mips.h|   2 +-
 include/hw/ppc/spapr.h|   3 +-
 include/hw/vfio/vfio-common.h |   2 +-
 include/qemu/typedefs.h   |   1 +
 exec.c|  12 ++--
 hw/alpha/typhoon.c|   8 ++-
 hw/dma/rc4030.c   |   8 +--
 hw/i386/amd_iommu.c   |   9 +--
 hw/i386/intel_iommu.c |  17 +++---
 hw/mips/mips_jazz.c   |   2 +-
 hw/pci-host/apb.c |   6 +-
 hw/ppc/spapr_iommu.c  |  20 ---
 hw/s390x/s390-pci-bus.c   |   6 +-
 hw/s390x/s390-pci-inst.c  |   8 +--
 hw/vfio/common.c  |  12 ++--
 hw/vfio/spapr.c   |   3 +-
 memory.c  | 124 +-
 20 files changed, 209 insertions(+), 114 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index cf142a3e68..6a599ed353 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -266,7 +266,7 @@ typedef struct S390PCIIOMMU {
 S390PCIBusDevice *pbdev;
 AddressSpace as;
 MemoryRegion mr;
-MemoryRegion iommu_mr;
+IOMMUMemoryRegion iommu_mr;
 bool enabled;
 uint64_t g_iota;
 uint64_t pba;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 80e605a96a..563453ff9e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -35,6 +35,10 @@
 #define MEMORY_REGION(obj) \
 OBJECT_CHECK(MemoryRegion, (obj), TYPE_MEMORY_REGION)
 
+#define TYPE_IOMMU_MEMORY_REGION "qemu:iommu-memory-region"
+#define IOMMU_MEMORY_REGION(obj) \
+OBJECT_CHECK(IOMMUMemoryRegion, (obj), TYPE_IOMMU_MEMORY_REGION)
+
 typedef struct MemoryRegionOps MemoryRegionOps;
 typedef struct MemoryRegionMmio MemoryRegionMmio;
 
@@ -189,16 +193,16 @@ struct MemoryRegionIOMMUOps {
  * set flag to IOMMU_NONE to mean that we don't need any
  * read/write permission checks, like, when for region replay.
  */
-IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr,
+IOMMUTLBEntry (*translate)(IOMMUMemoryRegion *iommu, hwaddr addr,
IOMMUAccessFlags flag);
 /* Returns minimum supported page size */
-uint64_t (*get_min_page_size)(MemoryRegion *iommu);
+uint64_t (*get_min_page_size)(IOMMUMemoryRegion *iommu);
 /* Called when IOMMU Notifier flag changed */
-void (*notify_flag_changed)(MemoryRegion *iommu,
+void (*notify_flag_changed)(IOMMUMemoryRegion *iommu,
 IOMMUNotifierFlag old_flags,
 IOMMUNotifierFlag new_flags);
 /* Set this up to provide customized IOMMU replay function */
-void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
+void (*replay)(IOMMUMemoryRegion *iommu, IOMMUNotifier *notifier);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -220,7 +224,6 @@ struct MemoryRegion {
 uint8_t dirty_log_mask;
 RAMBlock *ram_block;
 Object *owner;
-const MemoryRegionIOMMUOps *iommu_ops;
 
 const MemoryRegionOps *ops;
 void *opaque;
@@ -243,6 +246,13 @@ struct MemoryRegion {
 const char *name;
 unsigned ioeventfd_nb;
 MemoryRegionIoeventfd *ioeventfds;
+bool is_iommu;
+};
+
+struct IOMMUMemoryRegion {
+MemoryRegion parent_obj;
+
+const MemoryRegionIOMMUOps *iommu_ops;
 QLIST_HEAD(, IOMMUNotifier) iommu_notify;
 IOMMUNotifierFlag iommu_notify_flags;
 };
@@ -583,19 +593,40 @@ static inline void 
memory_region_init_reservation(MemoryRegion *mr,
 }
 
 /**
+ * memory_region_init_iommu_type: Initialize a memory region of a custom type
+ * that translates addresses
+ *
+ * An IOMMU region translates addresses and forwards acce

Re: [Qemu-devel] [PATCHv2 02/04] colo-compare: Process pactkets in the IOThread of the primary

2017-06-07 Thread Jason Wang




On 2017年06月05日 18:44, Yong Wang wrote:

From: Wang Yong 

Process pactkets in the IOThread which arrived over the socket.
we use qio_channel_set_aio_fd_handler to set the handlers on the
IOThread AioContext.then the packets from the primary and the secondary
are processed in the IOThread.
Finally remove the colo-compare thread using the IOThread instead.

Signed-off-by: Wang Yong
Signed-off-by: Wang Guang
---
  net/colo-compare.c | 133 -
  net/colo.h |   1 +
  2 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b0942a4..e3af791 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -29,6 +29,7 @@
  #include "qemu/sockets.h"
  #include "qapi-visit.h"
  #include "net/colo.h"
+#include "io/channel.h"
  #include "sysemu/iothread.h"
  
  #define TYPE_COLO_COMPARE "colo-compare"

@@ -82,11 +83,6 @@ typedef struct CompareState {
  GQueue conn_list;
  /* hashtable to save connection */
  GHashTable *connection_track_table;
-/* compare thread, a thread for each NIC */
-QemuThread thread;
-
-GMainContext *worker_context;
-GMainLoop *compare_loop;
  
  /*compare iothread*/

  IOThread *iothread;
@@ -95,6 +91,14 @@ typedef struct CompareState {
  QEMUTimer *packet_check_timer;
  } CompareState;
  
+typedef struct {

+Chardev parent;
+QIOChannel *ioc; /*I/O channel */


We probably don't want to manipulate char backend's internal io channel. 
All need here is to access the frontend API (char-fe.c) I believe, and 
hide the internal implementation.



+} CompareChardev;
+
+#define COMPARE_CHARDEV(obj) \
+OBJECT_CHECK(CompareChardev, (obj), TYPE_CHARDEV_SOCKET)
+
  typedef struct CompareClass {
  ObjectClass parent_class;
  } CompareClass;
@@ -107,6 +111,12 @@ enum {
  static int compare_chr_send(CharBackend *out,
  const uint8_t *buf,
  uint32_t size);
+static void compare_chr_set_aio_fd_handlers(CharBackend *b,
+AioContext *ctx,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque);
  
  static gint seq_sorter(Packet *a, Packet *b, gpointer data)

  {
@@ -534,6 +544,30 @@ err:
  return ret < 0 ? ret : -EIO;
  }
  
+static void compare_chr_read(void *opaque)

+{
+Chardev *chr = opaque;
+uint8_t buf[CHR_READ_BUF_LEN];
+int len, size;
+int max_size;
+
+max_size = qemu_chr_be_can_write(chr);
+if (max_size <= 0) {
+return;
+}
+
+len = sizeof(buf);
+if (len > max_size) {
+len = max_size;
+}
+size = CHARDEV_GET_CLASS(chr)->chr_sync_read(chr, (void *)buf, len);
+if (size == 0) {
+return;
+} else if (size > 0) {
+qemu_chr_be_write(chr, buf, size);
+}
+}
+
  static int compare_chr_can_read(void *opaque)
  {
  return COMPARE_READ_LEN_MAX;
@@ -550,8 +584,8 @@ static void compare_pri_chr_in(void *opaque, const uint8_t 
*buf, int size)
  
  ret = net_fill_rstate(&s->pri_rs, buf, size);

  if (ret == -1) {
-qemu_chr_fe_set_handlers(&s->chr_pri_in, NULL, NULL, NULL,
- NULL, NULL, true);
+compare_chr_set_aio_fd_handlers(&s->chr_pri_in, s->ctx,
+NULL, NULL, NULL, NULL);
  error_report("colo-compare primary_in error");
  }
  }
@@ -567,8 +601,8 @@ static void compare_sec_chr_in(void *opaque, const uint8_t 
*buf, int size)
  
  ret = net_fill_rstate(&s->sec_rs, buf, size);

  if (ret == -1) {
-qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL,
- NULL, NULL, true);
+compare_chr_set_aio_fd_handlers(&s->chr_sec_in, s->ctx,
+NULL, NULL, NULL, NULL);
  error_report("colo-compare secondary_in error");
  }
  }
@@ -605,34 +639,57 @@ static void colo_compare_timer_del(CompareState *s)
  }
  }
  
-static void *colo_compare_thread(void *opaque)

-{
-CompareState *s = opaque;
-
-s->worker_context = g_main_context_new();
-
-qemu_chr_fe_set_handlers(&s->chr_pri_in, compare_chr_can_read,
-  compare_pri_chr_in, NULL, s, s->worker_context, 
true);
-qemu_chr_fe_set_handlers(&s->chr_sec_in, compare_chr_can_read,
-  compare_sec_chr_in, NULL, s, s->worker_context, 
true);
-
-s->compare_loop = g_main_loop_new(s->worker_context, FALSE);
-
-g_main_loop_run(s->compare_loop);
-
-g_main_loop_unref(s->compare_loop);
-g_main_context_unref(s->worker_context);
-return NULL;
-}
  
  static void colo_compare_iothread(CompareState *s)

  {
  object_ref(OBJECT(s->iothread));
  s->ctx

[Qemu-devel] [RFC 1/8] update-linux-headers: import virtio_iommu.h

2017-06-07 Thread Eric Auger

Update the script to update the virtio_iommu.h header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 2f906c4..03f6712 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -134,6 +134,9 @@ EOF
 cat <$output/linux-headers/linux/virtio_config.h
 #include "standard-headers/linux/virtio_config.h"
 EOF
+cat <$output/linux-headers/linux/virtio_iommu.h
+#include "standard-headers/linux/virtio_iommu.h"
+EOF
 cat <$output/linux-headers/linux/virtio_ring.h
 #include "standard-headers/linux/virtio_ring.h"
 EOF
-- 
2.5.5

[Qemu-devel] [RFC 2/8] linux-headers: Update for virtio-iommu

2017-06-07 Thread Eric Auger

This is a partial linux header update against Jean-Philippe's branch:
git://linux-arm.org/linux-jpb.git virtio-iommu/base (unstable)

Signed-off-by: Eric Auger 
---
 include/standard-headers/linux/virtio_ids.h   |   1 +
 include/standard-headers/linux/virtio_iommu.h | 142 ++
 linux-headers/linux/virtio_iommu.h|   1 +
 3 files changed, 144 insertions(+)
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

diff --git a/include/standard-headers/linux/virtio_ids.h 
b/include/standard-headers/linux/virtio_ids.h
index 6d5c3b2..934ed3d 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT18 /* virtio input */
 #define VIRTIO_ID_VSOCK19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO   20 /* virtio crypto */
+#define VIRTIO_ID_IOMMU61216 /* virtio IOMMU (temporary) */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/standard-headers/linux/virtio_iommu.h 
b/include/standard-headers/linux/virtio_iommu.h
new file mode 100644
index 000..e139587
--- /dev/null
+++ b/include/standard-headers/linux/virtio_iommu.h
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ *
+ * This header is BSD licensed so anyone can use the definitions
+ * to implement compatible drivers/servers:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of ARM Ltd. nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_IOMMU_H
+#define _LINUX_VIRTIO_IOMMU_H
+
+/* Feature bits */
+#define VIRTIO_IOMMU_F_INPUT_RANGE 0
+#define VIRTIO_IOMMU_F_IOASID_BITS 1
+#define VIRTIO_IOMMU_F_MAP_UNMAP   2
+#define VIRTIO_IOMMU_F_BYPASS  3
+
+QEMU_PACKED
+struct virtio_iommu_config {
+   /* Supported page sizes */
+   uint64_tpage_sizes;
+   struct virtio_iommu_range {
+   uint64_tstart;
+   uint64_tend;
+   } input_range;
+   uint8_t ioasid_bits;
+};
+
+/* Request types */
+#define VIRTIO_IOMMU_T_ATTACH  0x01
+#define VIRTIO_IOMMU_T_DETACH  0x02
+#define VIRTIO_IOMMU_T_MAP 0x03
+#define VIRTIO_IOMMU_T_UNMAP   0x04
+
+/* Status types */
+#define VIRTIO_IOMMU_S_OK  0x00
+#define VIRTIO_IOMMU_S_IOERR   0x01
+#define VIRTIO_IOMMU_S_UNSUPP  0x02
+#define VIRTIO_IOMMU_S_DEVERR  0x03
+#define VIRTIO_IOMMU_S_INVAL   0x04
+#define VIRTIO_IOMMU_S_RANGE   0x05
+#define VIRTIO_IOMMU_S_NOENT   0x06
+#define VIRTIO_IOMMU_S_FAULT   0x07
+
+QEMU_PACKED
+struct virtio_iommu_req_head {
+   uint8_t type;
+   uint8_t reserved[3];
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_tail {
+   uint8_t status;
+   uint8_t reserved[3];
+};
+
+QEMU_PACKED
+struct virtio_iommu_req_attach {
+   struct virtio_iommu_req_headhead;
+
+   uint32_taddress_space;
+   uint32_tdevice;
+   uint32_treserved;
+
+   struct virtio_iommu_req_tail

[Qemu-devel] [RFC 0/8] VIRTIO-IOMMU device

2017-06-07 Thread Eric Auger

This series implements the virtio-iommu device. This is a proof
of concept based on the virtio-iommu specification written by
Jean-Philippe Brucker [1]. This was tested with a guest using
the virtio-iommu driver [2] and exposed with a virtio-net-pci
using dma ops.

The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node. ACPI booting is not yet supported.

This should allow to start some benchmarking activities against
pure emulated IOMMU (especially ARM SMMU).

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-rfcv1

References:
[1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU,
[2] [RFC PATCH linux] iommu: Add virtio-iommu driver
[3] [RFC PATCH kvmtool 00/15] Add virtio-iommu

Eric Auger (8):
  update-linux-headers: import virtio_iommu.h
  linux-headers: Update for virtio-iommu
  virtio_iommu: add skeleton
  virtio-iommu: Decode the command payload
  virtio_iommu: Add the iommu regions
  virtio-iommu: Implement the translation and commands
  hw/arm/virt: Add 2.10 machine type
  hw/arm/virt: Add virtio-iommu the virt board

 hw/arm/virt.c | 116 -
 hw/virtio/Makefile.objs   |   1 +
 hw/virtio/trace-events|  14 +
 hw/virtio/virtio-iommu.c  | 623 ++
 include/hw/arm/virt.h |   5 +
 include/hw/virtio/virtio-iommu.h  |  60 +++
 include/standard-headers/linux/virtio_ids.h   |   1 +
 include/standard-headers/linux/virtio_iommu.h | 142 ++
 linux-headers/linux/virtio_iommu.h|   1 +
 scripts/update-linux-headers.sh   |   3 +
 10 files changed, 957 insertions(+), 9 deletions(-)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h
 create mode 100644 include/standard-headers/linux/virtio_iommu.h
 create mode 100644 linux-headers/linux/virtio_iommu.h

-- 
2.5.5

[Qemu-devel] [RFC 3/8] virtio_iommu: add skeleton

2017-06-07 Thread Eric Auger

This patchs adds the skeleton for the virtio-iommu device.

Signed-off-by: Eric Auger 
---
 hw/virtio/Makefile.objs  |   1 +
 hw/virtio/virtio-iommu.c | 247 +++
 include/hw/virtio/virtio-iommu.h |  60 ++
 3 files changed, 308 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 765d363..8967a4a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-y += virtio-mmio.o
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+obj-$(CONFIG_LINUX) += virtio-iommu.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 obj-y += virtio-crypto.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
new file mode 100644
index 000..86129ef
--- /dev/null
+++ b/hw/virtio/virtio-iommu.c
@@ -0,0 +1,247 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qemu-common.h"
+#include "hw/virtio/virtio.h"
+#include "sysemu/kvm.h"
+#include "qapi-event.h"
+#include "trace.h"
+
+#include "standard-headers/linux/virtio_ids.h"
+#include 
+
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-iommu.h"
+
+/* Max size */
+#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+
+static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
+  struct iovec *iov,
+  unsigned int iov_cnt)
+{
+return -ENOENT;
+}
+static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
+  struct iovec *iov,
+  unsigned int iov_cnt)
+{
+return -ENOENT;
+}
+static int virtio_iommu_handle_map(VirtIOIOMMU *s,
+   struct iovec *iov,
+   unsigned int iov_cnt)
+{
+return -ENOENT;
+}
+static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
+ struct iovec *iov,
+ unsigned int iov_cnt)
+{
+return -ENOENT;
+}
+
+static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+VirtQueueElement *elem;
+struct virtio_iommu_req_head head;
+struct virtio_iommu_req_tail tail;
+unsigned int iov_cnt;
+struct iovec *iov;
+size_t sz;
+
+for (;;) {
+elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+if (!elem) {
+return;
+}
+
+if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
+iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
+virtio_error(vdev, "virtio-iommu erroneous head or tail");
+virtqueue_detach_element(vq, elem, 0);
+g_free(elem);
+break;
+}
+
+iov_cnt = elem->out_num;
+iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
+sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
+if (sz != sizeof(head)) {
+tail.status = VIRTIO_IOMMU_S_UNSUPP;
+}
+qemu_mutex_lock(&s->mutex);
+switch (head.type) {
+case VIRTIO_IOMMU_T_ATTACH:
+tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
+break;
+case VIRTIO_IOMMU_T_DETACH:
+tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
+break;
+case VIRTIO_IOMMU_T_MAP:
+tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
+break;
+case VIRTIO_IOMMU_T_UNMAP:
+tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
+break;
+default:
+tail.status = VIRTIO_IOMMU_S_UNSUPP;
+}
+qemu_mutex_unlock(&s->mutex);
+
+sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+  &tail, sizeof(tail));
+assert(sz == sizeof(tail));
+
+virtqueue_push(vq, elem, sizeof(tail));
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
+static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+VirtI

Re: [Qemu-devel] [PATCHv2 04/04] colo-compare: Update the COLO document to fix the processing of secondary packets in the main thread

2017-06-07 Thread Jason Wang




On 2017年06月05日 18:44, Yong Wang wrote:

From: Wang Yong 

In my test, secondary does not process the packets comparing in the IOThread
but in the qemu main thread processing.
secondary's configuration " -chardev 
socket,id=compare1,host=3.3.3.3,port=9004,server,nowait"
here,we should use 'wait' instead of 'nowait' configuration .
after use  'wait' configuration ,packets can be process in the IOThread from 
the secondary,
not in the qemu main thread.

Signed-off-by: Wang Yong
Signed-off-by: Wang Guang
---


This is a hint probably for something wrong in the code. Need to figure 
out the root cause.


Thanks


  docs/colo-proxy.txt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
index ce3f783..567cc8b 100644
--- a/docs/colo-proxy.txt
+++ b/docs/colo-proxy.txt
@@ -165,7 +165,7 @@ Primary(ip:3.3.3.3):
  -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
  -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
  -chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
--chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,wait
  -chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
  -chardev socket,id=compare0-0,host=3.3.3.3,port=9001
  -chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait

[Qemu-devel] [RFC 4/8] virtio-iommu: Decode the command payload

2017-06-07 Thread Eric Auger

This patch adds the command payload decoding and
introduces the functions that will do the actual
command handling. Those functions are not yet implemented.

Signed-off-by: Eric Auger 
---
 hw/virtio/trace-events   |  7 
 hw/virtio/virtio-iommu.c | 97 ++--
 2 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e24d8fa..fba1da6 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -25,3 +25,10 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) 
"section name: %s g
 virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d 
actual: %d"
 virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d 
oldactual: %d"
 virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: 
%"PRIx64" num_pages: %d"
+
+# hw/virtio/virtio-iommu.c
+#
+virtio_iommu_attach(uint32_t as, uint32_t dev, uint32_t flags) "as=%d dev=%d 
flags=%d"
+virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
+virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t 
size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" 
size=0x%"PRIx64" flags=%d"
+virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t 
reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 86129ef..ea1caa7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -35,29 +35,118 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static int virtio_iommu_attach(VirtIOIOMMU *s,
+   struct virtio_iommu_req_attach *req)
+{
+uint32_t asid = le32_to_cpu(req->address_space);
+uint32_t devid = le32_to_cpu(req->device);
+uint32_t reserved = le32_to_cpu(req->reserved);
+
+trace_virtio_iommu_attach(asid, devid, reserved);
+
+return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_detach(VirtIOIOMMU *s,
+   struct virtio_iommu_req_detach *req)
+{
+uint32_t devid = le32_to_cpu(req->device);
+uint32_t reserved = le32_to_cpu(req->reserved);
+
+trace_virtio_iommu_detach(devid, reserved);
+
+return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_map(VirtIOIOMMU *s,
+struct virtio_iommu_req_map *req)
+{
+uint32_t asid = le32_to_cpu(req->address_space);
+uint64_t phys_addr = le64_to_cpu(req->phys_addr);
+uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+uint64_t size = le64_to_cpu(req->size);
+uint32_t flags = le32_to_cpu(req->flags);
+
+trace_virtio_iommu_map(asid, phys_addr, virt_addr, size, flags);
+
+return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static int virtio_iommu_unmap(VirtIOIOMMU *s,
+  struct virtio_iommu_req_unmap *req)
+{
+uint32_t asid = le32_to_cpu(req->address_space);
+uint64_t virt_addr = le64_to_cpu(req->virt_addr);
+uint64_t size = le64_to_cpu(req->size);
+uint32_t flags = le32_to_cpu(req->flags);
+
+trace_virtio_iommu_unmap(asid, virt_addr, size, flags);
+
+return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+#define get_payload_size(req) (\
+sizeof((req)) - sizeof(struct virtio_iommu_req_tail))
+
 static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
   struct iovec *iov,
   unsigned int iov_cnt)
 {
-return -ENOENT;
+struct virtio_iommu_req_attach req;
+size_t sz, payload_sz;
+
+payload_sz = get_payload_size(req);
+
+sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+if (sz != payload_sz) {
+return VIRTIO_IOMMU_S_INVAL;
+}
+return virtio_iommu_attach(s, &req);
 }
 static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
   struct iovec *iov,
   unsigned int iov_cnt)
 {
-return -ENOENT;
+struct virtio_iommu_req_detach req;
+size_t sz, payload_sz;
+
+payload_sz = get_payload_size(req);
+
+sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+if (sz != payload_sz) {
+return VIRTIO_IOMMU_S_INVAL;
+}
+return virtio_iommu_detach(s, &req);
 }
 static int virtio_iommu_handle_map(VirtIOIOMMU *s,
struct iovec *iov,
unsigned int iov_cnt)
 {
-return -ENOENT;
+struct virtio_iommu_req_map req;
+size_t sz, payload_sz;
+
+payload_sz = get_payload_size(req);
+
+sz = iov_to_buf(iov, iov_cnt, 0, &req, payload_sz);
+if (sz != payload_sz) {
+return VIRTIO_IOMMU_S_INVAL;
+}
+return virtio_iommu_map(s, &req);
 }
 static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
  struct iovec *iov,
  unsigned int iov_cnt)
 {
-return -ENOENT;
+struct virti

[Qemu-devel] [RFC 5/8] virtio_iommu: Add the iommu regions

2017-06-07 Thread Eric Auger

This patch initializes the iommu memory regions so that
PCIe end point transactions get translated. The translation function
is not yet implemented at that stage.

Signed-off-by: Eric Auger 
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 97 
 2 files changed, 98 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index fba1da6..341dbdf 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -32,3 +32,4 @@ virtio_iommu_attach(uint32_t as, uint32_t dev, uint32_t 
flags) "as=%d dev=%d fla
 virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d flags=%d"
 virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t 
size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" 
size=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t 
reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
+virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int 
flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index ea1caa7..902c779 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -23,6 +23,7 @@
 #include "hw/virtio/virtio.h"
 #include "sysemu/kvm.h"
 #include "qapi-event.h"
+#include "qemu/error-report.h"
 #include "trace.h"
 
 #include "standard-headers/linux/virtio_ids.h"
@@ -35,6 +36,59 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
+{
+return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
+}
+
+static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
+  int devfn)
+{
+VirtIOIOMMU *s = opaque;
+uintptr_t key = (uintptr_t)bus;
+IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, &key);
+IOMMUDevice *sdev;
+
+if (!sbus) {
+uintptr_t *new_key = g_malloc(sizeof(*new_key));
+
+*new_key = (uintptr_t)bus;
+sbus = g_malloc0(sizeof(IOMMUPciBus) +
+ sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
+sbus->bus = bus;
+g_hash_table_insert(s->as_by_busptr, new_key, sbus);
+}
+
+sdev = sbus->pbdev[devfn];
+if (!sdev) {
+sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
+
+sdev->viommu = s;
+sdev->bus = bus;
+sdev->devfn = devfn;
+
+memory_region_init_iommu(&sdev->iommu_mr, OBJECT(s),
+ &s->iommu_ops, TYPE_VIRTIO_IOMMU,
+ UINT64_MAX);
+address_space_init(&sdev->as, &sdev->iommu_mr, TYPE_VIRTIO_IOMMU);
+}
+
+return &sdev->as;
+
+}
+
+static void virtio_iommu_init_as(VirtIOIOMMU *s)
+{
+PCIBus *pcibus = pci_find_primary_bus();
+
+if (pcibus) {
+pci_setup_iommu(pcibus, virtio_iommu_find_add_as, s);
+} else {
+error_report("No PCI bus, virtio-iommu is not registered");
+}
+}
+
+
 static int virtio_iommu_attach(VirtIOIOMMU *s,
struct virtio_iommu_req_attach *req)
 {
@@ -208,6 +262,26 @@ static void virtio_iommu_handle_command(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 }
 
+static IOMMUTLBEntry virtio_iommu_translate(MemoryRegion *mr, hwaddr addr,
+IOMMUAccessFlags flag)
+{
+IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+uint32_t sid;
+
+IOMMUTLBEntry entry = {
+.target_as = &address_space_memory,
+.iova = addr,
+.translated_addr = addr,
+.addr_mask = ~(hwaddr)0,
+.perm = IOMMU_NONE,
+};
+
+sid = smmu_get_sid(sdev);
+
+trace_virtio_iommu_translate(mr->name, sid, addr, flag);
+return entry;
+}
+
 static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
 VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -253,6 +327,21 @@ static const VMStateDescription 
vmstate_virtio_iommu_device = {
 },
 };
 
+/*
+ * Hash Table
+ */
+
+static inline gboolean as_uint64_equal(gconstpointer v1, gconstpointer v2)
+{
+return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+}
+
+static inline guint as_uint64_hash(gconstpointer v)
+{
+return (guint)*(const uint64_t *)v;
+}
+
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -266,6 +355,14 @@ static void virtio_iommu_device_realize(DeviceState *dev, 
Error **errp)
 
 s->config.page_sizes = ~((1ULL << 12) - 1);
 s->config.input_range.end = -1UL;
+
+s->iommu_ops.translate = virtio_iommu_translate;
+memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
+s->as_by_busptr = g_hash_table_new_full(as_uint64_hash,
+as_uint64_equal,
+

[Qemu-devel] [RFC 8/8] hw/arm/virt: Add virtio-iommu the virt board

2017-06-07 Thread Eric Auger

The specific virtio-mmio node is inconditionally added on
machine init while the binding between this latter and the
PCIe host bridge is done on machine init done notifier, only
if -device virtio-iommu-device was added to the qemu command
line.

Signed-off-by: Eric Auger 

---
---
 hw/arm/virt.c | 92 +++
 include/hw/arm/virt.h |  4 +++
 2 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6eb0d2a..6bcfbcd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -52,6 +52,7 @@
 #include "hw/arm/fdt.h"
 #include "hw/intc/arm_gic.h"
 #include "hw/intc/arm_gicv3_common.h"
+#include "hw/virtio/virtio-iommu.h"
 #include "kvm_arm.h"
 #include "hw/smbios/smbios.h"
 #include "qapi/visitor.h"
@@ -139,6 +140,7 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_FW_CFG] = { 0x0902, 0x0018 },
 [VIRT_GPIO] =   { 0x0903, 0x1000 },
 [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
+[VIRT_SMMU] =   { 0x0905, 0x0200 },
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
@@ -159,6 +161,7 @@ static const int a15irqmap[] = {
 [VIRT_SECURE_UART] = 8,
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
 [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
+[VIRT_SMMU] = 74,
 [VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */
 };
 
@@ -991,7 +994,81 @@ static void create_pcie_irq_map(const VirtMachineState 
*vms,
0x7   /* PCI irq */);
 }
 
-static void create_pcie(const VirtMachineState *vms, qemu_irq *pic)
+static int bind_virtio_iommu_device(Object *obj, void *opaque)
+{
+VirtMachineState *vms = (VirtMachineState *)opaque;
+struct arm_boot_info *info = &vms->bootinfo;
+int dtb_size;
+void *fdt = info->get_dtb(info, &dtb_size);
+Object *dev;
+
+dev = object_dynamic_cast(obj, TYPE_VIRTIO_IOMMU);
+
+if (!dev) {
+/* Container, traverse it for children */
+return object_child_foreach(obj, bind_virtio_iommu_device, opaque);
+}
+
+qemu_fdt_setprop_cells(fdt, vms->pcie_host_nodename, "iommu-map",
+   0x0, vms->smmu_phandle, 0x0, 0x1);
+
+return true;
+}
+
+static
+void virtio_iommu_notifier(Notifier *notifier, void *data)
+{
+VirtMachineState *vms = container_of(notifier, VirtMachineState,
+ virtio_iommu_done);
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+Object *container;
+
+
+if (vmc->no_iommu) {
+return;
+}
+
+container = container_get(qdev_get_machine(), "/peripheral");
+bind_virtio_iommu_device(container, vms);
+container = container_get(qdev_get_machine(), "/peripheral-anon");
+bind_virtio_iommu_device(container, vms);
+}
+
+static void create_virtio_iommu(VirtMachineState *vms, qemu_irq *pic)
+{
+char *smmu;
+const char compat[] = "virtio,mmio";
+int irq =  vms->irqmap[VIRT_SMMU];
+hwaddr base = vms->memmap[VIRT_SMMU].base;
+hwaddr size = vms->memmap[VIRT_SMMU].size;
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+if (vmc->no_iommu) {
+return;
+}
+
+vms->smmu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+
+sysbus_create_simple("virtio-mmio", base, pic[irq]);
+
+smmu = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+qemu_fdt_add_subnode(vms->fdt, smmu);
+qemu_fdt_setprop(vms->fdt, smmu, "compatible", compat, sizeof(compat));
+qemu_fdt_setprop_sized_cells(vms->fdt, smmu, "reg", 2, base, 2, size);
+
+qemu_fdt_setprop_cells(vms->fdt, smmu, "interrupts",
+GIC_FDT_IRQ_TYPE_SPI, irq, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+
+qemu_fdt_setprop(vms->fdt, smmu, "dma-coherent", NULL, 0);
+qemu_fdt_setprop_cell(vms->fdt, smmu, "#iommu-cells", 1);
+qemu_fdt_setprop_cell(vms->fdt, smmu, "phandle", vms->smmu_phandle);
+g_free(smmu);
+
+vms->virtio_iommu_done.notify = virtio_iommu_notifier;
+qemu_add_machine_init_done_notifier(&vms->virtio_iommu_done);
+}
+
+static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
 {
 hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
 hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
@@ -1064,7 +1141,8 @@ static void create_pcie(const VirtMachineState *vms, 
qemu_irq *pic)
 }
 }
 
-nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+vms->pcie_host_nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+nodename = vms->pcie_host_nodename;
 qemu_fdt_add_subnode(vms->fdt, nodename);
 qemu_fdt_setprop_string(vms->fdt, nodename,
 "compatible", "pci-host-ecam-generic");
@@ -1103,7 +1181,6 @@ static void create_pcie(const VirtMachineState *vms, 
qemu_irq

Re: [Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread no-reply

Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Message-id: 1496820931-27416-10-git-send-email-a.pereva...@samsung.com
Subject: [Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on 
dst side

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20170607083243.8983-1-...@ozlabs.ru -> 
patchew/20170607083243.8983-1-...@ozlabs.ru
Switched to a new branch 'test'
451d40f 09/11] migration: calculate vCPU blocktime on dst side

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=86076
SHELL=/bin/sh
USER=fam
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin
PWD=/var/tmp/patchew-tester-tmp-a31ds_kd/src
LANG=en_US.UTF-8
HOME=/home/fam
SHLVL=2
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
xz-libs-5.2.2-2.fc24.s390x
libxshmfence-1.2-3.fc24.s390x
giflib-4.1.6-15.fc24.s390x
trousers-lib-0.3.13-6.fc24.s390x
ncurses-base-6.0-6.20160709.fc25.noarch
gmp-6.1.1-1.fc25.s390x
libidn-1.33-1.fc25.s390x
slang-2.3.0-7.fc25.s390x
libsemanage-2.5-8.fc25.s390x
pkgconfig-0.29.1-1.fc25.s390x
alsa-lib-1.1.1-2.fc25.s390x
yum-metadata-parser-1.1.4-17.fc25.s390x
python3-slip-dbus-0.6.4-4.fc25.noarch
python2-cssselect-0.9.2-1.fc25.noarch
python-fedora-0.8.0-2.fc25.noarch
createrepo_c-libs-0.10.0-6.fc25.s390x
initscripts-9.69-1.fc25.s390x
wget-1.18-2.fc25.s390x
dhcp-client-4.3.5-1.fc25.s390x
parted-3.2-21.fc25.s390x
flex-2.6.0-3.fc25.s390x
colord-libs-1.3.4-1.fc25.s390x
python-osbs-client-0.33-3.fc25.noarch
perl-Pod-Simple-3.35-1.fc25.noarch
python2-simplejson-3.10.0-1.fc25.s390x
brltty-5.4-2.fc25.s390x
librados2-10.2.4-2.fc25.s390x
tcp_wrappers-7.6-83.fc25.s390x
libcephfs_jni1-10.2.4-2.fc25.s390x
nettle-devel-3.3-1.fc25.s390x
bzip2-devel-1.0.6-21.fc25.s390x
libuuid-2.28.2-2.fc25.s390x
pango-1.40.4-1.fc25.s390x
python3-dnf-1.1.10-6.fc25.noarch
cryptsetup-libs-1.7.4-1.fc25.s390x
texlive-kpathsea-doc-svn41139-33.fc25.1.noarch
netpbm-10.77.00-3.fc25.s390x
openssh-7.4p1-4.fc25.s390x
texlive-kpathsea-bin-svn40473-33.20160520.fc25.1.s390x
texlive-graphics-svn41015-33.fc25.1.noarch
texlive-dvipdfmx-def-svn40328-33.fc25.1.noarch
texlive-mfware-svn40768-33.fc25.1.noarch
texlive-texlive-scripts-svn41433-33.fc25.1.noarch
texlive-euro-svn22191.1.1-33.fc25.1.noarch
texlive-etex-svn37057.0-33.fc25.1.noarch
texlive-iftex-svn29654.0.2-33.fc25.1.noarch
texlive-palatino-svn31835.0-33.fc25.1.noarch
texlive-texlive-docindex-svn41430-33.fc25.1.noarch
texlive-xunicode-svn30466.0.981-33.fc25.1.noarch
texlive-koma-script-svn41508-33.fc25.1.noarch
texlive-pst-grad-svn15878.1.06-33.fc25.1.noarch
texlive-pst-blur-svn15878.2.0-33.fc25.1.noarch
texlive-jknapltx-svn19440.0-33.fc25.1.noarch
netpbm-progs-10.77.00-3.fc25.s390x
texinfo-6.1-4.fc25.s390x
openssl-devel-1.0.2k-1.fc25.s390x
python2-sssdconfig-1.15.2-1.fc25.noarch
gdk-pixbuf2-2.36.6-1.fc25.s390x
mesa-libEGL-13.0.4-3.fc25.s390x
pcre-cpp-8.40-6.fc25.s390x
pcre-utf16-8.40-6.fc25.s390x
glusterfs-extra-xlators-3.10.1-1.fc25.s390x
mesa-libGL-devel-13.0.4-3.fc25.s390x
nss-devel-3.29.3-1.1.fc25.s390x
libaio-0.3.110-6.fc24.s390x
libfontenc-1.1.3-3.fc24.s390x
lzo-2.08-8.fc24.s390x
isl-0.14-5.fc24.s390x
libXau-1.0.8-6.fc24.s390x
linux-atm-libs-2.5.1-14.fc24.s390x
libXext-1.3.3-4.fc24.s390x
libXxf86vm-1.1.4-3.fc24.s390x
bison-3.0.4-4.fc24.s390x
perl-srpm-macros-1-20.fc25.noarch
gawk-4.1.3-8.fc25.s390x
libwayland-client-1.12.0-1.fc25.s390x
perl-Exporter-5.72-366.fc25.noarch
perl-version-0.99.17-1.fc25.s390x
fftw-libs-double-3.3.5-3.fc25.s390x
libssh2-1.8.0-1.fc25.s390x
ModemManager-glib-1.6.4-1.fc25.s390x
newt-python3-0.52.19-2.fc25.s390x
python-munch-2.0.4-3.fc25.noarch
python-bugzilla-1.2.2-4.fc25.noarch
libedit-3.1-16.20160618cvs.fc25.s390x
python-pycurl-7.43.0-4.fc25.s390x
createrepo_c-0.10.0-6.fc25.s390x
device-mapper-multipath-libs-0.4.9-83.fc25.s390x
yum-3.4.3-510.fc25.noarch
dhcp-common-4.3.5-1.fc25.noarch
dracut-config-rescue-044-78.fc25.s390x
teamd-1.26-1.fc25.s390x
mozjs17-17.0.0-16.fc25.s390x
libselinux-2.5-13.fc25.s390x
libgo-devel-6.3.1-1.fc25.s390x
NetworkManager-libnm-1.4.4-3.fc25.s390x
python2-pyparsing-2.1.10-1.fc25.noarch
cairo-gobject-1.14.8-1.fc25.s390x
ethtool-4.8-1.fc25.s390x
xorg-x11-proto-devel-7.7-20.fc25.noarch
brlapi-0.6.5-2.fc25.s390x
librados-devel-10.2.4-2.fc25.s390x
libXinerama-devel-

[Qemu-devel] [RFC 7/8] hw/arm/virt: Add 2.10 machine type

2017-06-07 Thread Eric Auger

The new machine type allows virtio-iommu instantiation.

Signed-off-by: Eric Auger 

---

a Veuillez saisir le message de validation pour vos modifications. Les lignes
---
 hw/arm/virt.c | 24 ++--
 include/hw/arm/virt.h |  1 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 010f724..6eb0d2a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1639,7 +1639,7 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
-static void virt_2_9_instance_init(Object *obj)
+static void virt_2_10_instance_init(Object *obj)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
@@ -1699,10 +1699,30 @@ static void virt_2_9_instance_init(Object *obj)
 vms->irqmap = a15irqmap;
 }
 
+static void virt_machine_2_10_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(2, 10)
+
+#define VIRT_COMPAT_2_9 \
+HW_COMPAT_2_9
+
+static void virt_2_9_instance_init(Object *obj)
+{
+virt_2_10_instance_init(obj);
+}
+
 static void virt_machine_2_9_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
+virt_machine_2_10_options(mc);
+SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_9);
+
+vmc->no_iommu = true;
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(2, 9)
+DEFINE_VIRT_MACHINE(2, 9)
+
 
 #define VIRT_COMPAT_2_8 \
 HW_COMPAT_2_8
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 33b0ff3..ff27551 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -84,6 +84,7 @@ typedef struct {
 bool disallow_affinity_adjustment;
 bool no_its;
 bool no_pmu;
+bool no_iommu;
 bool claim_edge_triggered_timers;
 } VirtMachineClass;
 
-- 
2.5.5

[Qemu-devel] [RFC 6/8] virtio-iommu: Implement the translation and commands

2017-06-07 Thread Eric Auger

This patch adds the actual implementation for the translation routine
and the virtio-iommu commands.

Signed-off-by: Eric Auger 
---
 hw/virtio/trace-events   |   6 ++
 hw/virtio/virtio-iommu.c | 202 +--
 2 files changed, 202 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 341dbdf..9196b63 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -33,3 +33,9 @@ virtio_iommu_detach(uint32_t dev, uint32_t flags) "dev=%d 
flags=%d"
 virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr, uint64_t 
size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64" virt_addr=0x%"PRIx64" 
size=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t 
reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
 virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int 
flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
+virtio_iommu_new_asid(uint32_t asid) "Allocate a new asid=%d"
+virtio_iommu_new_devid(uint32_t devid) "Allocate a new devid=%d"
+virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t 
next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"], new 
interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t 
next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"], new 
interval=[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc 
[0x%"PRIx64",0x%"PRIx64"]"
+virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr, uint32_t 
sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 902c779..0bbdd76 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -32,10 +32,37 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+typedef struct viommu_as viommu_as;
+
+typedef struct viommu_mapping {
+uint64_t virt_addr;
+uint64_t phys_addr;
+uint64_t size;
+uint32_t flags;
+} viommu_mapping;
+
+typedef struct viommu_interval {
+uint64_t low;
+uint64_t high;
+} viommu_interval;
+
+typedef struct viommu_dev {
+uint32_t id;
+viommu_as *as;
+} viommu_dev;
+
+typedef struct viommu_as {
+uint32_t id;
+uint32_t nr_devices;
+GTree *mappings;
+} viommu_as;
+
 static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
 {
 return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
@@ -88,6 +115,19 @@ static void virtio_iommu_init_as(VirtIOIOMMU *s)
 }
 }
 
+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+viommu_interval *inta = (viommu_interval *)a;
+viommu_interval *intb = (viommu_interval *)b;
+
+if (inta->high <= intb->low) {
+return -1;
+} else if (intb->high <= inta->low) {
+return 1;
+} else {
+return 0;
+}
+}
 
 static int virtio_iommu_attach(VirtIOIOMMU *s,
struct virtio_iommu_req_attach *req)
@@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
 uint32_t asid = le32_to_cpu(req->address_space);
 uint32_t devid = le32_to_cpu(req->device);
 uint32_t reserved = le32_to_cpu(req->reserved);
+viommu_as *as;
+viommu_dev *dev;
 
 trace_virtio_iommu_attach(asid, devid, reserved);
 
-return VIRTIO_IOMMU_S_UNSUPP;
+dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
+if (dev) {
+return -1;
+}
+
+as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
+if (!as) {
+as = g_malloc0(sizeof(*as));
+as->id = asid;
+as->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+ NULL, NULL, (GDestroyNotify)g_free);
+g_tree_insert(s->address_spaces, GUINT_TO_POINTER(asid), as);
+trace_virtio_iommu_new_asid(asid);
+}
+
+dev = g_malloc0(sizeof(*dev));
+dev->as = as;
+dev->id = devid;
+as->nr_devices++;
+trace_virtio_iommu_new_devid(devid);
+g_tree_insert(s->devices, GUINT_TO_POINTER(devid), dev);
+
+return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_detach(VirtIOIOMMU *s,
@@ -106,10 +170,13 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
 {
 uint32_t devid = le32_to_cpu(req->device);
 uint32_t reserved = le32_to_cpu(req->reserved);
+int ret;
 
 trace_virtio_iommu_detach(devid, reserved);
 
-return VIRTIO_IOMMU_S_UNSUPP;
+ret = g_tree_remove(s->devices, GUINT_TO_POINTER(devid));
+
+return ret ? VIRTIO_IOMMU_S_OK : VIRTIO_IOMMU_S_INVAL;
 }
 
 static int virtio_iommu_map(VirtIOIOMMU *s,
@@ -120,10 +187,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
 uint64_t virt_addr = le64_to_cpu(req->vir

Re: [Qemu-devel] [PATCH V3 0/3] COLO-compare: Make COLO-compare support remote COLO-frame

2017-06-07 Thread Jason Wang




On 2017年06月06日 16:12, Zhang Chen wrote:

This series focus on COLO-proxy remote colo-frame support.
Xen COLO-frame is the first user. We add a new chardev socket
in colo-compare as the way of communicate with remote COLO-frame.
And remote COLO-frame notify colo-proxy part depend on this serise:
https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg03904.html

I will send another part of this series after depend patchset have
been merged.

V3:
  - Fix codestyle.

V2:
  - Rename this series.
  - Change communication way to remote colo-frame.
  - Some bugfix.
  - Split the main function, anther part wait depend patchset.


Zhang Chen (3):
   COLO-compare: Add new parameter for communicate with remote colo-frame
   COLO-compare: Add remote checkpoint notify chardev socket handler
 frame
   COLO-compare: Add remote initialization and checkpoint notification

  net/colo-compare.c | 91 --
  qemu-options.hx| 41 
  2 files changed, 124 insertions(+), 8 deletions(-)



Thanks for the series.

To speed up things, I would like to see IOThread conversion first. Then 
we can have this on top.

[Qemu-devel] [Bug 779151] Re: qemu-nbd crash during using with chroot

2017-06-07 Thread Thomas Huth

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/779151

Title:
  qemu-nbd crash during using with chroot

Status in QEMU:
  Incomplete

Bug description:
  I use qemu-nbd to mount my image. And after some times, qemu-nbd
  crashes and so the chroot freeze.

  ps aux | grep qemu :
  root  2223  0.0  0.0   9776   548 ?Ss   18:03   0:00 qemu-nbd 
--connect=/dev/nbd0 /chroots/test/virtual.img
  root  2224  0.0  0.0  10800   544 ?D18:03   0:00 qemu-nbd 
--connect=/dev/nbd0 /chroots/test/virtual.img
  root  2227  0.0  0.0  0 0 ?Z18:03   0:00 [qemu-nbd] 

  root  2357  0.0  0.0   5212   768 pts/0D+   18:07   0:00 grep qemu

  mount :
  /dev/nbd0p1 on /chroots/test/amd64 type ext3 (rw,errors=remount-ro,commit=0)
  /dev on /chroots/test/amd64/dev type devtmpfs (rw,mode=0755)
  /dev/pts on /chroots/test/amd64/dev/pts type devpts 
(rw,noexec,nosuid,gid=5,mode=0620)
  /proc on /chroots/test/amd64/proc type proc (rw)
  /sys on /chroots/test/amd64/sys type sysfs (rw,noexec,nosuid,nodev)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/779151/+subscriptions

[Qemu-devel] [Bug 816860] Re: Guest machine freezes when NFS mount goes offline

2017-06-07 Thread Thomas Huth

Can you still reproduce this problem with the latest version of QEMU
(currently version 2.9.0)?

** Changed in: qemu
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/816860

Title:
  Guest machine freezes when NFS mount goes offline

Status in QEMU:
  Incomplete

Bug description:
  I have a virtual KVM machine that has 2 CDROM units with ISOs mounted
  from a NFS mount point. When NFS server goes offline the virtual
  machine blocks completely instead of throwing read errors for the
  CDROM device.

  Host: Proxmox VE 1.8-11 (Debian GNU/Linux 5.0)
  KVM commandline version: QEMU emulator version 0.14.1 (qemu-kvm-devel)
  Guest: Windows 7 professional SP 1

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/816860/+subscriptions

[Qemu-devel] [Bug 796480] Re: Addresses with 4GB differences are consider as one single address in QEMU

2017-06-07 Thread Thomas Huth

Can you still reproduce this problem with the latest version of QEMU
(currently version 2.9.0)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/796480

Title:
  Addresses with 4GB differences are consider as one single address in
  QEMU

Status in QEMU:
  Incomplete

Bug description:
  THIS IS THE ISSUE OF USER MODE EMULATION
  Information about guest and host
  **
  guest: 64 bit x86 user mode binary
  host: 32 bit Linux OS
  uname -a :Linux KICS-HPCNL-32blue 2.6.33.3-85.fc13.i686.PAE #1 SMP
  architecture: intel64
  Bug Description
  
  for memory reference instructions, suppose I have two addresses in guest 
address space(64 bit)
  0x22000
  0x32000
  as lower 32 bit part of both addresses are same, when particular instructions 
are translated into host code(32 bit)
  in both above cases the value is loaded from same memory and we get same 
value. where actual behaviour was to get two different values.
  here is the program which i used to test:
  #include 
  #include 
  #include 
  #define SIZE 4294967297 /* 4Gib*/

  int main() {
 char *array;
 unsigned int i;

 array = malloc(sizeof(char) * SIZE);
 if(array == NULL){
fprintf(stderr, "Could not allocate that much memory");
return 1;}
  array[0] = 'a';
 array[SIZE-1] = 'z';
 printf("array[SIZE-1] = %c array[0] = %c\n",array[SIZE-1], array[0]);
return 0;
  }
  I have 8 gib RAM
  I compiled this program on 64 bit linux  and run this on 32 bit linux with 
qemu
  QEMU command line and output
  **
  $x86_64-linux-user/qemu-x86_64 ~/ar_x86 
  output: array[SIZE-1] = z,array[0] = z 
  Release information
  
  x86_64 binary is tested with latest release : qemu-0.14.1
  and with current development tree as well( live code of QEMU using git)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/796480/+subscriptions

[Qemu-devel] [Bug 893956] Re: qemu-img bug with dynamic vhd

2017-06-07 Thread Thomas Huth

Can you still reproduce this problem with the latest version of QEMU
(currently version 2.9.0)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/893956

Title:
  qemu-img bug with dynamic vhd

Status in QEMU:
  Incomplete

Bug description:
  Hye, i found a problem with qemu-img when trying to get info of a
  dynamic vhd. I made imgae of my 60GB computer hard drive with
  disk2vhd. The dynamic vhd is 21gb size.

  With 1.0-rc3 version :
  running command: qemu-img info 60_GB.VHD
  qemu-img:  Could not open '60_GB.VHD' : File too large

  0.14.1 version give me wrong information :
  image: 60_GB.VHD
  file format: vpc
  virtual size: 127G (13683600 bytes)
  disk size: 21G

  Thanks for reply.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/893956/+subscriptions

Re: [Qemu-devel] [RFC] q35/mch: implement extended TSEG sizes

2017-06-07 Thread Laszlo Ersek

On 06/07/17 07:52, Gerd Hoffmann wrote:
>   Hi,
> 
> Patch looks sane overall.
> 
>> Invent a new, QEMU-specific register in the config space of the DRAM
>> Controller, at offset 0x50, in order to allow guest firmware to query
>> the
>> TSEG (SMRAM) size.
> 
> Hmm, 0x50 appears to be the only unused config space register in the
> specs.

I did find more holes, in "Table 5-1. DRAM Controller Register Address
Map (D0:F0)", in "Document Number: 316966-002". The hole at 0x50 is just
the one with the lowest config space offset (after the standard PCI
device header). The others are:

- 0x58-0x59 (word)
- 0x9c (byte)
- 0x9f (byte)
- 0xb2-0xc7 (eight words)
- 0xce-0xdb (seven words)
- 0xeb-0xff (fifteen bytes, but this range appears to come after the
 capability list, so I felt it would be unsafe to use)

> I suspect in reality it isn't unused but undocumented.  I don't
> have a better idea though, and in practice it probably isn't much of a
> problem.

Thanks for confirming. Then I'll set out to write the OVMF-side patches
for this. Once I have it all working, I'm going to send out the QEMU
change as a real patch.

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH] target/m68k: implement rtd

2017-06-07 Thread John Paul Adrian Glaubitz

> Add "Return and Deallocate" (rtd) instruction.
>
>   RTD #d
>
> (SP) -> PC
> SP + 4 + d -> SP

Tested-By: John Paul Adrian Glaubitz 

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

[Qemu-devel] [Bug 965867] Re: 9p virtual file system on qemu slow

2017-06-07 Thread Thomas Huth

Can you still reproduce this problem with the latest version of QEMU
(currently version 2.9.0)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/965867

Title:
  9p virtual file system on qemu slow

Status in QEMU:
  Incomplete
Status in qemu-kvm package in Ubuntu:
  Confirmed

Bug description:
  Hi, 
  The 9p virtual file system is slow. Several examples below: 
  -
  Host for the first time: 
  $ time ls bam.unsorted/
  ...
  real0m0.084s
  user0m0.000s
  sys 0m0.028s
  --
  Host second and following: 

  real0m0.009s
  user0m0.000s
  sys 0m0.008s
  --
  VM for the first time: 
  $ time ls bam.unsorted/
  
  real0m23.141s
  user0m0.064s
  sys 0m2.156s
  --
  VM for the second time
  real0m3.643s
  user0m0.024s
  sys 0m0.424s
  

  Copy on host: 
  $ time cp 5173T.root.bak test.tmp
  real0m30.346s
  user0m0.004s
  sys 0m5.324s

  $ ls -lahs test.tmp
  2.7G -rw--- 1 oneadmin cloud 2.7G Mar 26 21:47 test.tmp

  ---
  $ copy on VM for the same file

  time cp 5173T.root.bak test.tmp

  real5m46.978s
  user0m0.352s
  sys 1m38.962s

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/965867/+subscriptions

Re: [Qemu-devel] [PATCH 2/5] coccinelle: use DIV_ROUND_UP

2017-06-07 Thread Peter Maydell

On 7 June 2017 at 08:46, Marc-André Lureau  wrote:
> The coccinelle/round.cocci script doesn't catch hard coded values.
>
> I used the following script over qemu code base:
>
> (
> - ((e1) + 3) / (4)
> + DIV_ROUND_UP(e1,4)
> |
> - ((e1) + (3)) / (4)
> + DIV_ROUND_UP(e1,4)

Why do we need both of these? Is it just "coccinelle is weird" ? :-)

> |
> - ((e1) + 7) / (8)
> + DIV_ROUND_UP(e1,8)
> |
> - ((e1) + (7)) / (8)
> + DIV_ROUND_UP(e1,8)
> |
> - ((e1) + 15) / (16)
> + DIV_ROUND_UP(e1,16)
> |
> - ((e1) + (15)) / (16)
> + DIV_ROUND_UP(e1,16)
> |
> - ((e1) + 31) / (32)
> + DIV_ROUND_UP(e1,32)
> |
> - ((e1) + (31)) / (32)
> + DIV_ROUND_UP(e1,32)
> )

> - next_op = op_pointer + ((oplen + 7) / 8);
> + next_op = op_pointer + (DIV_ROUND_UP(oplen, 8));

I think there's a coccinelle trick for making it drop
now-unnecessary brackets in substitutions like this, but I forget
what it is. Maybe it's as simple as having substitutions for

> - (((e1) + 7) / (8))
> + DIV_ROUND_UP(e1,8)

as well?

thanks
-- PMM

Re: [Qemu-devel] [RFC 0/8] VIRTIO-IOMMU device

2017-06-07 Thread Jason Wang




On 2017年06月07日 16:35, Eric Auger wrote:

This series implements the virtio-iommu device. This is a proof
of concept based on the virtio-iommu specification written by
Jean-Philippe Brucker [1]. This was tested with a guest using
the virtio-iommu driver [2] and exposed with a virtio-net-pci
using dma ops.

The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node. ACPI booting is not yet supported.

This should allow to start some benchmarking activities against
pure emulated IOMMU (especially ARM SMMU).


Yes, it would be also interesting to compare it with intel IOMMU. 
Actually the core function is similar to the subset of intel one with CM 
enabled. Since each map and unmap requires a command, it would be very 
slow for dynamic mappings. I wonder whether or not we can do any 
optimization on this.


Thanks



Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-rfcv1

References:
[1] [RFC 0/3] virtio-iommu: a paravirtualized IOMMU,
[2] [RFC PATCH linux] iommu: Add virtio-iommu driver
[3] [RFC PATCH kvmtool 00/15] Add virtio-iommu

Eric Auger (8):
   update-linux-headers: import virtio_iommu.h
   linux-headers: Update for virtio-iommu
   virtio_iommu: add skeleton
   virtio-iommu: Decode the command payload
   virtio_iommu: Add the iommu regions
   virtio-iommu: Implement the translation and commands
   hw/arm/virt: Add 2.10 machine type
   hw/arm/virt: Add virtio-iommu the virt board

  hw/arm/virt.c | 116 -
  hw/virtio/Makefile.objs   |   1 +
  hw/virtio/trace-events|  14 +
  hw/virtio/virtio-iommu.c  | 623 ++
  include/hw/arm/virt.h |   5 +
  include/hw/virtio/virtio-iommu.h  |  60 +++
  include/standard-headers/linux/virtio_ids.h   |   1 +
  include/standard-headers/linux/virtio_iommu.h | 142 ++
  linux-headers/linux/virtio_iommu.h|   1 +
  scripts/update-linux-headers.sh   |   3 +
  10 files changed, 957 insertions(+), 9 deletions(-)
  create mode 100644 hw/virtio/virtio-iommu.c
  create mode 100644 include/hw/virtio/virtio-iommu.h
  create mode 100644 include/standard-headers/linux/virtio_iommu.h
  create mode 100644 linux-headers/linux/virtio_iommu.h

Re: [Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread Fam Zheng

On Wed, 06/07 01:40, no-re...@patchew.org wrote:
> Hi,
> 
> This series failed build test on s390x host. Please find the details below.

So what happened is the double '[[' in the subject line confused Patchew, and
each patch in this series is treated as a standalone patch.

Fam

Re: [Qemu-devel] [PATCH 2/5] coccinelle: use DIV_ROUND_UP

2017-06-07 Thread Marc-André Lureau

Hi

- Original Message -
> On 7 June 2017 at 08:46, Marc-André Lureau 
> wrote:
> > The coccinelle/round.cocci script doesn't catch hard coded values.
> >
> > I used the following script over qemu code base:
> >
> > (
> > - ((e1) + 3) / (4)
> > + DIV_ROUND_UP(e1,4)
> > |
> > - ((e1) + (3)) / (4)
> > + DIV_ROUND_UP(e1,4)
> 
> Why do we need both of these? Is it just "coccinelle is weird" ? :-)

I am total newbie to coccinnelle-land, but I think this one is useless 
duplication

> 
> > |
> > - ((e1) + 7) / (8)
> > + DIV_ROUND_UP(e1,8)
> > |
> > - ((e1) + (7)) / (8)
> > + DIV_ROUND_UP(e1,8)
> > |
> > - ((e1) + 15) / (16)
> > + DIV_ROUND_UP(e1,16)
> > |
> > - ((e1) + (15)) / (16)
> > + DIV_ROUND_UP(e1,16)
> > |
> > - ((e1) + 31) / (32)
> > + DIV_ROUND_UP(e1,32)
> > |
> > - ((e1) + (31)) / (32)
> > + DIV_ROUND_UP(e1,32)
> > )
> 
> > - next_op = op_pointer + ((oplen + 7) / 8);
> > + next_op = op_pointer + (DIV_ROUND_UP(oplen, 8));
> 
> I think there's a coccinelle trick for making it drop
> now-unnecessary brackets in substitutions like this, but I forget
> what it is. Maybe it's as simple as having substitutions for
> 

> > - (((e1) + 7) / (8))
> > + DIV_ROUND_UP(e1,8)
> 
> as well?


I think you need a second rule:
@@
expression e1;
expression e2;
@@
-(DIV_ROUND_UP(e1,e2))
+DIV_ROUND_UP(e1,e2)


I will fix it in second version.

Re: [Qemu-devel] [[PATCH V7] 09/11] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread Alexey Perevalov


On 06/07/2017 12:24 PM, Fam Zheng wrote:

On Wed, 06/07 01:40, no-re...@patchew.org wrote:

Hi,

This series failed build test on s390x host. Please find the details below.

So what happened is the double '[[' in the subject line confused Patchew, and
each patch in this series is treated as a standalone patch.

it's not bad to check every patch in sequence separately,
but including previous patches )



Fam





--
Best regards,
Alexey Perevalov

[Qemu-devel] [PULL 1/1] target/m68k: implement rtd

2017-06-07 Thread Laurent Vivier

Add "Return and Deallocate" (rtd) instruction.

  RTD #d

(SP) -> PC
SP + 4 + d -> SP

Signed-off-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Tested-By: John Paul Adrian Glaubitz 
Message-Id: <20170605100014.22981-1-laur...@vivier.eu>
---
 target/m68k/cpu.c   |  2 ++
 target/m68k/cpu.h   |  1 +
 target/m68k/translate.c | 11 +++
 3 files changed, 14 insertions(+)

diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index fa10b6e..f068922 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -130,6 +130,7 @@ static void m68020_cpu_initfn(Object *obj)
 m68k_set_feature(env, M68K_FEATURE_FPU);
 m68k_set_feature(env, M68K_FEATURE_CAS);
 m68k_set_feature(env, M68K_FEATURE_BKPT);
+m68k_set_feature(env, M68K_FEATURE_RTD);
 }
 #define m68030_cpu_initfn m68020_cpu_initfn
 #define m68040_cpu_initfn m68020_cpu_initfn
@@ -151,6 +152,7 @@ static void m68060_cpu_initfn(Object *obj)
 m68k_set_feature(env, M68K_FEATURE_FPU);
 m68k_set_feature(env, M68K_FEATURE_CAS);
 m68k_set_feature(env, M68K_FEATURE_BKPT);
+m68k_set_feature(env, M68K_FEATURE_RTD);
 }
 
 static void m5208_cpu_initfn(Object *obj)
diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
index 8095822..384ec5d 100644
--- a/target/m68k/cpu.h
+++ b/target/m68k/cpu.h
@@ -251,6 +251,7 @@ enum m68k_features {
 M68K_FEATURE_FPU,
 M68K_FEATURE_CAS,
 M68K_FEATURE_BKPT,
+M68K_FEATURE_RTD,
 };
 
 static inline int m68k_feature(CPUM68KState *env, int feature)
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 9f60fbc..ad4d4ef 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -2483,6 +2483,16 @@ DISAS_INSN(nop)
 {
 }
 
+DISAS_INSN(rtd)
+{
+TCGv tmp;
+int16_t offset = read_im16(env, s);
+
+tmp = gen_load(s, OS_LONG, QREG_SP, 0);
+tcg_gen_addi_i32(QREG_SP, QREG_SP, offset + 4);
+gen_jmp(s, tmp);
+}
+
 DISAS_INSN(rts)
 {
 TCGv tmp;
@@ -4904,6 +4914,7 @@ void register_m68k_insns (CPUM68KState *env)
 BASE(nop,   4e71, );
 BASE(stop,  4e72, );
 BASE(rte,   4e73, );
+INSN(rtd,   4e74, , RTD);
 BASE(rts,   4e75, );
 INSN(movec, 4e7b, , CF_ISA_A);
 BASE(jump,  4e80, ffc0);
-- 
2.9.4

Re: [Qemu-devel] [[PATCH V7] 03/11] migration: fix hardcoded function name in error report

2017-06-07 Thread no-reply

Hi,

This series failed build test on s390x host. Please find the details below.

Subject: [Qemu-devel] [[PATCH V7] 03/11] migration: fix hardcoded function name 
in error report
Message-id: 1496820931-27416-4-git-send-email-a.pereva...@samsung.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
68e091d 03/11] migration: fix hardcoded function name in error report

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=86252
SHELL=/bin/sh
USER=fam
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin
PWD=/var/tmp/patchew-tester-tmp-w_vngi4s/src
LANG=en_US.UTF-8
HOME=/home/fam
SHLVL=2
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
xz-libs-5.2.2-2.fc24.s390x
libxshmfence-1.2-3.fc24.s390x
giflib-4.1.6-15.fc24.s390x
trousers-lib-0.3.13-6.fc24.s390x
ncurses-base-6.0-6.20160709.fc25.noarch
gmp-6.1.1-1.fc25.s390x
libidn-1.33-1.fc25.s390x
slang-2.3.0-7.fc25.s390x
libsemanage-2.5-8.fc25.s390x
pkgconfig-0.29.1-1.fc25.s390x
alsa-lib-1.1.1-2.fc25.s390x
yum-metadata-parser-1.1.4-17.fc25.s390x
python3-slip-dbus-0.6.4-4.fc25.noarch
python2-cssselect-0.9.2-1.fc25.noarch
python-fedora-0.8.0-2.fc25.noarch
createrepo_c-libs-0.10.0-6.fc25.s390x
initscripts-9.69-1.fc25.s390x
wget-1.18-2.fc25.s390x
dhcp-client-4.3.5-1.fc25.s390x
parted-3.2-21.fc25.s390x
flex-2.6.0-3.fc25.s390x
colord-libs-1.3.4-1.fc25.s390x
python-osbs-client-0.33-3.fc25.noarch
perl-Pod-Simple-3.35-1.fc25.noarch
python2-simplejson-3.10.0-1.fc25.s390x
brltty-5.4-2.fc25.s390x
librados2-10.2.4-2.fc25.s390x
tcp_wrappers-7.6-83.fc25.s390x
libcephfs_jni1-10.2.4-2.fc25.s390x
nettle-devel-3.3-1.fc25.s390x
bzip2-devel-1.0.6-21.fc25.s390x
libuuid-2.28.2-2.fc25.s390x
pango-1.40.4-1.fc25.s390x
python3-dnf-1.1.10-6.fc25.noarch
cryptsetup-libs-1.7.4-1.fc25.s390x
texlive-kpathsea-doc-svn41139-33.fc25.1.noarch
netpbm-10.77.00-3.fc25.s390x
openssh-7.4p1-4.fc25.s390x
texlive-kpathsea-bin-svn40473-33.20160520.fc25.1.s390x
texlive-graphics-svn41015-33.fc25.1.noarch
texlive-dvipdfmx-def-svn40328-33.fc25.1.noarch
texlive-mfware-svn40768-33.fc25.1.noarch
texlive-texlive-scripts-svn41433-33.fc25.1.noarch
texlive-euro-svn22191.1.1-33.fc25.1.noarch
texlive-etex-svn37057.0-33.fc25.1.noarch
texlive-iftex-svn29654.0.2-33.fc25.1.noarch
texlive-palatino-svn31835.0-33.fc25.1.noarch
texlive-texlive-docindex-svn41430-33.fc25.1.noarch
texlive-xunicode-svn30466.0.981-33.fc25.1.noarch
texlive-koma-script-svn41508-33.fc25.1.noarch
texlive-pst-grad-svn15878.1.06-33.fc25.1.noarch
texlive-pst-blur-svn15878.2.0-33.fc25.1.noarch
texlive-jknapltx-svn19440.0-33.fc25.1.noarch
netpbm-progs-10.77.00-3.fc25.s390x
texinfo-6.1-4.fc25.s390x
openssl-devel-1.0.2k-1.fc25.s390x
python2-sssdconfig-1.15.2-1.fc25.noarch
gdk-pixbuf2-2.36.6-1.fc25.s390x
mesa-libEGL-13.0.4-3.fc25.s390x
pcre-cpp-8.40-6.fc25.s390x
pcre-utf16-8.40-6.fc25.s390x
glusterfs-extra-xlators-3.10.1-1.fc25.s390x
mesa-libGL-devel-13.0.4-3.fc25.s390x
nss-devel-3.29.3-1.1.fc25.s390x
libaio-0.3.110-6.fc24.s390x
libfontenc-1.1.3-3.fc24.s390x
lzo-2.08-8.fc24.s390x
isl-0.14-5.fc24.s390x
libXau-1.0.8-6.fc24.s390x
linux-atm-libs-2.5.1-14.fc24.s390x
libXext-1.3.3-4.fc24.s390x
libXxf86vm-1.1.4-3.fc24.s390x
bison-3.0.4-4.fc24.s390x
perl-srpm-macros-1-20.fc25.noarch
gawk-4.1.3-8.fc25.s390x
libwayland-client-1.12.0-1.fc25.s390x
perl-Exporter-5.72-366.fc25.noarch
perl-version-0.99.17-1.fc25.s390x
fftw-libs-double-3.3.5-3.fc25.s390x
libssh2-1.8.0-1.fc25.s390x
ModemManager-glib-1.6.4-1.fc25.s390x
newt-python3-0.52.19-2.fc25.s390x
python-munch-2.0.4-3.fc25.noarch
python-bugzilla-1.2.2-4.fc25.noarch
libedit-3.1-16.20160618cvs.fc25.s390x
python-pycurl-7.43.0-4.fc25.s390x
createrepo_c-0.10.0-6.fc25.s390x
device-mapper-multipath-libs-0.4.9-83.fc25.s390x
yum-3.4.3-510.fc25.noarch
dhcp-common-4.3.5-1.fc25.noarch
dracut-config-rescue-044-78.fc25.s390x
teamd-1.26-1.fc25.s390x
mozjs17-17.0.0-16.fc25.s390x
libselinux-2.5-13.fc25.s390x
libgo-devel-6.3.1-1.fc25.s390x
NetworkManager-libnm-1.4.4-3.fc25.s390x
python2-pyparsing-2.1.10-1.fc25.noarch
cairo-gobject-1.14.8-1.fc25.s390x
ethtool-4.8-1.fc25.s390x
xorg-x11-proto-devel-7.7-20.fc25.noarch
brlapi-0.6.5-2.fc25.s390x
librados-devel-10.2.4-2.fc25.s390x
libXinerama-devel-1.1.3-6.fc24.s390x
quota-4.03-7.fc25.s390x
lua-posix-33.3.1-3.fc25.s390x
usbredir-devel-0.7.1-2.fc24.s390x
python-libs-2.7.13-1.fc25.s390x
libX11

[Qemu-devel] [PULL 0/1] M68k for 2.10 patches

2017-06-07 Thread Laurent Vivier

The following changes since commit 65dfad62a176f5265f801683be64149c5ad55f7d:

  Merge remote-tracking branch 'remotes/xtensa/tags/20170606-xtensa' into 
staging (2017-06-06 17:00:12 +0100)

are available in the git repository at:

  git://github.com/vivier/qemu-m68k.git tags/m68k-for-2.10-pull-request

for you to fetch changes up to 18059c9e1648bf4fc5c7c1bae6f54690742b05ba:

  target/m68k: implement rtd (2017-06-07 11:18:30 +0200)





Laurent Vivier (1):
  target/m68k: implement rtd

 target/m68k/cpu.c   |  2 ++
 target/m68k/cpu.h   |  1 +
 target/m68k/translate.c | 11 +++
 3 files changed, 14 insertions(+)

-- 
2.9.4

Re: [Qemu-devel] [PATCH v3 5/7] pci: Make errp the last parameter of pci_add_capability()

2017-06-07 Thread Mao Zhongyi


Hi, Markus

On 06/07/2017 03:05 PM, Markus Armbruster wrote:

Mao Zhongyi  writes:


Hi, Eduardo

On 06/06/2017 10:52 PM, Eduardo Habkost wrote:

On Tue, Jun 06, 2017 at 07:26:30PM +0800, Mao Zhongyi wrote:

Add Error argument for pci_add_capability() to leverage the errp
to pass info on errors. This way is helpful for its callers to
make a better error handling when moving to 'realize'.

[...]

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b73bfea..2bba37a 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2264,15 +2264,13 @@ static void pci_del_option_rom(PCIDevice *pdev)
  * in pci config space
  */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
-   uint8_t offset, uint8_t size)
+   uint8_t offset, uint8_t size,
+   Error **errp)
 {
 int ret;
-Error *local_err = NULL;

-ret = pci_add_capability2(pdev, cap_id, offset, size, &local_err);
-if (ret < 0) {
-error_report_err(local_err);
-}
+ret = pci_add_capability2(pdev, cap_id, offset, size, errp);
+
 return ret;
 }


pci_add_capability() and pci_add_capability2() now do exactly the
same, why are both being kept?  I suggest replacing
pci_add_capability2() with pci_add_capability() everywhere (on a
separate patch).



Completely remove pci_add_capability and direct use pci_add_capability2()
everywhere is it a more thorough way?


You're converting pci_add_capability() to Error because you need the
Error for your conversions to realize().


it's true.



I recommend to change the calls where you need the Error (and only
these) to call pci_add_capability2() instead.

When no calls to pci_add_capability() remain, we remove it.  If that
becomes the case in your series, you remove it.

Okay?


This is a gentle way of doing it. After read the code I found only
parts need to be replaced by pci_add_capability2() in my series as
follow your advice, this is no problem. But it means that the remaining
replacement will be reworked in the future, although it can be fixed
absolutely in a separate patch now. Of course, this is just my own
opinion, consider the reason for git history I would rather hear your
advice. :)

Thanks
Mao

Re: [Qemu-devel] [Qemu-block] [PATCH] blockjob: cancel blockjobs before stopping all iothreads

2017-06-07 Thread Alberto Garcia

On Sat 03 Jun 2017 07:48:37 AM CEST, sochin.jiang wrote:

> --- a/block.c
> +++ b/block.c
> @@ -3084,9 +3084,16 @@ static void bdrv_close(BlockDriverState *bs)
>  bdrv_drained_end(bs);
>  }
>  
> +void bdrv_cancel_all(void)
> +{
> +if (!block_jobs_is_empty()) {
> +block_job_cancel_sync_all();
> +}
> +}

Why do you need this function at all? block_job_cancel_sync_all() is
already doing nothing when the block_jobs list is empty.

Berto

[Qemu-devel] [PATCH v8 00/11] calculate blocktime for postcopy live migration

2017-06-07 Thread Alexey Perevalov

This is 8th version.

The rationale for that idea is following:
vCPU could suspend during postcopy live migration until faulted
page is not copied into kernel. Downtime on source side it's a value -
time interval since source turn vCPU off, till destination start runnig
vCPU. But that value was proper value for precopy migration it really shows
amount of time when vCPU is down. But not for postcopy migration, because
several vCPU threads could susppend after vCPU was started. That is important
to estimate packet drop for SDN software.


(V7 -> V8)
1. just one comma in
"migration: fix hardcoded function name in error report"
It was really missed, but fixed in futher patch.

(V6 -> V7)
1. copied bitmap was placed into RAMBlock as another migration
related bitmaps.
2. Ordering of mark_postcopy_blocktime_end call and ordering
of checking copied bitmap were changed.
3. linewrap style defects
4. new patch "postcopy_place_page factoring out"
5. postcopy_ram_supported_by_host accepts
MigrationIncomingState in qmp_migrate_set_capabilities
5. minor fixes of documentation. 
and huge description of get_postcopy_total_blocktime was
moved. Davids comment.

(V5 -> V6)
- blocktime was added into hmp command. Comment from David.
- bitmap for copied pages was added as well as check in *_begin/_end
functions. Patch uses just introduced RAMBLOCK_FOREACH. Comment from David.
- description of receive_ufd_features/request_ufd_features. Comment from 
David.
- commit message headers/@since references were modified. Comment from Eric.
- also typos in documentation. Comment from Eric.
- style and description of field in MigrationInfo. Comment from Eric.
- ufd_check_and_apply (former ufd_version_check) is calling twice,
so my previous patch contained double allocation of blocktime context and
as a result memory leak. In this patch series it was fixed.

(V4 -> V5)
- fill_destination_postcopy_migration_info empty stub was missed for none 
linux
build

(V3 -> V4)
- get rid of Downtime as a name for vCPU waiting time during postcopy 
migration
- PostcopyBlocktimeContext renamed (it was just BlocktimeContext)
- atomic operations are used for dealing with fields of 
PostcopyBlocktimeContext
affected in both threads.
- hardcoded function names in error_report were replaced to %s and __line__
- this patch set includes postcopy-downtime capability, but it used on
destination, coupled with not possibility to return calculated downtime back
to source to show it in query-migrate, it looks like a big trade off
- UFFD_API have to be sent notwithstanding need or not to ask kernel
for a feature, due to kernel expects it in any case (see patch comment)
- postcopy_downtime included into query-migrate output
- also this patch set includes trivial fix
migration: fix hardcoded function name in error report
maybe that is a candidate for qemu-trivial mailing list, but I already
sent "migration: Fixed code style" and it was unclaimed.

(V2 -> V3)
- Downtime calculation approach was changed, thanks to Peter Xu
- Due to previous point no more need to keep GTree as well as bitmap of 
cpus.
So glib changes aren't included in this patch set, it could be resent in
another patch set, if it will be a good reason for it.
- No procfs traces in this patchset, if somebody wants it, you could get it
from patchwork site to track down page fault initiators.
- UFFD_FEATURE_THREAD_ID is requesting only when kernel supports it
- It doesn't send back the downtime, just trace it

This patch set is based on commit
a0d4aac7467dd02e5657b79e867f067330266a24
of git://git.qemu-project.org/qemu.git

Alexey Perevalov (11):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  migration: pass MigrationIncomingState* into migration check functions
  migration: fix hardcoded function name in error report
  migration: split ufd_version_check onto receive/request features part
  migration: introduce postcopy-blocktime capability
  migration: add postcopy blocktime ctx into MigrationIncomingState
  migration: add bitmap for copied page
  migration: postcopy_place_page factoring out
  migration: calculate vCPU blocktime on dst side
  migration: add postcopy total blocktime into query-migrate
  migration: postcopy_blocktime documentation

 docs/migration.txt|  10 +
 hmp.c |  15 ++
 include/exec/ram_addr.h   |   2 +
 include/migration/migration.h |  13 ++
 linux-headers/linux/userfaultfd.h |   4 +
 migration/migration.c |  52 +-
 migration/postcopy-ram.c  | 374 --
 migration/postcopy-ram.h  |   6 +-
 migration/ram.c   |  40 +++-
 migration/ram.h   |   4 +
 migration/savevm.c|   2 +-
 migration/trace-events|   6 +-
 qapi-schema.json  |  14 +-
 13 files changed, 514 insertions(+

[Qemu-devel] [PATCH v8 08/11] migration: postcopy_place_page factoring out

2017-06-07 Thread Alexey Perevalov

Need to mark paged copied as closer as possible place where it
tracks down. That will be necessary in futher patch.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 13 -
 migration/postcopy-ram.h |  4 ++--
 migration/ram.c  |  4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ade7f1c..62a272a 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -713,9 +713,10 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 struct uffdio_copy copy_struct;
+size_t pagesize = qemu_ram_pagesize(rb);
 
 copy_struct.dst = (uint64_t)(uintptr_t)host;
 copy_struct.src = (uint64_t)(uintptr_t)from;
@@ -744,10 +745,12 @@ int postcopy_place_page(MigrationIncomingState *mis, void 
*host, void *from,
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize)
+ RAMBlock *rb)
 {
+size_t pagesize;
 trace_postcopy_place_page_zero(host);
 
+pagesize = qemu_ram_pagesize(rb);
 if (pagesize == getpagesize()) {
 struct uffdio_zeropage zero_struct;
 zero_struct.range.start = (uint64_t)(uintptr_t)host;
@@ -778,7 +781,7 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, 
void *host,
 memset(mis->postcopy_tmp_zero_page, '\0', mis->largest_page_size);
 }
 return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
-   pagesize);
+   rb);
 }
 
 return 0;
@@ -841,14 +844,14 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
*mis)
 }
 
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
 }
 
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
-size_t pagesize)
+RAMBlock *rb)
 {
 assert(0);
 return -1;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 587a8b8..77ea0fd 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -72,14 +72,14 @@ void postcopy_discard_send_finish(MigrationState *ms,
  * returns 0 on success
  */
 int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
-size_t pagesize);
+RAMBlock *rb);
 
 /*
  * Place a zero page at (host) atomically
  * returns 0 on success
  */
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
- size_t pagesize);
+ RAMBlock *rb);
 
 /* The current postcopy state is read/set by postcopy_state_get/set
  * which update it atomically.
diff --git a/migration/ram.c b/migration/ram.c
index a7c0db4..a791d40 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2524,10 +2524,10 @@ static int ram_load_postcopy(QEMUFile *f)
 
 if (all_zero) {
 ret = postcopy_place_page_zero(mis, place_dest,
-   block->page_size);
+   block);
 } else {
 ret = postcopy_place_page(mis, place_dest,
-  place_source, block->page_size);
+  place_source, block);
 }
 }
 if (!ret) {
-- 
1.9.1

[Qemu-devel] [PATCH v8 05/11] migration: introduce postcopy-blocktime capability

2017-06-07 Thread Alexey Perevalov

Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h | 1 +
 migration/migration.c | 9 +
 qapi-schema.json  | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 79b5484..2e61df5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -189,6 +189,7 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 bool migrate_use_events(void);
+bool migrate_postcopy_blocktime(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_message(MigrationIncomingState *mis,
diff --git a/migration/migration.c b/migration/migration.c
index 2a77636..d1cc34f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1371,6 +1371,15 @@ bool migrate_zero_blocks(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_ZERO_BLOCKS];
 }
 
+bool migrate_postcopy_blocktime(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME];
+}
+
 bool migrate_use_compression(void)
 {
 MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 4b50b65..e906953 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -900,12 +900,15 @@
 #  offers more flexibility.
 #  (Since 2.10)
 #
+# @postcopy-blocktime: Calculate downtime for postcopy live migration
+# (since 2.10)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
-   'block' ] }
+   'block', 'postcopy-blocktime'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
1.9.1

[Qemu-devel] [PATCH v8 06/11] migration: add postcopy blocktime ctx into MigrationIncomingState

2017-06-07 Thread Alexey Perevalov

This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID,
in case when this feature is provided by kernel.

PostcopyBlocktimeContext is incapsulated inside postcopy-ram.c,
due to it's postcopy only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to request
proper capabiltiy (Patch for documentation will be at the tail of the patch
set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\":   {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Signed-off-by: Alexey Perevalov 
---
 include/migration/migration.h |  8 ++
 migration/postcopy-ram.c  | 65 +++
 2 files changed, 73 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2e61df5..766e802 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -49,6 +49,8 @@ enum mig_rp_message_type {
 MIG_RP_MSG_MAX
 };
 
+struct PostcopyBlocktimeContext;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
@@ -86,6 +88,12 @@ struct MigrationIncomingState {
 /* The coroutine we should enter (back) after failover */
 Coroutine *migration_incoming_co;
 QemuSemaphore colo_incoming_sem;
+
+/*
+ * PostcopyBlocktimeContext to keep information for postcopy
+ * live migration, to calculate vCPU block time
+ * */
+struct PostcopyBlocktimeContext *blocktime_ctx;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index cbe8f9f..ade7f1c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,6 +63,58 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
+typedef struct PostcopyBlocktimeContext {
+/* time when page fault initiated per vCPU */
+int64_t *page_fault_vcpu_time;
+/* page address per vCPU */
+uint64_t *vcpu_addr;
+int64_t total_blocktime;
+/* blocktime per vCPU */
+int64_t *vcpu_blocktime;
+/* point in time when last page fault was initiated */
+int64_t last_begin;
+/* number of vCPU are suspended */
+int smp_cpus_down;
+
+/*
+ * Handler for exit event, necessary for
+ * releasing whole blocktime_ctx
+ */
+Notifier exit_notifier;
+/*
+ * Handler for postcopy event, necessary for
+ * releasing unnecessary part of blocktime_ctx
+ */
+Notifier postcopy_notifier;
+} PostcopyBlocktimeContext;
+
+static void destroy_blocktime_context(struct PostcopyBlocktimeContext *ctx)
+{
+g_free(ctx->page_fault_vcpu_time);
+g_free(ctx->vcpu_addr);
+g_free(ctx->vcpu_blocktime);
+g_free(ctx);
+}
+
+static void migration_exit_cb(Notifier *n, void *data)
+{
+PostcopyBlocktimeContext *ctx = container_of(n, PostcopyBlocktimeContext,
+ exit_notifier);
+destroy_blocktime_context(ctx);
+}
+
+static struct PostcopyBlocktimeContext *blocktime_context_new(void)
+{
+PostcopyBlocktimeContext *ctx = g_new0(PostcopyBlocktimeContext, 1);
+ctx->page_fault_vcpu_time = g_new0(int64_t, smp_cpus);
+ctx->vcpu_addr = g_new0(uint64_t, smp_cpus);
+ctx->vcpu_blocktime = g_new0(int64_t, smp_cpus);
+
+ctx->exit_notifier.notify = migration_exit_cb;
+qemu_add_exit_notifier(&ctx->exit_notifier);
+add_migration_state_change_notifier(&ctx->postcopy_notifier);
+return ctx;
+}
 
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
@@ -155,6 +207,19 @@ static bool ufd_check_and_apply(int ufd, 
MigrationIncomingState *mis)
 }
 }
 
+#ifdef UFFD_FEATURE_THREAD_ID
+if (migrate_postcopy_blocktime() && mis &&
+UFFD_FEATURE_THREAD_ID & supported_features) {
+/* kernel supports that feature */
+/* don't create blocktime_context if it exists */
+if (!mis->blocktime_ctx) {
+mis->blocktime_ctx = blocktime_context_new();
+}
+
+asked_features |= UFFD_FEATURE_THREAD_ID;
+}
+#endif
+
 /*
  * request features, even if asked_features is 0, due to
  * kernel expects UFFD_API before UFFDIO_REGISTER, per
-- 
1.9.1

[Qemu-devel] [PATCH v8 09/11] migration: calculate vCPU blocktime on dst side

2017-06-07 Thread Alexey Perevalov

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 139 ++-
 migration/trace-events   |   5 +-
 2 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 62a272a..0ad9f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,7 @@
 #include "ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include 
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -561,6 +562,133 @@ static int ram_block_enable_notify(const char 
*block_name, void *host_addr,
 return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+CPUState *cpu_iter;
+
+CPU_FOREACH(cpu_iter) {
+if (cpu_iter->thread_id == pid) {
+return cpu_iter->cpu_index;
+}
+}
+trace_get_mem_fault_cpu_index(pid);
+return -1;
+}
+
+/*
+ * This function is being called when pagefault occurs. It
+ * tracks down vCPU blocking time.
+ *
+ * @addr: faulted host virtual address
+ * @ptid: faulted process thread id
+ * @rb: ramblock appropriate to addr
+ */
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+  RAMBlock *rb)
+{
+int cpu;
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int64_t now_ms;
+
+if (!dc || ptid == 0) {
+return;
+}
+cpu = get_mem_fault_cpu_index(ptid);
+if (cpu < 0) {
+return;
+}
+
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (dc->vcpu_addr[cpu] == 0) {
+atomic_inc(&dc->smp_cpus_down);
+}
+
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+atomic_xchg__nocheck(&dc->last_begin, now_ms);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
+
+if (test_copiedmap_by_addr(addr, rb)) {
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], 0);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], 0);
+atomic_sub(&dc->smp_cpus_down, 1);
+}
+trace_mark_postcopy_blocktime_begin(addr, dc, 
dc->page_fault_vcpu_time[cpu],
+cpu);
+}
+
+/*
+ *  This function just provide calculated blocktime per cpu and trace it.
+ *  Total blocktime is calculated in mark_postcopy_blocktime_end.
+ *
+ *
+ * Assume we have 3 CPU
+ *
+ *  S1E1   S1   E1
+ * -***xxx***> CPU1
+ *
+ * S2E2
+ * xxx---> CPU2
+ *
+ * S3E3
+ * xxx---> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include 
CPU3
+ * S3,S1,E2 - sequence includes all CPUs, in this case overlap will be S1,E2 -
+ *it's a part of total blocktime.
+ * S1 - here is last_begin
+ * Legend of the picture is following:
+ *  * - means blocktime per vCPU
+ *  x - means overlapped blocktime (total blocktime)
+ *
+ * @addr: host virtual address
+ */
+static void mark_postcopy_blocktime_end(uint64_t addr)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int i, affected_cpu = 0;
+int64_t now_ms;
+bool vcpu_total_blocktime = false;
+
+if (!dc) {
+return;
+}
+
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+/* lookup cpu, to clear it,
+ * that algorithm looks straighforward, but it's not
+ * optimal, more optimal algorithm is keeping tree or hash
+ * where key is address value is a list of  */
+for (i = 0; i < smp_cpus; i++) {
+uint64_t vcpu_blocktime = 0;
+if (atomic_fetch_add(&dc->vcpu_addr[i], 0) != addr) {
+continue;
+}
+atomic_xchg__nocheck(&dc->vcpu_addr[i], 0);
+vcpu_blocktime = now_ms -
+atomic_fetch_add(&dc->page_fault_vcpu_time[i], 0);
+affected_cpu += 1;
+/* we need to know is that mark_postcopy_end was due to
+ * faulted page, another possible case it's prefetched
+ * page and in that case we shouldn't be here */
+if (!vcp

[Qemu-devel] [PATCH v8 01/11] userfault: add pid into uffd_msg & update UFFD_FEATURE_*

2017-06-07 Thread Alexey Perevalov

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov 
---
 linux-headers/linux/userfaultfd.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h 
b/linux-headers/linux/userfaultfd.h
index 9701772..eda028c 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -78,6 +78,9 @@ struct uffd_msg {
struct {
__u64   flags;
__u64   address;
+   union {
+   __u32   ptid;
+   } feat;
} pagefault;
 
struct {
@@ -161,6 +164,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MISSING_HUGETLBFS (1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM (1<<5)
 #define UFFD_FEATURE_EVENT_UNMAP   (1<<6)
+#define UFFD_FEATURE_THREAD_ID (1<<7)
__u64 features;
 
__u64 ioctls;
-- 
1.9.1

[Qemu-devel] [PATCH v8 03/11] migration: fix hardcoded function name in error report

2017-06-07 Thread Alexey Perevalov

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 10d39a0..8838901 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -71,7 +71,7 @@ static bool ufd_version_check(int ufd, MigrationIncomingState 
*mis)
 api_struct.api = UFFD_API;
 api_struct.features = 0;
 if (ioctl(ufd, UFFDIO_API, &api_struct)) {
-error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+error_report("%s: UFFDIO_API failed: %s", __func__,
  strerror(errno));
 return false;
 }
-- 
1.9.1

[Qemu-devel] [PATCH v8 07/11] migration: add bitmap for copied page

2017-06-07 Thread Alexey Perevalov

This patch adds ability to track down already copied
pages, it's necessary for calculation vCPU block time in
postcopy migration feature, maybe for restore after
postcopy migration failure.
Also it's necessary to solve shared memory issue in
postcopy livemigration. Information about copied pages
will be transferred to the software virtual bridge
(e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for
already copied pages. fallocate syscall is required for
remmaped shared memory, due to remmaping itself blocks
ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT
error (struct page is exists after remmap).

Bitmap is placed into RAMBlock as another postcopy/precopy
related bitmaps. Helpers are in migration/ram.c, due to
in this file is allowing to work with RAMBlock.

Signed-off-by: Alexey Perevalov 
---
 include/exec/ram_addr.h |  2 ++
 migration/ram.c | 36 
 migration/ram.h |  4 
 3 files changed, 42 insertions(+)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 140efa8..6a3780b 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -47,6 +47,8 @@ struct RAMBlock {
  * of the postcopy phase
  */
 unsigned long *unsentmap;
+/* bitmap of already copied pages in postcopy */
+unsigned long *copiedmap;
 };
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/migration/ram.c b/migration/ram.c
index f387e9c..a7c0db4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -149,6 +149,25 @@ out:
 return ret;
 }
 
+static unsigned long int get_copied_bit_offset(uint64_t addr, RAMBlock *rb)
+{
+uint64_t addr_offset = addr - (uint64_t)(uintptr_t)rb->host;
+int page_shift = find_first_bit((unsigned long *)&rb->page_size,
+sizeof(rb->page_size));
+
+return addr_offset >> page_shift;
+}
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+return test_bit(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb)
+{
+set_bit_atomic(get_copied_bit_offset(addr, rb), rb->copiedmap);
+}
+
 /*
  * An outstanding page request, on the source, having been received
  * and queued
@@ -1449,6 +1468,8 @@ static void ram_migration_cleanup(void *opaque)
 block->bmap = NULL;
 g_free(block->unsentmap);
 block->unsentmap = NULL;
+g_free(block->copiedmap);
+block->copiedmap = NULL;
 }
 
 XBZRLE_cache_lock();
@@ -2517,6 +2538,14 @@ static int ram_load_postcopy(QEMUFile *f)
 return ret;
 }
 
+static unsigned long get_copiedmap_size(RAMBlock *rb)
+{
+unsigned long pages;
+pages = rb->max_length >> find_first_bit((unsigned long *)&rb->page_size,
+ sizeof(rb->page_size));
+return pages;
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 int flags = 0, ret = 0;
@@ -2544,6 +2573,13 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 rcu_read_lock();
 
 if (postcopy_running) {
+RAMBlock *rb;
+RAMBLOCK_FOREACH(rb) {
+/* need for destination, bitmap_new calls
+ * g_try_malloc0 and this function
+ * Attempts to allocate @n_bytes, initialized to 0'sh */
+rb->copiedmap = bitmap_new(get_copiedmap_size(rb));
+}
 ret = ram_load_postcopy(f);
 }
 
diff --git a/migration/ram.h b/migration/ram.h
index c9563d1..1f32824 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -67,4 +67,8 @@ int ram_discard_range(const char *block_name, uint64_t start, 
size_t length);
 int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+
+int test_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+void set_copiedmap_by_addr(uint64_t addr, RAMBlock *rb);
+
 #endif
-- 
1.9.1

[Qemu-devel] [PATCH v8 10/11] migration: add postcopy total blocktime into query-migrate

2017-06-07 Thread Alexey Perevalov

Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
"postcopy-vcpu-blocktime": [115, 100],
"status": "completed",
"postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 hmp.c | 15 
 include/migration/migration.h |  4 +++
 migration/migration.c | 40 +++---
 migration/postcopy-ram.c  | 57 +++
 migration/trace-events|  1 +
 qapi-schema.json  |  9 ++-
 6 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index 8c72c58..e0c4fdf 100644
--- a/hmp.c
+++ b/hmp.c
@@ -262,6 +262,21 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->cpu_throttle_percentage);
 }
 
+if (info->has_postcopy_blocktime) {
+monitor_printf(mon, "postcopy blocktime: %" PRId64 "\n",
+   info->postcopy_blocktime);
+}
+
+if (info->has_postcopy_vcpu_blocktime) {
+Visitor *v;
+char *str;
+v = string_output_visitor_new(false, &str);
+visit_type_int64List(v, NULL, &info->postcopy_vcpu_blocktime, NULL);
+visit_complete(v, &str);
+monitor_printf(mon, "postcopy vcpu blocktime: %s\n", str);
+g_free(str);
+visit_free(v);
+}
 qapi_free_MigrationInfo(info);
 qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 766e802..7d20470 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -98,6 +98,10 @@ struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with blocktime context
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info);
 
 struct MigrationState
 {
diff --git a/migration/migration.c b/migration/migration.c
index d1cc34f..b80d5b5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,14 +625,15 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 }
 }
 
-MigrationInfo *qmp_query_migrate(Error **errp)
+static void fill_source_migration_info(MigrationInfo *info)
 {
-MigrationInfo *info = g_malloc0(sizeof(*info));
 MigrationState *s = migrate_get_current();
 
 switch (s->state) {
 case MIGRATION_STATUS_NONE:
 /* no migration has happened ever */
+/* do not overwrite destination migration status */
+return;
 break;
 case MIGRATION_STATUS_SETUP:
 info->has_status = true;
@@ -718,10 +719,43 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 break;
 }
 info->status = s->state;
+}
 
-return info;
+static void fill_destination_migration_info(MigrationInfo *info)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+
+switch (mis->state) {
+case MIGRATION_STATUS_NONE:
+return;
+break;
+case MIGRATION_STATUS_SETUP:
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_ACTIVE:
+case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+case MIGRATION_STATUS_FAILED:
+case MIGRATION_STATUS_COLO:
+info->has_status = true;
+break;
+case MIGRATION_STATUS_COMPLETED:
+info->has_status = true;
+fill_destination_postcopy_migration_info(info);
+break;
+}
+info->status = mis->state;
 }
 
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+MigrationInfo *info = g_malloc0(sizeof(*info));
+
+fill_destination_migration_info(info);
+fill_source_migration_info(info);
+
+return info;
+ }
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
   Error **errp)
 {
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0ad9f9f..7f5b402 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -117,6 +117,55 @@ static struct PostcopyBlocktimeContext 
*blocktime_context_new(void)
 return ctx;
 }
 
+static int64List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
+{
+int64List *list = NULL, *entry = NULL;
+int i;
+
+for (i = smp_cpus - 1; i >= 0; i--) {
+entry = g_new0(in

Re: [Qemu-devel] [RFC PATCH 1/3] vmstate: error hint for failed equal checks

2017-06-07 Thread Dr. David Alan Gilbert

* Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
> In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug
> (it's actually the best we can do). Especially in these cases a verbose
> error message is required.
> 
> Let's introduce infrastructure for specifying a error hint to be used if
> equal check fails.
> 
> Signed-off-by: Halil Pasic 
> ---
> Macros come in part 2. Once we are happy with the macros
> this two patches should be squashed into one. 
> ---
>  include/migration/vmstate.h |  1 +
>  migration/vmstate-types.c   | 36 +++-
>  2 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 66895623da..d90d9b12ca 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -200,6 +200,7 @@ typedef enum {
>  
>  struct VMStateField {
>  const char *name;
> +const char *err_hint;
>  size_t offset;
>  size_t size;
>  size_t start;
> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
> index 7287c6baa6..84d0545a38 100644
> --- a/migration/vmstate-types.c
> +++ b/migration/vmstate-types.c
> @@ -19,6 +19,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/queue.h"
>  #include "trace.h"
> +#include "qapi/error.h"
>  
>  /* bool */
>  
> @@ -118,6 +119,7 @@ const VMStateInfo vmstate_info_int32 = {
>  static int get_int32_equal(QEMUFile *f, void *pv, size_t size,
> VMStateField *field)
>  {
> +Error *err = NULL;
>  int32_t *v = pv;
>  int32_t v2;
>  qemu_get_sbe32s(f, &v2);
> @@ -125,7 +127,11 @@ static int get_int32_equal(QEMUFile *f, void *pv, size_t 
> size,
>  if (*v == v2) {
>  return 0;
>  }
> -error_report("%" PRIx32 " != %" PRIx32, *v, v2);
> +error_setg(&err, "%" PRIx32 " != %" PRIx32, *v, v2);
> +if (field->err_hint) {
> +error_append_hint(&err, "%s\n", field->err_hint);
> +}
> +error_report_err(err);

I'm a bit worried as to whether the error_append_hint data gets
printed out by error_report_err if we're being driven by a QMP
monitor.
error_report_err uses error_printf_unless_qmp

Since this code doesn't really handle Error *'s back up,
and always prints it's errors into stderr, I'd prefer if you just
used error_report again for the hint, something like:

if (field->err_hint) {
  error_report("%" PRIx32 " != %" PRIx32 "(%s)",
   *v, v2, field->err_hint);
} else {
  error_report("%" PRIx32 " != %" PRIx32, *v, v2);
}

Dave

>  return -EINVAL;
>  }
>  
> @@ -259,6 +265,7 @@ const VMStateInfo vmstate_info_uint32 = {
>  static int get_uint32_equal(QEMUFile *f, void *pv, size_t size,
>  VMStateField *field)
>  {
> +Error *err = NULL;
>  uint32_t *v = pv;
>  uint32_t v2;
>  qemu_get_be32s(f, &v2);
> @@ -266,7 +273,11 @@ static int get_uint32_equal(QEMUFile *f, void *pv, 
> size_t size,
>  if (*v == v2) {
>  return 0;
>  }
> -error_report("%" PRIx32 " != %" PRIx32, *v, v2);
> +error_setg(&err, "%" PRIx32 " != %" PRIx32, *v, v2);
> +if (field->err_hint) {
> +error_append_hint(&err, "%s\n", field->err_hint);
> +}
> +error_report_err(err);
>  return -EINVAL;
>  }
>  
> @@ -333,6 +344,7 @@ const VMStateInfo vmstate_info_nullptr = {
>  static int get_uint64_equal(QEMUFile *f, void *pv, size_t size,
>  VMStateField *field)
>  {
> +Error *err = NULL;
>  uint64_t *v = pv;
>  uint64_t v2;
>  qemu_get_be64s(f, &v2);
> @@ -340,7 +352,11 @@ static int get_uint64_equal(QEMUFile *f, void *pv, 
> size_t size,
>  if (*v == v2) {
>  return 0;
>  }
> -error_report("%" PRIx64 " != %" PRIx64, *v, v2);
> +error_setg(&err, "%" PRIx64 " != %" PRIx64, *v, v2);
> +if (field->err_hint) {
> +error_append_hint(&err, "%s\n", field->err_hint);
> +}
> +error_report_err(err);
>  return -EINVAL;
>  }
>  
> @@ -356,6 +372,7 @@ const VMStateInfo vmstate_info_uint64_equal = {
>  static int get_uint8_equal(QEMUFile *f, void *pv, size_t size,
> VMStateField *field)
>  {
> +Error *err = NULL;
>  uint8_t *v = pv;
>  uint8_t v2;
>  qemu_get_8s(f, &v2);
> @@ -363,7 +380,11 @@ static int get_uint8_equal(QEMUFile *f, void *pv, size_t 
> size,
>  if (*v == v2) {
>  return 0;
>  }
> -error_report("%x != %x", *v, v2);
> +error_setg(&err, "%x != %x", *v, v2);
> +if (field->err_hint) {
> +error_append_hint(&err, "%s\n", field->err_hint);
> +}
> +error_report_err(err);
>  return -EINVAL;
>  }
>  
> @@ -379,6 +400,7 @@ const VMStateInfo vmstate_info_uint8_equal = {
>  static int get_uint16_equal(QEMUFile *f, void *pv, size_t size,
>  VMStateField *field)
>  {
> +Error *err = NULL;
>  uint16_t *v = pv;
>  uint16_t v2;
>  qemu_get_be16s(f, &v2);
> @@ -386,7 +40

[Qemu-devel] [PATCH v8 04/11] migration: split ufd_version_check onto receive/request features part

2017-06-07 Thread Alexey Perevalov

This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.

QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.

Signed-off-by: Alexey Perevalov 
---
 migration/postcopy-ram.c | 94 
 1 file changed, 88 insertions(+), 6 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 8838901..cbe8f9f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,16 +63,67 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
-static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
+
+/**
+ * receive_ufd_features: check userfault fd features, to request only supported
+ * features in the future.
+ *
+ * Returns: true on success
+ *
+ * __NR_userfaultfd - should be checked before
+ *  @features: out parameter will contain uffdio_api.features provided by 
kernel
+ *  in case of success
+ */
+static bool receive_ufd_features(uint64_t *features)
 {
-struct uffdio_api api_struct;
-uint64_t ioctl_mask;
+struct uffdio_api api_struct = {0};
+int ufd;
+bool ret = true;
+
+/* if we are here __NR_userfaultfd should exists */
+ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+if (ufd == -1) {
+error_report("%s: syscall __NR_userfaultfd failed: %s", __func__,
+ strerror(errno));
+return false;
+}
 
+/* ask features */
 api_struct.api = UFFD_API;
 api_struct.features = 0;
 if (ioctl(ufd, UFFDIO_API, &api_struct)) {
 error_report("%s: UFFDIO_API failed: %s", __func__,
  strerror(errno));
+ret = false;
+goto release_ufd;
+}
+
+*features = api_struct.features;
+
+release_ufd:
+close(ufd);
+return ret;
+}
+
+/**
+ * request_ufd_features: this function should be called only once on a newly
+ * opened ufd, subsequent calls will lead to error.
+ *
+ * Returns: true on succes
+ *
+ * @ufd: fd obtained from userfaultfd syscall
+ * @features: bit mask see UFFD_API_FEATURES
+ */
+static bool request_ufd_features(int ufd, uint64_t features)
+{
+struct uffdio_api api_struct = {0};
+uint64_t ioctl_mask;
+
+api_struct.api = UFFD_API;
+api_struct.features = features;
+if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+error_report("%s failed: UFFDIO_API failed: %s", __func__,
+ strerror(errno));
 return false;
 }
 
@@ -84,11 +135,42 @@ static bool ufd_version_check(int ufd, 
MigrationIncomingState *mis)
 return false;
 }
 
+return true;
+}
+
+static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
+{
+uint64_t asked_features = 0;
+static uint64_t supported_features;
+
+/*
+ * it's not possible to
+ * request UFFD_API twice per one fd
+ * userfault fd features is persistent
+ */
+if (!supported_features) {
+if (!receive_ufd_features(&supported_features)) {
+error_report("%s failed", __func__);
+return false;
+}
+}
+
+/*
+ * request features, even if asked_features is 0, due to
+ * kernel expects UFFD_API before UFFDIO_REGISTER, per
+ * userfault file descriptor
+ */
+if (!request_ufd_features(ufd, asked_features)) {
+error_report("%s failed: features %" PRIu64, __func__,
+ asked_features);
+return false;
+}
+
 if (getpagesize() != ram_pagesize_summary()) {
 bool have_hp = false;
 /* We've got a huge page */
 #ifdef UFFD_FEATURE_MISSING_HUGETLBFS
-have_hp = api_struct.features & UFFD_FEATURE_MISSING_HUGETLBFS;
+have_hp = supported_features & UFFD_FEATURE_MISSING_HUGETLBFS;
 #endif
 if (!have_hp) {
 error_report("Userfault on this host does not support huge pages");
@@ -149,7 +231,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState 
*mis)
 }
 
 /* Version and features check */
-if (!ufd_version_check(ufd, mis)) {
+if (!ufd_check_and_apply(ufd, mis)) {
 goto out;
 }
 
@@ -525,7 +607,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * Although the host check already tested the API, we need to
  * do the check again as an ABI handshake on the new fd.
  */
-if (!ufd_version_check(mis->userfault_fd, mis)) {
+if (!ufd_check_and_apply(mis->userfault_fd, mis)) {
 return -1;
 }
 
-- 
1.9.1

[Qemu-devel] [PATCH v8 11/11] migration: postcopy_blocktime documentation

2017-06-07 Thread Alexey Perevalov

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Alexey Perevalov 
---
 docs/migration.txt | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index 1b940a8..4b625ca 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -402,6 +402,16 @@ will now cause the transition from precopy to postcopy.
 It can be issued immediately after migration is started or any
 time later on.  Issuing it after the end of a migration is harmless.
 
+Blocktime is a postcopy live migration metric, intended to show
+how long the vCPU was in state of interruptable sleep due to pagefault.
+This value is calculated on destination side.
+To enable postcopy blocktime calculation, enter following command on 
destination
+monitor:
+
+migrate_set_capability postcopy-blocktime on
+
+Postcopy blocktime can be retrieved by query-migrate qmp command.
+
 Note: During the postcopy phase, the bandwidth limits set using
 migrate_set_speed is ignored (to avoid delaying requested pages that
 the destination is waiting for).
-- 
1.9.1

[Qemu-devel] [PATCH v8 02/11] migration: pass MigrationIncomingState* into migration check functions

2017-06-07 Thread Alexey Perevalov

That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.

Signed-off-by: Alexey Perevalov 
---
 migration/migration.c|  3 ++-
 migration/postcopy-ram.c | 10 +-
 migration/postcopy-ram.h |  2 +-
 migration/savevm.c   |  2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 48c94c9..2a77636 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -726,6 +726,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
   Error **errp)
 {
 MigrationState *s = migrate_get_current();
+MigrationIncomingState *mis = migration_incoming_get_current();
 MigrationCapabilityStatusList *cap;
 bool old_postcopy_cap = migrate_postcopy_ram();
 
@@ -772,7 +773,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
  * special support.
  */
 if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-!postcopy_ram_supported_by_host()) {
+!postcopy_ram_supported_by_host(mis)) {
 /* postcopy_ram_supported_by_host will have emitted a more
  * detailed message
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9c41887..10d39a0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -63,7 +63,7 @@ struct PostcopyDiscardState {
 #include 
 #include 
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
 struct uffdio_api api_struct;
 uint64_t ioctl_mask;
@@ -126,7 +126,7 @@ static int test_ramblock_postcopiable(const char 
*block_name, void *host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 long pagesize = getpagesize();
 int ufd = -1;
@@ -149,7 +149,7 @@ bool postcopy_ram_supported_by_host(void)
 }
 
 /* Version and features check */
-if (!ufd_version_check(ufd)) {
+if (!ufd_version_check(ufd, mis)) {
 goto out;
 }
 
@@ -525,7 +525,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
  * Although the host check already tested the API, we need to
  * do the check again as an ABI handshake on the new fd.
  */
-if (!ufd_version_check(mis->userfault_fd)) {
+if (!ufd_version_check(mis->userfault_fd, mis)) {
 return -1;
 }
 
@@ -663,7 +663,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
 error_report("%s: No OS support", __func__);
 return false;
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 52d51e8..587a8b8 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/savevm.c b/migration/savevm.c
index 9c320f5..8b7bab8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1380,7 +1380,7 @@ static int 
loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
 return -1;
 }
 
-if (!postcopy_ram_supported_by_host()) {
+if (!postcopy_ram_supported_by_host(mis)) {
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
 return -1;
 }
-- 
1.9.1

[Qemu-devel] [PATCH V6 04/10] net/net.c: Add vnet_hdr support in SocketReadState

2017-06-07 Thread Zhang Chen

We add a flag to dicide whether net_fill_rstate() to read
the vnet_hdr_len or not.

Signed-off-by: Zhang Chen 
Suggested-by: Jason Wang 
---
 include/net/net.h   |  6 +-
 net/filter-mirror.c |  1 +
 net/net.c   | 33 ++---
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 9a92c70..b2167ae 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -112,9 +112,13 @@ typedef struct NICState {
 } NICState;
 
 struct SocketReadState {
-int state; /* 0 = getting length, 1 = getting data */
+/* 0 = getting length, 1 = getting vnet header length, 2 = getting data */
+int state;
+/* This flag decide whether to read the vnet_hdr_len field */
+bool vnet_hdr;
 uint32_t index;
 uint32_t packet_len;
+uint32_t vnet_hdr_len;
 uint8_t buf[NET_BUFSIZE];
 SocketReadStateFinalize *finalize;
 };
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 3413e82..4b03dda 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -267,6 +267,7 @@ static void filter_redirector_setup(NetFilterState *nf, 
Error **errp)
 }
 
 net_socket_rs_init(&s->rs, redirector_rs_finalize);
+s->rs.vnet_hdr = s->vnet_hdr;
 
 if (s->indev) {
 chr = qemu_chr_find(s->indev);
diff --git a/net/net.c b/net/net.c
index 4e7a305..b9b90c9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1606,8 +1606,10 @@ void net_socket_rs_init(SocketReadState *rs,
 SocketReadStateFinalize *finalize)
 {
 rs->state = 0;
+rs->vnet_hdr = false;
 rs->index = 0;
 rs->packet_len = 0;
+rs->vnet_hdr_len = 0;
 memset(rs->buf, 0, sizeof(rs->buf));
 rs->finalize = finalize;
 }
@@ -1622,8 +1624,12 @@ int net_fill_rstate(SocketReadState *rs, const uint8_t 
*buf, int size)
 unsigned int l;
 
 while (size > 0) {
-/* reassemble a packet from the network */
-switch (rs->state) { /* 0 = getting length, 1 = getting data */
+/* Reassemble a packet from the network.
+ * 0 = getting length.
+ * 1 = getting vnet header length.
+ * 2 = getting data.
+ */
+switch (rs->state) {
 case 0:
 l = 4 - rs->index;
 if (l > size) {
@@ -1637,10 +1643,31 @@ int net_fill_rstate(SocketReadState *rs, const uint8_t 
*buf, int size)
 /* got length */
 rs->packet_len = ntohl(*(uint32_t *)rs->buf);
 rs->index = 0;
-rs->state = 1;
+if (rs->vnet_hdr) {
+rs->state = 1;
+} else {
+rs->state = 2;
+rs->vnet_hdr_len = 0;
+}
 }
 break;
 case 1:
+l = 4 - rs->index;
+if (l > size) {
+l = size;
+}
+memcpy(rs->buf + rs->index, buf, l);
+buf += l;
+size -= l;
+rs->index += l;
+if (rs->index == 4) {
+/* got vnet header length */
+rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf);
+rs->index = 0;
+rs->state = 2;
+}
+break;
+case 2:
 l = rs->packet_len - rs->index;
 if (l > size) {
 l = size;
-- 
2.7.4

[Qemu-devel] [PATCH V6 01/10] net: Add vnet_hdr_len arguments in NetClientState

2017-06-07 Thread Zhang Chen

Add vnet_hdr_len arguments in NetClientState
that make other module get real vnet_hdr_len easily.

Signed-off-by: Zhang Chen 
---
 include/net/net.h | 1 +
 net/net.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 99b28d5..9a92c70 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -100,6 +100,7 @@ struct NetClientState {
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
 int vring_enable;
+int vnet_hdr_len;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
diff --git a/net/net.c b/net/net.c
index 0ac3b9e..4e7a305 100644
--- a/net/net.c
+++ b/net/net.c
@@ -491,6 +491,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 return;
 }
 
+nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH V6 05/10] net/colo.c: Make vnet_hdr_len as packet property

2017-06-07 Thread Zhang Chen

We can use this property flush and send packet with vnet_hdr_len.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c| 8 ++--
 net/colo.c| 3 ++-
 net/colo.h| 4 +++-
 net/filter-rewriter.c | 2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 4ab80b1..bf0b856 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -121,9 +121,13 @@ static int packet_enqueue(CompareState *s, int mode)
 Connection *conn;
 
 if (mode == PRIMARY_IN) {
-pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+pkt = packet_new(s->pri_rs.buf,
+ s->pri_rs.packet_len,
+ s->pri_rs.vnet_hdr_len);
 } else {
-pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+pkt = packet_new(s->sec_rs.buf,
+ s->sec_rs.packet_len,
+ s->sec_rs.vnet_hdr_len);
 }
 
 if (parse_packet_early(pkt)) {
diff --git a/net/colo.c b/net/colo.c
index 8cc166b..180eaed 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -153,13 +153,14 @@ void connection_destroy(void *opaque)
 g_slice_free(Connection, conn);
 }
 
-Packet *packet_new(const void *data, int size)
+Packet *packet_new(const void *data, int size, int vnet_hdr_len)
 {
 Packet *pkt = g_slice_new(Packet);
 
 pkt->data = g_memdup(data, size);
 pkt->size = size;
 pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+pkt->vnet_hdr_len = vnet_hdr_len;
 
 return pkt;
 }
diff --git a/net/colo.h b/net/colo.h
index 7c524f3..caedb0d 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -43,6 +43,8 @@ typedef struct Packet {
 int size;
 /* Time of packet creation, in wall clock ms */
 int64_t creation_ms;
+/* Get vnet_hdr_len from filter */
+uint32_t vnet_hdr_len;
 } Packet;
 
 typedef struct ConnectionKey {
@@ -82,7 +84,7 @@ Connection *connection_get(GHashTable *connection_track_table,
ConnectionKey *key,
GQueue *conn_list);
 void connection_hashtable_reset(GHashTable *connection_track_table);
-Packet *packet_new(const void *data, int size);
+Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
 
 #endif /* QEMU_COLO_PROXY_H */
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index afa06e8..63256c7 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -158,7 +158,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 char *buf = g_malloc0(size);
 
 iov_to_buf(iov, iovcnt, 0, buf, size);
-pkt = packet_new(buf, size);
+pkt = packet_new(buf, size, 0);
 g_free(buf);
 
 /*
-- 
2.7.4

[Qemu-devel] [PATCH V6 08/10] net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare

2017-06-07 Thread Zhang Chen

COLO-Proxy just focus on packet payload, So we skip vnet header.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index e33cf7e..ad1c3d5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -201,8 +201,11 @@ static int colo_packet_compare_common(Packet *ppkt, Packet 
*spkt, int offset)
sec_ip_src, sec_ip_dst);
 }
 
+offset = ppkt->vnet_hdr_len + offset;
+
 if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset, spkt->data + offset,
+return memcmp(ppkt->data + offset,
+  spkt->data + offset,
   spkt->size - offset);
 } else {
 trace_colo_compare_main("Net packet size are not the same");
@@ -261,8 +264,9 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  */
 if (ptcp->th_off > 5) {
 ptrdiff_t tcp_offset;
+
 tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
- + (ptcp->th_off * 4);
+ + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
 res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
 } else if (ptcp->th_sum == stcp->th_sum) {
 res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
-- 
2.7.4

[Qemu-devel] [PATCH V6 02/10] net/filter-mirror.c: Make filter mirror support vnet support.

2017-06-07 Thread Zhang Chen

We add the vnet_hdr_support option for filter-mirror, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support

If have vnet_hdr_support flag, we will change the send packet format from
struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const 
uint8_t buf[];}.
make other module(like colo-compare) know how to parse net packet correctly.

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 69 +
 qemu-options.hx |  5 ++--
 2 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 72fa7c2..50aa81b 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -38,15 +38,17 @@ typedef struct MirrorState {
 NetFilterState parent_obj;
 char *indev;
 char *outdev;
+bool vnet_hdr;
 CharBackend chr_in;
 CharBackend chr_out;
 SocketReadState rs;
 } MirrorState;
 
-static int filter_mirror_send(CharBackend *chr_out,
+static int filter_mirror_send(MirrorState *s,
   const struct iovec *iov,
   int iovcnt)
 {
+NetFilterState *nf = NETFILTER(s);
 int ret = 0;
 ssize_t size = 0;
 uint32_t len = 0;
@@ -58,14 +60,43 @@ static int filter_mirror_send(CharBackend *chr_out,
 }
 
 len = htonl(size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 if (ret != sizeof(len)) {
 goto err;
 }
 
+if (s->vnet_hdr) {
+/*
+ * If vnet_hdr = on, we send vnet header len to make other
+ * module(like colo-compare) know how to parse net
+ * packet correctly.
+ */
+ssize_t vnet_hdr_len;
+
+/*
+ * In anytime, nf->netdev and nf->netdev->peer both have a 
vnet_hdr_len,
+ * Here we just find out which is we need. When filter set RX or TX
+ * that the real vnet_hdr_len are different.
+ */
+if (nf->direction == NET_FILTER_DIRECTION_RX ||
+nf->direction == NET_FILTER_DIRECTION_ALL) {
+vnet_hdr_len = nf->netdev->vnet_hdr_len;
+} else if (nf->direction == NET_FILTER_DIRECTION_TX) {
+vnet_hdr_len = nf->netdev->peer->vnet_hdr_len;
+} else {
+return 0;
+}
+
+len = htonl(vnet_hdr_len);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+}
+
 buf = g_malloc(size);
 iov_to_buf(iov, iovcnt, 0, buf, size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
 g_free(buf);
 if (ret != size) {
 goto err;
@@ -141,7 +172,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
 MirrorState *s = FILTER_MIRROR(nf);
 int ret;
 
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
 if (ret) {
 error_report("filter_mirror_send failed(%s)", strerror(-ret));
 }
@@ -164,7 +195,7 @@ static ssize_t filter_redirector_receive_iov(NetFilterState 
*nf,
 int ret;
 
 if (qemu_chr_fe_get_driver(&s->chr_out)) {
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
 if (ret) {
 error_report("filter_mirror_send failed(%s)", strerror(-ret));
 }
@@ -308,6 +339,13 @@ static char *filter_mirror_get_outdev(Object *obj, Error 
**errp)
 return g_strdup(s->outdev);
 }
 
+static bool filter_mirror_get_vnet_hdr(Object *obj, Error **errp)
+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+return s->vnet_hdr;
+}
+
 static void
 filter_mirror_set_outdev(Object *obj, const char *value, Error **errp)
 {
@@ -322,6 +360,15 @@ filter_mirror_set_outdev(Object *obj, const char *value, 
Error **errp)
 }
 }
 
+static void filter_mirror_set_vnet_hdr(Object *obj,
+   bool value,
+   Error **errp)
+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+s->vnet_hdr = value;
+}
+
 static char *filter_redirector_get_outdev(Object *obj, Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
@@ -340,8 +387,20 @@ filter_redirector_set_outdev(Object *obj, const char 
*value, Error **errp)
 
 static void filter_mirror_init(Object *obj)
 {
+MirrorState *s = FILTER_MIRROR(obj);
+
 object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
 filter_mirror_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+

[Qemu-devel] [PATCH V6 00/10] Add COLO-proxy virtio-net support

2017-06-07 Thread Zhang Chen

If user use -device virtio-net-pci, virtio-net driver will add a header
to raw net packet that colo-proxy can't handle it. COLO-proxy just
focus on the packet payload, so we skip the virtio-net header to compare
the sent packet that primary guest's to secondary guest's.

V6:
 - p1: Remove the using_vnet_hdr, I will send a independent
   patchset about it.
 - p2: Change option input way from vnet_hdr=on/off to vnet_hdr_support.
   Use nf->direction to decide check nf->netdev->vnet_hdr_len or 
nf->netdev->peer->vnet_hdr_len.
 - p3: Change option input way from vnet_hdr=on/off to vnet_hdr_support.
 - p4: No change.
 - p5: No change.
 - p6: Change option input way from vnet_hdr=on/off to vnet_hdr_support.
   Fix commit log.
 - p7: No change.
 - p8: No change.
 - p9: Change option input way from vnet_hdr=on/off to vnet_hdr_support.
   Use nf->direction to decide check nf->netdev->vnet_hdr_len or 
nf->netdev->peer->vnet_hdr_len.
 - p10: New patch to add new example for the case needs vnet hdr.

V5:
 - patch1: No change.
 - patch2: Keep the long line in qemu-option.hx.
   Squash patch2 into old patch3.
   Add more comments.
   Fix the return value.
 - patch3: Add more comments in commit log.
 - patch4: Add Suggested-by tag.
   Fix commit log.
   Move vnet_hdr to SocketReadState.
 - patch5: No change.
 - patch6: Squash old patch6 into this patch.
 - patch7: No change.
 - patch8: Remove the offset_all.
 - patch9: Squash old patch11 and the patch12.


V4:
 - Add vnet_hdr option for filter-mirror, filter-redirector,
   filter-rewriter,colo-compare.
 - Use new design to impliment virtio-net support for colo-proxy.
 - Fix codestyle.
 - Remove unused option for filter-rewriter.
 - Add filter-rewriter virtio-net support.
 - Address other comments.


Zhang Chen (10):
  net: Add vnet_hdr_len arguments in NetClientState
  net/filter-mirror.c: Make filter mirror support vnet support.
  net/filter-mirror.c: Add new option to enable vnet support for
filter-redirector
  net/net.c: Add vnet_hdr support in SocketReadState
  net/colo.c: Make vnet_hdr_len as packet property
  net/colo-compare.c: Make colo-compare support vnet_hdr_len
  net/colo.c: Add vnet packet parse feature in colo-proxy
  net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
  net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
  docs/colo-proxy.txt: Update colo-proxy usage of net driver with
vnet_header

 docs/colo-proxy.txt   | 26 ++
 include/net/net.h |  7 +++-
 net/colo-compare.c| 86 +---
 net/colo.c|  9 ++---
 net/colo.h|  4 ++-
 net/filter-mirror.c   | 98 ---
 net/filter-rewriter.c | 51 ++-
 net/net.c | 34 --
 qemu-options.hx   | 19 +-
 9 files changed, 296 insertions(+), 38 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH V6 10/10] docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header

2017-06-07 Thread Zhang Chen

Signed-off-by: Zhang Chen 
---
 docs/colo-proxy.txt | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
index c4941de..f6a624f 100644
--- a/docs/colo-proxy.txt
+++ b/docs/colo-proxy.txt
@@ -182,6 +182,32 @@ Secondary(ip:3.3.3.8):
 -chardev socket,id=red1,host=3.3.3.3,port=9004
 -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
 -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
+-object filter-rewriter,id=f3,netdev=hn0,queue=all
+
+If you want to use virtio-net-pci or other driver with vnet_header:
+
+Primary(ip:3.3.3.3):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
+-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
+-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
+-object 
filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out,vnet_hdr_support
+-object 
filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0,vnet_hdr_support
+-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
+
+Secondary(ip:3.3.3.8):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
+-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=red0,host=3.3.3.3,port=9003
+-chardev socket,id=red1,host=3.3.3.3,port=9004
+-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0,vnet_hdr_support
+-object 
filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1,vnet_hdr_support
+-object filter-rewriter,id=f3,netdev=hn0,queue=all,vnet_hdr_support
 
 Note:
   a.COLO-proxy must work with COLO-frame and Block-replication.
-- 
2.7.4

[Qemu-devel] [PATCH V6 09/10] net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len

2017-06-07 Thread Zhang Chen

We add the vnet_hdr_support option for filter-rewriter, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr_support

We get the vnet_hdr_len from NetClientState that make us
parse net packet correctly.

Signed-off-by: Zhang Chen 
---
 net/filter-rewriter.c | 51 ++-
 qemu-options.hx   |  4 ++--
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 63256c7..8eaf0e8 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -17,6 +17,7 @@
 #include "qemu-common.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qapi-visit.h"
 #include "qom/object.h"
 #include "qemu/main-loop.h"
@@ -33,6 +34,7 @@ typedef struct RewriterState {
 NetQueue *incoming_queue;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
+bool vnet_hdr;
 } RewriterState;
 
 static void filter_rewriter_flush(NetFilterState *nf)
@@ -155,10 +157,25 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
 ConnectionKey key;
 Packet *pkt;
 ssize_t size = iov_size(iov, iovcnt);
+ssize_t vnet_hdr_len = 0;
 char *buf = g_malloc0(size);
 
 iov_to_buf(iov, iovcnt, 0, buf, size);
-pkt = packet_new(buf, size, 0);
+
+if (s->vnet_hdr) {
+if (nf->direction == NET_FILTER_DIRECTION_RX ||
+nf->direction == NET_FILTER_DIRECTION_ALL) {
+vnet_hdr_len = nf->netdev->vnet_hdr_len;
+} else if (nf->direction == NET_FILTER_DIRECTION_TX) {
+vnet_hdr_len = nf->netdev->peer->vnet_hdr_len;
+} else {
+error_report("filter-rewriter get vnet_hdr_len failed");
+/* When error occurred we drop the packet  */
+return 1;
+}
+}
+
+pkt = packet_new(buf, size, vnet_hdr_len);
 g_free(buf);
 
 /*
@@ -237,6 +254,37 @@ static void colo_rewriter_setup(NetFilterState *nf, Error 
**errp)
 s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }
 
+static bool filter_rewriter_get_vnet_hdr(Object *obj, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+return s->vnet_hdr;
+}
+
+static void filter_rewriter_set_vnet_hdr(Object *obj,
+ bool value,
+ Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+s->vnet_hdr = value;
+}
+
+static void filter_rewriter_init(Object *obj)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_bool(obj, "vnet_hdr_support",
+ filter_rewriter_get_vnet_hdr,
+ filter_rewriter_set_vnet_hdr, NULL);
+}
+
 static void colo_rewriter_class_init(ObjectClass *oc, void *data)
 {
 NetFilterClass *nfc = NETFILTER_CLASS(oc);
@@ -250,6 +298,7 @@ static const TypeInfo colo_rewriter_info = {
 .name = TYPE_FILTER_REWRITER,
 .parent = TYPE_NETFILTER,
 .class_init = colo_rewriter_class_init,
+.instance_init = filter_rewriter_init,
 .instance_size = sizeof(RewriterState),
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index fbfd604..8655842 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4037,12 +4037,12 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
-@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode},queue=@var{all|rx|tx},[vnet_hdr_support]
 
 Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
 secondary from primary to keep secondary tcp connection,and rewrite
 tcp packet to primary from secondary make tcp packet can be handled by
-client.
+client.if have the vnet_hdr_support flag, we can parse packet with vnet header.
 
 usage:
 colo secondary:
-- 
2.7.4

[Qemu-devel] [PATCH V6 03/10] net/filter-mirror.c: Add new option to enable vnet support for filter-redirector

2017-06-07 Thread Zhang Chen

We add the vnet_hdr_support option for filter-redirector, default is disable.
If you use virtio-net-pci net driver, please enable it.
Because colo-compare or other modules needs the vnet_hdr_len to parse
packet, so we add this new option send the len to others.
You can use it for example:
-object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr_support

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 28 
 qemu-options.hx |  6 +++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 50aa81b..3413e82 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -376,6 +376,13 @@ static char *filter_redirector_get_outdev(Object *obj, 
Error **errp)
 return g_strdup(s->outdev);
 }
 
+static bool filter_redirector_get_vnet_hdr(Object *obj, Error **errp)
+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+return s->vnet_hdr;
+}
+
 static void
 filter_redirector_set_outdev(Object *obj, const char *value, Error **errp)
 {
@@ -385,6 +392,15 @@ filter_redirector_set_outdev(Object *obj, const char 
*value, Error **errp)
 s->outdev = g_strdup(value);
 }
 
+static void filter_redirector_set_vnet_hdr(Object *obj,
+   bool value,
+   Error **errp)
+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+s->vnet_hdr = value;
+}
+
 static void filter_mirror_init(Object *obj)
 {
 MirrorState *s = FILTER_MIRROR(obj);
@@ -405,10 +421,22 @@ static void filter_mirror_init(Object *obj)
 
 static void filter_redirector_init(Object *obj)
 {
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
 object_property_add_str(obj, "indev", filter_redirector_get_indev,
 filter_redirector_set_indev, NULL);
 object_property_add_str(obj, "outdev", filter_redirector_get_outdev,
 filter_redirector_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_bool(obj, "vnet_hdr_support",
+ filter_redirector_get_vnet_hdr,
+ filter_redirector_set_vnet_hdr, NULL);
 }
 
 static void filter_mirror_fini(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 5c09fae..e78b942 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4028,11 +4028,11 @@ queue @var{all|rx|tx} is an option that can be applied 
to any netfilter.
 
 filter-mirror on netdev @var{netdevid},mirror net packet to 
chardev@var{chardevid}, if have the vnet_hdr_support flag, filter-mirror will 
mirror packet with vnet_hdr_len.
 
-@item -object 
filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
-outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+@item -object 
filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
 
 filter-redirector on netdev @var{netdevid},redirect filter's net packet to 
chardev
-@var{chardevid},and redirect indev's packet to filter.
+@var{chardevid},and redirect indev's packet to filter.if have the 
vnet_hdr_support flag,
+filter-redirector will redirect packet with vnet_hdr_len.
 Create a filter-redirector we need to differ outdev id from indev id, id can 
not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
-- 
2.7.4

[Qemu-devel] [PULL 4/7] ram: Move ZERO_TARGET_PAGE inside XBZRLE

2017-06-07 Thread Juan Quintela

It was only used by XBZRLE anyways.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Peter Xu 
---
 migration/ram.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 701a1e6..ac30e9e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -69,8 +69,6 @@
 /* 0x80 is reserved in migration.h start with 0x100 next */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
-static uint8_t *ZERO_TARGET_PAGE;
-
 static inline bool is_zero_range(uint8_t *p, uint64_t size)
 {
 return buffer_is_zero(p, size);
@@ -86,6 +84,8 @@ static struct {
 /* Cache for XBZRLE, Protected by lock. */
 PageCache *cache;
 QemuMutex lock;
+/* it will store a page full of zeros */
+uint8_t *zero_target_page;
 } XBZRLE;
 
 /* buffer used for XBZRLE decoding */
@@ -512,7 +512,7 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t 
current_addr)
 
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
-cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE,
+cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
  rs->bitmap_sync_count);
 }
 
@@ -1456,10 +1456,11 @@ static void ram_migration_cleanup(void *opaque)
 cache_fini(XBZRLE.cache);
 g_free(XBZRLE.encoded_buf);
 g_free(XBZRLE.current_buf);
-g_free(ZERO_TARGET_PAGE);
+g_free(XBZRLE.zero_target_page);
 XBZRLE.cache = NULL;
 XBZRLE.encoded_buf = NULL;
 XBZRLE.current_buf = NULL;
+XBZRLE.zero_target_page = NULL;
 }
 XBZRLE_cache_unlock();
 migration_page_queue_free(rs);
@@ -1880,7 +1881,7 @@ static int ram_state_init(RAMState *rs)
 
 if (migrate_use_xbzrle()) {
 XBZRLE_cache_lock();
-ZERO_TARGET_PAGE = g_malloc0(TARGET_PAGE_SIZE);
+XBZRLE.zero_target_page = g_malloc0(TARGET_PAGE_SIZE);
 XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() /
   TARGET_PAGE_SIZE,
   TARGET_PAGE_SIZE);
-- 
2.9.4

[Qemu-devel] [PULL 0/7] Migration PULL requset

2017-06-07 Thread Juan Quintela

Hi

This is the migration pull requset, it contains:
- fix for segfault of io tests (QingFeng)
- half of make consistent output
- Make RAMState dynamic.

Please, apply.

Thanks, Juan.

The following changes since commit 65dfad62a176f5265f801683be64149c5ad55f7d:

  Merge remote-tracking branch 'remotes/xtensa/tags/20170606-xtensa' into 
staging (2017-06-06 17:00:12 +0100)

are available in the git repository at:

  git://github.com/juanquintela/qemu.git tags/migration/20170607

for you to fetch changes up to eefff991d059d299b917627d2a95bce34d2f97f3:

  qemu/migration: fix the double free problem on from_src_file (2017-06-07 
10:20:56 +0200)


migration/next for 20170607


Juan Quintela (6):
  ram: Unfold get_xbzrle_cache_stats() into populate_ram_info()
  ram: We only print throttling information sometimes
  ram: Call migration_page_queue_free() at ram_migration_cleanup()
  ram: Move ZERO_TARGET_PAGE inside XBZRLE
  ram: Use MigrationStats for statistics
  ram: Make RAMState dynamic

QingFeng Hao (1):
  qemu/migration: fix the double free problem on from_src_file

 migration/migration.c |  59 ++---
 migration/ram.c   | 233 +++---
 migration/ram.h   |  15 +---
 migration/savevm.c|   1 -
 4 files changed, 114 insertions(+), 194 deletions(-)

Re: [Qemu-devel] [PATCH 1/5] coccinelle: replace code with ROUND_UP macro

2017-06-07 Thread Juan Quintela

Marc-André Lureau  wrote:
> I used a the following coccinelle script:
>
> @@
> expression e1;
> @@
> - ((e1) + (3)) / (4) * (4)
> + ROUND_UP(e1,4)
>
> @@
> expression e1;
> expression e2;
> @@
> -(ROUND_UP(e1,e2))
> +ROUND_UP(e1,e2)
>
> I tried with various other values (4, 8, 16, 32), but got only the
> matches in this patch.
>
> Signed-off-by: Marc-André Lureau 

Reviewed-by: Juan Quintela

[Qemu-devel] [PATCH V6 07/10] net/colo.c: Add vnet packet parse feature in colo-proxy

2017-06-07 Thread Zhang Chen

Make colo-compare and filter-rewriter can parse vnet packet.

Signed-off-by: Zhang Chen 
---
 net/colo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 180eaed..28ce7c8 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -43,11 +43,11 @@ int parse_packet_early(Packet *pkt)
 {
 int network_length;
 static const uint8_t vlan[] = {0x81, 0x00};
-uint8_t *data = pkt->data;
+uint8_t *data = pkt->data + pkt->vnet_hdr_len;
 uint16_t l3_proto;
 ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
 
-if (pkt->size < ETH_HLEN) {
+if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
 trace_colo_proxy_main("pkt->size < ETH_HLEN");
 return 1;
 }
@@ -73,7 +73,7 @@ int parse_packet_early(Packet *pkt)
 }
 
 network_length = pkt->ip->ip_hl * 4;
-if (pkt->size < l2hdr_len + network_length) {
+if (pkt->size < l2hdr_len + network_length + pkt->vnet_hdr_len) {
 trace_colo_proxy_main("pkt->size < network_header + network_length");
 return 1;
 }
-- 
2.7.4

[Qemu-devel] [PULL 1/7] ram: Unfold get_xbzrle_cache_stats() into populate_ram_info()

2017-06-07 Thread Juan Quintela

They were called consecutively always.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 migration/migration.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 48c94c9..b1e68c0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -588,20 +588,6 @@ static bool migration_is_setup_or_active(int state)
 }
 }
 
-static void get_xbzrle_cache_stats(MigrationInfo *info)
-{
-if (migrate_use_xbzrle()) {
-info->has_xbzrle_cache = true;
-info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
-info->xbzrle_cache->cache_size = migrate_xbzrle_cache_size();
-info->xbzrle_cache->bytes = xbzrle_mig_bytes_transferred();
-info->xbzrle_cache->pages = xbzrle_mig_pages_transferred();
-info->xbzrle_cache->cache_miss = xbzrle_mig_pages_cache_miss();
-info->xbzrle_cache->cache_miss_rate = xbzrle_mig_cache_miss_rate();
-info->xbzrle_cache->overflow = xbzrle_mig_pages_overflow();
-}
-}
-
 static void populate_ram_info(MigrationInfo *info, MigrationState *s)
 {
 info->has_ram = true;
@@ -619,6 +605,17 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->postcopy_requests = ram_postcopy_requests();
 info->ram->page_size = qemu_target_page_size();
 
+if (migrate_use_xbzrle()) {
+info->has_xbzrle_cache = true;
+info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
+info->xbzrle_cache->cache_size = migrate_xbzrle_cache_size();
+info->xbzrle_cache->bytes = xbzrle_mig_bytes_transferred();
+info->xbzrle_cache->pages = xbzrle_mig_pages_transferred();
+info->xbzrle_cache->cache_miss = xbzrle_mig_pages_cache_miss();
+info->xbzrle_cache->cache_miss_rate = xbzrle_mig_cache_miss_rate();
+info->xbzrle_cache->overflow = xbzrle_mig_pages_overflow();
+}
+
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 info->ram->remaining = ram_bytes_remaining();
 info->ram->dirty_pages_rate = ram_dirty_pages_rate();
@@ -664,7 +661,6 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->cpu_throttle_percentage = cpu_throttle_get_percentage();
 }
 
-get_xbzrle_cache_stats(info);
 break;
 case MIGRATION_STATUS_POSTCOPY_ACTIVE:
 /* Mostly the same as active; TODO add some postcopy stats */
@@ -687,15 +683,12 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->disk->total = blk_mig_bytes_total();
 }
 
-get_xbzrle_cache_stats(info);
 break;
 case MIGRATION_STATUS_COLO:
 info->has_status = true;
 /* TODO: display COLO specific information (checkpoint info etc.) */
 break;
 case MIGRATION_STATUS_COMPLETED:
-get_xbzrle_cache_stats(info);
-
 info->has_status = true;
 info->has_total_time = true;
 info->total_time = s->total_time;
-- 
2.9.4

Re: [Qemu-devel] [PATCH 4/5] Replace g_malloc()+memcpy() with g_memdup()

2017-06-07 Thread Juan Quintela

Marc-André Lureau  wrote:
> I found these pattern via grepping the source tree. I don't have a
> coccinelle script for it!
>
> Signed-off-by: Marc-André Lureau 

Reviewed-by: Juan Quintela

[Qemu-devel] [PATCH V6 06/10] net/colo-compare.c: Make colo-compare support vnet_hdr_len

2017-06-07 Thread Zhang Chen

We add the vnet_hdr_support option for colo-compare, default is disable.
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
You can use it for example:
-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support

COLO-compare can get vnet header length from filter,
Add vnet_hdr_len to struct packet and output packet with
the vnet_hdr_len.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 70 +++---
 qemu-options.hx|  4 ++--
 2 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index bf0b856..e33cf7e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -73,6 +73,7 @@ typedef struct CompareState {
 CharBackend chr_out;
 SocketReadState pri_rs;
 SocketReadState sec_rs;
+bool vnet_hdr;
 
 /* connection list: the connections belonged to this NIC could be found
  * in this list.
@@ -97,9 +98,10 @@ enum {
 SECONDARY_IN,
 };
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
-uint32_t size);
+uint32_t size,
+uint32_t vnet_hdr_len);
 
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
@@ -472,7 +474,10 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 }
 
 if (result) {
-ret = compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+ret = compare_chr_send(s,
+   pkt->data,
+   pkt->size,
+   pkt->vnet_hdr_len);
 if (ret < 0) {
 error_report("colo_send_primary_packet failed");
 }
@@ -493,9 +498,10 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 }
 }
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
-uint32_t size)
+uint32_t size,
+uint32_t vnet_hdr_len)
 {
 int ret = 0;
 uint32_t len = htonl(size);
@@ -504,12 +510,24 @@ static int compare_chr_send(CharBackend *out,
 return 0;
 }
 
-ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 if (ret != sizeof(len)) {
 goto err;
 }
 
-ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (s->vnet_hdr) {
+/*
+ * We send vnet header len make other module(like filter-redirector)
+ * know how to parse net packet correctly.
+ */
+len = htonl(vnet_hdr_len);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+}
+
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
 if (ret != size) {
 goto err;
 }
@@ -646,13 +664,32 @@ static void compare_set_outdev(Object *obj, const char 
*value, Error **errp)
 s->outdev = g_strdup(value);
 }
 
+static bool compare_get_vnet_hdr(Object *obj, Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+return s->vnet_hdr;
+}
+
+static void compare_set_vnet_hdr(Object *obj,
+ bool value,
+ Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+s->vnet_hdr = value;
+}
+
 static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 {
 CompareState *s = container_of(pri_rs, CompareState, pri_rs);
 
 if (packet_enqueue(s, PRIMARY_IN)) {
 trace_colo_compare_main("primary: unsupported packet in");
-compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
+compare_chr_send(s,
+ pri_rs->buf,
+ pri_rs->packet_len,
+ pri_rs->vnet_hdr_len);
 } else {
 /* compare connection */
 g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -735,7 +772,9 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 }
 
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
+s->pri_rs.vnet_hdr = s->vnet_hdr;
 net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
+s->sec_rs.vnet_hdr = s->vnet_hdr;
 
 g_queue_init(&s->conn_list);
 
@@ -761,7 +800,10 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 
 while (!g_queue_is_empty(&conn->primary_list)) {
 pkt = g_queue_pop_head(&conn->primary_list);
-compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+compare_chr_send(s,
+ pkt->data,
+ pkt->size,
+

[Qemu-devel] [PULL 2/7] ram: We only print throttling information sometimes

2017-06-07 Thread Juan Quintela

Change it to be consistent with everything else.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 migration/migration.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b1e68c0..9c5ff57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -616,6 +616,11 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->xbzrle_cache->overflow = xbzrle_mig_pages_overflow();
 }
 
+if (cpu_throttle_active()) {
+info->has_cpu_throttle_percentage = true;
+info->cpu_throttle_percentage = cpu_throttle_get_percentage();
+}
+
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 info->ram->remaining = ram_bytes_remaining();
 info->ram->dirty_pages_rate = ram_dirty_pages_rate();
@@ -656,11 +661,6 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info->disk->total = blk_mig_bytes_total();
 }
 
-if (cpu_throttle_active()) {
-info->has_cpu_throttle_percentage = true;
-info->cpu_throttle_percentage = cpu_throttle_get_percentage();
-}
-
 break;
 case MIGRATION_STATUS_POSTCOPY_ACTIVE:
 /* Mostly the same as active; TODO add some postcopy stats */
-- 
2.9.4

[Qemu-devel] [PULL 3/7] ram: Call migration_page_queue_free() at ram_migration_cleanup()

2017-06-07 Thread Juan Quintela

We shouldn't be using memory later than that.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Peter Xu 
---
 migration/migration.c | 2 --
 migration/ram.c   | 5 +++--
 migration/ram.h   | 1 -
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9c5ff57..9cf47d3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -948,8 +948,6 @@ static void migrate_fd_cleanup(void *opaque)
 qemu_bh_delete(s->cleanup_bh);
 s->cleanup_bh = NULL;
 
-migration_page_queue_free();
-
 if (s->to_dst_file) {
 trace_migrate_fd_cleanup();
 qemu_mutex_unlock_iothread();
diff --git a/migration/ram.c b/migration/ram.c
index f387e9c..701a1e6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1184,10 +1184,9 @@ static bool get_queued_page(RAMState *rs, 
PageSearchStatus *pss)
  * be some left.  in case that there is any page left, we drop it.
  *
  */
-void migration_page_queue_free(void)
+static void migration_page_queue_free(RAMState *rs)
 {
 struct RAMSrcPageRequest *mspr, *next_mspr;
-RAMState *rs = &ram_state;
 /* This queue generally should be empty - but in the case of a failed
  * migration might have some droppings in.
  */
@@ -1437,6 +1436,7 @@ void free_xbzrle_decoded_buf(void)
 
 static void ram_migration_cleanup(void *opaque)
 {
+RAMState *rs = opaque;
 RAMBlock *block;
 
 /* caller have hold iothread lock or is in a bh, so there is
@@ -1462,6 +1462,7 @@ static void ram_migration_cleanup(void *opaque)
 XBZRLE.current_buf = NULL;
 }
 XBZRLE_cache_unlock();
+migration_page_queue_free(rs);
 }
 
 static void ram_state_reset(RAMState *rs)
diff --git a/migration/ram.h b/migration/ram.h
index c9563d1..d4da419 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -53,7 +53,6 @@ void migrate_decompress_threads_create(void);
 void migrate_decompress_threads_join(void);
 
 uint64_t ram_pagesize_summary(void);
-void migration_page_queue_free(void);
 int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
 void acct_update_position(QEMUFile *f, size_t size, bool zero);
 void free_xbzrle_decoded_buf(void);
-- 
2.9.4

1 2 3 4 5 >

1 - 100 of 453 matches

Mail list logo