date:20150611

Re: [Qemu-devel] [PATCH v5 2/4] monitor: cleanup parsing of cmd name and cmd arguments

2015-06-11 Thread Markus Armbruster

Bandan Das  writes:

> There's too much going on in monitor_parse_command().
> Split up the arguments parsing bits into a separate function
> monitor_parse_arguments(). Let the original function check for
> command validity and sub-commands if any and return data (*cmd)
> that the newly introduced function can process and return a
> QDict. Also, pass a pointer to the cmdline to track current
> parser location.
>
> Suggested-by: Markus Armbruster 
> Signed-off-by: Bandan Das 

Doesn't apply cleanly anymore.  Please double-check my conflict
resolution carefully:

diff --git a/monitor.c b/monitor.c
index bcb88cd..0b0a8df 100644
--- a/monitor.c
+++ b/monitor.c
[...]
@@ -4156,13 +4168,17 @@ static void handle_hmp_command(Monitor *mon, const char 
*cmdline)
 QDict *qdict;
 const mon_cmd_t *cmd;
 
-qdict = qdict_new();
+cmd = monitor_parse_command(mon, &cmdline, mon->cmd_table);
+if (!cmd) {
+return;
+}
 
-cmd = monitor_parse_command(mon, cmdline, 0, mon->cmd_table, qdict);
-if (cmd) {
-cmd->mhandler.cmd(mon, qdict);
+qdict = monitor_parse_arguments(mon, &cmdline, cmd);
+if (!qdict) {
+return;
 }
 
+cmd->mhandler.cmd(mon, qdict);
 QDECREF(qdict);
 }

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Gerd Hoffmann

  Hi,

> On each boot, coreboot might decide to assign a different bus id to
> the extra roots (for example, if a device with a PCI bridge is
> inserted and it's bus allocation causes bus ids to shift).
> Technically, coreboot could even change the order extra buses are
> assigned bus ids, but doesn't today.
> 
> This was seen on several AMD systems - I'm told at least some Intel
> systems have multiple root buses, but the bus numbers are just hard
> wired.

This is how the qemu pxb works: root bus numbers are a config option for
the root bridge device, i.e. from the guest point of view they are
hard-wired.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 1/2] qobject: Use 'bool' for qbool

2015-06-11 Thread Markus Armbruster

Patch looks good to me, but it made me wonder about something.  Please
find the question inline.

Eric Blake  writes:

> We require a C99 compiler, so let's use 'bool' instead of 'int'
> when dealing with boolean values.  There are few enough clients
> to fix them all in one pass.
>
> Signed-off-by: Eric Blake 
[...]
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index 717cb8f..015d785 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -558,9 +558,9 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
>  }
>
>  if (token_is_keyword(token, "true")) {
> -ret = QOBJECT(qbool_from_int(true));
> +ret = QOBJECT(qbool_from_bool(true));
>  } else if (token_is_keyword(token, "false")) {
> -ret = QOBJECT(qbool_from_int(false));
> +ret = QOBJECT(qbool_from_bool(false));
>  } else if (token_is_keyword(token, "null")) {
>  ret = qnull();
>  } else {
> @@ -593,7 +593,7 @@ static QObject *parse_escape(JSONParserContext *ctxt, 
> va_list *ap)
>  if (token_is_escape(token, "%p")) {
>  obj = va_arg(*ap, QObject *);
>  } else if (token_is_escape(token, "%i")) {
> -obj = QOBJECT(qbool_from_int(va_arg(*ap, int)));
> +obj = QOBJECT(qbool_from_bool(va_arg(*ap, int)));

Funny: JSON_ESCAPE "%i" gets an int, but maps it to bool.  See also
patch to check-qjson.c below.

Is this feature actually used anywhere other than the tests?

>  } else if (token_is_escape(token, "%d")) {
>  obj = QOBJECT(qint_from_int(va_arg(*ap, int)));
>  } else if (token_is_escape(token, "%ld")) {
[...]
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 60e5b22..1cfffa5 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -1013,7 +1013,7 @@ static void keyword_literal(void)
>  g_assert(qobject_type(obj) == QTYPE_QBOOL);
>
>  qbool = qobject_to_qbool(obj);
> -g_assert(qbool_get_int(qbool) != 0);
> +g_assert(qbool_get_bool(qbool) == true);
>
>  str = qobject_to_json(obj);
>  g_assert(strcmp(qstring_get_str(str), "true") == 0);
> @@ -1026,7 +1026,7 @@ static void keyword_literal(void)
>  g_assert(qobject_type(obj) == QTYPE_QBOOL);
>
>  qbool = qobject_to_qbool(obj);
> -g_assert(qbool_get_int(qbool) == 0);
> +g_assert(qbool_get_bool(qbool) == false);
>
>  str = qobject_to_json(obj);
>  g_assert(strcmp(qstring_get_str(str), "false") == 0);
> @@ -1039,16 +1039,17 @@ static void keyword_literal(void)
   obj = qobject_from_jsonf("%i", false);
   g_assert(obj != NULL);
>  g_assert(qobject_type(obj) == QTYPE_QBOOL);
>
>  qbool = qobject_to_qbool(obj);
> -g_assert(qbool_get_int(qbool) == 0);
> +g_assert(qbool_get_bool(qbool) == false);
>
>  QDECREF(qbool);
>
> -obj = qobject_from_jsonf("%i", true);
> +/* Test that non-zero values other than 1 get collapsed to true */
> +obj = qobject_from_jsonf("%i", 2);
>  g_assert(obj != NULL);
>  g_assert(qobject_type(obj) == QTYPE_QBOOL);

These are test test cases for JSON_ESCAPE "%i".

>
>  qbool = qobject_to_qbool(obj);
> -g_assert(qbool_get_int(qbool) != 0);
> +g_assert(qbool_get_bool(qbool) == true);
>
>  QDECREF(qbool);
>
[...]

[Qemu-devel] [PULL 19/21] Teach analyze-migration.py about section footers

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
---
 scripts/analyze-migration.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 0c8b22f..f6894be 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -474,6 +474,7 @@ class MigrationDump(object):
 QEMU_VM_SECTION_FULL  = 0x04
 QEMU_VM_SUBSECTION= 0x05
 QEMU_VM_VMDESCRIPTION = 0x06
+QEMU_VM_SECTION_FOOTER= 0x7e

 def __init__(self, filename):
 self.section_classes = { ( 'ram', 0 ) : [ RamSection, None ],
@@ -526,6 +527,10 @@ class MigrationDump(object):
 elif section_type == self.QEMU_VM_SECTION_PART or section_type == 
self.QEMU_VM_SECTION_END:
 section_id = file.read32()
 self.sections[section_id].read()
+elif section_type == self.QEMU_VM_SECTION_FOOTER:
+read_section_id = file.read32()
+if read_section_id != section_id:
+raise Exception("Mismatched section footer: %x vs %x" % 
(read_section_id, section_id))
 else:
 raise Exception("Unknown section type: %d" % section_type)
 file.close()
-- 
2.4.3

[Qemu-devel] [PULL 16/21] Merge section header writing

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

The header writing for device sections is open coded in
a few places, merge it into one.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/savevm.c | 72 +-
 1 file changed, 28 insertions(+), 44 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 5324c4c..2942ed6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -611,6 +611,27 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se, 
QJSON *vmdesc)
 vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
 }

+/*
+ * Write the header for device section (QEMU_VM_SECTION START/END/PART/FULL)
+ */
+static void save_section_header(QEMUFile *f, SaveStateEntry *se,
+uint8_t section_type)
+{
+qemu_put_byte(f, section_type);
+qemu_put_be32(f, se->section_id);
+
+if (section_type == QEMU_VM_SECTION_FULL ||
+section_type == QEMU_VM_SECTION_START) {
+/* ID string */
+size_t len = strlen(se->idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se->idstr, len);
+
+qemu_put_be32(f, se->instance_id);
+qemu_put_be32(f, se->version_id);
+}
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
 SaveStateEntry *se;
@@ -647,8 +668,6 @@ void qemu_savevm_state_begin(QEMUFile *f,
 }

 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-int len;
-
 if (!se->ops || !se->ops->save_live_setup) {
 continue;
 }
@@ -657,17 +676,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 continue;
 }
 }
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
-qemu_put_be32(f, se->section_id);
-
-/* ID string */
-len = strlen(se->idstr);
-qemu_put_byte(f, len);
-qemu_put_buffer(f, (uint8_t *)se->idstr, len);
-
-qemu_put_be32(f, se->instance_id);
-qemu_put_be32(f, se->version_id);
+save_section_header(f, se, QEMU_VM_SECTION_START);

 ret = se->ops->save_live_setup(f, se->opaque);
 if (ret < 0) {
@@ -702,9 +711,8 @@ int qemu_savevm_state_iterate(QEMUFile *f)
 return 0;
 }
 trace_savevm_section_start(se->idstr, se->section_id);
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se->section_id);
+
+save_section_header(f, se, QEMU_VM_SECTION_PART);

 ret = se->ops->save_live_iterate(f, se->opaque);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
@@ -750,9 +758,8 @@ void qemu_savevm_state_complete(QEMUFile *f)
 }
 }
 trace_savevm_section_start(se->idstr, se->section_id);
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_END);
-qemu_put_be32(f, se->section_id);
+
+save_section_header(f, se, QEMU_VM_SECTION_END);

 ret = se->ops->save_live_complete(f, se->opaque);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
@@ -766,7 +773,6 @@ void qemu_savevm_state_complete(QEMUFile *f)
 json_prop_int(vmdesc, "page_size", TARGET_PAGE_SIZE);
 json_start_array(vmdesc, "devices");
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-int len;

 if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
 continue;
@@ -777,17 +783,7 @@ void qemu_savevm_state_complete(QEMUFile *f)
 json_prop_str(vmdesc, "name", se->idstr);
 json_prop_int(vmdesc, "instance_id", se->instance_id);

-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_FULL);
-qemu_put_be32(f, se->section_id);
-
-/* ID string */
-len = strlen(se->idstr);
-qemu_put_byte(f, len);
-qemu_put_buffer(f, (uint8_t *)se->idstr, len);
-
-qemu_put_be32(f, se->instance_id);
-qemu_put_be32(f, se->version_id);
+save_section_header(f, se, QEMU_VM_SECTION_FULL);

 vmstate_save(f, se, vmdesc);

@@ -887,8 +883,6 @@ static int qemu_save_device_state(QEMUFile *f)
 cpu_synchronize_all_states();

 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-int len;
-
 if (se->is_ram) {
 continue;
 }
@@ -896,17 +890,7 @@ static int qemu_save_device_state(QEMUFile *f)
 continue;
 }

-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_FULL);
-qemu_put_be32(f, se->section_id);
-
-/* ID string */
-len = strlen(se->idstr);
-qemu_put_byte(f, len);
-qemu_put_buffer(f, (uint8_t *)se->idstr, len);
-
-qemu_put_be32(f, se->instance_id);
-qemu_put_be32(f, se->version_id);
+save_section_header(f, se, QEMU_VM_SECTION_FULL);

 vmstate_save(f, se, NULL);
 }
-- 
2.4.3

[Qemu-devel] [PULL 12/21] qemu_ram_foreach_block: pass up error value, and down the ramblock name

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: David Gibson 
Reviewed-by: Amit Shah 
Reviewed-by: Michael R. Hines 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 exec.c| 10 --
 include/exec/cpu-common.h |  4 ++--
 migration/rdma.c  |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index ba3f2cf..76bfc4a 100644
--- a/exec.c
+++ b/exec.c
@@ -3345,14 +3345,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
 return res;
 }

-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
 RAMBlock *block;
+int ret = 0;

 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-func(block->host, block->offset, block->used_length, opaque);
+ret = func(block->idstr, block->host, block->offset,
+   block->used_length, opaque);
+if (ret) {
+break;
+}
 }
 rcu_read_unlock();
+return ret;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 43428bd..de8a720 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -126,10 +126,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;

-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
 ram_addr_t offset, ram_addr_t length, void *opaque);

-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);

 #endif

diff --git a/migration/rdma.c b/migration/rdma.c
index 3671903..791ef44 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void 
*host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
 ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-rdma_add_block(opaque, host_addr, block_offset, length);
+return rdma_add_block(opaque, host_addr, block_offset, length);
 }

 /*
-- 
2.4.3

[Qemu-devel] [PULL 15/21] Move loadvm_handlers into MigrationIncomingState

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

In postcopy we need the loadvm_handlers to be used in a couple
of different instances of the loadvm loop/routine, and thus
it can't be local any more.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: David Gibson 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  5 +
 include/migration/vmstate.h   |  2 ++
 include/qemu/typedefs.h   |  1 +
 migration/migration.c |  2 ++
 migration/savevm.c| 28 
 5 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1323e3d..720a949 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -42,9 +42,14 @@ struct MigrationParams {

 typedef struct MigrationState MigrationState;

+typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *file;
+
+/* See savevm.c */
+LoadStateEntry_Head loadvm_handlers;
 };

 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index fc5e643..7153b1e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -808,6 +808,8 @@ extern const VMStateInfo vmstate_info_bitmap;

 #define SELF_ANNOUNCE_ROUNDS 5

+void loadvm_free_handlers(MigrationIncomingState *mis);
+
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
void *opaque, int version_id);
 void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 74dfad3..6fdcbcd 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -31,6 +31,7 @@ typedef struct I2CBus I2CBus;
 typedef struct I2SCodec I2SCodec;
 typedef struct ISABus ISABus;
 typedef struct ISADevice ISADevice;
+typedef struct LoadStateEntry LoadStateEntry;
 typedef struct MACAddr MACAddr;
 typedef struct MachineClass MachineClass;
 typedef struct MachineState MachineState;
diff --git a/migration/migration.c b/migration/migration.c
index 66c0b57..b04b457 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -84,12 +84,14 @@ MigrationIncomingState 
*migration_incoming_state_new(QEMUFile* f)
 {
 mis_current = g_malloc0(sizeof(MigrationIncomingState));
 mis_current->file = f;
+QLIST_INIT(&mis_current->loadvm_handlers);

 return mis_current;
 }

 void migration_incoming_state_destroy(void)
 {
+loadvm_free_handlers(mis_current);
 g_free(mis_current);
 mis_current = NULL;
 }
diff --git a/migration/savevm.c b/migration/savevm.c
index d0991e8..5324c4c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -936,18 +936,26 @@ static SaveStateEntry *find_se(const char *idstr, int 
instance_id)
 return NULL;
 }

-typedef struct LoadStateEntry {
+struct LoadStateEntry {
 QLIST_ENTRY(LoadStateEntry) entry;
 SaveStateEntry *se;
 int section_id;
 int version_id;
-} LoadStateEntry;
+};

-int qemu_loadvm_state(QEMUFile *f)
+void loadvm_free_handlers(MigrationIncomingState *mis)
 {
-QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
-QLIST_HEAD_INITIALIZER(loadvm_handlers);
 LoadStateEntry *le, *new_le;
+
+QLIST_FOREACH_SAFE(le, &mis->loadvm_handlers, entry, new_le) {
+QLIST_REMOVE(le, entry);
+g_free(le);
+}
+}
+
+int qemu_loadvm_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 uint8_t section_type;
 unsigned int v;
@@ -978,6 +986,7 @@ int qemu_loadvm_state(QEMUFile *f)
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
 SaveStateEntry *se;
+LoadStateEntry *le;
 char idstr[256];

 trace_qemu_loadvm_state_section(section_type);
@@ -1019,7 +1028,7 @@ int qemu_loadvm_state(QEMUFile *f)
 le->se = se;
 le->section_id = section_id;
 le->version_id = version_id;
-QLIST_INSERT_HEAD(&loadvm_handlers, le, entry);
+QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);

 ret = vmstate_load(f, le->se, le->version_id);
 if (ret < 0) {
@@ -1033,7 +1042,7 @@ int qemu_loadvm_state(QEMUFile *f)
 section_id = qemu_get_be32(f);

 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, &loadvm_handlers, entry) {
+QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
 if (le->section_id == section_id) {
 break;
 }
@@ -1081,11 +1090,6 @@ int qemu_loadvm_state(QEMUFile *f)
 ret = 0;

 out:
-QLIST_FOREACH_SAFE(le, &loadvm_handlers, entry, new_le) {
-QLIST_REMOVE(le, entry);
-g_free(le);
-}
-
 if (ret == 0) {
 /

[Qemu-devel] [PULL 21/21] Remove unneeded memset

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
Signed-off-by: Juan Quintela 
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6c1e73f..48b3e64 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2452,7 +2452,6 @@ static void *qemu_rdma_data_init(const char *host_port, 
Error **errp)

 if (host_port) {
 rdma = g_malloc0(sizeof(RDMAContext));
-memset(rdma, 0, sizeof(RDMAContext));
 rdma->current_index = -1;
 rdma->current_chunk = -1;

-- 
2.4.3

[Qemu-devel] [PULL 14/21] Move copy out of qemu_peek_buffer

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

qemu_peek_buffer currently copies the data it reads into a buffer,
however a future patch wants access to the buffer without the copy,
hence rework to remove the copy to the layer above.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Amit Shah 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/qemu-file.h |  2 +-
 migration/qemu-file.c | 12 +++-
 migration/vmstate.c   |  5 +++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 318aa1e..4f67d79 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -157,7 +157,7 @@ static inline void qemu_put_ubyte(QEMUFile *f, unsigned int 
v)
 void qemu_put_be16(QEMUFile *f, unsigned int v);
 void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
-int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
   int level);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0ef543a..965a757 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -349,14 +349,14 @@ void qemu_file_skip(QEMUFile *f, int size)
 }

 /*
- * Read 'size' bytes from file (at 'offset') into buf without moving the
- * pointer.
+ * Read 'size' bytes from file (at 'offset') without moving the
+ * pointer and set 'buf' to point to that data.
  *
  * It will return size bytes unless there was an error, in which case it will
  * return as many as it managed to read (assuming blocking fd's which
  * all current QEMUFile are)
  */
-int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t **buf, int size, size_t offset)
 {
 int pending;
 int index;
@@ -392,7 +392,7 @@ int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, 
size_t offset)
 size = pending;
 }

-memcpy(buf, f->buf + index, size);
+*buf = f->buf + index;
 return size;
 }

@@ -411,11 +411,13 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)

 while (pending > 0) {
 int res;
+uint8_t *src;

-res = qemu_peek_buffer(f, buf, MIN(pending, IO_BUF_SIZE), 0);
+res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
 if (res == 0) {
 return done;
 }
+memcpy(buf, src, res);
 qemu_file_skip(f, res);
 buf += res;
 pending -= res;
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 108995e..6138d1a 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -358,7 +358,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 trace_vmstate_subsection_load(vmsd->name);

 while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) {
-char idstr[256];
+char idstr[256], *idstr_ret;
 int ret;
 uint8_t version_id, len, size;
 const VMStateDescription *sub_vmsd;
@@ -369,11 +369,12 @@ static int vmstate_subsection_load(QEMUFile *f, const 
VMStateDescription *vmsd,
 trace_vmstate_subsection_load_bad(vmsd->name, "(short)");
 return 0;
 }
-size = qemu_peek_buffer(f, (uint8_t *)idstr, len, 2);
+size = qemu_peek_buffer(f, (uint8_t **)&idstr_ret, len, 2);
 if (size != len) {
 trace_vmstate_subsection_load_bad(vmsd->name, "(peek fail)");
 return 0;
 }
+memcpy(idstr, idstr_ret, size);
 idstr[size] = 0;

 if (strncmp(vmsd->name, idstr, strlen(vmsd->name)) != 0) {
-- 
2.4.3

[Qemu-devel] [PULL 11/21] Split header writing out of qemu_savevm_state_begin

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

Split qemu_savevm_state_begin to:
  qemu_savevm_state_header   That writes the initial file header.
  qemu_savevm_state_beginThat sets up devices and does the first
 device pass.

Used later in postcopy.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Amit Shah 
Reviewed-by: David Gibson 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/sysemu/sysemu.h |  1 +
 migration/migration.c   |  1 +
 migration/savevm.c  | 11 ---
 trace-events|  1 +
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 853d90a..ef793f7 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,6 +84,7 @@ void qemu_announce_self(void);
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
  const MigrationParams *params);
+void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
diff --git a/migration/migration.c b/migration/migration.c
index 5d77046..438bf91 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -738,6 +738,7 @@ static void *migration_thread(void *opaque)
 int64_t start_time = initial_time;
 bool old_vm_running = false;

+qemu_savevm_state_header(s->file);
 qemu_savevm_state_begin(s->file, &s->params);

 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
diff --git a/migration/savevm.c b/migration/savevm.c
index 2b0aa65..903dbeb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -625,6 +625,13 @@ bool qemu_savevm_state_blocked(Error **errp)
 return false;
 }

+void qemu_savevm_state_header(QEMUFile *f)
+{
+trace_savevm_state_header();
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+}
+
 void qemu_savevm_state_begin(QEMUFile *f,
  const MigrationParams *params)
 {
@@ -639,9 +646,6 @@ void qemu_savevm_state_begin(QEMUFile *f,
 se->ops->set_params(params, se->opaque);
 }

-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
-
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 int len;

@@ -851,6 +855,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 }

 qemu_mutex_unlock_iothread();
+qemu_savevm_state_header(f);
 qemu_savevm_state_begin(f, ¶ms);
 qemu_mutex_lock_iothread();

diff --git a/trace-events b/trace-events
index b64e125..1abca7a 100644
--- a/trace-events
+++ b/trace-events
@@ -1186,6 +1186,7 @@ qemu_loadvm_state_section_startfull(uint32_t section_id, 
const char *idstr, uint
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id 
%u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, 
section_id %u -> %d"
 savevm_state_begin(void) ""
+savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_complete(void) ""
 savevm_state_cancel(void) ""
-- 
2.4.3

[Qemu-devel] [PULL 17/21] Disable section footers on older machine types

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

The next patch adds section footers; but we don't want to
break migration compatibility so disable them on older
machine types

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 hw/i386/pc_piix.c | 2 ++
 hw/i386/pc_q35.c  | 2 ++
 include/migration/migration.h | 2 +-
 migration/savevm.c| 7 +++
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 5253e6d..e142f75 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -52,6 +52,7 @@
 #ifdef CONFIG_XEN
 #  include 
 #endif
+#include "migration/migration.h"

 #define MAX_IDE_BUS 2

@@ -305,6 +306,7 @@ static void pc_init1(MachineState *machine)

 static void pc_compat_2_3(MachineState *machine)
 {
+savevm_skip_section_footers();
 }

 static void pc_compat_2_2(MachineState *machine)
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 110dfb7..b68263d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -45,6 +45,7 @@
 #include "hw/usb.h"
 #include "hw/cpu/icc_bus.h"
 #include "qemu/error-report.h"
+#include "migration/migration.h"

 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS 6
@@ -289,6 +290,7 @@ static void pc_q35_init(MachineState *machine)

 static void pc_compat_2_3(MachineState *machine)
 {
+savevm_skip_section_footers();
 }

 static void pc_compat_2_2(MachineState *machine)
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 720a949..7bdaf55 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -194,6 +194,6 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
  ram_addr_t offset, size_t size,
  uint64_t *bytes_sent);

-
 void ram_mig_init(void);
+void savevm_skip_section_footers(void);
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 2942ed6..80c4389 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -55,6 +55,8 @@
 #define ARP_PTYPE_IP 0x0800
 #define ARP_OP_REQUEST_REV 0x3

+static bool skip_section_footers;
+
 static int announce_self_create(uint8_t *buf,
 uint8_t *mac_addr)
 {
@@ -611,6 +613,11 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se, 
QJSON *vmdesc)
 vmstate_save_state(f, se->vmsd, se->opaque, vmdesc);
 }

+void savevm_skip_section_footers(void)
+{
+skip_section_footers = true;
+}
+
 /*
  * Write the header for device section (QEMU_VM_SECTION START/END/PART/FULL)
  */
-- 
2.4.3

[Qemu-devel] [PULL 13/21] Create MigrationIncomingState

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

There are currently lots of pieces of incoming migration state scattered
around, and postcopy is adding more, and it seems better to try and keep
it together.

allocate MIS in process_incoming_migration_co

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Amit Shah 
Reviewed-by: David Gibson 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  9 +
 include/qemu/typedefs.h   |  1 +
 migration/migration.c | 28 
 migration/savevm.c|  2 ++
 4 files changed, 40 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index b78a3b9..1323e3d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -42,6 +42,15 @@ struct MigrationParams {

 typedef struct MigrationState MigrationState;

+/* State for the incoming migration */
+struct MigrationIncomingState {
+QEMUFile *file;
+};
+
+MigrationIncomingState *migration_incoming_get_current(void);
+MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
+void migration_incoming_state_destroy(void);
+
 struct MigrationState
 {
 int64_t bandwidth_limit;
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index cde3314..74dfad3 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -38,6 +38,7 @@ typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
+typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
 typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
diff --git a/migration/migration.c b/migration/migration.c
index 438bf91..66c0b57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -53,6 +53,7 @@ static bool deferred_incoming;
migrations at once.  For now we don't need to add
dynamic creation of migration */

+/* For outgoing */
 MigrationState *migrate_get_current(void)
 {
 static MigrationState current_migration = {
@@ -71,6 +72,28 @@ MigrationState *migrate_get_current(void)
 return ¤t_migration;
 }

+/* For incoming */
+static MigrationIncomingState *mis_current;
+
+MigrationIncomingState *migration_incoming_get_current(void)
+{
+return mis_current;
+}
+
+MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
+{
+mis_current = g_malloc0(sizeof(MigrationIncomingState));
+mis_current->file = f;
+
+return mis_current;
+}
+
+void migration_incoming_state_destroy(void)
+{
+g_free(mis_current);
+mis_current = NULL;
+}
+
 /*
  * Called on -incoming with a defer: uri.
  * The migration can be started later after any parameters have been
@@ -115,9 +138,14 @@ static void process_incoming_migration_co(void *opaque)
 Error *local_err = NULL;
 int ret;

+migration_incoming_state_new(f);
+
 ret = qemu_loadvm_state(f);
+
 qemu_fclose(f);
 free_xbzrle_decoded_buf();
+migration_incoming_state_destroy();
+
 if (ret < 0) {
 error_report("load of migration failed: %s", strerror(-ret));
 migrate_decompress_threads_join();
diff --git a/migration/savevm.c b/migration/savevm.c
index 903dbeb..d0991e8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1329,9 +1329,11 @@ int load_vmstate(const char *name)
 }

 qemu_system_reset(VMRESET_SILENT);
+migration_incoming_state_new(f);
 ret = qemu_loadvm_state(f);

 qemu_fclose(f);
+migration_incoming_state_destroy();
 if (ret < 0) {
 error_report("Error %d while loading VM state", ret);
 return ret;
-- 
2.4.3

[Qemu-devel] [PULL 20/21] Rename RDMA structures to make destination clear

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

RDMA has two data types that are named confusingly;
   RDMALocalBlock (pointed to indirectly by local_ram_blocks)
   RDMARemoteBlock (pointed to by block in RDMAContext)

RDMALocalBlocks, as the name suggests is a data strucuture that
represents the RDMAable RAM Blocks on the current side of the migration
whichever that is.

RDMARemoteBlocks is always the shape of the RAMBlocks on the
destination, even on the destination.

Rename:
 RDMARemoteBlock -> RDMADestBlock
 context->'block' -> context->dest_blocks

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
Signed-off-by: Juan Quintela 
---
 migration/rdma.c | 66 
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 791ef44..6c1e73f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -236,13 +236,13 @@ typedef struct RDMALocalBlock {
  * corresponding RDMALocalBlock with
  * the information needed to perform the actual RDMA.
  */
-typedef struct QEMU_PACKED RDMARemoteBlock {
+typedef struct QEMU_PACKED RDMADestBlock {
 uint64_t remote_host_addr;
 uint64_t offset;
 uint64_t length;
 uint32_t remote_rkey;
 uint32_t padding;
-} RDMARemoteBlock;
+} RDMADestBlock;

 static uint64_t htonll(uint64_t v)
 {
@@ -258,20 +258,20 @@ static uint64_t ntohll(uint64_t v) {
 return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]);
 }

-static void remote_block_to_network(RDMARemoteBlock *rb)
+static void dest_block_to_network(RDMADestBlock *db)
 {
-rb->remote_host_addr = htonll(rb->remote_host_addr);
-rb->offset = htonll(rb->offset);
-rb->length = htonll(rb->length);
-rb->remote_rkey = htonl(rb->remote_rkey);
+db->remote_host_addr = htonll(db->remote_host_addr);
+db->offset = htonll(db->offset);
+db->length = htonll(db->length);
+db->remote_rkey = htonl(db->remote_rkey);
 }

-static void network_to_remote_block(RDMARemoteBlock *rb)
+static void network_to_dest_block(RDMADestBlock *db)
 {
-rb->remote_host_addr = ntohll(rb->remote_host_addr);
-rb->offset = ntohll(rb->offset);
-rb->length = ntohll(rb->length);
-rb->remote_rkey = ntohl(rb->remote_rkey);
+db->remote_host_addr = ntohll(db->remote_host_addr);
+db->offset = ntohll(db->offset);
+db->length = ntohll(db->length);
+db->remote_rkey = ntohl(db->remote_rkey);
 }

 /*
@@ -350,7 +350,7 @@ typedef struct RDMAContext {
  * Description of ram blocks used throughout the code.
  */
 RDMALocalBlocks local_ram_blocks;
-RDMARemoteBlock *block;
+RDMADestBlock  *dest_blocks;

 /*
  * Migration on *destination* started.
@@ -590,7 +590,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 memset(local, 0, sizeof *local);
 qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
 trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
-rdma->block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
+rdma->dest_blocks = (RDMADestBlock *) g_malloc0(sizeof(RDMADestBlock) *
 rdma->local_ram_blocks.nb_blocks);
 local->init = true;
 return 0;
@@ -2184,8 +2184,8 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 rdma->connected = false;
 }

-g_free(rdma->block);
-rdma->block = NULL;
+g_free(rdma->dest_blocks);
+rdma->dest_blocks = NULL;

 for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
 if (rdma->wr_data[idx].control_mr) {
@@ -2974,25 +2974,25 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque,
  * their "local" descriptions with what was sent.
  */
 for (i = 0; i < local->nb_blocks; i++) {
-rdma->block[i].remote_host_addr =
+rdma->dest_blocks[i].remote_host_addr =
 (uintptr_t)(local->block[i].local_host_addr);

 if (rdma->pin_all) {
-rdma->block[i].remote_rkey = local->block[i].mr->rkey;
+rdma->dest_blocks[i].remote_rkey = 
local->block[i].mr->rkey;
 }

-rdma->block[i].offset = local->block[i].offset;
-rdma->block[i].length = local->block[i].length;
+rdma->dest_blocks[i].offset = local->block[i].offset;
+rdma->dest_blocks[i].length = local->block[i].length;

-remote_block_to_network(&rdma->block[i]);
+dest_block_to_network(&rdma->dest_blocks[i]);
 }

 blocks.len = rdma->local_ram_blocks.nb_blocks
-* sizeof(RDMARemoteBlock);
+* sizeof(RDMADestBlock);


 ret = qemu_rdma_post_send_control(rdma,
-(uint8_t *) rdma->block, &blocks);
+(uint8_t *) rdma->dest_blocks, 
&blocks);

[Qemu-devel] [PULL 10/21] Add qemu_get_counted_string to read a string prefixed by a count byte

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

and use it in loadvm_state and ram_load.

Where ever it's used, check the return and error if it failed.

Minor: ram_load was using a 257 byte array for its string, the
   maximum length is 255 bytes + 0 terminator, so fix to 256

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Amit Shah 
Reviewed-by: David Gibson 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/qemu-file.h |  3 +++
 migration/qemu-file.c | 17 +
 migration/savevm.c| 11 ++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a01c5b8..318aa1e 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -312,4 +312,7 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 {
 qemu_get_be64s(f, (uint64_t *)pv);
 }
+
+size_t qemu_get_counted_string(QEMUFile *f, char buf[256]);
+
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2750365..0ef543a 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -585,3 +585,20 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src)
 }
 return len;
 }
+
+/*
+ * Get a string whose length is determined by a single preceding byte
+ * A preallocated 256 byte buffer must be passed in.
+ * Returns: len on success and a 0 terminated string in the buffer
+ *  else 0
+ *  (Note a 0 length string will return 0 either way)
+ */
+size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
+{
+size_t len = qemu_get_byte(f);
+size_t res = qemu_get_buffer(f, (uint8_t *)buf, len);
+
+buf[res] = 0;
+
+return res == len ? res : 0;
+}
diff --git a/migration/savevm.c b/migration/savevm.c
index 002f9b8..2b0aa65 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -973,8 +973,7 @@ int qemu_loadvm_state(QEMUFile *f)
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
 SaveStateEntry *se;
-char idstr[257];
-int len;
+char idstr[256];

 trace_qemu_loadvm_state_section(section_type);
 switch (section_type) {
@@ -982,9 +981,11 @@ int qemu_loadvm_state(QEMUFile *f)
 case QEMU_VM_SECTION_FULL:
 /* Read section start */
 section_id = qemu_get_be32(f);
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)idstr, len);
-idstr[len] = 0;
+if (!qemu_get_counted_string(f, idstr)) {
+error_report("Unable to read ID string for section %u",
+section_id);
+return -EINVAL;
+}
 instance_id = qemu_get_be32(f);
 version_id = qemu_get_be32(f);

-- 
2.4.3

[Qemu-devel] [PULL 01/21] migration: move ram stuff to migration/ram

2015-06-11 Thread Juan Quintela

For historic reasons, ram migration have been on arch_init.c.  Just
split it into migration/ram.c, the same that happened with block.c.

There is only code movement, no changes altogether.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 MAINTAINERS   |1 -
 Makefile.target   |1 +
 arch_init.c   | 1588 ---
 include/migration/migration.h |2 +
 include/sysemu/arch_init.h|1 -
 migration/ram.c   | 1639 +
 trace-events  |2 +-
 7 files changed, 1643 insertions(+), 1591 deletions(-)
 create mode 100644 migration/ram.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4ed8215..b183395 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1015,7 +1015,6 @@ S: Maintained
 F: include/migration/
 F: migration/
 F: savevm.c
-F: arch_init.c
 F: scripts/vmstate-static-checker.py
 F: tests/vmstate-static-checker-data/

diff --git a/Makefile.target b/Makefile.target
index ec5b92c..27209a7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -135,6 +135,7 @@ obj-$(CONFIG_KVM) += kvm-all.o
 obj-y += memory.o savevm.o cputlb.o
 obj-y += memory_mapping.o
 obj-y += dump.o
+obj-y += migration/ram.o
 LIBS := $(libs_softmmu) $(LIBS)

 # xen support
diff --git a/arch_init.c b/arch_init.c
index d294474..63c44d3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -55,14 +55,6 @@
 #include "qemu/host-utils.h"
 #include "qemu/rcu_queue.h"

-#ifdef DEBUG_ARCH_INIT
-#define DPRINTF(fmt, ...) \
-do { fprintf(stdout, "arch_init: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
 int graphic_height = 768;
@@ -111,24 +103,6 @@ int graphic_depth = 32;
 #endif

 const uint32_t arch_type = QEMU_ARCH;
-static bool mig_throttle_on;
-static int dirty_rate_high_cnt;
-static void check_guest_throttling(void);
-
-static uint64_t bitmap_sync_count;
-
-/***/
-/* ram save/restore */
-
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-#define RAM_SAVE_FLAG_XBZRLE   0x40
-/* 0x80 is reserved in migration.h start with 0x100 next */
-#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100

 static struct defconfig_file {
 const char *filename;
@@ -139,8 +113,6 @@ static struct defconfig_file {
 { NULL }, /* end of list */
 };

-static const uint8_t ZERO_TARGET_PAGE[TARGET_PAGE_SIZE];
-
 int qemu_read_default_config_files(bool userconfig)
 {
 int ret;
@@ -159,1517 +131,6 @@ int qemu_read_default_config_files(bool userconfig)
 return 0;
 }

-static inline bool is_zero_range(uint8_t *p, uint64_t size)
-{
-return buffer_find_nonzero_offset(p, size) == size;
-}
-
-/* struct contains XBZRLE cache and a static page
-   used by the compression */
-static struct {
-/* buffer used for XBZRLE encoding */
-uint8_t *encoded_buf;
-/* buffer for storing page content */
-uint8_t *current_buf;
-/* Cache for XBZRLE, Protected by lock. */
-PageCache *cache;
-QemuMutex lock;
-} XBZRLE;
-
-/* buffer used for XBZRLE decoding */
-static uint8_t *xbzrle_decoded_buf;
-
-static void XBZRLE_cache_lock(void)
-{
-if (migrate_use_xbzrle())
-qemu_mutex_lock(&XBZRLE.lock);
-}
-
-static void XBZRLE_cache_unlock(void)
-{
-if (migrate_use_xbzrle())
-qemu_mutex_unlock(&XBZRLE.lock);
-}
-
-/*
- * called from qmp_migrate_set_cache_size in main thread, possibly while
- * a migration is in progress.
- * A running migration maybe using the cache and might finish during this
- * call, hence changes to the cache are protected by XBZRLE.lock().
- */
-int64_t xbzrle_cache_resize(int64_t new_size)
-{
-PageCache *new_cache;
-int64_t ret;
-
-if (new_size < TARGET_PAGE_SIZE) {
-return -1;
-}
-
-XBZRLE_cache_lock();
-
-if (XBZRLE.cache != NULL) {
-if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
-goto out_new_size;
-}
-new_cache = cache_init(new_size / TARGET_PAGE_SIZE,
-TARGET_PAGE_SIZE);
-if (!new_cache) {
-error_report("Error creating cache");
-ret = -1;
-goto out;
-}
-
-cache_fini(XBZRLE.cache);
-XBZRLE.cache = new_cache;
-}
-
-out_new_size:
-ret = pow2floor(new_size);
-out:
-XBZRLE_cache_unlock();
-return ret;
-}
-
-/* accounting for migration statistics */
-typedef struct AccountingInfo {
-uint64_t dup_pages;
-uint64_t skipped_pages;
-uint64_t norm_pages;
-uint64_t iterations;
-uint64_t xbzrle_bytes;
-uint64_t xbzrle_pages;
-uint64_t xbzrle_cache_miss;
-double xbzrle_cache_miss_rate;
-

[Qemu-devel] [PULL 09/21] migration: Use normal VMStateDescriptions for Subsections

2015-06-11 Thread Juan Quintela

We create optional sections with this patch.  But we already have
optional subsections.  Instead of having two mechanism that do the
same, we can just generalize it.

For subsections we just change:

- Add a needed function to VMStateDescription
- Remove VMStateSubsection (after removal of the needed function
  it is just a VMStateDescription)
- Adjust the whole tree, moving the needed function to the corresponding
  VMStateDescription

Signed-off-by: Juan Quintela 
---
 cpus.c  | 11 +++---
 docs/migration.txt  | 11 +++---
 exec.c  | 11 +++---
 hw/acpi/ich9.c  | 10 +++---
 hw/acpi/piix4.c | 10 +++---
 hw/block/fdc.c  | 42 +--
 hw/char/serial.c| 41 +--
 hw/display/qxl.c| 11 +++---
 hw/display/vga.c| 11 +++---
 hw/ide/core.c   | 32 +++---
 hw/ide/pci.c| 16 -
 hw/input/pckbd.c| 22 ++--
 hw/input/ps2.c  | 11 +++---
 hw/intc/apic_common.c   | 10 +++---
 hw/isa/lpc_ich9.c   | 10 +++---
 hw/net/e1000.c  | 11 +++---
 hw/net/rtl8139.c| 11 +++---
 hw/net/vmxnet3.c| 12 +++
 hw/pci-host/piix.c  | 10 +++---
 hw/scsi/scsi-bus.c  | 11 +++---
 hw/timer/hpet.c | 11 +++---
 hw/timer/mc146818rtc.c  | 23 ++---
 hw/usb/hcd-ohci.c   | 11 +++---
 hw/usb/redirect.c   | 34 +--
 hw/virtio/virtio.c  | 16 -
 include/migration/vmstate.h |  8 ++---
 migration/savevm.c  | 10 +++---
 migration/vmstate.c | 16 -
 target-arm/machine.c| 26 ++-
 target-i386/machine.c   | 81 ++---
 target-ppc/machine.c| 62 ++
 target-s390x/machine.c  | 30 -
 32 files changed, 253 insertions(+), 389 deletions(-)

diff --git a/cpus.c b/cpus.c
index f38b858..b85fb5f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -480,6 +480,7 @@ static const VMStateDescription icount_vmstate_timers = {
 .name = "timer/icount",
 .version_id = 1,
 .minimum_version_id = 1,
+.needed = icount_state_needed,
 .fields = (VMStateField[]) {
 VMSTATE_INT64(qemu_icount_bias, TimersState),
 VMSTATE_INT64(qemu_icount, TimersState),
@@ -497,13 +498,9 @@ static const VMStateDescription vmstate_timers = {
 VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2),
 VMSTATE_END_OF_LIST()
 },
-.subsections = (VMStateSubsection[]) {
-{
-.vmsd = &icount_vmstate_timers,
-.needed = icount_state_needed,
-}, {
-/* empty */
-}
+.subsections = (const VMStateDescription*[]) {
+&icount_vmstate_timers,
+NULL
 }
 };

diff --git a/docs/migration.txt b/docs/migration.txt
index 0492a45..f6df4be 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -257,6 +257,7 @@ const VMStateDescription vmstate_ide_drive_pio_state = {
 .minimum_version_id = 1,
 .pre_save = ide_drive_pio_pre_save,
 .post_load = ide_drive_pio_post_load,
+.needed = ide_drive_pio_state_needed,
 .fields = (VMStateField[]) {
 VMSTATE_INT32(req_nb_sectors, IDEState),
 VMSTATE_VARRAY_INT32(io_buffer, IDEState, io_buffer_total_len, 1,
@@ -279,13 +280,9 @@ const VMStateDescription vmstate_ide_drive = {
  several fields 
 VMSTATE_END_OF_LIST()
 },
-.subsections = (VMStateSubsection []) {
-{
-.vmsd = &vmstate_ide_drive_pio_state,
-.needed = ide_drive_pio_state_needed,
-}, {
-/* empty */
-}
+.subsections = (const VMStateDescription*[]) {
+&vmstate_ide_drive_pio_state,
+NULL
 }
 };

diff --git a/exec.c b/exec.c
index 487583b..ba3f2cf 100644
--- a/exec.c
+++ b/exec.c
@@ -454,6 +454,7 @@ static const VMStateDescription 
vmstate_cpu_common_exception_index = {
 .name = "cpu_common/exception_index",
 .version_id = 1,
 .minimum_version_id = 1,
+.needed = cpu_common_exception_index_needed,
 .fields = (VMStateField[]) {
 VMSTATE_INT32(exception_index, CPUState),
 VMSTATE_END_OF_LIST()
@@ -471,13 +472,9 @@ const VMStateDescription vmstate_cpu_common = {
 VMSTATE_UINT32(interrupt_request, CPUState),
 VMSTATE_END_OF_LIST()
 },
-.subsections = (VMStateSubsection[]) {
-{
-.vmsd = &vmstate_cpu_common_exception_index,
-.needed = cpu_common_exception_index_needed,
-} , {
-/* empty */
-}
+.subsections = (const VMStateDescription*[]) {
+&vmstate_cpu_common_exception_index,
+NULL
 }
 };

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 25bc023..8a64ffb 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -152,6 +152,7 @@ static const VMSta

[Qemu-devel] [PULL 07/21] migration: Remove duplicated assignment of SETUP status

2015-06-11 Thread Juan Quintela

We assign the MIGRATION_STATUS_SETUP status in two places.  Just in
succession.  Just remove the second one.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 migration/migration.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 732d229..5d77046 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -838,9 +838,6 @@ static void *migration_thread(void *opaque)

 void migrate_fd_connect(MigrationState *s)
 {
-s->state = MIGRATION_STATUS_SETUP;
-trace_migrate_set_state(MIGRATION_STATUS_SETUP);
-
 /* This is a best 1st approximation. ns to ms */
 s->expected_downtime = max_downtime/100;
 s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
-- 
2.4.3

[Qemu-devel] [PULL 18/21] Add a protective section footer

2015-06-11 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

Badly formatted migration streams can go undetected or produce
misleading errors due to a lock of checking at the end of sections.
In particular a section that adds an extra 0x00 at the end
causes what looks like a normal end of stream and thus doesn't produce
any errors, and something that ends in a 0x01..0x04 kind of look
like real section headers and then fail when the section parser tries
to figure out which section they are.  This is made worse by the
choice of 0x00..0x04 being small numbers that are particularly common
in normal section data.

This patch adds a section footer consisting of a marker (0x7e - ~)
followed by the section-id that was also sent in the header.  If
they mismatch then it throws an error explaining which section was
being loaded.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  1 +
 migration/savevm.c| 61 +++
 2 files changed, 62 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 7bdaf55..9387c8c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -34,6 +34,7 @@
 #define QEMU_VM_SECTION_FULL 0x04
 #define QEMU_VM_SUBSECTION   0x05
 #define QEMU_VM_VMDESCRIPTION0x06
+#define QEMU_VM_SECTION_FOOTER   0x7e

 struct MigrationParams {
 bool blk;
diff --git a/migration/savevm.c b/migration/savevm.c
index 80c4389..2091882 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -639,6 +639,53 @@ static void save_section_header(QEMUFile *f, 
SaveStateEntry *se,
 }
 }

+/*
+ * Write a footer onto device sections that catches cases misformatted device
+ * sections.
+ */
+static void save_section_footer(QEMUFile *f, SaveStateEntry *se)
+{
+if (!skip_section_footers) {
+qemu_put_byte(f, QEMU_VM_SECTION_FOOTER);
+qemu_put_be32(f, se->section_id);
+}
+}
+
+/*
+ * Read a footer off the wire and check that it matches the expected section
+ *
+ * Returns: true if the footer was good
+ *  false if there is a problem (and calls error_report to say why)
+ */
+static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
+{
+uint8_t read_mark;
+uint32_t read_section_id;
+
+if (skip_section_footers) {
+/* No footer to check */
+return true;
+}
+
+read_mark = qemu_get_byte(f);
+
+if (read_mark != QEMU_VM_SECTION_FOOTER) {
+error_report("Missing section footer for %s", se->idstr);
+return false;
+}
+
+read_section_id = qemu_get_be32(f);
+if (read_section_id != se->section_id) {
+error_report("Mismatched section id in footer for %s -"
+ " read 0x%x expected 0x%x",
+ se->idstr, read_section_id, se->section_id);
+return false;
+}
+
+/* All good */
+return true;
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
 SaveStateEntry *se;
@@ -686,6 +733,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 save_section_header(f, se, QEMU_VM_SECTION_START);

 ret = se->ops->save_live_setup(f, se->opaque);
+save_section_footer(f, se);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 break;
@@ -723,6 +771,7 @@ int qemu_savevm_state_iterate(QEMUFile *f)

 ret = se->ops->save_live_iterate(f, se->opaque);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
+save_section_footer(f, se);

 if (ret < 0) {
 qemu_file_set_error(f, ret);
@@ -770,6 +819,7 @@ void qemu_savevm_state_complete(QEMUFile *f)

 ret = se->ops->save_live_complete(f, se->opaque);
 trace_savevm_section_end(se->idstr, se->section_id, ret);
+save_section_footer(f, se);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 return;
@@ -796,6 +846,7 @@ void qemu_savevm_state_complete(QEMUFile *f)

 json_end_object(vmdesc);
 trace_savevm_section_end(se->idstr, se->section_id, 0);
+save_section_footer(f, se);
 }

 qemu_put_byte(f, QEMU_VM_EOF);
@@ -900,6 +951,8 @@ static int qemu_save_device_state(QEMUFile *f)
 save_section_header(f, se, QEMU_VM_SECTION_FULL);

 vmstate_save(f, se, NULL);
+
+save_section_footer(f, se);
 }

 qemu_put_byte(f, QEMU_VM_EOF);
@@ -1027,6 +1080,10 @@ int qemu_loadvm_state(QEMUFile *f)
  " device '%s'", instance_id, idstr);
 goto out;
 }
+if (!check_section_footer(f, le->se)) {
+ret = -EINVAL;
+goto out;
+}
 break;
 case QEMU_VM_SECTION_PART:
 case QEMU_VM_SECTION_END:
@@ -1050,6 +1107,10 @@ int qemu_loadvm_state(QEMUFile *f)
  section_id, le->se->idstr);
 goto out;
 }
+

[Qemu-devel] [PULL 06/21] rdma: Fix qemu crash when IPv6 address is used for migration

2015-06-11 Thread Juan Quintela

From: Padmanabh Ratnakar 

Qemu crashes when IPv6 address is specified for migration and access
to any RDMA uverbs device available on the system is blocked using cgroups.
Fix the crash by checking the return value of ibv_open_device routine.

Signed-off-by: Meghana Cheripady 
Signed-off-by: Padmanabh Ratnakar 
Signed-off-by: Juan Quintela 
---
 migration/rdma.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 77e3444..3671903 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -790,6 +790,13 @@ static int qemu_rdma_broken_ipv6_kernel(Error **errp, 
struct ibv_context *verbs)

 for (x = 0; x < num_devices; x++) {
 verbs = ibv_open_device(dev_list[x]);
+if (!verbs) {
+if (errno == EPERM) {
+continue;
+} else {
+return -EINVAL;
+}
+}

 if (ibv_query_port(verbs, 1, &port_attr)) {
 ibv_close_device(verbs);
-- 
2.4.3

[Qemu-devel] [PULL 04/21] migration: reduce include files

2015-06-11 Thread Juan Quintela

To make changes easier, with the copy, I maintained almost all include
files.  Now I remove the unnecessary ones on this patch.  This compiles
on linux x64 with all architectures configured, and cross-compiles for
windows 32 and 64 bits.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 arch_init.c | 23 ---
 migration/ram.c | 18 ++
 2 files changed, 2 insertions(+), 39 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 63c44d3..725c638 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -22,38 +22,15 @@
  * THE SOFTWARE.
  */
 #include 
-#include 
-#include 
-#include 
-#ifndef _WIN32
-#include 
-#include 
-#endif
-#include "config.h"
-#include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
-#include "qemu/bitops.h"
-#include "qemu/bitmap.h"
 #include "sysemu/arch_init.h"
-#include "audio/audio.h"
-#include "hw/i386/pc.h"
 #include "hw/pci/pci.h"
 #include "hw/audio/audio.h"
-#include "sysemu/kvm.h"
-#include "migration/migration.h"
 #include "hw/i386/smbios.h"
-#include "exec/address-spaces.h"
-#include "hw/audio/pcspk.h"
-#include "migration/page_cache.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qmp-commands.h"
-#include "trace.h"
-#include "exec/cpu-all.h"
-#include "exec/ram_addr.h"
 #include "hw/acpi/acpi.h"
-#include "qemu/host-utils.h"
-#include "qemu/rcu_queue.h"

 #ifdef TARGET_SPARC
 int graphic_width = 1024;
diff --git a/migration/ram.c b/migration/ram.c
index 9db72a4..3945328 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -26,31 +26,17 @@
  * THE SOFTWARE.
  */
 #include 
-#include 
-#include 
 #include 
-#ifndef _WIN32
-#include 
-#include 
-#endif
-#include "config.h"
-#include "monitor/monitor.h"
-#include "sysemu/sysemu.h"
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
-#include "hw/i386/pc.h"
-#include "hw/pci/pci.h"
-#include "hw/audio/audio.h"
+#include "qemu/timer.h"
+#include "qemu/main-loop.h"
 #include "migration/migration.h"
 #include "exec/address-spaces.h"
 #include "migration/page_cache.h"
-#include "qemu/config-file.h"
 #include "qemu/error-report.h"
-#include "qmp-commands.h"
 #include "trace.h"
-#include "exec/cpu-all.h"
 #include "exec/ram_addr.h"
-#include "qemu/host-utils.h"
 #include "qemu/rcu_queue.h"

 #ifdef DEBUG_MIGRATION_RAM
-- 
2.4.3

[Qemu-devel] [PULL 03/21] migration: Add myself to the copyright list of both files

2015-06-11 Thread Juan Quintela

If anyone feels like adding himself to the list, just sent me a patch.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 migration/ram.c| 4 
 migration/savevm.c | 4 
 2 files changed, 8 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index ff889ba..9db72a4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2,6 +2,10 @@
  * QEMU System Emulator
  *
  * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2011-2015 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
diff --git a/migration/savevm.c b/migration/savevm.c
index 3b0e222..3dfa425 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2,6 +2,10 @@
  * QEMU System Emulator
  *
  * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2009-2015 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
-- 
2.4.3

[Qemu-devel] [PULL 02/21] migration: move savevm.c inside migration/

2015-06-11 Thread Juan Quintela

Now, everything is in place.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 
---
 MAINTAINERS| 1 -
 Makefile.target| 4 ++--
 savevm.c => migration/savevm.c | 0
 trace-events   | 2 +-
 4 files changed, 3 insertions(+), 4 deletions(-)
 rename savevm.c => migration/savevm.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index b183395..e728d3a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1014,7 +1014,6 @@ M: Amit Shah 
 S: Maintained
 F: include/migration/
 F: migration/
-F: savevm.c
 F: scripts/vmstate-static-checker.py
 F: tests/vmstate-static-checker-data/

diff --git a/Makefile.target b/Makefile.target
index 27209a7..3e7aafd 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -132,10 +132,10 @@ obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o 
ioport.o numa.o
 obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
-obj-y += memory.o savevm.o cputlb.o
+obj-y += memory.o cputlb.o
 obj-y += memory_mapping.o
 obj-y += dump.o
-obj-y += migration/ram.o
+obj-y += migration/ram.o migration/savevm.o
 LIBS := $(libs_softmmu) $(LIBS)

 # xen support
diff --git a/savevm.c b/migration/savevm.c
similarity index 100%
rename from savevm.c
rename to migration/savevm.c
diff --git a/trace-events b/trace-events
index dc1ef1f..b64e125 100644
--- a/trace-events
+++ b/trace-events
@@ -1179,7 +1179,7 @@ virtio_gpu_cmd_res_flush(uint32_t res, uint32_t w, 
uint32_t h, uint32_t x, uint3
 virtio_gpu_fence_ctrl(uint64_t fence, uint32_t type) "fence 0x%" PRIx64 ", 
type 0x%x"
 virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64

-# savevm.c
+# migration/savevm.c
 qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, 
uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
-- 
2.4.3

[Qemu-devel] [PULL 08/21] migration: create savevm_state

2015-06-11 Thread Juan Quintela

This way, we will put savevm global state here, instead of lots of variables.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/savevm.c | 51 ---
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 3dfa425..1a45d39 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -239,10 +239,15 @@ typedef struct SaveStateEntry {
 int is_ram;
 } SaveStateEntry;

+typedef struct SaveState {
+QTAILQ_HEAD(, SaveStateEntry) handlers;
+int global_section_id;
+} SaveState;

-static QTAILQ_HEAD(savevm_handlers, SaveStateEntry) savevm_handlers =
-QTAILQ_HEAD_INITIALIZER(savevm_handlers);
-static int global_section_id;
+static SaveState savevm_state = {
+.handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
+.global_section_id = 0,
+};

 static void dump_vmstate_vmsd(FILE *out_file,
   const VMStateDescription *vmsd, int indent,
@@ -387,7 +392,7 @@ static int calculate_new_instance_id(const char *idstr)
 SaveStateEntry *se;
 int instance_id = 0;

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (strcmp(idstr, se->idstr) == 0
 && instance_id <= se->instance_id) {
 instance_id = se->instance_id + 1;
@@ -401,7 +406,7 @@ static int calculate_compat_instance_id(const char *idstr)
 SaveStateEntry *se;
 int instance_id = 0;

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (!se->compat) {
 continue;
 }
@@ -429,7 +434,7 @@ int register_savevm_live(DeviceState *dev,

 se = g_malloc0(sizeof(SaveStateEntry));
 se->version_id = version_id;
-se->section_id = global_section_id++;
+se->section_id = savevm_state.global_section_id++;
 se->ops = ops;
 se->opaque = opaque;
 se->vmsd = NULL;
@@ -461,7 +466,7 @@ int register_savevm_live(DeviceState *dev,
 }
 assert(!se->compat || se->instance_id == 0);
 /* add at the end of list */
-QTAILQ_INSERT_TAIL(&savevm_handlers, se, entry);
+QTAILQ_INSERT_TAIL(&savevm_state.handlers, se, entry);
 return 0;
 }

@@ -495,9 +500,9 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 }
 pstrcat(id, sizeof(id), idstr);

-QTAILQ_FOREACH_SAFE(se, &savevm_handlers, entry, new_se) {
+QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
 if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
-QTAILQ_REMOVE(&savevm_handlers, se, entry);
+QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
 if (se->compat) {
 g_free(se->compat);
 }
@@ -519,7 +524,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int 
instance_id,

 se = g_malloc0(sizeof(SaveStateEntry));
 se->version_id = vmsd->version_id;
-se->section_id = global_section_id++;
+se->section_id = savevm_state.global_section_id++;
 se->opaque = opaque;
 se->vmsd = vmsd;
 se->alias_id = alias_id;
@@ -547,7 +552,7 @@ int vmstate_register_with_alias_id(DeviceState *dev, int 
instance_id,
 }
 assert(!se->compat || se->instance_id == 0);
 /* add at the end of list */
-QTAILQ_INSERT_TAIL(&savevm_handlers, se, entry);
+QTAILQ_INSERT_TAIL(&savevm_state.handlers, se, entry);
 return 0;
 }

@@ -556,9 +561,9 @@ void vmstate_unregister(DeviceState *dev, const 
VMStateDescription *vmsd,
 {
 SaveStateEntry *se, *new_se;

-QTAILQ_FOREACH_SAFE(se, &savevm_handlers, entry, new_se) {
+QTAILQ_FOREACH_SAFE(se, &savevm_state.handlers, entry, new_se) {
 if (se->vmsd == vmsd && se->opaque == opaque) {
-QTAILQ_REMOVE(&savevm_handlers, se, entry);
+QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
 if (se->compat) {
 g_free(se->compat);
 }
@@ -610,7 +615,7 @@ bool qemu_savevm_state_blocked(Error **errp)
 {
 SaveStateEntry *se;

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (se->vmsd && se->vmsd->unmigratable) {
 error_setg(errp, "State blocked by non-migratable device '%s'",
se->idstr);
@@ -627,7 +632,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 int ret;

 trace_savevm_state_begin();
-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (!se->ops || !se->ops->set_params) {
 continue;
 }
@@ -637,7 +642,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 qemu_put_be32(f, QEMU_VM_FILE_VERSION);

-QTAILQ_FOREACH(se, &savevm_handlers, entry) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 int len;

 if (!s

[Qemu-devel] [PULL 05/21] arch_init: Clean up the duplicate variable 'len' defining in ram_load()

2015-06-11 Thread Juan Quintela

From: zhanghailiang 

There are two places that define 'len' variable, It's OK for compiling,
but makes it difficult for reading.

Remove the local one which defined in the inside 'while' loop.

Signed-off-by: zhanghailiang 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3945328..57368e1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1459,7 +1459,6 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 total_ram_bytes = addr;
 while (!ret && total_ram_bytes) {
 RAMBlock *block;
-uint8_t len;
 char id[256];
 ram_addr_t length;

-- 
2.4.3

[Qemu-devel] [PULL v2 00/21] migration pull request

2015-06-11 Thread Juan Quintela


Hi

[v2]

Just rebased.

[v1]

Here are the pull request, it includes:
- generic patches form postcopy that are reviewed (dave)
- generic patches form RDMA fixes that are reviewed (dave)
- patches form optional section reviewed (me)
- patches for migration events reviewed (me)
- fix RDMA and ipv6 (Padmanabh)
- Remove extra variable (zhanghailiang)

Please, apply.

Later, Juan.



The following changes since commit d8e3b729cf452d2689c8669f1ec18158db29fd5a:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2015-06-11 15:33:38 +0100)

are available in the git repository at:

  git://github.com/juanquintela/qemu.git tags/migration/20150612

for you to fetch changes up to 4fa3dd17dc29c316726f0d4a354a4d895e130c73:

  Remove unneeded memset (2015-06-12 06:54:02 +0200)


migration/next for 20150612


Dr. David Alan Gilbert (12):
  Add qemu_get_counted_string to read a string prefixed by a count byte
  Split header writing out of qemu_savevm_state_begin
  qemu_ram_foreach_block: pass up error value, and down the ramblock name
  Create MigrationIncomingState
  Move copy out of qemu_peek_buffer
  Move loadvm_handlers into MigrationIncomingState
  Merge section header writing
  Disable section footers on older machine types
  Add a protective section footer
  Teach analyze-migration.py about section footers
  Rename RDMA structures to make destination clear
  Remove unneeded memset

Juan Quintela (7):
  migration: move ram stuff to migration/ram
  migration: move savevm.c inside migration/
  migration: Add myself to the copyright list of both files
  migration: reduce include files
  migration: Remove duplicated assignment of SETUP status
  migration: create savevm_state
  migration: Use normal VMStateDescriptions for Subsections

Padmanabh Ratnakar (1):
  rdma: Fix qemu crash when IPv6 address is used for migration

zhanghailiang (1):
  arch_init: Clean up the duplicate variable 'len' defining in ram_load()

 MAINTAINERS|2 -
 Makefile.target|3 +-
 arch_init.c| 1611 ---
 cpus.c |   11 +-
 docs/migration.txt |   11 +-
 exec.c |   21 +-
 hw/acpi/ich9.c |   10 +-
 hw/acpi/piix4.c|   10 +-
 hw/block/fdc.c |   42 +-
 hw/char/serial.c   |   41 +-
 hw/display/qxl.c   |   11 +-
 hw/display/vga.c   |   11 +-
 hw/i386/pc_piix.c  |2 +
 hw/i386/pc_q35.c   |2 +
 hw/ide/core.c  |   32 +-
 hw/ide/pci.c   |   16 +-
 hw/input/pckbd.c   |   22 +-
 hw/input/ps2.c |   11 +-
 hw/intc/apic_common.c  |   10 +-
 hw/isa/lpc_ich9.c  |   10 +-
 hw/net/e1000.c |   11 +-
 hw/net/rtl8139.c   |   11 +-
 hw/net/vmxnet3.c   |   12 +-
 hw/pci-host/piix.c |   10 +-
 hw/scsi/scsi-bus.c |   11 +-
 hw/timer/hpet.c|   11 +-
 hw/timer/mc146818rtc.c |   23 +-
 hw/usb/hcd-ohci.c  |   11 +-
 hw/usb/redirect.c  |   34 +-
 hw/virtio/virtio.c |   16 +-
 include/exec/cpu-common.h  |4 +-
 include/migration/migration.h  |   17 +
 include/migration/qemu-file.h  |5 +-
 include/migration/vmstate.h|   10 +-
 include/qemu/typedefs.h|2 +
 include/sysemu/arch_init.h |1 -
 include/sysemu/sysemu.h|1 +
 migration/migration.c  |   34 +-
 migration/qemu-file.c  |   29 +-
 migration/ram.c| 1628 
 migration/rdma.c   |   78 +-
 savevm.c => migration/savevm.c |  257 ---
 migration/vmstate.c|   21 +-
 scripts/analyze-migration.py   |5 +
 target-arm/machine.c   |   26 +-
 target-i386/machine.c  |   81 +-
 target-ppc/machine.c   |   62 +-
 target-s390x/machine.c |   30 +-
 trace-events   |5 +-
 49 files changed, 2191 insertions(+), 2144 deletions(-)
 create mode 100644 migration/ram.c
 rename savevm.c => migration/savevm.c (88%)

Re: [Qemu-devel] [PULL 20/22] hw/arm/boot: arm_load_kernel implemented as a machine init done notifier

2015-06-11 Thread Peter Crosthwaite

On Tue, Jun 2, 2015 at 9:33 AM, Peter Maydell  wrote:
> From: Eric Auger 
>
> Device tree nodes for the platform bus and its children dynamic sysbus
> devices are added in a machine init done notifier. To load the dtb once,
> after those latter nodes are built and before ROM freeze, the actual
> arm_load_kernel existing code is moved into a notifier notify function,
> arm_load_kernel_notify. arm_load_kernel now only registers the
> corresponding notifier.
>

Does this work? I am experiencing a regression on this patch for
xlnx-ep108 board. I think it is because this is now delaying
arm_load_kernel_notify call until after rom_load_all. From vl.c:

if (rom_load_all() != 0) {
fprintf(stderr, "rom loading failed\n");
exit(1);
}

/* TODO: once all bus devices are qdevified, this should be done
 * when bus is created by qdev.c */
qemu_register_reset(qbus_reset_all_fn, sysbus_get_default());
qemu_run_machine_init_done_notifiers();

the machine_init_done_notifiers are called after the rom_load_all()
call which does the image loading. So the image-to-load registration
is too late.

Straight revert of this patch fixes the issue for me.

Regards,
Peter


> Machine files that do not support platform bus stay unchanged. Machine
> files willing to support dynamic sysbus devices must call arm_load_kernel
> before sysbus-fdt arm_register_platform_bus_fdt_creator to make sure
> dynamic sysbus device nodes are integrated in the dtb.
>
> Signed-off-by: Eric Auger 
> Reviewed-by: Shannon Zhao 
> Reviewed-by: Alexander Graf 
> Reviewed-by: Alex Bennée 
> Message-id: 1433244554-12898-3-git-send-email-eric.au...@linaro.org
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/boot.c| 14 +-
>  include/hw/arm/arm.h | 28 
>  2 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index fa69503..d036624 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -557,7 +557,7 @@ static void load_image_to_fw_cfg(FWCfgState *fw_cfg, 
> uint16_t size_key,
>  fw_cfg_add_bytes(fw_cfg, data_key, data, size);
>  }
>
> -void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
> +static void arm_load_kernel_notify(Notifier *notifier, void *data)
>  {
>  CPUState *cs;
>  int kernel_size;
> @@ -568,6 +568,11 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
> *info)
>  hwaddr entry, kernel_load_offset;
>  int big_endian;
>  static const ARMInsnFixup *primary_loader;
> +ArmLoadKernelNotifier *n = DO_UPCAST(ArmLoadKernelNotifier,
> + notifier, notifier);
> +ARMCPU *cpu = n->cpu;
> +struct arm_boot_info *info =
> +container_of(n, struct arm_boot_info, load_kernel_notifier);
>
>  /* CPU objects (unlike devices) are not automatically reset on system
>   * reset, so we must always register a handler to do so. If we're
> @@ -775,3 +780,10 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
> *info)
>  ARM_CPU(cs)->env.boot_info = info;
>  }
>  }
> +
> +void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
> +{
> +info->load_kernel_notifier.cpu = cpu;
> +info->load_kernel_notifier.notifier.notify = arm_load_kernel_notify;
> +
> qemu_add_machine_init_done_notifier(&info->load_kernel_notifier.notifier);
> +}
> diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
> index 5c940eb..760804c 100644
> --- a/include/hw/arm/arm.h
> +++ b/include/hw/arm/arm.h
> @@ -13,11 +13,21 @@
>
>  #include "exec/memory.h"
>  #include "hw/irq.h"
> +#include "qemu/notify.h"
>
>  /* armv7m.c */
>  qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size, int num_irq,
>const char *kernel_filename, const char *cpu_model);
>
> +/*
> + * struct used as a parameter of the arm_load_kernel machine init
> + * done notifier
> + */
> +typedef struct {
> +Notifier notifier; /* actual notifier */
> +ARMCPU *cpu; /* handle to the first cpu object */
> +} ArmLoadKernelNotifier;
> +
>  /* arm_boot.c */
>  struct arm_boot_info {
>  uint64_t ram_size;
> @@ -64,6 +74,8 @@ struct arm_boot_info {
>   * the user it should implement this hook.
>   */
>  void (*modify_dtb)(const struct arm_boot_info *info, void *fdt);
> +/* machine init done notifier executing arm_load_dtb */
> +ArmLoadKernelNotifier load_kernel_notifier;
>  /* Used internally by arm_boot.c */
>  int is_linux;
>  hwaddr initrd_start;
> @@ -75,6 +87,22 @@ struct arm_boot_info {
>   */
>  bool firmware_loaded;
>  };
> +
> +/**
> + * arm_load_kernel - Loads memory with everything needed to boot
> + *
> + * @cpu: handle to the first CPU object
> + * @info: handle to the boot info struct
> + * Registers a machine init done notifier that copies to memory
> + * everything needed to boot, depending on machine and user options:
> + * kernel image, boot loaders, initrd, dtb. Also registers the CPU

Re: [Qemu-devel] [PATCH target-arm v1 8/9] arm: xlnx-zynqmp: Preface CPU variables with "A"

2015-06-11 Thread Alistair Francis

On Thu, Jun 11, 2015 at 9:58 AM, Peter Crosthwaite
 wrote:
> On Tue, Jun 2, 2015 at 4:57 PM, Alistair Francis
>  wrote:
>> On Tue, Jun 2, 2015 at 4:04 AM, Peter Crosthwaite
>>  wrote:
>>> The CPUs currently supported by zynqmp are the APU (application
>>> processing unit) CPUs. There are other CPUs in Zynqmp so unqualified
>>> "cpus" in ambiguous. Preface the variables with "A" accordingly, to
>>> prepare support adding the RPU (realtime processing unit) processors.
>>>
>>> Signed-off-by: Peter Crosthwaite 
>>> ---
>>>  hw/arm/xlnx-ep108.c  |  2 +-
>>>  hw/arm/xlnx-zynqmp.c | 24 
>>>  include/hw/arm/xlnx-zynqmp.h |  4 ++--
>>>  3 files changed, 15 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
>>> index b924f5e..1893b9f 100644
>>> --- a/hw/arm/xlnx-ep108.c
>>> +++ b/hw/arm/xlnx-ep108.c
>>> @@ -65,7 +65,7 @@ static void xlnx_ep108_init(MachineState *machine)
>>>  xlnx_ep108_binfo.kernel_cmdline = machine->kernel_cmdline;
>>>  xlnx_ep108_binfo.initrd_filename = machine->initrd_filename;
>>>  xlnx_ep108_binfo.loader_start = 0;
>>> -arm_load_kernel(&s->soc.cpu[0], &xlnx_ep108_binfo);
>>> +arm_load_kernel(&s->soc.acpu[0], &xlnx_ep108_binfo);
>>>  }
>>>
>>
>> Hey Peter,
>>
>> Why is this acpu instead of apu? APU follows the standard ZynqMP naming
>> conventions, while Application Central Processing Unit (ACPU) doesn't really
>> make sense.
>>
>
> So "apu" (or "rpu") doesn't work either, as each "processing unit" can
> contain more than just CPUs. E.G. The GIC should actually have this
> "apu" preface as well. I was trying to avoid text bloat with the short
> form, but I think the correct answer is going to be:
>
> "apu_cpu"
>
> The PU in each does mean the same thing though :|

Ok, I see what you are saying. I agree with you, I think the best option
is the long names 'apu_*' and 'rpu_*'.

Thanks,

Alistair

>
> Regards,
> Peter
>
>> Thanks,
>>
>> Alistair
>>
>>>  static QEMUMachine xlnx_ep108_machine = {
>>> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
>>> index 6b01965..6faa578 100644
>>> --- a/hw/arm/xlnx-zynqmp.c
>>> +++ b/hw/arm/xlnx-zynqmp.c
>>> @@ -64,10 +64,10 @@ static void xlnx_zynqmp_init(Object *obj)
>>>  XlnxZynqMPState *s = XLNX_ZYNQMP(obj);
>>>  int i;
>>>
>>> -for (i = 0; i < XLNX_ZYNQMP_NUM_CPUS; i++) {
>>> -object_initialize(&s->cpu[i], sizeof(s->cpu[i]),
>>> +for (i = 0; i < XLNX_ZYNQMP_NUM_ACPUS; i++) {
>>> +object_initialize(&s->acpu[i], sizeof(s->acpu[i]),
>>>"cortex-a53-" TYPE_ARM_CPU);
>>> -object_property_add_child(obj, "cpu[*]", OBJECT(&s->cpu[i]),
>>> +object_property_add_child(obj, "acpu[*]", OBJECT(&s->acpu[i]),
>>>&error_abort);
>>>  }
>>>
>>> @@ -95,7 +95,7 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
>>> **errp)
>>>
>>>  qdev_prop_set_uint32(DEVICE(&s->gic), "num-irq", GIC_NUM_SPI_INTR + 
>>> 32);
>>>  qdev_prop_set_uint32(DEVICE(&s->gic), "revision", 2);
>>> -qdev_prop_set_uint32(DEVICE(&s->gic), "num-cpu", XLNX_ZYNQMP_NUM_CPUS);
>>> +qdev_prop_set_uint32(DEVICE(&s->gic), "num-cpu", 
>>> XLNX_ZYNQMP_NUM_ACPUS);
>>>  object_property_set_bool(OBJECT(&s->gic), true, "realized", &err);
>>>  if (err) {
>>>  error_propagate((errp), (err));
>>> @@ -121,38 +121,38 @@ static void xlnx_zynqmp_realize(DeviceState *dev, 
>>> Error **errp)
>>>  }
>>>  }
>>>
>>> -for (i = 0; i < XLNX_ZYNQMP_NUM_CPUS; i++) {
>>> +for (i = 0; i < XLNX_ZYNQMP_NUM_ACPUS; i++) {
>>>  qemu_irq irq;
>>>
>>> -object_property_set_int(OBJECT(&s->cpu[i]), QEMU_PSCI_CONDUIT_SMC,
>>> +object_property_set_int(OBJECT(&s->acpu[i]), QEMU_PSCI_CONDUIT_SMC,
>>>  "psci-conduit", &error_abort);
>>>  if (i > 0) {
>>>  /* Secondary CPUs start in PSCI powered-down state */
>>> -object_property_set_bool(OBJECT(&s->cpu[i]), true,
>>> +object_property_set_bool(OBJECT(&s->acpu[i]), true,
>>>   "start-powered-off", &error_abort);
>>>  }
>>>
>>> -object_property_set_int(OBJECT(&s->cpu[i]), GIC_BASE_ADDR,
>>> +object_property_set_int(OBJECT(&s->acpu[i]), GIC_BASE_ADDR,
>>>  "reset-cbar", &err);
>>>  if (err) {
>>>  error_propagate((errp), (err));
>>>  return;
>>>  }
>>>
>>> -object_property_set_bool(OBJECT(&s->cpu[i]), true, "realized", 
>>> &err);
>>> +object_property_set_bool(OBJECT(&s->acpu[i]), true, "realized", 
>>> &err);
>>>  if (err) {
>>>  error_propagate((errp), (err));
>>>  return;
>>>  }
>>>
>>>  sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i,
>>> -   qdev_get_gpio_in(DEVICE(&s->cpu[i]), 
>>> ARM_CPU_IRQ));
>>> +

Re: [Qemu-devel] [PATCH] dma/rc4030: do multiple calls to address_space_rw when doing DMA transfers

2015-06-11 Thread Aurelien Jarno

On 2015-06-11 22:30, Hervé Poussineau wrote:
> This workarounds a bug in memory management.
> 
> To reproduce the problem, try to start the Windows NT 4.0/MIPS installer.
> After loading some files, you should see a screen saying
> "To set up Windows NT now, press ENTER."
> However, you're welcomed with an IRQL_NOT_LESS_OR_EQUAL bugcheck or an
> Unknown Hard Error c221.
> 
> Signed-off-by: Hervé Poussineau 
> ---
>  hw/dma/rc4030.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
> index 3efa6de..d265d6c 100644
> --- a/hw/dma/rc4030.c
> +++ b/hw/dma/rc4030.c
> @@ -681,6 +681,7 @@ static void rc4030_do_dma(void *opaque, int n, uint8_t 
> *buf, int len, int is_wri
>  rc4030State *s = opaque;
>  hwaddr dma_addr;
>  int dev_to_mem;
> +int i;
>  
>  s->dma_regs[n][DMA_REG_ENABLE] &= ~(DMA_FLAG_TC_INTR | DMA_FLAG_MEM_INTR 
> | DMA_FLAG_ADDR_INTR);
>  
> @@ -699,8 +700,22 @@ static void rc4030_do_dma(void *opaque, int n, uint8_t 
> *buf, int len, int is_wri
>  dma_addr = s->dma_regs[n][DMA_REG_ADDRESS];
>  
>  /* Read/write data at right place */
> +#if 1 /* workaround for a bug in memory management */
> +for (i = 0; i < len; ) {
> +int ncpy = DMA_PAGESIZE - (dma_addr & (DMA_PAGESIZE - 1));
> +if (ncpy > len - i) {
> +ncpy = len - i;
> +}
> +address_space_rw(&s->dma_as, dma_addr, MEMTXATTRS_UNSPECIFIED,
> + buf + i, ncpy, is_write);
> +
> +dma_addr += ncpy;
> +i += ncpy;
> +}
> +#else
>  address_space_rw(&s->dma_as, dma_addr, MEMTXATTRS_UNSPECIFIED,
>   buf, len, is_write);
> +#endif

Hmm, basically your code splits the transfers so that they don't cross
DMA page boundaries. It seems that your DMA memory region is actually
made of small subregions of size DMA_PAGESIZE aliased to the RAM.

Now looking at the address_space_rw function, it seems it optimizes the
write to RAM case by calling address_space_translate() and then doing a
memcpy() of the whole region. It doesn't work given the memory region is
not linear.

That said address_space_translate is supposed to adjust the length if
needed, but does so only if iommu_ops is defined. I therefore wonder if
you therefore shouldn't model this DMA translation tables by using IOMMU
ops instead of subregions.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature

2015-06-11 Thread John Snow



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Adds dirty-bitmaps feature to qcow2 format as specified in
> docs/specs/qcow2.txt
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/Makefile.objs|   2 +-
>  block/qcow2-dirty-bitmap.c | 503 
> +
>  block/qcow2.c  |  56 +
>  block/qcow2.h  |  50 +
>  include/block/block_int.h  |  10 +
>  5 files changed, 620 insertions(+), 1 deletion(-)
>  create mode 100644 block/qcow2-dirty-bitmap.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 0d8c2a4..bff12b4 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,5 +1,5 @@
>  block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
> qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
> qcow2-cache.o qcow2-dirty-bitmap.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> new file mode 100644
> index 000..bc0167c
> --- /dev/null
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -0,0 +1,503 @@
> +/*
> + * Dirty bitmpas for the QCOW version 2 format
> + *
> + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
> + *
> + * This file is derived from qcow2-snapshot.c, original copyright:
> + * Copyright (c) 2004-2006 Fabrice Bellard
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "block/qcow2.h"
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
> +{
> +BDRVQcowState *s = bs->opaque;
> +int i;
> +
> +for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +g_free(s->dirty_bitmaps[i].name);
> +}
> +g_free(s->dirty_bitmaps);
> +s->dirty_bitmaps = NULL;
> +s->nb_dirty_bitmaps = 0;
> +}
> +
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +BDRVQcowState *s = bs->opaque;
> +QCowDirtyBitmapHeader h;
> +QCowDirtyBitmap *bm;
> +int i, name_size;
> +int64_t offset;
> +int ret;
> +
> +if (!s->nb_dirty_bitmaps) {
> +s->dirty_bitmaps = NULL;
> +s->dirty_bitmaps_size = 0;
> +return 0;
> +}
> +
> +offset = s->dirty_bitmaps_offset;
> +s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> +
> +for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +/* Read statically sized part of the dirty_bitmap header */
> +offset = align_offset(offset, 8);
> +ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
> +if (ret < 0) {
> +goto fail;
> +}
> +
> +offset += sizeof(h);
> +bm = s->dirty_bitmaps + i;
> +bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
> +bm->l1_size = be32_to_cpu(h.l1_size);
> +bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
> +bm->bitmap_size = be64_to_cpu(h.bitmap_size);
> +
> +name_size = be16_to_cpu(h.name_size);
> +
> +/* Read dirty_bitmap name */
> +bm->name = g_malloc(name_size + 1);
> +ret = bdrv_pread(bs->file, offset, bm->name, name_size);
> +if (ret < 0) {
> +goto fail;
> +}
> +offset += name_size;
> +bm->name[name_size] = '\0';
> +
> +if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +ret = -EFBIG;
> +goto fail;
> +}
> +}
> +
> +assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
> +s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
>

Re: [Qemu-devel] [PATCH] MIPS: exceptions handling in icount mode

2015-06-11 Thread Aurelien Jarno

On 2015-06-10 11:33, Pavel Dovgalyuk wrote:
> This patch fixes exception handling in MIPS.
> MIPS instructions generate several types of exceptions.
> When exception is generated, it breaks the execution of the current 
> translation
> block. Implementation of the exceptions handling in MIPS does not correctly
> restore icount for the instruction which caused the exception. In most cases
> icount will be decreased by the value equal to the size of TB.

I don't think it is correct. There is no real point of always doing
retranslation for an exception triggered from the helpers, especially
when the CPU state has been saved before anyway?

> This patch passes pointer to the translation block internals to the exception
> handler. It allows correct restoring of the icount value.

Your patch doesn't do that for all the helpers, for example all the
memory access helpers. It probably improves the situation but therefore
doesn't fix it.

From my point of view, it looks like the problem is actually elsewhere
in the common icount code. Do we know if it works correctly on other
emulated architectures? Also do you have a quick example to reproduce
the issue?


> Signed-off-by: Pavel Dovgalyuk 
> ---
>  target-mips/cpu.h|   28 +
>  target-mips/msa_helper.c |5 +++-
>  target-mips/op_helper.c  |   52 
> +++---
>  target-mips/translate.c  |2 ++
>  4 files changed, 45 insertions(+), 42 deletions(-)

[ snip ]

> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index fd063a2..9c2ff7c 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -1675,6 +1675,7 @@ generate_exception_err (DisasContext *ctx, int excp, 
> int err)
>  TCGv_i32 terr = tcg_const_i32(err);
>  save_cpu_state(ctx, 1);
>  gen_helper_raise_exception_err(cpu_env, texcp, terr);
> +ctx->bstate = BS_STOP;
>  tcg_temp_free_i32(terr);
>  tcg_temp_free_i32(texcp);
>  }
> @@ -1684,6 +1685,7 @@ generate_exception (DisasContext *ctx, int excp)
>  {
>  save_cpu_state(ctx, 1);
>  gen_helper_0e0i(raise_exception, excp);
> +ctx->bstate = BS_STOP;
>  }
>  

Why do we need to stop the translation here? The exception might be
conditional (for example for ADDU or SUBU).

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] openbios.git mirror on git.qemu.org

2015-06-11 Thread Mark Cave-Ayland

On 09/06/15 11:22, Stefan Hajnoczi wrote:
> On Tue, May 19, 2015 at 09:55:12PM +0100, Peter Maydell wrote:
>> On 19 May 2015 at 21:47, Mark Cave-Ayland  
>> wrote:
>>> On 19/05/15 13:55, Andreas Färber wrote:
>>>
 Am 19.05.2015 um 12:42 schrieb Stefan Hajnoczi:
> Ping.  Should we stick with an old mirror of OpenBIOS for QEMU 2.4 or
> switch to the official upstream repo?

 I don't quite understand the question. OpenBIOS is still using SVN
 AFAIK. QEMU is using Git, so I thought we always need a Git mirror for
 submodules? Has the git-svn integration been extended to svn submodules
 and which versions of git support that?
>>>
>>> I'm not sure why submodules make anything different
>>
>> They mean we can't literally just point our submodule at upstream,
>> because upstream isn't a git repo and submodules must point
>> at git repos. (However I think we generally prefer to point at
>> a git.qemu.org mirror of upstream's repo anyway.)
>>
>>> , but for an SVN
>>> repository I can't see why the nightly cron job on git.qemu.org can't
>>> just run "git svn fetch && git svn rebase" directly against OpenBIOS SVN
>>> to update its master branch?
>>
>> Sounds reasonable.
>>
 Unless I'm missing something, the only question is which mirror do we
 use, not whether we use a mirror.
>>>
>>> The problem at the moment is that the repository on git.qemu.org is
>>> pointing to a git repository on a plain IP address (with no sensible
>>> reverse DNS) and so far no-one has admitted ownership. This was a
>>> problem last week when the repository stopped syncing with OpenBIOS SVN
>>> trunk and both Stefan and myself had no idea who to contact in order to
>>> get it fixed.
>>
>> It's also an obvious problem in terms of tracability and trust
>> of the code we're shipping to people... We must fix this for 2.4
>> (or ideally ASAP) I think.
> 
> Okay.  I will set up a cronjob to use git-svn to grab the latest
> OpenBIOS from upstream soon.

Hi Stefan,

As it's freeze coming up, I need to send an OpenBIOS pull request fairly
soon. Do you need anything from me to get this done beforehand?


ATB,

Mark.

Re: [Qemu-devel] [PATCH v3 0/7] target-mips: add support for large physical addresses

2015-06-11 Thread Aurelien Jarno

On 2015-06-09 17:42, Leon Alrae wrote:
> Hi,
> 
> This patchset adds large physical address support in MIPS, specifically:
> * eXtended Physical Addressing (XPA)
> * Large Physical Addressing (LPA)
> 
> XPA and LPA are enabled in MIPS32R5-generic and MIPS64R6-generic cores
> respectively.
> 
> The series applies on top of the Config5.FRE patches.
> 
> Regards,
> Leon

The whole series is:

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] Steal time MSR not set properly during live migration?

2015-06-11 Thread Michael Tokarev

11.06.2015 23:46, Apollon Oikonomopoulos wrote:
> On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote:
>> Any ideas?
> 
> As far as I understand, there is an issue when reading the MSR on the 
> incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main 
> thread during initialization, that causes the initial vCPU steal time 
> value to be set using the main thread's (and not the vCPU thread's) 
> run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the 
> vCPU thread's run_delay to determine steal time, causing an overflow.  
> The issue was introduced by commit 
> 917367aa968fd4fef29d340e0c7ec8c608dffaab.
> 
> For the full analysis, see https://bugs.debian.org/785557#64 and the 
> followup e-mail.

Adding Cc's...

Thanks,

/mjt

Re: [Qemu-devel] [PATCH 7/8] qemu: command line option for dirty bitmaps

2015-06-11 Thread John Snow



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> The patch adds the following command line option:
> 
> -dirty-bitmap [option1=val1][,option2=val2]...
> Available options are:
> name The name for the bitmap (necessary).
> 
> file The file to load the bitmap from.
> 
> file_id  When specified with 'file' option, then this file will
>  be available through this id for other -dirty-bitmap
>  options when specified without 'file' option, then it
>  is a reference to 'file', specified with another
>  -dirty-bitmap option, and it will be used to load the
>  bitmap from.
> 
> driveThe drive to bind the bitmap to. It should be specified
>  as 'id' suboption of one of -drive options. If nor
>  'file' neither 'file_id' are specified, then the bitmap
>  will be loaded from that drive (internal dirty bitmap).
> 
> granularity  The granularity for the bitmap. Not necessary, the
>  default value may be used.
> 
> enabled  on|off. Default is 'on'. Disabled bitmaps are not
>  changing regardless of writes to corresponding drive.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  blockdev.c|  38 ++
>  include/sysemu/blockdev.h |   1 +
>  include/sysemu/sysemu.h   |   1 +
>  qemu-options.hx   |  37 +
>  vl.c  | 100 
> ++
>  5 files changed, 177 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 5eaf77e..2a74395 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr)
>  return qemu_opts_parse(qemu_find_opts("drive"), optstr, 0);
>  }
>  
> +QemuOpts *dirty_bitmap_def(const char *optstr)
> +{
> +return qemu_opts_parse(qemu_find_opts("dirty-bitmap"), optstr, 0);
> +}
> +
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>  const char *optstr)
>  {
> @@ -3093,6 +3098,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>  return head;
>  }
>  
> +QemuOptsList qemu_dirty_bitmap_opts = {
> +.name = "dirty-bitmap",
> +.head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head),
> +.desc = {
> +{
> +.name = "name",
> +.type = QEMU_OPT_STRING,
> +.help = "Name of the dirty bitmap",
> +},{
> +.name = "file",
> +.type = QEMU_OPT_STRING,
> +.help = "file name to load the bitmap from",
> +},{
> +.name = "file_id",
> +.type = QEMU_OPT_STRING,
> +.help = "node name to load the bitmap from (or to set id for"
> +" for file, opened by previous option)",
> +},{
> +.name = "drive",
> +.type = QEMU_OPT_STRING,
> +.help = "drive id to bind the bitmap to",
> +},{
> +.name = "granularity",
> +.type = QEMU_OPT_NUMBER,
> +.help = "granularity",
> +},{
> +.name = "enabled",
> +.type = QEMU_OPT_BOOL,
> +.help = "enabled flag (default is 'on')",
> +}
> +}
> +};
> +
>  QemuOptsList qemu_common_drive_opts = {
>  .name = "drive",
>  .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index 7ca59b5..5b101b8 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type);
>  DriveInfo *drive_get_next(BlockInterfaceType type);
>  
>  QemuOpts *drive_def(const char *optstr);
> +QemuOpts *dirty_bitmap_def(const char *optstr);
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>  const char *optstr);
>  DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type);
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 8a52934..681a8f3 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -207,6 +207,7 @@ bool usb_enabled(void);
>  
>  extern QemuOptsList qemu_legacy_drive_opts;
>  extern QemuOptsList qemu_common_drive_opts;
> +extern QemuOptsList qemu_dirty_bitmap_opts;
>  extern QemuOptsList qemu_drive_opts;
>  extern QemuOptsList qemu_chardev_opts;
>  extern QemuOptsList qemu_device_opts;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index ec356f6..5e93122 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -614,6 +614,43 @@ qemu-system-i386 -hda a -hdb b
>  @end example
>  ETEXI
>  
> +DEF("dirty-bitmap", HAS_ARG, QEMU_OPTION_dirty_bitmap,
> +"-dirty-bitmap 
> name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n"
> +"  [,granulari

Re: [Qemu-devel] Steal time MSR not set properly during live migration?

2015-06-11 Thread Apollon Oikonomopoulos

On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote:
> Any ideas?

As far as I understand, there is an issue when reading the MSR on the 
incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main 
thread during initialization, that causes the initial vCPU steal time 
value to be set using the main thread's (and not the vCPU thread's) 
run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the 
vCPU thread's run_delay to determine steal time, causing an overflow.  
The issue was introduced by commit 
917367aa968fd4fef29d340e0c7ec8c608dffaab.

For the full analysis, see https://bugs.debian.org/785557#64 and the 
followup e-mail.

Regards,
Apollon

[Qemu-devel] [PATCH] dma/rc4030: do multiple calls to address_space_rw when doing DMA transfers

2015-06-11 Thread Hervé Poussineau

This workarounds a bug in memory management.

To reproduce the problem, try to start the Windows NT 4.0/MIPS installer.
After loading some files, you should see a screen saying
"To set up Windows NT now, press ENTER."
However, you're welcomed with an IRQL_NOT_LESS_OR_EQUAL bugcheck or an
Unknown Hard Error c221.

Signed-off-by: Hervé Poussineau 
---
 hw/dma/rc4030.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
index 3efa6de..d265d6c 100644
--- a/hw/dma/rc4030.c
+++ b/hw/dma/rc4030.c
@@ -681,6 +681,7 @@ static void rc4030_do_dma(void *opaque, int n, uint8_t 
*buf, int len, int is_wri
 rc4030State *s = opaque;
 hwaddr dma_addr;
 int dev_to_mem;
+int i;
 
 s->dma_regs[n][DMA_REG_ENABLE] &= ~(DMA_FLAG_TC_INTR | DMA_FLAG_MEM_INTR | 
DMA_FLAG_ADDR_INTR);
 
@@ -699,8 +700,22 @@ static void rc4030_do_dma(void *opaque, int n, uint8_t 
*buf, int len, int is_wri
 dma_addr = s->dma_regs[n][DMA_REG_ADDRESS];
 
 /* Read/write data at right place */
+#if 1 /* workaround for a bug in memory management */
+for (i = 0; i < len; ) {
+int ncpy = DMA_PAGESIZE - (dma_addr & (DMA_PAGESIZE - 1));
+if (ncpy > len - i) {
+ncpy = len - i;
+}
+address_space_rw(&s->dma_as, dma_addr, MEMTXATTRS_UNSPECIFIED,
+ buf + i, ncpy, is_write);
+
+dma_addr += ncpy;
+i += ncpy;
+}
+#else
 address_space_rw(&s->dma_as, dma_addr, MEMTXATTRS_UNSPECIFIED,
  buf, len, is_write);
+#endif
 
 s->dma_regs[n][DMA_REG_ENABLE] |= DMA_FLAG_TC_INTR;
 s->dma_regs[n][DMA_REG_COUNT] -= len;
-- 
2.1.4

Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps

2015-06-11 Thread Stefan Hajnoczi

The load/store API is not scalable when bitmaps are 1 MB or larger.

For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
bitmap.  If a guest has several disk images of this size, then multiple
megabytes must be read to start the guest and written out to shut down
the guest.

By comparison, the L1 table for the 500 GB disk image is less than 8 KB.

I think something like qcow2-cache.c or metabitmaps should be used to
lazily read/write persistent bitmaps.  That way only small portions need
to be read/written at a time.

Stefan


pgpy0vyE_dMkL.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
> On 06/11/15 19:46, Marcel Apfelbaum wrote:
> > On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> >> On real machines, the firmware assigns the 4 - it's not a physical
> >> address; it's a logical address (like all bus numbers in PCI).  The
> >> firmware might assign a totally different number on the next boot.
> > Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> > try to understand it.
> > 
> > I looked up a real hardware machine and it seemed to me that the extra
> > pci root numbers
> > are provided in the ACPI tables, meaning by the vendor, not the fw.
> > In this case QEMU is the vendor, i440fx is the machine, right?
> > 
> > I am not aware that Seabios/OVMF are deciding the bus numbers for the
> > *PCI roots*.
> > They are doing it for the pci-2-pci bridges of course.
> > I saw that Seabios is trying to "guess" the root-buses by going over all
> > the 0-0xff range
> > and probing all the slots, looking for devices. So it expects the hw to
> > be hardwired regarding
> > PCI root buses.
> 
> This is exactly how I understood it.
> 
> We're not interested in placing such bus numbers in device paths that
> are assigned during PCI enumeration. (Like subordinate bus numbers.)
> We're talking about the root bus numbers.
> 
> OVMF implements the same kind of probing that SeaBIOS does (based on
> natural language description from Michael and Marcel, not on the actual
> code). Devices on the root buses respond without any prior bus number
> assignments.

Alas, that is not correct.  Coreboot supports several AMD boards that
have multiple southbridge chips which provide independent PCI root
buses.  These chips have to be configured and assigned a bus number
prior to use (which coreboot does).

-Kevin

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 08:46:01PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> >On real machines, the firmware assigns the 4 - it's not a physical
> >address; it's a logical address (like all bus numbers in PCI).  The
> >firmware might assign a totally different number on the next boot.
> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> try to understand it.
> 
> I looked up a real hardware machine and it seemed to me that the extra pci 
> root numbers
> are provided in the ACPI tables, meaning by the vendor, not the fw.
> In this case QEMU is the vendor, i440fx is the machine, right?
> 
> I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI 
> roots*.

So, I'm also not an expert on this.  It seems to be a fairly esoteric
area of PC initialization.

My understanding is that extra PCI roots are configured by coreboot
outside of the normal PCI bridge mechanism.  They are configured by
assigning a base bus number and range (similar to the way PCI bridges
are configured).  All the PCI roots see all the PCI traffic, but they
only forward those requests that fall within their assigned bus range.

On each boot, coreboot might decide to assign a different bus id to
the extra roots (for example, if a device with a PCI bridge is
inserted and it's bus allocation causes bus ids to shift).
Technically, coreboot could even change the order extra buses are
assigned bus ids, but doesn't today.

This was seen on several AMD systems - I'm told at least some Intel
systems have multiple root buses, but the bus numbers are just hard
wired.

> They are doing it for the pci-2-pci bridges of course.
> I saw that Seabios is trying to "guess" the root-buses by going over all the 
> 0-0xff range
> and probing all the slots, looking for devices. So it expects the hw to be 
> hardwired regarding
> PCI root buses.
> Is my understanding incorrect?

SeaBIOS doesn't assign the extra PCI bus numbers on real hardware (nor
even regular PCI bridge numbers) - that's all handled by coreboot.

Under coreboot, SeaBIOS scans the PCI buses to figure out what
coreboot assigned - it doesn't mean the assignments are hard wired.

-Kevin

Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space

2015-06-11 Thread Michael R. Hines


On 06/11/2015 01:58 PM, Dr. David Alan Gilbert wrote:

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:

On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 31 ---
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9532461..cb66721 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
   */
  typedef struct QEMU_PACKED {
  union QEMU_PACKED {
-uint64_t current_addr;  /* offset into the ramblock of the chunk */
+uint64_t current_addr;  /* offset into the ram_addr_t space */
  uint64_t chunk; /* chunk to lookup if unregistering */
  } key;
  uint32_t current_index; /* which ramblock the chunk belongs to */
@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
  uint64_t chunks;/* how many sequential chunks to register */
  } RDMARegister;

-static void register_to_network(RDMARegister *reg)
+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
  {
+RDMALocalBlock *local_block;
+local_block  = &rdma->local_ram_blocks.block[reg->current_index];
+
+if (local_block->is_ram_block) {
+/*
+ * current_addr as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+reg->key.current_addr -= local_block->offset;
+reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
+}
  reg->key.current_addr = htonll(reg->key.current_addr);
  reg->current_index = htonl(reg->current_index);
  reg->chunks = htonll(reg->chunks);
@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
  typedef struct QEMU_PACKED {
  uint32_t value; /* if zero, we will madvise() */
  uint32_t block_idx; /* which ram block index */
-uint64_t offset;/* where in the remote ramblock this chunk */
+uint64_t offset;/* Address in remote ram_addr_t space */
  uint64_t length;/* length of the chunk */
  } RDMACompress;

-static void compress_to_network(RDMACompress *comp)
+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
  {
  comp->value = htonl(comp->value);
+/*
+ * comp->offset as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
+comp->offset += rdma->dest_blocks[comp->block_idx].offset;
  comp->block_idx = htonl(comp->block_idx);
  comp->offset = htonll(comp->offset);
  comp->length = htonll(comp->length);

So, why add the destination block's offset on the source side
just for it to be re-adjusted again when it gets to the destination side?

Can you just stop at this:

+reg->key.current_addr -= local_block->offset;

Without this:

+reg->key.current_addr +=
rdma->dest_blocks[reg->current_index].offset;

... on the source, followed by this on the destionation:

+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;

Without this:

+comp->offset += rdma->dest_blocks[comp->block_idx].offset;

Did I follow correctly?

Aren't both of those conversions happening on the source?
Anyway, I think what you're saying is that we change the value sent over
the network to be an offset within the block instead of an offset in
the whole ram_addr_t space (i.e. that's what happens if you don't
add back on the dest_blocks[].offset).


Yes, right. Can you skip adding/subtracting the local block offset on 
each side?


- Michael

Re: [Qemu-devel] [PATCH 0/4] PPC IBM 40p PReP emulation

2015-06-11 Thread Hervé Poussineau


Hi Artyom,

Le 11/06/2015 10:02, Artyom Tarasenko a écrit :

Hi Hervé,

On Wed, Jun 10, 2015 at 11:18 PM, Hervé Poussineau  wrote:

Hi,

This patchset adds the emulation of the IBM RS/6000 7020 (40p).


Well done! Congratulations on a good job!


The real machine is
able to run AIX (up to 4.3.3), Windows NT (up to 4.0 SP1), the beta of OS/2 
PowerPC,
Solaris, Linux, NetBSD/PReP ...



I've tested current emulation with Open Firmware PReP and with official 
firmware.
Patch 2 has been of a great help when using official firmware. However, if 
required,
I can drop it.

Linux kernel runs.
Windows NT starts up to the point where it wants to change endianness.
Other OSes have not been tested.


Solaris would likely have the same problem: it's little-endian on PReP.


To test, download firmware a http://tyom.de/qprepofw-serial-svn-3738.rom . 
Thanks Artyom!


You are welcome. I see your machine is using a S3 graphic card. If you
like I can add a driver for it.
Not within the next days though. Out of curiosity: is the proprietary
firmware also able to use a Cirrus Logic card?


No, it only handles S3 graphic cards, as is using some IBM/8514 commands.
It would probably be more interesting to add support for LSI 53c810 SCSI card, 
if possible.

Regards,

Hervé

Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space

2015-06-11 Thread Dr. David Alan Gilbert

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" 
> >
> >The 'offset' field in RDMACompress and 'current_addr' field
> >in RDMARegister are commented as being offsets within a particular
> >RAMBlock, however they appear to actually be offsets within the
> >ram_addr_t space.
> >
> >The code currently assumes that the offsets on the source/destination
> >match, this change removes the need for the assumption for these
> >structures by translating the addresses into the ram_addr_t space of
> >the destination host.
> >
> >Note: An alternative would be to change the fields to actually
> >take the data they're commented for; this would potentially be
> >simpler but would break stream compatibility for those cases
> >that currently work.
> >
> >Signed-off-by: Dr. David Alan Gilbert 
> >---
> >  migration/rdma.c | 31 ---
> >  1 file changed, 24 insertions(+), 7 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 9532461..cb66721 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader 
> >*control)
> >   */
> >  typedef struct QEMU_PACKED {
> >  union QEMU_PACKED {
> >-uint64_t current_addr;  /* offset into the ramblock of the chunk */
> >+uint64_t current_addr;  /* offset into the ram_addr_t space */
> >  uint64_t chunk; /* chunk to lookup if unregistering */
> >  } key;
> >  uint32_t current_index; /* which ramblock the chunk belongs to */
> >@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
> >  uint64_t chunks;/* how many sequential chunks to register 
> > */
> >  } RDMARegister;
> >
> >-static void register_to_network(RDMARegister *reg)
> >+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
> >  {
> >+RDMALocalBlock *local_block;
> >+local_block  = &rdma->local_ram_blocks.block[reg->current_index];
> >+
> >+if (local_block->is_ram_block) {
> >+/*
> >+ * current_addr as passed in is an address in the local ram_addr_t
> >+ * space, we need to translate this for the destination
> >+ */
> >+reg->key.current_addr -= local_block->offset;
> >+reg->key.current_addr += 
> >rdma->dest_blocks[reg->current_index].offset;
> >+}
> >  reg->key.current_addr = htonll(reg->key.current_addr);
> >  reg->current_index = htonl(reg->current_index);
> >  reg->chunks = htonll(reg->chunks);
> >@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
> >  typedef struct QEMU_PACKED {
> >  uint32_t value; /* if zero, we will madvise() */
> >  uint32_t block_idx; /* which ram block index */
> >-uint64_t offset;/* where in the remote ramblock this chunk */
> >+uint64_t offset;/* Address in remote ram_addr_t space */
> >  uint64_t length;/* length of the chunk */
> >  } RDMACompress;
> >
> >-static void compress_to_network(RDMACompress *comp)
> >+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
> >  {
> >  comp->value = htonl(comp->value);
> >+/*
> >+ * comp->offset as passed in is an address in the local ram_addr_t
> >+ * space, we need to translate this for the destination
> >+ */
> >+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> >+comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> >  comp->block_idx = htonl(comp->block_idx);
> >  comp->offset = htonll(comp->offset);
> >  comp->length = htonll(comp->length);
> 
> So, why add the destination block's offset on the source side
> just for it to be re-adjusted again when it gets to the destination side?
> 
> Can you just stop at this:
> 
> +reg->key.current_addr -= local_block->offset;
> 
> Without this:
> 
> +reg->key.current_addr +=
> rdma->dest_blocks[reg->current_index].offset;
> 
> ... on the source, followed by this on the destionation:
> 
> +comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
> 
> Without this:
> 
> +comp->offset += rdma->dest_blocks[comp->block_idx].offset;
> 
> Did I follow correctly?

Aren't both of those conversions happening on the source?
Anyway, I think what you're saying is that we change the value sent over
the network to be an offset within the block instead of an offset in
the whole ram_addr_t space (i.e. that's what happens if you don't
add back on the dest_blocks[].offset).  As I commented in the commit
message, that would work but it would break compatibility with existing
RDMA migrations since the offset field would now have a different meaning.

Dave

> 
> >@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext 
> >*rdma)
> >  rdma->total_registrations--;
> >
> >  reg.key.chunk = chunk;
> >-register_to_network(®);
> >+register_to_network

Re: [Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

Use the order of incoming RAMBlocks from the source to record
an index number; that then allows us to sort the destination
local RAMBlock list to match the source.

Now that the RAMBlocks are known to be in the same order, this
simplifies the RDMA Registration step which previously tried to
match RAMBlocks based on offset (which isn't guaranteed to match).

Looking at the existing compress code, I think it was erroneously
relying on an assumption of matching ordering, which this fixes.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 101 ---
  trace-events |   2 ++
  2 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f541586..92dc5c1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -224,6 +224,7 @@ typedef struct RDMALocalBlock {
  uint32_t  *remote_keys; /* rkeys for chunk-level registration */
  uint32_t   remote_rkey; /* rkeys for non-chunk-level registration 
*/
  intindex;   /* which block are we */
+unsigned int   src_index;   /* (Only used on dest) */
  bool   is_ram_block;
  intnb_chunks;
  unsigned long *transit_bitmap;
@@ -353,6 +354,9 @@ typedef struct RDMAContext {
  RDMALocalBlocks local_ram_blocks;
  RDMADestBlock  *dest_blocks;

+/* Index of the next RAMBlock received during block registration */
+unsigned intnext_src_index;
+
  /*
   * Migration on *destination* started.
   * Then use coroutine yield function.
@@ -561,6 +565,7 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,
  block->offset = block_offset;
  block->length = length;
  block->index = local->nb_blocks;
+block->src_index = ~0U; /* Filled in by the receipt of the block list */
  block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
  block->transit_bitmap = bitmap_new(block->nb_chunks);
  bitmap_clear(block->transit_bitmap, 0, block->nb_chunks);
@@ -2909,6 +2914,14 @@ err_rdma_dest_wait:
  return ret;
  }

+static int dest_ram_sort_func(const void *a, const void *b)
+{
+unsigned int a_index = ((const RDMALocalBlock *)a)->src_index;
+unsigned int b_index = ((const RDMALocalBlock *)b)->src_index;
+
+return (a_index < b_index) ? -1 : (a_index != b_index);
+}
+
  /*
   * During each iteration of the migration, we listen for instructions
   * by the source VM to perform dynamic page registrations before they
@@ -2986,6 +2999,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
  case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
  trace_qemu_rdma_registration_handle_ram_blocks();

+/* Sort our local RAM Block list so it's the same as the source,
+ * we can do this since we've filled in a src_index in the list
+ * as we received the RAMBlock list earlier.
+ */
+qsort(rdma->local_ram_blocks.block,
+  rdma->local_ram_blocks.nb_blocks,
+  sizeof(RDMALocalBlock), dest_ram_sort_func);
  if (rdma->pin_all) {
  ret = qemu_rdma_reg_whole_ram_blocks(rdma);
  if (ret) {
@@ -3013,6 +3033,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
  rdma->dest_blocks[i].length = local->block[i].length;

  dest_block_to_network(&rdma->dest_blocks[i]);
+trace_qemu_rdma_registration_handle_ram_blocks_loop(
+local->block[i].block_name,
+local->block[i].offset,
+local->block[i].length,
+local->block[i].local_host_addr,
+local->block[i].src_index);
  }

  blocks.len = rdma->local_ram_blocks.nb_blocks
@@ -3136,13 +3162,44 @@ out:
  return ret;
  }

+/* Destination:
+ * Called via a ram_control_load_hook during the initial RAM load section which
+ * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
+ * on the source.
+ * We've already built our local RAMBlock list, but not yet sent the list to
+ * the source.
+ */
+static int rdma_block_notification_handle(QEMUFileRDMA *rfile, const char 
*name)
+{
+RDMAContext *rdma = rfile->rdma;
+int curr;
+int found = -1;
+
+/* Find the matching RAMBlock in our local list */
+for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
+if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
+found = curr;
+break;
+}
+}
+
+if (found == -1) {
+error_report("RAMBlock '%s' not found on destination", name);
+return -ENOENT;
+}
+
+rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
+trace_rdm

Re: [Qemu-devel] [PATCH v2 09/12] Rework ram block hash

2015-06-11 Thread Dr. David Alan Gilbert

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" 
> >
> >RDMA uses a hash from block offset->RAM Block; this isn't needed
> >on the destination, and it becomes harder to maintain after the next
> >patch in the series that sorts the block list.
> >
> >Split the hash so that it's only generated on the source.
> >
> >Signed-off-by: Dr. David Alan Gilbert 
> >---
> >  migration/rdma.c | 32 
> >  1 file changed, 20 insertions(+), 12 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 8d99378..f541586 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const 
> >char *block_name,
> >   ram_addr_t block_offset, uint64_t length)
> >  {
> >  RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> >-(void *)(uintptr_t)block_offset);
> >+RDMALocalBlock *block;
> >  RDMALocalBlock *old = local->block;
> >
> >-assert(block == NULL);
> >-
> >  local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 
> > 1));
> >
> >  if (local->nb_blocks) {
> >  int x;
> >
> >-for (x = 0; x < local->nb_blocks; x++) {
> >-g_hash_table_remove(rdma->blockmap,
> >-(void *)(uintptr_t)old[x].offset);
> >-g_hash_table_insert(rdma->blockmap,
> >-(void *)(uintptr_t)old[x].offset,
> >-&local->block[x]);
> >+if (rdma->blockmap) {
> >+for (x = 0; x < local->nb_blocks; x++) {
> >+g_hash_table_remove(rdma->blockmap,
> >+(void *)(uintptr_t)old[x].offset);
> >+g_hash_table_insert(rdma->blockmap,
> >+(void *)(uintptr_t)old[x].offset,
> >+&local->block[x]);
> >+}
> >  }
> >  memcpy(local->block, old, sizeof(RDMALocalBlock) * 
> > local->nb_blocks);
> >  g_free(old);
> >@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char 
> >*block_name,
> >
> >  block->is_ram_block = local->init ? false : true;
> >
> >-g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+if (rdma->blockmap) {
> >+g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
> >+}
> >
> >  trace_rdma_add_block(block_name, local->nb_blocks,
> >   (uintptr_t) block->local_host_addr,
> >@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> >  RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >
> >  assert(rdma->blockmap == NULL);
> >-rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >  memset(local, 0, sizeof *local);
> >  qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
> >  trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
> >@@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, 
> >Error **errp, bool pin_all)
> >  goto err_rdma_source_init;
> >  }
> >
> >+/* Build the hash that maps from offset to RAMBlock */
> >+rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
> >+for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
> >+g_hash_table_insert(rdma->blockmap,
> >+(void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
> >+&rdma->local_ram_blocks.block[idx]);
> >+}
> >+
> >  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> >  ret = qemu_rdma_reg_control(rdma, idx);
> >  if (ret) {
> 
> You didn't want to use the ID string as a key? I forget

I was trying not to; it sounded expensive to hash on strings
if we didn't need it.

> Reviewed-by: Michael R. Hines 

Thanks.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load

2015-06-11 Thread Dr. David Alan Gilbert

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" 
> >
> >We need the names of RAMBlocks as they're loaded for RDMA,
> >reuse a slightly modified ram_control_load_hook:
> >   a) Pass a 'data' parameter to use for the name in the block-reg
> >  case
> >   b) Only some hook types now require the presence of a hook function.
> >
> >Signed-off-by: Dr. David Alan Gilbert 
> >---
> >  arch_init.c   |  4 +++-
> >  include/migration/migration.h |  2 +-
> >  include/migration/qemu-file.h | 14 +-
> >  migration/qemu-file.c | 16 +++-
> >  migration/rdma.c  | 28 ++--
> >  trace-events  |  2 +-
> >  6 files changed, 47 insertions(+), 19 deletions(-)
> >
> >diff --git a/arch_init.c b/arch_init.c
> >index d294474..dc9cc7e 100644
> >--- a/arch_init.c
> >+++ b/arch_init.c
> >@@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> >version_id)
> >  error_report_err(local_err);
> >  }
> >  }
> >+ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
> >+  block->idstr);
> >  break;
> >  }
> >  }
> >@@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> >version_id)
> >  break;
> >  default:
> >  if (flags & RAM_SAVE_FLAG_HOOK) {
> >-ram_control_load_hook(f, flags);
> >+ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
> >  } else {
> >  error_report("Unknown combination of migration flags: %#x",
> >   flags);
> >diff --git a/include/migration/migration.h b/include/migration/migration.h
> >index a6e025a..096e1ea 100644
> >--- a/include/migration/migration.h
> >+++ b/include/migration/migration.h
> >@@ -164,7 +164,7 @@ int migrate_decompress_threads(void);
> >
> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
> >
> >  /* Whenever this is found in the data stream, the flags
> >   * will be passed to ram_control_load_hook in the incoming-migration
> >diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> >index a01c5b8..7aafe19 100644
> >--- a/include/migration/qemu-file.h
> >+++ b/include/migration/qemu-file.h
> >@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, 
> >struct iovec *iov,
> >  /*
> >   * This function provides hooks around different
> >   * stages of RAM migration.
> >+ * 'opaque' is the backend specific data in QEMUFile
> >+ * 'data' is call specific data associated with the 'flags' value
> >   */
> >-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
> >+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
> >+  void *data);
> >
> >  /*
> >   * Constants used by ram_control_* hooks
> >   */
> >-#define RAM_CONTROL_SETUP0
> >-#define RAM_CONTROL_ROUND1
> >-#define RAM_CONTROL_HOOK 2
> >-#define RAM_CONTROL_FINISH   3
> >+#define RAM_CONTROL_SETUP 0
> >+#define RAM_CONTROL_ROUND 1
> >+#define RAM_CONTROL_HOOK  2
> >+#define RAM_CONTROL_FINISH3
> >+#define RAM_CONTROL_BLOCK_REG 4
> >
> >  /*
> >   * This function allows override of where the RAM page
> >diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >index 2750365..5493977 100644
> >--- a/migration/qemu-file.c
> >+++ b/migration/qemu-file.c
> >@@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t 
> >flags)
> >  int ret = 0;
> >
> >  if (f->ops->before_ram_iterate) {
> >-ret = f->ops->before_ram_iterate(f, f->opaque, flags);
> >+ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
> >  if (ret < 0) {
> >  qemu_file_set_error(f, ret);
> >  }
> >@@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t 
> >flags)
> >  int ret = 0;
> >
> >  if (f->ops->after_ram_iterate) {
> >-ret = f->ops->after_ram_iterate(f, f->opaque, flags);
> >+ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
> >  if (ret < 0) {
> >  qemu_file_set_error(f, ret);
> >  }
> >  }
> >  }
> >
> >-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
> >+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
> >  {
> >  int ret = -EINVAL;
> >
> >  if (f->ops->hook_ram_load) {
> >-ret = f->ops->hook_ram_load(f, f->opaque, flags);
> >+ret = f->ops->hook_ram_load(f, f->opaque

Re: [Qemu-devel] [PATCH v2 09/12] Rework ram block hash

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

RDMA uses a hash from block offset->RAM Block; this isn't needed
on the destination, and it becomes harder to maintain after the next
patch in the series that sorts the block list.

Split the hash so that it's only generated on the source.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 32 
  1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8d99378..f541586 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,
   ram_addr_t block_offset, uint64_t length)
  {
  RDMALocalBlocks *local = &rdma->local_ram_blocks;
-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-(void *)(uintptr_t)block_offset);
+RDMALocalBlock *block;
  RDMALocalBlock *old = local->block;

-assert(block == NULL);
-
  local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));

  if (local->nb_blocks) {
  int x;

-for (x = 0; x < local->nb_blocks; x++) {
-g_hash_table_remove(rdma->blockmap,
-(void *)(uintptr_t)old[x].offset);
-g_hash_table_insert(rdma->blockmap,
-(void *)(uintptr_t)old[x].offset,
-&local->block[x]);
+if (rdma->blockmap) {
+for (x = 0; x < local->nb_blocks; x++) {
+g_hash_table_remove(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset);
+g_hash_table_insert(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset,
+&local->block[x]);
+}
  }
  memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
  g_free(old);
@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,

  block->is_ram_block = local->init ? false : true;

-g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+if (rdma->blockmap) {
+g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+}

  trace_rdma_add_block(block_name, local->nb_blocks,
   (uintptr_t) block->local_host_addr,
@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
  RDMALocalBlocks *local = &rdma->local_ram_blocks;

  assert(rdma->blockmap == NULL);
-rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
  memset(local, 0, sizeof *local);
  qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
  trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
@@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, 
Error **errp, bool pin_all)
  goto err_rdma_source_init;
  }

+/* Build the hash that maps from offset to RAMBlock */
+rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
+for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+g_hash_table_insert(rdma->blockmap,
+(void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
+&rdma->local_ram_blocks.block[idx]);
+}
+
  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
  ret = qemu_rdma_reg_control(rdma, idx);
  if (ret) {


You didn't want to use the ID string as a key? I forget

Reviewed-by: Michael R. Hines

Re: [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash

2015-06-11 Thread Dr. David Alan Gilbert

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" 
> >
> >In the next patch we remove the hash on the destination,
> >rdma_delete_block does two things with the hash which can be avoided:
> >   a) The caller passes the offset and rdma_delete_block looks it up
> >  in the hash; fixed by getting the caller to pass the block
> >   b) The hash gets recreated after deletion; fixed by making that
> >  conditional on the hash being initialised.
> >
> >While this function is currently only used during cleanup, Michael
> >asked that we keep it general for future dynamic block registration
> >work.
> >
> >Signed-off-by: Dr. David Alan Gilbert 
> >---
> >  migration/rdma.c | 27 ---
> >  trace-events |  2 +-
> >  2 files changed, 17 insertions(+), 12 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index 396329c..8d99378 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
> >  return 0;
> >  }
> >
> >-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
> >+/*
> >+ * Note: If used outside of cleanup, the caller must ensure that the 
> >destination
> >+ * block structures are also updated
> >+ */
> >+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
> >  {
> >  RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> >-(void *) block_offset);
> >  RDMALocalBlock *old = local->block;
> >  int x;
> >
> >-assert(block);
> >-
> >+if (rdma->blockmap) {
> >+g_hash_table_remove(rdma->blockmap, (void 
> >*)(uintptr_t)block->offset);
> >+}
> >  if (block->pmr) {
> >  int j;
> >
> >@@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, 
> >ram_addr_t block_offset)
> >  g_free(block->block_name);
> >  block->block_name = NULL;
> >
> >-for (x = 0; x < local->nb_blocks; x++) {
> >-g_hash_table_remove(rdma->blockmap, (void 
> >*)(uintptr_t)old[x].offset);
> >+if (rdma->blockmap) {
> >+for (x = 0; x < local->nb_blocks; x++) {
> >+g_hash_table_remove(rdma->blockmap,
> >+(void *)(uintptr_t)old[x].offset);
> >+}
> >  }
> >
> >  if (local->nb_blocks > 1) {
> >@@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, 
> >ram_addr_t block_offset)
> >  local->block = NULL;
> >  }
> >
> >-trace_rdma_delete_block(local->nb_blocks,
> >-   (uintptr_t)block->local_host_addr,
> >+trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
> > block->offset, block->length,
> >  (uintptr_t)(block->local_host_addr + 
> > block->length),
> > BITS_TO_LONGS(block->nb_chunks) *
> >@@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, 
> >ram_addr_t block_offset)
> >
> >  local->nb_blocks--;
> >
> >-if (local->nb_blocks) {
> >+if (local->nb_blocks && rdma->blockmap) {
> >  for (x = 0; x < local->nb_blocks; x++) {
> >  g_hash_table_insert(rdma->blockmap,
> >  (void *)(uintptr_t)local->block[x].offset,
> >@@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >
> >  if (rdma->local_ram_blocks.block) {
> >  while (rdma->local_ram_blocks.nb_blocks) {
> >-rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
> >+rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
> >  }
> >  }
> 
> Looks good overall. Maybe this is a silly question, but have you done
> a few migrations over actual RDMA hardware yet?

Yes, I wouldn't call it heavy testing but I've done a few basic f22 migrates
with load.

Dave

> 
> Reviewed-by: Michael R. Hines 
> 
> >diff --git a/trace-events b/trace-events
> >index 0f37a4b..7dff362 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, 
> >int index, int64_t offset)
> >  qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 
> > " chunks, (%" PRIu64 " MB)"
> >  qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t 
> > offset) "Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, 
> > index: %d, offset: %" PRId64
> >  rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t 
> > offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added 
> > Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " 
> > end: %" PRIu64 " bits %" PRIu64 " chunks %d"
> >-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, 
> >uint64_t end, uint64_t bits, int chunks) "Del

Re: [Qemu-devel] [SeaBIOS] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Laszlo Ersek

On 06/11/15 18:48, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
>> On 06/11/15 15:58, Kevin O'Connor wrote:
>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
 The fixes solves the following issue:
 The PXB device exposes a new  pci root bridge with the
 fw path:  /pci-root@4/..., in which 4 is the root bus number.
 Before this patch the fw path was wrongly computed:
 /pci-root@1/pci@i0cf8/...
 Fix the above issues: Correct the bus number and remove the
 extra host bridge description.
>>>
>>> Why is that wrong?  The previous path looks correct to me.
>>>
 The IEEE Std 1275-1994:

   IEEE Standard for Boot (Initialization Configuration)
 Firmware: Core Requirements and Practices
   3.2.1.1 Node names
   Each node in the device tree is identified by a node name
   using the following notation:
   driver-name@unit-address:device-arguments

   The driver name field is a sequence of between one and 31
   letters [...]. By convention, this name includes the name of
   the device’s manufacturer and the device’s model name separated 
 by
   a “,”.

   The unit address field is the text representation of the
   physical address of the device within the address space
   defined by its parent node. The form of the text
   representation is bus-dependent.
>>>
>>> Note the "physical address" part in the above.  Your patch changes the
>>> "pci-root@" syntax to use a logical address instead of a physical
>>> address.  That is, unless I've missed something, SeaBIOS today uses a
>>> physical address (the n'th root bus) and the patch would change it to
>>> use a logical address.
>>>
>>> One of the goals of using an "openfirmware" like address was so that
>>> they would be stable across boots (the same mechanism is also used
>>> with coreboot).  Using a physical address is key for this, because
>>> simply adding or removing a PCI device could cause the logical PCI
>>> bridge enumeration to change - and that would mess up the bootorder
>>> list if it was based on logical addresses.
>>
>> There are two questions here. The first is the inclusion of the
>> "pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
>> The hunk that changes that is not your main concern, right? (And Marcel
>> just described that hunk in more detail.)
>>
>> The other question is how "x" is selected in "pci-root@x".
>>
>> On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
>> If you change that property from (say) 3 to 4, then the device paths
>> exported by QEMU will change. However, the location (in the PCI
>> hierarchy) of all the affected devices will *also* change at once, and
>> their auto-enumerated, firmware-side device paths will reflect that.
>> Therefore the new "bootorder" fw_cfg entries will match the freshly
>> generated firmware-side device paths.
>>
>> So why is this not stable? If you change the hardware without
>> automatically updating any stashed firmware-side device paths, then
>> things will fall apart without "bootorder" entries in the picture anyway.
>>
>> Also, assuming you key off "x" of the running counter that counts root
>> buses as they are found during enumeration, that's a possibility too,
>> but I don't see how it gives more stability. If you insert a new root
>> bus (with a device on it) between to preexistent ones, that will offset
>> all the "x" values for the root buses that come after it by one.
> 
> The SeaBIOS code is used on both virtual machines and real machines.
> The bus number is something that is generated by software

Not the root bus numbers, as far as I understand.

(Please see the rest of my reply in the other sub-thread.)

Thanks
Laszlo

> and it is
> not assured to be stable between boots.  (For example, if someone adds
> a PCI device to their machine between boots then every bus number in
> the system might be different on the next boot.)  The open firmware
> paths go to great length to avoid arbitrary bus numbers today - for
> example:
> 
> /pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
> 
> Given the complexity to avoid arbitrary bus numbers I'm confused why
> one would want to add them.
> 
>> In UEFI at least (I'm not speaking about OVMF in particular, but the
>> UEFI spec), there is a "short-form device path" concept for hard drive
>> and USB boot options. For hard disks, it is practically a relative
>> device path that lacks the path fragment from the root node until just
>> before the GPT partition identifier. The idea being, if you plug your
>> SCSI controller in another PCI slot, the change in the full device path
>> will be local to the path fragment that is not captured in the
>> (persistent) boot option. The GPT GUID can identify the partition
>> uniquely in the system w

Re: [Qemu-devel] [PATCH v2 04/12] rdma typos

2015-06-11 Thread Dr. David Alan Gilbert

* Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote:
> On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" 
> >
> >A couple of typo fixes.
> >
> >Signed-off-by: Dr. David Alan Gilbert 
> >---
> >  migration/rdma.c | 6 +++---
> >  trace-events | 4 ++--
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> >
> >diff --git a/migration/rdma.c b/migration/rdma.c
> >index bc73ff8..44ed996 100644
> >--- a/migration/rdma.c
> >+++ b/migration/rdma.c
> >@@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)
> >
> >  /*
> >   * Perform a non-optimized memory unregistration after every transfer
> >- * for demonsration purposes, only if pin-all is not requested.
> >+ * for demonstration purposes, only if pin-all is not requested.
> >   *
> >   * Potential optimizations:
> >   * 1. Start a new thread to run this function continuously
> >@@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void 
> >*opaque)
> >  QEMUFile *f;
> >  Error *local_err = NULL, **errp = &local_err;
> >
> >-trace_qemu_dma_accept_incoming_migration();
> >+trace_qemu_rdma_accept_incoming_migration();
> >  ret = qemu_rdma_accept(rdma);
> >
> >  if (ret) {
> >@@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void 
> >*opaque)
> >  return;
> >  }
> >
> >-trace_qemu_dma_accept_incoming_migration_accepted();
> >+trace_qemu_rdma_accept_incoming_migration_accepted();
> >
> >  f = qemu_fopen_rdma(rdma, "rb");
> >  if (f == NULL) {
> >diff --git a/trace-events b/trace-events
> >index 2662ffa..8b468fe 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending 
> >size %" PRIu64 " max %" PR
> >  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double 
> > bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " 
> > bandwidth %g max_size %" PRId64
> >
> >  # migration/rdma.c
> >-qemu_dma_accept_incoming_migration(void) ""
> >-qemu_dma_accept_incoming_migration_accepted(void) ""
> >+qemu_rdma_accept_incoming_migration(void) ""
> >+qemu_rdma_accept_incoming_migration_accepted(void) ""
> 
> What happened to the actual message inside the quotes? =)

You don't need them in most cases; the trace tools normally print
the name of the event.

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

In the next patch we remove the hash on the destination,
rdma_delete_block does two things with the hash which can be avoided:
   a) The caller passes the offset and rdma_delete_block looks it up
  in the hash; fixed by getting the caller to pass the block
   b) The hash gets recreated after deletion; fixed by making that
  conditional on the hash being initialised.

While this function is currently only used during cleanup, Michael
asked that we keep it general for future dynamic block registration
work.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 27 ---
  trace-events |  2 +-
  2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 396329c..8d99378 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
  return 0;
  }

-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
+/*
+ * Note: If used outside of cleanup, the caller must ensure that the 
destination
+ * block structures are also updated
+ */
+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
  {
  RDMALocalBlocks *local = &rdma->local_ram_blocks;
-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-(void *) block_offset);
  RDMALocalBlock *old = local->block;
  int x;

-assert(block);
-
+if (rdma->blockmap) {
+g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
+}
  if (block->pmr) {
  int j;

@@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
  g_free(block->block_name);
  block->block_name = NULL;

-for (x = 0; x < local->nb_blocks; x++) {
-g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
+if (rdma->blockmap) {
+for (x = 0; x < local->nb_blocks; x++) {
+g_hash_table_remove(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset);
+}
  }

  if (local->nb_blocks > 1) {
@@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
  local->block = NULL;
  }

-trace_rdma_delete_block(local->nb_blocks,
-   (uintptr_t)block->local_host_addr,
+trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
 block->offset, block->length,
  (uintptr_t)(block->local_host_addr + 
block->length),
 BITS_TO_LONGS(block->nb_chunks) *
@@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)

  local->nb_blocks--;

-if (local->nb_blocks) {
+if (local->nb_blocks && rdma->blockmap) {
  for (x = 0; x < local->nb_blocks; x++) {
  g_hash_table_insert(rdma->blockmap,
  (void *)(uintptr_t)local->block[x].offset,
@@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)

  if (rdma->local_ram_blocks.block) {
  while (rdma->local_ram_blocks.nb_blocks) {
-rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
+rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
  }
  }


Looks good overall. Maybe this is a silly question, but have you done
a few migrations over actual RDMA hardware yet?

Reviewed-by: Michael R. Hines 


diff --git a/trace-events b/trace-events
index 0f37a4b..7dff362 100644
--- a/trace-events
+++ b/trace-events
@@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int 
index, int64_t offset)
  qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " chunks, 
(%" PRIu64 " MB)"
  qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) "Entire chunk 
is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, offset: %" PRId64
  rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) 
"Added Block: '%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 
" bits %" PRIu64 " chunks %d"
-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: 
%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 
" chunks %d"
+rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Deleted Block: 
%p, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 
" chunks %d"
  rdma_start_incoming_migration(void) ""
  rdma_start_incoming_migration_after_dest_init(void) ""
  rdma_start_incoming_migration_after_rdma_list

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Laszlo Ersek

On 06/11/15 19:46, Marcel Apfelbaum wrote:
> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
>> On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
>>> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
 On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>> The fixes solves the following issue:
>>> The PXB device exposes a new  pci root bridge with the
>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>> Before this patch the fw path was wrongly computed:
>>>  /pci-root@1/pci@i0cf8/...
>>> Fix the above issues: Correct the bus number and remove the
>>> extra host bridge description.
>>
>> Why is that wrong?  The previous path looks correct to me.
> The prev path includes both the extra root bridge and *then* the
> usual host bridge.
>   /pci-root@1/pci@i0cf8/   ...
>  ^ new   ^ regular  ^ devices
>
> Since the new pci root bridge (and bus) is on "paralel" with the
> regular one.
> it is not correct to add it to the path.
>
> The architecture is:
>   //devices...
>   /extra root bridge/devices...
>   /extra root bridge/devices...
> And not
> /extra root bridge///devices

 Your patch changed both the "/extra root bridge/devices..." part and
 the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
 correct IMO.
>>> Why? @1 should be the unit address which is the text representation
>>> of the physical address, in our case the slot. Since the bus number
>>> in our case is 4, I think /pci-root@4/ is the 'correct' address.
>>
>> On real machines, the firmware assigns the 4 - it's not a physical
>> address; it's a logical address (like all bus numbers in PCI).  The
>> firmware might assign a totally different number on the next boot.
> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> try to understand it.
> 
> I looked up a real hardware machine and it seemed to me that the extra
> pci root numbers
> are provided in the ACPI tables, meaning by the vendor, not the fw.
> In this case QEMU is the vendor, i440fx is the machine, right?
> 
> I am not aware that Seabios/OVMF are deciding the bus numbers for the
> *PCI roots*.
> They are doing it for the pci-2-pci bridges of course.
> I saw that Seabios is trying to "guess" the root-buses by going over all
> the 0-0xff range
> and probing all the slots, looking for devices. So it expects the hw to
> be hardwired regarding
> PCI root buses.

This is exactly how I understood it.

We're not interested in placing such bus numbers in device paths that
are assigned during PCI enumeration. (Like subordinate bus numbers.)
We're talking about the root bus numbers.

OVMF implements the same kind of probing that SeaBIOS does (based on
natural language description from Michael and Marcel, not on the actual
code). Devices on the root buses respond without any prior bus number
assignments. Therefore it makes sense to place those root bus numbers
into device paths.

The bus numbers assignable by the firmware come from the intervals
*between* the (fixed-in-hardware) root bus numbers. As I understand it,
for two adjacent root bus numbers R1 and R2, the both-sides-exclusive
interval (R1, R2) is available for secondary bus number assignment, to
non-root buses that are (recursively) behind PCI bridges that hang off
the R1 root bus (ie. the LHS of the interval).

We don't care about such firmware-assigned bus numbers at all, but R1,
R2 etc. must be communicated to the firmware *somehow* in order to
identify devices for booting.

Since R1, R2 etc are not *assigned* by the firmware, only detected (the
assignment happens in QEMU, and also by the hw vendor in case of
physical hardware), R1, R2 etc are permanent as long as the physical
configuration does not change. Hence they qualify for the physical
addressing nature of OFW device paths. We've just been looking for a
*syntax* to express them.

> Is my understanding incorrect?

FWIW I'm relieved that at least the two of us have been understanding
each other ;)

Laszlo

Re: [Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

We need the names of RAMBlocks as they're loaded for RDMA,
reuse a slightly modified ram_control_load_hook:
   a) Pass a 'data' parameter to use for the name in the block-reg
  case
   b) Only some hook types now require the presence of a hook function.

Signed-off-by: Dr. David Alan Gilbert 
---
  arch_init.c   |  4 +++-
  include/migration/migration.h |  2 +-
  include/migration/qemu-file.h | 14 +-
  migration/qemu-file.c | 16 +++-
  migration/rdma.c  | 28 ++--
  trace-events  |  2 +-
  6 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d294474..dc9cc7e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  error_report_err(local_err);
  }
  }
+ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
+  block->idstr);
  break;
  }
  }
@@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
  break;
  default:
  if (flags & RAM_SAVE_FLAG_HOOK) {
-ram_control_load_hook(f, flags);
+ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
  } else {
  error_report("Unknown combination of migration flags: %#x",
   flags);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a6e025a..096e1ea 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -164,7 +164,7 @@ int migrate_decompress_threads(void);

  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);

  /* Whenever this is found in the data stream, the flags
   * will be passed to ram_control_load_hook in the incoming-migration
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a01c5b8..7aafe19 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, 
struct iovec *iov,
  /*
   * This function provides hooks around different
   * stages of RAM migration.
+ * 'opaque' is the backend specific data in QEMUFile
+ * 'data' is call specific data associated with the 'flags' value
   */
-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
+  void *data);

  /*
   * Constants used by ram_control_* hooks
   */
-#define RAM_CONTROL_SETUP0
-#define RAM_CONTROL_ROUND1
-#define RAM_CONTROL_HOOK 2
-#define RAM_CONTROL_FINISH   3
+#define RAM_CONTROL_SETUP 0
+#define RAM_CONTROL_ROUND 1
+#define RAM_CONTROL_HOOK  2
+#define RAM_CONTROL_FINISH3
+#define RAM_CONTROL_BLOCK_REG 4

  /*
   * This function allows override of where the RAM page
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2750365..5493977 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
  int ret = 0;

  if (f->ops->before_ram_iterate) {
-ret = f->ops->before_ram_iterate(f, f->opaque, flags);
+ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
  if (ret < 0) {
  qemu_file_set_error(f, ret);
  }
@@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t 
flags)
  int ret = 0;

  if (f->ops->after_ram_iterate) {
-ret = f->ops->after_ram_iterate(f, f->opaque, flags);
+ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
  if (ret < 0) {
  qemu_file_set_error(f, ret);
  }
  }
  }

-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
  {
  int ret = -EINVAL;

  if (f->ops->hook_ram_load) {
-ret = f->ops->hook_ram_load(f, f->opaque, flags);
+ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
  if (ret < 0) {
  qemu_file_set_error(f, ret);
  }
  } else {
-qemu_file_set_error(f, ret);
+/*
+ * Hook is a hook specifically requested by the source sending a flag
+ * that expects there to be a hook on the destination.
+ */
+if (flags == RAM_CONTROL_HOOK) {
+qemu_file_set_error(f, ret);
+}
  }

Re: [Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 31 ---
  1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9532461..cb66721 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
   */
  typedef struct QEMU_PACKED {
  union QEMU_PACKED {
-uint64_t current_addr;  /* offset into the ramblock of the chunk */
+uint64_t current_addr;  /* offset into the ram_addr_t space */
  uint64_t chunk; /* chunk to lookup if unregistering */
  } key;
  uint32_t current_index; /* which ramblock the chunk belongs to */
@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
  uint64_t chunks;/* how many sequential chunks to register */
  } RDMARegister;

-static void register_to_network(RDMARegister *reg)
+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
  {
+RDMALocalBlock *local_block;
+local_block  = &rdma->local_ram_blocks.block[reg->current_index];
+
+if (local_block->is_ram_block) {
+/*
+ * current_addr as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+reg->key.current_addr -= local_block->offset;
+reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
+}
  reg->key.current_addr = htonll(reg->key.current_addr);
  reg->current_index = htonl(reg->current_index);
  reg->chunks = htonll(reg->chunks);
@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
  typedef struct QEMU_PACKED {
  uint32_t value; /* if zero, we will madvise() */
  uint32_t block_idx; /* which ram block index */
-uint64_t offset;/* where in the remote ramblock this chunk */
+uint64_t offset;/* Address in remote ram_addr_t space */
  uint64_t length;/* length of the chunk */
  } RDMACompress;

-static void compress_to_network(RDMACompress *comp)
+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
  {
  comp->value = htonl(comp->value);
+/*
+ * comp->offset as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
+comp->offset += rdma->dest_blocks[comp->block_idx].offset;
  comp->block_idx = htonl(comp->block_idx);
  comp->offset = htonll(comp->offset);
  comp->length = htonll(comp->length);


So, why add the destination block's offset on the source side
just for it to be re-adjusted again when it gets to the destination side?

Can you just stop at this:

+reg->key.current_addr -= local_block->offset;

Without this:

+reg->key.current_addr += 
rdma->dest_blocks[reg->current_index].offset;


... on the source, followed by this on the destionation:

+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;

Without this:

+comp->offset += rdma->dest_blocks[comp->block_idx].offset;

Did I follow correctly?


@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
  rdma->total_registrations--;

  reg.key.chunk = chunk;
-register_to_network(®);
+register_to_network(rdma, ®);
  ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) ®,
  &resp, NULL, NULL);
  if (ret < 0) {
@@ -1909,7 +1926,7 @@ retry:
  trace_qemu_rdma_write_one_zero(chunk, sge.length,
 current_index, current_addr);

-compress_to_network(&comp);
+compress_to_network(rdma, &comp);
  ret = qemu_rdma_exchange_send(rdma, &head,
  (uint8_t *) &comp, NULL, NULL, NULL);

@@ -1936,7 +1953,7 @@ retry:
  trace_qemu_rdma_write_one_sendreg(chunk, sge.length, 
current_index,
current_addr);

-register_to_network(®);
+register_to_network(rdma, ®);
  ret = qemu_rdma_excha

Re: [Qemu-devel] [PATCH v2 0/2] Makefile: Generate tag files under $SRC_PATH

2015-06-11 Thread John Snow

On 06/11/2015 04:41 AM, Fam Zheng wrote:
> On Fri, 05/22 13:35, Fam Zheng wrote:
> 
> Ping :)
> 
> Fam
> 
>>
>>
>> Fam Zheng (2):
>>   Makefile: Fix "make cscope TAGS"
>>   Makefile: Add "make ctags"
>>
>>  Makefile | 20 +++-
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>
>> -- 
>> 2.4.1
>>
>>
> 

Looks good to me -- this has long been a major annoyance of mine when I
want to review code (and zip around) without trying to configure it first.

Reviewed-by: John Snow

Re: [Qemu-devel] [RFC v6 0/2] monitor: add memory search commands s, sp

2015-06-11 Thread Luiz Capitulino

On Thu, 28 May 2015 16:18:41 -0400
Luiz Capitulino  wrote:

> On Mon, 18 May 2015 13:22:16 +0200
> hw.clau...@gmail.com wrote:
> 
> > From: Claudio Fontana 
> > 
> > This is the latest iteration of the memory search patch,
> > including a trivial replacement for the memmem function for systems
> > which don't provide one (notably Windows).
> > 
> > It detects the presence of memmem in configure and sets CONFIG_MEMMEM,
> > providing a trivial implementation for the !CONFIG_MEMMEM case.
> > 
> > The new code is MIT licensed, following usage of other files in the same
> > directory dealing with replacement functions (osdep, oslib, getauxval etc),
> > and to maximize reusability.
> > 
> > I have tested this in both CONFIG_MEMMEM defined/undefined scenarios,
> > but more feedback and testing is welcome of course.
> > 
> > changes from v5:
> > dropped the import from gnulib and implemented a trivial replacement.
> > 
> > changes from v4:
> > made into a series of two patches.
> > Introduced a memmem replacement function (import from gnulib)
> > and detection code in configure.
> > 
> > changes from v3:
> > initialize pointer variable to NULL to finally get rid of spurious warning
> > 
> > changes from v2:
> > move code to try to address spurious warning
> > 
> > changes from v1:
> > make checkpatch happy by adding braces here and there.
> > 
> > 
> > Claudio Fontana (2):
> >   util: add memmem replacement function
> >   monitor: add memory search commands s, sp
> 
> Applied to the qmp branch, thanks.


Unfortunately, I'm quite busy and won't have time to push this
through my tree. Markus is going to pick up this series soon.

Acked-by: Luiz Capitulino 

> 
> > 
> >  configure|  15 ++
> >  hmp-commands.hx  |  28 +++
> >  include/qemu/osdep.h |   4 ++
> >  monitor.c| 140 
> > +++
> >  util/Makefile.objs   |   1 +
> >  util/memmem.c|  62 +++
> >  6 files changed, 250 insertions(+)
> >  create mode 100644 util/memmem.c
> > 
>

Re: [Qemu-devel] [PATCH 0/2] Use bool for QBool

2015-06-11 Thread Luiz Capitulino

On Thu, 28 May 2015 15:54:12 -0400
Luiz Capitulino  wrote:

> On Fri, 15 May 2015 16:24:58 -0600
> Eric Blake  wrote:
> 
> > Passing around an 'int' for a QBool type is weird, when we already
> > use a C99 compiler and have a sane 'bool' that does just fine.
> > 
> > I half-debated sending this through qemu-trivial, but think it
> > better belongs through the QMP tree.  There turned out to be few
> > enough clients that I grouped it into two patches touching a number
> > of files each; but I'm also okay with splitting into finer-grained
> > patches that focus on fewer files at a time if that is desired.
> > 
> > Eric Blake (2):
> >   qobject: Use 'bool' for qbool
> >   qobject: Use 'bool' inside qdict
> 
> Applied to the qmp branch, thanks.

Unfortunately, I'm quite busy and won't have time to push this
through my tree. Markus is going to pick up this series soon.

Acked-by: Luiz Capitulino 

> 
> > 
> >  block/qapi.c|  2 +-
> >  block/quorum.c  |  4 ++--
> >  block/vvfat.c   |  4 ++--
> >  hmp.c   | 40 
> > 
> >  hw/pci/pcie_aer.c   |  4 ++--
> >  include/qapi/qmp/qbool.h|  8 
> >  include/qapi/qmp/qdict.h|  4 ++--
> >  monitor.c   | 12 ++--
> >  qapi/qmp-input-visitor.c|  2 +-
> >  qapi/qmp-output-visitor.c   |  2 +-
> >  qobject/json-parser.c   |  6 +++---
> >  qobject/qbool.c |  8 
> >  qobject/qdict.c |  8 
> >  qobject/qjson.c |  2 +-
> >  qom/object.c|  4 ++--
> >  tests/check-qjson.c | 11 ++-
> >  tests/test-qmp-event.c  |  4 ++--
> >  tests/test-qmp-output-visitor.c |  6 +++---
> >  util/qemu-option.c  |  2 +-
> >  19 files changed, 67 insertions(+), 66 deletions(-)
> > 
>

Re: [Qemu-devel] [PATCH v5 0/4] monitor: suggest running "help" for command errors

2015-06-11 Thread Luiz Capitulino

On Mon, 08 Jun 2015 10:53:23 +0200
Markus Armbruster  wrote:

> Copying HMP maintainer Luiz.
> 
> Series
> Reviewed-by: Markus Armbruster 
> 
> Bandan, thanks for your patience.
> 
> Luiz, my monitor/QMP queue is currently empty, but if it fills up before
> you get around to doing a monitor/HMP pull request, I'm happy to take
> this series along, if it gets your Acked-by.

I'd be immensely grateful if you pick this series along with the
other ones, as we've spoken in pvt. Thanks a lot Markus for your
help!

Acked-by: Luiz Capitulino

Re: [Qemu-devel] [PATCH v2 0/2] monitor+disas: Remove uses of ENV_GET_CPU

2015-06-11 Thread Luiz Capitulino

On Sun, 24 May 2015 14:20:39 -0700
Peter Crosthwaite  wrote:

> Neither the monitor or disassembly core has a good reason to navigate from an
> env pointer to a cpu pointer. Disas should not need env awarness at all, that
> is removed in P2.
> 
> The monitor is trickier, the env is still needed by some #ifdef switched 
> target
> specific code but all common code only needs to trade in CPU pointers. As the
> monitor always has access to a CPU pointer naturally, remove ENV_GET_CPU 
> usages
> (P1).
> 
> This is related to my multi-arch work, where the goal is to minimise use of
> architecture defined global definitions, ENV_GET_CPU being a major headache in
> that whole effort. The longer term goal is to limit ENV_GET_CPU use to 
> genuinely
> architecture specific code.
> 
> But I think these two patches stand in their own right, so sending ahead of 
> the
> motherload series. This brings both modules closer to common-oby-y'ification.
> 
> First RFC for multi arch is avaiable here:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg01771.html
> 
> The two patches are done together to avoid a conflict with monitor_disas which
> is touched by both patches. If one patch gets acked, the other nacked then
> either can be merged independently with trivial edits.

Unfortunately, I'm quite busy and won't have time to push this
through my tree. Markus is going to pick up this series soon.

Acked-by: Luiz Capitulino 

> 
> Changed since v1:
> Addressed RH and Andreas comments on P1.
> 
> Peter Crosthwaite (2):
>   monitor: Split mon_get_cpu fn to remove ENV_GET_CPU
>   disas: Remove uses of CPU env
> 
>  disas.c   | 14 +-
>  include/disas/disas.h |  4 +--
>  include/qemu/log.h|  4 +--
>  monitor.c | 65 
> +++
>  target-alpha/translate.c  |  2 +-
>  target-arm/translate-a64.c|  2 +-
>  target-arm/translate.c|  2 +-
>  target-cris/translate.c   |  2 +-
>  target-i386/translate.c   |  2 +-
>  target-lm32/translate.c   |  2 +-
>  target-m68k/translate.c   |  2 +-
>  target-microblaze/translate.c |  2 +-
>  target-mips/translate.c   |  2 +-
>  target-openrisc/translate.c   |  2 +-
>  target-ppc/translate.c|  2 +-
>  target-s390x/translate.c  |  2 +-
>  target-sh4/translate.c|  2 +-
>  target-sparc/translate.c  |  2 +-
>  target-tricore/translate.c|  2 +-
>  target-unicore32/translate.c  |  2 +-
>  target-xtensa/translate.c |  2 +-
>  21 files changed, 57 insertions(+), 64 deletions(-)
>

Re: [Qemu-devel] [PATCH v2 04/12] rdma typos

2015-06-11 Thread Michael R. Hines


On 06/11/2015 12:17 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

A couple of typo fixes.

Signed-off-by: Dr. David Alan Gilbert 
---
  migration/rdma.c | 6 +++---
  trace-events | 4 ++--
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bc73ff8..44ed996 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)

  /*
   * Perform a non-optimized memory unregistration after every transfer
- * for demonsration purposes, only if pin-all is not requested.
+ * for demonstration purposes, only if pin-all is not requested.
   *
   * Potential optimizations:
   * 1. Start a new thread to run this function continuously
@@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void *opaque)
  QEMUFile *f;
  Error *local_err = NULL, **errp = &local_err;

-trace_qemu_dma_accept_incoming_migration();
+trace_qemu_rdma_accept_incoming_migration();
  ret = qemu_rdma_accept(rdma);

  if (ret) {
@@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void *opaque)
  return;
  }

-trace_qemu_dma_accept_incoming_migration_accepted();
+trace_qemu_rdma_accept_incoming_migration_accepted();

  f = qemu_fopen_rdma(rdma, "rb");
  if (f == NULL) {
diff --git a/trace-events b/trace-events
index 2662ffa..8b468fe 100644
--- a/trace-events
+++ b/trace-events
@@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending size %" 
PRIu64 " max %" PR
  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) 
"transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64

  # migration/rdma.c
-qemu_dma_accept_incoming_migration(void) ""
-qemu_dma_accept_incoming_migration_accepted(void) ""
+qemu_rdma_accept_incoming_migration(void) ""
+qemu_rdma_accept_incoming_migration_accepted(void) ""


What happened to the actual message inside the quotes? =)

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Marcel Apfelbaum


On 06/11/2015 07:54 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:

On 06/11/2015 05:24 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:

On 06/11/2015 04:58 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:

The fixes solves the following issue:
The PXB device exposes a new  pci root bridge with the
fw path:  /pci-root@4/..., in which 4 is the root bus number.
Before this patch the fw path was wrongly computed:
 /pci-root@1/pci@i0cf8/...
Fix the above issues: Correct the bus number and remove the
extra host bridge description.


Why is that wrong?  The previous path looks correct to me.

The prev path includes both the extra root bridge and *then* the usual host 
bridge.
  /pci-root@1/pci@i0cf8/   ...
 ^ new   ^ regular  ^ devices

Since the new pci root bridge (and bus) is on "paralel" with the regular one.
it is not correct to add it to the path.

The architecture is:
  //devices...
  /extra root bridge/devices...
  /extra root bridge/devices...
And not
/extra root bridge///devices


Your patch changed both the "/extra root bridge/devices..." part and
the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
correct IMO.

Why? @1 should be the unit address which is the text representation
of the physical address, in our case the slot. Since the bus number
in our case is 4, I think /pci-root@4/ is the 'correct' address.


On real machines, the firmware assigns the 4 - it's not a physical
address; it's a logical address (like all bus numbers in PCI).  The
firmware might assign a totally different number on the next boot.

Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
try to understand it.

I looked up a real hardware machine and it seemed to me that the extra pci root 
numbers
are provided in the ACPI tables, meaning by the vendor, not the fw.
In this case QEMU is the vendor, i440fx is the machine, right?

I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI 
roots*.
They are doing it for the pci-2-pci bridges of course.
I saw that Seabios is trying to "guess" the root-buses by going over all the 
0-0xff range
and probing all the slots, looking for devices. So it expects the hw to be 
hardwired regarding
PCI root buses.
Is my understanding incorrect?

Thanks,
Marcel








-Kevin

[Qemu-devel] [PATCH v2 12/12] Fail more cleanly in mismatched RAM cases

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

If the number of RAMBlocks was different on the source from the
destination, QEMU would hang waiting for a disconnect on the source
and wouldn't release from that hang until the destination was manually
killed.

Mark the stream as being in error, this causes the destination to die
and the source to carry on.

(It still gets a whole bunch of warnings on the destination, and I've
not managed to complete another migration after the 1st one, still
progress).

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1f3a9fb..9156308 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3320,6 +3320,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 "Your QEMU command line parameters are probably "
 "not identical on both the source and destination.",
 local->nb_blocks, nb_dest_blocks);
+rdma->error_state = -EINVAL;
 return -EINVAL;
 }
 
@@ -3335,6 +3336,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 "vs %" PRIu64, local->block[i].block_name, i,
 local->block[i].length,
 rdma->dest_blocks[i].length);
+rdma->error_state = -EINVAL;
 return -EINVAL;
 }
 local->block[i].remote_host_addr =
-- 
2.4.2

[Qemu-devel] [PATCH v2 07/12] Rework ram_control_load_hook to hook during block load

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

We need the names of RAMBlocks as they're loaded for RDMA,
reuse a slightly modified ram_control_load_hook:
  a) Pass a 'data' parameter to use for the name in the block-reg
 case
  b) Only some hook types now require the presence of a hook function.

Signed-off-by: Dr. David Alan Gilbert 
---
 arch_init.c   |  4 +++-
 include/migration/migration.h |  2 +-
 include/migration/qemu-file.h | 14 +-
 migration/qemu-file.c | 16 +++-
 migration/rdma.c  | 28 ++--
 trace-events  |  2 +-
 6 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d294474..dc9cc7e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1569,6 +1569,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 error_report_err(local_err);
 }
 }
+ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
+  block->idstr);
 break;
 }
 }
@@ -1637,7 +1639,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 default:
 if (flags & RAM_SAVE_FLAG_HOOK) {
-ram_control_load_hook(f, flags);
+ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
 } else {
 error_report("Unknown combination of migration flags: %#x",
  flags);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a6e025a..096e1ea 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -164,7 +164,7 @@ int migrate_decompress_threads(void);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-void ram_control_load_hook(QEMUFile *f, uint64_t flags);
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a01c5b8..7aafe19 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -63,16 +63,20 @@ typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, 
struct iovec *iov,
 /*
  * This function provides hooks around different
  * stages of RAM migration.
+ * 'opaque' is the backend specific data in QEMUFile
+ * 'data' is call specific data associated with the 'flags' value
  */
-typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
+typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags,
+  void *data);
 
 /*
  * Constants used by ram_control_* hooks
  */
-#define RAM_CONTROL_SETUP0
-#define RAM_CONTROL_ROUND1
-#define RAM_CONTROL_HOOK 2
-#define RAM_CONTROL_FINISH   3
+#define RAM_CONTROL_SETUP 0
+#define RAM_CONTROL_ROUND 1
+#define RAM_CONTROL_HOOK  2
+#define RAM_CONTROL_FINISH3
+#define RAM_CONTROL_BLOCK_REG 4
 
 /*
  * This function allows override of where the RAM page
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2750365..5493977 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -128,7 +128,7 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
 int ret = 0;
 
 if (f->ops->before_ram_iterate) {
-ret = f->ops->before_ram_iterate(f, f->opaque, flags);
+ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -140,24 +140,30 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t 
flags)
 int ret = 0;
 
 if (f->ops->after_ram_iterate) {
-ret = f->ops->after_ram_iterate(f, f->opaque, flags);
+ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
 }
 }
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags)
+void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
 {
 int ret = -EINVAL;
 
 if (f->ops->hook_ram_load) {
-ret = f->ops->hook_ram_load(f, f->opaque, flags);
+ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
 } else {
-qemu_file_set_error(f, ret);
+/*
+ * Hook is a hook specifically requested by the source sending a flag
+ * that expects there to be a hook on the destination.
+ */
+if (flags == RAM_CONTROL_HOOK) {
+qemu_file_set_error(f, ret);
+}
 }
 }
 
diff --git a/migration/rdma.c b/migration/rdma.c
index cb66721..396329c 100644
--- a/migration/rdma.c
+

[Qemu-devel] [PATCH v2 08/12] Allow rdma_delete_block to work without the hash

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

In the next patch we remove the hash on the destination,
rdma_delete_block does two things with the hash which can be avoided:
  a) The caller passes the offset and rdma_delete_block looks it up
 in the hash; fixed by getting the caller to pass the block
  b) The hash gets recreated after deletion; fixed by making that
 conditional on the hash being initialised.

While this function is currently only used during cleanup, Michael
asked that we keep it general for future dynamic block registration
work.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 27 ---
 trace-events |  2 +-
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 396329c..8d99378 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -617,16 +617,19 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 return 0;
 }
 
-static int rdma_delete_block(RDMAContext *rdma, ram_addr_t block_offset)
+/*
+ * Note: If used outside of cleanup, the caller must ensure that the 
destination
+ * block structures are also updated
+ */
+static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
 RDMALocalBlocks *local = &rdma->local_ram_blocks;
-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-(void *) block_offset);
 RDMALocalBlock *old = local->block;
 int x;
 
-assert(block);
-
+if (rdma->blockmap) {
+g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
+}
 if (block->pmr) {
 int j;
 
@@ -659,8 +662,11 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
 g_free(block->block_name);
 block->block_name = NULL;
 
-for (x = 0; x < local->nb_blocks; x++) {
-g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
+if (rdma->blockmap) {
+for (x = 0; x < local->nb_blocks; x++) {
+g_hash_table_remove(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset);
+}
 }
 
 if (local->nb_blocks > 1) {
@@ -682,8 +688,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
 local->block = NULL;
 }
 
-trace_rdma_delete_block(local->nb_blocks,
-   (uintptr_t)block->local_host_addr,
+trace_rdma_delete_block(block, (uintptr_t)block->local_host_addr,
block->offset, block->length,
 (uintptr_t)(block->local_host_addr + 
block->length),
BITS_TO_LONGS(block->nb_chunks) *
@@ -693,7 +698,7 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
 
 local->nb_blocks--;
 
-if (local->nb_blocks) {
+if (local->nb_blocks && rdma->blockmap) {
 for (x = 0; x < local->nb_blocks; x++) {
 g_hash_table_insert(rdma->blockmap,
 (void *)(uintptr_t)local->block[x].offset,
@@ -2214,7 +2219,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 
 if (rdma->local_ram_blocks.block) {
 while (rdma->local_ram_blocks.nb_blocks) {
-rdma_delete_block(rdma, rdma->local_ram_blocks.block->offset);
+rdma_delete_block(rdma, &rdma->local_ram_blocks.block[0]);
 }
 }
 
diff --git a/trace-events b/trace-events
index 0f37a4b..7dff362 100644
--- a/trace-events
+++ b/trace-events
@@ -1452,7 +1452,7 @@ qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int 
index, int64_t offset)
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " 
chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) 
"Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, 
offset: %" PRId64
 rdma_add_block(const char *block_name, int block, uint64_t addr, uint64_t 
offset, uint64_t len, uint64_t end, uint64_t bits, int chunks) "Added Block: 
'%s':%d, addr: %" PRIu64 ", offset: %" PRIu64 " length: %" PRIu64 " end: %" 
PRIu64 " bits %" PRIu64 " chunks %d"
-rdma_delete_block(int block, uint64_t addr, uint64_t offset, uint64_t len, 
uint64_t end, uint64_t bits, int chunks) "Deleted Block: %d, addr: %" PRIu64 ", 
offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " 
chunks %d"
+rdma_delete_block(void *block, uint64_t addr, uint64_t offset, uint64_t len, 
uint64_t end, uint64_t bits, int chunks) "Deleted Block: %p, addr: %" PRIu64 ", 
offset: %" PRIu64 " length: %" PRIu64 " end: %" PRIu64 " bits %" PRIu64 " 
chunks %d"
 rdma_start_incoming_migration(void) ""
 rdma_start_incoming_migration_after_dest_init(void) ""
 rdma_start_incoming_migration_after_rdma_listen(void) ""
-- 
2.4.2

[Qemu-devel] [PATCH v2 04/12] rdma typos

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

A couple of typo fixes.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 6 +++---
 trace-events | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bc73ff8..44ed996 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1215,7 +1215,7 @@ const char *print_wrid(int wrid)
 
 /*
  * Perform a non-optimized memory unregistration after every transfer
- * for demonsration purposes, only if pin-all is not requested.
+ * for demonstration purposes, only if pin-all is not requested.
  *
  * Potential optimizations:
  * 1. Start a new thread to run this function continuously
@@ -3279,7 +3279,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 QEMUFile *f;
 Error *local_err = NULL, **errp = &local_err;
 
-trace_qemu_dma_accept_incoming_migration();
+trace_qemu_rdma_accept_incoming_migration();
 ret = qemu_rdma_accept(rdma);
 
 if (ret) {
@@ -3287,7 +3287,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 return;
 }
 
-trace_qemu_dma_accept_incoming_migration_accepted();
+trace_qemu_rdma_accept_incoming_migration_accepted();
 
 f = qemu_fopen_rdma(rdma, "rb");
 if (f == NULL) {
diff --git a/trace-events b/trace-events
index 2662ffa..8b468fe 100644
--- a/trace-events
+++ b/trace-events
@@ -1398,8 +1398,8 @@ migrate_pending(uint64_t size, uint64_t max) "pending 
size %" PRIu64 " max %" PR
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double 
bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " 
bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-qemu_dma_accept_incoming_migration(void) ""
-qemu_dma_accept_incoming_migration_accepted(void) ""
+qemu_rdma_accept_incoming_migration(void) ""
+qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
 qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char 
*gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
-- 
2.4.2

[Qemu-devel] [PATCH v2 06/12] Translate offsets to destination address space

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The 'offset' field in RDMACompress and 'current_addr' field
in RDMARegister are commented as being offsets within a particular
RAMBlock, however they appear to actually be offsets within the
ram_addr_t space.

The code currently assumes that the offsets on the source/destination
match, this change removes the need for the assumption for these
structures by translating the addresses into the ram_addr_t space of
the destination host.

Note: An alternative would be to change the fields to actually
take the data they're commented for; this would potentially be
simpler but would break stream compatibility for those cases
that currently work.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9532461..cb66721 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -411,7 +411,7 @@ static void network_to_control(RDMAControlHeader *control)
  */
 typedef struct QEMU_PACKED {
 union QEMU_PACKED {
-uint64_t current_addr;  /* offset into the ramblock of the chunk */
+uint64_t current_addr;  /* offset into the ram_addr_t space */
 uint64_t chunk; /* chunk to lookup if unregistering */
 } key;
 uint32_t current_index; /* which ramblock the chunk belongs to */
@@ -419,8 +419,19 @@ typedef struct QEMU_PACKED {
 uint64_t chunks;/* how many sequential chunks to register */
 } RDMARegister;
 
-static void register_to_network(RDMARegister *reg)
+static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
+RDMALocalBlock *local_block;
+local_block  = &rdma->local_ram_blocks.block[reg->current_index];
+
+if (local_block->is_ram_block) {
+/*
+ * current_addr as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+reg->key.current_addr -= local_block->offset;
+reg->key.current_addr += rdma->dest_blocks[reg->current_index].offset;
+}
 reg->key.current_addr = htonll(reg->key.current_addr);
 reg->current_index = htonl(reg->current_index);
 reg->chunks = htonll(reg->chunks);
@@ -436,13 +447,19 @@ static void network_to_register(RDMARegister *reg)
 typedef struct QEMU_PACKED {
 uint32_t value; /* if zero, we will madvise() */
 uint32_t block_idx; /* which ram block index */
-uint64_t offset;/* where in the remote ramblock this chunk */
+uint64_t offset;/* Address in remote ram_addr_t space */
 uint64_t length;/* length of the chunk */
 } RDMACompress;
 
-static void compress_to_network(RDMACompress *comp)
+static void compress_to_network(RDMAContext *rdma, RDMACompress *comp)
 {
 comp->value = htonl(comp->value);
+/*
+ * comp->offset as passed in is an address in the local ram_addr_t
+ * space, we need to translate this for the destination
+ */
+comp->offset -= rdma->local_ram_blocks.block[comp->block_idx].offset;
+comp->offset += rdma->dest_blocks[comp->block_idx].offset;
 comp->block_idx = htonl(comp->block_idx);
 comp->offset = htonll(comp->offset);
 comp->length = htonll(comp->length);
@@ -1288,7 +1305,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
 rdma->total_registrations--;
 
 reg.key.chunk = chunk;
-register_to_network(®);
+register_to_network(rdma, ®);
 ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) ®,
 &resp, NULL, NULL);
 if (ret < 0) {
@@ -1909,7 +1926,7 @@ retry:
 trace_qemu_rdma_write_one_zero(chunk, sge.length,
current_index, current_addr);
 
-compress_to_network(&comp);
+compress_to_network(rdma, &comp);
 ret = qemu_rdma_exchange_send(rdma, &head,
 (uint8_t *) &comp, NULL, NULL, NULL);
 
@@ -1936,7 +1953,7 @@ retry:
 trace_qemu_rdma_write_one_sendreg(chunk, sge.length, current_index,
   current_addr);
 
-register_to_network(®);
+register_to_network(rdma, ®);
 ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) ®,
 &resp, ®_result_idx, NULL);
 if (ret < 0) {
-- 
2.4.2

[Qemu-devel] [PATCH v2 10/12] Sort destination RAMBlocks to be the same as the source

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Use the order of incoming RAMBlocks from the source to record
an index number; that then allows us to sort the destination
local RAMBlock list to match the source.

Now that the RAMBlocks are known to be in the same order, this
simplifies the RDMA Registration step which previously tried to
match RAMBlocks based on offset (which isn't guaranteed to match).

Looking at the existing compress code, I think it was erroneously
relying on an assumption of matching ordering, which this fixes.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 101 ---
 trace-events |   2 ++
 2 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f541586..92dc5c1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -224,6 +224,7 @@ typedef struct RDMALocalBlock {
 uint32_t  *remote_keys; /* rkeys for chunk-level registration */
 uint32_t   remote_rkey; /* rkeys for non-chunk-level registration 
*/
 intindex;   /* which block are we */
+unsigned int   src_index;   /* (Only used on dest) */
 bool   is_ram_block;
 intnb_chunks;
 unsigned long *transit_bitmap;
@@ -353,6 +354,9 @@ typedef struct RDMAContext {
 RDMALocalBlocks local_ram_blocks;
 RDMADestBlock  *dest_blocks;
 
+/* Index of the next RAMBlock received during block registration */
+unsigned intnext_src_index;
+
 /*
  * Migration on *destination* started.
  * Then use coroutine yield function.
@@ -561,6 +565,7 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,
 block->offset = block_offset;
 block->length = length;
 block->index = local->nb_blocks;
+block->src_index = ~0U; /* Filled in by the receipt of the block list */
 block->nb_chunks = ram_chunk_index(host_addr, host_addr + length) + 1UL;
 block->transit_bitmap = bitmap_new(block->nb_chunks);
 bitmap_clear(block->transit_bitmap, 0, block->nb_chunks);
@@ -2909,6 +2914,14 @@ err_rdma_dest_wait:
 return ret;
 }
 
+static int dest_ram_sort_func(const void *a, const void *b)
+{
+unsigned int a_index = ((const RDMALocalBlock *)a)->src_index;
+unsigned int b_index = ((const RDMALocalBlock *)b)->src_index;
+
+return (a_index < b_index) ? -1 : (a_index != b_index);
+}
+
 /*
  * During each iteration of the migration, we listen for instructions
  * by the source VM to perform dynamic page registrations before they
@@ -2986,6 +2999,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
 case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
 trace_qemu_rdma_registration_handle_ram_blocks();
 
+/* Sort our local RAM Block list so it's the same as the source,
+ * we can do this since we've filled in a src_index in the list
+ * as we received the RAMBlock list earlier.
+ */
+qsort(rdma->local_ram_blocks.block,
+  rdma->local_ram_blocks.nb_blocks,
+  sizeof(RDMALocalBlock), dest_ram_sort_func);
 if (rdma->pin_all) {
 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
 if (ret) {
@@ -3013,6 +3033,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
 rdma->dest_blocks[i].length = local->block[i].length;
 
 dest_block_to_network(&rdma->dest_blocks[i]);
+trace_qemu_rdma_registration_handle_ram_blocks_loop(
+local->block[i].block_name,
+local->block[i].offset,
+local->block[i].length,
+local->block[i].local_host_addr,
+local->block[i].src_index);
 }
 
 blocks.len = rdma->local_ram_blocks.nb_blocks
@@ -3136,13 +3162,44 @@ out:
 return ret;
 }
 
+/* Destination:
+ * Called via a ram_control_load_hook during the initial RAM load section which
+ * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
+ * on the source.
+ * We've already built our local RAMBlock list, but not yet sent the list to
+ * the source.
+ */
+static int rdma_block_notification_handle(QEMUFileRDMA *rfile, const char 
*name)
+{
+RDMAContext *rdma = rfile->rdma;
+int curr;
+int found = -1;
+
+/* Find the matching RAMBlock in our local list */
+for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
+if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
+found = curr;
+break;
+}
+}
+
+if (found == -1) {
+error_report("RAMBlock '%s' not found on destination", name);
+return -ENOENT;
+}
+
+rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
+trace_rdma_block_notification_handle(name, rdma->next_src_index);
+rdma->next_src_index++;
+
+

[Qemu-devel] [PATCH v2 09/12] Rework ram block hash

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

RDMA uses a hash from block offset->RAM Block; this isn't needed
on the destination, and it becomes harder to maintain after the next
patch in the series that sorts the block list.

Split the hash so that it's only generated on the source.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8d99378..f541586 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -533,23 +533,22 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,
  ram_addr_t block_offset, uint64_t length)
 {
 RDMALocalBlocks *local = &rdma->local_ram_blocks;
-RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
-(void *)(uintptr_t)block_offset);
+RDMALocalBlock *block;
 RDMALocalBlock *old = local->block;
 
-assert(block == NULL);
-
 local->block = g_malloc0(sizeof(RDMALocalBlock) * (local->nb_blocks + 1));
 
 if (local->nb_blocks) {
 int x;
 
-for (x = 0; x < local->nb_blocks; x++) {
-g_hash_table_remove(rdma->blockmap,
-(void *)(uintptr_t)old[x].offset);
-g_hash_table_insert(rdma->blockmap,
-(void *)(uintptr_t)old[x].offset,
-&local->block[x]);
+if (rdma->blockmap) {
+for (x = 0; x < local->nb_blocks; x++) {
+g_hash_table_remove(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset);
+g_hash_table_insert(rdma->blockmap,
+(void *)(uintptr_t)old[x].offset,
+&local->block[x]);
+}
 }
 memcpy(local->block, old, sizeof(RDMALocalBlock) * local->nb_blocks);
 g_free(old);
@@ -571,7 +570,9 @@ static int rdma_add_block(RDMAContext *rdma, const char 
*block_name,
 
 block->is_ram_block = local->init ? false : true;
 
-g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+if (rdma->blockmap) {
+g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
+}
 
 trace_rdma_add_block(block_name, local->nb_blocks,
  (uintptr_t) block->local_host_addr,
@@ -607,7 +608,6 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 RDMALocalBlocks *local = &rdma->local_ram_blocks;
 
 assert(rdma->blockmap == NULL);
-rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
 memset(local, 0, sizeof *local);
 qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
 trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
@@ -2292,6 +2292,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, 
Error **errp, bool pin_all)
 goto err_rdma_source_init;
 }
 
+/* Build the hash that maps from offset to RAMBlock */
+rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
+for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+g_hash_table_insert(rdma->blockmap,
+(void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
+&rdma->local_ram_blocks.block[idx]);
+}
+
 for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
 ret = qemu_rdma_reg_control(rdma, idx);
 if (ret) {
-- 
2.4.2

[Qemu-devel] [PATCH v2 05/12] Store block name in local blocks structure

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

In a later patch the block name will be used to match up two views
of the block list.  Keep a copy of the block name with the local block
list.

(At some point it could be argued that it would be best just to let
migration see the innards of RAMBlock and avoid the need to use
foreach).

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
---
 migration/rdma.c | 35 +--
 trace-events |  2 +-
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 44ed996..9532461 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -214,17 +214,18 @@ static void network_to_caps(RDMACapabilities *cap)
  * the information. It's small anyway, so a list is overkill.
  */
 typedef struct RDMALocalBlock {
-uint8_t  *local_host_addr; /* local virtual address */
-uint64_t remote_host_addr; /* remote virtual address */
-uint64_t offset;
-uint64_t length;
-struct   ibv_mr **pmr; /* MRs for chunk-level registration */
-struct   ibv_mr *mr;   /* MR for non-chunk-level registration */
-uint32_t *remote_keys; /* rkeys for chunk-level registration */
-uint32_t remote_rkey;  /* rkeys for non-chunk-level registration */
-int  index;/* which block are we */
-bool is_ram_block;
-int  nb_chunks;
+char  *block_name;
+uint8_t   *local_host_addr; /* local virtual address */
+uint64_t   remote_host_addr; /* remote virtual address */
+uint64_t   offset;
+uint64_t   length;
+struct ibv_mr **pmr;/* MRs for chunk-level registration */
+struct ibv_mr *mr;  /* MR for non-chunk-level registration */
+uint32_t  *remote_keys; /* rkeys for chunk-level registration */
+uint32_t   remote_rkey; /* rkeys for non-chunk-level registration 
*/
+intindex;   /* which block are we */
+bool   is_ram_block;
+intnb_chunks;
 unsigned long *transit_bitmap;
 unsigned long *unregister_bitmap;
 } RDMALocalBlock;
@@ -510,7 +511,8 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock 
*rdma_ram_block,
 return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, void *host_addr,
+static int rdma_add_block(RDMAContext *rdma, const char *block_name,
+ void *host_addr,
  ram_addr_t block_offset, uint64_t length)
 {
 RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -538,6 +540,7 @@ static int rdma_add_block(RDMAContext *rdma, void 
*host_addr,
 
 block = &local->block[local->nb_blocks];
 
+block->block_name = g_strdup(block_name);
 block->local_host_addr = host_addr;
 block->offset = block_offset;
 block->length = length;
@@ -553,7 +556,8 @@ static int rdma_add_block(RDMAContext *rdma, void 
*host_addr,
 
 g_hash_table_insert(rdma->blockmap, (void *) block_offset, block);
 
-trace_rdma_add_block(local->nb_blocks, (uintptr_t) block->local_host_addr,
+trace_rdma_add_block(block_name, local->nb_blocks,
+ (uintptr_t) block->local_host_addr,
  block->offset, block->length,
  (uintptr_t) (block->local_host_addr + block->length),
  BITS_TO_LONGS(block->nb_chunks) *
@@ -573,7 +577,7 @@ static int rdma_add_block(RDMAContext *rdma, void 
*host_addr,
 static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
 ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-return rdma_add_block(opaque, host_addr, block_offset, length);
+return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
 }
 
 /*
@@ -635,6 +639,9 @@ static int rdma_delete_block(RDMAContext *rdma, ram_addr_t 
block_offset)
 g_free(block->remote_keys);
 block->remote_keys = NULL;
 
+g_free(block->block_name);
+block->block_name = NULL;
+
 for (x = 0; x < local->nb_blocks; x++) {
 g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset);
 }
diff --git a/trace-events b/trace-events
index 8b468fe..557770c 100644
--- a/trace-events
+++ b/trace-events
@@ -1451,7 +1451,7 @@ qemu_rdma_write_one_recvregres(int mykey, int theirkey, 
uint64_t chunk) "Receive
 qemu_rdma_write_one_sendreg(uint64_t chunk, int len, int index, int64_t 
offset) "Sending registration request chunk %" PRIu64 " for %d bytes, index: 
%d, offset: %" PRId64
 qemu_rdma_write_one_top(uint64_t chunks, uint64_t size) "Writing %" PRIu64 " 
chunks, (%" PRIu64 " MB)"
 qemu_rdma_write_one_zero(uint64_t chunk, int len, int index, int64_t offset) 
"Entire chunk is zero, sending compress: %" PRIu64 " for %d bytes, index: %d, 
offset: %" PRId64
-rdma_add_block(int block, uint64_t addr, uint64_t offset, uint64_t len, 
uint64_t end, uint64_t bits, int chunks) "Added Block: %d, addr: %" PRIu64 ", 
offset: %" PR

[Qemu-devel] [PATCH v2 03/12] Remove unneeded memset

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 38e5f44..bc73ff8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2445,7 +2445,6 @@ static void *qemu_rdma_data_init(const char *host_port, 
Error **errp)
 
 if (host_port) {
 rdma = g_malloc0(sizeof(RDMAContext));
-memset(rdma, 0, sizeof(RDMAContext));
 rdma->current_index = -1;
 rdma->current_chunk = -1;
 
-- 
2.4.2

[Qemu-devel] [PATCH v2 02/12] qemu_ram_foreach_block: pass up error value, and down the ramblock name

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

check the return value of the function it calls and error if it's non-0
Fixup qemu_rdma_init_one_block that is the only current caller,
  and rdma_add_block the only function it calls using it.

Pass the name of the ramblock to the function; helps in debugging.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: David Gibson 
Reviewed-by: Amit Shah 
Reviewed-by: Michael R. Hines 
---
 exec.c| 10 --
 include/exec/cpu-common.h |  4 ++--
 migration/rdma.c  |  4 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 487583b..c4df2a4 100644
--- a/exec.c
+++ b/exec.c
@@ -3348,14 +3348,20 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr)
 return res;
 }
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
 {
 RAMBlock *block;
+int ret = 0;
 
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-func(block->host, block->offset, block->used_length, opaque);
+ret = func(block->idstr, block->host, block->offset,
+   block->used_length, opaque);
+if (ret) {
+break;
+}
 }
 rcu_read_unlock();
+return ret;
 }
 #endif
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 43428bd..de8a720 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -126,10 +126,10 @@ void cpu_flush_icache_range(hwaddr start, int len);
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
-typedef void (RAMBlockIterFunc)(void *host_addr,
+typedef int (RAMBlockIterFunc)(const char *block_name, void *host_addr,
 ram_addr_t offset, ram_addr_t length, void *opaque);
 
-void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 089adcf..38e5f44 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -570,10 +570,10 @@ static int rdma_add_block(RDMAContext *rdma, void 
*host_addr,
  * in advanced before the migration starts. This tells us where the RAM blocks
  * are so that we can register them individually.
  */
-static void qemu_rdma_init_one_block(void *host_addr,
+static int qemu_rdma_init_one_block(const char *block_name, void *host_addr,
 ram_addr_t block_offset, ram_addr_t length, void *opaque)
 {
-rdma_add_block(opaque, host_addr, block_offset, length);
+return rdma_add_block(opaque, host_addr, block_offset, length);
 }
 
 /*
-- 
2.4.2

[Qemu-devel] [PATCH v2 11/12] Sanity check RDMA remote data

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Perform some basic (but probably not complete) sanity checking on
requests from the RDMA source.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
---
 migration/rdma.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 92dc5c1..1f3a9fb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2984,6 +2984,13 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
 trace_qemu_rdma_registration_handle_compress(comp->length,
  comp->block_idx,
  comp->offset);
+if (comp->block_idx >= rdma->local_ram_blocks.nb_blocks) {
+error_report("rdma: 'compress' bad block index %u (vs %d)",
+ (unsigned int)comp->block_idx,
+ rdma->local_ram_blocks.nb_blocks);
+ret = -EIO;
+break;
+}
 block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
 host_addr = block->local_host_addr +
@@ -3072,8 +3079,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
 trace_qemu_rdma_registration_handle_register_loop(count,
  reg->current_index, reg->key.current_addr, 
reg->chunks);
 
+if (reg->current_index >= rdma->local_ram_blocks.nb_blocks) {
+error_report("rdma: 'register' bad block index %u (vs %d)",
+ (unsigned int)reg->current_index,
+ rdma->local_ram_blocks.nb_blocks);
+ret = -ENOENT;
+break;
+}
 block = &(rdma->local_ram_blocks.block[reg->current_index]);
 if (block->is_ram_block) {
+if (block->offset > reg->key.current_addr) {
+error_report("rdma: bad register address for block %s"
+" offset: %" PRIx64 " current_addr: %" PRIx64,
+block->block_name, block->offset,
+reg->key.current_addr);
+ret = -ERANGE;
+break;
+}
 host_addr = (block->local_host_addr +
 (reg->key.current_addr - block->offset));
 chunk = ram_chunk_index(block->local_host_addr,
@@ -3082,6 +3104,14 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque)
 chunk = reg->key.chunk;
 host_addr = block->local_host_addr +
 (reg->key.chunk * (1UL << RDMA_REG_CHUNK_SHIFT));
+/* Check for particularly bad chunk value */
+if (host_addr < (void *)block->local_host_addr) {
+error_report("rdma: bad chunk for block %s"
+" chunk: %" PRIx64,
+block->block_name, reg->key.chunk);
+ret = -ERANGE;
+break;
+}
 }
 chunk_start = ram_chunk_start(block, chunk);
 chunk_end = ram_chunk_end(block, chunk + reg->chunks);
-- 
2.4.2

[Qemu-devel] [PATCH v2 00/12] Remove RDMA migration dependence on RAMBlock offset

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

RDMA migration currently relies on the source and destination RAMBlocks
having the same offsets within ram_addr_t space;  unfortunately that's
just not true when:
   a) You hotplug on the source but then create the device on the command line
 on the destination.

   b) Across two versions of qemu

Thus there are migrations that work with TCP that don't with RDMA.

The changes keep stream compatibility with existing RDMA migration,
so cases that already work (i.e. no hotplug) will keep working.

The new requirements do rely on the block indexes being the same on
both sides (hence we sort to ensure that), my reading of the existing
code is that it also relies on that in various places but doesn't ensure
it's true.

With some light testing this seems to work; hopefully I've got all the
cases that pass offsets back and forward.

v2:
  Keep rdma_delete_block's ability to delete entries from the hash
 (on the source side)
  Fix ram-control-hook modification so they don't break non-rdma migrate
  Added a fix that makes the RDMA migration exit in the case of some mismatched
configs instead of hanging.
  Clarified comments/commit messages
  Added a pair of typo fixes

Dr. David Alan Gilbert (12):
  Rename RDMA structures to make destination clear
  qemu_ram_foreach_block: pass up error value, and down the ramblock
name
  Remove unneeded memset
  rdma typos
  Store block name in local blocks structure
  Translate offsets to destination address space
  Rework ram_control_load_hook to hook during block load
  Allow rdma_delete_block to work without the hash
  Rework ram block hash
  Sort destination RAMBlocks to be the same as the source
  Sanity check RDMA remote data
  Fail more cleanly in mismatched RAM cases

 arch_init.c   |   4 +-
 exec.c|  10 +-
 include/exec/cpu-common.h |   4 +-
 include/migration/migration.h |   2 +-
 include/migration/qemu-file.h |  14 +-
 migration/qemu-file.c |  16 +-
 migration/rdma.c  | 347 +-
 trace-events  |  12 +-
 8 files changed, 279 insertions(+), 130 deletions(-)

-- 
2.4.2

[Qemu-devel] [PATCH v2 01/12] Rename RDMA structures to make destination clear

2015-06-11 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

RDMA has two data types that are named confusingly;
   RDMALocalBlock (pointed to indirectly by local_ram_blocks)
   RDMARemoteBlock (pointed to by block in RDMAContext)

RDMALocalBlocks, as the name suggests is a data strucuture that
represents the RDMAable RAM Blocks on the current side of the migration
whichever that is.

RDMARemoteBlocks is always the shape of the RAMBlocks on the
destination, even on the destination.

Rename:
 RDMARemoteBlock -> RDMADestBlock
 context->'block' -> context->dest_blocks

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael R. Hines 
---
 migration/rdma.c | 66 
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 77e3444..089adcf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -236,13 +236,13 @@ typedef struct RDMALocalBlock {
  * corresponding RDMALocalBlock with
  * the information needed to perform the actual RDMA.
  */
-typedef struct QEMU_PACKED RDMARemoteBlock {
+typedef struct QEMU_PACKED RDMADestBlock {
 uint64_t remote_host_addr;
 uint64_t offset;
 uint64_t length;
 uint32_t remote_rkey;
 uint32_t padding;
-} RDMARemoteBlock;
+} RDMADestBlock;
 
 static uint64_t htonll(uint64_t v)
 {
@@ -258,20 +258,20 @@ static uint64_t ntohll(uint64_t v) {
 return ((uint64_t)ntohl(u.lv[0]) << 32) | (uint64_t) ntohl(u.lv[1]);
 }
 
-static void remote_block_to_network(RDMARemoteBlock *rb)
+static void dest_block_to_network(RDMADestBlock *db)
 {
-rb->remote_host_addr = htonll(rb->remote_host_addr);
-rb->offset = htonll(rb->offset);
-rb->length = htonll(rb->length);
-rb->remote_rkey = htonl(rb->remote_rkey);
+db->remote_host_addr = htonll(db->remote_host_addr);
+db->offset = htonll(db->offset);
+db->length = htonll(db->length);
+db->remote_rkey = htonl(db->remote_rkey);
 }
 
-static void network_to_remote_block(RDMARemoteBlock *rb)
+static void network_to_dest_block(RDMADestBlock *db)
 {
-rb->remote_host_addr = ntohll(rb->remote_host_addr);
-rb->offset = ntohll(rb->offset);
-rb->length = ntohll(rb->length);
-rb->remote_rkey = ntohl(rb->remote_rkey);
+db->remote_host_addr = ntohll(db->remote_host_addr);
+db->offset = ntohll(db->offset);
+db->length = ntohll(db->length);
+db->remote_rkey = ntohl(db->remote_rkey);
 }
 
 /*
@@ -350,7 +350,7 @@ typedef struct RDMAContext {
  * Description of ram blocks used throughout the code.
  */
 RDMALocalBlocks local_ram_blocks;
-RDMARemoteBlock *block;
+RDMADestBlock  *dest_blocks;
 
 /*
  * Migration on *destination* started.
@@ -590,7 +590,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 memset(local, 0, sizeof *local);
 qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
 trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
-rdma->block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
+rdma->dest_blocks = (RDMADestBlock *) g_malloc0(sizeof(RDMADestBlock) *
 rdma->local_ram_blocks.nb_blocks);
 local->init = true;
 return 0;
@@ -2177,8 +2177,8 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 rdma->connected = false;
 }
 
-g_free(rdma->block);
-rdma->block = NULL;
+g_free(rdma->dest_blocks);
+rdma->dest_blocks = NULL;
 
 for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
 if (rdma->wr_data[idx].control_mr) {
@@ -2967,25 +2967,25 @@ static int qemu_rdma_registration_handle(QEMUFile *f, 
void *opaque,
  * their "local" descriptions with what was sent.
  */
 for (i = 0; i < local->nb_blocks; i++) {
-rdma->block[i].remote_host_addr =
+rdma->dest_blocks[i].remote_host_addr =
 (uintptr_t)(local->block[i].local_host_addr);
 
 if (rdma->pin_all) {
-rdma->block[i].remote_rkey = local->block[i].mr->rkey;
+rdma->dest_blocks[i].remote_rkey = 
local->block[i].mr->rkey;
 }
 
-rdma->block[i].offset = local->block[i].offset;
-rdma->block[i].length = local->block[i].length;
+rdma->dest_blocks[i].offset = local->block[i].offset;
+rdma->dest_blocks[i].length = local->block[i].length;
 
-remote_block_to_network(&rdma->block[i]);
+dest_block_to_network(&rdma->dest_blocks[i]);
 }
 
 blocks.len = rdma->local_ram_blocks.nb_blocks
-* sizeof(RDMARemoteBlock);
+* sizeof(RDMADestBlock);
 
 
 ret = qemu_rdma_post_send_control(rdma,
-(uint8_t *) rdma->block, &blocks);
+(uint8_t *) rdma->dest_blocks, 
&blocks);
 
 if (r

[Qemu-devel] Runtime-modified DIMMs and live migration issue

2015-06-11 Thread Andrey Korolyov

Hello Igor,

the current hotplug code for dimms effectively prohibiting a
successful migration for VM if memory was added after startup:

- start a VM with certain amount of empty memory slots,
- add some dimms and online them in guest (I am transitioning from 2
to 16G with 512Mb DIMMs),
- migrate a VM and observe guest null pointer dereference (or BSOD
with reboot, for Windows).

Issue is currently touching all stable versions and assumingly master,
as there are no related fixes/RFCs since 2.3 I`m currently using for
testing. The issue is related to an incorrect population of the
regions during runtime hotplugging, hopefully 2.4 will get the fix.

You may run some workload in guest to achieve one hundred percent
certainty of hitting the issue, for example, fio against
http://xdel.ru/downloads/fio.txt . QEMU args are simular to '... -m
512,slots=31,maxmem=16384M -object
memory-backend-ram,id=mem0,size=512M -device
pc-dimm,id=dimm0,node=0,memdev=mem0 -object
memory-backend-ram,id=mem1,size=512M -device
pc-dimm,id=dimm1,node=0,memdev=mem1 -object
memory-backend-ram,id=mem2,size=512M -device
pc-dimm,id=dimm2,node=0,memdev=mem2...'

Thanks for looking into this!
11 June 2015, 19:50:14  [ 141.005630] fio[2742]: segfault at 0 ip (null) sp 
7f841ab5aeb8 error 14
11 June 2015, 19:50:14  in fio[40+58000]
11 June 2015, 19:50:14  NULL pointer dereference
11 June 2015, 19:50:14  at 0028
11 June 2015, 19:50:14  [ 141.006282] IP:
11 June 2015, 19:50:14  [ 141.006316] PGD 107ccc067
11 June 2015, 19:50:14  PUD 106056067
11 June 2015, 19:50:14  [ 141.006319] Oops:  [#1]
11 June 2015, 19:50:14  SMP
11 June 2015, 19:50:14  nfsd
11 June 2015, 19:50:14  auth_rpcgss
11 June 2015, 19:50:14  oid_registry
11 June 2015, 19:50:14  nfs
11 June 2015, 19:50:14  lockd
11 June 2015, 19:50:14  fscache
11 June 2015, 19:50:14  netconsole
11 June 2015, 19:50:14  configfs
11 June 2015, 19:50:14  crct10dif_pclmul
11 June 2015, 19:50:14  crct10dif_common
11 June 2015, 19:50:14  ghash_clmulni_intel
11 June 2015, 19:50:14  aesni_intel
11 June 2015, 19:50:14  lrw
11 June 2015, 19:50:14  gf128mul
11 June 2015, 19:50:14  ablk_helper
11 June 2015, 19:50:14  psmouse
11 June 2015, 19:50:14  parport_pc
11 June 2015, 19:50:14  virtio_console
11 June 2015, 19:50:14  serio_raw
11 June 2015, 19:50:14  evdev
11 June 2015, 19:50:14  pcspkr
11 June 2015, 19:50:14  processor
11 June 2015, 19:50:14  thermal_sys
11 June 2015, 19:50:14  button
11 June 2015, 19:50:14  ext4
11 June 2015, 19:50:14  mbcache
11 June 2015, 19:50:14  ata_generic
11 June 2015, 19:50:14  virtio_blk
11 June 2015, 19:50:14  crc32c_intel
11 June 2015, 19:50:14  floppy
11 June 2015, 19:50:14  xhci_hcd
11 June 2015, 19:50:14  libata
11 June 2015, 19:50:14  virtio_ring
11 June 2015, 19:50:14  usbcore
11 June 2015, 19:50:14  usb_common
11 June 2015, 19:50:14  [ 141.006396] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt7-1~bpo70+1
11 June 2015, 19:50:14  [ 141.006397] Hardware name: SuperMicro Virtual 
Appliance, BIOS 1.1
11 June 2015, 19:50:14  [ 141.006403] RIP: 0010:[]
11 June 2015, 19:50:14  [] ext4_finish_bio+0xd8/0x220 [ext4]
11 June 2015, 19:50:14  [ 141.006415] RAX:  RBX: 
 RCX: 1000
11 June 2015, 19:50:14  [ 141.006415] RDX: 000d RSI: 
ea0010ea6818 RDI: 88001c291300
11 June 2015, 19:50:14  [ 141.006417] R10: 0002 R11: 
0040 R12: 8804a41aaf98
11 June 2015, 19:50:14  [ 141.006419] FS: () 
GS:88001fc0() knlGS:
11 June 2015, 19:50:14  [ 141.006420] CS: 0010 DS:  ES:  CR0: 
80050033
11 June 2015, 19:50:14  [ 141.006434] Stack:
11 June 2015, 19:50:14  [ 141.006435] 005e
11 June 2015, 19:50:14  0007
11 June 2015, 19:50:14  88001fdd2ec0
11 June 2015, 19:50:14  88001c291300
11 June 2015, 19:50:14  88001d84c240
11 June 2015, 19:50:14  [ 141.006439] 0093
11 June 2015, 19:50:14  88001d84c940
11 June 2015, 19:50:14  d794c666350eb7d9
11 June 2015, 19:50:14  [ 141.006442] 
11 June 2015, 19:50:14  [] ? ext4_end_bio+0xc6/0x130 [ext4]
11 June 2015, 19:50:14  [] ? blk_update_request+0x9b/0x310
11 June 2015, 19:50:14  [ 141.006488]
11 June 2015, 19:50:14  [ 141.006494]
11 June 2015, 19:50:14  [] ? 
__blk_mq_complete_request+0x79/0x110
11 June 2015, 19:50:14  [] ? virtblk_done+0x4d/0xb0 
[virtio_blk]
11 June 2015, 19:50:14  [ 141.006506]
11 June 2015, 19:50:14  [ 141.006512]
11 June 2015, 19:50:14  [] ? 
handle_irq_event_percpu+0x54/0x1e0
11 June 2015, 19:50:14  [] ? 
update_blocked_averages+0x24a/0x5f0
11 June 2015, 19:50:14  [ 141.006540]
11 June 2015, 19:50:14  [ 141.006542]
11 June 2015, 19:50:14  [] ? handle_edge_irq+0x7d/0x120
11 June 2015, 19:50:14  [] ? handle_irq+0x1d/0x30
11 June 2015, 19:50:14  [ 141.006559]
11 June 2015, 19:50:14  [ 141.006579]
11 June 2015, 19:50:15  [ 141.006582]
11 June 2015, 19:50:15  [] ? __do_softirq+0x88/0x2e

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
> >On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
> >>On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
> >>>On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> The fixes solves the following issue:
> The PXB device exposes a new  pci root bridge with the
> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> Before this patch the fw path was wrongly computed:
>  /pci-root@1/pci@i0cf8/...
> Fix the above issues: Correct the bus number and remove the
> extra host bridge description.
> >>>
> >>>Why is that wrong?  The previous path looks correct to me.
> >>The prev path includes both the extra root bridge and *then* the usual host 
> >>bridge.
> >>  /pci-root@1/pci@i0cf8/   ...
> >> ^ new   ^ regular  ^ devices
> >>
> >>Since the new pci root bridge (and bus) is on "paralel" with the regular 
> >>one.
> >>it is not correct to add it to the path.
> >>
> >>The architecture is:
> >>  //devices...
> >>  /extra root bridge/devices...
> >>  /extra root bridge/devices...
> >>And not
> >>/extra root bridge///devices
> >
> >Your patch changed both the "/extra root bridge/devices..." part and
> >the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
> >correct IMO.
> Why? @1 should be the unit address which is the text representation
> of the physical address, in our case the slot. Since the bus number
> in our case is 4, I think /pci-root@4/ is the 'correct' address.

On real machines, the firmware assigns the 4 - it's not a physical
address; it's a logical address (like all bus numbers in PCI).  The
firmware might assign a totally different number on the next boot.

-Kevin

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
> On 06/11/15 15:58, Kevin O'Connor wrote:
> > On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> >> The fixes solves the following issue:
> >> The PXB device exposes a new  pci root bridge with the
> >> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> >> Before this patch the fw path was wrongly computed:
> >> /pci-root@1/pci@i0cf8/...
> >> Fix the above issues: Correct the bus number and remove the
> >> extra host bridge description.
> > 
> > Why is that wrong?  The previous path looks correct to me.
> > 
> >> The IEEE Std 1275-1994:
> >>
> >>   IEEE Standard for Boot (Initialization Configuration)
> >> Firmware: Core Requirements and Practices
> >>   3.2.1.1 Node names
> >>   Each node in the device tree is identified by a node name
> >>   using the following notation:
> >>   driver-name@unit-address:device-arguments
> >>
> >>   The driver name field is a sequence of between one and 31
> >>   letters [...]. By convention, this name includes the name of
> >>   the device’s manufacturer and the device’s model name separated 
> >> by
> >>   a “,”.
> >>
> >>   The unit address field is the text representation of the
> >>   physical address of the device within the address space
> >>   defined by its parent node. The form of the text
> >>   representation is bus-dependent.
> > 
> > Note the "physical address" part in the above.  Your patch changes the
> > "pci-root@" syntax to use a logical address instead of a physical
> > address.  That is, unless I've missed something, SeaBIOS today uses a
> > physical address (the n'th root bus) and the patch would change it to
> > use a logical address.
> > 
> > One of the goals of using an "openfirmware" like address was so that
> > they would be stable across boots (the same mechanism is also used
> > with coreboot).  Using a physical address is key for this, because
> > simply adding or removing a PCI device could cause the logical PCI
> > bridge enumeration to change - and that would mess up the bootorder
> > list if it was based on logical addresses.
> 
> There are two questions here. The first is the inclusion of the
> "pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
> The hunk that changes that is not your main concern, right? (And Marcel
> just described that hunk in more detail.)
> 
> The other question is how "x" is selected in "pci-root@x".
> 
> On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
> If you change that property from (say) 3 to 4, then the device paths
> exported by QEMU will change. However, the location (in the PCI
> hierarchy) of all the affected devices will *also* change at once, and
> their auto-enumerated, firmware-side device paths will reflect that.
> Therefore the new "bootorder" fw_cfg entries will match the freshly
> generated firmware-side device paths.
> 
> So why is this not stable? If you change the hardware without
> automatically updating any stashed firmware-side device paths, then
> things will fall apart without "bootorder" entries in the picture anyway.
> 
> Also, assuming you key off "x" of the running counter that counts root
> buses as they are found during enumeration, that's a possibility too,
> but I don't see how it gives more stability. If you insert a new root
> bus (with a device on it) between to preexistent ones, that will offset
> all the "x" values for the root buses that come after it by one.

The SeaBIOS code is used on both virtual machines and real machines.
The bus number is something that is generated by software and it is
not assured to be stable between boots.  (For example, if someone adds
a PCI device to their machine between boots then every bus number in
the system might be different on the next boot.)  The open firmware
paths go to great length to avoid arbitrary bus numbers today - for
example:

/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0

Given the complexity to avoid arbitrary bus numbers I'm confused why
one would want to add them.

> In UEFI at least (I'm not speaking about OVMF in particular, but the
> UEFI spec), there is a "short-form device path" concept for hard drive
> and USB boot options. For hard disks, it is practically a relative
> device path that lacks the path fragment from the root node until just
> before the GPT partition identifier. The idea being, if you plug your
> SCSI controller in another PCI slot, the change in the full device path
> will be local to the path fragment that is not captured in the
> (persistent) boot option. The GPT GUID can identify the partition
> uniquely in the system wherever it exists, so it can be booted even
> without fully enumerating all devices and reproducing all the default
> boot options.
> 
> Short of such a "uniquely identifying relative devpath" trick, I don't
> think stability

[Qemu-devel] [PATCH 5/6] nmi: Implement inject_nmi() for non-monitor context use

2015-06-11 Thread Christian Borntraeger

From: Xu Wang 

Let's introduce a general "inject_nmi()" function that doesn't rely on the cpu
index of the monitor, but uses cpu index 0 as default (except for x86).
This function can then later be used from a non-monitor context.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
CC: Alexey Kardashevskiy 
Signed-off-by: Christian Borntraeger 
---
 hw/core/nmi.c| 20 
 include/hw/nmi.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/hw/core/nmi.c b/hw/core/nmi.c
index 3dff020..5260d6c 100644
--- a/hw/core/nmi.c
+++ b/hw/core/nmi.c
@@ -21,6 +21,7 @@
 
 #include "hw/nmi.h"
 #include "qapi/qmp/qerror.h"
+#include "monitor/monitor.h"
 
 struct do_nmi_s {
 int cpu_index;
@@ -70,6 +71,25 @@ void nmi_monitor_handle(int cpu_index, Error **errp)
 }
 }
 
+void inject_nmi(void)
+{
+#if defined(TARGET_I386)
+CPUState *cs;
+
+CPU_FOREACH(cs) {
+X86CPU *cpu = X86_CPU(cs);
+
+if (!cpu->apic_state) {
+cpu_interrupt(cs, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(cpu->apic_state);
+}
+}
+#else
+nmi_monitor_handle(0, NULL);
+#endif
+}
+
 static const TypeInfo nmi_info = {
 .name  = TYPE_NMI,
 .parent= TYPE_INTERFACE,
diff --git a/include/hw/nmi.h b/include/hw/nmi.h
index b541772..f4cec62 100644
--- a/include/hw/nmi.h
+++ b/include/hw/nmi.h
@@ -45,5 +45,6 @@ typedef struct NMIClass {
 } NMIClass;
 
 void nmi_monitor_handle(int cpu_index, Error **errp);
+void inject_nmi(void);
 
 #endif /* NMI_H */
-- 
2.3.0

[Qemu-devel] [PATCH 1/6] watchdog: change option wording to allow for more watchdogs

2015-06-11 Thread Christian Borntraeger

From: Xu Wang 

We will introduce a new watchdog for s390x. Lets adopt
qemu-options.hx to allow more watchdog devices.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
[split out qemu-option.hx base changes]
---
 qemu-options.hx | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 1d281f6..a295c0f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3152,7 +3152,7 @@ when the shift value is high (how high depends on the 
host machine).
 ETEXI
 
 DEF("watchdog", HAS_ARG, QEMU_OPTION_watchdog, \
-"-watchdog i6300esb|ib700\n" \
+"-watchdog model\n" \
 "enable virtual hardware watchdog [default=none]\n",
 QEMU_ARCH_ALL)
 STEXI
@@ -3160,16 +3160,21 @@ STEXI
 @findex -watchdog
 Create a virtual hardware watchdog device.  Once enabled (by a guest
 action), the watchdog must be periodically polled by an agent inside
-the guest or else the guest will be restarted.
+the guest or else the guest will be restarted. Choose a model for
+which your guest has drivers.
 
-The @var{model} is the model of hardware watchdog to emulate.  Choices
-for model are: @code{ib700} (iBASE 700) which is a very simple ISA
-watchdog with a single timer, or @code{i6300esb} (Intel 6300ESB I/O
-controller hub) which is a much more featureful PCI-based dual-timer
-watchdog.  Choose a model for which your guest has drivers.
-
-Use @code{-watchdog help} to list available hardware models.  Only one
+The @var{model} is the model of hardware watchdog to emulate. Use
+@code{-watchdog help} to list available hardware models. Only one
 watchdog can be enabled for a guest.
+
+The following models may be available:
+@table @option
+@item ib700
+iBASE 700 is a very simple ISA watchdog with a single timer.
+@item i6300esb
+Intel 6300ESB I/O controller hub is a much more featureful PCI-based
+dual-timer watchdog.
+@end table
 ETEXI
 
 DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \
-- 
2.3.0

Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps

2015-06-11 Thread John Snow



On 06/11/2015 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 11.06.2015 02:42, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Vladimir Sementsov-Ogievskiy 
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2-dirty-bitmap.c |  5 +
>>>   block/qcow2.c  | 13 +++--
>>>   block/qcow2.h  |  9 +
>>>   3 files changed, 25 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
>>> index db83112..686a121 100644
>>> --- a/block/qcow2-dirty-bitmap.c
>>> +++ b/block/qcow2-dirty-bitmap.c
>>> @@ -188,6 +188,11 @@ static int
>>> qcow2_write_dirty_bitmaps(BlockDriverState *bs)
>>> s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>>>   s->dirty_bitmaps_size = dirty_bitmaps_size;
>>> +if (s->nb_dirty_bitmaps > 0) {
>>> +s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>>> +} else {
>>> +s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>>> +}
>>>   ret = qcow2_update_header(bs);
>>>   if (ret < 0) {
>>>   fprintf(stderr, "Could not update qcow2 header\n");
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 406e55d..f85a55a 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -182,6 +182,14 @@ static int
>>> qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>>   return ret;
>>>   }
>>>   +if (!(s->autoclear_features &
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>>> +s->nb_dirty_bitmaps > 0) {
>>> +ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>>> +if (ret < 0) {
>>> +return ret;
>>> +}
>>> +}
>>> +
>>>   #ifdef DEBUG_EXT
>>>   printf("Qcow2: Got dirty bitmaps extension:"
>>>  " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
>>> @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict
>>> *options, int flags,
>>>   }
>>> /* Clear unknown autoclear feature bits */
>>> -if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
>>> s->autoclear_features) {
>>> -s->autoclear_features = 0;
>>> +if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
>>> +(s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
>>> +s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;
>> Like Stefan already mentioned, fixing this |= to &= will fix iotest 036,
>> which is otherwise broken by this patch.
>>
>>>   ret = qcow2_update_header(bs);
>>>   if (ret < 0) {
>>>   error_setg_errno(errp, -ret, "Could not update qcow2
>>> header");
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index b5e576c..14bd6f9 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -215,6 +215,15 @@ enum {
>>>   QCOW2_COMPAT_FEAT_MASK= QCOW2_COMPAT_LAZY_REFCOUNTS,
>>>   };
>>>   +/* Autoclear feature bits */
>>> +enum {
>>> +QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
>>> +QCOW2_AUTOCLEAR_DIRTY_BITMAPS   =
>>> +1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
>>> +
>>> +QCOW2_AUTOCLEAR_MASK=
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
>>> +};
>>> +
>> I find it a little awkward to have an enum with three different kinds of
>> data in it, unless I am reading this incorrectly. (bit position, bit
>> masks, and accumulated bit mask.)
>>
>> Just enumerating the indices is probably sufficient:
>>
>> enum {
>>QCOW2_AUTOCLEAR_BEGIN = 0,
>>QCOW2_AUTOCLEAR_DIRTY_BITMAPS = QCOW2_AUTOCLEAR_BEGIN,
>>...,
>>QCOW2_AUTOCLEAR_END
>> }
>>
>> and then the QCOW2_AUTOCLEAR_MASK can either be programmatically defined
>> via a function, or just pre-computed as a #define.
>>
>> If you still want the mask definitions, you could do something cheeky
>> like this:
>>
>> #define AUTOCLEAR_MASK(X) (1 << QCOW2_AUTOCLEAR_ ## X)
>>
>> and then you can use things like AUTOCLEAR_MASK(DIRTY_BITMAPS) without
>> having to create and maintain two separate tables if you want both forms
>> easily available.
> 
> 
> This enum is made like enums for  QCOW2_INCOMPAT_* and QCOW2_COMPAT_*,
> which are already in the code... Then, may I make a patch for them too?
> I agree, it is strange solution to put things of different nature to one
> enum.
> 

Follow Kevin's lead, here -- It looked strange to me, but it _is_ best
to follow the existing style. I didn't look at the surrounding code too
carefully.

> 
>>
>>>   enum qcow2_discard_type {
>>>   QCOW2_DISCARD_NEVER = 0,
>>>   QCOW2_DISCARD_ALWAYS,
>>>
> 
>

Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification

2015-06-11 Thread John Snow



On 06/11/2015 06:25 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 10.06.2015 18:34, Kevin Wolf wrote:
>> Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> From: Vladimir Sementsov-Ogievskiy 
>>>
>>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>>> other drives (there may be qcow2 file with zero disk size but with
>>> several dirty bitmaps for other drives).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   docs/specs/qcow2.txt | 66
>>> 
>>>   1 file changed, 66 insertions(+)
>>>
>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>> index 121dfc8..0fffba2 100644
>>> --- a/docs/specs/qcow2.txt
>>> +++ b/docs/specs/qcow2.txt
>>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like
>>> the following:
>>>   0x - End of the header extension area
>>>   0xE2792ACA - Backing file format name
>>>   0x6803f857 - Feature name table
>>> +0x23852875 - Dirty bitmaps
>>>   other  - Unknown header extension, can
>>> be safely
>>>ignored
>>>   @@ -166,6 +167,19 @@ the header extension data. Each entry look
>>> like this:
>>>   terminated if it has full length)
>>> +== Dirty bitmaps ==
>>> +
>>> +Dirty bitmaps is an optional header extension. It provides a
>>> possibility of
>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>> +
>>> +  0 -  3:  nb_dirty_bitmaps
>>> +   Number of dirty bitmaps contained in the image
>>> +
>>> +  4 - 11:  dirty_bitmaps_offset
>>> +   Offset into the image file at which the dirty
>>> bitmaps table
>>> +   starts. Must be aligned to a cluster boundary.
>>> +
>>> +
>>>   == Host cluster management ==
>> You need to use a compatibility flag because for old qemu versions, the
>> dirty bitmaps (and associated metadata) are leaked clusters and qemu-img
>> check would "repair" them by resetting the refcount to 0.
>>
>> At second sight, I see that your patches add an autoclear flag.
>> Presumably the contents of the dirty bitmaps is outdated when you
>> accessed the image with an older version, so this seems right. We just
>> need to document it.
>>
>>>   qcow2 manages the allocation of host clusters by maintaining a
>>> reference count
>>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>> variable:   Padding to round up the snapshot table entry
>>> size to the
>>>   next multiple of 8.
>>> +
>>> +
>>> +== Dirty bitmaps ==
>>> +
>>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>>> +
>>> +=== Cluster mapping ===
>>> +
>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>> +bitmaps to host clusters. There is only an L1 table.
>>> +
>>> +The L1 table has a variable size (stored in the Bitmap table entry)
>>> and may
>>> +use multiple clusters, however it must be contiguous in the image file.
>>> +
>>> +Given an offset into the bitmap, the offset into the image file can be
>>> +obtained as follows:
>>> +
>>> +offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>>> +
>>> +L1 table entry:
>>> +
>>> +Bit  0 -  61:   Standard cluster descriptor
>>> +
>>> +62 -  63:   Reserved
>> Stefan already mentioned that we don't have a "L1" when there is only
>> one level, and that you shouldn't reuse the cluster descriptors from L2
>> tables.
>>
>>> +=== Bitmap table ===
>>> +
>>> +A directory of all bitmaps is stored in the bitmap table, a
>>> contiguous area in
>>> +the image file, whose starting offset and length are given by the
>>> header fields
>>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap
>>> table have
>>> +variable length, depending on the length of name and extra data.
>>> +
>>> +Bitmap table entry:
>>> +
>>> +Byte 0 -  7:Offset into the image file at which the L1 table
>>> for the
>>> +bitmap starts. Must be aligned to a cluster
>>> boundary.
>>> +
>>> + 8 - 11:Number of entries in the L1 table of the bitmap
>> Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
>> smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
>> default, which might be enough for a while.
>>
>>> +12 - 15:Bitmap granularity in bytes
>>> +
>>> +16 - 23:Bitmap size in sectors
>> Please don't use sectors, that's a meaningless unit. Bytes is better.
> Just bad description. Actually it is ~ (number of bits in bitmap *
> granularity), and it is corresponding to number of sectors in the image.

In defense of this, it does happen to be sectors, but what it /really/
represents is the virtual addressable range of the bitmap (its

[Qemu-devel] [PATCH 2/6] s390x/watchdog: introduce diag288 watchdog device

2015-06-11 Thread Christian Borntraeger

From: Xu Wang 

This patch introduces a new diag288 watchdog device that will, just like
other watchdogs, monitor a guest and take corresponding actions when it
detects that the guest is not responding.

diag288 is s390x specific. The wiring to s390x KVM will be done in
separate patches.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
[split out qemu-option.hx base changes]
---
 default-configs/s390x-softmmu.mak |   1 +
 hw/watchdog/Makefile.objs |   1 +
 hw/watchdog/wdt_diag288.c | 110 ++
 include/hw/watchdog/wdt_diag288.h |  36 +
 qemu-options.hx   |   3 ++
 5 files changed, 151 insertions(+)
 create mode 100644 hw/watchdog/wdt_diag288.c
 create mode 100644 include/hw/watchdog/wdt_diag288.h

diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index f9e13f1..36e15de 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -4,3 +4,4 @@ CONFIG_VIRTIO=y
 CONFIG_SCLPCONSOLE=y
 CONFIG_S390_FLIC=y
 CONFIG_S390_FLIC_KVM=$(CONFIG_KVM)
+CONFIG_WDT_DIAG288=y
diff --git a/hw/watchdog/Makefile.objs b/hw/watchdog/Makefile.objs
index 4b0374a..72e3ffd 100644
--- a/hw/watchdog/Makefile.objs
+++ b/hw/watchdog/Makefile.objs
@@ -1,3 +1,4 @@
 common-obj-y += watchdog.o
 common-obj-$(CONFIG_WDT_IB6300ESB) += wdt_i6300esb.o
 common-obj-$(CONFIG_WDT_IB700) += wdt_ib700.o
+common-obj-$(CONFIG_WDT_DIAG288) += wdt_diag288.o
diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
new file mode 100644
index 000..351b5a8
--- /dev/null
+++ b/hw/watchdog/wdt_diag288.c
@@ -0,0 +1,110 @@
+/*
+ * watchdog device diag288 support
+ *
+ * Copyright IBM, Corp. 2015
+ *
+ * Authors:
+ *  Xu Wang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "sysemu/watchdog.h"
+#include "hw/sysbus.h"
+#include "qemu/timer.h"
+#include "hw/watchdog/wdt_diag288.h"
+
+static WatchdogTimerModel model = {
+.wdt_name = TYPE_WDT_DIAG288,
+.wdt_description = "diag288 device for s390x platform",
+};
+
+static void wdt_diag288_reset(DeviceState *dev)
+{
+DIAG288State *diag288 = DIAG288(dev);
+
+diag288->enabled = false;
+timer_del(diag288->timer);
+}
+
+static void diag288_timer_expired(void *dev)
+{
+qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
+watchdog_perform_action();
+wdt_diag288_reset(dev);
+}
+
+static int wdt_diag288_handle_timer(DIAG288State *diag288,
+ uint64_t func, uint64_t timeout)
+{
+switch (func) {
+case WDT_DIAG288_INIT:
+diag288->enabled = true;
+/* fall through */
+case WDT_DIAG288_CHANGE:
+if (!diag288->enabled) {
+return -1;
+}
+timer_mod(diag288->timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
+  timeout * get_ticks_per_sec());
+break;
+case WDT_DIAG288_CANCEL:
+if (!diag288->enabled) {
+return -1;
+}
+diag288->enabled = false;
+timer_del(diag288->timer);
+break;
+default:
+return -1;
+}
+
+return 0;
+}
+
+static void wdt_diag288_realize(DeviceState *dev, Error **errp)
+{
+DIAG288State *diag288 = DIAG288(dev);
+
+diag288->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, diag288_timer_expired,
+  dev);
+}
+
+static void wdt_diag288_unrealize(DeviceState *dev, Error **errp)
+{
+DIAG288State *diag288 = DIAG288(dev);
+
+timer_del(diag288->timer);
+timer_free(diag288->timer);
+}
+
+static void wdt_diag288_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+DIAG288Class *diag288 = DIAG288_CLASS(klass);
+
+dc->realize = wdt_diag288_realize;
+dc->unrealize = wdt_diag288_unrealize;
+dc->reset = wdt_diag288_reset;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+diag288->handle_timer = wdt_diag288_handle_timer;
+}
+
+static const TypeInfo wdt_diag288_info = {
+.class_init = wdt_diag288_class_init,
+.parent = TYPE_DEVICE,
+.name  = TYPE_WDT_DIAG288,
+.instance_size  = sizeof(DIAG288State),
+.class_size = sizeof(DIAG288Class),
+};
+
+static void wdt_diag288_register_types(void)
+{
+watchdog_add_model(&model);
+type_register_static(&wdt_diag288_info);
+}
+
+type_init(wdt_diag288_register_types)
diff --git a/include/hw/watchdog/wdt_diag288.h 
b/include/hw/watchdog/wdt_diag288.h
new file mode 100644
index 000..7f3fd45
--- /dev/null
+++ b/include/hw/watchdog/wdt_diag288.h
@@ -0,0 +1,36 @@
+#ifndef WDT_DIAG288_H
+#define WDT_DIAG288_H
+
+#include "hw/qdev.h"
+
+#define TYPE_WDT_DIAG288 "diag288"
+#define DIAG288(obj) \
+OBJECT_CHECK(DIAG288State, (obj), TYPE_WDT_DIAG288)
+#define DIAG288_CLASS(klass) \
+OBJECT_CLASS_CHE

Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification

2015-06-11 Thread John Snow

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 06/11/2015 09:03 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 11, 2015 at 01:19:24PM +0300, Vladimir
> Sementsov-Ogievskiy wrote:
>> On 10.06.2015 16:24, Stefan Hajnoczi wrote:
>>> On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir
>>> Sementsov-Ogievskiy wrote:
 On 09.06.2015 20:03, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir
> Sementsov-Ogievskiy wrote:
>> @@ -166,6 +167,19 @@ the header extension data. Each
>> entry look like this: terminated if it has full length) 
>> +== Dirty bitmaps == + +Dirty bitmaps is an optional
>> header extension. It provides a possibility of +storing
>> dirty bitmaps in qcow2 image. The fields are: + +
>> 0 -  3:  nb_dirty_bitmaps +   Number of
>> dirty bitmaps contained in the image
> Is there a maximum?
 hmm. any proposals for this?
>>> 65535 seems practical.
>> 
>> So, you suggest to reduce this field width to 2b? And additional
>> 2 bytes reserved field, to achieve 8b-alignment?
> 
> No, I would leave it 32-bit but impose a little (which can be
> increased later if necessary).  That's how nb_snapshots works too.
> 

Doesn't the code already limit the number of bitmaps via +#define
QCOW_MAX_DIRTY_BITMAPS 65536, from patch 2?
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVebWLAAoJEH3vgQaq/DkOJCYP/jUSJT+jhb3+GvAtddCssyYR
u1BHZacXyTsDTwX4WDZQ5eGEZJZeSwu7++w5N+m+62yDxervarfEE0G/nuGRSNWx
0zYF0RrlYZFdDqed18rgXJJjCtNo1jp67ojk+xpEBUMx9cgFa6s+BTkrY0h+4hiO
V3mvU0H1+8by1Ss5lvziKCHdrksGyBIS4gw+WZNshdOc46/nBZfSlh6CWmtOO/5S
XZwXLKE7QMJMzigdcLJBOlymRwnF094Myklf8fZQILgbdoHoKhEEj9gVWkSpoNk9
FkMDDS1qN5vtYy5Ehzwy9QpbsN5ZEhuHoj5N8k0vDfFHgB9KKvOChvxf2lVhgbz7
fvGpqUb4eEdTvRno9V+8KoEcs99JXLvhed8LrfcZzq05WKbLeAdXYj18QrDw8pdY
Fl4kV5Ca4dpvDAcNZDlCKERv+STLh56hYXEYtjzNEXL+ryQwUyHetY/M6Qodq0j2
FtJq21aj68vEOovQQcX2QxqRxkPzDEvNPbM+phBOh2FjQkbvB6I5bs/ueloyi2q9
UtXWhR6ImUgA6LN25OIc6GS9xYJsFiQlLh1uI/bJoDEpQvVnMojAXE7SohyTya89
2+HIGJsdkbBZsc4SN1INqcsRCeN1at8KiwdIbAijrciF9WIsv0kUEvCvmA93UVYp
s2Os9g5QgMXrK1icCK5J
=CIuZ
-END PGP SIGNATURE-

[Qemu-devel] [PATCH 3/6] s390x/kvm: diag288 instruction interception and handling

2015-06-11 Thread Christian Borntraeger

From: Xu Wang 

Intercept the diag288 requests from kvm guests, and hand the
requested command to the diag288 watchdog device for further
handling.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 target-s390x/cpu.h |  1 +
 target-s390x/kvm.c | 18 ++
 target-s390x/misc_helper.c | 29 +
 3 files changed, 48 insertions(+)

diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index 584e74b..d63eb51 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -1100,6 +1100,7 @@ uint32_t set_cc_nz_f128(float128 v);
 
 /* misc_helper.c */
 #ifndef CONFIG_USER_ONLY
+int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3);
 void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3);
 #endif
 void program_interrupt(CPUS390XState *env, uint32_t code, int ilen);
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index f6f61b9..b02ff8d 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -98,6 +98,7 @@
 #define PRIV_E3_MPCIFC  0xd0
 #define PRIV_E3_STPCIFC 0xd4
 
+#define DIAG_TIMEREVENT 0x288
 #define DIAG_IPL0x308
 #define DIAG_KVM_HYPERCALL  0x500
 #define DIAG_KVM_BREAKPOINT 0x501
@@ -1267,6 +1268,20 @@ static int handle_hypercall(S390CPU *cpu, struct kvm_run 
*run)
 return ret;
 }
 
+static void kvm_handle_diag_288(S390CPU *cpu, struct kvm_run *run)
+{
+uint64_t r1, r3;
+int rc;
+
+cpu_synchronize_state(CPU(cpu));
+r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
+r3 = run->s390_sieic.ipa & 0x000f;
+rc = handle_diag_288(&cpu->env, r1, r3);
+if (rc) {
+enter_pgmcheck(cpu, PGM_SPECIFICATION);
+}
+}
+
 static void kvm_handle_diag_308(S390CPU *cpu, struct kvm_run *run)
 {
 uint64_t r1, r3;
@@ -1306,6 +1321,9 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, 
uint32_t ipb)
  */
 func_code = decode_basedisp_rs(&cpu->env, ipb, NULL) & DIAG_KVM_CODE_MASK;
 switch (func_code) {
+case DIAG_TIMEREVENT:
+kvm_handle_diag_288(cpu, run);
+break;
 case DIAG_IPL:
 kvm_handle_diag_308(cpu, run);
 break;
diff --git a/target-s390x/misc_helper.c b/target-s390x/misc_helper.c
index b375ab7..6711504 100644
--- a/target-s390x/misc_helper.c
+++ b/target-s390x/misc_helper.c
@@ -30,6 +30,7 @@
 #include 
 #endif
 #include "exec/cpu_ldst.h"
+#include "hw/watchdog/wdt_diag288.h"
 
 #if !defined(CONFIG_USER_ONLY)
 #include "sysemu/cpus.h"
@@ -153,6 +154,34 @@ static int load_normal_reset(S390CPU *cpu)
 return 0;
 }
 
+int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
+{
+uint64_t func = env->regs[r1];
+uint64_t timeout = env->regs[r1 + 1];
+uint64_t action = env->regs[r3];
+Object *obj;
+DIAG288State *diag288;
+DIAG288Class *diag288_class;
+
+if (r1 % 2 || action != 0) {
+return -1;
+}
+
+/* Timeout must be more than 15 seconds except for timer deletion */
+if (func != WDT_DIAG288_CANCEL && timeout < 15) {
+return -1;
+}
+
+obj = object_resolve_path_type("", TYPE_WDT_DIAG288, NULL);
+if (!obj) {
+return -1;
+}
+
+diag288 = DIAG288(obj);
+diag288_class = DIAG288_GET_CLASS(diag288);
+return diag288_class->handle_timer(diag288, func, timeout);
+}
+
 #define DIAG_308_RC_OK  0x0001
 #define DIAG_308_RC_NO_CONF 0x0102
 #define DIAG_308_RC_INVALID 0x0402
-- 
2.3.0

[Qemu-devel] [PATCH 0/6] s390x/watchdog: add diag288 based watchdog

2015-06-11 Thread Christian Borntraeger

This is the reworked patch set for the s390 diag288 watchdog.
The previous version was posted bny Cornelia in this thread:
https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg01950.html

This patch set should address all review comments.
If there are no blockers, pull request is planned for next
week.

Mao Chuan Li (1):
  watchdog: Add new Virtual Watchdog action INJECT-NMI

Xu Wang (5):
  watchdog: change option wording to allow for more watchdogs
  s390x/watchdog: introduce diag288 watchdog device
  s390x/kvm: diag288 instruction interception and handling
  s390x/watchdog: diag288 migration support
  nmi: Implement inject_nmi() for non-monitor context use

 default-configs/s390x-softmmu.mak |   1 +
 hw/core/nmi.c |  20 +++
 hw/watchdog/Makefile.objs |   1 +
 hw/watchdog/watchdog.c|  10 
 hw/watchdog/wdt_diag288.c | 122 ++
 include/hw/nmi.h  |   1 +
 include/hw/watchdog/wdt_diag288.h |  36 +++
 qapi-schema.json  |   6 +-
 qemu-options.hx   |  26 +---
 target-s390x/cpu.h|   1 +
 target-s390x/kvm.c|  18 ++
 target-s390x/misc_helper.c|  29 +
 12 files changed, 261 insertions(+), 10 deletions(-)
 create mode 100644 hw/watchdog/wdt_diag288.c
 create mode 100644 include/hw/watchdog/wdt_diag288.h

-- 
2.3.0

[Qemu-devel] [PATCH 4/6] s390x/watchdog: diag288 migration support

2015-06-11 Thread Christian Borntraeger

From: Xu Wang 

Add vmstate structure to keep state and data during migration.

Signed-off-by: Xu Wang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 hw/watchdog/wdt_diag288.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
index 351b5a8..1185e06 100644
--- a/hw/watchdog/wdt_diag288.c
+++ b/hw/watchdog/wdt_diag288.c
@@ -21,6 +21,17 @@ static WatchdogTimerModel model = {
 .wdt_description = "diag288 device for s390x platform",
 };
 
+static const VMStateDescription vmstate_diag288 = {
+.name = "vmstate_diag288",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_TIMER_PTR(timer, DIAG288State),
+VMSTATE_BOOL(enabled, DIAG288State),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void wdt_diag288_reset(DeviceState *dev)
 {
 DIAG288State *diag288 = DIAG288(dev);
@@ -90,6 +101,7 @@ static void wdt_diag288_class_init(ObjectClass *klass, void 
*data)
 dc->unrealize = wdt_diag288_unrealize;
 dc->reset = wdt_diag288_reset;
 set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+dc->vmsd = &vmstate_diag288;
 diag288->handle_timer = wdt_diag288_handle_timer;
 }
 
-- 
2.3.0

[Qemu-devel] [PATCH 6/6] watchdog: Add new Virtual Watchdog action INJECT-NMI

2015-06-11 Thread Christian Borntraeger

From: Mao Chuan Li 

This patch allows QEMU to inject a NMI into a guest when the
watchdog expires.

Signed-off-by: Mao Chuan Li 
Reviewed-by: David Hildenbrand 
CC: Eric Blake 
CC: Markus Armbruster 
Signed-off-by: Christian Borntraeger 
---
 hw/watchdog/watchdog.c | 10 ++
 qapi-schema.json   |  6 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/watchdog/watchdog.c b/hw/watchdog/watchdog.c
index 54440c9..8d4b0ee 100644
--- a/hw/watchdog/watchdog.c
+++ b/hw/watchdog/watchdog.c
@@ -27,6 +27,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/watchdog.h"
 #include "qapi-event.h"
+#include "hw/nmi.h"
 
 /* Possible values for action parameter. */
 #define WDT_RESET1 /* Hard reset. */
@@ -35,6 +36,7 @@
 #define WDT_PAUSE4 /* Pause. */
 #define WDT_DEBUG5 /* Prints a message and continues running. */
 #define WDT_NONE 6 /* Do nothing. */
+#define WDT_NMI  7 /* Inject nmi into the guest */
 
 static int watchdog_action = WDT_RESET;
 static QLIST_HEAD(watchdog_list, WatchdogTimerModel) watchdog_list;
@@ -95,6 +97,8 @@ int select_watchdog_action(const char *p)
 watchdog_action = WDT_DEBUG;
 else if (strcasecmp(p, "none") == 0)
 watchdog_action = WDT_NONE;
+else if (strcasecmp(p, "inject-nmi") == 0)
+watchdog_action = WDT_NMI;
 else
 return -1;
 
@@ -138,5 +142,11 @@ void watchdog_perform_action(void)
 case WDT_NONE:
 qapi_event_send_watchdog(WATCHDOG_EXPIRATION_ACTION_NONE, 
&error_abort);
 break;
+
+case WDT_NMI:
+qapi_event_send_watchdog(WATCHDOG_EXPIRATION_ACTION_INJECT_NMI,
+ &error_abort);
+inject_nmi();
+break;
 }
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 6e17a5c..c4ee3ea 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3746,10 +3746,14 @@
 #
 # @none: nothing is done
 #
+# @inject-nmi: a non-maskable interrupt is injected into the first VCPU (all
+#  VCPUS on x86) (since 2.4)
+#
 # Since: 2.1
 ##
 { 'enum': 'WatchdogExpirationAction',
-  'data': [ 'reset', 'shutdown', 'poweroff', 'pause', 'debug', 'none' ] }
+  'data': [ 'reset', 'shutdown', 'poweroff', 'pause', 'debug', 'none',
+'inject-nmi' ] }
 
 ##
 # @IoOperationType
-- 
2.3.0

[Qemu-devel] About usb passthru and speed mismatch

2015-06-11 Thread Lin Ma


Hi Gerd,

Based on the current implementation of 'usb_host_open' in 
hw/usb/host-libusb.c,


When user performs usb_add, The usb device will be detached from kernel 
first, Then be checked for speed mismatch second.
If it found speed mismatch, The usb device isn't attached to guest, But 
the usb device can't be reattached to kernel either.


I'd like to write patch to add 'usb_check_attach' before detaching the 
device from kernel  _or_  add 'usb_host_attach_kernel' under 'fail:' of 
usb_host_open, (the latter doesn't make sense I think).

May I have your thoughts?


BTW, Have you missed a patch: "usb: Use usb_bus_find(-1) instead of 
usb_enabled() in usb_device_add/usb_device_del" which sending on June 
4th ? May I have your thoughts about that patch as well?



Thanks,
Lin

Re: [Qemu-devel] [PULL 00/42] pc, acpi, virtio

2015-06-11 Thread Peter Maydell

On 11 June 2015 at 12:57, Michael S. Tsirkin  wrote:
> The following changes since commit 309750fad51f17d1ec6195c5d8ad7d741596ddb6:
>
>   vhost: logs sharing (2015-06-04 12:44:49 +0200)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 4ebc736e9938a7e88ecc785734b17145bf802a56:
>
>   i386/acpi-build: fix PXB workarounds for unsupported BIOSes (2015-06-11 
> 12:40:30 +0200)
>
> 
> pc, acpi, virtio
>
> Most notably this includes virtio 1 patches
> Still not all devices converted, and not fully spec compliant,
> so disabled by default.
>
> Signed-off-by: Michael S. Tsirkin 
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [vhost] virtio (guest) interrupt kernel modules

2015-06-11 Thread Catalin Vasile

It seems that the problem is from the code itself.
If I call vhost_dev_start() in set_status() (in qemu), when the status
changes to VIRTIO_CONFIG_S_DRIVER_OK, only the communication path from
guest to vhost works.

In another scenario I send a message from guest to qemu, and the vq
handler in qemu calls vhost_dev_start(). From here right on the
communication works both ways between guest and vhost (I can also
receive signals from vhost to guest).

I used the first scenario because I took vhost-scsi as an example. The
thing is I don't understand what I'm doing different.

On Thu, Jun 11, 2015 at 5:26 PM, Catalin Vasile
 wrote:
> Is there something I need to install (kernel module or something like
> that) so that the virtio guest receives signals from vhost?
> The communication from and to qemu works (I can send a message from
> guest to qemu, and I can send a message from qemu to guest), and I can
> send a message from guest to vhost, but I don't get any signals when I
> call vhost_add_used_and_signal()).

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Laszlo Ersek

On 06/11/15 16:36, Marcel Apfelbaum wrote:
> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
>> On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
>>> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
 On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> The fixes solves the following issue:
> The PXB device exposes a new  pci root bridge with the
> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> Before this patch the fw path was wrongly computed:
>  /pci-root@1/pci@i0cf8/...
> Fix the above issues: Correct the bus number and remove the
> extra host bridge description.

 Why is that wrong?  The previous path looks correct to me.
>>> The prev path includes both the extra root bridge and *then* the
>>> usual host bridge.
>>>   /pci-root@1/pci@i0cf8/   ...
>>>  ^ new   ^ regular  ^ devices
>>>
>>> Since the new pci root bridge (and bus) is on "paralel" with the
>>> regular one.
>>> it is not correct to add it to the path.
>>>
>>> The architecture is:
>>>   //devices...
>>>   /extra root bridge/devices...
>>>   /extra root bridge/devices...
>>> And not
>>> /extra root bridge///devices
>>
>> Your patch changed both the "/extra root bridge/devices..." part and
>> the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
>> correct IMO.
> Why? @1 should be the unit address which is the text representation
> of the physical address, in our case the slot. Since the bus number
> in our case is 4, I think /pci-root@4/ is the 'correct' address.
>>
>> Does open-firmware have any examples for PCI paths and in particular
>> PCI paths when there are multiple root-buses?
> Maybe Laszlo can say more, but we both agreed that this would be the
> berst representation of extra root buses on both OVMF and Seabios.

The

  PCI Bus Binding to:
  IEEE Std 1275-1994 Standard for Boot
  (Initialization Configuration) Firmware

document (binding) does speak about this, as far as I can see, in

  2.2.1. Physical Address Formats

It first gives a "Numerical Representation" in device tree format (same
thing as in DTB / FDT), and then a "Text Representation" with references
to "Numerical Representation". It is *completely* Greek to me. It took
me minutes of staring just to vaguely understand how the current

  i0cf8

unit address comes together.

I've always treated the OFW devpaths that QEMU generates only
*syntactically* conformant to the (base) OFW spec, and never considered
the particular bindings 100% binding. That said, if someone finds where
the PCI binding defines unit addresses for *root* buses, please let me
know, just for reference.

>> It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
>> seems odd as the extra root bus is accessible via io accesses to
>> 0x0cf8.
> While this is true, /pci-root@[...]/ may represent also other kind of host
> bridges not only PXBs. But we can change this of course, as long as OVMF
> can also
> work with it.
> 
>>
>> Another option would be to place the pci-root@1 behind the pci@i0cf8
>> as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
>> to the host bridge as in "/pci@i0cf8,1/...".
> The latest representation makes sense to me,  but "/pci@i0cf8,4/...",
> after comma
> the bus number.
> 
> Laszlo, will this work for OVMF?

With the v3 patchset for QEMU, we could probably easily generate the
"i0cf8,4" unit address inside the PXB device model itself. (Of course
exactly what number should stand after the comma remains a question.)

Parsing it in OVMF is doable, albeit somewhat ugly.

In any case, I'm not convinced at all why this is a better idea than the
proposal in this patch.

Laszlo

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Marcel Apfelbaum


On 06/11/2015 05:24 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:

On 06/11/2015 04:58 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:

The fixes solves the following issue:
The PXB device exposes a new  pci root bridge with the
fw path:  /pci-root@4/..., in which 4 is the root bus number.
Before this patch the fw path was wrongly computed:
 /pci-root@1/pci@i0cf8/...
Fix the above issues: Correct the bus number and remove the
extra host bridge description.


Why is that wrong?  The previous path looks correct to me.

The prev path includes both the extra root bridge and *then* the usual host 
bridge.
  /pci-root@1/pci@i0cf8/   ...
 ^ new   ^ regular  ^ devices

Since the new pci root bridge (and bus) is on "paralel" with the regular one.
it is not correct to add it to the path.

The architecture is:
  //devices...
  /extra root bridge/devices...
  /extra root bridge/devices...
And not
/extra root bridge///devices


Your patch changed both the "/extra root bridge/devices..." part and
the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
correct IMO.

Why? @1 should be the unit address which is the text representation
of the physical address, in our case the slot. Since the bus number
in our case is 4, I think /pci-root@4/ is the 'correct' address.


Does open-firmware have any examples for PCI paths and in particular
PCI paths when there are multiple root-buses?

Maybe Laszlo can say more, but we both agreed that this would be the
berst representation of extra root buses on both OVMF and Seabios.


It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
seems odd as the extra root bus is accessible via io accesses to
0x0cf8.

While this is true, /pci-root@[...]/ may represent also other kind of host
bridges not only PXBs. But we can change this of course, as long as OVMF can 
also
work with it.



Another option would be to place the pci-root@1 behind the pci@i0cf8
as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
to the host bridge as in "/pci@i0cf8,1/...".

The latest representation makes sense to me,  but "/pci@i0cf8,4/...", after 
comma
the bus number.

Laszlo, will this work for OVMF?

Thanks,
Marcel



-Kevin

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Laszlo Ersek

On 06/11/15 15:58, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>> The fixes solves the following issue:
>> The PXB device exposes a new  pci root bridge with the
>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>> Before this patch the fw path was wrongly computed:
>> /pci-root@1/pci@i0cf8/...
>> Fix the above issues: Correct the bus number and remove the
>> extra host bridge description.
> 
> Why is that wrong?  The previous path looks correct to me.
> 
>> The IEEE Std 1275-1994:
>>
>>   IEEE Standard for Boot (Initialization Configuration)
>> Firmware: Core Requirements and Practices
>>   3.2.1.1 Node names
>>   Each node in the device tree is identified by a node name
>>   using the following notation:
>>   driver-name@unit-address:device-arguments
>>
>>   The driver name field is a sequence of between one and 31
>>   letters [...]. By convention, this name includes the name of
>>   the device’s manufacturer and the device’s model name separated by
>>   a “,”.
>>
>>   The unit address field is the text representation of the
>>   physical address of the device within the address space
>>   defined by its parent node. The form of the text
>>   representation is bus-dependent.
> 
> Note the "physical address" part in the above.  Your patch changes the
> "pci-root@" syntax to use a logical address instead of a physical
> address.  That is, unless I've missed something, SeaBIOS today uses a
> physical address (the n'th root bus) and the patch would change it to
> use a logical address.
> 
> One of the goals of using an "openfirmware" like address was so that
> they would be stable across boots (the same mechanism is also used
> with coreboot).  Using a physical address is key for this, because
> simply adding or removing a PCI device could cause the logical PCI
> bridge enumeration to change - and that would mess up the bootorder
> list if it was based on logical addresses.

There are two questions here. The first is the inclusion of the
"pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
The hunk that changes that is not your main concern, right? (And Marcel
just described that hunk in more detail.)

The other question is how "x" is selected in "pci-root@x".

On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
If you change that property from (say) 3 to 4, then the device paths
exported by QEMU will change. However, the location (in the PCI
hierarchy) of all the affected devices will *also* change at once, and
their auto-enumerated, firmware-side device paths will reflect that.
Therefore the new "bootorder" fw_cfg entries will match the freshly
generated firmware-side device paths.

So why is this not stable? If you change the hardware without
automatically updating any stashed firmware-side device paths, then
things will fall apart without "bootorder" entries in the picture anyway.

Also, assuming you key off "x" of the running counter that counts root
buses as they are found during enumeration, that's a possibility too,
but I don't see how it gives more stability. If you insert a new root
bus (with a device on it) between to preexistent ones, that will offset
all the "x" values for the root buses that come after it by one.

In UEFI at least (I'm not speaking about OVMF in particular, but the
UEFI spec), there is a "short-form device path" concept for hard drive
and USB boot options. For hard disks, it is practically a relative
device path that lacks the path fragment from the root node until just
before the GPT partition identifier. The idea being, if you plug your
SCSI controller in another PCI slot, the change in the full device path
will be local to the path fragment that is not captured in the
(persistent) boot option. The GPT GUID can identify the partition
uniquely in the system wherever it exists, so it can be booted even
without fully enumerating all devices and reproducing all the default
boot options.

Short of such a "uniquely identifying relative devpath" trick, I don't
think stability in firmware-stashed (ie. not regenerated) device paths
exists in general, if the underlying hardware configuration is changed.

In summary: I think we could modify both QEMU and OVMF to use the
"serial numbers" of the extra PCI root buses, in increasing bus number
order, instead of their actual bus numbers, for identifying them. That's
just a convention. Then the second hunk of this patch would not be
necessary for SeaBIOS. But I think this convention would be only less
logical, and not more stable.

Can you please elaborate? I'm confused.

Thanks
Laszlo

Re: [Qemu-devel] [PULL 0/1] sdl patch queue

2015-06-11 Thread Peter Maydell

On 11 June 2015 at 09:37, Gerd Hoffmann  wrote:
>   Hi,
>
> Single fix patch queue for sdl.
>
> please pull,
>   Gerd
>
> The following changes since commit ee09f84e6bf5383a23c9624115c26b72aa1e076c:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2015-06-08 15:57:41 +0100)
>
> are available in the git repository at:
>
>
>   git://git.kraxel.org/qemu tags/pull-sdl-20150611-1
>
> for you to fetch changes up to 08d49df0dbaacc220a099dbfb644e1dc0eda57be:
>
>   sdl2: fix crash in handle_windowevent() when restoring the screen size 
> (2015-06-09 10:25:21 +0200)
>
> 
> sdl2: fix crash in handle_windowevent() when restoring the screen size
>
> 
> Alberto Garcia (1):
>   sdl2: fix crash in handle_windowevent() when restoring the screen size

Applied, thanks.

-- PMM

[Qemu-devel] [vhost] virtio (guest) interrupt kernel modules

2015-06-11 Thread Catalin Vasile

Is there something I need to install (kernel module or something like
that) so that the virtio guest receives signals from vhost?
The communication from and to qemu works (I can send a message from
guest to qemu, and I can send a message from qemu to guest), and I can
send a message from guest to vhost, but I don't get any signals when I
call vhost_add_used_and_signal()).

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
> >On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> >>The fixes solves the following issue:
> >>The PXB device exposes a new  pci root bridge with the
> >>fw path:  /pci-root@4/..., in which 4 is the root bus number.
> >>Before this patch the fw path was wrongly computed:
> >> /pci-root@1/pci@i0cf8/...
> >>Fix the above issues: Correct the bus number and remove the
> >>extra host bridge description.
> >
> >Why is that wrong?  The previous path looks correct to me.
> The prev path includes both the extra root bridge and *then* the usual host 
> bridge.
>  /pci-root@1/pci@i0cf8/   ...
> ^ new   ^ regular  ^ devices
> 
> Since the new pci root bridge (and bus) is on "paralel" with the regular one.
> it is not correct to add it to the path.
> 
> The architecture is:
>  //devices...
>  /extra root bridge/devices...
>  /extra root bridge/devices...
> And not
> /extra root bridge///devices

Your patch changed both the "/extra root bridge/devices..." part and
the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
correct IMO.

Does open-firmware have any examples for PCI paths and in particular
PCI paths when there are multiple root-buses?

It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
seems odd as the extra root bus is accessible via io accesses to
0x0cf8.

Another option would be to place the pci-root@1 behind the pci@i0cf8
as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
to the host bridge as in "/pci@i0cf8,1/...".

-Kevin

Re: [Qemu-devel] [PATCH] ui/cocoa.m: Give laptop users ability to scroll in monitor

2015-06-11 Thread Peter Maydell

On 11 May 2015 at 07:53, Gerd Hoffmann  wrote:
> On So, 2015-05-10 at 23:51 +0100, Peter Maydell wrote:
>> So looking at the code in ui/console.c that implements our
>> virtual consoles, the scrolling is hooked up to the keycodes
>> QEMU_KEY_CTRL_{UP,DOWN,PAGEUP,PAGEDOWN}. These only seem
>> to be output by one of our UI frontends, SDL.
>>
>> Gerd, how is this supposed to work? Shouldn't something
>> in the generic console code be handling converting the
>> Q_KEY_CODE_CTRL/CTRL_R + Q_KEY_CODE_PGUP/DOWN/etc into
>> what the vc layer expects, rather than having each of the
>> ui frontends doing it?
>
> Unfortunaly it isn't that easy as we have two very different modes of
> operation here:  For vc's we need the keyboard input already mapped to
> your local keyboard layout (i.e. the keysyms).  For guest input we need
> the raw scancodes of the keys as the keyboard layout handling is done by
> the guest.  The differences between the UIs (especially when it comes to
> raw scancodes) are big enough that it is next to impossible to hide all
> that in common code.
>
> Specifically for the vc control keys there is a little helper function
> though: kbd_put_qcode_console(), used by sdl2 and gtk, which might be
> useful for cocoa too.

I was just looking back at this, and I'm confused --
kbd_put_qcode_console() doesn't seem to do anything with the
QEMU_KEY_CTRL_* keycodes... And indeed ctrl-pageup/pagedown/up/down
don't work in the GTK UI, though they do work in the SDL1 UI.

-- PMM

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Marcel Apfelbaum


On 06/11/2015 04:58 PM, Kevin O'Connor wrote:

On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:

The fixes solves the following issue:
The PXB device exposes a new  pci root bridge with the
fw path:  /pci-root@4/..., in which 4 is the root bus number.
Before this patch the fw path was wrongly computed:
 /pci-root@1/pci@i0cf8/...
Fix the above issues: Correct the bus number and remove the
extra host bridge description.


Why is that wrong?  The previous path looks correct to me.

The prev path includes both the extra root bridge and *then* the usual host 
bridge.
 /pci-root@1/pci@i0cf8/   ...
^ new   ^ regular  ^ devices

Since the new pci root bridge (and bus) is on "paralel" with the regular one.
it is not correct to add it to the path.

The architecture is:
 //devices...
 /extra root bridge/devices...
 /extra root bridge/devices...
And not
/extra root bridge///devices

Thanks,
Marcel






The IEEE Std 1275-1994:

   IEEE Standard for Boot (Initialization Configuration)
 Firmware: Core Requirements and Practices
   3.2.1.1 Node names
   Each node in the device tree is identified by a node name
   using the following notation:
   driver-name@unit-address:device-arguments

   The driver name field is a sequence of between one and 31
   letters [...]. By convention, this name includes the name of
   the device’s manufacturer and the device’s model name separated by
   a “,”.

   The unit address field is the text representation of the
   physical address of the device within the address space
   defined by its parent node. The form of the text
   representation is bus-dependent.


Note the "physical address" part in the above.  Your patch changes the
"pci-root@" syntax to use a logical address instead of a physical
address.  That is, unless I've missed something, SeaBIOS today uses a
physical address (the n'th root bus) and the patch would change it to
use a logical address.

One of the goals of using an "openfirmware" like address was so that
they would be stable across boots (the same mechanism is also used
with coreboot).  Using a physical address is key for this, because
simply adding or removing a PCI device could cause the logical PCI
bridge enumeration to change - and that would mess up the bootorder
list if it was based on logical addresses.

-Kevin

Re: [Qemu-devel] [PATCH v2] hw/vfio/platform: replace g_malloc0_n by g_new0

2015-06-11 Thread Peter Maydell

On 11 June 2015 at 09:44, Eric Auger  wrote:
> g_malloc0_n() is introduced since glib-2.24 while QEMU currently
> requires glib-2.22. This may cause a link error on some distributions.
>
> Signed-off-by: Eric Auger 
>
> ---
>
> v1 -> v2:
> - replace g_malloc0 by g_new0
> ---
>  hw/vfio/platform.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 35266a8..9382bb7 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -346,8 +346,7 @@ static int vfio_populate_device(VFIODevice *vbasedev)
>  return ret;
>  }
>
> -vdev->regions = g_malloc0_n(vbasedev->num_regions,
> -sizeof(VFIORegion *));
> +vdev->regions = g_new0(VFIORegion *, vbasedev->num_regions);
>
>  for (i = 0; i < vbasedev->num_regions; i++) {
>  struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
> --
> 1.8.3.2

This looks like the right fix to me -- if somebody would like to
give it a reviewed-by tag I can apply it to master as a buildfix...

thanks
-- PMM

[Qemu-devel] [PATCH 1/2] Constify some variable

2015-06-11 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 hw/display/qxl-logger.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/display/qxl-logger.c b/hw/display/qxl-logger.c
index c900c2c..d944d3f 100644
--- a/hw/display/qxl-logger.c
+++ b/hw/display/qxl-logger.c
@@ -22,7 +22,7 @@
 #include "qemu/timer.h"
 #include "qxl.h"
 
-static const char *qxl_type[] = {
+static const char *const qxl_type[] = {
 [ QXL_CMD_NOP ] = "nop",
 [ QXL_CMD_DRAW ]= "draw",
 [ QXL_CMD_UPDATE ]  = "update",
@@ -31,7 +31,7 @@ static const char *qxl_type[] = {
 [ QXL_CMD_SURFACE ] = "surface",
 };
 
-static const char *qxl_draw_type[] = {
+static const char *const qxl_draw_type[] = {
 [ QXL_DRAW_NOP ] = "nop",
 [ QXL_DRAW_FILL] = "fill",
 [ QXL_DRAW_OPAQUE  ] = "opaque",
@@ -48,7 +48,7 @@ static const char *qxl_draw_type[] = {
 [ QXL_DRAW_ALPHA_BLEND ] = "alpha-blend",
 };
 
-static const char *qxl_draw_effect[] = {
+static const char *const qxl_draw_effect[] = {
 [ QXL_EFFECT_BLEND] = "blend",
 [ QXL_EFFECT_OPAQUE   ] = "opaque",
 [ QXL_EFFECT_REVERT_ON_DUP] = "revert-on-dup",
@@ -59,12 +59,12 @@ static const char *qxl_draw_effect[] = {
 [ QXL_EFFECT_OPAQUE_BRUSH ] = "opaque-brush",
 };
 
-static const char *qxl_surface_cmd[] = {
+static const char *const qxl_surface_cmd[] = {
[ QXL_SURFACE_CMD_CREATE  ] = "create",
[ QXL_SURFACE_CMD_DESTROY ] = "destroy",
 };
 
-static const char *spice_surface_fmt[] = {
+static const char *const spice_surface_fmt[] = {
[ SPICE_SURFACE_FMT_INVALID  ] = "invalid",
[ SPICE_SURFACE_FMT_1_A  ] = "alpha/1",
[ SPICE_SURFACE_FMT_8_A  ] = "alpha/8",
@@ -74,14 +74,14 @@ static const char *spice_surface_fmt[] = {
[ SPICE_SURFACE_FMT_32_ARGB  ] = "ARGB/32",
 };
 
-static const char *qxl_cursor_cmd[] = {
+static const char *const qxl_cursor_cmd[] = {
[ QXL_CURSOR_SET   ] = "set",
[ QXL_CURSOR_MOVE  ] = "move",
[ QXL_CURSOR_HIDE  ] = "hide",
[ QXL_CURSOR_TRAIL ] = "trail",
 };
 
-static const char *spice_cursor_type[] = {
+static const char *const spice_cursor_type[] = {
[ SPICE_CURSOR_TYPE_ALPHA   ] = "alpha",
[ SPICE_CURSOR_TYPE_MONO] = "mono",
[ SPICE_CURSOR_TYPE_COLOR4  ] = "color4",
@@ -91,7 +91,7 @@ static const char *spice_cursor_type[] = {
[ SPICE_CURSOR_TYPE_COLOR32 ] = "color32",
 };
 
-static const char *qxl_v2n(const char *n[], size_t l, int v)
+static const char *qxl_v2n(const char *const n[], size_t l, int v)
 {
 if (v >= l || !n[v]) {
 return "???";
-- 
2.1.0

Re: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses.

2015-06-11 Thread Kevin O'Connor

On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> The fixes solves the following issue:
> The PXB device exposes a new  pci root bridge with the
> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> Before this patch the fw path was wrongly computed:
> /pci-root@1/pci@i0cf8/...
> Fix the above issues: Correct the bus number and remove the
> extra host bridge description.

Why is that wrong?  The previous path looks correct to me.

> The IEEE Std 1275-1994:
> 
>   IEEE Standard for Boot (Initialization Configuration)
> Firmware: Core Requirements and Practices
>   3.2.1.1 Node names
>   Each node in the device tree is identified by a node name
>   using the following notation:
>   driver-name@unit-address:device-arguments
> 
>   The driver name field is a sequence of between one and 31
>   letters [...]. By convention, this name includes the name of
>   the device’s manufacturer and the device’s model name separated by
>   a “,”.
> 
>   The unit address field is the text representation of the
>   physical address of the device within the address space
>   defined by its parent node. The form of the text
>   representation is bus-dependent.

Note the "physical address" part in the above.  Your patch changes the
"pci-root@" syntax to use a logical address instead of a physical
address.  That is, unless I've missed something, SeaBIOS today uses a
physical address (the n'th root bus) and the patch would change it to
use a logical address.

One of the goals of using an "openfirmware" like address was so that
they would be stable across boots (the same mechanism is also used
with coreboot).  Using a physical address is key for this, because
simply adding or removing a PCI device could cause the logical PCI
bridge enumeration to change - and that would mess up the bootorder
list if it was based on logical addresses.

-Kevin

1 2 3 >

1 - 100 of 231 matches

Mail list logo