Re: [Qemu-devel] [PATCH] Add warmup phase for live migration of large memory apps

2011-05-12 Thread Yoshiaki Tamura
2011/5/12 Isaku Yamahata yamah...@valinux.co.jp:
 On Thu, May 12, 2011 at 12:39:22PM +0200, Juan Quintela wrote:
 Shribman, Aidan aidan.shrib...@sap.com wrote:
  On Wed, May 11, 2011 at 8:58 AM, Shribman, Aidan
  aidan.shrib...@sap.com wrote:
   From: Aidan Shribman aidan.shrib...@sap.com
  
   [PATCH] Add warmup phase for live migration of large memory apps
  
   By invoking migrate -w url we initiate a background
  live-migration
   transferring of dirty pages continuously until invocation
  of migrate_end
   which attempts to complete the live migration operation.
 
  What is the purpose of this patch?  How and when do I use it?
 
 
  The warmup patch adds none-converging background update of guest
  memory during live-migration such that on request of live-migration
  completion (via migrate_end command) we get much faster
  response. This is especially needed when running a payload of large
  enterprise applications which have high memory demands.

 We should integrate this with Kemari (Kemari is doing something like
 this, just that it has more requirements).  Isaku, do you have any comments?

 Yochi and Kei are familiar with Kemari. Not me. Cced to them.

I think it's OK to have this feature by checking max_downtime ==
0.  But I'm wondering that if users type commands like:

migrate_set_downtime 0
migrate url # w/o -d

it'll lock the monitor forever in most cases.  So forcing users to
set -d or automatically doing inside in case of max_downtime == 0
seems better to me.  Sorry if I'm missing the point...

Yoshi




 BTW, what loads have you tested for this?

 if I setup an image with 1GB RAM and a DVD iso image, and do in the
 guest:

 while true; do find /media/cdrom -type f | xargs md5sum; done

 Migration never converges with current code (if you use more than 1GB
 memory, then all the DVD will be cached inside).

 So, I see this only useful for guests that are almost idle, and on that
 case, migration speed is not the bigger of your problems, no?

 Later, Juan.


 --
 yamahata




Re: [Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-04-27 Thread Yoshiaki Tamura
On Apr 26, 2011, at 11:51 PM, Jan Kiszka jan.kis...@siemens.com wrote:

 On 2011-04-26 16:24, 大村 圭 wrote:
 
 2011/4/25 Jan Kiszka jan.kis...@web.de:
 On 2011-04-25 13:00, OHMURA Kei wrote:
 From: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 
 Record mmio write event to replay it upon failover.
 
 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/exec.c b/exec.c
 index c3dc68a..3c3cece 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
 +#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
 @@ -3736,6 +3737,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
 uint8_t *buf,
io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
if (p)
addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
 +
 +event_tap_mmio(addr, buf, len);
 +
 
 You know that this is incomplete? A few devices are calling st*_phys
 directly, specifically virtio.
 
 What kind of mmio should be traced here, device or CPU originated? Or both?
 
 Jan
 
 
 
 To let Kemari replay outputs upon failover, tracing CPU originated
 mmio (specifically write requests) should be enough.
 IIUC, we can reproduce device originated mmio as a result of cpu
 originated mmio.
 
 
 OK, I see.
 
 But this tap will only work for KVM. I think you either have to catch
 the other paths that TCG could take as well or maybe better move the
 hook into kvm-all - then it's absolutely clear that this is no generic
 feature.

Hi Jan,

Indeed Kemari is for KVM, so moving to kvm-all.c seems to be reasonable.  
However, I would like to have this feature general rather than locking up only 
in KVM.

Could you describe the difference between KVM and TCG in processing mmio, so 
that we can see the issue?

Yoshi

 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux




Re: [Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-04-27 Thread Yoshiaki Tamura
On Apr 27, 2011, at 2:51 PM, Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp 
wrote:

 
 What kind of mmio should be traced here, device or CPU originated? Or both?
 
 Jan
 
 
 
 To let Kemari replay outputs upon failover, tracing CPU originated
 mmio (specifically write requests) should be enough.
 IIUC, we can reproduce device originated mmio as a result of cpu
 originated mmio.
 
 
 Sorry, but I don't understand why it is safe yet.
 
 The problem is not if the mmio's are to be replayed but if replaying
 them will produce the same result, is it?

No.  That's the functionality of event-tap queuing.
The mmio tap is for recording which CPU originated mmio resulted in I/O 
monitored at event-tap queuing.

We expect the replayed result to be same as the primary, but we don't have to 
guarantee while it's queued.

Thanks,

Yoshi

 
 In other words, is it really idempotent?
 
 Takuya
 
 
 OK, I see.
 
 But this tap will only work for KVM. I think you either have to catch
 the other paths that TCG could take as well or maybe better move the
 hook into kvm-all - then it's absolutely clear that this is no generic
 feature.
 
 Jan
 




[Qemu-devel] Re: [PATCH v2 2/2] rbd: allow configuration of rados from the rbd filename

2011-04-07 Thread Yoshiaki Tamura
2011/4/7 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Apr 07, 2011 at 10:14:03AM +0900, Yoshiaki Tamura wrote:
 2011/3/29 Josh Durgin josh.dur...@dreamhost.com:
  The new format is 
  rbd:pool/image[@snapshot][:option1=value1[:option2=value2...]]
  Each option is used to configure rados, and may be any Ceph option, or 
  conf.
  The conf option specifies a Ceph configuration file to read.
 
  This allows rbd volumes from more than one Ceph cluster to be used by
  specifying different monitor addresses, as well as having different
  logging levels or locations for different volumes.
 
  Signed-off-by: Josh Durgin josh.dur...@dreamhost.com
  ---
   block/rbd.c |  119 
  ++
   1 files changed, 102 insertions(+), 17 deletions(-)
 
  diff --git a/block/rbd.c b/block/rbd.c
  index cb76dd3..bc3323d 100644
  --- a/block/rbd.c
  +++ b/block/rbd.c
  @@ -22,13 +22,17 @@
   /*
   * When specifying the image filename use:
   *
  - * rbd:poolname/devicename
  + * 
  rbd:poolname/devicename[@snapshotname][:option1=value1[:option2=value2...]]

 I'm not sure IIUC, but currently this @snapshotname seems to be
 meaningless; it doesn't allow you to boot from a snapshot because it's
 read only.  Am I misunderstanding or tested incorrectly?

 Read-only block devices are supported by QEMU and can be useful.

I agree.  My expectation was that @snapshotname is introduced to have
writable snapshot.

Yoshi


 Stefan




[Qemu-devel] Re: [PATCH v2 2/2] rbd: allow configuration of rados from the rbd filename

2011-04-07 Thread Yoshiaki Tamura
2011/4/8 Yehuda Sadeh Weinraub yehud...@gmail.com:
 On Thu, Apr 7, 2011 at 2:54 AM, Yoshiaki Tamura
 tamura.yoshi...@gmail.com wrote:
 2011/4/7 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Apr 07, 2011 at 10:14:03AM +0900, Yoshiaki Tamura wrote:
 2011/3/29 Josh Durgin josh.dur...@dreamhost.com:
  The new format is 
  rbd:pool/image[@snapshot][:option1=value1[:option2=value2...]]
  Each option is used to configure rados, and may be any Ceph option, or 
  conf.
  The conf option specifies a Ceph configuration file to read.
 
  This allows rbd volumes from more than one Ceph cluster to be used by
  specifying different monitor addresses, as well as having different
  logging levels or locations for different volumes.
 
  Signed-off-by: Josh Durgin josh.dur...@dreamhost.com
  ---
   block/rbd.c |  119 
  ++
   1 files changed, 102 insertions(+), 17 deletions(-)
 
  diff --git a/block/rbd.c b/block/rbd.c
  index cb76dd3..bc3323d 100644
  --- a/block/rbd.c
  +++ b/block/rbd.c
  @@ -22,13 +22,17 @@
   /*
   * When specifying the image filename use:
   *
  - * rbd:poolname/devicename
  + * 
  rbd:poolname/devicename[@snapshotname][:option1=value1[:option2=value2...]]

 I'm not sure IIUC, but currently this @snapshotname seems to be
 meaningless; it doesn't allow you to boot from a snapshot because it's
 read only.  Am I misunderstanding or tested incorrectly?

 Read-only block devices are supported by QEMU and can be useful.

 I agree.  My expectation was that @snapshotname is introduced to have
 writable snapshot.

 The RADOS backend doesn't support writable snapshots. However, down
 the rbd roadmap we plan to have layering which in a sense is writable
 snapshots. The whole shift to librbd was done so that introducing such
 new functionality will be transparent and will not require much or any
 changes in the qemu code.

Thanks.  It made things clear :)  I think it's a good move.

Yoshi


 Yehuda




Re: [Qemu-devel] [PATCH v2 00/28] Refactor and cleanup migration code

2011-04-02 Thread Yoshiaki Tamura
2011/2/24 Juan Quintela quint...@redhat.com:
 v2:
 - make Jan^Wcheckpatch.pl happy
 - Yoshiaki Tamura suggestions:
  - include its two patches to clean things
  - MAX_THROTTLE define
  - migration_state enum
 - I removed spurious differences between migration-{tcp,unix}
 - better error propagation, after this patch:
   migrate -d tcp:name_don_exist:port
   migrate -d tcp:name:port_dont_exist
   migrate -d exec: prog_dont_exist
   migrate -d exec: gzip  /path/dont/exist
  fail as expected.  Last two used to enter an infinite loop.

 Later, Juan.

 v1:
 This series:
 - Fold MigrationState into FdMigrationState (and then rename)
 - Factorize migration statec creation in a single place
 - Make use of MIG_STATE_*, setup through helpers and make them local
 - remove relase  cancel callbacks (where used only one in same
  file than defined)
 - get_status() is no more, just access directly to .state
 - current_migration use cleanup, and make variable static
 - max_throotle is gone, now inside current_migration
 - change get_migration_status() to migration_has_finished()
  and actualize single user.

 Please review.

 Later, Juan.

 Juan Quintela (26):
  migration: Make *start_outgoing_migration return FdMigrationState
  migration: Use FdMigrationState instead of MigrationState when
    possible
  migration: Fold MigrationState into FdMigrationState
  migration: Rename FdMigrationState MigrationState
  migration: Refactor MigrationState creation
  migration: Make all posible migration functions static
  migration: move migrate_create_state to do_migrate
  migration: Check that migration is active before cancel it
  migration: Introduce MIG_STATE_NONE
  migration: Refactor and simplify error checking in
    migrate_fd_put_ready
  migration: Introduce migrate_fd_completed() for consistenncy
  migration: Our release callback was just free
  migration: Remove get_status() accessor
  migration: Remove migration cancel() callback
  migration: Move exported functions to the end of the file
  migration: use global variable directly
  migration: another case of global variable assigned to local one
  migration: convert current_migration from pointer to struct
  migration: Use bandwidth_limit directly
  migration: Export a function that tells if the migration has finished
    correctly
  migration: Make state definitions local
  migration: Don't use callback on file defining it
  migration: propagate error correctly
  migration: qemu_savevm_iterate has three return values
  migration: If there is one error, it makes no sense to continue
  migration: make migration-{tcp,unix} consistent

 Yoshiaki Tamura (2):
  savevm: avoid qemu_savevm_state_iteate() to return 1 when qemu file
    has error.
  migration: add error handling to migrate_fd_put_notify().

  buffered_file.c  |    2 +-
  migration-exec.c |   39 +
  migration-fd.c   |   42 ++-
  migration-tcp.c  |   76 --
  migration-unix.c |  112 ++-
  migration.c      |  407 
 ++
  migration.h      |   85 ++--
  savevm.c         |    7 +-
  ui/spice-core.c  |    4 +-
  9 files changed, 307 insertions(+), 467 deletions(-)

 --
 1.7.4




Acked-by: Yoshiaki Tamura tamura.yoshi...@gmail.com



Re: Supsend/resume regression in c995b4 WAS: Re: [Qemu-devel] [PATCH] Fix migration uint8 arrys handled

2011-03-23 Thread Yoshiaki Tamura
2011/3/23 Avi Kivity a...@redhat.com:
 On 03/22/2011 03:26 PM, Anthony Liguori wrote:

 Here's how I propose we tackle this.  This patch adds a -dump-savevm
 option that takes a version.  It spits out all of the fields we save for a
 particular version (well, not really, but it should).  We also can add type
 information.  The idea is that we'd write a simple test case (using gtester)
 that ran through and dumped the schema for each version.  We'd store the
 schema's in the tree and the test can compare old schema's to the current
 schema to check for failure.


 Instead of generating the schema and comparing, what about the other way
 round?  Write vmstate in a formal schema, and generate the code at runtime.

I agree :)

Yoshi


 --
 error compiling committee.c: too many arguments to function






Re: [Qemu-devel] Re: [PATCH 2/9] vmstate: Fix varrays with uint8 indexes

2011-03-22 Thread Yoshiaki Tamura
2011/3/18 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Juan, Anthony,

 It seems this patch broke live migration in my environment.  The guest
 hangs after switching to remote.  The following is parameters of the
 guest.

 -L /usr/local/seabios --enable-kvm -M pc -m 512 -smp 1 -monitor stdio
 -localtime -boot c -drive file=/vm/fedora13.img,if=virtio -net
 nic,macaddr=54:52:00:47:a5:a8,model=virtio -net tap -parallel none
 -usb  -vnc :0

 Have you seen similar issues?

 Fix sent to the list, waiting for Anthony to apply it.

 Subject: [PATCH] Fix migration uint8 arrys handled

 Anthony, please apply that one.

This patch (commit:b784421ce4cc860315f4ec31bbc3d67e91984074)
actually fixed the problem, but it got broken again by the merge
after the commit...

Yoshi


 Later, Juan.





[Qemu-devel] [PATCH 04/18] qemu-char: export socket_set_nodelay().

2011-03-22 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-char.c   |2 +-
 qemu_socket.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 31c9e79..fa16d36 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2111,7 +2111,7 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
diff --git a/qemu_socket.h b/qemu_socket.h
index 180e4db..a05e1e5 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,6 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_nodelay(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.1.2




[Qemu-devel] [PATCH 17/18] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2011-03-22 Thread Yoshiaki Tamura
When ft_mode is set in the header, tcp_accept_incoming_migration()
sets ft_trans_incoming() as a callback, and call
qemu_file_get_notify() to receive FT transaction iteratively.  We also
need a hack no to close fd before moving to ft_transaction mode, so
that we can reuse the fd for it.  vm_change_state_handler is added to
turn off ft_mode when cont is pressed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   67 ++-
 1 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 62ec0ea..096781b 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,8 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_trans_file.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -29,6 +31,8 @@
 do { } while (0)
 #endif
 
+static VMChangeStateEntry *vmstate;
+
 static int socket_errno(FdMigrationState *s)
 {
 return socket_error();
@@ -56,7 +60,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -150,6 +155,36 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 return s-mig_state;
 }
 
+static void ft_trans_incoming(void *opaque)
+{
+QEMUFile *f = opaque;
+
+qemu_file_get_notify(f);
+if (qemu_file_has_error(f)) {
+ft_mode = FT_ERROR;
+qemu_fclose(f);
+}
+}
+
+static void ft_trans_reset(void *opaque, int running, int reason)
+{
+QEMUFile *f = opaque;
+
+if (running) {
+if (ft_mode != FT_ERROR) {
+qemu_fclose(f);
+}
+ft_mode = FT_OFF;
+qemu_del_vm_change_state_handler(vmstate);
+}
+}
+
+static void ft_trans_schedule_replay(QEMUFile *f)
+{
+event_tap_schedule_replay();
+vmstate = qemu_add_vm_change_state_handler(ft_trans_reset, f);
+}
+
 static void tcp_accept_incoming_migration(void *opaque)
 {
 struct sockaddr_in addr;
@@ -175,8 +210,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (ft_mode == FT_INIT) {
+autostart = 0;
+}
+
 process_incoming_migration(f);
+
+if (ft_mode == FT_INIT) {
+int ret;
+
+socket_set_nodelay(c);
+
+f = qemu_fopen_ft_trans(s, c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen_ft_trans\n);
+goto out;
+}
+
+/* need to wait sender to setup */
+ret = qemu_ft_trans_begin(f);
+if (ret  0) {
+goto out;
+}
+
+qemu_set_fd_handler2(c, NULL, ft_trans_incoming, NULL, f);
+ft_trans_schedule_replay(f);
+ft_mode = FT_TRANSACTION_RECV;
+
+return;
+}
+
 qemu_fclose(f);
+
 out:
 close(c);
 out2:
-- 
1.7.1.2




[Qemu-devel] [PATCH 03/18] Introduce qemu_loadvm_state_no_header() and make qemu_loadvm_state() a wrapper.

2011-03-22 Thread Yoshiaki Tamura
Introduce qemu_loadvm_state_no_header() so that it can be called
iteratively without reading the header, and qemu_loadvm_state()
becomes a wrapper of it.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |   45 +++--
 1 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/savevm.c b/savevm.c
index d293f9c..4a76e32 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1743,30 +1743,14 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_no_header(QEMUFile *f)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
 LoadStateEntry *le, *new_le;
 uint8_t section_type;
-unsigned int v;
-int ret;
-
-if (qemu_savevm_state_blocked(default_mon)) {
-return -EINVAL;
-}
-
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
-}
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
+int ret;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -1861,6 +1845,31 @@ out:
 return ret;
 }
 
+int qemu_loadvm_state(QEMUFile *f)
+{
+unsigned int v;
+
+if (qemu_savevm_state_blocked(default_mon)) {
+return -EINVAL;
+}
+
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC) {
+return -EINVAL;
+}
+
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION) {
+return -ENOTSUP;
+}
+
+return qemu_loadvm_state_no_header(f);
+}
+
 static int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
   const char *name)
 {
-- 
1.7.1.2




[Qemu-devel] [PATCH 18/18] Introduce kemari: to enable FT migration mode (Kemari).

2011-03-22 Thread Yoshiaki Tamura
When kemari: is set in front of URI of migrate command, it will turn
on ft_mode to start FT migration mode (Kemari).  On the receiver side,
the option looks like, -incoming kemari:protocol:address:port

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
 hmp-commands.hx |4 +++-
 migration.c |   12 
 qmp-commands.hx |4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 834e6a8..4cd7bfa 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -760,7 +760,9 @@ ETEXI
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
diff --git a/migration.c b/migration.c
index d536df0..5017dea 100644
--- a/migration.c
+++ b/migration.c
@@ -48,6 +48,12 @@ int qemu_start_incoming_migration(const char *uri)
 const char *p;
 int ret;
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p))
 ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
@@ -99,6 +105,12 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  blk, inc);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fbd98ee..71e4f0e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -437,7 +437,9 @@ EQMP
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.1.2




[Qemu-devel] [PATCH 06/18] virtio: decrement last_avail_idx with inuse before saving.

2011-03-22 Thread Yoshiaki Tamura
For regular migration inuse == 0 always as requests are flushed before
save. However, event-tap log when enabled introduces an extra queue
for requests which is not being flushed, thus the last inuse requests
are left in the event-tap queue.  Move the last_avail_idx value sent
to the remote back to make it repeat the last inuse requests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 31bd9e3..f05d1b6 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -673,12 +673,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, i);
 
 for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+/* For regular migration inuse == 0 always as
+ * requests are flushed before save. However,
+ * event-tap log when enabled introduces an extra
+ * queue for requests which is not being flushed,
+ * thus the last inuse requests are left in the event-tap queue.
+ * Move the last_avail_idx value sent to the remote back
+ * to make it repeat the last inuse requests. */
+uint16_t last_avail = vdev-vq[i].last_avail_idx - vdev-vq[i].inuse;
 if (vdev-vq[i].vring.num == 0)
 break;
 
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
-qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, last_avail);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 11/18] ioport: insert event_tap_ioport() to ioport_write().

2011-03-22 Thread Yoshiaki Tamura
Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index 2e971fa..f485bab 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include ioport.h
 #include trace.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.1.2




[Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-03-22 Thread Yoshiaki Tamura
Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.  Note that this
won't change the existing migration protocol to be bi-directional.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   15 +++
 migration.c |   13 +
 migration.h |3 +++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index e8dff9d..62ec0ea 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,20 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s-fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do {
+len = recv(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
@@ -94,6 +108,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s-get_error = socket_errno;
 s-write = socket_write;
+s-read = socket_read;
 s-close = tcp_close;
 s-mig_state.cancel = migrate_fd_cancel;
 s-mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index af3a1f2..302b8fe 100644
--- a/migration.c
+++ b/migration.c
@@ -340,6 +340,19 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size)
+{
+FdMigrationState *s = opaque;
+int ret;
+
+ret = s-read(s, data, size);
+if (ret == -1) {
+ret = -(s-get_error(s));
+}
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index 2170792..88a6987 100644
--- a/migration.h
+++ b/migration.h
@@ -48,6 +48,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -116,6 +117,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.1.2




[Qemu-devel] [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-03-22 Thread Yoshiaki Tamura
This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs   |1 +
 ft_trans_file.c |  624 +++
 ft_trans_file.h |   72 +++
 migration.c |3 +
 trace-events|   15 ++
 5 files changed, 715 insertions(+), 0 deletions(-)
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h

diff --git a/Makefile.objs b/Makefile.objs
index f8cf199..c084905 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -101,6 +101,7 @@ common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
 common-obj-y += pflib.o
 common-obj-y += bitmap.o bitops.o
+common-obj-y += ft_trans_file.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_trans_file.c b/ft_trans_file.c
new file mode 100644
index 000..2b42b95
--- /dev/null
+++ b/ft_trans_file.c
@@ -0,0 +1,624 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include trace.h
+#include ft_trans_file.h
+
+typedef struct FtTransHdr
+{
+uint16_t cmd;
+uint16_t id;
+uint32_t seq;
+uint32_t payload_len;
+} FtTransHdr;
+
+typedef struct QEMUFileFtTrans
+{
+FtTransPutBufferFunc *put_buffer;
+FtTransGetBufferFunc *get_buffer;
+FtTransPutReadyFunc *put_ready;
+FtTransGetReadyFunc *get_ready;
+FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
+FtTransCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+
+enum QEMU_VM_TRANSACTION_STATE state;
+uint32_t seq;
+uint16_t id;
+
+int has_error;
+
+bool freeze_output;
+bool freeze_input;
+bool rate_limit;
+bool is_sender;
+bool is_payload;
+
+uint8_t *buf;
+size_t buf_max_size;
+size_t put_offset;
+size_t get_offset;
+
+FtTransHdr header;
+size_t header_offset;
+} QEMUFileFtTrans;
+
+#define IO_BUF_SIZE 32768
+
+static void ft_trans_append(QEMUFileFtTrans *s,
+const uint8_t *buf, size_t size)
+{
+if (size  (s-buf_max_size - s-put_offset)) {
+trace_ft_trans_realloc(s-buf_max_size, size + 1024);
+s-buf_max_size += size + 1024;
+s-buf = qemu_realloc(s-buf, s-buf_max_size);
+}
+
+trace_ft_trans_append(size);
+memcpy(s-buf + s-put_offset, buf, size);
+s-put_offset += size;
+}
+
+static void ft_trans_flush(QEMUFileFtTrans *s)
+{
+size_t offset = 0;
+
+if (s-has_error) {
+error_report(flush when error %d, bailing, s-has_error);
+return;
+}
+
+while (offset  s-put_offset) {
+ssize_t ret;
+
+ret = s-put_buffer(s-opaque, s-buf + offset, s-put_offset - 
offset);
+if (ret == -EAGAIN) {
+break;
+}
+
+if (ret = 0) {
+error_report(error flushing data, %s, strerror(errno));
+s-has_error = FT_TRANS_ERR_FLUSH;
+break;
+} else {
+offset += ret;
+}
+}
+
+trace_ft_trans_flush(offset, s-put_offset);
+memmove(s-buf, s-buf + offset, s-put_offset - offset);
+s-put_offset -= offset;
+s-freeze_output = !!s-put_offset;
+}
+
+static ssize_t ft_trans_put(void *opaque, void *buf, int size)
+{
+QEMUFileFtTrans *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+/* flush buffered data before putting next */
+if (s-put_offset) {
+ft_trans_flush(s);
+}
+
+while (!s-freeze_output  offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len == -EAGAIN) {
+trace_ft_trans_freeze_output();
+s-freeze_output = 1;
+break;
+}
+
+if (len = 0) {
+error_report(putting data failed, %s, strerror(errno));
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+if (s-freeze_output) {
+ft_trans_append(s, buf + offset, size - offset);
+offset = size;
+}
+
+return offset;
+}
+
+static int ft_trans_send_header(QEMUFileFtTrans *s,
+enum QEMU_VM_TRANSACTION_STATE state,
+uint32_t payload_len)
+{
+int ret

[Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-03-22 Thread Yoshiaki Tamura
Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 964ce31..be71464 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3733,6 +3734,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.1.2




[Qemu-devel] [PATCH 01/18] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2011-03-22 Thread Yoshiaki Tamura
Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   20 +++-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 1b09039..f90ff15 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 03fce62..d293f9c 100644
--- a/savevm.c
+++ b/savevm.c
@@ -171,7 +171,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -422,6 +423,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_malloc(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -452,6 +456,19 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -477,6 +494,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 15/18] savevm: introduce qemu_savevm_trans_{begin, commit}.

2011-03-22 Thread Yoshiaki Tamura
Introduce qemu_savevm_trans_{begin,commit} to send the memory and
device info together, while avoiding cancelling memory state tracking.
This patch also abstracts common code between
qemu_savevm_state_{begin,iterate,commit}.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |  157 +++---
 sysemu.h |2 +
 2 files changed, 101 insertions(+), 58 deletions(-)

diff --git a/savevm.c b/savevm.c
index 48a0f65..4793be0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1629,29 +1629,68 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 return false;
 }
 
-int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
-int shared)
+/*
+ * section: header to write
+ * inc: if true, forces to pass SECTION_PART instead of SECTION_START
+ * pause: if true, breaks the loop when live handler returned 0
+ */
+static int qemu_savevm_state_live(Monitor *mon, QEMUFile *f, int section,
+  bool inc, bool pause)
 {
 SaveStateEntry *se;
+int skip = 0, ret;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if(se-set_params == NULL) {
+int len, stage;
+
+if (se-save_live_state == NULL) {
 continue;
-   }
-   se-set_params(blk_enable, shared, se-opaque);
+}
+
+/* Section type */
+qemu_put_byte(f, section);
+qemu_put_be32(f, se-section_id);
+
+if (section == QEMU_VM_SECTION_START) {
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+stage = inc ? QEMU_VM_SECTION_PART : QEMU_VM_SECTION_START;
+} else {
+assert(inc);
+stage = section;
+}
+
+ret = se-save_live_state(mon, f, stage, se-opaque);
+if (!ret) {
+skip++;
+if (pause) {
+break;
+}
+}
 }
-
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+return skip;
+}
+
+static void qemu_savevm_state_full(QEMUFile *f)
+{
+SaveStateEntry *se;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
 int len;
 
-if (se-save_live_state == NULL)
+if (se-save_state == NULL  se-vmsd == NULL) {
 continue;
+}
 
 /* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
 qemu_put_be32(f, se-section_id);
 
 /* ID string */
@@ -1662,9 +1701,29 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
+int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
+int shared)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (se-set_params == NULL) {
+continue;
+}
+se-set_params(blk_enable, shared, se-opaque);
 }
 
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_START, 0, 0);
+
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
 return -EIO;
@@ -1675,29 +1734,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
 int ret = 1;
 
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (se-save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se-section_id);
-
-ret = se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
-if (!ret) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
-}
-}
-
-if (ret)
+/* Do not proceed to the next vmstate before this one reported
+   completion of the current stage. This serializes the migration
+   and reduces the probability that a faster changing state is
+   synchronized over and over again. */
+ret = qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_PART, 1, 1);
+if (!ret) {
 return 1;
+}
 
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1709,46 +1755,41 @@ int

[Qemu-devel] [PATCH 00/18] [PATCH 00/18] Kemari for KVM v0.2.13

2011-03-22 Thread Yoshiaki Tamura
 the
max_downtime to decide when to switch from async to sync mode.

The repository contains all patches I'm sending with this message.
For those who want to try, please pull the following repository.  It
also includes dirty bitmap optimization which aren't ready for posting
yet.  To remove the dirty bitmap optimization, please look at HEAD~5
of the tree.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari next

Thanks,

Yoshi

Yoshiaki Tamura (18):
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce qemu_loadvm_state_no_header() and make qemu_loadvm_state()
a wrapper.
  qemu-char: export socket_set_nodelay().
  vl.c: add deleted flag for deleting the handler.
  virtio: decrement last_avail_idx with inuse before saving.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  savevm: introduce util functions to control ft_trans_file from savevm
layer.
  Introduce event-tap.
  Call init handler of event-tap at main() in vl.c.
  ioport: insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.
  net: insert event-tap to qemu_send_packet() and
qemu_sendv_packet_async().
  block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and
bdrv_flush().
  savevm: introduce qemu_savevm_trans_{begin,commit}.
  migration: introduce migrate_ft_trans_{put,get}_ready(), and modify
migrate_fd_put_ready() when ft_mode is on.
  migration-tcp: modify tcp_accept_incoming_migration() to handle
ft_mode, and add a hack not to close fd when ft_mode is enabled.
  Introduce kemari: to enable FT migration mode (Kemari).

 Makefile.objs   |1 +
 Makefile.target |1 +
 block.c |   15 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 exec.c  |4 +
 ft_trans_file.c |  624 
 ft_trans_file.h |   72 +
 hmp-commands.hx |4 +-
 hw/hw.h |7 +
 hw/virtio.c |   10 +-
 ioport.c|2 +
 migration-tcp.c |   82 +-
 migration.c |  294 +-
 migration.h |3 +
 net.c   |9 +
 qemu-char.c |2 +-
 qemu-tool.c |   27 ++
 qemu_socket.h   |1 +
 qmp-commands.hx |4 +-
 savevm.c|  372 +-
 sysemu.h|2 +
 trace-events|   25 ++
 vl.c|   18 +-
 24 files changed, 2475 insertions(+), 88 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h




[Qemu-devel] [PATCH 13/18] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async().

2011-03-22 Thread Yoshiaki Tamura
event-tap function is called only when it is on.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 net.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index ddcca97..a541ede 100644
--- a/net.c
+++ b/net.c
@@ -37,6 +37,7 @@
 #include qemu_socket.h
 #include hw/qdev.h
 #include iov.h
+#include event-tap.h
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -519,6 +520,10 @@ ssize_t qemu_send_packet_async(VLANClientState *sender,
 
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size)
 {
+if (event_tap_is_on()) {
+return event_tap_send_packet(vc, buf, size);
+}
+
 qemu_send_packet_async(vc, buf, size, NULL);
 }
 
@@ -600,6 +605,10 @@ ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 {
 NetQueue *queue;
 
+if (event_tap_is_on()) {
+return event_tap_sendv_packet_async(sender, iov, iovcnt, sent_cb);
+}
+
 if (sender-link_down || (!sender-peer  !sender-vlan)) {
 return iov_size(iov, iovcnt);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 05/18] vl.c: add deleted flag for deleting the handler.

2011-03-22 Thread Yoshiaki Tamura
Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/vl.c b/vl.c
index dbb927d..483e2e3 100644
--- a/vl.c
+++ b/vl.c
@@ -1158,6 +1158,7 @@ static void nographic_update(void *opaque)
 struct vm_change_state_entry {
 VMChangeStateHandler *cb;
 void *opaque;
+int deleted;
 QLIST_ENTRY (vm_change_state_entry) entries;
 };
 
@@ -1178,18 +1179,22 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
 }
 
 void vm_state_notify(int running, int reason)
 {
-VMChangeStateEntry *e;
+VMChangeStateEntry *e, *ne;
 
 trace_vm_state_notify(running, reason);
 
-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);
+QLIST_FOREACH_SAFE(e, vm_change_state_head, entries, ne) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}
 }
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 08/18] savevm: introduce util functions to control ft_trans_file from savevm layer.

2011-03-22 Thread Yoshiaki Tamura
To utilize ft_trans_file function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 ++
 savevm.c |  150 ++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index f90ff15..2d4d595 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -51,6 +51,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -60,6 +61,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_ft_trans_begin(QEMUFile *f);
+int qemu_ft_trans_commit(QEMUFile *f);
+int qemu_ft_trans_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
@@ -94,6 +98,7 @@ void qemu_file_set_error(QEMUFile *f);
  * halted due to rate limiting or EAGAIN errors occur as it can be used to
  * resume output. */
 void qemu_file_put_notify(QEMUFile *f);
+void qemu_file_get_notify(void *opaque);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 4a76e32..48a0f65 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_trans_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -189,6 +190,13 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
+typedef struct QEMUFileSocketTrans
+{
+int fd;
+QEMUFileSocket *s;
+VMChangeStateEntry *e;
+} QEMUFileSocketTrans;
+
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
 QEMUFileSocket *s = opaque;
@@ -204,6 +212,22 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -211,6 +235,71 @@ static int socket_close(void *opaque)
 return 0;
 }
 
+static int socket_trans_get_buffer(void *opaque, uint8_t *buf, int64_t pos, 
size_t size)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+ssize_t len;
+
+len = socket_get_buffer(s, buf, pos, size);
+
+return len;
+}
+
+static ssize_t socket_trans_put_buffer(void *opaque, const void *buf, size_t 
size)
+{
+QEMUFileSocketTrans *t = opaque;
+
+return socket_put_buffer(t-s, buf, size);
+}
+
+static int qemu_loadvm_state_no_header(QEMUFile *f);
+
+static int socket_trans_get_ready(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+QEMUFile *f = s-file;
+int ret = 0;
+
+ret = qemu_loadvm_state_no_header(f);
+if (ret  0) {
+fprintf(stderr,
+socket_trans_get_ready: error while loading vmstate\n);
+}
+
+return ret;
+}
+
+static int socket_trans_close(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(t-fd, NULL, NULL, NULL, NULL);
+qemu_del_vm_change_state_handler(t-e);
+close(s-fd);
+close(t-fd);
+qemu_free(s);
+qemu_free(t);
+
+return 0;
+}
+
+static void socket_trans_resume(void *opaque, int running, int reason)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+if (!running) {
+return;
+}
+
+qemu_announce_self();
+qemu_fclose(s-file);
+}
+
 static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
 {
 QEMUFileStdio *s = opaque;
@@ -333,6 +422,26 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd)
+{
+QEMUFileSocketTrans *t = qemu_mallocz(sizeof(QEMUFileSocketTrans));
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+t-s = s;
+t-fd = s_fd;
+t-e = qemu_add_vm_change_state_handler(socket_trans_resume, t);
+
+s-fd = c_fd;
+s-file = qemu_fopen_ops_ft_trans(t, socket_trans_put_buffer,
+  socket_trans_get_buffer, NULL,
+  socket_trans_get_ready,
+  migrate_fd_wait_for_unfreeze

[Qemu-devel] [PATCH 14/18] block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and bdrv_flush().

2011-03-22 Thread Yoshiaki Tamura
event-tap function is called only when it is on, and requests were
sent from device emulators.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Kevin Wolf kw...@redhat.com
---
 block.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index c8e2f97..952543a 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include block_int.h
 #include module.h
 #include qemu-objects.h
+#include event-tap.h
 
 #ifdef CONFIG_BSD
 #include sys/types.h
@@ -1585,6 +1586,10 @@ int bdrv_flush(BlockDriverState *bs)
 }
 
 if (bs-drv  bs-drv-bdrv_flush) {
+if (*bs-device_name  event_tap_is_on()) {
+event_tap_bdrv_flush();
+}
+
 return bs-drv-bdrv_flush(bs);
 }
 
@@ -2220,6 +2225,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 if (bdrv_check_request(bs, sector_num, nb_sectors))
 return NULL;
 
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+ cb, opaque);
+}
+
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
  opaque);
@@ -2493,6 +2503,11 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
 if (!drv)
 return NULL;
+
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_flush(bs, cb, opaque);
+}
+
 return drv-bdrv_aio_flush(bs, cb, opaque);
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 09/18] Introduce event-tap.

2011-03-22 Thread Yoshiaki Tamura
event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 qemu-tool.c |   27 ++
 trace-events|   10 +
 5 files changed, 1022 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 62b102a..f088121 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,7 @@ obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 LIBS+=-lz
+obj-y += event-tap.o
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..95c147a
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,940 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include block.h
+#include block_int.h
+#include ioport.h
+#include osdep.h
+#include sysemu.h
+#include hw/hw.h
+#include net.h
+#include event-tap.h
+#include trace.h
+
+enum EVENT_TAP_STATE {
+EVENT_TAP_OFF,
+EVENT_TAP_ON,
+EVENT_TAP_SUSPEND,
+EVENT_TAP_FLUSH,
+EVENT_TAP_LOAD,
+EVENT_TAP_REPLAY,
+};
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int  index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t  buf[MMIO_BUF_SIZE];
+int  len;
+} EventTapMMIO;
+
+typedef struct EventTapNetReq {
+char *device_name;
+int iovcnt;
+int vlan_id;
+bool vlan_needed;
+bool async;
+struct iovec *iov;
+NetPacketSent *sent_cb;
+} EventTapNetReq;
+
+#define MAX_BLOCK_REQUEST 32
+
+typedef struct EventTapAIOCB EventTapAIOCB;
+
+typedef struct EventTapBlkReq {
+char *device_name;
+int num_reqs;
+int num_cbs;
+bool is_flush;
+BlockRequest reqs[MAX_BLOCK_REQUEST];
+EventTapAIOCB *acb[MAX_BLOCK_REQUEST];
+} EventTapBlkReq;
+
+#define EVENT_TAP_IOPORT (1  0)
+#define EVENT_TAP_MMIO   (1  1)
+#define EVENT_TAP_NET(1  2)
+#define EVENT_TAP_BLK(1  3)
+
+#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport;
+EventTapMMIO mmio;
+};
+union {
+EventTapNetReq net_req;
+EventTapBlkReq blk_req;
+};
+QTAILQ_ENTRY(EventTapLog) node;
+} EventTapLog;
+
+struct EventTapAIOCB {
+BlockDriverAIOCB common;
+BlockDriverAIOCB *acb;
+bool is_canceled;
+};
+
+static EventTapLog *last_event_tap;
+
+static QTAILQ_HEAD(, EventTapLog) event_list;
+static QTAILQ_HEAD(, EventTapLog) event_pool;
+
+static int (*event_tap_cb)(void);
+static QEMUBH *event_tap_bh;
+static VMChangeStateEntry *vmstate;
+
+static void event_tap_bh_cb(void *p)
+{
+if (event_tap_cb) {
+event_tap_cb();
+}
+
+qemu_bh_delete(event_tap_bh);
+event_tap_bh = NULL;
+}
+
+static void event_tap_schedule_bh(void)
+{
+trace_event_tap_ignore_bh(!!event_tap_bh);
+
+/* if bh is already set, we ignore it for now */
+if (event_tap_bh) {
+return;
+}
+
+event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
+qemu_bh_schedule(event_tap_bh);
+
+return;
+}
+
+static void *event_tap_alloc_log(void)
+{
+EventTapLog *log;
+
+if (QTAILQ_EMPTY(event_pool)) {
+log = qemu_mallocz(sizeof(EventTapLog));
+} else {
+log = QTAILQ_FIRST(event_pool);
+QTAILQ_REMOVE(event_pool, log, node);
+}
+
+return log;
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req);
+static void event_tap_free_blk_req(EventTapBlkReq *blk_req);
+
+static void event_tap_free_log(EventTapLog *log)
+{
+int mode = log-mode  ~EVENT_TAP_TYPE_MASK;
+
+if (mode == EVENT_TAP_NET) {
+event_tap_free_net_req(log-net_req);
+} else if (mode == EVENT_TAP_BLK) {
+event_tap_free_blk_req(log-blk_req);
+}
+
+log-mode = 0;
+
+/* return the log to event_pool */
+QTAILQ_INSERT_HEAD(event_pool, log, node);
+}
+
+static void event_tap_free_pool(void)
+{
+EventTapLog *log, *next;
+
+QTAILQ_FOREACH_SAFE(log, event_pool, node, next) {
+QTAILQ_REMOVE(event_pool, log, node);
+qemu_free(log);
+}
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req)
+{
+int i;
+
+if (!net_req-async

[Qemu-devel] [PATCH 16/18] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2011-03-22 Thread Yoshiaki Tamura
Introduce migrate_ft_trans_put_ready() which kicks the FT transaction
cycle.  When ft_mode is on, migrate_fd_put_ready() would open
ft_trans_file and turn on event_tap.  To end or cancel FT transaction,
ft_mode and event_tap is turned off.  migrate_ft_trans_get_ready() is
called to receive ack from the receiver.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |  266 ++-
 1 files changed, 265 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 1c2d956..d536df0 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -283,6 +284,17 @@ void migrate_fd_error(FdMigrationState *s)
 migrate_fd_cleanup(s);
 }
 
+static void migrate_ft_trans_error(FdMigrationState *s)
+{
+ft_mode = FT_ERROR;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_error(s);
+/* we need to set vm running to avoid assert in virtio-net */
+vm_start();
+event_tap_unregister();
+vm_stop(0);
+}
+
 int migrate_fd_cleanup(FdMigrationState *s)
 {
 int ret = 0;
@@ -318,6 +330,17 @@ void migrate_fd_put_notify(void *opaque)
 qemu_file_put_notify(s-file);
 }
 
+static void migrate_fd_get_notify(void *opaque)
+{
+FdMigrationState *s = opaque;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_file_get_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_ft_trans_error(s);
+}
+}
+
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
 FdMigrationState *s = opaque;
@@ -353,6 +376,10 @@ int migrate_fd_get_buffer(void *opaque, uint8_t *data, 
int64_t pos, size_t size)
 ret = -(s-get_error(s));
 }
 
+if (ret == -EAGAIN) {
+qemu_set_fd_handler2(s-fd, NULL, migrate_fd_get_notify, NULL, s);
+}
+
 return ret;
 }
 
@@ -379,6 +406,230 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int migrate_ft_trans_commit(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_COMMIT  ft_mode != FT_TRANSACTION_ATOMIC) {
+fprintf(stderr,
+migrate_ft_trans_commit: invalid ft_mode %d\n, ft_mode);
+goto out;
+}
+
+do {
+if (ft_mode == FT_TRANSACTION_ATOMIC) {
+if (qemu_ft_trans_begin(s-file)  0) {
+fprintf(stderr, qemu_ft_trans_begin failed\n);
+goto out;
+}
+
+ret = qemu_savevm_trans_begin(s-mon, s-file, 0);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_begin failed\n);
+goto out;
+}
+
+ft_mode = FT_TRANSACTION_COMMIT;
+if (ret) {
+/* don't proceed until if fd isn't ready */
+goto out;
+}
+}
+
+/* make the VM state consistent by flushing outstanding events */
+vm_stop(0);
+
+/* send at full speed */
+qemu_file_set_rate_limit(s-file, 0);
+
+ret = qemu_savevm_trans_complete(s-mon, s-file);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_complete failed\n);
+goto out;
+}
+
+ret = qemu_ft_trans_commit(s-file);
+if (ret  0) {
+fprintf(stderr, qemu_ft_trans_commit failed\n);
+goto out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_RECV;
+ret = 1;
+goto out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto out;
+}
+
+ft_mode =  ret ? FT_TRANSACTION_BEGIN : FT_TRANSACTION_ATOMIC;
+} while (ft_mode != FT_TRANSACTION_BEGIN);
+
+vm_start();
+ret = 0;
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_get_ready(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_RECV) {
+fprintf(stderr,
+migrate_ft_trans_get_ready: invalid ft_mode %d\n, ft_mode);
+goto error_out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto error_out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_BEGIN;
+} else {
+ft_mode = FT_TRANSACTION_ATOMIC;
+
+ret = migrate_ft_trans_commit(s);
+if (ret  0) {
+goto error_out;
+}
+if (ret) {
+goto out;
+}
+}
+
+vm_start();
+ret = 0;
+goto out;
+
+error_out:
+migrate_ft_trans_error(s);
+
+out:
+return ret;
+}
+
+static int

[Qemu-devel] [PATCH 10/18] Call init handler of event-tap at main() in vl.c.

2011-03-22 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 483e2e3..6ed9b20 100644
--- a/vl.c
+++ b/vl.c
@@ -160,6 +160,7 @@ int main(int argc, char **argv)
 #include qemu-queue.h
 #include cpus.h
 #include arch_init.h
+#include event-tap.h
 
 #include ui/qemu-spice.h
 
@@ -3042,6 +3043,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 /* open the virtual block devices */
 if (snapshot)
 qemu_opts_foreach(qemu_find_opts(drive), drive_enable_snapshot, 
NULL, 0);
-- 
1.7.1.2




Re: [Qemu-devel] [PATCH 2/9] vmstate: Fix varrays with uint8 indexes

2011-03-18 Thread Yoshiaki Tamura
Juan, Anthony,

It seems this patch broke live migration in my environment.  The guest
hangs after switching to remote.  The following is parameters of the
guest.

-L /usr/local/seabios --enable-kvm -M pc -m 512 -smp 1 -monitor stdio
-localtime -boot c -drive file=/vm/fedora13.img,if=virtio -net
nic,macaddr=54:52:00:47:a5:a8,model=virtio -net tap -parallel none
-usb  -vnc :0

Have you seen similar issues?

Thanks,

Yoshi

2011/3/10 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  hw/hw.h  |    5 +++--
  savevm.c |    2 ++
  2 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/hw/hw.h b/hw/hw.h
 index 0299207..40c6396 100644
 --- a/hw/hw.h
 +++ b/hw/hw.h
 @@ -298,6 +298,7 @@ enum VMStateFlags {
     VMS_VARRAY_UINT16    = 0x080,  /* Array with size in uint16_t field */
     VMS_VBUFFER          = 0x100,  /* Buffer with size in int32_t field */
     VMS_MULTIPLY         = 0x200,  /* multiply size field by field_size */
 +    VMS_VARRAY_UINT8     = 0x400,  /* Array with size in uint8_t field*/
  };

  typedef struct {
 @@ -489,11 +490,11 @@ extern const VMStateInfo vmstate_info_unused_buffer;

  #define VMSTATE_STRUCT_VARRAY_UINT8(_field, _state, _field_num, _version, 
 _vmsd, _type) { \
     .name       = (stringify(_field)),                               \
 -    .num_offset = vmstate_offset_value(_state, _field_num, uint8_t),  \
 +    .num_offset = vmstate_offset_value(_state, _field_num, uint8_t), \
     .version_id = (_version),                                        \
     .vmsd       = (_vmsd),                                          \
     .size       = sizeof(_type),                                     \
 -    .flags      = VMS_STRUCT|VMS_VARRAY_INT32,                       \
 +    .flags      = VMS_STRUCT|VMS_VARRAY_UINT8,                       \
     .offset     = offsetof(_state, _field),                          \
  }

 diff --git a/savevm.c b/savevm.c
 index ce063d1..4db036b 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -1331,6 +1331,8 @@ int vmstate_load_state(QEMUFile *f, const 
 VMStateDescription *vmsd,
                 n_elems = *(int32_t *)(opaque+field-num_offset);
             } else if (field-flags  VMS_VARRAY_UINT16) {
                 n_elems = *(uint16_t *)(opaque+field-num_offset);
 +            } else if (field-flags  VMS_VARRAY_UINT8) {
 +                n_elems = *(uint8_t *)(opaque+field-num_offset);
             }
             if (field-flags  VMS_POINTER) {
                 base_addr = *(void **)base_addr + field-start;
 --
 1.7.4






Re: [Qemu-devel] Re: [PATCH 2/9] vmstate: Fix varrays with uint8 indexes

2011-03-18 Thread Yoshiaki Tamura
Ah, now I see what the problem was.

Yoshi

2011/3/18 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Juan, Anthony,

 It seems this patch broke live migration in my environment.  The guest
 hangs after switching to remote.  The following is parameters of the
 guest.

 -L /usr/local/seabios --enable-kvm -M pc -m 512 -smp 1 -monitor stdio
 -localtime -boot c -drive file=/vm/fedora13.img,if=virtio -net
 nic,macaddr=54:52:00:47:a5:a8,model=virtio -net tap -parallel none
 -usb  -vnc :0

 Have you seen similar issues?

 Fix sent to the list, waiting for Anthony to apply it.

 Subject: [PATCH] Fix migration uint8 arrys handled

 Anthony, please apply that one.

 Later, Juan.





Re: [Qemu-devel] [PATCH] Fix migration uint8 arrys handled

2011-03-18 Thread Yoshiaki Tamura
2011/3/15 Juan Quintela quint...@redhat.com:
 commit 82fa39b75181b730d6d4d09f443bd26bcfcd045c

 only contains half of the fix.  It forgots the save state fix for
 UINT8 indexes.

 Anthony, please apply, without this migration using hpet is broken.
 (only current user).

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  savevm.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

 diff --git a/savevm.c b/savevm.c
 index 60d2f2a..67459a7 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -1395,6 +1395,8 @@ void vmstate_save_state(QEMUFile *f, const 
 VMStateDescription *vmsd,
                 n_elems = *(int32_t *)(opaque+field-num_offset);
             } else if (field-flags  VMS_VARRAY_UINT16) {
                 n_elems = *(uint16_t *)(opaque+field-num_offset);
 +            } else if (field-flags  VMS_VARRAY_UINT8) {
 +                n_elems = *(uint8_t *)(opaque+field-num_offset);
             }
             if (field-flags  VMS_POINTER) {
                 base_addr = *(void **)base_addr + field-start;
 --
 1.7.4

Acked-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp



Re: [Qemu-devel] Re: KVM call agenda for Mars 14th

2011-03-14 Thread Yoshiaki Tamura
 On Mar 15, 2011, at 2:49 AM, Anthony Liguori anth...@codemonkey.ws wrote:

 On 03/14/2011 11:36 AM, Juan Quintela wrote:
 Jes Sorensenjes.soren...@redhat.com  wrote:
 On 03/14/11 13:14, Juan Quintela wrote:
 Please send any agenda items you are interested in covering.

 Thanks, Juan.
 I presume you mean for March 15? Today is the 14th and it is Monday :)
 Dunno what calendar I looked to :p

 Yes, you are right.


 - QCFG, http://wiki.qemu.org/Features/QCFG

 Regards,

 Anthony Liguori

- Kemari merge plan,
http://www.mail-archive.com/qemu-devel@nongnu.org/msg56493.html

Although I won't be able to join the conference, I would like to know
the status or plans at least.

Thanks,

Yoshi


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Qemu-devel] [PATCH v2] migration: allow setting MIG_STATE_CANCEL even if s-state != MIG_STATE_ACTIVE.

2011-02-28 Thread Yoshiaki Tamura
After migration failure, even a user commands migrate_cancel, it keeps
saying:

Migration status: failed

This patch checks s-state == MIG_STATE_CANCEL instead of s-state !=
MIG_STATE_ACTIVE.  With this patch the message above would be:

Migration status: cancelled

Please note that the following patches are prerequisite.

http://www.mail-archive.com/qemu-devel@nongnu.org/msg56448.html
http://www.mail-archive.com/qemu-devel@nongnu.org/msg56446.html

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 14a125f..0e551e2 100644
--- a/migration.c
+++ b/migration.c
@@ -406,16 +406,19 @@ void migrate_fd_cancel(MigrationState *mig_state)
 {
 FdMigrationState *s = migrate_to_fms(mig_state);
 
-if (s-state != MIG_STATE_ACTIVE)
+if (s-state == MIG_STATE_CANCELLED) {
 return;
+}
 
 DPRINTF(cancelling migration\n);
 
 s-state = MIG_STATE_CANCELLED;
 notifier_list_notify(migration_state_notifiers);
-qemu_savevm_state_cancel(s-mon, s-file);
 
-migrate_fd_cleanup(s);
+if (s-file) {
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_cleanup(s);
+}
 }
 
 void migrate_fd_release(MigrationState *mig_state)
-- 
1.7.1.2




[Qemu-devel] [PATCH 18/18] Introduce kemari: to enable FT migration mode (Kemari).

2011-02-24 Thread Yoshiaki Tamura
When kemari: is set in front of URI of migrate command, it will turn
on ft_mode to start FT migration mode (Kemari).  On the receiver side,
the option looks like, -incoming kemari:protocol:address:port

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
 hmp-commands.hx |4 +++-
 migration.c |   12 
 qmp-commands.hx |4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 372bef4..4588f38 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -760,7 +760,9 @@ ETEXI
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
diff --git a/migration.c b/migration.c
index cdea459..ee57b0e 100644
--- a/migration.c
+++ b/migration.c
@@ -48,6 +48,12 @@ int qemu_start_incoming_migration(const char *uri)
 const char *p;
 int ret;
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p))
 ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
@@ -99,6 +105,12 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  blk, inc);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index df40a3d..68ca48a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -437,7 +437,9 @@ EQMP
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.1.2




[Qemu-devel] [PATCH 15/18] savevm: introduce qemu_savevm_trans_{begin, commit}.

2011-02-24 Thread Yoshiaki Tamura
Introduce qemu_savevm_trans_{begin,commit} to send the memory and
device info together, while avoiding cancelling memory state tracking.
This patch also abstracts common code between
qemu_savevm_state_{begin,iterate,commit}.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |  157 +++---
 sysemu.h |2 +
 2 files changed, 101 insertions(+), 58 deletions(-)

diff --git a/savevm.c b/savevm.c
index aa760b7..c21b901 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1602,29 +1602,68 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 return false;
 }
 
-int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
-int shared)
+/*
+ * section: header to write
+ * inc: if true, forces to pass SECTION_PART instead of SECTION_START
+ * pause: if true, breaks the loop when live handler returned 0
+ */
+static int qemu_savevm_state_live(Monitor *mon, QEMUFile *f, int section,
+  bool inc, bool pause)
 {
 SaveStateEntry *se;
+int skip = 0, ret;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if(se-set_params == NULL) {
+int len, stage;
+
+if (se-save_live_state == NULL) {
 continue;
-   }
-   se-set_params(blk_enable, shared, se-opaque);
+}
+
+/* Section type */
+qemu_put_byte(f, section);
+qemu_put_be32(f, se-section_id);
+
+if (section == QEMU_VM_SECTION_START) {
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+stage = inc ? QEMU_VM_SECTION_PART : QEMU_VM_SECTION_START;
+} else {
+assert(inc);
+stage = section;
+}
+
+ret = se-save_live_state(mon, f, stage, se-opaque);
+if (!ret) {
+skip++;
+if (pause) {
+break;
+}
+}
 }
-
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+return skip;
+}
+
+static void qemu_savevm_state_full(QEMUFile *f)
+{
+SaveStateEntry *se;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
 int len;
 
-if (se-save_live_state == NULL)
+if (se-save_state == NULL  se-vmsd == NULL) {
 continue;
+}
 
 /* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
 qemu_put_be32(f, se-section_id);
 
 /* ID string */
@@ -1635,9 +1674,29 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
+int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
+int shared)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (se-set_params == NULL) {
+continue;
+}
+se-set_params(blk_enable, shared, se-opaque);
 }
 
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_START, 0, 0);
+
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
 return -EIO;
@@ -1648,29 +1707,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
 int ret = 1;
 
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (se-save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se-section_id);
-
-ret = se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
-if (!ret) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
-}
-}
-
-if (ret)
+/* Do not proceed to the next vmstate before this one reported
+   completion of the current stage. This serializes the migration
+   and reduces the probability that a faster changing state is
+   synchronized over and over again. */
+ret = qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_PART, 1, 1);
+if (!ret) {
 return 1;
+}
 
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1682,46 +1728,41 @@ int

[Qemu-devel] [PATCH 16/18] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2011-02-24 Thread Yoshiaki Tamura
Introduce migrate_ft_trans_put_ready() which kicks the FT transaction
cycle.  When ft_mode is on, migrate_fd_put_ready() would open
ft_trans_file and turn on event_tap.  To end or cancel FT transaction,
ft_mode and event_tap is turned off.  migrate_ft_trans_get_ready() is
called to receive ack from the receiver.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |  261 ++-
 1 files changed, 260 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 1c2d956..cdea459 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -283,6 +284,14 @@ void migrate_fd_error(FdMigrationState *s)
 migrate_fd_cleanup(s);
 }
 
+static void migrate_ft_trans_error(FdMigrationState *s)
+{
+ft_mode = FT_ERROR;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_error(s);
+event_tap_unregister();
+}
+
 int migrate_fd_cleanup(FdMigrationState *s)
 {
 int ret = 0;
@@ -318,6 +327,17 @@ void migrate_fd_put_notify(void *opaque)
 qemu_file_put_notify(s-file);
 }
 
+static void migrate_fd_get_notify(void *opaque)
+{
+FdMigrationState *s = opaque;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_file_get_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_ft_trans_error(s);
+}
+}
+
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
 FdMigrationState *s = opaque;
@@ -353,6 +373,10 @@ int migrate_fd_get_buffer(void *opaque, uint8_t *data, 
int64_t pos, size_t size)
 ret = -(s-get_error(s));
 }
 
+if (ret == -EAGAIN) {
+qemu_set_fd_handler2(s-fd, NULL, migrate_fd_get_notify, NULL, s);
+}
+
 return ret;
 }
 
@@ -379,6 +403,230 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int migrate_ft_trans_commit(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_COMMIT  ft_mode != FT_TRANSACTION_ATOMIC) {
+fprintf(stderr,
+migrate_ft_trans_commit: invalid ft_mode %d\n, ft_mode);
+goto out;
+}
+
+do {
+if (ft_mode == FT_TRANSACTION_ATOMIC) {
+if (qemu_ft_trans_begin(s-file)  0) {
+fprintf(stderr, qemu_ft_trans_begin failed\n);
+goto out;
+}
+
+ret = qemu_savevm_trans_begin(s-mon, s-file, 0);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_begin failed\n);
+goto out;
+}
+
+ft_mode = FT_TRANSACTION_COMMIT;
+if (ret) {
+/* don't proceed until if fd isn't ready */
+goto out;
+}
+}
+
+/* make the VM state consistent by flushing outstanding events */
+vm_stop(0);
+
+/* send at full speed */
+qemu_file_set_rate_limit(s-file, 0);
+
+ret = qemu_savevm_trans_complete(s-mon, s-file);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_complete failed\n);
+goto out;
+}
+
+ret = qemu_ft_trans_commit(s-file);
+if (ret  0) {
+fprintf(stderr, qemu_ft_trans_commit failed\n);
+goto out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_RECV;
+ret = 1;
+goto out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto out;
+}
+
+ft_mode =  ret ? FT_TRANSACTION_BEGIN : FT_TRANSACTION_ATOMIC;
+} while (ft_mode != FT_TRANSACTION_BEGIN);
+
+vm_start();
+ret = 0;
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_get_ready(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_RECV) {
+fprintf(stderr,
+migrate_ft_trans_get_ready: invalid ft_mode %d\n, ft_mode);
+goto error_out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto error_out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_BEGIN;
+} else {
+ft_mode = FT_TRANSACTION_ATOMIC;
+
+ret = migrate_ft_trans_commit(s);
+if (ret  0) {
+goto error_out;
+}
+if (ret) {
+goto out;
+}
+}
+
+vm_start();
+ret = 0;
+goto out;
+
+error_out:
+migrate_ft_trans_error(s);
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_put_ready(void)
+{
+FdMigrationState *s = migrate_to_fms(current_migration

[Qemu-devel] [PATCH 09/18] Introduce event-tap.

2011-02-24 Thread Yoshiaki Tamura
event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 qemu-tool.c |   28 ++
 trace-events|   10 +
 5 files changed, 1023 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 220589e..da57efe 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,7 @@ obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 LIBS+=-lz
+obj-y += event-tap.o
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..95c147a
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,940 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include block.h
+#include block_int.h
+#include ioport.h
+#include osdep.h
+#include sysemu.h
+#include hw/hw.h
+#include net.h
+#include event-tap.h
+#include trace.h
+
+enum EVENT_TAP_STATE {
+EVENT_TAP_OFF,
+EVENT_TAP_ON,
+EVENT_TAP_SUSPEND,
+EVENT_TAP_FLUSH,
+EVENT_TAP_LOAD,
+EVENT_TAP_REPLAY,
+};
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int  index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t  buf[MMIO_BUF_SIZE];
+int  len;
+} EventTapMMIO;
+
+typedef struct EventTapNetReq {
+char *device_name;
+int iovcnt;
+int vlan_id;
+bool vlan_needed;
+bool async;
+struct iovec *iov;
+NetPacketSent *sent_cb;
+} EventTapNetReq;
+
+#define MAX_BLOCK_REQUEST 32
+
+typedef struct EventTapAIOCB EventTapAIOCB;
+
+typedef struct EventTapBlkReq {
+char *device_name;
+int num_reqs;
+int num_cbs;
+bool is_flush;
+BlockRequest reqs[MAX_BLOCK_REQUEST];
+EventTapAIOCB *acb[MAX_BLOCK_REQUEST];
+} EventTapBlkReq;
+
+#define EVENT_TAP_IOPORT (1  0)
+#define EVENT_TAP_MMIO   (1  1)
+#define EVENT_TAP_NET(1  2)
+#define EVENT_TAP_BLK(1  3)
+
+#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport;
+EventTapMMIO mmio;
+};
+union {
+EventTapNetReq net_req;
+EventTapBlkReq blk_req;
+};
+QTAILQ_ENTRY(EventTapLog) node;
+} EventTapLog;
+
+struct EventTapAIOCB {
+BlockDriverAIOCB common;
+BlockDriverAIOCB *acb;
+bool is_canceled;
+};
+
+static EventTapLog *last_event_tap;
+
+static QTAILQ_HEAD(, EventTapLog) event_list;
+static QTAILQ_HEAD(, EventTapLog) event_pool;
+
+static int (*event_tap_cb)(void);
+static QEMUBH *event_tap_bh;
+static VMChangeStateEntry *vmstate;
+
+static void event_tap_bh_cb(void *p)
+{
+if (event_tap_cb) {
+event_tap_cb();
+}
+
+qemu_bh_delete(event_tap_bh);
+event_tap_bh = NULL;
+}
+
+static void event_tap_schedule_bh(void)
+{
+trace_event_tap_ignore_bh(!!event_tap_bh);
+
+/* if bh is already set, we ignore it for now */
+if (event_tap_bh) {
+return;
+}
+
+event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
+qemu_bh_schedule(event_tap_bh);
+
+return;
+}
+
+static void *event_tap_alloc_log(void)
+{
+EventTapLog *log;
+
+if (QTAILQ_EMPTY(event_pool)) {
+log = qemu_mallocz(sizeof(EventTapLog));
+} else {
+log = QTAILQ_FIRST(event_pool);
+QTAILQ_REMOVE(event_pool, log, node);
+}
+
+return log;
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req);
+static void event_tap_free_blk_req(EventTapBlkReq *blk_req);
+
+static void event_tap_free_log(EventTapLog *log)
+{
+int mode = log-mode  ~EVENT_TAP_TYPE_MASK;
+
+if (mode == EVENT_TAP_NET) {
+event_tap_free_net_req(log-net_req);
+} else if (mode == EVENT_TAP_BLK) {
+event_tap_free_blk_req(log-blk_req);
+}
+
+log-mode = 0;
+
+/* return the log to event_pool */
+QTAILQ_INSERT_HEAD(event_pool, log, node);
+}
+
+static void event_tap_free_pool(void)
+{
+EventTapLog *log, *next;
+
+QTAILQ_FOREACH_SAFE(log, event_pool, node, next) {
+QTAILQ_REMOVE(event_pool, log, node);
+qemu_free(log);
+}
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req)
+{
+int i;
+
+if (!net_req-async

[Qemu-devel] Re: [PATCH 21/28] migration: Make state definitions local

2011-02-24 Thread Yoshiaki Tamura

Juan Quintela wrote:

Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:

2011/2/24 Juan Quintelaquint...@redhat.com:


Signed-off-by: Juan Quintelaquint...@redhat.com
---
  migration.c |8 
  migration.h |8 
  2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration.c b/migration.c
index 493c2d7..697c74f 100644
--- a/migration.c
+++ b/migration.c
@@ -31,6 +31,14 @@
 do { } while (0)
  #endif

+enum migration_state {
+MIG_STATE_ERROR,


Would be better to say:

MIG_STATE_ERROR = -1,


I thought about it, but basically it shouldn't matter, no?


It shouldn't.  Just gives clear impression :)

Yoshi



Later, Juan.






[Qemu-devel] Re: [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-24 Thread Yoshiaki Tamura
2011/2/24 Juan Quintela quint...@redhat.com:

 [ trimming cc to kvm  qemu lists]

 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Juan Quintela wrote:
 Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:
 This code implements VM transaction protocol.  Like buffered_file, it
 sits between savevm and migration layer.  With this architecture, VM
 transaction protocol is implemented mostly independent from other
 existing code.

 Could you explain what is the difference with buffered_file.c?
 I am fixing problems on buffered_file, and having something that copies
 lot of code from there makes me nervous.

 The objective is different:

 buffered_file buffers data for transmission control.
 ft_trans_file adds headers to the stream, and controls the transaction
 between sender and receiver.

 Although ft_trans_file sometimes buffers date, but it's not the main 
 objective.
 If you're fixing the problems on buffered_file, I'll keep eyes on them.

 +typedef ssize_t (FtTransPutBufferFunc)(void *opaque, const void *data, 
 size_t size);

 Can we get some sharing here?
 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t 
 size);

 There are not so much types for a write function that the 1st element is
 one opaque :p

 You're right, but I want to keep ft_trans_file independent of
 buffered_file at this point.  Once Kemari gets merged, I'm happy to
 work with you to fix the problems on buffered_file and ft_trans_file,
 and refactoring them.

 My goal is getting its own thread for migration on 0.15, that
 basically means that we can do rm buffered_file.c.  I guess that
 something similar could happen for kemari.

That means both gets initiated by it's own thread, not like
current poll based.  I'm still skeptical whether Anthony agrees,
but I'll keep it in my mind.

 But for now, this is just the start + handwaving, once I start doing the
 work I will told you.

Yes, please.

Yoshi


 Later, Juan.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Qemu-devel] [PATCH] migration: allow setting MIG_STATE_CANCEL even if s-state != MIG_STATE_ACTIVE.

2011-02-24 Thread Yoshiaki Tamura
After migration failure, even a user commands migrate_cancel, it keeps
saying:

Migration status: failed

Move checking s-state is MIG_STATE_ACTIVE, to allow setting
MIG_STATE_CANCEL even if s-state != MIG_STATE_ACTIVE.  With this
patch the message above would be:

Migration status: cancelled

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration.c b/migration.c
index af3a1f2..098885f 100644
--- a/migration.c
+++ b/migration.c
@@ -409,15 +409,16 @@ void migrate_fd_cancel(MigrationState *mig_state)
 {
 FdMigrationState *s = migrate_to_fms(mig_state);
 
-if (s-state != MIG_STATE_ACTIVE)
-return;
-
 DPRINTF(cancelling migration\n);
 
 s-state = MIG_STATE_CANCELLED;
 notifier_list_notify(migration_state_notifiers);
-qemu_savevm_state_cancel(s-mon, s-file);
 
+if (s-state != MIG_STATE_ACTIVE) {
+return;
+}
+
+qemu_savevm_state_cancel(s-mon, s-file);
 migrate_fd_cleanup(s);
 }
 
-- 
1.7.1.2




Re: [Qemu-devel] Re: [PATCH 22/22] migration: Make state definitions local

2011-02-24 Thread Yoshiaki Tamura
2011/2/24 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Although you're right, I would prefer to keep it so that somebody
 outside of migration may understand the status in the future if
 there are no harms.

 my plan is to move MigrationState inside migration.c, and then decide
 what to export/not export.

 Well, it may be just a policy, but it's already exported, and I
 would like to keep it unless it bothers your plan.  IIUC, I don't
 think it does.

 Next thing to do is move migration to its
 own thread.  Before doing that, I need to know what parts are used/not
 used outside migration.c.  Removing it now means that nothing gets to
 use it without needing a patch.

 I've once asked Anthony whether it's possible to make migration
 to different threads, but his answer was no due to hard
 dependency of qemu's internal code, and making migration to
 different threads are bad design.

 I know.  But Anthony is seeing the light O:-)

I'm with you at making live migration fast, but let me comment a
few.

 Basically, without an own thread we are not able to:
 - do anything else while on incoming migration
  (namely using the monitor)

It's not true.  Just adding fixed headers to buffered file should
be enough for that.  I've done it for Kemari as you can see
this (http://www.osrg.net/kemari/download/kemari-v0.2-fedora11.mov).

 - do anything else than migration.  We can try hard and let vcpus to
  run, but we would still clog the io_thread.
 - We are not able to saturate 10Gbit networking (basically we are doing
  2/3 level of bufferering (depending on how you count).

I think current byte-based dirty bitmap for sending rams are
responsible for this too.  I've converted it to bit-based dirty
and made traversing 100x faster.  Also bypassing QEMUFile buffer
in case of rams would boost to some degrees.

Thanks,

Yoshi

 So, once code is there, I guess we will convince Anthony to commit it.

 Later, Juan.





Re: [Qemu-devel] Re: [PATCH 22/22] migration: Make state definitions local

2011-02-24 Thread Yoshiaki Tamura
2011/2/24 Anthony Liguori anth...@codemonkey.ws:
 On 02/24/2011 06:23 AM, Juan Quintela wrote:

 Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:


 2011/2/23 Juan Quintelaquint...@redhat.com:


 Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:


 2011/2/23 Juan Quintelaquint...@redhat.com:




 Although you're right, I would prefer to keep it so that somebody
 outside of migration may understand the status in the future if
 there are no harms.


 my plan is to move MigrationState inside migration.c, and then decide
 what to export/not export.


 Well, it may be just a policy, but it's already exported, and I
 would like to keep it unless it bothers your plan.  IIUC, I don't
 think it does.



 Next thing to do is move migration to its
 own thread.  Before doing that, I need to know what parts are used/not
 used outside migration.c.  Removing it now means that nothing gets to
 use it without needing a patch.


 I've once asked Anthony whether it's possible to make migration
 to different threads, but his answer was no due to hard
 dependency of qemu's internal code, and making migration to
 different threads are bad design.


 I know.  But Anthony is seeing the light O:-)


 Let's be very careful about quoting Anthony as he's known to be incoherent
 90% of the time :-)

There you are :)

 I don't quite recall the context of the discussion with Yoshi, but I'm not
 quite there in terms of advocating that we throw a bucket full of threads at
 migration.  I think we should move the ram migration to another I/O thread
 that doesn't hold a lock against the main I/O thread.  That's all.

Let's just forget about the old discussion and have a new one.
Why do you want to have a new thread only for ram migration?  I
know that it's the majority of the migration, but how do you
serialize it with other device migration for single QEMUFile?
IIUC, it seems to get mixed up easily or lose paralism.

Thanks,

Yoshi


 Regards,

 Anthony Liguori

 Basically, without an own thread we are not able to:
 - do anything else while on incoming migration
   (namely using the monitor)
 - do anything else than migration.  We can try hard and let vcpus to
   run, but we would still clog the io_thread.
 - We are not able to saturate 10Gbit networking (basically we are doing
   2/3 level of bufferering (depending on how you count).

 So, once code is there, I guess we will convince Anthony to commit it.

 Later, Juan.








Re: [Qemu-devel] [PATCH 03/22] migration: Fold MigrationState into FdMigrationState

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration-exec.c |   10 +-
  migration-fd.c   |   10 +-
  migration-tcp.c  |   10 +-
  migration-unix.c |   10 +-
  migration.c      |   11 +--
  migration.h      |   23 +--
  6 files changed, 30 insertions(+), 44 deletions(-)

 diff --git a/migration-exec.c b/migration-exec.c
 index b49a475..02b0667 100644
 --- a/migration-exec.c
 +++ b/migration-exec.c
 @@ -93,12 +93,12 @@ FdMigrationState *exec_start_outgoing_migration(Monitor 
 *mon,
     s-close = exec_close;
     s-get_error = file_errno;
     s-write = file_write;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-fd.c b/migration-fd.c
 index bd5e8a9..ccba86b 100644
 --- a/migration-fd.c
 +++ b/migration-fd.c
 @@ -76,12 +76,12 @@ FdMigrationState *fd_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = fd_errno;
     s-write = fd_write;
     s-close = fd_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-tcp.c b/migration-tcp.c
 index 355bc37..02b01ed 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -95,12 +95,12 @@ FdMigrationState *tcp_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = socket_errno;
     s-write = socket_write;
     s-close = tcp_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-unix.c b/migration-unix.c
 index b9b0dbf..fb73f46 100644
 --- a/migration-unix.c
 +++ b/migration-unix.c
 @@ -94,12 +94,12 @@ FdMigrationState *unix_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = unix_errno;
     s-write = unix_write;
     s-close = unix_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration.c b/migration.c
 index 3a371a3..dd4cdab 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -86,7 +86,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);

     if (current_migration 
 -        current_migration-mig_state.get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -120,7 +120,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     }

     if (current_migration) {
 -        current_migration-mig_state.release(current_migration);
 +        current_migration-release(current_migration);
     }

     current_migration = s;
 @@ -133,7 +133,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
     FdMigrationState *s = current_migration;

     if (s)
 -        s-mig_state.cancel(s);
 +        s-cancel(s);

     return 0;
  }
 @@ -229,7 +229,7 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     QDict *qdict;

     if (current_migration) {
 -        MigrationState *s = current_migration-mig_state;
 +        FdMigrationState *s = current_migration;

         switch (s-get_status(current_migration)) {
         case MIG_STATE_ACTIVE:
 @@ -353,8 +353,7 @@ void migrate_fd_connect(FdMigrationState *s)
                                       migrate_fd_close);

     DPRINTF(beginning savevm\n);
 -    ret = qemu_savevm_state_begin(s-mon, s-file, s-mig_state.blk,
 -                                  s-mig_state.shared);
 +    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);
   

Re: [Qemu-devel] [PATCH 17/22] migration: use global variable directly

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 We are setting a pointer to a local variable in the previous line, just use
 the global variable directly.  We remove the -file test because it is already
 done inside qemu_file_set_rate_limit() function.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  1 files changed, 2 insertions(+), 4 deletions(-)

 diff --git a/migration.c b/migration.c
 index d7dfe1e..accc6e4 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -451,7 +451,6 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
  {
     int64_t d;
 -    MigrationState *s;

     d = qdict_get_int(qdict, value);
     if (d  0) {
 @@ -459,9 +458,8 @@ int do_migrate_set_speed(Monitor *mon, const QDict 
 *qdict, QObject **ret_data)
     }
     max_throttle = d;

 -    s = current_migration;
 -    if (s  s-file) {
 -        qemu_file_set_rate_limit(s-file, max_throttle);
 +    if (current_migration) {
 +        qemu_file_set_rate_limit(current_migration-file, max_throttle);
     }

     return 0;

Looks good to me.

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 12/22] migration: Use migrate_fd_error() in last place that set status to ERROR

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 We are also calling to migrate_fd_cleanup(), but notice that it is the
 right thing to do.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 +-
  1 files changed, 1 insertions(+), 5 deletions(-)

 diff --git a/migration.c b/migration.c
 index ab98664..3983257 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -351,11 +351,7 @@ static ssize_t migrate_fd_put_buffer(void *opaque, const 
 void *data, size_t size
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s-fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret  0) {
 -        if (s-mon) {
 -            monitor_resume(s-mon);
 -        }
 -        s-state = MIG_STATE_ERROR;
 -        notifier_list_notify(migration_state_notifiers);
 +        migrate_fd_error(s);
     }

Are you sure about this?  migrate_fd_error may call qemu_fclose
through migrate_fd_cleanup, but the caller of
migrate_fd_put_buffer gets called by buffered_file that sits
under qemu file.  In my previous posting,

http://permalink.gmane.org/gmane.comp.emulators.qemu/94688

I thought migrate_fd_put_buffer should just return error, and let
the original caller (migrate_fd_put_notify or any) to actually call
migrate_fd_error.

Thanks,

Yoshi


     return ret;
 --
 1.7.4






Re: [Qemu-devel] [PATCH 08/22] migration: Check that migration is active before cancel it

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/migration.c b/migration.c
 index 397a0b9..55f58c8 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -138,7 +138,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  {
     MigrationState *s = current_migration;

 -    if (s)
 +    if (s  s-get_status(s) == MIG_STATE_ACTIVE)
         s-cancel(s);

     return 0;

Why don't you remove *s again?

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 22/22] migration: Make state definitions local

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  migration.h |    6 --
  2 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 383ebaf..90fc2a0 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -31,6 +31,12 @@
     do { } while (0)
  #endif

 +#define MIG_STATE_ERROR                -1
 +#define MIG_STATE_NONE         0
 +#define MIG_STATE_CANCELLED    1
 +#define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3
 +
  static MigrationState current_migration = {
     .state = MIG_STATE_NONE,
      /* Migration speed throttling */
 diff --git a/migration.h b/migration.h
 index 9457807..493fbe5 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -18,12 +18,6 @@
  #include qemu-common.h
  #include notify.h

 -#define MIG_STATE_ERROR                -1
 -#define MIG_STATE_NONE         0
 -#define MIG_STATE_CANCELLED    1
 -#define MIG_STATE_ACTIVE       2
 -#define MIG_STATE_COMPLETED    3
 -

Although you're right, I would prefer to keep it so that somebody
outside of migration may understand the status in the future if
there are no harms.

Yoshi

  typedef struct MigrationState MigrationState;

  struct MigrationState
 --
 1.7.4






Re: [Qemu-devel] [PATCH 14/22] migration: Remove get_status() accessor

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 It is only used inside migration.c, and fields on that struct are
 accessed all around the place on that file.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   16 +---
  migration.h |    1 -
  2 files changed, 5 insertions(+), 12 deletions(-)

 diff --git a/migration.c b/migration.c
 index dfe6a96..2b873fa 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -90,7 +90,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     int ret;

     if (current_migration 
 -        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-state == MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -135,7 +135,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  {
     MigrationState *s = current_migration;

 -    if (s  s-get_status(s) == MIG_STATE_ACTIVE)
 +    if (s  s-state == MIG_STATE_ACTIVE)
         s-cancel(s);

     return 0;
 @@ -234,7 +234,7 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     if (current_migration) {
         MigrationState *s = current_migration;

 -        switch (s-get_status(current_migration)) {
 +        switch (s-state) {
         case MIG_STATE_NONE:
             /* no migration has happened ever */
             break;
 @@ -375,7 +375,7 @@ static void migrate_fd_put_ready(void *opaque)
         } else {
             migrate_fd_completed(s);
         }
 -        if (s-get_status(s) != MIG_STATE_COMPLETED) {
 +        if (s-state != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }
 @@ -383,11 +383,6 @@ static void migrate_fd_put_ready(void *opaque)
     }
  }

 -static int migrate_fd_get_status(MigrationState *s)
 -{
 -    return s-state;
 -}
 -
  static void migrate_fd_cancel(MigrationState *s)
  {
     if (s-state != MIG_STATE_ACTIVE)
 @@ -442,7 +437,7 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)
  int get_migration_state(void)
  {
     if (current_migration) {
 -        return migrate_fd_get_status(current_migration);
 +        return current_migration-state;
     } else {
         return MIG_STATE_ERROR;
     }
 @@ -477,7 +472,6 @@ static MigrationState *migrate_create_state(Monitor *mon, 
 int64_t bandwidth_limi
     MigrationState *s = qemu_mallocz(sizeof(*s));

     s-cancel = migrate_fd_cancel;
 -    s-get_status = migrate_fd_get_status;
     s-blk = blk;
     s-shared = inc;
     s-mon = NULL;
 diff --git a/migration.h b/migration.h
 index 5455d8b..58a6e06 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -37,7 +37,6 @@ struct MigrationState
     int (*close)(MigrationState*);
     int (*write)(MigrationState*, const void *, size_t);
     void (*cancel)(MigrationState *s);
 -    int (*get_status)(MigrationState *s);
     void *opaque;
     int blk;
     int shared;

I agree to access s-state directly inside of migration.c, but I
disagree to remove get_status() accessor right away.  We don't
have strong motivations for doing that AFAIK.

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 21/22] migration: Export a function that tells if the migration has finished correctly

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 This will allows us to hide the status values.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c     |    4 ++--
  migration.h     |    2 +-
  ui/spice-core.c |    4 +---
  3 files changed, 4 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 312a029..383ebaf 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -335,9 +335,9 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)
     notifier_list_remove(migration_state_notifiers, notify);
  }

 -int get_migration_state(void)
 +bool migration_has_finished(void)
  {
 -    return current_migration.state;
 +    return current_migration.state == MIG_STATE_COMPLETED;
  }

  void migrate_fd_connect(MigrationState *s)
 diff --git a/migration.h b/migration.h
 index 6477b51..9457807 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -82,6 +82,6 @@ void migrate_fd_connect(MigrationState *s);

  void add_migration_state_change_notifier(Notifier *notify);
  void remove_migration_state_change_notifier(Notifier *notify);
 -int get_migration_state(void);
 +bool migration_has_finished(void);

  #endif
 diff --git a/ui/spice-core.c b/ui/spice-core.c
 index 1aa1a5e..997603d 100644
 --- a/ui/spice-core.c
 +++ b/ui/spice-core.c
 @@ -422,9 +422,7 @@ void do_info_spice(Monitor *mon, QObject **ret_data)

  static void migration_state_notifier(Notifier *notifier)
  {
 -    int state = get_migration_state();
 -
 -    if (state == MIG_STATE_COMPLETED) {
 +    if (migration_has_finished()) {
  #if SPICE_SERVER_VERSION = 0x000701 /* 0.7.1 */
         spice_server_migrate_switch(spice_server);
  #endif

I agree to add migration_has_finished, but I don't see why we
want to remove get_migration_state.  Are we going to make
migration_has_* for each state even migration gets complicated?

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 19/22] migration: convert current_migration from pointer to struct

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 This cleans up a lot the code as we don't have to check anymore if
 the variable is NULL or not.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |  119 --
  1 files changed, 49 insertions(+), 70 deletions(-)

 diff --git a/migration.c b/migration.c
 index 4014330..7b1e679 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -34,7 +34,9 @@
  /* Migration speed throttling */
  static int64_t max_throttle = (32  20);

 -static MigrationState *current_migration;
 +static MigrationState current_migration = {
 +    .state = MIG_STATE_NONE,
 +};

  static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 @@ -135,37 +137,34 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
  {
     QDict *qdict;

 -    if (current_migration) {
 -
 -        switch (current_migration-state) {
 -        case MIG_STATE_NONE:
 -            /* no migration has happened ever */
 -            break;
 -        case MIG_STATE_ACTIVE:
 -            qdict = qdict_new();
 -            qdict_put(qdict, status, qstring_from_str(active));
 -
 -            migrate_put_status(qdict, ram, ram_bytes_transferred(),
 -                               ram_bytes_remaining(), ram_bytes_total());
 -
 -            if (blk_mig_active()) {
 -                migrate_put_status(qdict, disk, 
 blk_mig_bytes_transferred(),
 -                                   blk_mig_bytes_remaining(),
 -                                   blk_mig_bytes_total());
 -            }
 -
 -            *ret_data = QOBJECT(qdict);
 -            break;
 -        case MIG_STATE_COMPLETED:
 -            *ret_data = qobject_from_jsonf({ 'status': 'completed' });
 -            break;
 -        case MIG_STATE_ERROR:
 -            *ret_data = qobject_from_jsonf({ 'status': 'failed' });
 -            break;
 -        case MIG_STATE_CANCELLED:
 -            *ret_data = qobject_from_jsonf({ 'status': 'cancelled' });
 -            break;
 +    switch (current_migration.state) {
 +    case MIG_STATE_NONE:
 +        /* no migration has happened ever */
 +        break;
 +    case MIG_STATE_ACTIVE:
 +        qdict = qdict_new();
 +        qdict_put(qdict, status, qstring_from_str(active));
 +
 +        migrate_put_status(qdict, ram, ram_bytes_transferred(),
 +                           ram_bytes_remaining(), ram_bytes_total());
 +
 +        if (blk_mig_active()) {
 +            migrate_put_status(qdict, disk, blk_mig_bytes_transferred(),
 +                               blk_mig_bytes_remaining(),
 +                               blk_mig_bytes_total());
         }
 +
 +        *ret_data = QOBJECT(qdict);
 +        break;
 +    case MIG_STATE_COMPLETED:
 +        *ret_data = qobject_from_jsonf({ 'status': 'completed' });
 +        break;
 +    case MIG_STATE_ERROR:
 +        *ret_data = qobject_from_jsonf({ 'status': 'failed' });
 +        break;
 +    case MIG_STATE_CANCELLED:
 +        *ret_data = qobject_from_jsonf({ 'status': 'cancelled' });
 +        break;
     }
  }

 @@ -339,11 +338,7 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)

  int get_migration_state(void)
  {
 -    if (current_migration) {
 -        return current_migration-state;
 -    } else {
 -        return MIG_STATE_ERROR;
 -    }
 +    return current_migration.state;
  }

  void migrate_fd_connect(MigrationState *s)
 @@ -369,27 +364,22 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
  }

 -static MigrationState *migrate_create_state(Monitor *mon, int64_t 
 bandwidth_limit,
 -                                            int detach, int blk, int inc)
 +static void migrate_init_state(Monitor *mon, int64_t bandwidth_limit,
 +                               int detach, int blk, int inc)
  {
 -    MigrationState *s = qemu_mallocz(sizeof(*s));
 -
 -    s-blk = blk;
 -    s-shared = inc;
 -    s-mon = NULL;
 -    s-bandwidth_limit = bandwidth_limit;
 -    s-state = MIG_STATE_NONE;
 +    current_migration.blk = blk;
 +    current_migration.shared = inc;
 +    current_migration.mon = NULL;
 +    current_migration.bandwidth_limit = bandwidth_limit;
 +    current_migration.state = MIG_STATE_NONE;

     if (!detach) {
 -        migrate_fd_monitor_suspend(s, mon);
 +        migrate_fd_monitor_suspend(current_migration, mon);
     }
 -
 -    return s;
  }

  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
 -    MigrationState *s = NULL;
     const char *p;
     int detach = qdict_get_try_bool(qdict, detach, 0);
     int blk = qdict_get_try_bool(qdict, blk, 0);
 @@ -397,8 +387,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);
     int ret;

 -    if (current_migration 
 -        current_migration-state == MIG_STATE_ACTIVE) {
 +    if (current_migration.state == MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already 

Re: [Qemu-devel] [PATCH 18/22] migration: another case of global variable assigned to local one

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)

 diff --git a/migration.c b/migration.c
 index accc6e4..4014330 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -136,9 +136,8 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     QDict *qdict;

     if (current_migration) {
 -        MigrationState *s = current_migration;

 -        switch (s-state) {
 +        switch (current_migration-state) {
         case MIG_STATE_NONE:
             /* no migration has happened ever */
             break;

Looks good to me.

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 16/22] migration: Move exported functions to the end of the file

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 This means we can remove the two forward declarations.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |  188 
 +--
  1 files changed, 92 insertions(+), 96 deletions(-)

 diff --git a/migration.c b/migration.c
 index 92bff01..d7dfe1e 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -76,90 +76,6 @@ void process_incoming_migration(QEMUFile *f)
         vm_start();
  }

 -static MigrationState *migrate_create_state(Monitor *mon, int64_t 
 bandwidth_limit,
 -                                            int detach, int blk, int inc);
 -
 -int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 -{
 -    MigrationState *s = NULL;
 -    const char *p;
 -    int detach = qdict_get_try_bool(qdict, detach, 0);
 -    int blk = qdict_get_try_bool(qdict, blk, 0);
 -    int inc = qdict_get_try_bool(qdict, inc, 0);
 -    const char *uri = qdict_get_str(qdict, uri);
 -    int ret;
 -
 -    if (current_migration 
 -        current_migration-state == MIG_STATE_ACTIVE) {
 -        monitor_printf(mon, migration already in progress\n);
 -        return -1;
 -    }
 -
 -    if (qemu_savevm_state_blocked(mon)) {
 -        return -1;
 -    }
 -
 -    s = migrate_create_state(mon, max_throttle, detach, blk, inc);
 -
 -    if (strstart(uri, tcp:, p)) {
 -        ret = tcp_start_outgoing_migration(s, p);
 -#if !defined(WIN32)
 -    } else if (strstart(uri, exec:, p)) {
 -        ret = exec_start_outgoing_migration(s, p);
 -    } else if (strstart(uri, unix:, p)) {
 -        ret = unix_start_outgoing_migration(s, p);
 -    } else if (strstart(uri, fd:, p)) {
 -        ret = fd_start_outgoing_migration(s, p);
 -#endif
 -    } else {
 -        monitor_printf(mon, unknown migration protocol: %s\n, uri);
 -        ret  = -EINVAL;
 -        goto free_migrate_state;
 -    }
 -
 -    if (ret  0) {
 -        monitor_printf(mon, migration failed\n);
 -        goto free_migrate_state;
 -    }
 -
 -    qemu_free(current_migration);
 -    current_migration = s;
 -    notifier_list_notify(migration_state_notifiers);
 -    return 0;
 -free_migrate_state:
 -    qemu_free(s);
 -    return -1;
 -}
 -
 -static void migrate_fd_cancel(MigrationState *s);
 -
 -int do_migrate_cancel(Monitor *mon, const QDict *qdict, QObject **ret_data)
 -{
 -    if (current_migration)
 -        migrate_fd_cancel(current_migration);
 -
 -    return 0;
 -}
 -
 -int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 -{
 -    int64_t d;
 -    MigrationState *s;
 -
 -    d = qdict_get_int(qdict, value);
 -    if (d  0) {
 -        d = 0;
 -    }
 -    max_throttle = d;
 -
 -    s = current_migration;
 -    if (s  s-file) {
 -        qemu_file_set_rate_limit(s-file, max_throttle);
 -    }
 -
 -    return 0;
 -}
 -
  /* amount of nanoseconds we are willing to wait for migration to be down.
  * the choice of nanoseconds is because it is the maximum resolution that
  * get_clock() can achieve. It is an internal measure. All user-visible
 @@ -171,18 +87,6 @@ uint64_t migrate_max_downtime(void)
     return max_downtime;
  }

 -int do_migrate_set_downtime(Monitor *mon, const QDict *qdict,
 -                            QObject **ret_data)
 -{
 -    double d;
 -
 -    d = qdict_get_double(qdict, value) * 1e9;
 -    d = MAX(0, MIN(UINT64_MAX, d));
 -    max_downtime = (uint64_t)d;
 -
 -    return 0;
 -}
 -
  static void migrate_print_status(Monitor *mon, const char *name,
                                  const QDict *status_dict)
  {
 @@ -483,3 +387,95 @@ static MigrationState *migrate_create_state(Monitor 
 *mon, int64_t bandwidth_limi

     return s;
  }
 +
 +int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 +{
 +    MigrationState *s = NULL;
 +    const char *p;
 +    int detach = qdict_get_try_bool(qdict, detach, 0);
 +    int blk = qdict_get_try_bool(qdict, blk, 0);
 +    int inc = qdict_get_try_bool(qdict, inc, 0);
 +    const char *uri = qdict_get_str(qdict, uri);
 +    int ret;
 +
 +    if (current_migration 
 +        current_migration-state == MIG_STATE_ACTIVE) {
 +        monitor_printf(mon, migration already in progress\n);
 +        return -1;
 +    }
 +
 +    if (qemu_savevm_state_blocked(mon)) {
 +        return -1;
 +    }
 +
 +    s = migrate_create_state(mon, max_throttle, detach, blk, inc);
 +
 +    if (strstart(uri, tcp:, p)) {
 +        ret = tcp_start_outgoing_migration(s, p);
 +#if !defined(WIN32)
 +    } else if (strstart(uri, exec:, p)) {
 +        ret = exec_start_outgoing_migration(s, p);
 +    } else if (strstart(uri, unix:, p)) {
 +        ret = unix_start_outgoing_migration(s, p);
 +    } else if (strstart(uri, fd:, p)) {
 +        ret = fd_start_outgoing_migration(s, p);
 +#endif
 +    } else {
 +        monitor_printf(mon, unknown migration protocol: %s\n, uri);
 +        ret  = -EINVAL;
 +        goto free_migrate_state;
 +    }
 +
 +    if (ret  0) {
 +        

Re: [Qemu-devel] [PATCH 02/22] migration: Use FdMigrationState instead of MigrationState when possible

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   31 ++-
  migration.h |   16 
  2 files changed, 22 insertions(+), 25 deletions(-)

 diff --git a/migration.c b/migration.c
 index f9aaadc..3a371a3 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -34,7 +34,7 @@
  /* Migration speed throttling */
  static int64_t max_throttle = (32  20);

 -static MigrationState *current_migration;
 +static FdMigrationState *current_migration;

  static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 @@ -86,7 +86,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);

     if (current_migration 
 -        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-mig_state.get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -120,20 +120,20 @@ int do_migrate(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
     }

     if (current_migration) {
 -        current_migration-release(current_migration);
 +        current_migration-mig_state.release(current_migration);
     }

 -    current_migration = s-mig_state;
 +    current_migration = s;
     notifier_list_notify(migration_state_notifiers);
     return 0;
  }

  int do_migrate_cancel(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
 -    MigrationState *s = current_migration;
 +    FdMigrationState *s = current_migration;

     if (s)
 -        s-cancel(s);
 +        s-mig_state.cancel(s);

     return 0;
  }
 @@ -149,7 +149,7 @@ int do_migrate_set_speed(Monitor *mon, const QDict 
 *qdict, QObject **ret_data)
     }
     max_throttle = d;

 -    s = migrate_to_fms(current_migration);
 +    s = current_migration;
     if (s  s-file) {
         qemu_file_set_rate_limit(s-file, max_throttle);
     }
 @@ -227,10 +227,11 @@ static void migrate_put_status(QDict *qdict, const char 
 *name,
  void do_info_migrate(Monitor *mon, QObject **ret_data)
  {
     QDict *qdict;
 -    MigrationState *s = current_migration;

 -    if (s) {
 -        switch (s-get_status(s)) {
 +    if (current_migration) {
 +        MigrationState *s = current_migration-mig_state;
 +
 +        switch (s-get_status(current_migration)) {
         case MIG_STATE_ACTIVE:
             qdict = qdict_new();
             qdict_put(qdict, status, qstring_from_str(active));
 @@ -399,16 +400,13 @@ void migrate_fd_put_ready(void *opaque)
     }
  }

 -int migrate_fd_get_status(MigrationState *mig_state)
 +int migrate_fd_get_status(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);
     return s-state;
  }

 -void migrate_fd_cancel(MigrationState *mig_state)
 +void migrate_fd_cancel(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);
 -
     if (s-state != MIG_STATE_ACTIVE)
         return;

 @@ -421,9 +419,8 @@ void migrate_fd_cancel(MigrationState *mig_state)
     migrate_fd_cleanup(s);
  }

 -void migrate_fd_release(MigrationState *mig_state)
 +void migrate_fd_release(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);

     DPRINTF(releasing state\n);

 diff --git a/migration.h b/migration.h
 index db0e45a..f49a9e2 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -25,18 +25,18 @@

  typedef struct MigrationState MigrationState;

 +typedef struct FdMigrationState FdMigrationState;
 +
  struct MigrationState
  {
     /* FIXME: add more accessors to print migration info */
 -    void (*cancel)(MigrationState *s);
 -    int (*get_status)(MigrationState *s);
 -    void (*release)(MigrationState *s);
 +    void (*cancel)(FdMigrationState *s);
 +    int (*get_status)(FdMigrationState *s);
 +    void (*release)(FdMigrationState *s);
     int blk;
     int shared;
  };

 -typedef struct FdMigrationState FdMigrationState;
 -
  struct FdMigrationState
  {
     MigrationState mig_state;
 @@ -120,11 +120,11 @@ void migrate_fd_connect(FdMigrationState *s);

  void migrate_fd_put_ready(void *opaque);

 -int migrate_fd_get_status(MigrationState *mig_state);
 +int migrate_fd_get_status(FdMigrationState *mig_state);

 -void migrate_fd_cancel(MigrationState *mig_state);
 +void migrate_fd_cancel(FdMigrationState *mig_state);

 -void migrate_fd_release(MigrationState *mig_state);
 +void migrate_fd_release(FdMigrationState *mig_state);

  void migrate_fd_wait_for_unfreeze(void *opaque);


Looks good to me.

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 06/22] migration: Make all posible migration functions static

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 I have to move two functions postions to avoid forward declarations

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   72 +-
  migration.h |   12 -
  2 files changed, 36 insertions(+), 48 deletions(-)

 diff --git a/migration.c b/migration.c
 index e773806..1853380 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -273,15 +273,7 @@ static void migrate_fd_monitor_suspend(MigrationState 
 *s, Monitor *mon)
     }
  }

 -void migrate_fd_error(MigrationState *s)
 -{
 -    DPRINTF(setting error state\n);
 -    s-state = MIG_STATE_ERROR;
 -    notifier_list_notify(migration_state_notifiers);
 -    migrate_fd_cleanup(s);
 -}
 -
 -int migrate_fd_cleanup(MigrationState *s)
 +static int migrate_fd_cleanup(MigrationState *s)
  {
     int ret = 0;

 @@ -308,7 +300,15 @@ int migrate_fd_cleanup(MigrationState *s)
     return ret;
  }

 -void migrate_fd_put_notify(void *opaque)
 +void migrate_fd_error(MigrationState *s)
 +{
 +    DPRINTF(setting error state\n);
 +    s-state = MIG_STATE_ERROR;
 +    notifier_list_notify(migration_state_notifiers);
 +    migrate_fd_cleanup(s);
 +}
 +
 +static void migrate_fd_put_notify(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -316,7 +316,7 @@ void migrate_fd_put_notify(void *opaque)
     qemu_file_put_notify(s-file);
  }

 -ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 +static ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t 
 size)
  {
     MigrationState *s = opaque;
     ssize_t ret;
 @@ -341,29 +341,7 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
 *data, size_t size)
     return ret;
  }

 -void migrate_fd_connect(MigrationState *s)
 -{
 -    int ret;
 -
 -    s-file = qemu_fopen_ops_buffered(s,
 -                                      s-bandwidth_limit,
 -                                      migrate_fd_put_buffer,
 -                                      migrate_fd_put_ready,
 -                                      migrate_fd_wait_for_unfreeze,
 -                                      migrate_fd_close);
 -
 -    DPRINTF(beginning savevm\n);
 -    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);
 -    if (ret  0) {
 -        DPRINTF(failed, %d\n, ret);
 -        migrate_fd_error(s);
 -        return;
 -    }
 -
 -    migrate_fd_put_ready(s);
 -}
 -
 -void migrate_fd_put_ready(void *opaque)
 +static void migrate_fd_put_ready(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -431,7 +409,7 @@ static void migrate_fd_release(MigrationState *s)
     qemu_free(s);
  }

 -void migrate_fd_wait_for_unfreeze(void *opaque)
 +static void migrate_fd_wait_for_unfreeze(void *opaque)
  {
     MigrationState *s = opaque;
     int ret;
 @@ -450,7 +428,7 @@ void migrate_fd_wait_for_unfreeze(void *opaque)
     } while (ret == -1  (s-get_error(s)) == EINTR);
  }

 -int migrate_fd_close(void *opaque)
 +static int migrate_fd_close(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -477,6 +455,28 @@ int get_migration_state(void)
     }
  }

 +void migrate_fd_connect(MigrationState *s)
 +{
 +    int ret;
 +
 +    s-file = qemu_fopen_ops_buffered(s,
 +                                      s-bandwidth_limit,
 +                                      migrate_fd_put_buffer,
 +                                      migrate_fd_put_ready,
 +                                      migrate_fd_wait_for_unfreeze,
 +                                      migrate_fd_close);
 +
 +    DPRINTF(beginning savevm\n);
 +    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);
 +    if (ret  0) {
 +        DPRINTF(failed, %d\n, ret);
 +        migrate_fd_error(s);
 +        return;
 +    }
 +
 +    migrate_fd_put_ready(s);
 +}
 +
  MigrationState *migrate_create_state(Monitor *mon, int64_t bandwidth_limit,
                                      int detach, int blk, int inc)
  {
 diff --git a/migration.h b/migration.h
 index 0178414..048ee46 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -100,20 +100,8 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,

  void migrate_fd_error(MigrationState *s);

 -int migrate_fd_cleanup(MigrationState *s);
 -
 -void migrate_fd_put_notify(void *opaque);
 -
 -ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 -
  void migrate_fd_connect(MigrationState *s);

 -void migrate_fd_put_ready(void *opaque);
 -
 -void migrate_fd_wait_for_unfreeze(void *opaque);
 -
 -int migrate_fd_close(void *opaque);
 -
  MigrationState *migrate_create_state(Monitor *mon, int64_t bandwidth_limit,
                                      int detach, int blk, int inc);


Looks good to me.

Yoshi

 --
 1.7.4






Re: [Qemu-devel] [PATCH 09/22] migration: Introduce MIG_STATE_NONE

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Use MIG_STATE_ACTIVE only when migration has really started

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 +-
  migration.h |    3 ++-
  2 files changed, 7 insertions(+), 2 deletions(-)

 diff --git a/migration.c b/migration.c
 index 55f58c8..f015e02 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -238,6 +238,9 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
         MigrationState *s = current_migration;

         switch (s-get_status(current_migration)) {
 +        case MIG_STATE_NONE:
 +            /* no migration has happened ever */
 +            break;
         case MIG_STATE_ACTIVE:
             qdict = qdict_new();
             qdict_put(qdict, status, qstring_from_str(active));
 @@ -465,6 +468,7 @@ void migrate_fd_connect(MigrationState *s)
  {
     int ret;

 +    s-state = MIG_STATE_ACTIVE;
     s-file = qemu_fopen_ops_buffered(s,
                                       s-bandwidth_limit,
                                       migrate_fd_put_buffer,
 @@ -495,7 +499,7 @@ static MigrationState *migrate_create_state(Monitor *mon, 
 int64_t bandwidth_limi
     s-shared = inc;
     s-mon = NULL;
     s-bandwidth_limit = bandwidth_limit;
 -    s-state = MIG_STATE_ACTIVE;
 +    s-state = MIG_STATE_NONE;

     if (!detach) {
         migrate_fd_monitor_suspend(s, mon);
 diff --git a/migration.h b/migration.h
 index 7d28dd3..3df2293 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -19,9 +19,10 @@
  #include notify.h

  #define MIG_STATE_ERROR                -1
 -#define MIG_STATE_COMPLETED    0
 +#define MIG_STATE_NONE         0
  #define MIG_STATE_CANCELLED    1
  #define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3

It may be a good chance to make them enum?

Yoshi


  typedef struct MigrationState MigrationState;

 --
 1.7.4






Re: [Qemu-devel] [PATCH 10/22] migration: Refactor and simplify error checking in migrate_fd_put_ready

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

 diff --git a/migration.c b/migration.c
 index f015e02..641df9f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -361,28 +361,26 @@ static void migrate_fd_put_ready(void *opaque)

     DPRINTF(iterate\n);
     if (qemu_savevm_state_iterate(s-mon, s-file) == 1) {
 -        int state;
         int old_vm_running = vm_running;

         DPRINTF(done iterating\n);
         vm_stop(VMSTOP_MIGRATE);

 -        if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 -            if (old_vm_running) {
 -                vm_start();
 -            }
 -            state = MIG_STATE_ERROR;
 +        if (qemu_savevm_state_complete(s-mon, s-file)  0) {
 +            migrate_fd_error(s);
         } else {
 -            state = MIG_STATE_COMPLETED;
 +            if (migrate_fd_cleanup(s)  0) {
 +                migrate_fd_error(s);
 +            } else {
 +                s-state = MIG_STATE_COMPLETED;
 +                notifier_list_notify(migration_state_notifiers);
 +            }
         }
 -        if (migrate_fd_cleanup(s)  0) {
 +        if (s-get_status(s) != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }

This part, although it's not fair to ask you, but calling
vm_start when != MIG_STATE_COMPLETED terrifies me because just
failing migrate_fd_cleanup (mostly calling qemu_fclose) may cause
split brain between src/dst.  Although I haven't encountered this
situation, just having stopped VMs on both sides is safer.

Thanks,

Yoshi

 -            state = MIG_STATE_ERROR;
         }
 -        s-state = state;
 -        notifier_list_notify(migration_state_notifiers);
     }
  }

 --
 1.7.4






Re: [Qemu-devel] Re: [PATCH 0/2] Fix error handling in migration when the peer is killed.

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Hi,

 During live migration, if the receiver side of qemu gets killed, the
 sender side seems to be handling the error incorrectly, like it passes
 the iterate phase (stage 2) and moves on to the complete state (stage
 3).  These patches fix the issue.


 Agreed. Integrated into my series of cleanups. Thanks.

Thanks!

Yoshi



Re: [Qemu-devel] Re: [PATCH 10/22] migration: Refactor and simplify error checking in migrate_fd_put_ready

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

 diff --git a/migration.c b/migration.c
 index f015e02..641df9f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -361,28 +361,26 @@ static void migrate_fd_put_ready(void *opaque)

     DPRINTF(iterate\n);
     if (qemu_savevm_state_iterate(s-mon, s-file) == 1) {
 -        int state;
         int old_vm_running = vm_running;

         DPRINTF(done iterating\n);
         vm_stop(VMSTOP_MIGRATE);

 -        if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 -            if (old_vm_running) {
 -                vm_start();
 -            }
 -            state = MIG_STATE_ERROR;
 +        if (qemu_savevm_state_complete(s-mon, s-file)  0) {
 +            migrate_fd_error(s);
         } else {
 -            state = MIG_STATE_COMPLETED;
 +            if (migrate_fd_cleanup(s)  0) {
 +                migrate_fd_error(s);
 +            } else {
 +                s-state = MIG_STATE_COMPLETED;
 +                notifier_list_notify(migration_state_notifiers);
 +            }
         }
 -        if (migrate_fd_cleanup(s)  0) {
 +        if (s-get_status(s) != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }

 This part, although it's not fair to ask you, but calling
 vm_start when != MIG_STATE_COMPLETED terrifies me because just
 failing migrate_fd_cleanup (mostly calling qemu_fclose) may cause
 split brain between src/dst.  Although I haven't encountered this
 situation, just having stopped VMs on both sides is safer.

 I see your pain. I am not happy at all, but this was integrated by
 Anthony to fix this bug:

 commit 41ef56e61153d7bd27d34a634633bb51b1c5988d
 Author: Anthony Liguori aligu...@us.ibm.com
 Date:   Wed Jun 2 14:55:25 2010 -0500

    migration: respect exit status with exec:

  This fixes https://bugs.launchpad.net/qemu/+bug/391879


Thanks for the link.  I don't know IIUC, why stopping the VM was
a problem?  The essential thing is that we need to introduce a
flag that whether user wants to continue a VM when something goes
wrong during live migration.  Deciding only with old_vm_running is
wrong.

 I think that it fixes that bug, but it makes me un-easy to restart vm if
 there is a failure in migrate_fd_cleanup().  As I didn't wanted to
 change behaviour with this series, I left it as it was.

I agree with keeping the behavior unchanged.

 Next on ToDo list is to do something sensible with errors, just now we
 are not very good at handling them.

Yeah.  If we introduce Kemari, the migration code becomes more
important because it'll be part of the normal VM execution
path :)

Thanks,

Yoshi


 Later, Juan.





[Qemu-devel] [PATCH 03/18] Introduce skip_header parameter to qemu_loadvm_state().

2011-02-23 Thread Yoshiaki Tamura
Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |2 +-
 savevm.c|   24 +---
 sysemu.h|2 +-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/migration.c b/migration.c
index 302b8fe..bd51fef 100644
--- a/migration.c
+++ b/migration.c
@@ -63,7 +63,7 @@ int qemu_start_incoming_migration(const char *uri)
 
 void process_incoming_migration(QEMUFile *f)
 {
-if (qemu_loadvm_state(f)  0) {
+if (qemu_loadvm_state(f, 0)  0) {
 fprintf(stderr, load of migration failed\n);
 exit(0);
 }
diff --git a/savevm.c b/savevm.c
index 22010b9..52d5be8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1716,7 +1716,7 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, int skip_header)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
@@ -1729,17 +1729,19 @@ int qemu_loadvm_state(QEMUFile *f)
 return -EINVAL;
 }
 
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
+if (!skip_header) {
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC)
+return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION)
+return -ENOTSUP;
 }
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -2062,7 +2064,7 @@ int load_vmstate(const char *name)
 return -EINVAL;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 
 qemu_fclose(f);
 if (ret  0) {
diff --git a/sysemu.h b/sysemu.h
index 0a83ab9..8339eb4 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -93,7 +93,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
-- 
1.7.1.2




[Qemu-devel] [PATCH 01/18] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2011-02-23 Thread Yoshiaki Tamura
Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   20 +++-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 5e24329..a168a37 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index a50fd31..22010b9 100644
--- a/savevm.c
+++ b/savevm.c
@@ -171,7 +171,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -422,6 +423,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_malloc(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -452,6 +456,19 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -477,6 +494,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 18/18] Introduce kemari: to enable FT migration mode (Kemari).

2011-02-23 Thread Yoshiaki Tamura
When kemari: is set in front of URI of migrate command, it will turn
on ft_mode to start FT migration mode (Kemari).  On the receiver side,
the option looks like, -incoming kemari:protocol:address:port

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
 hmp-commands.hx |4 +++-
 migration.c |   12 
 qmp-commands.hx |4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 372bef4..4588f38 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -760,7 +760,9 @@ ETEXI
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
diff --git a/migration.c b/migration.c
index 82f4a4d..95096ef 100644
--- a/migration.c
+++ b/migration.c
@@ -48,6 +48,12 @@ int qemu_start_incoming_migration(const char *uri)
 const char *p;
 int ret;
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p))
 ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
@@ -99,6 +105,12 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  blk, inc);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index df40a3d..68ca48a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -437,7 +437,9 @@ EQMP
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.1.2




[Qemu-devel] [PATCH 06/18] virtio: decrement last_avail_idx with inuse before saving.

2011-02-23 Thread Yoshiaki Tamura
For regular migration inuse == 0 always as requests are flushed before
save. However, event-tap log when enabled introduces an extra queue
for requests which is not being flushed, thus the last inuse requests
are left in the event-tap queue.  Move the last_avail_idx value sent
to the remote back to make it repeat the last inuse requests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 31bd9e3..f05d1b6 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -673,12 +673,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, i);
 
 for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+/* For regular migration inuse == 0 always as
+ * requests are flushed before save. However,
+ * event-tap log when enabled introduces an extra
+ * queue for requests which is not being flushed,
+ * thus the last inuse requests are left in the event-tap queue.
+ * Move the last_avail_idx value sent to the remote back
+ * to make it repeat the last inuse requests. */
+uint16_t last_avail = vdev-vq[i].last_avail_idx - vdev-vq[i].inuse;
 if (vdev-vq[i].vring.num == 0)
 break;
 
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
-qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, last_avail);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 10/18] Call init handler of event-tap at main() in vl.c.

2011-02-23 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 4e263c3..614ac9c 100644
--- a/vl.c
+++ b/vl.c
@@ -162,6 +162,7 @@ int main(int argc, char **argv)
 #include qemu-queue.h
 #include cpus.h
 #include arch_init.h
+#include event-tap.h
 
 #include ui/qemu-spice.h
 
@@ -2931,6 +2932,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 /* open the virtual block devices */
 if (snapshot)
 qemu_opts_foreach(qemu_find_opts(drive), drive_enable_snapshot, 
NULL, 0);
-- 
1.7.1.2




[Qemu-devel] [PATCH 13/18] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async().

2011-02-23 Thread Yoshiaki Tamura
event-tap function is called only when it is on.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 net.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index ec4745d..724b549 100644
--- a/net.c
+++ b/net.c
@@ -36,6 +36,7 @@
 #include qemu-common.h
 #include qemu_socket.h
 #include hw/qdev.h
+#include event-tap.h
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -559,6 +560,10 @@ ssize_t qemu_send_packet_async(VLANClientState *sender,
 
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size)
 {
+if (event_tap_is_on()) {
+return event_tap_send_packet(vc, buf, size);
+}
+
 qemu_send_packet_async(vc, buf, size, NULL);
 }
 
@@ -657,6 +662,10 @@ ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 {
 NetQueue *queue;
 
+if (event_tap_is_on()) {
+return event_tap_sendv_packet_async(sender, iov, iovcnt, sent_cb);
+}
+
 if (sender-link_down || (!sender-peer  !sender-vlan)) {
 return calc_iov_length(iov, iovcnt);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 09/18] Introduce event-tap.

2011-02-23 Thread Yoshiaki Tamura
event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 qemu-tool.c |   28 ++
 trace-events|   10 +
 5 files changed, 1023 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 220589e..da57efe 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,7 @@ obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 LIBS+=-lz
+obj-y += event-tap.o
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..95c147a
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,940 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include block.h
+#include block_int.h
+#include ioport.h
+#include osdep.h
+#include sysemu.h
+#include hw/hw.h
+#include net.h
+#include event-tap.h
+#include trace.h
+
+enum EVENT_TAP_STATE {
+EVENT_TAP_OFF,
+EVENT_TAP_ON,
+EVENT_TAP_SUSPEND,
+EVENT_TAP_FLUSH,
+EVENT_TAP_LOAD,
+EVENT_TAP_REPLAY,
+};
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int  index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t  buf[MMIO_BUF_SIZE];
+int  len;
+} EventTapMMIO;
+
+typedef struct EventTapNetReq {
+char *device_name;
+int iovcnt;
+int vlan_id;
+bool vlan_needed;
+bool async;
+struct iovec *iov;
+NetPacketSent *sent_cb;
+} EventTapNetReq;
+
+#define MAX_BLOCK_REQUEST 32
+
+typedef struct EventTapAIOCB EventTapAIOCB;
+
+typedef struct EventTapBlkReq {
+char *device_name;
+int num_reqs;
+int num_cbs;
+bool is_flush;
+BlockRequest reqs[MAX_BLOCK_REQUEST];
+EventTapAIOCB *acb[MAX_BLOCK_REQUEST];
+} EventTapBlkReq;
+
+#define EVENT_TAP_IOPORT (1  0)
+#define EVENT_TAP_MMIO   (1  1)
+#define EVENT_TAP_NET(1  2)
+#define EVENT_TAP_BLK(1  3)
+
+#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport;
+EventTapMMIO mmio;
+};
+union {
+EventTapNetReq net_req;
+EventTapBlkReq blk_req;
+};
+QTAILQ_ENTRY(EventTapLog) node;
+} EventTapLog;
+
+struct EventTapAIOCB {
+BlockDriverAIOCB common;
+BlockDriverAIOCB *acb;
+bool is_canceled;
+};
+
+static EventTapLog *last_event_tap;
+
+static QTAILQ_HEAD(, EventTapLog) event_list;
+static QTAILQ_HEAD(, EventTapLog) event_pool;
+
+static int (*event_tap_cb)(void);
+static QEMUBH *event_tap_bh;
+static VMChangeStateEntry *vmstate;
+
+static void event_tap_bh_cb(void *p)
+{
+if (event_tap_cb) {
+event_tap_cb();
+}
+
+qemu_bh_delete(event_tap_bh);
+event_tap_bh = NULL;
+}
+
+static void event_tap_schedule_bh(void)
+{
+trace_event_tap_ignore_bh(!!event_tap_bh);
+
+/* if bh is already set, we ignore it for now */
+if (event_tap_bh) {
+return;
+}
+
+event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
+qemu_bh_schedule(event_tap_bh);
+
+return;
+}
+
+static void *event_tap_alloc_log(void)
+{
+EventTapLog *log;
+
+if (QTAILQ_EMPTY(event_pool)) {
+log = qemu_mallocz(sizeof(EventTapLog));
+} else {
+log = QTAILQ_FIRST(event_pool);
+QTAILQ_REMOVE(event_pool, log, node);
+}
+
+return log;
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req);
+static void event_tap_free_blk_req(EventTapBlkReq *blk_req);
+
+static void event_tap_free_log(EventTapLog *log)
+{
+int mode = log-mode  ~EVENT_TAP_TYPE_MASK;
+
+if (mode == EVENT_TAP_NET) {
+event_tap_free_net_req(log-net_req);
+} else if (mode == EVENT_TAP_BLK) {
+event_tap_free_blk_req(log-blk_req);
+}
+
+log-mode = 0;
+
+/* return the log to event_pool */
+QTAILQ_INSERT_HEAD(event_pool, log, node);
+}
+
+static void event_tap_free_pool(void)
+{
+EventTapLog *log, *next;
+
+QTAILQ_FOREACH_SAFE(log, event_pool, node, next) {
+QTAILQ_REMOVE(event_pool, log, node);
+qemu_free(log);
+}
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req)
+{
+int i;
+
+if (!net_req-async

[Qemu-devel] [PATCH 00/18] Kemari for KVM v0.2.11

2011-02-23 Thread Yoshiaki Tamura

Yoshiaki Tamura (18):
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  qemu-char: export socket_set_nodelay().
  vl.c: add deleted flag for deleting the handler.
  virtio: decrement last_avail_idx with inuse before saving.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  savevm: introduce util functions to control ft_trans_file from savevm
layer.
  Introduce event-tap.
  Call init handler of event-tap at main() in vl.c.
  ioport: insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.
  net: insert event-tap to qemu_send_packet() and
qemu_sendv_packet_async().
  block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and
bdrv_flush().
  savevm: introduce qemu_savevm_trans_{begin,commit}.
  migration: introduce migrate_ft_trans_{put,get}_ready(), and modify
migrate_fd_put_ready() when ft_mode is on.
  migration-tcp: modify tcp_accept_incoming_migration() to handle
ft_mode, and add a hack not to close fd when ft_mode is enabled.
  Introduce kemari: to enable FT migration mode (Kemari).

 Makefile.objs   |1 +
 Makefile.target |1 +
 block.c |   15 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 exec.c  |4 +
 ft_trans_file.c |  624 
 ft_trans_file.h |   72 +
 hmp-commands.hx |4 +-
 hw/hw.h |7 +
 hw/virtio.c |   10 +-
 ioport.c|2 +
 migration-tcp.c |   82 +-
 migration.c |  291 +-
 migration.h |3 +
 net.c   |9 +
 qemu-char.c |2 +-
 qemu-tool.c |   28 ++
 qemu_socket.h   |1 +
 qmp-commands.hx |4 +-
 savevm.c|  350 +
 sysemu.h|4 +-
 trace-events|   25 ++
 vl.c|   16 +-
 24 files changed, 2457 insertions(+), 82 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h




[Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-02-23 Thread Yoshiaki Tamura
Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d611100..e192eec 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3662,6 +3663,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.1.2




[Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-23 Thread Yoshiaki Tamura
Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.  Note that this
won't change the existing migration protocol to be bi-directional.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   15 +++
 migration.c |   13 +
 migration.h |3 +++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index b55f419..55777c8 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,20 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s-fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do {
+len = recv(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
@@ -94,6 +108,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s-get_error = socket_errno;
 s-write = socket_write;
+s-read = socket_read;
 s-close = tcp_close;
 s-mig_state.cancel = migrate_fd_cancel;
 s-mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index af3a1f2..302b8fe 100644
--- a/migration.c
+++ b/migration.c
@@ -340,6 +340,19 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size)
+{
+FdMigrationState *s = opaque;
+int ret;
+
+ret = s-read(s, data, size);
+if (ret == -1) {
+ret = -(s-get_error(s));
+}
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index 2170792..88a6987 100644
--- a/migration.h
+++ b/migration.h
@@ -48,6 +48,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -116,6 +117,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.1.2




[Qemu-devel] [PATCH 11/18] ioport: insert event_tap_ioport() to ioport_write().

2011-02-23 Thread Yoshiaki Tamura
Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index aa4188a..74aebf5 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include ioport.h
 #include trace.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.1.2




[Qemu-devel] [PATCH 15/18] savevm: introduce qemu_savevm_trans_{begin, commit}.

2011-02-23 Thread Yoshiaki Tamura
Introduce qemu_savevm_trans_{begin,commit} to send the memory and
device info together, while avoiding cancelling memory state tracking.
This patch also abstracts common code between
qemu_savevm_state_{begin,iterate,commit}.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |  157 +++---
 sysemu.h |2 +
 2 files changed, 101 insertions(+), 58 deletions(-)

diff --git a/savevm.c b/savevm.c
index 78c1972..c96a393 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1601,29 +1601,68 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 return false;
 }
 
-int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
-int shared)
+/*
+ * section: header to write
+ * inc: if true, forces to pass SECTION_PART instead of SECTION_START
+ * pause: if true, breaks the loop when live handler returned 0
+ */
+static int qemu_savevm_state_live(Monitor *mon, QEMUFile *f, int section,
+  bool inc, bool pause)
 {
 SaveStateEntry *se;
+int skip = 0, ret;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if(se-set_params == NULL) {
+int len, stage;
+
+if (se-save_live_state == NULL) {
 continue;
-   }
-   se-set_params(blk_enable, shared, se-opaque);
+}
+
+/* Section type */
+qemu_put_byte(f, section);
+qemu_put_be32(f, se-section_id);
+
+if (section == QEMU_VM_SECTION_START) {
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+stage = inc ? QEMU_VM_SECTION_PART : QEMU_VM_SECTION_START;
+} else {
+assert(inc);
+stage = section;
+}
+
+ret = se-save_live_state(mon, f, stage, se-opaque);
+if (!ret) {
+skip++;
+if (pause) {
+break;
+}
+}
 }
-
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+return skip;
+}
+
+static void qemu_savevm_state_full(QEMUFile *f)
+{
+SaveStateEntry *se;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
 int len;
 
-if (se-save_live_state == NULL)
+if (se-save_state == NULL  se-vmsd == NULL) {
 continue;
+}
 
 /* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
 qemu_put_be32(f, se-section_id);
 
 /* ID string */
@@ -1634,9 +1673,29 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
+int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
+int shared)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (se-set_params == NULL) {
+continue;
+}
+se-set_params(blk_enable, shared, se-opaque);
 }
 
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_START, 0, 0);
+
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
 return -EIO;
@@ -1647,29 +1706,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
 int ret = 1;
 
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (se-save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se-section_id);
-
-ret = se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
-if (!ret) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
-}
-}
-
-if (ret)
+/* Do not proceed to the next vmstate before this one reported
+   completion of the current stage. This serializes the migration
+   and reduces the probability that a faster changing state is
+   synchronized over and over again. */
+ret = qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_PART, 1, 1);
+if (!ret) {
 return 1;
+}
 
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1681,46 +1727,41 @@ int

[Qemu-devel] [PATCH 05/18] vl.c: add deleted flag for deleting the handler.

2011-02-23 Thread Yoshiaki Tamura
Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..4e263c3 100644
--- a/vl.c
+++ b/vl.c
@@ -1158,6 +1158,7 @@ static void nographic_update(void *opaque)
 struct vm_change_state_entry {
 VMChangeStateHandler *cb;
 void *opaque;
+int deleted;
 QLIST_ENTRY (vm_change_state_entry) entries;
 };
 
@@ -1178,8 +1179,7 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
 }
 
 void vm_state_notify(int running, int reason)
@@ -1188,8 +1188,13 @@ void vm_state_notify(int running, int reason)
 
 trace_vm_state_notify(running, reason);
 
-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);
+QLIST_FOREACH(e, vm_change_state_head, entries) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}
 }
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 17/18] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2011-02-23 Thread Yoshiaki Tamura
When ft_mode is set in the header, tcp_accept_incoming_migration()
sets ft_trans_incoming() as a callback, and call
qemu_file_get_notify() to receive FT transaction iteratively.  We also
need a hack no to close fd before moving to ft_transaction mode, so
that we can reuse the fd for it.  vm_change_state_handler is added to
turn off ft_mode when cont is pressed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   67 ++-
 1 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 55777c8..84076d6 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,8 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_trans_file.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -29,6 +31,8 @@
 do { } while (0)
 #endif
 
+static VMChangeStateEntry *vmstate;
+
 static int socket_errno(FdMigrationState *s)
 {
 return socket_error();
@@ -56,7 +60,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -150,6 +155,36 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 return s-mig_state;
 }
 
+static void ft_trans_incoming(void *opaque)
+{
+QEMUFile *f = opaque;
+
+qemu_file_get_notify(f);
+if (qemu_file_has_error(f)) {
+ft_mode = FT_ERROR;
+qemu_fclose(f);
+}
+}
+
+static void ft_trans_reset(void *opaque, int running, int reason)
+{
+QEMUFile *f = opaque;
+
+if (running) {
+if (ft_mode != FT_ERROR) {
+qemu_fclose(f);
+}
+ft_mode = FT_OFF;
+qemu_del_vm_change_state_handler(vmstate);
+}
+}
+
+static void ft_trans_schedule_replay(QEMUFile *f)
+{
+event_tap_schedule_replay();
+vmstate = qemu_add_vm_change_state_handler(ft_trans_reset, f);
+}
+
 static void tcp_accept_incoming_migration(void *opaque)
 {
 struct sockaddr_in addr;
@@ -175,8 +210,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (ft_mode == FT_INIT) {
+autostart = 0;
+}
+
 process_incoming_migration(f);
+
+if (ft_mode == FT_INIT) {
+int ret;
+
+socket_set_nodelay(c);
+
+f = qemu_fopen_ft_trans(s, c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen_ft_trans\n);
+goto out;
+}
+
+/* need to wait sender to setup */
+ret = qemu_ft_trans_begin(f);
+if (ret  0) {
+goto out;
+}
+
+qemu_set_fd_handler2(c, NULL, ft_trans_incoming, NULL, f);
+ft_trans_schedule_replay(f);
+ft_mode = FT_TRANSACTION_RECV;
+
+return;
+}
+
 qemu_fclose(f);
+
 out:
 close(c);
 out2:
-- 
1.7.1.2




[Qemu-devel] [PATCH 14/18] block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and bdrv_flush().

2011-02-23 Thread Yoshiaki Tamura
event-tap function is called only when it is on, and requests were
sent from device emulators.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Kevin Wolf kw...@redhat.com
---
 block.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..b19729a 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include block_int.h
 #include module.h
 #include qemu-objects.h
+#include event-tap.h
 
 #ifdef CONFIG_BSD
 #include sys/types.h
@@ -1585,6 +1586,10 @@ int bdrv_flush(BlockDriverState *bs)
 }
 
 if (bs-drv  bs-drv-bdrv_flush) {
+if (*bs-device_name  event_tap_is_on()) {
+event_tap_bdrv_flush();
+}
+
 return bs-drv-bdrv_flush(bs);
 }
 
@@ -2220,6 +2225,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 if (bdrv_check_request(bs, sector_num, nb_sectors))
 return NULL;
 
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+ cb, opaque);
+}
+
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
  opaque);
@@ -2483,6 +2493,11 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
 if (!drv)
 return NULL;
+
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_flush(bs, cb, opaque);
+}
+
 return drv-bdrv_aio_flush(bs, cb, opaque);
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 08/18] savevm: introduce util functions to control ft_trans_file from savevm layer.

2011-02-23 Thread Yoshiaki Tamura
To utilize ft_trans_file function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 ++
 savevm.c |  149 ++
 2 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index a168a37..a9eff5a 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -51,6 +51,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -60,6 +61,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_ft_trans_begin(QEMUFile *f);
+int qemu_ft_trans_commit(QEMUFile *f);
+int qemu_ft_trans_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
@@ -94,6 +98,7 @@ void qemu_file_set_error(QEMUFile *f);
  * halted due to rate limiting or EAGAIN errors occur as it can be used to
  * resume output. */
 void qemu_file_put_notify(QEMUFile *f);
+void qemu_file_get_notify(void *opaque);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 52d5be8..78c1972 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_trans_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -189,6 +190,13 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
+typedef struct QEMUFileSocketTrans
+{
+int fd;
+QEMUFileSocket *s;
+VMChangeStateEntry *e;
+} QEMUFileSocketTrans;
+
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
 QEMUFileSocket *s = opaque;
@@ -204,6 +212,22 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -211,6 +235,70 @@ static int socket_close(void *opaque)
 return 0;
 }
 
+static int socket_trans_get_buffer(void *opaque, uint8_t *buf, int64_t pos, 
size_t size)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+ssize_t len;
+
+len = socket_get_buffer(s, buf, pos, size);
+
+return len;
+}
+
+static ssize_t socket_trans_put_buffer(void *opaque, const void *buf, size_t 
size)
+{
+QEMUFileSocketTrans *t = opaque;
+
+return socket_put_buffer(t-s, buf, size);
+}
+
+
+static int socket_trans_get_ready(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+QEMUFile *f = s-file;
+int ret = 0;
+
+ret = qemu_loadvm_state(f, 1);
+if (ret  0) {
+fprintf(stderr,
+socket_trans_get_ready: error while loading vmstate\n);
+}
+
+return ret;
+}
+
+static int socket_trans_close(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(t-fd, NULL, NULL, NULL, NULL);
+qemu_del_vm_change_state_handler(t-e);
+close(s-fd);
+close(t-fd);
+qemu_free(s);
+qemu_free(t);
+
+return 0;
+}
+
+static void socket_trans_resume(void *opaque, int running, int reason)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+if (!running) {
+return;
+}
+
+qemu_announce_self();
+qemu_fclose(s-file);
+}
+
 static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
 {
 QEMUFileStdio *s = opaque;
@@ -333,6 +421,26 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd)
+{
+QEMUFileSocketTrans *t = qemu_mallocz(sizeof(QEMUFileSocketTrans));
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+t-s = s;
+t-fd = s_fd;
+t-e = qemu_add_vm_change_state_handler(socket_trans_resume, t);
+
+s-fd = c_fd;
+s-file = qemu_fopen_ops_ft_trans(t, socket_trans_put_buffer,
+  socket_trans_get_buffer, NULL,
+  socket_trans_get_ready,
+  migrate_fd_wait_for_unfreeze,
+  socket_trans_close, 0

[Qemu-devel] [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-23 Thread Yoshiaki Tamura
This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs   |1 +
 ft_trans_file.c |  624 +++
 ft_trans_file.h |   72 +++
 migration.c |3 +
 trace-events|   15 ++
 5 files changed, 715 insertions(+), 0 deletions(-)
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h

diff --git a/Makefile.objs b/Makefile.objs
index c144df1..8856160 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -100,6 +100,7 @@ common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
 common-obj-y += pflib.o
+common-obj-y += ft_trans_file.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_trans_file.c b/ft_trans_file.c
new file mode 100644
index 000..2b42b95
--- /dev/null
+++ b/ft_trans_file.c
@@ -0,0 +1,624 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include trace.h
+#include ft_trans_file.h
+
+typedef struct FtTransHdr
+{
+uint16_t cmd;
+uint16_t id;
+uint32_t seq;
+uint32_t payload_len;
+} FtTransHdr;
+
+typedef struct QEMUFileFtTrans
+{
+FtTransPutBufferFunc *put_buffer;
+FtTransGetBufferFunc *get_buffer;
+FtTransPutReadyFunc *put_ready;
+FtTransGetReadyFunc *get_ready;
+FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
+FtTransCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+
+enum QEMU_VM_TRANSACTION_STATE state;
+uint32_t seq;
+uint16_t id;
+
+int has_error;
+
+bool freeze_output;
+bool freeze_input;
+bool rate_limit;
+bool is_sender;
+bool is_payload;
+
+uint8_t *buf;
+size_t buf_max_size;
+size_t put_offset;
+size_t get_offset;
+
+FtTransHdr header;
+size_t header_offset;
+} QEMUFileFtTrans;
+
+#define IO_BUF_SIZE 32768
+
+static void ft_trans_append(QEMUFileFtTrans *s,
+const uint8_t *buf, size_t size)
+{
+if (size  (s-buf_max_size - s-put_offset)) {
+trace_ft_trans_realloc(s-buf_max_size, size + 1024);
+s-buf_max_size += size + 1024;
+s-buf = qemu_realloc(s-buf, s-buf_max_size);
+}
+
+trace_ft_trans_append(size);
+memcpy(s-buf + s-put_offset, buf, size);
+s-put_offset += size;
+}
+
+static void ft_trans_flush(QEMUFileFtTrans *s)
+{
+size_t offset = 0;
+
+if (s-has_error) {
+error_report(flush when error %d, bailing, s-has_error);
+return;
+}
+
+while (offset  s-put_offset) {
+ssize_t ret;
+
+ret = s-put_buffer(s-opaque, s-buf + offset, s-put_offset - 
offset);
+if (ret == -EAGAIN) {
+break;
+}
+
+if (ret = 0) {
+error_report(error flushing data, %s, strerror(errno));
+s-has_error = FT_TRANS_ERR_FLUSH;
+break;
+} else {
+offset += ret;
+}
+}
+
+trace_ft_trans_flush(offset, s-put_offset);
+memmove(s-buf, s-buf + offset, s-put_offset - offset);
+s-put_offset -= offset;
+s-freeze_output = !!s-put_offset;
+}
+
+static ssize_t ft_trans_put(void *opaque, void *buf, int size)
+{
+QEMUFileFtTrans *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+/* flush buffered data before putting next */
+if (s-put_offset) {
+ft_trans_flush(s);
+}
+
+while (!s-freeze_output  offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len == -EAGAIN) {
+trace_ft_trans_freeze_output();
+s-freeze_output = 1;
+break;
+}
+
+if (len = 0) {
+error_report(putting data failed, %s, strerror(errno));
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+if (s-freeze_output) {
+ft_trans_append(s, buf + offset, size - offset);
+offset = size;
+}
+
+return offset;
+}
+
+static int ft_trans_send_header(QEMUFileFtTrans *s,
+enum QEMU_VM_TRANSACTION_STATE state,
+uint32_t payload_len)
+{
+int ret;
+FtTransHdr

[Qemu-devel] [PATCH 16/18] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2011-02-23 Thread Yoshiaki Tamura
Introduce migrate_ft_trans_put_ready() which kicks the FT transaction
cycle.  When ft_mode is on, migrate_fd_put_ready() would open
ft_trans_file and turn on event_tap.  To end or cancel FT transaction,
ft_mode and event_tap is turned off.  migrate_ft_trans_get_ready() is
called to receive ack from the receiver.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |  261 ++-
 1 files changed, 260 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3be3554..82f4a4d 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -283,6 +284,14 @@ void migrate_fd_error(FdMigrationState *s)
 migrate_fd_cleanup(s);
 }
 
+static void migrate_ft_trans_error(FdMigrationState *s)
+{
+ft_mode = FT_ERROR;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_error(s);
+event_tap_unregister();
+}
+
 int migrate_fd_cleanup(FdMigrationState *s)
 {
 int ret = 0;
@@ -318,6 +327,17 @@ void migrate_fd_put_notify(void *opaque)
 qemu_file_put_notify(s-file);
 }
 
+static void migrate_fd_get_notify(void *opaque)
+{
+FdMigrationState *s = opaque;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_file_get_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_ft_trans_error(s);
+}
+}
+
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
 FdMigrationState *s = opaque;
@@ -353,6 +373,10 @@ int migrate_fd_get_buffer(void *opaque, uint8_t *data, 
int64_t pos, size_t size)
 ret = -(s-get_error(s));
 }
 
+if (ret == -EAGAIN) {
+qemu_set_fd_handler2(s-fd, NULL, migrate_fd_get_notify, NULL, s);
+}
+
 return ret;
 }
 
@@ -379,6 +403,230 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int migrate_ft_trans_commit(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_COMMIT  ft_mode != FT_TRANSACTION_ATOMIC) {
+fprintf(stderr,
+migrate_ft_trans_commit: invalid ft_mode %d\n, ft_mode);
+goto out;
+}
+
+do {
+if (ft_mode == FT_TRANSACTION_ATOMIC) {
+if (qemu_ft_trans_begin(s-file)  0) {
+fprintf(stderr, qemu_ft_trans_begin failed\n);
+goto out;
+}
+
+ret = qemu_savevm_trans_begin(s-mon, s-file, 0);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_begin failed\n);
+goto out;
+}
+
+ft_mode = FT_TRANSACTION_COMMIT;
+if (ret) {
+/* don't proceed until if fd isn't ready */
+goto out;
+}
+}
+
+/* make the VM state consistent by flushing outstanding events */
+vm_stop(0);
+
+/* send at full speed */
+qemu_file_set_rate_limit(s-file, 0);
+
+ret = qemu_savevm_trans_complete(s-mon, s-file);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_complete failed\n);
+goto out;
+}
+
+ret = qemu_ft_trans_commit(s-file);
+if (ret  0) {
+fprintf(stderr, qemu_ft_trans_commit failed\n);
+goto out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_RECV;
+ret = 1;
+goto out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto out;
+}
+
+ft_mode =  ret ? FT_TRANSACTION_BEGIN : FT_TRANSACTION_ATOMIC;
+} while (ft_mode != FT_TRANSACTION_BEGIN);
+
+vm_start();
+ret = 0;
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_get_ready(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_RECV) {
+fprintf(stderr,
+migrate_ft_trans_get_ready: invalid ft_mode %d\n, ft_mode);
+goto error_out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto error_out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_BEGIN;
+} else {
+ft_mode = FT_TRANSACTION_ATOMIC;
+
+ret = migrate_ft_trans_commit(s);
+if (ret  0) {
+goto error_out;
+}
+if (ret) {
+goto out;
+}
+}
+
+vm_start();
+ret = 0;
+goto out;
+
+error_out:
+migrate_ft_trans_error(s);
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_put_ready(void)
+{
+FdMigrationState *s = migrate_to_fms(current_migration

[Qemu-devel] [PATCH 04/18] qemu-char: export socket_set_nodelay().

2011-02-23 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-char.c   |2 +-
 qemu_socket.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index bd4e944..c4f1940 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2111,7 +2111,7 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
diff --git a/qemu_socket.h b/qemu_socket.h
index 897a8ae..b7f8465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,6 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_nodelay(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.1.2




[Qemu-devel] Re: [PATCH 05/18] vl.c: add deleted flag for deleting the handler.

2011-02-23 Thread Yoshiaki Tamura

Juan Quintela wrote:

Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:

Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
---
  vl.c |   13 +
  1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..4e263c3 100644
--- a/vl.c
+++ b/vl.c
@@ -1158,6 +1158,7 @@ static void nographic_update(void *opaque)
  struct vm_change_state_entry {
  VMChangeStateHandler *cb;
  void *opaque;
+int deleted;
  QLIST_ENTRY (vm_change_state_entry) entries;
  };

@@ -1178,8 +1179,7 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,

  void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
  {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
  }

  void vm_state_notify(int running, int reason)
@@ -1188,8 +1188,13 @@ void vm_state_notify(int running, int reason)

  trace_vm_state_notify(running, reason);

-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);


this needs to become:


+QLIST_FOREACH(e,vm_change_state_head, entries) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}


VMChangeState_entry *next;

QLIST_FOREACH_SAFE(e,vm_change_state_head, entries, next) {
   .

   Otherwise you are accessing e after qemu_free and being put out of
   the list.


You're right.  Thanks.

Yoshi



Later, Juan.






[Qemu-devel] Re: [PATCH 03/18] Introduce skip_header parameter to qemu_loadvm_state().

2011-02-23 Thread Yoshiaki Tamura

Juan Quintela wrote:

Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:

Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
---
  migration.c |2 +-
  savevm.c|   24 +---
  sysemu.h|2 +-
  3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/migration.c b/migration.c
index 302b8fe..bd51fef 100644
--- a/migration.c
+++ b/migration.c
@@ -63,7 +63,7 @@ int qemu_start_incoming_migration(const char *uri)

  void process_incoming_migration(QEMUFile *f)
  {
-if (qemu_loadvm_state(f)  0) {
+if (qemu_loadvm_state(f, 0)  0) {
  fprintf(stderr, load of migration failed\n);
  exit(0);
  }


I think it would be better to just create a different function

qemu_loadvm_state_internal() (better name)

and that qemu_loadvm_state() just does the other tests and call
qemu_loadvm_state_internal?


Sounds reasonable.  Let me try.

Yoshi




Later, Juan.





[Qemu-devel] Re: [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-23 Thread Yoshiaki Tamura

Juan Quintela wrote:

Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp  wrote:

This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.


Could you explain what is the difference with buffered_file.c?
I am fixing problems on buffered_file, and having something that copies
lot of code from there makes me nervous.


The objective is different:

buffered_file buffers data for transmission control.
ft_trans_file adds headers to the stream, and controls the transaction between 
sender and receiver.


Although ft_trans_file sometimes buffers date, but it's not the main objective.
If you're fixing the problems on buffered_file, I'll keep eyes on them.


+typedef ssize_t (FtTransPutBufferFunc)(void *opaque, const void *data, size_t 
size);


Can we get some sharing here?
typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);

There are not so much types for a write function that the 1st element is
one opaque :p


You're right, but I want to keep ft_trans_file independent of buffered_file at 
this point.  Once Kemari gets merged, I'm happy to work with you to fix the 
problems on buffered_file and ft_trans_file, and refactoring them.


Thanks,

Yoshi



Later, Juan.





Re: [Qemu-devel] [PATCH 21/28] migration: Make state definitions local

2011-02-23 Thread Yoshiaki Tamura
2011/2/24 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    8 
  migration.h |    8 
  2 files changed, 8 insertions(+), 8 deletions(-)

 diff --git a/migration.c b/migration.c
 index 493c2d7..697c74f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -31,6 +31,14 @@
     do { } while (0)
  #endif

 +enum migration_state {
 +    MIG_STATE_ERROR,

Would be better to say:

MIG_STATE_ERROR = -1,

Yoshi

 +    MIG_STATE_NONE,
 +    MIG_STATE_CANCELLED,
 +    MIG_STATE_ACTIVE,
 +    MIG_STATE_COMPLETED,
 +};
 +
  #define MAX_THROTTLE  (32  20)      /* Migration speed throttling */

  static MigrationState current_migration = {
 diff --git a/migration.h b/migration.h
 index 3c5bb6a..e1fc921 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -18,14 +18,6 @@
  #include qemu-common.h
  #include notify.h

 -enum migration_state {
 -    MIG_STATE_ERROR,
 -    MIG_STATE_NONE,
 -    MIG_STATE_CANCELLED,
 -    MIG_STATE_ACTIVE,
 -    MIG_STATE_COMPLETED,
 -};
 -
  typedef struct MigrationState MigrationState;

  struct MigrationState
 --
 1.7.4






Re: [Qemu-devel] Re: [PATCH 22/22] migration: Make state definitions local

2011-02-23 Thread Yoshiaki Tamura
2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  migration.h |    6 --
  2 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 383ebaf..90fc2a0 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -31,6 +31,12 @@
     do { } while (0)
  #endif

 +#define MIG_STATE_ERROR                -1
 +#define MIG_STATE_NONE         0
 +#define MIG_STATE_CANCELLED    1
 +#define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3
 +
  static MigrationState current_migration = {
     .state = MIG_STATE_NONE,
      /* Migration speed throttling */
 diff --git a/migration.h b/migration.h
 index 9457807..493fbe5 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -18,12 +18,6 @@
  #include qemu-common.h
  #include notify.h

 -#define MIG_STATE_ERROR                -1
 -#define MIG_STATE_NONE         0
 -#define MIG_STATE_CANCELLED    1
 -#define MIG_STATE_ACTIVE       2
 -#define MIG_STATE_COMPLETED    3
 -

 Although you're right, I would prefer to keep it so that somebody
 outside of migration may understand the status in the future if
 there are no harms.

 my plan is to move MigrationState inside migration.c, and then decide
 what to export/not export.

Well, it may be just a policy, but it's already exported, and I
would like to keep it unless it bothers your plan.  IIUC, I don't
think it does.

 Next thing to do is move migration to its
 own thread.  Before doing that, I need to know what parts are used/not
 used outside migration.c.  Removing it now means that nothing gets to
 use it without needing a patch.

I've once asked Anthony whether it's possible to make migration
to different threads, but his answer was no due to hard
dependency of qemu's internal code, and making migration to
different threads are bad design.

Thanks,

Yoshi


 Later, Juan..





[Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-02-23 Thread Yoshiaki Tamura
Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d611100..e192eec 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3662,6 +3663,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.1.2




[Qemu-devel] [PATCH 13/18] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async().

2011-02-23 Thread Yoshiaki Tamura
event-tap function is called only when it is on.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 net.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index ec4745d..724b549 100644
--- a/net.c
+++ b/net.c
@@ -36,6 +36,7 @@
 #include qemu-common.h
 #include qemu_socket.h
 #include hw/qdev.h
+#include event-tap.h
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -559,6 +560,10 @@ ssize_t qemu_send_packet_async(VLANClientState *sender,
 
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size)
 {
+if (event_tap_is_on()) {
+return event_tap_send_packet(vc, buf, size);
+}
+
 qemu_send_packet_async(vc, buf, size, NULL);
 }
 
@@ -657,6 +662,10 @@ ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 {
 NetQueue *queue;
 
+if (event_tap_is_on()) {
+return event_tap_sendv_packet_async(sender, iov, iovcnt, sent_cb);
+}
+
 if (sender-link_down || (!sender-peer  !sender-vlan)) {
 return calc_iov_length(iov, iovcnt);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 03/18] Introduce qemu_loadvm_state_no_header() and make qemu_loadvm_state() a wrapper.

2011-02-23 Thread Yoshiaki Tamura
Introduce qemu_loadvm_state_no_header() so that it can be called
iteratively without reading the header, and qemu_loadvm_state()
becomes a wrapper of it.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |   45 +++--
 1 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/savevm.c b/savevm.c
index 22010b9..fc62bcb 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1716,30 +1716,14 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_no_header(QEMUFile *f)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
 LoadStateEntry *le, *new_le;
 uint8_t section_type;
-unsigned int v;
-int ret;
-
-if (qemu_savevm_state_blocked(default_mon)) {
-return -EINVAL;
-}
-
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
-}
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
+int ret;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -1834,6 +1818,31 @@ out:
 return ret;
 }
 
+int qemu_loadvm_state(QEMUFile *f)
+{
+unsigned int v;
+
+if (qemu_savevm_state_blocked(default_mon)) {
+return -EINVAL;
+}
+
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC) {
+return -EINVAL;
+}
+
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION) {
+return -ENOTSUP;
+}
+
+return qemu_loadvm_state_no_header(f);
+}
+
 static int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
   const char *name)
 {
-- 
1.7.1.2




[Qemu-devel] [PATCH 01/18] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2011-02-23 Thread Yoshiaki Tamura
Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   20 +++-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 5e24329..a168a37 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index a50fd31..22010b9 100644
--- a/savevm.c
+++ b/savevm.c
@@ -171,7 +171,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -422,6 +423,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_malloc(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -452,6 +456,19 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -477,6 +494,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 06/18] virtio: decrement last_avail_idx with inuse before saving.

2011-02-23 Thread Yoshiaki Tamura
For regular migration inuse == 0 always as requests are flushed before
save. However, event-tap log when enabled introduces an extra queue
for requests which is not being flushed, thus the last inuse requests
are left in the event-tap queue.  Move the last_avail_idx value sent
to the remote back to make it repeat the last inuse requests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 31bd9e3..f05d1b6 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -673,12 +673,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, i);
 
 for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+/* For regular migration inuse == 0 always as
+ * requests are flushed before save. However,
+ * event-tap log when enabled introduces an extra
+ * queue for requests which is not being flushed,
+ * thus the last inuse requests are left in the event-tap queue.
+ * Move the last_avail_idx value sent to the remote back
+ * to make it repeat the last inuse requests. */
+uint16_t last_avail = vdev-vq[i].last_avail_idx - vdev-vq[i].inuse;
 if (vdev-vq[i].vring.num == 0)
 break;
 
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
-qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, last_avail);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
-- 
1.7.1.2




[Qemu-devel] [PATCH 14/18] block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and bdrv_flush().

2011-02-23 Thread Yoshiaki Tamura
event-tap function is called only when it is on, and requests were
sent from device emulators.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Kevin Wolf kw...@redhat.com
---
 block.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..b19729a 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include block_int.h
 #include module.h
 #include qemu-objects.h
+#include event-tap.h
 
 #ifdef CONFIG_BSD
 #include sys/types.h
@@ -1585,6 +1586,10 @@ int bdrv_flush(BlockDriverState *bs)
 }
 
 if (bs-drv  bs-drv-bdrv_flush) {
+if (*bs-device_name  event_tap_is_on()) {
+event_tap_bdrv_flush();
+}
+
 return bs-drv-bdrv_flush(bs);
 }
 
@@ -2220,6 +2225,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 if (bdrv_check_request(bs, sector_num, nb_sectors))
 return NULL;
 
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+ cb, opaque);
+}
+
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
  opaque);
@@ -2483,6 +2493,11 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
 if (!drv)
 return NULL;
+
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_flush(bs, cb, opaque);
+}
+
 return drv-bdrv_aio_flush(bs, cb, opaque);
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 00/18] Kemari for KVM v0.2.12

2011-02-23 Thread Yoshiaki Tamura
 dirty bitmap optimization which aren't ready for posting
yet.  To remove the dirty bitmap optimization, please look at HEAD~4
of the tree.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari next

Thanks,

Yoshi

Yoshiaki Tamura (18):
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce qemu_loadvm_state_no_header() and make qemu_loadvm_state()
a wrapper.
  qemu-char: export socket_set_nodelay().
  vl.c: add deleted flag for deleting the handler.
  virtio: decrement last_avail_idx with inuse before saving.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  savevm: introduce util functions to control ft_trans_file from savevm
layer.
  Introduce event-tap.
  Call init handler of event-tap at main() in vl.c.
  ioport: insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.
  net: insert event-tap to qemu_send_packet() and
qemu_sendv_packet_async().
  block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and
bdrv_flush().
  savevm: introduce qemu_savevm_trans_{begin,commit}.
  migration: introduce migrate_ft_trans_{put,get}_ready(), and modify
migrate_fd_put_ready() when ft_mode is on.
  migration-tcp: modify tcp_accept_incoming_migration() to handle
ft_mode, and add a hack not to close fd when ft_mode is enabled.
  Introduce kemari: to enable FT migration mode (Kemari).

 Makefile.objs   |1 +
 Makefile.target |1 +
 block.c |   15 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 exec.c  |4 +
 ft_trans_file.c |  624 
 ft_trans_file.h |   72 +
 hmp-commands.hx |4 +-
 hw/hw.h |7 +
 hw/virtio.c |   10 +-
 ioport.c|2 +
 migration-tcp.c |   82 +-
 migration.c |  289 +-
 migration.h |3 +
 net.c   |9 +
 qemu-char.c |2 +-
 qemu-tool.c |   28 ++
 qemu_socket.h   |1 +
 qmp-commands.hx |4 +-
 savevm.c|  372 +-
 sysemu.h|2 +
 trace-events|   25 ++
 vl.c|   18 +-
 24 files changed, 2471 insertions(+), 88 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h




[Qemu-devel] [PATCH 08/18] savevm: introduce util functions to control ft_trans_file from savevm layer.

2011-02-23 Thread Yoshiaki Tamura
To utilize ft_trans_file function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 ++
 savevm.c |  150 ++
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index a168a37..a9eff5a 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -51,6 +51,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -60,6 +61,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_ft_trans_begin(QEMUFile *f);
+int qemu_ft_trans_commit(QEMUFile *f);
+int qemu_ft_trans_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
@@ -94,6 +98,7 @@ void qemu_file_set_error(QEMUFile *f);
  * halted due to rate limiting or EAGAIN errors occur as it can be used to
  * resume output. */
 void qemu_file_put_notify(QEMUFile *f);
+void qemu_file_get_notify(void *opaque);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index fc62bcb..aa760b7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_trans_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -189,6 +190,13 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
+typedef struct QEMUFileSocketTrans
+{
+int fd;
+QEMUFileSocket *s;
+VMChangeStateEntry *e;
+} QEMUFileSocketTrans;
+
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
 QEMUFileSocket *s = opaque;
@@ -204,6 +212,22 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -211,6 +235,71 @@ static int socket_close(void *opaque)
 return 0;
 }
 
+static int socket_trans_get_buffer(void *opaque, uint8_t *buf, int64_t pos, 
size_t size)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+ssize_t len;
+
+len = socket_get_buffer(s, buf, pos, size);
+
+return len;
+}
+
+static ssize_t socket_trans_put_buffer(void *opaque, const void *buf, size_t 
size)
+{
+QEMUFileSocketTrans *t = opaque;
+
+return socket_put_buffer(t-s, buf, size);
+}
+
+static int qemu_loadvm_state_no_header(QEMUFile *f);
+
+static int socket_trans_get_ready(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+QEMUFile *f = s-file;
+int ret = 0;
+
+ret = qemu_loadvm_state_no_header(f);
+if (ret  0) {
+fprintf(stderr,
+socket_trans_get_ready: error while loading vmstate\n);
+}
+
+return ret;
+}
+
+static int socket_trans_close(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(t-fd, NULL, NULL, NULL, NULL);
+qemu_del_vm_change_state_handler(t-e);
+close(s-fd);
+close(t-fd);
+qemu_free(s);
+qemu_free(t);
+
+return 0;
+}
+
+static void socket_trans_resume(void *opaque, int running, int reason)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+if (!running) {
+return;
+}
+
+qemu_announce_self();
+qemu_fclose(s-file);
+}
+
 static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
 {
 QEMUFileStdio *s = opaque;
@@ -333,6 +422,26 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd)
+{
+QEMUFileSocketTrans *t = qemu_mallocz(sizeof(QEMUFileSocketTrans));
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+t-s = s;
+t-fd = s_fd;
+t-e = qemu_add_vm_change_state_handler(socket_trans_resume, t);
+
+s-fd = c_fd;
+s-file = qemu_fopen_ops_ft_trans(t, socket_trans_put_buffer,
+  socket_trans_get_buffer, NULL,
+  socket_trans_get_ready,
+  migrate_fd_wait_for_unfreeze

[Qemu-devel] [PATCH 17/18] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2011-02-23 Thread Yoshiaki Tamura
When ft_mode is set in the header, tcp_accept_incoming_migration()
sets ft_trans_incoming() as a callback, and call
qemu_file_get_notify() to receive FT transaction iteratively.  We also
need a hack no to close fd before moving to ft_transaction mode, so
that we can reuse the fd for it.  vm_change_state_handler is added to
turn off ft_mode when cont is pressed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   67 ++-
 1 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 55777c8..84076d6 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,8 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_trans_file.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -29,6 +31,8 @@
 do { } while (0)
 #endif
 
+static VMChangeStateEntry *vmstate;
+
 static int socket_errno(FdMigrationState *s)
 {
 return socket_error();
@@ -56,7 +60,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -150,6 +155,36 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 return s-mig_state;
 }
 
+static void ft_trans_incoming(void *opaque)
+{
+QEMUFile *f = opaque;
+
+qemu_file_get_notify(f);
+if (qemu_file_has_error(f)) {
+ft_mode = FT_ERROR;
+qemu_fclose(f);
+}
+}
+
+static void ft_trans_reset(void *opaque, int running, int reason)
+{
+QEMUFile *f = opaque;
+
+if (running) {
+if (ft_mode != FT_ERROR) {
+qemu_fclose(f);
+}
+ft_mode = FT_OFF;
+qemu_del_vm_change_state_handler(vmstate);
+}
+}
+
+static void ft_trans_schedule_replay(QEMUFile *f)
+{
+event_tap_schedule_replay();
+vmstate = qemu_add_vm_change_state_handler(ft_trans_reset, f);
+}
+
 static void tcp_accept_incoming_migration(void *opaque)
 {
 struct sockaddr_in addr;
@@ -175,8 +210,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (ft_mode == FT_INIT) {
+autostart = 0;
+}
+
 process_incoming_migration(f);
+
+if (ft_mode == FT_INIT) {
+int ret;
+
+socket_set_nodelay(c);
+
+f = qemu_fopen_ft_trans(s, c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen_ft_trans\n);
+goto out;
+}
+
+/* need to wait sender to setup */
+ret = qemu_ft_trans_begin(f);
+if (ret  0) {
+goto out;
+}
+
+qemu_set_fd_handler2(c, NULL, ft_trans_incoming, NULL, f);
+ft_trans_schedule_replay(f);
+ft_mode = FT_TRANSACTION_RECV;
+
+return;
+}
+
 qemu_fclose(f);
+
 out:
 close(c);
 out2:
-- 
1.7.1.2




[Qemu-devel] [PATCH 10/18] Call init handler of event-tap at main() in vl.c.

2011-02-23 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 0bda77d..0ea1fb7 100644
--- a/vl.c
+++ b/vl.c
@@ -162,6 +162,7 @@ int main(int argc, char **argv)
 #include qemu-queue.h
 #include cpus.h
 #include arch_init.h
+#include event-tap.h
 
 #include ui/qemu-spice.h
 
@@ -2931,6 +2932,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 /* open the virtual block devices */
 if (snapshot)
 qemu_opts_foreach(qemu_find_opts(drive), drive_enable_snapshot, 
NULL, 0);
-- 
1.7.1.2




[Qemu-devel] [PATCH 11/18] ioport: insert event_tap_ioport() to ioport_write().

2011-02-23 Thread Yoshiaki Tamura
Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index aa4188a..74aebf5 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include ioport.h
 #include trace.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.1.2




[Qemu-devel] [PATCH 04/18] qemu-char: export socket_set_nodelay().

2011-02-23 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-char.c   |2 +-
 qemu_socket.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index bd4e944..c4f1940 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2111,7 +2111,7 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
diff --git a/qemu_socket.h b/qemu_socket.h
index 897a8ae..b7f8465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,6 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_nodelay(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.1.2




[Qemu-devel] [PATCH 05/18] vl.c: add deleted flag for deleting the handler.

2011-02-23 Thread Yoshiaki Tamura
Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..0bda77d 100644
--- a/vl.c
+++ b/vl.c
@@ -1158,6 +1158,7 @@ static void nographic_update(void *opaque)
 struct vm_change_state_entry {
 VMChangeStateHandler *cb;
 void *opaque;
+int deleted;
 QLIST_ENTRY (vm_change_state_entry) entries;
 };
 
@@ -1178,18 +1179,22 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
 }
 
 void vm_state_notify(int running, int reason)
 {
-VMChangeStateEntry *e;
+VMChangeStateEntry *e, *ne;
 
 trace_vm_state_notify(running, reason);
 
-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);
+QLIST_FOREACH_SAFE(e, vm_change_state_head, entries, ne) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}
 }
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-23 Thread Yoshiaki Tamura
This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs   |1 +
 ft_trans_file.c |  624 +++
 ft_trans_file.h |   72 +++
 migration.c |3 +
 trace-events|   15 ++
 5 files changed, 715 insertions(+), 0 deletions(-)
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h

diff --git a/Makefile.objs b/Makefile.objs
index c144df1..8856160 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -100,6 +100,7 @@ common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
 common-obj-y += pflib.o
+common-obj-y += ft_trans_file.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_trans_file.c b/ft_trans_file.c
new file mode 100644
index 000..2b42b95
--- /dev/null
+++ b/ft_trans_file.c
@@ -0,0 +1,624 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include trace.h
+#include ft_trans_file.h
+
+typedef struct FtTransHdr
+{
+uint16_t cmd;
+uint16_t id;
+uint32_t seq;
+uint32_t payload_len;
+} FtTransHdr;
+
+typedef struct QEMUFileFtTrans
+{
+FtTransPutBufferFunc *put_buffer;
+FtTransGetBufferFunc *get_buffer;
+FtTransPutReadyFunc *put_ready;
+FtTransGetReadyFunc *get_ready;
+FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
+FtTransCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+
+enum QEMU_VM_TRANSACTION_STATE state;
+uint32_t seq;
+uint16_t id;
+
+int has_error;
+
+bool freeze_output;
+bool freeze_input;
+bool rate_limit;
+bool is_sender;
+bool is_payload;
+
+uint8_t *buf;
+size_t buf_max_size;
+size_t put_offset;
+size_t get_offset;
+
+FtTransHdr header;
+size_t header_offset;
+} QEMUFileFtTrans;
+
+#define IO_BUF_SIZE 32768
+
+static void ft_trans_append(QEMUFileFtTrans *s,
+const uint8_t *buf, size_t size)
+{
+if (size  (s-buf_max_size - s-put_offset)) {
+trace_ft_trans_realloc(s-buf_max_size, size + 1024);
+s-buf_max_size += size + 1024;
+s-buf = qemu_realloc(s-buf, s-buf_max_size);
+}
+
+trace_ft_trans_append(size);
+memcpy(s-buf + s-put_offset, buf, size);
+s-put_offset += size;
+}
+
+static void ft_trans_flush(QEMUFileFtTrans *s)
+{
+size_t offset = 0;
+
+if (s-has_error) {
+error_report(flush when error %d, bailing, s-has_error);
+return;
+}
+
+while (offset  s-put_offset) {
+ssize_t ret;
+
+ret = s-put_buffer(s-opaque, s-buf + offset, s-put_offset - 
offset);
+if (ret == -EAGAIN) {
+break;
+}
+
+if (ret = 0) {
+error_report(error flushing data, %s, strerror(errno));
+s-has_error = FT_TRANS_ERR_FLUSH;
+break;
+} else {
+offset += ret;
+}
+}
+
+trace_ft_trans_flush(offset, s-put_offset);
+memmove(s-buf, s-buf + offset, s-put_offset - offset);
+s-put_offset -= offset;
+s-freeze_output = !!s-put_offset;
+}
+
+static ssize_t ft_trans_put(void *opaque, void *buf, int size)
+{
+QEMUFileFtTrans *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+/* flush buffered data before putting next */
+if (s-put_offset) {
+ft_trans_flush(s);
+}
+
+while (!s-freeze_output  offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len == -EAGAIN) {
+trace_ft_trans_freeze_output();
+s-freeze_output = 1;
+break;
+}
+
+if (len = 0) {
+error_report(putting data failed, %s, strerror(errno));
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+if (s-freeze_output) {
+ft_trans_append(s, buf + offset, size - offset);
+offset = size;
+}
+
+return offset;
+}
+
+static int ft_trans_send_header(QEMUFileFtTrans *s,
+enum QEMU_VM_TRANSACTION_STATE state,
+uint32_t payload_len)
+{
+int ret;
+FtTransHdr

[Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-23 Thread Yoshiaki Tamura
Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.  Note that this
won't change the existing migration protocol to be bi-directional.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   15 +++
 migration.c |   13 +
 migration.h |3 +++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index b55f419..55777c8 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,20 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s-fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do {
+len = recv(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
@@ -94,6 +108,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s-get_error = socket_errno;
 s-write = socket_write;
+s-read = socket_read;
 s-close = tcp_close;
 s-mig_state.cancel = migrate_fd_cancel;
 s-mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index af3a1f2..302b8fe 100644
--- a/migration.c
+++ b/migration.c
@@ -340,6 +340,19 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size)
+{
+FdMigrationState *s = opaque;
+int ret;
+
+ret = s-read(s, data, size);
+if (ret == -1) {
+ret = -(s-get_error(s));
+}
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index 2170792..88a6987 100644
--- a/migration.h
+++ b/migration.h
@@ -48,6 +48,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -116,6 +117,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.1.2




[Qemu-devel] [PATCH 0/2] Fix error handling in migration when the peer is killed.

2011-02-22 Thread Yoshiaki Tamura
Hi,

During live migration, if the receiver side of qemu gets killed, the
sender side seems to be handling the error incorrectly, like it passes
the iterate phase (stage 2) and moves on to the complete state (stage
3).  These patches fix the issue.

Yoshiaki Tamura (2):
  savevm: avoid qemu_savevm_state_iteate() to return 1 when qemu file
has error.
  migration: add error handling to migrate_fd_put_notify().

 migration.c |9 +++--
 savevm.c|7 ---
 2 files changed, 7 insertions(+), 9 deletions(-)




[Qemu-devel] [PATCH 1/2] savevm: avoid qemu_savevm_state_iteate() to return 1 when qemu file has error.

2011-02-22 Thread Yoshiaki Tamura
When qemu on the receiver gets killed during live migration, if debug
is turned on, migrate_fd_put_ready() says,

migration: done iterating

and proceeds.  The reason was qemu_savevm_state_iterate() returning 1
even when qemu file has error.  This patch checks
qemu_file_has_error() before returning 1/0, and avoids
migrate_fd_put_ready() to proceed in case of error.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index a50fd31..1a0be58 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1501,14 +1501,15 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 }
 }
 
-if (ret)
-return 1;
-
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
 return -EIO;
 }
 
+if (ret) {
+return 1;
+}
+
 return 0;
 }
 
-- 
1.7.1.2




[Qemu-devel] [PATCH 2/2] migration: add error handling to migrate_fd_put_notify().

2011-02-22 Thread Yoshiaki Tamura
Although migrate_fd_put_buffer() sets MIG_STATE_ERROR if it failed,
since migrate_fd_put_notify() isn't checking error of underlying
QEMUFile, those resources are kept open.  This patch checks it and
calls migrate_fd_error() in case of error.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/migration.c b/migration.c
index af3a1f2..14a125f 100644
--- a/migration.c
+++ b/migration.c
@@ -313,6 +313,9 @@ void migrate_fd_put_notify(void *opaque)
 
 qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
 qemu_file_put_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_fd_error(s);
+}
 }
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
@@ -329,12 +332,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 
 if (ret == -EAGAIN) {
 qemu_set_fd_handler2(s-fd, NULL, NULL, migrate_fd_put_notify, s);
-} else if (ret  0) {
-if (s-mon) {
-monitor_resume(s-mon);
-}
-s-state = MIG_STATE_ERROR;
-notifier_list_notify(migration_state_notifiers);
 }
 
 return ret;
-- 
1.7.1.2




Re: [Qemu-devel] Re: [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-22 Thread Yoshiaki Tamura
2011/2/23 ya su suya94...@gmail.com:
 Yoshi:

    thanks for your explaining.
    if you introduce a new stage as 3, I think stage 1 also need to change as
 it will mark all pages dirty.
    looking forward to your new patch update.

Unless there're strong comments from others, I won't put it in
this series though because I don't want to touch other components
as much as possible this time.

Yoshi


 Green.


 2011/2/21 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp

 Hi Green,

 2011/2/21 ya su suya94...@gmail.com:
  Yoshiaki:
 
      I have one question about ram_save_live, during migration 3
  stage(completation stage), it will call
  cpu_physical_memory_set_dirty_tracking(0) to stop recording ram dirty
  pages.
  at the end of migrate_ft_trans_connect function, it will invoke
  vm_start(),
  at this time, cpu_physical_memory_set_dirty_tracking(1) is not called
  yet,
  so there may have some ram pages not recorded when
  qemu_savevm_trans_begin
  is called.  I think you need calll
  cpu_physical_memory_set_dirty_tracking(1) in migrate_ft_trans_connect
  function, Am I right?

 Thank you for taking a look.
 When qemu_savevm_trans_begin is called for the first time, it
 calls ram_save_live with stage 1, that sends all pages and sets
 dirty tracking, so there won't be missing pages.  Note that
 event-tap is turned on by then, meaning no outputs are sent before
 finishing the first transaction.  I understand that this
 implementation is inefficient, and planning to introduce a new
 stage that is almost same as stage 3 but keeps dirty tracking in
 the future.

 Thanks,

 Yoshi

 
  BR
 
  Green.
 
 
  2011/2/10 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 
  This code implements VM transaction protocol.  Like buffered_file, it
  sits between savevm and migration layer.  With this architecture, VM
  transaction protocol is implemented mostly independent from other
  existing code.
 
  Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
  Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
  ---
   Makefile.objs   |    1 +
   ft_trans_file.c |  624
  +++
   ft_trans_file.h |   72 +++
   migration.c     |    3 +
   trace-events    |   15 ++
   5 files changed, 715 insertions(+), 0 deletions(-)
   create mode 100644 ft_trans_file.c
   create mode 100644 ft_trans_file.h
 
  diff --git a/Makefile.objs b/Makefile.objs
  index 353b1a8..04148b5 100644
  --- a/Makefile.objs
  +++ b/Makefile.objs
  @@ -100,6 +100,7 @@ common-obj-y += msmouse.o ps2.o
   common-obj-y += qdev.o qdev-properties.o
   common-obj-y += block-migration.o
   common-obj-y += pflib.o
  +common-obj-y += ft_trans_file.o
 
   common-obj-$(CONFIG_BRLAPI) += baum.o
   common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o
  migration-fd.o
  diff --git a/ft_trans_file.c b/ft_trans_file.c
  new file mode 100644
  index 000..2b42b95
  --- /dev/null
  +++ b/ft_trans_file.c
  @@ -0,0 +1,624 @@
  +/*
  + * Fault tolerant VM transaction QEMUFile
  + *
  + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2.
   See
  + * the COPYING file in the top-level directory.
  + *
  + * This source code is based on buffered_file.c.
  + * Copyright IBM, Corp. 2008
  + * Authors:
  + *  Anthony Liguori        aligu...@us.ibm.com
  + */
  +
  +#include qemu-common.h
  +#include qemu-error.h
  +#include hw/hw.h
  +#include qemu-timer.h
  +#include sysemu.h
  +#include qemu-char.h
  +#include trace.h
  +#include ft_trans_file.h
  +
  +typedef struct FtTransHdr
  +{
  +    uint16_t cmd;
  +    uint16_t id;
  +    uint32_t seq;
  +    uint32_t payload_len;
  +} FtTransHdr;
  +
  +typedef struct QEMUFileFtTrans
  +{
  +    FtTransPutBufferFunc *put_buffer;
  +    FtTransGetBufferFunc *get_buffer;
  +    FtTransPutReadyFunc *put_ready;
  +    FtTransGetReadyFunc *get_ready;
  +    FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
  +    FtTransCloseFunc *close;
  +    void *opaque;
  +    QEMUFile *file;
  +
  +    enum QEMU_VM_TRANSACTION_STATE state;
  +    uint32_t seq;
  +    uint16_t id;
  +
  +    int has_error;
  +
  +    bool freeze_output;
  +    bool freeze_input;
  +    bool rate_limit;
  +    bool is_sender;
  +    bool is_payload;
  +
  +    uint8_t *buf;
  +    size_t buf_max_size;
  +    size_t put_offset;
  +    size_t get_offset;
  +
  +    FtTransHdr header;
  +    size_t header_offset;
  +} QEMUFileFtTrans;
  +
  +#define IO_BUF_SIZE 32768
  +
  +static void ft_trans_append(QEMUFileFtTrans *s,
  +                            const uint8_t *buf, size_t size)
  +{
  +    if (size  (s-buf_max_size - s-put_offset)) {
  +        trace_ft_trans_realloc(s-buf_max_size, size + 1024);
  +        s-buf_max_size += size + 1024;
  +        s-buf = qemu_realloc(s-buf, s-buf_max_size);
  +    }
  +
  +    trace_ft_trans_append(size);
  +    memcpy(s-buf + s-put_offset, buf, size);
  +    s-put_offset += size

  1   2   3   4   5   6   7   >