date:20110223

Re: [Qemu-devel] how to judge machine's state in user-level/kernel-level

2011-02-23 Thread Stefan Hajnoczi

On Wed, Feb 23, 2011 at 4:05 AM, Jianhui Yue jianhui@gmail.com wrote:
 hi, All,
       I am wondering how to determine machine's state in user-level
 or kernel level from Operating System perspective.
 what's meaning of CPUState.exception_is_int?  Thanks.

If kernel level from Operating System perspective means privileged
then you could look at the current privilege level in a target
architecture-specific way.  So on x86 look at the code segment
selector.

Stefan

Re: [Qemu-devel] [PATCH 03/22] migration: Fold MigrationState into FdMigrationState

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration-exec.c |   10 +-
  migration-fd.c   |   10 +-
  migration-tcp.c  |   10 +-
  migration-unix.c |   10 +-
  migration.c      |   11 +--
  migration.h      |   23 +--
  6 files changed, 30 insertions(+), 44 deletions(-)

 diff --git a/migration-exec.c b/migration-exec.c
 index b49a475..02b0667 100644
 --- a/migration-exec.c
 +++ b/migration-exec.c
 @@ -93,12 +93,12 @@ FdMigrationState *exec_start_outgoing_migration(Monitor 
 *mon,
     s-close = exec_close;
     s-get_error = file_errno;
     s-write = file_write;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-fd.c b/migration-fd.c
 index bd5e8a9..ccba86b 100644
 --- a/migration-fd.c
 +++ b/migration-fd.c
 @@ -76,12 +76,12 @@ FdMigrationState *fd_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = fd_errno;
     s-write = fd_write;
     s-close = fd_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-tcp.c b/migration-tcp.c
 index 355bc37..02b01ed 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -95,12 +95,12 @@ FdMigrationState *tcp_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = socket_errno;
     s-write = socket_write;
     s-close = tcp_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration-unix.c b/migration-unix.c
 index b9b0dbf..fb73f46 100644
 --- a/migration-unix.c
 +++ b/migration-unix.c
 @@ -94,12 +94,12 @@ FdMigrationState *unix_start_outgoing_migration(Monitor 
 *mon,
     s-get_error = unix_errno;
     s-write = unix_write;
     s-close = unix_close;
 -    s-mig_state.cancel = migrate_fd_cancel;
 -    s-mig_state.get_status = migrate_fd_get_status;
 -    s-mig_state.release = migrate_fd_release;
 +    s-cancel = migrate_fd_cancel;
 +    s-get_status = migrate_fd_get_status;
 +    s-release = migrate_fd_release;

 -    s-mig_state.blk = blk;
 -    s-mig_state.shared = inc;
 +    s-blk = blk;
 +    s-shared = inc;

     s-state = MIG_STATE_ACTIVE;
     s-mon = NULL;
 diff --git a/migration.c b/migration.c
 index 3a371a3..dd4cdab 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -86,7 +86,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);

     if (current_migration 
 -        current_migration-mig_state.get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -120,7 +120,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     }

     if (current_migration) {
 -        current_migration-mig_state.release(current_migration);
 +        current_migration-release(current_migration);
     }

     current_migration = s;
 @@ -133,7 +133,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
     FdMigrationState *s = current_migration;

     if (s)
 -        s-mig_state.cancel(s);
 +        s-cancel(s);

     return 0;
  }
 @@ -229,7 +229,7 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     QDict *qdict;

     if (current_migration) {
 -        MigrationState *s = current_migration-mig_state;
 +        FdMigrationState *s = current_migration;

         switch (s-get_status(current_migration)) {
         case MIG_STATE_ACTIVE:
 @@ -353,8 +353,7 @@ void migrate_fd_connect(FdMigrationState *s)
                                       migrate_fd_close);

     DPRINTF(beginning savevm\n);
 -    ret = qemu_savevm_state_begin(s-mon, s-file, s-mig_state.blk,
 -                                  s-mig_state.shared);
 +    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);

Re: [Qemu-devel] [PATCH 17/22] migration: use global variable directly

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 We are setting a pointer to a local variable in the previous line, just use
 the global variable directly.  We remove the -file test because it is already
 done inside qemu_file_set_rate_limit() function.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  1 files changed, 2 insertions(+), 4 deletions(-)

 diff --git a/migration.c b/migration.c
 index d7dfe1e..accc6e4 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -451,7 +451,6 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
  {
     int64_t d;
 -    MigrationState *s;

     d = qdict_get_int(qdict, value);
     if (d  0) {
 @@ -459,9 +458,8 @@ int do_migrate_set_speed(Monitor *mon, const QDict 
 *qdict, QObject **ret_data)
     }
     max_throttle = d;

 -    s = current_migration;
 -    if (s  s-file) {
 -        qemu_file_set_rate_limit(s-file, max_throttle);
 +    if (current_migration) {
 +        qemu_file_set_rate_limit(current_migration-file, max_throttle);
     }

     return 0;

Looks good to me.

Yoshi

 --
 1.7.4

Re: [Qemu-devel] Re: QEMU regression problems - Update FPU

2011-02-23 Thread Peter Maydell

On 18 February 2011 07:12, Gerhard Wiesinger li...@wiesinger.com wrote:
 Issue 1.) with FPU still present
 I tracked down the problematic code and it is a rounding error from double
 precision to 64bit floats: Any ideas how to fix such an issue in general?

 QEMU result in ST0: 0.42925860786976457 (wrong emulated)
 KVM result in ST0:  0.42925860786975449 (correct)

This is an error when running QEMU in TCG mode, right?
At the moment x86 is the odd-one-out in that it doesn't
use CONFIG_SOFTFLOAT for its FPU emulation, so somebody
has made an explicit choice of preferring speed over
accuracy, and I am unsurprised that there are rounding
errors as a result.

-- PMM

Re: [Qemu-devel] [PATCH 12/22] migration: Use migrate_fd_error() in last place that set status to ERROR

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 We are also calling to migrate_fd_cleanup(), but notice that it is the
 right thing to do.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 +-
  1 files changed, 1 insertions(+), 5 deletions(-)

 diff --git a/migration.c b/migration.c
 index ab98664..3983257 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -351,11 +351,7 @@ static ssize_t migrate_fd_put_buffer(void *opaque, const 
 void *data, size_t size
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s-fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret  0) {
 -        if (s-mon) {
 -            monitor_resume(s-mon);
 -        }
 -        s-state = MIG_STATE_ERROR;
 -        notifier_list_notify(migration_state_notifiers);
 +        migrate_fd_error(s);
     }

Are you sure about this?  migrate_fd_error may call qemu_fclose
through migrate_fd_cleanup, but the caller of
migrate_fd_put_buffer gets called by buffered_file that sits
under qemu file.  In my previous posting,

http://permalink.gmane.org/gmane.comp.emulators.qemu/94688

I thought migrate_fd_put_buffer should just return error, and let
the original caller (migrate_fd_put_notify or any) to actually call
migrate_fd_error.

Thanks,

Yoshi


     return ret;
 --
 1.7.4

Re: [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device

2011-02-23 Thread Markus Armbruster

Isaku Yamahata yamah...@valinux.co.jp writes:

 On Tue, Feb 22, 2011 at 06:36:20PM +0100, William Dauchy wrote:
  `qdev_free` when unplug a pci device
  It resolves a bug when detaching/attaching a network device
 
  # virsh detach-interface dom0 network --mac 52:54:00:f6:84:ba
  Interface detached successfully
 
  # virsh attach-interface dom0 network default --mac 52:54:00:f6:84:ba
  error: Failed to attach interface
  error: internal error unable to execute QEMU command 'device_add': 
 Duplicate ID 'net0' for device
 
 A detached pci device was not freed of the list `qemu_device_opts`
 ---
  hw/pci.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 8b76cea..1e9a8f0 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1671,13 +1671,16 @@ static int pci_unplug_device(DeviceState *qdev)
  {
  PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
  PCIDeviceInfo *info = container_of(qdev-info, PCIDeviceInfo, qdev);
 +int unplug;
  
  if (info-no_hotplug) {
  qerror_report(QERR_DEVICE_NO_HOTPLUG, info-qdev.name);
  return -1;
  }
 -return dev-bus-hotplug(dev-bus-hotplug_qdev, dev,
 +unplug = dev-bus-hotplug(dev-bus-hotplug_qdev, dev,
   PCI_HOTPLUG_DISABLED);
 +qdev_free(qdev);

 Good catch.
 qdev_free() should be called only when unplug had succeeded.
 Although piix4 unplug always successes, pcie unplug may fail.

 +return unplug;
  }
  
  void pci_qdev_register(PCIDeviceInfo *info)

I don't think this patch is correct.  Let me explain.

Device hot unplug is *not* guaranteed to succeed.

For some buses, such as USB, it always succeeds immediately, i.e. when
the device_del monitor command finishes, the device is gone.  Live is
good.

But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
doesn't wait for the dance to complete.  Why?  The dance can take an
unpredictable amount of time, including forever.

Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
slot, and the unplug has not yet completed (race condition), or it
failed.  Yes, Virginia, PCI hotplug *can* fail.

When unplug succeeds, the qdev is automatically destroyed.
pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
does it for PCIE.

Your patch adds a *second* qdev_free().  No good.

Re: [Qemu-devel] [PATCH 08/22] migration: Check that migration is active before cancel it

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/migration.c b/migration.c
 index 397a0b9..55f58c8 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -138,7 +138,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  {
     MigrationState *s = current_migration;

 -    if (s)
 +    if (s  s-get_status(s) == MIG_STATE_ACTIVE)
         s-cancel(s);

     return 0;

Why don't you remove *s again?

Yoshi

 --
 1.7.4

Re: [Qemu-devel] [PATCH 22/22] migration: Make state definitions local

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  migration.h |    6 --
  2 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 383ebaf..90fc2a0 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -31,6 +31,12 @@
     do { } while (0)
  #endif

 +#define MIG_STATE_ERROR                -1
 +#define MIG_STATE_NONE         0
 +#define MIG_STATE_CANCELLED    1
 +#define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3
 +
  static MigrationState current_migration = {
     .state = MIG_STATE_NONE,
      /* Migration speed throttling */
 diff --git a/migration.h b/migration.h
 index 9457807..493fbe5 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -18,12 +18,6 @@
  #include qemu-common.h
  #include notify.h

 -#define MIG_STATE_ERROR                -1
 -#define MIG_STATE_NONE         0
 -#define MIG_STATE_CANCELLED    1
 -#define MIG_STATE_ACTIVE       2
 -#define MIG_STATE_COMPLETED    3
 -

Although you're right, I would prefer to keep it so that somebody
outside of migration may understand the status in the future if
there are no harms.

Yoshi

  typedef struct MigrationState MigrationState;

  struct MigrationState
 --
 1.7.4

[Qemu-devel] [PATCH] reset nic when we init it

2011-02-23 Thread Wen Congyang

When we hotplug a nic to guest OS, we can see it
but the mac address of nic is wrong.

The reason of this bug is guest OS does not reset
nic, and we init eeprom(rtl8139) only when we reset
nic.

I think we should reset nic when we init it.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com

---
 hw/e1000.c |   14 --
 hw/pcnet-pci.c |   19 +++
 hw/rtl8139.c   |1 +
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 0a4574c..5bd91a8 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1137,6 +1137,12 @@ static NetClientInfo net_e1000_info = {
 .link_status_changed = e1000_set_link_status,
 };
 
+static void qdev_e1000_reset(DeviceState *dev)
+{
+E1000State *d = DO_UPCAST(E1000State, dev.qdev, dev);
+e1000_reset(d);
+}
+
 static int pci_e1000_init(PCIDevice *pci_dev)
 {
 E1000State *d = DO_UPCAST(E1000State, dev, pci_dev);
@@ -1186,13 +1192,9 @@ static int pci_e1000_init(PCIDevice *pci_dev)
 
 add_boot_device_path(d-conf.bootindex, pci_dev-qdev, /ethernet-phy@0);
 
-return 0;
-}
+qdev_e1000_reset(pci_dev-qdev);
 
-static void qdev_e1000_reset(DeviceState *dev)
-{
-E1000State *d = DO_UPCAST(E1000State, dev.qdev, dev);
-e1000_reset(d);
+return 0;
 }
 
 static PCIDeviceInfo e1000_info = {
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 339a401..205f2c7 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -265,11 +265,19 @@ static NetClientInfo net_pci_pcnet_info = {
 .cleanup = pci_pcnet_cleanup,
 };
 
+static void pci_reset(DeviceState *dev)
+{
+PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev.qdev, dev);
+
+pcnet_h_reset(d-state);
+}
+
 static int pci_pcnet_init(PCIDevice *pci_dev)
 {
 PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev, pci_dev);
 PCNetState *s = d-state;
 uint8_t *pci_conf;
+int ret;
 
 #if 0
 printf(sizeof(RMD)=%d, sizeof(TMD)=%d\n,
@@ -315,14 +323,9 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 }
 }
 
-return pcnet_common_init(pci_dev-qdev, s, net_pci_pcnet_info);
-}
-
-static void pci_reset(DeviceState *dev)
-{
-PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev.qdev, dev);
-
-pcnet_h_reset(d-state);
+ret = pcnet_common_init(pci_dev-qdev, s, net_pci_pcnet_info);
+pci_reset(pci_dev-qdev);
+return ret;
 }
 
 static PCIDeviceInfo pcnet_info = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..15425a8 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -3405,6 +3405,7 @@ static int pci_rtl8139_init(PCIDevice *dev)
 rtl8139_set_next_tctr_time(s, qemu_get_clock(vm_clock));
 
 add_boot_device_path(s-conf.bootindex, dev-qdev, /ethernet-phy@0);
+rtl8139_reset(dev-qdev);
 
 return 0;
 }
-- 
1.7.1

Re: [Qemu-devel] [PATCH 14/22] migration: Remove get_status() accessor

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 It is only used inside migration.c, and fields on that struct are
 accessed all around the place on that file.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   16 +---
  migration.h |    1 -
  2 files changed, 5 insertions(+), 12 deletions(-)

 diff --git a/migration.c b/migration.c
 index dfe6a96..2b873fa 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -90,7 +90,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     int ret;

     if (current_migration 
 -        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-state == MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -135,7 +135,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  {
     MigrationState *s = current_migration;

 -    if (s  s-get_status(s) == MIG_STATE_ACTIVE)
 +    if (s  s-state == MIG_STATE_ACTIVE)
         s-cancel(s);

     return 0;
 @@ -234,7 +234,7 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     if (current_migration) {
         MigrationState *s = current_migration;

 -        switch (s-get_status(current_migration)) {
 +        switch (s-state) {
         case MIG_STATE_NONE:
             /* no migration has happened ever */
             break;
 @@ -375,7 +375,7 @@ static void migrate_fd_put_ready(void *opaque)
         } else {
             migrate_fd_completed(s);
         }
 -        if (s-get_status(s) != MIG_STATE_COMPLETED) {
 +        if (s-state != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }
 @@ -383,11 +383,6 @@ static void migrate_fd_put_ready(void *opaque)
     }
  }

 -static int migrate_fd_get_status(MigrationState *s)
 -{
 -    return s-state;
 -}
 -
  static void migrate_fd_cancel(MigrationState *s)
  {
     if (s-state != MIG_STATE_ACTIVE)
 @@ -442,7 +437,7 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)
  int get_migration_state(void)
  {
     if (current_migration) {
 -        return migrate_fd_get_status(current_migration);
 +        return current_migration-state;
     } else {
         return MIG_STATE_ERROR;
     }
 @@ -477,7 +472,6 @@ static MigrationState *migrate_create_state(Monitor *mon, 
 int64_t bandwidth_limi
     MigrationState *s = qemu_mallocz(sizeof(*s));

     s-cancel = migrate_fd_cancel;
 -    s-get_status = migrate_fd_get_status;
     s-blk = blk;
     s-shared = inc;
     s-mon = NULL;
 diff --git a/migration.h b/migration.h
 index 5455d8b..58a6e06 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -37,7 +37,6 @@ struct MigrationState
     int (*close)(MigrationState*);
     int (*write)(MigrationState*, const void *, size_t);
     void (*cancel)(MigrationState *s);
 -    int (*get_status)(MigrationState *s);
     void *opaque;
     int blk;
     int shared;

I agree to access s-state directly inside of migration.c, but I
disagree to remove get_status() accessor right away.  We don't
have strong motivations for doing that AFAIK.

Yoshi

 --
 1.7.4

Re: [Qemu-devel] [PATCH 2/2 V1] Fixed EPROM for AMD driver compatibility under DOS with Netware driver

2011-02-23 Thread Peter Maydell

On 22 February 2011 21:00, Gerhard Wiesinger li...@wiesinger.com wrote:
 Hello,

 No comments? Can someone commit?

Not my area of the code, but some general remarks:

(1) your patch email seems to be in an odd format with two patches
concatenated, the first of which just changes a line introduced in
the first. You should fix this and resubmit as a single patch in
the right format that makes the changes you want.

(2) Style issues: QEMU's coding style is for C-style comments
(/* .. */) not C++-style //. Also lines should be 80 characters
maximum, so your end-of-line comments should probably be moved.

(3) Your commit message could be better. Convention is that the
summary (subject) should start with an indication of what area
is being patched, eg
  hw/pcnet.c: Fix EPROM contents to suit AMD netware drivers

and then in the body of the commit message you have more detail.
Remarks about what you're fixing and what you've tested go
here, not in comments like:
// bugfix under DOS with AMD netware driver: AMD PCNTNW Ethernet MLID
v3.10 (960115), network card not found
// works well under DOS with AMD NDIS driver v2.0.1, Knoppix 6.2 ok

If you fix these cosmetic issues you'll have a patch which
is easier to review and more likely to be applied.

-- PMM

Re: [Qemu-devel] [PATCH 21/22] migration: Export a function that tells if the migration has finished correctly

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 This will allows us to hide the status values.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c     |    4 ++--
  migration.h     |    2 +-
  ui/spice-core.c |    4 +---
  3 files changed, 4 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 312a029..383ebaf 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -335,9 +335,9 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)
     notifier_list_remove(migration_state_notifiers, notify);
  }

 -int get_migration_state(void)
 +bool migration_has_finished(void)
  {
 -    return current_migration.state;
 +    return current_migration.state == MIG_STATE_COMPLETED;
  }

  void migrate_fd_connect(MigrationState *s)
 diff --git a/migration.h b/migration.h
 index 6477b51..9457807 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -82,6 +82,6 @@ void migrate_fd_connect(MigrationState *s);

  void add_migration_state_change_notifier(Notifier *notify);
  void remove_migration_state_change_notifier(Notifier *notify);
 -int get_migration_state(void);
 +bool migration_has_finished(void);

  #endif
 diff --git a/ui/spice-core.c b/ui/spice-core.c
 index 1aa1a5e..997603d 100644
 --- a/ui/spice-core.c
 +++ b/ui/spice-core.c
 @@ -422,9 +422,7 @@ void do_info_spice(Monitor *mon, QObject **ret_data)

  static void migration_state_notifier(Notifier *notifier)
  {
 -    int state = get_migration_state();
 -
 -    if (state == MIG_STATE_COMPLETED) {
 +    if (migration_has_finished()) {
  #if SPICE_SERVER_VERSION = 0x000701 /* 0.7.1 */
         spice_server_migrate_switch(spice_server);
  #endif

I agree to add migration_has_finished, but I don't see why we
want to remove get_migration_state.  Are we going to make
migration_has_* for each state even migration gets complicated?

Yoshi

 --
 1.7.4

Re: [Qemu-devel] [PATCH 19/22] migration: convert current_migration from pointer to struct

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 This cleans up a lot the code as we don't have to check anymore if
 the variable is NULL or not.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |  119 --
  1 files changed, 49 insertions(+), 70 deletions(-)

 diff --git a/migration.c b/migration.c
 index 4014330..7b1e679 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -34,7 +34,9 @@
  /* Migration speed throttling */
  static int64_t max_throttle = (32  20);

 -static MigrationState *current_migration;
 +static MigrationState current_migration = {
 +    .state = MIG_STATE_NONE,
 +};

  static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 @@ -135,37 +137,34 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
  {
     QDict *qdict;

 -    if (current_migration) {
 -
 -        switch (current_migration-state) {
 -        case MIG_STATE_NONE:
 -            /* no migration has happened ever */
 -            break;
 -        case MIG_STATE_ACTIVE:
 -            qdict = qdict_new();
 -            qdict_put(qdict, status, qstring_from_str(active));
 -
 -            migrate_put_status(qdict, ram, ram_bytes_transferred(),
 -                               ram_bytes_remaining(), ram_bytes_total());
 -
 -            if (blk_mig_active()) {
 -                migrate_put_status(qdict, disk, 
 blk_mig_bytes_transferred(),
 -                                   blk_mig_bytes_remaining(),
 -                                   blk_mig_bytes_total());
 -            }
 -
 -            *ret_data = QOBJECT(qdict);
 -            break;
 -        case MIG_STATE_COMPLETED:
 -            *ret_data = qobject_from_jsonf({ 'status': 'completed' });
 -            break;
 -        case MIG_STATE_ERROR:
 -            *ret_data = qobject_from_jsonf({ 'status': 'failed' });
 -            break;
 -        case MIG_STATE_CANCELLED:
 -            *ret_data = qobject_from_jsonf({ 'status': 'cancelled' });
 -            break;
 +    switch (current_migration.state) {
 +    case MIG_STATE_NONE:
 +        /* no migration has happened ever */
 +        break;
 +    case MIG_STATE_ACTIVE:
 +        qdict = qdict_new();
 +        qdict_put(qdict, status, qstring_from_str(active));
 +
 +        migrate_put_status(qdict, ram, ram_bytes_transferred(),
 +                           ram_bytes_remaining(), ram_bytes_total());
 +
 +        if (blk_mig_active()) {
 +            migrate_put_status(qdict, disk, blk_mig_bytes_transferred(),
 +                               blk_mig_bytes_remaining(),
 +                               blk_mig_bytes_total());
         }
 +
 +        *ret_data = QOBJECT(qdict);
 +        break;
 +    case MIG_STATE_COMPLETED:
 +        *ret_data = qobject_from_jsonf({ 'status': 'completed' });
 +        break;
 +    case MIG_STATE_ERROR:
 +        *ret_data = qobject_from_jsonf({ 'status': 'failed' });
 +        break;
 +    case MIG_STATE_CANCELLED:
 +        *ret_data = qobject_from_jsonf({ 'status': 'cancelled' });
 +        break;
     }
  }

 @@ -339,11 +338,7 @@ void remove_migration_state_change_notifier(Notifier 
 *notify)

  int get_migration_state(void)
  {
 -    if (current_migration) {
 -        return current_migration-state;
 -    } else {
 -        return MIG_STATE_ERROR;
 -    }
 +    return current_migration.state;
  }

  void migrate_fd_connect(MigrationState *s)
 @@ -369,27 +364,22 @@ void migrate_fd_connect(MigrationState *s)
     migrate_fd_put_ready(s);
  }

 -static MigrationState *migrate_create_state(Monitor *mon, int64_t 
 bandwidth_limit,
 -                                            int detach, int blk, int inc)
 +static void migrate_init_state(Monitor *mon, int64_t bandwidth_limit,
 +                               int detach, int blk, int inc)
  {
 -    MigrationState *s = qemu_mallocz(sizeof(*s));
 -
 -    s-blk = blk;
 -    s-shared = inc;
 -    s-mon = NULL;
 -    s-bandwidth_limit = bandwidth_limit;
 -    s-state = MIG_STATE_NONE;
 +    current_migration.blk = blk;
 +    current_migration.shared = inc;
 +    current_migration.mon = NULL;
 +    current_migration.bandwidth_limit = bandwidth_limit;
 +    current_migration.state = MIG_STATE_NONE;

     if (!detach) {
 -        migrate_fd_monitor_suspend(s, mon);
 +        migrate_fd_monitor_suspend(current_migration, mon);
     }
 -
 -    return s;
  }

  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
 -    MigrationState *s = NULL;
     const char *p;
     int detach = qdict_get_try_bool(qdict, detach, 0);
     int blk = qdict_get_try_bool(qdict, blk, 0);
 @@ -397,8 +387,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);
     int ret;

 -    if (current_migration 
 -        current_migration-state == MIG_STATE_ACTIVE) {
 +    if (current_migration.state == MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already

Re: [Qemu-devel] [PATCH 18/22] migration: another case of global variable assigned to local one

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)

 diff --git a/migration.c b/migration.c
 index accc6e4..4014330 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -136,9 +136,8 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
     QDict *qdict;

     if (current_migration) {
 -        MigrationState *s = current_migration;

 -        switch (s-state) {
 +        switch (current_migration-state) {
         case MIG_STATE_NONE:
             /* no migration has happened ever */
             break;

Looks good to me.

Yoshi

 --
 1.7.4

[Qemu-devel] Re: [PATCH 00/22] Refactor and cleaup migration code

2011-02-23 Thread Paolo Bonzini


On 02/23/2011 01:44 AM, Juan Quintela wrote:

This series:
- Fold MigrationState into FdMigrationState (and then rename)
- Factorize migration statec creation in a single place
- Make use of MIG_STATE_*, setup through helpers and make them local
- remove relase  cancel callbacks (where used only one in same
   file than defined)
- get_status() is no more, just access directly to .state
- current_migration use cleanup, and make variable static
- max_throotle is gone, now inside current_migration
- change get_migration_status() to migration_has_finished()
   and actualize single user.

Please review.


Some changes are a matter of taste, but overall looks very nice!

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Paolo

Re: [Qemu-devel] [PATCH 16/22] migration: Move exported functions to the end of the file

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 This means we can remove the two forward declarations.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |  188 
 +--
  1 files changed, 92 insertions(+), 96 deletions(-)

 diff --git a/migration.c b/migration.c
 index 92bff01..d7dfe1e 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -76,90 +76,6 @@ void process_incoming_migration(QEMUFile *f)
         vm_start();
  }

 -static MigrationState *migrate_create_state(Monitor *mon, int64_t 
 bandwidth_limit,
 -                                            int detach, int blk, int inc);
 -
 -int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 -{
 -    MigrationState *s = NULL;
 -    const char *p;
 -    int detach = qdict_get_try_bool(qdict, detach, 0);
 -    int blk = qdict_get_try_bool(qdict, blk, 0);
 -    int inc = qdict_get_try_bool(qdict, inc, 0);
 -    const char *uri = qdict_get_str(qdict, uri);
 -    int ret;
 -
 -    if (current_migration 
 -        current_migration-state == MIG_STATE_ACTIVE) {
 -        monitor_printf(mon, migration already in progress\n);
 -        return -1;
 -    }
 -
 -    if (qemu_savevm_state_blocked(mon)) {
 -        return -1;
 -    }
 -
 -    s = migrate_create_state(mon, max_throttle, detach, blk, inc);
 -
 -    if (strstart(uri, tcp:, p)) {
 -        ret = tcp_start_outgoing_migration(s, p);
 -#if !defined(WIN32)
 -    } else if (strstart(uri, exec:, p)) {
 -        ret = exec_start_outgoing_migration(s, p);
 -    } else if (strstart(uri, unix:, p)) {
 -        ret = unix_start_outgoing_migration(s, p);
 -    } else if (strstart(uri, fd:, p)) {
 -        ret = fd_start_outgoing_migration(s, p);
 -#endif
 -    } else {
 -        monitor_printf(mon, unknown migration protocol: %s\n, uri);
 -        ret  = -EINVAL;
 -        goto free_migrate_state;
 -    }
 -
 -    if (ret  0) {
 -        monitor_printf(mon, migration failed\n);
 -        goto free_migrate_state;
 -    }
 -
 -    qemu_free(current_migration);
 -    current_migration = s;
 -    notifier_list_notify(migration_state_notifiers);
 -    return 0;
 -free_migrate_state:
 -    qemu_free(s);
 -    return -1;
 -}
 -
 -static void migrate_fd_cancel(MigrationState *s);
 -
 -int do_migrate_cancel(Monitor *mon, const QDict *qdict, QObject **ret_data)
 -{
 -    if (current_migration)
 -        migrate_fd_cancel(current_migration);
 -
 -    return 0;
 -}
 -
 -int do_migrate_set_speed(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 -{
 -    int64_t d;
 -    MigrationState *s;
 -
 -    d = qdict_get_int(qdict, value);
 -    if (d  0) {
 -        d = 0;
 -    }
 -    max_throttle = d;
 -
 -    s = current_migration;
 -    if (s  s-file) {
 -        qemu_file_set_rate_limit(s-file, max_throttle);
 -    }
 -
 -    return 0;
 -}
 -
  /* amount of nanoseconds we are willing to wait for migration to be down.
  * the choice of nanoseconds is because it is the maximum resolution that
  * get_clock() can achieve. It is an internal measure. All user-visible
 @@ -171,18 +87,6 @@ uint64_t migrate_max_downtime(void)
     return max_downtime;
  }

 -int do_migrate_set_downtime(Monitor *mon, const QDict *qdict,
 -                            QObject **ret_data)
 -{
 -    double d;
 -
 -    d = qdict_get_double(qdict, value) * 1e9;
 -    d = MAX(0, MIN(UINT64_MAX, d));
 -    max_downtime = (uint64_t)d;
 -
 -    return 0;
 -}
 -
  static void migrate_print_status(Monitor *mon, const char *name,
                                  const QDict *status_dict)
  {
 @@ -483,3 +387,95 @@ static MigrationState *migrate_create_state(Monitor 
 *mon, int64_t bandwidth_limi

     return s;
  }
 +
 +int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 +{
 +    MigrationState *s = NULL;
 +    const char *p;
 +    int detach = qdict_get_try_bool(qdict, detach, 0);
 +    int blk = qdict_get_try_bool(qdict, blk, 0);
 +    int inc = qdict_get_try_bool(qdict, inc, 0);
 +    const char *uri = qdict_get_str(qdict, uri);
 +    int ret;
 +
 +    if (current_migration 
 +        current_migration-state == MIG_STATE_ACTIVE) {
 +        monitor_printf(mon, migration already in progress\n);
 +        return -1;
 +    }
 +
 +    if (qemu_savevm_state_blocked(mon)) {
 +        return -1;
 +    }
 +
 +    s = migrate_create_state(mon, max_throttle, detach, blk, inc);
 +
 +    if (strstart(uri, tcp:, p)) {
 +        ret = tcp_start_outgoing_migration(s, p);
 +#if !defined(WIN32)
 +    } else if (strstart(uri, exec:, p)) {
 +        ret = exec_start_outgoing_migration(s, p);
 +    } else if (strstart(uri, unix:, p)) {
 +        ret = unix_start_outgoing_migration(s, p);
 +    } else if (strstart(uri, fd:, p)) {
 +        ret = fd_start_outgoing_migration(s, p);
 +#endif
 +    } else {
 +        monitor_printf(mon, unknown migration protocol: %s\n, uri);
 +        ret  = -EINVAL;
 +        goto free_migrate_state;
 +    }
 +
 +    if (ret  0) {
 +

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Kevin Wolf

Am 22.02.2011 19:18, schrieb Anthony Liguori:
 On 02/22/2011 10:15 AM, Kevin Wolf wrote:
 Am 22.02.2011 16:57, schrieb Anthony Liguori:

 On 02/22/2011 02:56 AM, Kevin Wolf wrote:
  
 *sigh*

 It starts to get annoying, but if you really insist, I can repeat it
 once more: These features that you don't need (this is the correct
 description for what you call misfeatures) _are_ implemented in a way
 that they don't impact the normal case.

 Except that they require a refcount table that adds additional metadata
 that needs to be updated in the fast path.  I consider that impacting
 the normal case.
  
 Like it or not, this requirement exists anyway, without any of your
 misfeatures.

 You chose to use the dirty flag in QED in order to avoid having to flush
 metadata too often, which is an approach that any other format, even one
 using refcounts, can take as well.

 
 It's a minor detail, but flushing and the amount of metadata are 
 separate points.

I agree that they are separate...

 
 The dirty flag prevents metadata from being flushed to disk very often 
 but the use of a refcount table adds additional metadata.
 
 A refcount table is definitely not required even if you claim the 
 requirement exists for other features.  I assume you mean to implement 
 trim/discard support but instead of a refcount table, a free list would 
 work just as well and would leave the metadata update out of the fast 
 path (allocating writes) and instead only be in the slow path 
 (trim/discard).

...but here you're arguing about writing metadata out in the fast path,
so you're actually not interested in the amount of metadata but in the
overhead of flushing it. Which is a problem that's solved.

A refcount table is essential for internal snapshots and compression,
it's useful for discard and for running on block devices, it's necessary
for avoiding the dirty flag and fsck on startup.

These are five use cases that I can enumerate without thinking a lot
about it, there might be more. You propose using three different
mechanisms for allowing normal allocations (use the file size), block
devices (add a size field into the header) and discard (free list), and
the other three features, for which you can't think of a hack, you
declare misfeatures.

I don't think what you're proposing is a satisfactory solution. In my
book, a single data structure that can provide all of the features is
better than a bunch of independent hacks that allows only half of it.

 As a format feature, a refcount table really only makes sense if the 
 refcount is required to be greater than a single bit.  There are more 
 optimal data structures that can be used if the refcount of a block is 
 fixed to 1-bit (like a free list) which is what the fundamental design 
 difference between qcow2 and qed is.

Okay, so even assuming that there's something like misfeatures that we
can kick out (with which I strongly disagree), what's the crucial
advantage of free lists that would make you switch the image format?

That you only access it in the slow path (discard) isn't true, because
you certainly want to reallocate freed clusters. Otherwise you could
just leak them without maintaining a list of leaked clusters...

 The only use of a refcount of more than 1-bit is internal snapshots AFAICT.

Of the currently implemented features, internal snapshots and compression.

Kevin

[Qemu-devel] Re: [PATCH 03/22] migration: Fold MigrationState into FdMigrationState

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

  struct FdMigrationState
  {
 -    MigrationState mig_state;
     int64_t bandwidth_limit;
     QEMUFile *file;
     int fd;
 @@ -48,7 +35,12 @@ struct FdMigrationState
     int (*get_error)(struct FdMigrationState*);
     int (*close)(struct FdMigrationState*);
     int (*write)(struct FdMigrationState*, const void *, size_t);
 +    void (*cancel)(FdMigrationState *s);
 +    int (*get_status)(FdMigrationState *s);
 +    void (*release)(FdMigrationState *s);
     void *opaque;
 +    int blk;
 +    int shared;
  };

 Although I don't have objections for folding MigrationState into
 FdMigrationState, it would be good to ask why the original author
 split it intentionally.

I asked, Anthony answer was that it is a historical artifact.

Later, Juan.

[Qemu-devel] Re: [PATCH 08/22] migration: Check that migration is active before cancel it

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/migration.c b/migration.c
 index 397a0b9..55f58c8 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -138,7 +138,7 @@ int do_migrate_cancel(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
  {
     MigrationState *s = current_migration;

 -    if (s)
 +    if (s  s-get_status(s) == MIG_STATE_ACTIVE)
         s-cancel(s);

     return 0;

 Why don't you remove *s again?

Removed in a next patch.

Later, Juan.

Re: [Qemu-devel] [PATCH 02/22] migration: Use FdMigrationState instead of MigrationState when possible

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   31 ++-
  migration.h |   16 
  2 files changed, 22 insertions(+), 25 deletions(-)

 diff --git a/migration.c b/migration.c
 index f9aaadc..3a371a3 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -34,7 +34,7 @@
  /* Migration speed throttling */
  static int64_t max_throttle = (32  20);

 -static MigrationState *current_migration;
 +static FdMigrationState *current_migration;

  static NotifierList migration_state_notifiers =
     NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
 @@ -86,7 +86,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
     const char *uri = qdict_get_str(qdict, uri);

     if (current_migration 
 -        current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 +        current_migration-mig_state.get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
         monitor_printf(mon, migration already in progress\n);
         return -1;
     }
 @@ -120,20 +120,20 @@ int do_migrate(Monitor *mon, const QDict *qdict, 
 QObject **ret_data)
     }

     if (current_migration) {
 -        current_migration-release(current_migration);
 +        current_migration-mig_state.release(current_migration);
     }

 -    current_migration = s-mig_state;
 +    current_migration = s;
     notifier_list_notify(migration_state_notifiers);
     return 0;
  }

  int do_migrate_cancel(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
 -    MigrationState *s = current_migration;
 +    FdMigrationState *s = current_migration;

     if (s)
 -        s-cancel(s);
 +        s-mig_state.cancel(s);

     return 0;
  }
 @@ -149,7 +149,7 @@ int do_migrate_set_speed(Monitor *mon, const QDict 
 *qdict, QObject **ret_data)
     }
     max_throttle = d;

 -    s = migrate_to_fms(current_migration);
 +    s = current_migration;
     if (s  s-file) {
         qemu_file_set_rate_limit(s-file, max_throttle);
     }
 @@ -227,10 +227,11 @@ static void migrate_put_status(QDict *qdict, const char 
 *name,
  void do_info_migrate(Monitor *mon, QObject **ret_data)
  {
     QDict *qdict;
 -    MigrationState *s = current_migration;

 -    if (s) {
 -        switch (s-get_status(s)) {
 +    if (current_migration) {
 +        MigrationState *s = current_migration-mig_state;
 +
 +        switch (s-get_status(current_migration)) {
         case MIG_STATE_ACTIVE:
             qdict = qdict_new();
             qdict_put(qdict, status, qstring_from_str(active));
 @@ -399,16 +400,13 @@ void migrate_fd_put_ready(void *opaque)
     }
  }

 -int migrate_fd_get_status(MigrationState *mig_state)
 +int migrate_fd_get_status(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);
     return s-state;
  }

 -void migrate_fd_cancel(MigrationState *mig_state)
 +void migrate_fd_cancel(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);
 -
     if (s-state != MIG_STATE_ACTIVE)
         return;

 @@ -421,9 +419,8 @@ void migrate_fd_cancel(MigrationState *mig_state)
     migrate_fd_cleanup(s);
  }

 -void migrate_fd_release(MigrationState *mig_state)
 +void migrate_fd_release(FdMigrationState *s)
  {
 -    FdMigrationState *s = migrate_to_fms(mig_state);

     DPRINTF(releasing state\n);

 diff --git a/migration.h b/migration.h
 index db0e45a..f49a9e2 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -25,18 +25,18 @@

  typedef struct MigrationState MigrationState;

 +typedef struct FdMigrationState FdMigrationState;
 +
  struct MigrationState
  {
     /* FIXME: add more accessors to print migration info */
 -    void (*cancel)(MigrationState *s);
 -    int (*get_status)(MigrationState *s);
 -    void (*release)(MigrationState *s);
 +    void (*cancel)(FdMigrationState *s);
 +    int (*get_status)(FdMigrationState *s);
 +    void (*release)(FdMigrationState *s);
     int blk;
     int shared;
  };

 -typedef struct FdMigrationState FdMigrationState;
 -
  struct FdMigrationState
  {
     MigrationState mig_state;
 @@ -120,11 +120,11 @@ void migrate_fd_connect(FdMigrationState *s);

  void migrate_fd_put_ready(void *opaque);

 -int migrate_fd_get_status(MigrationState *mig_state);
 +int migrate_fd_get_status(FdMigrationState *mig_state);

 -void migrate_fd_cancel(MigrationState *mig_state);
 +void migrate_fd_cancel(FdMigrationState *mig_state);

 -void migrate_fd_release(MigrationState *mig_state);
 +void migrate_fd_release(FdMigrationState *mig_state);

  void migrate_fd_wait_for_unfreeze(void *opaque);


Looks good to me.

Yoshi

 --
 1.7.4

[Qemu-devel] Re: [PATCH 14/22] migration: Remove get_status() accessor

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:
 It is only used inside migration.c, and fields on that struct are
 accessed all around the place on that file.


 I agree to access s-state directly inside of migration.c, but I
 disagree to remove get_status() accessor right away.  We don't
 have strong motivations for doing that AFAIK.

Only user outside of migration.c was ui/spice-core.c, and it just wanted
to know if migration has finished at all.

At this point I was trying to isolate what other parts of MigrationState
are used externally.  That way, it gets easier to change that later.

At this point, only things used outside of migration.c are:
- write, clase, get_error: trivial to fix, just add setters for them.
- fd: that is not enterely trivial to fix.

Later, Juan.

[Qemu-devel] Re: [PATCH 22/22] migration: Make state definitions local

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 ++
  migration.h |    6 --
  2 files changed, 6 insertions(+), 6 deletions(-)

 diff --git a/migration.c b/migration.c
 index 383ebaf..90fc2a0 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -31,6 +31,12 @@
     do { } while (0)
  #endif

 +#define MIG_STATE_ERROR                -1
 +#define MIG_STATE_NONE         0
 +#define MIG_STATE_CANCELLED    1
 +#define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3
 +
  static MigrationState current_migration = {
     .state = MIG_STATE_NONE,
      /* Migration speed throttling */
 diff --git a/migration.h b/migration.h
 index 9457807..493fbe5 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -18,12 +18,6 @@
  #include qemu-common.h
  #include notify.h

 -#define MIG_STATE_ERROR                -1
 -#define MIG_STATE_NONE         0
 -#define MIG_STATE_CANCELLED    1
 -#define MIG_STATE_ACTIVE       2
 -#define MIG_STATE_COMPLETED    3
 -

 Although you're right, I would prefer to keep it so that somebody
 outside of migration may understand the status in the future if
 there are no harms.

my plan is to move MigrationState inside migration.c, and then decide
what to export/not export.  Next thing to do is move migration to its
own thread.  Before doing that, I need to know what parts are used/not
used outside migration.c.  Removing it now means that nothing gets to
use it without needing a patch.

Later, Juan..

Re: [Qemu-devel] Building QEMU on PS3

2011-02-23 Thread 陳韋任

Hi, Roy

 Seems to be related:
 http://www.mail-archive.com/qemu-devel@nongnu.org/msg55914.html

  Thanks for the info. I wonder if it is possible to cross compile qemu for
ppc (host) on a x86 machine.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Parallel Processing Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667

Re: [Qemu-devel] [PATCH 06/22] migration: Make all posible migration functions static

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 I have to move two functions postions to avoid forward declarations

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   72 +-
  migration.h |   12 -
  2 files changed, 36 insertions(+), 48 deletions(-)

 diff --git a/migration.c b/migration.c
 index e773806..1853380 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -273,15 +273,7 @@ static void migrate_fd_monitor_suspend(MigrationState 
 *s, Monitor *mon)
     }
  }

 -void migrate_fd_error(MigrationState *s)
 -{
 -    DPRINTF(setting error state\n);
 -    s-state = MIG_STATE_ERROR;
 -    notifier_list_notify(migration_state_notifiers);
 -    migrate_fd_cleanup(s);
 -}
 -
 -int migrate_fd_cleanup(MigrationState *s)
 +static int migrate_fd_cleanup(MigrationState *s)
  {
     int ret = 0;

 @@ -308,7 +300,15 @@ int migrate_fd_cleanup(MigrationState *s)
     return ret;
  }

 -void migrate_fd_put_notify(void *opaque)
 +void migrate_fd_error(MigrationState *s)
 +{
 +    DPRINTF(setting error state\n);
 +    s-state = MIG_STATE_ERROR;
 +    notifier_list_notify(migration_state_notifiers);
 +    migrate_fd_cleanup(s);
 +}
 +
 +static void migrate_fd_put_notify(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -316,7 +316,7 @@ void migrate_fd_put_notify(void *opaque)
     qemu_file_put_notify(s-file);
  }

 -ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 +static ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t 
 size)
  {
     MigrationState *s = opaque;
     ssize_t ret;
 @@ -341,29 +341,7 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
 *data, size_t size)
     return ret;
  }

 -void migrate_fd_connect(MigrationState *s)
 -{
 -    int ret;
 -
 -    s-file = qemu_fopen_ops_buffered(s,
 -                                      s-bandwidth_limit,
 -                                      migrate_fd_put_buffer,
 -                                      migrate_fd_put_ready,
 -                                      migrate_fd_wait_for_unfreeze,
 -                                      migrate_fd_close);
 -
 -    DPRINTF(beginning savevm\n);
 -    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);
 -    if (ret  0) {
 -        DPRINTF(failed, %d\n, ret);
 -        migrate_fd_error(s);
 -        return;
 -    }
 -
 -    migrate_fd_put_ready(s);
 -}
 -
 -void migrate_fd_put_ready(void *opaque)
 +static void migrate_fd_put_ready(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -431,7 +409,7 @@ static void migrate_fd_release(MigrationState *s)
     qemu_free(s);
  }

 -void migrate_fd_wait_for_unfreeze(void *opaque)
 +static void migrate_fd_wait_for_unfreeze(void *opaque)
  {
     MigrationState *s = opaque;
     int ret;
 @@ -450,7 +428,7 @@ void migrate_fd_wait_for_unfreeze(void *opaque)
     } while (ret == -1  (s-get_error(s)) == EINTR);
  }

 -int migrate_fd_close(void *opaque)
 +static int migrate_fd_close(void *opaque)
  {
     MigrationState *s = opaque;

 @@ -477,6 +455,28 @@ int get_migration_state(void)
     }
  }

 +void migrate_fd_connect(MigrationState *s)
 +{
 +    int ret;
 +
 +    s-file = qemu_fopen_ops_buffered(s,
 +                                      s-bandwidth_limit,
 +                                      migrate_fd_put_buffer,
 +                                      migrate_fd_put_ready,
 +                                      migrate_fd_wait_for_unfreeze,
 +                                      migrate_fd_close);
 +
 +    DPRINTF(beginning savevm\n);
 +    ret = qemu_savevm_state_begin(s-mon, s-file, s-blk, s-shared);
 +    if (ret  0) {
 +        DPRINTF(failed, %d\n, ret);
 +        migrate_fd_error(s);
 +        return;
 +    }
 +
 +    migrate_fd_put_ready(s);
 +}
 +
  MigrationState *migrate_create_state(Monitor *mon, int64_t bandwidth_limit,
                                      int detach, int blk, int inc)
  {
 diff --git a/migration.h b/migration.h
 index 0178414..048ee46 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -100,20 +100,8 @@ MigrationState *fd_start_outgoing_migration(Monitor *mon,

  void migrate_fd_error(MigrationState *s);

 -int migrate_fd_cleanup(MigrationState *s);
 -
 -void migrate_fd_put_notify(void *opaque);
 -
 -ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 -
  void migrate_fd_connect(MigrationState *s);

 -void migrate_fd_put_ready(void *opaque);
 -
 -void migrate_fd_wait_for_unfreeze(void *opaque);
 -
 -int migrate_fd_close(void *opaque);
 -
  MigrationState *migrate_create_state(Monitor *mon, int64_t bandwidth_limit,
                                      int detach, int blk, int inc);


Looks good to me.

Yoshi

 --
 1.7.4

Re: [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device

2011-02-23 Thread William Dauchy

Hi Markus,

On Wed, Feb 23, 2011 at 9:30 AM, Markus Armbruster arm...@redhat.com wrote:
 I don't think this patch is correct.  Let me explain.

 Device hot unplug is *not* guaranteed to succeed.

 For some buses, such as USB, it always succeeds immediately, i.e. when
 the device_del monitor command finishes, the device is gone.  Live is
 good.

 But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
 doesn't wait for the dance to complete.  Why?  The dance can take an
 unpredictable amount of time, including forever.

 Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
 slot, and the unplug has not yet completed (race condition), or it
 failed.  Yes, Virginia, PCI hotplug *can* fail.

 When unplug succeeds, the qdev is automatically destroyed.
 pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
 does it for PCIE.

 Your patch adds a *second* qdev_free().  No good.

Thanks for the complete explanation. I now understand my mistake and
find out why it wasn't working as expected.
On a Linux virtual machine I forgot to load the `acpiphp` module which
makes the pci device disappear when doing a detach. I thought that,
even without this module the detach should have freed the
corresponding device on qemu side, regardless if the virtual machine
was supporting or not acpi signals.
Please ignore my patch.

Regards,
-- 
William

Re: [Qemu-devel] [PATCH 09/22] migration: Introduce MIG_STATE_NONE

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Use MIG_STATE_ACTIVE only when migration has really started

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |    6 +-
  migration.h |    3 ++-
  2 files changed, 7 insertions(+), 2 deletions(-)

 diff --git a/migration.c b/migration.c
 index 55f58c8..f015e02 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -238,6 +238,9 @@ void do_info_migrate(Monitor *mon, QObject **ret_data)
         MigrationState *s = current_migration;

         switch (s-get_status(current_migration)) {
 +        case MIG_STATE_NONE:
 +            /* no migration has happened ever */
 +            break;
         case MIG_STATE_ACTIVE:
             qdict = qdict_new();
             qdict_put(qdict, status, qstring_from_str(active));
 @@ -465,6 +468,7 @@ void migrate_fd_connect(MigrationState *s)
  {
     int ret;

 +    s-state = MIG_STATE_ACTIVE;
     s-file = qemu_fopen_ops_buffered(s,
                                       s-bandwidth_limit,
                                       migrate_fd_put_buffer,
 @@ -495,7 +499,7 @@ static MigrationState *migrate_create_state(Monitor *mon, 
 int64_t bandwidth_limi
     s-shared = inc;
     s-mon = NULL;
     s-bandwidth_limit = bandwidth_limit;
 -    s-state = MIG_STATE_ACTIVE;
 +    s-state = MIG_STATE_NONE;

     if (!detach) {
         migrate_fd_monitor_suspend(s, mon);
 diff --git a/migration.h b/migration.h
 index 7d28dd3..3df2293 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -19,9 +19,10 @@
  #include notify.h

  #define MIG_STATE_ERROR                -1
 -#define MIG_STATE_COMPLETED    0
 +#define MIG_STATE_NONE         0
  #define MIG_STATE_CANCELLED    1
  #define MIG_STATE_ACTIVE       2
 +#define MIG_STATE_COMPLETED    3

It may be a good chance to make them enum?

Yoshi


  typedef struct MigrationState MigrationState;

 --
 1.7.4

[Qemu-devel] Re: [PATCH 0/2] Fix error handling in migration when the peer is killed.

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Hi,

 During live migration, if the receiver side of qemu gets killed, the
 sender side seems to be handling the error incorrectly, like it passes
 the iterate phase (stage 2) and moves on to the complete state (stage
 3).  These patches fix the issue.


Agreed. Integrated into my series of cleanups. Thanks.

Re: [Qemu-devel] [PATCH] Remove a detached device from qemu_device_opts.

2011-02-23 Thread William Dauchy

Hi Minoru,

On Tue, Feb 15, 2011 at 3:32 AM, Minoru Usui u...@mxm.nes.nec.co.jp wrote:
 I can reproduce, too.
 But strangely, it don't occur in case of loading acpiphp driver
 to the guest VM on below environment.

  Host : RHEL6.0
  Guest: RHEL5.5

 Unfortunately, I'm not familiar with qemu-kvm.
 I investigated below questions about this problem, but I couldn't resolve 
 them.

  - How to call qdev_free() asynchronously. (How should we fix this problem)
  - Why it don't occur with acpiphp driver

 If anyone knows answer of above questions or its clue, please let me know.

If fact this is not a bug.
`qdev_free` is called when the acpi detach succeed in `pciej_write`.
The virtual machine has to correctly support acpi signals.
Please read the explanation from Markus Armbruster on
http://lists.nongnu.org/archive/html/qemu-devel/2011-02/msg02637.html

Regards,

-- 
William

Re: [Qemu-devel] Re: QEMU regression problems - Update FPU

2011-02-23 Thread Laurent Desnogues

On Wed, Feb 23, 2011 at 9:16 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 18 February 2011 07:12, Gerhard Wiesinger li...@wiesinger.com wrote:
 Issue 1.) with FPU still present
 I tracked down the problematic code and it is a rounding error from double
 precision to 64bit floats: Any ideas how to fix such an issue in general?

 QEMU result in ST0: 0.42925860786976457 (wrong emulated)
 KVM result in ST0:  0.42925860786975449 (correct)

 This is an error when running QEMU in TCG mode, right?
 At the moment x86 is the odd-one-out in that it doesn't
 use CONFIG_SOFTFLOAT for its FPU emulation, so somebody
 has made an explicit choice of preferring speed over
 accuracy, and I am unsurprised that there are rounding
 errors as a result.

Even if you were using SoftFloat, you'd probably still get wrong
results given that this test seems to test trigonometric instructions
which aren't implemented in SoftFloat, but are relying on libm.


Laurent

Re: [Qemu-devel] [PATCH 10/22] migration: Refactor and simplify error checking in migrate_fd_put_ready

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

 diff --git a/migration.c b/migration.c
 index f015e02..641df9f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -361,28 +361,26 @@ static void migrate_fd_put_ready(void *opaque)

     DPRINTF(iterate\n);
     if (qemu_savevm_state_iterate(s-mon, s-file) == 1) {
 -        int state;
         int old_vm_running = vm_running;

         DPRINTF(done iterating\n);
         vm_stop(VMSTOP_MIGRATE);

 -        if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 -            if (old_vm_running) {
 -                vm_start();
 -            }
 -            state = MIG_STATE_ERROR;
 +        if (qemu_savevm_state_complete(s-mon, s-file)  0) {
 +            migrate_fd_error(s);
         } else {
 -            state = MIG_STATE_COMPLETED;
 +            if (migrate_fd_cleanup(s)  0) {
 +                migrate_fd_error(s);
 +            } else {
 +                s-state = MIG_STATE_COMPLETED;
 +                notifier_list_notify(migration_state_notifiers);
 +            }
         }
 -        if (migrate_fd_cleanup(s)  0) {
 +        if (s-get_status(s) != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }

This part, although it's not fair to ask you, but calling
vm_start when != MIG_STATE_COMPLETED terrifies me because just
failing migrate_fd_cleanup (mostly calling qemu_fclose) may cause
split brain between src/dst.  Although I haven't encountered this
situation, just having stopped VMs on both sides is safer.

Thanks,

Yoshi

 -            state = MIG_STATE_ERROR;
         }
 -        s-state = state;
 -        notifier_list_notify(migration_state_notifiers);
     }
  }

 --
 1.7.4

Re: [Qemu-devel] Building QEMU on PS3

2011-02-23 Thread Roy Tam

Hi chenwj,

2011/2/23 陳韋任 che...@iis.sinica.edu.tw:
 Hi, Roy

 Seems to be related:
 http://www.mail-archive.com/qemu-devel@nongnu.org/msg55914.html

  Thanks for the info. I wonder if it is possible to cross compile qemu for
 ppc (host) on a x86 machine.


If you have cross compile toolchain and cross compiled necessary
libraries, then it should work.
You can also try adding extra swap on PS3 to see if it can compile the sources.
http://psubuntu.com/wiki/PSUbuntuGPU
http://stason.org/TULARC/os/linux-faq/038-How-Do-I-Add-Temporary-Swap-Space.html

Best regards,
Roy

Re: [Qemu-devel] Re: [PATCH 0/2] Fix error handling in migration when the peer is killed.

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 Hi,

 During live migration, if the receiver side of qemu gets killed, the
 sender side seems to be handling the error incorrectly, like it passes
 the iterate phase (stage 2) and moves on to the complete state (stage
 3).  These patches fix the issue.


 Agreed. Integrated into my series of cleanups. Thanks.

Thanks!

Yoshi

Re: [Qemu-devel] General IO ports in pc386

2011-02-23 Thread Markus Armbruster

Tomas Bures bu...@d3s.mff.cuni.cz writes:

 Thank you. This is exactly what I was looking for. I've created the
 device, added the compilation of it to Makefile.target . Now, I'm
 wondering how to persuade QEMU to initialize it?

 I've registered it using the code below. The registration executes
 during QEMU launch, the the initialization doesn't. Please where
 should I add it?

 static void ers_io_register_devices(void)
 {
 isa_qdev_register(ers_io_info);
 ers_io_debug(\n);
 }

 device_init(ers_io_register_devices);

Try starting with -device NAME, where NAME is whatever you put into
qdev.name.  Your device should then be visible in info qtree.

[Qemu-devel] Re: [PATCH 10/22] migration: Refactor and simplify error checking in migrate_fd_put_ready

2011-02-23 Thread Juan Quintela

Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

 diff --git a/migration.c b/migration.c
 index f015e02..641df9f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -361,28 +361,26 @@ static void migrate_fd_put_ready(void *opaque)

     DPRINTF(iterate\n);
     if (qemu_savevm_state_iterate(s-mon, s-file) == 1) {
 -        int state;
         int old_vm_running = vm_running;

         DPRINTF(done iterating\n);
         vm_stop(VMSTOP_MIGRATE);

 -        if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 -            if (old_vm_running) {
 -                vm_start();
 -            }
 -            state = MIG_STATE_ERROR;
 +        if (qemu_savevm_state_complete(s-mon, s-file)  0) {
 +            migrate_fd_error(s);
         } else {
 -            state = MIG_STATE_COMPLETED;
 +            if (migrate_fd_cleanup(s)  0) {
 +                migrate_fd_error(s);
 +            } else {
 +                s-state = MIG_STATE_COMPLETED;
 +                notifier_list_notify(migration_state_notifiers);
 +            }
         }
 -        if (migrate_fd_cleanup(s)  0) {
 +        if (s-get_status(s) != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }

 This part, although it's not fair to ask you, but calling
 vm_start when != MIG_STATE_COMPLETED terrifies me because just
 failing migrate_fd_cleanup (mostly calling qemu_fclose) may cause
 split brain between src/dst.  Although I haven't encountered this
 situation, just having stopped VMs on both sides is safer.

I see your pain. I am not happy at all, but this was integrated by
Anthony to fix this bug:

commit 41ef56e61153d7bd27d34a634633bb51b1c5988d
Author: Anthony Liguori aligu...@us.ibm.com
Date:   Wed Jun 2 14:55:25 2010 -0500

migration: respect exit status with exec:

 This fixes https://bugs.launchpad.net/qemu/+bug/391879


I think that it fixes that bug, but it makes me un-easy to restart vm if
there is a failure in migrate_fd_cleanup().  As I didn't wanted to
change behaviour with this series, I left it as it was.

Next on ToDo list is to do something sensible with errors, just now we
are not very good at handling them.

Later, Juan.

[Qemu-devel] Re: [PATCH 00/22] Refactor and cleaup migration code

2011-02-23 Thread Jan Kiszka

On 2011-02-23 01:44, Juan Quintela wrote:
 This series:
 - Fold MigrationState into FdMigrationState (and then rename)
 - Factorize migration statec creation in a single place
 - Make use of MIG_STATE_*, setup through helpers and make them local
 - remove relase  cancel callbacks (where used only one in same
   file than defined)
 - get_status() is no more, just access directly to .state
 - current_migration use cleanup, and make variable static
 - max_throotle is gone, now inside current_migration
 - change get_migration_status() to migration_has_finished()
   and actualize single user.
 
 Please review.

Tried checkpatch.pl? :) I can only recommend to include it in personal
preparation scripts for patch series.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Edgar E. Iglesias

On Mon, Feb 21, 2011 at 09:51:22AM +0100, Paolo Bonzini wrote:
 This series redoes the way time spent waiting for I/O is accounted to
 the vm_clock.
 
 The current code is advancing qemu_icount before waiting for I/O.
 Instead, after the patch qemu_icount is left aside (it is a pure
 instruction counter) and qemu_icount_bias is changed according to
 the actual amount of time spent in the wait.  This is more
 accurate, and actually works in the iothread case as well.
 
 (I started this as an experiment while trying to understand what was
 going on.  But it fixes the bug and does not break the no-iothread
 case, so hey...).
 
 Patch 1 is a cleanup to Edgar's commit 225d02c (Avoid deadlock whith
 iothread and icount, 2011-01-23).

Thanks, this one was a good cleanup, I've applied it.


 Patch 2 fixes another misunderstanding in the role of qemu_next_deadline.
 
 Patches 3 and 4 implement the actual new accounting algorithm.

Sorry, I don't know the code well enough to give any sensible feedback
on patch 2 - 4. I did test them with some of my guests and things seem
to be OK with them but quite a bit slower.
I saw around 10 - 20% slowdown with a cris guest and -icount 10.

The slow down might be related to the issue with super slow icount together
with iothread (adressed by Marcelos iothread timeout patch).


 With these patches, iothread -icount N doesn't work when the actual
 execution speed cannot keep up with the requested speed; the execution
 in that case is not deterministic.  It works when the requested speed
 is slow enough.

Sorry, would you mind explaning this a bit?

For example, if I have a machine and guest sw that does no IO. It runs
the CPU and only uses devices that use the virtual time (e.g timers
and peripherals that compute stuff). Can I expect the guest (with
fixed icount speed -icount N) to run deterministically regardless of
host speed?

Cheers

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Paolo Bonzini


On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:

Sorry, I don't know the code well enough to give any sensible feedback
on patch 2 - 4. I did test them with some of my guests and things seem
to be OK with them but quite a bit slower.
I saw around 10 - 20% slowdown with a cris guest and -icount 10.

The slow down might be related to the issue with super slow icount together
with iothread (adressed by Marcelos iothread timeout patch).


No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
looks like it deadlocked anyway.  Also, Jan has ideas on how to remove 
the synchronization overhead in the main loop for TCG+iothread.


Have you tested without iothread too, both to check the speed and to 
ensure this introduces no regressions?



  With these patches, iothread -icount N doesn't work when the actual
  execution speed cannot keep up with the requested speed; the execution
  in that case is not deterministic.  It works when the requested speed
  is slow enough.

Sorry, would you mind explaning this a bit?


-icount 0 doesn't give 1000 BogoMIPS unless the machine is fast enough 
to sustain it.  But that's a bug.  These patches are meant to be a start.



For example, if I have a machine and guest sw that does no IO. It runs
the CPU and only uses devices that use the virtual time (e.g timers
and peripherals that compute stuff). Can I expect the guest (with
fixed icount speed -icount N) to run deterministically regardless of
host speed?


Right now, only if the N is large enough for the host machine to sustain 
that speed.


Paolo

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Edgar E. Iglesias

On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
 On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
  Sorry, I don't know the code well enough to give any sensible feedback
  on patch 2 - 4. I did test them with some of my guests and things seem
  to be OK with them but quite a bit slower.
  I saw around 10 - 20% slowdown with a cris guest and -icount 10.
 
  The slow down might be related to the issue with super slow icount together
  with iothread (adressed by Marcelos iothread timeout patch).
 
 No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
 looks like it deadlocked anyway.  Also, Jan has ideas on how to remove 
 the synchronization overhead in the main loop for TCG+iothread.

I see. I tried booting two of my MIPS and CRIS linux guests with iothread
and -icount 4. Without your patch, the boot crawls super slow. Your patch
gives a huge improvement. This was the deadlock scenario which I
mentioned in previous emails.

Just to clarify the previous test where I saw slowdown with your patch:
A CRIS setup that has a CRIS and basically only two peripherals,
a timer block and a device (X) that computes stuff but delays the results
with a virtual timer. The guest CPU is 99% of the time just
busy-waiting for device X to get ready.

This latter test runs in 3.7s with icount 4 and without iothread,
with or without your patch.

With icount 4 and iothread it runs in ~1m5s without your patch and
~1m20s with your patch. That was the 20% slowdown I mentioned earlier.

Don't know if that info helps...


 Have you tested without iothread too, both to check the speed and to 
 ensure this introduces no regressions?

I tried now, I see no regression without iothread.

With these patches, iothread -icount N doesn't work when the actual
execution speed cannot keep up with the requested speed; the execution
in that case is not deterministic.  It works when the requested speed
is slow enough.
 
  Sorry, would you mind explaning this a bit?
 
 -icount 0 doesn't give 1000 BogoMIPS unless the machine is fast enough 
 to sustain it.  But that's a bug.  These patches are meant to be a start.
 
  For example, if I have a machine and guest sw that does no IO. It runs
  the CPU and only uses devices that use the virtual time (e.g timers
  and peripherals that compute stuff). Can I expect the guest (with
  fixed icount speed -icount N) to run deterministically regardless of
  host speed?
 
 Right now, only if the N is large enough for the host machine to sustain 
 that speed.

OK thanks.

Cheers

[Qemu-devel] [PATCH 2/7] introduce libcacard/vscard_common.h

2011-02-23 Thread Alon Levy

---

Signed-off-by: Alon Levy al...@redhat.com

v19-v20 changes:
 * checkpatch.pl

v15-v16 changes:

Protocol change:
 * VSCMsgInit capabilities and magic
 * removed ReaderResponse, will use Error instead with code==VSC_SUCCESS.
 * adaded Flush and FlushComplete, remove Reconnect.
 * define VSCARD_MAGIC
 * added error code VSC_SUCCESS.

Fixes:
 * update VSCMsgInit comment
 * fix message type enum
 * remove underscore from wrapping define
 * update copyright
 * updated comments.
 * Header comment updated
 * remove C++ style comment
 * fix comment for VSCMsgError
 * give names to enums in typedefs
---
 libcacard/vscard_common.h |  167 +
 1 files changed, 167 insertions(+), 0 deletions(-)
 create mode 100644 libcacard/vscard_common.h

diff --git a/libcacard/vscard_common.h b/libcacard/vscard_common.h
new file mode 100644
index 000..7449314
--- /dev/null
+++ b/libcacard/vscard_common.h
@@ -0,0 +1,167 @@
+/* Virtual Smart Card protocol definition
+ *
+ * This protocol is between a host using virtual smart card readers,
+ * and a client providing the smart cards, perhaps by emulating them or by
+ * access to real cards.
+ *
+ * Definitions for this protocol:
+ *  Host   - user of the card
+ *  Client - owner of the card
+ *
+ * The current implementation passes the raw APDU's from 7816 and additionally
+ * contains messages to setup and teardown readers, handle insertion and
+ * removal of cards, negotiate the protocol via capabilities and provide
+ * for error responses.
+ *
+ * Copyright (c) 2011 Red Hat.
+ *
+ * This code is licensed under the LGPL.
+ */
+
+#ifndef VSCARD_COMMON_H
+#define VSCARD_COMMON_H
+
+#include stdint.h
+
+#define VERSION_MAJOR_BITS 11
+#define VERSION_MIDDLE_BITS 11
+#define VERSION_MINOR_BITS 10
+
+#define MAKE_VERSION(major, middle, minor) \
+ ((major   (VERSION_MINOR_BITS + VERSION_MIDDLE_BITS)) \
+  | (middle   VERSION_MINOR_BITS) \
+  | (minor))
+
+/** IMPORTANT NOTE on VERSION
+ *
+ * The version below MUST be changed whenever a change in this file is made.
+ *
+ * The last digit, the minor, is for bug fix changes only.
+ *
+ * The middle digit is for backward / forward compatible changes, updates
+ * to the existing messages, addition of fields.
+ *
+ * The major digit is for a breaking change of protocol, presumably
+ * something that cannot be accomodated with the existing protocol.
+ */
+
+#define VSCARD_VERSION MAKE_VERSION(0, 0, 2)
+
+typedef enum VSCMsgType {
+VSC_Init = 1,
+VSC_Error,
+VSC_ReaderAdd,
+VSC_ReaderRemove,
+VSC_ATR,
+VSC_CardRemove,
+VSC_APDU,
+VSC_Flush,
+VSC_FlushComplete
+} VSCMsgType;
+
+typedef enum VSCErrorCode {
+VSC_SUCCESS = 0,
+VSC_GENERAL_ERROR = 1,
+VSC_CANNOT_ADD_MORE_READERS,
+VSC_CARD_ALREAY_INSERTED,
+} VSCErrorCode;
+
+#define VSCARD_UNDEFINED_READER_ID  0x
+#define VSCARD_MINIMAL_READER_ID0
+
+#define VSCARD_MAGIC (*(uint32_t *)VSCD)
+
+/* Header
+ * Each message starts with the header.
+ * type - message type
+ * reader_id - used by messages that are reader specific
+ * length - length of payload (not including header, i.e. zero for
+ *  messages containing empty payloads)
+ */
+typedef struct VSCMsgHeader {
+uint32_t   type;
+uint32_t   reader_id;
+uint32_t   length;
+uint8_tdata[0];
+} VSCMsgHeader;
+
+/* VSCMsgInit   Client - Host
+ * Client sends it on connection, with its own capabilities.
+ * Host replies with VSCMsgInit filling in its capabilities.
+ *
+ * It is not meant to be used for negotiation, i.e. sending more then
+ * once from any side, but could be used for that in the future.
+ * */
+typedef struct VSCMsgInit {
+uint32_t   magic;
+uint32_t   version;
+uint32_t   capabilities[1]; /* receiver must check length,
+   array may grow in the future*/
+} VSCMsgInit;
+
+/* VSCMsgError  Client - Host
+ * This message is a response to any of:
+ *  Reader Add
+ *  Reader Remove
+ *  Card Remove
+ * If the operation was successful then VSC_SUCCESS
+ * is returned, other wise a specific error code.
+ * */
+typedef struct VSCMsgError {
+uint32_t   code;
+} VSCMsgError;
+
+/* VSCMsgReaderAdd  Client - Host
+ * Host replies with allocated reader id in VSCMsgError with code==SUCCESS.
+ *
+ * name - name of the reader on client side, UTF-8 encoded. Only used
+ *  for client presentation (may be translated to the device presented to the
+ *  guest), protocol wise only reader_id is important.
+ * */
+typedef struct VSCMsgReaderAdd {
+uint8_tname[0];
+} VSCMsgReaderAdd;
+
+/* VSCMsgReaderRemove   Client - Host
+ * The client's reader has been removed.
+ * */
+typedef struct VSCMsgReaderRemove {
+} VSCMsgReaderRemove;
+
+/* VSCMsgATRClient - Host
+ * Answer to reset. Sent for card insertion or card reset. The reset/insertion
+ * happens on the client side, they do not require any action from the

[Qemu-devel] [PATCH v20 0/7] usb-ccid

2011-02-23 Thread Alon Levy

This patchset adds three new devices, usb-ccid, ccid-card-passthru and
ccid-card-emulated, providing a CCID bus, a simple passthru protocol
implementing card requiring a client, and a standalone emulated card.

It also introduces a new directory libcaccard with CAC card emulation,
CAC is a type of ISO 7816 smart card.

Tree for pull: git://anongit.freedesktop.org/~alon/qemu usb_ccid.v20

v19-v20 changes:
 * checkpatch.pl. Here are the remaining errors with explanation:
  * ignored 5 macro errors of the type
   ERROR: Macros with complex values should be enclosed in parenthesis
   because fixing them breaks current code, if it really bothers someone
   I can fix it.
   * four of them are in libcacard/card_7816t.h:
   /* give the subfields a unified look */
   ..
#define a_cla a_header-ah_cla /* class */
#define a_ins a_header-ah_ins /* instruction */
#define a_p1 a_header-ah_p1   /* parameter 1 */
#define a_p2 a_header-ah_p2   /* parameter 2 */
   * and the fifth:
#4946: FILE: libcacard/vcardt.h:31:
+#define VCARD_ATR_PREFIX(size) 0x3b, 0x66+(size), 0x00, 0xff, \
+   'V', 'C', 'A', 'R', 'D', '_'
  * Ignored this warning since I couldn't figure it out, and it's a test
   file:
WARNING: externs should be avoided in .c files
#2343: FILE: libcacard/link_test.c:7:
+VCardStatus cac_card_init(const char *flags, VCard *card,

v18-v19 changes:
 * more merges, down to a single digit number of patches.
 * drop enumeration property, use string.
 * rebased (trivial)

v17-v18 changes:
 * merge vscard_common.h patches.
 * actually provide a tree to pull.

v16-v17 changes:
 * merged all the v15-v16 patches
 * merged some more wherever it was easy (all same file commits).
 * added signed off by to first four patches
 * ccid.h: added copyright, removed underscore in defines, and replaced
 non C89 comments

v15-v16 changes:
 * split vscard_common introducing patch for ease of review
 * sum of commit logs for the v15-v16 commits: (whitespace fixes
removed for space, see original commit messages in later patches)
  * usb-ccid:
   * fix abort on client answer after card remove
   * enable migration
   * remove side affect code from asserts
   * return consistent self-powered state
   * mask out reserved bits in ccid_set_parameters
   * add missing abRFU in SetParameters (no affect on linux guest)
  * vscard_common.h protocol change:
   * VSCMsgInit capabilities and magic
   * removed ReaderResponse, will use Error instead with code==VSC_SUCCESS.
   * added Flush and FlushComplete, remove Reconnect.
   * define VSCARD_MAGIC
   * added error code VSC_SUCCESS.
  * ccid-card-passthru
   * return correct size
   * return error instead of assert if client sent too large ATR
   * don't assert if client sent too large a size, but add asserts for indices 
to buffer
   * reset vscard_in indices on chardev disconnect
   * handle init from client
   * error if no chardev supplied
   * use ntoh, hton
   * eradicate reader_id_t
   * remove Reconnect usage (removed from VSCARD protocol)
   * send VSC_SUCCESS on card insert/remove and reader add/remove
  * ccid-card-emulated
   * fix error reporting in initfn

v14-v15 changes:
 * add patch with --enable-smartcard and --disable-smartcard and only
  disable ccid-card-emulated if nss not found.
 * add patch with description strings
 * s/libcaccard/libcacard/ in docs/ccid.txt

v13-v14 changes:
 - support device_del/device_add on ccid-card-* and usb-ccid
 * usb-ccid:
  * lose card reference when card device deleted
  * check slot number and deny adding a slot if one is already added.
 * ccid-card-*: use qdev_simple_unplug_cb in both emulated and passthru ccid 
cards,
   the exitfn already takes care of triggering card removal in the usb dev.
 * libcacard:
  * remove double include of config-host.mak
  * add replay of card events to libcacard to support second and more emulation
  * don't initialize more then once (doesn't support it right now, so one
   thread, NSS thread, is left when device_del is done)
  * add VCARD_EMUL_INIT_ALREADY_INITED
 * ccid-card-emulated:
  * take correct mutexes on signaling to fix deadlocks on device_del
  * allow card insertion/removal event without proper reader insertion event

v12-v13 changes:
 * libcacard:
  * fix Makefile clean to remove vscclient
  * fix double include of config-host in Makefile
 * usb-ccid: remove attach/detach logic, usb is always attached. Guest
  doesn't care if there is a reader attached with no card anyway.
 * ccid-card-passthru: don't close chr_dev on removal, makes it possible
  to use device_del/device_add to create remove/insertion for debugging.

v11-v12 changes:
 * fix out of tree build

v10-v11 changes:
 * fix last patch that removed one of the doc files.
 * updated flow table in docs/ccid.txt

v8-v10 changes:
 * usb-ccid:
  * add slot for future use (Gerd)
  * ifdef ENABLE_MIGRATION for migration support on account of usb
   migration not being ready in general. (Gerd)
 * verbosified commit messages.

[Qemu-devel] [PATCH 6/7] ccid: add docs

2011-02-23 Thread Alon Levy

Add documentation for the usb-ccid device and accompanying two card
devices, ccid-card-emulated and ccid-card-passthru.

Signed-off-by: Alon Levy al...@redhat.com
---
 docs/ccid.txt |  135 +
 1 files changed, 135 insertions(+), 0 deletions(-)
 create mode 100644 docs/ccid.txt

diff --git a/docs/ccid.txt b/docs/ccid.txt
new file mode 100644
index 000..b8e504a
--- /dev/null
+++ b/docs/ccid.txt
@@ -0,0 +1,135 @@
+Qemu CCID Device Documentation.
+
+Contents
+1. USB CCID device
+2. Building
+3. Using ccid-card-emulated with hardware
+4. Using ccid-card-emulated with certificates
+5. Using ccid-card-passthru with client side hardware
+6. Using ccid-card-passthru with client side certificates
+7. Passthrough protocol scenario
+8. libcacard
+
+1. USB CCID device
+
+The USB CCID device is a USB device implementing the CCID specification, which
+lets one connect smart card readers that implement the same spec. For more
+information see the specification:
+
+ Universal Serial Bus
+ Device Class: Smart Card
+ CCID
+ Specification for
+ Integrated Circuit(s) Cards Interface Devices
+ Revision 1.1
+ April 22rd, 2005
+
+Smartcard are used for authentication, single sign on, decryption in
+public/private schemes and digital signatures. A smartcard reader on the client
+cannot be used on a guest with simple usb passthrough since it will then not be
+available on the client, possibly locking the computer when it is removed. On
+the other hand this device can let you use the smartcard on both the client and
+the guest machine. It is also possible to have a completely virtual smart card
+reader and smart card (i.e. not backed by a physical device) using this device.
+
+2. Building
+
+The cryptographic functions and access to the physical card is done via NSS.
+
+Installing NSS:
+
+In redhat/fedora:
+yum install nss-devel
+In ubuntu/debian:
+apt-get install libnss3-dev
+(not tested on ubuntu)
+
+Configuring and building:
+./configure --enable-smartcard  make
+
+3. Using ccid-card-emulated with hardware
+
+Assuming you have a working smartcard on the host with the current
+user, using NSS, qemu acts as another NSS client using ccid-card-emulated:
+
+qemu -usb -device usb-ccid -device ccid-card-emualated
+
+4. Using ccid-card-emulated with certificates
+
+You must create the certificates. This is a one time process. We use NSS
+certificates:
+
+certutil -d /etc/pki/nssdb -x -t CT,CT,CT -S -s CN=cert1 -n cert1
+
+Note: you must have exactly three certificates.
+
+Assuming the current user can access the certificates (use certutil -L to
+verify), you can use the emulated card type with the certificates backend:
+
+qemu -usb -device usb-ccid -device 
ccid-card-emulated,backend=certificates,cert1=cert1,cert2=cert2,cert3=cert3
+
+5. Using ccid-card-passthru with client side hardware
+
+on the host specify the ccid-card-passthru device with a suitable chardev:
+
+qemu -chardev socket,server,host=0.0.0.0,port=2001,id=ccid,nowait -usb 
-device usb-ccid -device ccid-card-passthru,chardev=ccid
+
+on the client run vscclient, built when you built the libcacard library:
+libcacard/vscclient qemu-host 2001
+
+6. Using ccid-card-passthru with client side certificates
+
+Run qemu as per #5, and run vscclient as follows:
+(Note: vscclient command line interface is in a state of change)
+
+libcacard/vscclient -e db=\/etc/pki/nssdb\ use_hw=no 
soft=(,Test,CAC,,cert1,cert2,cert3) qemu-host 2001
+
+7. Passthrough protocol scenario
+
+This is a typical interchange of messages when using the passthru card device.
+usb-ccid is a usb device. It defaults to an unattached usb device on startup.
+usb-ccid expects a chardev and expects the protocol defined in
+cac_card/vscard_common.h to be passed over that.
+The usb-ccid device can be in one of three modes:
+ * detached
+ * attached with no card
+ * attached with card
+
+A typical interchange is: (the arrow shows who started each exchange, it can 
be client
+originated or guest originated)
+
+client event  |  vscclient   |passthru| usb-ccid  
|  guest event
+--
+  |  VSC_Init||   |
+  |  VSC_ReaderAdd   || attach|
+  |  ||   
|  sees new usb device.
+card inserted -  |  ||   |
+  |  VSC_ATR |   insert   | insert
|  see new card
+  |  ||   |
+  |  VSC_APDU|   VSC_APDU |   
| - guest sends APDU
+client-physical |  ||   |
+card APDU exchange|

Re: [Qemu-devel] Re: [PATCH 10/22] migration: Refactor and simplify error checking in migrate_fd_put_ready

2011-02-23 Thread Yoshiaki Tamura

2011/2/23 Juan Quintela quint...@redhat.com:
 Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote:
 2011/2/23 Juan Quintela quint...@redhat.com:

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  migration.c |   20 +---
  1 files changed, 9 insertions(+), 11 deletions(-)

 diff --git a/migration.c b/migration.c
 index f015e02..641df9f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -361,28 +361,26 @@ static void migrate_fd_put_ready(void *opaque)

     DPRINTF(iterate\n);
     if (qemu_savevm_state_iterate(s-mon, s-file) == 1) {
 -        int state;
         int old_vm_running = vm_running;

         DPRINTF(done iterating\n);
         vm_stop(VMSTOP_MIGRATE);

 -        if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 -            if (old_vm_running) {
 -                vm_start();
 -            }
 -            state = MIG_STATE_ERROR;
 +        if (qemu_savevm_state_complete(s-mon, s-file)  0) {
 +            migrate_fd_error(s);
         } else {
 -            state = MIG_STATE_COMPLETED;
 +            if (migrate_fd_cleanup(s)  0) {
 +                migrate_fd_error(s);
 +            } else {
 +                s-state = MIG_STATE_COMPLETED;
 +                notifier_list_notify(migration_state_notifiers);
 +            }
         }
 -        if (migrate_fd_cleanup(s)  0) {
 +        if (s-get_status(s) != MIG_STATE_COMPLETED) {
             if (old_vm_running) {
                 vm_start();
             }

 This part, although it's not fair to ask you, but calling
 vm_start when != MIG_STATE_COMPLETED terrifies me because just
 failing migrate_fd_cleanup (mostly calling qemu_fclose) may cause
 split brain between src/dst.  Although I haven't encountered this
 situation, just having stopped VMs on both sides is safer.

 I see your pain. I am not happy at all, but this was integrated by
 Anthony to fix this bug:

 commit 41ef56e61153d7bd27d34a634633bb51b1c5988d
 Author: Anthony Liguori aligu...@us.ibm.com
 Date:   Wed Jun 2 14:55:25 2010 -0500

    migration: respect exit status with exec:

  This fixes https://bugs.launchpad.net/qemu/+bug/391879


Thanks for the link.  I don't know IIUC, why stopping the VM was
a problem?  The essential thing is that we need to introduce a
flag that whether user wants to continue a VM when something goes
wrong during live migration.  Deciding only with old_vm_running is
wrong.

 I think that it fixes that bug, but it makes me un-easy to restart vm if
 there is a failure in migrate_fd_cleanup().  As I didn't wanted to
 change behaviour with this series, I left it as it was.

I agree with keeping the behavior unchanged.

 Next on ToDo list is to do something sensible with errors, just now we
 are not very good at handling them.

Yeah.  If we introduce Kemari, the migration code becomes more
important because it'll be part of the normal VM execution
path :)

Thanks,

Yoshi


 Later, Juan.

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Jan Kiszka

On 2011-02-23 12:08, Edgar E. Iglesias wrote:
 On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
 On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
 Sorry, I don't know the code well enough to give any sensible feedback
 on patch 2 - 4. I did test them with some of my guests and things seem
 to be OK with them but quite a bit slower.
 I saw around 10 - 20% slowdown with a cris guest and -icount 10.

 The slow down might be related to the issue with super slow icount together
 with iothread (adressed by Marcelos iothread timeout patch).

 No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
 looks like it deadlocked anyway.  Also, Jan has ideas on how to remove 
 the synchronization overhead in the main loop for TCG+iothread.
 
 I see. I tried booting two of my MIPS and CRIS linux guests with iothread
 and -icount 4. Without your patch, the boot crawls super slow. Your patch
 gives a huge improvement. This was the deadlock scenario which I
 mentioned in previous emails.
 
 Just to clarify the previous test where I saw slowdown with your patch:
 A CRIS setup that has a CRIS and basically only two peripherals,
 a timer block and a device (X) that computes stuff but delays the results
 with a virtual timer. The guest CPU is 99% of the time just
 busy-waiting for device X to get ready.
 
 This latter test runs in 3.7s with icount 4 and without iothread,
 with or without your patch.
 
 With icount 4 and iothread it runs in ~1m5s without your patch and
 ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
 
 Don't know if that info helps...

You should try to trace the event flow in qemu, either via strace, via
the built-in tracer (which likely requires a bit more tracepoints), or
via a system-level tracer (ftrace / kernelshark).

Did my patches contribute a bit to overhead reduction? They specifically
target the costly vcpu/iothread switches in TCG mode (caused by TCGs
excessive lock-holding times).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH] tools: Use real async.c instead of stubs

2011-02-23 Thread Kevin Wolf

It's wrong to call BHs directly, even in tools. The only operations that
schedule BHs are called in a loop that (indirectly) contains a call to
qemu_bh_poll anyway, so we're not losing the scheduled BHs: Tools either use
synchronous functions, which are guaranteed to have completed (including any
BHs) when they return; or if they use asynchronous functions, they need to call
qemu_aio_wait() or similar functions already today.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 Makefile.objs |4 ++--
 qemu-tool.c   |   47 ---
 2 files changed, 6 insertions(+), 45 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index c144df1..fc369dc 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -13,7 +13,7 @@ oslib-obj-$(CONFIG_POSIX) += oslib-posix.o
 ###
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
-block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
+block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o 
async.o
 block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
@@ -63,7 +63,7 @@ common-obj-y = $(block-obj-y) blockdev.o
 common-obj-y += $(net-obj-y)
 common-obj-y += $(qobject-obj-y)
 common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
-common-obj-y += readline.o console.o cursor.o async.o qemu-error.o
+common-obj-y += readline.o console.o cursor.o qemu-error.o
 common-obj-y += $(oslib-obj-y)
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
diff --git a/qemu-tool.c b/qemu-tool.c
index 392e1c9..d45840d 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -56,53 +56,10 @@ void monitor_print_filename(Monitor *mon, const char 
*filename)
 {
 }
 
-void async_context_push(void)
-{
-}
-
-void async_context_pop(void)
-{
-}
-
-int get_async_context_id(void)
-{
-return 0;
-}
-
 void monitor_protocol_event(MonitorEvent event, QObject *data)
 {
 }
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
-{
-QEMUBH *bh;
-
-bh = qemu_malloc(sizeof(*bh));
-bh-cb = cb;
-bh-opaque = opaque;
-
-return bh;
-}
-
-int qemu_bh_poll(void)
-{
-return 0;
-}
-
-void qemu_bh_schedule(QEMUBH *bh)
-{
-bh-cb(bh-opaque);
-}
-
-void qemu_bh_cancel(QEMUBH *bh)
-{
-}
-
-void qemu_bh_delete(QEMUBH *bh)
-{
-qemu_free(bh);
-}
-
 int qemu_set_fd_handler2(int fd,
  IOCanReadHandler *fd_read_poll,
  IOHandler *fd_read,
@@ -111,3 +68,7 @@ int qemu_set_fd_handler2(int fd,
 {
 return 0;
 }
+
+void qemu_notify_event(void)
+{
+}
-- 
1.7.2.3

[Qemu-devel] [PATCH 5/7] ccid: add ccid-card-emulated device

2011-02-23 Thread Alon Levy

This devices uses libcacard (internal) to emulate a smartcard conforming
to the CAC standard. It attaches to the usb-ccid bus. Usage instructions
(example command lines) are in the following patch in docs/ccid.txt. It
uses libcacard which uses nss, so it can work with both hw cards and
certificates (files).

Signed-off-by: Alon Levy al...@redhat.com

---

changes from v19-v20:
 * checkpatch.pl

changes from v18-v19:
 * add qdev.desc
 * backend: drop the enumeration property, back to using a string one.

changes from v16-v17:
 * use PROP_TYPE_ENUM for backend

changes from v15-v16:
 * fix error reporting in initfn
 * bump copyright year
 * update copyright license

changes from v1:
 * remove stale comments, use only c-style comments
 * bugfix, forgot to set recv_len
 * change reader name to 'Virtual Reader'
---
 Makefile.objs   |2 +-
 hw/ccid-card-emulated.c |  599 +++
 2 files changed, 600 insertions(+), 1 deletions(-)
 create mode 100644 hw/ccid-card-emulated.c

diff --git a/Makefile.objs b/Makefile.objs
index b5e001e..f0933f3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -197,7 +197,7 @@ hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_HPET) += hpet.o
 hw-obj-$(CONFIG_APPLESMC) += applesmc.o
-hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
+hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o 
ccid-card-emulated.o
 
 # PPC devices
 hw-obj-$(CONFIG_OPENPIC) += openpic.o
diff --git a/hw/ccid-card-emulated.c b/hw/ccid-card-emulated.c
new file mode 100644
index 000..bd84d45
--- /dev/null
+++ b/hw/ccid-card-emulated.c
@@ -0,0 +1,599 @@
+/*
+ * CCID Card Device. Emulated card.
+ *
+ * Copyright (c) 2011 Red Hat.
+ * Written by Alon Levy.
+ *
+ * This code is licenced under the GNU LGPL, version 2 or later.
+ */
+
+/*
+ * It can be used to provide access to the local hardware in a non exclusive
+ * way, or it can use certificates. It requires the usb-ccid bus.
+ *
+ * Usage 1: standard, mirror hardware reader+card:
+ * qemu .. -usb -device usb-ccid -device ccid-card-emulated
+ *
+ * Usage 2: use certificates, no hardware required
+ * one time: create the certificates:
+ *  for i in 1 2 3; do
+ *  certutil -d /etc/pki/nssdb -x -t CT,CT,CT -S -s CN=user$i -n user$i
+ *  done
+ * qemu .. -usb -device usb-ccid \
+ *  -device ccid-card-emulated,cert1=user1,cert2=user2,cert3=user3
+ *
+ * If you use a non default db for the certificates you can specify it using
+ * the db parameter.
+ */
+
+#include pthread.h
+#include eventt.h
+#include vevent.h
+#include vreader.h
+#include vcard_emul.h
+#include qemu-char.h
+#include monitor.h
+#include hw/ccid.h
+
+#define DPRINTF(card, lvl, fmt, ...) \
+do {\
+if (lvl = card-debug) {\
+printf(ccid-card-emul: %s:  fmt , __func__, ## __VA_ARGS__);\
+} \
+} while (0)
+
+#define EMULATED_DEV_NAME ccid-card-emulated
+
+#define BACKEND_NSS_EMULATED_NAME nss-emulated
+#define BACKEND_CERTIFICATES_NAME certificates
+
+enum {
+BACKEND_NSS_EMULATED = 1,
+BACKEND_CERTIFICATES
+};
+
+#define DEFAULT_BACKEND BACKEND_NSS_EMULATED
+
+typedef struct EmulatedState EmulatedState;
+
+enum {
+EMUL_READER_INSERT = 0,
+EMUL_READER_REMOVE,
+EMUL_CARD_INSERT,
+EMUL_CARD_REMOVE,
+EMUL_GUEST_APDU,
+EMUL_RESPONSE_APDU,
+EMUL_ERROR,
+};
+
+static const char *emul_event_to_string(uint32_t emul_event)
+{
+switch (emul_event) {
+case EMUL_READER_INSERT: return EMUL_READER_INSERT;
+case EMUL_READER_REMOVE: return EMUL_READER_REMOVE;
+case EMUL_CARD_INSERT: return EMUL_CARD_INSERT;
+case EMUL_CARD_REMOVE: return EMUL_CARD_REMOVE;
+case EMUL_GUEST_APDU: return EMUL_GUEST_APDU;
+case EMUL_RESPONSE_APDU: return EMUL_RESPONSE_APDU;
+case EMUL_ERROR: return EMUL_ERROR;
+default:
+break;
+}
+return UNKNOWN;
+}
+
+typedef struct EmulEvent {
+QSIMPLEQ_ENTRY(EmulEvent) entry;
+union {
+struct {
+uint32_t type;
+} gen;
+struct {
+uint32_t type;
+uint64_t code;
+} error;
+struct {
+uint32_t type;
+uint32_t len;
+uint8_t data[];
+} data;
+} p;
+} EmulEvent;
+
+#define MAX_ATR_SIZE 40
+struct EmulatedState {
+CCIDCardState base;
+uint8_t  debug;
+char*backend_str;
+uint32_t backend;
+char*cert1;
+char*cert2;
+char*cert3;
+char*db;
+uint8_t  atr[MAX_ATR_SIZE];
+uint8_t  atr_length;
+QSIMPLEQ_HEAD(event_list, EmulEvent) event_list;
+pthread_mutex_t event_list_mutex;
+VReader *reader;
+QSIMPLEQ_HEAD(guest_apdu_list, EmulEvent) guest_apdu_list;
+pthread_mutex_t vreader_mutex; /* and guest_apdu_list mutex */
+pthread_mutex_t handle_apdu_mutex;
+pthread_cond_t handle_apdu_cond;
+int  pipe[2];
+int  quit_apdu_thread;
+pthread_mutex_t apdu_thread_quit_mutex;
+

[Qemu-devel] [PATCH 7/7] ccid: configure: improve --enable-smartcard flags

2011-02-23 Thread Alon Levy

 * add --enable-smartcard and --disable-smartcard flags
 * let the nss check only disable building the ccid-card-emulated device
 * report only if nss is found or not, not smartcard build inclusion
 * don't link with NSS if --disable-smartcard-nss

Signed-off-by: Alon Levy al...@redhat.com
---
 Makefile.objs   |3 ++-
 Makefile.target |2 +-
 configure   |   55 ++-
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index f0933f3..bba862c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -197,7 +197,8 @@ hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_HPET) += hpet.o
 hw-obj-$(CONFIG_APPLESMC) += applesmc.o
-hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o 
ccid-card-emulated.o
+hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
+hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o
 
 # PPC devices
 hw-obj-$(CONFIG_OPENPIC) += openpic.o
diff --git a/Makefile.target b/Makefile.target
index 1b744be..0951ba0 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -340,7 +340,7 @@ obj-y += $(addprefix $(HWDIR)/, $(hw-obj-y))
 
 endif # CONFIG_SOFTMMU
 
-obj-y += $(addprefix ../libcacard/, $(libcacard-$(CONFIG_SMARTCARD)))
+obj-y += $(addprefix ../libcacard/, $(libcacard-$(CONFIG_SMARTCARD_NSS)))
 
 obj-y += $(addprefix ../, $(trace-obj-y))
 obj-$(CONFIG_GDBSTUB_XML) += gdbstub-xml.o
diff --git a/configure b/configure
index e2d0e9d..4f19613 100755
--- a/configure
+++ b/configure
@@ -174,7 +174,8 @@ trace_backend=nop
 trace_file=trace
 spice=
 rbd=
-smartcard=yes
+smartcard=
+smartcard_nss=
 
 # parse CC options first
 for opt do
@@ -720,6 +721,14 @@ for opt do
   ;;
   --enable-rbd) rbd=yes
   ;;
+  --disable-smartcard) smartcard=no
+  ;;
+  --enable-smartcard) smartcard=yes
+  ;;
+  --disable-smartcard-nss) smartcard_nss=no
+  ;;
+  --enable-smartcard-nss) smartcard_nss=yes
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -915,6 +924,10 @@ echoDefault:trace-pid
 echo   --disable-spice  disable spice
 echo   --enable-spice   enable spice
 echo   --enable-rbd enable building the rados block device (rbd)
+echo   --disable-smartcard  disable smartcard support
+echo   --enable-smartcard   enable smartcard support
+echo   --disable-smartcard-nss  disable smartcard nss support
+echo   --enable-smartcard-nss   enable smartcard nss support
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -2292,16 +2305,28 @@ EOF
 fi
 
 # check for libcacard for smartcard support
-smartcard_cflags=-I\$(SRC_PATH)/libcacard
-libcacard_libs=$($pkgconfig --libs nss 2/dev/null)
-libcacard_cflags=$($pkgconfig --cflags nss)
-# TODO - what's the minimal nss version we support?
-if $pkgconfig --atleast-version=3.12.8 nss; then
+if test $smartcard != no ; then
 smartcard=yes
-QEMU_CFLAGS=$QEMU_CFLAGS $smartcard_cflags $libcacard_cflags
-LIBS=$libcacard_libs $LIBS
-else
-smartcard=no
+smartcard_cflags=
+# TODO - what's the minimal nss version we support?
+if test $smartcard_nss != no; then
+if $pkg_config --atleast-version=3.12.8 nss /dev/null 21 ; then
+smartcard_nss=yes
+smartcard_cflags=-I\$(SRC_PATH)/libcacard
+libcacard_libs=$($pkg_config --libs nss 2/dev/null)
+libcacard_cflags=$($pkg_config --cflags nss 2/dev/null)
+QEMU_CFLAGS=$QEMU_CFLAGS $smartcard_cflags $libcacard_cflags
+LIBS=$libcacard_libs $LIBS
+else
+if test $smartcard_nss == yes; then
+feature_not_found nss
+fi
+smartcard_nss=no
+fi
+fi
+fi
+if test $smartcard == no ; then
+smartcard_nss=no
 fi
 
 ##
@@ -2537,7 +2562,7 @@ echo Trace output file $trace_file-pid
 echo spice support $spice
 echo rbd support   $rbd
 echo xfsctl support$xfs
-echo smartcard support $smartcard
+echo nss used  $smartcard_nss
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -2823,6 +2848,10 @@ if test $smartcard = yes ; then
   echo CONFIG_SMARTCARD=y  $config_host_mak
 fi
 
+if test $smartcard_nss = yes ; then
+  echo CONFIG_SMARTCARD_NSS=y  $config_host_mak
+fi
+
 # XXX: suppress that
 if [ $bsd = yes ] ; then
   echo CONFIG_BSD=y  $config_host_mak
@@ -3165,7 +3194,7 @@ fi
 if test $target_darwin_user = yes ; then
   echo CONFIG_DARWIN_USER=y  $config_target_mak
 fi
-if test $smartcard = yes ; then
+if test $smartcard_nss = yes ; then
   echo subdir-$target: subdir-libcacard  $config_host_mak
   echo libcacard_libs=$libcacard_libs  $config_host_mak
   echo libcacard_cflags=$libcacard_cflags  $config_host_mak
@@ -3383,7 +3412,7 @@ for hwlib in 32 64; do
   echo

[Qemu-devel] [PATCH 3/7] ccid: add passthru card device

2011-02-23 Thread Alon Levy

The passthru ccid card is a device sitting on the usb-ccid bus and
using a chardevice to communicate with a remote device using the
VSCard protocol defined in libcacard/vscard_common.h

Usage docs available in following patch in docs/ccid.txt

Signed-off-by: Alon Levy al...@redhat.com

---

Changes from v19-v20:
 * checkpatch.pl

Changes from v18-v19:
 * add qdev.desc
 * remove .qdev.unplug (no hot unplug support for ccid bus)

Changes from v16-v17:
 * fix wrong cast when receiving VSC_Error
 * ccid-card-passthru: force chardev user wakeup by sending Init
   see lengthy comment below.

Changes from v15-v16:

Behavioral changes:
 * return correct size
 * return error instead of assert if client sent too large ATR
 * don't assert if client sent too large a size, but add asserts for indices to 
buffer
 * reset vscard_in indices on chardev disconnect
 * handle init from client
 * error if no chardev supplied
 * use ntoh, hton
 * eradicate reader_id_t
 * remove Reconnect usage (removed from VSCARD protocol)
 * send VSC_SUCCESS on card insert/remove and reader add/remove

Style fixes:
 * width of line fix
 * update copyright
 * remove old TODO's
 * update file header comment
 * use macros for debug levels
 * c++ style comment replacement
 * update copyright license
 * fix ATR size comment
 * fix whitespace in struct def
 * fix DPRINTF prefix
 * line width fix

ccid-card-passthru: force chardev user wakeup by sending Init

The problem: how to wakeup the user of the smartcard when the smartcard
device is initialized?

Long term solution: have a callback interface. This was done via
the deprecated so called chardev ioctl interface.

Short term solution: do a write. Specifically we write an Init message.
And we change the client to send it's own Init message regardless of
receiving this one. Additional Init messages will be regarded as
acceptable, the first one received after connection establishment is
the determining one wrt capabilities.
---
 Makefile.objs   |2 +-
 hw/ccid-card-passthru.c |  337 +++
 2 files changed, 338 insertions(+), 1 deletions(-)
 create mode 100644 hw/ccid-card-passthru.c

diff --git a/Makefile.objs b/Makefile.objs
index 414d206..a3ac45a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -197,7 +197,7 @@ hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_HPET) += hpet.o
 hw-obj-$(CONFIG_APPLESMC) += applesmc.o
-hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o
+hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
 
 # PPC devices
 hw-obj-$(CONFIG_OPENPIC) += openpic.o
diff --git a/hw/ccid-card-passthru.c b/hw/ccid-card-passthru.c
new file mode 100644
index 000..ad2adb7
--- /dev/null
+++ b/hw/ccid-card-passthru.c
@@ -0,0 +1,337 @@
+/*
+ * CCID Passthru Card Device emulation
+ *
+ * Copyright (c) 2011 Red Hat.
+ * Written by Alon Levy.
+ *
+ * This code is licenced under the GNU LGPL, version 2 or later.
+ */
+
+#include arpa/inet.h
+
+#include qemu-char.h
+#include monitor.h
+#include hw/ccid.h
+#include libcacard/vscard_common.h
+
+#define DPRINTF(card, lvl, fmt, ...)\
+do {\
+if (lvl = card-debug) {   \
+printf(ccid-card-passthru:  fmt , ## __VA_ARGS__); \
+}   \
+} while (0)
+
+#define D_WARN 1
+#define D_INFO 2
+#define D_MORE_INFO 3
+#define D_VERBOSE 4
+
+/* TODO: do we still need this? */
+uint8_t DEFAULT_ATR[] = {
+/* From some example somewhere
+ 0x3B, 0xB0, 0x18, 0x00, 0xD1, 0x81, 0x05, 0xB1, 0x40, 0x38, 0x1F, 0x03, 0x28
+ */
+
+/* From an Athena smart card */
+ 0x3B, 0xD5, 0x18, 0xFF, 0x80, 0x91, 0xFE, 0x1F, 0xC3, 0x80, 0x73, 0xC8, 0x21,
+ 0x13, 0x08
+};
+
+
+#define PASSTHRU_DEV_NAME ccid-card-passthru
+#define VSCARD_IN_SIZE 65536
+
+/* maximum size of ATR - from 7816-3 */
+#define MAX_ATR_SIZE40
+
+typedef struct PassthruState PassthruState;
+
+struct PassthruState {
+CCIDCardState base;
+CharDriverState *cs;
+uint8_t  vscard_in_data[VSCARD_IN_SIZE];
+uint32_t vscard_in_pos;
+uint32_t vscard_in_hdr;
+uint8_t  atr[MAX_ATR_SIZE];
+uint8_t  atr_length;
+uint8_t  debug;
+};
+
+/* VSCard protocol over chardev
+ * This code should not depend on the card type.
+ * */
+
+static void ccid_card_vscard_send_msg(PassthruState *s,
+VSCMsgType type, uint32_t reader_id,
+const uint8_t *payload, uint32_t length)
+{
+VSCMsgHeader scr_msg_header;
+
+scr_msg_header.type = htonl(type);
+scr_msg_header.reader_id = htonl(reader_id);
+scr_msg_header.length = htonl(length);
+qemu_chr_write(s-cs, (uint8_t *)scr_msg_header, sizeof(VSCMsgHeader));
+qemu_chr_write(s-cs, payload, length);
+}
+
+static void ccid_card_vscard_send_apdu(PassthruState *s,
+const uint8_t *apdu, uint32_t length)
+{
+ccid_card_vscard_send_msg(
+s, VSC_APDU,

[Qemu-devel] [PATCH 1/7] usb-ccid: add CCID bus

2011-02-23 Thread Alon Levy

A CCID device is a smart card reader. It is a USB device, defined at [1].
This patch introduces the usb-ccid device that is a ccid bus. Next patches will
introduce two card types to use it, a passthru card and an emulated card.

 [1] http://www.usb.org/developers/devclass_docs/DWG_Smart-Card_CCID_Rev110.

Signed-off-by: Alon Levy al...@redhat.com

---

changes from v19-v20:
 * checkpatch.pl

changes from v18-v19:
 * merged: ccid.h: add copyright, fix define and remove non C89 comments
 * add qdev.desc

changes from v15-v16:

Behavioral changes:
 * fix abort on client answer after card remove
 * enable migration
 * remove side affect code from asserts
 * return consistent self-powered state
 * mask out reserved bits in ccid_set_parameters
 * add missing abRFU in SetParameters (no affect on linux guest)

whitefixes / comments / consts defines:
 * remove stale comment
 * remove ccid_print_pending_answers if no DEBUG_CCID
 * replace printf's with DPRINTF, remove DEBUG_CCID, add verbosity defines
 * use error_report
 * update copyright (most of the code is not original)
 * reword known bug comment
 * add missing closing quote in comment
 * add missing whitespace on one line
 * s/CCID_SetParameter/CCID_SetParameters/
 * add comments
 * use define for max packet size

Comment for return consistent self-powered state:

the Configuration Descriptor bmAttributes claims we are self powered,
but we were returning not self powered to USB_REQ_GET_STATUS control message.

In practice, this message is not sent by a linux 2.6.35.10-74.fc14.x86_64
guest (not tested on other guests), unless you issue lsusb -v as root (for
example).
---
 Makefile.objs |1 +
 configure |6 +
 hw/ccid.h |   54 +++
 hw/usb-ccid.c | 1391 +
 4 files changed, 1452 insertions(+), 0 deletions(-)
 create mode 100644 hw/ccid.h
 create mode 100644 hw/usb-ccid.c

diff --git a/Makefile.objs b/Makefile.objs
index c144df1..414d206 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -197,6 +197,7 @@ hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_HPET) += hpet.o
 hw-obj-$(CONFIG_APPLESMC) += applesmc.o
+hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o
 
 # PPC devices
 hw-obj-$(CONFIG_OPENPIC) += openpic.o
diff --git a/configure b/configure
index 791b71d..147aab3 100755
--- a/configure
+++ b/configure
@@ -174,6 +174,7 @@ trace_backend=nop
 trace_file=trace
 spice=
 rbd=
+smartcard=yes
 
 # parse CC options first
 for opt do
@@ -2523,6 +2524,7 @@ echo Trace output file $trace_file-pid
 echo spice support $spice
 echo rbd support   $rbd
 echo xfsctl support$xfs
+echo smartcard support $smartcard
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -2804,6 +2806,10 @@ if test $spice = yes ; then
   echo CONFIG_SPICE=y  $config_host_mak
 fi
 
+if test $smartcard = yes ; then
+  echo CONFIG_SMARTCARD=y  $config_host_mak
+fi
+
 # XXX: suppress that
 if [ $bsd = yes ] ; then
   echo CONFIG_BSD=y  $config_host_mak
diff --git a/hw/ccid.h b/hw/ccid.h
new file mode 100644
index 000..4350bc2
--- /dev/null
+++ b/hw/ccid.h
@@ -0,0 +1,54 @@
+/*
+ * CCID Passthru Card Device emulation
+ *
+ * Copyright (c) 2011 Red Hat.
+ * Written by Alon Levy.
+ *
+ * This code is licenced under the GNU LGPL, version 2 or later.
+ */
+
+#ifndef CCID_H
+#define CCID_H
+
+#include qdev.h
+
+typedef struct CCIDCardState CCIDCardState;
+typedef struct CCIDCardInfo CCIDCardInfo;
+
+/* state of the CCID Card device (i.e. hw/ccid-card-*.c)
+ */
+struct CCIDCardState {
+DeviceState qdev;
+uint32_tslot; /* For future use with multiple slot reader. */
+};
+
+/* callbacks to be used by the CCID device (hw/usb-ccid.c) to call
+ * into the smartcard device (hw/ccid-card-*.c)
+ */
+struct CCIDCardInfo {
+DeviceInfo qdev;
+void (*print)(Monitor *mon, CCIDCardState *card, int indent);
+const uint8_t *(*get_atr)(CCIDCardState *card, uint32_t *len);
+void (*apdu_from_guest)(CCIDCardState *card,
+const uint8_t *apdu,
+uint32_t len);
+int (*exitfn)(CCIDCardState *card);
+int (*initfn)(CCIDCardState *card);
+};
+
+/* API for smartcard calling the CCID device (used by hw/ccid-card-*.c)
+ */
+void ccid_card_send_apdu_to_guest(CCIDCardState *card,
+  uint8_t *apdu,
+  uint32_t len);
+void ccid_card_card_removed(CCIDCardState *card);
+void ccid_card_card_inserted(CCIDCardState *card);
+void ccid_card_card_error(CCIDCardState *card, uint64_t error);
+void ccid_card_qdev_register(CCIDCardInfo *card);
+
+/* support guest visible insertion/removal of ccid devices based on actual
+ * devices connected/removed. Called by card implementation (passthru, local) 
*/
+int ccid_card_ccid_attach(CCIDCardState *card);
+void ccid_card_ccid_detach(CCIDCardState *card);
+
+#endif /* CCID_H */
diff --git

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Edgar E. Iglesias

On Wed, Feb 23, 2011 at 12:39:52PM +0100, Jan Kiszka wrote:
 On 2011-02-23 12:08, Edgar E. Iglesias wrote:
  On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
  On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
  Sorry, I don't know the code well enough to give any sensible feedback
  on patch 2 - 4. I did test them with some of my guests and things seem
  to be OK with them but quite a bit slower.
  I saw around 10 - 20% slowdown with a cris guest and -icount 10.
 
  The slow down might be related to the issue with super slow icount 
  together
  with iothread (adressed by Marcelos iothread timeout patch).
 
  No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
  looks like it deadlocked anyway.  Also, Jan has ideas on how to remove 
  the synchronization overhead in the main loop for TCG+iothread.
  
  I see. I tried booting two of my MIPS and CRIS linux guests with iothread
  and -icount 4. Without your patch, the boot crawls super slow. Your patch
  gives a huge improvement. This was the deadlock scenario which I
  mentioned in previous emails.
  
  Just to clarify the previous test where I saw slowdown with your patch:
  A CRIS setup that has a CRIS and basically only two peripherals,
  a timer block and a device (X) that computes stuff but delays the results
  with a virtual timer. The guest CPU is 99% of the time just
  busy-waiting for device X to get ready.
  
  This latter test runs in 3.7s with icount 4 and without iothread,
  with or without your patch.
  
  With icount 4 and iothread it runs in ~1m5s without your patch and
  ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
  
  Don't know if that info helps...
 
 You should try to trace the event flow in qemu, either via strace, via
 the built-in tracer (which likely requires a bit more tracepoints), or
 via a system-level tracer (ftrace / kernelshark).

Thanks, I'll see if I can get some time to run this more carefully during
some weekend.

 
 Did my patches contribute a bit to overhead reduction? They specifically
 target the costly vcpu/iothread switches in TCG mode (caused by TCGs
 excessive lock-holding times).

Do you have a tree for quick access to your patches? (couldnt find them
on my inbox).

I could give them a quick go and post results.

Cheers

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Paolo Bonzini


On 02/23/2011 12:08 PM, Edgar E. Iglesias wrote:

  No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to
  looks like it deadlocked anyway.  Also, Jan has ideas on how to remove
  the synchronization overhead in the main loop for TCG+iothread.

I see. I tried booting two of my MIPS and CRIS linux guests with iothread
and -icount 4. Without your patch, the boot crawls super slow. Your patch
gives a huge improvement. This was the deadlock scenario which I
mentioned in previous emails.

Just to clarify the previous test where I saw slowdown with your patch:
A CRIS setup that has a CRIS and basically only two peripherals,
a timer block and a device (X) that computes stuff but delays the results
with a virtual timer. The guest CPU is 99% of the time just
busy-waiting for device X to get ready.

This latter test runs in 3.7s with icount 4 and without iothread,
with or without your patch.


Thanks for testing this.


With icount 4 and iothread it runs in ~1m5s without your patch and
~1m20s with your patch. That was the 20% slowdown I mentioned earlier.


Ok, so it is in both cases with iothread.  We go from 16x slowdown to 
19x on one testcase :) and huge improvement on another.  (Also, the 
CRIS images on qemu.org simply hang for me without my patch and numeric 
icount---and the watchdog triggers---so that's another factor in favor 
of the patches).  I guess we can live with the slowdown for now, if 
somebody else finds the patch okay.


Do you have images for the slow test?

Paolo

[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread

2011-02-23 Thread Jan Kiszka

On 2011-02-23 13:40, Edgar E. Iglesias wrote:
 On Wed, Feb 23, 2011 at 12:39:52PM +0100, Jan Kiszka wrote:
 On 2011-02-23 12:08, Edgar E. Iglesias wrote:
 On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
 On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
 Sorry, I don't know the code well enough to give any sensible feedback
 on patch 2 - 4. I did test them with some of my guests and things seem
 to be OK with them but quite a bit slower.
 I saw around 10 - 20% slowdown with a cris guest and -icount 10.

 The slow down might be related to the issue with super slow icount 
 together
 with iothread (adressed by Marcelos iothread timeout patch).

 No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
 looks like it deadlocked anyway.  Also, Jan has ideas on how to remove 
 the synchronization overhead in the main loop for TCG+iothread.

 I see. I tried booting two of my MIPS and CRIS linux guests with iothread
 and -icount 4. Without your patch, the boot crawls super slow. Your patch
 gives a huge improvement. This was the deadlock scenario which I
 mentioned in previous emails.

 Just to clarify the previous test where I saw slowdown with your patch:
 A CRIS setup that has a CRIS and basically only two peripherals,
 a timer block and a device (X) that computes stuff but delays the results
 with a virtual timer. The guest CPU is 99% of the time just
 busy-waiting for device X to get ready.

 This latter test runs in 3.7s with icount 4 and without iothread,
 with or without your patch.

 With icount 4 and iothread it runs in ~1m5s without your patch and
 ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.

 Don't know if that info helps...

 You should try to trace the event flow in qemu, either via strace, via
 the built-in tracer (which likely requires a bit more tracepoints), or
 via a system-level tracer (ftrace / kernelshark).
 
 Thanks, I'll see if I can get some time to run this more carefully during
 some weekend.
 

 Did my patches contribute a bit to overhead reduction? They specifically
 target the costly vcpu/iothread switches in TCG mode (caused by TCGs
 excessive lock-holding times).
 
 Do you have a tree for quick access to your patches? (couldnt find them
 on my inbox).

http://thread.gmane.org/gmane.comp.emulators.qemu/93765
(looks like I failed to CC you)

and they are also part of

git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

 
 I could give them a quick go and post results.
 
 Cheers

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-23 Thread Avi Kivity


On 02/22/2011 11:11 PM, Anthony Liguori wrote:

The commit file is considered reliable storage for the result of image
switch operation. Think of the following scenario:

- mgmt app requests live copy of guests ide1-hd0
from /a/image.img to /b/image.img.
- mgmt app dies.
- guest switches to /b/image.img, /a/image.img is outdated.
- guest dies.

Notifying the switch via QMP would not be reliable in this case.


But this is true of any number of operations in QEMU such as hotplug 
where if a management tool dies after requesting hotplug and then the 
guest dies before the management tool reconnects, it's never known 
whether it's truly connected or not.


The only way to robustly solve this is for QEMU to maintain a stateful 
config file.


Or, the management tool can remember that it tries to hotplug, 
reconnect, query qemu (or just retry), and proceed according to the result:


- insert record in config for the new device, status = inactive
- commit
- tell qemu to hotplug
crash
- restart, read database, see inactive device
- query qemu
- if device is there, mark as active, commit
- else, tell qemu to hotplug, when successful, mark as active, commit



  A one-off for this particular command doesn't seem wise to me.


Strongly agreed.

--
error compiling committee.c: too many arguments to function

[Qemu-devel] Re: [PATCH] tools: Use real async.c instead of stubs

2011-02-23 Thread Stefan Hajnoczi

On Wed, Feb 23, 2011 at 11:54 AM, Kevin Wolf kw...@redhat.com wrote:
 It's wrong to call BHs directly, even in tools. The only operations that
 schedule BHs are called in a loop that (indirectly) contains a call to
 qemu_bh_poll anyway, so we're not losing the scheduled BHs: Tools either use
 synchronous functions, which are guaranteed to have completed (including any
 BHs) when they return; or if they use asynchronous functions, they need to 
 call
 qemu_aio_wait() or similar functions already today.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  Makefile.objs |    4 ++--
  qemu-tool.c   |   47 ---
  2 files changed, 6 insertions(+), 45 deletions(-)

We discussed this on IRC and I think it makes sense.  Review from
others would be appreciated.

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Re: [Qemu-devel] Re: Network bridging without adding bridge with brctl, possible?

2011-02-23 Thread Jan Kiszka

On 2011-02-23 07:38, Gerhard Wiesinger wrote:
 On Mon, 21 Feb 2011, Arnd Bergmann wrote:
 
 On Monday 21 February 2011, Jan Kiszka wrote:
 Now I think I tried all useful possible combinations:
 1.) macvtap0 and macvlan0 in bridged and non bridge mode
 2.) macvlan0 based on eth0 or based on macvtap0
 3.) Using and ip address on macvlan0 or not (tried both for 7b mac and
 7c mac)
 4.) Using 7b and 7c mac address in KVM (MAC in KVM is always the
 command
 line configured) and used tap interface from macvtap0 (as no second tap
 devices shows up)

 Summary:
 1.) Using macvtap0 only with 7b mac on interface and also at qemu works
 well in bridged mode as well as non bridged mode but with limitation no
 guest/host communication possible, used interface in KVM is tap
 interface created by macvtap0. Quite logically that it doesn't work
 with
 guest/host because of missing hairpin mode on the switch. But according
 to http://virt.kernelnewbies.org/MacVTap bridge mode should work even
 without hairpinning available on the switch.
 2.) macvlan0 doesn't create any tap interface therefore it can't be
 used
 as a device on KVM.
 3.) Using 7c mac address in KVM doesn't work at all regardsless of
 setting ip addresses on macvlan0 or any other setup.

 @Arnd: Can you explain a setup in detail which should work also with
 host/guest communication. Thnx.

 Any further ideas?
 Which kernel is needed to work?
 (current: 2.6.34.7-56.fc13.x86_64)

 Actually, I tried this successfully over a 2.6.38-rcX, but I have no
 idea what changes related to macvlan/tap went in since .34 and if this
 is required for this.

 We had a few bugs in .34, but it should work in general.

 One thing to make sure is that the host has connectivity through the
 macvlan interface and the lower interface (eth0) has no IP address
 assigned
 but is 'up'.

 It could also be a bug in the NIC, you could try to set the NIC into
 promiscious mode using ethtool to work around that.
 
 Hello,
 
 After some further tests and looking at the iproute ip and kernel code I
 finally gave up because I thing such a setup it is not possible without
 breaking up/reconfiguring eth0. When I have to reconfigure eth0 I think
 a better approach is to configure a bridge which I finally did and works
 well.
 
 I tried to explain/document the macvtap/macvlan concepts and limitations
 below. Please comment on it whether this is true or false.
 
 macvtap/macvlan driver concepts and limitations:
 1.) macvlan driver adds a MAC address to a lower interface device where
 the actual macvlanx device is based on
 2.) macvtap driver is based on macvlan driver and macvtap driver adds
 additional functionality of interface = external program communication
 with stdin/stdout channel.
 3.) Limitations: macvtap/macvlan based devices can only communicate with
 childs based on the same lower device (e.g. eth0 in this sample) but not
 to the lower device itself, only to the outside world of the interface
 (I guess limitations are derived from function macvlan_queue_xmit:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/macvlan.c;h=6ed577b065df136a18af5ee483810773ca4f43f8;hb=HEAD#l217)
 
 
 Example:
 # Lower device eth0 must not have an IP address assigned, but when also
 no communication is possible with macv* devices
 /root/download/iproute2/iproute2-2.6.37/ip/ip link add link eth0 name
 macvtap0 type macvtap mode bridge
 /root/download/iproute2/iproute2-2.6.37/ip/ip link set macvtap0 address
 1a:46:0b:ca:bc:7b up
 /root/download/iproute2/iproute2-2.6.37/ip/ip link add link eth0 name
 macvtap1 type macvtap mode bridge
 /root/download/iproute2/iproute2-2.6.37/ip/ip link set macvtap1 address
 1a:46:0b:ca:bc:7c up
 /root/download/iproute2/iproute2-2.6.37/ip/ip link add link eth0 name
 macvlan0 type macvtap mode bridge
 /root/download/iproute2/iproute2-2.6.37/ip/ip link set macvlan0 address
 1a:46:0b:ca:bc:7d up
 
 Example usage:
 eth0 should not be assigned any IP. macvlan0 should be used as the main
 host interface and the main IP address is configured.
 macvtap0 and macvtap1 can be used in e.g. VMs.
 Bridge communication is possible between the 3 interfaces and the outer
 world: macvtap0, macvtap1, macvlan0
 E.g.:
 1.) macvtap0 = macvtap1, macvlan0, eth0 external
 2.) macvtap1 = macvtap0, macvlan0, eth0 external
 3.) macvlan0 = macvtap0, macvtap1, eth0 external
 
 But no communication is possible between lower device eth0 and the child
 interfaces macvtap0, macvtap1, macvlan0:
 1.) eth0 = macvtap0
 2.) eth0 = macvtap1
 3.) eth0 = macvlan0

Right, but if I set IP(eth0) == IP(macvlan0), I'm able to communicate
between macvlan0 and mactapX, thus between guest and host. Just
re-checked here, still works (after resolving the usual MAC address mess
I caused by configuring manually).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-23 Thread Avi Kivity


On 02/23/2011 01:14 AM, Anthony Liguori wrote:


-drive already ties into the qemuopts infrastructure and we have 
readconfig and writeconfig.  I don't think we're missing any major 
pieces to do this in a more proper fashion.


The problem with qemu config files is that it splits the authoritative 
source of where images are stored into two.  Is it in the management 
tool's database or is it in qemu's config file?


For the problem at hand, one solution is to make qemu stop after the 
copy, and then management can issue an additional command to rearrange 
the disk and resume the guest.  A drawback here is that if management 
dies, the guest is stopped until it restarts.  We also make management 
latency guest visible, even if it doesn't die at an inconvenient place.


An alternative approach is to have the copy be performed by a new 
layered block format driver:


- create a new image, type = live-copy, containing three pieces of 
information

   - source image
   - destination image
   - copy state (initially nothing is copied)
- tell qemu switch to the new image
- qemu starts copying, updates copy state as needed
- copy finishes, event is emitted; reads and writes still serviced
- management receives event, switches qemu to destination image
- management removes live-copy image

If management dies while this is happening, it can simply query the 
state of the copy.  Similarly, if qemu dies, the copy state is 
persistent (could be 0/1 or real range of blocks).


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Markus Armbruster

Chunqiang Tang ct...@us.ibm.com writes:

[...]
 Now let’s talk about features. It seems that there is great interest in 
 QCOW2’ internal snapshot feature. If we really want to do that, the right 

Great interest?  Its use cases are demo, debugging, testing and such.
Kind of useful for developers, but I wouldn't want to use it in anger.
Nice to have if we can get it cheaply, but I'm not prepared to pay much
for it in performance or complexity, and I doubt I'm the only one.

Users always say yes when you ask them whether they need some feature.
Hence, the question is useless.  A better question to ask is how much
are you willing to pay for it?

 solution is to follow VMDK’s approach of storing each snapshot as a 
 separate COW file (see http://www.vmware.com/app/vmdk/?src=vmdk ), rather 
 than using the reference count table. VMDK’s approach can be easily 
 implemented for any COW format, or even as a function of the generic block 
 layer, without complicating any COW format or hurting its performance. I 
 know the snapshots are not really “internal” as stored in a single file 
 but instead more like external snapshots, but users don’t care about that 
 so long as they support the same use cases. Probably many people who use 
 VMware don't even know that the snapshots are stored as separate files. Do 
 they care?

I certainly wouldn't.

Re: [Qemu-devel] [PATCH] target-arm: Implement a minimal set of cp14 debug registers

2011-02-23 Thread Peter Maydell

On 22 February 2011 18:19, Peter Maydell peter.mayd...@linaro.org wrote:
 +            tmp = tcg_const_i32(0);
 +            store_reg(s, rt, tmp);

This generates spurious resource leak warnings, because
store_reg() calls dead_tmp() so you can't store anything
you didn't get from new_tmp().

I think I'll do what Aurelien suggested and put together
a patch that implements the resource-leak debug as a proper
facility at the tcg level.

-- PMM

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Avi Kivity


On 02/22/2011 10:56 AM, Kevin Wolf wrote:

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a way
that they don't impact the normal case. And they are it today.



Plus, encryption and snapshots can be implemented in a way that doesn't 
impact performance more than is reasonable.  Compression perhaps not, 
but if you choose compression, then performance is not your top 
consideration.  That's the case with filesystems that support 
compression as well.


--
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 03/18] Introduce skip_header parameter to qemu_loadvm_state().

2011-02-23 Thread Yoshiaki Tamura

Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |2 +-
 savevm.c|   24 +---
 sysemu.h|2 +-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/migration.c b/migration.c
index 302b8fe..bd51fef 100644
--- a/migration.c
+++ b/migration.c
@@ -63,7 +63,7 @@ int qemu_start_incoming_migration(const char *uri)
 
 void process_incoming_migration(QEMUFile *f)
 {
-if (qemu_loadvm_state(f)  0) {
+if (qemu_loadvm_state(f, 0)  0) {
 fprintf(stderr, load of migration failed\n);
 exit(0);
 }
diff --git a/savevm.c b/savevm.c
index 22010b9..52d5be8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1716,7 +1716,7 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, int skip_header)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
@@ -1729,17 +1729,19 @@ int qemu_loadvm_state(QEMUFile *f)
 return -EINVAL;
 }
 
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
+if (!skip_header) {
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC)
+return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION)
+return -ENOTSUP;
 }
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -2062,7 +2064,7 @@ int load_vmstate(const char *name)
 return -EINVAL;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 
 qemu_fclose(f);
 if (ret  0) {
diff --git a/sysemu.h b/sysemu.h
index 0a83ab9..8339eb4 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -93,7 +93,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
-- 
1.7.1.2

[Qemu-devel] [PATCH 01/18] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2011-02-23 Thread Yoshiaki Tamura

Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   20 +++-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 5e24329..a168a37 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index a50fd31..22010b9 100644
--- a/savevm.c
+++ b/savevm.c
@@ -171,7 +171,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -422,6 +423,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_malloc(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -452,6 +456,19 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -477,6 +494,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.1.2

[Qemu-devel] [PATCH 18/18] Introduce kemari: to enable FT migration mode (Kemari).

2011-02-23 Thread Yoshiaki Tamura

When kemari: is set in front of URI of migrate command, it will turn
on ft_mode to start FT migration mode (Kemari).  On the receiver side,
the option looks like, -incoming kemari:protocol:address:port

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Paolo Bonzini pbonz...@redhat.com
---
 hmp-commands.hx |4 +++-
 migration.c |   12 
 qmp-commands.hx |4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 372bef4..4588f38 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -760,7 +760,9 @@ ETEXI
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
diff --git a/migration.c b/migration.c
index 82f4a4d..95096ef 100644
--- a/migration.c
+++ b/migration.c
@@ -48,6 +48,12 @@ int qemu_start_incoming_migration(const char *uri)
 const char *p;
 int ret;
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p))
 ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
@@ -99,6 +105,12 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+/* check ft_mode (Kemari protocol) */
+if (strstart(uri, kemari:, p)) {
+ft_mode = FT_INIT;
+uri = p;
+}
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  blk, inc);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index df40a3d..68ca48a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -437,7 +437,9 @@ EQMP
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t put \kemari:\ in front of URI to enable 
+ Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.1.2

[Qemu-devel] [PATCH 06/18] virtio: decrement last_avail_idx with inuse before saving.

2011-02-23 Thread Yoshiaki Tamura

For regular migration inuse == 0 always as requests are flushed before
save. However, event-tap log when enabled introduces an extra queue
for requests which is not being flushed, thus the last inuse requests
are left in the event-tap queue.  Move the last_avail_idx value sent
to the remote back to make it repeat the last inuse requests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 31bd9e3..f05d1b6 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -673,12 +673,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, i);
 
 for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+/* For regular migration inuse == 0 always as
+ * requests are flushed before save. However,
+ * event-tap log when enabled introduces an extra
+ * queue for requests which is not being flushed,
+ * thus the last inuse requests are left in the event-tap queue.
+ * Move the last_avail_idx value sent to the remote back
+ * to make it repeat the last inuse requests. */
+uint16_t last_avail = vdev-vq[i].last_avail_idx - vdev-vq[i].inuse;
 if (vdev-vq[i].vring.num == 0)
 break;
 
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
-qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, last_avail);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
-- 
1.7.1.2

[Qemu-devel] [PATCH 10/18] Call init handler of event-tap at main() in vl.c.

2011-02-23 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 4e263c3..614ac9c 100644
--- a/vl.c
+++ b/vl.c
@@ -162,6 +162,7 @@ int main(int argc, char **argv)
 #include qemu-queue.h
 #include cpus.h
 #include arch_init.h
+#include event-tap.h
 
 #include ui/qemu-spice.h
 
@@ -2931,6 +2932,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 /* open the virtual block devices */
 if (snapshot)
 qemu_opts_foreach(qemu_find_opts(drive), drive_enable_snapshot, 
NULL, 0);
-- 
1.7.1.2

[Qemu-devel] [PATCH 13/18] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async().

2011-02-23 Thread Yoshiaki Tamura

event-tap function is called only when it is on.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 net.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index ec4745d..724b549 100644
--- a/net.c
+++ b/net.c
@@ -36,6 +36,7 @@
 #include qemu-common.h
 #include qemu_socket.h
 #include hw/qdev.h
+#include event-tap.h
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -559,6 +560,10 @@ ssize_t qemu_send_packet_async(VLANClientState *sender,
 
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size)
 {
+if (event_tap_is_on()) {
+return event_tap_send_packet(vc, buf, size);
+}
+
 qemu_send_packet_async(vc, buf, size, NULL);
 }
 
@@ -657,6 +662,10 @@ ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 {
 NetQueue *queue;
 
+if (event_tap_is_on()) {
+return event_tap_sendv_packet_async(sender, iov, iovcnt, sent_cb);
+}
+
 if (sender-link_down || (!sender-peer  !sender-vlan)) {
 return calc_iov_length(iov, iovcnt);
 }
-- 
1.7.1.2

[Qemu-devel] [PATCH 09/18] Introduce event-tap.

2011-02-23 Thread Yoshiaki Tamura

event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  940 +++
 event-tap.h |   44 +++
 qemu-tool.c |   28 ++
 trace-events|   10 +
 5 files changed, 1023 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 220589e..da57efe 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,7 @@ obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 LIBS+=-lz
+obj-y += event-tap.o
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..95c147a
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,940 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include block.h
+#include block_int.h
+#include ioport.h
+#include osdep.h
+#include sysemu.h
+#include hw/hw.h
+#include net.h
+#include event-tap.h
+#include trace.h
+
+enum EVENT_TAP_STATE {
+EVENT_TAP_OFF,
+EVENT_TAP_ON,
+EVENT_TAP_SUSPEND,
+EVENT_TAP_FLUSH,
+EVENT_TAP_LOAD,
+EVENT_TAP_REPLAY,
+};
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int  index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t  buf[MMIO_BUF_SIZE];
+int  len;
+} EventTapMMIO;
+
+typedef struct EventTapNetReq {
+char *device_name;
+int iovcnt;
+int vlan_id;
+bool vlan_needed;
+bool async;
+struct iovec *iov;
+NetPacketSent *sent_cb;
+} EventTapNetReq;
+
+#define MAX_BLOCK_REQUEST 32
+
+typedef struct EventTapAIOCB EventTapAIOCB;
+
+typedef struct EventTapBlkReq {
+char *device_name;
+int num_reqs;
+int num_cbs;
+bool is_flush;
+BlockRequest reqs[MAX_BLOCK_REQUEST];
+EventTapAIOCB *acb[MAX_BLOCK_REQUEST];
+} EventTapBlkReq;
+
+#define EVENT_TAP_IOPORT (1  0)
+#define EVENT_TAP_MMIO   (1  1)
+#define EVENT_TAP_NET(1  2)
+#define EVENT_TAP_BLK(1  3)
+
+#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport;
+EventTapMMIO mmio;
+};
+union {
+EventTapNetReq net_req;
+EventTapBlkReq blk_req;
+};
+QTAILQ_ENTRY(EventTapLog) node;
+} EventTapLog;
+
+struct EventTapAIOCB {
+BlockDriverAIOCB common;
+BlockDriverAIOCB *acb;
+bool is_canceled;
+};
+
+static EventTapLog *last_event_tap;
+
+static QTAILQ_HEAD(, EventTapLog) event_list;
+static QTAILQ_HEAD(, EventTapLog) event_pool;
+
+static int (*event_tap_cb)(void);
+static QEMUBH *event_tap_bh;
+static VMChangeStateEntry *vmstate;
+
+static void event_tap_bh_cb(void *p)
+{
+if (event_tap_cb) {
+event_tap_cb();
+}
+
+qemu_bh_delete(event_tap_bh);
+event_tap_bh = NULL;
+}
+
+static void event_tap_schedule_bh(void)
+{
+trace_event_tap_ignore_bh(!!event_tap_bh);
+
+/* if bh is already set, we ignore it for now */
+if (event_tap_bh) {
+return;
+}
+
+event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
+qemu_bh_schedule(event_tap_bh);
+
+return;
+}
+
+static void *event_tap_alloc_log(void)
+{
+EventTapLog *log;
+
+if (QTAILQ_EMPTY(event_pool)) {
+log = qemu_mallocz(sizeof(EventTapLog));
+} else {
+log = QTAILQ_FIRST(event_pool);
+QTAILQ_REMOVE(event_pool, log, node);
+}
+
+return log;
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req);
+static void event_tap_free_blk_req(EventTapBlkReq *blk_req);
+
+static void event_tap_free_log(EventTapLog *log)
+{
+int mode = log-mode  ~EVENT_TAP_TYPE_MASK;
+
+if (mode == EVENT_TAP_NET) {
+event_tap_free_net_req(log-net_req);
+} else if (mode == EVENT_TAP_BLK) {
+event_tap_free_blk_req(log-blk_req);
+}
+
+log-mode = 0;
+
+/* return the log to event_pool */
+QTAILQ_INSERT_HEAD(event_pool, log, node);
+}
+
+static void event_tap_free_pool(void)
+{
+EventTapLog *log, *next;
+
+QTAILQ_FOREACH_SAFE(log, event_pool, node, next) {
+QTAILQ_REMOVE(event_pool, log, node);
+qemu_free(log);
+}
+}
+
+static void event_tap_free_net_req(EventTapNetReq *net_req)
+{
+int i;
+
+if (!net_req-async) {
+for

[Qemu-devel] [PATCH 00/18] Kemari for KVM v0.2.11

2011-02-23 Thread Yoshiaki Tamura

Hi,

This patch series is a revised version of Kemari for KVM, which
applied comments for the previous post.  The current code is based on
qemu.git 9a31334f419c1d773cf4b4bfbbdace96fbf8a4f4.

The changes from v0.2.10 - v0.2.11 are:

- rebased to 0.14
- upon unregistering event-tap, set event_tap_state after event_tap_flush
- modify commit log of 02/18 that it won't make existing migration
  bi-directional.

The changes from v0.2.9 - v0.2.10 are:

- change migrate format to kemari:protocol:host:port (Paolo)

The changes from v0.2.8 - v0.2.9 are:

- abstract common code between qemu_savevm_{state,trans}_* (Paolo)
- change incoming format to kemari:protocol:host:port (Paolo)

The changes from v0.2.7 - v0.2.8 are:

- fixed calling wrong cb in event-tap
- add missing qemu_aio_release in event-tap

The changes from v0.2.6 - v0.2.7 are:

- add AIOCB, AIOPool and cancel functions (Kevin)
- insert event-tap for bdrv_flush (Kevin)
- add error handing when calling bdrv functions (Kevin)
- fix usage of qemu_aio_flush and bdrv_flush (Kevin)
- use bs in AIOCB on the primary (Kevin)
- reorder event-tap functions to gather with block/net (Kevin)
- fix checking bs-device_name (Kevin)

The changes from v0.2.5 - v0.2.6 are:

- use qemu_{put,get}_be32() to save/load niov in event-tap

The changes from v0.2.4 - v0.2.5 are:

- fixed braces and trailing spaces by using Blue's checkpatch.pl (Blue)
- event-tap: don't try to send blk_req if it's a bdrv_aio_flush event

The changes from v0.2.3 - v0.2.4 are:

- call vm_start() before event_tap_flush_one() to avoid failure in
  virtio-net assertion
- add vm_change_state_handler to turn off ft_mode
- use qemu_iovec functions in event-tap
- remove duplicated code in migration
- remove unnecessary new line for error_report in ft_trans_file

The changes from v0.2.2 - v0.2.3 are:

- queue async net requests without copying (MST)
-- if not async, contents of the packets are sent to the secondary
- better description for option -k (MST)
- fix memory transfer failure
- fix ft transaction initiation failure

The changes from v0.2.1 - v0.2.2 are:

- decrement last_avaid_idx with inuse before saving (MST)
- remove qemu_aio_flush() and bdrv_flush_all() in migrate_ft_trans_commit()

The changes from v0.2 - v0.2.1 are:

- Move event-tap to net/block layer and use stubs (Blue, Paul, MST, Kevin)
- Tap bdrv_aio_flush (Marcelo)
- Remove multiwrite interface in event-tap (Stefan)
- Fix event-tap to use pio/mmio to replay both net/block (Stefan)
- Improve error handling in event-tap (Stefan)
- Fix leak in event-tap (Stefan)
- Revise virtio last_avail_idx manipulation (MST)
- Clean up migration.c hook (Marcelo)
- Make deleting change state handler robust (Isaku, Anthony)

The changes from v0.1.1 - v0.2 are:

- Introduce a queue in event-tap to make VM sync live.
- Change transaction receiver to a state machine for async receiving.
- Replace net/block layer functions with event-tap proxy functions.
- Remove dirty bitmap optimization for now.
- convert DPRINTF() in ft_trans_file to trace functions.
- convert fprintf() in ft_trans_file to error_report().
- improved error handling in ft_trans_file.
- add a tmp pointer to qemu_del_vm_change_state_handler.

The changes from v0.1 - v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer.
- Introduce a new option for -incoming to accept FT transaction.

- Removed writev() support to QEMUFile and FdMigrationState for now.
  I would post this work in a different series.

- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.

- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

The first 6 patches modify several functions of qemu to prepare
introducing Kemari specific components.

The next 6 patches are the components of Kemari.  They introduce
event-tap and the FT transaction protocol file based on buffered file.
The design document of FT transaction protocol can be found at,
http://wiki.qemu.org/images/b/b1/Kemari_sender_receiver_0.5a.pdf

Then the following 2 patches modifies net/block layer functions with
event-tap functions.  Please note that if Kemari is off, event-tap
will just passthrough, and there is most no intrusion to exisiting
functions including normal live migration.

Finally, the migration layer are modified to support Kemari in the
last 4 patches.  Again, there shouldn't be any affection if a user
doesn't specify Kemari specific options.  The transaction is now async
on both sender and receiver side.  The sender side respects the
max_downtime to decide when to switch from async to sync mode.

The repository contains all patches I'm sending with this message.
For those who want to try, please pull the following repository.  It
also includes dirty bitmap optimization which aren't ready for posting
yet.  To remove the dirty bitmap optimization, please look at HEAD~4
of the tree.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari next

Thanks,

Yoshi

[Qemu-devel] [PATCH 12/18] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-02-23 Thread Yoshiaki Tamura

Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d611100..e192eec 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3662,6 +3663,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.1.2

[Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-23 Thread Yoshiaki Tamura

Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.  Note that this
won't change the existing migration protocol to be bi-directional.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   15 +++
 migration.c |   13 +
 migration.h |3 +++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index b55f419..55777c8 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,20 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s-fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do {
+len = recv(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
@@ -94,6 +108,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s-get_error = socket_errno;
 s-write = socket_write;
+s-read = socket_read;
 s-close = tcp_close;
 s-mig_state.cancel = migrate_fd_cancel;
 s-mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index af3a1f2..302b8fe 100644
--- a/migration.c
+++ b/migration.c
@@ -340,6 +340,19 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size)
+{
+FdMigrationState *s = opaque;
+int ret;
+
+ret = s-read(s, data, size);
+if (ret == -1) {
+ret = -(s-get_error(s));
+}
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index 2170792..88a6987 100644
--- a/migration.h
+++ b/migration.h
@@ -48,6 +48,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -116,6 +117,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.1.2

[Qemu-devel] [PATCH 11/18] ioport: insert event_tap_ioport() to ioport_write().

2011-02-23 Thread Yoshiaki Tamura

Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index aa4188a..74aebf5 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include ioport.h
 #include trace.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.1.2

[Qemu-devel] [PATCH 15/18] savevm: introduce qemu_savevm_trans_{begin, commit}.

2011-02-23 Thread Yoshiaki Tamura

Introduce qemu_savevm_trans_{begin,commit} to send the memory and
device info together, while avoiding cancelling memory state tracking.
This patch also abstracts common code between
qemu_savevm_state_{begin,iterate,commit}.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |  157 +++---
 sysemu.h |2 +
 2 files changed, 101 insertions(+), 58 deletions(-)

diff --git a/savevm.c b/savevm.c
index 78c1972..c96a393 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1601,29 +1601,68 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 return false;
 }
 
-int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
-int shared)
+/*
+ * section: header to write
+ * inc: if true, forces to pass SECTION_PART instead of SECTION_START
+ * pause: if true, breaks the loop when live handler returned 0
+ */
+static int qemu_savevm_state_live(Monitor *mon, QEMUFile *f, int section,
+  bool inc, bool pause)
 {
 SaveStateEntry *se;
+int skip = 0, ret;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if(se-set_params == NULL) {
+int len, stage;
+
+if (se-save_live_state == NULL) {
 continue;
-   }
-   se-set_params(blk_enable, shared, se-opaque);
+}
+
+/* Section type */
+qemu_put_byte(f, section);
+qemu_put_be32(f, se-section_id);
+
+if (section == QEMU_VM_SECTION_START) {
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+stage = inc ? QEMU_VM_SECTION_PART : QEMU_VM_SECTION_START;
+} else {
+assert(inc);
+stage = section;
+}
+
+ret = se-save_live_state(mon, f, stage, se-opaque);
+if (!ret) {
+skip++;
+if (pause) {
+break;
+}
+}
 }
-
-qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+return skip;
+}
+
+static void qemu_savevm_state_full(QEMUFile *f)
+{
+SaveStateEntry *se;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
 int len;
 
-if (se-save_live_state == NULL)
+if (se-save_state == NULL  se-vmsd == NULL) {
 continue;
+}
 
 /* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
 qemu_put_be32(f, se-section_id);
 
 /* ID string */
@@ -1634,9 +1673,29 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
+int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
+int shared)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+if (se-set_params == NULL) {
+continue;
+}
+se-set_params(blk_enable, shared, se-opaque);
 }
 
+qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+
+qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_START, 0, 0);
+
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
 return -EIO;
@@ -1647,29 +1706,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
 {
-SaveStateEntry *se;
 int ret = 1;
 
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (se-save_live_state == NULL)
-continue;
-
-/* Section type */
-qemu_put_byte(f, QEMU_VM_SECTION_PART);
-qemu_put_be32(f, se-section_id);
-
-ret = se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
-if (!ret) {
-/* Do not proceed to the next vmstate before this one reported
-   completion of the current stage. This serializes the migration
-   and reduces the probability that a faster changing state is
-   synchronized over and over again. */
-break;
-}
-}
-
-if (ret)
+/* Do not proceed to the next vmstate before this one reported
+   completion of the current stage. This serializes the migration
+   and reduces the probability that a faster changing state is
+   synchronized over and over again. */
+ret = qemu_savevm_state_live(mon, f, QEMU_VM_SECTION_PART, 1, 1);
+if (!ret) {
 return 1;
+}
 
 if (qemu_file_has_error(f)) {
 qemu_savevm_state_cancel(mon, f);
@@ -1681,46 +1727,41 @@ int

[Qemu-devel] [PATCH 05/18] vl.c: add deleted flag for deleting the handler.

2011-02-23 Thread Yoshiaki Tamura

Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/vl.c b/vl.c
index b436952..4e263c3 100644
--- a/vl.c
+++ b/vl.c
@@ -1158,6 +1158,7 @@ static void nographic_update(void *opaque)
 struct vm_change_state_entry {
 VMChangeStateHandler *cb;
 void *opaque;
+int deleted;
 QLIST_ENTRY (vm_change_state_entry) entries;
 };
 
@@ -1178,8 +1179,7 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
 }
 
 void vm_state_notify(int running, int reason)
@@ -1188,8 +1188,13 @@ void vm_state_notify(int running, int reason)
 
 trace_vm_state_notify(running, reason);
 
-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);
+QLIST_FOREACH(e, vm_change_state_head, entries) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}
 }
 }
 
-- 
1.7.1.2

[Qemu-devel] [PATCH 17/18] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2011-02-23 Thread Yoshiaki Tamura

When ft_mode is set in the header, tcp_accept_incoming_migration()
sets ft_trans_incoming() as a callback, and call
qemu_file_get_notify() to receive FT transaction iteratively.  We also
need a hack no to close fd before moving to ft_transaction mode, so
that we can reuse the fd for it.  vm_change_state_handler is added to
turn off ft_mode when cont is pressed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   67 ++-
 1 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 55777c8..84076d6 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,8 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_trans_file.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -29,6 +31,8 @@
 do { } while (0)
 #endif
 
+static VMChangeStateEntry *vmstate;
+
 static int socket_errno(FdMigrationState *s)
 {
 return socket_error();
@@ -56,7 +60,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -150,6 +155,36 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 return s-mig_state;
 }
 
+static void ft_trans_incoming(void *opaque)
+{
+QEMUFile *f = opaque;
+
+qemu_file_get_notify(f);
+if (qemu_file_has_error(f)) {
+ft_mode = FT_ERROR;
+qemu_fclose(f);
+}
+}
+
+static void ft_trans_reset(void *opaque, int running, int reason)
+{
+QEMUFile *f = opaque;
+
+if (running) {
+if (ft_mode != FT_ERROR) {
+qemu_fclose(f);
+}
+ft_mode = FT_OFF;
+qemu_del_vm_change_state_handler(vmstate);
+}
+}
+
+static void ft_trans_schedule_replay(QEMUFile *f)
+{
+event_tap_schedule_replay();
+vmstate = qemu_add_vm_change_state_handler(ft_trans_reset, f);
+}
+
 static void tcp_accept_incoming_migration(void *opaque)
 {
 struct sockaddr_in addr;
@@ -175,8 +210,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (ft_mode == FT_INIT) {
+autostart = 0;
+}
+
 process_incoming_migration(f);
+
+if (ft_mode == FT_INIT) {
+int ret;
+
+socket_set_nodelay(c);
+
+f = qemu_fopen_ft_trans(s, c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen_ft_trans\n);
+goto out;
+}
+
+/* need to wait sender to setup */
+ret = qemu_ft_trans_begin(f);
+if (ret  0) {
+goto out;
+}
+
+qemu_set_fd_handler2(c, NULL, ft_trans_incoming, NULL, f);
+ft_trans_schedule_replay(f);
+ft_mode = FT_TRANSACTION_RECV;
+
+return;
+}
+
 qemu_fclose(f);
+
 out:
 close(c);
 out2:
-- 
1.7.1.2

[Qemu-devel] [PATCH 14/18] block: insert event-tap to bdrv_aio_writev(), bdrv_aio_flush() and bdrv_flush().

2011-02-23 Thread Yoshiaki Tamura

event-tap function is called only when it is on, and requests were
sent from device emulators.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Acked-by: Kevin Wolf kw...@redhat.com
---
 block.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index f7d91a2..b19729a 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include block_int.h
 #include module.h
 #include qemu-objects.h
+#include event-tap.h
 
 #ifdef CONFIG_BSD
 #include sys/types.h
@@ -1585,6 +1586,10 @@ int bdrv_flush(BlockDriverState *bs)
 }
 
 if (bs-drv  bs-drv-bdrv_flush) {
+if (*bs-device_name  event_tap_is_on()) {
+event_tap_bdrv_flush();
+}
+
 return bs-drv-bdrv_flush(bs);
 }
 
@@ -2220,6 +2225,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 if (bdrv_check_request(bs, sector_num, nb_sectors))
 return NULL;
 
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+ cb, opaque);
+}
+
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
  opaque);
@@ -2483,6 +2493,11 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
 if (!drv)
 return NULL;
+
+if (*bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_flush(bs, cb, opaque);
+}
+
 return drv-bdrv_aio_flush(bs, cb, opaque);
 }
 
-- 
1.7.1.2

[Qemu-devel] [PATCH 08/18] savevm: introduce util functions to control ft_trans_file from savevm layer.

2011-02-23 Thread Yoshiaki Tamura

To utilize ft_trans_file function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 ++
 savevm.c |  149 ++
 2 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index a168a37..a9eff5a 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -51,6 +51,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -60,6 +61,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_ft_trans_begin(QEMUFile *f);
+int qemu_ft_trans_commit(QEMUFile *f);
+int qemu_ft_trans_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
@@ -94,6 +98,7 @@ void qemu_file_set_error(QEMUFile *f);
  * halted due to rate limiting or EAGAIN errors occur as it can be used to
  * resume output. */
 void qemu_file_put_notify(QEMUFile *f);
+void qemu_file_get_notify(void *opaque);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 52d5be8..78c1972 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_trans_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -189,6 +190,13 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
+typedef struct QEMUFileSocketTrans
+{
+int fd;
+QEMUFileSocket *s;
+VMChangeStateEntry *e;
+} QEMUFileSocketTrans;
+
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
 QEMUFileSocket *s = opaque;
@@ -204,6 +212,22 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -211,6 +235,70 @@ static int socket_close(void *opaque)
 return 0;
 }
 
+static int socket_trans_get_buffer(void *opaque, uint8_t *buf, int64_t pos, 
size_t size)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+ssize_t len;
+
+len = socket_get_buffer(s, buf, pos, size);
+
+return len;
+}
+
+static ssize_t socket_trans_put_buffer(void *opaque, const void *buf, size_t 
size)
+{
+QEMUFileSocketTrans *t = opaque;
+
+return socket_put_buffer(t-s, buf, size);
+}
+
+
+static int socket_trans_get_ready(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+QEMUFile *f = s-file;
+int ret = 0;
+
+ret = qemu_loadvm_state(f, 1);
+if (ret  0) {
+fprintf(stderr,
+socket_trans_get_ready: error while loading vmstate\n);
+}
+
+return ret;
+}
+
+static int socket_trans_close(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(t-fd, NULL, NULL, NULL, NULL);
+qemu_del_vm_change_state_handler(t-e);
+close(s-fd);
+close(t-fd);
+qemu_free(s);
+qemu_free(t);
+
+return 0;
+}
+
+static void socket_trans_resume(void *opaque, int running, int reason)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+if (!running) {
+return;
+}
+
+qemu_announce_self();
+qemu_fclose(s-file);
+}
+
 static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
 {
 QEMUFileStdio *s = opaque;
@@ -333,6 +421,26 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd)
+{
+QEMUFileSocketTrans *t = qemu_mallocz(sizeof(QEMUFileSocketTrans));
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+t-s = s;
+t-fd = s_fd;
+t-e = qemu_add_vm_change_state_handler(socket_trans_resume, t);
+
+s-fd = c_fd;
+s-file = qemu_fopen_ops_ft_trans(t, socket_trans_put_buffer,
+  socket_trans_get_buffer, NULL,
+  socket_trans_get_ready,
+  migrate_fd_wait_for_unfreeze,
+  socket_trans_close, 0);

[Qemu-devel] [PATCH] docs: Update stderr and simple backend, add systemtap backend

2011-02-23 Thread Stefan Hajnoczi

The following additions to the tracing documentation are included:

1. Move stderr backend documentation to top-level and out of simple
   backend.  Include hints on when this backend is useful.

2. Document the simple backend thread-safety limitation.

3. Document the dtrace backend for SystemTap.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 docs/tracing.txt |   30 ++
 1 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 21183f9..a6cc56f 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -126,6 +126,14 @@ The nop backend generates empty trace event functions so 
that the compiler
 can optimize out trace events completely.  This is the default and imposes no
 performance penalty.
 
+=== Stderr ===
+
+The stderr backend sends trace events directly to standard error.  This
+effectively turns trace events into debug printfs.
+
+This is the simplest backend and can be used together with existing code that
+uses DPRINTF().
+
 === Simpletrace ===
 
 The simple backend supports common use cases and comes as part of the QEMU
@@ -133,10 +141,10 @@ source tree.  It may not be as powerful as 
platform-specific or third-party
 trace backends but it is portable.  This is the recommended trace backend
 unless you have specific needs for more advanced backends.
 
-=== Stderr ===
-
-The stderr backend sends trace events directly to standard error output
-during emulation.
+Warning: the simple backend is not thread-safe so only enable trace events
+that are executed while the global mutex is held.  Much of QEMU meets this
+requirement but some utility functions like qemu_malloc() or thread-related
+code cannot be safely traced using the simple backend.
 
  Monitor commands 
 
@@ -187,3 +195,17 @@ consistent.
 The ust backend uses the LTTng Userspace Tracer library.  There are no
 monitor commands built into QEMU, instead UST utilities should be used to list,
 enable/disable, and dump traces.
+
+=== SystemTap ===
+
+The dtrace backend uses DTrace sdt probes but has only been tested with
+SystemTap.  When SystemTap support is detected a .stp file with wrapper probes
+is generated to make use in scripts more convenient.  This step can also be
+performed manually after a build in order to change the binary name in the .stp
+probes:
+
+scripts/tracetool --dtrace --stap \
+  --binary path/to/qemu-binary \
+  --target-type system \
+  --target-arch x86_64 \
+  trace-events qemu.stp
-- 
1.7.2.3

[Qemu-devel] [PATCH 07/18] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-02-23 Thread Yoshiaki Tamura

This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs   |1 +
 ft_trans_file.c |  624 +++
 ft_trans_file.h |   72 +++
 migration.c |3 +
 trace-events|   15 ++
 5 files changed, 715 insertions(+), 0 deletions(-)
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h

diff --git a/Makefile.objs b/Makefile.objs
index c144df1..8856160 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -100,6 +100,7 @@ common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
 common-obj-y += pflib.o
+common-obj-y += ft_trans_file.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_trans_file.c b/ft_trans_file.c
new file mode 100644
index 000..2b42b95
--- /dev/null
+++ b/ft_trans_file.c
@@ -0,0 +1,624 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include trace.h
+#include ft_trans_file.h
+
+typedef struct FtTransHdr
+{
+uint16_t cmd;
+uint16_t id;
+uint32_t seq;
+uint32_t payload_len;
+} FtTransHdr;
+
+typedef struct QEMUFileFtTrans
+{
+FtTransPutBufferFunc *put_buffer;
+FtTransGetBufferFunc *get_buffer;
+FtTransPutReadyFunc *put_ready;
+FtTransGetReadyFunc *get_ready;
+FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
+FtTransCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+
+enum QEMU_VM_TRANSACTION_STATE state;
+uint32_t seq;
+uint16_t id;
+
+int has_error;
+
+bool freeze_output;
+bool freeze_input;
+bool rate_limit;
+bool is_sender;
+bool is_payload;
+
+uint8_t *buf;
+size_t buf_max_size;
+size_t put_offset;
+size_t get_offset;
+
+FtTransHdr header;
+size_t header_offset;
+} QEMUFileFtTrans;
+
+#define IO_BUF_SIZE 32768
+
+static void ft_trans_append(QEMUFileFtTrans *s,
+const uint8_t *buf, size_t size)
+{
+if (size  (s-buf_max_size - s-put_offset)) {
+trace_ft_trans_realloc(s-buf_max_size, size + 1024);
+s-buf_max_size += size + 1024;
+s-buf = qemu_realloc(s-buf, s-buf_max_size);
+}
+
+trace_ft_trans_append(size);
+memcpy(s-buf + s-put_offset, buf, size);
+s-put_offset += size;
+}
+
+static void ft_trans_flush(QEMUFileFtTrans *s)
+{
+size_t offset = 0;
+
+if (s-has_error) {
+error_report(flush when error %d, bailing, s-has_error);
+return;
+}
+
+while (offset  s-put_offset) {
+ssize_t ret;
+
+ret = s-put_buffer(s-opaque, s-buf + offset, s-put_offset - 
offset);
+if (ret == -EAGAIN) {
+break;
+}
+
+if (ret = 0) {
+error_report(error flushing data, %s, strerror(errno));
+s-has_error = FT_TRANS_ERR_FLUSH;
+break;
+} else {
+offset += ret;
+}
+}
+
+trace_ft_trans_flush(offset, s-put_offset);
+memmove(s-buf, s-buf + offset, s-put_offset - offset);
+s-put_offset -= offset;
+s-freeze_output = !!s-put_offset;
+}
+
+static ssize_t ft_trans_put(void *opaque, void *buf, int size)
+{
+QEMUFileFtTrans *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+/* flush buffered data before putting next */
+if (s-put_offset) {
+ft_trans_flush(s);
+}
+
+while (!s-freeze_output  offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len == -EAGAIN) {
+trace_ft_trans_freeze_output();
+s-freeze_output = 1;
+break;
+}
+
+if (len = 0) {
+error_report(putting data failed, %s, strerror(errno));
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+if (s-freeze_output) {
+ft_trans_append(s, buf + offset, size - offset);
+offset = size;
+}
+
+return offset;
+}
+
+static int ft_trans_send_header(QEMUFileFtTrans *s,
+enum QEMU_VM_TRANSACTION_STATE state,
+uint32_t payload_len)
+{
+int ret;
+FtTransHdr

Re: [Qemu-devel] [RFC PATCH] block: Fix eject -f for locked devices

2011-02-23 Thread Markus Armbruster

Sorry for my slow reply.

Blue Swirl blauwir...@gmail.com writes:

 On Fri, Feb 18, 2011 at 5:16 PM, Markus Armbruster arm...@redhat.com wrote:
 From 8cd4978c9be6ff2bcc414bb1c1b258b96b9a74c1 Mon Sep 17 00:00:00 2001
 From: Markus Armbruster arm...@redhat.com
 Date: Fri, 18 Feb 2011 15:54:02 +0100

 After forcefully ejecting media locked by the guest, you can't ever
 again insert new media.

 Example:

    (qemu) info block
    hda: type=hd removable=0 file=test.img ro=0 drv=raw encrypted=0
    cd: type=cdrom removable=1 locked=1 file=x.iso ro=0 drv=raw encrypted=0
    (qemu) eject cd
    Device 'cd' is locked
    (qemu) eject -f cd
    (qemu) info block
    hda: type=hd removable=0 file=test.img ro=0 drv=raw encrypted=0
    cd: type=cdrom removable=1 locked=1 [not inserted]
    (qemu) change cd x.iso
    Device 'cd' is locked

 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
 I'm not entirely sure this is the appropriate fix, and that's why
 there's RFC in the subject.

 Both IDE and SCSI devices expose their drive's BlockDriverState member
 locked to the guest via mode sense.

 What does real hardware do when I force-eject media (typically by
 rummaging in that little hole with a paperclip)?  Does it actively
 notify the OS?  Does mode sense change?

 No idea, but IIRC the drive is still usable after that,

That's what I'd expect.  If the OS recovers from losing the media, the
drive should be fine.

 so locking the
 drive does not look correct.

I'm not sure I get you here.

 A possible alternative fix is to make do_change_block() ignore
 bdrv_is_locked() when inserting media into an empty drive.

 Then the meaning of locked would change, maybe eject_disabled would
 then describe the state better.

I like the current meaning of BlockDriverState member locked just fine,
it nicely mirrors the SCSI / ATAPI mode sense page bit.

I'm worried about unwanted side effects of my change.  For instance,
what about this code in do_snapshot_blkdev():

flags = bs-open_flags;
bdrv_close(bs);
ret = bdrv_open(bs, filename, flags, drv);

I'm afraid my change would make it screw up bs-locked.  Jes?

[Qemu-devel] [PATCH 16/18] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2011-02-23 Thread Yoshiaki Tamura

Introduce migrate_ft_trans_put_ready() which kicks the FT transaction
cycle.  When ft_mode is on, migrate_fd_put_ready() would open
ft_trans_file and turn on event_tap.  To end or cancel FT transaction,
ft_mode and event_tap is turned off.  migrate_ft_trans_get_ready() is
called to receive ack from the receiver.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |  261 ++-
 1 files changed, 260 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3be3554..82f4a4d 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -283,6 +284,14 @@ void migrate_fd_error(FdMigrationState *s)
 migrate_fd_cleanup(s);
 }
 
+static void migrate_ft_trans_error(FdMigrationState *s)
+{
+ft_mode = FT_ERROR;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_error(s);
+event_tap_unregister();
+}
+
 int migrate_fd_cleanup(FdMigrationState *s)
 {
 int ret = 0;
@@ -318,6 +327,17 @@ void migrate_fd_put_notify(void *opaque)
 qemu_file_put_notify(s-file);
 }
 
+static void migrate_fd_get_notify(void *opaque)
+{
+FdMigrationState *s = opaque;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_file_get_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_ft_trans_error(s);
+}
+}
+
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
 FdMigrationState *s = opaque;
@@ -353,6 +373,10 @@ int migrate_fd_get_buffer(void *opaque, uint8_t *data, 
int64_t pos, size_t size)
 ret = -(s-get_error(s));
 }
 
+if (ret == -EAGAIN) {
+qemu_set_fd_handler2(s-fd, NULL, migrate_fd_get_notify, NULL, s);
+}
+
 return ret;
 }
 
@@ -379,6 +403,230 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int migrate_ft_trans_commit(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_COMMIT  ft_mode != FT_TRANSACTION_ATOMIC) {
+fprintf(stderr,
+migrate_ft_trans_commit: invalid ft_mode %d\n, ft_mode);
+goto out;
+}
+
+do {
+if (ft_mode == FT_TRANSACTION_ATOMIC) {
+if (qemu_ft_trans_begin(s-file)  0) {
+fprintf(stderr, qemu_ft_trans_begin failed\n);
+goto out;
+}
+
+ret = qemu_savevm_trans_begin(s-mon, s-file, 0);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_begin failed\n);
+goto out;
+}
+
+ft_mode = FT_TRANSACTION_COMMIT;
+if (ret) {
+/* don't proceed until if fd isn't ready */
+goto out;
+}
+}
+
+/* make the VM state consistent by flushing outstanding events */
+vm_stop(0);
+
+/* send at full speed */
+qemu_file_set_rate_limit(s-file, 0);
+
+ret = qemu_savevm_trans_complete(s-mon, s-file);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_complete failed\n);
+goto out;
+}
+
+ret = qemu_ft_trans_commit(s-file);
+if (ret  0) {
+fprintf(stderr, qemu_ft_trans_commit failed\n);
+goto out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_RECV;
+ret = 1;
+goto out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto out;
+}
+
+ft_mode =  ret ? FT_TRANSACTION_BEGIN : FT_TRANSACTION_ATOMIC;
+} while (ft_mode != FT_TRANSACTION_BEGIN);
+
+vm_start();
+ret = 0;
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_get_ready(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_RECV) {
+fprintf(stderr,
+migrate_ft_trans_get_ready: invalid ft_mode %d\n, ft_mode);
+goto error_out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto error_out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_BEGIN;
+} else {
+ft_mode = FT_TRANSACTION_ATOMIC;
+
+ret = migrate_ft_trans_commit(s);
+if (ret  0) {
+goto error_out;
+}
+if (ret) {
+goto out;
+}
+}
+
+vm_start();
+ret = 0;
+goto out;
+
+error_out:
+migrate_ft_trans_error(s);
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_put_ready(void)
+{
+FdMigrationState *s = migrate_to_fms(current_migration);
+

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Anthony Liguori


On 02/23/2011 07:43 AM, Avi Kivity wrote:

On 02/22/2011 10:56 AM, Kevin Wolf wrote:

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a way
that they don't impact the normal case. And they are it today.



Plus, encryption and snapshots can be implemented in a way that 
doesn't impact performance more than is reasonable.


We're still missing the existence proof of this, but even assuming it 
existed, what about snapshots?  Are we okay having a feature in a 
prominent format that isn't going to meet user's expectations?


Is there any hope that an image with 1000, 1000, or 1 snapshots is 
going to have even reasonable performance in qcow2?


Regards,

Anthony Liguori

  Compression perhaps not, but if you choose compression, then 
performance is not your top consideration.  That's the case with 
filesystems that support compression as well.

[Qemu-devel] [PATCH 04/18] qemu-char: export socket_set_nodelay().

2011-02-23 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-char.c   |2 +-
 qemu_socket.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index bd4e944..c4f1940 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2111,7 +2111,7 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
diff --git a/qemu_socket.h b/qemu_socket.h
index 897a8ae..b7f8465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,6 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_nodelay(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.1.2

[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents

2011-02-23 Thread Michael Roth


On 02/22/2011 10:59 PM, Amit Shah wrote:

On (Tue) 22 Feb 2011 [16:40:55], Michael Roth wrote:

If something in the guest is attempting to read/write from the
virtio-serial device, and nothing is connected to virtio-serial's
host character device (say, a socket)

1. writes will block until something connect()s, at which point the
write will succeed

2. reads will always return 0 until something connect()s, at which
point the reads will block until there's data

This makes it difficult (impossible?) to implement the notion of
connect/disconnect or open/close over virtio-serial without layering
another protocol on top using hackish things like length-encoded
payloads or sentinel values to determine the end of one
RPC/request/response/session and the start of the next.

For instance, if the host side disconnects, then reconnects before
we read(), we may never get the read()=0, and our FD remains valid.
Whereas with a tcp/unix socket our FD is no longer valid, and the
read()=0 is an event we can check for at any point after the other
end does a close/disconnect.


There's SIGIO support, so host connect-disconnect notifications can be
caught via the signal.


I recall looking into this at some pointbut don't we get a SIGIO for 
read/write-ability in general? So you still need some way differentiate, 
say, readability from a disconnect/EOF, and the read()=0 that could 
determine this is still racing with host-side reconnects.




Also, nonblocking reads/writes will return -EPIPE if the host-side
connection is not up.


But we still essentially need to poll() for a host-side disconnected 
state, which is still racy since they may reconnect before we've done a 
read/write that would've generated the -EPIPE. It seems like what we 
really need is for the FD to be invalid from that point forward.


Also, I focused more on the guest-side connect/disconnect detection, but 
as Anthony mentioned I think the host side shares similar limitations as 
well. AFAIK once we connect to the chardev that FD remains valid until 
the connected process closes it, and so races with the guest side on 
detecting connect/disconnect events in a similar manner. For the host 
side it looks like virtio-console has guest_close/guest_open callbacks 
already that we could potentially use...seems like it's just a matter of 
tying them to the chardev... basically having virtio-serial's 
guest_close() result in a close() on the corresponding chardev 
connection's FD.




Amit

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Anthony Liguori


On 02/23/2011 03:13 AM, Kevin Wolf wrote:

Am 22.02.2011 19:18, schrieb Anthony Liguori:
   

On 02/22/2011 10:15 AM, Kevin Wolf wrote:
 

Am 22.02.2011 16:57, schrieb Anthony Liguori:

   

On 02/22/2011 02:56 AM, Kevin Wolf wrote:

 

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a way
that they don't impact the normal case.

   

Except that they require a refcount table that adds additional metadata
that needs to be updated in the fast path.  I consider that impacting
the normal case.

 

Like it or not, this requirement exists anyway, without any of your
misfeatures.

You chose to use the dirty flag in QED in order to avoid having to flush
metadata too often, which is an approach that any other format, even one
using refcounts, can take as well.

   

It's a minor detail, but flushing and the amount of metadata are
separate points.
 

I agree that they are separate...

   

The dirty flag prevents metadata from being flushed to disk very often
but the use of a refcount table adds additional metadata.

A refcount table is definitely not required even if you claim the
requirement exists for other features.  I assume you mean to implement
trim/discard support but instead of a refcount table, a free list would
work just as well and would leave the metadata update out of the fast
path (allocating writes) and instead only be in the slow path
(trim/discard).
 

...but here you're arguing about writing metadata out in the fast path,
so you're actually not interested in the amount of metadata but in the
overhead of flushing it. Which is a problem that's solved.
   


I'm interested in both.  An extra write is always going to be an extra 
write.  The flush just makes it very painful.



A refcount table is essential for internal snapshots and compression,
it's useful for discard and for running on block devices, it's necessary
for avoiding the dirty flag and fsck on startup.
   


No, as designed today, qcow2 still needs a dirty flag to avoid leaking 
blocks.



These are five use cases that I can enumerate without thinking a lot
about it, there might be more. You propose using three different
mechanisms for allowing normal allocations (use the file size), block
devices (add a size field into the header) and discard (free list), and
the other three features, for which you can't think of a hack, you
declare misfeatures.
   


No, I only label compression and internal snapshots as misfeatures.  
Encryption is a completely reasonable feature.


So even with qcow3, what's the expectation of snapshots?  Are we going 
to scale to images with over 1000 snapshots?  I believe snapshot support 
in qcow2 is not a feature that has been designed with any serious 
thought.  If we truly want to support internal snapshots, let's design 
it correctly.



As a format feature, a refcount table really only makes sense if the
refcount is required to be greater than a single bit.  There are more
optimal data structures that can be used if the refcount of a block is
fixed to 1-bit (like a free list) which is what the fundamental design
difference between qcow2 and qed is.
 

Okay, so even assuming that there's something like misfeatures that we
can kick out (with which I strongly disagree), what's the crucial
advantage of free lists that would make you switch the image format?
   


Performance.  One thing we haven't tested with qcow2 is O_SYNC 
performance in the guest but my suspicion is that an O_SYNC workload is 
going to perform poorly even with cache=none.


Starting with a simple format that we don't have to jump through 
tremendous hoops to get reasonable performance out of has a lot of virtues.


Regards,

Anthony Liguori

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-23 Thread Anthony Liguori


On 02/23/2011 07:01 AM, Avi Kivity wrote:

On 02/23/2011 01:14 AM, Anthony Liguori wrote:


-drive already ties into the qemuopts infrastructure and we have 
readconfig and writeconfig.  I don't think we're missing any major 
pieces to do this in a more proper fashion.


The problem with qemu config files is that it splits the authoritative 
source of where images are stored into two.  Is it in the management 
tool's database or is it in qemu's config file?


I like to use the phrase stateful config file.  To me, it's just a 
database for QEMU to persist data about the VM.  It's the only way for 
QEMU to make certain transactions atomic in the face of QEMU crashing.


The user visible config file is a totally different concept.  A 
management tool launches QEMU and tells it where to keep it's state 
database.  The management application may prepopulate the state database 
or it may just use an empty file.


QEMU uses the state database to store information that is created 
dynamically.  For instance, devices added through device_add.  A device 
added via -device wouldn't necessary get added to the state database.


Practically speaking, it let's you invoke QEMU with a fixed command 
line, while still using the monitor to make changes that would otherwise 
require the command line being updated.


For the problem at hand, one solution is to make qemu stop after the 
copy, and then management can issue an additional command to rearrange 
the disk and resume the guest.  A drawback here is that if management 
dies, the guest is stopped until it restarts.  We also make management 
latency guest visible, even if it doesn't die at an inconvenient place.


An alternative approach is to have the copy be performed by a new 
layered block format driver:


- create a new image, type = live-copy, containing three pieces of 
information

   - source image
   - destination image
   - copy state (initially nothing is copied)
- tell qemu switch to the new image
- qemu starts copying, updates copy state as needed
- copy finishes, event is emitted; reads and writes still serviced
- management receives event, switches qemu to destination image
- management removes live-copy image

If management dies while this is happening, it can simply query the 
state of the copy.  Similarly, if qemu dies, the copy state is 
persistent (could be 0/1 or real range of blocks).


This is a more elegant solution to the problem than the commit problem 
but it's also a one-off.  I think we have a generic problem here and we 
ought to try to solve it generically (within reason).


Regards,

Anthony Liguori

[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents

2011-02-23 Thread Michael Roth


On 02/23/2011 08:31 AM, Michael Roth wrote:

On 02/22/2011 10:59 PM, Amit Shah wrote:

On (Tue) 22 Feb 2011 [16:40:55], Michael Roth wrote:

If something in the guest is attempting to read/write from the
virtio-serial device, and nothing is connected to virtio-serial's
host character device (say, a socket)

1. writes will block until something connect()s, at which point the
write will succeed

2. reads will always return 0 until something connect()s, at which
point the reads will block until there's data

This makes it difficult (impossible?) to implement the notion of
connect/disconnect or open/close over virtio-serial without layering
another protocol on top using hackish things like length-encoded
payloads or sentinel values to determine the end of one
RPC/request/response/session and the start of the next.

For instance, if the host side disconnects, then reconnects before
we read(), we may never get the read()=0, and our FD remains valid.
Whereas with a tcp/unix socket our FD is no longer valid, and the
read()=0 is an event we can check for at any point after the other
end does a close/disconnect.


There's SIGIO support, so host connect-disconnect notifications can be
caught via the signal.


I recall looking into this at some pointbut don't we get a SIGIO for
read/write-ability in general? So you still need some way differentiate,
say, readability from a disconnect/EOF, and the read()=0 that could
determine this is still racing with host-side reconnects.



Also, nonblocking reads/writes will return -EPIPE if the host-side
connection is not up.


But we still essentially need to poll() for a host-side disconnected


heh, just poll not poll() sorry


state, which is still racy since they may reconnect before we've done a
read/write that would've generated the -EPIPE. It seems like what we
really need is for the FD to be invalid from that point forward.

Also, I focused more on the guest-side connect/disconnect detection, but
as Anthony mentioned I think the host side shares similar limitations as
well. AFAIK once we connect to the chardev that FD remains valid until
the connected process closes it, and so races with the guest side on
detecting connect/disconnect events in a similar manner. For the host
side it looks like virtio-console has guest_close/guest_open callbacks
already that we could potentially use...seems like it's just a matter of
tying them to the chardev... basically having virtio-serial's
guest_close() result in a close() on the corresponding chardev
connection's FD.



Amit

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Kevin Wolf

Am 23.02.2011 15:23, schrieb Anthony Liguori:
 On 02/23/2011 07:43 AM, Avi Kivity wrote:
 On 02/22/2011 10:56 AM, Kevin Wolf wrote:
 *sigh*

 It starts to get annoying, but if you really insist, I can repeat it
 once more: These features that you don't need (this is the correct
 description for what you call misfeatures) _are_ implemented in a way
 that they don't impact the normal case. And they are it today.


 Plus, encryption and snapshots can be implemented in a way that 
 doesn't impact performance more than is reasonable.
 
 We're still missing the existence proof of this, but even assuming it 

Define reasonable. I sent you some numbers not too long for
encryption, and I consider them reasonable (iirc, between 25% and 40%
slower than without encryption).

 existed, what about snapshots?  Are we okay having a feature in a 
 prominent format that isn't going to meet user's expectations?
 
 Is there any hope that an image with 1000, 1000, or 1 snapshots is 
 going to have even reasonable performance in qcow2?

Is there any hope for backing file chains of 1000 files or more? I
haven't tried it out, but in theory I'd expect that internal snapshots
could cope better with it than external ones because internal snapshots
don't have to go through the whole chain all the time.

What are the points where you think that performance of internal
snapshots suffers?

The argument that I would understand is that internal snapshots are
probably not as handy in all scenarios.

Kevin

Re: [Qemu-devel] Re: Network bridging without adding bridge with brctl, possible?

2011-02-23 Thread Arnd Bergmann

On Wednesday 23 February 2011, Gerhard Wiesinger wrote:
 After some further tests and looking at the iproute ip and kernel code I 
 finally gave up because I thing such a setup it is not possible without 
 breaking up/reconfiguring eth0. When I have to reconfigure eth0 I think a 
 better approach is to configure a bridge which I finally did and works 
 well.
 
 I tried to explain/document the macvtap/macvlan concepts and limitations 
 below. Please comment on it whether this is true or false.
 
 macvtap/macvlan driver concepts and limitations:
 1.) macvlan driver adds a MAC address to a lower interface device where 
 the actual macvlanx device is based on
 2.) macvtap driver is based on macvlan driver and macvtap driver adds 
 additional functionality of interface = external program communication 
 with stdin/stdout channel.
 3.) Limitations: macvtap/macvlan based devices can only communicate with 
 childs based on the same lower device (e.g. eth0 in this sample) but not 
 to the lower device itself, only to the outside world of the interface

Correct.

 finally this makes the macvlan/macvtap approach useless because main eth0 
 interface must still be broken in the chain and reconfigured which was 
 against the requirements that eth0 should not be touched and reconfigured!

Yes, that is unfortunate, but it's the same that you'd get with a bridge
device: 
When you have a bridge on top of eth0, you can no longer assign an IP
address to eth0 and let it communicate with the virtual ports on the
bridge. You need to instead set the IP address on the bridge itself.

Macvlan is slightly better because it allows you to have multiple
host devices that can each have their own MAC/IP address, unlike
the bridge, but of course it can not be connected to anything else
besides macvlan or macvtap ports.

Arnd

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Kevin Wolf

Am 23.02.2011 15:21, schrieb Anthony Liguori:
 On 02/23/2011 03:13 AM, Kevin Wolf wrote:
 Am 22.02.2011 19:18, schrieb Anthony Liguori:

 On 02/22/2011 10:15 AM, Kevin Wolf wrote:
  
 Am 22.02.2011 16:57, schrieb Anthony Liguori:


 On 02/22/2011 02:56 AM, Kevin Wolf wrote:

  
 *sigh*

 It starts to get annoying, but if you really insist, I can repeat it
 once more: These features that you don't need (this is the correct
 description for what you call misfeatures) _are_ implemented in a way
 that they don't impact the normal case.


 Except that they require a refcount table that adds additional metadata
 that needs to be updated in the fast path.  I consider that impacting
 the normal case.

  
 Like it or not, this requirement exists anyway, without any of your
 misfeatures.

 You chose to use the dirty flag in QED in order to avoid having to flush
 metadata too often, which is an approach that any other format, even one
 using refcounts, can take as well.


 It's a minor detail, but flushing and the amount of metadata are
 separate points.
  
 I agree that they are separate...


 The dirty flag prevents metadata from being flushed to disk very often
 but the use of a refcount table adds additional metadata.

 A refcount table is definitely not required even if you claim the
 requirement exists for other features.  I assume you mean to implement
 trim/discard support but instead of a refcount table, a free list would
 work just as well and would leave the metadata update out of the fast
 path (allocating writes) and instead only be in the slow path
 (trim/discard).
  
 ...but here you're arguing about writing metadata out in the fast path,
 so you're actually not interested in the amount of metadata but in the
 overhead of flushing it. Which is a problem that's solved.

 
 I'm interested in both.  An extra write is always going to be an extra 
 write.  The flush just makes it very painful.

One extra write of 64k every 2 GB. Hardly relevant.

 A refcount table is essential for internal snapshots and compression,
 it's useful for discard and for running on block devices, it's necessary
 for avoiding the dirty flag and fsck on startup.

 
 No, as designed today, qcow2 still needs a dirty flag to avoid leaking 
 blocks.

I know that this is your opinion and I do respect that, this is one of
the reasons why there is the suggestion to add the dirty flag for you.

On the other hand, it would be about time for you to accept that there
are people who think differently about it and who don't want the same as
you. This is why using the dirty flag should be optional.

 These are five use cases that I can enumerate without thinking a lot
 about it, there might be more. You propose using three different
 mechanisms for allowing normal allocations (use the file size), block
 devices (add a size field into the header) and discard (free list), and
 the other three features, for which you can't think of a hack, you
 declare misfeatures.

 
 No, I only label compression and internal snapshots as misfeatures.  
 Encryption is a completely reasonable feature.

I didn't even mention encryption. It's obvious that it's a reasonable
feature and not a misfeature, because it fits relatively easily in
your QED design. :-)

The three features you don't like because they don't fit are
compression, internal snapshots and not having to fsck (thanks for
proving the latter above)

 So even with qcow3, what's the expectation of snapshots?  Are we going 
 to scale to images with over 1000 snapshots?  I believe snapshot support 
 in qcow2 is not a feature that has been designed with any serious 
 thought.  If we truly want to support internal snapshots, let's design 
 it correctly.

So what would be the key differences between your design and qcow2's? We
can always check if there's room to improve.

 As a format feature, a refcount table really only makes sense if the
 refcount is required to be greater than a single bit.  There are more
 optimal data structures that can be used if the refcount of a block is
 fixed to 1-bit (like a free list) which is what the fundamental design
 difference between qcow2 and qed is.
  
 Okay, so even assuming that there's something like misfeatures that we
 can kick out (with which I strongly disagree), what's the crucial
 advantage of free lists that would make you switch the image format?
 
 Performance.  One thing we haven't tested with qcow2 is O_SYNC 
 performance in the guest but my suspicion is that an O_SYNC workload is 
 going to perform poorly even with cache=none.

But wasn't it you who wants to use the dirty flag in any case? The
refcounts aren't even written then.

 Starting with a simple format that we don't have to jump through 
 tremendous hoops to get reasonable performance out of has a lot of virtues.

I know that you don't mean it like I read this, but it's entirely true:
You're _starting_ with a simple

Re: [Qemu-devel] [PATCH 1/7] usb-ccid: add CCID bus

2011-02-23 Thread Alon Levy

On Tue, Feb 22, 2011 at 10:03:45AM -0600, Anthony Liguori wrote:
 On 02/07/2011 10:34 AM, Alon Levy wrote:
 +static int ccid_post_load(void *opaque, int version_id)
 +{
 +USBCCIDState *s = opaque;
 +
 +// This must be done after usb_device_attach, which sets state to 
 ATTACHED,
 +// while it must be DEFAULT in order to accept packets (like it is after
 +// reset, but reset will reset our addr and call our reset handler which
 +// may change state, and we don't want to do that when migrating).
 +s-dev.state = s-state_vmstate;
 +return 0;
 +}
 +
 +static void ccid_pre_save(void *opaque)
 +{
 +USBCCIDState *s = opaque;
 +
 +s-state_vmstate = s-dev.state;
 +if (s-dev.attached) {
 +// migrating an open device, ignore reconnection CHR_EVENT to avoid 
 an
 +// erronous detach.
 +s-migration_state = MIGRATION_MIGRATED;
 +}
 
 
 Still using C99 comments, should be C89.

Fixed (by checkpatch.pl iteration) in v20.

 
 Regards,
 
 Anthony Liguori

[Qemu-devel] [PATCH 0/3] tcg: Support debugging leakage of temporaries

2011-02-23 Thread Peter Maydell

This patchset removes the ad-hoc debug code in target-arm for
identifying cases where we leaked TCG temporary variables, in
favour of an implementation in tcg itself.

Generally any temporaries created by a target while it is
translating an instruction should be freed by the end of that
instruction; otherwise carefully crafted guest code could cause
TCG to run out of temporaries and assert.

Putting the leak-debugging code into TCG proper (a) makes more
sense as this isn't at all arm-specific (b) makes it more
comprehensive, as it now covers temporaries created in all ways,
not just via the new_tmp()/dead_tmp() wrapper functions
(c) avoids annoying false positives where eg a TCG temp created
with tcg_const_i32() was passed to dead_tmp().

The tracking only happens if qemu was configured with
--enable-debug-tcg. It should be easy to add to other targets if
desired; it's just a matter of calling tcg_clear_temp_count()
and tcg_check_temp_count() in the appropriate places.

Peter Maydell (3):
  tcg: Add support for debugging leakage of temporaries
  target-arm: Remove ad-hoc leak checking code
  target-arm: Use TCG temporary leak debugging facilities

 target-arm/translate.c |  705 +++
 tcg/tcg.c  |   32 +++
 tcg/tcg.h  |   17 ++
 3 files changed, 394 insertions(+), 360 deletions(-)

[Qemu-devel] [PATCH 3/3] target-arm: Use TCG temporary leak debugging facilities

2011-02-23 Thread Peter Maydell

Use the new TCG temporary leak debugging facilities to
check that each ARM instruction does not leak temporaries.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/translate.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 31067d5..b96a136 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -9125,6 +9125,8 @@ static inline void 
gen_intermediate_code_internal(CPUState *env,
 
 gen_icount_start();
 
+tcg_clear_temp_count();
+
 /* A note on handling of the condexec (IT) bits:
  *
  * We want to avoid the overhead of having to write the updated condexec
@@ -9234,6 +9236,11 @@ static inline void 
gen_intermediate_code_internal(CPUState *env,
 gen_set_label(dc-condlabel);
 dc-condjmp = 0;
 }
+
+if (tcg_check_temp_count()) {
+fprintf(stderr, TCG temporary leak before %08x\n, dc-pc);
+}
+
 /* Translation stops when a conditional branch is encountered.
  * Otherwise the subsequent code could get translated several times.
  * Also stop translation when a page boundary is reached.  This
-- 
1.7.1

[Qemu-devel] [PATCH 1/3] tcg: Add support for debugging leakage of temporaries

2011-02-23 Thread Peter Maydell

Add support (if CONFIG_DEBUG_TCG is defined) for debugging leakage
of temporary variables. Generally any temporaries created by
a target while it is translating an instruction should be freed
by the end of that instruction; otherwise carefully crafted
guest code could cause TCG to run out of temporaries and assert.
By calling tcg_check_temp_count() after each instruction we can
check that we are not leaking temporaries in this way.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 tcg/tcg.c |   32 
 tcg/tcg.h |   17 +
 2 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5dd6a2c..8748c05 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -450,6 +450,10 @@ static inline int tcg_temp_new_internal(TCGType type, int 
temp_local)
 s-nb_temps++;
 }
 }
+
+#if defined(CONFIG_DEBUG_TCG)
+s-temps_in_use++;
+#endif
 return idx;
 }
 
@@ -475,6 +479,13 @@ static inline void tcg_temp_free_internal(int idx)
 TCGTemp *ts;
 int k;
 
+#if defined(CONFIG_DEBUG_TCG)
+s-temps_in_use--;
+if (s-temps_in_use  0) {
+fprintf(stderr, More temporaries freed than allocated!\n);
+}
+#endif
+
 assert(idx = s-nb_globals  idx  s-nb_temps);
 ts = s-temps[idx];
 assert(ts-temp_allocated != 0);
@@ -528,6 +539,27 @@ TCGv_i64 tcg_const_local_i64(int64_t val)
 return t0;
 }
 
+#if defined(CONFIG_DEBUG_TCG)
+void tcg_clear_temp_count(void)
+{
+TCGContext *s = tcg_ctx;
+s-temps_in_use = 0;
+}
+
+int tcg_check_temp_count(void)
+{
+TCGContext *s = tcg_ctx;
+if (s-temps_in_use) {
+/* Clear the count so that we don't give another
+ * warning immediately next time around.
+ */
+s-temps_in_use = 0;
+return 1;
+}
+return 0;
+}
+#endif
+
 void tcg_register_helper(void *func, const char *name)
 {
 TCGContext *s = tcg_ctx;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index e1afde2..5538db9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -323,6 +323,10 @@ struct TCGContext {
 int64_t restore_count;
 int64_t restore_time;
 #endif
+
+#ifdef CONFIG_DEBUG_TCG
+int temps_in_use;
+#endif
 };
 
 extern TCGContext tcg_ctx;
@@ -392,6 +396,19 @@ static inline TCGv_i64 tcg_temp_local_new_i64(void)
 void tcg_temp_free_i64(TCGv_i64 arg);
 char *tcg_get_arg_str_i64(TCGContext *s, char *buf, int buf_size, TCGv_i64 
arg);
 
+#if defined(CONFIG_DEBUG_TCG)
+/* If you call tcg_clear_temp_count() at the start of a section of
+ * code which is not supposed to leak any TCG temporaries, then
+ * calling tcg_check_temp_count() at the end of the section will
+ * return 1 if the section did in fact leak a temporary.
+ */
+void tcg_clear_temp_count(void);
+int tcg_check_temp_count(void);
+#else
+#define tcg_clear_temp_count()
+#define tcg_check_temp_count() 0
+#endif
+
 void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf);
 
 #define TCG_CT_ALIAS  0x80
-- 
1.7.1

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Avi Kivity


On 02/23/2011 04:23 PM, Anthony Liguori wrote:

On 02/23/2011 07:43 AM, Avi Kivity wrote:

On 02/22/2011 10:56 AM, Kevin Wolf wrote:

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a way
that they don't impact the normal case. And they are it today.



Plus, encryption and snapshots can be implemented in a way that 
doesn't impact performance more than is reasonable.


We're still missing the existence proof of this, but even assuming it 
existed,


dm-crypt isn't any more complicated, and it's used by default in most 
distributions these days.


what about snapshots?  Are we okay having a feature in a prominent 
format that isn't going to meet user's expectations?


Is there any hope that an image with 1000, 1000, or 1 snapshots is 
going to have even reasonable performance in qcow2?




Are thousands of snapshots for a single image a reasonable user 
expectation?  What's the use case?


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Anthony Liguori


On 02/23/2011 08:38 AM, Kevin Wolf wrote:

Am 23.02.2011 15:23, schrieb Anthony Liguori:
   

On 02/23/2011 07:43 AM, Avi Kivity wrote:
 

On 02/22/2011 10:56 AM, Kevin Wolf wrote:
   

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a way
that they don't impact the normal case. And they are it today.

 

Plus, encryption and snapshots can be implemented in a way that
doesn't impact performance more than is reasonable.
   

We're still missing the existence proof of this, but even assuming it
 

Define reasonable. I sent you some numbers not too long for
encryption, and I consider them reasonable (iirc, between 25% and 40%
slower than without encryption).
   


I was really referring to snapshots.  I have absolutely no doubt that 
encryption can be implemented with a reasonable performance overhead.



existed, what about snapshots?  Are we okay having a feature in a
prominent format that isn't going to meet user's expectations?

Is there any hope that an image with 1000, 1000, or 1 snapshots is
going to have even reasonable performance in qcow2?
 

Is there any hope for backing file chains of 1000 files or more? I
haven't tried it out, but in theory I'd expect that internal snapshots
could cope better with it than external ones because internal snapshots
don't have to go through the whole chain all the time.
   


I don't think there's a user expectation of backing file chains of 1000 
files performing well.  However, I've talked to a number of customers 
that have been interested in using internal snapshots for checkpointing 
which would involve a large number of snapshots.


In fact, Fabrice originally added qcow2 because he was interested in 
doing reverse debugging.  The idea of internal snapshots was to store a 
high number of checkpoints to allow reverse debugging to be optimized.


I think the way snapshot metadata is stored makes this not realistic 
since they're stored in more or less a linear array.  I think to really 
support a high number of snapshots, you'd want to store a hash with each 
block that contained a refcount  1.  I think you quickly end up 
reinventing btrfs though in the process.


Regards,

Anthony Liguori


What are the points where you think that performance of internal
snapshots suffers?

The argument that I would understand is that internal snapshots are
probably not as handy in all scenarios.

Kevin

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Daniel P. Berrange

On Wed, Feb 23, 2011 at 05:23:33PM +0200, Avi Kivity wrote:
 On 02/23/2011 04:23 PM, Anthony Liguori wrote:
 On 02/23/2011 07:43 AM, Avi Kivity wrote:
 On 02/22/2011 10:56 AM, Kevin Wolf wrote:
 *sigh*
 
 It starts to get annoying, but if you really insist, I can repeat it
 once more: These features that you don't need (this is the correct
 description for what you call misfeatures) _are_ implemented in a way
 that they don't impact the normal case. And they are it today.
 
 
 Plus, encryption and snapshots can be implemented in a way that
 doesn't impact performance more than is reasonable.
 
 We're still missing the existence proof of this, but even assuming
 it existed,
 
 dm-crypt isn't any more complicated, and it's used by default in
 most distributions these days.

IMHO dm-crypt isn't a generally usable alternative to native built
in encryption in qcow2. It isn't usable at all by non-root. If you
want to use with plain files, then you need to turn the file into
a loopback device and then layer in dm-crypt. It is generally
just a PITA to manage.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-23 Thread Avi Kivity


On 02/23/2011 04:35 PM, Anthony Liguori wrote:

On 02/23/2011 07:01 AM, Avi Kivity wrote:

On 02/23/2011 01:14 AM, Anthony Liguori wrote:


-drive already ties into the qemuopts infrastructure and we have 
readconfig and writeconfig.  I don't think we're missing any major 
pieces to do this in a more proper fashion.


The problem with qemu config files is that it splits the 
authoritative source of where images are stored into two.  Is it in 
the management tool's database or is it in qemu's config file?


I like to use the phrase stateful config file.  To me, it's just a 
database for QEMU to persist data about the VM.  It's the only way for 
QEMU to make certain transactions atomic in the face of QEMU crashing.


The user visible config file is a totally different concept.  A 
management tool launches QEMU and tells it where to keep it's state 
database.  The management application may prepopulate the state 
database or it may just use an empty file.


In that case the word 'config' is misleading.  To me, it implies that 
the user configures something, and qemu reads it, not something mostly 
internal to qemu.


Qemu does keep state.  Currently only images, but in theory also the 
on-board NVRAM.




QEMU uses the state database to store information that is created 
dynamically.  For instance, devices added through device_add.  A 
device added via -device wouldn't necessary get added to the state 
database.


Practically speaking, it let's you invoke QEMU with a fixed command 
line, while still using the monitor to make changes that would 
otherwise require the command line being updated.


Then the invoker quickly loses track of what the actual state is.  It 
can't just remember which commands it issued (presumably in response to 
the user updating user visible state).  It has to parse the stateful 
config file qemu outputs.  But at which points should it parse it?


I don't think it's reasonable to have three different ways to interact 
with qemu, all needed: the command line, reading and writing the 
stateful config file, and the monitor.  I'd rather push for starting 
qemu with a blank guest and assembling (cold-plugging) all the hardware 
via the monitor before starting the guest.


For the problem at hand, one solution is to make qemu stop after the 
copy, and then management can issue an additional command to 
rearrange the disk and resume the guest.  A drawback here is that if 
management dies, the guest is stopped until it restarts.  We also 
make management latency guest visible, even if it doesn't die at an 
inconvenient place.


An alternative approach is to have the copy be performed by a new 
layered block format driver:


- create a new image, type = live-copy, containing three pieces of 
information

   - source image
   - destination image
   - copy state (initially nothing is copied)
- tell qemu switch to the new image
- qemu starts copying, updates copy state as needed
- copy finishes, event is emitted; reads and writes still serviced
- management receives event, switches qemu to destination image
- management removes live-copy image

If management dies while this is happening, it can simply query the 
state of the copy.  Similarly, if qemu dies, the copy state is 
persistent (could be 0/1 or real range of blocks).


This is a more elegant solution to the problem than the commit problem 
but it's also a one-off.  I think we have a generic problem here and 
we ought to try to solve it generically (within reason).


Can you give more examples?

I think I demonstrated that hot-plug can be solved via the existing 
interfaces.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Anthony Liguori


On 02/23/2011 09:23 AM, Avi Kivity wrote:

On 02/23/2011 04:23 PM, Anthony Liguori wrote:

On 02/23/2011 07:43 AM, Avi Kivity wrote:

On 02/22/2011 10:56 AM, Kevin Wolf wrote:

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call misfeatures) _are_ implemented in a 
way

that they don't impact the normal case. And they are it today.



Plus, encryption and snapshots can be implemented in a way that 
doesn't impact performance more than is reasonable.


We're still missing the existence proof of this, but even assuming it 
existed,


dm-crypt isn't any more complicated, and it's used by default in most 
distributions these days.


what about snapshots?  Are we okay having a feature in a prominent 
format that isn't going to meet user's expectations?


Is there any hope that an image with 1000, 1000, or 1 snapshots 
is going to have even reasonable performance in qcow2?




Are thousands of snapshots for a single image a reasonable user 
expectation?  What's the use case?


Checkpointing.  It was the original use-case that led to qcow2 being 
invented.


Regards,

Anthony Liguori

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Avi Kivity


On 02/23/2011 05:29 PM, Anthony Liguori wrote:



existed, what about snapshots?  Are we okay having a feature in a
prominent format that isn't going to meet user's expectations?

Is there any hope that an image with 1000, 1000, or 1 snapshots is
going to have even reasonable performance in qcow2?

Is there any hope for backing file chains of 1000 files or more? I
haven't tried it out, but in theory I'd expect that internal snapshots
could cope better with it than external ones because internal snapshots
don't have to go through the whole chain all the time.


I don't think there's a user expectation of backing file chains of 
1000 files performing well.  However, I've talked to a number of 
customers that have been interested in using internal snapshots for 
checkpointing which would involve a large number of snapshots.


In fact, Fabrice originally added qcow2 because he was interested in 
doing reverse debugging.  The idea of internal snapshots was to store 
a high number of checkpoints to allow reverse debugging to be optimized.


I don't see how that works, since the memory image is duplicated for 
each snapshot.  So thousands of snapshots = terabytes of storage, and 
hours of creating the snapshots.


Migrate-to-file with block live migration, or even better, something 
based on Kemari would be a lot faster.




I think the way snapshot metadata is stored makes this not realistic 
since they're stored in more or less a linear array.  I think to 
really support a high number of snapshots, you'd want to store a hash 
with each block that contained a refcount  1.  I think you quickly 
end up reinventing btrfs though in the process.


Can you elaborate?  What's the problem with a linear array of snapshots 
(say up to 10,000 snapshots)?


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Avi Kivity


On 02/23/2011 05:33 PM, Daniel P. Berrange wrote:

On Wed, Feb 23, 2011 at 05:23:33PM +0200, Avi Kivity wrote:
  On 02/23/2011 04:23 PM, Anthony Liguori wrote:
  On 02/23/2011 07:43 AM, Avi Kivity wrote:
  On 02/22/2011 10:56 AM, Kevin Wolf wrote:
  *sigh*
  
  It starts to get annoying, but if you really insist, I can repeat it
  once more: These features that you don't need (this is the correct
  description for what you call misfeatures) _are_ implemented in a way
  that they don't impact the normal case. And they are it today.
  
  
  Plus, encryption and snapshots can be implemented in a way that
  doesn't impact performance more than is reasonable.
  
  We're still missing the existence proof of this, but even assuming
  it existed,

  dm-crypt isn't any more complicated, and it's used by default in
  most distributions these days.

IMHO dm-crypt isn't a generally usable alternative to native built
in encryption in qcow2. It isn't usable at all by non-root. If you
want to use with plain files, then you need to turn the file into
a loopback device and then layer in dm-crypt. It is generally
just a PITA to manage.


I wasn't suggesting dm-crypt is a replacement for qcow2 encyption, just 
that it shows that block-level encryption can be done with reasonable 
overhead.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Avi Kivity


On 02/23/2011 05:31 PM, Anthony Liguori wrote:


what about snapshots?  Are we okay having a feature in a prominent 
format that isn't going to meet user's expectations?


Is there any hope that an image with 1000, 1000, or 1 snapshots 
is going to have even reasonable performance in qcow2?




Are thousands of snapshots for a single image a reasonable user 
expectation?  What's the use case?



Checkpointing.  It was the original use-case that led to qcow2 being 
invented.


I still don't see.  What would you do with thousands of checkpoints?

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Strategic decision: COW format

2011-02-23 Thread Anthony Liguori


On 02/23/2011 09:36 AM, Avi Kivity wrote:

On 02/23/2011 05:29 PM, Anthony Liguori wrote:



existed, what about snapshots?  Are we okay having a feature in a
prominent format that isn't going to meet user's expectations?

Is there any hope that an image with 1000, 1000, or 1 snapshots is
going to have even reasonable performance in qcow2?

Is there any hope for backing file chains of 1000 files or more? I
haven't tried it out, but in theory I'd expect that internal snapshots
could cope better with it than external ones because internal snapshots
don't have to go through the whole chain all the time.


I don't think there's a user expectation of backing file chains of 
1000 files performing well.  However, I've talked to a number of 
customers that have been interested in using internal snapshots for 
checkpointing which would involve a large number of snapshots.


In fact, Fabrice originally added qcow2 because he was interested in 
doing reverse debugging.  The idea of internal snapshots was to store 
a high number of checkpoints to allow reverse debugging to be optimized.


I don't see how that works, since the memory image is duplicated for 
each snapshot.  So thousands of snapshots = terabytes of storage, and 
hours of creating the snapshots.


Fabrice wanted to use CoW to as a mechanism to deduplicate the memory 
contents with the on-disk state specifically to address this problem.  
For the longest time, there was a comment in the savevm code along these 
lines.  It might still be there.


I think the lack of on-disk hashes was a critical missing bit to make 
this feature really work well.


Migrate-to-file with block live migration, or even better, something 
based on Kemari would be a lot faster.




I think the way snapshot metadata is stored makes this not realistic 
since they're stored in more or less a linear array.  I think to 
really support a high number of snapshots, you'd want to store a hash 
with each block that contained a refcount  1.  I think you quickly 
end up reinventing btrfs though in the process.


Can you elaborate?  What's the problem with a linear array of 
snapshots (say up to 10,000 snapshots)?


Lots of things.  The array will start to consume quite a bit of 
contiguous space as it gets larger which means it needs to be 
relocated.  Deleting a snapshot is a far more expensive operation than 
it needs to be.


Regards,

Anthony Liguori

1 2 3 >

1 - 100 of 222 matches

Mail list logo