date:20140218

Re: [Qemu-devel] [PULL v2 46/49] qdev: Add enum property types to QAPI schema

2014-02-18 Thread Paolo Bonzini


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 17/02/2014 23:30, Eric Blake ha scritto:
| Oops - we lost the ## header before add_client.  Sounds like a
| followup patch might be called for, unless you are okay tweaking
| the pull request.

Yeah, I can do that with qemu-trivial.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTAxRgAAoJEBvWZb6bTYby+0kP/jo/84UmFlN6zJssfed++kkm
tffNBa2Aom2oWiPiUv80J8sstiL7QRI44szHrnj9IVew8q/iSuIjHKQSvG9NvdE+
zEB1XKqcm4dlXOpVZ+Jq4w9g2K7BLmtoEJY75oZEE/u8731EI1IMVqmzGJ1dBYsh
/XDS9drIsHfZvDNcqTtRdY7ZhFP4QZo/UrDcX4vHx8ak6QcGWllfTn8sUGmkV3rS
p643v54WCwE+ZhTtfWEHwhTU7c0tUZzr+UxGXZ6bumJ7CXCdgeZltYkqxUc/qOlb
Fg9HEZiFaDxGDtI2IzJu4G2Y67j7twM8h/USSkx1lUqaQNQ2wC7nLDZ3GsnkY8XY
450DfVKfBAN7bAFf8jVT+h2D/AgfX1qYsKrozgOGppxQRmIhU0QlPFm/1/V9paCX
0Nje6oG+89XMUl3KKstyXzKiKGQSnETu2iyVMpeLcQyDUZWdst0c7yh4yME3giIH
sS8354qcdOs2VYE02z7ceRU7M3vwv6sfuGyrN15oeFI/x42CTCseH2B8Unvn815F
Da/hHNqNy58PTgVbMybz/sxYRBKBl8euTvQuxjLyg+2SvOJUtvK040lRuVk9YwTO
Q0RmnVU5bXkO8A27l34zccnekLAbIJVgJKLVzB9tpGdONNLsUuXXwgtfP2Kw+IVI
I0fyhNi64xs6LCER/QI6
=FzxL
-END PGP SIGNATURE-

Re: [Qemu-devel] [PATCH v20 18/26] sheepdog.c: replace QEMUOptionParameter with QemuOpts

2014-02-18 Thread Chunyan Liu

2014-02-18 3:01 GMT+08:00 Eric Blake ebl...@redhat.com:

 On 02/11/2014 11:33 PM, Chunyan Liu wrote:
  sheepdog.c: replace QEMUOptionParameter with QemuOpts
 
  Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
  Signed-off-by: Chunyan Liu cy...@suse.com
  ---
   block/sheepdog.c |  101
 +-
   1 files changed, 47 insertions(+), 54 deletions(-)
 

  +
  +buf = NULL;
  +buf = qemu_opt_get_del(opts, BLOCK_OPT_REDUNDANCY);

 Dead NULL assignment.

  +if (buf) {
  +ret = parse_redundancy(s, buf);
  +if (ret  0) {
  +goto out;
   }

 Do you need to store into ret here, or is it sufficient to use 'if
 (parse_redundancy(s, buf)  0) {'

 It followed the logic in existing code. If function ret  0, return
with that return value. Same logic in earlier codes of the function.
if (strstr(filename, ://)) {
ret = sd_parse_uri(s, filename, s-name, snapid, tag);
} else {
ret = parse_vdiname(s, filename, s-name, snapid, tag);
}
if (ret  0) {
goto out;
}


  -{
  -.name = BLOCK_OPT_REDUNDANCY,
  -.type = OPT_STRING,
  -.help = Redundancy of the image
  -},
  -{ NULL }
  +static QemuOptsList sd_create_opts = {
  +.name = sheepdog-create-opts,
  +.head = QTAILQ_HEAD_INITIALIZER(sd_create_opts.head),
  +.desc = {
  +{

  +.name = BLOCK_OPT_PREALLOC,
  +.type = QEMU_OPT_STRING,
  +.help = Preallocation mode (allowed values: off, full)
  +},
  +{ /* end of list */ }

 Missing redundancy in the new list.

  @@ -2538,7 +2533,7 @@ static BlockDriver bdrv_sheepdog = {
   .bdrv_save_vmstate  = sd_save_vmstate,
   .bdrv_load_vmstate  = sd_load_vmstate,
 
  -.create_options = sd_create_options,
  +.create_opts   = sd_create_opts,

 Another example of the inconsistent use of .

   };
 
   static BlockDriver bdrv_sheepdog_tcp = {
  @@ -2548,7 +2543,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
   .bdrv_needs_filename = true,
   .bdrv_file_open = sd_open,
   .bdrv_close = sd_close,
  -.bdrv_create= sd_create,
  +.bdrv_create2   = sd_create,
   .bdrv_has_zero_init = bdrv_has_zero_init_1,
   .bdrv_getlength = sd_getlength,
   .bdrv_get_allocated_file_size = sd_get_allocated_file_size,
  @@ -2568,7 +2563,6 @@ static BlockDriver bdrv_sheepdog_tcp = {
   .bdrv_save_vmstate  = sd_save_vmstate,
   .bdrv_load_vmstate  = sd_load_vmstate,
 
  -.create_options = sd_create_options,
   };

 Why no .create_opts replacement?

 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org

[Qemu-devel] [RFC PATCH v2 00/12] mc: fault tolerante through micro-checkpointing

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

Changes since v1:

1. Re-based against Juan's improved migration_bitmap performance changes
2. Overhauled RDMA support to prepare for better usage of RDMA in 
   other parts of the QEMU code base (such as storage).
3. Fix for netlink issues that failed to cleanup the network buffer
   device for development testing.

Michael R. Hines (12):
  mc: add documentation for micro-checkpointing
  mc: timestamp migration_bitmap and KVM logdirty usage
  mc: introduce a 'checkpointing' status check into the VCPU states
  mc: support custom page loading and copying
  rdma: accelerated memcpy() support and better external RDMA user
interfaces
  mc: introduce state machine changes for MC
  mc: introduce additional QMP statistics for micro-checkpointing
  mc: core logic
  mc: configure and makefile support
  mc: expose tunable parameter for checkpointing frequency
  mc: introduce new capabilities to control micro-checkpointing
  mc: activate and use MC if requested

 Makefile.objs |1 +
 arch_init.c   |   72 +-
 configure |   45 +
 cpus.c|9 +-
 docs/mc.txt   |  222 
 hmp-commands.hx   |   16 +-
 hmp.c |   23 +
 hmp.h |1 +
 include/migration/migration.h |   70 +-
 include/migration/qemu-file.h |   55 +-
 migration-checkpoint.c| 1565 +
 migration-rdma.c  | 2605 +++--
 migration.c   |  156 ++-
 qapi-schema.json  |   86 +-
 qemu-file.c   |   80 +-
 qmp-commands.hx   |   23 +
 vl.c  |9 +
 17 files changed, 4097 insertions(+), 941 deletions(-)
 create mode 100644 docs/mc.txt
 create mode 100644 migration-checkpoint.c

-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 02/12] mc: timestamp migration_bitmap and KVM logdirty usage

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

We also later export these statistics over QMP for better
monitoring of micro-checkpointing as the workload changes.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 arch_init.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 80574a0..b8364b0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -193,6 +193,8 @@ typedef struct AccountingInfo {
 uint64_t skipped_pages;
 uint64_t norm_pages;
 uint64_t iterations;
+uint64_t log_dirty_time;
+uint64_t migration_bitmap_time;
 uint64_t xbzrle_bytes;
 uint64_t xbzrle_pages;
 uint64_t xbzrle_cache_miss;
@@ -201,7 +203,7 @@ typedef struct AccountingInfo {
 
 static AccountingInfo acct_info;
 
-static void acct_clear(void)
+void acct_clear(void)
 {
 memset(acct_info, 0, sizeof(acct_info));
 }
@@ -236,6 +238,16 @@ uint64_t norm_mig_pages_transferred(void)
 return acct_info.norm_pages;
 }
 
+uint64_t norm_mig_log_dirty_time(void)
+{
+return acct_info.log_dirty_time;
+}
+
+uint64_t norm_mig_bitmap_time(void)
+{
+return acct_info.migration_bitmap_time;
+}
+
 uint64_t xbzrle_mig_bytes_transferred(void)
 {
 return acct_info.xbzrle_bytes;
@@ -426,27 +438,35 @@ static void migration_bitmap_sync(void)
 static int64_t num_dirty_pages_period;
 int64_t end_time;
 int64_t bytes_xfer_now;
+int64_t begin_time;
+int64_t dirty_time;
 
 if (!bytes_xfer_prev) {
 bytes_xfer_prev = ram_bytes_transferred();
 }
 
+begin_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 if (!start_time) {
 start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 }
-
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(address_space_memory);
 
+dirty_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
 QTAILQ_FOREACH(block, ram_list.blocks, next) {
 migration_bitmap_sync_range(block-mr-ram_addr, block-length);
 }
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
 end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
-/* more than 1 second = 1000 millisecons */
+acct_info.log_dirty_time += dirty_time - begin_time;
+acct_info.migration_bitmap_time += end_time - dirty_time;
+
+/* more than 1 second = 1000 milliseconds */
 if (end_time  start_time + 1000) {
 if (migrate_auto_converge()) {
 /* The following detection logic can be refined later. For now:
@@ -548,9 +568,11 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 /* XBZRLE overflow or normal page */
 if (bytes_sent == -1) {
 bytes_sent = save_block_hdr(f, block, offset, cont, 
RAM_SAVE_FLAG_PAGE);
-qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
-bytes_sent += TARGET_PAGE_SIZE;
-acct_info.norm_pages++;
+if (ret != RAM_SAVE_CONTROL_DELAYED) {
+qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
+bytes_sent += TARGET_PAGE_SIZE;
+acct_info.norm_pages++;
+}
 }
 
 /* if page is unmodified, continue to the next */
-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 07/12] mc: introduce additional QMP statistics for micro-checkpointing

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

MC provides a lot of new information, including the same RAM statistics
that ordinary migration does, so we centralize a lot of that printing
code into a common function so that the QMP printing statements don't
get duplicated too much.

We also introduce a new MCStats structure (like MigrationStats) due
to the large number of non-migration related statistics - don't want
to confuse migration and MC too much, so let's keep them separate for now.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 hmp.c | 17 +
 include/migration/migration.h |  6 +++
 migration.c   | 86 ++-
 qapi-schema.json  | 33 +
 4 files changed, 109 insertions(+), 33 deletions(-)

diff --git a/hmp.c b/hmp.c
index 1af0809..edf062e 100644
--- a/hmp.c
+++ b/hmp.c
@@ -203,6 +203,23 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info-disk-total  10);
 }
 
+if (info-has_mc) {
+monitor_printf(mon, checkpoints: % PRIu64 \n,
+   info-mc-checkpoints);
+monitor_printf(mon, xmit_time: % PRIu64  ms\n,
+   info-mc-xmit_time);
+monitor_printf(mon, log_dirty_time: % PRIu64  ms\n,
+   info-mc-log_dirty_time);
+monitor_printf(mon, migration_bitmap_time: % PRIu64  ms\n,
+   info-mc-migration_bitmap_time);
+monitor_printf(mon, ram_copy_time: % PRIu64  ms\n,
+   info-mc-ram_copy_time);
+monitor_printf(mon, copy_mbps: %0.2f mbps\n,
+   info-mc-copy_mbps);
+monitor_printf(mon, throughput: %0.2f mbps\n,
+   info-mc-mbps);
+}
+
 if (info-has_xbzrle_cache) {
 monitor_printf(mon, cache size: % PRIu64  bytes\n,
info-xbzrle_cache-cache_size);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index e876a2c..f18ff5e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -53,14 +53,20 @@ struct MigrationState
 int state;
 MigrationParams params;
 double mbps;
+double copy_mbps;
 int64_t total_time;
 int64_t downtime;
 int64_t expected_downtime;
+int64_t xmit_time;
+int64_t ram_copy_time;
+int64_t log_dirty_time;
+int64_t bitmap_time;
 int64_t dirty_pages_rate;
 int64_t dirty_bytes_rate;
 bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
 int64_t xbzrle_cache_size;
 int64_t setup_time;
+int64_t checkpoints;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration.c b/migration.c
index f42dae4..0ccbeaa 100644
--- a/migration.c
+++ b/migration.c
@@ -59,7 +59,6 @@ MigrationState *migrate_get_current(void)
 .state = MIG_STATE_NONE,
 .bandwidth_limit = MAX_THROTTLE,
 .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
-.mbps = -1,
 };
 
 return current_migration;
@@ -173,6 +172,31 @@ static void get_xbzrle_cache_stats(MigrationInfo *info)
 }
 }
 
+static void get_ram_stats(MigrationState *s, MigrationInfo *info)
+{
+info-has_total_time = true;
+info-total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+- s-total_time;
+
+info-has_ram = true;
+info-ram = g_malloc0(sizeof(*info-ram));
+info-ram-transferred = ram_bytes_transferred();
+info-ram-total = ram_bytes_total();
+info-ram-duplicate = dup_mig_pages_transferred();
+info-ram-skipped = skipped_mig_pages_transferred();
+info-ram-normal = norm_mig_pages_transferred();
+info-ram-normal_bytes = norm_mig_bytes_transferred();
+info-ram-mbps = s-mbps;
+
+if (blk_mig_active()) {
+info-has_disk = true;
+info-disk = g_malloc0(sizeof(*info-disk));
+info-disk-transferred = blk_mig_bytes_transferred();
+info-disk-remaining = blk_mig_bytes_remaining();
+info-disk-total = blk_mig_bytes_total();
+}
+}
+
 MigrationInfo *qmp_query_migrate(Error **errp)
 {
 MigrationInfo *info = g_malloc0(sizeof(*info));
@@ -199,26 +223,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info-has_setup_time = true;
 info-setup_time = s-setup_time;
 
-info-has_ram = true;
-info-ram = g_malloc0(sizeof(*info-ram));
-info-ram-transferred = ram_bytes_transferred();
-info-ram-remaining = ram_bytes_remaining();
-info-ram-total = ram_bytes_total();
-info-ram-duplicate = dup_mig_pages_transferred();
-info-ram-skipped = skipped_mig_pages_transferred();
-info-ram-normal = norm_mig_pages_transferred();
-info-ram-normal_bytes = norm_mig_bytes_transferred();
+get_ram_stats(s, info);
 info-ram-dirty_pages_rate = s-dirty_pages_rate;
-info-ram-mbps = s-mbps;
-
-if (blk_mig_active()) {
-info-has_disk = true;
-

[Qemu-devel] [RFC PATCH v2 01/12] mc: add documentation for micro-checkpointing

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

Wiki: http://wiki.qemu.org/Features/MicroCheckpointing
Github: g...@github.com:hinesmr/qemu.git, 'mc' branch

NOTE: This is a direct copy of the QEMU wiki page for the convenience
of the review process. Since this series very much in flux, instead of
maintaing two copies of documentation in two different formats, this
documentation will be properly formatted in the future when the review
process has completed.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 docs/mc.txt | 222 
 1 file changed, 222 insertions(+)
 create mode 100644 docs/mc.txt

diff --git a/docs/mc.txt b/docs/mc.txt
new file mode 100644
index 000..5d4b5fe
--- /dev/null
+++ b/docs/mc.txt
@@ -0,0 +1,222 @@
+Micro Checkpointing Specification v1
+==
+Wiki: http://wiki.qemu.org/Features/MicroCheckpointing
+Github: g...@github.com:hinesmr/qemu.git, 'mc' branch
+
+Copyright (C) 2014 Michael R. Hines mrhi...@us.ibm.com
+
+Contents
+1 Summary
+1.1 Contact
+1.2 Introduction
+2 The Micro-Checkpointing Process
+2.1 Basic Algorithm
+2.2 I/O buffering
+2.3 Failure Recovery
+3 Optimizations
+3.1 Memory Management
+3.2 RDMA Integration
+4 Usage
+4.1 BEFORE Running
+4.2 Running
+5 Performance
+6 TODO
+7 FAQ / Frequently Asked Questions
+7.1 What happens if a failure occurs in the *middle* of a flush of the network 
buffer?
+7.2 What's different about this implementation?
+Summary
+This is an implementation of Micro Checkpointing for memory and cpu state. 
Also known as: Continuous Replication or Fault Tolerance or 100 other 
different names - choose your poison.
+
+Contact
+Name: Michael Hines
+Email: mrhi...@us.ibm.com
+Wiki: http://wiki.qemu.org/Features/MicroCheckpointing
+
+Github: http://github.com/hinesmr/qemu.git, 'mc' branch
+
+Libvirt Support: http://github.com/hinesmr/libvirt.git, 'mc' branch
+
+Copyright (C) 2014 IBM Michael R. Hines mrhi...@us.ibm.com
+
+Introduction
+Micro-Checkpointing (MC) is one method for providing Fault Tolerance to a 
running virtual machine (VM) with little or no runtime assistance from the 
guest kernel or guest application software. Furthermore, Fault Tolerance is one 
method of providing high availability to a VM such that, from the perspective 
of the outside world (clients, devices, and neighboring VMs that may be paired 
with it), the VM and its applications have not lost any runtime state in the 
event of either a failure of the hypervisor/hardware to allow the VM to make 
forward progress or a complete loss of power. This mechanism for providing 
fault tolerance does *not* provide any protection whatsoever against 
software-level faults in the guest kernel or applications. In fact, due to the 
potentially extended lifetime of the VM because of this type of high 
availability, such software-level bugs may in fact manifest themselves more 
often than they otherwise ordinarily would, in which case you would need to 
empl!
 oy other
+
+This implementation is also fully compatible with RDMA and has undergone 
special optimizations to suppor the use of RDMA. (See docs/rdma.txt for more 
details).
+
+The Micro-Checkpointing Process
+Basic Algorithm
+Micro-Checkpoints (MC) work against the existing live migration path in QEMU, 
and can effectively be understood as a live migration that never ends. As 
such, iteration rounds happen at the granularity of 10s of milliseconds and 
perform the following steps:
+
+1. After N milliseconds, stop the VM.
+3. Generate a MC by invoking the live migration software path to identify and 
copy dirty memory into a local staging area inside QEMU.
+4. Resume the VM immediately so that it can make forward progress.
+5. Transmit the checkpoint to the destination.
+6. Repeat
+Upon failure, load the contents of the last MC at the destination back into 
memory and run the VM normally.
+
+I/O buffering
+Additionally, a MC must include a consistent view of device I/O, particularly 
the network, a problem commonly referred to as output commit. This means that 
the outside world can not be allowed to experience duplicate state that was 
committed by the virtual machine after failure. This is possible because a 
checkpoint may diverge by N milliseconds of time and commit state while the 
current MC is being transmitted to the destination.
+
+To guard against this problem, first, we must buffer the TX output of the 
network (not the input) between MCs until the current MC is safely received by 
the destination. For example, all outbound network packets must be held at the 
source until the MC is transmitted. After transmission is complete, those 
packets can be released. Similarly, in the case of disk I/O, we must ensure 
that either the contents of the local disk are safely mirrored to a remote disk 
before completing a MC or that the output to a shared disk, such as iSCSI, is 
also buffered between checkpoints and then later released in the

[Qemu-devel] [RFC PATCH v2 03/12] mc: introduce a 'checkpointing' status check into the VCPU states

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

During micro-checkpointing, the VCPUs get repeatedly paused and
resumed. We need to not freak out when the VM begins micro-checkpointing.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 cpus.c|  9 -
 include/migration/migration.h | 21 +
 qapi-schema.json  |  4 +++-
 vl.c  |  7 +++
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 945d85b..b876d6b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -532,7 +532,14 @@ static int do_vm_stop(RunState state)
 pause_all_vcpus();
 runstate_set(state);
 vm_state_notify(0, state);
-monitor_protocol_event(QEVENT_STOP, NULL);
+/*
+ * If MC is enabled, libvirt gets confused 
+ * because it thinks the VM is stopped when 
+ * its just being micro-checkpointed.
+ */
+if(state != RUN_STATE_CHECKPOINT_VM) {
+monitor_protocol_event(QEVENT_STOP, NULL);
+}
 }
 
 bdrv_drain_all();
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3e1e6c7..9c62e2f 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -121,10 +121,31 @@ uint64_t skipped_mig_bytes_transferred(void);
 uint64_t skipped_mig_pages_transferred(void);
 uint64_t norm_mig_bytes_transferred(void);
 uint64_t norm_mig_pages_transferred(void);
+uint64_t norm_mig_log_dirty_time(void);
+uint64_t norm_mig_bitmap_time(void);
 uint64_t xbzrle_mig_bytes_transferred(void);
 uint64_t xbzrle_mig_pages_transferred(void);
 uint64_t xbzrle_mig_pages_overflow(void);
 uint64_t xbzrle_mig_pages_cache_miss(void);
+void acct_clear(void);
+
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
+enum {
+MIG_STATE_ERROR = -1,
+MIG_STATE_NONE,
+MIG_STATE_SETUP,
+MIG_STATE_CANCELLED,
+MIG_STATE_CANCELLING,
+MIG_STATE_ACTIVE,
+MIG_STATE_CHECKPOINTING,
+MIG_STATE_COMPLETED,
+};
+
+int mc_enable_buffering(void);
+int mc_start_buffer(void);
+void mc_init_checkpointer(MigrationState *s);
+void mc_process_incoming_checkpoints_if_requested(QEMUFile *f);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 7cfb5e5..3c2ee4d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -169,6 +169,8 @@
 #
 # @save-vm: guest is paused to save the VM state
 #
+# @checkpoint-vm: guest is paused to checkpoint the VM state
+#
 # @shutdown: guest is shut down (and -no-shutdown is in use)
 #
 # @suspended: guest is suspended (ACPI S3)
@@ -181,7 +183,7 @@
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-'guest-panicked' ] }
+'guest-panicked', 'checkpoint-vm' ] }
 
 ##
 # @SnapshotInfo
diff --git a/vl.c b/vl.c
index 316de54..2fb5b1f 100644
--- a/vl.c
+++ b/vl.c
@@ -552,6 +552,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_PAUSED, RUN_STATE_CHECKPOINT_VM },
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -562,14 +563,18 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_CHECKPOINT_VM },
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+{ RUN_STATE_CHECKPOINT_VM, RUN_STATE_RUNNING },
+
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_PAUSED },
 { RUN_STATE_RUNNING, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_RUNNING, RUN_STATE_CHECKPOINT_VM },
 { RUN_STATE_RUNNING, RUN_STATE_RESTORE_VM },
 { RUN_STATE_RUNNING, RUN_STATE_SAVE_VM },
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
@@ -585,9 +590,11 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SUSPENDED, RUN_STATE_CHECKPOINT_VM },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_WATCHDOG, RUN_STATE_CHECKPOINT_VM },
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 11/12] mc: introduce new capabilities to control micro-checkpointing

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

New capabilities include the use of RDMA acceleration,
use of network buffering, and keepalive support, as documented
in patch #1.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 qapi-schema.json | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 98abdac..1fdf208 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -720,10 +720,44 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #  to speed up convergence of RAM migration. (since 1.6)
 #
+# @mc: The migration will never end, and the VM will instead be continuously
+#  micro-checkpointed (MC). Use the command migrate-set-mc-delay to 
+#  control the frequency at which the checkpoints occur. 
+#  Disabled by default. (Since 2.x)
+#
+# @mc-net-disable: Deactivate network buffering against outbound network 
+#  traffic while Micro-Checkpointing (@mc) is active.
+#  Enabled by default. Disabling will make the MC protocol inconsistent
+#  and potentially break network connections upon an actual failure.
+#  Only for performance testing. (Since 2.x)
+#
+# @mc-rdma-copy: MC requires creating a local-memory checkpoint before
+#  transmission to the destination. This requires heavy use of 
+#  memcpy() which dominates the processor pipeline. This option 
+#  makes use of *local* RDMA to perform the copy instead of the CPU.
+#  Enabled by default only if the migration transport is RDMA.
+#  Disabled by default otherwise. (Since 2.x)
+#
+# @rdma-keepalive: RDMA connections do not timeout by themselves if a peer
+# has disconnected prematurely or failed. User-level keepalives
+# allow the migration to abort cleanly if there is a problem with the
+# destination host. For debugging, this can be problematic as
+# the keepalive may cause the peer to abort prematurely if we are
+# at a GDB breakpoint, for example.
+# Enabled by default. (Since 2.x)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 
+   'rdma-pin-all', 
+   'auto-converge', 
+   'zero-blocks',
+   'mc', 
+   'mc-net-disable',
+   'mc-rdma-copy',
+   'rdma-keepalive'
+  ] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 08/12] mc: core logic

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

This implements the core logic,
all described in the first patch (docs/mc.txt).

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 migration-checkpoint.c | 1565 
 1 file changed, 1565 insertions(+)
 create mode 100644 migration-checkpoint.c

diff --git a/migration-checkpoint.c b/migration-checkpoint.c
new file mode 100644
index 000..a69edb2
--- /dev/null
+++ b/migration-checkpoint.c
@@ -0,0 +1,1565 @@
+/*
+ *  Micro-Checkpointing (MC) support 
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ *  Copyright IBM, Corp. 2014
+ *
+ *  Authors:
+ *   Michael R. Hines mrhi...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#include libnl3/netlink/route/qdisc/plug.h
+#include libnl3/netlink/route/class.h
+#include libnl3/netlink/cli/utils.h
+#include libnl3/netlink/cli/tc.h
+#include libnl3/netlink/cli/qdisc.h
+#include libnl3/netlink/cli/link.h
+#include qemu-common.h
+#include hw/virtio/virtio.h
+#include hw/virtio/virtio-net.h
+#include qemu/sockets.h
+#include migration/migration.h
+#include migration/qemu-file.h
+#include qmp-commands.h
+#include net/tap-linux.h
+#include sys/ioctl.h
+
+#define DEBUG_MC
+//#define DEBUG_MC_VERBOSE
+//#define DEBUG_MC_REALLY_VERBOSE
+
+#ifdef DEBUG_MC
+#define DPRINTF(fmt, ...) \
+do { printf(mc:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+#ifdef DEBUG_MC_VERBOSE
+#define DDPRINTF(fmt, ...) \
+do { printf(mc:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DDPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+#ifdef DEBUG_MC_REALLY_VERBOSE
+#define DDDPRINTF(fmt, ...) \
+do { printf(mc:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DDDPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+/*
+ * Micro checkpoints (MC)s are typically only a few MB when idle.
+ * However, they can easily be very large during heavy workloads.
+ * In the *extreme* worst-case, QEMU might need double the amount of main 
memory
+ * than that of what was originally allocated to the virtual machine.
+ *
+ * To support this variability during transient periods, a MC
+ * consists of a linked list of slabs, each of identical size. A better name
+ * would be welcome, as the name was only chosen because it resembles linux
+ * memory allocation. Because MCs occur several times per second 
+ * (a frequency of 10s of milliseconds), slabs allow MCs to grow and shrink 
+ * without constantly re-allocating all memory in place during each checkpoint.
+ *
+ * During steady-state, the 'head' slab is permanently allocated and never goes
+ * away, so when the VM is idle, there is no memory allocation at all.
+ * This design supports the use of RDMA. Since RDMA requires memory pinning, we
+ * must be able to hold on to a slab for a reasonable amount of time to get any
+ * real use out of it.
+ *
+ * Regardless, the current strategy taken is:
+ * 
+ * 1. If the checkpoint size increases,
+ *then grow the number of slabs to support it,
+ *(if and only if RDMA is activated, these slabs will be pinned.)
+ * 2. If the next checkpoint size is smaller than the last one,
+  then that's a strike.
+ * 3. After N strikes, cut the size of the slab cache in half
+ *(to a minimum of 1 slab as described before).
+ *
+ * As of this writing, a typical average size of 
+ * an Idle-VM checkpoint is under 5MB.
+ */
+
+#define MC_SLAB_BUFFER_SIZE (5UL * 1024UL * 1024UL) /* empirical */
+#define MC_DEV_NAME_MAX_SIZE256
+
+#define MC_DEFAULT_CHECKPOINT_FREQ_MS 100 /* too slow, but best for now */
+#define CALC_MAX_STRIKES()   \
+do {  max_strikes = (max_strikes_delay_secs * 1000) / freq_ms; } \
+while (0)
+
+/*
+ * How many seconds-worth of checkpoints to wait before re-evaluating the 
size
+ * of the slab list?
+ *
+ * #strikes_until_shrink_cache = Function(#checkpoints/sec)
+ *
+ * Increasing the number of seconds also increases the number of strikes needed
+ * to be reached until it is time to cut the cache in half.
+ *
+ * Below value is open for debate - we just want it to be small enough to 
ensure
+ * that a large, idle slab list doesn't stay too large for too long.
+ */
+#define MC_DEFAULT_SLAB_MAX_CHECK_DELAY_SECS 10
+
+/* 
+ * MC serializes the actual RAM page contents in such a way that the actual
+ * pages are separated from the meta-data (all the QEMUFile stuff).
+ *
+ * This is done strictly for the purposes of being able to use RDMA
+ * and to replace memcpy() on the local machine for hardware with very
+ * fast RAM memory speeds.
+ * 
+ * This serialization requires recording the page descriptions and then
+ * pushing them into slabs after the checkpoint has been captured
+ * (minus the page data).
+ *
+ * The memory holding the page descriptions are allocated

[Qemu-devel] [RFC PATCH v2 04/12] mc: support custom page loading and copying

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

Just as RDMA has custom routines for saving memory,
this provides RDMA with custom routines for loading
and copying memory as well.

Micro-checkpointing needs this support to avoid modifying
the arch_init.c as little as possible while stilling being
able to load RDMA-based memory from checkpoints in a
performance-optimal way as they are received from the network.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 arch_init.c   |  9 +++--
 include/migration/migration.h | 33 --
 include/migration/qemu-file.h | 54 +++--
 qemu-file.c   | 80 +--
 4 files changed, 167 insertions(+), 9 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index b8364b0..db75120 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -540,7 +540,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 /* In doubt sent page as normal */
 bytes_sent = -1;
 ret = ram_control_save_page(f, block-offset,
-   offset, TARGET_PAGE_SIZE, bytes_sent);
+   block-host, offset, TARGET_PAGE_SIZE, bytes_sent);
 
 if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
 if (ret != RAM_SAVE_CONTROL_DELAYED) {
@@ -1004,13 +1004,18 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
 } else if (flags  RAM_SAVE_FLAG_PAGE) {
 void *host;
+int r;
 
 host = host_from_stream_offset(f, addr, flags);
 if (!host) {
 return -EINVAL;
 }
 
-qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+r = ram_control_load_page(f, host, TARGET_PAGE_SIZE);
+
+if (r == RAM_LOAD_CONTROL_NOT_SUPP) {
+qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
+}
 } else if (flags  RAM_SAVE_FLAG_XBZRLE) {
 void *host = host_from_stream_offset(f, addr, flags);
 if (!host) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 9c62e2f..5c1a574 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -190,9 +190,38 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags);
 
 #define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
+#define RAM_LOAD_CONTROL_NOT_SUPP -3000
+#define RAM_LOAD_CONTROL_DELAYED  -4000
+#define RAM_COPY_CONTROL_NOT_SUPP -5000
+#define RAM_COPY_CONTROL_DELAYED  -6000
 
-size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
- ram_addr_t offset, size_t size,
+#define RDMA_CONTROL_VERSION_CURRENT 1
+
+int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+ uint8_t *host_addr,
+ ram_addr_t offset, long size,
  int *bytes_sent);
 
+int ram_control_load_page(QEMUFile *f,
+ void *host_addr,
+ long size);
+
+int ram_control_copy_page(QEMUFile *f, 
+ ram_addr_t block_offset_dest,
+ ram_addr_t offset_dest,
+ ram_addr_t block_offset_source,
+ ram_addr_t offset_source,
+ long size);
+
+int migrate_use_mc(void);
+int migrate_use_mc_net(void);
+int migrate_use_mc_rdma_copy(void);
+
+#define MC_VERSION 1
+
+int mc_info_load(QEMUFile *f, void *opaque, int version_id);
+void mc_info_save(QEMUFile *f, void *opaque);
+
+void qemu_rdma_info_save(QEMUFile *f, void *opaque);
+int qemu_rdma_info_load(QEMUFile *f, void *opaque, int version_id);
 #endif
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index a191fb6..c50de0d 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -71,17 +71,63 @@ typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, 
uint64_t flags);
 #define RAM_CONTROL_ROUND1
 #define RAM_CONTROL_HOOK 2
 #define RAM_CONTROL_FINISH   3
+#define RAM_CONTROL_FLUSH4
 
 /*
  * This function allows override of where the RAM page
  * is saved (such as RDMA, for example.)
  */
-typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
+typedef int (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
ram_addr_t block_offset,
+   uint8_t *host_addr,
ram_addr_t offset,
-   size_t size,
+   long size,
int *bytes_sent);
 
+/*
+ * This function allows override of where the RAM page
+ * is saved (such as RDMA, for example.)
+ */
+typedef int (QEMURamLoadFunc)(QEMUFile *f,
+   void *opaque,
+   void

[Qemu-devel] [RFC PATCH v2 09/12] mc: configure and makefile support

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

Self-explanatory.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 Makefile.objs |  1 +
 configure | 45 +
 2 files changed, 46 insertions(+)

diff --git a/Makefile.objs b/Makefile.objs
index ac1d0e1..db70f93 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -53,6 +53,7 @@ common-obj-y += migration.o migration-tcp.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
 common-obj-$(CONFIG_RDMA) += migration-rdma.o
+common-obj-$(CONFIG_MC) += migration-checkpoint.o
 common-obj-y += qemu-char.o #aio.o
 common-obj-y += block-migration.o
 common-obj-y += page_cache.o xbzrle.o
diff --git a/configure b/configure
index 0eadab5..507ee99 100755
--- a/configure
+++ b/configure
@@ -197,6 +197,7 @@ kvm=no
 rdma=
 gprof=no
 debug_tcg=no
+mc=
 debug=no
 strip_opt=yes
 tcg_interpreter=no
@@ -1001,6 +1002,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2=yes
   ;;
+  --disable-mc) mc=no
+  ;;
+  --enable-mc) mc=yes
+  ;;
   --enable-vhdx) vhdx=yes
   ;;
   --disable-vhdx) vhdx=no
@@ -1189,6 +1194,8 @@ Advanced options (experts only):
   --enable-kvm enable KVM acceleration support
   --disable-rdma   disable RDMA-based migration support
   --enable-rdmaenable RDMA-based migration support
+  --disable-mc disable Micro-Checkpointing support
+  --enable-mc  enable Micro-Checkpointing support
   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
   --enable-system  enable all system emulation targets
   --disable-system disable all system emulation targets
@@ -1901,6 +1908,35 @@ EOF
   fi
 fi
 
+##
+# Micro-Checkpointing requires netlink (libnl3)
+if test $mc != no ; then
+  cat  $TMPC EOF
+#include libnl3/netlink/route/qdisc/plug.h
+#include libnl3/netlink/route/class.h
+#include libnl3/netlink/cli/utils.h
+#include libnl3/netlink/cli/tc.h
+#include libnl3/netlink/cli/qdisc.h
+#include libnl3/netlink/cli/link.h
+int main(void) { return 0; }
+EOF
+  mc_libs=-lnl-3 -lnl-cli-3 -lnl-route-3
+  mc_cflags=-I/usr/include/libnl3
+  if compile_prog $mc_cflags $mc_libs ; then
+mc=yes
+libs_softmmu=$libs_softmmu $mc_libs
+QEMU_CFLAGS=$QEMU_CFLAGS $mc_cflags
+  else
+if test $mc = yes ; then
+error_exit \
+ NetLink v3 libs/headers not present. \
+ Please install the libnl3-*-dev(el) packages from your distro.
+fi
+mc=no
+  fi
+fi
+
+
 ##
 # VNC TLS/WS detection
 if test $vnc = yes -a \( $vnc_tls != no -o $vnc_ws != no \) ; then
@@ -3829,6 +3865,7 @@ echo KVM support   $kvm
 echo RDMA support  $rdma
 echo TCG interpreter   $tcg_interpreter
 echo fdt support   $fdt
+echo Micro checkpointing $mc
 echo preadv support$preadv
 echo fdatasync $fdatasync
 echo madvise   $madvise
@@ -4334,6 +4371,10 @@ if test $rdma = yes ; then
   echo CONFIG_RDMA=y  $config_host_mak
 fi
 
+if test $mc = yes ; then
+  echo CONFIG_MC=y  $config_host_mak
+fi
+
 if test $tcg_interpreter = yes; then
   QEMU_INCLUDES=-I\$(SRC_PATH)/tcg/tci $QEMU_INCLUDES
 elif test $ARCH = sparc64 ; then
@@ -4769,6 +4810,10 @@ echo QEMU_CFLAGS+=$cflags  $config_target_mak
 
 done # for target in $targets
 
+if test $mc = yes ; then
+echo CONFIG_MC=y  $config_host_mak
+fi
+
 if [ $pixman = internal ]; then
   echo config-host.h: subdir-pixman  $config_host_mak
 fi
-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 06/12] mc: introduce state machine changes for MC

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

This patch sets up the initial changes to the migration state
machine and prototypes to be used by the checkpointing code
to interact with the state machine so that we can later handle
failure and recovery scenarios.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 arch_init.c   | 29 -
 include/migration/migration.h |  2 ++
 migration.c   | 37 +
 3 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index db75120..e9d4d9e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -658,13 +658,13 @@ static void ram_migration_cancel(void *opaque)
 migration_end();
 }
 
-static void reset_ram_globals(void)
+static void reset_ram_globals(bool reset_bulk_stage)
 {
 last_seen_block = NULL;
 last_sent_block = NULL;
 last_offset = 0;
 last_version = ram_list.version;
-ram_bulk_stage = true;
+ram_bulk_stage = reset_bulk_stage;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -674,6 +674,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 RAMBlock *block;
 int64_t ram_pages = last_ram_offset()  TARGET_PAGE_BITS;
 
+/*
+ * RAM stays open during micro-checkpointing for the next transaction.
+ */
+if (migration_is_mc(migrate_get_current())) {
+qemu_mutex_lock_ramlist();
+reset_ram_globals(false);
+goto skip_setup;
+}
+
 migration_bitmap = bitmap_new(ram_pages);
 bitmap_set(migration_bitmap, 0, ram_pages);
 migration_dirty_pages = ram_pages;
@@ -710,12 +719,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 qemu_mutex_lock_iothread();
 qemu_mutex_lock_ramlist();
 bytes_transferred = 0;
-reset_ram_globals();
+reset_ram_globals(true);
 
 memory_global_dirty_log_start();
 migration_bitmap_sync();
 qemu_mutex_unlock_iothread();
 
+skip_setup:
+
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
 QTAILQ_FOREACH(block, ram_list.blocks, next) {
@@ -744,7 +755,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 qemu_mutex_lock_ramlist();
 
 if (ram_list.version != last_version) {
-reset_ram_globals();
+reset_ram_globals(true);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_ROUND);
@@ -825,7 +836,15 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-migration_end();
+
+/*
+ * Only cleanup at the end of normal migrations
+ * or if the MC destination failed and we got an error.
+ * Otherwise, we are (or will soon be) in MIG_STATE_CHECKPOINTING.
+ */
+if(!migrate_use_mc() || migration_has_failed(migrate_get_current())) {
+migration_end();
+}
 
 qemu_mutex_unlock_ramlist();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a7c54fe..e876a2c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -101,7 +101,9 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+bool migration_is_active(MigrationState *);
 bool migration_in_setup(MigrationState *);
+bool migration_is_mc(MigrationState *s);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
 MigrationState *migrate_get_current(void);
diff --git a/migration.c b/migration.c
index 25add6f..f42dae4 100644
--- a/migration.c
+++ b/migration.c
@@ -36,16 +36,6 @@
 do { } while (0)
 #endif
 
-enum {
-MIG_STATE_ERROR = -1,
-MIG_STATE_NONE,
-MIG_STATE_SETUP,
-MIG_STATE_CANCELLING,
-MIG_STATE_CANCELLED,
-MIG_STATE_ACTIVE,
-MIG_STATE_COMPLETED,
-};
-
 #define MAX_THROTTLE  (32  20)  /* Migration speed throttling */
 
 /* Amount of time to allocate to each chunk of bandwidth-throttled
@@ -273,7 +263,7 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 MigrationState *s = migrate_get_current();
 MigrationCapabilityStatusList *cap;
 
-if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
+if (migration_is_active(s)) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -285,7 +275,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+bool migration_is_active(MigrationState *s)
+{
+return (s-state == MIG_STATE_ACTIVE) || s-state == MIG_STATE_SETUP
+|| s-state == MIG_STATE_CHECKPOINTING;
+}
+
+void migrate_set_state(MigrationState *s, int old_state, int new_state)
 {
 if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
 trace_migrate_set_state(new_state);

[Qemu-devel] [RFC PATCH v2 10/12] mc: expose tunable parameter for checkpointing frequency

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

This exposes a QMP command that allows the management software
or policy to control the frequency of micro-checkpointing.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 hmp-commands.hx  | 16 +++-
 hmp.c|  6 ++
 hmp.h|  1 +
 qapi-schema.json | 13 +
 qmp-commands.hx  | 23 +++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f3fc514..2066c76 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -888,7 +888,7 @@ ETEXI
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination),
 .mhandler.cmd = hmp_migrate,
 },
 
@@ -965,6 +965,20 @@ Set maximum tolerated downtime (in seconds) for migration.
 ETEXI
 
 {
+.name   = migrate-set-mc-delay,
+.args_type  = value:i,
+.params = value,
+.help   = Set maximum delay (in milliseconds) between 
micro-checkpoints,
+.mhandler.cmd = hmp_migrate_set_mc_delay,
+},
+
+STEXI
+@item migrate-set-mc-delay @var{millisecond}
+@findex migrate-set-mc-delay
+Set maximum delay (in milliseconds) between micro-checkpoints.
+ETEXI
+
+{
 .name   = migrate_set_capability,
 .args_type  = capability:s,state:b,
 .params = capability state,
diff --git a/hmp.c b/hmp.c
index edf062e..9880bc8 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1029,6 +1029,12 @@ void hmp_migrate_set_downtime(Monitor *mon, const QDict 
*qdict)
 qmp_migrate_set_downtime(value, NULL);
 }
 
+void hmp_migrate_set_mc_delay(Monitor *mon, const QDict *qdict)
+{
+int64_t value = qdict_get_int(qdict, value);
+qmp_migrate_set_mc_delay(value, NULL);
+}
+
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict)
 {
 int64_t value = qdict_get_int(qdict, value);
diff --git a/hmp.h b/hmp.h
index ed58f0e..068b2c1 100644
--- a/hmp.h
+++ b/hmp.h
@@ -60,6 +60,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_drive_backup(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_mc_delay(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index 7306adc..98abdac 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2160,6 +2160,19 @@
 { 'command': 'migrate_set_downtime', 'data': {'value': 'number'} }
 
 ##
+# @migrate-set-mc-delay
+#
+# Set delay (in milliseconds) between micro checkpoints.
+#
+# @value: maximum delay in milliseconds 
+#
+# Returns: nothing on success
+#
+# Since: 2.x
+##
+{ 'command': 'migrate-set-mc-delay', 'data': {'value': 'int'} }
+
+##
 # @migrate_set_speed
 #
 # Set maximum speed for migration.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cce6b81..d8b9c34 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -754,6 +754,29 @@ Example:
 EQMP
 
 {
+.name   = migrate-set-mc-delay,
+.args_type  = value:i,
+.mhandler.cmd_new = qmp_marshal_input_migrate_set_mc_delay,
+},
+
+SQMP
+migrate-set-mc-delay
+
+
+Set maximum delay (in milliseconds) between micro-checkpoints.
+
+Arguments:
+
+- value: maximum delay (json-int)
+
+Example:
+
+- { execute: migrate-set-mc-delay, arguments: { value: 100 } }
+- { return: {} }
+
+EQMP
+
+{
 .name   = client_migrate_info,
 .args_type  = 
protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?,
 .params = protocol hostname port tls-port cert-subject,
-- 
1.8.1.2

[Qemu-devel] [RFC PATCH v2 12/12] mc: activate and use MC if requested

2014-02-18 Thread mrhines

From: Michael R. Hines mrhi...@us.ibm.com

Once the initial migration has completed, we kickoff the migration_thread,
which never dies. Additionally, we register load/save functions for MC
which allow us to inform the destination that we are requesting a
micro-checkpointing session without needing to add additional command-line
switches on the destination side.

Signed-off-by: Michael R. Hines mrhi...@us.ibm.com
---
 include/migration/migration.h |  2 +-
 include/migration/qemu-file.h |  1 +
 migration.c   | 33 -
 vl.c  |  2 ++
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index f18ff5e..1695e9e 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -46,7 +46,7 @@ struct MigrationState
 int64_t bandwidth_limit;
 size_t bytes_xfer;
 size_t xfer_limit;
-QemuThread thread;
+QemuThread *thread;
 QEMUBH *cleanup_bh;
 QEMUFile *file;
 
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c50de0d..89e7c19 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -148,6 +148,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
+QEMUFile *qemu_fopen_mc(void *opaque, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/migration.c b/migration.c
index 0ccbeaa..1a6fdbd 100644
--- a/migration.c
+++ b/migration.c
@@ -93,6 +93,9 @@ static void process_incoming_migration_co(void *opaque)
 int ret;
 
 ret = qemu_loadvm_state(f);
+if (ret = 0) {
+mc_process_incoming_checkpoints_if_requested(f);
+}
 qemu_fclose(f);
 free_xbzrle_decoded_buf();
 if (ret  0) {
@@ -313,14 +316,17 @@ static void migrate_fd_cleanup(void *opaque)
 {
 MigrationState *s = opaque;
 
-qemu_bh_delete(s-cleanup_bh);
-s-cleanup_bh = NULL;
+if(s-cleanup_bh) {
+qemu_bh_delete(s-cleanup_bh);
+s-cleanup_bh = NULL;
+}
 
 if (s-file) {
 DPRINTF(closing file\n);
 qemu_mutex_unlock_iothread();
-qemu_thread_join(s-thread);
+qemu_thread_join(s-thread);
 qemu_mutex_lock_iothread();
+g_free(s-thread);
 
 qemu_fclose(s-file);
 s-file = NULL;
@@ -695,11 +701,27 @@ static void *migration_thread(void *opaque)
 s-downtime = end_time - start_time;
 runstate_set(RUN_STATE_POSTMIGRATE);
 } else {
+if(migrate_use_mc()) {
+qemu_fflush(s-file);
+if (migrate_use_mc_net()) {
+if (mc_enable_buffering()  0 ||
+mc_start_buffer()  0) {
+migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
+}
+}
+}
+
 if (old_vm_running) {
 vm_start();
 }
 }
-qemu_bh_schedule(s-cleanup_bh);
+
+if (migrate_use_mc()  s-state != MIG_STATE_ERROR) {
+mc_init_checkpointer(s);
+} else {
+qemu_bh_schedule(s-cleanup_bh);
+}
+
 qemu_mutex_unlock_iothread();
 
 return NULL;
@@ -720,6 +742,7 @@ void migrate_fd_connect(MigrationState *s)
 /* Notify before starting migration thread */
 notifier_list_notify(migration_state_notifiers, s);
 
-qemu_thread_create(s-thread, migration_thread, s,
+s-thread = g_malloc0(sizeof(*s-thread));
+qemu_thread_create(s-thread, migration_thread, s,
QEMU_THREAD_JOINABLE);
 }
diff --git a/vl.c b/vl.c
index 2fb5b1f..093eb20 100644
--- a/vl.c
+++ b/vl.c
@@ -4145,6 +4145,8 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
 register_savevm_live(NULL, ram, 0, 4, savevm_ram_handlers, NULL);
+register_savevm(NULL, mc, -1, MC_VERSION, mc_info_save, 
+mc_info_load, NULL); 
 
 if (nb_numa_nodes  0) {
 int i;
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH 2/3] qtest: make QEMU our direct child process

2014-02-18 Thread Stefan Hajnoczi

On Mon, Feb 17, 2014 at 05:44:55PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
 
  qtest_init() cannot use exec*p() to launch QEMU since the exec*p()
  functions take an argument array while qtest_init() takes char
  *extra_args.  Therefore we execute /bin/sh -c command-line and let the
  shell parse the argument string.
 
  This left /bin/sh as our child process and our child's child was QEMU.
  We still want QEMU's pid so the -pidfile option was used to let QEMU
  report its pid.
 
  The pidfile needs to be unlinked when the test case exits or fails.  In
  other words, the pidfile creates a new problem for us!
 
  Simplify all this using the shell 'exec' command.  It allows us to
  replace the /bin/sh process with QEMU.  Then we no longer need to use
  -pidfile because we already know our fork child's pid.
 
  Note: Yes, it seems silly to exec /bin/sh when we could just exec QEMU
  directly.  But remember qtest_init() takes a single char *extra_args
  command-line fragment instead of a real argv[] array, so we need
  /bin/sh's argument parsing behavior.
 
 Sounds like a design mistake to me.

I wouldn't call char *extra_args a mistake because strings are still
simpler to manipulate in C than char *argv[] arrays.  So we write less
code in test cases and libqtest.c at the expense of a roundabout way to
spawn the process.

Stefan

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Stefan Hajnoczi

On Mon, Feb 17, 2014 at 06:00:03PM +0100, Markus Armbruster wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 17/02/2014 16:44, Stefan Hajnoczi ha scritto:
   }
 
  +static void sigabrt_handler(int signo)
  +{
  +qtest_end();
  +}
  +
 
  void qtest_quit(QTestState *s)
  {
  int status;
 
  if (s-qemu_pid != -1) {
  kill(s-qemu_pid, SIGTERM);
  waitpid(s-qemu_pid, status, 0);
  }
 
  close(s-fd);
  close(s-qmp_fd);
  g_string_free(s-rx, true);
  g_free(s);
  }
 
  Not async-signal safe.  You need to ignore the g_string_free and
  g_free (perhaps even the closes) if calling from the sigabrt_handler.
 
 kill(), waitpid() and close() are all async-signal-safe.
 
 SIGABRT is normally synchronous enough: it's sent by abort().  But of
 course, nothing stops the user from kill -ABRT.  Or GLib from calling
 abort() in some place where an attempt to reenter it crashes  burns.
 Not sure I'd care, but I'm pretty sure I don't care for freeing stuff on
 exit :)

Yes, SIGABRT is synchronous for all purposes.  So the only danger is
that g_string_free() or g_free() could fail while we're in
g_assert(false).  But they don't, which makes sense because they are
totally unrelated to g_assert() and therefore can handle re-entrancy.

In practice there is no issue and I've tested assertion failure with
glib 1.2.10.

Stefan

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Stefan Hajnoczi

On Mon, Feb 17, 2014 at 05:49:31PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
  +/* Catch SIGABRT to clean up on g_assert() failure */
  +sigact = (struct sigaction){
  +.sa_handler = sigabrt_handler,
  +.sa_flags = SA_RESETHAND,
  +};
 
 Assumes zero-initialization has the same effect as
 sigemptyset(sigact.sa_mask).  Quoting POSIX:
 
 The implementation of the sigemptyset() (or sigfillset()) function
 could quite trivially clear (or set) all the bits in the signal set.
 Alternatively, it would be reasonable to initialize part of the
 structure, such as a version field, to permit binary-compatibility
 between releases where the size of the set varies.  For such
 reasons, either sigemptyset() or sigfillset() must be called prior
 to any other use of the signal set, even if such use is read-only
 (for example, as an argument to sigpending()).
 
 Looks like you better sigemptyset() here, for maximum portability.

Okay, will do that.

  +sigaction(SIGABRT, sigact, s-sigact_old);
   
   socket_path = g_strdup_printf(/tmp/qtest-%d.sock, getpid());
   qmp_socket_path = g_strdup_printf(/tmp/qtest-%d.qmp, getpid());
  @@ -150,13 +167,19 @@ void qtest_quit(QTestState *s)
   {
   int status;
   
  +sigaction(SIGABRT, s-sigact_old, NULL);
  +
 
 Can this ever restore the action to anything but the default action?

This code is in a library so I don't want to make assumptions about
the callers.  Keeping sigact_old around is literally 1 line of code so I
think it's worth it.

   if (s-qemu_pid != -1) {
   kill(s-qemu_pid, SIGTERM);
   waitpid(s-qemu_pid, status, 0);
   }
   
  -close(s-fd);
  -close(s-qmp_fd);
  +if (s-fd != -1) {
  +close(s-fd);
  +}
  +if (s-qmp_fd != -1) {
  +close(s-qmp_fd);
  +}
 
 I generally don't bother to avoid close(-1).

When I drive on the highway I stay on the lanes but I guess I could just
slide along the side barriers :).  It's a style issue but close(-1)
annoys me in strace so I try to avoid doing it.

Re: [Qemu-devel] [PATCH v3 0/4] X86/KVM: enable Intel MPX for KVM

2014-02-18 Thread Paolo Bonzini


Il 22/01/2014 13:03, Paolo Bonzini ha scritto:

Il 22/01/2014 06:29, Liu, Jinsong ha scritto:

These patches are version 3 to enalbe Intel MPX for KVM.

Version 1:
  * Add some Intel MPX definiation
  * Fix a cpuid(0x0d, 0) exposing bug, dynamic per XCR0 features
enable/disable
  * vmx and msr handle for MPX support at KVM
  * enalbe MPX feature for guest

Version 2:
  * remove generic MPX definiation, Qiaowei's patch has add the
definiation at kernel side
  * add MSR_IA32_BNDCFGS to msrs_to_save

Version 3:
  * rebase on latest kernel, which include Qiaowei's MPX common
definiation pulled from HPA's tree


I am afraid there is still some work to do on these patches, so they
need to be delayed to 3.15.

Patch 1:
this seems mostly separate from the rest of the MPX work.  I
commented on the missing ULL suffix, but I would also like to
understand why you put this patch in this series.

Patch 2:
As remarked in the reply to this patch:
- the vmx_disable_intercept_for_msr has to be unconditional
- you need a new kvm_x86_ops member mpx_supported, to disable
MPX whenever the two VMX controls are not available.

Patch 3:
this patch needs to be rebased.  Apart from that it is fine,
but please move the VMX bits together with patch 2, and the
other bits together with patch 4.

Patch 4:
this patch needs to be rebased and to use the new mpx_supported
member

If you also want to look at nested VMX support for MPX, that would be
nice.  It should not be hard.  Otherwise we can take care of that later.

Thanks for your work,

Paolo


Are you going to send v4?

Paolo

Re: [Qemu-devel] [RFC PATCH v2 00/12] mc: fault tolerante through micro-checkpointing

2014-02-18 Thread Li Guang


Hi, Michael

this patch-set will break normal build(without --enable-mc):

migration.c: In function ‘migrate_rdma_pin_all’:
migration.c:564: error: ‘MIGRATION_CAPABILITY_X_RDMA_PIN_ALL’ undeclared 
(first use in this function)

migration.c:564: error: for each function it appears in.)

Thanks!
Li Guang

mrhi...@linux.vnet.ibm.com wrote:

From: Michael R. Hinesmrhi...@us.ibm.com

Changes since v1:

1. Re-based against Juan's improved migration_bitmap performance changes
2. Overhauled RDMA support to prepare for better usage of RDMA in
other parts of the QEMU code base (such as storage).
3. Fix for netlink issues that failed to cleanup the network buffer
device for development testing.

Michael R. Hines (12):
   mc: add documentation for micro-checkpointing
   mc: timestamp migration_bitmap and KVM logdirty usage
   mc: introduce a 'checkpointing' status check into the VCPU states
   mc: support custom page loading and copying
   rdma: accelerated memcpy() support and better external RDMA user
 interfaces
   mc: introduce state machine changes for MC
   mc: introduce additional QMP statistics for micro-checkpointing
   mc: core logic
   mc: configure and makefile support
   mc: expose tunable parameter for checkpointing frequency
   mc: introduce new capabilities to control micro-checkpointing
   mc: activate and use MC if requested

  Makefile.objs |1 +
  arch_init.c   |   72 +-
  configure |   45 +
  cpus.c|9 +-
  docs/mc.txt   |  222 
  hmp-commands.hx   |   16 +-
  hmp.c |   23 +
  hmp.h |1 +
  include/migration/migration.h |   70 +-
  include/migration/qemu-file.h |   55 +-
  migration-checkpoint.c| 1565 +
  migration-rdma.c  | 2605 +++--
  migration.c   |  156 ++-
  qapi-schema.json  |   86 +-
  qemu-file.c   |   80 +-
  qmp-commands.hx   |   23 +
  vl.c  |9 +
  17 files changed, 4097 insertions(+), 941 deletions(-)
  create mode 100644 docs/mc.txt
  create mode 100644 migration-checkpoint.c

Re: [Qemu-devel] [PATCH v3 0/4] X86/KVM: enable Intel MPX for KVM

2014-02-18 Thread Liu, Jinsong

Paolo Bonzini wrote:
 Il 22/01/2014 13:03, Paolo Bonzini ha scritto:
 Il 22/01/2014 06:29, Liu, Jinsong ha scritto:
 These patches are version 3 to enalbe Intel MPX for KVM.
 
 Version 1:
   * Add some Intel MPX definiation
   * Fix a cpuid(0x0d, 0) exposing bug, dynamic per XCR0 features
 enable/disable 
   * vmx and msr handle for MPX support at KVM
   * enalbe MPX feature for guest
 
 Version 2:
   * remove generic MPX definiation, Qiaowei's patch has add the
 definiation at kernel side 
   * add MSR_IA32_BNDCFGS to msrs_to_save
 
 Version 3:
   * rebase on latest kernel, which include Qiaowei's MPX common
 definiation pulled from HPA's tree
 
 I am afraid there is still some work to do on these patches, so they
 need to be delayed to 3.15. 
 
 Patch 1:
 this seems mostly separate from the rest of the MPX work.  I
 commented on the missing ULL suffix, but I would also like to
 understand why you put this patch in this series.
 
 Patch 2:
 As remarked in the reply to this patch:
 - the vmx_disable_intercept_for_msr has to be unconditional
 - you need a new kvm_x86_ops member mpx_supported, to disable
 MPX whenever the two VMX controls are not available.
 
 Patch 3:
 this patch needs to be rebased.  Apart from that it is fine,
 but please move the VMX bits together with patch 2, and the
 other bits together with patch 4.
 
 Patch 4:
 this patch needs to be rebased and to use the new mpx_supported 
 member 
 
 If you also want to look at nested VMX support for MPX, that would be
 nice.  It should not be hard.  Otherwise we can take care of that
 later. 
 
 Thanks for your work,
 
 Paolo
 
 Are you going to send v4?
 
 Paolo

Yes, I just return from long Chinese Spring Festival, I will send V4 later.

Thanks,
Jinsong

Re: [Qemu-devel] [PATCH v2] RFC: Add blockdev-del QMP command

2014-02-18 Thread Markus Armbruster

Kevin Wolf kw...@redhat.com writes:

 Am 12.02.2014 um 18:36 hat Ian Main geschrieben:
 This is the sister command to blockdev-add.  In Fam's example he uses
 the drive_del HMP command to clean up but it would be much nicer to
 have a way to do this via QMP.
 
 Signed-off-by: Ian Main im...@redhat.com

 diff --git a/qapi-schema.json b/qapi-schema.json
 index d22651c..01186cd 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -4469,3 +4469,14 @@
  # Since: 1.7
  ##
  { 'command': 'blockdev-add', 'data': { 'options': 'BlockdevOptions' } }
 +
 +##
 +# @blockdev-del:
 +#
 +# Delete a block device.
 +#
 +# @device: Identifier for the block device to be deleted.
 +#
 +# Since: 2.0
 +##
 +{ 'command': 'blockdev-del', 'data': { 'device': 'str' } }

 I believe the full documentation should go here as well, not just in
 qmp-commands.hx.

Yes.

 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index c3ee46a..f08045d 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -3442,6 +3442,36 @@ Example (2):
  EQMP
  
  {
 +.name   = blockdev-del,
 +.args_type  = device:s,
 +.mhandler.cmd_new = qmp_marshal_input_blockdev_del,
 +},
 +
 +SQMP
 +blockdev-del
 +
 +
 +Remove host block device.  The result is that guest generated IO is no
 +longer submitted against the host device underlying the disk.  Once a
 +drive has been deleted, the QEMU Block layer returns -EIO which results
 +in IO errors in the guest for applications that are reading/writing to
 +the device.  These errors are always reported to the guest, regardless
 +of the drive's error actions (drive options rerror, werror).

 I think we wanted to have different semantics for blockdev-del.
 Specifically, it is a backend command that should have no effect on
 users of that backend.

 Let me paste and comment on some notes I made in a previous blockdev
 discussion:

 * Make sure that an explicit blockdev-del is needed to remove the
   BDS; it shouldn't happen automagically just because the guest
   device was unplugged
   [done]

 * By default, return an error for blockdev-del if reference count  1

Yes.

Deleting a block device that is not in use is a perfectly innocent
operation.  Deleting one that is in use isn't; it can corrupt data.
Innocent and dangerous operations should be separate.  A force option is
an acceptable separator.  Having a separate command would be another.

   [ The assumption is here that one reference is held by the
 monitor/external user, which isn't true today to my knowledge ]

Here's my working hypothesis on this matter:

blockdev_init() initializes the reference count to 1.  This is for the
reference it puts into the list of drives.  It also returns a weak
reference.

We pair blockdev_init() with a drive_put_ref().  drive_put_ref() knows
it must be paired with blockdev_init() when the reference count is 1.
Then it deletes from drives in addition to decrementing the count.

Actually, nothing else ever uses the reference count since commit
fa510eb switch block jobs over to the BDS reference count.  Worth a
cleanup.

   - But have a force option that closes the image file, even if it
 breaks the remaining users (e.g. uncooperative guest that doesn't
 release its PCI device)
 [ Here we need working refcounting including the external user,
   because otherwise we don't free the (closed) BDS even when the
   guest device is unplugged. It is an open question whether and
   how BDSes without an external reference are shown in the
   monitor. ]

Despite its name, the purpose of drive_del is not really deleting a
drive, it's revoking access to the host resources backing a drive.

drive_del doesn't delete the drive, it merely detaches the
BlockDriverState from its host resources, and schedules it for automatic
deletion.  That way, references beyond drive_del's reach remain valid.

Special case: if it's not in use by a front end, delete it right away.
Assumes that references beyond drive_del's reach exist only while it is
in use by a frone end.

I think there are basic operations struggling to get out of this mess:

* Create block device (and put it into drives)

* Delete block device (and remove it from drives)

  Fails if the block device is in use

* Revoke access to host resources

* Start use of block device

* Stop use of block device


 * Prevent mixing blockdev-add with drive_del and vice versa

   - Ideally drive_add BDSes are exactly those with a DriveInfo
 [ not true today, but DriveInfo.enable_auto_del can be used to
   distinguish them ]

 So I believe we still have some design work to do before we can actually
 implement this. I would prefer not to merge this for 2.0.

I'm afraid you're right.

Re: [Qemu-devel] [PATCH v3] Fix QEMU build on OpenBSD on x86 archs

2014-02-18 Thread Peter Maydell

On 18 February 2014 00:36, Brad Smith b...@comstyle.com wrote:
 Just one issue... just curious why my last name is missing
 from the commit?

Ah, we've run into a bug in patchwork. I downloaded your patch:
 http://patchwork.ozlabs.org/patch/299656/

but patchwork reconstructs patch mboxes from its database
rather than just returning the original email, and it has a bug
where it will always remember the name used by the very
first email it ever saw from you. I guess it has a patch in
the system way back from you with a misconfigured fullname.
Sorry about that. I know they've discussed this issue on the
patchwork mailing list but they obviously haven't implemented
a fix yet.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Peter Maydell

On 18 February 2014 01:02, Andreas Färber afaer...@suse.de wrote:
 Shouldn't 1/3 and 3/3 be included in 1.7.1 as guest-triggerable bugs?
 There's no qemu-sta...@nongnu.org in the commit message.

That's up to whoever maintains stable, I guess. I have no objection.

thanks
-- PMM

Re: [Qemu-devel] Cortex-M3: reading NVIC registers causes segfaults

2014-02-18 Thread Peter Maydell

On 18 February 2014 01:14, Andreas Färber afaer...@suse.de wrote:
 While first_cpu may help Andreas in his local copy for STM32, that
 assumption is not okay in general. The Vybrid VF6 has both a GIC and an
 NVIC, so our NVIC code should not make assumptions which CPU it can
 access.

Do you mean it has an M core CPU with an NVIC plus some other
A-class CPU with a GIC. Certainly in a system with heterogenous
CPUs this change won't work, but we don't support that at all
currently and there's a lot of work between here and there.

 I assume we would shield both using CPU address spaces

The GIC needs to know which CPU it's being accessed by.
I guess in theory you could have it export N distributor memory
regions which were identical apart from making I assume I'm
being accessed by CPU X assumptions, but that seems like a
bit of a kludge. Ideally we should have some way for the target
of a memory transaction to get at information about the source
(ie all the out of band info that hardware sends like master ID,
privilege level, secure vs nonsecure, etc).

 but I
 wonder if either of you two has already thought about how those will
 interact for gdbstub?

I'd need to look at the protocol to figure out if gdbstub does
memory accesses in the context of a particular thread (CPU
for us) or not. If not, that could be awkward.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/3] qtest: make QEMU our direct child process

2014-02-18 Thread Markus Armbruster

Stefan Hajnoczi stefa...@gmail.com writes:

 On Mon, Feb 17, 2014 at 05:44:55PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
 
  qtest_init() cannot use exec*p() to launch QEMU since the exec*p()
  functions take an argument array while qtest_init() takes char
  *extra_args.  Therefore we execute /bin/sh -c command-line and let the
  shell parse the argument string.
 
  This left /bin/sh as our child process and our child's child was QEMU.
  We still want QEMU's pid so the -pidfile option was used to let QEMU
  report its pid.
 
  The pidfile needs to be unlinked when the test case exits or fails.  In
  other words, the pidfile creates a new problem for us!
 
  Simplify all this using the shell 'exec' command.  It allows us to
  replace the /bin/sh process with QEMU.  Then we no longer need to use
  -pidfile because we already know our fork child's pid.
 
  Note: Yes, it seems silly to exec /bin/sh when we could just exec QEMU
  directly.  But remember qtest_init() takes a single char *extra_args
  command-line fragment instead of a real argv[] array, so we need
  /bin/sh's argument parsing behavior.
 
 Sounds like a design mistake to me.

 I wouldn't call char *extra_args a mistake because strings are still
 simpler to manipulate in C than char *argv[] arrays.  So we write less
 code in test cases and libqtest.c at the expense of a roundabout way to
 spawn the process.

Building command arguments in a string is deceptively simple.  The
deception collapses once you add funny characters that make the shell do
funny and unexpected things.

Our extra_args string manipulations are typically of the form format a
bunch of options and option arguments into a string.  I suspect using a
function to collect a bunch of options and option arguments into an
array would be about the same amount of code in most cases.

Just sayin'; I'm not objecting to your patch.

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Markus Armbruster

Stefan Hajnoczi stefa...@gmail.com writes:

 On Mon, Feb 17, 2014 at 05:49:31PM +0100, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
  +/* Catch SIGABRT to clean up on g_assert() failure */
  +sigact = (struct sigaction){
  +.sa_handler = sigabrt_handler,
  +.sa_flags = SA_RESETHAND,
  +};
 
 Assumes zero-initialization has the same effect as
 sigemptyset(sigact.sa_mask).  Quoting POSIX:
 
 The implementation of the sigemptyset() (or sigfillset()) function
 could quite trivially clear (or set) all the bits in the signal set.
 Alternatively, it would be reasonable to initialize part of the
 structure, such as a version field, to permit binary-compatibility
 between releases where the size of the set varies.  For such
 reasons, either sigemptyset() or sigfillset() must be called prior
 to any other use of the signal set, even if such use is read-only
 (for example, as an argument to sigpending()).
 
 Looks like you better sigemptyset() here, for maximum portability.

 Okay, will do that.

  +sigaction(SIGABRT, sigact, s-sigact_old);
   
   socket_path = g_strdup_printf(/tmp/qtest-%d.sock, getpid());
   qmp_socket_path = g_strdup_printf(/tmp/qtest-%d.qmp, getpid());
  @@ -150,13 +167,19 @@ void qtest_quit(QTestState *s)
   {
   int status;
   
  +sigaction(SIGABRT, s-sigact_old, NULL);
  +
 
 Can this ever restore the action to anything but the default action?

 This code is in a library so I don't want to make assumptions about
 the callers.  Keeping sigact_old around is literally 1 line of code so I
 think it's worth it.

   if (s-qemu_pid != -1) {
   kill(s-qemu_pid, SIGTERM);
   waitpid(s-qemu_pid, status, 0);
   }
   
  -close(s-fd);
  -close(s-qmp_fd);
  +if (s-fd != -1) {
  +close(s-fd);
  +}
  +if (s-qmp_fd != -1) {
  +close(s-qmp_fd);
  +}
 
 I generally don't bother to avoid close(-1).

 When I drive on the highway I stay on the lanes but I guess I could just
 slide along the side barriers :).  It's a style issue but close(-1)
 annoys me in strace so I try to avoid doing it.

For me, it's in the same category as free(NULL).

Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-18 Thread Michael S. Tsirkin

On Mon, Feb 17, 2014 at 11:58:11AM -0500, Vlad Yasevich wrote:
 On 02/17/2014 11:56 AM, Eric Blake wrote:
  On 02/17/2014 09:52 AM, Eric Blake wrote:
  On 02/16/2014 07:27 PM, Amos Kong wrote:
  Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
  filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
 
  We should also not send the vlan table to management, this patch makes
  the vlan-talbe optional.
 
  s/talbe/table/
 
  
  @@ -4053,7 +4053,7 @@
   'multicast-overflow': 'bool',
   'unicast-overflow':   'bool',
   'main-mac':   'str',
  -'vlan-table': ['int'],
  +'*vlan-table': ['int'],
 
  Indentation is now off.
 
   - main-mac: main macaddr string (json-string)
  -- vlan-table: a json-array of active vlan id
  +- vlan-table: a json-array of active vlan id (optoinal)
 
  s/optoinal/optional/
 
  Those fixes are trivial enough, so I'm okay if you correct them then add:
  
  Scratch that.  I revoke my R-b, on the following grounds:
  
  On 02/17/2014 08:27 AM, Vlad Yasevich wrote:
  On 02/16/2014 09:27 PM, Amos Kong wrote:
  Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
  filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't
  negotiated.
 
  We should also not send the vlan table to management, this patch makes
  the vlan-talbe optional.
^^
 table.
 
  One question I have from the API perspective is can we suddenly change
  something to be optional?  If there are any users of this ,
  wouldn't they have to change now to check the existence of this
  list?
  
  You are correct.  Since the parameter is an output field, older clients
  may be depending on it existing.  It is okay to generate an empty array,
  but you must not entirely omit the array unless you add further
  justification in your commit message that you are 100% positive that
  there are no clients of 1.6 that will be broken when the array no longer
  appears in the output.
  
  Can you rework the patch to just leave the array empty in the case where
  the bit does not indicate it is used?  Or do we need to add a new bool
  field to the output for new enough management to know whether to use the
  array?
  
 
 I think a completely empty array should be sufficient.
 
 -vlad

An empty array could mean either no vlans allowed or
all vlans allowed. Also it's nice if users can detect an old
buggy qemu.

Let's just add an rx state.

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Markus Armbruster

Markus Armbruster arm...@redhat.com writes:

 Stefan Hajnoczi stefa...@redhat.com writes:

 The QEMU process stays running if the test case fails.  This patch fixes
 the leak by installing a SIGABRT signal handler which invokes
 qtest_end().

 In order to make that work for assertion failures during qtest_init(),
 we need to initialize QTestState fields including file descriptors and
 pids carefully.  qtest_quit() is then safe to call even during
 qtest_init().

 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  tests/libqtest.c | 29 ++---
  1 file changed, 26 insertions(+), 3 deletions(-)

 diff --git a/tests/libqtest.c b/tests/libqtest.c
 index 8b2b2d7..09a0481 100644
 --- a/tests/libqtest.c
 +++ b/tests/libqtest.c
 @@ -44,6 +44,7 @@ struct QTestState
  bool irq_level[MAX_IRQ];
  GString *rx;
  pid_t qemu_pid;  /* our child QEMU process */
 +struct sigaction sigact_old; /* restored on exit */
  };
  
  #define g_assert_no_errno(ret) do { \
 @@ -88,6 +89,11 @@ static int socket_accept(int sock)
  return ret;
  }
  
 +static void sigabrt_handler(int signo)
 +{
 +qtest_end();

Don't you have to re-raise SIGABRT here, to actually terminate the
process?

 +}
 +
  QTestState *qtest_init(const char *extra_args)
  {
  QTestState *s;
[...]

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Markus Armbruster

Stefan Hajnoczi stefa...@gmail.com writes:

 On Mon, Feb 17, 2014 at 06:00:03PM +0100, Markus Armbruster wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 17/02/2014 16:44, Stefan Hajnoczi ha scritto:
   }
 
  +static void sigabrt_handler(int signo)
  +{
  +qtest_end();
  +}
  +
 
  void qtest_quit(QTestState *s)
  {
  int status;
 
  if (s-qemu_pid != -1) {
  kill(s-qemu_pid, SIGTERM);
  waitpid(s-qemu_pid, status, 0);
  }
 
  close(s-fd);
  close(s-qmp_fd);
  g_string_free(s-rx, true);
  g_free(s);
  }
 
  Not async-signal safe.  You need to ignore the g_string_free and
  g_free (perhaps even the closes) if calling from the sigabrt_handler.
 
 kill(), waitpid() and close() are all async-signal-safe.
 
 SIGABRT is normally synchronous enough: it's sent by abort().  But of
 course, nothing stops the user from kill -ABRT.  Or GLib from calling
 abort() in some place where an attempt to reenter it crashes  burns.
 Not sure I'd care, but I'm pretty sure I don't care for freeing stuff on
 exit :)

 Yes, SIGABRT is synchronous for all purposes.  So the only danger is
 that g_string_free() or g_free() could fail while we're in
 g_assert(false).  But they don't, which makes sense because they are
 totally unrelated to g_assert() and therefore can handle re-entrancy.

The (theoretical!) problem is when it aborts in the bowels of g_*free(),
and your SIGABRT handler reenters g_*free().

 In practice there is no issue and I've tested assertion failure with
 glib 1.2.10.

Worst that can happen is we crash on the way from abort() to process
termination.  Tolerable.

Still, avoiding unnecessary cleanup on that path seems prudent to me.
If you agree, factor out the kill()/waitpid(), and call only that from
the signal handler.

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 10:05, Stefan Hajnoczi ha scritto:

 SIGABRT is normally synchronous enough: it's sent by abort().  But of
 course, nothing stops the user from kill -ABRT.  Or GLib from calling
 abort() in some place where an attempt to reenter it crashes  burns.
 Not sure I'd care, but I'm pretty sure I don't care for freeing stuff on
 exit :)

Yes, SIGABRT is synchronous for all purposes.  So the only danger is
that g_string_free() or g_free() could fail while we're in
g_assert(false).  But they don't, which makes sense because they are
totally unrelated to g_assert() and therefore can handle re-entrancy.


If malloc aborts due to a double free or other similar problem, you may 
risk reentering it.


Paolo

[Qemu-devel] [PATCH v2 1/2] include/qemu/crc32c.h: Rename include guards to match filename

2014-02-18 Thread Will Newton

Signed-off-by: Will Newton will.new...@linaro.org
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
---
 include/qemu/crc32c.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu/crc32c.h b/include/qemu/crc32c.h
index 56d1c3b..dafb6a1 100644
--- a/include/qemu/crc32c.h
+++ b/include/qemu/crc32c.h
@@ -25,8 +25,8 @@
  *
  */
 
-#ifndef QEMU_CRC32_H
-#define QEMU_CRC32_H
+#ifndef QEMU_CRC32C_H
+#define QEMU_CRC32C_H
 
 #include qemu-common.h
 
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 0/2] target-arm: Add support for AArch32 ARMv8 CRC32 instructions

2014-02-18 Thread Will Newton

This series adds support for the AArch32 CRC32 instructions added in
ARMv8.

Will Newton (2):
  include/qemu/crc32c.h: Rename include guards to match filename
  target-arm: Add support for AArch32 ARMv8 CRC32 instructions

 configure  |  2 +-
 include/qemu/crc32c.h  |  4 ++--
 target-arm/helper.c| 39 +++
 target-arm/helper.h|  3 +++
 target-arm/translate.c | 48 
 5 files changed, 93 insertions(+), 3 deletions(-)

-- 
1.8.1.4

[Qemu-devel] [PATCH v2 2/2] target-arm: Add support for AArch32 ARMv8 CRC32 instructions

2014-02-18 Thread Will Newton

Add support for AArch32 CRC32 and CRC32C instructions added in ARMv8.
The CRC32-C implementation used is the built-in qemu implementation
and The CRC-32 implementation is from zlib. This requires adding zlib
to LIBS to ensure it is linked for the linux-user binary.

Signed-off-by: Will Newton will.new...@linaro.org
---
 configure  |  2 +-
 target-arm/helper.c| 39 +++
 target-arm/helper.h|  3 +++
 target-arm/translate.c | 48 
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 4648117..822842c 100755
--- a/configure
+++ b/configure
@@ -1550,7 +1550,7 @@ EOF
 Make sure to have the zlib libs and headers installed.
 fi
 fi
-libs_softmmu=$libs_softmmu -lz
+LIBS=$LIBS -lz
 
 ##
 # libseccomp check
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 5ae08c9..294cfaf 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -5,6 +5,8 @@
 #include sysemu/arch_init.h
 #include sysemu/sysemu.h
 #include qemu/bitops.h
+#include qemu/crc32c.h
+#include zlib.h /* For crc32 */
 
 #ifndef CONFIG_USER_ONLY
 static inline int get_phys_addr(CPUARMState *env, uint32_t address,
@@ -4468,3 +4470,40 @@ int arm_rmode_to_sf(int rmode)
 }
 return rmode;
 }
+
+static void crc_init_buffer(uint8_t *buf, uint32_t val, uint32_t bytes)
+{
+memset(buf, 0, 4);
+
+if (bytes == 1) {
+buf[0] = val  0xff;
+} else if (bytes == 2) {
+buf[0] = val  0xff;
+buf[1] = (val  8)  0xff;
+} else {
+buf[0] = val  0xff;
+buf[1] = (val  8)  0xff;
+buf[2] = (val  16)  0xff;
+buf[3] = (val  24)  0xff;
+}
+}
+
+uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+uint8_t buf[4];
+
+crc_init_buffer(buf, val, bytes);
+
+/* zlib crc32 converts the accumulator and output to one's complement.  */
+return crc32(acc ^ 0x, buf, bytes) ^ 0x;
+}
+
+uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+uint8_t buf[4];
+
+crc_init_buffer(buf, val, bytes);
+
+/* Linux crc32c converts the output to one's complement.  */
+return crc32c(acc, buf, bytes) ^ 0x;
+}
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 951e6ad..fb92e53 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -494,6 +494,9 @@ DEF_HELPER_3(neon_qzip32, void, env, i32, i32)
 DEF_HELPER_4(crypto_aese, void, env, i32, i32, i32)
 DEF_HELPER_4(crypto_aesmc, void, env, i32, i32, i32)
 
+DEF_HELPER_3(crc32, i32, i32, i32, i32)
+DEF_HELPER_3(crc32c, i32, i32, i32, i32)
+
 #ifdef TARGET_AARCH64
 #include helper-a64.h
 #endif
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 782aab8..9410b6a 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7541,6 +7541,32 @@ static void disas_arm_insn(CPUARMState * env, 
DisasContext *s)
 store_reg(s, 14, tmp2);
 gen_bx(s, tmp);
 break;
+case 0x4:
+{
+/* crc32/crc32c */
+uint32_t c = extract32(insn, 9, 1);
+
+/* size == 64 is UNPREDICTABLE but handle as UNDEFINED.  */
+if (op1 == 0x3) {
+goto illegal_op;
+}
+
+rn = (insn  16)  0xf;
+rd = (insn  12)  0xf;
+
+tmp = load_reg(s, rn);
+tmp2 = load_reg(s, rm);
+tmp3 = tcg_const_i32(1  op1);
+if (c) {
+gen_helper_crc32c(tmp, tmp, tmp2, tmp3);
+} else {
+gen_helper_crc32(tmp, tmp, tmp2, tmp3);
+}
+tcg_temp_free_i32(tmp2);
+tcg_temp_free_i32(tmp3);
+store_reg(s, rd, tmp);
+break;
+}
 case 0x5: /* saturating add/subtract */
 ARCH(5TE);
 rd = (insn  12)  0xf;
@@ -9125,6 +9151,28 @@ static int disas_thumb2_insn(CPUARMState *env, 
DisasContext *s, uint16_t insn_hw
 case 0x18: /* clz */
 gen_helper_clz(tmp, tmp);
 break;
+case 0x20:
+case 0x21:
+case 0x22:
+case 0x28:
+case 0x29:
+case 0x2a:
+{
+/* crc32/crc32c */
+uint32_t sz = op  0x3;
+uint32_t c = op  0x8;
+
+tmp2 = load_reg(s, rm);
+tmp3 = tcg_const_i32(1  sz);
+if (c) {
+gen_helper_crc32c(tmp, tmp, tmp2, tmp3);
+} else {
+gen_helper_crc32(tmp, tmp, tmp2, tmp3);
+}
+tcg_temp_free_i32(tmp2);
+tcg_temp_free_i32(tmp3);
+break;
+}
 default:

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Andreas Färber

Am 18.02.2014 10:46, schrieb Peter Maydell:
 On 18 February 2014 01:02, Andreas Färber afaer...@suse.de wrote:
 Shouldn't 1/3 and 3/3 be included in 1.7.1 as guest-triggerable bugs?
 There's no qemu-sta...@nongnu.org in the commit message.
 
 That's up to whoever maintains stable, I guess. I have no objection.

No, we've had that topic before: It's your job as submitter and
maintainer to flag that appropriately in the commit message, as per QEMU
Summit 2012.

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Daniel P. Berrange

On Tue, Feb 18, 2014 at 11:07:53AM +0100, Paolo Bonzini wrote:
 Il 18/02/2014 10:05, Stefan Hajnoczi ha scritto:
  SIGABRT is normally synchronous enough: it's sent by abort().  But of
  course, nothing stops the user from kill -ABRT.  Or GLib from calling
  abort() in some place where an attempt to reenter it crashes  burns.
  Not sure I'd care, but I'm pretty sure I don't care for freeing stuff on
  exit :)
 Yes, SIGABRT is synchronous for all purposes.  So the only danger is
 that g_string_free() or g_free() could fail while we're in
 g_assert(false).  But they don't, which makes sense because they are
 totally unrelated to g_assert() and therefore can handle re-entrancy.
 
 If malloc aborts due to a double free or other similar problem, you
 may risk reentering it.

If you register the custom SIGABRT handler with sigaction + SA_RESETHAND
then you'd avoid the re-entrancy risk, since a cascading SIGABRT would
get handled by the system default handler, which would immediately
terminate the process.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2, Ping] SMBIOS: Upgrade Type17 to v2.3, add Type2

2014-02-18 Thread Gerd Hoffmann

On Mo, 2014-02-17 at 15:33 -0500, Kevin O'Connor wrote:
 On Mon, Feb 17, 2014 at 11:09:48AM -0500, Gabriel L. Somlo wrote:
  On Fri, Feb 07, 2014 at 04:37:58PM +0100, Paolo Bonzini wrote:
  Il 06/02/2014 14:38, Gabriel L. Somlo ha scritto:
  On Wed, Feb 05, 2014 at 08:02:25PM -0500, Kevin O'Connor wrote:
   Thanks.  In general, though, it is preferred to make smbios changes at
   the QEMU layer.  Indeed, going forward, I'd like to see all the smbios
   stuff moved up to QEMU.
  
   Is there something in these two patches that can't be done in QEMU?
  
   But anyhow, right now QEMU seems to be making relatively minor tweaks
   to something that's firmly at home in SeaBIOS, which is why I sent my
   patches to the latter...
  
  Yeah, that's correct. There's really no particular hook in QEMU to do this.
  
  Would it be OK to apply this in SeaBIOS now, so that 1. it can be
  useful until whenever SMBIOS gets transferred/absorbed into QEMU,
  and 2. it won't fall through the cracks when that transition happens ?
 
 Unfortunately, if we change the smbios in SeaBIOS, it will show up on
 all machine types that use the new version of SeaBIOS.  We've had
 issues with this type of change before as various OSes react
 differently to the change.
 
 Gerd, what's your take on this change?

I think we can do this in qemu, without touching seabios at all.  The
currently used fw_cfg interface allows to pass both individual fields
and complete tables.  In practice the individual fields interface is
used for type0 and type1 table fields.  I think the only way to pass
complete tables is 'qemu -smbios file=pregenerated-blob-here'.

If seabios finds a table provided by qemu it used it, otherwise it
(possibly) generates its own.  So we can smoothly switch over to qemu,
table-by-table.  You can have qemu provide type2+type17 tables, and
leave everything else as-is.  And when doing it in qemu it is easy to do
it for new machine types only.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 11:17, Daniel P. Berrange ha scritto:

 Yes, SIGABRT is synchronous for all purposes.  So the only danger is
 that g_string_free() or g_free() could fail while we're in
 g_assert(false).  But they don't, which makes sense because they are
 totally unrelated to g_assert() and therefore can handle re-entrancy.


 If malloc aborts due to a double free or other similar problem, you
 may risk reentering it.

If you register the custom SIGABRT handler with sigaction + SA_RESETHAND
then you'd avoid the re-entrancy risk, since a cascading SIGABRT would
get handled by the system default handler, which would immediately
terminate the process.


I meant reentering malloc.

Paolo

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 11:05, Markus Armbruster ha scritto:

 Yes, SIGABRT is synchronous for all purposes.  So the only danger is
 that g_string_free() or g_free() could fail while we're in
 g_assert(false).  But they don't, which makes sense because they are
 totally unrelated to g_assert() and therefore can handle re-entrancy.

The (theoretical!) problem is when it aborts in the bowels of g_*free(),
and your SIGABRT handler reenters g_*free().


 In practice there is no issue and I've tested assertion failure with
 glib 1.2.10.

Worst that can happen is we crash on the way from abort() to process
termination.  Tolerable.


What about recursive locking of a non-recursive mutex?

Paolo

Re: [Qemu-devel] [RFC PATCH v2 02/12] mc: timestamp migration_bitmap and KVM logdirty usage

2014-02-18 Thread Dr. David Alan Gilbert

* mrhi...@linux.vnet.ibm.com (mrhi...@linux.vnet.ibm.com) wrote:
 From: Michael R. Hines mrhi...@us.ibm.com
 
 We also later export these statistics over QMP for better
 monitoring of micro-checkpointing as the workload changes.

snip

 @@ -548,9 +568,11 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
  /* XBZRLE overflow or normal page */
  if (bytes_sent == -1) {
  bytes_sent = save_block_hdr(f, block, offset, cont, 
 RAM_SAVE_FLAG_PAGE);
 -qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
 -bytes_sent += TARGET_PAGE_SIZE;
 -acct_info.norm_pages++;
 +if (ret != RAM_SAVE_CONTROL_DELAYED) {
 +qemu_put_buffer_async(f, p, TARGET_PAGE_SIZE);
 +bytes_sent += TARGET_PAGE_SIZE;
 +acct_info.norm_pages++;
 +}
  }

Is that last change intended for this patch; it doesn't look
timestamp related.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Michael S. Tsirkin

On Tue, Feb 18, 2014 at 02:38:40AM +, Zhanghaoyu (A) wrote:
 Hi, all
 
 The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
 attaching a pass-through PCI card to the non-pass-through VM for the first 
 time. 
 The reason is that the host will build the whole VT-d GPA-HPA DMAR 
 page-table, which needs a lot of time, and during this time, the 
 qemu_global_mutex
 lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be 
 blocked to waiting main-thread to release the qemu_global_mutex lock,
 so the VM got stuck.
 The race between qemu-main-thread and vcpu-thread is shown as below,
 
   QEMU-main-threadvcpu-thread 
   
  | |
   qemu_mutex_lock_iothread 
 qemu_mutex_lock(qemu_global_mutex)
  | |
 +loop- -+   +loop+  
  
 ||   | |
 |  qemu_mutex_unlock_iothread| 
 qemu_mutex_unlock_iothread 
 ||   | |  
 
 |   poll |
 kvm_vcpu_ioctl(KVM_RUN) 
 ||   | |  
 
 | qemu_mutex_lock_iothread   | |
 ||   | | 
  
 --
 ||   |  
 qemu_mutex_lock_iothread
 |   kvm_device_pci_assign| |  
 
 ||   |   blocked to waiting 
 main-thread to release the qemu lock
 |  about 6 sec for 20GB memory   | |  
 
 ||   | |  

 ++   +-+  
 
 
 
 Any advises?
 
 Thanks,
 Zhang Haoyu

What if you detach and re-attach?
Is it fast then?
If yes this means the issue is COW breaking that occurs
with get_user_pages, not translation as such.
Try hugepages with prealloc - does it help?




-- 
MST

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 11:38, Michael S. Tsirkin ha scritto:

What if you detach and re-attach?
Is it fast then?
If yes this means the issue is COW breaking that occurs
with get_user_pages, not translation as such.
Try hugepages with prealloc - does it help?


I agree it's either COW breaking or (similarly) locking pages that the 
guest hasn't touched yet.


You can use prealloc or -rt mlock=on to avoid this problem.

Paolo

Re: [Qemu-devel] [PATCH 3/3] qtest: kill QEMU process on g_assert() failure

2014-02-18 Thread Markus Armbruster

Paolo Bonzini pbonz...@redhat.com writes:

 Il 18/02/2014 11:05, Markus Armbruster ha scritto:
  Yes, SIGABRT is synchronous for all purposes.  So the only danger is
  that g_string_free() or g_free() could fail while we're in
  g_assert(false).  But they don't, which makes sense because they are
  totally unrelated to g_assert() and therefore can handle re-entrancy.
 The (theoretical!) problem is when it aborts in the bowels of g_*free(),
 and your SIGABRT handler reenters g_*free().

  In practice there is no issue and I've tested assertion failure with
  glib 1.2.10.
 Worst that can happen is we crash on the way from abort() to process
 termination.  Tolerable.

 What about recursive locking of a non-recursive mutex?

You're right, deadlock is possible.  Strengthens the argument for:

 Still, avoiding unnecessary cleanup on that path seems prudent to me.
 If you agree, factor out the kill()/waitpid(), and call only that from
 the signal handler.

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Michael S. Tsirkin

On Tue, Feb 18, 2014 at 11:42:19AM +0100, Paolo Bonzini wrote:
 Il 18/02/2014 11:38, Michael S. Tsirkin ha scritto:
 What if you detach and re-attach?
 Is it fast then?
 If yes this means the issue is COW breaking that occurs
 with get_user_pages, not translation as such.
 Try hugepages with prealloc - does it help?
 
 I agree it's either COW breaking or (similarly) locking pages that
 the guest hasn't touched yet.
 
 You can use prealloc or -rt mlock=on to avoid this problem.
 
 Paolo

Or the new shared flag - IIRC shared VMAs don't do COW either.

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Zhanghaoyu (A)

 Hi, all
 
 The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
 attaching a pass-through PCI card to the non-pass-through VM for the first 
 time. 
 The reason is that the host will build the whole VT-d GPA-HPA DMAR 
 page-table, which needs a lot of time, and during this time, the 
 qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL 
 return, it will be blocked to waiting main-thread to release the 
 qemu_global_mutex lock, so the VM got stuck.
 The race between qemu-main-thread and vcpu-thread is shown as below,
 
   QEMU-main-threadvcpu-thread

  | |
   qemu_mutex_lock_iothread 
 qemu_mutex_lock(qemu_global_mutex)
  | |
 +loop- -+   +loop+ 
   
 ||   | |
 |  qemu_mutex_unlock_iothread| 
 qemu_mutex_unlock_iothread 
 ||   | | 
  
 |   poll |
 kvm_vcpu_ioctl(KVM_RUN) 
 ||   | | 
  
 | qemu_mutex_lock_iothread   | |
 ||   | | 
  
 --
 ||   |  
 qemu_mutex_lock_iothread
 |   kvm_device_pci_assign| | 
  
 ||   |   blocked to waiting 
 main-thread to release the qemu lock
 |  about 6 sec for 20GB memory   | | 
  
 ||   | | 
 
 ++   +-+ 
  
 
 
 Any advises?
 
 Thanks,
 Zhang Haoyu

What if you detach and re-attach?
Is it fast then?
Yes, because the VT-d GPA-HPA DMAR page-table has been built, no need to 
re-build it.

If yes this means the issue is COW breaking that occurs with get_user_pages, 
not translation as such.
Try hugepages with prealloc - does it help?
Yes, a bit help gained, but it cannot resolve the problem completely, the stuck 
still happened.

Thanks,
Zhang Haoyu

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 11:51, Michael S. Tsirkin ha scritto:

 I agree it's either COW breaking or (similarly) locking pages that
 the guest hasn't touched yet.

 You can use prealloc or -rt mlock=on to avoid this problem.

 Paolo

Or the new shared flag - IIRC shared VMAs don't do COW either.


Only if the problem isn't locking and zeroing of untouched pages (also, 
it is not upstream is it?).


Can you make a profile with perf?

Paolo

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Peter Maydell

On 18 February 2014 10:13, Andreas Färber afaer...@suse.de wrote:
 Am 18.02.2014 10:46, schrieb Peter Maydell:
 On 18 February 2014 01:02, Andreas Färber afaer...@suse.de wrote:
 Shouldn't 1/3 and 3/3 be included in 1.7.1 as guest-triggerable bugs?
 There's no qemu-sta...@nongnu.org in the commit message.

 That's up to whoever maintains stable, I guess. I have no objection.

 No, we've had that topic before: It's your job as submitter and
 maintainer to flag that appropriately in the commit message, as per QEMU
 Summit 2012.

I don't think this workflow works. I have no idea what
stable's criteria are, and if you rely on people
adding a cc you're going to miss stuff.

thanks
-- PMM

Re: [Qemu-devel] [RFC 0/9] generate dynamic _CRS for motherboard resources

2014-02-18 Thread Igor Mammedov

On Mon, 17 Feb 2014 13:02:18 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Mon, Feb 17, 2014 at 11:33:23AM +0100, Igor Mammedov wrote:
  On Sun, 16 Feb 2014 17:53:45 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Fri, Feb 07, 2014 at 01:51:27PM +0100, Igor Mammedov wrote:
Since introduction of PCIHP, it became problematic to
punch hole in PCI0._CRS statically since PCI hotplug
region size became runtime changeable.
   
   What makes it runtime changeable?
  piix machine acpi-pci-hotplug-with-bridge-support=on effectively
  changes size of pcihp MMIO region
 
 It adds 4 bytes. Let's just reserve a reasonably sized region,
 like 512 bytes.
Reserve where and for how log it will be enough?
Eventually we will continue punching holes or increasing size of
this one big hole.

Q35 adds more to the problem with guest programmable pm_io_base,
we can't statically punch hole for it.

 
   
So replace static hole punching with dynamically consumed
resources in a child device on PCI0 bus. i.e generate
PNP0C02 device as a child of PCI0 bus at runtime and
consume GPE0, PCI/CPU hotplug IO resources in it instead
of punching holes in static PCI0._CRS.
   
   It seems that we are being too exact with
   IO resources here.
   Can't we roughly reserve 0xae00 to 0xafff
   and be done with it?
  that would be easiest way for this specific case if we could agree
  for ranges on PIIX/Q35 machines.
  
  But I also use it as excuse to introduce ASL like macros so that
  building dynamic SSDT would be easier i.e. replace template patching
  with single place where dynamic device is defined and its values
  are set in simple and straightforward manner.
 
 We don't need excuses really :)
 But the approach itself needs some work IMHO, in particular
 ASL like syntax by using macros and varargs is not something to strive
 for IMHO :)
 
 Why don't we just pass GArrays around?
 
 crs = build_alloc_crs();
 build_append_resource(crs, )

I don't like much macros myself, but compared to alternative
approach ^^^ (I've tried it) it's still harder to read/write
ASL constructs right.

I think that creating AML using ASL like flow and primitives
is better than building it using custom APIs, since it looks
like it was documented in spec.
Macros+vararg provide a necessary degree of flexibility to
create AML using ASL like flow and constructs.

 
 
 Or if you really want this, using array of GArray:
 
 build_append_crs(ssdt,
   {
   build_alloc_resource(...),
   build_alloc_resource(...),
   build_alloc_resource(...),
   build_alloc_resource(...),
   NULL
   })
that would work if there was a way to specify arbitrary statements
inside embedded array (like in the last hunk of [9/9]).

 
 
Tested with Windows XPsp3, Vista, Windows Server 2003, 2008, 2012r2.

PS:
Series adds several ASL like macros to simplify
code for dynamic generation of AML structures.

Igor Mammedov (9):
  Revert pc: Q35 DSDT: exclude CPU hotplug IO range from PCI bus
resources
  Revert pc: PIIX DSDT: exclude CPU/PCI hotplug  GPE0 IO range from
PCI bus resources
  Partial revert pc: ACPI: expose PRST IO range via _CRS
  acpi: replace opencoded opcodes with defines
  acpi: add PNP0C02 to PCI0 bus
  acpi: consume GPE0 IO resources in PNP0C02 device
  acpi: consume CPU hotplug IO resource in PNP0C02 device
  pcihp: expose PCI hotplug MMIO base/length as properties of piix4pm
  acpi: consume PCIHP IO resource in PNP0C02 device

 hw/acpi/pcihp.c   |   28 ++
 hw/acpi/piix4.c   |1 +
 hw/i386/acpi-build.c  |  177 
++--
 hw/i386/acpi-dsdt-cpu-hotplug.dsl |   11 ---
 hw/i386/acpi-dsdt-pci-crs.dsl |   15 +++-
 hw/i386/acpi-dsdt.dsl |   39 
 hw/i386/q35-acpi-dsdt.dsl |   16 
 include/hw/acpi/pcihp.h   |4 +
 8 files changed, 214 insertions(+), 77 deletions(-)
  
  
  -- 
  Regards,
Igor

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Michael S. Tsirkin

On Tue, Feb 18, 2014 at 12:05:19PM +0100, Paolo Bonzini wrote:
 Il 18/02/2014 11:51, Michael S. Tsirkin ha scritto:
  I agree it's either COW breaking or (similarly) locking pages that
  the guest hasn't touched yet.
 
  You can use prealloc or -rt mlock=on to avoid this problem.
 
  Paolo
 Or the new shared flag - IIRC shared VMAs don't do COW either.
 
 Only if the problem isn't locking and zeroing of untouched pages
 (also, it is not upstream is it?).

No but it's a small patch - part of vhost user patchset.

 Can you make a profile with perf?
 
 Paolo

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 12:09, Peter Maydell ha scritto:

 No, we've had that topic before: It's your job as submitter and
 maintainer to flag that appropriately in the commit message, as per QEMU
 Summit 2012.

I don't think this workflow works. I have no idea what
stable's criteria are, and if you rely on people
adding a cc you're going to miss stuff.


There isn't really a standard criterion.  It's up to each maintainer to 
be stricter or looser on what goes to stable.


Paolo

Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-18 Thread Zhanghaoyu (A)

 What if you detach and re-attach?
 Is it fast then?
 If yes this means the issue is COW breaking that occurs with 
 get_user_pages, not translation as such.
 Try hugepages with prealloc - does it help?

I agree it's either COW breaking or (similarly) locking pages that the guest 
hasn't touched yet.

You can use prealloc or -rt mlock=on to avoid this problem.

It gets better if using -rt mlock=on, but still cannot resolve the problem 
completely.
VT-d and EPT do not share the GPA-HPA page-table, still need to build VT-d 
GPA-HPA DMAR page-table,
Although the -rt mlock=on option guarantees that all of vm memory have been 
touched before attaching 
the pass-through device, the building is faster, but which still need some time.

Thanks,
Zhang Haoyu

Paolo

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Peter Maydell

On 18 February 2014 11:16, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 18/02/2014 12:09, Peter Maydell ha scritto:

  No, we've had that topic before: It's your job as submitter and
  maintainer to flag that appropriately in the commit message, as per QEMU
  Summit 2012.

 I don't think this workflow works. I have no idea what
 stable's criteria are, and if you rely on people
 adding a cc you're going to miss stuff.


 There isn't really a standard criterion.  It's up to each maintainer to be
 stricter or looser on what goes to stable.

My criteria for ARM in the past has typically been there's
a new release every three months, anything that got past
the release testing process for release N is sufficiently
non-critical it can just go into release N+1.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 2/2] target-arm: Add support for AArch32ARMv8 CRC32 instructionss

2014-02-18 Thread Alex Bennée


will.new...@linaro.org writes:

 Add support for AArch32 CRC32 and CRC32C instructions added in ARMv8.
 The CRC32-C implementation used is the built-in qemu implementation
 and The CRC-32 implementation is from zlib. This requires adding zlib
 to LIBS to ensure it is linked for the linux-user binary.

 Signed-off-by: Will Newton will.new...@linaro.org
 ---
  configure  |  2 +-
  target-arm/helper.c| 39 +++
  target-arm/helper.h|  3 +++
  target-arm/translate.c | 48 
  4 files changed, 91 insertions(+), 1 deletion(-)

 diff --git a/configure b/configure
 index 4648117..822842c 100755
 --- a/configure
 +++ b/configure
 @@ -1550,7 +1550,7 @@ EOF
  Make sure to have the zlib libs and headers installed.
  fi
  fi
 -libs_softmmu=$libs_softmmu -lz
 +LIBS=$LIBS -lz
  
  ##
  # libseccomp check
 diff --git a/target-arm/helper.c b/target-arm/helper.c
 index 5ae08c9..294cfaf 100644
 --- a/target-arm/helper.c
 +++ b/target-arm/helper.c
 @@ -5,6 +5,8 @@
  #include sysemu/arch_init.h
  #include sysemu/sysemu.h
  #include qemu/bitops.h
 +#include qemu/crc32c.h
 +#include zlib.h /* For crc32 */
  
  #ifndef CONFIG_USER_ONLY
  static inline int get_phys_addr(CPUARMState *env, uint32_t address,
 @@ -4468,3 +4470,40 @@ int arm_rmode_to_sf(int rmode)
  }
  return rmode;
  }
 +
 +static void crc_init_buffer(uint8_t *buf, uint32_t val, uint32_t bytes)
 +{
 +memset(buf, 0, 4);
 +
 +if (bytes == 1) {
 +buf[0] = val  0xff;
 +} else if (bytes == 2) {
 +buf[0] = val  0xff;
 +buf[1] = (val  8)  0xff;
 +} else {
 +buf[0] = val  0xff;
 +buf[1] = (val  8)  0xff;
 +buf[2] = (val  16)  0xff;
 +buf[3] = (val  24)  0xff;
 +}
 +}
 +
 +uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
 +{
 +uint8_t buf[4];
 +
 +crc_init_buffer(buf, val, bytes);
 +
 +/* zlib crc32 converts the accumulator and output to one's complement.  
 */
 +return crc32(acc ^ 0x, buf, bytes) ^ 0x;
 +}
 +
 +uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
 +{
 +uint8_t buf[4];
 +
 +crc_init_buffer(buf, val, bytes);
 +
 +/* Linux crc32c converts the output to one's complement.  */
 +return crc32c(acc, buf, bytes) ^ 0x;
 +}
 diff --git a/target-arm/helper.h b/target-arm/helper.h
 index 951e6ad..fb92e53 100644
 --- a/target-arm/helper.h
 +++ b/target-arm/helper.h
 @@ -494,6 +494,9 @@ DEF_HELPER_3(neon_qzip32, void, env, i32, i32)
  DEF_HELPER_4(crypto_aese, void, env, i32, i32, i32)
  DEF_HELPER_4(crypto_aesmc, void, env, i32, i32, i32)
  
 +DEF_HELPER_3(crc32, i32, i32, i32, i32)
 +DEF_HELPER_3(crc32c, i32, i32, i32, i32)
 +
  #ifdef TARGET_AARCH64
  #include helper-a64.h
  #endif
 diff --git a/target-arm/translate.c b/target-arm/translate.c
 index 782aab8..9410b6a 100644
 --- a/target-arm/translate.c
 +++ b/target-arm/translate.c
 @@ -7541,6 +7541,32 @@ static void disas_arm_insn(CPUARMState * env, 
 DisasContext *s)
  store_reg(s, 14, tmp2);
  gen_bx(s, tmp);
  break;
 +case 0x4:
 +{
 +/* crc32/crc32c */
 +uint32_t c = extract32(insn, 9, 1);
 +
 +/* size == 64 is UNPREDICTABLE but handle as UNDEFINED.  */
 +if (op1 == 0x3) {
 +goto illegal_op;
 +}
 +
 +rn = (insn  16)  0xf;
 +rd = (insn  12)  0xf;

use extract32?

 +
 +tmp = load_reg(s, rn);
 +tmp2 = load_reg(s, rm);
 +tmp3 = tcg_const_i32(1  op1);
 +if (c) {
 +gen_helper_crc32c(tmp, tmp, tmp2, tmp3);
 +} else {
 +gen_helper_crc32(tmp, tmp, tmp2, tmp3);
 +}
 +tcg_temp_free_i32(tmp2);
 +tcg_temp_free_i32(tmp3);
 +store_reg(s, rd, tmp);
 +break;
 +}
  case 0x5: /* saturating add/subtract */
  ARCH(5TE);
  rd = (insn  12)  0xf;
 @@ -9125,6 +9151,28 @@ static int disas_thumb2_insn(CPUARMState *env, 
 DisasContext *s, uint16_t insn_hw
  case 0x18: /* clz */
  gen_helper_clz(tmp, tmp);
  break;
 +case 0x20:
 +case 0x21:
 +case 0x22:
 +case 0x28:
 +case 0x29:
 +case 0x2a:
 +{
 +/* crc32/crc32c */
 +uint32_t sz = op  0x3;
 +uint32_t c = op  0x8;
 +
 +tmp2 = load_reg(s, rm);
 +tmp3 = tcg_const_i32(1  sz);
 +if (c) {
 +gen_helper_crc32c(tmp, tmp, tmp2, tmp3);
 +} else {
 +gen_helper_crc32(tmp, tmp, tmp2, tmp3);
 +

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 12:22, Peter Maydell ha scritto:

 There isn't really a standard criterion.  It's up to each maintainer to be
 stricter or looser on what goes to stable.

My criteria for ARM in the past has typically been there's
a new release every three months, anything that got past
the release testing process for release N is sufficiently
non-critical it can just go into release N+1.


Then you can ignore qemu-stable. :)

Paolo

Re: [Qemu-devel] [RFC 0/9] generate dynamic _CRS for motherboard resources

2014-02-18 Thread Michael S. Tsirkin

On Tue, Feb 18, 2014 at 12:10:27PM +0100, Igor Mammedov wrote:
 On Mon, 17 Feb 2014 13:02:18 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Mon, Feb 17, 2014 at 11:33:23AM +0100, Igor Mammedov wrote:
   On Sun, 16 Feb 2014 17:53:45 +0200
   Michael S. Tsirkin m...@redhat.com wrote:
   
On Fri, Feb 07, 2014 at 01:51:27PM +0100, Igor Mammedov wrote:
 Since introduction of PCIHP, it became problematic to
 punch hole in PCI0._CRS statically since PCI hotplug
 region size became runtime changeable.

What makes it runtime changeable?
   piix machine acpi-pci-hotplug-with-bridge-support=on effectively
   changes size of pcihp MMIO region
  
  It adds 4 bytes. Let's just reserve a reasonably sized region,
  like 512 bytes.
 Reserve where and for how log it will be enough?
 Eventually we will continue punching holes or increasing size of
 this one big hole.

I agree this might become necessary.  But why not keep it simple while
we can? The advantage of ACPI in qemu is we can fix it without
touching guest code. Let's use it.

So if there's just pm_io_base, we can just write
it in ASL and patch in that value.

 
 Q35 adds more to the problem with guest programmable pm_io_base,
 we can't statically punch hole for it.

Aha. That's a good point.
But I still worry how we'll handle e.g. pvpanic.
Paolo had an idea to scan the memory range,
and punch holes for each section, making an
exception for regions owned by PCI devices.
I thought it would work originally too, but it
seems that e.g. VGA range is also owned by pci
devices sometimes.


  

 So replace static hole punching with dynamically consumed
 resources in a child device on PCI0 bus. i.e generate
 PNP0C02 device as a child of PCI0 bus at runtime and
 consume GPE0, PCI/CPU hotplug IO resources in it instead
 of punching holes in static PCI0._CRS.

It seems that we are being too exact with
IO resources here.
Can't we roughly reserve 0xae00 to 0xafff
and be done with it?
   that would be easiest way for this specific case if we could agree
   for ranges on PIIX/Q35 machines.
   
   But I also use it as excuse to introduce ASL like macros so that
   building dynamic SSDT would be easier i.e. replace template patching
   with single place where dynamic device is defined and its values
   are set in simple and straightforward manner.
  
  We don't need excuses really :)
  But the approach itself needs some work IMHO, in particular
  ASL like syntax by using macros and varargs is not something to strive
  for IMHO :)
  
  Why don't we just pass GArrays around?
  
  crs = build_alloc_crs();
  build_append_resource(crs, )
 
 I don't like much macros myself, but compared to alternative
 approach ^^^ (I've tried it) it's still harder to read/write
 ASL constructs right.
 
 I think that creating AML using ASL like flow and primitives
 is better than building it using custom APIs, since it looks
 like it was documented in spec.
 Macros+vararg provide a necessary degree of flexibility to
 create AML using ASL like flow and constructs.

Only you lose all type safety. When compiling ASL with IASL
at least it does some type checking.

  
  
  Or if you really want this, using array of GArray:
  
  build_append_crs(ssdt,
  {
  build_alloc_resource(...),
  build_alloc_resource(...),
  build_alloc_resource(...),
  build_alloc_resource(...),
  NULL
  })
 that would work if there was a way to specify arbitrary statements
 inside embedded array (like in the last hunk of [9/9]).

In C we have functions for this :)
Last chunk:

-if (pm-pcihp_io_base) {
-ACPI_IO(RESBUF, Decode16,
-pm-pcihp_io_base, /* _MIN */
-pm-pcihp_io_base, /* _MAX */
-0x0,   /* _ALN */
-pm-pcihp_io_len); /* _LEN */
-}


+   build_io_resource_16(pm-pcihp_io_base, pm-pcihp_io_base,
 0x0, pm-pcihp_io_len);


Which actually makes one think. Isn't _MAX wrong here?
I'd expect pm-pcihp_io_base + pm-pcihp_io_len - 1

I would also key this off pcihp_io_len.

  
  
 Tested with Windows XPsp3, Vista, Windows Server 2003, 2008, 2012r2.
 
 PS:
 Series adds several ASL like macros to simplify
 code for dynamic generation of AML structures.
 
 Igor Mammedov (9):
   Revert pc: Q35 DSDT: exclude CPU hotplug IO range from PCI bus
 resources
   Revert pc: PIIX DSDT: exclude CPU/PCI hotplug  GPE0 IO range from
 PCI bus resources
   Partial revert pc: ACPI: expose PRST IO range via _CRS
   acpi: replace opencoded opcodes with defines
   acpi: add PNP0C02 to PCI0 bus
   acpi: consume GPE0 IO resources in PNP0C02 device
   acpi: consume

Re: [Qemu-devel] [PATCH v2] qcow2: Set zero flag for discarded clusters

2014-02-18 Thread Kevin Wolf

Am 17.02.2014 um 18:12 hat Eric Blake geschrieben:
 On 02/17/2014 07:45 AM, Kevin Wolf wrote:
  Instead of making the backing file contents visible again after a discard
  request, set the zero flag if possible (i.e. on version = 3).
  
  Signed-off-by: Kevin Wolf kw...@redhat.com
  ---
   block/qcow2-cluster.c  | 22 --
   tests/qemu-iotests/046 | 18 ++
   2 files changed, 34 insertions(+), 6 deletions(-)
  
 
  +if (!!(old_offset  QCOW_OFLAG_ZERO)) {
 
 The !! is not necessary here; any non-zero value in a boolean context
 gives the same result as an explicit conversion to 0-or-1.

Thanks, that looks a bit odd indeed. I'll change it.

Kevin


pgpQg0digIRKr.pgp
Description: PGP signature

[Qemu-devel] [PATCH] block/iscsi: fix deadlock on scsi check condition

2014-02-18 Thread Peter Lieven

the retry logic was broken because the complete status
of the task structure was not reset. this resulted in
an infinite loop retrying the command over and over.

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3c0b728..ded414e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -145,12 +145,13 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,
 
 if (iTask-retries--  0  status == SCSI_STATUS_CHECK_CONDITION
  task-sense.key == SCSI_SENSE_UNIT_ATTENTION) {
+error_report(iSCSI CheckCondition: %s, iscsi_get_error(iscsi));
 iTask-do_retry = 1;
 goto out;
 }
 
 if (status != SCSI_STATUS_GOOD) {
-error_report(iSCSI: Failure. %s, iscsi_get_error(iscsi));
+error_report(iSCSI Failure: %s, iscsi_get_error(iscsi));
 }
 
 out:
@@ -325,6 +326,7 @@ retry:
 }
 
 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }
 
@@ -399,6 +401,7 @@ retry:
 }
 
 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }
 
@@ -433,6 +436,7 @@ retry:
 }
 
 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }
 
@@ -683,6 +687,7 @@ retry:
 scsi_free_scsi_task(iTask.task);
 iTask.task = NULL;
 }
+iTask.complete = 0;
 goto retry;
 }
 
@@ -767,6 +772,7 @@ retry:
 }
 
 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }
 
@@ -836,6 +842,7 @@ retry:
 }
 
 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }
 
-- 
1.7.9.5

[Qemu-devel] [PATCH V18 02/12] quorum: Create BDRVQuorumState and BlkDriver and do init.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Create the structure holding the quorum settings and write the minimal block
driver instanciation boilerplate.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/quorum.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 950f5cc..375ffd5 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,23 @@
 
 #include block/block_int.h
 
+/* the following structure holds the state of one quorum instance */
+typedef struct BDRVQuorumState {
+BlockDriverState **bs; /* children BlockDriverStates */
+int num_children;  /* children count */
+int threshold; /* if less than threshold children reads gave the
+* same result a quorum error occurs.
+*/
+bool is_blkverify; /* true if the driver is in blkverify mode
+* Writes are mirrored on two children devices.
+* On reads the two children devices' contents are
+* compared and if a difference is spotted its
+* location is printed and the code aborts.
+* It is useful to debug other block drivers by
+* comparing them with a reference one.
+*/
+} BDRVQuorumState;
+
 typedef struct QuorumAIOCB QuorumAIOCB;
 
 /* Quorum will create one instance of the following structure per operation it
@@ -51,3 +68,17 @@ struct QuorumAIOCB {
 bool is_read;
 int vote_ret;
 };
+
+static BlockDriver bdrv_quorum = {
+.format_name= quorum,
+.protocol_name  = quorum,
+
+.instance_size  = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+bdrv_register(bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 08/12] quorum: Add quorum_invalidate_cache().

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

We really want that live migration works with quorum so implement
quorum_invalidate_cache().

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/quorum.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 131ffbf..d924055 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -619,6 +619,16 @@ static int64_t quorum_getlength(BlockDriverState *bs)
 return result;
 }
 
+static void quorum_invalidate_cache(BlockDriverState *bs)
+{
+BDRVQuorumState *s = bs-opaque;
+int i;
+
+for (i = 0; i  s-num_children; i++) {
+bdrv_invalidate_cache(s-bs[i]);
+}
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
@@ -629,6 +639,7 @@ static BlockDriver bdrv_quorum = {
 
 .bdrv_aio_readv = quorum_aio_readv,
 .bdrv_aio_writev= quorum_aio_writev,
+.bdrv_invalidate_cache = quorum_invalidate_cache,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 03/12] quorum: Add quorum_aio_writev and its dependencies.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Writes are mirrored num_children times on num_children devices.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/quorum.c | 103 +
 1 file changed, 103 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 375ffd5..99e0e03 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -69,11 +69,114 @@ struct QuorumAIOCB {
 int vote_ret;
 };
 
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+BDRVQuorumState *s = acb-common.bs-opaque;
+int i;
+
+/* cancel all callbacks */
+for (i = 0; i  s-num_children; i++) {
+bdrv_aio_cancel(acb-qcrs[i].aiocb);
+}
+
+g_free(acb-qcrs);
+qemu_aio_release(acb);
+}
+
+static AIOCBInfo quorum_aiocb_info = {
+.aiocb_size = sizeof(QuorumAIOCB),
+.cancel = quorum_aio_cancel,
+};
+
+static void quorum_aio_finalize(QuorumAIOCB *acb)
+{
+int ret = 0;
+
+acb-common.cb(acb-common.opaque, ret);
+
+g_free(acb-qcrs);
+qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+   BlockDriverState *bs,
+   QEMUIOVector *qiov,
+   uint64_t sector_num,
+   int nb_sectors,
+   BlockDriverCompletionFunc *cb,
+   void *opaque)
+{
+QuorumAIOCB *acb = qemu_aio_get(quorum_aiocb_info, bs, cb, opaque);
+int i;
+
+acb-common.bs-opaque = s;
+acb-sector_num = sector_num;
+acb-nb_sectors = nb_sectors;
+acb-qiov = qiov;
+acb-qcrs = g_new0(QuorumChildRequest, s-num_children);
+acb-count = 0;
+acb-success_count = 0;
+acb-is_read = false;
+acb-vote_ret = 0;
+
+for (i = 0; i  s-num_children; i++) {
+acb-qcrs[i].buf = NULL;
+acb-qcrs[i].ret = 0;
+acb-qcrs[i].parent = acb;
+}
+
+return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+QuorumChildRequest *sacb = opaque;
+QuorumAIOCB *acb = sacb-parent;
+BDRVQuorumState *s = acb-common.bs-opaque;
+
+sacb-ret = ret;
+acb-count++;
+if (ret == 0) {
+acb-success_count++;
+}
+assert(acb-count = s-num_children);
+assert(acb-success_count = s-num_children);
+if (acb-count  s-num_children) {
+return;
+}
+
+quorum_aio_finalize(acb);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+  int64_t sector_num,
+  QEMUIOVector *qiov,
+  int nb_sectors,
+  BlockDriverCompletionFunc *cb,
+  void *opaque)
+{
+BDRVQuorumState *s = bs-opaque;
+QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+  cb, opaque);
+int i;
+
+for (i = 0; i  s-num_children; i++) {
+acb-qcrs[i].aiocb = bdrv_aio_writev(s-bs[i], sector_num, qiov,
+ nb_sectors, quorum_aio_cb,
+ acb-qcrs[i]);
+}
+
+return acb-common;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
 
 .instance_size  = sizeof(BDRVQuorumState),
+
+.bdrv_aio_writev= quorum_aio_writev,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 01/12] quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Quorum is a block filter mirroring writes to num_children children.
For reads quorum reads each children and does a vote.
If more than vote_threshold versions are identical the quorum is reached and
this winning version is returned to the guest. So quorum prevents bit 
corruption.
For high availability purpose minority errors are reported via QMP but the guest
does not see them.

This patch creates the driver C source file and introduces the structures that
will be used in asynchronous reads and writes.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/Makefile.objs |  1 +
 block/quorum.c  | 53 +
 2 files changed, 54 insertions(+)
 create mode 100644 block/quorum.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e254a21..716556f 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,6 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
+block-obj-y += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 000..950f5cc
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,53 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012-2014 Nodalink, EURL.
+ *
+ * Author:
+ *   Benoît Canet benoit.ca...@irqsave.net
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include block/block_int.h
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+/* Quorum will create one instance of the following structure per operation it
+ * performs on its children.
+ * So for each read/write operation coming from the upper layer there will be
+ * $children_count QuorumChildRequest.
+ */
+typedef struct QuorumChildRequest {
+BlockDriverAIOCB *aiocb;
+QEMUIOVector qiov;
+uint8_t *buf;
+int ret;
+QuorumAIOCB *parent;
+} QuorumChildRequest;
+
+/* Quorum will use the following structure to track progress of each read/write
+ * operation received by the upper layer.
+ * This structure hold pointers to the QuorumChildRequest structures instances
+ * used to do operations on each children and track overall progress.
+ */
+struct QuorumAIOCB {
+BlockDriverAIOCB common;
+
+/* Request metadata */
+uint64_t sector_num;
+int nb_sectors;
+
+QEMUIOVector *qiov; /* calling IOV */
+
+QuorumChildRequest *qcrs;   /* individual child requests */
+int count;  /* number of completed AIOCB */
+int success_count;  /* number of successfully completed AIOCB */
+
+bool is_read;
+int vote_ret;
+};
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 09/12] quorum: Add quorum_co_flush().

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Makes a vote to select error if any.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/quorum.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index d924055..1007dfc 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -629,12 +629,40 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
 }
 }
 
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+BDRVQuorumState *s = bs-opaque;
+QuorumVoteVersion *winner = NULL;
+QuorumVotes error_votes;
+QuorumVoteValue result_value;
+int i;
+int result = 0;
+
+QLIST_INIT(error_votes.vote_list);
+error_votes.compare = quorum_64bits_compare;
+
+for (i = 0; i  s-num_children; i++) {
+result = bdrv_co_flush(s-bs[i]);
+result_value.l = result;
+quorum_count_vote(error_votes, result_value, i);
+}
+
+winner = quorum_get_vote_winner(error_votes);
+result = winner-value.l;
+
+quorum_free_vote_list(error_votes);
+
+return result;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
 
 .instance_size  = sizeof(BDRVQuorumState),
 
+.bdrv_co_flush_to_disk = quorum_co_flush,
+
 .bdrv_getlength = quorum_getlength,
 
 .bdrv_aio_readv = quorum_aio_readv,
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 10/12] quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

This is used to activate quorum snapshot.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/quorum.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 1007dfc..40832c0 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -655,6 +655,23 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 return result;
 }
 
+static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
+   BlockDriverState *candidate)
+{
+BDRVQuorumState *s = bs-opaque;
+int i;
+
+for (i = 0; i  s-num_children; i++) {
+bool perm = bdrv_recurse_is_first_non_filter(s-bs[i],
+ candidate);
+if (perm) {
+return true;
+}
+}
+
+return false;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
@@ -668,6 +685,8 @@ static BlockDriver bdrv_quorum = {
 .bdrv_aio_readv = quorum_aio_readv,
 .bdrv_aio_writev= quorum_aio_writev,
 .bdrv_invalidate_cache = quorum_invalidate_cache,
+
+.bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.8.3.2

[Qemu-devel] [PATCH V18 11/12] quorum: Add quorum_open() and quorum_close().

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Example of command line:

-drive if=virtio,driver=quorum,\
children.0.file.filename=1.raw,\
children.0.node-name=1.raw,\
children.0.driver=raw,\
children.1.file.filename=2.raw,\
children.1.node-name=2.raw,\
children.1.driver=raw,\
children.2.file.filename=3.raw,\
children.2.node-name=3.raw,\
children.2.driver=raw,\
vote-threshold=2

blkverify=on with vote-threshold=2 and two files can be passed to
emulate blkverify.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/quorum.c   | 161 +++
 monitor.c|   3 ++
 qapi-schema.json |  21 +++-
 3 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 40832c0..18721ba 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -20,6 +20,9 @@
 
 #define HASH_LENGTH 32
 
+#define QUORUM_OPT_VOTE_THRESHOLD vote-threshold
+#define QUORUM_OPT_BLKVERIFY  blkverify
+
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
 char h[HASH_LENGTH];   /* SHA-256 hash */
@@ -672,12 +675,170 @@ static bool 
quorum_recurse_is_first_non_filter(BlockDriverState *bs,
 return false;
 }
 
+static int quorum_valid_threshold(int threshold, int num_children, Error 
**errp)
+{
+
+if (threshold  1) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+  vote-threshold, value = 1);
+return -ERANGE;
+}
+
+if (threshold  num_children) {
+error_setg(errp, threshold may not exceed children count);
+return -ERANGE;
+}
+
+return 0;
+}
+
+static QemuOptsList quorum_runtime_opts = {
+.name = quorum,
+.head = QTAILQ_HEAD_INITIALIZER(quorum_runtime_opts.head),
+.desc = {
+{
+.name = QUORUM_OPT_VOTE_THRESHOLD,
+.type = QEMU_OPT_NUMBER,
+.help = The number of vote needed for reaching quorum,
+},
+{
+.name = QUORUM_OPT_BLKVERIFY,
+.type = QEMU_OPT_BOOL,
+.help = Trigger block verify mode if set,
+},
+{ /* end of list */ }
+},
+};
+
+static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
+   Error **errp)
+{
+BDRVQuorumState *s = bs-opaque;
+Error *local_err = NULL;
+QemuOpts *opts;
+bool *opened;
+QDict *sub = NULL;
+QList *list = NULL;
+const QListEntry *lentry;
+const QDictEntry *dentry;
+int i;
+int ret = 0;
+
+qdict_flatten(options);
+qdict_extract_subqdict(options, sub, children.);
+qdict_array_split(sub, list);
+
+/* count how many different children are present and validate
+ * qdict_size(sub) address the open by reference case
+ */
+s-num_children = !qlist_size(list) ? qdict_size(sub) : qlist_size(list);
+if (s-num_children  2) {
+error_setg(local_err,
+   Number of provided children must be greater than 1);
+ret = -EINVAL;
+goto exit;
+}
+
+opts = qemu_opts_create(quorum_runtime_opts, NULL, 0, error_abort);
+qemu_opts_absorb_qdict(opts, options, local_err);
+if (error_is_set(local_err)) {
+ret = -EINVAL;
+goto exit;
+}
+
+s-threshold = qemu_opt_get_number(opts, QUORUM_OPT_VOTE_THRESHOLD, 0);
+
+/* and validate it against s-num_children */
+ret = quorum_valid_threshold(s-threshold, s-num_children, local_err);
+if (ret  0) {
+goto exit;
+}
+
+/* is the driver in blkverify mode */
+if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false) 
+s-num_children == 2  s-threshold == 2) {
+s-is_blkverify = true;
+} else if (qemu_opt_get_bool(opts, QUORUM_OPT_BLKVERIFY, false)) {
+fprintf(stderr, blkverify mode is set by setting blkverify=on 
+and using two files with vote_threshold=2\n);
+}
+
+/* allocate the children BlockDriverState array */
+s-bs = g_new0(BlockDriverState *, s-num_children);
+opened = g_new0(bool, s-num_children);
+
+/* Open by file name or options dict (command line or QMP) */
+if (s-num_children == qlist_size(list)) {
+for (i = 0, lentry = qlist_first(list); lentry;
+lentry = qlist_next(lentry), i++) {
+QDict *d = qobject_to_qdict(lentry-value);
+QINCREF(d);
+ret = bdrv_open(s-bs[i], NULL, NULL, d, flags, NULL, local_err);
+if (ret  0) {
+goto close_exit;
+}
+opened[i] = true;
+}
+/* Open by QMP references */
+} else {
+for (i = 0, dentry = qdict_first(sub); dentry;
+ dentry = qdict_next(sub, dentry), i++) {
+QString *string = qobject_to_qstring(dentry-value);
+ret = bdrv_open(s-bs[i], NULL, qstring_get_str(string), NULL,
+flags, NULL, local_err);
+if (ret  0) {
+goto close_exit;
+}
+

[Qemu-devel] [PATCH V18 04/12] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

qemu_iovec_compare() will be used to compare IOs vectors in quorum blkverify
mode. The patch extracts these functions in order to factorize the code.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/blkverify.c | 108 +-
 include/qemu-common.h |   2 +
 util/iov.c| 106 +
 3 files changed, 110 insertions(+), 106 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index b57da59..c86ad7a 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -173,110 +173,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
 return bdrv_getlength(s-test_file);
 }
 
-/**
- * Check that I/O vector contents are identical
- *
- * @a:  I/O vector
- * @b:  I/O vector
- * @ret:Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
-int i;
-ssize_t offset = 0;
-
-assert(a-niov == b-niov);
-for (i = 0; i  a-niov; i++) {
-size_t len = 0;
-uint8_t *p = (uint8_t *)a-iov[i].iov_base;
-uint8_t *q = (uint8_t *)b-iov[i].iov_base;
-
-assert(a-iov[i].iov_len == b-iov[i].iov_len);
-while (len  a-iov[i].iov_len  *p++ == *q++) {
-len++;
-}
-
-offset += len;
-
-if (len != a-iov[i].iov_len) {
-return offset;
-}
-}
-return -1;
-}
-
-typedef struct {
-int src_index;
-struct iovec *src_iov;
-void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
-const IOVectorSortElem *elem_a = a;
-const IOVectorSortElem *elem_b = b;
-
-/* Don't overflow */
-if (elem_a-src_iov-iov_base  elem_b-src_iov-iov_base) {
-return -1;
-} else if (elem_a-src_iov-iov_base  elem_b-src_iov-iov_base) {
-return 1;
-} else {
-return 0;
-}
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
-const IOVectorSortElem *elem_a = a;
-const IOVectorSortElem *elem_b = b;
-
-return elem_a-src_index - elem_b-src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved.  This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
-  void *buf)
-{
-IOVectorSortElem sortelems[src-niov];
-void *last_end;
-int i;
-
-/* Sort by source iovecs by base address */
-for (i = 0; i  src-niov; i++) {
-sortelems[i].src_index = i;
-sortelems[i].src_iov = src-iov[i];
-}
-qsort(sortelems, src-niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
-/* Allocate buffer space taking into account overlapping iovecs */
-last_end = NULL;
-for (i = 0; i  src-niov; i++) {
-struct iovec *cur = sortelems[i].src_iov;
-ptrdiff_t rewind = 0;
-
-/* Detect overlap */
-if (last_end  last_end  cur-iov_base) {
-rewind = last_end - cur-iov_base;
-}
-
-sortelems[i].dest_base = buf - rewind;
-buf += cur-iov_len - MIN(rewind, cur-iov_len);
-last_end = MAX(cur-iov_base + cur-iov_len, last_end);
-}
-
-/* Sort by source iovec index and build destination iovec */
-qsort(sortelems, src-niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
-for (i = 0; i  src-niov; i++) {
-qemu_iovec_add(dest, sortelems[i].dest_base, src-iov[i].iov_len);
-}
-}
-
 static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
  int64_t sector_num, QEMUIOVector 
*qiov,
  int nb_sectors,
@@ -340,7 +236,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
 
 static void blkverify_verify_readv(BlkverifyAIOCB *acb)
 {
-ssize_t offset = blkverify_iovec_compare(acb-qiov, acb-raw_qiov);
+ssize_t offset = qemu_iovec_compare(acb-qiov, acb-raw_qiov);
 if (offset != -1) {
 blkverify_err(acb, contents mismatch in sector % PRId64,
   acb-sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -358,7 +254,7 @@ static BlockDriverAIOCB 
*blkverify_aio_readv(BlockDriverState *bs,
 acb-verify = blkverify_verify_readv;
 acb-buf = qemu_blockalign(bs-file, qiov-size);
 qemu_iovec_init(acb-raw_qiov, acb-qiov-niov);
-blkverify_iovec_clone(acb-raw_qiov, qiov, acb-buf);
+qemu_iovec_clone(acb-raw_qiov, qiov, acb-buf);
 
 bdrv_aio_readv(s-test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 5054836..d70783e 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -346,6 +346,8 @@ size_t

[Qemu-devel] [PATCH V18 00/12] Quorum block filter

2014-02-18 Thread Benoît Canet

Hi,

This is the latest version of the quorum block filter.
Since the patch fixing the external_snapshot_prepare QDECREF(options) brain
damage that I introduced earlier quorum works fine in every cases.

in V18:
Address all Max comments [Max]
Fix sample quorum command line usage [Benoît]

tested:

Ran qemu-iotest including 081 ok

quorum command line ok
quorum command line snapshot transaction ok
quorum QMP by reference ok
quorum QMP by reference and snapshot transaction ok
quorum QMP at once ok
quorum QMP at once and snapshot transaction ok

Benoît Canet (12):
  quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.
  quorum: Create BDRVQuorumState and BlkDriver and do init.
  quorum: Add quorum_aio_writev and its dependencies.
  blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
blkverify.
  quorum: Add quorum_aio_readv.
  quorum: Add quorum mechanism.
  quorum: Add quorum_getlength().
  quorum: Add quorum_invalidate_cache().
  quorum: Add quorum_co_flush().
  quorum: Implement recursive .bdrv_recurse_is_first_non_filter in
quorum.
  quorum: Add quorum_open() and quorum_close().
  quorum: Add unit test.

 block/Makefile.objs|   1 +
 block/blkverify.c  | 108 +-
 block/quorum.c | 858 +
 configure  |  36 ++
 docs/qmp/qmp-events.txt|  36 ++
 include/monitor/monitor.h  |   2 +
 include/qemu-common.h  |   2 +
 monitor.c  |   5 +
 qapi-schema.json   |  21 +-
 tests/qemu-iotests/081 |  95 +
 tests/qemu-iotests/081.out |  34 ++
 tests/qemu-iotests/group   |   1 +
 util/iov.c | 106 ++
 13 files changed, 1198 insertions(+), 107 deletions(-)
 create mode 100644 block/quorum.c
 create mode 100755 tests/qemu-iotests/081
 create mode 100644 tests/qemu-iotests/081.out

-- 
1.8.3.2

[Qemu-devel] [PATCH V18 06/12] quorum: Add quorum mechanism.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

This patchset enables the core of the quorum mechanism.
The num_children reads are compared to get the majority version and if this
version exists more than threshold times the guest won't see the error at all.

If a block is corrupted or if an error occurs during an IO or if the quorum
cannot be established QMP events are used to report to the management.

Use gnutls's SHA-256 to compare versions.

--enable-quorum must be used to enable the feature.

Signed-off-by: Benoit Canet ben...@irqsave.net
---
 block/Makefile.objs   |   2 +-
 block/quorum.c| 391 +-
 configure |  36 +
 docs/qmp/qmp-events.txt   |  36 +
 include/monitor/monitor.h |   2 +
 monitor.c |   2 +
 6 files changed, 467 insertions(+), 2 deletions(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 716556f..425d7fb 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
qcow2-snapshot.o qcow2-c
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
-block-obj-y += quorum.o
+block-obj-$(CONFIG_QUORUM) += quorum.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/quorum.c b/block/quorum.c
index fd19662..7acaa97 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -13,7 +13,43 @@
  * See the COPYING file in the top-level directory.
  */
 
+#include gnutls/gnutls.h
+#include gnutls/crypto.h
 #include block/block_int.h
+#include qapi/qmp/qjson.h
+
+#define HASH_LENGTH 32
+
+/* This union holds a vote hash value */
+typedef union QuorumVoteValue {
+char h[HASH_LENGTH];   /* SHA-256 hash */
+int64_t l; /* simpler 64 bits hash */
+} QuorumVoteValue;
+
+/* A vote item */
+typedef struct QuorumVoteItem {
+int index;
+QLIST_ENTRY(QuorumVoteItem) next;
+} QuorumVoteItem;
+
+/* this structure is a vote version. A version is the set of votes sharing the
+ * same vote value.
+ * The set of votes will be tracked with the items field and its cardinality is
+ * vote_count.
+ */
+typedef struct QuorumVoteVersion {
+QuorumVoteValue value;
+int index;
+int vote_count;
+QLIST_HEAD(, QuorumVoteItem) items;
+QLIST_ENTRY(QuorumVoteVersion) next;
+} QuorumVoteVersion;
+
+/* this structure holds a group of vote versions together */
+typedef struct QuorumVotes {
+QLIST_HEAD(, QuorumVoteVersion) vote_list;
+bool (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
+} QuorumVotes;
 
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
@@ -65,10 +101,14 @@ struct QuorumAIOCB {
 int count;  /* number of completed AIOCB */
 int success_count;  /* number of successfully completed AIOCB */
 
+QuorumVotes votes;
+
 bool is_read;
 int vote_ret;
 };
 
+static void quorum_vote(QuorumAIOCB *acb);
+
 static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
@@ -94,6 +134,10 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 BDRVQuorumState *s = acb-common.bs-opaque;
 int i, ret = 0;
 
+if (acb-vote_ret) {
+ret = acb-vote_ret;
+}
+
 acb-common.cb(acb-common.opaque, ret);
 
 if (acb-is_read) {
@@ -107,6 +151,16 @@ static void quorum_aio_finalize(QuorumAIOCB *acb)
 qemu_aio_release(acb);
 }
 
+static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+return !memcmp(a-h, b-h, HASH_LENGTH);
+}
+
+static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+return a-l == b-l;
+}
+
 static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
BlockDriverState *bs,
QEMUIOVector *qiov,
@@ -125,6 +179,8 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 acb-qcrs = g_new0(QuorumChildRequest, s-num_children);
 acb-count = 0;
 acb-success_count = 0;
+acb-votes.compare = quorum_sha256_compare;
+QLIST_INIT(acb-votes.vote_list);
 acb-is_read = false;
 acb-vote_ret = 0;
 
@@ -137,6 +193,48 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 return acb;
 }
 
+static void quorum_report_bad(QuorumAIOCB *acb, char *node_name, int ret)
+{
+QObject *data;
+assert(node_name);
+data = qobject_from_jsonf({ 'ret': %i
+  , 'node-name': \%s\
+  , 'sector-num': % PRId64
+  , 'sectors-count': %i },
+  ret, node_name, acb-sector_num, 
acb-nb_sectors);
+monitor_protocol_event(QEVENT_QUORUM_REPORT_BAD, data);
+qobject_decref(data);

Re: [Qemu-devel] [PULL 00/20] acpi,pc,pci fixes and enhancements

2014-02-18 Thread Stefano Stabellini

On Thu, 13 Feb 2014, Peter Maydell wrote:
 On 10 February 2014 16:47, Michael S. Tsirkin m...@redhat.com wrote:
  The following changes since commit 2b2449f7e467957778ca006904471b231dc0ac8e:
 
Merge remote-tracking branch 'remotes/borntraeger/tags/kvm-s390-20140131' 
  into staging (2014-02-04 18:46:33 +)
 
  are available in the git repository at:
 
 
git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
 
  for you to fetch changes up to 417c45ab2f847c0a47b1232f611aa886df6a97d5:
 
ACPI: Remove commented-out code from HPET._CRS (2014-02-10 11:09:33 +0200)
 
 Applied, thanks.

It looks like that this series breaks disk unplug
(hw/ide/piix.c:pci_piix3_xen_ide_unplug).

I bisected it and the problem is caused by:

commit 5e95494380ecf83c97d28f72134ab45e0cace8f9
Author: Igor Mammedov imamm...@redhat.com
Date:   Wed Feb 5 16:36:52 2014 +0100

hw/pci: switch to a generic hotplug handling for PCIDevice

make qdev_unplug()/device_set_realized() to call hotplug handler's
plug/unplug methods if available and remove not needed anymore
hot(un)plug handling from PCIDevice.

In case if hotplug handler is not available, revert to the legacy
hotplug method for compatibility with not yet converted buses.

Signed-off-by: Igor Mammedov imamm...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Alexander Graf


On 02/18/2014 12:22 PM, Peter Maydell wrote:

On 18 February 2014 11:16, Paolo Bonzini pbonz...@redhat.com wrote:

Il 18/02/2014 12:09, Peter Maydell ha scritto:


No, we've had that topic before: It's your job as submitter and
maintainer to flag that appropriately in the commit message, as per QEMU
Summit 2012.

I don't think this workflow works. I have no idea what
stable's criteria are, and if you rely on people
adding a cc you're going to miss stuff.


There isn't really a standard criterion.  It's up to each maintainer to be
stricter or looser on what goes to stable.

My criteria for ARM in the past has typically been there's
a new release every three months, anything that got past
the release testing process for release N is sufficiently
non-critical it can just go into release N+1.


Unfortunately this doesn't work for distributions. Distros need to 
maintain a stable branch for the lifetime of a release to ensure that 
we're reasonably regression free.


If you indicate that this doesn't apply to ARM it basically means you 
admit that ARM systems are not yet ready for stable use by customers 
when they want to use KVM. At least at the point when we agree that 
customers do want to run on a stable base for virtualization on ARM we 
need a working -stable system for critical fixes.



Alex

[Qemu-devel] [PATCH V18 12/12] quorum: Add unit test.

2014-02-18 Thread Benoît Canet

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/081 | 95 ++
 tests/qemu-iotests/081.out | 34 +
 tests/qemu-iotests/group   |  1 +
 3 files changed, 130 insertions(+)
 create mode 100755 tests/qemu-iotests/081
 create mode 100644 tests/qemu-iotests/081.out

diff --git a/tests/qemu-iotests/081 b/tests/qemu-iotests/081
new file mode 100755
index 000..be34544
--- /dev/null
+++ b/tests/qemu-iotests/081
@@ -0,0 +1,95 @@
+#!/bin/bash
+#
+# Test Quorum block driver
+#
+# Copyright (C) 2013 Nodalink, SARL.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=ben...@irqsave.net
+
+seq=`basename $0`
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -rf $TEST_DIR/1.raw
+rm -rf $TEST_DIR/2.raw
+rm -rf $TEST_DIR/3.raw
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto generic
+_supported_os Linux
+
+quorum=file.driver=quorum,file.children.0.file.filename=$TEST_DIR/1.raw
+quorum=$quorum,file.children.1.file.filename=$TEST_DIR/2.raw
+quorum=$quorum,file.children.2.file.filename=$TEST_DIR/3.raw,file.vote-threshold=2
+
+echo
+echo == creating quorum files ==
+
+size=10M
+
+TEST_IMG=$TEST_DIR/1.raw _make_test_img $size
+TEST_IMG=$TEST_DIR/2.raw _make_test_img $size
+TEST_IMG=$TEST_DIR/3.raw _make_test_img $size
+
+echo
+echo == writing images ==
+
+$QEMU_IO -c open -o $quorum -c write -P 0x32 0 $size | _filter_qemu_io
+
+echo
+echo == checking quorum write ==
+
+$QEMU_IO -c read -P 0x32 0 $size $TEST_DIR/1.raw | _filter_qemu_io
+$QEMU_IO -c read -P 0x32 0 $size $TEST_DIR/2.raw | _filter_qemu_io
+$QEMU_IO -c read -P 0x32 0 $size $TEST_DIR/3.raw | _filter_qemu_io
+
+echo
+echo == corrupting image ==
+
+$QEMU_IO -c write -P 0x42 0 $size $TEST_DIR/2.raw | _filter_qemu_io
+
+echo
+echo == checking quorum correction ==
+
+$QEMU_IO -c open -o $quorum -c read -P 0x32 0 $size | _filter_qemu_io
+
+echo
+echo == breaking quorum ==
+
+$QEMU_IO -c write -P 0x41 0 $size $TEST_DIR/1.raw | _filter_qemu_io
+echo
+echo == checking that quorum is broken ==
+
+$QEMU_IO -c open -o $quorum -c read -P 0x32 0 $size | _filter_qemu_io
+
+
+# success, all done
+echo *** done
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/081.out b/tests/qemu-iotests/081.out
new file mode 100644
index 000..b5b55ab
--- /dev/null
+++ b/tests/qemu-iotests/081.out
@@ -0,0 +1,34 @@
+QA output created by 081
+
+== creating quorum files ==
+Formatting 'TEST_DIR/1.IMGFMT', fmt=IMGFMT size=10485760 
+Formatting 'TEST_DIR/2.IMGFMT', fmt=IMGFMT size=10485760 
+Formatting 'TEST_DIR/3.IMGFMT', fmt=IMGFMT size=10485760 
+
+== writing images ==
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== checking quorum write ==
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== corrupting image ==
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== checking quorum correction ==
+read 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== breaking quorum ==
+wrote 10485760/10485760 bytes at offset 0
+10 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== checking that quorum is broken ==
+qemu-io: can't open device (null): Could not read image for determining its 
format: Input/output error
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index d8be74a..fe5d764 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -83,3 +83,4 @@
 074 rw auto
 077 rw auto
 079 rw auto
+081 rw auto
-- 
1.8.3.2

Re: [Qemu-devel] [PULL 00/20] acpi,pc,pci fixes and enhancements

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 13:16, Stefano Stabellini ha scritto:

It looks like that this series breaks disk unplug
(hw/ide/piix.c:pci_piix3_xen_ide_unplug).

I bisected it and the problem is caused by:

commit 5e95494380ecf83c97d28f72134ab45e0cace8f9
Author: Igor Mammedov imamm...@redhat.com
Date:   Wed Feb 5 16:36:52 2014 +0100

hw/pci: switch to a generic hotplug handling for PCIDevice

make qdev_unplug()/device_set_realized() to call hotplug handler's
plug/unplug methods if available and remove not needed anymore
hot(un)plug handling from PCIDevice.

In case if hotplug handler is not available, revert to the legacy
hotplug method for compatibility with not yet converted buses.

Signed-off-by: Igor Mammedov imamm...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com




What exactly breaks?

Paolo

Re: [Qemu-devel] [PATCH V17 11/12] quorum: Add quorum_open() and quorum_close().

2014-02-18 Thread Benoît Canet

The Tuesday 18 Feb 2014 à 12:06:57 (+0800), Fam Zheng wrote :
 On Thu, 02/13 10:09, Benoît Canet wrote:
  The Wednesday 12 Feb 2014 à 23:06:38 (+0100), Benoît Canet wrote :
   +static void quorum_close(BlockDriverState *bs)
   +{
   +BDRVQuorumState *s = bs-opaque;
   +int i;
   +
   +for (i = 0; i  s-num_children; i++) {
   +bdrv_unref(s-bs[i]);
  Quorum crash here from time to time I don't understand why.
 
 I think you could add printf or use gdb to examine every bdrv_unref on the
 crashing bs, so you can track down to the unbalanced reference.

I found the cause of the crash since: I added a spurious QDECREF() in the
external snapshot prepare function.
It's fixed now.

Thanks,

Benoît

 
 Fam

[Qemu-devel] [PATCH] kvm: fix kvm_set_migration_log() behavior

2014-02-18 Thread Vincent KHERBACHE

The test (!!(mem-flags  KVM_MEM_LOG_DIRTY_PAGES) == enable) is wrong because 
the condition is valid when enable = 0 and current dirty log memory flag is set.
As a consequence kvm_log_global_stop() does not stop the KVM dirty log
tracking: kvm_set_migration_log(0) didn't do its job.
So instead I propose to use kvm_slot_dirty_pages_log_change() which correctly
compare the memory flags (old/new).

Signed-off-by: Vincent KHERBACHE vincent.kherba...@inria.fr
---
 kvm-all.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2ca9143..f104f87 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -355,7 +355,7 @@ static int kvm_set_migration_log(int enable)
 {
 KVMState *s = kvm_state;
 KVMSlot *mem;
-int i, err;
+int i, err = 0;
 
 s-migration_log = enable;
 
@@ -365,15 +365,9 @@ static int kvm_set_migration_log(int enable)
 if (!mem-memory_size) {
 continue;
 }
-if (!!(mem-flags  KVM_MEM_LOG_DIRTY_PAGES) == enable) {
-continue;
-}
-err = kvm_set_user_memory_region(s, mem);
-if (err) {
-return err;
-}
+err = kvm_slot_dirty_pages_log_change(mem, (bool)enable);
 }
-return 0;
+return err;
 }
 
 /* get kvm's dirty pages bitmap and update qemu's */
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Peter Maydell

On 18 February 2014 12:17, Alexander Graf ag...@suse.de wrote:
 On 02/18/2014 12:22 PM, Peter Maydell wrote:
 My criteria for ARM in the past has typically been there's
 a new release every three months, anything that got past
 the release testing process for release N is sufficiently
 non-critical it can just go into release N+1.

 Unfortunately this doesn't work for distributions. Distros
 need to maintain a stable branch for the lifetime of a release
 to ensure that we're reasonably regression free.

 If you indicate that this doesn't apply to ARM it basically means you admit
 that ARM systems are not yet ready for stable use by customers when they
 want to use KVM. At least at the point when we agree that customers do want
 to run on a stable base for virtualization on ARM we need a working -stable
 system for critical fixes.

I agree in general that ARM support needs to move from
its traditional this is just a dev tool situation to
a broader level of support/stability guarantees for KVM.
(We're not yet guaranteeing cross-version migration,
for another example there.)

However again we run into the definition of what's a
critical fix?. I think if distros need a stable branch
then they need to be prepared to do the work of sorting
through what counts as a critical fix that needs to be
ported to that branch. (For instance, which boards and
targets do they care about?)

For instance patch 3 only applies to the integrator
board, and I don't consider the guest-to-host border
to be a security boundary for most of our legacy board
models: there's just too much unaudited device code for
that to be trustable.

thanks
-- PMM

[Qemu-devel] [PATCH 0/8] virtio endian-ambivalent target fixes (rebased)

2014-02-18 Thread Greg Kurz

On Fri, 14 Feb 2014 12:59:49 +0100
Andreas Färber afaer...@suse.de wrote:

 Hi,
 
 [...]
 
 It might've helped if Rusty had actually used our
 scripts/get_maintainer.pl script to CC people. While Anthony seems to
 have reviewed some patches (usually Reviewed-by should be before the
 final Signed-off-by fwiw), neither Stefan (virtio-net) nor Kevin
 (virtio-blk) nor Paolo (virtio-scsi) were CC'ed, and recently Michael
 stepped up as virtio maintainer, so maybe he can take them once ready.
 
 1/7 looks okay to me; 3-7 are rather mechanical - people will need to
 review that those changes are sufficient for the current code base.
 We've since converted virtio devices to QOM realize, so a rebase is
 likely needed for such an old series.
 
 Regards,
 Andreas

Hi,

Thank you Andreas for your support. :)

Here is a rebased patchset. Only the two first patches (virtio_get_byteswap:
and virtio:) had to be updated actually. FWIW, this the very same patchset
I use, along with some ppc64 specific enablement code, to have functionnal
ppc64 LE guests.

---

Greg Kurz (1):
  hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.

Rusty Russell (7):
  virtio_get_byteswap: function for endian-ambivalent targets using virtio.
  virtio: allow byte swapping for vring and config access
  hw/net/virtio-net: use virtio wrappers to access headers.
  hw/net/virtio-balloon: use virtio wrappers to access page frame numbers.
  hw/block/virtio-blk: use virtio wrappers  to access headers.
  hw/scsi/virtio-scsi: use virtio wrappers  to access headers.
  hw/char/virtio-serial-bus: use virtio wrappers to access headers.


 hw/9pfs/virtio-9p-device.c|3 +
 hw/block/virtio-blk.c |   35 +-
 hw/char/virtio-serial-bus.c   |   34 +-
 hw/net/virtio-net.c   |   15 ++--
 hw/scsi/virtio-scsi.c |   33 +
 hw/virtio/virtio-balloon.c|3 +
 hw/virtio/virtio.c|   38 ++-
 include/hw/virtio/virtio-access.h |  132 +
 include/hw/virtio/virtio.h|2 +
 stubs/Makefile.objs   |1 
 stubs/virtio_get_byteswap.c   |6 ++
 11 files changed, 228 insertions(+), 74 deletions(-)
 create mode 100644 include/hw/virtio/virtio-access.h
 create mode 100644 stubs/virtio_get_byteswap.c

Best Regards.

-- 
Greg

[Qemu-devel] [PATCH 3/8] hw/net/virtio-net: use virtio wrappers to access headers.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/net/virtio-net.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3626608..34d6b48 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -23,6 +23,7 @@
 #include hw/virtio/virtio-bus.h
 #include qapi/qmp/qjson.h
 #include monitor/monitor.h
+#include hw/virtio/virtio-access.h
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -72,8 +73,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t 
*config)
 VirtIONet *n = VIRTIO_NET(vdev);
 struct virtio_net_config netcfg;
 
-stw_p(netcfg.status, n-status);
-stw_p(netcfg.max_virtqueue_pairs, n-max_queues);
+virtio_stw_p(netcfg.status, n-status);
+virtio_stw_p(netcfg.max_virtqueue_pairs, n-max_queues);
 memcpy(netcfg.mac, n-mac, ETH_ALEN);
 memcpy(config, netcfg, n-config_size);
 }
@@ -618,7 +619,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 
 s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries,
sizeof(mac_data.entries));
-mac_data.entries = ldl_p(mac_data.entries);
+mac_data.entries = virtio_ldl_p(mac_data.entries);
 if (s != sizeof(mac_data.entries)) {
 goto error;
 }
@@ -645,7 +646,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 
 s = iov_to_buf(iov, iov_cnt, 0, mac_data.entries,
sizeof(mac_data.entries));
-mac_data.entries = ldl_p(mac_data.entries);
+mac_data.entries = virtio_ldl_p(mac_data.entries);
 if (s != sizeof(mac_data.entries)) {
 goto error;
 }
@@ -690,7 +691,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, 
uint8_t cmd,
 NetClientState *nc = qemu_get_queue(n-nic);
 
 s = iov_to_buf(iov, iov_cnt, 0, vid, sizeof(vid));
-vid = lduw_p(vid);
+vid = virtio_lduw_p(vid);
 if (s != sizeof(vid)) {
 return VIRTIO_NET_ERR;
 }
@@ -727,7 +728,7 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
 return VIRTIO_NET_ERR;
 }
 
-queues = lduw_p(mq.virtqueue_pairs);
+queues = virtio_lduw_p(mq.virtqueue_pairs);
 
 if (queues  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN ||
 queues  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX ||
@@ -1026,7 +1027,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, 
const uint8_t *buf, size_t
 }
 
 if (mhdr_cnt) {
-stw_p(mhdr.num_buffers, i);
+virtio_stw_p(mhdr.num_buffers, i);
 iov_from_buf(mhdr_sg, mhdr_cnt,
  0,
  mhdr.num_buffers, sizeof mhdr.num_buffers);

[Qemu-devel] [PATCH 6/8] hw/scsi/virtio-scsi: use virtio wrappers to access headers.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/scsi/virtio-scsi.c |   33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6610b3a..df6c0e2 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -18,6 +18,7 @@
 #include hw/scsi/scsi.h
 #include block/scsi.h
 #include hw/virtio/virtio-bus.h
+#include hw/virtio/virtio-access.h
 
 typedef struct VirtIOSCSIReq {
 VirtIOSCSI *dev;
@@ -313,12 +314,12 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
uint32_t status,
 req-resp.cmd-response = VIRTIO_SCSI_S_OK;
 req-resp.cmd-status = status;
 if (req-resp.cmd-status == GOOD) {
-req-resp.cmd-resid = tswap32(resid);
+req-resp.cmd-resid = virtio_tswap32(resid);
 } else {
 req-resp.cmd-resid = 0;
 sense_len = scsi_req_get_sense(r, req-resp.cmd-sense,
VIRTIO_SCSI_SENSE_SIZE);
-req-resp.cmd-sense_len = tswap32(sense_len);
+req-resp.cmd-sense_len = virtio_tswap32(sense_len);
 }
 virtio_scsi_complete_req(req);
 }
@@ -414,16 +415,16 @@ static void virtio_scsi_get_config(VirtIODevice *vdev,
 VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config;
 VirtIOSCSICommon *s = VIRTIO_SCSI_COMMON(vdev);
 
-stl_raw(scsiconf-num_queues, s-conf.num_queues);
-stl_raw(scsiconf-seg_max, 128 - 2);
-stl_raw(scsiconf-max_sectors, s-conf.max_sectors);
-stl_raw(scsiconf-cmd_per_lun, s-conf.cmd_per_lun);
-stl_raw(scsiconf-event_info_size, sizeof(VirtIOSCSIEvent));
-stl_raw(scsiconf-sense_size, s-sense_size);
-stl_raw(scsiconf-cdb_size, s-cdb_size);
-stw_raw(scsiconf-max_channel, VIRTIO_SCSI_MAX_CHANNEL);
-stw_raw(scsiconf-max_target, VIRTIO_SCSI_MAX_TARGET);
-stl_raw(scsiconf-max_lun, VIRTIO_SCSI_MAX_LUN);
+virtio_stl_p(scsiconf-num_queues, s-conf.num_queues);
+virtio_stl_p(scsiconf-seg_max, 128 - 2);
+virtio_stl_p(scsiconf-max_sectors, s-conf.max_sectors);
+virtio_stl_p(scsiconf-cmd_per_lun, s-conf.cmd_per_lun);
+virtio_stl_p(scsiconf-event_info_size, sizeof(VirtIOSCSIEvent));
+virtio_stl_p(scsiconf-sense_size, s-sense_size);
+virtio_stl_p(scsiconf-cdb_size, s-cdb_size);
+virtio_stw_p(scsiconf-max_channel, VIRTIO_SCSI_MAX_CHANNEL);
+virtio_stw_p(scsiconf-max_target, VIRTIO_SCSI_MAX_TARGET);
+virtio_stl_p(scsiconf-max_lun, VIRTIO_SCSI_MAX_LUN);
 }
 
 static void virtio_scsi_set_config(VirtIODevice *vdev,
@@ -432,14 +433,14 @@ static void virtio_scsi_set_config(VirtIODevice *vdev,
 VirtIOSCSIConfig *scsiconf = (VirtIOSCSIConfig *)config;
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 
-if ((uint32_t) ldl_raw(scsiconf-sense_size) = 65536 ||
-(uint32_t) ldl_raw(scsiconf-cdb_size) = 256) {
+if ((uint32_t) virtio_ldl_p(scsiconf-sense_size) = 65536 ||
+(uint32_t) virtio_ldl_p(scsiconf-cdb_size) = 256) {
 error_report(bad data written to virtio-scsi configuration space);
 exit(1);
 }
 
-vs-sense_size = ldl_raw(scsiconf-sense_size);
-vs-cdb_size = ldl_raw(scsiconf-cdb_size);
+vs-sense_size = virtio_ldl_p(scsiconf-sense_size);
+vs-cdb_size = virtio_ldl_p(scsiconf-cdb_size);
 }
 
 static uint32_t virtio_scsi_get_features(VirtIODevice *vdev,

[Qemu-devel] [PATCH 5/8] hw/block/virtio-blk: use virtio wrappers to access headers.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/block/virtio-blk.c |   35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8a568e5..f55f89c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -26,6 +26,7 @@
 # include scsi/sg.h
 #endif
 #include hw/virtio/virtio-bus.h
+#include hw/virtio/virtio-access.h
 
 typedef struct VirtIOBlockReq
 {
@@ -77,7 +78,7 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 trace_virtio_blk_rw_complete(req, ret);
 
 if (ret) {
-bool is_read = !(ldl_p(req-out-type)  VIRTIO_BLK_T_OUT);
+bool is_read = !(virtio_ldl_p(req-out-type)  VIRTIO_BLK_T_OUT);
 if (virtio_blk_handle_rw_error(req, -ret, is_read))
 return;
 }
@@ -224,12 +225,12 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 hdr.status = CHECK_CONDITION;
 }
 
-stl_p(req-scsi-errors,
-  hdr.status | (hdr.msg_status  8) |
-  (hdr.host_status  16) | (hdr.driver_status  24));
-stl_p(req-scsi-residual, hdr.resid);
-stl_p(req-scsi-sense_len, hdr.sb_len_wr);
-stl_p(req-scsi-data_len, hdr.dxfer_len);
+virtio_stl_p(req-scsi-errors,
+ hdr.status | (hdr.msg_status  8) |
+ (hdr.host_status  16) | (hdr.driver_status  24));
+virtio_stl_p(req-scsi-residual, hdr.resid);
+virtio_stl_p(req-scsi-sense_len, hdr.sb_len_wr);
+virtio_stl_p(req-scsi-data_len, hdr.dxfer_len);
 
 virtio_blk_req_complete(req, status);
 g_free(req);
@@ -240,7 +241,7 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 
 fail:
 /* Just put anything nonzero so that the ioctl fails in the guest.  */
-stl_p(req-scsi-errors, 255);
+virtio_stl_p(req-scsi-errors, 255);
 virtio_blk_req_complete(req, status);
 g_free(req);
 }
@@ -286,7 +287,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 BlockRequest *blkreq;
 uint64_t sector;
 
-sector = ldq_p(req-out-sector);
+sector = virtio_ldq_p(req-out-sector);
 
 bdrv_acct_start(req-dev-bs, req-acct, req-qiov.size, BDRV_ACCT_WRITE);
 
@@ -320,7 +321,7 @@ static void virtio_blk_handle_read(VirtIOBlockReq *req)
 {
 uint64_t sector;
 
-sector = ldq_p(req-out-sector);
+sector = virtio_ldq_p(req-out-sector);
 
 bdrv_acct_start(req-dev-bs, req-acct, req-qiov.size, BDRV_ACCT_READ);
 
@@ -358,7 +359,7 @@ static void virtio_blk_handle_request(VirtIOBlockReq *req,
 req-out = (void *)req-elem.out_sg[0].iov_base;
 req-in = (void *)req-elem.in_sg[req-elem.in_num - 1].iov_base;
 
-type = ldl_p(req-out-type);
+type = virtio_ldl_p(req-out-type);
 
 if (type  VIRTIO_BLK_T_FLUSH) {
 virtio_blk_handle_flush(req, mrb);
@@ -487,12 +488,12 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 
 bdrv_get_geometry(s-bs, capacity);
 memset(blkcfg, 0, sizeof(blkcfg));
-stq_raw(blkcfg.capacity, capacity);
-stl_raw(blkcfg.seg_max, 128 - 2);
-stw_raw(blkcfg.cylinders, s-conf-cyls);
-stl_raw(blkcfg.blk_size, blk_size);
-stw_raw(blkcfg.min_io_size, s-conf-min_io_size / blk_size);
-stw_raw(blkcfg.opt_io_size, s-conf-opt_io_size / blk_size);
+virtio_stq_p(blkcfg.capacity, capacity);
+virtio_stl_p(blkcfg.seg_max, 128 - 2);
+virtio_stw_p(blkcfg.cylinders, s-conf-cyls);
+virtio_stl_p(blkcfg.blk_size, blk_size);
+virtio_stw_p(blkcfg.min_io_size, s-conf-min_io_size / blk_size);
+virtio_stw_p(blkcfg.opt_io_size, s-conf-opt_io_size / blk_size);
 blkcfg.heads = s-conf-heads;
 /*
  * We must ensure that the block device capacity is a multiple of

[Qemu-devel] [PATCH 1/8] virtio_get_byteswap: function for endian-ambivalent targets using virtio.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

virtio data structures are defined as target endian, which assumes
that's a fixed value.  In fact, that actually means it's
platform-specific.

The OASIS virtio 1.0 spec will fix this.  Meanwhile, create a hook for
little endian ppc (and potentially ARM).  This is called at device
reset time (which is done before any driver is loaded) since it
may involve a system call to get the status when running under kvm.

[ fixed checkpatch.pl error with the virtio_byteswap initialisation,
  ldq_phys() API change, Greg Kurz gk...@linux.vnet.ibm.com ]
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
---
 hw/virtio/virtio.c|6 ++
 include/hw/virtio/virtio-access.h |  132 +
 include/hw/virtio/virtio.h|2 +
 stubs/Makefile.objs   |1 
 stubs/virtio_get_byteswap.c   |6 ++
 5 files changed, 147 insertions(+)
 create mode 100644 include/hw/virtio/virtio-access.h
 create mode 100644 stubs/virtio_get_byteswap.c

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index aeabf3a..4fd6ac2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -19,6 +19,9 @@
 #include hw/virtio/virtio.h
 #include qemu/atomic.h
 #include hw/virtio/virtio-bus.h
+#include hw/virtio/virtio-access.h
+
+bool virtio_byteswap;
 
 /*
  * The alignment to use between consumer and producer parts of vring.
@@ -546,6 +549,9 @@ void virtio_reset(void *opaque)
 
 virtio_set_status(vdev, 0);
 
+/* We assume all devices are the same endian. */
+virtio_byteswap = virtio_get_byteswap();
+
 if (k-reset) {
 k-reset(vdev);
 }
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
new file mode 100644
index 000..2e22a47
--- /dev/null
+++ b/include/hw/virtio/virtio-access.h
@@ -0,0 +1,132 @@
+/*
+ * Virtio Accessor Support: In case your target can change endian.
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Rusty Russell   ru...@au.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#ifndef _QEMU_VIRTIO_ACCESS_H
+#define _QEMU_VIRTIO_ACCESS_H
+#include hw/virtio/virtio.h
+
+/* Initialized by virtio_get_byteswap() at any virtio device reset. */
+extern bool virtio_byteswap;
+
+static inline uint16_t virtio_lduw_phys(AddressSpace *as, hwaddr pa)
+{
+if (virtio_byteswap) {
+return bswap16(lduw_phys(as, pa));
+}
+return lduw_phys(as, pa);
+}
+
+static inline uint32_t virtio_ldl_phys(AddressSpace *as, hwaddr pa)
+{
+if (virtio_byteswap) {
+return bswap32(ldl_phys(as, pa));
+}
+return ldl_phys(as, pa);
+}
+
+static inline uint64_t virtio_ldq_phys(AddressSpace *as, hwaddr pa)
+{
+if (virtio_byteswap) {
+return bswap64(ldq_phys(as, pa));
+}
+return ldq_phys(as, pa);
+}
+
+static inline void virtio_stw_phys(AddressSpace *as, hwaddr pa, uint16_t value)
+{
+if (virtio_byteswap) {
+stw_phys(as, pa, bswap16(value));
+} else {
+stw_phys(as, pa, value);
+}
+}
+
+static inline void virtio_stl_phys(AddressSpace *as, hwaddr pa, uint32_t value)
+{
+if (virtio_byteswap) {
+stl_phys(as, pa, bswap32(value));
+} else {
+stl_phys(as, pa, value);
+}
+}
+
+static inline void virtio_stw_p(void *ptr, uint16_t v)
+{
+if (virtio_byteswap) {
+stw_p(ptr, bswap16(v));
+} else {
+stw_p(ptr, v);
+}
+}
+
+static inline void virtio_stl_p(void *ptr, uint32_t v)
+{
+if (virtio_byteswap) {
+stl_p(ptr, bswap32(v));
+} else {
+stl_p(ptr, v);
+}
+}
+
+static inline void virtio_stq_p(void *ptr, uint64_t v)
+{
+if (virtio_byteswap) {
+stq_p(ptr, bswap64(v));
+} else {
+stq_p(ptr, v);
+}
+}
+
+static inline int virtio_lduw_p(const void *ptr)
+{
+if (virtio_byteswap) {
+return bswap16(lduw_p(ptr));
+} else {
+return lduw_p(ptr);
+}
+}
+
+static inline int virtio_ldl_p(const void *ptr)
+{
+if (virtio_byteswap) {
+return bswap32(ldl_p(ptr));
+} else {
+return ldl_p(ptr);
+}
+}
+
+static inline uint64_t virtio_ldq_p(const void *ptr)
+{
+if (virtio_byteswap) {
+return bswap64(ldq_p(ptr));
+} else {
+return ldq_p(ptr);
+}
+}
+
+static inline uint32_t virtio_tswap32(uint32_t s)
+{
+if (virtio_byteswap) {
+return bswap32(tswap32(s));
+} else {
+return tswap32(s);
+}
+}
+
+static inline void virtio_tswap32s(uint32_t *s)
+{
+tswap32s(s);
+if (virtio_byteswap) {
+*s = bswap32(*s);
+}
+}
+#endif /* _QEMU_VIRTIO_ACCESS_H */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 3e54e90..5009945 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -253,4 +253,6 @@ void

Re: [Qemu-devel] [PATCH] block/iscsi: fix deadlock on scsi check condition

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 13:08, Peter Lieven ha scritto:

the retry logic was broken because the complete status
of the task structure was not reset. this resulted in
an infinite loop retrying the command over and over.

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3c0b728..ded414e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -145,12 +145,13 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
status,

 if (iTask-retries--  0  status == SCSI_STATUS_CHECK_CONDITION
  task-sense.key == SCSI_SENSE_UNIT_ATTENTION) {
+error_report(iSCSI CheckCondition: %s, iscsi_get_error(iscsi));
 iTask-do_retry = 1;
 goto out;
 }

 if (status != SCSI_STATUS_GOOD) {
-error_report(iSCSI: Failure. %s, iscsi_get_error(iscsi));
+error_report(iSCSI Failure: %s, iscsi_get_error(iscsi));
 }

 out:
@@ -325,6 +326,7 @@ retry:
 }

 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }

@@ -399,6 +401,7 @@ retry:
 }

 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }

@@ -433,6 +436,7 @@ retry:
 }

 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }

@@ -683,6 +687,7 @@ retry:
 scsi_free_scsi_task(iTask.task);
 iTask.task = NULL;
 }
+iTask.complete = 0;
 goto retry;
 }

@@ -767,6 +772,7 @@ retry:
 }

 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }

@@ -836,6 +842,7 @@ retry:
 }

 if (iTask.do_retry) {
+iTask.complete = 0;
 goto retry;
 }




Applied to scsi-next, thanks.

Paolo

[Qemu-devel] [PATCH 2/8] virtio: allow byte swapping for vring and config access

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

This is based on a simpler patch by Anthony Liguouri, which only handled
the vring accesses.  We also need some drivers to access these helpers,
eg. for data which contains headers.

[ ldq_phys() API change, Greg Kurz gk...@linux.vnet.ibm.com ]
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
---
 hw/virtio/virtio.c |   32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4fd6ac2..c2c3a7c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -108,49 +108,49 @@ static inline uint64_t vring_desc_addr(hwaddr desc_pa, 
int i)
 {
 hwaddr pa;
 pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, addr);
-return ldq_phys(address_space_memory, pa);
+return virtio_ldq_phys(address_space_memory, pa);
 }
 
 static inline uint32_t vring_desc_len(hwaddr desc_pa, int i)
 {
 hwaddr pa;
 pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len);
-return ldl_phys(address_space_memory, pa);
+return virtio_ldl_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_desc_flags(hwaddr desc_pa, int i)
 {
 hwaddr pa;
 pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, flags);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_desc_next(hwaddr desc_pa, int i)
 {
 hwaddr pa;
 pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, next);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
 {
 hwaddr pa;
 pa = vq-vring.avail + offsetof(VRingAvail, flags);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
 hwaddr pa;
 pa = vq-vring.avail + offsetof(VRingAvail, idx);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
 {
 hwaddr pa;
 pa = vq-vring.avail + offsetof(VRingAvail, ring[i]);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline uint16_t vring_used_event(VirtQueue *vq)
@@ -162,44 +162,44 @@ static inline void vring_used_ring_id(VirtQueue *vq, int 
i, uint32_t val)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, ring[i].id);
-stl_phys(address_space_memory, pa, val);
+virtio_stl_phys(address_space_memory, pa, val);
 }
 
 static inline void vring_used_ring_len(VirtQueue *vq, int i, uint32_t val)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, ring[i].len);
-stl_phys(address_space_memory, pa, val);
+virtio_stl_phys(address_space_memory, pa, val);
 }
 
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, idx);
-return lduw_phys(address_space_memory, pa);
+return virtio_lduw_phys(address_space_memory, pa);
 }
 
 static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, idx);
-stw_phys(address_space_memory, pa, val);
+virtio_stw_phys(address_space_memory, pa, val);
 }
 
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, flags);
-stw_phys(address_space_memory,
- pa, lduw_phys(address_space_memory, pa) | mask);
+virtio_stw_phys(address_space_memory,
+ pa, virtio_lduw_phys(address_space_memory, pa) | mask);
 }
 
 static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
 {
 hwaddr pa;
 pa = vq-vring.used + offsetof(VRingUsed, flags);
-stw_phys(address_space_memory,
- pa, lduw_phys(address_space_memory, pa)  ~mask);
+virtio_stw_phys(address_space_memory,
+pa, lduw_phys(address_space_memory, pa)  ~mask);
 }
 
 static inline void vring_avail_event(VirtQueue *vq, uint16_t val)
@@ -209,7 +209,7 @@ static inline void vring_avail_event(VirtQueue *vq, 
uint16_t val)
 return;
 }
 pa = vq-vring.used + offsetof(VRingUsed, ring[vq-vring.num]);
-stw_phys(address_space_memory, pa, val);
+virtio_stw_phys(address_space_memory, pa, val);
 }
 
 void virtio_queue_set_notification(VirtQueue *vq, int enable)

[Qemu-devel] [PATCH V18 05/12] quorum: Add quorum_aio_readv.

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Add code to do num_children reads in parallel and cleanup the structures
afterwards.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/quorum.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/block/quorum.c b/block/quorum.c
index 99e0e03..fd19662 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -91,10 +91,18 @@ static AIOCBInfo quorum_aiocb_info = {
 
 static void quorum_aio_finalize(QuorumAIOCB *acb)
 {
-int ret = 0;
+BDRVQuorumState *s = acb-common.bs-opaque;
+int i, ret = 0;
 
 acb-common.cb(acb-common.opaque, ret);
 
+if (acb-is_read) {
+for (i = 0; i  s-num_children; i++) {
+qemu_vfree(acb-qcrs[i].buf);
+qemu_iovec_destroy(acb-qcrs[i].qiov);
+}
+}
+
 g_free(acb-qcrs);
 qemu_aio_release(acb);
 }
@@ -149,6 +157,34 @@ static void quorum_aio_cb(void *opaque, int ret)
 quorum_aio_finalize(acb);
 }
 
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+BDRVQuorumState *s = bs-opaque;
+QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+  nb_sectors, cb, opaque);
+int i;
+
+acb-is_read = true;
+
+for (i = 0; i  s-num_children; i++) {
+acb-qcrs[i].buf = qemu_blockalign(s-bs[i], qiov-size);
+qemu_iovec_init(acb-qcrs[i].qiov, qiov-niov);
+qemu_iovec_clone(acb-qcrs[i].qiov, qiov, acb-qcrs[i].buf);
+}
+
+for (i = 0; i  s-num_children; i++) {
+bdrv_aio_readv(s-bs[i], sector_num, qiov, nb_sectors,
+   quorum_aio_cb, acb-qcrs[i]);
+}
+
+return acb-common;
+}
+
 static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
   int64_t sector_num,
   QEMUIOVector *qiov,
@@ -176,6 +212,7 @@ static BlockDriver bdrv_quorum = {
 
 .instance_size  = sizeof(BDRVQuorumState),
 
+.bdrv_aio_readv = quorum_aio_readv,
 .bdrv_aio_writev= quorum_aio_writev,
 };
 
-- 
1.8.3.2

[Qemu-devel] [PATCH 4/8] hw/net/virtio-balloon: use virtio wrappers to access page frame numbers.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/virtio/virtio-balloon.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a470a0b..5e7f26f 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 #endif
 
 #include hw/virtio/virtio-bus.h
+#include hw/virtio/virtio-access.h
 
 static void balloon_page(void *addr, int deflate)
 {
@@ -192,7 +193,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 ram_addr_t pa;
 ram_addr_t addr;
 
-pa = (ram_addr_t)ldl_p(pfn)  VIRTIO_BALLOON_PFN_SHIFT;
+pa = (ram_addr_t)virtio_ldl_p(pfn)  VIRTIO_BALLOON_PFN_SHIFT;
 offset += 4;
 
 /* FIXME: remove get_system_memory(), but how? */

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Andreas Färber

Am 18.02.2014 12:16, schrieb Paolo Bonzini:
 Il 18/02/2014 12:09, Peter Maydell ha scritto:
  No, we've had that topic before: It's your job as submitter and
  maintainer to flag that appropriately in the commit message, as per
 QEMU
  Summit 2012.

 I don't think this workflow works. I have no idea what
 stable's criteria are, and if you rely on people
 adding a cc you're going to miss stuff.
 
 There isn't really a standard criterion.  It's up to each maintainer to
 be stricter or looser on what goes to stable.

The criteria is pretty simple: Was the breakage in the last release
already or was it introduced only intermittently.

It simply does not scale to have Michael or other stable maintainers
(mjt, me, ...) look at each commit from vX.Y to HEAD and decide whether
to backport or not. That's why that task of flagging as backport
*candidates* is pushed out to maintainers and recursively to authors
where applicable, to reduce the number of commits to sift through and to
allow to do this for actually committed patches rather than mails on the
mailing list that might not get committed or change subject.

What especially annoys me here is that Peter wants to play on Anthony's
level on the project but is openly ignoring both our stable releases as
a concept (we wouldn't need a release in the first place if we don't
care about it working!) and the procedures decided in his presence at
QEMU Summit (having maintainer/contributor flag it via Cc: line). If you
feel the conclusion we reached there is not working out, feel free to
bring this topic up on the KVM call later today - playing
Rumpelstilzchen and exempting you from what everyone else is doing is
not an acceptable solution. Either we all do it this way or we all
decide on another way. It was not my suggestion, just a proposed
solution to an issue that affects me, so I'm open to alternatives.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH 7/8] hw/char/virtio-serial-bus: use virtio wrappers to access headers.

2014-02-18 Thread Greg Kurz

From: Rusty Russell ru...@rustcorp.com.au

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
---
 hw/char/virtio-serial-bus.c |   34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 226e9f9..243bf31 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -24,6 +24,7 @@
 #include hw/sysbus.h
 #include trace.h
 #include hw/virtio/virtio-serial.h
+#include hw/virtio/virtio-access.h
 
 static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id)
 {
@@ -185,9 +186,9 @@ static size_t send_control_event(VirtIOSerial *vser, 
uint32_t port_id,
 {
 struct virtio_console_control cpkt;
 
-stl_p(cpkt.id, port_id);
-stw_p(cpkt.event, event);
-stw_p(cpkt.value, value);
+virtio_stl_p(cpkt.id, port_id);
+virtio_stw_p(cpkt.event, event);
+virtio_stw_p(cpkt.value, value);
 
 trace_virtio_serial_send_control_event(port_id, event, value);
 return send_control_msg(vser, cpkt, sizeof(cpkt));
@@ -291,8 +292,8 @@ static void handle_control_message(VirtIOSerial *vser, void 
*buf, size_t len)
 return;
 }
 
-cpkt.event = lduw_p(gcpkt-event);
-cpkt.value = lduw_p(gcpkt-value);
+cpkt.event = virtio_lduw_p(gcpkt-event);
+cpkt.value = virtio_lduw_p(gcpkt-value);
 
 trace_virtio_serial_handle_control_message(cpkt.event, cpkt.value);
 
@@ -312,10 +313,10 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf, size_t len)
 return;
 }
 
-port = find_port_by_id(vser, ldl_p(gcpkt-id));
+port = find_port_by_id(vser, virtio_ldl_p(gcpkt-id));
 if (!port) {
 error_report(virtio-serial-bus: Unexpected port id %u for device %s,
- ldl_p(gcpkt-id), vser-bus.qbus.name);
+ virtio_ldl_p(gcpkt-id), vser-bus.qbus.name);
 return;
 }
 
@@ -342,9 +343,9 @@ static void handle_control_message(VirtIOSerial *vser, void 
*buf, size_t len)
 }
 
 if (port-name) {
-stl_p(cpkt.id, port-id);
-stw_p(cpkt.event, VIRTIO_CONSOLE_PORT_NAME);
-stw_p(cpkt.value, 1);
+virtio_stl_p(cpkt.id, port-id);
+virtio_stw_p(cpkt.event, VIRTIO_CONSOLE_PORT_NAME);
+virtio_stw_p(cpkt.value, 1);
 
 buffer_len = sizeof(cpkt) + strlen(port-name) + 1;
 buffer = g_malloc(buffer_len);
@@ -536,7 +537,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 qemu_put_be32s(f, s-config.max_nr_ports);
 
 /* The ports map */
-max_nr_ports = tswap32(s-config.max_nr_ports);
+max_nr_ports = virtio_tswap32(s-config.max_nr_ports);
 for (i = 0; i  (max_nr_ports + 31) / 32; i++) {
 qemu_put_be32s(f, s-ports_map[i]);
 }
@@ -690,8 +691,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 qemu_get_be16s(f, s-config.rows);
 
 qemu_get_be32s(f, max_nr_ports);
-tswap32s(max_nr_ports);
-if (max_nr_ports  tswap32(s-config.max_nr_ports)) {
+virtio_tswap32s(max_nr_ports);
+if (max_nr_ports  virtio_tswap32(s-config.max_nr_ports)) {
 /* Source could have had more ports than us. Fail migration. */
 return -EINVAL;
 }
@@ -760,7 +761,7 @@ static uint32_t find_free_port_id(VirtIOSerial *vser)
 {
 unsigned int i, max_nr_ports;
 
-max_nr_ports = tswap32(vser-config.max_nr_ports);
+max_nr_ports = virtio_tswap32(vser-config.max_nr_ports);
 for (i = 0; i  (max_nr_ports + 31) / 32; i++) {
 uint32_t map, bit;
 
@@ -846,7 +847,7 @@ static int virtser_port_qdev_init(DeviceState *qdev)
 }
 }
 
-max_nr_ports = tswap32(port-vser-config.max_nr_ports);
+max_nr_ports = virtio_tswap32(port-vser-config.max_nr_ports);
 if (port-id = max_nr_ports) {
 error_report(virtio-serial-bus: Out-of-range port id specified, max. 
allowed: %u,
  max_nr_ports - 1);
@@ -949,7 +950,8 @@ static void virtio_serial_device_realize(DeviceState *dev, 
Error **errp)
 vser-ovqs[i] = virtio_add_queue(vdev, 128, handle_output);
 }
 
-vser-config.max_nr_ports = tswap32(vser-serial.max_virtserial_ports);
+vser-config.max_nr_ports =
+virtio_tswap32(vser-serial.max_virtserial_ports);
 vser-ports_map = g_malloc0(((vser-serial.max_virtserial_ports + 31) / 32)
 * sizeof(vser-ports_map[0]));
 /*

[Qemu-devel] [PATCH 8/8] hw/9pfs/virtio_9p_device: use virtio wrappers to access headers.

2014-02-18 Thread Greg Kurz

Note that st*_raw and ld*_raw are effectively replaced by st*_p and ld*_p.

Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
---
 hw/9pfs/virtio-9p-device.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 15a4983..b3161ec 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -19,6 +19,7 @@
 #include fsdev/qemu-fsdev.h
 #include virtio-9p-xattr.h
 #include virtio-9p-coth.h
+#include hw/virtio/virtio-access.h
 
 static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
 {
@@ -34,7 +35,7 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t 
*config)
 
 len = strlen(s-tag);
 cfg = g_malloc0(sizeof(struct virtio_9p_config) + len);
-stw_raw(cfg-tag_len, len);
+virtio_stw_p(cfg-tag_len, len);
 /* We don't copy the terminating null to config space */
 memcpy(cfg-tag, s-tag, len);
 memcpy(config, cfg, s-config_size);

Re: [Qemu-devel] [PULL 00/20] acpi,pc,pci fixes and enhancements

2014-02-18 Thread Stefano Stabellini

On Tue, 18 Feb 2014, Paolo Bonzini wrote:
 Il 18/02/2014 13:16, Stefano Stabellini ha scritto:
  It looks like that this series breaks disk unplug
  (hw/ide/piix.c:pci_piix3_xen_ide_unplug).
  
  I bisected it and the problem is caused by:
  
  commit 5e95494380ecf83c97d28f72134ab45e0cace8f9
  Author: Igor Mammedov imamm...@redhat.com
  Date:   Wed Feb 5 16:36:52 2014 +0100
  
  hw/pci: switch to a generic hotplug handling for PCIDevice
  
  make qdev_unplug()/device_set_realized() to call hotplug handler's
  plug/unplug methods if available and remove not needed anymore
  hot(un)plug handling from PCIDevice.
  
  In case if hotplug handler is not available, revert to the legacy
  hotplug method for compatibility with not yet converted buses.
  
  Signed-off-by: Igor Mammedov imamm...@redhat.com
  Reviewed-by: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  
 
 What exactly breaks?

Disk unplug: hw/ide/piix.c:pci_piix3_xen_ide_unplug (see the beginning
of the email :-P).
It is called by hw/xen/xen_platform.c:platform_fixed_ioport_writew, in
response to the guest writing to a magic ioport specifically to unplug
the emulated disk.
With this patch after the guest boots I can still access both xvda and
sda for the same disk, leading to fs corruptions.

Re: [Qemu-devel] [RFC PATCH v2 01/12] mc: add documentation for micro-checkpointing

2014-02-18 Thread Dr. David Alan Gilbert

* mrhi...@linux.vnet.ibm.com (mrhi...@linux.vnet.ibm.com) wrote:
 From: Michael R. Hines mrhi...@us.ibm.com
 
 Wiki: http://wiki.qemu.org/Features/MicroCheckpointing
 Github: g...@github.com:hinesmr/qemu.git, 'mc' branch
 
 NOTE: This is a direct copy of the QEMU wiki page for the convenience
 of the review process. Since this series very much in flux, instead of
 maintaing two copies of documentation in two different formats, this
 documentation will be properly formatted in the future when the review
 process has completed.

It seems to be picking up some truncations as well.

 +The Micro-Checkpointing Process
 +Basic Algorithm
 +Micro-Checkpoints (MC) work against the existing live migration path in 
 QEMU, and can effectively be understood as a live migration that never 
 ends. As such, iteration rounds happen at the granularity of 10s of 
 milliseconds and perform the following steps:
 +
 +1. After N milliseconds, stop the VM.
 +3. Generate a MC by invoking the live migration software path to identify 
 and copy dirty memory into a local staging area inside QEMU.
 +4. Resume the VM immediately so that it can make forward progress.
 +5. Transmit the checkpoint to the destination.
 +6. Repeat
 +Upon failure, load the contents of the last MC at the destination back into 
 memory and run the VM normally.

Later you talk about the memory allocation and how you grow the memory as needed
to fit the checkpoint, have you tried going the other way and triggering the
checkpoints sooner if they're taking too much memory?

 +1. MC over TCP/IP: Once the socket connection breaks, we assume
 failure. This happens very early in the loss of the latest MC not only
 because a very large amount of bytes is typically being sequenced in a
 TCP stream but perhaps also because of the timeout in acknowledgement
 of the receipt of a commit message by the destination.
 +
 +2. MC over RDMA: Since Infiniband does not provide any underlying
 timeout mechanisms, this implementation enhances QEMU's RDMA migration
 protocol to include a simple keep-alive. Upon the loss of multiple
 keep-alive messages, the sender is deemed to have failed.
 +
 +In both cases, either due to a failed TCP socket connection or lost RDMA 
 keep-alive group, both the sender or the receiver can be deemed to have 
 failed.
 +
 +If the sender is deemed to have failed, the destination takes over 
 immediately using the contents of the last checkpoint.
 +
 +If the destination is deemed to be lost, we perform the same action
 as a live migration: resume the sender normally and wait for management
 software to make a policy decision about whether or not to re-protect
 the VM, which may involve a third-party to identify a new destination
host again to use as a backup for the VM.

In this world what is making the decision about whether the sender/destination
should win - how do you avoid a split brain situation where both
VMs are running but the only thing that failed is the comms between them?
Is there any guarantee that you'll have received knowledge of the comms
failure before you pull the plug out and enable the corked packets to be
sent on the sender side?

snip

 +RDMA is used for two different reasons:
 +
 +1. Checkpoint generation (RDMA-based memcpy):
 +2. Checkpoint transmission
 +Checkpoint generation must be done while the VM is paused. In the
 worst case, the size of the checkpoint can be equal in size to the amount
 of memory in total use by the VM. In order to resume VM execution as
 fast as possible, the checkpoint is copied consistently locally into
 a staging area before transmission. A standard memcpy() of potentially
 such a large amount of memory not only gets no use out of the CPU cache
 but also potentially clogs up the CPU pipeline which would otherwise
 be useful by other neighbor VMs on the same physical node that could be
 scheduled for execution. To minimize the effect on neighbor VMs, we use
 RDMA to perform a local memcpy(), bypassing the host processor. On
 more recent processors, a 'beefy' enough memory bus architecture can
 move memory just as fast (sometimes faster) as a pure-software CPU-only
 optimized memcpy() from libc. However, on older computers, this feature
 only gives you the benefit of lower CPU-utilization at the expense of

Isn't there a generic kernel DMA ABI for doing this type of thing (I
think there was at one point, people have suggested things like using
graphics cards to do it but I don't know if it ever happened).
The other question is, do you always need to copy - what about something
like COWing the pages?

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] Fwd: Trying to write a new device / virtio-i2c ?

2014-02-18 Thread Alex David

2014-02-17 17:11 GMT+01:00 Paolo Bonzini pbonz...@redhat.com:

 Il 17/02/2014 16:33, Alex David ha scritto:

  If you need more than one bus, you need a new device exposing the
 I2C bus, besides the new sensor devices.  USB-I2C could be one such
 device.

 So let me see if I understood well. USB-I2C (host QEMU device) seems a
 good idea, I could normally do : qemu-system-i386 -device
 usb-I2c,chardev=foo -device usb-i2c,chardev=bar -chardev
 socket,path=/tmp/test0,server,nowait,id=foo -chardev
 socket,path=/tmp/test1,server,nowait,id=bar.


 Almost. For QOM:

 -device usb-i2c,id=usb-i2c-0
 -device i2c-my-sensor,address=0x48,bus=usb-i2c-0.0

 For chardev:

 -device usb-i2c,id=usb-i2c-0
 -chardev socket,path=/tmp/test0,server,nowait,id=chr-foo-0
 -device i2c-my-sensor,address=0x48,bus=usb-i2c-0.0,chardev=chr-foo-0

 Repeat for the other buses, replacing -0 with -1 and -2.


  I need a USB-I2C guest kernel driver that would register a bus (i2c-1
 for chardev foo, i2c-2 for chardev bar etc...), I guess ?


 It exists already, drivers/i2c/busses/i2c-tiny-usb.c.


I'm now trying to write this new USB-I2C device, I look through some of the
code (dev-serial, dev-...), I used dev-serial code and removed the code
that wanted to declare a chardev (I don't need one to create my bus, right
?).

Now I'm wondering, how will be i2c-tiny-usb launched and declare a new
/dev/i2c-N, with my device. I found :

static struct usb_device_id i2c_tiny_usb_table [] = {
{ USB_DEVICE(0x0403, 0xc631) },   /* FTDI */
{ USB_DEVICE(0x1c40, 0x0534) },   /* EZPrototypes */
{ }   /* Terminating entry */
};

So I figured my USB-I2C should register using these numbers ?

static const USBDesc desc_i2c = {
.id = {
.idVendor  = 0x0403,
.idProduct = 0xc631,
.bcdDevice = 0x0400,
.iManufacturer = STR_MANUFACTURER,
.iProduct  = STR_PRODUCT_I2C,
.iSerialNumber = STR_SERIALNUMBER,
},
.full = desc_device,
.str  = desc_strings,
};

However, this doesn't work, after a $ qemu-system-i386
debian_wheezy_i386_standard.qcow2 -usb -device USB-I2C,id=usb-i2c-0, the
driver doesn't seem to be used (/sys/class/i2c-dev/i2c-0/name isn't for
it...), even if I modprobe i2c-tiny-usb...
The dmesg shows it doesn't go into the probe function.

What did I miss ?

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Alexander Graf


On 02/18/2014 01:37 PM, Peter Maydell wrote:

On 18 February 2014 12:17, Alexander Graf ag...@suse.de wrote:

On 02/18/2014 12:22 PM, Peter Maydell wrote:

My criteria for ARM in the past has typically been there's
a new release every three months, anything that got past
the release testing process for release N is sufficiently
non-critical it can just go into release N+1.

Unfortunately this doesn't work for distributions. Distros
need to maintain a stable branch for the lifetime of a release
to ensure that we're reasonably regression free.

If you indicate that this doesn't apply to ARM it basically means you admit
that ARM systems are not yet ready for stable use by customers when they
want to use KVM. At least at the point when we agree that customers do want
to run on a stable base for virtualization on ARM we need a working -stable
system for critical fixes.

I agree in general that ARM support needs to move from
its traditional this is just a dev tool situation to
a broader level of support/stability guarantees for KVM.
(We're not yet guaranteeing cross-version migration,
for another example there.)


Yup. I think it's reasonably to not declare ARM a stable target at the 
current point in time. But I think we agree that we want to change that 
- timeframe wise probably around the release after 2.0 at which point 
hopefully PCI and virtio 1.0 have settled.



However again we run into the definition of what's a
critical fix?. I think if distros need a stable branch
then they need to be prepared to do the work of sorting
through what counts as a critical fix that needs to be
ported to that branch. (For instance, which boards and
targets do they care about?)


I think this is up for discussion. If I had to declare anything, I 
wouldn't consider anything but the virt machine as supported for example 
- similar to how x86 only really considers the pc machine type stable.



For instance patch 3 only applies to the integrator
board, and I don't consider the guest-to-host border
to be a security boundary for most of our legacy board
models: there's just too much unaudited device code for
that to be trustable.


Yes, I fully agree. Traditionally what I've done is to reply to patches 
that I consider stable material and nag the maintainer about CCing it. 
After a while people got so afraid of my emails that they started doing 
the CC themselves :). But in case of the integrator board I personally 
wouldn't bother ;).



Alex

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Andreas Färber

Am 18.02.2014 13:51, schrieb Alexander Graf:
 Traditionally what I've done is to reply to patches
 that I consider stable material and nag the maintainer about CCing it.
 After a while people got so afraid of my emails that they started doing
 the CC themselves :). But in case of the integrator board I personally
 wouldn't bother ;).

Well, I have been doing exactly that here, asking Peter to add the CC,
and still Peter is refusing to add that single line before applying the
patch to his branch... :(

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Alexander Graf


On 02/18/2014 01:53 PM, Andreas Färber wrote:

Am 18.02.2014 13:51, schrieb Alexander Graf:

Traditionally what I've done is to reply to patches
that I consider stable material and nag the maintainer about CCing it.
After a while people got so afraid of my emails that they started doing
the CC themselves :). But in case of the integrator board I personally
wouldn't bother ;).

Well, I have been doing exactly that here, asking Peter to add the CC,
and still Peter is refusing to add that single line before applying the
patch to his branch... :(


Peter, can you please put CC stable on them? Let's consider this a 
training exercise on how to add a stable tag and call it a day ;).



Alex

Re: [Qemu-devel] [PATCH v4] qdev: Keep global allocation counter per bus

2014-02-18 Thread Andreas Färber

Am 18.02.2014 13:54, schrieb Markus Armbruster:
 Peter, can you merge this patch?  v2 is from December 20, and the
 changes since then have been limited to the commit message.
 
 In case Peter doesn't want to take it directly: Andreas, would you be
 willing to take it through your tree?  It's not really QOM, but your
 tree is the closest fit I can find.

Last thing I read was some open discussions related to migration
problems, so I didn't take it yet. If those are resolved, as a qdev
patch I would take it through qom-next, if mst acks for the PC part.

Regards,
Andreas

 
 Markus Armbruster arm...@redhat.com writes:
 
 From: Alexander Graf ag...@suse.de

 When we have 2 separate qdev devices that both create a qbus of the
 same type without specifying a bus name or device name, we end up
 with two buses of the same name, such as ide.0 on the Mac machines:

   dev: macio-ide, id 
 bus: ide.0
   type IDE
   dev: macio-ide, id 
 bus: ide.0
   type IDE

 If we now spawn a device that connects to a ide.0 the last created
 bus gets the device, with the first created bus inaccessible to the
 command line.

 After some discussion on IRC we concluded that the best quick fix way
 forward for this is to make automated bus-class type based allocation
 count a global counter. That's what this patch implements. With this
 we instead get

   dev: macio-ide, id 
 bus: ide.1
   type IDE
   dev: macio-ide, id 
 bus: ide.0
   type IDE

 on the example mentioned above.

 This also means that if you did -device ...,bus=ide.0 you got a device
 on the first bus (the last created one) before this patch and get that
 device on the second one (the first created one) now.  Breaks
 migration unless you change bus=ide.0 to bus=ide.1 on the destination.

 This is intended and makes the bus enumeration work as expected.

 As per review request follows a list of otherwise affected boards and
 the reasoning for the conclusion that they are ok:

target  machine bus id  times
--  --- --  -

aarch64 n800i2c-bus.0   2
aarch64 n810i2c-bus.0   2
arm n800i2c-bus.0   2
arm n810i2c-bus.0   2

 - Devices are only created explicitly on one of the two buses, using
s-mpu-i2c[0], so no change to the guest.

aarch64 vexpress-a15virtio-mmio-bus.0   4
aarch64 vexpress-a9 virtio-mmio-bus.0   4
aarch64 virtvirtio-mmio-bus.0   32
arm vexpress-a15virtio-mmio-bus.0   4
arm vexpress-a9 virtio-mmio-bus.0   4
arm virtvirtio-mmio-bus.0   32

 - Makes -device bus= work for all virtio-mmio buses.  Breaks
migration.  Workaround for migration from old to new: specify
virtio-mmio-bus.4 or .32 respectively rather than .0 on the
destination.

aarch64 xilinx-zynq-a9  usb-bus.0   2
arm xilinx-zynq-a9  usb-bus.0   2
mips64elfulong2eusb-bus.0   2

 - Normal USB operation not affected. Migration driver needs command
line to use the other bus.

i386isapc   ide.0   2
x86_64  isapc   ide.0   2
mipsmipside.0   2
mips64  mipside.0   2
mips64elmipside.0   2
mipsel  mipside.0   2
ppc g3beige ide.0   2
ppc mac99   ide.0   2
ppc prepide.0   2
ppc64   g3beige ide.0   2
ppc64   mac99   ide.0   2
ppc64   prepide.0   2

 - Makes -device bus= work for all IDE buses.  Breaks migration.
Workaround for migration from old to new: specify ide.1 rather than
ide.0 on the destination.

 CC: Paolo Bonzini pbonz...@redhat.com
 CC: Anthony Liguori aligu...@amazon.com
 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
 v1 - v2:

   - add fix for isapc which was searching for 2 buses called ide.0
   - explain the semantic change more in the commit message

 v2 - v3:

   - add board list to commit message

 v3 - v4:

   - Explain impact on migration more clearly in commit message.

  hw/core/qdev.c | 20 +---
  hw/i386/pc_piix.c  |  8 +++-
  include/hw/qdev-core.h |  2 ++
  3 files changed, 22 insertions(+), 8 deletions(-)

 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index 82a9123..e7985fe 100644
 --- a/hw/core/qdev.c
 +++ b/hw/core/qdev.c
 @@ -430,27 +430,33 @@ DeviceState *qdev_find_recursive(BusState *bus, const 
 char *id)
  static void qbus_realize(BusState *bus, DeviceState *parent, const char 
 *name)
  {
  const char *typename =

Re: [Qemu-devel] [PATCH v4] qdev: Keep global allocation counter per bus

2014-02-18 Thread Markus Armbruster

Peter, can you merge this patch?  v2 is from December 20, and the
changes since then have been limited to the commit message.

In case Peter doesn't want to take it directly: Andreas, would you be
willing to take it through your tree?  It's not really QOM, but your
tree is the closest fit I can find.

Markus Armbruster arm...@redhat.com writes:

 From: Alexander Graf ag...@suse.de

 When we have 2 separate qdev devices that both create a qbus of the
 same type without specifying a bus name or device name, we end up
 with two buses of the same name, such as ide.0 on the Mac machines:

   dev: macio-ide, id 
 bus: ide.0
   type IDE
   dev: macio-ide, id 
 bus: ide.0
   type IDE

 If we now spawn a device that connects to a ide.0 the last created
 bus gets the device, with the first created bus inaccessible to the
 command line.

 After some discussion on IRC we concluded that the best quick fix way
 forward for this is to make automated bus-class type based allocation
 count a global counter. That's what this patch implements. With this
 we instead get

   dev: macio-ide, id 
 bus: ide.1
   type IDE
   dev: macio-ide, id 
 bus: ide.0
   type IDE

 on the example mentioned above.

 This also means that if you did -device ...,bus=ide.0 you got a device
 on the first bus (the last created one) before this patch and get that
 device on the second one (the first created one) now.  Breaks
 migration unless you change bus=ide.0 to bus=ide.1 on the destination.

 This is intended and makes the bus enumeration work as expected.

 As per review request follows a list of otherwise affected boards and
 the reasoning for the conclusion that they are ok:

target  machine bus id  times
--  --- --  -

aarch64 n800i2c-bus.0   2
aarch64 n810i2c-bus.0   2
arm n800i2c-bus.0   2
arm n810i2c-bus.0   2

 - Devices are only created explicitly on one of the two buses, using
s-mpu-i2c[0], so no change to the guest.

aarch64 vexpress-a15virtio-mmio-bus.0   4
aarch64 vexpress-a9 virtio-mmio-bus.0   4
aarch64 virtvirtio-mmio-bus.0   32
arm vexpress-a15virtio-mmio-bus.0   4
arm vexpress-a9 virtio-mmio-bus.0   4
arm virtvirtio-mmio-bus.0   32

 - Makes -device bus= work for all virtio-mmio buses.  Breaks
migration.  Workaround for migration from old to new: specify
virtio-mmio-bus.4 or .32 respectively rather than .0 on the
destination.

aarch64 xilinx-zynq-a9  usb-bus.0   2
arm xilinx-zynq-a9  usb-bus.0   2
mips64elfulong2eusb-bus.0   2

 - Normal USB operation not affected. Migration driver needs command
line to use the other bus.

i386isapc   ide.0   2
x86_64  isapc   ide.0   2
mipsmipside.0   2
mips64  mipside.0   2
mips64elmipside.0   2
mipsel  mipside.0   2
ppc g3beige ide.0   2
ppc mac99   ide.0   2
ppc prepide.0   2
ppc64   g3beige ide.0   2
ppc64   mac99   ide.0   2
ppc64   prepide.0   2

 - Makes -device bus= work for all IDE buses.  Breaks migration.
Workaround for migration from old to new: specify ide.1 rather than
ide.0 on the destination.

 CC: Paolo Bonzini pbonz...@redhat.com
 CC: Anthony Liguori aligu...@amazon.com
 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
 v1 - v2:

   - add fix for isapc which was searching for 2 buses called ide.0
   - explain the semantic change more in the commit message

 v2 - v3:

   - add board list to commit message

 v3 - v4:

   - Explain impact on migration more clearly in commit message.

  hw/core/qdev.c | 20 +---
  hw/i386/pc_piix.c  |  8 +++-
  include/hw/qdev-core.h |  2 ++
  3 files changed, 22 insertions(+), 8 deletions(-)

 diff --git a/hw/core/qdev.c b/hw/core/qdev.c
 index 82a9123..e7985fe 100644
 --- a/hw/core/qdev.c
 +++ b/hw/core/qdev.c
 @@ -430,27 +430,33 @@ DeviceState *qdev_find_recursive(BusState *bus, const 
 char *id)
  static void qbus_realize(BusState *bus, DeviceState *parent, const char 
 *name)
  {
  const char *typename = object_get_typename(OBJECT(bus));
 +BusClass *bc;
  char *buf;
 -int i,len;
 +int i, len, bus_id;
  
  bus-parent = parent;
  
  if (name) {
  bus-name = g_strdup(name);
  } else if (bus-parent  bus-parent-id) {
 -/* parent device has

[Qemu-devel] [PATCH V18 07/12] quorum: Add quorum_getlength().

2014-02-18 Thread Benoît Canet

From: Benoît Canet ben...@irqsave.net

Check that every bs file returns the same length.
Otherwise, return -EIO to disable the quorum and
avoid length discrepancy.

Signed-off-by: Benoit Canet ben...@irqsave.net
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/quorum.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 7acaa97..131ffbf 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -595,12 +595,38 @@ static BlockDriverAIOCB 
*quorum_aio_writev(BlockDriverState *bs,
 return acb-common;
 }
 
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+BDRVQuorumState *s = bs-opaque;
+int64_t result;
+int i;
+
+/* check that all file have the same length */
+result = bdrv_getlength(s-bs[0]);
+if (result  0) {
+return result;
+}
+for (i = 1; i  s-num_children; i++) {
+int64_t value = bdrv_getlength(s-bs[i]);
+if (value  0) {
+return value;
+}
+if (value != result) {
+return -EIO;
+}
+}
+
+return result;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= quorum,
 .protocol_name  = quorum,
 
 .instance_size  = sizeof(BDRVQuorumState),
 
+.bdrv_getlength = quorum_getlength,
+
 .bdrv_aio_readv = quorum_aio_readv,
 .bdrv_aio_writev= quorum_aio_writev,
 };
-- 
1.8.3.2

Re: [Qemu-devel] [PATCH 0/3] ARM: three easy patches for coverity-reported issues

2014-02-18 Thread Peter Maydell

On 18 February 2014 12:44, Andreas Färber afaer...@suse.de wrote:
 What especially annoys me here is that Peter wants to play on Anthony's
 level on the project but is openly ignoring both our stable releases as
 a concept (we wouldn't need a release in the first place if we don't
 care about it working!) and the procedures decided in his presence at
 QEMU Summit (having maintainer/contributor flag it via Cc: line). If you
 feel the conclusion we reached there is not working out, feel free to
 bring this topic up on the KVM call later today - playing
 Rumpelstilzchen and exempting you from what everyone else is doing is
 not an acceptable solution. Either we all do it this way or we all
 decide on another way. It was not my suggestion, just a proposed
 solution to an issue that affects me, so I'm open to alternatives.

I'm just pointing out (as I do pretty much any time somebody
says hey you should have cc'd stable) that this workflow
isn't working for me as a contributor or as a submaintainer.
If you can document and define how it's actually supposed to
work and what the policy is for what counts as a bugfix that
should target stable (I couldn't find anything on the wiki)
then that might help. We should probably discuss this on the
call, yes.

thanks
-- PMM

Re: [Qemu-devel] Fwd: Trying to write a new device / virtio-i2c ?

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 13:48, Alex David ha scritto:


2014-02-17 17:11 GMT+01:00 Paolo Bonzini pbonz...@redhat.com
mailto:pbonz...@redhat.com:

Il 17/02/2014 16:33, Alex David ha scritto:

If you need more than one bus, you need a new device
exposing the
I2C bus, besides the new sensor devices.  USB-I2C could be
one such
device.

So let me see if I understood well. USB-I2C (host QEMU device)
seems a
good idea, I could normally do : qemu-system-i386 -device
usb-I2c,chardev=foo -device usb-i2c,chardev=bar -chardev
socket,path=/tmp/test0,server,__nowait,id=foo -chardev
socket,path=/tmp/test1,server,__nowait,id=bar.


Almost. For QOM:

-device usb-i2c,id=usb-i2c-0
-device i2c-my-sensor,address=0x48,__bus=usb-i2c-0.0

For chardev:

-device usb-i2c,id=usb-i2c-0
-chardev socket,path=/tmp/test0,server,__nowait,id=chr-foo-0
-device
i2c-my-sensor,address=0x48,__bus=usb-i2c-0.0,chardev=chr-__foo-0

Repeat for the other buses, replacing -0 with -1 and -2.


I need a USB-I2C guest kernel driver that would register a bus
(i2c-1
for chardev foo, i2c-2 for chardev bar etc...), I guess ?


It exists already, drivers/i2c/busses/i2c-tiny-usb.c.


I'm now trying to write this new USB-I2C device, I look through some of
the code (dev-serial, dev-...), I used dev-serial code and removed the
code that wanted to declare a chardev (I don't need one to create my
bus, right ?).

Now I'm wondering, how will be i2c-tiny-usb launched and declare a new
/dev/i2c-N, with my device. I found :

static struct usb_device_id i2c_tiny_usb_table [] = {
{ USB_DEVICE(0x0403, 0xc631) },   /* FTDI */
{ USB_DEVICE(0x1c40, 0x0534) },   /* EZPrototypes */
{ }   /* Terminating entry */
};

So I figured my USB-I2C should register using these numbers ?

static const USBDesc desc_i2c = {
.id = {
.idVendor  = 0x0403,
.idProduct = 0xc631,
.bcdDevice = 0x0400,
.iManufacturer = STR_MANUFACTURER,
.iProduct  = STR_PRODUCT_I2C,
.iSerialNumber = STR_SERIALNUMBER,
},
.full = desc_device,
.str  = desc_strings,
};

However, this doesn't work, after a $ qemu-system-i386
debian_wheezy_i386_standard.qcow2 -usb -device USB-I2C,id=usb-i2c-0, the
driver doesn't seem to be used (/sys/class/i2c-dev/i2c-0/name isn't for
it...), even if I modprobe i2c-tiny-usb...
The dmesg shows it doesn't go into the probe function.

What did I miss ?


I honestly don't know, you'll have to debug it.

Paolo

Re: [Qemu-devel] [PATCH 2/8] virtio: allow byte swapping for vring and config access

2014-02-18 Thread Cornelia Huck

On Tue, 18 Feb 2014 13:38:54 +0100
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 From: Rusty Russell ru...@rustcorp.com.au
 
 This is based on a simpler patch by Anthony Liguouri, which only handled
 the vring accesses.  We also need some drivers to access these helpers,
 eg. for data which contains headers.
 
 [ ldq_phys() API change, Greg Kurz gk...@linux.vnet.ibm.com ]
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  hw/virtio/virtio.c |   32 
  1 file changed, 16 insertions(+), 16 deletions(-)
 

  static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
  {
  hwaddr pa;
  pa = vq-vring.used + offsetof(VRingUsed, flags);
 -stw_phys(address_space_memory,
 - pa, lduw_phys(address_space_memory, pa)  ~mask);
 +virtio_stw_phys(address_space_memory,
 +pa, lduw_phys(address_space_memory, pa)  ~mask);
  }

This needs to be virtio_lduw_phys(), no?

Re: [Qemu-devel] [PULL 00/20] acpi,pc,pci fixes and enhancements

2014-02-18 Thread Igor Mammedov

On Tue, 18 Feb 2014 12:45:29 +
Stefano Stabellini stefano.stabell...@eu.citrix.com wrote:

 On Tue, 18 Feb 2014, Paolo Bonzini wrote:
  Il 18/02/2014 13:16, Stefano Stabellini ha scritto:
   It looks like that this series breaks disk unplug
   (hw/ide/piix.c:pci_piix3_xen_ide_unplug).
   
   I bisected it and the problem is caused by:
   
   commit 5e95494380ecf83c97d28f72134ab45e0cace8f9
   Author: Igor Mammedov imamm...@redhat.com
   Date:   Wed Feb 5 16:36:52 2014 +0100
   
   hw/pci: switch to a generic hotplug handling for PCIDevice
   
   make qdev_unplug()/device_set_realized() to call hotplug handler's
   plug/unplug methods if available and remove not needed anymore
   hot(un)plug handling from PCIDevice.
   
   In case if hotplug handler is not available, revert to the legacy
   hotplug method for compatibility with not yet converted buses.
   
   Signed-off-by: Igor Mammedov imamm...@redhat.com
   Reviewed-by: Michael S. Tsirkin m...@redhat.com
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   
   
  
  What exactly breaks?
 
 Disk unplug: hw/ide/piix.c:pci_piix3_xen_ide_unplug (see the beginning
 of the email :-P).
 It is called by hw/xen/xen_platform.c:platform_fixed_ioport_writew, in
 response to the guest writing to a magic ioport specifically to unplug
 the emulated disk.
 With this patch after the guest boots I can still access both xvda and
 sda for the same disk, leading to fs corruptions.
 
Could you try with following debug patch?

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 64b66e0..84aa8be 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -214,6 +214,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 return;
 }
 
+fprintf(stderr, dc-hotpluggable %d\n, dc-hotpluggable);
 if (!dc-hotpluggable) {
 error_set(errp, QERR_DEVICE_NO_HOTPLUG,
   object_get_typename(OBJECT(dev)));
@@ -223,8 +224,12 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 qdev_hot_removed = true;
 
 if (dev-parent_bus  dev-parent_bus-hotplug_handler) {
+fprintf(stderr, bus name: %s, hotplug_handler: %s\n,
+dev-parent_bus-name,
+object_get_typename(OBJECT(dev-parent_bus-hotplug_handler)));
 hotplug_handler_unplug(dev-parent_bus-hotplug_handler, dev, errp);
 } else {
+fprintf(stderr, legacy unplug: %p\n, dc-unplug);
 assert(dc-unplug != NULL);
 if (dc-unplug(dev)  0) { /* legacy handler */
 error_set(errp, QERR_UNDEFINED_ERROR);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1acd2b2..74b0cac 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -238,6 +238,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
 if (pci_enabled  acpi_enabled) {
 i2c_bus *smbus;
 
+   fprintf(stderr, create piix4_pm_init\n);
 smi_irq = qemu_allocate_irqs(pc_acpi_smi_interrupt, first_cpu, 1);
 /* TODO: Populate SPD eeprom data.  */
 smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,

Re: [Qemu-devel] [PULL 00/20] acpi,pc,pci fixes and enhancements

2014-02-18 Thread Paolo Bonzini


Il 18/02/2014 13:45, Stefano Stabellini ha scritto:

Disk unplug: hw/ide/piix.c:pci_piix3_xen_ide_unplug (see the beginning
of the email :-P).
It is called by hw/xen/xen_platform.c:platform_fixed_ioport_writew, in
response to the guest writing to a magic ioport specifically to unplug
the emulated disk.
With this patch after the guest boots I can still access both xvda and
sda for the same disk, leading to fs corruptions.


Ok, the last paragraph is what I was missing.

So this is dc-unplug for the PIIX3 IDE device.  Because PCI declares a 
hotplug handler, dc-unplug is not called anymore.


But unlike other dc-unplug callbacks, pci_piix3_xen_ide_unplug doesn't 
free the device, it just drops the disks underneath.  I think the 
simplest solution is to _not_ make it a dc-unplug callback at all, and 
call pci_piix3_xen_ide_unplug from unplug_disks instead of qdev_unplug. 
 qdev_unplug means ask guest to start unplug, which is not what Xen 
wants to do here.


Paolo

Re: [Qemu-devel] [PATCH 2/8] virtio: allow byte swapping for vring and config access

2014-02-18 Thread Greg Kurz

On Tue, 18 Feb 2014 14:08:35 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:
 On Tue, 18 Feb 2014 13:38:54 +0100
 Greg Kurz gk...@linux.vnet.ibm.com wrote:
 
  From: Rusty Russell ru...@rustcorp.com.au
  
  This is based on a simpler patch by Anthony Liguouri, which only handled
  the vring accesses.  We also need some drivers to access these helpers,
  eg. for data which contains headers.
  
  [ ldq_phys() API change, Greg Kurz gk...@linux.vnet.ibm.com ]
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
  ---
   hw/virtio/virtio.c |   32 
   1 file changed, 16 insertions(+), 16 deletions(-)
  
 
   static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
   {
   hwaddr pa;
   pa = vq-vring.used + offsetof(VRingUsed, flags);
  -stw_phys(address_space_memory,
  - pa, lduw_phys(address_space_memory, pa)  ~mask);
  +virtio_stw_phys(address_space_memory,
  +pa, lduw_phys(address_space_memory, pa)  ~mask);
   }
 
 This needs to be virtio_lduw_phys(), no?

Oops yes it should be... my mistake. :) 

-- 
Gregory Kurz kurzg...@fr.ibm.com
 gk...@linux.vnet.ibm.com
Software Engineer @ IBM/Meiosys  http://www.ibm.com
Tel +33 (0)562 165 496

Anarchy is about taking complete responsibility for yourself.
Alan Moore.

1 2 3 >

1 - 100 of 253 matches

Mail list logo