date:20120625

Re: [Qemu-devel] [PATCH 1/8] Add spent time for migration

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 We add time spent for migration to the output of info migrate
 command.  'total_time' means time since the start fo migration if
 migration is 'active', and total time of migration if migration is
 completed.  As we are also interested in transferred ram when
 migration completes, adding all ram statistics
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  hmp.c|2 ++
  migration.c  |   11 +++
  migration.h  |1 +
  qapi-schema.json |   12 +---
  4 files changed, 23 insertions(+), 3 deletions(-)
 
 diff --git a/hmp.c b/hmp.c
 index b9cec1d..4c6d4ae 100644
 --- a/hmp.c
 +++ b/hmp.c
 @@ -145,6 +145,8 @@ void hmp_info_migrate(Monitor *mon)
 info-ram-remaining  10);
  monitor_printf(mon, total ram: % PRIu64  kbytes\n,
 info-ram-total  10);
 +monitor_printf(mon, total time: % PRIu64  milliseconds\n,
 +   info-ram-total_time);
  }
 
  if (info-has_disk) {
 diff --git a/migration.c b/migration.c
 index 3f485d3..599bb6c 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -131,6 +131,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  info-ram-transferred = ram_bytes_transferred();
  info-ram-remaining = ram_bytes_remaining();
  info-ram-total = ram_bytes_total();
 +info-ram-total_time = qemu_get_clock_ms(rt_clock)
 +- s-total_time;
 
  if (blk_mig_active()) {
  info-has_disk = true;
 @@ -143,6 +145,13 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  case MIG_STATE_COMPLETED:
  info-has_status = true;
  info-status = g_strdup(completed);
 +
 +info-has_ram = true;
 +info-ram = g_malloc0(sizeof(*info-ram));
 +info-ram-transferred = ram_bytes_transferred();
 +info-ram-remaining = 0;
 +info-ram-total = ram_bytes_total();
 +info-ram-total_time = s-total_time;
  break;
  case MIG_STATE_ERROR:
  info-has_status = true;
 @@ -260,6 +269,7 @@ static void migrate_fd_put_ready(void *opaque)
  } else {
  migrate_fd_completed(s);
  }
 +s-total_time = qemu_get_clock_ms(rt_clock) - s-total_time;
  if (s-state != MIG_STATE_COMPLETED) {
  if (old_vm_running) {
  vm_start();
 @@ -373,6 +383,7 @@ static MigrationState *migrate_init(int blk, int inc)
 
  s-bandwidth_limit = bandwidth_limit;
  s-state = MIG_STATE_SETUP;
 +s-total_time = qemu_get_clock_ms(rt_clock);
 
  return s;
  }
 diff --git a/migration.h b/migration.h
 index 2e9ca2e..165b27b 100644
 --- a/migration.h
 +++ b/migration.h
 @@ -33,6 +33,7 @@ struct MigrationState
  void *opaque;
  int blk;
  int shared;
 +int64_t total_time;
  };
 
  void process_incoming_migration(QEMUFile *f);
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 3b6e346..1ab5dbd 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -260,10 +260,15 @@
  #
  # @total: total amount of bytes involved in the migration process
  #
 +# @total_time: tota0l amount of ms since migration started.  If
 +#migration has ended, it returns the total migration
 +#time. (since 1.2)
 +#
  # Since: 0.14.0.
  ##
  { 'type': 'MigrationStats',
 -  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' } }
 +  'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
 +   'total_time': 'int' } }
 
  ##
  # @MigrationInfo
 @@ -275,8 +280,9 @@
  #  'cancelled'. If this field is not returned, no migration process
  #  has been initiated
  #
 -# @ram: #optional @MigrationStats containing detailed migration status,
 -#   only returned if status is 'active'
 +# @ram: #optional @MigrationStats containing detailed migration
 +#   status, only returned if status is 'active' or
 +#   'completed'. 'comppleted' (since 1.2)
  #
  # @disk: #optional @MigrationStats containing detailed disk migration
  #status, only returned if status is 'active' and it is a block

Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCHv3 02/14] unicore32-softmmu: Add coprocessor 0(sysctrl) and 1(ocd) instruction support

2012-06-25 Thread Wei-Ren Chen

On Mon, Jun 25, 2012 at 11:23:25AM +0800, guanxue...@mprc.pku.edu.cn wrote:
 [snip]
   +unrecognized:
   + ?? ??cpu_abort(env, Wrong register (%d) or wrong operation (%d) in
  cp0_set!\n,
   + ?? ?? ?? ?? ?? ??creg, cop);
 
  The call to cpu_abort() would mean that the guest is able to terminate
  QEMU at will, which is not OK. What does real HW do?
  In my opinion, I just want to terminate qemu when any unhandled or
  unknown operations happen.
 
  This can make the emulator vulnerable in the security sense. Probably
  Unicore CPUs are not used now in an environment where the guest can
  not be trusted (like cloud computing), but who knows the future?
 Is it proper to print such information to monitor? by using monitor_printf().

  What if user doesn't open a monitor?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Re: [Qemu-devel] [PATCH 2/8] Add tracepoints for savevm section start/end

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 This allows to know how long each section takes to save.
 
 An awk script like this tells us sections that takes more that 10ms
 
 $1 ~ /savevm_state_iterate_end/ {
   /* Print savevm_section_end line when  10ms duration */
   if ($2  1) {
   printf(%s times_missing=%u\n, $0, times_missing++);
   }
 }
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 
 fix ws tracepoints
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  savevm.c |8 
  trace-events |5 +
  2 files changed, 13 insertions(+)
 
 diff --git a/savevm.c b/savevm.c
 index faa8145..40320be 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -85,6 +85,7 @@
  #include cpus.h
  #include memory.h
  #include qmp-commands.h
 +#include trace.h
 
  #define SELF_ANNOUNCE_ROUNDS 5
 
 @@ -1624,11 +1625,14 @@ int qemu_savevm_state_iterate(QEMUFile *f)
  if (se-save_live_state == NULL)
  continue;
 
 +trace_savevm_section_start();
  /* Section type */
  qemu_put_byte(f, QEMU_VM_SECTION_PART);
  qemu_put_be32(f, se-section_id);
 
  ret = se-save_live_state(f, QEMU_VM_SECTION_PART, se-opaque);
 +trace_savevm_section_end(se-section_id);
 +
  if (ret = 0) {
  /* Do not proceed to the next vmstate before this one reported
 completion of the current stage. This serializes the migration
 @@ -1658,11 +1662,13 @@ int qemu_savevm_state_complete(QEMUFile *f)
  if (se-save_live_state == NULL)
  continue;
 
 +trace_savevm_section_start();
  /* Section type */
  qemu_put_byte(f, QEMU_VM_SECTION_END);
  qemu_put_be32(f, se-section_id);
 
  ret = se-save_live_state(f, QEMU_VM_SECTION_END, se-opaque);
 +trace_savevm_section_end(se-section_id);
  if (ret  0) {
  return ret;
  }
 @@ -1674,6 +1680,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
   if (se-save_state == NULL  se-vmsd == NULL)
   continue;
 
 +trace_savevm_section_start();
  /* Section type */
  qemu_put_byte(f, QEMU_VM_SECTION_FULL);
  qemu_put_be32(f, se-section_id);
 @@ -1687,6 +1694,7 @@ int qemu_savevm_state_complete(QEMUFile *f)
  qemu_put_be32(f, se-version_id);
 
  vmstate_save(f, se);
 +trace_savevm_section_end(se-section_id);
  }
 
  qemu_put_byte(f, QEMU_VM_EOF);
 diff --git a/trace-events b/trace-events
 index 5c82b3a..82c7619 100644
 --- a/trace-events
 +++ b/trace-events
 @@ -782,6 +782,11 @@ displaysurface_resize(void *display_state, void 
 *display_surface, int width, int
  # vga.c
  ppm_save(const char *filename, void *display_surface) %s surface=%p
 
 +# savevm.c
 +
 +savevm_section_start(void) 
 +savevm_section_end(unsigned int section_id) section_id %u
 +
  # hw/qxl.c
  disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) %d %d
  disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t 
 val) %d %s addr=%u val=%u

Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCH 3/8] No need to iterate if we already are over the limit

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 If buffers are full, don't iterate, just exit.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  savevm.c |3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/savevm.c b/savevm.c
 index 40320be..9101bfb 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -1625,6 +1625,9 @@ int qemu_savevm_state_iterate(QEMUFile *f)
  if (se-save_live_state == NULL)
  continue;
 
 +if (qemu_file_rate_limit(f)) {
 +return 0;
 +}
  trace_savevm_section_start();
  /* Section type */
  qemu_put_byte(f, QEMU_VM_SECTION_PART);

Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCH 4/8] Only TCG needs TLB handling

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Refactor the code that is only needed for tcg to an static function.
 Call that only when tcg is enabled.  We can't refactor to a dummy
 function in the kvm case, as qemu can be compiled at the same time
 with tcg and kvm.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  exec.c |   31 +--
  1 file changed, 21 insertions(+), 10 deletions(-)
 
 diff --git a/exec.c b/exec.c
 index 8244d54..a68b65c 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1824,11 +1824,29 @@ void tb_flush_jmp_cache(CPUArchState *env, 
 target_ulong addr)
  TB_JMP_PAGE_SIZE * sizeof(TranslationBlock *));
  }
 
 +static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t end,
 +  uintptr_t length)
 +{
 +uintptr_t start1;
 +
 +/* we modify the TLB cache so that the dirty bit will be set again
 +   when accessing the range */
 +start1 = (uintptr_t)qemu_safe_ram_ptr(start);
 +/* Check that we don't span multiple blocks - this breaks the
 +   address comparisons below.  */
 +if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
 +!= (end - 1) - start) {
 +abort();
 +}
 +cpu_tlb_reset_dirty_all(start1, length);
 +
 +}
 +
  /* Note: start and end must be within the same ram block.  */
  void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
   int dirty_flags)
  {
 -uintptr_t length, start1;
 +uintptr_t length;
 
  start = TARGET_PAGE_MASK;
  end = TARGET_PAGE_ALIGN(end);
 @@ -1838,16 +1856,9 @@ void cpu_physical_memory_reset_dirty(ram_addr_t start, 
 ram_addr_t end,
  return;
  cpu_physical_memory_mask_dirty_range(start, length, dirty_flags);
 
 -/* we modify the TLB cache so that the dirty bit will be set again
 -   when accessing the range */
 -start1 = (uintptr_t)qemu_safe_ram_ptr(start);
 -/* Check that we don't span multiple blocks - this breaks the
 -   address comparisons below.  */
 -if ((uintptr_t)qemu_safe_ram_ptr(end - 1) - start1
 -!= (end - 1) - start) {
 -abort();
 +if (tcg_enabled()) {
 +tlb_reset_dirty_range_all(start, end, length);
  }
 -cpu_tlb_reset_dirty_all(start1, length);
  }
 
  int cpu_physical_memory_set_dirty_tracking(int enable)

Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCH 6/8] Exit loop if we have been there too long

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Checking each 64 pages is a random magic number as good as any other.
 We don't want to test too many times, but on the other hand,
 qemu_get_clock_ns() is not so expensive either.  We want to be sure
 that we spent less than 50ms (half of buffered_file timer), if we
 spent more than 100ms, all the accounting got wrong.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  arch_init.c |   17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/arch_init.c b/arch_init.c
 index 424efe7..7de1abf 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -294,12 +294,15 @@ static void sort_ram_list(void)
  g_free(blocks);
  }
 
 +#define MAX_WAIT 50 /* ms, half buffered_file limit */
 +
  int ram_save_live(QEMUFile *f, int stage, void *opaque)
  {
  ram_addr_t addr;
  uint64_t bytes_transferred_last;
  double bwidth = 0;
  int ret;
 +int i;
 
  if (stage  0) {
  memory_global_dirty_log_stop();
 @@ -339,6 +342,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
  bytes_transferred_last = bytes_transferred;
  bwidth = qemu_get_clock_ns(rt_clock);
 
 +i = 0;
  while ((ret = qemu_file_rate_limit(f)) == 0) {
  int bytes_sent;
 
 @@ -347,6 +351,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
  if (bytes_sent == 0) { /* no more blocks */
  break;
  }
 +/* we want to check in the 1st loop, just in case it was the 1st time
 +   and we had to sync the dirty bitmap.
 +   qemu_get_clock_ns() is a bit expensive, so we only check each some
 +   iterations
 +*/
 +if ((i  63) == 0) {
 +uint64_t t1 = (qemu_get_clock_ns(rt_clock) - bwidth) / 100;
 +if (t1  MAX_WAIT) {
 +DPRINTF(big wait: %ld milliseconds, %d iterations\n, t1, 
 i);
 +break;
 +}
 +}
 +i++;
  }
 
  if (ret  0) {

Reviewed-by: Orit Wasserman owass...@redhat.com

[Qemu-devel] buildbot failure in qemu on default_openbsd_4.9

2012-06-25 Thread qemu

The Buildbot has detected a new failure on builder default_openbsd_4.9 while 
building qemu.
Full details are available at:
 http://buildbot.b1-systems.de/qemu/builders/default_openbsd_4.9/builds/298

Buildbot URL: http://buildbot.b1-systems.de/qemu/

Buildslave for this Build: kraxel_openbsd49

Build Reason: The Nightly scheduler named 'nightly_default' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

Re: [Qemu-devel] [PATCH 5/8] Only calculate expected_time for stage 2

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 ram_save_remaining() is an expensive operation when there is a lot of memory.
 So we only call the function when we need it.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  arch_init.c |   10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/arch_init.c b/arch_init.c
 index a9e8b74..424efe7 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -299,7 +299,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
  ram_addr_t addr;
  uint64_t bytes_transferred_last;
  double bwidth = 0;
 -uint64_t expected_time = 0;
  int ret;
 
  if (stage  0) {
 @@ -376,9 +375,12 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
  qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 -expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
 -
 -return (stage == 2)  (expected_time = migrate_max_downtime());
 +if (stage == 2) {
 +uint64_t expected_time;
 +expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
 +return expected_time = migrate_max_downtime();
 +}
 +return 0;
  }
 
  static inline void *host_from_stream_offset(QEMUFile *f,
Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Always use accessors to read/set the dirty bitmap.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  exec-obsolete.h |   40 
  exec.c  |3 +--
  2 files changed, 21 insertions(+), 22 deletions(-)
 
 diff --git a/exec-obsolete.h b/exec-obsolete.h
 index 792c831..f8ffce6 100644
 --- a/exec-obsolete.h
 +++ b/exec-obsolete.h
 @@ -45,15 +45,15 @@ int cpu_physical_memory_set_dirty_tracking(int enable);
  #define CODE_DIRTY_FLAG  0x02
  #define MIGRATION_DIRTY_FLAG 0x08
 
 -/* read dirty bit (return 0 or 1) */
 -static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
 +static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
  {
 -return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] == 0xff;
 +return ram_list.phys_dirty[addr  TARGET_PAGE_BITS];
  }
 
 -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
 +/* read dirty bit (return 0 or 1) */
 +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
  {
 -return ram_list.phys_dirty[addr  TARGET_PAGE_BITS];
 +return cpu_physical_memory_get_dirty_flags(addr) == 0xff;
  }
Juan,
you changed the order of the functions , can your restore the order as it was.

 
  static inline int cpu_physical_memory_get_dirty(ram_addr_t start,
 @@ -61,41 +61,45 @@ static inline int 
 cpu_physical_memory_get_dirty(ram_addr_t start,
  int dirty_flags)
  {
  int ret = 0;
 -uint8_t *p;
  ram_addr_t addr, end;
 
  end = TARGET_PAGE_ALIGN(start + length);
  start = TARGET_PAGE_MASK;
 -p = ram_list.phys_dirty + (start  TARGET_PAGE_BITS);
  for (addr = start; addr  end; addr += TARGET_PAGE_SIZE) {
 -ret |= *p++  dirty_flags;
 +ret |= cpu_physical_memory_get_dirty_flags(addr)  dirty_flags;
  }
  return ret;
  }
 
 +static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr,
 +  int dirty_flags)
 +{
 +return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] |= dirty_flags;
 +}
 +
  static inline void cpu_physical_memory_set_dirty(ram_addr_t addr)
  {
 -ram_list.phys_dirty[addr  TARGET_PAGE_BITS] = 0xff;
 +cpu_physical_memory_set_dirty_flags(addr, 0xff);
  }
 
 -static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr,
 -  int dirty_flags)
 +static inline int cpu_physical_memory_clear_dirty_flags(ram_addr_t addr,
 +int dirty_flags)
  {
 -return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] |= dirty_flags;
 +int mask = ~dirty_flags;
 +
 +return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] = mask;
  }
 
  static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
 ram_addr_t length,
 int dirty_flags)
  {
 -uint8_t *p;
  ram_addr_t addr, end;
 
  end = TARGET_PAGE_ALIGN(start + length);
  start = TARGET_PAGE_MASK;
 -p = ram_list.phys_dirty + (start  TARGET_PAGE_BITS);
  for (addr = start; addr  end; addr += TARGET_PAGE_SIZE) {
 -*p++ |= dirty_flags;
 +cpu_physical_memory_set_dirty_flags(addr, dirty_flags);
  }
  }
 
 @@ -103,16 +107,12 @@ static inline void 
 cpu_physical_memory_mask_dirty_range(ram_addr_t start,
  ram_addr_t length,
  int dirty_flags)
  {
 -int mask;
 -uint8_t *p;
  ram_addr_t addr, end;
 
  end = TARGET_PAGE_ALIGN(start + length);
  start = TARGET_PAGE_MASK;
 -mask = ~dirty_flags;
 -p = ram_list.phys_dirty + (start  TARGET_PAGE_BITS);
  for (addr = start; addr  end; addr += TARGET_PAGE_SIZE) {
 -*p++ = mask;
 +cpu_physical_memory_clear_dirty_flags(addr, dirty_flags);
  }
  }
 
 diff --git a/exec.c b/exec.c
 index a68b65c..dd4833d 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
 void *host,
 
  ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
 last_ram_offset()  
 TARGET_PAGE_BITS);
 -memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
 -   0xff, size  TARGET_PAGE_BITS);
 +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff);

This will be much slower than memset , do we use it much ?


Orit
 
  if (kvm_enabled())
  kvm_setup_guest_memory(new_block-host, size);

Re: [Qemu-devel] [PATCH 8/8] Maintain the number of dirty pages

2012-06-25 Thread Orit Wasserman

On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Calculate the number of dirty pages takes a lot on hosts with lots
 of memory.  Just maintain how many pages are dirty.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  arch_init.c |   15 +--
  cpu-all.h   |1 +
  exec-obsolete.h |   10 ++
  3 files changed, 12 insertions(+), 14 deletions(-)
 
 diff --git a/arch_init.c b/arch_init.c
 index 7de1abf..545cf8f 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -228,20 +228,7 @@ static uint64_t bytes_transferred;
 
  static ram_addr_t ram_save_remaining(void)
  {
 -RAMBlock *block;
 -ram_addr_t count = 0;
 -
 -QLIST_FOREACH(block, ram_list.blocks, next) {
 -ram_addr_t addr;
 -for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
 -if (memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE,
 -DIRTY_MEMORY_MIGRATION)) {
 -count++;
 -}
 -}
 -}
 -
 -return count;
 +return ram_list.dirty_pages;
  }
 
  uint64_t ram_bytes_remaining(void)
 diff --git a/cpu-all.h b/cpu-all.h
 index 50c8b62..88cedba 100644
 --- a/cpu-all.h
 +++ b/cpu-all.h
 @@ -477,6 +477,7 @@ typedef struct RAMBlock {
  typedef struct RAMList {
  uint8_t *phys_dirty;
  QLIST_HEAD(, RAMBlock) blocks;
 +uint64_t dirty_pages;
  } RAMList;
  extern RAMList ram_list;
 
 diff --git a/exec-obsolete.h b/exec-obsolete.h
 index f8ffce6..c099256 100644
 --- a/exec-obsolete.h
 +++ b/exec-obsolete.h
 @@ -74,6 +74,11 @@ static inline int cpu_physical_memory_get_dirty(ram_addr_t 
 start,
  static inline int cpu_physical_memory_set_dirty_flags(ram_addr_t addr,
int dirty_flags)
  {
 +if ((dirty_flags  MIGRATION_DIRTY_FLAG) 
 +!cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
 +   MIGRATION_DIRTY_FLAG)) {
 +ram_list.dirty_pages++;
 +}
  return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] |= dirty_flags;
  }
 
 @@ -87,6 +92,11 @@ static inline int 
 cpu_physical_memory_clear_dirty_flags(ram_addr_t addr,
  {
  int mask = ~dirty_flags;
 
 +if ((dirty_flags  MIGRATION_DIRTY_FLAG) 
 +cpu_physical_memory_get_dirty(addr, TARGET_PAGE_SIZE,
 +  MIGRATION_DIRTY_FLAG)) {
 +ram_list.dirty_pages--;
 +}
  return ram_list.phys_dirty[addr  TARGET_PAGE_BITS] = mask;
  }
 
Reviewed-by: Orit Wasserman owass...@redhat.com

Re: [Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing

2012-06-25 Thread Hervé Poussineau


Jan Kiszka a écrit :

This MMIO area is an entry gate to legacy PC ISA devices, addressed via
PIO over there. Quite a few of the PIO ports have side effects on access
like starting/stopping timers that must be executed properly ordered
/wrt the CPU. So we have to remove the coalescing mark.

CC: Andreas Färber andreas.faer...@web.de
CC: Hervé Poussineau hpous...@reactos.org
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i82378.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)


Acked-by: Hervé Poussineau hpous...@reactos.org

Re: [Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing

2012-06-25 Thread Andreas Färber

Am 25.06.2012 09:11, schrieb Hervé Poussineau:
 Jan Kiszka a écrit :
 This MMIO area is an entry gate to legacy PC ISA devices, addressed via
 PIO over there. Quite a few of the PIO ports have side effects on access
 like starting/stopping timers that must be executed properly ordered
 /wrt the CPU. So we have to remove the coalescing mark.

 CC: Andreas Färber andreas.faer...@web.de
 CC: Hervé Poussineau hpous...@reactos.org
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/i82378.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 
 Acked-by: Hervé Poussineau hpous...@reactos.org

Fine with me then.

Andreas

[Qemu-devel] [PATCH 0/5] kvm: Get coalesced MMIO flushing out of the hot-path

2012-06-25 Thread Jan Kiszka

We currently flush the coalesced MMIO buffer on every vmexit to
userspace. KVM only provides a single buffer per VM, so a central lock
is required to read from it. This is a contention point given a large
enough VCPU set. Moreover, we need to hold the BQL while replaying the
queued requests, probably for a long time until there is more fine
grained locking available. Good reasons to overcome the unconditional
flush.

The series achieves this by flushing only on selected memory region
accesses, either generically via the memory access dispatcher or
directly on certain VGA PIO accesses that are not yet fully converted.
Another reason to flush are remappings or other relevant region state
changes.

Please review carefully.

CC: Andreas Färber andreas.faer...@web.de
CC: Hervé Poussineau hpous...@reactos.org

Jan Kiszka (5):
  i82378: Remove bogus MMIO coalescing
  memory: Flush coalesced MMIO on selected region access
  memory: Flush coalesced MMIO on mapping and state changes
  VGA: Flush coalesced MMIO on related MMIO/PIO accesses
  kvm: Stop flushing coalesced MMIO on vmexit

 hw/cirrus_vga.c |7 +++
 hw/i82378.c |1 -
 hw/qxl.c|1 +
 hw/vga-isa-mm.c |1 +
 hw/vga.c|5 +
 hw/vmware_vga.c |1 +
 kvm-all.c   |2 --
 memory.c|   36 
 memory.h|   13 +
 9 files changed, 64 insertions(+), 3 deletions(-)

-- 
1.7.3.4

[Qemu-devel] [PATCH 4/5] VGA: Flush coalesced MMIO on related MMIO/PIO accesses

2012-06-25 Thread Jan Kiszka

In preparation of stopping to flush coalesced MMIO unconditionally on
vmexits, mark VGA MMIO and PIO regions as synchronous /wrt coalesced
MMIO and flush the buffer explicitly on PIO accesses that do not use
generic memory regions yet.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/cirrus_vga.c |7 +++
 hw/qxl.c|1 +
 hw/vga-isa-mm.c |1 +
 hw/vga.c|5 +
 hw/vmware_vga.c |1 +
 5 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index afedaa4..cf551a3 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2439,6 +2439,8 @@ static uint32_t cirrus_vga_ioport_read(void *opaque, 
uint32_t addr)
 VGACommonState *s = c-vga;
 int val, index;
 
+qemu_flush_coalesced_mmio_buffer();
+
 if (vga_ioport_invalid(s, addr)) {
val = 0xff;
 } else {
@@ -2532,6 +2534,8 @@ static void cirrus_vga_ioport_write(void *opaque, 
uint32_t addr, uint32_t val)
 VGACommonState *s = c-vga;
 int index;
 
+qemu_flush_coalesced_mmio_buffer();
+
 /* check port range access depending on color/monochrome mode */
 if (vga_ioport_invalid(s, addr)) {
return;
@@ -2852,6 +2856,7 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
 /* I/O handler for LFB */
 memory_region_init_io(s-cirrus_linear_io, cirrus_linear_io_ops, s,
   cirrus-linear-io, VGA_RAM_SIZE);
+memory_region_set_flush_coalesced(s-cirrus_linear_io);
 
 /* I/O handler for LFB */
 memory_region_init_io(s-cirrus_linear_bitblt_io,
@@ -2859,10 +2864,12 @@ static void cirrus_init_common(CirrusVGAState * s, int 
device_id, int is_pci,
   s,
   cirrus-bitblt-mmio,
   0x40);
+memory_region_set_flush_coalesced(s-cirrus_linear_bitblt_io);
 
 /* I/O handler for memory-mapped I/O */
 memory_region_init_io(s-cirrus_mmio_io, cirrus_mmio_io_ops, s,
   cirrus-mmio, CIRRUS_PNPMMIO_SIZE);
+memory_region_set_flush_coalesced(s-cirrus_mmio_io);
 
 s-real_vram_size =
 (s-device_id == CIRRUS_ID_CLGD5446) ? 4096 * 1024 : 2048 * 1024;
diff --git a/hw/qxl.c b/hw/qxl.c
index 3da3399..ef21176 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1778,6 +1778,7 @@ static int qxl_init_common(PCIQXLDevice *qxl)
 if (qxl-id == 0) {
 vga_dirty_log_start(qxl-vga);
 }
+memory_region_set_flush_coalesced(qxl-io_bar);
 
 
 pci_register_bar(qxl-pci, QXL_IO_RANGE_INDEX,
diff --git a/hw/vga-isa-mm.c b/hw/vga-isa-mm.c
index f8984c6..aab78de 100644
--- a/hw/vga-isa-mm.c
+++ b/hw/vga-isa-mm.c
@@ -105,6 +105,7 @@ static void vga_mm_init(ISAVGAMMState *s, 
target_phys_addr_t vram_base,
 s_ioport_ctrl = g_malloc(sizeof(*s_ioport_ctrl));
 memory_region_init_io(s_ioport_ctrl, vga_mm_ctrl_ops, s,
   vga-mm-ctrl, 0x10);
+memory_region_set_flush_coalesced(s_ioport_ctrl);
 
 vga_io_memory = g_malloc(sizeof(*vga_io_memory));
 /* XXX: endianness? */
diff --git a/hw/vga.c b/hw/vga.c
index d784df7..2a6f040 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -357,6 +357,8 @@ uint32_t vga_ioport_read(void *opaque, uint32_t addr)
 VGACommonState *s = opaque;
 int val, index;
 
+qemu_flush_coalesced_mmio_buffer();
+
 if (vga_ioport_invalid(s, addr)) {
 val = 0xff;
 } else {
@@ -449,6 +451,8 @@ void vga_ioport_write(void *opaque, uint32_t addr, uint32_t 
val)
 VGACommonState *s = opaque;
 int index;
 
+qemu_flush_coalesced_mmio_buffer();
+
 /* check port range access depending on color/monochrome mode */
 if (vga_ioport_invalid(s, addr)) {
 return;
@@ -2320,6 +2324,7 @@ MemoryRegion *vga_init_io(VGACommonState *s,
 vga_mem = g_malloc(sizeof(*vga_mem));
 memory_region_init_io(vga_mem, vga_mem_ops, s,
   vga-lowmem, 0x2);
+memory_region_set_flush_coalesced(vga_mem);
 
 return vga_mem;
 }
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 142d9f4..cf2638d 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1182,6 +1182,7 @@ static int pci_vmsvga_initfn(PCIDevice *dev)
 
 memory_region_init_io(s-io_bar, vmsvga_io_ops, s-chip,
   vmsvga-io, 0x10);
+memory_region_set_flush_coalesced(s-io_bar);
 pci_register_bar(s-card, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar);
 
 vmsvga_init(s-chip, VGA_RAM_SIZE, pci_address_space(dev),
-- 
1.7.3.4

[Qemu-devel] [PATCH 2/5] memory: Flush coalesced MMIO on selected region access

2012-06-25 Thread Jan Kiszka

Instead of flushing pending coalesced MMIO requests on every vmexit,
this provides a mechanism to selectively flush when memory regions
related to the coalesced one are accessed. This first of all includes
the coalesced region itself but can also applied to other regions, e.g.
of the same device, by calling memory_region_set_flush_coalesced.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 memory.c |   13 +
 memory.h |   13 +
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index aab4a31..ba55b3e 100644
--- a/memory.c
+++ b/memory.c
@@ -311,6 +311,9 @@ static void memory_region_read_accessor(void *opaque,
 MemoryRegion *mr = opaque;
 uint64_t tmp;
 
+if (mr-flush_coalesced_mmio) {
+qemu_flush_coalesced_mmio_buffer();
+}
 tmp = mr-ops-read(mr-opaque, addr, size);
 *value |= (tmp  mask)  shift;
 }
@@ -325,6 +328,9 @@ static void memory_region_write_accessor(void *opaque,
 MemoryRegion *mr = opaque;
 uint64_t tmp;
 
+if (mr-flush_coalesced_mmio) {
+qemu_flush_coalesced_mmio_buffer();
+}
 tmp = (*value  shift)  mask;
 mr-ops-write(mr-opaque, addr, tmp, size);
 }
@@ -826,6 +832,7 @@ void memory_region_init(MemoryRegion *mr,
 mr-dirty_log_mask = 0;
 mr-ioeventfd_nb = 0;
 mr-ioeventfds = NULL;
+mr-flush_coalesced_mmio = false;
 }
 
 static bool memory_region_access_valid(MemoryRegion *mr,
@@ -1176,6 +1183,7 @@ void memory_region_add_coalescing(MemoryRegion *mr,
 cmr-addr = addrrange_make(int128_make64(offset), int128_make64(size));
 QTAILQ_INSERT_TAIL(mr-coalesced, cmr, link);
 memory_region_update_coalesced_range(mr);
+memory_region_set_flush_coalesced(mr);
 }
 
 void memory_region_clear_coalescing(MemoryRegion *mr)
@@ -1190,6 +1198,11 @@ void memory_region_clear_coalescing(MemoryRegion *mr)
 memory_region_update_coalesced_range(mr);
 }
 
+void memory_region_set_flush_coalesced(MemoryRegion *mr)
+{
+mr-flush_coalesced_mmio = true;
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
target_phys_addr_t addr,
unsigned size,
diff --git a/memory.h b/memory.h
index 740c48e..dca7a86 100644
--- a/memory.h
+++ b/memory.h
@@ -133,6 +133,7 @@ struct MemoryRegion {
 bool enabled;
 bool rom_device;
 bool warning_printed; /* For reservations */
+bool flush_coalesced_mmio;
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -521,6 +522,18 @@ void memory_region_add_coalescing(MemoryRegion *mr,
 void memory_region_clear_coalescing(MemoryRegion *mr);
 
 /**
+ * memory_region_set_flush_coalesced: Enforce memory coalescing flush before
+ *accesses.
+ *
+ * Ensure that pending coalesced MMIO request are flushed before the memory
+ * region is accessed. This property is automatically enabled for all regions
+ * passed to memory_region_set_coalescing() and memory_region_add_coalescing().
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_set_flush_coalesced(MemoryRegion *mr);
+
+/**
  * memory_region_add_eventfd: Request an eventfd to be triggered when a word
  *is written to a location.
  *
-- 
1.7.3.4

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Andreas Färber

Am 25.06.2012 03:51, schrieb Peter Crosthwaite:
 Ping! @PMM
 
 You rejected my workaround in V2 (i.e. using -dtb to force is_linux)
 on the grounds that youd accept some reasonable way of saying this
 ELF file is a Linux kernel, That brings us back to V1 - this patch.
 Any objections?

I won't object to it, but I don't like it: If it's really needed, it
should be os=linux or something (enum?) for extensibility or at some
point we'll need windows=on, vxworks=on, aix=on, etc.

Andreas

 
 Regards,
 Peter
 
 On Fri, Jun 1, 2012 at 11:41 AM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 1 June 2012 02:16, Peter A. G. Crosthwaite
 peter.crosthwa...@petalogix.com wrote:
 Added a switch to tell the bootloader that the image is linux and should be
 bootstrapped as such. This is needed to boot an elf that is linux.

 Syntax would be:

 qemu-system-arm ... -kernel linux.elf -machine linux=on

 This whole area is a mess. Personally I think we should have a
 standard way to say load this ELF file into memory which isn't
 target-dependent, and then we could make -kernel actually load
 a kernel rather than its current load a kernel unless it's an
 ELF file in which case do something else.

 There are some back-compatibility issues there, though.

 -- PMM
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] buildbot failure in qemu on default_openbsd_current

2012-06-25 Thread qemu

The Buildbot has detected a new failure on builder default_openbsd_current 
while building qemu.
Full details are available at:
 http://buildbot.b1-systems.de/qemu/builders/default_openbsd_current/builds/298

Buildbot URL: http://buildbot.b1-systems.de/qemu/

Buildslave for this Build: brad_openbsd_current

Build Reason: The Nightly scheduler named 'nightly_default' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

[Qemu-devel] [PATCH 5/5] kvm: Stop flushing coalesced MMIO on vmexit

2012-06-25 Thread Jan Kiszka

The memory subsystem will now take care of flushing whenever affected
regions are accessed or the memory mapping changes.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index f8e4328..a1d32f6 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1483,8 +1483,6 @@ int kvm_cpu_exec(CPUArchState *env)
 qemu_mutex_lock_iothread();
 kvm_arch_post_run(env, run);
 
-kvm_flush_coalesced_mmio_buffer();
-
 if (run_ret  0) {
 if (run_ret == -EINTR || run_ret == -EAGAIN) {
 DPRINTF(io window exit\n);
-- 
1.7.3.4

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Peter Crosthwaite

On Mon, Jun 25, 2012 at 5:31 PM, Andreas Färber afaer...@suse.de wrote:
 Am 25.06.2012 03:51, schrieb Peter Crosthwaite:
 Ping! @PMM

 You rejected my workaround in V2 (i.e. using -dtb to force is_linux)
 on the grounds that youd accept some reasonable way of saying this
 ELF file is a Linux kernel, That brings us back to V1 - this patch.
 Any objections?

 I won't object to it, but I don't like it: If it's really needed, it
 should be os=linux

String machine opt os=foo work?

 or something (enum?) for extensibility or at some
 point we'll need windows=on, vxworks=on, aix=on, etc.

 Andreas

[Qemu-devel] [PATCH 1/5] i82378: Remove bogus MMIO coalescing

2012-06-25 Thread Jan Kiszka

This MMIO area is an entry gate to legacy PC ISA devices, addressed via
PIO over there. Quite a few of the PIO ports have side effects on access
like starting/stopping timers that must be executed properly ordered
/wrt the CPU. So we have to remove the coalescing mark.

CC: Andreas Färber andreas.faer...@web.de
CC: Hervé Poussineau hpous...@reactos.org
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i82378.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/hw/i82378.c b/hw/i82378.c
index 9b11d90..2123c14 100644
--- a/hw/i82378.c
+++ b/hw/i82378.c
@@ -225,7 +225,6 @@ static int pci_i82378_init(PCIDevice *dev)
 pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-io);
 
 memory_region_init_io(s-mem, i82378_mem_ops, s, i82378-mem, 
0x0100);
-memory_region_set_coalescing(s-mem);
 pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s-mem);
 
 /* Make I/O address read only */
-- 
1.7.3.4

Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi

2012-06-25 Thread mc



Quoting Stefan Hajnoczi stefa...@gmail.com:


On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng m...@linux.vnet.ibm.com wrote:

Implement the hotplug() and hot_unplug() interfaces in virtio-scsi,
by signal
the virtio_scsi.ko in guest kernel via event virtual queue.

The counterpart patch of virtio_scsi.ko will be sent soon in another thread.



Signed-off-by: Cong Meng m...@linux.vnet.ibm.com
Signed-off-by: Sen Wang senw...@linux.vnet.ibm.com
---
 hw/virtio-scsi.c |   72
+++--
 1 files changed, 69 insertions(+), 3 deletions(-)


I compared against the virtio-scsi specification and this looks good:
http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

Dropped events and event throttling are not implemented by this patch.
 This means that the guest can miss events if it runs out of event
queue elements.  A scenario that might be able to trigger this is if
multiple LUNs are hotplugged in a single QEMU monitor callback.

Implementing dropped events is easy in hw/virtio-scsi.c.  Keep a bool
or counter of dropped events and report them when the guest kicks us
with a free event element (virtio_scsi_handle_event).


Yes. It's easy to do this in qemu. But I'm not sure what should be done
in virtio-scsi.ko to respond the VIRTIO_SCSI_T_EVENTS_MISSED event.
The spec says poll the logical units for unit attention conditions, or
just a whole bus rescan?



Stefan

[Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Jan Kiszka

Flush pending coalesced MMIO before performing mapping or state changes
that could affect the event orderings or route the buffered requests to
a wrong region.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com

In addition, we also have to
---
 memory.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index ba55b3e..141f92b 100644
--- a/memory.c
+++ b/memory.c
@@ -759,6 +759,7 @@ static void memory_region_update_topology(MemoryRegion *mr)
 
 void memory_region_transaction_begin(void)
 {
+qemu_flush_coalesced_mmio_buffer();
 ++memory_region_transaction_depth;
 }
 
@@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 
 void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
 {
+if (!QTAILQ_EMPTY(mr-coalesced)) {
+qemu_flush_coalesced_mmio_buffer();
+}
 if (mr-readonly != readonly) {
 mr-readonly = readonly;
 memory_region_update_topology(mr);
@@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool 
readonly)
 
 void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
 {
+if (!QTAILQ_EMPTY(mr-coalesced)) {
+qemu_flush_coalesced_mmio_buffer();
+}
 if (mr-readable != readable) {
 mr-readable = readable;
 memory_region_update_topology(mr);
@@ -1190,6 +1197,8 @@ void memory_region_clear_coalescing(MemoryRegion *mr)
 {
 CoalescedMemoryRange *cmr;
 
+qemu_flush_coalesced_mmio_buffer();
+
 while (!QTAILQ_EMPTY(mr-coalesced)) {
 cmr = QTAILQ_FIRST(mr-coalesced);
 QTAILQ_REMOVE(mr-coalesced, cmr, link);
@@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr,
 };
 unsigned i;
 
+if (!QTAILQ_EMPTY(mr-coalesced)) {
+qemu_flush_coalesced_mmio_buffer();
+}
 for (i = 0; i  mr-ioeventfd_nb; ++i) {
 if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) {
 break;
@@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr,
 };
 unsigned i;
 
+if (!QTAILQ_EMPTY(mr-coalesced)) {
+qemu_flush_coalesced_mmio_buffer();
+}
 for (i = 0; i  mr-ioeventfd_nb; ++i) {
 if (memory_region_ioeventfd_equal(mrfd, mr-ioeventfds[i])) {
 break;
@@ -1269,6 +1284,8 @@ static void 
memory_region_add_subregion_common(MemoryRegion *mr,
 {
 MemoryRegion *other;
 
+qemu_flush_coalesced_mmio_buffer();
+
 assert(!subregion-parent);
 subregion-parent = mr;
 subregion-addr = offset;
@@ -1327,6 +1344,8 @@ void memory_region_add_subregion_overlap(MemoryRegion *mr,
 void memory_region_del_subregion(MemoryRegion *mr,
  MemoryRegion *subregion)
 {
+qemu_flush_coalesced_mmio_buffer();
+
 assert(subregion-parent == mr);
 subregion-parent = NULL;
 QTAILQ_REMOVE(mr-subregions, subregion, subregions_link);
@@ -1335,6 +1354,8 @@ void memory_region_del_subregion(MemoryRegion *mr,
 
 void memory_region_set_enabled(MemoryRegion *mr, bool enabled)
 {
+qemu_flush_coalesced_mmio_buffer();
+
 if (enabled == mr-enabled) {
 return;
 }
@@ -1367,6 +1388,8 @@ void memory_region_set_alias_offset(MemoryRegion *mr, 
target_phys_addr_t offset)
 {
 target_phys_addr_t old_offset = mr-alias_offset;
 
+qemu_flush_coalesced_mmio_buffer();
+
 assert(mr-alias);
 mr-alias_offset = offset;
 
-- 
1.7.3.4

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Peter Maydell

On 25 June 2012 02:51, Peter Crosthwaite
peter.crosthwa...@petalogix.com wrote:
 You rejected my workaround in V2 (i.e. using -dtb to force is_linux)
 on the grounds that youd accept some reasonable way of saying this
 ELF file is a Linux kernel, That brings us back to V1 - this patch.
 Any objections?

I'd really like to see some input from people who deal with the
other QEMU architectures on whether we can come up with something
coherent as a plan for handling these loader command line options.
I don't really like here's a switch but it happens to only
work on ARM targets if we can avoid it.

-- PMM

Re: [Qemu-devel] [PATCH 2/5] memory: Flush coalesced MMIO on selected region access

2012-06-25 Thread Avi Kivity

On 06/25/2012 10:00 AM, Jan Kiszka wrote:
 Instead of flushing pending coalesced MMIO requests on every vmexit,
 this provides a mechanism to selectively flush when memory regions
 related to the coalesced one are accessed. This first of all includes
 the coalesced region itself but can also applied to other regions, e.g.

 @@ -521,6 +522,18 @@ void memory_region_add_coalescing(MemoryRegion *mr,
  void memory_region_clear_coalescing(MemoryRegion *mr);
  
  /**
 + * memory_region_set_flush_coalesced: Enforce memory coalescing flush before
 + *accesses.
 + *
 + * Ensure that pending coalesced MMIO request are flushed before the memory
 + * region is accessed. This property is automatically enabled for all regions
 + * passed to memory_region_set_coalescing() and 
 memory_region_add_coalescing().
 + *
 + * @mr: the memory region to be updated.
 + */
 +void memory_region_set_flush_coalesced(MemoryRegion *mr);
 +
 +/**


Please provide a way to clear the flag (and autoclear on clear_coalesced).


-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)

2012-06-25 Thread Mike Lovell


Hi all,
Here is something I've been tinkering with the past few weeks and now have it
in a state where the basic idea makes sense, it works, and could use some
feedback from the community.

This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
first had the idea when I was playing with the udp and mcast socket network
backends while exploring how to build a VM infrastructure. I liked the idea of
using the sockets backends cause it doesn't require escalated permissions to
configure and run as well as the ability to talk over IP networks.

But the built in socket backends either allowed for only 2 guests talking
directly or for multiple guests where all traffic is sent to all. So one can
either have two guests talking or have bandwidth wasted with multiple guests.
There wasn't something that could talk to multiple guests but also utilize
unicast traffic.

So I made a backend that can do this. It takes the basics of how the udp and
mcast socket backends work and combines them with some switching based on the
ethernet packets. The result is multiple guests can talk to each other but
not waste bandwidth by delivering unicast traffic to all guests. The backend
also adds some header data to each packet. This header includes a network
identifier so multiple logical networks can be created using the same
multicast configuration but still have separation in the guests.

Its kind of like VXLAN or NVGRE but replace the GRE tunnels with UDP packets.

There are a couple advantages that I see to this. It allows for multiple guests
in multiple locations to talk to each other while keeping unicast traffic to
just between two hosts. It doesn't require root permissions to run. It can
operate over non-ethernet networks (like IPoIB). It doesn't require changing
network configuration on the host. It allows for a ton of logical networks to
be created (currently 65536 per multicast address and port combination).

There are a few disadvantages as well. It does add some more processing to the
QEMU process but not much (I saw it go as fast as the socket backends). It is
encapsulating an Ethernet frame inside a UDP packet so there is the overhead of
the IP and UDP headers as well as the transport medium headers (most likely
Ethernet again). Because there is additional header data and MTU of the guest
could be limited depending on the ability to send larger multicast packet from
the host. (I haven't really looked closely at this last one). There isn't the
ability for something besides QEMU processes to communicate using this, though
I hope to build a utility to work with a tap device.

Overall, I think this is something that's pretty cool. I don't know how much
people give any thought to the socket backends for real world use and so I
don't know if this would be of much use to anyone. I am looking for some
feedback into what the community thinks and for comments about the code. Its
only my second time doing more than 20 lines of C so I'm sure I did some stupid
things. I have only tested on 64 bit x86 Linux systems so far.

Hopefully you all have good things to say. :)

mike

[Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch

2012-06-25 Thread Mike Lovell

This commit adds a new network backend to QEMU. It combines the basic behavior
of the unicast udp and multicast socket backends with some intelligence about
the source and destination of the packets. It also adds a header to the
packets to allow for creating multiple logical networks using the same
underlying network infrastructure. (Kind of like GRE Keys). This provides a
network backend that acts as a type of Ethernet switch.

During initialization, QDES will create two sockets. One that is used for
receiving unicast udp traffic and for sending both unicast and multicast udp
traffic. The second socket is for receiving multicast udp traffic. A
GHashTable is also created that will serve as an address table for where
packets should be delivered. A timer is also configured to regularly clean the
contents of the address table.

When a packet is received by either of the two sockets, the header is parsed
and checked to make sure the packet is a member of the correct logical
network. Then the ethernet frame that is received is inspected for the source
MAC address. The address table is then updated with the source MAC address,
the address of where the packets was received from, and the current time.

When QEMU delivers a packet to be sent, the destination MAC address is looked
for in the address table. If it is found, the packet is sent to the remote
address stored in the table along with the approriate header. If the address
is not found, the packet and header is sent to the configured multicast
address so all members of the network will receive the packet.

Unfortunately, only IPv4 is currently supported. IPv6 is on the short list of
improvements to be made.

Signed-off-by: Mike Lovell m...@dev-zero.net
---
 hmp-commands.hx   |2 +-
 net.c |   31 +++-
 net.h |1 +
 net/Makefile.objs |1 +
 net/qdes.c|  453 +
 net/qdes.h|   38 +
 net/socket.c  |3 +-
 net/socket.h  |3 +
 qemu-options.hx   |5 +
 9 files changed, 533 insertions(+), 4 deletions(-)
 create mode 100644 net/qdes.c
 create mode 100644 net/qdes.h

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..042bb85 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1007,7 +1007,7 @@ ETEXI
 {
 .name   = host_net_add,
 .args_type  = device:s,opts:s?,
-.params = tap|user|socket|vde|dump [options],
+.params = tap|user|socket|qdes|vde|dump [options],
 .help   = add host VLAN client,
 .mhandler.cmd = net_host_device_add,
 },
diff --git a/net.c b/net.c
index 4aa416c..dbb4a48 100644
--- a/net.c
+++ b/net.c
@@ -30,6 +30,7 @@
 #include net/dump.h
 #include net/slirp.h
 #include net/vde.h
+#include net/qdes.h
 #include net/util.h
 #include monitor.h
 #include qemu-common.h
@@ -1016,6 +1017,30 @@ static const struct {
 { /* end of list */ }
 },
 },
+[NET_CLIENT_TYPE_QDES] = {
+.type = qdes,
+.init = net_init_qdes,
+.desc = {
+NET_COMMON_PARAMS_DESC,
+{
+.name = timer,
+.type = QEMU_OPT_NUMBER,
+.help = Seconds between cleaning mac address table
+}, {
+.name = mcast,
+.type = QEMU_OPT_STRING,
+.help = UDP multicast address and port number,
+}, {
+.name = localaddr,
+.type = QEMU_OPT_STRING,
+.help = source address for multicast and udp packets,
+}, {
+.name = network,
+.type = QEMU_OPT_NUMBER,
+.help = qdes network number,
+},
+},
+},
 #ifdef CONFIG_VDE
 [NET_CLIENT_TYPE_VDE] = {
 .type = vde,
@@ -1104,7 +1129,8 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error 
**errp)
 #ifdef CONFIG_VDE
 strcmp(type, vde) != 0 
 #endif
-strcmp(type, socket) != 0) {
+strcmp(type, socket) != 0 
+strcmp(type, qdes) != 0) {
 error_set(errp, QERR_INVALID_PARAMETER_VALUE, type,
   a netdev backend type);
 return -1;
@@ -1170,7 +1196,7 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error 
**errp)
 static int net_host_check_device(const char *device)
 {
 int i;
-const char *valid_param_list[] = { tap, socket, dump
+const char *valid_param_list[] = { tap, socket, dump, qdes
 #ifdef CONFIG_NET_BRIDGE
, bridge
 #endif
@@ -1405,6 +1431,7 @@ void net_check_clients(void)
 case NET_CLIENT_TYPE_USER:
 case NET_CLIENT_TYPE_TAP:
 case NET_CLIENT_TYPE_SOCKET:
+case NET_CLIENT_TYPE_QDES:
 case NET_CLIENT_TYPE_VDE:
 has_host_dev = 1;
 break;
diff --git a/net.h b/net.h
index bdc2a06..bf932b0 100644
--- a/net.h
+++ b/net.h

[Qemu-devel] [Bug 1013467] Re: network in bridge mode doesn't work in SMP Linux guest

2012-06-25 Thread Xuekun Hu

The regression is due to the patch of rtl8139: do the network/host 
communication only in normal operating mode. (Commit
ff71f2e8cacefae99179993204172bc65e4303df). 

However after applying the patch, someone already reported this issue.
(http://comments.gmane.org/gmane.comp.emulators.qemu/150064). So
revertted this patch.

Not sure why this patch revert-revert again? So the bug happened again.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1013467

Title:
  network in bridge mode doesn't work in SMP Linux guest

Status in QEMU:
  New

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit:25e531a988ea5a64bd97a72dc9d3c65ad5850120
  qemu-kvm Commit:0a948cbb1835e5f36990e173966d30bc4c8cc038
  Host Kernel Version:3.5.0-rc1
  Hardware:WSM-EP, Romley-EP

  
  Bug detailed description:
  --
  Network in bridge mode doesn't work in SMP RHEL6.1/6.2 guest.
  After booting up, use “dhclient ethX” to get IP (we use DHCP in the network), 
guest can not get IP.
  Note:  1. UP guest(-smp 1) doesn’t have this issue.
   2. Windows(e.g. Win7) guest(-smp 4) has no issue.

  This should be a qemu-kvm bug.
  kvm  + qemu-kvm   =  result
  25e531a9 + 0a948cbb=  bad 
  25e531a9 + 18b01275   =  good

  
  Reproduce steps:
  
  1.start up a host with kvm(25e531a9) + qemu-kvm (0a948cbb)
  2.qemu-system-x86_64 -m 1024 -smp 4 –net nic,model=rt18139 –net 
tap,script=/etc/kvm/qemu-ifup –hda /root/rhel6u1.qcow
  3. ifconfig
  4. ping 192.168.199.98  (a IP in the same subnet)
  5. dhclient ethx   (can't get dynamic IP address)

  
  Current result:
  
  The guest can not get IP

  Expected result:
  
  The guest get IP and work fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1013467/+subscriptions

Re: [Qemu-devel] [PATCH] Initial commit for QDES - QEMU Distributed Ethernet Switch

2012-06-25 Thread Mike Lovell

I'm not sure why but it looks like my intro email for this got eaten by 
something. Here it is again and sorry if it shows up twice. This is my 
first time posting to the list and submitting a patch and I guess 
something doesn't like the way I did it.


-
Hi all,
Here is something I've been tinkering with the past few weeks and now 
have it

in a state where the basic idea makes sense, it works, and could use some
feedback from the community.

This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
first had the idea when I was playing with the udp and mcast socket network
backends while exploring how to build a VM infrastructure. I liked the 
idea of

using the sockets backends cause it doesn't require escalated permissions to
configure and run as well as the ability to talk over IP networks.

But the built in socket backends either allowed for only 2 guests talking
directly or for multiple guests where all traffic is sent to all. So one can
either have two guests talking or have bandwidth wasted with multiple 
guests.

There wasn't something that could talk to multiple guests but also utilize
unicast traffic.

So I made a backend that can do this. It takes the basics of how the udp and
mcast socket backends work and combines them with some switching based 
on the

ethernet packets. The result is multiple guests can talk to each other but
not waste bandwidth by delivering unicast traffic to all guests. The backend
also adds some header data to each packet. This header includes a network
identifier so multiple logical networks can be created using the same
multicast configuration but still have separation in the guests.

There are a couple advantages that I see to this. It allows for multiple 
guests

in multiple locations to talk to each other while keeping unicast traffic to
just between two hosts. It doesn't require root permissions to run. It can
operate over non-ethernet networks (like IPoIB). It doesn't require changing
network configuration on the host. It allows for a ton of logical 
networks to

be created (currently 65536 per multicast address and port combination).

There are a few disadvantages as well. It does add some more processing 
to the
QEMU process but not much (I saw it go as fast as the socket backends). 
It is
encapsulating an Ethernet frame inside a UDP packet so there is the 
overhead of

the IP and UDP headers as well as the transport medium headers (most likely
Ethernet again). Because there is additional header data and MTU of the 
guest
could be limited depending on the ability to send larger multicast 
packet from
the host. (I haven't really looked closely at this last one). There 
isn't the
ability for something besides QEMU processes to communicate using this, 
though

I hope to build a utility to work with a tap device.

Overall, I think this is something that's pretty cool. I don't know how much
people give any thought to the socket backends for real world use and so I
don't know if this would be of much use to anyone. I am looking for some
feedback into what the community thinks and for comments about the code. Its
only my second time doing more than 20 lines of C so I'm sure I did some 
stupid

things. I have only tested on 64 bit x86 Linux systems so far.

Hopefully you all have good things to say. :)

mike

[Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Oleg Ogurtsov

This patch add RTC device

Oleg Ogurtsov (1):
  Exynos4: added RTC device

 hw/arm/Makefile.objs |1 +
 hw/exynos4210.c  |8 +
 hw/exynos4210_rtc.c  |  607 ++
 3 files changed, 616 insertions(+), 0 deletions(-)
 create mode 100644 hw/exynos4210_rtc.c

-- 
1.7.5.4

[Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Oleg Ogurtsov


Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com
---
 hw/arm/Makefile.objs |1 +
 hw/exynos4210.c  |8 +
 hw/exynos4210_rtc.c  |  607 ++
 3 files changed, 616 insertions(+), 0 deletions(-)
 create mode 100644 hw/exynos4210_rtc.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 88ff47d..0fdb832 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -11,6 +11,7 @@ obj-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o 
a9mpcore.o
 obj-y += exynos4210_gic.o exynos4210_combiner.o exynos4210.o
 obj-y += exynos4_boards.o exynos4210_uart.o exynos4210_pwm.o
 obj-y += exynos4210_pmu.o exynos4210_mct.o exynos4210_fimd.o
+obj-y += exynos4210_rtc.o
 obj-y += arm_l2x0.o
 obj-y += arm_mptimer.o a15mpcore.o
 obj-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o
diff --git a/hw/exynos4210.c b/hw/exynos4210.c
index 9c20b3f..6e105e0 100644
--- a/hw/exynos4210.c
+++ b/hw/exynos4210.c
@@ -33,6 +33,9 @@
 /* PWM */
 #define EXYNOS4210_PWM_BASE_ADDR   0x139D
 
+/* RTC */
+#define EXYNOS4210_RTC_BASE_ADDR   0x1007
+
 /* MCT */
 #define EXYNOS4210_MCT_BASE_ADDR   0x1005
 
@@ -258,6 +261,11 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem,
   s-irq_table[exynos4210_get_irq(22, 3)],
   s-irq_table[exynos4210_get_irq(22, 4)],
   NULL);
+/* RTC */
+sysbus_create_varargs(exynos4210.rtc, EXYNOS4210_RTC_BASE_ADDR,
+  s-irq_table[exynos4210_get_irq(23, 0)],
+  s-irq_table[exynos4210_get_irq(23, 1)],
+  NULL);
 
 /* Multi Core Timer */
 dev = qdev_create(NULL, exynos4210.mct);
diff --git a/hw/exynos4210_rtc.c b/hw/exynos4210_rtc.c
new file mode 100644
index 000..2ac6301
--- /dev/null
+++ b/hw/exynos4210_rtc.c
@@ -0,0 +1,607 @@
+/*
+ * Samsung exynos4210 Real Time Clock
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ *  Ogurtsov Oleg o.ogurt...@samsung.com
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the
+ *  Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ *  for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, see http://www.gnu.org/licenses/.
+ *
+ */
+
+/* Description:
+ * Register RTCCON:
+ *  CLKSEL Bit[1] not used
+ *  CLKOUTEN Bit[9] not used
+ */
+
+#include sysbus.h
+#include qemu-timer.h
+#include qemu-common.h
+#include ptimer.h
+
+#include hw.h
+#include qemu-timer.h
+#include sysemu.h
+
+#include exynos4210.h
+
+#define DEBUG_RTC 0
+
+#if DEBUG_RTC
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, RTC: [%24s:%5d]  fmt, __func__, __LINE__, \
+## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while (0)
+#endif
+
+#define EXYNOS4210_RTC_REG_MEM_SIZE 0x0100
+
+#define INTP0x0030
+#define RTCCON  0x0040
+#define TICCNT  0x0044
+#define RTCALM  0x0050
+#define ALMSEC  0x0054
+#define ALMMIN  0x0058
+#define ALMHOUR 0x005C
+#define ALMDAY  0x0060
+#define ALMMON  0x0064
+#define ALMYEAR 0x0068
+#define BCDSEC  0x0070
+#define BDCMIN  0x0074
+#define BCDHOUR 0x0078
+#define BCDDAY  0x007C
+#define BCDDAYWEEK  0x0080
+#define BCDMON  0x0084
+#define BCDYEAR 0x0088
+#define CURTICNT0x0090
+
+#define TICK_TIMER_ENABLE   0x0100
+#define TICNT_THRESHHOLD2
+
+
+#define RTC_ENABLE  0x0001
+
+#define INTP_TICK_ENABLE0x0001
+#define INTP_ALM_ENABLE 0x0002
+
+#define ALARM_INT_ENABLE0x0040
+
+#define RTC_BASE_FREQ   32768
+
+typedef struct Exynos4210RTCState {
+SysBusDevice busdev;
+MemoryRegion iomem;
+
+/* registers */
+uint32_treg_intp;
+uint32_treg_rtccon;
+uint32_treg_ticcnt;
+uint32_treg_rtcalm;
+uint32_treg_almsec;
+uint32_treg_almmin;
+uint32_treg_almhour;
+uint32_treg_almday;
+uint32_treg_almmon;
+uint32_treg_almyear;
+uint32_treg_curticcnt;
+
+ptimer_state*ptimer;/* tick timer */
+ptimer_state*ptimer_1Hz;/* clock timer */
+uint32_tfreq;
+
+qemu_irqtick_irq;   /* Time Tick Generator irq */
+qemu_irqalm_irq;/* alarm irq */
+
+struct tm   current_tm; /* current time

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Andreas Färber

Am 25.06.2012 09:01, schrieb Jan Kiszka:
 Flush pending coalesced MMIO before performing mapping or state changes
 that could affect the event orderings or route the buffered requests to
 a wrong region.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 
 In addition, we also have to

Stray paste or missing addition to the above?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 0/7] esp: add AMD PCscsi emulation

2012-06-25 Thread Kevin Wolf

Am 24.06.2012 19:15, schrieb Hervé Poussineau:
 This patchset adds a PCI connection to the 53c9x emulation already
 present in QEMU.  The emulated card is the AMD PCscsi adapter,
 and has been tested on multiple Microsoft operating systems.
 
 Changes v1 - v2:
 - use qemu_log_mask(LOG_UNIMP, ...) instead of error_report()
 - use prefix esp_pci_* for functions related to PCI emulation
 - add missing break
 - fix DMA start for Select and Transfer Information commands
 
 Hervé Poussineau (9):
   esp: execute select commands immediately when it is a non-dma command
   esp: delay Transfer Information command if dma is not enabled
   esp: implement Disable selection command
   esp: implement Reset ATN command
   esp: support future change of chip_id
   esp: use hba_private field instead of a complex cast
   esp: split esp code into generic chip emulation and sysbus layer
   esp: use trace framework instead of stderr output
   esp: add AMD PCscsi emulation (PCI SCSI adapter)
 
  default-configs/i386-softmmu.mak |1 +
  hw/esp.c |  544 
 --
  hw/pci_ids.h |1 +
  trace-events |   17 ++
  4 files changed, 481 insertions(+), 82 deletions(-)
 

Please keep Paolo CCed, he is the SCSI maintainer (though he's still on
vacation this week, iirc).

Kevin

Re: [Qemu-devel] [PATCH v6 5/6] fdc_test: update media_change test

2012-06-25 Thread Kevin Wolf

Am 24.06.2012 08:16, schrieb Blue Swirl:
 On Fri, Jun 22, 2012 at 10:33 AM, Pavel Hrdina phrd...@redhat.com wrote:
 After rewrite DSKCHG bit handling the test has to be updated. Now
 is needed to seek to different track to clear DSKCHG bit.

 Signed-off-by: Pavel Hrdina phrd...@redhat.com
 ---
  tests/fdc-test.c |   29 +
  1 files changed, 21 insertions(+), 8 deletions(-)

 diff --git a/tests/fdc-test.c b/tests/fdc-test.c
 index 610e2f1..5280eff 100644
 --- a/tests/fdc-test.c
 +++ b/tests/fdc-test.c
 @@ -156,19 +156,20 @@ static uint8_t send_read_command(void)
 return ret;
  }

 -static void send_step_pulse(void)
 +static void send_step_pulse(bool chg_cyl)
  {
 int drive = 0;
 int head = 0;
 -static int cyl = 0;
 +int cyl = 0;
 +
 +if (chg_cyl)
 +cyl = (cyl + 1) % 4;
 
 Missing braces, please use checkpatch.pl to avoid these issues.
 
 % 4 could be turned into  3, maybe with a separate patch.

I wouldn't do that. It's not something like a mask for a bitfield, but
just a way to let counting start from the beginning after a while. If
you have two way to express the same, choose the one that expresses the
clearest what the real intention is. The compiler will optimise it (not
that this optimisation in a test case like this really mattered...)

Kevin

Re: [Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Evgeny Voevodin


On 25.06.2012 11:55, Oleg Ogurtsov wrote:

Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com
---
  hw/arm/Makefile.objs |1 +
  hw/exynos4210.c  |8 +
  hw/exynos4210_rtc.c  |  607 ++
  3 files changed, 616 insertions(+), 0 deletions(-)
  create mode 100644 hw/exynos4210_rtc.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 88ff47d..0fdb832 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -11,6 +11,7 @@ obj-y += realview_gic.o realview.o arm_sysctl.o arm11mpcore.o 
a9mpcore.o
  obj-y += exynos4210_gic.o exynos4210_combiner.o exynos4210.o
  obj-y += exynos4_boards.o exynos4210_uart.o exynos4210_pwm.o
  obj-y += exynos4210_pmu.o exynos4210_mct.o exynos4210_fimd.o
+obj-y += exynos4210_rtc.o
  obj-y += arm_l2x0.o
  obj-y += arm_mptimer.o a15mpcore.o
  obj-y += armv7m.o armv7m_nvic.o stellaris.o pl022.o stellaris_enet.o
diff --git a/hw/exynos4210.c b/hw/exynos4210.c
index 9c20b3f..6e105e0 100644
--- a/hw/exynos4210.c
+++ b/hw/exynos4210.c
@@ -33,6 +33,9 @@
  /* PWM */
  #define EXYNOS4210_PWM_BASE_ADDR   0x139D

+/* RTC */
+#define EXYNOS4210_RTC_BASE_ADDR   0x1007
+
  /* MCT */
  #define EXYNOS4210_MCT_BASE_ADDR   0x1005

@@ -258,6 +261,11 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem,
s-irq_table[exynos4210_get_irq(22, 3)],
s-irq_table[exynos4210_get_irq(22, 4)],
NULL);
+/* RTC */
+sysbus_create_varargs(exynos4210.rtc, EXYNOS4210_RTC_BASE_ADDR,
+  s-irq_table[exynos4210_get_irq(23, 0)],
+  s-irq_table[exynos4210_get_irq(23, 1)],
+  NULL);

  /* Multi Core Timer */
  dev = qdev_create(NULL, exynos4210.mct);
diff --git a/hw/exynos4210_rtc.c b/hw/exynos4210_rtc.c
new file mode 100644
index 000..2ac6301
--- /dev/null
+++ b/hw/exynos4210_rtc.c
@@ -0,0 +1,607 @@
+/*
+ * Samsung exynos4210 Real Time Clock
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ *  Ogurtsov Olego.ogurt...@samsung.com
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the
+ *  Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ *  for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, seehttp://www.gnu.org/licenses/.
+ *
+ */
+
+/* Description:
+ * Register RTCCON:
+ *  CLKSEL Bit[1] not used
+ *  CLKOUTEN Bit[9] not used
+ */
+
+#include sysbus.h
+#include qemu-timer.h
+#include qemu-common.h
+#include ptimer.h
+
+#include hw.h
+#include qemu-timer.h
+#include sysemu.h
+
+#include exynos4210.h
+
+#define DEBUG_RTC 0
+
+#if DEBUG_RTC
+#define DPRINTF(fmt, ...) \
+do { fprintf(stdout, RTC: [%24s:%5d]  fmt, __func__, __LINE__, \
+## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while (0)
+#endif
+
+#define EXYNOS4210_RTC_REG_MEM_SIZE 0x0100
+
+#define INTP0x0030
+#define RTCCON  0x0040
+#define TICCNT  0x0044
+#define RTCALM  0x0050
+#define ALMSEC  0x0054
+#define ALMMIN  0x0058
+#define ALMHOUR 0x005C
+#define ALMDAY  0x0060
+#define ALMMON  0x0064
+#define ALMYEAR 0x0068
+#define BCDSEC  0x0070
+#define BDCMIN  0x0074
+#define BCDHOUR 0x0078
+#define BCDDAY  0x007C
+#define BCDDAYWEEK  0x0080
+#define BCDMON  0x0084
+#define BCDYEAR 0x0088
+#define CURTICNT0x0090
+
+#define TICK_TIMER_ENABLE   0x0100
+#define TICNT_THRESHHOLD2
+
+
+#define RTC_ENABLE  0x0001
+
+#define INTP_TICK_ENABLE0x0001
+#define INTP_ALM_ENABLE 0x0002
+
+#define ALARM_INT_ENABLE0x0040
+
+#define RTC_BASE_FREQ   32768
+
+typedef struct Exynos4210RTCState {
+SysBusDevice busdev;
+MemoryRegion iomem;
+
+/* registers */
+uint32_treg_intp;
+uint32_treg_rtccon;
+uint32_treg_ticcnt;
+uint32_treg_rtcalm;
+uint32_treg_almsec;
+uint32_treg_almmin;
+uint32_treg_almhour;
+uint32_treg_almday;
+uint32_treg_almmon;
+uint32_treg_almyear;
+uint32_treg_curticcnt;
+
+ptimer_state*ptimer;/* tick timer */
+ptimer_state*ptimer_1Hz;/* clock timer */
+uint32_tfreq;
+
+qemu_irqtick_irq;   /* Time Tick Generator irq */
+qemu_irqalm_irq;/* alarm

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Peter Crosthwaite

On Mon, Jun 25, 2012 at 5:59 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 25 June 2012 02:51, Peter Crosthwaite
 peter.crosthwa...@petalogix.com wrote:
 You rejected my workaround in V2 (i.e. using -dtb to force is_linux)
 on the grounds that youd accept some reasonable way of saying this
 ELF file is a Linux kernel, That brings us back to V1 - this patch.
 Any objections?

 I'd really like to see some input from people who deal with the
 other QEMU architectures on whether we can come up with something
 coherent as a plan for handling these loader command line options.
 I don't really like here's a switch but it happens to only
 work on ARM targets if we can avoid it.

Well, this switch itself is generic. Theres no armisms in the machine
opt itself.

As for architectures, I can speak for Microblaze and the PPC
bootloader for virtex_ml507. The DTB will blob in regardless of
whether its an elf or not, so no is_linux switch is needed there for
my Linux elfs. There is no functional difference between linux and not
linux in those bootloaders.

 -- PMM

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Avi Kivity

On 06/25/2012 10:01 AM, Jan Kiszka wrote:
 Flush pending coalesced MMIO before performing mapping or state changes
 that could affect the event orderings or route the buffered requests to
 a wrong region.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 In addition, we also have to

Yes, we do.

  
  void memory_region_transaction_begin(void)
  {
 +qemu_flush_coalesced_mmio_buffer();
  ++memory_region_transaction_depth;
  }

Why is this needed?

  
 @@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
  
  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
  {
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}

The readonly attribute is inherited by subregions and alias targets, so
this check is insufficient.  See render_memory_region().  Need to flush
unconditionally.

  if (mr-readonly != readonly) {
  mr-readonly = readonly;
  memory_region_update_topology(mr);
 @@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool 
 readonly)
  
  void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
  {
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}

This property is not inherited, but let's flush unconditionally just the
same, to reduce fragility.

 @@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr,
  };
  unsigned i;
  
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}

Ditto.  It's possible that an eventfd overlays a subregion which has
coalescing enabled.  It's not really defined what happens in this case,
and such code and its submitter should be perma-nacked, but let's play
it safe here since there isn't much to be gained by avoiding the flush.
 This code is a very slow path anyway, including and rcu and/or srcu
synchronization, and a rebuild of the dispatch radix tree (trees when we
dma-enable this code).

  for (i = 0; i  mr-ioeventfd_nb; ++i) {
  if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) {
  break;
 @@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr,
  };
  unsigned i;
  
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}

Same.

The repetitiveness of this code suggests a different way of doing this:
make every API call be its own subtransaction and perform the flush in
memory_region_begin_transaction() (maybe that's the answer to my
question above).

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v6 08/16] target-or32: Add instruction tanslation

2012-06-25 Thread Max Filippov

On Mon, Jun 25, 2012 at 6:50 AM, Jia Liu pro...@gmail.com wrote:
 Hi Max,

 On Thu, Jun 21, 2012 at 6:24 PM, Max Filippov jcmvb...@gmail.com wrote:
 On Thu, Jun 21, 2012 at 6:58 AM, Jia Liu pro...@gmail.com wrote:
 Add OpenRISC instruction tanslation routines.

 Signed-off-by: Jia Liu pro...@gmail.com

 [...]

 +    case 0x0009:
 +        switch (op1) {
 +        case 0x03:   /*l.div*/
 +            LOG_DIS(l.div r%d, r%d, r%d\n, rd, ra, rb);
 +            {
 +                int lab0 = gen_new_label();
 +                int lab1 = gen_new_label();
 +                int lab2 = gen_new_label();
 +                TCGv_i32 sr_ove = tcg_temp_local_new_i32();
 +                if (rb == 0) {
 +                    tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
 +                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab0);
 +                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
 +                    gen_exception(dc, EXCP_RANGE);
 +                    gen_set_label(lab0);
 +                } else {
 +                    tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_R[rb],
 +                                       0x, lab1);
 +                    tcg_gen_brcondi_tl(TCG_COND_NE, cpu_R[ra],
 +                                       0x, lab2);
 +                    tcg_gen_brcondi_tl(TCG_COND_NE, cpu_R[rb],
 +                                       0x8000, lab2);
 +                    gen_set_label(lab1);
 +                    tcg_gen_ori_tl(cpu_sr, cpu_sr, (SR_OV | SR_CY));
 +                    tcg_gen_andi_tl(sr_ove, cpu_sr, SR_OVE);
 +                    tcg_gen_brcondi_tl(TCG_COND_NE, sr_ove, SR_OVE, lab2);

 Causes host division by zero/overflow. I'd suggest to brcond to lab3 set 
 after
 the final tcg_gen_div.

 Causes host division by zero/overflow? Can I handle the host code? I'm
 confused about this.
 May I get more comment about this? Sorry I didn't understand it.

The brcondi in question jumps to the division operation although we
know for sure
that it's going to be a division by zero or an overflow. The host is
not smarter than
we are, it cannot magically divide by zero. So, if the result register value is
undefined in case of division by zero/overflow than you should skip the division
and brcondi to a label lab3, right after it.

 +                    gen_exception(dc, EXCP_RANGE);
 +                    gen_set_label(lab2);
 +                    tcg_gen_div_tl(cpu_R[rd], cpu_R[ra], cpu_R[rb]);

+ gen_set_label(lab3);

 +                }
 +                tcg_temp_free_i32(sr_ove);
 +            }
 +            break;

A picture worth a thousand words is the test for this specific case.
I think it can look like this (because SR_OVE is cleared upon reset):

int main(void)
{
int a, b, c;

b = 0x8000;
c = 0x;

__asm
(l.div  %0, %1, %2\n
 : =r(a)
 : r(b), r(c)
);
return 0;
}

I believe that this test with the current div implementation would crash qemu.

BTW, shouldn't tests that detect wrong behavior do return 1 or something
like this to make make check fully automatic?

-- 
Thanks.
-- Max

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Peter Maydell

On 25 June 2012 09:55, Peter Crosthwaite
peter.crosthwa...@petalogix.com wrote:
 As for architectures, I can speak for Microblaze and the PPC
 bootloader for virtex_ml507. The DTB will blob in regardless of
 whether its an elf or not, so no is_linux switch is needed there for
 my Linux elfs. There is no functional difference between linux and not
 linux in those bootloaders.

Yes, it is exactly the on architecture X we behave like this and
on architecture Y we do something else that I dislike, and this is
why I'm not happy with any patch which is just trying to band-aid
a situation on architecture X rather than trying to address the
more general situation.

-- PMM

Re: [Qemu-devel] [PATCH 2/2] virtio-scsi: Implement hotplug support for virtio-scsi

2012-06-25 Thread Stefan Hajnoczi

On Mon, Jun 25, 2012 at 03:51:13AM -0400, m...@linux.vnet.ibm.com wrote:
 
 Quoting Stefan Hajnoczi stefa...@gmail.com:
 
 On Wed, Jun 20, 2012 at 7:47 AM, Cong Meng m...@linux.vnet.ibm.com wrote:
 Implement the hotplug() and hot_unplug() interfaces in virtio-scsi,
 by signal
 the virtio_scsi.ko in guest kernel via event virtual queue.
 
 The counterpart patch of virtio_scsi.ko will be sent soon in another thread.
 
 Signed-off-by: Cong Meng m...@linux.vnet.ibm.com
 Signed-off-by: Sen Wang senw...@linux.vnet.ibm.com
 ---
  hw/virtio-scsi.c |   72
 +++--
  1 files changed, 69 insertions(+), 3 deletions(-)
 
 I compared against the virtio-scsi specification and this looks good:
 http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf
 
 Dropped events and event throttling are not implemented by this patch.
  This means that the guest can miss events if it runs out of event
 queue elements.  A scenario that might be able to trigger this is if
 multiple LUNs are hotplugged in a single QEMU monitor callback.
 
 Implementing dropped events is easy in hw/virtio-scsi.c.  Keep a bool
 or counter of dropped events and report them when the guest kicks us
 with a free event element (virtio_scsi_handle_event).
 
 Yes. It's easy to do this in qemu. But I'm not sure what should be done
 in virtio-scsi.ko to respond the VIRTIO_SCSI_T_EVENTS_MISSED event.
 The spec says poll the logical units for unit attention conditions, or
 just a whole bus rescan?

I'm not sure what the answer is either, maybe you can find an existing
SCSI LLD that does what you need.

Stefan

Re: [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg

2012-06-25 Thread Stefan Hajnoczi

On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote:
 On 06/22/2012 12:36 PM, Corey Bryant wrote:
  This sets the close-on-exec flag for the file descriptor received
  via SCM_RIGHTS.
  
  Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com
  ---
  v4
   -This patch is new in v4 (ebl...@redhat.com)
  
   qemu-char.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/qemu-char.c b/qemu-char.c
  index c2aaaee..f890113 100644
  --- a/qemu-char.c
  +++ b/qemu-char.c
  @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, 
  char *buf, size_t len)
   msg.msg_control = msg_control;
   msg.msg_controllen = sizeof(msg_control);
   
  -ret = recvmsg(s-fd, msg, 0);
  +ret = recvmsg(s-fd, msg, MSG_CMSG_CLOEXEC);
 
 MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed
 for addition); therefore, at the moment, it only exists on Linux and
 Cygwin.  Does this need to have conditional code to allow compilation on
 BSD, such as:
 
 #ifndef MSG_CMSG_CLOEXEC
 # define MSG_CMSG_CLOEXEC 0
 #endif
 
 as well as fallback code that sets FD_CLOEXEC manually via fcntl() when
 MSG_CMSG_CLOEXEC is missing?

Good point.  I think the answer is yes.  Just like qemu_open() we can
wrap recvmsg(2) with fd so that platforms with MSG_CMSG_CLOEXEC use
that flag and other platforms use qemu_set_cloexec().

Stefan

Re: [Qemu-devel] [RFC PATCH V1 1/2] arm_boot: added linux switch

2012-06-25 Thread Peter Crosthwaite

On Mon, Jun 25, 2012 at 7:03 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 25 June 2012 09:55, Peter Crosthwaite
 peter.crosthwa...@petalogix.com wrote:
 As for architectures, I can speak for Microblaze and the PPC
 bootloader for virtex_ml507. The DTB will blob in regardless of
 whether its an elf or not, so no is_linux switch is needed there for
 my Linux elfs. There is no functional difference between linux and not
 linux in those bootloaders.

 Yes, it is exactly the on architecture X we behave like this and
 on architecture Y we do something else that I dislike, and this is
 why I'm not happy with any patch which is just trying to band-aid
 a situation on architecture X rather than trying to address the
 more general situation.


Well if X and Y behave differently and you want to unify them, then
either X or Y loses backwards compatibility. Someone is going to have
their command line change on them.

Looking at some of the other bootloaders, I dont see others that
assume elfs are not linux, so it seems to me that ARM is the
exception, not the rule. So from that unification would involve
getting rid of this ARM assumption that elfs are different. Just boot
the elf with the linux style bootloader like most other platforms.

 -- PMM

Re: [Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Andreas Färber

Am 25.06.2012 09:55, schrieb Oleg Ogurtsov:
 Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com
 ---
  hw/arm/Makefile.objs |1 +
  hw/exynos4210.c  |8 +
  hw/exynos4210_rtc.c  |  607 
 ++
  3 files changed, 616 insertions(+), 0 deletions(-)
  create mode 100644 hw/exynos4210_rtc.c

This RTC like many other Exynos devices has no dependency on the CPU. I
have a patch in preparation that moves such devices from
hw/arm/Makefile.objs to hw/Makefile.objs.
I don't object to this patch, not even minor style nits spotted,
compliment, but if you have to respin for some reason, it would be nice
if you could consider that improvement.

One thing I noted though, is it time to share the static BCD conversion
helpers used in the various RTCs? (bcd.c? qemu-common.h?) Or are they
slightly different?

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [RFC] Attaching EHCI to sysbus.

2012-06-25 Thread Peter Crosthwaite

Hi All,

I have a platform (Xilinx Zynq) that has a USB EHCI controller that
attaches directly to the system bus and not through PCI. We are
looking for a way to disentangle EHCI from PCI - currently it inherits
from TYPE_PCI_DEVICE:

static TypeInfo ehci_info = {
.name  = usb-ehci,
.parent= TYPE_PCI_DEVICE,
.instance_size = sizeof(EHCIState),
.class_init= ehci_class_init,
};

How would we go about edit EHCI to inherit from TYPE_SYSBUS_DEVICE as well?

Regards,
Peter

Re: [Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Peter Maydell

On 25 June 2012 08:55, Oleg Ogurtsov o.ogurt...@samsung.com wrote:
 Subject:  [PATCH] Exynos4: added RTC device

add, please.


 Signed-off-by: Oleg Ogurtsov o.ogurt...@samsung.com

(You don't need to send a cover letter for a single patch, by the way.)

 +#define     BCDSEC          0x0070
 +#define     BDCMIN          0x0074

typo, should be BCDMIN I assume.

 +#define     BCDHOUR         0x0078
 +#define     BCDDAY          0x007C
 +#define     BCDDAYWEEK      0x0080
 +#define     BCDMON          0x0084
 +#define     BCDYEAR         0x0088

 +/*
 + * 1Hz clock handler
 + */
 +static void exynos4210_rtc_1Hz_tick(void *opaque)
 +{
 +    Exynos4210RTCState *s = (Exynos4210RTCState *)opaque;
 +
 +    rtc_next_second(s-current_tm);
 +    /* DPRINTF(1Hz tick\n); */
 +
 +    /* raise IRQ */
 +    if (s-reg_rtcalm  ALARM_INT_ENABLE) {
 +        check_alarm_raise(s);
 +    }
 +
 +    ptimer_set_count(s-ptimer_1Hz, RTC_BASE_FREQ);
 +    ptimer_run(s-ptimer_1Hz, 1);
 +}

You could I think structure this so that instead of
running the timer every second you only have it go
off at the alarm time (you then recalculate the right
values etc if the guest reads/writes the registers).
That would be more complicated though and I guess one
timer fire every second isn't a big deal. We seem to
do it this way in other RTCs too, so this code is OK.

-- PMM

[Qemu-devel] [net-next RFC V4 PATCH 3/4] virtio: introduce a method to get the irq of a specific virtqueue

2012-06-25 Thread Jason Wang

Device specific irq optimizations such as irq affinity may be used by virtio
drivers. So this patch introduce a new method to get the irq of a specific
virtqueue.

After this patch, virtio device drivers could query the irq and do device
specific optimizations. First user would be virtio-net.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/lguest/lguest_device.c |8 
 drivers/s390/kvm/kvm_virtio.c  |6 ++
 drivers/virtio/virtio_mmio.c   |8 
 drivers/virtio/virtio_pci.c|   12 
 include/linux/virtio_config.h  |4 
 5 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index 9e8388e..bcd080f 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -392,6 +392,13 @@ static const char *lg_bus_name(struct virtio_device *vdev)
return ;
 }
 
+static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct lguest_vq_info *lvq = vq-priv;
+
+   return lvq-config.irq;
+}
+
 /* The ops structure which hooks everything together. */
 static struct virtio_config_ops lguest_config_ops = {
.get_features = lg_get_features,
@@ -404,6 +411,7 @@ static struct virtio_config_ops lguest_config_ops = {
.find_vqs = lg_find_vqs,
.del_vqs = lg_del_vqs,
.bus_name = lg_bus_name,
+   .get_vq_irq = lg_get_vq_irq,
 };
 
 /*
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index d74e9ae..a897de2 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device *vdev)
return ;
 }
 
+static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   return 0x2603;
+}
+
 /*
  * The config ops structure as defined by virtio config
  */
@@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
.find_vqs = kvm_find_vqs,
.del_vqs = kvm_del_vqs,
.bus_name = kvm_bus_name,
+   .get_vq_irq = kvm_get_vq_irq,
 };
 
 /*
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index f5432b6..2ba37ed 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -411,6 +411,13 @@ static const char *vm_bus_name(struct virtio_device *vdev)
return vm_dev-pdev-name;
 }
 
+static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
+
+   return platform_get_irq(vm_dev-pdev, 0);
+}
+
 static struct virtio_config_ops virtio_mmio_config_ops = {
.get= vm_get,
.set= vm_set,
@@ -422,6 +429,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
.get_features   = vm_get_features,
.finalize_features = vm_finalize_features,
.bus_name   = vm_bus_name,
+   .get_vq_irq = vm_get_vq_irq,
 };
 
 
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index adb24f2..c062227 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -607,6 +607,17 @@ static const char *vp_bus_name(struct virtio_device *vdev)
return pci_name(vp_dev-pci_dev);
 }
 
+static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   struct virtio_pci_vq_info *info = vq-priv;
+
+   if (vp_dev-intx_enabled)
+   return vp_dev-pci_dev-irq;
+   else
+   return vp_dev-msix_entries[info-msix_vector].vector;
+}
+
 static struct virtio_config_ops virtio_pci_config_ops = {
.get= vp_get,
.set= vp_set,
@@ -618,6 +629,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
.get_features   = vp_get_features,
.finalize_features = vp_finalize_features,
.bus_name   = vp_bus_name,
+   .get_vq_irq = vp_get_vq_irq,
 };
 
 static void virtio_pci_release_dev(struct device *_d)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index fc457f4..acd6930 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -98,6 +98,9 @@
  * vdev: the virtio_device
  *  This returns a pointer to the bus name a la pci_name from which
  *  the caller can then copy.
+ * @get_vq_irq: get the irq numer of the specific virt queue.
+ *  vdev: the virtio_device
+ *  vq: the virtqueue
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -116,6 +119,7 @@ struct virtio_config_ops {
u32 (*get_features)(struct virtio_device *vdev);
void (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
+   int (*get_vq_irq)(struct virtio_device *vdev, struct virtqueue *vq);
 };
 
 /* If driver didn't advertise

[Qemu-devel] [net-next RFC V4 PATCH 4/4] virtio_net: multiqueue support

2012-06-25 Thread Jason Wang

This addes multiqueue support to virtio_net driver. This feature is negotiated
through VIRTIO_NET_F_MULTIQUEUE.

The driver expects the number of rx/tx queue paris is equal to the number of
vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
optimization were introduced:

- Txq selection is based on the processor id in order to avoid contending a lock
 whose owner may exits to host.
- Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
 the queue pairs.

Signed-off-by: Krishna Kumar krkum...@in.ibm.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c   |  695 
 include/linux/virtio_net.h |2 +
 2 files changed, 506 insertions(+), 191 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f18149a..5c719de 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,7 @@
 #include linux/scatterlist.h
 #include linux/if_vlan.h
 #include linux/slab.h
+#include linux/interrupt.h
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -51,43 +52,73 @@ struct virtnet_stats {
u64 rx_packets;
 };
 
-struct virtnet_info {
-   struct virtio_device *vdev;
-   struct virtqueue *rvq, *svq, *cvq;
-   struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+   /* Virtqueue associated with this send _queue */
+   struct virtqueue *vq;
+
+   /* TX: fragments + linear part + virtio header */
+   struct scatterlist sg[MAX_SKB_FRAGS + 2];
+
+   cpumask_var_t affinity_mask;
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+   /* Virtqueue associated with this receive_queue */
+   struct virtqueue *vq;
+
+   /* Back pointer to the virtnet_info */
+   struct virtnet_info *vi;
+
struct napi_struct napi;
-   unsigned int status;
 
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
 
+   /* Work struct for refilling if we run low on memory. */
+   struct delayed_work refill;
+
+   /* Chain pages by the private ptr. */
+   struct page *pages;
+
+   /* RX: fragments + linear part + virtio header */
+   struct scatterlist sg[MAX_SKB_FRAGS + 2];
+
+   cpumask_var_t affinity_mask;
+};
+
+struct virtnet_info {
+   int num_queue_pairs;/* # of RX/TX vq pairs */
+
+   struct send_queue **sq;
+   struct receive_queue **rq;
+   struct virtqueue *cvq;
+
+   struct virtio_device *vdev;
+   struct net_device *dev;
+   unsigned int status;
+
/* I like... big packets and I cannot lie! */
bool big_packets;
 
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
 
+   /* Has control virtqueue */
+   bool has_cvq;
+
/* enable config space updates */
bool config_enable;
 
/* Active statistics */
struct virtnet_stats __percpu *stats;
 
-   /* Work struct for refilling if we run low on memory. */
-   struct delayed_work refill;
-
/* Work struct for config space updates */
struct work_struct config_work;
 
/* Lock for config space updates */
struct mutex config_lock;
-
-   /* Chain pages by the private ptr. */
-   struct page *pages;
-
-   /* fragments + linear part + virtio header */
-   struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
-   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
 };
 
 struct skb_vnet_hdr {
@@ -108,6 +139,22 @@ struct padded_vnet_hdr {
char padding[6];
 };
 
+static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+   int ret = virtqueue_get_queue_index(vq);
+
+   /* skip ctrl vq */
+   if (vi-has_cvq)
+   return (ret - 1) / 2;
+   else
+   return ret / 2;
+}
+
+static inline int rxq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
+{
+   return virtqueue_get_queue_index(vq) / 2;
+}
+
 static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
 {
return (struct skb_vnet_hdr *)skb-cb;
@@ -117,22 +164,22 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct 
sk_buff *skb)
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
  */
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
 {
struct page *end;
 
/* Find end of list, sew whole thing into vi-pages. */
for (end = page; end-private; end = (struct page *)end-private);
-   end-private = (unsigned long)vi-pages;
-   vi-pages = page;
+   end-private = (unsigned long)rq-pages;
+   rq-pages = page;
 }
 
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct

Re: [Qemu-devel] [PATCH v6 5/6] fdc_test: update media_change test

2012-06-25 Thread Kevin Wolf

Am 22.06.2012 12:33, schrieb Pavel Hrdina:
 After rewrite DSKCHG bit handling the test has to be updated. Now
 is needed to seek to different track to clear DSKCHG bit.
 
 Signed-off-by: Pavel Hrdina phrd...@redhat.com
 ---
  tests/fdc-test.c |   29 +
  1 files changed, 21 insertions(+), 8 deletions(-)
 
 diff --git a/tests/fdc-test.c b/tests/fdc-test.c
 index 610e2f1..5280eff 100644
 --- a/tests/fdc-test.c
 +++ b/tests/fdc-test.c
 @@ -156,19 +156,20 @@ static uint8_t send_read_command(void)
  return ret;
  }
  
 -static void send_step_pulse(void)
 +static void send_step_pulse(bool chg_cyl)
  {
  int drive = 0;
  int head = 0;
 -static int cyl = 0;
 +int cyl = 0;
 +
 +if (chg_cyl)
 +cyl = (cyl + 1) % 4;

Why do you remove the static? Previously, the function would cycle
through cylinders 1-4. Now it's always 0 if !chg_cyl and 1 if chg_cyl. I
think you need to fix this.

I also don't quite understand why you move the increment to here instead
of leaving it as the bottom. It doesn't look wrong, but pointless.

  
  floppy_send(CMD_SEEK);
  floppy_send(head  2 | drive);
  g_assert(!get_irq(FLOPPY_IRQ));
  floppy_send(cyl);
  ack_irq();
 -
 -cyl = (cyl + 1) % 4;
  }
  
  static uint8_t cmos_read(uint8_t reg)
 @@ -195,8 +196,7 @@ static void test_no_media_on_start(void)
  assert_bit_set(dir, DSKCHG);
  dir = inb(FLOPPY_BASE + reg_dir);
  assert_bit_set(dir, DSKCHG);
 -send_step_pulse();
 -send_step_pulse();
 +send_step_pulse(1);

This is a bool parameter, so you should pass true/false instead of 0/1.

Other than that, the test case additions look good and I'll accept it as
sufficient for the DSKCHG handling change. It would be great, though, to
add some more cases that test the implicit seeks during commands like
READ (ideally one for each command that can seek).

Kevin

Re: [Qemu-devel] [RFC] Attaching EHCI to sysbus.

2012-06-25 Thread Andreas Färber

Hi Peter,

Am 25.06.2012 11:34, schrieb Peter Crosthwaite:
 I have a platform (Xilinx Zynq) that has a USB EHCI controller that
 attaches directly to the system bus and not through PCI. We are
 looking for a way to disentangle EHCI from PCI - currently it inherits
 from TYPE_PCI_DEVICE:
 
 static TypeInfo ehci_info = {
 .name  = usb-ehci,
 .parent= TYPE_PCI_DEVICE,
 .instance_size = sizeof(EHCIState),
 .class_init= ehci_class_init,
 };
 
 How would we go about edit EHCI to inherit from TYPE_SYSBUS_DEVICE as well?

As mentioned recently, I need that for the Tegra platform as well,
please keep me in the loop.

Basically it means adding a second TypeInfo with its own class_init.
We'd need to have PCIEHCIState with PCIDevice as parent and a
SysBusEHCIState with SysBusDevice as parent, sharing a common EHCIState.
The m48t59 comes to mind as example with both ISADevice and
SysBusDevice. Maintaining VMState compatibility is likely going to be
tricky.

Following David's DMA refactoring this also means that the PCI DMA
helpers used in EHCI need to be replaced by the generic ones now
suggested by Ben, as done for OHCI.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock

2012-06-25 Thread Stefan Hajnoczi

On Sun, Jun 24, 2012 at 3:05 PM, liu ping fan qemul...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 8:36 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan qemul...@gmail.com wrote:
 diff --git a/cpu-defs.h b/cpu-defs.h
 index f49e950..7305822 100644
 --- a/cpu-defs.h
 +++ b/cpu-defs.h
 @@ -30,6 +30,7 @@
  #include osdep.h
  #include qemu-queue.h
  #include targphys.h
 +#include qemu-thread-posix.h

 This breaks Windows, you need qemu-thread.h.


  #ifndef TARGET_LONG_BITS
  #error TARGET_LONG_BITS must be defined before including this header
 @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
     CPU_COMMON_THREAD                                                   \
     struct QemuCond *halt_cond;                                         \
     int thread_kicked;                                                  \
 +    struct QemuMutex *cpu_lock;                                         \

 It would be nicer to declare it QemuMutex cpu_lock (no pointer) so
 that you don't need to worry about malloc/free.

 Yes :), I have considered about it, and not quite sure.  But I figure
 out the lock for per-device to break BQL will be like this for some
 reason.
 After all, have not decided yet.

Start simple.  If you need it to be a pointer later that's fine but
for now I see no reason to do the malloc (especially since there is no
free!).

 diff --git a/main-loop.h b/main-loop.h
 index dce1cd9..d8d44a4 100644
 --- a/main-loop.h
 +++ b/main-loop.h
 @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
  int qemu_add_child_watch(pid_t pid);
  #endif

 +void qemu_mutex_lock_cpu(void *_env);
 +void qemu_mutex_unlock_cpu(void *_env);

 Why void* instead of CPUArchState*?

 Because we can hide the CPUArchState from apic.c

There's probably a nicer way of doing that than void*.

Stefan

Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock

2012-06-25 Thread Andreas Färber

Am 25.06.2012 12:04, schrieb Stefan Hajnoczi:
 On Sun, Jun 24, 2012 at 3:05 PM, liu ping fan qemul...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 8:36 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 21, 2012 at 4:06 PM, Liu Ping Fan qemul...@gmail.com wrote:
 diff --git a/main-loop.h b/main-loop.h
 index dce1cd9..d8d44a4 100644
 --- a/main-loop.h
 +++ b/main-loop.h
 @@ -323,6 +323,9 @@ void qemu_bh_delete(QEMUBH *bh);
  int qemu_add_child_watch(pid_t pid);
  #endif

 +void qemu_mutex_lock_cpu(void *_env);
 +void qemu_mutex_unlock_cpu(void *_env);

 Why void* instead of CPUArchState*?

 Because we can hide the CPUArchState from apic.c
 
 There's probably a nicer way of doing that than void*.

CPUState, as requested.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [RFC V2 PATCH 0/4] Multiqueue support for tap and virtio-net/vhost

2012-06-25 Thread Jason Wang

Hello all:

This seires is an update of last version of multiqueue support to add multiqueue
capability to both tap and virtio-net.

Some kinds of tap backends has (macvatp in linux) or would (tap) support
multiqueue. In such kind of tap backend, each file descriptor of a tap is a
qeueu and ioctls were prodived to attach an exist tap file descriptor to the
tun/tap device. So the patch let qemu to use this kind of backend, and let it
can transmit and receving packets through multiple file descriptors.

Patch 1 introduce a new help to get all matched options, after this patch, we
could pass multiple file descriptors to a signle netdev by:

  qemu -netdev tap,id=hn0,fd=10,fd=11,...

Patch 2 introduce generic helpers in tap to attach or detach a file descriptor
from a tap device, emulated nics could use this helper to enable/disable queues.

Patch 3 modifies the NICState to allow multiple VLANClientState to be stored in
it, with this patch, qemu has basic support of multiple capable tap backend.

Patch 4 converts virtio-net/vhost to be multiple capable. The vhost device were
created per tx/rx queue pairs as usual.

Changes from V1:

- rebase to the latest
- fix memory leak in parse_netdev
- fix guest notifiers assignment/de-assignment
- changes the command lines to:
   qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

TODO:
- netdev_add
- bridge helper for multiple queue backend

References:
- V1 http://comments.gmane.org/gmane.comp.emulators.qemu/100481

Please review and comments.
---

Jason Wang (4):
  option: introduce qemu_get_opt_all()
  tap: multiqueue support
  net: multiqueue support
  virtio-net: add multiqueue support


 hw/dp8393x.c |2 
 hw/mcf_fec.c |2 
 hw/qdev-properties.c |   33 +++-
 hw/qdev.h|3 
 hw/vhost.c   |   58 --
 hw/vhost.h   |1 
 hw/vhost_net.c   |7 +
 hw/vhost_net.h   |2 
 hw/virtio-net.c  |  461 +-
 hw/virtio-net.h  |3 
 net.c|   62 ++-
 net.h|   16 +-
 net/tap-aix.c|   13 +
 net/tap-bsd.c|   13 +
 net/tap-haiku.c  |   13 +
 net/tap-linux.c  |   55 ++
 net/tap-linux.h  |3 
 net/tap-solaris.c|   13 +
 net/tap-win32.c  |   11 +
 net/tap.c|  189 +---
 net/tap.h|7 +
 qemu-option.c|   19 ++
 qemu-option.h|2 
 23 files changed, 714 insertions(+), 274 deletions(-)

-- 
Signature

[Qemu-devel] [RFC V2 PATCH 1/4] option: introduce qemu_get_opt_all()

2012-06-25 Thread Jason Wang

Sometimes, we need to pass option like -netdev tap,fd=100,fd=101,fd=102 which
can not be properly parsed by qemu_find_opt() because it only returns the first
matched option. So qemu_get_opt_all() were introduced to return an array of
pointers which contains all matched option.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 qemu-option.c |   19 +++
 qemu-option.h |2 ++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-option.c b/qemu-option.c
index bb3886c..9263125 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -545,6 +545,25 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char 
*name)
 return NULL;
 }
 
+int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp,
+ int max)
+{
+QemuOpt *opt;
+int index = 0;
+
+QTAILQ_FOREACH_REVERSE(opt, opts-head, QemuOptHead, next) {
+if (strcmp(opt-name, name) == 0) {
+if (index  max) {
+optp[index++] = opt-str;
+}
+if (index == max) {
+break;
+}
+}
+}
+return index;
+}
+
 const char *qemu_opt_get(QemuOpts *opts, const char *name)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
diff --git a/qemu-option.h b/qemu-option.h
index 951dec3..3c9a273 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -106,6 +106,8 @@ struct QemuOptsList {
 QemuOptDesc desc[];
 };
 
+int qemu_opt_get_all(QemuOpts *opts, const char *name, const char **optp,
+ int max);
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
 bool qemu_opt_get_bool(QemuOpts *opts, const char *name, bool defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t 
defval);

[Qemu-devel] [RFC V2 PATCH 2/4] tap: multiqueue support

2012-06-25 Thread Jason Wang

This patch adds basic support for the multiple queue capable tap device. When
multiqueue were enabled for a tap device, user can attach/detach multiple files
(sockets) to the device through TUNATTACHQUEUE/TUNDETACHQUEUE.

Two helpers tun_attach() and tun_deatch() were introduced to attach and detach
file. Platform-specific helpers were called and only linux helper has its
content as multiqueue tap were only supported in linux.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 net.c |4 +
 net/tap-aix.c |   13 +++-
 net/tap-bsd.c |   13 +++-
 net/tap-haiku.c   |   13 +++-
 net/tap-linux.c   |   55 +++
 net/tap-linux.h   |3 +
 net/tap-solaris.c |   13 +++-
 net/tap-win32.c   |   11 +++
 net/tap.c |  189 ++---
 net/tap.h |7 ++
 10 files changed, 245 insertions(+), 76 deletions(-)

diff --git a/net.c b/net.c
index 4aa416c..eabe830 100644
--- a/net.c
+++ b/net.c
@@ -978,6 +978,10 @@ static const struct {
 .name = vhostforce,
 .type = QEMU_OPT_BOOL,
 .help = force vhost on for non-MSIX virtio guests,
+}, {
+.name = queues,
+.type = QEMU_OPT_NUMBER,
+.help = number of queues the backend can provides,
 },
 #endif /* _WIN32 */
 { /* end of list */ }
diff --git a/net/tap-aix.c b/net/tap-aix.c
index e19aaba..f111e0f 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -25,7 +25,8 @@
 #include net/tap.h
 #include stdio.h
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int attach)
 {
 fprintf(stderr, no tap on AIX\n);
 return -1;
@@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd, const char *ifname)
+{
+return -1;
+}
+
+int tap_fd_detach(int fd, const char *ifname)
+{
+return -1;
+}
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 937a94b..44f3421 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -33,7 +33,8 @@
 #include net/if_tap.h
 #endif
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int attach)
 {
 int fd;
 #ifdef TAPGIFNAME
@@ -145,3 +146,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd, const char *ifname)
+{
+return -1;
+}
+
+int tap_fd_detach(int fd, const char *ifname)
+{
+return -1;
+}
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index 91dda8e..6fb6719 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -25,7 +25,8 @@
 #include net/tap.h
 #include stdio.h
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int attach)
 {
 fprintf(stderr, no tap on Haiku\n);
 return -1;
@@ -59,3 +60,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_attach(int fd, const char *ifname)
+{
+return -1;
+}
+
+int tap_fd_detach(int fd, const char *ifname)
+{
+return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 41d581b..5d74b53 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -35,7 +35,8 @@
 
 #define PATH_NET_TUN /dev/net/tun
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int attach)
 {
 struct ifreq ifr;
 int fd, ret;
@@ -47,6 +48,8 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 }
 memset(ifr, 0, sizeof(ifr));
 ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+if (!attach)
+ifr.ifr_flags |= IFF_MULTI_QUEUE;
 
 if (*vnet_hdr) {
 unsigned int features;
@@ -71,7 +74,10 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
 else
 pstrcpy(ifr.ifr_name, IFNAMSIZ, tap%d);
-ret = ioctl(fd, TUNSETIFF, (void *) ifr);
+if (attach)
+ret = ioctl(fd, TUNATTACHQUEUE, (void *) ifr);
+else
+ret = ioctl(fd, TUNSETIFF, (void *) ifr);
 if (ret != 0) {
 if (ifname[0] != '\0') {
 error_report(could not configure %s (%s): %m, PATH_NET_TUN, 
ifr.ifr_name);
@@ -197,3 +203,48 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 }
 }
 }
+
+/* Attach a file descriptor to a TUN/TAP device. This descriptor should be
+ * detached before.
+ */
+int tap_fd_attach(int fd, const char *ifname)
+{
+struct ifreq ifr;
+int ret;
+
+

[Qemu-devel] [RFC V2 PATCH 3/4] net: multiqueue support

2012-06-25 Thread Jason Wang

This patch adds the multiqueues support for emulated nics. Each VLANClientState
pairs are now abstract as a queue instead of a nic, and multiple VLANClientState
pointers were stored in the NICState. A queue_index were also introduced to let
the emulated nics know which queue the packet were came from or sent
out. Virtio-net would be the first user.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/dp8393x.c |2 +-
 hw/mcf_fec.c |2 +-
 hw/qdev-properties.c |   33 +++-
 hw/qdev.h|3 ++-
 net.c|   58 +++---
 net.h|   16 ++
 6 files changed, 93 insertions(+), 21 deletions(-)

diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 017d074..483a868 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -900,7 +900,7 @@ void dp83932_init(NICInfo *nd, target_phys_addr_t base, int 
it_shift,
 
 s-conf.macaddr = nd-macaddr;
 s-conf.vlan = nd-vlan;
-s-conf.peer = nd-netdev;
+s-conf.peers[0] = nd-netdev;
 
 s-nic = qemu_new_nic(net_dp83932_info, s-conf, nd-model, nd-name, s);
 
diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
index ae37bef..69f508d 100644
--- a/hw/mcf_fec.c
+++ b/hw/mcf_fec.c
@@ -473,7 +473,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd,
 
 s-conf.macaddr = nd-macaddr;
 s-conf.vlan = nd-vlan;
-s-conf.peer = nd-netdev;
+s-conf.peers[0] = nd-netdev;
 
 s-nic = qemu_new_nic(net_mcf_fec_info, s-conf, nd-model, nd-name, s);
 
diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 9ae3187..d45fcef 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -554,16 +554,37 @@ PropertyInfo qdev_prop_chr = {
 
 static int parse_netdev(DeviceState *dev, const char *str, void **ptr)
 {
-VLANClientState *netdev = qemu_find_netdev(str);
+VLANClientState ***nc = (VLANClientState ***)ptr;
+VLANClientState *vcs[MAX_QUEUE_NUM];
+int queues, i = 0;
+int ret;
 
-if (netdev == NULL) {
-return -ENOENT;
+*nc = g_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *));
+queues = qemu_find_netdev_all(str, vcs, MAX_QUEUE_NUM);
+if (queues == 0) {
+ret = -ENOENT;
+goto err;
 }
-if (netdev-peer) {
-return -EEXIST;
+
+for (i = 0; i  queues; i++) {
+if (vcs[i] == NULL) {
+ret = -ENOENT;
+goto err;
+}
+
+if (vcs[i]-peer) {
+ret = -EEXIST;
+goto err;
+}
+
+(*nc)[i] = vcs[i];
 }
-*ptr = netdev;
+
 return 0;
+
+err:
+g_free(*nc);
+return ret;
 }
 
 static const char *print_netdev(void *ptr)
diff --git a/hw/qdev.h b/hw/qdev.h
index 5386b16..1c023b4 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -248,6 +248,7 @@ extern PropertyInfo qdev_prop_blocksize;
 .defval= (bool)_defval,  \
 }
 
+
 #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
 DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
 #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \
@@ -274,7 +275,7 @@ extern PropertyInfo qdev_prop_blocksize;
 #define DEFINE_PROP_STRING(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f) \
-DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*)
+DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**)
 #define DEFINE_PROP_VLAN(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*)
 #define DEFINE_PROP_DRIVE(_n, _s, _f) \
diff --git a/net.c b/net.c
index eabe830..026a03a 100644
--- a/net.c
+++ b/net.c
@@ -238,16 +238,40 @@ NICState *qemu_new_nic(NetClientInfo *info,
 {
 VLANClientState *nc;
 NICState *nic;
+int i;
 
 assert(info-type == NET_CLIENT_TYPE_NIC);
 assert(info-size = sizeof(NICState));
 
-nc = qemu_new_net_client(info, conf-vlan, conf-peer, model, name);
+if (conf-peers) {
+nc = qemu_new_net_client(info, NULL, conf-peers[0], model, name);
+} else {
+nc = qemu_new_net_client(info, conf-vlan, NULL, model, name);
+}
 
 nic = DO_UPCAST(NICState, nc, nc);
 nic-conf = conf;
 nic-opaque = opaque;
 
+/* For compatiablity with single queue nic */
+nic-ncs[0] = nc;
+nc-opaque = nic;
+
+for (i = 1 ; i  conf-queues; i++) {
+VLANClientState *vc = g_malloc0(sizeof(*vc));
+vc-opaque = nic;
+nic-ncs[i] = vc;
+vc-peer = conf-peers[i];
+vc-info = info;
+vc-queue_index = i;
+vc-peer-peer = vc;
+QTAILQ_INSERT_TAIL(non_vlan_clients, vc, next);
+
+vc-send_queue = qemu_new_net_queue(qemu_deliver_packet,
+qemu_deliver_packet_iov,
+vc);
+}
+
 return nic;
 }
 
@@ -283,11 +307,10 @@ void qemu_del_vlan_client(VLANClientState *vc)
 {
 /*

[Qemu-devel] [RFC V2 PATCH 4/4] virtio-net: add multiqueue support

2012-06-25 Thread Jason Wang

This patch let the virtio-net can transmit and recevie packets through multiuple
VLANClientStates and abstract them as multiple virtqueues to guest. A new
parameter 'queues' were introduced to specify the number of queue pairs.

The main goal for vhost support is to let the multiqueue could be used without
changes in vhost code. So each vhost_net structure were used to track a single
VLANClientState and two virtqueues in the past. As multiple VLANClientState were
stored in the NICState, we can infer the correspond VLANClientState from this
and queue_index easily.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/vhost.c  |   58 ---
 hw/vhost.h  |1 
 hw/vhost_net.c  |7 +
 hw/vhost_net.h  |2 
 hw/virtio-net.c |  461 +--
 hw/virtio-net.h |3 
 6 files changed, 355 insertions(+), 177 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 43664e7..6318bb2 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -620,11 +620,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 {
 target_phys_addr_t s, l, a;
 int r;
+int vhost_vq_index = (idx  2 ? idx - 1 : idx) % dev-nvqs;
 struct vhost_vring_file file = {
-.index = idx,
+   .index = vhost_vq_index
 };
 struct vhost_vring_state state = {
-.index = idx,
+.index = vhost_vq_index
 };
 struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
 
@@ -670,11 +671,12 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 goto fail_alloc_ring;
 }
 
-r = vhost_virtqueue_set_addr(dev, vq, idx, dev-log_enabled);
+r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev-log_enabled);
 if (r  0) {
 r = -errno;
 goto fail_alloc;
 }
+
 file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
 r = ioctl(dev-control, VHOST_SET_VRING_KICK, file);
 if (r) {
@@ -715,7 +717,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
 unsigned idx)
 {
 struct vhost_vring_state state = {
-.index = idx,
+.index = (idx  2 ? idx - 1 : idx) % dev-nvqs,
 };
 int r;
 r = ioctl(dev-control, VHOST_GET_VRING_BASE, state);
@@ -829,7 +831,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 }
 
 for (i = 0; i  hdev-nvqs; ++i) {
-r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, true);
+r = vdev-binding-set_host_notifier(vdev-binding_opaque,
+hdev-start_idx + i,
+true);
 if (r  0) {
 fprintf(stderr, vhost VQ %d notifier binding failed: %d\n, i, 
-r);
 goto fail_vq;
@@ -839,7 +843,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 return 0;
 fail_vq:
 while (--i = 0) {
-r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, false);
+r = vdev-binding-set_host_notifier(vdev-binding_opaque,
+hdev-start_idx + i,
+false);
 if (r  0) {
 fprintf(stderr, vhost VQ %d notifier cleanup error: %d\n, i, -r);
 fflush(stderr);
@@ -860,7 +866,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 int i, r;
 
 for (i = 0; i  hdev-nvqs; ++i) {
-r = vdev-binding-set_host_notifier(vdev-binding_opaque, i, false);
+r = vdev-binding-set_host_notifier(vdev-binding_opaque,
+hdev-start_idx + i,
+false);
 if (r  0) {
 fprintf(stderr, vhost VQ %d notifier cleanup failed: %d\n, i, 
-r);
 fflush(stderr);
@@ -874,15 +882,17 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 {
 int i, r;
 if (!vdev-binding-set_guest_notifiers) {
-fprintf(stderr, binding does not support guest notifiers\n);
+fprintf(stderr, binding does not support guest notifier\n);
 r = -ENOSYS;
 goto fail;
 }
 
-r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true);
-if (r  0) {
-fprintf(stderr, Error binding guest notifier: %d\n, -r);
-goto fail_notifiers;
+if (hdev-start_idx == 0) {
+r = vdev-binding-set_guest_notifiers(vdev-binding_opaque, true);
+if (r  0) {
+fprintf(stderr, Error binding guest notifier: %d\n, -r);
+goto fail_notifiers;
+}
 }
 
 r = vhost_dev_set_features(hdev, hdev-log_enabled);
@@ -898,7 +908,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 r = vhost_virtqueue_init(hdev,
  vdev,
  hdev-vqs + i,
- i);
+ hdev-start_idx + i);

Re: [Qemu-devel] [net-next RFC V4 PATCH 3/4] virtio: introduce a method to get the irq of a specific virtqueue

2012-06-25 Thread Michael S. Tsirkin

On Mon, Jun 25, 2012 at 05:41:17PM +0800, Jason Wang wrote:
 Device specific irq optimizations such as irq affinity may be used by virtio
 drivers. So this patch introduce a new method to get the irq of a specific
 virtqueue.
 
 After this patch, virtio device drivers could query the irq and do device
 specific optimizations. First user would be virtio-net.
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/lguest/lguest_device.c |8 
  drivers/s390/kvm/kvm_virtio.c  |6 ++
  drivers/virtio/virtio_mmio.c   |8 
  drivers/virtio/virtio_pci.c|   12 
  include/linux/virtio_config.h  |4 
  5 files changed, 38 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
 index 9e8388e..bcd080f 100644
 --- a/drivers/lguest/lguest_device.c
 +++ b/drivers/lguest/lguest_device.c
 @@ -392,6 +392,13 @@ static const char *lg_bus_name(struct virtio_device 
 *vdev)
   return ;
  }
  
 +static int lg_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
 +{
 + struct lguest_vq_info *lvq = vq-priv;
 +
 + return lvq-config.irq;
 +}
 +
  /* The ops structure which hooks everything together. */
  static struct virtio_config_ops lguest_config_ops = {
   .get_features = lg_get_features,
 @@ -404,6 +411,7 @@ static struct virtio_config_ops lguest_config_ops = {
   .find_vqs = lg_find_vqs,
   .del_vqs = lg_del_vqs,
   .bus_name = lg_bus_name,
 + .get_vq_irq = lg_get_vq_irq,
  };
  
  /*
 diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
 index d74e9ae..a897de2 100644
 --- a/drivers/s390/kvm/kvm_virtio.c
 +++ b/drivers/s390/kvm/kvm_virtio.c
 @@ -268,6 +268,11 @@ static const char *kvm_bus_name(struct virtio_device 
 *vdev)
   return ;
  }
  
 +static int kvm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
 +{
 + return 0x2603;
 +}
 +
  /*
   * The config ops structure as defined by virtio config
   */
 @@ -282,6 +287,7 @@ static struct virtio_config_ops kvm_vq_configspace_ops = {
   .find_vqs = kvm_find_vqs,
   .del_vqs = kvm_del_vqs,
   .bus_name = kvm_bus_name,
 + .get_vq_irq = kvm_get_vq_irq,
  };
  
  /*
 diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
 index f5432b6..2ba37ed 100644
 --- a/drivers/virtio/virtio_mmio.c
 +++ b/drivers/virtio/virtio_mmio.c
 @@ -411,6 +411,13 @@ static const char *vm_bus_name(struct virtio_device 
 *vdev)
   return vm_dev-pdev-name;
  }
  
 +static int vm_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
 +{
 + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
 +
 + return platform_get_irq(vm_dev-pdev, 0);
 +}
 +
  static struct virtio_config_ops virtio_mmio_config_ops = {
   .get= vm_get,
   .set= vm_set,
 @@ -422,6 +429,7 @@ static struct virtio_config_ops virtio_mmio_config_ops = {
   .get_features   = vm_get_features,
   .finalize_features = vm_finalize_features,
   .bus_name   = vm_bus_name,
 + .get_vq_irq = vm_get_vq_irq,
  };
  
  
 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index adb24f2..c062227 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -607,6 +607,17 @@ static const char *vp_bus_name(struct virtio_device 
 *vdev)
   return pci_name(vp_dev-pci_dev);
  }
  
 +static int vp_get_vq_irq(struct virtio_device *vdev, struct virtqueue *vq)
 +{
 + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
 + struct virtio_pci_vq_info *info = vq-priv;
 +
 + if (vp_dev-intx_enabled)
 + return vp_dev-pci_dev-irq;
 + else
 + return vp_dev-msix_entries[info-msix_vector].vector;
 +}
 +
  static struct virtio_config_ops virtio_pci_config_ops = {
   .get= vp_get,
   .set= vp_set,
 @@ -618,6 +629,7 @@ static struct virtio_config_ops virtio_pci_config_ops = {
   .get_features   = vp_get_features,
   .finalize_features = vp_finalize_features,
   .bus_name   = vp_bus_name,
 + .get_vq_irq = vp_get_vq_irq,
  };
  
  static void virtio_pci_release_dev(struct device *_d)
 diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
 index fc457f4..acd6930 100644
 --- a/include/linux/virtio_config.h
 +++ b/include/linux/virtio_config.h
 @@ -98,6 +98,9 @@
   *   vdev: the virtio_device
   *  This returns a pointer to the bus name a la pci_name from which
   *  the caller can then copy.
 + * @get_vq_irq: get the irq numer of the specific virt queue.
 + *  vdev: the virtio_device
 + *  vq: the virtqueue

What if the vq does not have an IRQ? E.g. control vqs don't.
What if the IRQ is shared between VQs? Between devices?
The need to cleanup affinity on destroy is also nasty.
How about we expose a set_affinity API instead?
Then:
- non PCI can ignore for now
- with a per vq vector we

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Jan Kiszka

On 2012-06-25 10:57, Avi Kivity wrote:
 On 06/25/2012 10:01 AM, Jan Kiszka wrote:
 Flush pending coalesced MMIO before performing mapping or state changes
 that could affect the event orderings or route the buffered requests to
 a wrong region.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 In addition, we also have to
 
 Yes, we do.
 
  
  void memory_region_transaction_begin(void)
  {
 +qemu_flush_coalesced_mmio_buffer();
  ++memory_region_transaction_depth;
  }
 
 Why is this needed?
 
  
 @@ -1109,6 +1110,9 @@ void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
  
  void memory_region_set_readonly(MemoryRegion *mr, bool readonly)
  {
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}
 
 The readonly attribute is inherited by subregions and alias targets, so
 this check is insufficient.  See render_memory_region().  Need to flush
 unconditionally.
 
  if (mr-readonly != readonly) {
  mr-readonly = readonly;
  memory_region_update_topology(mr);
 @@ -1117,6 +1121,9 @@ void memory_region_set_readonly(MemoryRegion *mr, bool 
 readonly)
  
  void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
  {
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}
 
 This property is not inherited, but let's flush unconditionally just the
 same, to reduce fragility.
 
 @@ -1219,6 +1228,9 @@ void memory_region_add_eventfd(MemoryRegion *mr,
  };
  unsigned i;
  
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}
 
 Ditto.  It's possible that an eventfd overlays a subregion which has
 coalescing enabled.  It's not really defined what happens in this case,
 and such code and its submitter should be perma-nacked, but let's play
 it safe here since there isn't much to be gained by avoiding the flush.
  This code is a very slow path anyway, including and rcu and/or srcu
 synchronization, and a rebuild of the dispatch radix tree (trees when we
 dma-enable this code).
 
  for (i = 0; i  mr-ioeventfd_nb; ++i) {
  if (memory_region_ioeventfd_before(mrfd, mr-ioeventfds[i])) {
  break;
 @@ -1249,6 +1261,9 @@ void memory_region_del_eventfd(MemoryRegion *mr,
  };
  unsigned i;
  
 +if (!QTAILQ_EMPTY(mr-coalesced)) {
 +qemu_flush_coalesced_mmio_buffer();
 +}
 
 Same.

OK.

 
 The repetitiveness of this code suggests a different way of doing this:
 make every API call be its own subtransaction and perform the flush in
 memory_region_begin_transaction() (maybe that's the answer to my
 question above).

So you want me to wrap the core of those services in
begin/commit_transaction instead? Just to be sure I got the idea.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] running qemu with image file on a block device

2012-06-25 Thread Stefan Hajnoczi

On Sun, Jun 24, 2012 at 7:49 AM, udit agarwal fzdu...@gmail.com wrote:
  I have set up a block device of 20G on my hard drive and is located at
 /dev/sda2. Now I have created a raw image inside this device via command:
  hp1:~ # qemu-img create -f raw /dev/sda2:server.img 10G

 Then I copied the iso file on this device via command:
 hp1:~# cp ~/CD.iso /dev/sda2:CD,iso
 and it also worked fine.

 But when I ran the following qemu command:
 hp1:~ # qemu -name Gnome-desktop -M pc-0.12 -m 256 -smp 2 -boot d -drive
 file=/dev/sda2:server.img,if=virtio,index=0,media=disk,format=raw -drive
 file=/dev/sda2:CD.iso,index=1,media=cdrom -net
 nic,model=virtio,macaddr=52:54:00:05:11:11 -net user -vga cirrus -balloon
 virtio

 I got the following errors:
 qemu: -drive
 file=/dev/sda2:server.img,if=virtio,index=0,media=disk,format=raw: could not
 open disk image /dev/sda2:server.img: No such file or directory

 Can anybody help me with this??

The commands you pasted seem to be creating files called
/dev/sda2:server.img and /dev/sda2:CD.iso.  Note that these files
are created in /dev, they are not located on the /dev/sda2 device as
you intended.  The ':' notation you tried to use has no special
meaning on Linux, you are creating new files that have nothing to do
with /dev/sda2.

If you want to put several disk images onto a single block device then
you probably need to create a file system on the device, mount it, and
then put files into this new file system:

# mkfs.ext4 /dev/sda2 # WARNING: any existing data in /dev/sda2 will be lost
# mount /dev/sda2 /mnt
# cp CD.iso /mnt
# qemu-img create /mnt/server.img 10G
# ls /mnt
CD.iso
server.img

Stefan

Re: [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock

2012-06-25 Thread Andreas Färber

Am 24.06.2012 16:02, schrieb liu ping fan:
 On Fri, Jun 22, 2012 at 8:55 PM, Andreas Färber afaer...@suse.de wrote:
 Am 21.06.2012 17:06, schrieb Liu Ping Fan:
 introduce a lock for per-cpu to protect agaist accesing from
 other vcpu thread.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  cpu-defs.h  |2 ++
  cpus.c  |   17 +
  main-loop.h |3 +++
  3 files changed, 22 insertions(+), 0 deletions(-)

 diff --git a/cpu-defs.h b/cpu-defs.h
 index f49e950..7305822 100644
 --- a/cpu-defs.h
 +++ b/cpu-defs.h
 @@ -30,6 +30,7 @@
  #include osdep.h
  #include qemu-queue.h
  #include targphys.h
 +#include qemu-thread-posix.h

  #ifndef TARGET_LONG_BITS
  #error TARGET_LONG_BITS must be defined before including this header
 @@ -220,6 +221,7 @@ typedef struct CPUWatchpoint {
  CPU_COMMON_THREAD   \
  struct QemuCond *halt_cond; \
  int thread_kicked;  \
 +struct QemuMutex *cpu_lock; \
  struct qemu_work_item *queued_work_first, *queued_work_last;\
  const char *cpu_model_str;  \
  struct KVMState *kvm_state; \

 Please don't add stuff to CPU_COMMON. Instead add to CPUState in
 qom/cpu.c. The QOM CPUState part 4 series moves many of those fields there.

My mistake: The struct to add to is in include/qemu/cpu.h obviously.


 Ok, thanks.
 diff --git a/cpus.c b/cpus.c
 index b182b3d..554f7bc 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -735,6 +735,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
  env-thread_id = qemu_get_thread_id();
  cpu_single_env = env;

 +

 Stray whitespace addition.

  r = kvm_init_vcpu(env);
  if (r  0) {
  fprintf(stderr, kvm_init_vcpu failed: %s\n, strerror(-r));
 @@ -891,6 +892,20 @@ int qemu_cpu_is_self(void *_env)
  return qemu_thread_is_self(env-thread);
  }

 +void qemu_mutex_lock_cpu(void *_env)
 +{
 +CPUArchState *env = _env;
 +
 +qemu_mutex_lock(env-cpu_lock);
 +}
 +
 +void qemu_mutex_unlock_cpu(void *_env)
 +{
 +CPUArchState *env = _env;
 +
 +qemu_mutex_unlock(env-cpu_lock);
 +}
 +

 I don't like these helpers. For one, you are using void * arguments and
 casting them, for another you are using CPUArchState at all. With my
 suggestion above these can be CPUState *cpu.

 For using it in apic.c, we need to hide the  CPUArchState structure

CPUState serves to hide CPUArchState, with added type safety. :)

Regards,
Andreas

  void qemu_mutex_lock_iothread(void)
  {
  if (!tcg_enabled()) {
 @@ -1027,6 +1042,8 @@ void qemu_init_vcpu(void *_env)
  env-nr_cores = smp_cores;
  env-nr_threads = smp_threads;
  env-stopped = 1;
 +env-cpu_lock = g_malloc0(sizeof(QemuMutex));
 +qemu_mutex_init(env-cpu_lock);

 Are you sure this is not needed for linux-user/bsd-user? If not needed,
 then the field should be #ifdef'ed in the struct to assure that.
 Otherwise this function is never called and you need to move the
 initialization to the initfn in qom/cpu.c and then should also clean it
 up in a finalizer.

 It is not needed for linux-user/bsd-user. I will use the macro,
 
 Thanks and regards,
 pingfan

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Jan Kiszka

On 2012-06-25 12:15, Jan Kiszka wrote:
 On 2012-06-25 10:57, Avi Kivity wrote:
 The repetitiveness of this code suggests a different way of doing this:
 make every API call be its own subtransaction and perform the flush in
 memory_region_begin_transaction() (maybe that's the answer to my
 question above).
 
 So you want me to wrap the core of those services in
 begin/commit_transaction instead? Just to be sure I got the idea.

What we would lose this way (transaction_commit) is the ability to skip
updates on !mr-enabled.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] net: RFC New Socket-Based, Switched Network Backend (QDES)

2012-06-25 Thread Stefan Hajnoczi

On Mon, Jun 25, 2012 at 6:42 AM, Mike Lovell m...@dev-zero.net wrote:
 This is what I've been calling QDES or QEMU Distributed Ethernet Switch. I
 first had the idea when I was playing with the udp and mcast socket network
 backends while exploring how to build a VM infrastructure. I liked the idea of
 using the sockets backends cause it doesn't require escalated permissions to
 configure and run as well as the ability to talk over IP networks.

 But the built in socket backends either allowed for only 2 guests talking
 directly or for multiple guests where all traffic is sent to all. So one can
 either have two guests talking or have bandwidth wasted with multiple guests.
 There wasn't something that could talk to multiple guests but also utilize
 unicast traffic.

Have you looked at QEMU's net/vde.c backend?

Does VDE (http://vde.sourceforge.net/) already do everything that QDES does?

Stefan

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Avi Kivity

On 06/25/2012 01:26 PM, Jan Kiszka wrote:
 On 2012-06-25 12:15, Jan Kiszka wrote:
 On 2012-06-25 10:57, Avi Kivity wrote:
 The repetitiveness of this code suggests a different way of doing this:
 make every API call be its own subtransaction and perform the flush in
 memory_region_begin_transaction() (maybe that's the answer to my
 question above).
 
 So you want me to wrap the core of those services in
 begin/commit_transaction instead? Just to be sure I got the idea.
 
 What we would lose this way (transaction_commit) is the ability to skip
 updates on !mr-enabled.

We could have an internal memory_region_begin_transaction_mr() which
checks mr-enabled and notes it if its clear (but anything else would
have to undo this).  I don't think it's worth it, let's lose the
optimization and see if it shows up anywhere.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 3/5] memory: Flush coalesced MMIO on mapping and state changes

2012-06-25 Thread Jan Kiszka

On 2012-06-25 13:01, Avi Kivity wrote:
 On 06/25/2012 01:26 PM, Jan Kiszka wrote:
 On 2012-06-25 12:15, Jan Kiszka wrote:
 On 2012-06-25 10:57, Avi Kivity wrote:
 The repetitiveness of this code suggests a different way of doing this:
 make every API call be its own subtransaction and perform the flush in
 memory_region_begin_transaction() (maybe that's the answer to my
 question above).

 So you want me to wrap the core of those services in
 begin/commit_transaction instead? Just to be sure I got the idea.

 What we would lose this way (transaction_commit) is the ability to skip
 updates on !mr-enabled.
 
 We could have an internal memory_region_begin_transaction_mr() which
 checks mr-enabled and notes it if its clear (but anything else would
 have to undo this).  I don't think it's worth it, let's lose the
 optimization and see if it shows up anywhere.

OK, will send a new series with a conversion of this included.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1

2012-06-25 Thread Stefano Stabellini

On Sat, 23 Jun 2012, Michael Roth wrote:
 Hi everyone,
 
 I've set up a tree for stable-1.1 at:
 
 git://github.com/mdroth/qemu.git stable-1.1

Thanks for doing this.


 That is simply the initial commit for v1.1.0
 
 I've set up a staging tree for the first batch of commits at:
 
 git://github.com/mdroth/qemu.git stable-1.1-staging
 
 This should capture everything explicitly tagged for stable/1.1. I've also
 pulled in relevant build fixes and a handful of applicable bug fixes.
 
 Please let me know if I've missed any patches, or if any of these shouldn't be
 applied.

Please backport this one too:

commit 64c27e5b1fdb6d94bdc0bda3b1869d7383a35c65
Author: Jan Beulich jbeul...@suse.com
Date:   Mon Jun 11 09:52:27 2012 +

qemu/xendisk: set maximum number of grants to be used


Thanks!

- Stefano

Re: [Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Evgeny Voevodin


On 25.06.2012 13:24, Andreas Färber wrote:

Am 25.06.2012 09:55, schrieb Oleg Ogurtsov:

Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com
---
  hw/arm/Makefile.objs |1 +
  hw/exynos4210.c  |8 +
  hw/exynos4210_rtc.c  |  607 ++
  3 files changed, 616 insertions(+), 0 deletions(-)
  create mode 100644 hw/exynos4210_rtc.c

This RTC like many other Exynos devices has no dependency on the CPU. I
have a patch in preparation that moves such devices from
hw/arm/Makefile.objs to hw/Makefile.objs.
I don't object to this patch, not even minor style nits spotted,
compliment, but if you have to respin for some reason, it would be nice
if you could consider that improvement.


These devices are SOC specific and this SOC is based on ARM only.
Do we really need to move them?

--
Kind regards,
Evgeny Voevodin,
Leading Software Engineer,
ASWG, Moscow RD center, Samsung Electronics
e-mail: e.voevo...@samsung.com

Re: [Qemu-devel] [PATCH v6 6/6] fdc_test: introduce test_sense_interrupt

2012-06-25 Thread Kevin Wolf

Am 22.06.2012 12:33, schrieb Pavel Hrdina:
 Calling sense interrupt status while there is no interrupt should
 return invalid command (0x80).
 
 Read command should always returns in st0 seek_end bit set to 1.
 
 Signed-off-by: Pavel Hrdina phrd...@redhat.com
 ---
  tests/fdc-test.c |   25 -
  1 files changed, 24 insertions(+), 1 deletions(-)
 
 diff --git a/tests/fdc-test.c b/tests/fdc-test.c
 index 5280eff..be14d83 100644
 --- a/tests/fdc-test.c
 +++ b/tests/fdc-test.c
 @@ -142,7 +142,7 @@ static uint8_t send_read_command(void)
  }
  
  st0 = floppy_recv();
 -if (st0 != 0x40) {
 +if (st0 != 0x60) {
  ret = 1;
  }
  
 @@ -263,6 +263,28 @@ static void test_media_change(void)
  assert_bit_set(dir, DSKCHG);
  }
  
 +static void test_sense_interrupt(void)
 +{
 +int drive = 0;
 +int head = 0;
 +int cyl = 0;
 +int ret = 0;
 +
 +floppy_send(CMD_SENSE_INT);
 +ret = floppy_recv();
 +g_assert(ret == 0x80);

If I'm not mistakes, CMD_SENSE_INT returns two bytes (not just one), the
first of which is ST0 and the second one the Present Cylinder Number.

0x80 for ST0 would mean invalid command, which is the right return
value if there is no active interrupt condition. So you seem to check
the right value and just miss a second floppy_recv().

 +
 +floppy_send(CMD_SEEK);
 +floppy_send(head  2 | drive);
 +g_assert(!get_irq(FLOPPY_IRQ));
 +floppy_send(cyl);

Maybe add another assertion here that the IRQ is now active?

 +
 +floppy_send(CMD_SENSE_INT);
 +ret = floppy_recv();
 +g_assert(ret != 0x80);

I think we can do better and tell the exact value that should result. I
believe g_assert(ret == 0x20); would be right.

Kevin

Re: [Qemu-devel] [PATCH] Exynos4: added RTC device

2012-06-25 Thread Andreas Färber

Am 25.06.2012 13:46, schrieb Evgeny Voevodin:
 On 25.06.2012 13:24, Andreas Färber wrote:
 Am 25.06.2012 09:55, schrieb Oleg Ogurtsov:
 Signed-off-by: Oleg Ogurtsovo.ogurt...@samsung.com
 ---
   hw/arm/Makefile.objs |1 +
   hw/exynos4210.c  |8 +
   hw/exynos4210_rtc.c  |  607
 ++
   3 files changed, 616 insertions(+), 0 deletions(-)
   create mode 100644 hw/exynos4210_rtc.c
 This RTC like many other Exynos devices has no dependency on the CPU. I
 have a patch in preparation that moves such devices from
 hw/arm/Makefile.objs to hw/Makefile.objs.
 I don't object to this patch, not even minor style nits spotted,
 compliment, but if you have to respin for some reason, it would be nice
 if you could consider that improvement.
 
 These devices are SOC specific and this SOC is based on ARM only.
 Do we really need to move them?

For one, they do not need to be rebuilt when cpu.h changes and they
should get the usual device poisoning for proper modeling.
For another, someone on IRC started work on an armeb-softmmu, for which
we would probably not want to compile in the Exynos devices. Or if we
do, we certainly don't want to compile everything twice (cf. xilinx).

If devices are ARM-specific and need access to the CPU (e.g., machines,
PICs) then according to Paolo they should be placed in hw/arm/ with the
new scheme. I'm trying to stay away from moving other people's files
around, but the Makefile changes are pretty non-intrusive and can go
through arm-devs.next.

Cheers,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device

2012-06-25 Thread Daniel P. Berrange

On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote:
 On 06/22/2012 01:50 PM, Amit Shah wrote:
 On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote:
 On 06/22/2012 08:34 AM, Daniel P. Berrange wrote:
 
 Oh, that's a good point.
 
 But easily fixed.
 
 Of course, except that now we have to maintain compatibility so some
 hideous hack goes in.
 
 This is what fundamentally makes using events a bad approach.  There
 are more problems lurking.  This is the fundamental complexity of
 having two non-synchronized communication channels when you're
 trying to synchronize some sort of activity.
 
 BTW, despite what danpb mentioned, you are rate limiting entropy
 requests in this patch series
 
 All of these problems are naturally solved using a protocol over a 
 CharDriverState.
 
 Can we at least agree on merging a patch which just includes the
 raw chardev backend support for virtio-rng ? ie drop the QMP
 event for now, so we can make some step forward.
 
 All we need to do to support EGD is instead of doing:
 
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'bytes': % PRId64  },
 +  size);
 +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data);
 +qobject_decref(data);
 
 Do:
 
  while (size  0) {
  uint8_t partial_size = MIN(255, size);
  uint8_t msg[2] = { 0x02, partial_size };
 
  qemu_chr_write(s-chr, msg, sizeof(msg));
 
  size -= partial_size;
  }
 
 And that's it.  It's an extremely simple protocol to support.  It
 would actually reduce the total size of the patch.
 
 So I now get what your objection is, and that is because of an
 assumption you're making which is incorrect.
 
 An OS asking for 5038 bytes of entropy is doing just that: asking for
 those bytes.  That doesn't mean a hardware device can provide it with
 that much entropy.  So, the hardware device here will just provide
 whatever is available, and the OS has to be happy with what it got.
 If it got 200 bytes from the device, the OS is then free to demand
 even 7638 bytes more, but it may not get anything for quite a while.
 
 (This is exactly how things work with /dev/random and /dev/urandom
 too, btw.  And /dev/urandom was invented for apps which can't wait for
 all the data they're asking to come to them.)
 
 As it turns out, the EGD protocol also has a mechanism to ask for
 ask much entropy as is available at the current moment.  It also has
 a mechanism to query the amount of available entropy.
 
 And no, you're assertion that we don't care about how much entropy
 the guest is requesting is wrong.  If we lose entropy requests, then
 we never know if the guest is in a state where it's expecting
 entropy and we're not sending it.
 
 Consider:
 
 - Guest requests entropy (X bytes)
 - libvirt sees request
 - libvirt sends X bytes to guest
 - Guest requests entropy (Y bytes), QEMU filters due to rate limiting
 
 The guest will never request entropy again and libvirt will never
 send entropy again.  The device is effectively dead-locked.

WRT the impl of rate limiting Amit has used in this patch, you
are correct, however, this is not how QEMU should be rate
limiting this event. As mentioned in an earlier reply in this
thread, any rate limited /must/ work as follows:

   - Guest sends randomness request for 10 bytes
   - QMP event gets sent for 10 bytes
   - Guest sends randomness request for 4 bytes
   - QMP is dropped
   - Guest sends randomness request for 8 bytes
   - QMP event gets sent for 12 bytes

ie, the byte count for any events which are dropped, must be added to
the byte count in the next emitted event. Also as Amit mentioned in his
reply to me, it should take account of multiple devices, or we should
limit QEMU to 1 single RNG device per guest, as we do for the balloon
driver.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH] target-ppc: Fix 2nd parameter for tcg_gen_shri_tl

2012-06-25 Thread Andreas Färber

Am 24.06.2012 17:14, schrieb Alexander Graf:
 On 24.06.2012, at 16:18, Stefan Weil s...@weilnetz.de wrote:
 
 This fixes a compiler error when QEMU was configured with --enable-debug.

 Signed-off-by: Stefan Weil s...@weilnetz.de
 ---

 Alex, please review. It fixes the compiler error, but I did the change
 simply by comparision with other functions, so I have no idea whether
 it is really correct.
 
 Looks good at a first glance, but will verify tonight. Thanks a lot for 
 catching it!
 
 I guess we should add an --enable-debug build to buildbot.

Blue/Stefan, could you set up a feature page on the Wiki for the AREG0
refactoring? Then we could document what maintainers/testers need to
verify for the next series there.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device

2012-06-25 Thread Anthony Liguori


On 06/25/2012 07:10 AM, Daniel P. Berrange wrote:

On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote:

On 06/22/2012 01:50 PM, Amit Shah wrote:

On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote:

On 06/22/2012 08:34 AM, Daniel P. Berrange wrote:


Oh, that's a good point.


But easily fixed.


Of course, except that now we have to maintain compatibility so some
hideous hack goes in.

This is what fundamentally makes using events a bad approach.  There
are more problems lurking.  This is the fundamental complexity of
having two non-synchronized communication channels when you're
trying to synchronize some sort of activity.

BTW, despite what danpb mentioned, you are rate limiting entropy
requests in this patch series


All of these problems are naturally solved using a protocol over a 
CharDriverState.


Can we at least agree on merging a patch which just includes the
raw chardev backend support for virtio-rng ? ie drop the QMP
event for now, so we can make some step forward.


All we need to do to support EGD is instead of doing:

+QObject *data;
+
+data = qobject_from_jsonf({ 'bytes': % PRId64  },
+  size);
+monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data);
+qobject_decref(data);

Do:

 while (size   0) {
 uint8_t partial_size = MIN(255, size);
 uint8_t msg[2] = { 0x02, partial_size };

 qemu_chr_write(s-chr, msg, sizeof(msg));

 size -= partial_size;
 }

And that's it.  It's an extremely simple protocol to support.  It
would actually reduce the total size of the patch.


So I now get what your objection is, and that is because of an
assumption you're making which is incorrect.

An OS asking for 5038 bytes of entropy is doing just that: asking for
those bytes.  That doesn't mean a hardware device can provide it with
that much entropy.  So, the hardware device here will just provide
whatever is available, and the OS has to be happy with what it got.
If it got 200 bytes from the device, the OS is then free to demand
even 7638 bytes more, but it may not get anything for quite a while.

(This is exactly how things work with /dev/random and /dev/urandom
too, btw.  And /dev/urandom was invented for apps which can't wait for
all the data they're asking to come to them.)


As it turns out, the EGD protocol also has a mechanism to ask for
ask much entropy as is available at the current moment.  It also has
a mechanism to query the amount of available entropy.

And no, you're assertion that we don't care about how much entropy
the guest is requesting is wrong.  If we lose entropy requests, then
we never know if the guest is in a state where it's expecting
entropy and we're not sending it.

Consider:

- Guest requests entropy (X bytes)
- libvirt sees request
- libvirt sends X bytes to guest
- Guest requests entropy (Y bytes), QEMU filters due to rate limiting

The guest will never request entropy again and libvirt will never
send entropy again.  The device is effectively dead-locked.


WRT the impl of rate limiting Amit has used in this patch, you
are correct, however, this is not how QEMU should be rate
limiting this event. As mentioned in an earlier reply in this
thread, any rate limited /must/ work as follows:

- Guest sends randomness request for 10 bytes
- QMP event gets sent for 10 bytes
- Guest sends randomness request for 4 bytes
- QMP is dropped
- Guest sends randomness request for 8 bytes
- QMP event gets sent for 12 bytes

ie, the byte count for any events which are dropped, must be added to
the byte count in the next emitted event. Also as Amit mentioned in his
reply to me, it should take account of multiple devices, or we should
limit QEMU to 1 single RNG device per guest, as we do for the balloon
driver.


QMP events are not meant to be used like this.  They are posted events and since 
we cannot know if there is something connected to the monitor, it's always 
possible for the backend to lose the information associated with an event.


Events really should just be used to indicate, hey, you should go query 
information from QEMU now.


Even what you suggest above doesn't handle the case where a management 
application restarts.


If you were going to do this via QMP, you'd want to introduce a command to query 
the total outstanding requested entropy for a given device and then send a event 
whenever that value changes.


But again, it's silly to use QMP for this.  Using an inline protocol simplifies 
things quite a bit.


Regards,

Anthony Liguori



Daniel

Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device

2012-06-25 Thread Daniel P. Berrange

On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote:
 On 06/25/2012 07:10 AM, Daniel P. Berrange wrote:
 On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote:
 On 06/22/2012 01:50 PM, Amit Shah wrote:
 On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote:
 On 06/22/2012 08:34 AM, Daniel P. Berrange wrote:
 
 Oh, that's a good point.
 
 But easily fixed.
 
 Of course, except that now we have to maintain compatibility so some
 hideous hack goes in.
 
 This is what fundamentally makes using events a bad approach.  There
 are more problems lurking.  This is the fundamental complexity of
 having two non-synchronized communication channels when you're
 trying to synchronize some sort of activity.
 
 BTW, despite what danpb mentioned, you are rate limiting entropy
 requests in this patch series
 
 All of these problems are naturally solved using a protocol over a 
 CharDriverState.
 
 Can we at least agree on merging a patch which just includes the
 raw chardev backend support for virtio-rng ? ie drop the QMP
 event for now, so we can make some step forward.
 
 All we need to do to support EGD is instead of doing:
 
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'bytes': % PRId64  },
 +  size);
 +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data);
 +qobject_decref(data);
 
 Do:
 
  while (size   0) {
  uint8_t partial_size = MIN(255, size);
  uint8_t msg[2] = { 0x02, partial_size };
 
  qemu_chr_write(s-chr, msg, sizeof(msg));
 
  size -= partial_size;
  }
 
 And that's it.  It's an extremely simple protocol to support.  It
 would actually reduce the total size of the patch.
 
 So I now get what your objection is, and that is because of an
 assumption you're making which is incorrect.
 
 An OS asking for 5038 bytes of entropy is doing just that: asking for
 those bytes.  That doesn't mean a hardware device can provide it with
 that much entropy.  So, the hardware device here will just provide
 whatever is available, and the OS has to be happy with what it got.
 If it got 200 bytes from the device, the OS is then free to demand
 even 7638 bytes more, but it may not get anything for quite a while.
 
 (This is exactly how things work with /dev/random and /dev/urandom
 too, btw.  And /dev/urandom was invented for apps which can't wait for
 all the data they're asking to come to them.)
 
 As it turns out, the EGD protocol also has a mechanism to ask for
 ask much entropy as is available at the current moment.  It also has
 a mechanism to query the amount of available entropy.
 
 And no, you're assertion that we don't care about how much entropy
 the guest is requesting is wrong.  If we lose entropy requests, then
 we never know if the guest is in a state where it's expecting
 entropy and we're not sending it.
 
 Consider:
 
 - Guest requests entropy (X bytes)
 - libvirt sees request
 - libvirt sends X bytes to guest
 - Guest requests entropy (Y bytes), QEMU filters due to rate limiting
 
 The guest will never request entropy again and libvirt will never
 send entropy again.  The device is effectively dead-locked.
 
 WRT the impl of rate limiting Amit has used in this patch, you
 are correct, however, this is not how QEMU should be rate
 limiting this event. As mentioned in an earlier reply in this
 thread, any rate limited /must/ work as follows:
 
 - Guest sends randomness request for 10 bytes
 - QMP event gets sent for 10 bytes
 - Guest sends randomness request for 4 bytes
 - QMP is dropped
 - Guest sends randomness request for 8 bytes
 - QMP event gets sent for 12 bytes
 
 ie, the byte count for any events which are dropped, must be added to
 the byte count in the next emitted event. Also as Amit mentioned in his
 reply to me, it should take account of multiple devices, or we should
 limit QEMU to 1 single RNG device per guest, as we do for the balloon
 driver.
 
 QMP events are not meant to be used like this.  They are posted
 events and since we cannot know if there is something connected to
 the monitor, it's always possible for the backend to lose the
 information associated with an event.
 
 Events really should just be used to indicate, hey, you should go
 query information from QEMU now.
 
 Even what you suggest above doesn't handle the case where a
 management application restarts.
 
 If you were going to do this via QMP, you'd want to introduce a
 command to query the total outstanding requested entropy for a given
 device and then send a event whenever that value changes.
 
 But again, it's silly to use QMP for this.  Using an inline protocol
 simplifies things quite a bit.

Ok, I agree that being rebust against mgmt layer restarts is a good
reason for not relying on QMP events for this, and that using the
EGD protocol would nicely solve this problem. So lets drop the QMP
event. I still think it is key to allow use of raw chardevs in
addition to EDGE though, so you

Re: [Qemu-devel] [PATCH v6 4/6] fdc: fix interrupt handling

2012-06-25 Thread Kevin Wolf

Am 22.06.2012 12:33, schrieb Pavel Hrdina:
 If you call the SENSE INTERRUPT STATUS command while there is no interrupt
 waiting you get as result unknown command.
 
 Fixed status0 register handling for read/write/format commands.
 
 Signed-off-by: Pavel Hrdina phrd...@redhat.com
 ---
  hw/fdc.c |   34 +-
  1 files changed, 21 insertions(+), 13 deletions(-)
 
 diff --git a/hw/fdc.c b/hw/fdc.c
 index 0270264..e28841c 100644
 --- a/hw/fdc.c
 +++ b/hw/fdc.c
 @@ -307,6 +307,9 @@ enum {
  };
  
  enum {
 +FD_SR0_DS0  = 0x01,
 +FD_SR0_DS1  = 0x02,
 +FD_SR0_HEAD = 0x04,
  FD_SR0_EQPMT= 0x10,
  FD_SR0_SEEK = 0x20,
  FD_SR0_ABNTERM  = 0x40,
 @@ -972,14 +975,15 @@ static void fdctrl_reset_fifo(FDCtrl *fdctrl)
  }
  
  /* Set FIFO status for the host to read */
 -static void fdctrl_set_fifo(FDCtrl *fdctrl, int fifo_len, int do_irq)
 +static void fdctrl_set_fifo(FDCtrl *fdctrl, int fifo_len, uint8_t status0)
  {
  fdctrl-data_dir = FD_DIR_READ;
  fdctrl-data_len = fifo_len;
  fdctrl-data_pos = 0;
  fdctrl-msr |= FD_MSR_CMDBUSY | FD_MSR_RQM | FD_MSR_DIO;
 -if (do_irq)
 -fdctrl_raise_irq(fdctrl, 0x00);
 +if (status0) {
 +fdctrl_raise_irq(fdctrl, status0);
 +}

Is status0 != 0 the real condition or is it just what we have here and
what happens to give the right result with our implementation?

  }
  
  /* Set an error: unimplemented/unknown command */
 @@ -1044,10 +1048,12 @@ static void fdctrl_stop_transfer(FDCtrl *fdctrl, 
 uint8_t status0,
  FDrive *cur_drv;
  
  cur_drv = get_cur_drv(fdctrl);
 +fdctrl-status0 = status0 | FD_SR0_SEEK | (cur_drv-head  2) |
 +  GET_CUR_DRV(fdctrl);
 +
  FLOPPY_DPRINTF(transfer status: %02x %02x %02x (%02x)\n,
 -   status0, status1, status2,
 -   status0 | (cur_drv-head  2) | GET_CUR_DRV(fdctrl));
 -fdctrl-fifo[0] = status0 | (cur_drv-head  2) | GET_CUR_DRV(fdctrl);
 +   status0, status1, status2, fdctrl-status0);
 +fdctrl-fifo[0] = fdctrl-status0;
  fdctrl-fifo[1] = status1;
  fdctrl-fifo[2] = status2;
  fdctrl-fifo[3] = cur_drv-track;
 @@ -1060,7 +1066,7 @@ static void fdctrl_stop_transfer(FDCtrl *fdctrl, 
 uint8_t status0,
  }
  fdctrl-msr |= FD_MSR_RQM | FD_MSR_DIO;
  fdctrl-msr = ~FD_MSR_NONDMA;
 -fdctrl_set_fifo(fdctrl, 7, 1);
 +fdctrl_set_fifo(fdctrl, 7, fdctrl-status0);
  }
  
  /* Prepare a data transfer (either DMA or FIFO) */
 @@ -1175,7 +1181,7 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
 direction)
  if (direction != FD_DIR_WRITE)
  fdctrl-msr |= FD_MSR_DIO;
  /* IO based transfer: calculate len */
 -fdctrl_raise_irq(fdctrl, 0x00);
 +fdctrl_raise_irq(fdctrl, FD_SR0_SEEK);
  
  return;
  }
 @@ -1604,16 +1610,18 @@ static void 
 fdctrl_handle_sense_interrupt_status(FDCtrl *fdctrl, int direction)
  {
  FDrive *cur_drv = get_cur_drv(fdctrl);
  
 -if(fdctrl-reset_sensei  0) {
 +if (fdctrl-reset_sensei  0) {
  fdctrl-fifo[0] =
  FD_SR0_RDYCHG + FD_RESET_SENSEI_COUNT - fdctrl-reset_sensei;
  fdctrl-reset_sensei--;
 +} else if (!(fdctrl-sra  FD_SRA_INTPEND)) {
 +fdctrl-fifo[0] = FD_SR0_INVCMD;
 +fdctrl_set_fifo(fdctrl, 1, 0);
 +return;
  } else {
 -/* XXX: status0 handling is broken for read/write
 -   commands, so we do this hack. It should be suppressed
 -   ASAP */
  fdctrl-fifo[0] =
 -FD_SR0_SEEK | (cur_drv-head  2) | GET_CUR_DRV(fdctrl);
 +(fdctrl-status0  ~(FD_SR0_HEAD | FD_SR0_DS1 | FD_SR0_DS0))
 +| GET_CUR_DRV(fdctrl);

Why isn't fdctrl-status0 already right?

In any case, I think if you're masking out the head number from status0,
you need to or it back, even if from a different source.

Kevin

[Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t

2012-06-25 Thread Peter Maydell

This patchset defines a set of macros for use in printf format strings
for handling target_phys_addr_t values, as suggested by Andreas Färber
(http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html)

We can then use them in some places which were previously either
making assumptions about the width of target_phys_addr_t or had
an ifdef to handle the 32 and 64 bit cases separately. Patches 2-4
cover the format-string related parts of Anthony's original make
target_phys_addr_t 64 bits in all cases patch:
http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01523.html

Peter Maydell (4):
  targphys.h: Define PRI*PLX format specifier macros
  hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx
  hw/sh_serial: Use PRIxPLX rather than %x for physaddr
  monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef

 hw/omap.h  |8 +---
 hw/sh_serial.c |5 +++--
 monitor.c  |   29 -
 targphys.h |   16 
 4 files changed, 24 insertions(+), 34 deletions(-)

[Qemu-devel] [PATCH 1/4] targphys.h: Define PRI*PLX format specifier macros

2012-06-25 Thread Peter Maydell

Define a set of PRI*PLX format specifier macros for working
with target_phys_addr_t types. These follow the standard pattern
for such macros, and are more flexible than TARGET_FMT_plx, which
does not allow specification of field widths.

Suggested-by: Andreas Färber afaer...@suse.de
Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 targphys.h |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/targphys.h b/targphys.h
index 95648d6..d5b5636 100644
--- a/targphys.h
+++ b/targphys.h
@@ -11,10 +11,26 @@
 typedef uint32_t target_phys_addr_t;
 #define TARGET_PHYS_ADDR_MAX UINT32_MAX
 #define TARGET_FMT_plx %08x
+/* Format strings for printing target_phys_addr_t types.
+ * These are recommended over the less flexible TARGET_FMT_plx,
+ * which is retained for the benefit of existing code.
+ */
+#define PRIdPLX PRId32
+#define PRIiPLX PRIi32
+#define PRIoPLX PRIo32
+#define PRIuPLX PRIu32
+#define PRIxPLX PRIx32
+#define PRIXPLX PRIX32
 #elif TARGET_PHYS_ADDR_BITS == 64
 typedef uint64_t target_phys_addr_t;
 #define TARGET_PHYS_ADDR_MAX UINT64_MAX
 #define TARGET_FMT_plx %016 PRIx64
+#define PRIdPLX PRId64
+#define PRIiPLX PRIi64
+#define PRIoPLX PRIo64
+#define PRIuPLX PRIu64
+#define PRIxPLX PRIx64
+#define PRIXPLX PRIX64
 #endif
 #endif
 
-- 
1.7.1

[Qemu-devel] [PATCH 4/4] monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef

2012-06-25 Thread Peter Maydell

Now we have PRX*PLX for printing target_phys_addr_t values, we
can use them in monitor.c rather than having duplicate code
in two arms of a TARGET_PHYS_ADDR_BITS ifdef.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 monitor.c |   29 -
 1 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/monitor.c b/monitor.c
index f6107ba..6dea2a3 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1262,45 +1262,24 @@ static void do_print(Monitor *mon, const QDict *qdict)
 int format = qdict_get_int(qdict, format);
 target_phys_addr_t val = qdict_get_int(qdict, val);
 
-#if TARGET_PHYS_ADDR_BITS == 32
 switch(format) {
 case 'o':
-monitor_printf(mon, %#o, val);
+monitor_printf(mon, %# PRIoPLX, val);
 break;
 case 'x':
-monitor_printf(mon, %#x, val);
+monitor_printf(mon, %# PRIxPLX, val);
 break;
 case 'u':
-monitor_printf(mon, %u, val);
+monitor_printf(mon, % PRIuPLX, val);
 break;
 default:
 case 'd':
-monitor_printf(mon, %d, val);
+monitor_printf(mon, % PRIdPLX, val);
 break;
 case 'c':
 monitor_printc(mon, val);
 break;
 }
-#else
-switch(format) {
-case 'o':
-monitor_printf(mon, %# PRIo64, val);
-break;
-case 'x':
-monitor_printf(mon, %# PRIx64, val);
-break;
-case 'u':
-monitor_printf(mon, % PRIu64, val);
-break;
-default:
-case 'd':
-monitor_printf(mon, % PRId64, val);
-break;
-case 'c':
-monitor_printc(mon, val);
-break;
-}
-#endif
 monitor_printf(mon, \n);
 }
 
-- 
1.7.1

Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device

2012-06-25 Thread Anthony Liguori


On 06/25/2012 07:30 AM, Daniel P. Berrange wrote:

On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote:

On 06/25/2012 07:10 AM, Daniel P. Berrange wrote:

On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote:

On 06/22/2012 01:50 PM, Amit Shah wrote:

On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote:

On 06/22/2012 08:34 AM, Daniel P. Berrange wrote:


Oh, that's a good point.


But easily fixed.


Of course, except that now we have to maintain compatibility so some
hideous hack goes in.

This is what fundamentally makes using events a bad approach.  There
are more problems lurking.  This is the fundamental complexity of
having two non-synchronized communication channels when you're
trying to synchronize some sort of activity.

BTW, despite what danpb mentioned, you are rate limiting entropy
requests in this patch series


All of these problems are naturally solved using a protocol over a 
CharDriverState.


Can we at least agree on merging a patch which just includes the
raw chardev backend support for virtio-rng ? ie drop the QMP
event for now, so we can make some step forward.


All we need to do to support EGD is instead of doing:

+QObject *data;
+
+data = qobject_from_jsonf({ 'bytes': % PRId64  },
+  size);
+monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data);
+qobject_decref(data);

Do:

 while (size0) {
 uint8_t partial_size = MIN(255, size);
 uint8_t msg[2] = { 0x02, partial_size };

 qemu_chr_write(s-chr, msg, sizeof(msg));

 size -= partial_size;
 }

And that's it.  It's an extremely simple protocol to support.  It
would actually reduce the total size of the patch.


So I now get what your objection is, and that is because of an
assumption you're making which is incorrect.

An OS asking for 5038 bytes of entropy is doing just that: asking for
those bytes.  That doesn't mean a hardware device can provide it with
that much entropy.  So, the hardware device here will just provide
whatever is available, and the OS has to be happy with what it got.
If it got 200 bytes from the device, the OS is then free to demand
even 7638 bytes more, but it may not get anything for quite a while.

(This is exactly how things work with /dev/random and /dev/urandom
too, btw.  And /dev/urandom was invented for apps which can't wait for
all the data they're asking to come to them.)


As it turns out, the EGD protocol also has a mechanism to ask for
ask much entropy as is available at the current moment.  It also has
a mechanism to query the amount of available entropy.

And no, you're assertion that we don't care about how much entropy
the guest is requesting is wrong.  If we lose entropy requests, then
we never know if the guest is in a state where it's expecting
entropy and we're not sending it.

Consider:

- Guest requests entropy (X bytes)
- libvirt sees request
- libvirt sends X bytes to guest
- Guest requests entropy (Y bytes), QEMU filters due to rate limiting

The guest will never request entropy again and libvirt will never
send entropy again.  The device is effectively dead-locked.


WRT the impl of rate limiting Amit has used in this patch, you
are correct, however, this is not how QEMU should be rate
limiting this event. As mentioned in an earlier reply in this
thread, any rate limited /must/ work as follows:

- Guest sends randomness request for 10 bytes
- QMP event gets sent for 10 bytes
- Guest sends randomness request for 4 bytes
- QMP is dropped
- Guest sends randomness request for 8 bytes
- QMP event gets sent for 12 bytes

ie, the byte count for any events which are dropped, must be added to
the byte count in the next emitted event. Also as Amit mentioned in his
reply to me, it should take account of multiple devices, or we should
limit QEMU to 1 single RNG device per guest, as we do for the balloon
driver.


QMP events are not meant to be used like this.  They are posted
events and since we cannot know if there is something connected to
the monitor, it's always possible for the backend to lose the
information associated with an event.

Events really should just be used to indicate, hey, you should go
query information from QEMU now.

Even what you suggest above doesn't handle the case where a
management application restarts.

If you were going to do this via QMP, you'd want to introduce a
command to query the total outstanding requested entropy for a given
device and then send a event whenever that value changes.

But again, it's silly to use QMP for this.  Using an inline protocol
simplifies things quite a bit.


Ok, I agree that being rebust against mgmt layer restarts is a good
reason for not relying on QMP events for this, and that using the
EGD protocol would nicely solve this problem. So lets drop the QMP
event. I still think it is key to allow use of raw chardevs in
addition to EDGE though, so you can easily attached QEMU directly
to a

[Qemu-devel] [PATCH 2/4] hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx

2012-06-25 Thread Peter Maydell

Use the new PRIxPLX macro to avoid the need to define an
OMAP_FMT_plx macro whose expansion depends directly on
TARGET_PHYS_ADDR_BITS.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/omap.h |8 +---
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/hw/omap.h b/hw/omap.h
index 3d98941..2560aba 100644
--- a/hw/omap.h
+++ b/hw/omap.h
@@ -942,13 +942,7 @@ struct omap_mpu_state_s *omap2420_mpu_init(MemoryRegion 
*sysmem,
 unsigned long sdram_size,
 const char *core);
 
-# if TARGET_PHYS_ADDR_BITS == 32
-#  define OMAP_FMT_plx %#08x
-# elif TARGET_PHYS_ADDR_BITS == 64
-#  define OMAP_FMT_plx %#08 PRIx64
-# else
-#  error TARGET_PHYS_ADDR_BITS undefined
-# endif
+#define OMAP_FMT_plx %#08 PRIxPLX
 
 uint32_t omap_badwidth_read8(void *opaque, target_phys_addr_t addr);
 void omap_badwidth_write8(void *opaque, target_phys_addr_t addr,
-- 
1.7.1

[Qemu-devel] [PATCH 3/4] hw/sh_serial: Use PRIxPLX rather than %x for physaddr

2012-06-25 Thread Peter Maydell

Switch a format string from %x to PRIxPLX so that it will
continue to work even if target_phys_addr_t is changed
to 64 bits in the future.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 hw/sh_serial.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/sh_serial.c b/hw/sh_serial.c
index 43b0eb1..6f89696 100644
--- a/hw/sh_serial.c
+++ b/hw/sh_serial.c
@@ -186,7 +186,7 @@ static void sh_serial_write(void *opaque, 
target_phys_addr_t offs,
 }
 }
 
-fprintf(stderr, sh_serial: unsupported write to 0x%02x\n, offs);
+fprintf(stderr, sh_serial: unsupported write to 0x%02 PRIxPLX \n, 
offs);
 abort();
 }
 
@@ -287,7 +287,8 @@ static uint64_t sh_serial_read(void *opaque, 
target_phys_addr_t offs,
 #endif
 
 if (ret  ~((1  16) - 1)) {
-fprintf(stderr, sh_serial: unsupported read from 0x%02x\n, offs);
+fprintf(stderr,
+sh_serial: unsupported read from 0x%02 PRIxPLX \n, offs);
 abort();
 }
 
-- 
1.7.1

Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1

2012-06-25 Thread Kevin Wolf

Am 23.06.2012 02:33, schrieb Michael Roth:
 Hi everyone,
 
 I've set up a tree for stable-1.1 at:
 
 git://github.com/mdroth/qemu.git stable-1.1
 
 That is simply the initial commit for v1.1.0
 
 I've set up a staging tree for the first batch of commits at:
 
 git://github.com/mdroth/qemu.git stable-1.1-staging
 
 This should capture everything explicitly tagged for stable/1.1. I've also
 pulled in relevant build fixes and a handful of applicable bug fixes.
 
 Please let me know if I've missed any patches, or if any of these shouldn't be
 applied.
 
 Testing so far has been all-target builds on x86, make check, qemu-test, and
 installation + old(current stable-1.1)-new and new-old migration for
 fc15/fc16/winxp. I plan on doing a more extensive final round of testing later
 next week before committing, as that will more or less be QEMU v1.1.1
 
 For the next release I plan on doing weekly or bi-weekly testing of staged
 patches and emails of this sort before committing to stable-1.1, depending on
 the number of patches and code churn, and will continue working off my
 stable-1.1-staging tree on github for those.

Please consider as well:

4bb9c939a57103898f5a51aa6a7336eb3320d923 ahci: SATA FIS is 20 bytes...
6f3c714eb7730630241fd0b33b799352d7feb876 sheepdog: fix return value...
47ce9ef7f89032c4079bf5132a12d1bfd4d5bca5 virtio: Fix compiler warning...
f085800e245836fed27fddb3b624a29326637657 qemu-img: document qed...

Also I think you should add your SoB to each of the patches and document
which commit in master you backported (i.e. use git cherry-pick -sx).

Kevin

Re: [Qemu-devel] [PATCH v3 1/1] virtio-rng: hardware random number generator device

2012-06-25 Thread Daniel P. Berrange

On Mon, Jun 25, 2012 at 07:54:18AM -0500, Anthony Liguori wrote:
 On 06/25/2012 07:30 AM, Daniel P. Berrange wrote:
 On Mon, Jun 25, 2012 at 07:22:13AM -0500, Anthony Liguori wrote:
 On 06/25/2012 07:10 AM, Daniel P. Berrange wrote:
 On Fri, Jun 22, 2012 at 02:59:13PM -0500, Anthony Liguori wrote:
 On 06/22/2012 01:50 PM, Amit Shah wrote:
 On (Fri) 22 Jun 2012 [08:44:52], Anthony Liguori wrote:
 On 06/22/2012 08:34 AM, Daniel P. Berrange wrote:
 
 Oh, that's a good point.
 
 But easily fixed.
 
 Of course, except that now we have to maintain compatibility so some
 hideous hack goes in.
 
 This is what fundamentally makes using events a bad approach.  There
 are more problems lurking.  This is the fundamental complexity of
 having two non-synchronized communication channels when you're
 trying to synchronize some sort of activity.
 
 BTW, despite what danpb mentioned, you are rate limiting entropy
 requests in this patch series
 
 All of these problems are naturally solved using a protocol over a 
 CharDriverState.
 
 Can we at least agree on merging a patch which just includes the
 raw chardev backend support for virtio-rng ? ie drop the QMP
 event for now, so we can make some step forward.
 
 All we need to do to support EGD is instead of doing:
 
 +QObject *data;
 +
 +data = qobject_from_jsonf({ 'bytes': % PRId64  },
 +  size);
 +monitor_protocol_event(QEVENT_ENTROPY_NEEDED, data);
 +qobject_decref(data);
 
 Do:
 
  while (size0) {
  uint8_t partial_size = MIN(255, size);
  uint8_t msg[2] = { 0x02, partial_size };
 
  qemu_chr_write(s-chr, msg, sizeof(msg));
 
  size -= partial_size;
  }
 
 And that's it.  It's an extremely simple protocol to support.  It
 would actually reduce the total size of the patch.
 
 So I now get what your objection is, and that is because of an
 assumption you're making which is incorrect.
 
 An OS asking for 5038 bytes of entropy is doing just that: asking for
 those bytes.  That doesn't mean a hardware device can provide it with
 that much entropy.  So, the hardware device here will just provide
 whatever is available, and the OS has to be happy with what it got.
 If it got 200 bytes from the device, the OS is then free to demand
 even 7638 bytes more, but it may not get anything for quite a while.
 
 (This is exactly how things work with /dev/random and /dev/urandom
 too, btw.  And /dev/urandom was invented for apps which can't wait for
 all the data they're asking to come to them.)
 
 As it turns out, the EGD protocol also has a mechanism to ask for
 ask much entropy as is available at the current moment.  It also has
 a mechanism to query the amount of available entropy.
 
 And no, you're assertion that we don't care about how much entropy
 the guest is requesting is wrong.  If we lose entropy requests, then
 we never know if the guest is in a state where it's expecting
 entropy and we're not sending it.
 
 Consider:
 
 - Guest requests entropy (X bytes)
 - libvirt sees request
 - libvirt sends X bytes to guest
 - Guest requests entropy (Y bytes), QEMU filters due to rate limiting
 
 The guest will never request entropy again and libvirt will never
 send entropy again.  The device is effectively dead-locked.
 
 WRT the impl of rate limiting Amit has used in this patch, you
 are correct, however, this is not how QEMU should be rate
 limiting this event. As mentioned in an earlier reply in this
 thread, any rate limited /must/ work as follows:
 
 - Guest sends randomness request for 10 bytes
 - QMP event gets sent for 10 bytes
 - Guest sends randomness request for 4 bytes
 - QMP is dropped
 - Guest sends randomness request for 8 bytes
 - QMP event gets sent for 12 bytes
 
 ie, the byte count for any events which are dropped, must be added to
 the byte count in the next emitted event. Also as Amit mentioned in his
 reply to me, it should take account of multiple devices, or we should
 limit QEMU to 1 single RNG device per guest, as we do for the balloon
 driver.
 
 QMP events are not meant to be used like this.  They are posted
 events and since we cannot know if there is something connected to
 the monitor, it's always possible for the backend to lose the
 information associated with an event.
 
 Events really should just be used to indicate, hey, you should go
 query information from QEMU now.
 
 Even what you suggest above doesn't handle the case where a
 management application restarts.
 
 If you were going to do this via QMP, you'd want to introduce a
 command to query the total outstanding requested entropy for a given
 device and then send a event whenever that value changes.
 
 But again, it's silly to use QMP for this.  Using an inline protocol
 simplifies things quite a bit.
 
 Ok, I agree that being rebust against mgmt layer restarts is a good
 reason for not relying on QMP events for this, and that using the
 EGD protocol would nicely solve this

Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use

2012-06-25 Thread Juan Quintela

Orit Wasserman owass...@redhat.com wrote:
 On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Always use accessors to read/set the dirty bitmap.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
 +/* read dirty bit (return 0 or 1) */
 +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
  {
 -return ram_list.phys_dirty[addr  TARGET_PAGE_BITS];
 +return cpu_physical_memory_get_dirty_flags(addr) == 0xff;
  }
 Juan,
 you changed the order of the functions , can your restore the order as it was.

There is reason after madness.  Now cpu_physical_memory_is_dirty calls
cpu_physical_memory_get_dirty_flags().  After this patch, only 4
functions touch the bitmap directly.
 index a68b65c..dd4833d 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
 void *host,
 
  ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
 last_ram_offset()  
 TARGET_PAGE_BITS);
 -memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
 -   0xff, size  TARGET_PAGE_BITS);
 +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff);

 This will be much slower than memset , do we use it much ?

No, we only use it each time that we add another region of RAM (i.e. a
very rare event).

Next step is to change the bitmap to three bitmaps of 1 bit per page.
That way, we only need to look at that 4 functions for changing it.

On an un-related note, I can't see how migration + TCG can work at the
moment.  TCG uses *is_dirty() to see if the page has changed, but if
we have migrated that page, is_dirty is going to return _not-dirty_.
AKA as bad as it can be :-(

Later, Juan.)

Re: [Qemu-devel] [PATCH 1/4] targphys.h: Define PRI*PLX format specifier macros

2012-06-25 Thread Eric Blake

On 06/25/2012 06:45 AM, Peter Maydell wrote:
 Define a set of PRI*PLX format specifier macros for working
 with target_phys_addr_t types. These follow the standard pattern
 for such macros, and are more flexible than TARGET_FMT_plx, which
 does not allow specification of field widths.
 
 Suggested-by: Andreas Färber afaer...@suse.de
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  targphys.h |   16 
  1 files changed, 16 insertions(+), 0 deletions(-)
 
 diff --git a/targphys.h b/targphys.h
 index 95648d6..d5b5636 100644
 --- a/targphys.h
 +++ b/targphys.h
 @@ -11,10 +11,26 @@
  typedef uint32_t target_phys_addr_t;
  #define TARGET_PHYS_ADDR_MAX UINT32_MAX
  #define TARGET_FMT_plx %08x
 +/* Format strings for printing target_phys_addr_t types.
 + * These are recommended over the less flexible TARGET_FMT_plx,
 + * which is retained for the benefit of existing code.
 + */
 +#define PRIdPLX PRId32

This risks collisions with future POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/inttypes.h.html#tag_13_20

Macro names beginning with PRI or SCN followed by any lowercase
letter or 'X' may be added to the macros defined in the inttypes.h header.

The correct way to do this is to first undefine any potential conflicts,
per this text in
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_01

If any header in the following table is included, macros with the
prefixes shown may be defined. After the last inclusion of a given
header, an application may use identifiers with the corresponding
prefixes for its own purpose, provided their use is preceded by a #undef
of the corresponding macro.
...
inttypes.h PRI[Xa-z], SCN[Xa-z]

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org





signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/8] Add spent time for migration

2012-06-25 Thread Eric Blake

On 06/25/2012 12:24 AM, Orit Wasserman wrote:

  # @MigrationInfo
 @@ -275,8 +280,9 @@
  #  'cancelled'. If this field is not returned, no migration process
  #  has been initiated
  #
 -# @ram: #optional @MigrationStats containing detailed migration status,
 -#   only returned if status is 'active'
 +# @ram: #optional @MigrationStats containing detailed migration
 +#   status, only returned if status is 'active' or
 +#   'completed'. 'comppleted' (since 1.2)

s/comppleted/completed/

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org





signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC][PATCH 05/11] kvm: Introduce kvm_irqchip_add_msi_route

2012-06-25 Thread Alex Williamson

On Mon, 2012-05-14 at 18:07 -0300, Jan Kiszka wrote:
 Add a service that establishes a static route from a virtual IRQ line to
 an MSI message. Will be used for IRQFD and device assignment. As we will
 use this service outside of CONFIG_KVM protected code, stub it properly.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  kvm-all.c  |   31 +++
  kvm-stub.c |8 
  kvm.h  |   10 ++
  3 files changed, 45 insertions(+), 4 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 8ab83db..f45b852 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1080,6 +1080,32 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  return kvm_irqchip_set_irq(s, route-kroute.gsi, 1);
  }
  
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 +{
 +struct kvm_irq_routing_entry kroute;
 +int gsi;
 +
 +if (!kvm_irqchip_in_kernel()) {
 +return -ENOSYS;
 +}
 +
 +gsi = kvm_get_pseudo_gsi(s);
 +if (gsi  0) {
 +return gsi;
 +}
 +
 +kroute.gsi = gsi;
 +kroute.type = KVM_IRQ_ROUTING_MSI;
 +kroute.flags = 0;
 +kroute.u.msi.address_lo = (uint32_t)msg.address;
 +kroute.u.msi.address_hi = msg.address  32;
 +kroute.u.msi.data = msg.data;
 +
 +kvm_add_routing_entry(s, kroute);
 +
 +return gsi;
 +}
 +
  #else /* !KVM_CAP_IRQ_ROUTING */
  
  static void kvm_init_irq_routing(KVMState *s)
 @@ -1090,6 +1116,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  {
  abort();
  }
 +
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 +{
 +abort();
 +}
  #endif /* !KVM_CAP_IRQ_ROUTING */


Jan,

Could we perhaps return a sane error value here?  This seems to be the
initial function used to setup direct MSI injection and if we have code
like:

virq = kvm_irqchip_add_msi_route(...)
if (virq  0) {
slow path...
} else {
fast path...
}

It works on x86 w/ or w/o irqchip, works with kvm disabled, but gives an
abort on !x86.  I really don't want to have to surround the above in a
#define or in-kernel ioapic test.  Thanks,

Alex

  static int kvm_irqchip_create(KVMState *s)
 diff --git a/kvm-stub.c b/kvm-stub.c
 index 47c573d..db3a7dc 100644
 --- a/kvm-stub.c
 +++ b/kvm-stub.c
 @@ -12,10 +12,13 @@
  
  #include qemu-common.h
  #include hw/hw.h
 +#include hw/msi.h
  #include cpu.h
  #include gdbstub.h
  #include kvm.h
  
 +KVMState *kvm_state;
 +
  int kvm_init_vcpu(CPUArchState *env)
  {
  return -ENOSYS;
 @@ -128,3 +131,8 @@ int kvm_on_sigbus(int code, void *addr)
  {
  return 1;
  }
 +
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 +{
 +return -ENOSYS;
 +}
 diff --git a/kvm.h b/kvm.h
 index 8b061bd..67df1f1 100644
 --- a/kvm.h
 +++ b/kvm.h
 @@ -44,6 +44,10 @@ typedef struct KVMCapabilityInfo {
  #define KVM_CAP_INFO(CAP) { KVM_CAP_ stringify(CAP), KVM_CAP_##CAP }
  #define KVM_CAP_LAST_INFO { NULL, 0 }
  
 +struct KVMState;
 +typedef struct KVMState KVMState;
 +extern KVMState *kvm_state;
 +
  /* external API */
  
  int kvm_init(void);
 @@ -88,10 +92,6 @@ int kvm_on_sigbus(int code, void *addr);
  
  /* internal API */
  
 -struct KVMState;
 -typedef struct KVMState KVMState;
 -extern KVMState *kvm_state;
 -
  int kvm_ioctl(KVMState *s, int type, ...);
  
  int kvm_vm_ioctl(KVMState *s, int type, ...);
 @@ -213,4 +213,6 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
 val, bool assign,
 uint32_t size);
  
  int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
 assign);
 +
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
  #endif

Re: [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg

2012-06-25 Thread Corey Bryant




On 06/25/2012 05:16 AM, Stefan Hajnoczi wrote:

On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote:

On 06/22/2012 12:36 PM, Corey Bryant wrote:

This sets the close-on-exec flag for the file descriptor received
via SCM_RIGHTS.

Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com
---
v4
  -This patch is new in v4 (ebl...@redhat.com)

  qemu-char.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-char.c b/qemu-char.c
index c2aaaee..f890113 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char 
*buf, size_t len)
  msg.msg_control = msg_control;
  msg.msg_controllen = sizeof(msg_control);

-ret = recvmsg(s-fd, msg, 0);
+ret = recvmsg(s-fd, msg, MSG_CMSG_CLOEXEC);


MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed
for addition); therefore, at the moment, it only exists on Linux and
Cygwin.  Does this need to have conditional code to allow compilation on
BSD, such as:

#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0
#endif

as well as fallback code that sets FD_CLOEXEC manually via fcntl() when
MSG_CMSG_CLOEXEC is missing?


Good point.  I think the answer is yes.  Just like qemu_open() we can
wrap recvmsg(2) with fd so that platforms with MSG_CMSG_CLOEXEC use
that flag and other platforms use qemu_set_cloexec().

Stefan



Thanks for pointing this out.  I'll make the updates in v5.

--
Regards,
Corey

Re: [Qemu-devel] [PATCH 0/2] introduce bdrv_swap, implement bdrv_append on top

2012-06-25 Thread Stefan Hajnoczi

On Fri, Jun 15, 2012 at 10:38 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 14.06.2012 16:55, schrieb Paolo Bonzini:
 Yet another tiny bit extracted from block mirroring, looks like it
 should be useful for block commit too.

 Paolo Bonzini (2):
   block: copy over job and dirty bitmap fields in bdrv_append
   block: introduce bdrv_swap, implement bdrv_append on top of it

  block.c |  175 
 +--
  block.h |    1 +
  2 files changed, 103 insertions(+), 73 deletions(-)


 I was really hoping we could get rid of bdrv_append() rather than extend
 it and spread its use...

 What exactly do we need this for? I'm sure you have good reasons, but
 with such hackish approaches the justification should be explicit.

Opening twice and switching to the new state is an okay approach.
It's just made very difficult because BlockDriverState has grown into
a monster struct.  So the implementation is ugly.

If we split BlockDriverState (e.g. guest-visible block device state vs
block driver state) can we do away with the in-place swap?  If it's
possible to implement this using regular pointer assignment instead of
swapping fields, then this approach will become reasonable IMO.

Stefan

Re: [Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t

2012-06-25 Thread Andreas Färber

Am 25.06.2012 14:45, schrieb Peter Maydell:
 This patchset defines a set of macros for use in printf format strings
 for handling target_phys_addr_t values, as suggested by Andreas Färber
 (http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html)

What is the X for again? I assume PL = Physical Long, X as in
hexadecimal would then contradict the d/u and duplicate x/X.

Thus I would suggest TARGET_PRIdPHYSADDR, also addressing Eric's
comment. Judging from patches 2-4 it's not used to frequently unlike its
user TARGET_FMT_plx, so we don't have to badly shorten it for usability.

Andreas

 We can then use them in some places which were previously either
 making assumptions about the width of target_phys_addr_t or had
 an ifdef to handle the 32 and 64 bit cases separately. Patches 2-4
 cover the format-string related parts of Anthony's original make
 target_phys_addr_t 64 bits in all cases patch:
 http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01523.html
 
 Peter Maydell (4):
   targphys.h: Define PRI*PLX format specifier macros
   hw/omap.h: Use PRIxPLX to define OMAP_FMT_plx
   hw/sh_serial: Use PRIxPLX rather than %x for physaddr
   monitor: Use PRI*PLX to avoid TARGET_PHYS_ADDR_BITS ifdef
 
  hw/omap.h  |8 +---
  hw/sh_serial.c |5 +++--
  monitor.c  |   29 -
  targphys.h |   16 
  4 files changed, 24 insertions(+), 34 deletions(-)

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [RFC][PATCH 05/11] kvm: Introduce kvm_irqchip_add_msi_route

2012-06-25 Thread Jan Kiszka

On 2012-06-25 15:38, Alex Williamson wrote:
 On Mon, 2012-05-14 at 18:07 -0300, Jan Kiszka wrote:
 Add a service that establishes a static route from a virtual IRQ line to
 an MSI message. Will be used for IRQFD and device assignment. As we will
 use this service outside of CONFIG_KVM protected code, stub it properly.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  kvm-all.c  |   31 +++
  kvm-stub.c |8 
  kvm.h  |   10 ++
  3 files changed, 45 insertions(+), 4 deletions(-)

 diff --git a/kvm-all.c b/kvm-all.c
 index 8ab83db..f45b852 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1080,6 +1080,32 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  return kvm_irqchip_set_irq(s, route-kroute.gsi, 1);
  }
  
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 +{
 +struct kvm_irq_routing_entry kroute;
 +int gsi;
 +
 +if (!kvm_irqchip_in_kernel()) {
 +return -ENOSYS;
 +}
 +
 +gsi = kvm_get_pseudo_gsi(s);
 +if (gsi  0) {
 +return gsi;
 +}
 +
 +kroute.gsi = gsi;
 +kroute.type = KVM_IRQ_ROUTING_MSI;
 +kroute.flags = 0;
 +kroute.u.msi.address_lo = (uint32_t)msg.address;
 +kroute.u.msi.address_hi = msg.address  32;
 +kroute.u.msi.data = msg.data;
 +
 +kvm_add_routing_entry(s, kroute);
 +
 +return gsi;
 +}
 +
  #else /* !KVM_CAP_IRQ_ROUTING */
  
  static void kvm_init_irq_routing(KVMState *s)
 @@ -1090,6 +1116,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
  {
  abort();
  }
 +
 +int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 +{
 +abort();
 +}
  #endif /* !KVM_CAP_IRQ_ROUTING */
 
 
 Jan,
 
 Could we perhaps return a sane error value here?  This seems to be the
 initial function used to setup direct MSI injection and if we have code
 like:
 
 virq = kvm_irqchip_add_msi_route(...)
 if (virq  0) {
 slow path...
 } else {
 fast path...
 }
 
 It works on x86 w/ or w/o irqchip, works with kvm disabled, but gives an
 abort on !x86.  I really don't want to have to surround the above in a
 #define or in-kernel ioapic test.  Thanks,

No problems, just send a patch.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 7/8] dirty bitmap: abstract its use

2012-06-25 Thread Orit Wasserman

On 06/25/2012 04:20 PM, Juan Quintela wrote:
 Orit Wasserman owass...@redhat.com wrote:
 On 06/22/2012 04:46 PM, Juan Quintela wrote:
 Always use accessors to read/set the dirty bitmap.

 Signed-off-by: Juan Quintela quint...@redhat.com
 -static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
 +/* read dirty bit (return 0 or 1) */
 +static inline int cpu_physical_memory_is_dirty(ram_addr_t addr)
  {
 -return ram_list.phys_dirty[addr  TARGET_PAGE_BITS];
 +return cpu_physical_memory_get_dirty_flags(addr) == 0xff;
  }
 Juan,
 you changed the order of the functions , can your restore the order as it 
 was.
 
 There is reason after madness.  Now cpu_physical_memory_is_dirty calls
 cpu_physical_memory_get_dirty_flags().  After this patch, only 4
 functions touch the bitmap directly.
ok.
 index a68b65c..dd4833d 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -2565,8 +2565,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
 void *host,

  ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
 last_ram_offset()  
 TARGET_PAGE_BITS);
 -memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
 -   0xff, size  TARGET_PAGE_BITS);
 +cpu_physical_memory_set_dirty_range(new_block-offset, size, 0xff);

 This will be much slower than memset , do we use it much ?
 
 No, we only use it each time that we add another region of RAM (i.e. a
 very rare event).
if it is rare , than it is fine
 
 Next step is to change the bitmap to three bitmaps of 1 bit per page.
 That way, we only need to look at that 4 functions for changing it.
 
Great.
 On an un-related note, I can't see how migration + TCG can work at the
 moment.  TCG uses *is_dirty() to see if the page has changed, but if
 we have migrated that page, is_dirty is going to return _not-dirty_.
 AKA as bad as it can be :-(
 
 Later, Juan.)

Reviewed-by: Orit Wasserman owass...@redhat.com

[Qemu-devel] [PATCH 2/4] target-xtensa: drop usage of prev_debug_excp_handler

2012-06-25 Thread Igor Mammedov

Chains of exception handlers are currently unused feature. Dropping it
to be consistent with target-i386 but it may simplify qom-ifying CPU
in future like for target-i386.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 target-xtensa/helper.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-xtensa/helper.c b/target-xtensa/helper.c
index 044ce18..d5bb171 100644
--- a/target-xtensa/helper.c
+++ b/target-xtensa/helper.c
@@ -54,8 +54,6 @@ static uint32_t check_hw_breakpoints(CPUXtensaState *env)
 return 0;
 }
 
-static CPUDebugExcpHandler *prev_debug_excp_handler;
-
 static void breakpoint_handler(CPUXtensaState *env)
 {
 if (env-watchpoint_hit) {
@@ -70,9 +68,6 @@ static void breakpoint_handler(CPUXtensaState *env)
 cpu_resume_from_signal(env, NULL);
 }
 }
-if (prev_debug_excp_handler) {
-prev_debug_excp_handler(env);
-}
 }
 
 XtensaCPU *cpu_xtensa_init(const char *cpu_model)
@@ -105,8 +100,7 @@ XtensaCPU *cpu_xtensa_init(const char *cpu_model)
 
 if (!debug_handler_inited  tcg_enabled()) {
 debug_handler_inited = 1;
-prev_debug_excp_handler =
-cpu_set_debug_excp_handler(breakpoint_handler);
+cpu_set_debug_excp_handler(breakpoint_handler);
 }
 
 xtensa_irq_init(env);
-- 
1.7.10.2

[Qemu-devel] [PATCH 0/4 v2] target-i386: move tcg intialization inside CPU object

2012-06-25 Thread Igor Mammedov

v2:
  - drop usage of prev_debug_excp_handler consistently in all users
  - split from reset patches to avoid confusion of inter-dependency

Compile  Run tested:
  target-i386: tcg and kvm mode
  i386-linux-user: running of /bin/ls
Compile tested:
  xtensa-softmmu  xtensaeb-softmmu

git tree for testing: 
  https://github.com/imammedo/qemu/tree/x86cpu_qom_tcg_v2


Igor Mammedov (4):
  target-i386: drop usage of prev_debug_excp_handler
  target-xtensa: drop usage of prev_debug_excp_handler
  cleanup cpu_set_debug_excp_handler
  target-i386: move tcg initialization into x86_cpu_initfn()

 cpu-exec.c |5 +
 exec-all.h |2 +-
 target-i386/cpu.c  |   10 ++
 target-i386/cpu.h  |1 +
 target-i386/helper.c   |   16 +---
 target-xtensa/helper.c |8 +---
 6 files changed, 15 insertions(+), 27 deletions(-)

-- 
1.7.10.2

[Qemu-devel] [PATCH 3/4] cleanup cpu_set_debug_excp_handler

2012-06-25 Thread Igor Mammedov

There are no users left for previous exception handler returned from
cpu_set_debug_excp_handler. It should simplify code a little.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 cpu-exec.c |5 +
 exec-all.h |2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 624c409..24607fb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -156,12 +156,9 @@ static inline TranslationBlock *tb_find_fast(CPUArchState 
*env)
 
 static CPUDebugExcpHandler *debug_excp_handler;
 
-CPUDebugExcpHandler *cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler)
+void cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler)
 {
-CPUDebugExcpHandler *old_handler = debug_excp_handler;
-
 debug_excp_handler = handler;
-return old_handler;
 }
 
 static void cpu_handle_debug_exception(CPUArchState *env)
diff --git a/exec-all.h b/exec-all.h
index 9bda7f7..c5ec8e1 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -357,7 +357,7 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, 
target_ulong addr);
 
 typedef void (CPUDebugExcpHandler)(CPUArchState *env);
 
-CPUDebugExcpHandler *cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler);
+void cpu_set_debug_excp_handler(CPUDebugExcpHandler *handler);
 
 /* vl.c */
 extern int singlestep;
-- 
1.7.10.2

[Qemu-devel] [PATCH 1/4] target-i386: drop usage of prev_debug_excp_handler

2012-06-25 Thread Igor Mammedov

Chains of exception handlers are currently unused feature, drop it
for now so as not to expose prev_debug_excp_handler at global
scope when moving tcg initialization into target-i386/cpu.c

Later we probably could re-invent better interface for this.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 target-i386/helper.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2cc8097..b9384f6 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -941,8 +941,6 @@ int check_hw_breakpoints(CPUX86State *env, int 
force_dr6_update)
 return hit_enabled;
 }
 
-static CPUDebugExcpHandler *prev_debug_excp_handler;
-
 static void breakpoint_handler(CPUX86State *env)
 {
 CPUBreakpoint *bp;
@@ -965,8 +963,6 @@ static void breakpoint_handler(CPUX86State *env)
 break;
 }
 }
-if (prev_debug_excp_handler)
-prev_debug_excp_handler(env);
 }
 
 typedef struct MCEInjectionParams {
@@ -1166,8 +1162,7 @@ X86CPU *cpu_x86_init(const char *cpu_model)
 inited = 1;
 optimize_flags_init();
 #ifndef CONFIG_USER_ONLY
-prev_debug_excp_handler =
-cpu_set_debug_excp_handler(breakpoint_handler);
+cpu_set_debug_excp_handler(breakpoint_handler);
 #endif
 }
 if (cpu_x86_register(cpu, cpu_model)  0) {
-- 
1.7.10.2

[Qemu-devel] [PATCH 4/4] target-i386: move tcg initialization into x86_cpu_initfn()

2012-06-25 Thread Igor Mammedov

In order to make cpu object not depended on external ad-hoc
initialization routines, move tcg initialization from cpu_x86_init
inside cpu object x86_cpu_initfn().

Signed-off-by: Igor Mammedov imamm...@redhat.com
Reviewed-by: Andreas Färber afaer...@suse.de
---
 target-i386/cpu.c|   10 ++
 target-i386/cpu.h|1 +
 target-i386/helper.c |   11 +--
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index fdd95be..445274c 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1716,6 +1716,7 @@ static void x86_cpu_initfn(Object *obj)
 {
 X86CPU *cpu = X86_CPU(obj);
 CPUX86State *env = cpu-env;
+static int inited;
 
 cpu_exec_init(env);
 
@@ -1745,6 +1746,15 @@ static void x86_cpu_initfn(Object *obj)
 x86_cpuid_set_tsc_freq, NULL, NULL, NULL);
 
 env-cpuid_apic_id = env-cpu_index;
+
+/* init various static tables used in TCG mode */
+if (tcg_enabled()  !inited) {
+inited = 1;
+optimize_flags_init();
+#ifndef CONFIG_USER_ONLY
+cpu_set_debug_excp_handler(breakpoint_handler);
+#endif
+}
 }
 
 static void x86_cpu_common_class_init(ObjectClass *oc, void *data)
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index bcf663e..80dcb49 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -933,6 +933,7 @@ static inline int hw_breakpoint_len(unsigned long dr7, int 
index)
 void hw_breakpoint_insert(CPUX86State *env, int index);
 void hw_breakpoint_remove(CPUX86State *env, int index);
 int check_hw_breakpoints(CPUX86State *env, int force_dr6_update);
+void breakpoint_handler(CPUX86State *env);
 
 /* will be suppressed */
 void cpu_x86_update_cr0(CPUX86State *env, uint32_t new_cr0);
diff --git a/target-i386/helper.c b/target-i386/helper.c
index b9384f6..c52ec13 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -941,7 +941,7 @@ int check_hw_breakpoints(CPUX86State *env, int 
force_dr6_update)
 return hit_enabled;
 }
 
-static void breakpoint_handler(CPUX86State *env)
+void breakpoint_handler(CPUX86State *env)
 {
 CPUBreakpoint *bp;
 
@@ -1151,20 +1151,11 @@ X86CPU *cpu_x86_init(const char *cpu_model)
 {
 X86CPU *cpu;
 CPUX86State *env;
-static int inited;
 
 cpu = X86_CPU(object_new(TYPE_X86_CPU));
 env = cpu-env;
 env-cpu_model_str = cpu_model;
 
-/* init various static tables used in TCG mode */
-if (tcg_enabled()  !inited) {
-inited = 1;
-optimize_flags_init();
-#ifndef CONFIG_USER_ONLY
-cpu_set_debug_excp_handler(breakpoint_handler);
-#endif
-}
 if (cpu_x86_register(cpu, cpu_model)  0) {
 object_delete(OBJECT(cpu));
 return NULL;
-- 
1.7.10.2

Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1

2012-06-25 Thread Michael Roth

On Mon, Jun 25, 2012 at 12:46:28PM +0100, Stefano Stabellini wrote:
 On Sat, 23 Jun 2012, Michael Roth wrote:
  Hi everyone,
  
  I've set up a tree for stable-1.1 at:
  
  git://github.com/mdroth/qemu.git stable-1.1
 
 Thanks for doing this.
 
 
  That is simply the initial commit for v1.1.0
  
  I've set up a staging tree for the first batch of commits at:
  
  git://github.com/mdroth/qemu.git stable-1.1-staging
  
  This should capture everything explicitly tagged for stable/1.1. I've also
  pulled in relevant build fixes and a handful of applicable bug fixes.
  
  Please let me know if I've missed any patches, or if any of these shouldn't 
  be
  applied.
 
 Please backport this one too:
 
 commit 64c27e5b1fdb6d94bdc0bda3b1869d7383a35c65
 Author: Jan Beulich jbeul...@suse.com
 Date:   Mon Jun 11 09:52:27 2012 +
 
 qemu/xendisk: set maximum number of grants to be used

Applied to staging, thanks.

 
 
 Thanks!
 
 - Stefano

Re: [Qemu-devel] [PATCH 2/4] target-xtensa: drop usage of prev_debug_excp_handler

2012-06-25 Thread Max Filippov

On Mon, Jun 25, 2012 at 5:55 PM, Igor Mammedov imamm...@redhat.com wrote:
 Chains of exception handlers are currently unused feature. Dropping it
 to be consistent with target-i386 but it may simplify qom-ifying CPU
 in future like for target-i386.

 Signed-off-by: Igor Mammedov imamm...@redhat.com
 ---
  target-xtensa/helper.c |    8 +---
  1 file changed, 1 insertion(+), 7 deletions(-)

Acked-by: Max Filippov jcmvb...@gmail.com

-- 
Thanks.
-- Max

Re: [Qemu-devel] [RFC 0/7] qcow2: implement lazy refcounts optimization

2012-06-25 Thread Zhi Yong Wu

On Fri, Jun 22, 2012 at 11:08 PM, Stefan Hajnoczi
stefa...@linux.vnet.ibm.com wrote:
 This series aims to improve qcow2 performance with cache=writethrough and
 cache=directsync.  In particular it reduces the impact of metadata updates for
What does cache=directsync mean? what is the difference between them?
 allocating writes.

 Allocating writes are expensive because they involve updating L2 tables and
 refcount blocks.  In addition they can also cause L2 table allocation and
 refcount block allocation but these remain unaffected by this optimization.

 The key insight is that refcounts are not required to access data in the 
 image.
Can you elaborate this? why?
 This means that we can postpone refcount updates without violating the data
 integrity guarantee that cache=writethrough and cache=directsync give.

 The trade-off for postponing refcount updates is that the image may not be
 completely consistent in case of power failure or crash.  If the image is 
 dirty
 then it must be repaired before performing further modifications, in other
 words we need an fsck-like scan on startup.

 I don't have performance results to share yet but I wanted to get the code out
 there.  The bigger picture is that this optimization should help make qcow2 a
 good choice even for cache=writethrough and cache=directsync where QED has
 traditionally had an advantage due to less metadata - this allows us to
 converge image format development in QEMU around the qcow2v3 format.
Why do we design one new image format? e.g. use btree to manage
metadata and disk data layout? Since btree has a lot of advantages
than bitmap or tables.

 Stefan Hajnoczi (7):
  docs: add dirty bit to qcow2 specification
  qcow2: introduce dirty bit
  docs: add lazy refcounts bit to qcow2 specification
  qemu-iotests: ignore qemu-img create lazy_refcounts output
  qcow2: implement lazy refcounts
  qemu-io: add abort command to simulate program crash
  qemu-iotests: add 039 qcow2 lazy refcounts test

  block/qcow2-cluster.c        |    5 +-
  block/qcow2.c                |  111 
 +++---
  block/qcow2.h                |   11 +
  block_int.h                  |   26 +-
  docs/specs/qcow2.txt         |   12 -
  qemu-io.c                    |   12 +
  tests/qemu-iotests/039       |   99 +
  tests/qemu-iotests/039.out   |   34 +
  tests/qemu-iotests/common.rc |    3 +-
  tests/qemu-iotests/group     |    1 +
  10 files changed, 292 insertions(+), 22 deletions(-)
  create mode 100755 tests/qemu-iotests/039
  create mode 100644 tests/qemu-iotests/039.out

 --
 1.7.10





-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH 0/4] Define and use PRI*PLX format specifier macros for target_phys_addr_t

2012-06-25 Thread Peter Maydell

On 25 June 2012 14:48, Andreas Färber afaer...@suse.de wrote:
 Am 25.06.2012 14:45, schrieb Peter Maydell:
 This patchset defines a set of macros for use in printf format strings
 for handling target_phys_addr_t values, as suggested by Andreas Färber
 (http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg01547.html)

 What is the X for again? I assume PL = Physical Long, X as in
 hexadecimal would then contradict the d/u and duplicate x/X.

I'm just following the existing use of 'plx' in TARGET_FMT_plx,
but I'm not very attached to the abbreviation.

 Thus I would suggest TARGET_PRIdPHYSADDR, also addressing Eric's
 comment. Judging from patches 2-4 it's not used to frequently unlike its
 user TARGET_FMT_plx, so we don't have to badly shorten it for usability.

The hope is that it is used more extensively in the future (ie
any device that would have used TARGET_FMT_plx should use these
instead); I think TARGET_PRIdPHYSADDR is too long.

-- PMM

Re: [Qemu-devel] [PATCH stable-1.1 00/26] Initial tree and candidates for stable-1.1

2012-06-25 Thread Michael Roth

On Mon, Jun 25, 2012 at 02:58:40PM +0200, Kevin Wolf wrote:
 Am 23.06.2012 02:33, schrieb Michael Roth:
  Hi everyone,
  
  I've set up a tree for stable-1.1 at:
  
  git://github.com/mdroth/qemu.git stable-1.1
  
  That is simply the initial commit for v1.1.0
  
  I've set up a staging tree for the first batch of commits at:
  
  git://github.com/mdroth/qemu.git stable-1.1-staging
  
  This should capture everything explicitly tagged for stable/1.1. I've also
  pulled in relevant build fixes and a handful of applicable bug fixes.
  
  Please let me know if I've missed any patches, or if any of these shouldn't 
  be
  applied.
  
  Testing so far has been all-target builds on x86, make check, qemu-test, and
  installation + old(current stable-1.1)-new and new-old migration for
  fc15/fc16/winxp. I plan on doing a more extensive final round of testing 
  later
  next week before committing, as that will more or less be QEMU v1.1.1
  
  For the next release I plan on doing weekly or bi-weekly testing of staged
  patches and emails of this sort before committing to stable-1.1, depending 
  on
  the number of patches and code churn, and will continue working off my
  stable-1.1-staging tree on github for those.
 
 Please consider as well:
 
 4bb9c939a57103898f5a51aa6a7336eb3320d923 ahci: SATA FIS is 20 bytes...

I originally had this in the queue but it looks like the only effect is
to avoid allocating/initializing an extra 16 bytes on the stack. I
didn't see any other potential side-effects so decided to drop it. Let
me know if I'm missing something though.

 6f3c714eb7730630241fd0b33b799352d7feb876 sheepdog: fix return value...
 47ce9ef7f89032c4079bf5132a12d1bfd4d5bca5 virtio: Fix compiler warning...
 f085800e245836fed27fddb3b624a29326637657 qemu-img: document qed...

Applied to staging, thanks.

 
 Also I think you should add your SoB to each of the patches and document
 which commit in master you backported (i.e. use git cherry-pick -sx).

Thanks, corrected in staging now. Didn't know about about the -x option,
pretty spiffy :)

 
 Kevin

Re: [Qemu-devel] [RFC 2/7] qcow2: introduce dirty bit

2012-06-25 Thread Kevin Wolf

Am 22.06.2012 17:08, schrieb Stefan Hajnoczi:
 This patch adds an incompatible feature bit to mark images that have not
 been closed cleanly.  When a dirty image file is opened a consistency
 check and repair is performed.
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  block/qcow2.c |   44 ++--
  block/qcow2.h |3 +++
  2 files changed, 45 insertions(+), 2 deletions(-)

 diff --git a/block/qcow2.h b/block/qcow2.h
 index 455b6d7..5c7cfb6 100644
 --- a/block/qcow2.h
 +++ b/block/qcow2.h
 @@ -108,6 +108,9 @@ enum {
  QCOW2_FEAT_TYPE_INCOMPATIBLE= 0,
  QCOW2_FEAT_TYPE_COMPATIBLE  = 1,
  QCOW2_FEAT_TYPE_AUTOCLEAR   = 2,
 +
 +QCOW2_INCOMPATIBLE_FEAT_DIRTY   = 0x1,
 +QCOW2_INCOMPATIBLE_FEAT_MASK= QCOW2_INCOMPATIBLE_FEAT_DIRTY,
  };

I would use a separate enum for each of the three flag categories.

Also, you should add a feature table entry for the dirty bit so that
older qemu versions can display a useful error message.

Kevin

1 2 >

1 - 100 of 196 matches

Mail list logo