Re: [Qemu-devel] [PATCH 2/2] hostmem-file: allow option 'size' optional

2016-10-25 Thread Haozhong Zhang

On 10/25/16 17:50 -0200, Eduardo Habkost wrote:

On Mon, Oct 24, 2016 at 05:21:51PM +0800, Haozhong Zhang wrote:

If 'size' option of hostmem-file is not given, QEMU will use the file
size of 'mem-path' instead. For an empty file, a non-zero size must be
specified by the option 'size'.

Signed-off-by: Haozhong Zhang 
---
 backends/hostmem-file.c | 10 ++
 exec.c  |  8 
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 42efb2f..f94d2f7 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -40,10 +40,6 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 {
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);

-if (!backend->size) {
-error_setg(errp, "can't create backend with size 0");
-return;
-}
 if (!fb->mem_path) {
 error_setg(errp, "mem-path property not set");
 return;
@@ -62,6 +58,12 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 g_free(path);
 }
 #endif
+if (!errp && !backend->size) {


This condition is always false because non-NULL errp is always
provided by the only caller (host_memory_backend_memory_complete()).



Oops, I meant !*errp. Anyway, I'll change to the way you suggested below.


To simplify error checking, I suggest moving the error path to a
label at the end of the function, e.g.:

static void file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
{
   Error *local_err = NULL;
   /* ... */
   memory_region_init_ram_from_file(..., _err);
   if (local_err) {
   goto out;
   }
   /* ... */
   if (!backend->size) {
   backend->size = memory_region_size(>mr);
   }
   /* ... */
out:
   error_propagate(errp, local_err);
}


+backend->size = memory_region_size(>mr);
+if (!backend->size) {
+error_setg(errp, "can't create backend with size 0");
+}
+}
 }

 static char *get_mem_path(Object *o, Error **errp)
diff --git a/exec.c b/exec.c
index 95983c9..91adc62 100644
--- a/exec.c
+++ b/exec.c
@@ -1274,6 +1274,14 @@ static void *file_ram_alloc(RAMBlock *block,
 goto error;
 }

+if (memory) {
+memory = memory ?: file_size;


This doesn't make sense to me. You already checked if memory is
zero above, and now you are checking if it's zero again.
file_size is never going to be used here.


+memory_region_set_size(block->mr, memory);
+memory = HOST_PAGE_ALIGN(memory);
+block->used_length = memory;
+block->max_length = memory;


This is fragile: it duplicates the logic that initializes
used_length and max_length in qemu_ram_alloc_*().

Maybe it's better to keep the file-size-probing magic inside
hostmem-file.c, and always give a non-zero size to
memory_region_init_ram_from_file().



Yes, I can move the logic in above if-statement to qemu_ram_alloc_from_file().

Thanks,
Haozhong


+}
+
 if (memory < block->page_size) {
 error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
"or larger than page size 0x%zx",
--
2.10.1



--
Eduardo




Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 13/17] COLO: Introduce state to record failover process

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:09], zhanghailiang wrote:
> When handling failover, COLO processes differently according to
> the different stage of failover process, here we introduce a global
> atomic variable to record the status of failover.
> 
> We add four failover status to indicate the different stage of failover 
> process.
> You should use the helpers to get and set the value.
> 
> Signed-off-by: zhanghailiang 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 12/17] COLO: Add 'x-colo-lost-heartbeat' command to trigger failover

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:08], zhanghailiang wrote:
> We leave users to choose whatever heartbeat solution they want,
> if the heartbeat is lost, or other errors they detect, they can use
> experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover,
> COLO will do operations accordingly.
> 
> For example, if the command is sent to the PVM, the Primary side will
> exit COLO mode and take over operation.

Primary should already be in control, so there's nothing special
needed to 'take over operation'?  At max, it should not do periodic
syncs anymore till it hears from a (new) secondary.

> If sent to the Secondary, the
> secondary will run failover work, then take over server operation to
> become the new Primary.
> 
> Cc: Luiz Capitulino 
> Cc: Eric Blake 
> Cc: Markus Armbruster 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 11/17] COLO: Synchronize PVM's state to SVM periodically

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:07], zhanghailiang wrote:
> Do checkpoint periodically, the default interval is 200ms.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 10/17] COLO: Add checkpoint-delay parameter for migrate-set-parameters

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:06], zhanghailiang wrote:
> Add checkpoint-delay parameter for migrate-set-parameters, so that
> we can control the checkpoint frequency when COLO is in periodic mode.
> 
> Cc: Luiz Capitulino 
> Cc: Eric Blake 
> Cc: Markus Armbruster 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 


> diff --git a/hmp.c b/hmp.c
> index 80f7f1f..759f4f4 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -318,6 +318,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>  monitor_printf(mon, " %s: %" PRId64 " milliseconds",
>  MigrationParameter_lookup[MIGRATION_PARAMETER_DOWNTIME_LIMIT],
>  params->downtime_limit);
> +monitor_printf(mon, " %s: %" PRId64,
> +
> MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
> +params->x_checkpoint_delay);
>  monitor_printf(mon, "\n");
>  }
>  
> @@ -1363,7 +1366,6 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)
>  case MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT:
>  p.has_cpu_throttle_increment = true;
>  use_int_value = true;
> -break;

Hm?

>  case MIGRATION_PARAMETER_TLS_CREDS:
>  p.has_tls_creds = true;
>  p.tls_creds = (char *) valuestr;
> @@ -1386,6 +1388,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> QDict *qdict)


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 09/17] COLO: Load VMState into QIOChannelBuffer before restore it

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:05], zhanghailiang wrote:
> We should not destroy the state of SVM (Secondary VM) until we receive
> the complete data of PVM's state, in case the primary fails in the process
> of sending the state, so we cache the VM's state in secondary side before
> load it into SVM.
> 
> Besides, we should call qemu_system_reset() before load VM state,
> which can ensure the data is intact.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Reviewed-by: Dr. David Alan Gilbert 
> Cc: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 08/17] COLO: Send PVM state to secondary side when do checkpoint

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:04], zhanghailiang wrote:
> VM checkpointing is to synchronize the state of PVM to SVM, just
> like migration does, we re-use save helpers to achieve migrating
> PVM's state to Secondary side.
> 
> COLO need to cache the data of VM's state in the secondary side before
> synchronize it to SVM. COLO need the size of the data to determine
> how much data should be read in the secondary side.
> So here, we can get the size of the data by saving it into I/O channel
> before send it to the secondary side.

BTW PVM and SVM and Primary and Secondary are used interchangeably and
inconsistently.  I'd prefer if you stuck with one usage.  I prefer
Primary and Secondary, but it doesn't matter what you choose.

Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 08/17] COLO: Send PVM state to secondary side when do checkpoint

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:04], zhanghailiang wrote:
> VM checkpointing is to synchronize the state of PVM to SVM, just
> like migration does, we re-use save helpers to achieve migrating
> PVM's state to Secondary side.
> 
> COLO need to cache the data of VM's state in the secondary side before
> synchronize it to SVM. COLO need the size of the data to determine
> how much data should be read in the secondary side.
> So here, we can get the size of the data by saving it into I/O channel
> before send it to the secondary side.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 
> Cc: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 

Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 06/17] COLO: Introduce checkpointing protocol

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:02], zhanghailiang wrote:
> We need communications protocol of user-defined to control
> the checkpointing process.
> 
> The new checkpointing request is started by Primary VM,
> and the interactive process like below:
> 
> Checkpoint synchronizing points:
> 
>Primary   Secondary
> initial work
> 'checkpoint-ready'< @
> 
> 'checkpoint-request'  @ >
> Suspend (Only in hybrid mode)
> 'checkpoint-reply'< @
>   Suspend state
> 'vmstate-send'@ >
>   Send stateReceive state
> 'vmstate-received'< @
>   Release packets   Load state
> 'vmstate-load'< @
>   ResumeResume (Only in hybrid mode)
> 
>   Start Comparing (Only in hybrid mode)
> NOTE:
>  1) '@' who sends the message
>  2) Every sync-point is synchronized by two sides with only
> one handshake(single direction) for low-latency.
> If more strict synchronization is required, a opposite direction
> sync-point should be added.
>  3) Since sync-points are single direction, the remote side may
> go forward a lot when this side just receives the sync-point.
>  4) For now, we only support 'periodic' checkpoint, for which
>the Secondary VM is not running, later we will support 'hybrid' mode.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Cc: Eric Blake 
> Cc: Markus Armbruster 
> Cc: Dr. David Alan Gilbert 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 

> +static int colo_do_checkpoint_transaction(MigrationState *s)
> +{
> +Error *local_err = NULL;
> +
> +colo_send_message(s->to_dst_file, COLO_MESSAGE_CHECKPOINT_REQUEST,
> +  _err);
> +if (local_err) {
> +goto out;
> +}
> +
> +colo_receive_check_message(s->rp_state.from_dst_file,
> +COLO_MESSAGE_CHECKPOINT_REPLY, _err);
> +if (local_err) {
> +goto out;
> +}
> +
> +/* TODO: suspend and save vm state to colo buffer */

I like how you've split up the patches - makes it easier to review.
Thanks for doing this!

> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -785,6 +785,31 @@
>  { 'command': 'migrate-start-postcopy' }
>  
>  ##
> +# @COLOMessage
> +#
> +# The message transmission between PVM and SVM

Can you expand PVM and SVM for the first use?  It's obvious to someone
who's familiar with COLO, but someone looking at the api may not know
what it all means.  Also, please expand COLO if not already done in
the qapi-schema file.


Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 05/17] COLO: Establish a new communicating path for COLO

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:01], zhanghailiang wrote:
> This new communication path will be used for returning messages
> from Secondary side to Primary side.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 

> @@ -63,8 +75,24 @@ void *colo_process_incoming_thread(void *opaque)
>  migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
>MIGRATION_STATUS_COLO);
>  
> +mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
> +if (!mis->to_src_file) {
> +error_report("COLO incoming thread: Open QEMUFile to_src_file 
> failed");
> +goto out;
> +}
> +/*
> + * Note: We set the fd to unblocked in migration incoming coroutine,
> + * But here we are in the COLO incoming thread, so it is ok to set the
> + * fd back to blocked.
> + */
> +qemu_file_set_blocking(mis->from_src_file, true);

Why does it need to be blocking?

Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 04/17] migration: Switch to COLO process after finishing loadvm

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:10:00], zhanghailiang wrote:
> Switch from normal migration loadvm process into COLO checkpoint process if
> COLO mode is enabled.
> 
> We add three new members to struct MigrationIncomingState,
> 'have_colo_incoming_thread' and 'colo_incoming_thread' record the COLO
> related thread for secondary VM, 'migration_incoming_co' records the
> original migration incoming coroutine.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Reviewed-by: Dr. David Alan Gilbert 

(snip)

> +void migration_incoming_exit_colo(void)
> +{
> +colo_info.colo_requested = 0;

Please use 'true' and 'false' for bools.

Otherwise,

Reviewed-by: Amit Shah 



Amit



Re: [Qemu-devel] [PATCH 0/7] blockjobs: preliminary refactoring work, Pt 1

2016-10-25 Thread Jeff Cody
On Fri, Oct 14, 2016 at 02:32:55PM -0400, John Snow wrote:
> 
> 
> On 10/13/2016 06:56 PM, John Snow wrote:
> >This is a follow-up to patches 1-6 of:
> >[PATCH v2 00/11] blockjobs: Fix transactional race condition
> >
> >That series started trying to refactor blockjobs with the goal of
> >internalizing BlockJob state as a side effect of having gone through
> >the effort of figuring out which commands were "safe" to call on
> >a Job that has no coroutine object.
> >
> >I've split out the less contentious bits so I can move forward with my
> >original work of focusing on the transactional race condition in a
> >different series.
> >
> >Functionally the biggest difference here is the presence of "internal"
> >block jobs, which do not emit QMP events or show up in block query
> >requests. This is done for the sake of replication jobs, which should
> >not be interfering with the public jobs namespace.
> >
> 
> I have v2 ready to send out correcting Kevin's comments in patch #01, but
> I'd like to have the Replication maintainers at Fujitsu take a look at how I
> have modified replication and at least 'ACK' the change.
> 
> As a recap: I am creating "internal" block jobs that have no ID and
> therefore do not collide with the user-specified jobs namespace. This way
> users cannot query, cancel, pause, or otherwise accidentally interfere with
> the replication job lifetime.
> 
> It also means that management layers such as libvirt will not be aware of
> the presence of such "internal" jobs.
> 
> Relevant patches are 1-3. Please let me know if you have questions.
> 
> Thanks,
> --John Snow
>

Looks good to me, once you address Kevin's comments in patch 1.

> 
> >
> >
> >For convenience, this branch is available at:
> >https://github.com/jnsnow/qemu.git branch job-refactor-pt1
> >https://github.com/jnsnow/qemu/tree/job-refactor-pt1
> >
> >This version is tagged job-refactor-pt1-v1:
> >https://github.com/jnsnow/qemu/releases/tag/job-refactor-pt1-v1
> >
> >John Snow (7):
> >  blockjobs: hide internal jobs from management API
> >  blockjobs: Allow creating internal jobs
> >  Replication/Blockjobs: Create replication jobs as internal
> >  blockjob: centralize QMP event emissions
> >  Blockjobs: Internalize user_pause logic
> >  blockjobs: split interface into public/private, Part 1
> >  blockjobs: fix documentation
> >
> > block/backup.c   |   5 +-
> > block/commit.c   |  10 +-
> > block/mirror.c   |  28 +++--
> > block/replication.c  |  14 +--
> > block/stream.c   |   9 +-
> > block/trace-events   |   5 +-
> > blockdev.c   |  74 +
> > blockjob.c   | 109 ++
> > include/block/block.h|   3 +-
> > include/block/block_int.h|  26 ++---
> > include/block/blockjob.h | 257 
> > +++
> > include/block/blockjob_int.h | 232 ++
> > qemu-img.c   |   5 +-
> > tests/test-blockjob-txn.c|   5 +-
> > tests/test-blockjob.c|   4 +-
> > 15 files changed, 443 insertions(+), 343 deletions(-)
> > create mode 100644 include/block/blockjob_int.h
> >



Re: [Qemu-devel] [PATCH 5/7] Blockjobs: Internalize user_pause logic

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:57:00PM -0400, John Snow wrote:
> BlockJobs will begin hiding their state in preparation for some
> refactorings anyway, so let's internalize the user_pause mechanism
> instead of leaving it to callers to correctly manage.
> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c   | 12 +---
>  blockjob.c   | 22 --
>  include/block/blockjob.h | 26 ++
>  3 files changed, 51 insertions(+), 9 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 22a1280..1661d08 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3579,7 +3579,7 @@ void qmp_block_job_cancel(const char *device,
>  force = false;
>  }
>  
> -if (job->user_paused && !force) {
> +if (block_job_user_paused(job) && !force) {
>  error_setg(errp, "The block job for device '%s' is currently paused",
> device);
>  goto out;
> @@ -3596,13 +3596,12 @@ void qmp_block_job_pause(const char *device, Error 
> **errp)
>  AioContext *aio_context;
>  BlockJob *job = find_block_job(device, _context, errp);
>  
> -if (!job || job->user_paused) {
> +if (!job || block_job_user_paused(job)) {
>  return;
>  }
>  
> -job->user_paused = true;
>  trace_qmp_block_job_pause(job);
> -block_job_pause(job);
> +block_job_user_pause(job);
>  aio_context_release(aio_context);
>  }
>  
> @@ -3611,14 +3610,13 @@ void qmp_block_job_resume(const char *device, Error 
> **errp)
>  AioContext *aio_context;
>  BlockJob *job = find_block_job(device, _context, errp);
>  
> -if (!job || !job->user_paused) {
> +if (!job || !block_job_user_paused(job)) {
>  return;
>  }
>  
> -job->user_paused = false;
>  trace_qmp_block_job_resume(job);
>  block_job_iostatus_reset(job);
> -block_job_resume(job);
> +block_job_user_resume(job);
>  aio_context_release(aio_context);
>  }
>  
> diff --git a/blockjob.c b/blockjob.c
> index e32cb78..d118a1f 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -362,11 +362,22 @@ void block_job_pause(BlockJob *job)
>  job->pause_count++;
>  }
>  
> +void block_job_user_pause(BlockJob *job)
> +{
> +job->user_paused = true;
> +block_job_pause(job);
> +}
> +
>  static bool block_job_should_pause(BlockJob *job)
>  {
>  return job->pause_count > 0;
>  }
>  
> +bool block_job_user_paused(BlockJob *job)
> +{
> +return job ? job->user_paused : 0;
> +}
> +
>  void coroutine_fn block_job_pause_point(BlockJob *job)
>  {
>  if (!block_job_should_pause(job)) {
> @@ -403,6 +414,14 @@ void block_job_resume(BlockJob *job)
>  block_job_enter(job);
>  }
>  
> +void block_job_user_resume(BlockJob *job)
> +{
> +if (job && job->user_paused && job->pause_count > 0) {
> +job->user_paused = false;
> +block_job_resume(job);
> +}
> +}
> +
>  void block_job_enter(BlockJob *job)
>  {
>  if (job->co && !job->busy) {
> @@ -626,8 +645,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, 
> BlockdevOnError on_err,
>  }
>  if (action == BLOCK_ERROR_ACTION_STOP) {
>  /* make the pause user visible, which will be resumed from QMP. */
> -job->user_paused = true;
> -block_job_pause(job);
> +block_job_user_pause(job);
>  block_job_iostatus_set_err(job, error);
>  }
>  return action;
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 928f0b8..5b61140 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -358,6 +358,23 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
>  void block_job_pause(BlockJob *job);
>  
>  /**
> + * block_job_user_pause:
> + * @job: The job to be paused.
> + *
> + * Asynchronously pause the specified job.
> + * Do not allow a resume until a matching call to block_job_user_resume.
> + */
> +void block_job_user_pause(BlockJob *job);
> +
> +/**
> + * block_job_paused:
> + * @job: The job to query.
> + *
> + * Returns true if the job is user-paused.
> + */
> +bool block_job_user_paused(BlockJob *job);
> +
> +/**
>   * block_job_resume:
>   * @job: The job to be resumed.
>   *
> @@ -366,6 +383,15 @@ void block_job_pause(BlockJob *job);
>  void block_job_resume(BlockJob *job);
>  
>  /**
> + * block_job_user_resume:
> + * @job: The job to be resumed.
> + *
> + * Resume the specified job.
> + * Must be paired with a preceding block_job_user_pause.
> + */
> +void block_job_user_resume(BlockJob *job);
> +
> +/**
>   * block_job_enter:
>   * @job: The job to enter.
>   *
> -- 
> 2.7.4
> 

Reviewed-by: Jeff Cody 



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 03/17] migration: Enter into COLO mode after migration if COLO is enabled

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:09:59], zhanghailiang wrote:
> Add a new migration state: MIGRATION_STATUS_COLO. Migration source side
> enters this state after the first live migration successfully finished
> if COLO is enabled by command 'migrate_set_capability x-colo on'.
> 
> We reuse migration thread, so the process of checkpointing will be handled
> in migration thread.
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Reviewed-by: Dr. David Alan Gilbert 

(snip)

> +static void colo_process_checkpoint(MigrationState *s)
> +{
> +qemu_mutex_lock_iothread();
> +vm_start();
> +qemu_mutex_unlock_iothread();
> +trace_colo_vm_state_change("stop", "run");
> +
> +/* TODO: COLO checkpoint savevm loop */
> +
> +migrate_set_state(>state, MIGRATION_STATUS_COLO,
> +  MIGRATION_STATUS_COMPLETED);

Is this just a temporary thing that'll be removed in the next patches?
I guess so - because once you enter COLO state, you want to remain in
it, right?

I think the commit message implies that.  So the commit msg and the
code are not in sync.

(snip)

> diff --git a/migration/migration.c b/migration/migration.c
> index f7dd9c6..462007d 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -695,6 +695,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  
>  get_xbzrle_cache_stats(info);
>  break;
> +case MIGRATION_STATUS_COLO:
> +info->has_status = true;
> +/* TODO: display COLO specific information (checkpoint info etc.) */
> +break;

When do you plan to add this?  I guess it's important for debugging
and also to get the state of the system while colo is active.  What
info do you have planned to display here?


Amit



Re: [Qemu-devel] [PATCH 7/7] blockjobs: fix documentation

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:57:02PM -0400, John Snow wrote:
> (Trivial)
> 
> Fix wrong function names in documentation.
> 
> Signed-off-by: John Snow 
> ---
>  include/block/blockjob_int.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 8eced19..10ebb38 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -191,8 +191,8 @@ void coroutine_fn block_job_pause_point(BlockJob *job);
>  void block_job_enter(BlockJob *job);
>  
>  /**
> - * block_job_ready:
> - * @job: The job which is now ready to complete.
> + * block_job_event_ready:
> + * @job: The job which is now ready to be completed.
>   *
>   * Send a BLOCK_JOB_READY event for the specified job.
>   */
> -- 
> 2.7.4
> 

Reviewed-by: Jeff Cody 



Re: [Qemu-devel] [PATCH 6/7] blockjobs: split interface into public/private, Part 1

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:57:01PM -0400, John Snow wrote:
> To make it a little more obvious which functions are intended to be
> public interface and which are intended to be for use only by jobs
> themselves, split the interface into "public" and "private" files.
> 
> Convert blockjobs (e.g. block/backup) to using the private interface.
> Leave blockdev and others on the public interface.
> 
> There are remaining uses of private state by qemu-img, and several
> cases in blockdev.c and block/io.c where we grab job->blk for the
> purposes of acquiring an AIOContext.
> 
> These will be corrected in future patches.
> 
> Signed-off-by: John Snow 
> ---
>  block/backup.c   |   2 +-
>  block/commit.c   |   2 +-
>  block/mirror.c   |   2 +-
>  block/stream.c   |   2 +-
>  blockjob.c   |   2 +-
>  include/block/block.h|   3 +-
>  include/block/blockjob.h | 205 +-
>  include/block/blockjob_int.h | 232 
> +++
>  tests/test-blockjob-txn.c|   2 +-
>  tests/test-blockjob.c|   2 +-
>  10 files changed, 244 insertions(+), 210 deletions(-)
>  create mode 100644 include/block/blockjob_int.h
> 
> diff --git a/block/backup.c b/block/backup.c
> index 6a60ca8..6d12100 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -16,7 +16,7 @@
>  #include "trace.h"
>  #include "block/block.h"
>  #include "block/block_int.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "block/block_backup.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
> diff --git a/block/commit.c b/block/commit.c
> index 475a375..d555600 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -15,7 +15,7 @@
>  #include "qemu/osdep.h"
>  #include "trace.h"
>  #include "block/block_int.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/ratelimit.h"
> diff --git a/block/mirror.c b/block/mirror.c
> index 4374fb4..c81b5e0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -13,7 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "trace.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "block/block_int.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
> diff --git a/block/stream.c b/block/stream.c
> index 7d6877d..906f7f3 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -14,7 +14,7 @@
>  #include "qemu/osdep.h"
>  #include "trace.h"
>  #include "block/block_int.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/ratelimit.h"
> diff --git a/blockjob.c b/blockjob.c
> index d118a1f..e6f0d97 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -27,7 +27,7 @@
>  #include "qemu-common.h"
>  #include "trace.h"
>  #include "block/block.h"
> -#include "block/blockjob.h"
> +#include "block/blockjob_int.h"
>  #include "block/block_int.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/qmp/qerror.h"
> diff --git a/include/block/block.h b/include/block/block.h
> index 107c603..89b5feb 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -7,16 +7,15 @@
>  #include "qemu/coroutine.h"
>  #include "block/accounting.h"
>  #include "block/dirty-bitmap.h"
> +#include "block/blockjob.h"
>  #include "qapi/qmp/qobject.h"
>  #include "qapi-types.h"
>  #include "qemu/hbitmap.h"
>  
>  /* block.c */
>  typedef struct BlockDriver BlockDriver;
> -typedef struct BlockJob BlockJob;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildRole BdrvChildRole;
> -typedef struct BlockJobTxn BlockJobTxn;
>  
>  typedef struct BlockDriverInfo {
>  /* in bytes, 0 if irrelevant */
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 5b61140..bfc8233 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -28,78 +28,15 @@
>  
>  #include "block/block.h"
>  
> -/**
> - * BlockJobDriver:
> - *
> - * A class type for block job driver.
> - */
> -typedef struct BlockJobDriver {
> -/** Derived BlockJob struct size */
> -size_t instance_size;
> -
> -/** String describing the operation, part of query-block-jobs QMP API */
> -BlockJobType job_type;
> -
> -/** Optional callback for job types that support setting a speed limit */
> -void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
> -
> -/** Optional callback for job types that need to forward I/O status 
> reset */
> -void (*iostatus_reset)(BlockJob *job);
> -
> -/**
> - * Optional callback for job types whose completion must be triggered
> - * manually.
> - */
> -void (*complete)(BlockJob *job, Error **errp);
> -
> -/**
> - * If the callback is not NULL, it will be invoked when all the jobs
> - * belonging to the same transaction 

Re: [Qemu-devel] [PATCH 4/7] blockjob: centralize QMP event emissions

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:56:59PM -0400, John Snow wrote:
> There's no reason to leave this to blockdev; we can do it in blockjobs
> directly and get rid of an extra callback for most users.
> 
> All non-internal events, even those created outside of QMP, will
> consistently emit events.
> 
> Signed-off-by: John Snow 
> ---
>  block/commit.c|  8 
>  block/mirror.c|  6 ++
>  block/stream.c|  7 +++
>  block/trace-events|  5 ++---
>  blockdev.c| 42 --
>  blockjob.c| 23 +++
>  include/block/block_int.h | 17 -
>  include/block/blockjob.h  | 17 -
>  8 files changed, 42 insertions(+), 83 deletions(-)
> 
> diff --git a/block/commit.c b/block/commit.c
> index f29e341..475a375 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -209,8 +209,8 @@ static const BlockJobDriver commit_job_driver = {
>  
>  void commit_start(const char *job_id, BlockDriverState *bs,
>BlockDriverState *base, BlockDriverState *top, int64_t 
> speed,
> -  BlockdevOnError on_error, BlockCompletionFunc *cb,
> -  void *opaque, const char *backing_file_str, Error **errp)
> +  BlockdevOnError on_error, const char *backing_file_str,
> +  Error **errp)
>  {
>  CommitBlockJob *s;
>  BlockReopenQueue *reopen_queue = NULL;
> @@ -233,7 +233,7 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  
>  s = block_job_create(job_id, _job_driver, bs, speed,
> - BLOCK_JOB_DEFAULT, cb, opaque, errp);
> + BLOCK_JOB_DEFAULT, NULL, NULL, errp);
>  if (!s) {
>  return;
>  }
> @@ -276,7 +276,7 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  s->on_error = on_error;
>  s->common.co = qemu_coroutine_create(commit_run, s);
>  
> -trace_commit_start(bs, base, top, s, s->common.co, opaque);
> +trace_commit_start(bs, base, top, s, s->common.co);
>  qemu_coroutine_enter(s->common.co);
>  }
>  
> diff --git a/block/mirror.c b/block/mirror.c
> index 15d2d10..4374fb4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -979,9 +979,7 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
> -  bool unmap,
> -  BlockCompletionFunc *cb,
> -  void *opaque, Error **errp)
> +  bool unmap, Error **errp)
>  {
>  bool is_none_mode;
>  BlockDriverState *base;
> @@ -994,7 +992,7 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>  mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
>   speed, granularity, buf_size, backing_mode,
> - on_source_error, on_target_error, unmap, cb, opaque, 
> errp,
> + on_source_error, on_target_error, unmap, NULL, NULL, 
> errp,
>   _job_driver, is_none_mode, base, false);
>  }
>  
> diff --git a/block/stream.c b/block/stream.c
> index eeb6f52..7d6877d 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -216,13 +216,12 @@ static const BlockJobDriver stream_job_driver = {
>  
>  void stream_start(const char *job_id, BlockDriverState *bs,
>BlockDriverState *base, const char *backing_file_str,
> -  int64_t speed, BlockdevOnError on_error,
> -  BlockCompletionFunc *cb, void *opaque, Error **errp)
> +  int64_t speed, BlockdevOnError on_error, Error **errp)
>  {
>  StreamBlockJob *s;
>  
>  s = block_job_create(job_id, _job_driver, bs, speed,
> - BLOCK_JOB_DEFAULT, cb, opaque, errp);
> + BLOCK_JOB_DEFAULT, NULL, NULL, errp);
>  if (!s) {
>  return;
>  }
> @@ -232,6 +231,6 @@ void stream_start(const char *job_id, BlockDriverState 
> *bs,
>  
>  s->on_error = on_error;
>  s->common.co = qemu_coroutine_create(stream_run, s);
> -trace_stream_start(bs, base, s, s->common.co, opaque);
> +trace_stream_start(bs, base, s, s->common.co);
>  qemu_coroutine_enter(s->common.co);
>  }
> diff --git a/block/trace-events b/block/trace-events
> index 05fa13c..c12f91b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -20,11 +20,11 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t offset, 
> unsigned int bytes, int64_t c
>  
>  # block/stream.c
>  stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int 
> is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
> -stream_start(void *bs, void *base, void *s, 

Re: [Qemu-devel] [PATCH 3/7] Replication/Blockjobs: Create replication jobs as internal

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:56:58PM -0400, John Snow wrote:
> Bubble up the internal interface to commit and backup jobs, then switch
> replication tasks over to using this methodology.
> 
> Signed-off-by: John Snow 
> ---
>  block/backup.c|  3 ++-
>  block/mirror.c| 21 ++---
>  block/replication.c   | 14 +++---
>  blockdev.c| 11 +++
>  include/block/block_int.h |  9 +++--
>  qemu-img.c|  5 +++--
>  6 files changed, 36 insertions(+), 27 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 5acb5c4..6a60ca8 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -527,6 +527,7 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>bool compress,
>BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
> +  int creation_flags,
>BlockCompletionFunc *cb, void *opaque,
>BlockJobTxn *txn, Error **errp)
>  {
> @@ -596,7 +597,7 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  
>  job = block_job_create(job_id, _job_driver, bs, speed,
> -   BLOCK_JOB_DEFAULT, cb, opaque, errp);
> +   creation_flags, cb, opaque, errp);
>  if (!job) {
>  goto error;
>  }
> diff --git a/block/mirror.c b/block/mirror.c
> index 74c03ae..15d2d10 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -906,9 +906,9 @@ static const BlockJobDriver commit_active_job_driver = {
>  };
>  
>  static void mirror_start_job(const char *job_id, BlockDriverState *bs,
> - BlockDriverState *target, const char *replaces,
> - int64_t speed, uint32_t granularity,
> - int64_t buf_size,
> + int creation_flags, BlockDriverState *target,
> + const char *replaces, int64_t speed,
> + uint32_t granularity, int64_t buf_size,
>   BlockMirrorBackingMode backing_mode,
>   BlockdevOnError on_source_error,
>   BlockdevOnError on_target_error,
> @@ -936,8 +936,8 @@ static void mirror_start_job(const char *job_id, 
> BlockDriverState *bs,
>  buf_size = DEFAULT_MIRROR_BUF_SIZE;
>  }
>  
> -s = block_job_create(job_id, driver, bs, speed,
> - BLOCK_JOB_DEFAULT, cb, opaque, errp);
> +s = block_job_create(job_id, driver, bs, speed, creation_flags,
> + cb, opaque, errp);
>  if (!s) {
>  return;
>  }
> @@ -992,17 +992,16 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
> -mirror_start_job(job_id, bs, target, replaces,
> +mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
>   speed, granularity, buf_size, backing_mode,
>   on_source_error, on_target_error, unmap, cb, opaque, 
> errp,
>   _job_driver, is_none_mode, base, false);
>  }
>  
>  void commit_active_start(const char *job_id, BlockDriverState *bs,
> - BlockDriverState *base, int64_t speed,
> - BlockdevOnError on_error,
> - BlockCompletionFunc *cb,
> - void *opaque, Error **errp,
> + BlockDriverState *base, int creation_flags,
> + int64_t speed, BlockdevOnError on_error,
> + BlockCompletionFunc *cb, void *opaque, Error **errp,
>   bool auto_complete)
>  {
>  int64_t length, base_length;
> @@ -1041,7 +1040,7 @@ void commit_active_start(const char *job_id, 
> BlockDriverState *bs,
>  }
>  }
>  
> -mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
> +mirror_start_job(job_id, bs, creation_flags, base, NULL, speed, 0, 0,
>   MIRROR_LEAVE_BACKING_CHAIN,
>   on_error, on_error, false, cb, opaque, _err,
>   _active_job_driver, false, base, auto_complete);
> diff --git a/block/replication.c b/block/replication.c
> index 3bd1cf1..d4f4a7b 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -496,10 +496,11 @@ static void replication_start(ReplicationState *rs, 
> ReplicationMode mode,
>  bdrv_op_block_all(top_bs, s->blocker);
>  bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>  
> -backup_start("replication-backup", s->secondary_disk->bs,
> - s->hidden_disk->bs, 0, MIRROR_SYNC_MODE_NONE, NULL, 
> false,
> +backup_start(NULL, s->secondary_disk->bs, 

Re: [Qemu-devel] [PATCH 2/7] blockjobs: Allow creating internal jobs

2016-10-25 Thread Jeff Cody
On Thu, Oct 13, 2016 at 06:56:57PM -0400, John Snow wrote:
> Add the ability to create jobs without an ID.
> 
> Signed-off-by: John Snow 
> ---
>  block/backup.c|  2 +-
>  block/commit.c|  2 +-
>  block/mirror.c|  3 ++-
>  block/stream.c|  2 +-
>  blockjob.c| 25 -
>  include/block/blockjob.h  |  7 ++-
>  tests/test-blockjob-txn.c |  3 ++-
>  tests/test-blockjob.c |  2 +-
>  8 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 582bd0f..5acb5c4 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -596,7 +596,7 @@ void backup_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  
>  job = block_job_create(job_id, _job_driver, bs, speed,
> -   cb, opaque, errp);
> +   BLOCK_JOB_DEFAULT, cb, opaque, errp);
>  if (!job) {
>  goto error;
>  }
> diff --git a/block/commit.c b/block/commit.c
> index 9f67a8b..f29e341 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -233,7 +233,7 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  
>  s = block_job_create(job_id, _job_driver, bs, speed,
> - cb, opaque, errp);
> + BLOCK_JOB_DEFAULT, cb, opaque, errp);
>  if (!s) {
>  return;
>  }
> diff --git a/block/mirror.c b/block/mirror.c
> index f9d1fec..74c03ae 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -936,7 +936,8 @@ static void mirror_start_job(const char *job_id, 
> BlockDriverState *bs,
>  buf_size = DEFAULT_MIRROR_BUF_SIZE;
>  }
>  
> -s = block_job_create(job_id, driver, bs, speed, cb, opaque, errp);
> +s = block_job_create(job_id, driver, bs, speed,
> + BLOCK_JOB_DEFAULT, cb, opaque, errp);
>  if (!s) {
>  return;
>  }
> diff --git a/block/stream.c b/block/stream.c
> index 3187481..eeb6f52 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -222,7 +222,7 @@ void stream_start(const char *job_id, BlockDriverState 
> *bs,
>  StreamBlockJob *s;
>  
>  s = block_job_create(job_id, _job_driver, bs, speed,
> - cb, opaque, errp);
> + BLOCK_JOB_DEFAULT, cb, opaque, errp);
>  if (!s) {
>  return;
>  }
> diff --git a/blockjob.c b/blockjob.c
> index e78ad94..017905a 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -118,7 +118,7 @@ static void block_job_detach_aio_context(void *opaque)
>  }
>  
>  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
> -   BlockDriverState *bs, int64_t speed,
> +   BlockDriverState *bs, int64_t speed, int flags,
> BlockCompletionFunc *cb, void *opaque, Error **errp)
>  {
>  BlockBackend *blk;
> @@ -130,7 +130,7 @@ void *block_job_create(const char *job_id, const 
> BlockJobDriver *driver,
>  return NULL;
>  }
>  
> -if (job_id == NULL) {
> +if (job_id == NULL && !(flags & BLOCK_JOB_INTERNAL)) {
>  job_id = bdrv_get_device_name(bs);
>  if (!*job_id) {
>  error_setg(errp, "An explicit job ID is required for this node");
> @@ -138,14 +138,21 @@ void *block_job_create(const char *job_id, const 
> BlockJobDriver *driver,
>  }
>  }
>  
> -if (!id_wellformed(job_id)) {
> -error_setg(errp, "Invalid job ID '%s'", job_id);
> -return NULL;
> -}
> +if (job_id) {
> +if (flags & BLOCK_JOB_INTERNAL) {
> +error_setg(errp, "Cannot specify job ID for internal block job");
> +return NULL;
> +}
>  
> -if (block_job_get(job_id)) {
> -error_setg(errp, "Job ID '%s' already in use", job_id);
> -return NULL;
> +if (!id_wellformed(job_id)) {
> +error_setg(errp, "Invalid job ID '%s'", job_id);
> +return NULL;
> +}
> +
> +if (block_job_get(job_id)) {
> +error_setg(errp, "Job ID '%s' already in use", job_id);
> +return NULL;
> +}
>  }
>  
>  blk = blk_new();
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 6ecfa2e..fdb31e0 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -200,6 +200,11 @@ struct BlockJob {
>  QLIST_ENTRY(BlockJob) txn_list;
>  };
>  
> +typedef enum BlockJobCreateFlags {
> +BLOCK_JOB_DEFAULT = 0x00,
> +BLOCK_JOB_INTERNAL = 0x01,
> +} BlockJobCreateFlags;
> +
>  /**
>   * block_job_next:
>   * @job: A block job, or %NULL.
> @@ -242,7 +247,7 @@ BlockJob *block_job_get(const char *id);
>   * called from a wrapper that is specific to the job type.
>   */
>  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
> -   BlockDriverState *bs, int64_t speed,
> +   BlockDriverState 

Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions

2016-10-25 Thread Nikunj A Dadhania
Richard Henderson  writes:

> On 10/24/2016 11:02 PM, Nikunj A Dadhania wrote:
>> Richard Henderson  writes:
>> 
>> 
>>> We already have rol32 and rol64.
>>>
>>> Which I see are broken for shift == 0.
>> 
>> I tried with different shift (including 0) in a test program, and the
>> result is as expected:
>> 
>> 0: ccddeeff
>> 
>> static inline unsigned int rol32(unsigned int word, unsigned int shift)
>> {
>>   return (word << shift) | (word >> (32 - shift));
>> }
>
> Technically, a shift by 32 is invalid.  Practically, there are two common
> cases: shift >= 32 produces zero and shift is truncated to the word size, both
> of which produce the correct results here.
>
> That said, there's also the case of clang's sanitizers, which will in fact
> signal this as a runtime error.

In that case, will send patch updating them as part of my next revision

Regards
Nikunj




Re: [Qemu-devel] [PATCH v1 2/3] target-ppc: add vrldnmi and vrlwmi instructions

2016-10-25 Thread Nikunj A Dadhania
Richard Henderson  writes:

> On 10/24/2016 11:19 PM, Nikunj A Dadhania wrote:
>> +begin = extract##size(src2, size - begin_last - 1, num_bits); \
>> +end = extract##size(src2, size - end_last - 1, num_bits); \
>> +shift = extract##size(src2, size - shift_last - 1, num_bits); \
>
> What I mean is
>
>   shift = extract##size(src2, 0, 6);
>   end = extract##size(src2, 8, 6);
>   begin = extract##size(src2, 16, 6);
>
> The values are at the *same* position for both instructions.  There's no need
> to parameterize with silly bigendian numberings.

Ah.. ok. You are right.

Regards
Nikunj




Re: [Qemu-devel] [PATCH v1] block/vxhs: Add Veritas HyperScale VxHS block device support

2016-10-25 Thread Jeff Cody
On Tue, Oct 25, 2016 at 03:02:07PM -0700, Ashish Mittal wrote:
> This patch adds support for a new block device type called "vxhs".
> Source code for the library that this code loads can be downloaded from:
> https://github.com/MittalAshish/libqnio.git
>

I grabbed the latest of libqnio, compiled it (had to disable -Werror), and
tried it out.  I was able to do a qemu-img info on a raw file, but it would
just hang when trying a format such as qcow2.  I am assuming
this is a limitation of test_server, and not libqnio.

This will make qemu-iotests more difficult however.

I haven't looked at the latest qnio code yet (other than compiling the
test-server to test), so the rest of this review is on the qemu driver.

> Sample command line using JSON syntax:
> ./qemu-system-x86_64 -name instance-0008 -S -vnc 0.0.0.0:0 -k en-us -vga 
> cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg 
> timestamp=on 
> 'json:{"driver":"vxhs","vdisk_id":"{c3e9095a-a5ee-4dce-afeb-2a59fb387410}","server":[{"host":"172.172.17.4","port":""}]}'
> 
> Sample command line using URI syntax:
> qemu-img convert -f raw -O raw -n 
> /var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad 
> vxhs://192.168.0.1:/%7Bc6718f6b-0401-441d-a8c3-1f0064d75ee0%7D
> 
> Signed-off-by: Ashish Mittal 
> ---
>  block/Makefile.objs |   2 +
>  block/trace-events  |  22 ++
>  block/vxhs.c| 736 
> 
>  configure   |  41 +++jj
>  4 files changed, 801 insertions(+)
>  create mode 100644 block/vxhs.c

I think this version still does not address Daniel's concerns regarding a
QAPI schema for vxhs.

We are also still needing qemu-iotests, and a test-server suitable to run
the tests.

> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 67a036a..58313a2 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
>  block-obj-$(CONFIG_CURL) += curl.o
>  block-obj-$(CONFIG_RBD) += rbd.o
>  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
> +block-obj-$(CONFIG_VXHS) += vxhs.o
>  block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
>  block-obj-$(CONFIG_LIBSSH2) += ssh.o
>  block-obj-y += accounting.o dirty-bitmap.o
> @@ -38,6 +39,7 @@ rbd.o-cflags   := $(RBD_CFLAGS)
>  rbd.o-libs := $(RBD_LIBS)
>  gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
>  gluster.o-libs := $(GLUSTERFS_LIBS)
> +vxhs.o-libs:= $(VXHS_LIBS)
>  ssh.o-cflags   := $(LIBSSH2_CFLAGS)
>  ssh.o-libs := $(LIBSSH2_LIBS)
>  archipelago.o-libs := $(ARCHIPELAGO_LIBS)
> diff --git a/block/trace-events b/block/trace-events
> index 05fa13c..aea97cb 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -114,3 +114,25 @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t 
> offset, size_t len) "s
>  qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, 
> uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
>  qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, 
> uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
>  qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
> "s %p acb %p ret %d offset %"PRIu64" len %zu"
> +
> +# block/vxhs.c
> +vxhs_iio_callback(int error, int reason) "ctx is NULL: error %d, reason %d"
> +vxhs_setup_qnio(void *s) "Context to HyperScale IO manager = %p"
> +vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o 
> %d, %d"
> +vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno 
> %d"
> +vxhs_open_fail(int ret) "Could not open the device. Error = %d"
> +vxhs_open_epipe(int ret) "Could not create a pipe for device. Bailing out. 
> Error=%d"
> +vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
> +vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void 
> *acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = %lu 
> offset = %lu ACB = %p. Error = %d, errno = %d"
> +vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl 
> failed, ret = %d, errno = %d"
> +vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat 
> ioctl returned size %lu"
> +vxhs_qnio_iio_open(const char *ip) "Failed to connect to storage agent on 
> host-ip %s"
> +vxhs_qnio_iio_devopen(const char *fname) "Failed to open vdisk device: %s"
> +vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %ld"
> +vxhs_parse_uri_filename(const char *filename) "URI passed via 
> bdrv_parse_filename %s"
> +vxhs_qemu_init_vdisk(const char *vdisk_id) "vdisk_id from json %s"
> +vxhs_qemu_init_numservers(int num_servers) "Number of servers passed = %d"
> +vxhs_parse_uri_hostinfo(int num, char *host, int port) "Host %d: IP %s, Port 
> %d"
> +vxhs_qemu_init(char *of_vsa_addr, int port) "Adding host %s:%d to 
> BDRVVXHSState"
> +vxhs_qemu_init_filename(const char 

Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 02/17] COLO: migrate COLO related info to secondary node

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:09:58], zhanghailiang wrote:
> We can determine whether or not VM in destination should go into COLO mode
> by referring to the info that was migrated.
> 
> We skip this section if COLO is not enabled (i.e.
> migrate_set_capability colo off), so that, It doesn't break compatibility
> with migration no matter whether users configure the 
> --enable-colo/disable-colo
> on the source/destination side or not;
> 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 

Amit



Re: [Qemu-devel] [PATCH COLO-Frame (Base) v21 01/17] migration: Introduce capability 'x-colo' to migration

2016-10-25 Thread Amit Shah
On (Tue) 18 Oct 2016 [20:09:57], zhanghailiang wrote:
> We add helper function colo_supported() to indicate whether
> colo is supported or not, with which we use to control whether or not
> showing 'x-colo' string to users, they can use qmp command
> 'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
> to learn if colo is supported.
> 
> The default value for COLO is disabled.
> 
> Cc: Juan Quintela 
> Cc: Amit Shah 
> Cc: Eric Blake 
> Cc: Markus Armbruster 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Li Zhijian 
> Signed-off-by: Gonglei 
> Reviewed-by: Eric Blake 

Reviewed-by: Amit Shah 

Amit



Re: [Qemu-devel] [PATCHv5 00/12] Cleanups to qtest PCI handling

2016-10-25 Thread David Gibson
On Tue, Oct 25, 2016 at 03:14:00PM +0200, Greg Kurz wrote:
> On Tue, 25 Oct 2016 14:35:52 +1100
> David Gibson  wrote:
> 
> > On Mon, Oct 24, 2016 at 03:59:49PM +1100, David Gibson wrote:
> > > This series contains a number of cleanups to the libqos code for
> > > accessing PCI devices, and to tests which use it.
> > > 
> > > The general aim is to improve the consistency of semantics across
> > > functions, and reduce the amount of intimate knowledge of the libqos
> > > PCI layer needed by tests.
> > > 
> > > This should make it easier to write PCI tests which will be portable
> > > to different guest machines with different PCI host bridge
> > > arrangements.
> > > 
> > > This series is on top of my ppc-for-2.8 branch, since it contains
> > > patches enabling the virtio tests on ppc, which would otherwise
> > > conflict with the changes here.  
> > 
> > Greg, Alexey, Michael,
> > 
> > Some reviews from outside RH would be really welcome.
> > 
> 
> Done.
> 
> I also took time to run 'make check' with all targets on ppc64le, ppc64be,
> i686 and ppc32 hosts.
> 
> Everything passes for ppc64le, ppc64be and i686.
> 
> It fails on ppc32 but this seems to be a TCG issue (QEMU fails with SIGILL
> in code_gen_buffer()), not related to this patchset.
> 
> Tested-by: Greg Kurz 

Thanks for the reviews; I've now merged this into ppc-for-2.8.

> 
> > > 
> > > Changes since v4:
> > >   * Fixed some remaining abstraction breaks in ahci-test
> > >   * Removed QPCI_BAR_INVALID, turned out not to really be useful
> > > 
> > > Changes since v3:
> > >   * Fixed another endian bug introduced in ide-test
> > > 
> > > Changes since v2:
> > >   * Fixed build bugs in virtio-9p-test, which I didn't find earlier
> > > due to not having the right libraries installed
> > >   * Fixed an endian bug I accidentally introduced in ide-test
> > >   * Better handling of invalid BAR tokens
> > > 
> > > Changes since v1:
> > >   * Split out updates to tco-test into separate patch
> > >   * Split out updates to ide-test into separate patch
> > >   * Neater and more general handling of legacy PIO addresses
> > >   * Removed now-redundant fields from platform specific bus structures
> > >   * Introduced CONFIG_BASE() macro to virtio-pci to remove many
> > > similar assignments
> > >   * Fixed handling of two guest testcasesin ivshmem
> > >   * Added 64-bit accessors
> > >   * Rebase on ppc-for-2.8 to avoid conflict with Laurent's series in
> > > the same area
> > > 
> > > David Gibson (12):
> > >   libqos: Give qvirtio_config_read*() consistent semantics
> > >   libqos: Handle PCI IO de-multiplexing in common code
> > >   libqos: Move BAR assignment to common code
> > >   libqos: Better handling of PCI legacy IO
> > >   tests: Adjust tco-test to use qpci_legacy_iomap()
> > >   libqos: Add streaming accessors for PCI MMIO
> > >   libqos: Implement mmio accessors in terms of mem{read,write}
> > >   tests: Clean up IO handling in ide-test
> > >   libqos: Add 64-bit PCI IO accessors
> > >   tests: Use qpci_mem{read,write} in ivshmem-test
> > >   tests: Don't assume structure of PCI IO base in ahci-test
> > >   libqos: Change PCI accessors to take opaque BAR handle
> > > 
> > >  tests/ahci-test.c  |  13 +--
> > >  tests/e1000e-test.c|   7 +-
> > >  tests/ide-test.c   | 177 
> > > +++--
> > >  tests/ivshmem-test.c   |  47 +++
> > >  tests/libqos/ahci.c|   4 +-
> > >  tests/libqos/ahci.h|   7 +-
> > >  tests/libqos/pci-pc.c  | 187 
> > > ++-
> > >  tests/libqos/pci-spapr.c   | 194 
> > > -
> > >  tests/libqos/pci.c | 194 
> > > +
> > >  tests/libqos/pci.h |  66 ++-
> > >  tests/libqos/usb.c |   6 +-
> > >  tests/libqos/usb.h |   2 +-
> > >  tests/libqos/virtio-mmio.c |  16 ++--
> > >  tests/libqos/virtio-pci.c  | 122 ++--
> > >  tests/libqos/virtio-pci.h  |   2 +-
> > >  tests/rtl8139-test.c   |  10 +--
> > >  tests/tco-test.c   |  80 +--
> > >  tests/usb-hcd-ehci-test.c  |   5 +-
> > >  tests/virtio-9p-test.c |   8 +-
> > >  tests/virtio-blk-test.c|  42 +++---
> > >  tests/virtio-scsi-test.c   |   4 +-
> > >  21 files changed, 598 insertions(+), 595 deletions(-)
> > >   
> > 
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCHv5 07/12] libqos: Implement mmio accessors in terms of mem{read, write}

2016-10-25 Thread David Gibson
On Wed, Oct 26, 2016 at 12:18:03PM +1100, Alexey Kardashevskiy wrote:
> On 25/10/16 23:16, David Gibson wrote:
> > On Tue, Oct 25, 2016 at 05:47:43PM +1100, Alexey Kardashevskiy wrote:
> >> On 24/10/16 15:59, David Gibson wrote:
> >>> In the libqos PCI code we now have accessors both for registers (byte
> >>> significance preserving) and for streaming data (byte address order
> >>> preserving).  These exist in both the interface for qtest drivers and in
> >>> the machine specific backends.
> >>>
> >>> However, the register-style accessors aren't actually necessary in the
> >>> backend.  They can be implemented in terms of the byte address order
> >>> preserving accessors by the libqos wrappers.  This works because PCI is
> >>> always little endian.
> >>>
> >>> This does assume that the back end byte address order preserving accessors
> >>> will perform the equivalent of a single bus transaction for short lengths.
> >>> This is the case, and in fact they currently end up using the same
> >>> cpu_physical_memory_rw() implementation within the qtest accelerator.
> >>>
> >>> Signed-off-by: David Gibson 
> >>> Reviewed-by: Laurent Vivier 
> >>> Reviewed-by: Greg Kurz 
> >>> ---
> >>>  tests/libqos/pci-pc.c| 38 --
> >>>  tests/libqos/pci-spapr.c | 44 
> >>> 
> >>>  tests/libqos/pci.c   | 20 ++--
> >>>  tests/libqos/pci.h   |  8 
> >>>  4 files changed, 14 insertions(+), 96 deletions(-)
> >>>
> >>
> >> [...]
> >>
> >>> diff --git a/tests/libqos/pci.h b/tests/libqos/pci.h
> >>> index 2b08362..ce6ed08 100644
> >>> --- a/tests/libqos/pci.h
> >>> +++ b/tests/libqos/pci.h
> >>> @@ -27,18 +27,10 @@ struct QPCIBus {
> >>>  uint16_t (*pio_readw)(QPCIBus *bus, uint32_t addr);
> >>>  uint32_t (*pio_readl)(QPCIBus *bus, uint32_t addr);
> >>>  
> >>> -uint8_t (*mmio_readb)(QPCIBus *bus, uint32_t addr);
> >>> -uint16_t (*mmio_readw)(QPCIBus *bus, uint32_t addr);
> >>> -uint32_t (*mmio_readl)(QPCIBus *bus, uint32_t addr);
> >>> -
> >>>  void (*pio_writeb)(QPCIBus *bus, uint32_t addr, uint8_t value);
> >>>  void (*pio_writew)(QPCIBus *bus, uint32_t addr, uint16_t value);
> >>>  void (*pio_writel)(QPCIBus *bus, uint32_t addr, uint32_t value);
> >>>  
> >>> -void (*mmio_writeb)(QPCIBus *bus, uint32_t addr, uint8_t value);
> >>> -void (*mmio_writew)(QPCIBus *bus, uint32_t addr, uint16_t value);
> >>> -void (*mmio_writel)(QPCIBus *bus, uint32_t addr, uint32_t value);
> >>> -
> >>>  void (*memread)(QPCIBus *bus, uint32_t addr, void *buf, size_t len);
> >>>  void (*memwrite)(QPCIBus *bus, uint32_t addr, const void *buf, 
> >>> size_t len);
> >>>  
> >>>
> >>
> >> You added them in "libqos: Handle PCI IO de-multiplexing in common code"
> >> (few patched before) and removing them now - if you moved this patch
> >> earlier, it would reduce the series, or what do I miss?
> > 
> > Well, it can't go before the PIO / MMIO split, because on x86 the PIO
> > part is implemented with inw/outw instead of readw/writew, and those
> > don't have a memread/memwrite equivalent.
> > 
> > The change could go at the same time, but my feeling was that logical
> > separation of the steps was worth a bit of temporary extra code.
> 
> It is a bit hard to follow the logic of the patchset when you do not know
> if the new code is going to stay or not - I automatically assumed it is
> staying and when I saw it is being removed - I wondered if you are removing
> what you just added, and this - in my opinion - kills the idea of making
> smaller patches to make review easier, better just squash them all... But
> since Greg is happy and things seems not working worse (make check fails on
> my setup but whatever), you can ignore me :)

Well, I guess it's a trade-off between conceptual simplicity and
minimal code changes.  Putting those callbacks in temporarily means
more code change, but I think it's worth it to make each patch
conceptually simpler.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 1/2] exec.c: do not truncate non-empty memory backend file

2016-10-25 Thread Haozhong Zhang

On 10/25/16 17:30 -0200, Eduardo Habkost wrote:

On Mon, Oct 24, 2016 at 05:21:50PM +0800, Haozhong Zhang wrote:

For '-object memory-backend-file,mem-path=foo,size=xyz', if the size of
file 'foo' does not match the given size 'xyz', the current QEMU will
truncate the file to the given size, which may corrupt the existing data
in that file. To avoid such data corruption, this patch disables
truncating non-empty backend files.

Signed-off-by: Haozhong Zhang 
---
 exec.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index e63c5a1..95983c9 100644
--- a/exec.c
+++ b/exec.c
@@ -1188,6 +1188,15 @@ void qemu_mutex_unlock_ramlist(void)
 }

 #ifdef __linux__
+static int64_t get_file_size(int fd)
+{
+int64_t size = lseek(fd, 0, SEEK_END);
+if (size < 0) {
+return -errno;
+}
+return size;
+}
+
 static void *file_ram_alloc(RAMBlock *block,
 ram_addr_t memory,
 const char *path,
@@ -1199,6 +1208,7 @@ static void *file_ram_alloc(RAMBlock *block,
 char *c;
 void *area = MAP_FAILED;
 int fd = -1;
+int64_t file_size;

 if (kvm_enabled() && !kvm_has_sync_mmu()) {
 error_setg(errp,
@@ -1256,6 +1266,14 @@ static void *file_ram_alloc(RAMBlock *block,
 block->page_size = qemu_fd_getpagesize(fd);
 block->mr->align = MAX(block->page_size, QEMU_VMALLOC_ALIGN);

+file_size = get_file_size(fd);
+if (file_size < 0) {
+error_setg_errno(errp, file_size,
+ "can't get size of backing store %s",
+ path);


What about block devices or filesystems where lseek(SEEK_END) is
not supported? They work today, and would break with this patch.

I suggest just continuing without any errors (and not truncating
the file) in case it is not possible to get the file size.



If it fails to get file size, I'd fall back to the 'size' option. If
it's not zero, QEMU will not truncate the file. If 'memory' is zero,
QEMU will error-out a message like "cannot get file size, 'size'
option should be provided".


+goto error;
+}
+
 if (memory < block->page_size) {
 error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
"or larger than page size 0x%zx",
@@ -1266,12 +1284,29 @@ static void *file_ram_alloc(RAMBlock *block,
 memory = ROUND_UP(memory, block->page_size);

 /*
+ * Do not extend/shrink the backend file if it's not empty, or its
+ * size does not match the aligned 'size=xxx' option. Otherwise,
+ * it is possible to corrupt the existing data in the file.
+ *
+ * Disabling shrinking is not enough. For example, the current
+ * vNVDIMM implementation stores the guest NVDIMM labels at the
+ * end of the backend file. If the backend file is later extended,
+ * QEMU will not be able to find those labels. Therefore,
+ * extending the non-empty backend file is disabled as well.
+ */
+if (file_size && file_size != memory) {
+error_setg(errp, "backing store %s size %"PRId64
+   " does not math with aligned 'size' option %"PRIu64,


Did you mean "specified 'size' option"?



I meant aligned, because the original 'size' option might have been
aligned to a value larger than file_size at this point. I'll use
'specified' in the next version.


+   path, file_size, memory);


We already support size being smaller than the backing file and
people may rely on it, so we shouldn't change this behavior. This
can be changed to:
   if (file_size > 0 && file_size < memory)



will change


I also suggest doing this check in a separate patch. The two
changes (skipping truncation of non-empty files and exiting on
size mismatch) don't depend on each other.



will do

Thanks,
Haozhong


+goto error;
+}
+/*
  * ftruncate is not supported by hugetlbfs in older
  * hosts, so don't bother bailing out on errors.
  * If anything goes wrong with it under other filesystems,
  * mmap will fail.
  */
-if (ftruncate(fd, memory)) {
+if (!file_size && ftruncate(fd, memory)) {
 perror("ftruncate");
 }

--
2.10.1



--
Eduardo




Re: [Qemu-devel] [PATCHv5 08/12] tests: Clean up IO handling in ide-test

2016-10-25 Thread David Gibson
On Wed, Oct 26, 2016 at 12:57:26PM +1100, Alexey Kardashevskiy wrote:
> On 25/10/16 23:25, David Gibson wrote:
> > On Tue, Oct 25, 2016 at 06:01:41PM +1100, Alexey Kardashevskiy wrote:
> >> On 24/10/16 15:59, David Gibson wrote:
> >>> ide-test uses many explicit inb() / outb() operations for its IO, which
> >>> means it's not portable to non-x86 platforms.  This cleans it up to use
> >>> the libqos PCI accessors instead.
> >>>
> >>> Signed-off-by: David Gibson 
> > [snip]
> > 
> >>> -static void send_scsi_cdb_read10(uint64_t lba, int nblocks)
> >>> +static void send_scsi_cdb_read10(QPCIDevice *dev, void *ide_base,
> >>> + uint64_t lba, int nblocks)
> >>>  {
> >>>  Read10CDB pkt = { .padding = 0 };
> >>>  int i;
> >>> @@ -670,7 +717,8 @@ static void send_scsi_cdb_read10(uint64_t lba, int 
> >>> nblocks)
> >>>  
> >>>  /* Send Packet */
> >>>  for (i = 0; i < sizeof(Read10CDB)/2; i++) {
> >>> -outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i]));
> >>> +qpci_io_writew(dev, ide_base + reg_data,
> >>> +   le16_to_cpu(((uint16_t *))[i]));
> >>
> >>
> >> cpu_to_le16 -> le16_to_cpu conversion here and below (at the very end) is
> >> not obvious. Right above this chunk the @pkt fields are initialized as BE:
> >>
> >>  /* Construct SCSI CDB packet */
> >>  pkt.opcode = 0x28;
> >>  pkt.lba = cpu_to_be32(lba);
> >>  pkt.nblocks = cpu_to_be16(nblocks);
> >>
> >> outw() seems to be CPU-endian, and qpci_io_writew() as well, or not?
> > 
> > outw() is guest CPU endian (which is stupid, but that's another
> > matter).  qpci_io_writew() is different - it is always LE, because PCI
> > devices are always LE (well, ok, nearly always).
> > 
> > So, yes, this is a bit confusing.  Here's what's going on:
> >   * the SCSI standard uses BE, so that's what we put into the
> > packet structure
> >   * We need to transfer the packet to the device as a bytestream - so
> > no endianness conversions
> >   * But.. we do so in 16-bit chunks
> >   * .. and qpci_io_writew() is designed to take CPU values and write
> > them out as LE - ie, it contains an implicit cpu_to_le16()
> 
> dev->bus->pio_writew() calls outw() which calls qtest_outw() and
> qtest_sendf() where @value is a text - where does this implicit
> cpu_to_le16() happen? Or I am reading the code wrong?

You're looking at the PC specific backend, which knows that the target
endianness is LE, and so target_to_le16() is a NOP.  The translation
from hsot to guest endianness happens down inside the outw logic.
qtest.c calls outw, which calls stw_p, which is defined to do the swap
for the target endianness in include/exec/cpu-all.h

If you look at the spapr backend, you'll see that the PIO callbacks
have an unconditional byteswap in them.  The spapr backend is ppc
specific which is notionally BE, so it always needs a swap in order to
get LE writes.

> The other branch (for MMIO) in qpci_io_writew() calls cpu_to_le16() 
> explicitly.
> 
> I'd expect a function with a generic name as qpci_io_writew() to always
> take data in the some known and always the same endianness (LE in this case
> as it is PCI).

It does.  It's just the means to accomplishing that is a bit
convoluted for the PIO case.  That's exactly why I think the base
in/out operations should also be fixed endianness, rather than guest
endian, but that's an argument I'm having elsewhere.

> In the chunk above we convert host-CPU-endian @lba to BE then treat it as
> LE when converting to CPU-endian and then expect qpci_io_writew() to do
> swapping again (or not - depends on BAR type - IO vs. MMIO - or conversion
> always happens?), this confuses me a lot. However, everybody else is happy
> so am I :)

You need to think of this in two different parts.  Building the buffer
as a bytestream, which includes BE components.  Then sending the
buffer to the hardware as a bytestream.  This has balanced le<->cpu
conversions in order to preserve bytestream order.

Remember than endian is a property of a value - something that has a
specific length and location, not of a bytestream or bus of itself.
The fields in the request are BE, hence the BE conversions.  The data
*register* which we write stuff out to is treated as LE, hence the LE
conversions.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 12:04:33PM -0700, Richard Henderson wrote:
> Pinging target maintainers.  If I don't get responses by the end of the
> week, I'll only push the generic tcg bits and the two targets that I
> maintain.

Sorry, missed this first time around.  The ppc host side looked ok to
me, but the ppc target side didn't look quite right.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v2] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Fam Zheng
Currently, the generated function body will do "strlen(arg)" but the
argument could be 'char **' or 'char * const *'. Avoid that by excluding
such cases in is_string check.

Reported by patchew's "make docker-test-mingw@fedora".

Suggested-by: Eric Blake 
Signed-off-by: Fam Zheng 

---

v2: Fix typo in commit message and "yeah we'll be counting stars".  [Eric]
---
 scripts/tracetool/backend/simple.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 9885e83..85f6102 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -21,7 +21,8 @@ PUBLIC = True
 
 def is_string(arg):
 strtype = ('const char*', 'char*', 'const char *', 'char *')
-if arg.lstrip().startswith(strtype):
+arg_strip = arg.lstrip()
+if arg_strip.startswith(strtype) and arg_strip.count('*') == 1:
 return True
 else:
 return False
-- 
2.7.4




Re: [Qemu-devel] [PATCH] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Fam Zheng
On Tue, 10/25 21:56, Eric Blake wrote:
> On 10/25/2016 09:46 PM, Fam Zheng wrote:
> > On Tue, 10/25 21:29, Eric Blake wrote:
> >> On 10/25/2016 08:59 PM, Fam Zheng wrote:
> >>> Currently, the generated function body will do "strlen(arg)" but the
> >>> argument could be 'char **'. Avoid that by exclusding such cases in
> >>
> 
> >>>  def is_string(arg):
> >>>  strtype = ('const char*', 'char*', 'const char *', 'char *')
> >>> -if arg.lstrip().startswith(strtype):
> >>> +non_strtype = ('const char**', 'char**', 'const char **', 'char **')
> >>> +arg_strip = arg.lstrip()
> >>> +if arg_strip.startswith(strtype) and not 
> >>> arg_strip.startswith(non_strtype):
> >>
> >> There may be a more compact way to write it, but I'm not enough of a
> >> python expert to know offhand what else to suggest (it's not as simple
> >> as string concatenation of strtype + '*', since strtype is a tuple
> >> rather than a string).
> > 
> > Did you mean
> > 
> > non_strtype = tuple(x + '*' for x in strtype)
> 
> Hmm, I guess that would work.
> 
> Or, what about a different approach, something like:
>   if arg_strip.startswith(strtype) and no_multiple_star(arg_strip):
> for some sane definition of no_multiple_star() that checks that there is
> exactly one '*' in a string.  In C, I'd check roughly:
>   p = strchr(str, '*');
>   if (p && !strchr(p + 1, '*')) {
> // treat str as string
>   }
> but again, I'm not enough of an expert to pop that out late at night,
> even if python has an easy one-liner way to express that.

That's indeed a nicer approach:

if arg_strip.startswith(strtype) and arg_strip.count("*") == 1:

Do you want a respin with your suggested-by? :-)

Fam

> 
> 
> > But personally I'd stick to the flatten version in this specific case for
> > a bit more readability.
> 
> Indeed, and that's why I gave R-b as-is, even if it fails when there are
> multiple 'const' qualifiers in a string with multiple '*' :)
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 





Re: [Qemu-devel] [Qemu-ppc] [PATCH 14/15] target-ppc: Use tcg_gen_extract_*

2016-10-25 Thread David Gibson
On Sat, Oct 15, 2016 at 08:37:49PM -0700, Richard Henderson wrote:
> Use the new primitives for RDWINM and RLDICL.
> 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Richard Henderson 
> ---
>  target-ppc/translate.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index bfc1301..724d95c 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -1977,9 +1977,8 @@ static void gen_rlwinm(DisasContext *ctx)
>  if (mb == 0 && me == (31 - sh)) {
>  tcg_gen_shli_tl(t_ra, t_rs, sh);
>  tcg_gen_ext32u_tl(t_ra, t_ra);
> -} else if (sh != 0 && me == 31 && sh == (32 - mb)) {
> -tcg_gen_ext32u_tl(t_ra, t_rs);
> -tcg_gen_shri_tl(t_ra, t_ra, mb);
> +} else if (me == 31 && (me - mb + 1) + sh <= 32) {

I'm having trouble figuring out what the second part of this condition
is supposed to be checking for, and it seems like it's too
restrictive.

For example, everything except the LSB of a word would be:
rlwnim rT,rA,31,1,31
which would fail the test, but it should be fine to implement that
with an extract op.

> +tcg_gen_extract_tl(t_ra, t_rs, sh, me - mb + 1);
>  } else {
>  target_ulong mask;
>  #if defined(TARGET_PPC64)
> @@ -2094,8 +2093,8 @@ static void gen_rldinm(DisasContext *ctx, int mb, int 
> me, int sh)
>  
>  if (sh != 0 && mb == 0 && me == (63 - sh)) {
>  tcg_gen_shli_tl(t_ra, t_rs, sh);
> -} else if (sh != 0 && me == 63 && sh == (64 - mb)) {
> -tcg_gen_shri_tl(t_ra, t_rs, mb);
> +} else if (me == 63 && (me - mb + 1) + sh <= 64) {
> +tcg_gen_extract_tl(t_ra, t_rs, sh, me - mb + 1);
>  } else {
>  tcg_gen_rotli_tl(t_ra, t_rs, sh);
>  tcg_gen_andi_tl(t_ra, t_ra, MASK(mb, me));

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 08/15] tcg/ppc: Implement field extraction opcodes

2016-10-25 Thread David Gibson
On Sat, Oct 15, 2016 at 08:37:43PM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson 

Reviewed-by: David Gibson 

> ---
>  tcg/ppc/tcg-target.h |  4 ++--
>  tcg/ppc/tcg-target.inc.c | 10 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index c765d3e..b42c57a 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -69,7 +69,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32 1
>  #define TCG_TARGET_HAS_nor_i32  1
>  #define TCG_TARGET_HAS_deposit_i32  1
> -#define TCG_TARGET_HAS_extract_i32  0
> +#define TCG_TARGET_HAS_extract_i32  1
>  #define TCG_TARGET_HAS_sextract_i32 0
>  #define TCG_TARGET_HAS_movcond_i32  1
>  #define TCG_TARGET_HAS_mulu2_i320
> @@ -102,7 +102,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64 1
>  #define TCG_TARGET_HAS_nor_i64  1
>  #define TCG_TARGET_HAS_deposit_i64  1
> -#define TCG_TARGET_HAS_extract_i64  0
> +#define TCG_TARGET_HAS_extract_i64  1
>  #define TCG_TARGET_HAS_sextract_i64 0
>  #define TCG_TARGET_HAS_movcond_i64  1
>  #define TCG_TARGET_HAS_add2_i64 1
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index a3262cf..7ec54a2 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -2396,6 +2396,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
> const TCGArg *args,
>  }
>  break;
>  
> +case INDEX_op_extract_i32:
> +tcg_out_rlw(s, RLWINM, args[0], args[1],
> +32 - args[2], 32 - args[3], 31);
> +break;
> +case INDEX_op_extract_i64:
> +tcg_out_rld(s, RLDICL, args[0], args[1], 64 - args[2], 64 - args[3]);
> +break;
> +
>  case INDEX_op_movcond_i32:
>  tcg_out_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1], args[2],
>  args[3], args[4], const_args[2]);
> @@ -2530,6 +2538,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>  { INDEX_op_movcond_i32, { "r", "r", "ri", "rZ", "rZ" } },
>  
>  { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
> +{ INDEX_op_extract_i32, { "r", "r" } },
>  
>  { INDEX_op_muluh_i32, { "r", "r", "r" } },
>  { INDEX_op_mulsh_i32, { "r", "r", "r" } },
> @@ -2585,6 +2594,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>  { INDEX_op_movcond_i64, { "r", "r", "ri", "rZ", "rZ" } },
>  
>  { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> +{ INDEX_op_extract_i64, { "r", "r" } },
>  
>  { INDEX_op_mulsh_i64, { "r", "r", "r" } },
>  { INDEX_op_muluh_i64, { "r", "r", "r" } },

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Eric Blake
On 10/25/2016 09:46 PM, Fam Zheng wrote:
> On Tue, 10/25 21:29, Eric Blake wrote:
>> On 10/25/2016 08:59 PM, Fam Zheng wrote:
>>> Currently, the generated function body will do "strlen(arg)" but the
>>> argument could be 'char **'. Avoid that by exclusding such cases in
>>

>>>  def is_string(arg):
>>>  strtype = ('const char*', 'char*', 'const char *', 'char *')
>>> -if arg.lstrip().startswith(strtype):
>>> +non_strtype = ('const char**', 'char**', 'const char **', 'char **')
>>> +arg_strip = arg.lstrip()
>>> +if arg_strip.startswith(strtype) and not 
>>> arg_strip.startswith(non_strtype):
>>
>> There may be a more compact way to write it, but I'm not enough of a
>> python expert to know offhand what else to suggest (it's not as simple
>> as string concatenation of strtype + '*', since strtype is a tuple
>> rather than a string).
> 
> Did you mean
> 
> non_strtype = tuple(x + '*' for x in strtype)

Hmm, I guess that would work.

Or, what about a different approach, something like:
  if arg_strip.startswith(strtype) and no_multiple_star(arg_strip):
for some sane definition of no_multiple_star() that checks that there is
exactly one '*' in a string.  In C, I'd check roughly:
  p = strchr(str, '*');
  if (p && !strchr(p + 1, '*')) {
// treat str as string
  }
but again, I'm not enough of an expert to pop that out late at night,
even if python has an easy one-liner way to express that.


> But personally I'd stick to the flatten version in this specific case for
> a bit more readability.

Indeed, and that's why I gave R-b as-is, even if it fails when there are
multiple 'const' qualifiers in a string with multiple '*' :)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Fam Zheng
On Tue, 10/25 21:29, Eric Blake wrote:
> On 10/25/2016 08:59 PM, Fam Zheng wrote:
> > Currently, the generated function body will do "strlen(arg)" but the
> > argument could be 'char **'. Avoid that by exclusding such cases in
> 
> s/exclusding/excluding/

Yes, I blame the insomnia last night. @.@

I assume this can be fixed when applying.

> 
> > is_string check.
> > 
> > Reported by patchew's "make docker-test-mingw@fedora".
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  scripts/tracetool/backend/simple.py | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/scripts/tracetool/backend/simple.py 
> > b/scripts/tracetool/backend/simple.py
> > index 9885e83..2538795 100644
> > --- a/scripts/tracetool/backend/simple.py
> > +++ b/scripts/tracetool/backend/simple.py
> > @@ -21,7 +21,9 @@ PUBLIC = True
> >  
> >  def is_string(arg):
> >  strtype = ('const char*', 'char*', 'const char *', 'char *')
> > -if arg.lstrip().startswith(strtype):
> > +non_strtype = ('const char**', 'char**', 'const char **', 'char **')
> > +arg_strip = arg.lstrip()
> > +if arg_strip.startswith(strtype) and not 
> > arg_strip.startswith(non_strtype):
> 
> There may be a more compact way to write it, but I'm not enough of a
> python expert to know offhand what else to suggest (it's not as simple
> as string concatenation of strtype + '*', since strtype is a tuple
> rather than a string).

Did you mean

non_strtype = tuple(x + '*' for x in strtype)

?

But personally I'd stick to the flatten version in this specific case for
a bit more readability.

Thanks!

Fam



[Qemu-devel] [PULL 7/9] net: vmxnet: initialise local tx descriptor

2016-10-25 Thread Jason Wang
From: Li Qiang 

In Vmxnet3 device emulator while processing transmit(tx) queue,
when it reaches end of packet, it calls vmxnet3_complete_packet.
In that local 'txcq_descr' object is not initialised, which could
leak host memory bytes a guest.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 90f6943..92f6af9 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -531,6 +531,7 @@ static void vmxnet3_complete_packet(VMXNET3State *s, int 
qidx, uint32_t tx_ridx)
 
 VMXNET3_RING_DUMP(VMW_RIPRN, "TXC", qidx, >txq_descr[qidx].comp_ring);
 
+memset(_descr, 0, sizeof(txcq_descr));
 txcq_descr.txdIdx = tx_ridx;
 txcq_descr.gen = vmxnet3_ring_curr_gen(>txq_descr[qidx].comp_ring);
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Eric Blake
On 10/25/2016 08:59 PM, Fam Zheng wrote:
> Currently, the generated function body will do "strlen(arg)" but the
> argument could be 'char **'. Avoid that by exclusding such cases in

s/exclusding/excluding/

> is_string check.
> 
> Reported by patchew's "make docker-test-mingw@fedora".
> 
> Signed-off-by: Fam Zheng 
> ---
>  scripts/tracetool/backend/simple.py | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/tracetool/backend/simple.py 
> b/scripts/tracetool/backend/simple.py
> index 9885e83..2538795 100644
> --- a/scripts/tracetool/backend/simple.py
> +++ b/scripts/tracetool/backend/simple.py
> @@ -21,7 +21,9 @@ PUBLIC = True
>  
>  def is_string(arg):
>  strtype = ('const char*', 'char*', 'const char *', 'char *')
> -if arg.lstrip().startswith(strtype):
> +non_strtype = ('const char**', 'char**', 'const char **', 'char **')
> +arg_strip = arg.lstrip()
> +if arg_strip.startswith(strtype) and not 
> arg_strip.startswith(non_strtype):

There may be a more compact way to write it, but I'm not enough of a
python expert to know offhand what else to suggest (it's not as simple
as string concatenation of strtype + '*', since strtype is a tuple
rather than a string).

What you have will fail to detect 'const char *const *' as a non-string
(possible if we have some argv-like function that takes a constant array
of constant strings), but I guess we can worry about that if we actually
try to trace something with that signature.  In the meantime, what you
have solves the immediate failure, so:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PULL 6/9] e1000e: Don't zero out buffer address in rx descriptor

2016-10-25 Thread Jason Wang
From: Kevin Wolf 

The e1000e emulation zeroes out any used rx descriptor and then writes a
completely newly constructed value there. By doing this, it doesn't only
update the write-back area of the descriptors (as it's supposed to do),
but it also clears the buffer address, which real hardware doesn't do.

The spec explicitly mentions in chapter 7.1.8 that it is valid for a
driver to reuse a descriptor and only update the status field while
doing so, i.e. reusing the old buffer address:

If software statically allocates buffers, and uses memory read to
check for completed descriptors, it simply has to zero the status
byte in the descriptor to make it ready for reuse by hardware.

This patch fixes the behaviour to leave the buffer address in
descriptors unchanged even after the descriptor has been used.

Signed-off-by: Kevin Wolf 
Reviewed-by: Dmitry Fleytman 
Signed-off-by: Jason Wang 
---
 hw/net/e1000e_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 6505983..2b11499 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1278,11 +1278,10 @@ e1000e_write_lgcy_rx_descr(E1000ECore *core, uint8_t 
*desc,
 
 struct e1000_rx_desc *d = (struct e1000_rx_desc *) desc;
 
-memset(d, 0, sizeof(*d));
-
 assert(!rss_info->enabled);
 
 d->length = cpu_to_le16(length);
+d->csum = 0;
 
 e1000e_build_rx_metadata(core, pkt, pkt != NULL,
  rss_info,
@@ -1291,6 +1290,7 @@ e1000e_write_lgcy_rx_descr(E1000ECore *core, uint8_t 
*desc,
  >special);
 d->errors = (uint8_t) (le32_to_cpu(status_flags) >> 24);
 d->status = (uint8_t) le32_to_cpu(status_flags);
+d->special = 0;
 }
 
 static inline void
@@ -1301,7 +1301,7 @@ e1000e_write_ext_rx_descr(E1000ECore *core, uint8_t *desc,
 {
 union e1000_rx_desc_extended *d = (union e1000_rx_desc_extended *) desc;
 
-memset(d, 0, sizeof(*d));
+memset(>wb, 0, sizeof(d->wb));
 
 d->wb.upper.length = cpu_to_le16(length);
 
@@ -1325,7 +1325,7 @@ e1000e_write_ps_rx_descr(E1000ECore *core, uint8_t *desc,
 union e1000_rx_desc_packet_split *d =
 (union e1000_rx_desc_packet_split *) desc;
 
-memset(d, 0, sizeof(*d));
+memset(>wb, 0, sizeof(d->wb));
 
 d->wb.middle.length0 = cpu_to_le16((*written)[0]);
 
-- 
2.7.4




[Qemu-devel] [PULL 9/9] colo-proxy: fix memory leak

2016-10-25 Thread Jason Wang
From: Zhang Chen 

Fix memory leak in colo-compare.c and filter-rewriter.c
Report by Coverity and add some comments.

Signed-off-by: Zhang Chen 
Reviewed-by: zhanghailiang 
Signed-off-by: Jason Wang 
---
 net/colo-compare.c| 34 +++---
 net/filter-rewriter.c | 17 +
 net/trace-events  |  1 +
 3 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 109990f..f791383 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -188,7 +188,6 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
 {
 struct tcphdr *ptcp, *stcp;
 int res;
-char *sdebug, *ddebug;
 
 trace_colo_compare_main("compare tcp");
 if (ppkt->size != spkt->size) {
@@ -219,24 +218,21 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
 (spkt->size - ETH_HLEN));
 
 if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
-sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
-ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
-fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
-" s: seq/ack=%u/%u res=%d flags=%x/%x\n",
-__func__, sdebug, ddebug,
-(unsigned int)ntohl(ptcp->th_seq),
-(unsigned int)ntohl(ptcp->th_ack),
-(unsigned int)ntohl(stcp->th_seq),
-(unsigned int)ntohl(stcp->th_ack),
-res, ptcp->th_flags, stcp->th_flags);
-
-fprintf(stderr, "Primary len = %d\n", ppkt->size);
-qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size);
-fprintf(stderr, "Secondary len = %d\n", spkt->size);
-qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size);
-
-g_free(sdebug);
-g_free(ddebug);
+trace_colo_compare_pkt_info(inet_ntoa(ppkt->ip->ip_src),
+inet_ntoa(ppkt->ip->ip_dst),
+ntohl(ptcp->th_seq),
+ntohl(ptcp->th_ack),
+ntohl(stcp->th_seq),
+ntohl(stcp->th_ack),
+res, ptcp->th_flags,
+stcp->th_flags,
+ppkt->size,
+spkt->size);
+
+qemu_hexdump((char *)ppkt->data, stderr,
+ "colo-compare ppkt", ppkt->size);
+qemu_hexdump((char *)spkt->data, stderr,
+ "colo-compare spkt", spkt->size);
 }
 
 return res;
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 89abe72..c4ab91c 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -68,15 +68,11 @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
 
 tcp_pkt = (struct tcphdr *)pkt->transport_header;
 if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
-char *sdebug, *ddebug;
-sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
-ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
-trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+trace_colo_filter_rewriter_pkt_info(__func__,
+inet_ntoa(pkt->ip->ip_src), inet_ntoa(pkt->ip->ip_dst),
 ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
 tcp_pkt->th_flags);
 trace_colo_filter_rewriter_conn_offset(conn->offset);
-g_free(sdebug);
-g_free(ddebug);
 }
 
 if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
@@ -116,15 +112,11 @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
 tcp_pkt = (struct tcphdr *)pkt->transport_header;
 
 if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
-char *sdebug, *ddebug;
-sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
-ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
-trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+trace_colo_filter_rewriter_pkt_info(__func__,
+inet_ntoa(pkt->ip->ip_src), inet_ntoa(pkt->ip->ip_dst),
 ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
 tcp_pkt->th_flags);
 trace_colo_filter_rewriter_conn_offset(conn->offset);
-g_free(sdebug);
-g_free(ddebug);
 }
 
 if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
@@ -162,6 +154,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 
 iov_to_buf(iov, iovcnt, 0, buf, size);
 pkt = packet_new(buf, size);
+g_free(buf);
 
 /*
  * if we get tcp packet
diff --git a/net/trace-events b/net/trace-events
index d67f048..b1913a6 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -13,6 +13,7 @@ 

[Qemu-devel] [PULL 5/9] net: rocker: set limit to DMA buffer size

2016-10-25 Thread Jason Wang
From: Prasad J Pandit 

Rocker network switch emulator has test registers to help debug
DMA operations. While testing host DMA access, a buffer address
is written to register 'TEST_DMA_ADDR' and its size is written to
register 'TEST_DMA_SIZE'. When performing TEST_DMA_CTRL_INVERT
test, if DMA buffer size was greater than 'INT_MAX', it leads to
an invalid buffer access. Limit the DMA buffer size to avoid it.

Reported-by: Huawei PSIRT 
Signed-off-by: Prasad J Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Jason Wang 
---
 hw/net/rocker/rocker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 30f2ce4..e9d215a 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -860,7 +860,7 @@ static void rocker_io_writel(void *opaque, hwaddr addr, 
uint32_t val)
 rocker_msix_irq(r, val);
 break;
 case ROCKER_TEST_DMA_SIZE:
-r->test_dma_size = val;
+r->test_dma_size = val & 0x;
 break;
 case ROCKER_TEST_DMA_ADDR + 4:
 r->test_dma_addr = ((uint64_t)val) << 32 | r->lower32;
-- 
2.7.4




[Qemu-devel] [PULL 8/9] net: rtl8139: limit processing of ring descriptors

2016-10-25 Thread Jason Wang
From: Prasad J Pandit 

RTL8139 ethernet controller in C+ mode supports multiple
descriptor rings, each with maximum of 64 descriptors. While
processing transmit descriptor ring in 'rtl8139_cplus_transmit',
it does not limit the descriptor count and runs forever. Add
check to avoid it.

Reported-by: Andrew Henderson 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Jason Wang 
---
 hw/net/rtl8139.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 3345bc6..f05e59c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2350,7 +2350,7 @@ static void rtl8139_cplus_transmit(RTL8139State *s)
 {
 int txcount = 0;
 
-while (rtl8139_cplus_transmit_one(s))
+while (txcount < 64 && rtl8139_cplus_transmit_one(s))
 {
 ++txcount;
 }
-- 
2.7.4




[Qemu-devel] [PULL 2/9] net: pcnet: fix source formatting and indentation

2016-10-25 Thread Jason Wang
From: Prasad J Pandit 

Fix indentations and source format at few places. Add braces
around 'if' and 'while' statements.

Signed-off-by: Prasad J Pandit 
Signed-off-by: Jason Wang 
---
 hw/net/pcnet.c | 130 +
 1 file changed, 67 insertions(+), 63 deletions(-)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 3078de8..6544553 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -302,7 +302,7 @@ static inline void pcnet_tmd_load(PCNetState *s, struct 
pcnet_TMD *tmd,
 uint32_t tbadr;
 int16_t length;
 int16_t status;
-   } xda;
+} xda;
 s->phys_mem_read(s->dma_opaque, addr, (void *), sizeof(xda), 0);
 tmd->tbadr = le32_to_cpu(xda.tbadr) & 0xff;
 tmd->length = le16_to_cpu(xda.length);
@@ -664,7 +664,9 @@ static inline int ladr_match(PCNetState *s, const uint8_t 
*buf, int size)
 
 static inline hwaddr pcnet_rdra_addr(PCNetState *s, int idx)
 {
-while (idx < 1) idx += CSR_RCVRL(s);
+while (idx < 1) {
+idx += CSR_RCVRL(s);
+}
 return s->rdra + ((CSR_RCVRL(s) - idx) * (BCR_SWSTYLE(s) ? 16 : 8));
 }
 
@@ -672,8 +674,10 @@ static inline int64_t pcnet_get_next_poll_time(PCNetState 
*s, int64_t current_ti
 {
 int64_t next_time = current_time +
 (65536 - (CSR_SPND(s) ? 0 : CSR_POLL(s))) * 30;
-if (next_time <= current_time)
+
+if (next_time <= current_time) {
 next_time = current_time + 1;
+}
 return next_time;
 }
 
@@ -795,13 +799,13 @@ static void pcnet_init(PCNetState *s)
 mode = le16_to_cpu(initblk.mode);
 rlen = initblk.rlen >> 4;
 tlen = initblk.tlen >> 4;
-   ladrf[0] = le16_to_cpu(initblk.ladrf[0]);
-   ladrf[1] = le16_to_cpu(initblk.ladrf[1]);
-   ladrf[2] = le16_to_cpu(initblk.ladrf[2]);
-   ladrf[3] = le16_to_cpu(initblk.ladrf[3]);
-   padr[0] = le16_to_cpu(initblk.padr[0]);
-   padr[1] = le16_to_cpu(initblk.padr[1]);
-   padr[2] = le16_to_cpu(initblk.padr[2]);
+ladrf[0] = le16_to_cpu(initblk.ladrf[0]);
+ladrf[1] = le16_to_cpu(initblk.ladrf[1]);
+ladrf[2] = le16_to_cpu(initblk.ladrf[2]);
+ladrf[3] = le16_to_cpu(initblk.ladrf[3]);
+padr[0] = le16_to_cpu(initblk.padr[0]);
+padr[1] = le16_to_cpu(initblk.padr[1]);
+padr[2] = le16_to_cpu(initblk.padr[2]);
 rdra = le32_to_cpu(initblk.rdra);
 tdra = le32_to_cpu(initblk.tdra);
 } else {
@@ -809,13 +813,13 @@ static void pcnet_init(PCNetState *s)
 s->phys_mem_read(s->dma_opaque, PHYSADDR(s,CSR_IADR(s)),
 (uint8_t *), sizeof(initblk), 0);
 mode = le16_to_cpu(initblk.mode);
-   ladrf[0] = le16_to_cpu(initblk.ladrf[0]);
-   ladrf[1] = le16_to_cpu(initblk.ladrf[1]);
-   ladrf[2] = le16_to_cpu(initblk.ladrf[2]);
-   ladrf[3] = le16_to_cpu(initblk.ladrf[3]);
-   padr[0] = le16_to_cpu(initblk.padr[0]);
-   padr[1] = le16_to_cpu(initblk.padr[1]);
-   padr[2] = le16_to_cpu(initblk.padr[2]);
+ladrf[0] = le16_to_cpu(initblk.ladrf[0]);
+ladrf[1] = le16_to_cpu(initblk.ladrf[1]);
+ladrf[2] = le16_to_cpu(initblk.ladrf[2]);
+ladrf[3] = le16_to_cpu(initblk.ladrf[3]);
+padr[0] = le16_to_cpu(initblk.padr[0]);
+padr[1] = le16_to_cpu(initblk.padr[1]);
+padr[2] = le16_to_cpu(initblk.padr[2]);
 rdra = le32_to_cpu(initblk.rdra);
 tdra = le32_to_cpu(initblk.tdra);
 rlen = rdra >> 29;
@@ -858,12 +862,12 @@ static void pcnet_start(PCNetState *s)
 printf("pcnet_start\n");
 #endif
 
-if (!CSR_DTX(s))
+if (!CSR_DTX(s)) {
 s->csr[0] |= 0x0010;/* set TXON */
-
-if (!CSR_DRX(s))
+}
+if (!CSR_DRX(s)) {
 s->csr[0] |= 0x0020;/* set RXON */
-
+}
 s->csr[0] &= ~0x0004;   /* clear STOP bit */
 s->csr[0] |= 0x0002;
 pcnet_poll_timer(s);
@@ -925,8 +929,7 @@ static void pcnet_rdte_poll(PCNetState *s)
crda);
 }
 } else {
-printf("pcnet: BAD RMD RDA=0x" TARGET_FMT_plx "\n",
-   crda);
+printf("pcnet: BAD RMD RDA=0x" TARGET_FMT_plx "\n", crda);
 #endif
 }
 }
@@ -1168,10 +1171,11 @@ ssize_t pcnet_receive(NetClientState *nc, const uint8_t 
*buf, size_t size_)
 #endif
 
 while (pktcount--) {
-if (CSR_RCVRC(s) <= 1)
+if (CSR_RCVRC(s) <= 1) {
 CSR_RCVRC(s) = CSR_RCVRL(s);
-else
+} else {
 CSR_RCVRC(s)--;
+}
 }
 
 pcnet_rdte_poll(s);
@@ -1207,7 +1211,7 @@ static void pcnet_transmit(PCNetState *s)
 
 s->tx_busy = 1;
 
-txagain:
+txagain:
 if (pcnet_tdte_poll(s)) {
 struct pcnet_TMD tmd;
 
@@ -1251,7 +1255,7 @@ static void 

[Qemu-devel] [PULL 1/9] net: pcnet: check rx/tx descriptor ring length

2016-10-25 Thread Jason Wang
From: Prasad J Pandit 

The AMD PC-Net II emulator has set of control and status(CSR)
registers. Of these, CSR76 and CSR78 hold receive and transmit
descriptor ring length respectively. This ring length could range
from 1 to 65535. Setting ring length to zero leads to an infinite
loop in pcnet_rdra_addr() or pcnet_transmit(). Add check to avoid it.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Jason Wang 
---
 hw/net/pcnet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 198a01f..3078de8 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -1429,8 +1429,11 @@ static void pcnet_csr_writew(PCNetState *s, uint32_t 
rap, uint32_t new_value)
 case 47: /* POLLINT */
 case 72:
 case 74:
+break;
 case 76: /* RCVRL */
 case 78: /* XMTRL */
+val = (val > 0) ? val : 512;
+break;
 case 112:
if (CSR_STOP(s) || CSR_SPND(s))
break;
-- 
2.7.4




[Qemu-devel] [PULL 4/9] net: eepro100: fix memory leak in device uninit

2016-10-25 Thread Jason Wang
From: Li Qiang 

The exit dispatch of eepro100 network card device doesn't free
the 's->vmstate' field which was allocated in device realize thus
leading a host memory leak. This patch avoid this.

Signed-off-by: Li Qiang 
Signed-off-by: Jason Wang 
---
 hw/net/eepro100.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index bab4dbf..4bf71f2 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1843,6 +1843,7 @@ static void pci_nic_uninit(PCIDevice *pci_dev)
 EEPRO100State *s = DO_UPCAST(EEPRO100State, dev, pci_dev);
 
 vmstate_unregister(_dev->qdev, s->vmstate, s);
+g_free(s->vmstate);
 eeprom93xx_free(_dev->qdev, s->eeprom);
 qemu_del_nic(s->nic);
 }
-- 
2.7.4




[Qemu-devel] [PULL 0/9] Net patches

2016-10-25 Thread Jason Wang
The following changes since commit ede0cbeb7892bdf4a19128853a3a3c61a17fb068:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-10-25' into 
staging (2016-10-25 17:03:11 +0100)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 2061c14c9bea67f8f1fc6bc7acb33c903a0586c1:

  colo-proxy: fix memory leak (2016-10-26 09:58:02 +0800)




Brad Smith (1):
  tap-bsd: OpenBSD uses tap(4) now

Kevin Wolf (1):
  e1000e: Don't zero out buffer address in rx descriptor

Li Qiang (2):
  net: eepro100: fix memory leak in device uninit
  net: vmxnet: initialise local tx descriptor

Prasad J Pandit (4):
  net: pcnet: check rx/tx descriptor ring length
  net: pcnet: fix source formatting and indentation
  net: rocker: set limit to DMA buffer size
  net: rtl8139: limit processing of ring descriptors

Zhang Chen (1):
  colo-proxy: fix memory leak

 hw/net/e1000e_core.c   |   8 +--
 hw/net/eepro100.c  |   1 +
 hw/net/pcnet.c | 133 ++---
 hw/net/rocker/rocker.c |   2 +-
 hw/net/rtl8139.c   |   2 +-
 hw/net/vmxnet3.c   |   1 +
 net/colo-compare.c |  34 ++---
 net/filter-rewriter.c  |  17 ++-
 net/tap-bsd.c  |   6 ++-
 net/trace-events   |   1 +
 10 files changed, 104 insertions(+), 101 deletions(-)

-- 
2.7.4




[Qemu-devel] [PULL 3/9] tap-bsd: OpenBSD uses tap(4) now

2016-10-25 Thread Jason Wang
From: Brad Smith 

Update the tap-bsd code now that OpenBSD uses tap(4).

Signed-off-by: Brad Smith 
Signed-off-by: Jason Wang 
---
 net/tap-bsd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index c506ac3..6c96922 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -35,6 +35,10 @@
 #include 
 #endif
 
+#if defined(__OpenBSD__)
+#include 
+#endif
+
 #ifndef __FreeBSD__
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
  int vnet_hdr_required, int mq_required, Error **errp)
@@ -55,7 +59,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
 if (*ifname) {
 snprintf(dname, sizeof dname, "/dev/%s", ifname);
 } else {
-#if defined(__OpenBSD__)
+#if defined(__OpenBSD__) && OpenBSD < 201605
 snprintf(dname, sizeof dname, "/dev/tun%d", i);
 #else
 snprintf(dname, sizeof dname, "/dev/tap%d", i);
-- 
2.7.4




[Qemu-devel] [PATCH qemu] configure, ppc64: Copy skiboot.lid to build directory when configuring

2016-10-25 Thread Alexey Kardashevskiy
When configured to compile out of tree, the configure script
copies BIOS blobs to the build directory. However since the PPC64 powernv
machine ROM has .lid extension, it is ignored and "make check" fails
when trying the powernv machine.

This adds *.lid to the list of copied blobs.

Signed-off-by: Alexey Kardashevskiy 
---
 configure | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index d3dafcb..300b0cb 100755
--- a/configure
+++ b/configure
@@ -6071,6 +6071,7 @@ FILES="$FILES roms/seabios/Makefile roms/vgabios/Makefile"
 FILES="$FILES pc-bios/qemu-icon.bmp"
 for bios_file in \
 $source_path/pc-bios/*.bin \
+$source_path/pc-bios/*.lid \
 $source_path/pc-bios/*.aml \
 $source_path/pc-bios/*.rom \
 $source_path/pc-bios/*.dtb \
-- 
2.5.0.rc3




[Qemu-devel] [PATCH] trace: Fix 'char **' compilation error in simple backend

2016-10-25 Thread Fam Zheng
Currently, the generated function body will do "strlen(arg)" but the
argument could be 'char **'. Avoid that by exclusding such cases in
is_string check.

Reported by patchew's "make docker-test-mingw@fedora".

Signed-off-by: Fam Zheng 
---
 scripts/tracetool/backend/simple.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 9885e83..2538795 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -21,7 +21,9 @@ PUBLIC = True
 
 def is_string(arg):
 strtype = ('const char*', 'char*', 'const char *', 'char *')
-if arg.lstrip().startswith(strtype):
+non_strtype = ('const char**', 'char**', 'const char **', 'char **')
+arg_strip = arg.lstrip()
+if arg_strip.startswith(strtype) and not arg_strip.startswith(non_strtype):
 return True
 else:
 return False
-- 
2.7.4




Re: [Qemu-devel] [PATCH RFC 0/7] COLO block replication supports shared disk case

2016-10-25 Thread Changlong Xie

I did't review p5/p6, I think you can merge p5/p6 into a single one.
Also don't forget update qapi/block-core.json with p3.

Thanks
-Xie

On 10/20/2016 09:57 PM, zhanghailiang wrote:

COLO block replication doesn't support the shared disk case,
Here we try to implement it.

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

  virtio-blk ||   
.--
  /  ||   | 
Secondary
 /   ||   
'--
/|| 
virtio-blk
   / || 
 |
   | ||   
replication(5)
   |NBD  >   NBD   (2)  
 |
   |  client ||server ---> hidden disk <-- 
active disk(4)
   | ^   ||  |
   |  replication(1) ||  |
   | |   ||  |
   |   +-'   ||  |
  (3)  |drive-backup sync=none   ||  |
. |   +-+   ||  |
Primary | | |   ||   backing|
' | |   ||  |
   V |   |
+---+|
|   shared disk | <--+
+---+
1) Primary writes will read original data and forward it to Secondary
QEMU.
2) The hidden-disk will buffers the original content that is modified
by the primary VM. It should also be an empty disk, and
the driver supports bdrv_make_empty() and backing file.
3) Primary write requests will be written to Shared disk.
4) Secondary write requests will be buffered in the active disk and it
will overwrite the existing sector content in the buffe

For more details, please refer to patch 1.

The complete codes can be found from the link:
https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk

Test steps:
1. Secondary:
# x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :9 
-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -drive 
if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,backing.driver=raw,backing.file.filename=/work/kvm/suse11_sp3_64
  -drive 
if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfs/active_disk.img,file.backing=hidden_disk0,shared-disk=on
 -incoming tcp:0:

Issue qmp commands:
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': 
{'host': '0', 'port': '9998'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'hidden_disk0', 
'writable': true } }

2.Primary:
# x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc 
:9 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -drive 
if=virtio,id=primary_disk0,file.filename=/work/kvm/suse11_sp3_64,driver=raw -S

Issue qmp commands:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add 
-n buddy 
driver=replication,mode=primary,file.driver=nbd,file.host=9.42.3.17,file.port=9998,file.export=hidden_disk0,shared-disk-id=primary_disk0,shared-disk=on,node-name=rep'}}
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ 
{'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:9.42.3.17:' } }

3. Failover
Secondary side:
Issue qmp commands:
{ 'execute': 'nbd-server-stop' }
{ "execute": "x-colo-lost-heartbeat" }

Please review and any commits are welcomed.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Dr. David Alan Gilbert (git) 

zhanghailiang (7):
   docs/block-replication: Add description for shared-disk case
   block-backend: Introduce blk_root() helper
   replication: add shared-disk and shared-disk-id options
   replication: Split out backup_do_checkpoint() from
 secondary_do_checkpoint()
   replication: fix code logic with the new shared_disk option
   replication: Implement block replication for shared disk case
   nbd/replication: implement .bdrv_get_info() for nbd and replication
 driver

  block/block-backend.c  |   5 ++
  block/nbd.c|  12 

Re: [Qemu-devel] [PATCHv5 08/12] tests: Clean up IO handling in ide-test

2016-10-25 Thread Alexey Kardashevskiy
On 25/10/16 23:25, David Gibson wrote:
> On Tue, Oct 25, 2016 at 06:01:41PM +1100, Alexey Kardashevskiy wrote:
>> On 24/10/16 15:59, David Gibson wrote:
>>> ide-test uses many explicit inb() / outb() operations for its IO, which
>>> means it's not portable to non-x86 platforms.  This cleans it up to use
>>> the libqos PCI accessors instead.
>>>
>>> Signed-off-by: David Gibson 
> [snip]
> 
>>> -static void send_scsi_cdb_read10(uint64_t lba, int nblocks)
>>> +static void send_scsi_cdb_read10(QPCIDevice *dev, void *ide_base,
>>> + uint64_t lba, int nblocks)
>>>  {
>>>  Read10CDB pkt = { .padding = 0 };
>>>  int i;
>>> @@ -670,7 +717,8 @@ static void send_scsi_cdb_read10(uint64_t lba, int 
>>> nblocks)
>>>  
>>>  /* Send Packet */
>>>  for (i = 0; i < sizeof(Read10CDB)/2; i++) {
>>> -outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i]));
>>> +qpci_io_writew(dev, ide_base + reg_data,
>>> +   le16_to_cpu(((uint16_t *))[i]));
>>
>>
>> cpu_to_le16 -> le16_to_cpu conversion here and below (at the very end) is
>> not obvious. Right above this chunk the @pkt fields are initialized as BE:
>>
>>  /* Construct SCSI CDB packet */
>>  pkt.opcode = 0x28;
>>  pkt.lba = cpu_to_be32(lba);
>>  pkt.nblocks = cpu_to_be16(nblocks);
>>
>> outw() seems to be CPU-endian, and qpci_io_writew() as well, or not?
> 
> outw() is guest CPU endian (which is stupid, but that's another
> matter).  qpci_io_writew() is different - it is always LE, because PCI
> devices are always LE (well, ok, nearly always).
> 
> So, yes, this is a bit confusing.  Here's what's going on:
>   * the SCSI standard uses BE, so that's what we put into the
> packet structure
>   * We need to transfer the packet to the device as a bytestream - so
> no endianness conversions
>   * But.. we do so in 16-bit chunks
>   * .. and qpci_io_writew() is designed to take CPU values and write
> them out as LE - ie, it contains an implicit cpu_to_le16()

dev->bus->pio_writew() calls outw() which calls qtest_outw() and
qtest_sendf() where @value is a text - where does this implicit
cpu_to_le16() happen? Or I am reading the code wrong?

The other branch (for MMIO) in qpci_io_writew() calls cpu_to_le16() explicitly.

I'd expect a function with a generic name as qpci_io_writew() to always
take data in the some known and always the same endianness (LE in this case
as it is PCI).

In the chunk above we convert host-CPU-endian @lba to BE then treat it as
LE when converting to CPU-endian and then expect qpci_io_writew() to do
swapping again (or not - depends on BAR type - IO vs. MMIO - or conversion
always happens?), this confuses me a lot. However, everybody else is happy
so am I :)




-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH V4] colo-proxy: fix memory leak

2016-10-25 Thread Jason Wang



On 2016年10月24日 11:08, Hailiang Zhang wrote:

On 2016/10/17 17:23, Zhang Chen wrote:

Fix memory leak in colo-compare.c and filter-rewriter.c
Report by Coverity and add some comments.

Signed-off-by: Zhang Chen 
---


Reviewed-by: zhanghailiang 


Applied to -net.

Thanks.



Re: [Qemu-devel] [PATCH RFC 3/7] replication: add shared-disk and shared-disk-id options

2016-10-25 Thread Changlong Xie

On 10/20/2016 09:57 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
  block/replication.c | 33 +
  1 file changed, 33 insertions(+)

diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..2a2fdb2 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
  typedef struct BDRVReplicationState {
  ReplicationMode mode;
  int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
  BdrvChild *active_disk;
  BdrvChild *hidden_disk;
  BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
  char *top_id;
  ReplicationState *rs;
  Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,

  #define REPLICATION_MODE"mode"
  #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
  static QemuOptsList replication_runtime_opts = {
  .name = "replication",
  .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
  .name = REPLICATION_TOP_ID,
  .type = QEMU_OPT_STRING,
  },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
  { /* end of list */ }
  },
  };
@@ -85,6 +99,8 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  QemuOpts *opts = NULL;
  const char *mode;
  const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;

  ret = -EINVAL;
  opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
@@ -114,6 +130,22 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 "The option mode's value should be primary or secondary");
  goto fail;
  }


Now we have four runtime options 
"mode"/"top-id"/"shared-disk"/"shared-disk-id". But the current checking 
logic is too weak, i think you need enhance it to avoid opts misusage.



+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+false);


Missing one space.


+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(_err, "Missing shared disk blk");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+s->primary_disk = blk_root(blk);
+}

  s->rs = replication_new(bs, _ops);

@@ -130,6 +162,7 @@ static void replication_close(BlockDriverState *bs)
  {
  BDRVReplicationState *s = bs->opaque;

+g_free(s->shared_disk_id);
  if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
  replication_stop(s->rs, false, NULL);
  }







Re: [Qemu-devel] [PATCH 10/10] spapr: Memory hot-unplug support

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:36PM -0500, Michael Roth wrote:
> From: Bharata B Rao 
> 
> Add support to hot remove pc-dimm memory devices.
> 
> Since we're introducing a machine-level unplug_request hook, we also
> had handling for CPU unplug there as well to ensure CPU unplug
> continues to work as it did before.
> 
> Signed-off-by: Bharata B Rao 
> * add hooks to CAS/cmdline enablement of hotplug ACR support
> * add hook for CPU unplug
> Signed-off-by: Michael Roth 
> ---
>  hw/ppc/spapr.c | 119 
> -
>  hw/ppc/spapr_drc.c |  17 
>  2 files changed, 135 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0b3aa2f..a4a6058 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2281,6 +2281,90 @@ out:
>  error_propagate(errp, local_err);
>  }
>  
> +typedef struct sPAPRDIMMState {
> +uint32_t nr_lmbs;
> +} sPAPRDIMMState;
> +
> +static void spapr_lmb_release(DeviceState *dev, void *opaque)
> +{
> +sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
> +HotplugHandler *hotplug_ctrl = NULL;

No reason for the NULL initializer here - you set the variable
unconditionally below.

Otherwise,

Reviewed-by: David Gibson 

> +if (--ds->nr_lmbs) {
> +return;
> +}
> +
> +g_free(ds);
> +
> +/*
> + * Now that all the LMBs have been removed by the guest, call the
> + * pc-dimm unplug handler to cleanup up the pc-dimm device.
> + */
> +hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
> +}
> +
> +static void spapr_del_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t 
> size,
> +   Error **errp)
> +{
> +sPAPRDRConnector *drc;
> +sPAPRDRConnectorClass *drck;
> +uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +int i;
> +sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
> +uint64_t addr = addr_start;
> +
> +ds->nr_lmbs = nr_lmbs;
> +for (i = 0; i < nr_lmbs; i++) {
> +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +addr / SPAPR_MEMORY_BLOCK_SIZE);
> +g_assert(drc);
> +
> +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +drck->detach(drc, dev, spapr_lmb_release, ds, errp);
> +addr += SPAPR_MEMORY_BLOCK_SIZE;
> +}
> +
> +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +  nr_lmbs,
> +  drck->get_index(drc));
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState 
> *dev,
> +Error **errp)
> +{
> +sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> +PCDIMMDevice *dimm = PC_DIMM(dev);
> +PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
> +
> +pc_dimm_memory_unplug(dev, >hotplug_memory, mr);
> +object_unparent(OBJECT(dev));
> +}
> +
> +static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +Error *local_err = NULL;
> +PCDIMMDevice *dimm = PC_DIMM(dev);
> +PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
> +uint64_t size = memory_region_size(mr);
> +uint64_t addr;
> +
> +addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, 
> _err);
> +if (local_err) {
> +goto out;
> +}
> +
> +spapr_del_lmbs(dev, addr, size, _abort);
> +out:
> +error_propagate(errp, local_err);
> +}
> +
>  void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>  sPAPRMachineState *spapr)
>  {
> @@ -2354,10 +2438,42 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
> +sPAPRMachineState *sms = SPAPR_MACHINE(qdev_get_machine());
>  MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
>  
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -error_setg(errp, "Memory hot unplug not supported by sPAPR");
> +if (spapr_ovec_test(sms->ov5_cas, OV5_HP_EVT)) {
> +spapr_memory_unplug(hotplug_dev, dev, errp);
> +} else {
> +error_setg(errp, "Memory hot unplug not supported for this 
> guest");
> +}
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +

Re: [Qemu-devel] [PATCH 07/10] spapr_events: add support for dedicated hotplug event source

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:33PM -0500, Michael Roth wrote:
> Hotplug events were previously delivered using an EPOW interrupt
> and were queued by linux guests into a circular buffer. For traditional
> EPOW events like shutdown/resets, this isn't an issue, but for hotplug
> events there are cases where this buffer can be exhausted, resulting
> in the loss of hotplug events, resets, etc.
> 
> Newer-style hotplug event are delivered using a dedicated event source.
> We enable this in supported guests by adding standard an additional
> event source in the guest device-tree via /event-sources, and, if
> the guest advertises support for the newer-style hotplug events,
> using the corresponding interrupt to signal the available of
> hotplug/unplug events.
> 
> Signed-off-by: Michael Roth 
> ---
>  hw/ppc/spapr.c |   4 +-
>  hw/ppc/spapr_events.c  | 202 
> -
>  include/hw/ppc/spapr.h |   5 +-
>  3 files changed, 170 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a3ea140..dc4224b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -973,7 +973,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  }
>  
>  /* /event-sources */
> -spapr_dt_events(fdt, spapr->check_exception_irq);
> +spapr_dt_events(spapr, fdt);
>  
>  /* /rtas */
>  spapr_dt_rtas(spapr, fdt);
> @@ -1917,7 +1917,7 @@ static void ppc_spapr_init(MachineState *machine)
>  }
>  g_free(filename);
>  
> -/* Set up EPOW events infrastructure */
> +/* Set up RTAS event infrastructure */
>  spapr_events_init(spapr);
>  
>  /* Set up the RTC RTAS interfaces */
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 89aa5a7..b6b3511 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -40,6 +40,7 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qemu/help_option.h"
>  #include "qemu/bcd.h"
> +#include "hw/ppc/spapr_ovec.h"
>  #include 
>  
>  struct rtas_error_log {
> @@ -206,27 +207,140 @@ struct hp_log_full {
>  struct rtas_event_log_v6_hp hp;
>  } QEMU_PACKED;
>  
> -#define EVENT_MASK_INTERNAL_ERRORS   0x8000
> -#define EVENT_MASK_EPOW  0x4000
> -#define EVENT_MASK_HOTPLUG   0x1000
> -#define EVENT_MASK_IO0x0800
> +typedef enum EventClass {
> +EVENT_CLASS_INTERNAL_ERRORS = 0,
> +EVENT_CLASS_EPOW= 1,
> +EVENT_CLASS_RESERVED= 2,
> +EVENT_CLASS_HOT_PLUG= 3,
> +EVENT_CLASS_IO  = 4,
> +EVENT_CLASS_MAX
> +} EventClassIndex;
> +#define EVENT_CLASS_MASK(index) (1 << (31 - index))
> +
> +static const char *event_names[EVENT_CLASS_MAX] = {
> +[EVENT_CLASS_INTERNAL_ERRORS]   = "internal-errors",
> +[EVENT_CLASS_EPOW]  = "epow-events",
> +[EVENT_CLASS_HOT_PLUG]  = "hot-plug-events",
> +[EVENT_CLASS_IO]= "ibm,io-events",
> +};
> +
> +struct sPAPREventSource {
> +const char *name;
> +int irq;
> +uint32_t mask;
> +bool enabled;
> +};
> +
> +static sPAPREventSource *spapr_event_sources_new(void)
> +{
> +sPAPREventSource *event_sources = g_new0(sPAPREventSource,
> + EVENT_CLASS_MAX);
> +int i;
> +
> +for (i = 0; i < EVENT_CLASS_MAX; i++) {
> +event_sources[i].name = event_names[i];

You don't really need to have the pointer to the name in
sPAPREventSource.  You only need it for building the DT, and you can
look up event_names in parallel just as easily there.

> +}
>  
> -void spapr_dt_events(void *fdt, uint32_t check_exception_irq)
> +return event_sources;
> +}
> +
> +static void spapr_event_sources_register(sPAPREventSource *event_sources,
> +EventClassIndex index, int irq)
>  {
> -int event_sources, epow_events;
> -uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), 
> cpu_to_be32(1)};
> -uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
> +/* we only support 1 irq per event class at the moment */
> +g_assert(event_sources);
> +g_assert(!event_sources[index].enabled);
> +event_sources[index].irq = irq;
> +event_sources[index].mask = EVENT_CLASS_MASK(index);
> +event_sources[index].enabled = true;
> +}
> +
> +static const sPAPREventSource
> +*spapr_event_sources_get_source(sPAPREventSource *event_sources,
> +EventClassIndex index)

function return type on previous line or same line as the function
name is fine by me.  But please don't split it across lines as you
have here (with the '*' on the second line).

> +{
> +g_assert(index < EVENT_CLASS_MAX);
> +g_assert(event_sources);
> +
> +return _sources[index];
> +}
> +
> +void spapr_dt_events(sPAPRMachineState *spapr, void *fdt)
> +{
> +

Re: [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling

2016-10-25 Thread David Gibson
On Wed, Oct 26, 2016 at 10:58:09AM +1100, David Gibson wrote:
> On Mon, Oct 24, 2016 at 11:47:30PM -0500, Michael Roth wrote:
> > ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
> > negotiated between platform/guest. Currently we hardcode this property
> > in the boot-time device tree to advertise a single negotiated
> > capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
> > has been invoked or that capability has actually been negotiated.
> > 
> > Improve this by generating ibm,architecture-vec-5 based on the full
> > set of option vector 5 capabilities negotiated via CAS.
> > 
> > Signed-off-by: Michael Roth 
> > ---
> >  hw/ppc/spapr.c  | 23 +--
> >  include/hw/ppc/spapr_ovec.h |  1 +
> >  2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 3b64580..828072a 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -659,14 +659,28 @@ static int spapr_dt_cas_updates(sPAPRMachineState 
> > *spapr, void *fdt,
> >  sPAPROptionVector *ov5_updates)
> >  {
> >  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> > -int ret = 0;
> > +int ret = 0, offset;
> >  
> >  /* Generate ibm,dynamic-reconfiguration-memory node if required */
> >  if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
> >  g_assert(smc->dr_lmb_enabled);
> >  ret = spapr_populate_drconf_memory(spapr, fdt);
> > +if (ret) {
> > +goto out;
> > +}
> >  }
> >  
> > +offset = fdt_path_offset(fdt, "/chosen");
> > +if (offset < 0) {
> > +offset = fdt_add_subnode(fdt, 0, "chosen");
> > +if (offset < 0) {
> > +return offset;
> > +}
> 
> Just asserting offset >= 0 would be fine here.  We always create a
> /chosen node.

Duh.  Realised during testing that of course this *is* necessary for
the case where we're just making a CAS patch to the tree, rather than
building the whole tree.  I've reverted my ill-considered change in my
tree back to your original patch.

> 
> > +}
> > +ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
> > + "ibm,architecture-vec-5");
> > +
> > +out:
> >  return ret;
> >  }
> >  
> > @@ -792,14 +806,9 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, 
> > void *fdt)
> >  char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
> >  size_t cb = 0;
> >  char *bootlist = get_boot_devices_list(, true);
> > -unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
> >  
> >  _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
> >  
> > -/* Set Form1_affinity */
> > -_FDT(fdt_setprop(fdt, chosen, "ibm,architecture-vec-5",
> > - vec5, sizeof(vec5)));
> > -
> >  _FDT(fdt_setprop_string(fdt, chosen, "bootargs", 
> > machine->kernel_cmdline));
> >  _FDT(fdt_setprop_cell(fdt, chosen, "linux,initrd-start",
> >spapr->initrd_base));
> > @@ -1778,6 +1787,8 @@ static void ppc_spapr_init(MachineState *machine)
> >  spapr_validate_node_memory(machine, _fatal);
> >  }
> >  
> > +spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> > +
> >  /* init CPUs */
> >  if (machine->cpu_model == NULL) {
> >  machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> > diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> > index 09afd59..47fa04c 100644
> > --- a/include/hw/ppc/spapr_ovec.h
> > +++ b/include/hw/ppc/spapr_ovec.h
> > @@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
> >  
> >  /* option vector 5 */
> >  #define OV5_DRCONF_MEMORY   OV_BIT(2, 2)
> > +#define OV5_FORM1_AFFINITY  OV_BIT(5, 0)
> >  
> >  /* interfaces */
> >  sPAPROptionVector *spapr_ovec_new(void);
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/5] nvram: Refactor OpenBIOS NVRAM code to support -prom-env on pseries, too

2016-10-25 Thread David Gibson
On Tue, Oct 25, 2016 at 02:22:08PM +0200, Thomas Huth wrote:
> On 24.10.2016 14:04, David Gibson wrote:
> > On Mon, Oct 24, 2016 at 12:36:05PM +0200, Thomas Huth wrote:
> >> On 24.10.2016 12:22, Bharata B Rao wrote:
> >>>
> >>> On Wed, Oct 19, 2016 at 7:46 AM, David Gibson
> >>> > wrote:
> >>>
> >>> On Tue, Oct 18, 2016 at 10:46:39PM +0200, Thomas Huth wrote:
> >>> > The OpenBIOS NVRAM set-up is based on the layout defined in the CHRP
> >>> > (Common Hardware Reference Platform) specification. This is the same
> >>> > layout that is also used by the PAPR specification and thus by the
> >>> SLOF
> >>> > firmware of the pseries machine. By refactoring the NVRAM code from
> >>> > mac_nvram.c, we can use the same functions for setting up the NVRAM
> >>> > for both, OpenBIOS and SLOF. This way we can support the "-prom-env"
> >>> > parameter of QEMU for SLOF, too, which is very useful to influence
> >>> > the firmware boot process.
> >>> >
> >>> > Thomas Huth (5):
> >>> >   nvram: Introduce helper functions for CHRP "system" and "free 
> >>> space"
> >>> > partitions
> >>> >   sparc: Use the new common NVRAM functions for system and free 
> >>> space
> >>> > partition
> >>> >   spapr_nvram: Pre-initialize the NVRAM to support the -prom-env
> >>> > parameter
> >>> >   nvram: Move the remaining CHRP NVRAM related code to 
> >>> chrp_nvram.[ch]
> >>> >   nvram: Rename openbios_firmware_abi.h into sun_nvram.h
> >>> >
> >>> >  hw/nvram/Makefile.objs |  1 +
> >>> >  hw/nvram/chrp_nvram.c  | 85
> >>> ++
> >>> >  hw/nvram/mac_nvram.c   | 49 
> >>> +++--
> >>> >  hw/nvram/spapr_nvram.c |  6 ++
> >>> >  hw/sparc/sun4m.c   | 35 ++---
> >>> >  hw/sparc64/sun4u.c | 35 ++---
> >>> >  include/hw/nvram/chrp_nvram.h  | 54
> >>> ++
> >>> >  .../nvram/{openbios_firmware_abi.h => sun_nvram.h} | 47 
> >>> +---
> >>> >  tests/postcopy-test.c  |  8 +-
> >>> >  9 files changed, 179 insertions(+), 141 deletions(-)
> >>> >  create mode 100644 hw/nvram/chrp_nvram.c
> >>> >  create mode 100644 include/hw/nvram/chrp_nvram.h
> >>> >  rename include/hw/nvram/{openbios_firmware_abi.h => sun_nvram.h}
> >>> (50%)
> >>>
> >>> Series,
> >>>
> >>> Reviewed-by: David Gibson  >>> >
> >>>
> >>> I've put it into ppc-for-2.8 tentatively.  However I'd like to get an
> >>> Acked-by from Mark for the Sparc bits before I send my next pull
> >>> request.
> >>>
> >>>
> >>> I observe an early boot failure in SLOF with a commit from this patchset
> >>> on ppc-for-2.8 branch.
> >>>
> >>> 4e1257ed41bce16baa8a010 - spapr_nvram: Pre-initialize the NVRAM to
> >>> support the -prom-env parameter
> >>>
> >>> SLOF 
> >>> **
> >>> QEMU Starting
> >>>  Build Date = Oct 19 2016 09:58:38
> >>>  FW Version = git-efd65f49929d7db7
> >>>  Press "s" to enter Open Firmware.
> >>>
> >>> Populating /vdevice methods
> >>> Populating /vdevice/vty@7100
> >>> Populating /vdevice/nvram@7101
> >>> Populating /vdevice/v-scsi@7102
> >>>SCSI: Looking for devices
> >>>   8200 CD-ROM   : "QEMU QEMU CD-ROM  2.5+"
> >>> Populating /pci@8002000
> >>>  00 1000 (D) : 1033 0194serial bus [ usb-xhci ]
> >>>  00 0800 (D) : 1af4 1001virtio [ block ]
> >>>  00  (D) : 1af4 1000virtio [ net ]
> >>> Scanning USB
> >>>   XHCI: Initializing
> >>> Using default console: /vdevice/vty@7100
> >>> 
> >>>   Welcome to Open Firmware
> >>>
> >>>   Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
> >>>   This program and the accompanying materials are made available
> >>>   under the terms of the BSD License available at
> >>>   http://www.opensource.org/licenses/bsd-license.php
> >>>
> >>>
> >>> Trying to load:  from: /pci@8002000/scsi@1 ...   Successfully 
> >>> loaded
> >>> error: out of memory.
> >>> out of memory
> >>> Aborted. Press any key to exit.
> >>
> >> Yuck. Confirmed. Sorry for the inconvenience. Seems like SLOF does not
> >> create the properties in the /options device tree node anymore in this 
> >> case.
> >>
> >> David, could you please unqueue the "spapr_nvram: Pre-initialize the
> >> NVRAM to support the -prom-env parameter" patch from the ppc-for-2.8
> >> branch until I figure out a fix for this problem? Thanks!
> > 
> > Done.
> 
> FYI, SLOF patch to fix this issue is on the way:
> 

Re: [Qemu-devel] [PATCH 08/10] spapr: Add DRC count indexed hotplug identifier type

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:34PM -0500, Michael Roth wrote:
> From: Bharata B Rao 
> 
> Add support for DRC count indexed hotplug ID type which is primarily
> needed for memory hot unplug. This type allows for specifying the
> number of DRs that should be plugged/unplugged starting from a given
> DRC index.
> 
> Signed-off-by: Bharata B Rao 
> * updated rtas_event_log_v6_hp to reflect count/index field ordering
>   used in PAPR hotplug ACR
> Signed-off-by: Michael Roth 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr_events.c  | 76 
> --
>  include/hw/ppc/spapr.h |  4 +++
>  2 files changed, 65 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index b6b3511..596e991 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -175,6 +175,16 @@ struct epow_log_full {
>  struct rtas_event_log_v6_epow epow;
>  } QEMU_PACKED;
>  
> +union drc_identifier {
> +uint32_t index;
> +uint32_t count;
> +struct {
> +uint32_t count;
> +uint32_t index;
> +} count_indexed;
> +char name[1];
> +} QEMU_PACKED;
> +
>  struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_SECTION_ID_HOTPLUG  0x4850 /* HP */
>  struct rtas_event_log_v6_section_header hdr;
> @@ -191,12 +201,9 @@ struct rtas_event_log_v6_hp {
>  #define RTAS_LOG_V6_HP_ID_DRC_NAME   1
>  #define RTAS_LOG_V6_HP_ID_DRC_INDEX  2
>  #define RTAS_LOG_V6_HP_ID_DRC_COUNT  3
> +#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED  4
>  uint8_t reserved;
> -union {
> -uint32_t index;
> -uint32_t count;
> -char name[1];
> -} drc;
> +union drc_identifier drc_id;
>  } QEMU_PACKED;
>  
>  struct hp_log_full {
> @@ -496,7 +503,7 @@ static void spapr_hotplug_set_signalled(uint32_t 
> drc_index)
>  
>  static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action,
>  sPAPRDRConnectorType drc_type,
> -uint32_t drc)
> +union drc_identifier *drc_id)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  struct hp_log_full *new_hp;
> @@ -541,7 +548,7 @@ static void spapr_hotplug_req_event(uint8_t hp_id, 
> uint8_t hp_action,
>  case SPAPR_DR_CONNECTOR_TYPE_PCI:
>  hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
>  if (hp->hotplug_action == RTAS_LOG_V6_HP_ACTION_ADD) {
> -spapr_hotplug_set_signalled(drc);
> +spapr_hotplug_set_signalled(drc_id->index);
>  }
>  break;
>  case SPAPR_DR_CONNECTOR_TYPE_LMB:
> @@ -559,9 +566,18 @@ static void spapr_hotplug_req_event(uint8_t hp_id, 
> uint8_t hp_action,
>  }
>  
>  if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT) {
> -hp->drc.count = cpu_to_be32(drc);
> +hp->drc_id.count = cpu_to_be32(drc_id->count);
>  } else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_INDEX) {
> -hp->drc.index = cpu_to_be32(drc);
> +hp->drc_id.index = cpu_to_be32(drc_id->index);
> +} else if (hp_id == RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED) {
> +/* we should not be using count_indexed value unless the guest
> + * supports dedicated hotplug event source
> + */
> +g_assert(spapr_ovec_test(spapr->ov5_cas, OV5_HP_EVT));
> +hp->drc_id.count_indexed.count =
> +cpu_to_be32(drc_id->count_indexed.count);
> +hp->drc_id.count_indexed.index =
> +cpu_to_be32(drc_id->count_indexed.index);
>  }
>  
>  rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true);
> @@ -575,34 +591,64 @@ void spapr_hotplug_req_add_by_index(sPAPRDRConnector 
> *drc)
>  {
>  sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -uint32_t index = drck->get_index(drc);
> +union drc_identifier drc_id;
>  
> +drc_id.index = drck->get_index(drc);
>  spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -RTAS_LOG_V6_HP_ACTION_ADD, drc_type, index);
> +RTAS_LOG_V6_HP_ACTION_ADD, drc_type, _id);
>  }
>  
>  void spapr_hotplug_req_remove_by_index(sPAPRDRConnector *drc)
>  {
>  sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  sPAPRDRConnectorType drc_type = drck->get_type(drc);
> -uint32_t index = drck->get_index(drc);
> +union drc_identifier drc_id;
>  
> +drc_id.index = drck->get_index(drc);
>  spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
> -RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, index);
> +RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, _id);
>  }
>  
>  void 

Re: [Qemu-devel] [PATCH 00/10] spapr: option vector re-work and memory unplug support

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:26PM -0500, Michael Roth wrote:
> This series is based on David's ppc-for-2.8 branch, and is also available 
> from:
> 
>   https://github.com/mdroth/qemu/commits/spapr-hotplug-event-update
> 
> Changes since RFC:
>   * Submit as v1 now that PAPR Hotplug ACR is accepted
>   * Rebase on latest ppc-for-2.8 (with device-tree refactoring)
>   * address Patchew warnings
>   * add comments to clarify spapr->ov5/ov5_cas usage. (David)
>   * revise comment to clarify intent when setting spapr->ov5
> OV5_HP_EVT bit. (Bharata)
>   * drop internal usage of spapr_ovec_from_bitmap() in favor of
> directly assigning bitmap to sPAPROptionVector instances. (David)
>   * standardize meaning of 'vector_len' variable through spapr_ovec_*
> functions to be the byte-wise length of option vectors entries,
> and not including the preceeding length byte itself. (David)
>   * fix spapr_ovec_populate_dt() to parse up to OV_MAXBITS bits
> rather than OV_MAXBITS - 1. (David)
>   * fix spapr_ovec_populate_dt() encode the minimum of 1 option
> vector byte instead of the max of OV_MAXBYTES in cases where
> no option bits are set. (David)
>   * add some comments to spapr_ovec_populate_dt() to clarify what
> is being encoded into length byte of ibm,architecture-vec-5
>   * switch 'legacy-hotplug-events' option to
> 'modern-hotplug-events' (David)
>   * modify rtas_event_log_to_source() to check for OV5_HP_EVT
> option rather than relying on whether the hotplug source is
> specifically enabled. Assert the latter in cases where
> OV5_HP_EVT is set. (Bharata)
>   * drop global EventSource list in favor of an sPAPREventSource
> list field within sPAPRMachineState (David)
>   * add CPU unplug hook in mc->unplug_request (Bharata)
> 
> 
> Patches 1-4 address various deficiencies in how we currently handle option
> vectors via ibm,client-architecture-support. This is done here in preparation
> for a new option vector bit introduced later in this series, as well as a
> number of future option vector bits related to other features, but I can
> break this out into a separate series if preferred.

I've now merged these 4 (adjusting for a couple of tiny nits mentioned
in comments).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 01/10] spapr_ovec: initial implementation of option vector helpers

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:27PM -0500, Michael Roth wrote:
> PAPR guests advertise their capabilities to the platform by passing
> an ibm,architecture-vec structure via an
> ibm,client-architecture-support hcall as described by LoPAPR v11,
> B.6.2.3. during early boot.
> 
> Using this information, the platform enables the capabilities it
> supports, then encodes a subset of those enabled capabilities (the
> 5th option vector of the ibm,architecture-vec structure passed to
> ibm,client-architecture-support) into the guest device tree via
> "/chosen/ibm,architecture-vec-5".
> 
> The logical format of these these option vectors is a bit-vector,
> where individual bits are addressed/documented based on the byte-wise
> offset from the beginning of the bit-vector, followed by the bit-wise
> index starting from the byte-wise offset. Thus the bits of each of
> these bytes are stored in reverse order. Additionally, the first
> byte of each option vector is encodes the length of the option vector,
> so byte offsets begin at 1, and bit offset at 0.
> 
> This is not very intuitive for the purposes of mapping these bits to
> a particular documented capability, so this patch introduces a set
> of abstractions that encapsulate the work of parsing/encoding these
> options vectors and testing for individual capabilities.
> 
> Cc: Bharata B Rao 
> Signed-off-by: Michael Roth 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/Makefile.objs|   2 +-
>  hw/ppc/spapr_ovec.c | 242 
> 
>  include/hw/ppc/spapr_ovec.h |  62 
>  3 files changed, 305 insertions(+), 1 deletion(-)
>  create mode 100644 hw/ppc/spapr_ovec.c
>  create mode 100644 include/hw/ppc/spapr_ovec.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index ebc72af..8025129 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -4,7 +4,7 @@ obj-y += ppc.o ppc_booke.o fdt.o
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> -obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> +obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
>  # IBM PowerNV
>  obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> new file mode 100644
> index 000..c2a0d18
> --- /dev/null
> +++ b/hw/ppc/spapr_ovec.c
> @@ -0,0 +1,242 @@
> +/*
> + * QEMU SPAPR Architecture Option Vector Helper Functions
> + *
> + * Copyright IBM Corp. 2016
> + *
> + * Authors:
> + *  Bharata B Rao 
> + *  Michael Roth  
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/ppc/spapr_ovec.h"
> +#include "qemu/bitmap.h"
> +#include "exec/address-spaces.h"
> +#include "qemu/error-report.h"
> +#include 
> +
> +/* #define DEBUG_SPAPR_OVEC */
> +
> +#ifdef DEBUG_SPAPR_OVEC
> +#define DPRINTFN(fmt, ...) \
> +do { fprintf(stderr, fmt "\n", ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTFN(fmt, ...) \
> +do { } while (0)
> +#endif
> +
> +#define OV_MAXBYTES 256 /* not including length byte */
> +#define OV_MAXBITS (OV_MAXBYTES * BITS_PER_BYTE)
> +
> +/* we *could* work with bitmaps directly, but handling the bitmap privately
> + * allows us to more safely make assumptions about the bitmap size and
> + * simplify the calling code somewhat
> + */
> +struct sPAPROptionVector {
> +unsigned long *bitmap;
> +};
> +
> +sPAPROptionVector *spapr_ovec_new(void)
> +{
> +sPAPROptionVector *ov;
> +
> +ov = g_new0(sPAPROptionVector, 1);
> +ov->bitmap = bitmap_new(OV_MAXBITS);
> +
> +return ov;
> +}
> +
> +sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig)
> +{
> +sPAPROptionVector *ov;
> +
> +g_assert(ov_orig);
> +
> +ov = spapr_ovec_new();
> +bitmap_copy(ov->bitmap, ov_orig->bitmap, OV_MAXBITS);
> +
> +return ov;
> +}
> +
> +void spapr_ovec_intersect(sPAPROptionVector *ov,
> +  sPAPROptionVector *ov1,
> +  sPAPROptionVector *ov2)
> +{
> +g_assert(ov);
> +g_assert(ov1);
> +g_assert(ov2);
> +
> +bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
> +}
> +
> +/* returns true if options bits were removed, false otherwise */
> +bool spapr_ovec_diff(sPAPROptionVector *ov,
> + sPAPROptionVector *ov_old,
> + sPAPROptionVector *ov_new)
> +{
> +unsigned long *change_mask = bitmap_new(OV_MAXBITS);
> +unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
> +bool bits_were_removed = false;
> +
> +   

Re: [Qemu-devel] [PATCH 04/10] spapr: improve ibm, architecture-vec-5 property handling

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:30PM -0500, Michael Roth wrote:
> ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
> negotiated between platform/guest. Currently we hardcode this property
> in the boot-time device tree to advertise a single negotiated
> capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
> has been invoked or that capability has actually been negotiated.
> 
> Improve this by generating ibm,architecture-vec-5 based on the full
> set of option vector 5 capabilities negotiated via CAS.
> 
> Signed-off-by: Michael Roth 
> ---
>  hw/ppc/spapr.c  | 23 +--
>  include/hw/ppc/spapr_ovec.h |  1 +
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3b64580..828072a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -659,14 +659,28 @@ static int spapr_dt_cas_updates(sPAPRMachineState 
> *spapr, void *fdt,
>  sPAPROptionVector *ov5_updates)
>  {
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> -int ret = 0;
> +int ret = 0, offset;
>  
>  /* Generate ibm,dynamic-reconfiguration-memory node if required */
>  if (spapr_ovec_test(ov5_updates, OV5_DRCONF_MEMORY)) {
>  g_assert(smc->dr_lmb_enabled);
>  ret = spapr_populate_drconf_memory(spapr, fdt);
> +if (ret) {
> +goto out;
> +}
>  }
>  
> +offset = fdt_path_offset(fdt, "/chosen");
> +if (offset < 0) {
> +offset = fdt_add_subnode(fdt, 0, "chosen");
> +if (offset < 0) {
> +return offset;
> +}

Just asserting offset >= 0 would be fine here.  We always create a
/chosen node.

> +}
> +ret = spapr_ovec_populate_dt(fdt, offset, spapr->ov5_cas,
> + "ibm,architecture-vec-5");
> +
> +out:
>  return ret;
>  }
>  
> @@ -792,14 +806,9 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, 
> void *fdt)
>  char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
>  size_t cb = 0;
>  char *bootlist = get_boot_devices_list(, true);
> -unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
>  
>  _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
>  
> -/* Set Form1_affinity */
> -_FDT(fdt_setprop(fdt, chosen, "ibm,architecture-vec-5",
> - vec5, sizeof(vec5)));
> -
>  _FDT(fdt_setprop_string(fdt, chosen, "bootargs", 
> machine->kernel_cmdline));
>  _FDT(fdt_setprop_cell(fdt, chosen, "linux,initrd-start",
>spapr->initrd_base));
> @@ -1778,6 +1787,8 @@ static void ppc_spapr_init(MachineState *machine)
>  spapr_validate_node_memory(machine, _fatal);
>  }
>  
> +spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
> +
>  /* init CPUs */
>  if (machine->cpu_model == NULL) {
>  machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 09afd59..47fa04c 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -44,6 +44,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  
>  /* option vector 5 */
>  #define OV5_DRCONF_MEMORY   OV_BIT(2, 2)
> +#define OV5_FORM1_AFFINITY  OV_BIT(5, 0)
>  
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 03/10] spapr: add option vector handling in CAS-generated resets

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:29PM -0500, Michael Roth wrote:
> In some cases, ibm,client-architecture-support calls can fail. This
> could happen in the current code for situations where the modified
> device tree segment exceeds the buffer size provided by the guest
> via the call parameters. In these cases, QEMU will reset, allowing
> an opportunity to regenerate the device tree from scratch via
> boot-time handling. There are potentially other scenarios as well,
> not currently reachable in the current code, but possible in theory,
> such as cases where device-tree properties or nodes need to be removed.
> 
> We currently don't handle either of these properly for option vector
> capabilities however. Instead of carrying the negotiated capability
> beyond the reset and creating the boot-time device tree accordingly,
> we start from scratch, generating the same boot-time device tree as we
> did prior to the CAS-generated and the same device tree updates as we
> did before. This could (in theory) cause us to get stuck in a reset
> loop. This hasn't been observed, but depending on the extensiveness
> of CAS-induced device tree updates in the future, could eventually
> become an issue.
> 
> Address this by pulling capability-related device tree
> updates resulting from CAS calls into a common routine,
> spapr_dt_cas_updates(), and adding an sPAPROptionVector*
> parameter that allows us to test for newly-negotiated capabilities.
> We invoke it as follows:
> 
> 1) When ibm,client-architecture-support gets called, we
>call spapr_dt_cas_updates() with the set of capabilities
>added since the previous call to ibm,client-architecture-support.
>For the initial boot, or a system reset generated by something
>other than the CAS call itself, this set will consist of *all*
>options supported both the platform and the guest. For calls
>to ibm,client-architecture-support immediately after a CAS-induced
>reset, we call spapr_dt_cas_updates() with only the set
>of capabilities added since the previous call, since the other
>capabilities will have already been addressed by the boot-time
>device-tree this time around. In the unlikely event that
>capabilities are *removed* since the previous CAS, we will
>generate a CAS-induced reset. In the unlikely event that we
>cannot fit the device-tree updates into the buffer provided
>by the guest, well generate a CAS-induced reset.
> 
> 2) When a CAS update results in the need to reset the machine and
>include the updates in the boot-time device tree, we call the
>spapr_dt_cas_updates() using the full set of negotiated
>capabilities as part of the reset path. At initial boot, or after
>a reset generated by something other than the CAS call itself,
>this set will be empty, resulting in what should be the same
>boot-time device-tree as we generated prior to this patch. For
>CAS-induced reset, this routine will be called with the full set of
>capabilities negotiated by the platform/guest in the previous
>CAS call, which should result in CAS updates from previous call
>being accounted for in the initial boot-time device tree.
> 
> Signed-off-by: Michael Roth 
> Reviewed-by: David Gibson 

One little nit..

[snip]
> @@ -1013,13 +1013,27 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu_,
>   * of guest input. To model these properly we'd want some sort of mask,
>   * but since they only currently apply to memory migration as defined
>   * by LoPAPR 1.1, 14.5.4.8, which QEMU doesn't implement, we don't need
> - * to worry about this.
> + * to worry about this for now.
>   */
> +ov5_cas_old = spapr_ovec_clone(spapr->ov5_cas);
> +/* full range of negotiated ov5 capabilities */
>  spapr_ovec_intersect(spapr->ov5_cas, spapr->ov5, ov5_guest);
>  spapr_ovec_cleanup(ov5_guest);
> +/* capabilities that have been added since CAS-generated guest reset.
> + * if capabilities have since been removed, generate another reset
> + */
> +ov5_updates = spapr_ovec_new();
> +spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
> +ov5_cas_old, spapr->ov5_cas);
> +
> +if (!spapr->cas_reboot) {
> +spapr->cas_reboot =
> +spapr_h_cas_compose_response(spapr, args[1], args[2], cpu_update,
> + ov5_updates);

spapr->cas_reboot is a bool, whereas spapr_h_cas_compose_response()
returns an int error code.  Now that C has real bools, I think the
compiler will do the right thing here, but I'd prefer to see an explicit

cas_reboot = (spapr_h_cas_compose_response() != 0)

for clarity.

> +}
> +spapr_ovec_cleanup(ov5_updates);
>  
> -if (spapr_h_cas_compose_response(spapr, args[1], args[2],
> - cpu_update)) {
> +if (spapr->cas_reboot) {
> 

Re: [Qemu-devel] [PATCH] adb: change handler only when recognized

2016-10-25 Thread David Gibson
On Tue, Oct 25, 2016 at 09:01:01AM +0200, Hervé Poussineau wrote:
> ADB devices must take new handler into account only when they recognize it.
> This lets operating systems probe for valid/invalid handles, to know device 
> capabilities.
> 
> Add a FIXME in keyboard handler, which should use a different translation
> table depending of the selected handler.
> 
> Signed-off-by: Hervé Poussineau 

Applied to ppc-for-2.8, thanks.

> ---
>  hw/input/adb.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/input/adb.c b/hw/input/adb.c
> index 3d39368..43d3205 100644
> --- a/hw/input/adb.c
> +++ b/hw/input/adb.c
> @@ -396,9 +396,15 @@ static int adb_kbd_request(ADBDevice *d, uint8_t *obuf,
>  d->devaddr = buf[1] & 0xf;
>  break;
>  default:
> -/* XXX: check this */
>  d->devaddr = buf[1] & 0xf;
> -d->handler = buf[2];
> +/* we support handlers:
> + * 1: Apple Standard Keyboard
> + * 2: Apple Extended Keyboard (LShift = RShift)
> + * 3: Apple Extended Keyboard (LShift != RShift)
> + */
> +if (buf[2] == 1 || buf[2] == 2 || buf[2] == 3) {
> +d->handler = buf[2];
> +}
>  break;
>  }
>  }
> @@ -437,6 +443,7 @@ static void adb_keyboard_event(DeviceState *dev, 
> QemuConsole *src,
>  if (qcode >= ARRAY_SIZE(qcode_to_adb_keycode)) {
>  return;
>  }
> +/* FIXME: take handler into account when translating qcode */
>  keycode = qcode_to_adb_keycode[qcode];
>  if (keycode == NO_KEY) {  /* We don't want to send this to the guest */
>  ADB_DPRINTF("Ignoring NO_KEY\n");
> @@ -631,8 +638,21 @@ static int adb_mouse_request(ADBDevice *d, uint8_t *obuf,
>  d->devaddr = buf[1] & 0xf;
>  break;
>  default:
> -/* XXX: check this */
>  d->devaddr = buf[1] & 0xf;
> +/* we support handlers:
> + * 0x01: Classic Apple Mouse Protocol / 100 cpi operations
> + * 0x02: Classic Apple Mouse Protocol / 200 cpi operations
> + * we don't support handlers (at least):
> + * 0x03: Mouse systems A3 trackball
> + * 0x04: Extended Apple Mouse Protocol
> + * 0x2f: Microspeed mouse
> + * 0x42: Macally
> + * 0x5f: Microspeed mouse
> + * 0x66: Microspeed mouse
> + */
> +if (buf[2] == 1 || buf[2] == 2) {
> +d->handler = buf[2];
> +}
>  break;
>  }
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 06/10] spapr: add hotplug interrupt machine options

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:32PM -0500, Michael Roth wrote:
> This adds machine options of the form:
> 
>   -machine pseries,modern-hotplug-events=true
>   -machine pseries,modern-hotplug-events=false
> 
> If false, QEMU will force the use of "legacy" style hotplug events,
> which are surfaced through EPOW events instead of a dedicated
> hot plug event source, and lack certain features necessary, mainly,
> for memory unplug support.
> 
> If true, QEMU will enable support for "modern" dedicated hot plug
> event source. Note that we will still default to "legacy" style unless
> the guest advertises support for the "modern" hotplug events via
> ibm,client-architecture-support hcall during early boot.
> 
> For pseries-2.7 and earlier we default to false, for newer machine
> types we default to true.
> 
> Signed-off-by: Michael Roth 

I think this either needs to go after the next patch, or be merged
with it.

As it stands, after this patch, you're advertising availability of the
new mechanism without having actually implemented it.

> ---
>  hw/ppc/spapr.c  | 33 +
>  include/hw/ppc/spapr.h  |  1 +
>  include/hw/ppc/spapr_ovec.h |  1 +
>  3 files changed, 35 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828072a..a3ea140 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1789,6 +1789,11 @@ static void ppc_spapr_init(MachineState *machine)
>  
>  spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
>  
> +/* advertise support for dedicated HP event source to guests */
> +if (spapr->use_hotplug_event_source) {
> +spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
> +}
> +
>  /* init CPUs */
>  if (machine->cpu_model == NULL) {
>  machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
> @@ -2138,16 +2143,41 @@ static void spapr_set_kvm_type(Object *obj, const 
> char *value, Error **errp)
>  spapr->kvm_type = g_strdup(value);
>  }
>  
> +static bool spapr_get_modern_hotplug_events(Object *obj, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +return spapr->use_hotplug_event_source;
> +}
> +
> +static void spapr_set_modern_hotplug_events(Object *obj, bool value,
> +Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +spapr->use_hotplug_event_source = value;
> +}
> +
>  static void spapr_machine_initfn(Object *obj)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
>  
>  spapr->htab_fd = -1;
> +spapr->use_hotplug_event_source = true;
>  object_property_add_str(obj, "kvm-type",
>  spapr_get_kvm_type, spapr_set_kvm_type, NULL);
>  object_property_set_description(obj, "kvm-type",
>  "Specifies the KVM virtualization mode 
> (HV, PR)",
>  NULL);
> +object_property_add_bool(obj, "modern-hotplug-events",
> +spapr_get_modern_hotplug_events,
> +spapr_set_modern_hotplug_events,
> +NULL);
> +object_property_set_description(obj, "modern-hotplug-events",
> +"Use dedicated hotplug event mechanism 
> in"
> +" place of standard EPOW events when 
> possible"
> +" (required for memory hot-unplug 
> support)",
> +NULL);
>  }
>  
>  static void spapr_machine_finalizefn(Object *obj)
> @@ -2594,7 +2624,10 @@ static void phb_placement_2_7(sPAPRMachineState 
> *spapr, uint32_t index,
>  
>  static void spapr_machine_2_7_instance_options(MachineState *machine)
>  {
> +sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +
>  spapr_machine_2_8_instance_options(machine);
> +spapr->use_hotplug_event_source = false;
>  }
>  
>  static void spapr_machine_2_7_class_options(MachineClass *mc)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b6f9f1b..851f536 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -80,6 +80,7 @@ struct sPAPRMachineState {
>  uint32_t check_exception_irq;
>  Notifier epow_notifier;
>  QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> +bool use_hotplug_event_source;
>  
>  /* Migration state */
>  int htab_save_index;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index 47fa04c..92167c6 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -45,6 +45,7 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  /* option vector 5 */
>  #define OV5_DRCONF_MEMORY   OV_BIT(2, 2)
>  #define OV5_FORM1_AFFINITY  OV_BIT(5, 0)
> +#define OV5_HP_EVT  OV_BIT(6, 5)
>  
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);

-- 
David Gibson

Re: [Qemu-devel] [PATCH 09/10] spapr: use count+index for memory hotplug

2016-10-25 Thread David Gibson
On Mon, Oct 24, 2016 at 11:47:35PM -0500, Michael Roth wrote:
> Commit 0a417869:
> 
> spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type
> 
> dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
> single LMB count value. This was to avoid overrunning the guest EPOW
> event queue with hotplug events. This works fine, but relies on the
> guest exhaustively scanning for pluggable LMBs to satisfy the
> requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
> until all the LMBs associated with the DIMM are identified.
> 
> With newer support for dedicated hotplug event source, this queue
> exhaustion is no longer as much of an issue due to implementation
> details on the guest side, but we still try to avoid excessive hotplug
> events by now supporting both a count and a starting index to avoid
> unecessary work. This patch makes use of that approach when the
> capability is available.
> 
> Cc: bhar...@linux.vnet.ibm.com
> Signed-off-by: Michael Roth 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index dc4224b..0b3aa2f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2202,14 +2202,16 @@ static void spapr_nmi(NMIState *n, int cpu_index, 
> Error **errp)
>  }
>  }
>  
> -static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
> -   uint32_t node, Error **errp)
> +static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t 
> size,
> +   uint32_t node, bool dedicated_hp_event_source,
> +   Error **errp)
>  {
>  sPAPRDRConnector *drc;
>  sPAPRDRConnectorClass *drck;
>  uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
>  int i, fdt_offset, fdt_size;
>  void *fdt;
> +uint64_t addr = addr_start;
>  
>  for (i = 0; i < nr_lmbs; i++) {
>  drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> @@ -2228,7 +2230,17 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
> addr, uint64_t size,
>   * guest only in case of hotplugged memory
>   */
>  if (dev->hotplugged) {
> -   spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB, nr_lmbs);
> +if (dedicated_hp_event_source) {
> +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +addr_start / SPAPR_MEMORY_BLOCK_SIZE);
> +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> spapr_hotplug_req_add_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +   nr_lmbs,
> +   drck->get_index(drc));
> +} else {
> +spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB,
> +   nr_lmbs);
> +}
>  }
>  }
>  
> @@ -2261,7 +2273,9 @@ static void spapr_memory_plug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  goto out;
>  }
>  
> -spapr_add_lmbs(dev, addr, size, node, _abort);
> +spapr_add_lmbs(dev, addr, size, node,
> +   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
> +   _abort);
>  
>  out:
>  error_propagate(errp, local_err);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 15/17] ppc/pnv: Add cut down PSI bridge model and hookup external interrupt

2016-10-25 Thread David Gibson
On Tue, Oct 25, 2016 at 09:58:10AM +0200, Cédric Le Goater wrote:
> On 10/25/2016 07:30 AM, David Gibson wrote:
> > On Sat, Oct 22, 2016 at 11:46:48AM +0200, Cédric Le Goater wrote:
> >> From: Benjamin Herrenschmidt 
> >>
> >> The PSI (Processor Service Interface) is one of the engines of the
> >> "Bridge" unit which connects the different interfaces to the Power
> >> Processor.
> >>
> >> This adds just enough of the PSI bridge to handle various on-chip and
> >> the one external interrupt. The rest of PSI has to do with the link to
> >> the IBM FSP service processor which we don't plan to emulate (not used
> >> on OpenPower machines).
> >>
> >> Signed-off-by: Benjamin Herrenschmidt 
> >> [clg: - updated for qemu-2.7
> >>   - changed the XSCOM interface to fit new model
> >>   - QOMified the model
> >>   - reworked set_xive ]
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>
> >>  When skiboot initializes PSIHB, it fills the xives with server=0,
> >>  prio=0xff, which is fine, but for some reason the last two xive
> >>  settings reach the qemu MMIO region with a bogus value :
> >>  
> >>pnv_psi_mmio_write: MMIO write 0x30 val 0x00ff
> >>pnv_psi_mmio_write: MMIO write 0x60 val 0x00ff2000
> >>pnv_psi_mmio_write: MMIO write 0x68 val 0x00ff4000
> >>pnv_psi_mmio_write: MMIO write 0x70 val 0x00ff6000
> >>pnv_psi_mmio_write: MMIO write 0x78 val 0x8000
> >>pnv_psi_mmio_write: MMIO write 0x80 val 0xa000
> >>
> >>  It looks like a badly initialized temp variable in the call
> >>  stack. The memory regions look fine, maybe in stdcix ? For the
> >>  moment, I have added a logging error to catch non zero values as the
> >>  guest should not do that in any case.
> > 
> > Just to clarify, I think you're saying that you believe this to be a
> > skiboot (guest side) bug rather than a qemu bug.  Is that right?
> 
> Yes. I just found why. The P8_IRQ_PSI_* values in skiboot need to
> be unsigned because they are shifted left of 29 bits :
> 
>   ...
>   #define P8_IRQ_PSI_LOCAL_ERR4
>   #define P8_IRQ_PSI_EXTERNAL 5   /* Used for UART */
>   ...
>   out_be64(psi->regs + PSIHB_XIVR_LOCAL_ERR,
>(0xffull << 32) | (P8_IRQ_PSI_LOCAL_ERR << 29));
>   out_be64(psi->regs + PSIHB_XIVR_HOST_ERR,
>(0xffull << 32) | (P8_IRQ_PSI_EXTERNAL << 29));
> 
> 
> I will send a skiboot patch but we need to keep the code as it is. 

Sure.  Seems like the hardware ignores those high bits, so we probably
should too.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RFC 4/7] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()

2016-10-25 Thread Changlong Xie

On 10/20/2016 09:57 PM, zhanghailiang wrote:

The helper backup_do_checkpoint() will be used for primary related
codes. Here we split it out from secondary_do_checkpoint().

Besides, it is unnecessary to call backup_do_checkpoint() in
replication starting and normally stop replication path.


This patch is unnecessary. We *really* need clean 
backup_job->done_bitmap in replication_start/stop path.



We only need call it while do real checkpointing.

Signed-off-by: zhanghailiang 
---
  block/replication.c | 36 +++-
  1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 2a2fdb2..d687ffc 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -320,20 +320,8 @@ static bool 
replication_recurse_is_first_non_filter(BlockDriverState *bs,

  static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
  {
-Error *local_err = NULL;
  int ret;

-if (!s->secondary_disk->bs->job) {
-error_setg(errp, "Backup job was cancelled unexpectedly");
-return;
-}
-
-backup_do_checkpoint(s->secondary_disk->bs->job, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-
  ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
  if (ret < 0) {
  error_setg(errp, "Cannot make active disk empty");
@@ -539,6 +527,8 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
  aio_context_release(aio_context);
  return;
  }
+
+secondary_do_checkpoint(s, errp);
  break;
  default:
  aio_context_release(aio_context);
@@ -547,10 +537,6 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,

  s->replication_state = BLOCK_REPLICATION_RUNNING;

-if (s->mode == REPLICATION_MODE_SECONDARY) {
-secondary_do_checkpoint(s, errp);
-}
-
  s->error = 0;
  aio_context_release(aio_context);
  }
@@ -560,13 +546,29 @@ static void replication_do_checkpoint(ReplicationState 
*rs, Error **errp)
  BlockDriverState *bs = rs->opaque;
  BDRVReplicationState *s;
  AioContext *aio_context;
+Error *local_err = NULL;

  aio_context = bdrv_get_aio_context(bs);
  aio_context_acquire(aio_context);
  s = bs->opaque;

-if (s->mode == REPLICATION_MODE_SECONDARY) {
+switch (s->mode) {
+case REPLICATION_MODE_PRIMARY:
+break;
+case REPLICATION_MODE_SECONDARY:
+if (!s->secondary_disk->bs->job) {
+error_setg(errp, "Backup job was cancelled unexpectedly");
+break;
+}
+backup_do_checkpoint(s->secondary_disk->bs->job, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+break;
+}
  secondary_do_checkpoint(s, errp);
+break;
+default:
+abort();
  }
  aio_context_release(aio_context);
  }







Re: [Qemu-devel] [PATCHv5 07/12] libqos: Implement mmio accessors in terms of mem{read, write}

2016-10-25 Thread Alexey Kardashevskiy
On 25/10/16 23:16, David Gibson wrote:
> On Tue, Oct 25, 2016 at 05:47:43PM +1100, Alexey Kardashevskiy wrote:
>> On 24/10/16 15:59, David Gibson wrote:
>>> In the libqos PCI code we now have accessors both for registers (byte
>>> significance preserving) and for streaming data (byte address order
>>> preserving).  These exist in both the interface for qtest drivers and in
>>> the machine specific backends.
>>>
>>> However, the register-style accessors aren't actually necessary in the
>>> backend.  They can be implemented in terms of the byte address order
>>> preserving accessors by the libqos wrappers.  This works because PCI is
>>> always little endian.
>>>
>>> This does assume that the back end byte address order preserving accessors
>>> will perform the equivalent of a single bus transaction for short lengths.
>>> This is the case, and in fact they currently end up using the same
>>> cpu_physical_memory_rw() implementation within the qtest accelerator.
>>>
>>> Signed-off-by: David Gibson 
>>> Reviewed-by: Laurent Vivier 
>>> Reviewed-by: Greg Kurz 
>>> ---
>>>  tests/libqos/pci-pc.c| 38 --
>>>  tests/libqos/pci-spapr.c | 44 
>>>  tests/libqos/pci.c   | 20 ++--
>>>  tests/libqos/pci.h   |  8 
>>>  4 files changed, 14 insertions(+), 96 deletions(-)
>>>
>>
>> [...]
>>
>>> diff --git a/tests/libqos/pci.h b/tests/libqos/pci.h
>>> index 2b08362..ce6ed08 100644
>>> --- a/tests/libqos/pci.h
>>> +++ b/tests/libqos/pci.h
>>> @@ -27,18 +27,10 @@ struct QPCIBus {
>>>  uint16_t (*pio_readw)(QPCIBus *bus, uint32_t addr);
>>>  uint32_t (*pio_readl)(QPCIBus *bus, uint32_t addr);
>>>  
>>> -uint8_t (*mmio_readb)(QPCIBus *bus, uint32_t addr);
>>> -uint16_t (*mmio_readw)(QPCIBus *bus, uint32_t addr);
>>> -uint32_t (*mmio_readl)(QPCIBus *bus, uint32_t addr);
>>> -
>>>  void (*pio_writeb)(QPCIBus *bus, uint32_t addr, uint8_t value);
>>>  void (*pio_writew)(QPCIBus *bus, uint32_t addr, uint16_t value);
>>>  void (*pio_writel)(QPCIBus *bus, uint32_t addr, uint32_t value);
>>>  
>>> -void (*mmio_writeb)(QPCIBus *bus, uint32_t addr, uint8_t value);
>>> -void (*mmio_writew)(QPCIBus *bus, uint32_t addr, uint16_t value);
>>> -void (*mmio_writel)(QPCIBus *bus, uint32_t addr, uint32_t value);
>>> -
>>>  void (*memread)(QPCIBus *bus, uint32_t addr, void *buf, size_t len);
>>>  void (*memwrite)(QPCIBus *bus, uint32_t addr, const void *buf, size_t 
>>> len);
>>>  
>>>
>>
>> You added them in "libqos: Handle PCI IO de-multiplexing in common code"
>> (few patched before) and removing them now - if you moved this patch
>> earlier, it would reduce the series, or what do I miss?
> 
> Well, it can't go before the PIO / MMIO split, because on x86 the PIO
> part is implemented with inw/outw instead of readw/writew, and those
> don't have a memread/memwrite equivalent.
> 
> The change could go at the same time, but my feeling was that logical
> separation of the steps was worth a bit of temporary extra code.

It is a bit hard to follow the logic of the patchset when you do not know
if the new code is going to stay or not - I automatically assumed it is
staying and when I saw it is being removed - I wondered if you are removing
what you just added, and this - in my opinion - kills the idea of making
smaller patches to make review easier, better just squash them all... But
since Greg is happy and things seems not working worse (make check fails on
my setup but whatever), you can ignore me :)


-- 
Alexey



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 00/12] virtio-crypto: introduce framework and device emulation

2016-10-25 Thread Gonglei (Arei)
Hi Michael,


> -Original Message-
> From: virtio-...@lists.oasis-open.org [mailto:virtio-...@lists.oasis-open.org]
> On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, October 26, 2016 12:51 AM
> Subject: [virtio-dev] Re: [PATCH v9 00/12] virtio-crypto: introduce framework
> and device emulation
> 
> Will do.
> Meanwhile, could you please create and open the oasis tracker
> in jira, so we can vote on it?
> 
I created an oasis issue one month ago:

https://issues.oasis-open.org/i#browse/VIRTIO-153

I'll update the status based on the newest version about virtio-crypto spec.


Regards,
-Gonglei

> 
> On Tue, Oct 25, 2016 at 11:20:35AM +, Gonglei (Arei) wrote:
> > Hi Michael and Stefan,
> >
> > Ping...
> >
> > Would you please review and/or merge this feature for Qemu 2.8
> > because the soft-freeze period draw near.
> >
> > Thanks,
> > -Gonglei
> >
> >



Re: [Qemu-devel] [Qemu-stable] [Qemu-ppc] [PULL 0/4] ppc patches for qemu-2.7 stable branch

2016-10-25 Thread Michael Roth
Quoting David Gibson (2016-10-24 20:41:29)
> On Mon, Oct 17, 2016 at 04:24:31PM -0500, Michael Roth wrote:
> > Quoting Peter Maydell (2016-10-17 13:45:21)
> > > On 17 October 2016 at 19:13, Michael Roth  
> > > wrote:
> > > > We could do both though: use some ad-hoc way to tag for a particular
> > > > sub-maintainer tree/stable branch, as well as an explicit "not for
> > > > master" in the cover letter ensure it doesn't go into master. It's a bit
> > > > more redundant, but flexible in that people can use whatever tagging
> > > > format they want for a particular tree.
> > > 
> > > Yes, that would be my preference. Gmail's filtering is not
> > > very good, and it doesn't seem to be able to support
> > > multiple or complex matches on the subject line, but
> > > it can deal with "doesn't include foo in body".
> > > People who actively want to look for stuff not to go
> > > into master can filter it however they like.
> > 
> > Sounds good to me. For my part I think "for-2.7.1" etc. would be
> > prefereable. No need to resend this patchset though.
> > 
> > I suppose MAINTAINERS would be the best place to document something
> > like this?
> 
> So.. regardless of the outcome in general for future stable merges..
> 
> Has this batch been merged for 2.7 stable?  Or do I need to resend it
> in the new style?

No need to resend. I should have the initial staging tree for 2.7 posted
by Monday and will have this included.

> 
> -- 
> David Gibson| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson




Re: [Qemu-devel] [PATCH v5 08/13] qapi: Allow blockdev-add for NBD

2016-10-25 Thread Eric Blake
On 10/25/2016 08:11 AM, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  qapi/block-core.json | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 97b1205..4b4a74c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1703,14 +1703,15 @@
>  #
>  # @host_device, @host_cdrom: Since 2.1
>  # @gluster: Since 2.7
> +# @nbd: Since 2.8

'replication' was also added in 2.8; we should mention it while touching
this.

>  #
>  # Since: 2.0
>  ##
>  { 'enum': 'BlockdevDriver',
>'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
>  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
> -'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
> -'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> +'host_device', 'http', 'https', 'luks', 'nbd', 'null-aio',
> +'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
>   'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }

Can we fix the TAB damage while at it?

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 1/1] qga: minimal support for fstrim for Windows guests

2016-10-25 Thread Michael Roth
Quoting Denis V. Lunev (2016-10-05 06:13:12)
> On 10/04/2016 04:43 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Mon, Oct 3, 2016 at 6:01 PM Denis V. Lunev  > > wrote:
> >
> > Unfortunately, there is no public Windows API to start trimming the
> > filesystem. The only viable way here is to call 'defrag.exe /L' for
> > each volume.
> >
> > This is working since Win8 and Win2k12.
> >
> > Signed-off-by: Denis V. Lunev >
> > Signed-off-by: Denis Plotnikov  > >
> > CC: Michael Roth  > >
> > CC: Stefan Weil >
> > CC: Marc-André Lureau  > >
> >
> >
> > overall looks good to me, few remarks below:
> >  
> >
> > ---
> >  qga/commands-win32.c | 97
> > ++--
> >  1 file changed, 94 insertions(+), 3 deletions(-)
> >
> > Changes from v3:
> > - fixed memory leak on error path for FindFirstVolumeW
> > - replaced g_malloc0 with g_malloc for uc_path. g_malloc is better
> > as we are
> >   allocating string, not an object
> >
> > Changes from v1, v2:
> > - next attempt to fix error handling on error in FindFirstVolumeW
> >
> > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> > index 9c9be12..cebf4cc 100644
> > --- a/qga/commands-win32.c
> > +++ b/qga/commands-win32.c
> > @@ -840,8 +840,99 @@ static void guest_fsfreeze_cleanup(void)
> >  GuestFilesystemTrimResponse *
> >  qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **errp)
> >  {
> > -error_setg(errp, QERR_UNSUPPORTED);
> > -return NULL;
> > +GuestFilesystemTrimResponse *resp;
> > +HANDLE handle;
> > +WCHAR guid[MAX_PATH] = L"";
> > +
> > +handle = FindFirstVolumeW(guid, ARRAYSIZE(guid));
> > +if (handle == INVALID_HANDLE_VALUE) {
> > +error_setg_win32(errp, GetLastError(), "failed to find
> > any volume");
> > +return NULL;
> > +}
> > +
> > +resp = g_new0(GuestFilesystemTrimResponse, 1);
> > +
> > +do {
> > +GuestFilesystemTrimResult *res;
> > +GuestFilesystemTrimResultList *list;
> > +PWCHAR uc_path;
> > +DWORD char_count = 0;
> > +char *path, *out;
> > +GError *gerr = NULL;
> > +gchar * argv[4];
> > +
> > +GetVolumePathNamesForVolumeNameW(guid, NULL, 0, _count);
> > +
> >
> >
> > It assumes GetVolumePathNamesForVolumeNameW() == 0, perhaps better be
> > explicit about it with an assert() or a warning()?
> original assumption was that in this case we'll call
> GetVolumePathNamesForVolumeNameW()
> with the exactly the same parameter set and fail there.
> 
> 
> >
> > +if (GetLastError() != ERROR_MORE_DATA) {
> >
> >
> > Would it be useful to log the error in this case?
> >  
> >
> > +continue;
> > +}
> > +if (GetDriveTypeW(guid) != DRIVE_FIXED) {
> > +continue;
> > +}
> > +
> > +uc_path = g_malloc(sizeof(WCHAR) * char_count); 
> >
> > +if (!GetVolumePathNamesForVolumeNameW(guid, uc_path,
> > char_count,
> > +  _count) ||
> > !*uc_path) {
> > +/* strange, but this condition could be faced even
> > with size == 2 */
> >
> >
> > What size?
> >  
> with char_count == 2
> 
> > Same remark regarding logging error.
> >
> > +g_free(uc_path);
> > +continue;
> > +}
> > +
> > +res = g_new0(GuestFilesystemTrimResult, 1);
> > +
> > +path = g_utf16_to_utf8(uc_path, char_count, NULL, NULL,
> > );
> > +
> > +g_free(uc_path);
> > +
> > +if (gerr != NULL && gerr->code) {
> >
> >
> > Why check gerr->code? To be consistent with error checking code, I
> > would check if path == NULL instead, which by glib doc says that gerr
> > will be set in this case.
> >  
> ok

Thanks, applied to qga tree with the above suggestion squashed in:

  https://github.com/mdroth/qemu/commits/qga

> 
> > +res->has_error = true;
> > +res->error = g_strdup(gerr->message);
> > +g_error_free(gerr);
> > +break;
> > +}
> > +
> > +res->path = path;
> > +
> > +list = g_new0(GuestFilesystemTrimResultList, 1);
> > +list->value = res;
> > +list->next = resp->paths;
> > +
> > +resp->paths = list;
> > +
> > +memset(argv, 0, sizeof(argv));
> > +

Re: [Qemu-devel] [PATCH v2 3/4] sockets: add AF_VSOCK support

2016-10-25 Thread Michael Roth
Quoting Stefan Hajnoczi (2016-10-14 04:00:55)
> Add the AF_VSOCK address family so that qemu-ga will be able to use
> virtio-vsock.
> 
> The AF_VSOCK address family uses  address tuples.  The cid is
> the unique identifier comparable to an IP address.  AF_VSOCK does not
> use name resolution so it's easy to convert between struct sockaddr_vm
> and strings.
> 
> This patch defines a VsockSocketAddress instead of trying to piggy-back
> on InetSocketAddress.  This is cleaner in the long run since it avoids
> lots of IPv4 vs IPv6 vs vsock special casing.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * s/seasy/easy/ typo fix in commit description [Eric]
>  * Use %n to check for trailing characters in addresses [Eric]
> ---
>  qapi-schema.json|  23 +-
>  util/qemu-sockets.c | 227 
> 
>  2 files changed, 249 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 9e47b47..12aea99 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -988,12 +988,14 @@
>  #
>  # @unix: unix socket
>  #
> +# @vsock: vsock family (since 2.8)
> +#
>  # @unknown: otherwise
>  #
>  # Since: 2.1
>  ##
>  { 'enum': 'NetworkAddressFamily',
> -  'data': [ 'ipv4', 'ipv6', 'unix', 'unknown' ] }
> +  'data': [ 'ipv4', 'ipv6', 'unix', 'vsock', 'unknown' ] }
> 
>  ##
>  # @VncBasicInfo
> @@ -3018,6 +3020,24 @@
>  'path': 'str' } }
> 
>  ##
> +# @VsockSocketAddress
> +#
> +# Captures a socket address in the vsock namespace.
> +#
> +# @cid: unique host identifier
> +# @port: port
> +#
> +# Note that string types are used to allow for possible future hostname or
> +# service resolution support.
> +#
> +# Since 2.8
> +##
> +{ 'struct': 'VsockSocketAddress',
> +  'data': {
> +'cid': 'str',
> +'port': 'str' } }
> +
> +##
>  # @SocketAddress
>  #
>  # Captures the address of a socket, which could also be a named file 
> descriptor
> @@ -3028,6 +3048,7 @@
>'data': {
>  'inet': 'InetSocketAddress',
>  'unix': 'UnixSocketAddress',
> +'vsock': 'VsockSocketAddress',
>  'fd': 'String' } }
> 
>  ##
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 6db48b3..6ef3cc5 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -17,6 +17,10 @@
>   */
>  #include "qemu/osdep.h"
> 
> +#ifdef AF_VSOCK
> +#include 
> +#endif /* AF_VSOCK */

I have this series applied locally but I hit some build issues on Ubuntu
14.04 due to linux/vm_sockets.h not being provided by Ubuntu 14.04's
linux-libc-dev package. It is however included with linux-libc-dev in
16.04. linux-headers package includes it in both cases, but installs
to /usr/src/linux-headers*, which are not part of the default include
path.

Do you think we need a configure check and CONFIG_AF_VSOCK flag instead?

> +
>  #include "monitor/monitor.h"
>  #include "qapi/error.h"
>  #include "qemu/sockets.h"
> @@ -75,6 +79,9 @@ NetworkAddressFamily inet_netfamily(int family)
>  case PF_INET6: return NETWORK_ADDRESS_FAMILY_IPV6;
>  case PF_INET:  return NETWORK_ADDRESS_FAMILY_IPV4;
>  case PF_UNIX:  return NETWORK_ADDRESS_FAMILY_UNIX;
> +#ifdef AF_VSOCK
> +case PF_VSOCK: return NETWORK_ADDRESS_FAMILY_VSOCK;
> +#endif /* AF_VSOCK */
>  }
>  return NETWORK_ADDRESS_FAMILY_UNKNOWN;
>  }
> @@ -650,6 +657,181 @@ int inet_connect(const char *str, Error **errp)
>  return sock;
>  }
> 
> +#ifdef AF_VSOCK
> +static bool vsock_parse_vaddr_to_sockaddr(const VsockSocketAddress *vaddr,
> +  struct sockaddr_vm *svm,
> +  Error **errp)
> +{
> +unsigned long long val;
> +
> +memset(svm, 0, sizeof(*svm));
> +svm->svm_family = AF_VSOCK;
> +
> +if (parse_uint_full(vaddr->cid, , 10) < 0 ||
> +val > UINT32_MAX) {
> +error_setg(errp, "Failed to parse cid '%s'", vaddr->cid);
> +return false;
> +}
> +svm->svm_cid = val;
> +
> +if (parse_uint_full(vaddr->port, , 10) < 0 ||
> +val > UINT32_MAX) {
> +error_setg(errp, "Failed to parse port '%s'", vaddr->port);
> +return false;
> +}
> +svm->svm_port = val;
> +
> +return true;
> +}
> +
> +static int vsock_connect_addr(const struct sockaddr_vm *svm, bool 
> *in_progress,
> +  ConnectState *connect_state, Error **errp)
> +{
> +int sock, rc;
> +
> +*in_progress = false;
> +
> +sock = qemu_socket(AF_VSOCK, SOCK_STREAM, 0);
> +if (sock < 0) {
> +error_setg_errno(errp, errno, "Failed to create socket");
> +return -1;
> +}
> +if (connect_state != NULL) {
> +qemu_set_nonblock(sock);
> +}
> +/* connect to peer */
> +do {
> +rc = 0;
> +if (connect(sock, (const struct sockaddr *)svm, sizeof(*svm)) < 0) {
> +rc = -errno;
> +}
> +} while (rc == -EINTR);
> +
> +if (connect_state != NULL && 

Re: [Qemu-devel] [PATCH RFC] acpi: don't build acpi tables for xen hvm guests

2016-10-25 Thread Stefano Stabellini
CC'ing maintainers

On Tue, 25 Oct 2016, Wei Liu wrote:
> Xen's toolstack is in charge of building ACPI tables. Skip acpi table
> building if running on Xen.
> 
> This issue is discovered due to direct kernel boot on Xen doesn't boot
> anymore, because the new ACPI tables cause the guest to exceed its
> memory allocation limit.
> 
> Reported-by: Sander Eikelenboom 
> Signed-off-by: Wei Liu 

Hi Wei,
thanks for the patch. I think the right fix is to set

pcmc->has_acpi_build = false

for the xenfv machine and for the PC machine when accel=xen.

Thoughts?


> Cc: Anthony PERARD 
> Cc: Stefano Stabellini 
> 
> RFC because I'm not sure this is the best way to fix it.
> ---
>  hw/i386/acpi-build.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a26a4bb..2cdff12 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -45,6 +45,7 @@
>  #include "sysemu/tpm_backend.h"
>  #include "hw/timer/mc146818rtc_regs.h"
>  #include "sysemu/numa.h"
> +#include "hw/xen/xen.h"
>  
>  /* Supported chipsets: */
>  #include "hw/acpi/piix4.h"
> @@ -2865,6 +2866,11 @@ void acpi_setup(void)
>  return;
>  }
>  
> +if (xen_enabled()) {
> +ACPI_BUILD_DPRINTF("Xen enabled. Bailing out.\n");
> +return;
> +}
> +
>  build_state = g_malloc0(sizeof *build_state);
>  
>  acpi_set_pci_info();
> -- 
> 2.1.4
> 



Re: [Qemu-devel] [PATCH 4/5] curses: add option to specify VGA font encoding

2016-10-25 Thread Samuel Thibault
Hello,

It seems that in the flurry of qemu-devel mails, I missed this answer.

Paolo Bonzini wrote:
> > +#ifdef CONFIG_ICONV
> > +if (font_charset) {
> > +unsigned char ch;
> > +wchar_t wch;
> > +char *pch, *pwch;
> > +size_t sch, swch;
> > +iconv_t conv;
> > +
> > +conv = iconv_open("WCHAR_T", font_charset);
> 
> Is this portable?

I confirm it works on at least glibc (thus Linux and other GNUs), MacOS,
OpenBSD, FreeBSD, Windows mingw.

Samuel



Re: [Qemu-devel] qemu-ga virtio-serial socket clarification

2016-10-25 Thread Matt Broadstone
On Tue, Oct 25, 2016 at 6:27 PM, Stefan Hajnoczi  wrote:

> On Tue, Oct 25, 2016 at 7:14 PM, Matt Broadstone 
> wrote:
> > I've been attempting an experimental qemu agent using a node.js daemon on
> > the host side, and have run into an issue I was hoping someone here might
> > be able to help with:
> >
> > Using libvirt I've set up a 'unix' channel for a domain using
> virtio-serial
> > (the same way you would for the existing qemu agent) with the name
> > 'test.agent', in order to bypass libvirt taking ownership of the domain
> > socket. This works as expected, and so does the following test:
> >
> >  - [host] $ echo "testing" | nc -U
> > /var/lib/libvirt/qemu/channel/target/domain-T40001/test.agent
> >  - [guest] $ cat -v < /dev/virtio-ports/test.agent
> >
> > Then I tried the same test, converting the host->guest communication to
> > node.js:
> >
> > 'use strict';
> > const net = require('net');
> > const socketPath =
> > '/var/lib/libvirt/qemu/channel/target/domain-T40001/test.agent';
> > let socket = net.createConnection(socketPath);
> > socket.write('testing');
> >
> > In this case the data makes it across to the guest, however until I
> > explicitly close the socket on the sender side (`socket.write('testing',
> ()
> > => socket.end())`) both sides block indefinitely. I understand closing
> the
> > socket brings the node example to parity with the netcat one, however
> after
> > perusing the qemu-ga and libvirt repositories it looks like glib's io
> > channels are being used on a single socket, and effectively handling
> > bidirectional data.
> >
> > Is this the expected behavior?
> >
> > This would seem to imply that normal async communication over the domain
> > socket is somehow different in the virtio-serial case (as in I can't
> > maintain a duplex socket, but would rather have to juggle opening and
> > closing read/write sockets). In my research I came across another similar
> > project: https://github.com/xolox/python-negotiator, which requires two
> > channels: one for host->guest communication, and another for guest->host
> > communication, likely because of this very issue.
>
> virtio-serial is full-duplex.
>
> Please post the receive side test program you are using.
>
> Stefan
>

Stefan,

The receive side in this case is the same as above: `cat -v <
/dev/virtio-ports/test.agent`, the only variable here is the sending side
changing to the posted node script.

Matt


[Qemu-devel] [PATCH v6] Add 'offset' and 'size' options

2016-10-25 Thread Tomáš Golembiovský
v5 -> v6:
- fix alignment check condition
- when size is not specified and device size is being used take offset
  into account

v4 -> v5:
- added two missing overflow checks
- comments from Eric Blake:
  - renamed 'fail' label to 'end'
  - fixed optional fields in JSON scheme
  - no punctuation at the end of error_setg() message
  - spaces around PRI* macros
  - using QEMU_IS_ALIGNED
  - typos

v3 -> v4:
- fix stupid compilation error and formatting issue

v2 -> v3:
- changed overflow check to make it clearer
- produce error instead of warning when size is not multiple of sector
  size

v1 -> v2:
- options were moved from 'file' driver into 'raw' driver as suggested
- added support for writing, reopen and truncate when possible

Tomáš Golembiovský (1):
  raw_bsd: add offset and size options

 block/raw_bsd.c  | 176 ++-
 qapi/block-core.json |  16 -
 2 files changed, 188 insertions(+), 4 deletions(-)

-- 
2.10.1




[Qemu-devel] [PATCH v6] raw_bsd: add offset and size options

2016-10-25 Thread Tomáš Golembiovský
Added two new options 'offset' and 'size'. This makes it possible to use
only part of the file as a device. This can be used e.g. to limit the
access only to single partition in a disk image or use a disk inside a
tar archive (like OVA).

When 'size' is specified we do our best to honour it.

Signed-off-by: Tomáš Golembiovský 
---
 block/raw_bsd.c  | 176 ++-
 qapi/block-core.json |  16 -
 2 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 588d408..9eb187a 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -31,6 +31,30 @@
 #include "qapi/error.h"
 #include "qemu/option.h"
 
+typedef struct BDRVRawState {
+uint64_t offset;
+uint64_t size;
+bool has_size;
+} BDRVRawState;
+
+static QemuOptsList raw_runtime_opts = {
+.name = "raw",
+.head = QTAILQ_HEAD_INITIALIZER(raw_runtime_opts.head),
+.desc = {
+{
+.name = "offset",
+.type = QEMU_OPT_SIZE,
+.help = "offset in the disk where the image starts",
+},
+{
+.name = "size",
+.type = QEMU_OPT_SIZE,
+.help = "virtual disk size",
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -44,16 +68,108 @@ static QemuOptsList raw_create_opts = {
 }
 };
 
+static int raw_read_options(QDict *options, BlockDriverState *bs,
+BDRVRawState *s, Error **errp)
+{
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+int64_t real_size = 0;
+int ret;
+
+real_size = bdrv_getlength(bs->file->bs);
+if (real_size < 0) {
+error_setg_errno(errp, -real_size, "Could not get image size");
+return real_size;
+}
+
+opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
+qemu_opts_absorb_qdict(opts, options, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto end;
+}
+
+s->offset = qemu_opt_get_size(opts, "offset", 0);
+if (qemu_opt_find(opts, "size") != NULL) {
+s->size = qemu_opt_get_size(opts, "size", 0);
+s->has_size = true;
+} else {
+s->has_size = false;
+s->size = real_size - s->offset;
+}
+
+/* Check size and offset */
+if (real_size < s->offset || (real_size - s->offset) < s->size) {
+error_setg(errp, "The sum of offset (%" PRIu64 ") and size "
+"(%" PRIu64 ") has to be smaller or equal to the "
+" actual size of the containing file (%" PRId64 ")",
+s->offset, s->size, real_size);
+ret = -EINVAL;
+goto end;
+}
+
+/* Make sure size is multiple of BDRV_SECTOR_SIZE to prevent rounding
+ * up and leaking out of the specified area. */
+if (!QEMU_IS_ALIGNED(s->size, BDRV_SECTOR_SIZE)) {
+error_setg(errp, "Specified size is not multiple of %llu",
+BDRV_SECTOR_SIZE);
+ret = -EINVAL;
+goto end;
+}
+
+ret = 0;
+
+end:
+
+qemu_opts_del(opts);
+
+return ret;
+}
+
 static int raw_reopen_prepare(BDRVReopenState *reopen_state,
   BlockReopenQueue *queue, Error **errp)
 {
-return 0;
+assert(reopen_state != NULL);
+assert(reopen_state->bs != NULL);
+
+reopen_state->opaque = g_new0(BDRVRawState, 1);
+
+return raw_read_options(
+reopen_state->options,
+reopen_state->bs,
+reopen_state->opaque,
+errp);
+}
+
+static void raw_reopen_commit(BDRVReopenState *state)
+{
+BDRVRawState *new_s = state->opaque;
+BDRVRawState *s = state->bs->opaque;
+
+memcpy(s, new_s, sizeof(BDRVRawState));
+
+g_free(state->opaque);
+state->opaque = NULL;
+}
+
+static void raw_reopen_abort(BDRVReopenState *state)
+{
+g_free(state->opaque);
+state->opaque = NULL;
 }
 
 static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, QEMUIOVector *qiov,
   int flags)
 {
+BDRVRawState *s = bs->opaque;
+
+if (offset > UINT64_MAX - s->offset) {
+return -EINVAL;
+}
+offset += s->offset;
+
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
 }
@@ -62,11 +178,23 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
 {
+BDRVRawState *s = bs->opaque;
 void *buf = NULL;
 BlockDriver *drv;
 QEMUIOVector local_qiov;
 int ret;
 
+if (s->has_size && (offset > s->size || bytes > (s->size - offset))) {
+/* There's not enough space for the data. Don't write anything and just
+ * fail 

Re: [Qemu-devel] [Bug 1630723] [NEW] UART writes to netduino2/stm32f205-soc disappear

2016-10-25 Thread Alistair Francis
On Thu, Oct 20, 2016 at 3:55 PM, Seth K  wrote:
> I've narrowed this down. In exec.c the address is reduced by
> section->offset_within_address_space. However, half the time that seems to
> be wrong.
>
> For usart1 at 40011004 it is 40011000, a difference of 4 which signals a
> usart write.
>
> For usart2 at 40004404 it is 4c00, a difference of 3804 which means
> nothing.
>
> On Wed, Oct 12, 2016 at 6:25 PM, Seth K  wrote:
>>
>> It's a bare metal program so I don't really have anywhere to print to,
>> other than my custom function to output to the uart. I did double check all
>> the address to make sure they agreed with the documentation and the Qemu
>> source code. I tried changing around the destinations of the output just to
>> verify the order of the write or the destination somehow affected the
>> output. I tried being tricky, like instead of writing to usart 3 I wrote to
>> uart 4 - 0x400 (the same address, it didn't work). The code should be simple
>> enough that I don't have room for any crazy mistakes:
>>
>> volatile unsigned char * const USART1_PTR = (unsigned char *)0x40011000;
>> volatile unsigned char * const USART2_PTR = (unsigned char *)0x40004400;
>> volatile unsigned char * const USART3_PTR = (unsigned char *)0x40004800;
>> volatile unsigned char * const UART4_PTR = (unsigned char *)0x40004c00;
>>
>> void display(const char *string, volatile unsigned char * uart_addr){
>>   while(*string != '\0'){
>> *(uart_addr+4) = *string;
>> string++;
>>   }
>> }
>>
>> int my_init(){
>>   display("Test 1/4\n", USART1_PTR);
>>   display("Test 2/4\n", USART2_PTR);
>>   display("Test 3/4\n", USART3_PTR);
>>   display("Test 4/4\n", UART4_PTR);
>> }
>>
>>
>> In the past I ran a really long test where I wrote to every possible
>> address just to see what happens. No unexpected output occurred. I can do
>> that test again, but it takes hours. I could also write code to convert the
>> address to something printable to verify the address isn't being changed,
>> but that seems unlikely.
>>
>> Another thought I had is maybe there is some sort of interaction between
>> where I am setting the stack top - 0x20001000 - but that doesn't seem like
>> it should interfere. Maybe the linker or objcopy are doing something crazy?
>>
>> I don't understand Qemu enough to know what should be calling the
>> functions that handle UART read/write. Is there something I should look at
>> in Qemu and try to intercept?

Try this diff to enable debug prints. That should print more
information about what is happening in QEMU

diff --git a/hw/char/stm32f2xx_usart.c b/hw/char/stm32f2xx_usart.c
index 4c6640d..4be093d 100644
--- a/hw/char/stm32f2xx_usart.c
+++ b/hw/char/stm32f2xx_usart.c
@@ -27,7 +27,7 @@
 #include "qemu/log.h"

 #ifndef STM_USART_ERR_DEBUG
-#define STM_USART_ERR_DEBUG 0
+#define STM_USART_ERR_DEBUG 1
 #endif

 #define DB_PRINT_L(lvl, fmt, args...) do { \

When the guest writes to a register it should call back to the
stm32f2xx_usart_write() function. Make sure that is happening and the
offsets are correct.

Thanks,

Alistair

>>
>> On Fri, Oct 7, 2016 at 6:27 PM, Alistair Francis 
>> wrote:
>>>
>>> On Fri, Oct 7, 2016 at 1:04 PM, Seth K  wrote:
>>> > I applied that patch, made qemu and ran my code, I didn't see a change.
>>> >
>>> > According to the STM32F20xxx memory map, the memory range seems to be
>>> > 0x400
>>> > -- UART 1 is listed as 0x4001 - 0x400103FF. Should that memory
>>> > region be
>>> > set to 0x400?
>>>
>>> I was hoping that would have fixed it.
>>>
>>> It sounds like it should be 0x400 then, although it doesn't sound like
>>> this is causing this issue.
>>>
>>> >
>>> > I tried that too, no change yet, but maybe I should look at the other
>>> > memory
>>> > settings.
>>>
>>> Maybe, it is very strange that it's not reaching the read/write
>>> functions.
>>>
>>> Can you try putting print statements in the guest software to make
>>> sure it is writing to the locations you expect and then make sure
>>> there are no conditionals in QEMU that cause the print statements to
>>> not be printed. See what that uncovers.
>>>
>>> Thanks,
>>>
>>> Alistair
>>>
>>> >
>>> > I also tried making these changes in another branch where I made this
>>> > chip
>>> > have 8 UARTS. That was unchanged: I can only output UARTS 1,4,5,6.
>>> >
>>> > On Fri, Oct 7, 2016 at 12:10 PM, Alistair Francis
>>> > 
>>> > wrote:
>>> >>
>>> >> On Fri, Oct 7, 2016 at 9:03 AM, Alistair Francis
>>> >> 
>>> >> wrote:
>>> >> > On Fri, Oct 7, 2016 at 8:59 AM, Seth K  wrote:
>>> >> >> The only machine I saw listed in the help output is "netduino2." I
>>> >> >> pulled
>>> >> >> QEMU from github, was that the right thing to do?
>>> >> >>
>>> >> >> I found the specifications for the stm32f2xx and some similar chips
>>> >> >> and
>>> >> >> verified the addresses and interrupts are correct.
>>> >> 

Re: [Qemu-devel] [PATCH v2] char: cadence: check baud rate generator and divider values

2016-10-25 Thread Alistair Francis
On Tue, Oct 25, 2016 at 11:24 AM, P J P  wrote:
>Hello Alistair,
>
> +-- On Tue, 25 Oct 2016, Alistair Francis wrote --+
> | >   * Device model for Cadence UART
> | > + *  -> 
> http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf
> |
> | Can you say what page/section the UART spec is in the Xilinx TRM?
>
>   Chapter 19 UART Controller, page 585, 19.2.3 Baud Rate Generator.
>
> | I think it might also be worth noting that the datasheet is a Xilinx
> | datasheet that covers the Cadence UART. Others might be using the IP
> | as well and might get confused why you are referring to a Xilinx
> | datasheet.
>
>   Right, I'll add above section details in the comment.
>
> | > +case R_BRGR: /* Baud rate generator */
> | > +s->r[offset] = 0x028B; /* default reset value */
> |
> | Is this the correct behavior, or should the write just be ignored?
> | pg.587 of the TRM doesn't really make this clear, did you find this
> | somewhere else?
>
>   True, page 587 does not clearly mention if it should be ignored.
> But in Appendix B, Register details for 'Baud_rate_gen_reg0' says
>
> 0: Disables baud_sample
> 1: Clock divisor bypass (baud_sample = sel_clk)
> 2 - 65535: baud_sample
>
> | > +case R_BDIV:/* Baud rate divider */
> | > +s->r[offset] = 0x0F;
>
>   Appendix B, Register details for 'Baud_rate_divider_reg0' says
>
> 0 - 3: ignored
> 4 - 255: Baud rate
>
>
> ie. values 0-3 are ignored. But should we avoid writing 's->r[R_BRGR]' &
> 's->r[R_BDIV]' for these values? That would lead to undefined values being
> using in 'uart_parameters_setup()', no?

I think your email crossed with Peter. Have a look at what he said.
That should clarify everything.

Thanks,

Alistair

>
> Thank you.
> --
> Prasad J Pandit / Red Hat Product Security Team
> 47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F
>



Re: [Qemu-devel] qemu-ga virtio-serial socket clarification

2016-10-25 Thread Stefan Hajnoczi
On Tue, Oct 25, 2016 at 7:14 PM, Matt Broadstone  wrote:
> I've been attempting an experimental qemu agent using a node.js daemon on
> the host side, and have run into an issue I was hoping someone here might
> be able to help with:
>
> Using libvirt I've set up a 'unix' channel for a domain using virtio-serial
> (the same way you would for the existing qemu agent) with the name
> 'test.agent', in order to bypass libvirt taking ownership of the domain
> socket. This works as expected, and so does the following test:
>
>  - [host] $ echo "testing" | nc -U
> /var/lib/libvirt/qemu/channel/target/domain-T40001/test.agent
>  - [guest] $ cat -v < /dev/virtio-ports/test.agent
>
> Then I tried the same test, converting the host->guest communication to
> node.js:
>
> 'use strict';
> const net = require('net');
> const socketPath =
> '/var/lib/libvirt/qemu/channel/target/domain-T40001/test.agent';
> let socket = net.createConnection(socketPath);
> socket.write('testing');
>
> In this case the data makes it across to the guest, however until I
> explicitly close the socket on the sender side (`socket.write('testing', ()
> => socket.end())`) both sides block indefinitely. I understand closing the
> socket brings the node example to parity with the netcat one, however after
> perusing the qemu-ga and libvirt repositories it looks like glib's io
> channels are being used on a single socket, and effectively handling
> bidirectional data.
>
> Is this the expected behavior?
>
> This would seem to imply that normal async communication over the domain
> socket is somehow different in the virtio-serial case (as in I can't
> maintain a duplex socket, but would rather have to juggle opening and
> closing read/write sockets). In my research I came across another similar
> project: https://github.com/xolox/python-negotiator, which requires two
> channels: one for host->guest communication, and another for guest->host
> communication, likely because of this very issue.

virtio-serial is full-duplex.

Please post the receive side test program you are using.

Stefan



Re: [Qemu-devel] [PATCH 0/5] curses: wide character support

2016-10-25 Thread Samuel Thibault
Hello,

Ping?

Samuel Thibault, on Sat 15 Oct 2016 21:53:03 +0200, wrote:
> This patch series adds wide character support to the curses frontend of qemu,
> thus allowing to fix a lot of input and output issues with e.g. accented 
> letters
> and semi-graphic glyphs. Since qemu can't know the encoding of the VGA font, 
> the
> user has to specify it (just like he has to specify the keyboard layout with
> -k). I used option -f to make it simple for now, but I welcome any other idea 
> :)

I forgot to mention that I updated the patches according to the reviews
on the list (assume cursesw support, and use -display option).

Samuel

> Samuel Thibault (5):
>   curses: fix left/right arrow translation
>   curses: Use cursesw instead of curses
>   curses: use wide output functions
>   curses: add option to specify VGA font encoding
>   curses: support wide input
> 
>  configure   |  71 ++--
>  hw/display/vga.c|   4 +-
>  include/sysemu/sysemu.h |   1 +
>  include/ui/console.h|  16 +-
>  qemu-options.hx |   5 +-
>  ui/curses.c | 436 
> +---
>  ui/curses_keys.h| 113 +++--
>  vl.c|  21 ++-
>  8 files changed, 543 insertions(+), 124 deletions(-)
> 
> -- 
> 2.9.3
> 

-- 
Samuel
Tu as lu les docs. Tu es devenu un informaticien. Que tu le veuilles
ou non. Lire la doc, c'est le Premier et Unique Commandement de
l'informaticien.
-+- TP in: Guide du Linuxien pervers - "L'évangile selon St Thomas"



[Qemu-devel] [PATCH v1] block/vxhs: Add Veritas HyperScale VxHS block device support

2016-10-25 Thread Ashish Mittal
This patch adds support for a new block device type called "vxhs".
Source code for the library that this code loads can be downloaded from:
https://github.com/MittalAshish/libqnio.git

Sample command line using JSON syntax:
./qemu-system-x86_64 -name instance-0008 -S -vnc 0.0.0.0:0 -k en-us -vga 
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg 
timestamp=on 
'json:{"driver":"vxhs","vdisk_id":"{c3e9095a-a5ee-4dce-afeb-2a59fb387410}","server":[{"host":"172.172.17.4","port":""}]}'

Sample command line using URI syntax:
qemu-img convert -f raw -O raw -n 
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad 
vxhs://192.168.0.1:/%7Bc6718f6b-0401-441d-a8c3-1f0064d75ee0%7D

Signed-off-by: Ashish Mittal 
---
 block/Makefile.objs |   2 +
 block/trace-events  |  22 ++
 block/vxhs.c| 736 
 configure   |  41 +++
 4 files changed, 801 insertions(+)
 create mode 100644 block/vxhs.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 67a036a..58313a2 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
+block-obj-$(CONFIG_VXHS) += vxhs.o
 block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
@@ -38,6 +39,7 @@ rbd.o-cflags   := $(RBD_CFLAGS)
 rbd.o-libs := $(RBD_LIBS)
 gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs := $(GLUSTERFS_LIBS)
+vxhs.o-libs:= $(VXHS_LIBS)
 ssh.o-cflags   := $(LIBSSH2_CFLAGS)
 ssh.o-libs := $(LIBSSH2_LIBS)
 archipelago.o-libs := $(ARCHIPELAGO_LIBS)
diff --git a/block/trace-events b/block/trace-events
index 05fa13c..aea97cb 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -114,3 +114,25 @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t 
offset, size_t len) "s
 qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t 
offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, 
uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
+
+# block/vxhs.c
+vxhs_iio_callback(int error, int reason) "ctx is NULL: error %d, reason %d"
+vxhs_setup_qnio(void *s) "Context to HyperScale IO manager = %p"
+vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o %d, 
%d"
+vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno %d"
+vxhs_open_fail(int ret) "Could not open the device. Error = %d"
+vxhs_open_epipe(int ret) "Could not create a pipe for device. Bailing out. 
Error=%d"
+vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
+vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void 
*acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = %lu 
offset = %lu ACB = %p. Error = %d, errno = %d"
+vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl 
failed, ret = %d, errno = %d"
+vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat 
ioctl returned size %lu"
+vxhs_qnio_iio_open(const char *ip) "Failed to connect to storage agent on 
host-ip %s"
+vxhs_qnio_iio_devopen(const char *fname) "Failed to open vdisk device: %s"
+vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %ld"
+vxhs_parse_uri_filename(const char *filename) "URI passed via 
bdrv_parse_filename %s"
+vxhs_qemu_init_vdisk(const char *vdisk_id) "vdisk_id from json %s"
+vxhs_qemu_init_numservers(int num_servers) "Number of servers passed = %d"
+vxhs_parse_uri_hostinfo(int num, char *host, int port) "Host %d: IP %s, Port 
%d"
+vxhs_qemu_init(char *of_vsa_addr, int port) "Adding host %s:%d to 
BDRVVXHSState"
+vxhs_qemu_init_filename(const char *filename) "Filename passed as %s"
+vxhs_close(char *vdisk_guid) "Closing vdisk %s"
diff --git a/block/vxhs.c b/block/vxhs.c
new file mode 100644
index 000..97fb804
--- /dev/null
+++ b/block/vxhs.c
@@ -0,0 +1,736 @@
+/*
+ * QEMU Block driver for Veritas HyperScale (VxHS)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include 
+#include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "trace.h"
+#include "qemu/uri.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+
+#define VDISK_FD_READ   0
+#define VDISK_FD_WRITE  1
+
+#define VXHS_OPT_FILENAME   "filename"
+#define VXHS_OPT_VDISK_ID   "vdisk_id"
+#define VXHS_OPT_SERVER 

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 23:53, Ketan Nilangekar wrote:
> We need to confirm the perf numbers but it really depends on the way we do 
> failover outside qemu.
> 
> We are looking at a vip based failover implementation which may need
> some handling code in qnio but that overhead should be minimal (atleast
> no more than the current impl in qemu driver)

Then it's not outside QEMU's address space, it's only outside
block/vxhs.c... I don't understand.

Paolo

> IMO, the real benefit of qemu + qnio perf comes from:
> 1. the epoll based io multiplexer
> 2. 8 epoll threads
> 3. Zero buffer copies in userland code
> 4. Minimal locking
>
> We are also looking at replacing the existing qnio socket code with
> memory readv/writev calls available with the latest kernel for even
> better performance.

> 
> Ketan
> 
>> On Oct 25, 2016, at 1:01 PM, Paolo Bonzini  wrote:
>>
>>
>>
>>> On 25/10/2016 07:07, Ketan Nilangekar wrote:
>>> We are able to derive significant performance from the qemu block
>>> driver as compared to nbd/iscsi/nfs. We have prototyped nfs and nbd
>>> based io tap in the past and the performance of qemu block driver is
>>> significantly better. Hence we would like to go with the vxhs driver
>>> for now.
>>
>> Is this still true with failover implemented outside QEMU (which
>> requires I/O to be proxied, if I'm not mistaken)?  What does the benefit
>> come from if so, is it the threaded backend and performing multiple
>> connections to the same server?
>>
>> Paolo
>>
>>> Ketan
>>>
>>>
 On Oct 24, 2016, at 4:24 PM, Paolo Bonzini 
 wrote:



> On 20/10/2016 03:31, Ketan Nilangekar wrote: This way the
> failover logic will be completely out of qemu address space. We
> are considering use of some of our proprietary 
> clustering/monitoring services to implement service failover.

 Are you implementing a different protocol just for the sake of
 QEMU, in other words, and forwarding from that protocol to your
 proprietary code?

 If that is what you are doing, you don't need at all a vxhs driver
 in QEMU.  Just implement NBD or iSCSI on your side, QEMU already
 has drivers for that.

 Paolo
> 
> 



Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-10-25 Thread Ketan Nilangekar
We need to confirm the perf numbers but it really depends on the way we do 
failover outside qemu.

We are looking at a vip based failover implementation which may need some 
handling code in qnio but that overhead should be minimal (atleast no more than 
the current impl in qemu driver)

IMO, the real benefit of qemu + qnio perf comes from:
1. the epoll based io multiplexer
2. 8 epoll threads
3. Zero buffer copies in userland code
4. Minimal locking

We are also looking at replacing the existing qnio socket code with memory 
readv/writev calls available with the latest kernel for even better performance.

But again this is something that will come in the near future. For now the 
existing qnio implementation can give us adequate performance even if we need 
to modify it to handle vip based failover.

Ketan

> On Oct 25, 2016, at 1:01 PM, Paolo Bonzini  wrote:
> 
> 
> 
>> On 25/10/2016 07:07, Ketan Nilangekar wrote:
>> We are able to derive significant performance from the qemu block
>> driver as compared to nbd/iscsi/nfs. We have prototyped nfs and nbd
>> based io tap in the past and the performance of qemu block driver is
>> significantly better. Hence we would like to go with the vxhs driver
>> for now.
> 
> Is this still true with failover implemented outside QEMU (which
> requires I/O to be proxied, if I'm not mistaken)?  What does the benefit
> come from if so, is it the threaded backend and performing multiple
> connections to the same server?
> 
> Paolo
> 
>> Ketan
>> 
>> 
>>> On Oct 24, 2016, at 4:24 PM, Paolo Bonzini 
>>> wrote:
>>> 
>>> 
>>> 
 On 20/10/2016 03:31, Ketan Nilangekar wrote: This way the
 failover logic will be completely out of qemu address space. We
 are considering use of some of our proprietary 
 clustering/monitoring services to implement service failover.
>>> 
>>> Are you implementing a different protocol just for the sake of
>>> QEMU, in other words, and forwarding from that protocol to your
>>> proprietary code?
>>> 
>>> If that is what you are doing, you don't need at all a vxhs driver
>>> in QEMU.  Just implement NBD or iSCSI on your side, QEMU already
>>> has drivers for that.
>>> 
>>> Paolo



Re: [Qemu-devel] [PATCH v5] raw_bsd: add offset and size options

2016-10-25 Thread Tomáš Golembiovský
I should test my code more before submitting it to ML. I have found two
bugs in the patch.


On Sun, 23 Oct 2016 16:54:37 +0200
Tomáš Golembiovský  wrote:

> +static int raw_read_options(QDict *options, BlockDriverState *bs,
> +BDRVRawState *s, Error **errp)
> +{
> +Error *local_err = NULL;
> +QemuOpts *opts = NULL;
> +int64_t real_size = 0;
> +int ret;
> +
> +real_size = bdrv_getlength(bs->file->bs);
> +if (real_size < 0) {
> +error_setg_errno(errp, -real_size, "Could not get image size");
> +return real_size;
> +}
> +
> +opts = qemu_opts_create(_runtime_opts, NULL, 0, _abort);
> +qemu_opts_absorb_qdict(opts, options, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +goto end;
> +}
> +
> +s->offset = qemu_opt_get_size(opts, "offset", 0);
> +if (qemu_opt_find(opts, "size") != NULL) {
> +s->size = qemu_opt_get_size(opts, "size", 0);
> +s->has_size = true;
> +} else {
> +s->has_size = false;
> +s->size = real_size;

This has to be:

s->size = real_size - s->offset;

.. to account for the offset. Otherwise the following check will fail.

> +}
> +
> +/* Check size and offset */
> +if (real_size < s->offset || (real_size - s->offset) < s->size) {
> +error_setg(errp, "The sum of offset (%" PRIu64 ") and size "
> +"(%" PRIu64 ") has to be smaller or equal to the "
> +" actual size of the containing file (%" PRId64 ")",
> +s->offset, s->size, real_size);
> +ret = -EINVAL;
> +goto end;
> +}
> +
> +/* Make sure size is multiple of BDRV_SECTOR_SIZE to prevent rounding
> + * up and leaking out of the specified area. */
> +if (QEMU_IS_ALIGNED(s->size, BDRV_SECTOR_SIZE)) {

The condition has to be negated. Silly mistake made while rewriting the
condition to use QEMU_IS_ALIGNED.

> +error_setg(errp, "Specified size is not multiple of %llu",
> +BDRV_SECTOR_SIZE);
> +ret = -EINVAL;
> +goto end;
> +}
> +
> +ret = 0;
> +
> +end:
> +
> +qemu_opts_del(opts);
> +
> +return ret;
> +}
> +

-- 
Tomáš Golembiovský 



Re: [Qemu-devel] [PULL 15/17] support replication driver in blockdev-add

2016-10-25 Thread Eric Blake
On 09/12/2016 09:08 AM, Stefan Hajnoczi wrote:
> From: Wen Congyang 
> 
> Signed-off-by: Wen Congyang 
> Signed-off-by: Changlong Xie 
> Signed-off-by: Wang WeiWei 
> Signed-off-by: zhanghailiang 
> Signed-off-by: Gonglei 
> Reviewed-by: Eric Blake 
> Message-id: 1469602913-20979-12-git-send-email-xiecl.f...@cn.fujitsu.com
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qapi/block-core.json | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 

> +++ b/qapi/block-core.json
> @@ -252,6 +252,7 @@
>  #   2.3: 'host_floppy' deprecated
>  #   2.5: 'host_floppy' dropped
>  #   2.6: 'luks' added
> +#   2.8: 'replication' added

You added documentation here...

>  #
>  # @backing_file: #optional the name of the backing file (for copy-on-write)
>  #
> @@ -1712,8 +1713,8 @@

...but not for the BlockdevDriver enum.

>'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
>  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
>  'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
> -'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp',
> -'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> + 'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }

Also, I failed to notice that you added TAB damage here.  Other patches
are currently proposed to touch the same area (nfs, ssh, nbd), so
hopefully one of them will correct it in the process.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 2/2] qapi: allow blockdev-add for NFS

2016-10-25 Thread Eric Blake
On 10/24/2016 02:27 PM, Ashijeet Acharya wrote:
> Introduce new object 'BlockdevOptionsNFS' in qapi/block-core.json to
> support blockdev-add for NFS network protocol driver. Also make a new
> struct NFSServer to support tcp connection.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  qapi/block-core.json | 56 
> 
>  1 file changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 9d797b8..3ab028d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1714,9 +1714,9 @@
>  { 'enum': 'BlockdevDriver',
>'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
>  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
> -'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
> -'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> - 'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'host_device', 'http', 'https', 'luks', 'nfs', 'null-aio',
> +'null-co', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> +'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }

Missing a comment that 'nfs' is since 2.8.

>  ##
> +# @NFSServer
> +#
> +# Captures the address of the socket
> +#
> +# @type:transport type used for NFS (only TCP supported)
> +#
> +# @host:host part of the address
> +#
> +# Since 2.8
> +##
> +{ 'struct': 'NFSServer',
> +  'data': { 'type': 'str',

Please make this an enum, instead of an open-coded string. It's okay if
the enum only has one value 'tcp' for now; but using an enum will make
it introspectable if we later add a second transport, unlike what we get
with an open-coded string.

Must 'type' be mandatory if it must always be 'tcp'?

> +'host': 'str' } }
> +
> +##
> +# @BlockdevOptionsNfs
> +#
> +# Driver specific block device option for NFS
> +#
> +# @server:host address
> +#
> +# @path:  path of the image on the host
> +#
> +# @uid:   #optional UID value to use when talking to the server
> +#
> +# @gid:   #optional GID value to use when talking to the server

Do we want to allow string names in addition to numeric uid/gid values?
I'm not sure if NFS has name-based id mapping, but it's food for thought
on whether we need to use an alternate type here (alternate between
integer id and string name), or leave this as is.

> +#
> +# @tcp-syncnt:#optional number of SYNs during the session establishment

Would tcp-syn-count be any more legible?  What is the default when omitted?

> +#
> +# @readahead: #optional set the readahead size in bytes

What's the default when omitted?

> +#
> +# @pagecache: #optional set the pagecache size in bytes

Default?

> +#
> +# @debug: #optional set the NFS debug level (max 2)

Presumably default 0?

> +#
> +# Since 2.8
> +##
> +{ 'struct': 'BlockdevOptionsNfs',
> +  'data': { 'server': 'NFSServer',
> +'path': 'str',
> +'*uid': 'int',
> +'*gid': 'int',
> +'*tcp-syncnt': 'int',
> +'*readahead': 'int',
> +'*pagecache': 'int',
> +'*debug': 'int' } }
> +

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 22:45, Emilio G. Cota wrote:
> On Tue, Oct 25, 2016 at 16:35:48 -0400, Pranith Kumar wrote:
>> On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini  wrote:
>>>
>>>
 I've written a patch (see below) to take the per-bucket sequence locks.
>>>
>>> What's the performance like?
>>>
>>
>> Applying only this patch, the perf numbers are similar to the 128
>> cache line alignment you suggested.
> 
> That makes sense. Having a single seqlock per bucket is simple and fast;
> note that bucket chains should be very short (we use good hashing and
> automatic resize for this purpose).

But why do we get such worse performance in the 100% reader case?  (And
even more puzzling, why does Pranith's original patch improve
performance instead of causing more cache misses?)

Thanks,

Paolo



[Qemu-devel] [QEMU PATCH v8 3/3] tests/migration: Add test for QTAILQ migration

2016-10-25 Thread Jianjun Duan
Add a test for QTAILQ migration to tests/test-vmstate.c.

Signed-off-by: Jianjun Duan 
---
 tests/test-vmstate.c | 160 +++
 1 file changed, 160 insertions(+)

diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index d8da26f..a992408 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -475,6 +475,164 @@ static void test_load_skip(void)
 qemu_fclose(loading);
 }
 
+
+/* test QTAILQ migration */
+typedef struct TestQtailqElement TestQtailqElement;
+
+struct TestQtailqElement {
+bool b;
+uint8_t  u8;
+QTAILQ_ENTRY(TestQtailqElement) next;
+};
+
+typedef struct TestQtailq {
+int16_t  i16;
+QTAILQ_HEAD(TestQtailqHead, TestQtailqElement) q;
+int32_t  i32;
+} TestQtailq;
+
+static const VMStateDescription vmstate_q_element = {
+.name = "test/queue-element",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(b, TestQtailqElement),
+VMSTATE_UINT8(u8, TestQtailqElement),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_q = {
+.name = "test/queue",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_INT16(i16, TestQtailq),
+VMSTATE_QTAILQ_V(q, TestQtailq, 1, vmstate_q_element, 
TestQtailqElement,
+ next),
+VMSTATE_INT32(i32, TestQtailq),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void test_save_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+save_vmstate(_q, _q);
+compare_vmstate(wire_q, sizeof(wire_q));
+}
+
+static void test_load_q(void)
+{
+TestQtailq obj_q = {
+.i16 = -512,
+.i32 = 7,
+};
+
+TestQtailqElement obj_qe1 = {
+.b = true,
+.u8 = 130,
+};
+
+TestQtailqElement obj_qe2 = {
+.b = false,
+.u8 = 65,
+};
+
+uint8_t wire_q[] = {
+/* i16 */ 0xfe, 0x0,
+/* start of element 0 of q */ 0x01,
+/* .b  */ 0x01,
+/* .u8 */ 0x82,
+/* start of element 1 of q */ 0x01,
+/* b */   0x00,
+/* u8 */  0x41,
+/* end of q */0x00,
+/* i32 */ 0x00, 0x01, 0x11, 0x70,
+};
+
+QTAILQ_INIT(_q.q);
+QTAILQ_INSERT_TAIL(_q.q, _qe1, next);
+QTAILQ_INSERT_TAIL(_q.q, _qe2, next);
+
+QEMUFile *fsave = open_test_file(true);
+
+qemu_put_buffer(fsave, wire_q, sizeof(wire_q));
+qemu_put_byte(fsave, QEMU_VM_EOF);
+g_assert(!qemu_file_get_error(fsave));
+qemu_fclose(fsave);
+
+QEMUFile *fload = open_test_file(false);
+TestQtailq tgt;
+
+QTAILQ_INIT();
+vmstate_load_state(fload, _q, , 1);
+char eof = qemu_get_byte(fload);
+g_assert(!qemu_file_get_error(fload));
+g_assert_cmpint(tgt.i16, ==, obj_q.i16);
+g_assert_cmpint(tgt.i32, ==, obj_q.i32);
+g_assert_cmpint(eof, ==, QEMU_VM_EOF);
+
+TestQtailqElement *qele_from = QTAILQ_FIRST(_q.q);
+TestQtailqElement *qlast_from = QTAILQ_LAST(_q.q, TestQtailqHead);
+TestQtailqElement *qele_to = QTAILQ_FIRST();
+TestQtailqElement *qlast_to = QTAILQ_LAST(, TestQtailqHead);
+
+while (1) {
+g_assert_cmpint(qele_to->b, ==, qele_from->b);
+g_assert_cmpint(qele_to->u8, ==, qele_from->u8);
+if ((qele_from == qlast_from) || (qele_to == qlast_to)) {
+break;
+}
+qele_from = QTAILQ_NEXT(qele_from, next);
+qele_to = QTAILQ_NEXT(qele_to, next);
+}
+
+g_assert_cmpint((uint64_t) qele_from, ==, (uint64_t) qlast_from);
+g_assert_cmpint((uint64_t) qele_to, ==, (uint64_t) qlast_to);
+
+/* clean up */
+TestQtailqElement *qele;
+while (!QTAILQ_EMPTY()) {
+qele = QTAILQ_LAST(, TestQtailqHead);
+QTAILQ_REMOVE(, qele, next);
+free(qele);
+qele = NULL;
+}
+qemu_fclose(fload);
+}
+
 int main(int argc, char **argv)
 {
 temp_fd = mkstemp(temp_file);
@@ 

Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

2016-10-25 Thread Emilio G. Cota
On Tue, Oct 25, 2016 at 11:35:06 -0400, Pranith Kumar wrote:
> Using perf, I see that sequence lock is being a bottleneck since it is
> being read by everyone. Giving it its own cache-line seems to help
> things quite a bit.
> 
> Using qht-bench, I measured the following for:
> 
> $ ./tests/qht-bench -d 10 -n 24 -u 
> 
> throughput base   patch  %change
> update
> 0  8.07   13.33  +65%
> 10 7.10   8.90   +25%
> 20 6.34   7.02 +10%
> 30 5.48   6.11   +9.6%
> 40 4.90   5.46   +11.42%
> 
> I am not able to see any significant increases for lower thread counts though.

Honestly I don't know what you're measuring here.

Your results are low (I assume you're showing here throughput per-thread,
but still they're low), and it makes no sense that increasing the cacheline
footprint will make a *read-only* workload (0% updates) run faster.

More below.

> Signed-off-by: Pranith Kumar 
> ---
>  include/qemu/seqlock.h | 2 +-
>  util/qht.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
> index 8dee11d..954abe8 100644
> --- a/include/qemu/seqlock.h
> +++ b/include/qemu/seqlock.h
> @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock;
>  
>  struct QemuSeqLock {
>  unsigned sequence;
> -};
> +} QEMU_ALIGNED(64);
>  
>  static inline void seqlock_init(QemuSeqLock *sl)
>  {
> diff --git a/util/qht.c b/util/qht.c
> index ff4d2e6..4d82609 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -101,14 +101,14 @@
>   * be grabbed first.
>   */
>  struct qht_bucket {
> -QemuSpin lock;
>  QemuSeqLock sequence;
> +QemuSpin lock;
>  uint32_t hashes[QHT_BUCKET_ENTRIES];
>  void *pointers[QHT_BUCKET_ENTRIES];
>  struct qht_bucket *next;
>  } QEMU_ALIGNED(QHT_BUCKET_ALIGN);

I understand this is a hack but this would have been more localized:

diff --git a/util/qht.c b/util/qht.c
index ff4d2e6..55db907 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -101,14 +101,16 @@
  * be grabbed first.
  */
 struct qht_bucket {
+struct {
+QemuSeqLock sequence;
+} QEMU_ALIGNED(QHT_BUCKET_ALIGN);
 QemuSpin lock;
-QemuSeqLock sequence;
 uint32_t hashes[QHT_BUCKET_ENTRIES];
 void *pointers[QHT_BUCKET_ENTRIES];
 struct qht_bucket *next;
 } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
 
So I tested my change above vs. master on a 16-core (32-way) Intel machine
(Xeon E5-2690 @ 2.90GHz with turbo-boost disabled), making sure threads are
scheduled on separate cores, favouring same-socket ones.
Results: http://imgur.com/a/c4dTB

So really I don't know what you're measuring.
The idea of decoupling the seqlock from the spinlock's cache line doesn't
make sense to me, because:
- Bucket lock holders are very likely to update the seqlock, so it makes sense
  to have them in the same cache line (exceptions to this are resizes or
  traversals, but those are very rare and we're not measuring those in 
qht-bench)
- Thanks to resizing + good hashing, bucket chains are very short, so
  a single seqlock per bucket is all we need.
- We can have *many* buckets (200K is not crazy for the TB htable), so
  anything that increases their size needs very good justification (see
  200K results above).

E.



[Qemu-devel] [QEMU PATCH v8 1/3] migration: extend VMStateInfo

2016-10-25 Thread Jianjun Duan
Current migration code cannot handle some data structures such as
QTAILQ in qemu/queue.h. Here we extend the signatures of put/get
in VMStateInfo so that customized handling is supported.

Signed-off-by: Jianjun Duan 
---
 hw/display/virtio-gpu.c |   6 ++-
 hw/intc/s390_flic_kvm.c |   6 ++-
 hw/net/vmxnet3.c|  18 +---
 hw/nvram/eeprom93xx.c   |   6 ++-
 hw/nvram/fw_cfg.c   |   6 ++-
 hw/pci/msix.c   |   6 ++-
 hw/pci/pci.c|  12 +++--
 hw/pci/shpc.c   |   5 ++-
 hw/scsi/scsi-bus.c  |   6 ++-
 hw/timer/twl92230.c |   6 ++-
 hw/usb/redirect.c   |  18 +---
 hw/virtio/virtio-pci.c  |   6 ++-
 hw/virtio/virtio.c  |  12 +++--
 include/migration/vmstate.h |  15 +--
 migration/savevm.c  |   5 ++-
 migration/vmstate.c | 106 
 target-alpha/machine.c  |   5 ++-
 target-arm/machine.c|  12 +++--
 target-i386/machine.c   |  21 ++---
 target-mips/machine.c   |  10 +++--
 target-ppc/machine.c|  10 +++--
 target-sparc/machine.c  |   5 ++-
 22 files changed, 198 insertions(+), 104 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index fa6fd0e..2a21150 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -987,7 +987,8 @@ static const VMStateDescription vmstate_virtio_gpu_scanouts 
= {
 },
 };
 
-static void virtio_gpu_save(QEMUFile *f, void *opaque, size_t size)
+static void virtio_gpu_save(QEMUFile *f, void *opaque, size_t size,
+VMStateField *field, QJSON *vmdesc)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
@@ -1014,7 +1015,8 @@ static void virtio_gpu_save(QEMUFile *f, void *opaque, 
size_t size)
 vmstate_save_state(f, _virtio_gpu_scanouts, g, NULL);
 }
 
-static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size)
+static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size,
+   VMStateField *field)
 {
 VirtIOGPU *g = opaque;
 struct virtio_gpu_simple_resource *res;
diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index 21ac2e2..a80a812 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -286,7 +286,8 @@ static void kvm_s390_release_adapter_routes(S390FLICState 
*fs,
  * increase until buffer is sufficient or maxium size is
  * reached
  */
-static void kvm_flic_save(QEMUFile *f, void *opaque, size_t size)
+static void kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
+  VMStateField *field, QJSON *vmdesc)
 {
 KVMS390FLICState *flic = opaque;
 int len = FLIC_SAVE_INITIAL_SIZE;
@@ -331,7 +332,8 @@ static void kvm_flic_save(QEMUFile *f, void *opaque, size_t 
size)
  * Note: Do nothing when no interrupts where stored
  * in QEMUFile
  */
-static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size)
+static int kvm_flic_load(QEMUFile *f, void *opaque, size_t size,
+ VMStateField *field)
 {
 uint64_t len = 0;
 uint64_t count = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 90f6943..943a960 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2450,7 +2450,8 @@ static void vmxnet3_put_tx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, tx_stat->pktsTxDiscard);
 }
 
-static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2464,7 +2465,8 @@ static int vmxnet3_get_txq_descr(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size)
+static void vmxnet3_put_txq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field, QJSON *vmdesc)
 {
 Vmxnet3TxqDescr *r = pv;
 
@@ -2511,7 +2513,8 @@ static void vmxnet3_put_rx_stats_to_file(QEMUFile *f,
 qemu_put_be64(f, rx_stat->pktsRxError);
 }
 
-static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3RxqDescr *r = pv;
 int i;
@@ -2529,7 +2532,8 @@ static int vmxnet3_get_rxq_descr(QEMUFile *f, void *pv, 
size_t size)
 return 0;
 }
 
-static void vmxnet3_put_rxq_descr(QEMUFile *f, void *pv, size_t size)
+static void vmxnet3_put_rxq_descr(QEMUFile *f, void *pv, size_t size,
+VMStateField *field, QJSON *vmdesc)
 {
 Vmxnet3RxqDescr *r = pv;
 int i;
@@ -2574,7 +2578,8 @@ static const VMStateInfo rxq_descr_info = {
 .put = vmxnet3_put_rxq_descr
 };
 
-static int vmxnet3_get_int_state(QEMUFile *f, void *pv, size_t size)
+static int vmxnet3_get_int_state(QEMUFile *f, void *pv, size_t size,
+VMStateField *field)
 {
 Vmxnet3IntState *r = pv;
 
@@ -2585,7 +2590,8 @@ static int vmxnet3_get_int_state(QEMUFile *f, 

[Qemu-devel] [QEMU PATCH v8 2/3] migration: migrate QTAILQ

2016-10-25 Thread Jianjun Duan
Currently we cannot directly transfer a QTAILQ instance because of the
limitation in the migration code. Here we introduce an approach to
transfer such structures. We created VMStateInfo vmstate_info_qtailq
for QTAILQ. Similar VMStateInfo can be created for other data structures
such as list.

This approach will be used to transfer pending_events and ccs_list in spapr
state.

We also create some macros in qemu/queue.h to access a QTAILQ using pointer
arithmetic. This ensures that we do not depend on the implementation
details about QTAILQ in the migration code.

Signed-off-by: Jianjun Duan 
---
 include/migration/vmstate.h | 20 ++
 include/qemu/queue.h| 46 +++
 migration/trace-events  |  4 +++
 migration/vmstate.c | 67 +
 4 files changed, 137 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index d0e37b5..318a6f1 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -251,6 +251,7 @@ extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
 extern const VMStateInfo vmstate_info_bitmap;
+extern const VMStateInfo vmstate_info_qtailq;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
@@ -662,6 +663,25 @@ extern const VMStateInfo vmstate_info_bitmap;
 .offset   = offsetof(_state, _field),\
 }
 
+/* For QTAILQ that need customized handling.
+ * Target QTAILQ needs be properly initialized.
+ * _type: type of QTAILQ element
+ * _next: name of QTAILQ entry field in QTAILQ element
+ * _vmsd: VMSD for QTAILQ element
+ * size: size of QTAILQ element
+ * start: offset of QTAILQ entry in QTAILQ element
+ */
+#define VMSTATE_QTAILQ_V(_field, _state, _version, _vmsd, _type, _next)  \
+{\
+.name = (stringify(_field)), \
+.version_id   = (_version),  \
+.vmsd = &(_vmsd),\
+.size = sizeof(_type),   \
+.info = _info_qtailq,\
+.offset   = offsetof(_state, _field),\
+.start= offsetof(_type, _next),  \
+}
+
 /* _f : field name
_f_n : num of elements field_name
_n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073f..e9378fa 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -438,4 +438,50 @@ struct {   
 \
 #define QTAILQ_PREV(elm, headname, field) \
 (*(((struct headname *)((elm)->field.tqe_prev))->tqh_last))
 
+#define RAW_FIELD(base, offset)
\
+((char *) (base) + offset)
+
+/*
+ * Offsets of layout of a tail queue head.
+ */
+#define QTAILQ_FIRST_OFFSET 0
+#define QTAILQ_LAST_OFFSET (sizeof(void *))
+/*
+ * Raw access of elements of a tail queue
+ */
+#define QTAILQ_RAW_FIRST(head) 
\
+(*((void **) (RAW_FIELD(head,  QTAILQ_FIRST_OFFSET
+#define QTAILQ_RAW_LAST(head)  
\
+(*((void ***) (RAW_FIELD(head,  QTAILQ_LAST_OFFSET
+
+/*
+ * Offsets of layout of a tail queue element.
+ */
+#define QTAILQ_NEXT_OFFSET 0
+#define QTAILQ_PREV_OFFSET (sizeof(void *))
+
+/*
+ * Raw access of elements of a tail entry
+ */
+#define QTAILQ_RAW_NEXT(elm, entry)
\
+(*((void **) (RAW_FIELD(elm, entry + QTAILQ_NEXT_OFFSET
+#define QTAILQ_RAW_PREV(elm, entry)
\
+(*((void ***) (RAW_FIELD(elm, entry + QTAILQ_PREV_OFFSET
+/*
+ * Tail queue tranversal using pointer arithmetic.
+ */
+#define QTAILQ_RAW_FOREACH(elm, head, entry)   
\
+for ((elm) = QTAILQ_RAW_FIRST(head);   
\
+ (elm);
\
+ (elm) = QTAILQ_RAW_NEXT(elm, entry))
+/*
+ * Tail queue insertion using pointer arithmetic.
+ */
+#define QTAILQ_RAW_INSERT_TAIL(head, elm, entry) do {  
\
+QTAILQ_RAW_NEXT(elm, entry) = NULL;
\
+QTAILQ_RAW_PREV(elm, entry) = QTAILQ_RAW_LAST(head);   
\
+*QTAILQ_RAW_LAST(head) = (elm);
\
+QTAILQ_RAW_LAST(head) = _RAW_NEXT(elm, entry);  
\
+} while (/*CONSTCOND*/0)
+
 #endif /* QEMU_SYS_QUEUE_H */

[Qemu-devel] [QEMU PATCH v8 0/3] migration: migrate QTAILQ

2016-10-25 Thread Jianjun Duan
Hi all,

I fixed a style issue. Comments are welcome.

v8: - Fixed a style issue. 
Previous versions are:

v7: - Fixed merge errors.
- Simplified macro definitions related to pointer arithmetic based QTAILQ 
access.
- Added test case for QTAILQ migration in tests/test-vmstate.c.
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00711.html)


v6: - Split from Power specific patches. 
- Dropped VMS_LINKED flag.
- Rebased to master.
- Added comments to clarify about put/get in VMStateInfo.  
(link: http://lists.nongnu.org/archive/html/qemu-ppc/2016-10/msg00336.html)

v5: - Rebased to David's ppc-for-2.8. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg00270.html)

v4: - Introduce a way to set customized instance_id in SaveStateEntry. Use it
  to set instance_id for DRC using its unique index to address David 
  Gibson's concern.
- Rename VMS_CSTM to VMS_LINKED based on Paolo Bonzini's suggestions.
- Clean up qjson stuff in put_qtailq. 
- Add trace for put_qtailq and get_qtailq based on David Gilbert's 
  suggestion.
- Based on David's ppc-for-2.7. 
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07720.html)

v3: - Simplify overall design followng discussion with Paolo. No longer need
  metadata to migrate QTAILQ.
- Extend VMStateInfo instead of adding similar fields to VMStateField.
- Clean up macros in qemu/queue.h.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg05695.html)

v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
- Migrate signalled field in the DRC state.
- Put the newly added migrating fields in subsections so that backward 
  migration is not broken.  
- Set detach_cb field right after migration so that a migrated hot-unplug
  event could finish its course.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04188.html)

v1: - Inital version.
(link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)

Jianjun Duan (3):
  migration: extend VMStateInfo
  migration: migrate QTAILQ
  tests/migration: Add test for QTAILQ migration

 hw/display/virtio-gpu.c |   6 +-
 hw/intc/s390_flic_kvm.c |   6 +-
 hw/net/vmxnet3.c|  18 +++--
 hw/nvram/eeprom93xx.c   |   6 +-
 hw/nvram/fw_cfg.c   |   6 +-
 hw/pci/msix.c   |   6 +-
 hw/pci/pci.c|  12 ++-
 hw/pci/shpc.c   |   5 +-
 hw/scsi/scsi-bus.c  |   6 +-
 hw/timer/twl92230.c |   6 +-
 hw/usb/redirect.c   |  18 +++--
 hw/virtio/virtio-pci.c  |   6 +-
 hw/virtio/virtio.c  |  12 ++-
 include/migration/vmstate.h |  35 -
 include/qemu/queue.h|  46 
 migration/savevm.c  |   5 +-
 migration/trace-events  |   4 +
 migration/vmstate.c | 173 ++--
 target-alpha/machine.c  |   5 +-
 target-arm/machine.c|  12 ++-
 target-i386/machine.c   |  21 --
 target-mips/machine.c   |  10 ++-
 target-ppc/machine.c|  10 ++-
 target-sparc/machine.c  |   5 +-
 tests/test-vmstate.c| 160 
 25 files changed, 495 insertions(+), 104 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

2016-10-25 Thread Emilio G. Cota
On Tue, Oct 25, 2016 at 16:35:48 -0400, Pranith Kumar wrote:
> On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini  wrote:
> >
> >
> >> I've written a patch (see below) to take the per-bucket sequence locks.
> >
> > What's the performance like?
> >
> 
> Applying only this patch, the perf numbers are similar to the 128
> cache line alignment you suggested.

That makes sense. Having a single seqlock per bucket is simple and fast;
note that bucket chains should be very short (we use good hashing and
automatic resize for this purpose).

E.



Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

2016-10-25 Thread Pranith Kumar
On Tue, Oct 25, 2016 at 4:02 PM, Paolo Bonzini  wrote:
>
>
>> I've written a patch (see below) to take the per-bucket sequence locks.
>
> What's the performance like?
>

Applying only this patch, the perf numbers are similar to the 128
cache line alignment you suggested.

0 4
10 9.70
20 8.09
30 7.13
40 6.49

I am not sure why only 100% reader case is so low. Applying the
sequence lock cache alignment patch brings it back up to 13
MT/s/thread.

-- 
Pranith



Re: [Qemu-devel] [PATCH v2 0/2] memory: Convert skip_dump to ram_device and avoid memcpy

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 20:17, Alex Williamson wrote:
> v2: retain ram_device flag to avoid extra cache miss, per Paolo.
> 
> Paolo, posting for completeness, I can merge through my tree if you
> want to Ack.  Thanks,

Great, thanks.

Acked-by: Paolo Bonzini 

> Alex
> 
> ---
> 
> Alex Williamson (2):
>   memory: Replace skip_dump flag with "ram_device"
>   memory: Don't use memcpy for ram_device regions
> 
> 
>  hw/vfio/common.c  |9 ++
>  hw/vfio/spapr.c   |2 +
>  include/exec/memory.h |   47 -
>  memory.c  |   80 
> +++--
>  memory_mapping.c  |2 +
>  trace-events  |2 +
>  6 files changed, 116 insertions(+), 26 deletions(-)
> 
> 



Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line

2016-10-25 Thread Paolo Bonzini


On 25/10/2016 21:12, Pranith Kumar wrote:
> 
> Paolo Bonzini writes:
> 
>> On 25/10/2016 17:49, Pranith Kumar wrote:
>>> But we are taking the seqlock of only the head bucket, while the
>>> readers are reading hashes/pointers of the chained buckets.
>>
>> No, we aren't.  See qht_lookup__slowpath.
> 
> 
> I don't see it. The reader is taking the head bucket look in
> qht_lookup__slowpath() and then iterating over the chained buckets in
> qht_do_lookup(). The writer is doing the same. It is taking the head bucket
> lock in qht_insert__locked().

Uh, you're right.  Sorry I wasn't reading it correctly.

> I've written a patch (see below) to take the per-bucket sequence locks.

What's the performance like?

Paolo

>>
>> This patch:
>>
>> throughput base   patch  %change
>> update
>> 0  8.07   13.33  +65%
>> 10 7.10   8.90   +25%
>> 20 6.34   7.02   +10%
>> 30 5.48   6.11   +9.6%
>> 40 4.90   5.46   +11.42%
>>
>>
>> Just doubling the cachesize:
>>
>> throughput base   patch  %change
>> update
>> 0  8.07   4.47   -45% ?!?
>> 10 7.10   9.82   +38%
>> 20 6.34   8.13   +28%
>> 30 5.48   7.13   +30%
>> 40 5.90   6.45   +30%
>>
>> It seems to me that your machine has 128-byte cachelines.
>>
> 
> Nope. It is just the regular 64 byte cache line.
> 
> $ getconf LEVEL1_DCACHE_LINESIZE
> 64
> 
> (The machine model is Xeon CPU E5-2620).
> 
> 
> Take the per-bucket sequence locks instead of the head bucket lock.
> 
> Signed-off-by: Pranith Kumar 
> ---
>  util/qht.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/util/qht.c b/util/qht.c
> index 4d82609..cfce5fc 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -374,19 +374,19 @@ static void qht_bucket_reset__locked(struct qht_bucket 
> *head)
>  struct qht_bucket *b = head;
>  int i;
>  
> -seqlock_write_begin(>sequence);
>  do {
> +seqlock_write_begin(>sequence);
>  for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>  if (b->pointers[i] == NULL) {
> -goto done;
> +seqlock_write_end(>sequence);
> +return;
>  }
>  atomic_set(>hashes[i], 0);
>  atomic_set(>pointers[i], NULL);
>  }
> +seqlock_write_end(>sequence);
>  b = b->next;
>  } while (b);
> - done:
> -seqlock_write_end(>sequence);
>  }
>  
>  /* call with all bucket locks held */
> @@ -446,6 +446,8 @@ void *qht_do_lookup(struct qht_bucket *head, 
> qht_lookup_func_t func,
>  int i;
>  
>  do {
> +void *q = NULL;
> +unsigned int version = seqlock_read_begin(>sequence);
>  for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
>  if (atomic_read(>hashes[i]) == hash) {
>  /* The pointer is dereferenced before seqlock_read_retry,
> @@ -455,11 +457,16 @@ void *qht_do_lookup(struct qht_bucket *head, 
> qht_lookup_func_t func,
>  void *p = atomic_rcu_read(>pointers[i]);
>  
>  if (likely(p) && likely(func(p, userp))) {
> -return p;
> +q = p;
> +break;
>  }
>  }
>  }
> -b = atomic_rcu_read(>next);
> +if (!q) {
> +b = atomic_rcu_read(>next);
> +} else if (!seqlock_read_retry(>sequence, version)) {
> +return q;
> +}
>  } while (b);
>  
>  return NULL;
> @@ -469,14 +476,7 @@ static __attribute__((noinline))
>  void *qht_lookup__slowpath(struct qht_bucket *b, qht_lookup_func_t func,
> const void *userp, uint32_t hash)
>  {
> -unsigned int version;
> -void *ret;
> -
> -do {
> -version = seqlock_read_begin(>sequence);
> -ret = qht_do_lookup(b, func, userp, hash);
> -} while (seqlock_read_retry(>sequence, version));
> -return ret;
> +return qht_do_lookup(b, func, userp, hash);
>  }
>  
>  void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
> @@ -537,14 +537,14 @@ static bool qht_insert__locked(struct qht *ht, struct 
> qht_map *map,
>  
>   found:
>  /* found an empty key: acquire the seqlock and write */
> -seqlock_write_begin(>sequence);
> +seqlock_write_begin(>sequence);
>  if (new) {
>  atomic_rcu_set(>next, b);
>  }
>  /* smp_wmb() implicit in seqlock_write_begin.  */
>  atomic_set(>hashes[i], hash);
>  atomic_set(>pointers[i], p);
> -seqlock_write_end(>sequence);
> +seqlock_write_end(>sequence);
>  return true;
>  }
>  
> @@ -665,9 +665,9 @@ bool qht_remove__locked(struct qht_map *map, struct 
> qht_bucket *head,
>  }
>  if (q == p) {
>  qht_debug_assert(b->hashes[i] == hash);
> -seqlock_write_begin(>sequence);
> +seqlock_write_begin(>sequence);
>  

Re: [Qemu-devel] [PATCH v4 5/5] qapi: allow blockdev-add for ssh

2016-10-25 Thread Eric Blake
On 10/25/2016 08:04 AM, Ashijeet Acharya wrote:
> Introduce new object 'BlockdevOptionsSsh' in qapi/block-core.json to
> support blockdev-add for SSH network protocol driver. Use only 'struct
> InetSocketAddress' since SSH only supports connection over TCP.
> 
> Signed-off-by: Ashijeet Acharya 
> Reviewed-by: Kevin Wolf 
> ---
>  qapi/block-core.json | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)

Sorry for not noticing this when I finally replied to v4;


> +##
> +# @BlockdevOptionsSsh
> +#
> +# @server:  host address
> +#
> +# @path:path to the image on the host
> +#
> +# @user:#optional user as which to connect, defaults to 
> current
> +#   local user name
> +#
> +# @host_key_check   #optional defines how and what to check the host
> +#   key against, defaults to "yes"

I still have reservations about this parameter. I think we have time to
fix it as followups during soft freeze if Kevin would rather get your
initial patches in now, if that's what it takes to meet soft freeze
deadlines, but I do not want to bake it into the actual 2.8 release
without addressing those concerns.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH V6 2/7] nios2: Add architecture emulation support

2016-10-25 Thread Marek Vasut
From: Chris Wulff 

Add support for emulating Altera NiosII R1 architecture into qemu.
This patch is based on previous work by Chris Wulff from 2012 and
updated to latest mainline QEMU.

Signed-off-by: Marek Vasut 
Cc: Chris Wulff 
Cc: Jeff Da Silva 
Cc: Ley Foon Tan 
Cc: Sandra Loosemore 
Cc: Yves Vandervennet 
---
V3: Thorough cleanup, deal with the review comments all over the place
V4: - Use extract32()
- Fix gen_goto_tb() , suppress tcg_gen_goto_tb()
- Clean up gen_check_supervisor() helper
- Use TCGMemOp type for flags
- Drop jump labels from wrctl/rdctl
- More TCG cleanup
V5: - Simplify load/store handling
- Handle loads into R_ZERO from protected page, add comment
V6: - Fix division opcode handling
- Add missing disas handling
- V5 review comments cleanup
---
 target-nios2/Makefile.objs |   4 +
 target-nios2/cpu.c | 232 +++
 target-nios2/cpu.h | 270 +
 target-nios2/helper.c  | 313 +++
 target-nios2/helper.h  |  27 ++
 target-nios2/mmu.c | 292 ++
 target-nios2/mmu.h |  54 +++
 target-nios2/monitor.c |  35 ++
 target-nios2/op_helper.c   |  47 +++
 target-nios2/translate.c   | 953 +
 10 files changed, 2227 insertions(+)
 create mode 100644 target-nios2/Makefile.objs
 create mode 100644 target-nios2/cpu.c
 create mode 100644 target-nios2/cpu.h
 create mode 100644 target-nios2/helper.c
 create mode 100644 target-nios2/helper.h
 create mode 100644 target-nios2/mmu.c
 create mode 100644 target-nios2/mmu.h
 create mode 100644 target-nios2/monitor.c
 create mode 100644 target-nios2/op_helper.c
 create mode 100644 target-nios2/translate.c

diff --git a/target-nios2/Makefile.objs b/target-nios2/Makefile.objs
new file mode 100644
index 000..2a11c5c
--- /dev/null
+++ b/target-nios2/Makefile.objs
@@ -0,0 +1,4 @@
+obj-y += translate.o op_helper.o helper.o cpu.o mmu.o
+obj-$(CONFIG_SOFTMMU) += monitor.o
+
+$(obj)/op_helper.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
diff --git a/target-nios2/cpu.c b/target-nios2/cpu.c
new file mode 100644
index 000..658d684
--- /dev/null
+++ b/target-nios2/cpu.c
@@ -0,0 +1,232 @@
+/*
+ * QEMU Nios II CPU
+ *
+ * Copyright (c) 2012 Chris Wulff 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "exec/log.h"
+#include "exec/gdbstub.h"
+#include "hw/qdev-properties.h"
+
+static void nios2_cpu_set_pc(CPUState *cs, vaddr value)
+{
+Nios2CPU *cpu = NIOS2_CPU(cs);
+CPUNios2State *env = >env;
+
+env->regs[R_PC] = value;
+}
+
+static bool nios2_cpu_has_work(CPUState *cs)
+{
+return cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
+}
+
+/* CPUClass::reset() */
+static void nios2_cpu_reset(CPUState *cs)
+{
+Nios2CPU *cpu = NIOS2_CPU(cs);
+Nios2CPUClass *ncc = NIOS2_CPU_GET_CLASS(cpu);
+CPUNios2State *env = >env;
+
+if (qemu_loglevel_mask(CPU_LOG_RESET)) {
+qemu_log("CPU Reset (CPU %d)\n", cs->cpu_index);
+log_cpu_state(cs, 0);
+}
+
+ncc->parent_reset(cs);
+
+tlb_flush(cs, 1);
+
+memset(env->regs, 0, sizeof(uint32_t) * NUM_CORE_REGS);
+env->regs[R_PC] = cpu->reset_addr;
+
+#if defined(CONFIG_USER_ONLY)
+/* Start in user mode with interrupts enabled. */
+env->regs[CR_STATUS] = CR_STATUS_U | CR_STATUS_PIE;
+#endif
+}
+
+static void nios2_cpu_initfn(Object *obj)
+{
+CPUState *cs = CPU(obj);
+Nios2CPU *cpu = NIOS2_CPU(obj);
+CPUNios2State *env = >env;
+static bool tcg_initialized;
+
+cpu->mmu_present = true;
+cs->env_ptr = env;
+
+#if !defined(CONFIG_USER_ONLY)
+mmu_init(>mmu);
+#endif
+
+if (tcg_enabled() && !tcg_initialized) {
+tcg_initialized = true;
+nios2_tcg_init();
+}
+}
+
+Nios2CPU *cpu_nios2_init(const char *cpu_model)
+{
+Nios2CPU *cpu = NIOS2_CPU(object_new(TYPE_NIOS2_CPU));
+
+object_property_set_bool(OBJECT(cpu), true, "realized", NULL);
+
+return cpu;
+}
+
+static void nios2_cpu_realizefn(DeviceState *dev, Error **errp)
+{
+CPUState *cs = CPU(dev);
+

Re: [Qemu-devel] [PATCH v3 5/5] qapi: allow blockdev-add for ssh

2016-10-25 Thread Eric Blake
On 10/17/2016 12:32 PM, Ashijeet Acharya wrote:
> Introduce new object 'BlockdevOptionsSsh' in qapi/block-core.json to
> support blockdev-add for SSH network protocol driver. Use only 'struct
> InetSocketAddress' since SSH only supports connection over TCP.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  qapi/block-core.json | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 

> +++ b/qapi/block-core.json
> @@ -1716,7 +1716,8 @@
>  'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
>  'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
>  'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw',
> - 'replication', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'replication', 'ssh', 'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc',
> +'vvfat' ] }

Please update the comment just before the enum that mentions 'ssh' as an
addition in 2.8.


> +##
> +# @BlockdevOptionsSsh
> +#
> +# @server:  host address
> +#
> +# @path:path to the image on the host
> +#
> +# @user:#optional user as which to connect, defaults to 
> current
> +#   local user name
> +#
> +# @host_key_check   #optional defines how and what to check the host
> +#   key against, defaults to "yes"

Please s/host_key_check/host-key-check/ - new interfaces should favor
dash, not underscore. (The C code will be the same, though.)

> +#
> +# Since 2.8
> +##
> +{ 'struct': 'BlockdevOptionsSsh',
> +  'data': { 'server': 'InetSocketAddress',
> +'path': 'str',
> +'*user': 'str',
> +'*host_key_check': 'str' } }

Is host-key-check truly a free-form string, or is it only a finite set
of valid possibilities, where 'yes' is the default string?  Would it be
better to express it as an enum instead of a raw string?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >