date:20160614

Re: [Qemu-devel] [PATCH 06/10] ppc: Rework generation of priv and inval interrupts

2016-06-14 Thread David Gibson

On Wed, Jun 15, 2016 at 02:31:56PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2016-06-15 at 11:19 +1000, David Gibson wrote:
> > 
> > >  static void spr_noaccess(DisasContext *ctx, int gprn, int sprn)
> > > @@ -4348,9 +4371,15 @@ static inline void gen_op_mfspr(DisasContext *ctx)
> > >   TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 
> > > 4);
> > >  }
> > >  }
> > > -gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
> > > +gen_priv_exception(ctx, POWERPC_EXCP_PRIV_REG);
> > >  }
> > >  } else {
> > > +/* ISA 2.07 defines these as no-ops */
> > > +if ((ctx->insns_flags2 & PPC2_ISA207S) &&
> > > +(sprn >= 808 && sprn <= 811)) {
> > > +/* This is a nop */
> > > +return;
> > > +}
> > >  /* Not defined */
> > >  fprintf(stderr, "Trying to read invalid spr %d (0x%03x) at "
> > >  TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 4);
> > > @@ -4358,9 +4387,18 @@ static inline void gen_op_mfspr(DisasContext *ctx)
> > >  qemu_log("Trying to read invalid spr %d (0x%03x) at "
> > >   TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 4);
> > >  }
> > > -/* Only generate an exception in user space, otherwise this is a 
> > > nop */
> > > -if (ctx->pr) {
> > > -gen_inval_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> > > +
> > > +/* The behaviour depends on MSR:PR and SPR# bit 0x10,
> > > + * it can generate a priv, a hv emu or a no-op
> > > + */
> > > +if (sprn & 0x10) {
> > > +if (ctx->pr) {
> > > +gen_priv_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> > > +}
> > > +} else {
> > > +if (ctx->pr || sprn == 0 || sprn == 4 || sprn == 5 || sprn 
> > > == 6) {
> > > +gen_hvpriv_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> > 
> > Just double checking this logic.  So in this case we get an exception
> > to the hypervisor if executed in guest user mode, but a no-op if
> > 
> > executed in guest supervisor mode.  That seems.. odd.
> 
> >From the architecture:
> 
> * if spr 0 =0:
>   - if MSR PR =1: Hypervisor Emulation Assistance
> interrupt
>   - if MSR PR =0: Hypervisor Emulation Assistance
> interrupt for SPRs 0, 4, 5, and 6 and no opera-
> tion (i.e. the instruction is treated as a no-op)
> for all other SPRs
> 
> * if spr 0 =1:
>   - if MSR PR =1: Privileged Instruction type Pro-
> gram interrupt
>   - if MSR PR =0: no operation (i.e. the instruction
> is treated as a no-op)
> 
> IE. SPRs with 0x10 are supervisor priv, so PR access will trap to
> the OS, whether they are implemented or not.
> 
> Otherwise, you get the "system illegal isntruction" handler which
> is turned into an HVPRIV on all recent processors (the exception code
> will turn that back into a 0x700 if the processor doesn't support
> HVPRIV).
> 
> It was done this way so that an OS (guest) can context switch a bunch
> of supervisor SPRs without having to test if they individually exist
> on a given processor.

Huh.  Alright then.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 1/9] qom: API to get instance_size of a type

2016-06-14 Thread David Gibson

On Fri, Jun 10, 2016 at 09:38:24AM +0200, Igor Mammedov wrote:
> On Fri, 10 Jun 2016 14:04:41 +1000
> David Gibson  wrote:
> 
> > On Fri, Jun 10, 2016 at 06:29:00AM +0530, Bharata B Rao wrote:
> > > Add an API object_type_get_size(const char *typename) that returns the
> > > instance_size of the give typename.
> I'd rename it to
>   object_type_get_instance_size()
> so it'd apparent what size is returned from name.

I'll make that change in my tree.

> 
> > > 
> > > Signed-off-by: Bharata B Rao   
> > 
> > Reviewed-by: David Gibson 
> > 
> > This looks sensible to me, it would be nice to have an ack or two from
> > the qemu community at large.
> > 
> > > ---
> > >  include/qom/object.h | 8 +++-
> > >  qom/object.c | 8 
> > >  2 files changed, 15 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/qom/object.h b/include/qom/object.h
> > > index 21bb5ff..460ddfc 100644
> > > --- a/include/qom/object.h
> > > +++ b/include/qom/object.h
> > > @@ -1608,5 +1608,11 @@ int object_child_foreach_recursive(Object *obj,
> > >   */
> > >  Object *container_get(Object *root, const char *path);
> > >  
> > > -
> > > +/**
> > > + * object_type_get_size:
> > > + * @typename: Name of the Type whose instance_size is required
> > > + *
> > > + * Returns the instance_size of the given @typename.
> > > + */
> > > +size_t object_type_get_size(const char *typename);
> > >  #endif
> > > diff --git a/qom/object.c b/qom/object.c
> > > index 3bc8a00..0e75877 100644
> > > --- a/qom/object.c
> > > +++ b/qom/object.c
> > > @@ -202,6 +202,14 @@ static size_t type_object_get_size(TypeImpl *ti)
> > >  return 0;
> > >  }
> > >  
> > > +size_t object_type_get_size(const char *typename)
> > > +{
> > > +TypeImpl *type = type_get_by_name(typename);
> > > +
> > > +g_assert(type != NULL);
> > > +return type_object_get_size(type);
> > > +}
> > > +
> > >  static bool type_is_ancestor(TypeImpl *type, TypeImpl *target_type)
> > >  {
> > >  assert(target_type);  
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 06/10] ppc: Rework generation of priv and inval interrupts

2016-06-14 Thread Benjamin Herrenschmidt

On Wed, 2016-06-15 at 11:19 +1000, David Gibson wrote:
> 
> >  static void spr_noaccess(DisasContext *ctx, int gprn, int sprn)
> > @@ -4348,9 +4371,15 @@ static inline void gen_op_mfspr(DisasContext *ctx)
> >   TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 4);
> >  }
> >  }
> > -gen_inval_exception(ctx, POWERPC_EXCP_PRIV_REG);
> > +gen_priv_exception(ctx, POWERPC_EXCP_PRIV_REG);
> >  }
> >  } else {
> > +/* ISA 2.07 defines these as no-ops */
> > +if ((ctx->insns_flags2 & PPC2_ISA207S) &&
> > +(sprn >= 808 && sprn <= 811)) {
> > +/* This is a nop */
> > +return;
> > +}
> >  /* Not defined */
> >  fprintf(stderr, "Trying to read invalid spr %d (0x%03x) at "
> >  TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 4);
> > @@ -4358,9 +4387,18 @@ static inline void gen_op_mfspr(DisasContext *ctx)
> >  qemu_log("Trying to read invalid spr %d (0x%03x) at "
> >   TARGET_FMT_lx "\n", sprn, sprn, ctx->nip - 4);
> >  }
> > -/* Only generate an exception in user space, otherwise this is a 
> > nop */
> > -if (ctx->pr) {
> > -gen_inval_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> > +
> > +/* The behaviour depends on MSR:PR and SPR# bit 0x10,
> > + * it can generate a priv, a hv emu or a no-op
> > + */
> > +if (sprn & 0x10) {
> > +if (ctx->pr) {
> > +gen_priv_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> > +}
> > +} else {
> > +if (ctx->pr || sprn == 0 || sprn == 4 || sprn == 5 || sprn == 
> > 6) {
> > +gen_hvpriv_exception(ctx, POWERPC_EXCP_INVAL_SPR);
> 
> Just double checking this logic.  So in this case we get an exception
> to the hypervisor if executed in guest user mode, but a no-op if
> 
> executed in guest supervisor mode.  That seems.. odd.

From the architecture:

* if spr 0 =0:
  - if MSR PR =1: Hypervisor Emulation Assistance
interrupt
  - if MSR PR =0: Hypervisor Emulation Assistance
interrupt for SPRs 0, 4, 5, and 6 and no opera-
tion (i.e. the instruction is treated as a no-op)
for all other SPRs

* if spr 0 =1:
  - if MSR PR =1: Privileged Instruction type Pro-
gram interrupt
  - if MSR PR =0: no operation (i.e. the instruction
is treated as a no-op)

IE. SPRs with 0x10 are supervisor priv, so PR access will trap to
the OS, whether they are implemented or not.

Otherwise, you get the "system illegal isntruction" handler which
is turned into an HVPRIV on all recent processors (the exception code
will turn that back into a 0x700 if the processor doesn't support
HVPRIV).

It was done this way so that an OS (guest) can context switch a bunch
of supervisor SPRs without having to test if they individually exist
on a given processor.

Cheers,
Ben.

>

Re: [Qemu-devel] [Qemu-ppc] Determining interest in PPC e500spin, yield, and openpic patches

2016-06-14 Thread David Gibson

On Mon, Jun 13, 2016 at 06:08:30PM -0500, alar...@ddci.com wrote:
> We've used older versions of QEMU for several years as a virtual
> target for our OS.  Many thanks to the community for providing this
> platform.

Hi,

Thanks for your interest.

> We've been working to get our OS running under QEMU 2.x and have
> identified a few bugs in QEMU, have made some enhancements, and are
> still tracking down some other curious behaviors.  I'm looking for
> some guidance as to how, and whether, you'd like patches for the
> following.

Always interested in bug fixes.  The e500 support probably doesn't get
a whole lot of attention these days, but you're proof that there are
at least a few people who care.

> 1. There is a defect in ppce500_spin.c:spin_kick() that creates an
>incorrectly sized TLB entry.  This was reported as bug
>https://bugs.launchpad.net/qemu/+bug/1587535  I can provide a
>patch if desired.

Absolutely.

> 2. We have implemented the PPC "yield" instruction.  I can provide a
>patch if desired.

Sounds good.

> 3. We're working on support for openpic timers.  We're not finished,
>but it would be helpful to know if a patch is desired or if we
>should expect to maintain the changes independently.

I don't see a reason we wouldn't want it, unless it's horribly
invasive.  By all means post patches, and I'll review as best I can.

> 4. We're currently tracking down why in our e500 (both unicore and
>multi-core) PPC QEMU 2.5 guest (x86 host), that with interrupts
>disabled, after enabling the decrementer and issuing a "wait"
>instruction QEMU continues to "busy loop", consuming an entire host
>CPU doing apparently nothing.  As expected, issuing a "wait" prior
>to enabling the decrementer leaves the host process idle.  We found
>the bug in the PPC "wait" instruction implementation that was
>independently reported and resolved last week, but that did not fix
>the problem.  We also have our OS running on the g3beige and using
>MSR.POW which causes the host to "sleep", but we are having no joy
>with e500 and "wait".  Any pointers would be appreciated.  When we
>find something we'll report back.

Ok.


So, patches for ppc related code should be sent to myself and Alex
Graf , co-maintainers for the target-ppc and related
code.  You should CC both qemu-...@nongnu.org and
qemu-devel@nongnu.org.

I have a (frequently rebased) git tree where I stage ppc patches at:
https://github.com/dgibson/qemu/tree/ppc-for-2.7

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] Bug in ppc/BookE wait instruction

2016-06-14 Thread David Gibson

On Mon, Jun 06, 2016 at 10:47:28AM +0200, Jakub Horak wrote:
> 
> Hello,
> 
> David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Fri, Jun 03, 2016 at 05:45:49PM +0200, Jakub Horak wrote:
> > > Hello,
> > > I think there's a bug in "wait" instruction code generator for PowerPC
> > > architecture. It doesn't make sense to store a non-initialized register.
> > > 
> > > Best regards,
> > > Jakub Horak
> > 
> > The fix looks correct, but I need a Signed-off-by line in order to
> > apply it.
> 
> Here you go:
> 
> Fixed bug in code generator.
> 
> Signed-off-by: Jakub Horak 
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index f5ceae5..6af567b 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -3439,7 +3439,7 @@ static void gen_sync(DisasContext *ctx)
>  /* wait */
>  static void gen_wait(DisasContext *ctx)
>  {
> -TCGv_i32 t0 = tcg_temp_new_i32();
> +TCGv_i32 t0 = tcg_const_i32(1);
>  tcg_gen_st_i32(t0, cpu_env,
> -offsetof(PowerPCCPU, env) + offsetof(CPUState, halted));
>  tcg_temp_free_i32(t0);
> 
> 
> > 
> > In future, please send such patches to myself and Alex Graf
> > (target-ppc maintainers) the qemu-ppc list as well as qemu-devel.  I
> > wouldn't have spotted this if Marc Cave-Ayland hadn't copied it to me.
> 
> Sorry, I'll be more thoughtful next time.

Sorry, I forgot about this one after you resent, I've now applied to 
ppc-for-2.7.

> 
> Best regards,
> Jakub 
> 
> > 
> > > 
> > > 
> > > diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> > > index f5ceae5..6af567b 100644
> > > --- a/target-ppc/translate.c
> > > +++ b/target-ppc/translate.c
> > > @@ -3439,7 +3439,7 @@ static void gen_sync(DisasContext *ctx)
> > >  /* wait */
> > >  static void gen_wait(DisasContext *ctx)
> > >  {
> > > -TCGv_i32 t0 = tcg_temp_new_i32();
> > > +TCGv_i32 t0 = tcg_const_i32(1);
> > >  tcg_gen_st_i32(t0, cpu_env,
> > > -offsetof(PowerPCCPU, env) + offsetof(CPUState, 
> > > halted));
> > >  tcg_temp_free_i32(t0);
> > > 
> > 
> 
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 8/9] mirror: use synch scheme for drive mirror

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> Block commit of the active image to the backing store on a slow disk
> could never end. For example with the guest with the following loop
> inside
> while true; do
> dd bs=1k count=1 if=/dev/zero of=x
> done
> running above slow storage could not complete the operation with a

s/with/within/

> resonable amount of time:

s/resonable/reasonable/

> virsh blockcommit rhel7 sda --active --shallow
> virsh qemu-monitor-event
> virsh qemu-monitor-command rhel7 \
> '{"execute":"block-job-complete",\
>   "arguments":{"device":"drive-scsi0-0-0-0"} }'
> virsh qemu-monitor-event
> Completion event is never received.
> 
> This problem could not be fixed easily with the current architecture. We
> should either prohibit guest writes (making dirty bitmap dirty) or switch
> to the sycnchronous scheme.

s/sycnchronous/synchronous/

> 
> This patch implements the latter. It adds mirror_before_write_notify
> callback. In this case all data written from the guest is synchnonously

s/synchnonously/synchronously/

> written to the mirror target. Though the problem is solved partially.
> We should switch from bdrv_dirty_bitmap to simple hbitmap. This will be
> done in the next patch.
> 

In other words, the mere act of mirroring a guest will now be
guest-visible in that the guest is auto-throttled while waiting for the
mirroring to be written out.  It seems like you would want to be able to
opt in or out of this scheme.  Is it something that can be toggled
mid-operation (try asynchronous, and switch to synchronous if a timeout
elapses)?

> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 

I'll leave the actual idea to others to review, because there may be
some ramifications that I'm not thinking of.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:

In the subject: 'allow to save buffer' is not idiomatic English; better
would be 'allow saving the buffer' or simply 'save the buffer'

> Properly cook MirrorOp initialization/deinitialization. The field is not
> yet used actually.

Another "what" but not "why" - expanding the commit message to mention
"why" makes it easier to review.

> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 32 +++-
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index d8be80a..7471211 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -73,6 +73,7 @@ typedef struct MirrorOp {
>  QEMUIOVector qiov;
>  int64_t sector_num;
>  int nb_sectors;
> +void *buf;
>  } MirrorOp;
>  
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -100,24 +101,28 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  s->in_flight--;
>  s->sectors_in_flight -= op->nb_sectors;
>  iov = op->qiov.iov;
> -for (i = 0; i < op->qiov.niov; i++) {
> -MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
> -QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
> -s->buf_free_count++;
> -}
>  
> -sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> -chunk_num = op->sector_num / sectors_per_chunk;
> -nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
> -bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
> -if (ret >= 0) {
> -if (s->cow_bitmap) {
> -bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
> +if (op->buf == NULL) {
> +for (i = 0; i < op->qiov.niov; i++) {
> +MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
> +QSIMPLEQ_INSERT_TAIL(>buf_free, buf, next);
> +s->buf_free_count++;
> +}
> +
> +sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +chunk_num = op->sector_num / sectors_per_chunk;
> +nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
> +bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);

I still think it might be smarter to fix bitmap operations to work on
byte inputs (still sectors, or rather granularity chunks, under the
hood, but no need to make users scale things just to have it scaled again).

> +if (ret >= 0) {
> +if (s->cow_bitmap) {
> +bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
> +}
> +s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
>  }
> -s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
>  }
>  
>  qemu_iovec_destroy(>qiov);
> +g_free(op->buf);
>  g_free(op);
>  
>  if (s->waiting_for_io) {
> @@ -255,6 +260,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
> sector_num,
>  op->s = s;
>  op->sector_num = sector_num;
>  op->nb_sectors = nb_sectors;
> +op->buf = NULL;
>  
>  /* Now make a QEMUIOVector taking enough granularity-sized chunks
>   * from s->buf_free.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 6/9] block: pass qiov into before_write notifier

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> Signed-off-by: Denis V. Lunev 

The commit message says what, but not why.  It's useful to give
reviewers a hint as to why a patch makes sense (such as a future patch
being able to use the write notifier to make mirroring more efficient if
it knows what is being mirrored).

> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/io.c| 12 +++-
>  include/block/block_int.h |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 

> @@ -2228,7 +2230,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
> int64_t sector_num,
>  return 0;
>  }
>  
> -tracked_request_begin(, bs, sector_num, nb_sectors,
> +tracked_request_begin(, bs, NULL, sector_num, nb_sectors,
>BDRV_TRACKED_DISCARD);
>  bdrv_set_dirty(bs, sector_num, nb_sectors);
>  
> @@ -2331,7 +2333,7 @@ static int bdrv_co_do_ioctl(BlockDriverState *bs, int 
> req, void *buf)
>  };
>  BlockAIOCB *acb;
>  
> -tracked_request_begin(_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
> +tracked_request_begin(_req, bs, NULL, 0, 0, BDRV_TRACKED_IOCTL);
>  if (!drv || !drv->bdrv_aio_ioctl) {
>  co.ret = -ENOTSUP;
>  goto out;
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 30a9717..72f463a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -69,6 +69,7 @@ enum BdrvTrackedRequestType {
>  
>  typedef struct BdrvTrackedRequest {
>  BlockDriverState *bs;
> +QEMUIOVector *qiov;
>  int64_t offset;
>  unsigned int bytes;

I guess bytes and qiov->size are not always redundant, because you pass
NULL for qiov for a zero or discard operation (an alternative would be
to always pass a qiov, even for zero/discard, so that we only need a
single size).  But I've been pointing out our inconsistent use of qiov
for zeroes in multiple places, so I don't think it's worth changing in
this series, but in one of its own if we want to do it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 07/10] ppc: Add real mode CI load/store instructions for P7 and P8

2016-06-14 Thread David Gibson

On Mon, Jun 13, 2016 at 07:24:53AM +0200, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> Those instructions are only available in hypervisor real mode and
> allow cache inhibited garded access to devices in that mode.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: fixed checkpatch.pl errors ]
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
> 
> This patch still has a couple of checkpatch issues which I did not
> know quite understand:
> 
> ERROR: Macros with complex values should be enclosed in parenthesis
> #176: FILE: target-ppc/translate.c:10238:
> +#define GEN_LDX_E(name, ldop, opc2, opc3, type, type2, chk)  
>  \
>  GEN_HANDLER_E(name##x, 0x1F, opc2, opc3, 0x0001, type, type2),
> 
> ERROR: Macros with complex values should be enclosed in parenthesis
> #200: FILE: target-ppc/translate.c:10277:
> +#define GEN_STX_E(name, stop, opc2, opc3, type, type2, chk)  
>  \
>  GEN_HANDLER_E(name##x, 0x1F, opc2, opc3, 0x0001, type, type2),

I believe that's because of the trailing comma in those expansions - I
think those are making checkpatch think the macro expands to an
expression, which should be parenthesised, but isn't.

That expansion is kinda weird, and quite possibly a bad idea, but it
wasn't actually changed by this patch - this just changes the macro
arguments.  So, I don't think it's within the scope of this patch to
fix it, i.e. I believe you can ignore the checkpatch warnings in this
case.

> total: 2 errors, 0 warnings, 196 lines checked
> 
> 
>  target-ppc/cpu.h|  4 ++-
>  target-ppc/translate.c  | 59 
> -
>  target-ppc/translate_init.c |  6 +++--
>  3 files changed, 55 insertions(+), 14 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index f005549c352e..61a24b19ffce 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -1912,6 +1912,8 @@ enum {
>  PPC_POPCNTB= 0x1000ULL,
>  /*   string load / store 
> */
>  PPC_STRING = 0x2000ULL,
> +/*   real mode cache inhibited load / store  
> */
> +PPC_CILDST = 0x4000ULL,
>  
>  /* Floating-point unit extensions
> */
>  /*   Optional floating point instructions
> */
> @@ -2026,7 +2028,7 @@ enum {
>  | PPC_MFAPIDI | PPC_TLBIVA | PPC_TLBIVAX \
>  | PPC_4xx_COMMON | PPC_40x_ICBT | PPC_RFMCI \
>  | PPC_RFDI | PPC_DCR | PPC_DCRX | PPC_DCRUX \
> -| PPC_POPCNTWD)
> +| PPC_POPCNTWD | PPC_CILDST)
>  
>  /* extended type values */
>  
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 2ec858063ecc..b594b18399f7 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -192,7 +192,7 @@ struct DisasContext {
>  uint32_t opcode;
>  uint32_t exception;
>  /* Routine used to access memory */
> -bool pr, hv;
> +bool pr, hv, dr;
>  bool lazy_tlb_flush;
>  int mem_idx;
>  int access_type;
> @@ -387,6 +387,7 @@ typedef struct opcode_t {
>  #if defined(CONFIG_USER_ONLY)
>  #define CHK_HV GEN_PRIV
>  #define CHK_SV GEN_PRIV
> +#define CHK_HVRM GEN_PRIV
>  #else
>  #define CHK_HV  \
>  do {\
> @@ -400,6 +401,12 @@ typedef struct opcode_t {
>  GEN_PRIV;\
>  }\
>  } while (0)
> +#define CHK_HVRM\
> +do {\
> +if (unlikely(ctx->pr || !ctx->hv || ctx->dr)) { \
> +GEN_PRIV;   \
> +}   \
> +} while (0)
>  #endif
>  
>  #define CHK_NONE
> @@ -2895,18 +2902,23 @@ static void glue(gen_, name##ux)(DisasContext *ctx)
>  tcg_temp_free(EA);   
>  \
>  }
>  
> -#define GEN_LDX_E(name, ldop, opc2, opc3, type, type2)   
>  \
> +#define GEN_LDX_E(name, ldop, opc2, opc3, type, type2, chk)  
>  \
>  static void glue(gen_, name##x)(DisasContext *ctx)   
>  \
>  {
>  \
>  TCGv EA; 
>  \
> +chk; 
>  \
>  gen_set_access_type(ctx, ACCESS_INT);
>  \
>  EA =

Re: [Qemu-devel] [PATCH] hw/ppc/spapr: Silence deprecation message in qtest mode

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 07:23:03PM +0200, Thomas Huth wrote:
> When running "make check", there is currently always an error message
> saying "spapr-pci-vfio-host-bridge is deprecated". This happens because
> the QOM tests are instantiating all possible devices, and the error
> message is currently located in the instance_init() function of the
> device. Since it is legal for the tests to instantiate a device without
> using it, the error message should be silenced when we're running in
> test mode.
> 
> Signed-off-by: Thomas Huth 

Applied to ppc-for-2.7, thanks.

> ---
>  hw/ppc/spapr_pci_vfio.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index cbd3d23..f3cb141 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -27,6 +27,7 @@
>  #include "linux/vfio.h"
>  #include "hw/vfio/vfio.h"
>  #include "qemu/error-report.h"
> +#include "sysemu/qtest.h"
>  
>  #define TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE "spapr-pci-vfio-host-bridge"
>  
> @@ -48,7 +49,9 @@ static Property spapr_phb_vfio_properties[] = {
>  
>  static void spapr_phb_vfio_instance_init(Object *obj)
>  {
> -error_report("spapr-pci-vfio-host-bridge is deprecated");
> +if (!qtest_enabled()) {
> +error_report("spapr-pci-vfio-host-bridge is deprecated");
> +}
>  }
>  
>  bool spapr_phb_eeh_available(sPAPRPHBState *sphb)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] ppc / sparc: Add a tester for checking whether OpenBIOS runs successfully

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 03:57:56PM +0200, Thomas Huth wrote:
> Since the mac99 and g3beige PowerPC machines recently broke without
> being noticed, it would be good to have a tester for "make check"
> that detects such issues immediately. A simple way to test the firmware
> of these machines is to use the "-prom-env" parameter of QEMU. This
> parameter can be used to put some Forth code into the 'boot-command'
> firmware variable which then can signal success to the tester by
> writing a magic value to a known memory location. And since some of the
> Sparc machines are also using OpenBIOS, they are now tested with this
> prom-env-tester, too.
> 
> Reviewed-by: Markus Armbruster 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Removed unnecessary include statements (as suggested by Markus)

Beautiful, I've applied this to ppc-for-2.7, assuming I don't get an
objection to taking this through my tree.

> 
>  tests/Makefile.include |  5 +++
>  tests/prom-env-test.c  | 90 
> ++
>  2 files changed, 95 insertions(+)
>  create mode 100644 tests/prom-env-test.c
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 7d63d16..f95a3ca 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -258,6 +258,10 @@ check-qtest-ppc-y += tests/boot-order-test$(EXESUF)
>  check-qtest-ppc64-y += tests/boot-order-test$(EXESUF)
>  check-qtest-ppc64-y += tests/spapr-phb-test$(EXESUF)
>  gcov-files-ppc64-y += ppc64-softmmu/hw/ppc/spapr_pci.c
> +check-qtest-ppc-y = tests/prom-env-test$(EXESUF)
> +check-qtest-ppc64-y = tests/prom-env-test$(EXESUF)
> +check-qtest-sparc-y = tests/prom-env-test$(EXESUF)
> +check-qtest-sparc64-y = tests/prom-env-test$(EXESUF)
>  check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
>  
> @@ -549,6 +553,7 @@ tests/rtc-test$(EXESUF): tests/rtc-test.o
>  tests/m48t59-test$(EXESUF): tests/m48t59-test.o
>  tests/endianness-test$(EXESUF): tests/endianness-test.o
>  tests/spapr-phb-test$(EXESUF): tests/spapr-phb-test.o $(libqos-obj-y)
> +tests/prom-env-test$(EXESUF): tests/prom-env-test.o $(libqos-obj-y)
>  tests/fdc-test$(EXESUF): tests/fdc-test.o
>  tests/ide-test$(EXESUF): tests/ide-test.o $(libqos-pc-obj-y)
>  tests/ahci-test$(EXESUF): tests/ahci-test.o $(libqos-pc-obj-y)
> diff --git a/tests/prom-env-test.c b/tests/prom-env-test.c
> new file mode 100644
> index 000..6df57d2
> --- /dev/null
> +++ b/tests/prom-env-test.c
> @@ -0,0 +1,90 @@
> +/*
> + * Test OpenBIOS-based machines.
> + *
> + * Copyright (c) 2016 Red Hat Inc.
> + *
> + * Author:
> + *Thomas Huth 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2
> + * or later. See the COPYING file in the top-level directory.
> + *
> + * This test is used to check that some OpenBIOS machines can be started
> + * successfully in TCG mode. To do this, we first put some Forth code into
> + * the "boot-command" Open Firmware environment variable. This Forth code
> + * writes a well-known magic value to a known location in memory. Then we
> + * start the guest so that OpenBIOS can boot and finally run the Forth code.
> + * The testing code here then can finally check whether the value has been
> + * successfully written into the guest memory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest.h"
> +
> +#define MAGIC   0xcafec0de
> +#define ADDRESS 0x4000
> +
> +static void check_guest_memory(void)
> +{
> +uint32_t signature;
> +int i;
> +
> +/* Poll until code has run and modified memory. Wait at most 30 seconds 
> */
> +for (i = 0; i < 3000; ++i) {
> +signature = readl(ADDRESS);
> +if (signature == MAGIC) {
> +break;
> +}
> +g_usleep(1);
> +}
> +
> +g_assert_cmphex(signature, ==, MAGIC);
> +}
> +
> +static void test_machine(const void *machine)
> +{
> +char *args;
> +
> +args = g_strdup_printf("-M %s,accel=tcg -prom-env 'boot-command=%x %x 
> l!'",
> +   (const char *)machine, MAGIC, ADDRESS);
> +
> +qtest_start(args);
> +check_guest_memory();
> +qtest_quit(global_qtest);
> +
> +g_free(args);
> +}
> +
> +static void add_tests(const char *machines[])
> +{
> +int i;
> +char *name;
> +
> +for (i = 0; machines[i] != NULL; i++) {
> +name = g_strdup_printf("prom-env/%s", machines[i]);
> +qtest_add_data_func(name, machines[i], test_machine);
> +g_free(name);
> +}
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +const char *sparc_machines[] = { "SPARCbook", "Voyager", "SS-20", NULL };
> +const char *sparc64_machines[] = { "sun4u", "sun4v", NULL };
> +const char *mac_machines[] = { "mac99", "g3beige", NULL };
> +const char *arch = qtest_get_arch();
> +
> +g_test_init(, , NULL);
> +
> +if (!strcmp(arch, "ppc") || !strcmp(arch, "ppc64")) {
> +

[Qemu-devel] [PATCH] ACPI: ARM: Present GIC version in MADT table

2016-06-14 Thread Shannon Zhao

From: Shannon Zhao 

In ACPI 5.1 Errata, it adds GIC version in GIC distributor structure.
This is useful for guest kernel to identify which version GIC hardware
is. Update GIC distributor structure and present GIC version in MADT
table.

Signed-off-by: Shannon Zhao 
---
 hw/arm/virt-acpi-build.c| 1 +
 include/hw/acpi/acpi-defs.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 1fa0581..28fc59c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -523,6 +523,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtGuestInfo *guest_info)
 gicd->type = ACPI_APIC_GENERIC_DISTRIBUTOR;
 gicd->length = sizeof(*gicd);
 gicd->base_address = memmap[VIRT_GIC_DIST].base;
+gicd->version = guest_info->gic_version;
 
 for (i = 0; i < guest_info->smp_cpus; i++) {
 AcpiMadtGenericInterrupt *gicc = acpi_data_push(table_data,
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 850a962..ea9be0b 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -367,7 +367,9 @@ struct AcpiMadtGenericDistributor {
 uint32_t gic_id;
 uint64_t base_address;
 uint32_t global_irq_base;
-uint32_t reserved2;
+/* ACPI 5.1 Errata 1228 Present GIC version in MADT table */
+uint8_t version;
+uint8_t reserved2[3];
 } QEMU_PACKED;
 
 typedef struct AcpiMadtGenericDistributor AcpiMadtGenericDistributor;
-- 
2.0.4

Re: [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> We should not take into account zero blocks for delay calculations.
> They are not read and thus IO throttling is not required. In the
> other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> days.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Seems reasonable, but I'll let others more familiar with throttling give
the final say.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 18/20] hw/intc/arm_gicv3: Add IRQ handling CPU interface registers

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> Add the CPU interface registers which deal with acknowledging
> and dismissing interrupts.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/arm_gicv3_cpuif.c | 437 
> ++
>  hw/intc/gicv3_internal.h  |   5 +
>  trace-events  |   7 +
>  3 files changed, 449 insertions(+)
> 
> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
> index a368dbb..5b2972e 100644
> --- a/hw/intc/arm_gicv3_cpuif.c
> +++ b/hw/intc/arm_gicv3_cpuif.c
> @@ -219,6 +219,297 @@ static void icc_pmr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  gicv3_cpuif_update(cs);
>  }
>  
> +static void icc_activate_irq(GICv3CPUState *cs, int irq)
> +{
> +/* Move the interrupt from the Pending state to Active, and update
> + * the Active Priority Registers
> + */
> +uint32_t mask = icc_gprio_mask(cs, cs->hppi.grp);
> +int prio = cs->hppi.prio & mask;
> +int aprbit = prio >> 1;
> +int regno = aprbit / 32;
> +int regbit = aprbit % 32;
> +
> +cs->icc_apr[cs->hppi.grp][regno] |= (1 << regbit);
> +
> +if (irq < GIC_INTERNAL) {
> +cs->gicr_iactiver0 = deposit32(cs->gicr_iactiver0, irq, 1, 1);
> +cs->gicr_ipendr0 = deposit32(cs->gicr_ipendr0, irq, 1, 0);
> +gicv3_redist_update(cs);
> +} else {
> +gicv3_gicd_active_set(cs->gic, irq);
> +gicv3_gicd_pending_clear(cs->gic, irq);
> +gicv3_update(cs->gic, irq, 1);
> +}
> +}
> +
> +static uint64_t icc_hppir0_value(GICv3CPUState *cs, CPUARMState *env)
> +{
> +/* Return the highest priority pending interrupt register value
> + * for group 0.
> + */
> +bool irq_is_secure;
> +
> +if (cs->hppi.prio == 0xff) {
> +return INTID_SPURIOUS;
> +}
> +
> +/* Check whether we can return the interrupt or if we should return
> + * a special identifier, as per the CheckGroup0ForSpecialIdentifiers
> + * pseudocode. (We can simplify a little because for us ICC_SRE_EL1.RM
> + * is always zero.)
> + */
> +irq_is_secure = (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
> + (cs->hppi.grp != GICV3_G1NS));
> +
> +if (cs->hppi.grp != GICV3_G0 && !arm_is_el3_or_mon(env)) {
> +return INTID_SPURIOUS;
> +}
> +if (irq_is_secure && !arm_is_secure(env)) {
> +/* Secure interrupts not visible to Nonsecure */
> +return INTID_SPURIOUS;
> +}
> +
> +if (cs->hppi.grp != GICV3_G0) {
> +/* Indicate to EL3 that there's a Group 1 interrupt for the other
> + * state pending.
> + */
> +return irq_is_secure ? INTID_SECURE : INTID_NONSECURE;
> +}
> +
> +return cs->hppi.irq;
> +}
> +
> +static uint64_t icc_hppir1_value(GICv3CPUState *cs, CPUARMState *env)
> +{
> +/* Return the highest priority pending interrupt register value
> + * for group 1.
> + */
> +bool irq_is_secure;
> +
> +if (cs->hppi.prio == 0xff) {
> +return INTID_SPURIOUS;
> +}
> +
> +/* Check whether we can return the interrupt or if we should return
> + * a special identifier, as per the CheckGroup1ForSpecialIdentifiers
> + * pseudocode. (We can simplify a little because for us ICC_SRE_EL1.RM
> + * is always zero.)
> + */
> +irq_is_secure = (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
> + (cs->hppi.grp != GICV3_G1NS));
> +
> +if (cs->hppi.grp == GICV3_G0) {
> +/* Group 0 interrupts not visible via HPPIR1 */
> +return INTID_SPURIOUS;
> +}
> +if (irq_is_secure) {
> +if (!arm_is_secure(env)) {
> +/* Secure interrupts not visible in Non-secure */
> +return INTID_SPURIOUS;
> +}
> +} else if (!arm_is_el3_or_mon(env) && arm_is_secure(env)) {
> +/* Group 1 non-secure interrupts not visible in Secure EL1 */
> +return INTID_SPURIOUS;
> +}
> +
> +return cs->hppi.irq;
> +}
> +
> +static uint64_t icc_iar0_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +GICv3CPUState *cs = icc_cs_from_env(env);
> +uint64_t intid;
> +
> +if (!icc_hppi_can_preempt(cs)) {
> +intid = INTID_SPURIOUS;
> +} else {
> +intid = icc_hppir0_value(cs, env);
> +}
> +
> +if (!(intid >= INTID_SECURE && intid <= INTID_SPURIOUS)) {
> +icc_activate_irq(cs, intid);
> +}
> +
> +trace_gicv3_icc_iar0_read(gicv3_redist_affid(cs), intid);
> +return intid;
> +}
> +
> +static uint64_t icc_iar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +GICv3CPUState *cs = icc_cs_from_env(env);
> +uint64_t intid;
> +
> +if (!icc_hppi_can_preempt(cs)) {
> +intid = INTID_SPURIOUS;
> +} else {
> +intid = icc_hppir1_value(cs, env);
> +}
> +
> +if (!(intid >= INTID_SECURE && intid <= INTID_SPURIOUS)) {
> +icc_activate_irq(cs, intid);
> +

Re: [Qemu-devel] [PATCH 06/10] ppc: Rework generation of priv and inval interrupts

2016-06-14 Thread David Gibson

On Mon, Jun 13, 2016 at 07:24:52AM +0200, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> Recent server processors use the Hypervisor Emulation Assistance
> interrupt for illegal instructions and *some* type of SPR accesses.
> 
> Also the code was always generating inval instructions even for priv
> violations due to setting the wrong flags
> 
> Finally, the checking for PR/HV was open coded everywhere.
> 
> This reworks it all, using little helper macros for checking, and
> adding the HV interrupt (which gets converted back to program check
> in the slow path of excp_helper.c on CPUs that don't want it).
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: fixed checkpatch.pl errors ]
> Signed-off-by: Cédric Le Goater 
> ---
>  linux-user/main.c|   1 +
>  target-ppc/excp_helper.c |  19 ++
>  target-ppc/translate.c   | 690 
> ---
>  3 files changed, 311 insertions(+), 399 deletions(-)
> 
> diff --git a/linux-user/main.c b/linux-user/main.c
> index f8a8764ae97a..9e9b88b458c4 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -1721,6 +1721,7 @@ void cpu_loop(CPUPPCState *env)
>  queue_signal(env, info.si_signo, );
>  break;
>  case POWERPC_EXCP_PROGRAM:  /* Program exception 
> */
> +case POWERPC_EXCP_HV_EMU:   /* HV emulation  
> */
>  /* XXX: check this */
>  switch (env->error_code & ~0xF) {
>  case POWERPC_EXCP_FP:
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index 7c44c102db39..054c12de3bff 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -128,6 +128,19 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  ail = 0;
>  }
>  
> +/* Hypervisor emulation assistance interrupt only exists on server
> + * arch 2.05 server or later. We also don't want to generate it if
> + * we don't have HVB in msr_mask (PAPR mode).
> + */
> +if (excp == POWERPC_EXCP_HV_EMU
> +#if defined(TARGET_PPC64)
> +&& !((env->mmu_model & POWERPC_MMU_64) && (env->msr_mask & MSR_HVB))
> +#endif /* defined(TARGET_PPC64) */
> +
> +) {
> +excp = POWERPC_EXCP_PROGRAM;
> +}
> +
>  switch (excp) {
>  case POWERPC_EXCP_NONE:
>  /* Should never happen */
> @@ -249,6 +262,12 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  break;
>  }
>  goto store_current;
> +case POWERPC_EXCP_HV_EMU:
> +srr0 = SPR_HSRR0;
> +srr1 = SPR_HSRR1;
> +new_msr |= (target_ulong)MSR_HVB;
> +new_msr |= env->msr & ((target_ulong)1 << MSR_RI);
> +goto store_current;
>  case POWERPC_EXCP_FPU:   /* Floating-point unavailable exception 
> */
>  goto store_current;
>  case POWERPC_EXCP_SYSCALL:   /* System call exception
> */
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index a02ddf52bfe6..2ec858063ecc 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -325,7 +325,19 @@ static inline void gen_debug_exception(DisasContext *ctx)
>  
>  static inline void gen_inval_exception(DisasContext *ctx, uint32_t error)
>  {
> -gen_exception_err(ctx, POWERPC_EXCP_PROGRAM, POWERPC_EXCP_INVAL | error);
> +/* Will be converted to program check if needed */
> +gen_exception_err(ctx, POWERPC_EXCP_HV_EMU, POWERPC_EXCP_INVAL | error);
> +}
> +
> +static inline void gen_priv_exception(DisasContext *ctx, uint32_t error)
> +{
> +gen_exception_err(ctx, POWERPC_EXCP_PROGRAM, POWERPC_EXCP_PRIV | error);
> +}
> +
> +static inline void gen_hvpriv_exception(DisasContext *ctx, uint32_t error)
> +{
> +/* Will be converted to program check if needed */
> +gen_exception_err(ctx, POWERPC_EXCP_HV_EMU, POWERPC_EXCP_PRIV | error);
>  }
>  
>  /* Stop translation */
> @@ -366,6 +378,33 @@ typedef struct opcode_t {
>  const char *oname;
>  } opcode_t;
>  
> +/* Helpers for priv. check */
> +#define GEN_PRIV\
> +do {\
> +gen_priv_exception(ctx, POWERPC_EXCP_PRIV_OPC); return; \
> +} while (0)
> +
> +#if defined(CONFIG_USER_ONLY)
> +#define CHK_HV GEN_PRIV
> +#define CHK_SV GEN_PRIV
> +#else
> +#define CHK_HV  \
> +do {\
> +if (unlikely(ctx->pr || !ctx->hv)) {\
> +GEN_PRIV;   \
> +}   \
> +} while (0)
> +#define CHK_SV   \
> +do { \
> +

Re: [Qemu-devel] [PATCH 10/10] ppc: Add P7/P8 Power Management instructions

2016-06-14 Thread David Gibson

On Mon, Jun 13, 2016 at 07:24:56AM +0200, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This adds the ISA 2.06 and later power management instructions
> (doze, nap, sleep and rvwinkle) and associated wakeup cause testing
> in LPCR
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: fixed checkpatch.pl errors ]
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target-ppc/cpu-qom.h|  9 +
>  target-ppc/cpu.h| 17 -
>  target-ppc/excp_helper.c| 59 +
>  target-ppc/helper.h |  1 +
>  target-ppc/translate.c  | 66 
>  target-ppc/translate_init.c | 92 
> -
>  6 files changed, 241 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
> index 969ecdfbd40a..0fad2def0a94 100644
> --- a/target-ppc/cpu-qom.h
> +++ b/target-ppc/cpu-qom.h
> @@ -126,6 +126,15 @@ enum powerpc_excp_t {
>  };
>  
>  
> /*/
> +/* PM instructions */
> +typedef enum {
> +PPC_PM_DOZE,
> +PPC_PM_NAP,
> +PPC_PM_SLEEP,
> +PPC_PM_RVWINKLE,
> +} powerpc_pm_insn_t;
> +
> +/*/
>  /* Input pins model  
> */
>  typedef enum powerpc_input_t powerpc_input_t;
>  enum powerpc_input_t {
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 61a24b19ffce..b1354a4791db 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -383,6 +383,14 @@ struct ppc_slb_t {
>  #define LPCR_LPES1(1ull << (63 - 61))
>  #define LPCR_AIL_SHIFT(63 - 40)  /* Alternate interrupt location */
>  #define LPCR_AIL  (3ull << LPCR_AIL_SHIFT)
> +#define LPCR_P7_PECE0 (1ull << (63 - 49))
> +#define LPCR_P7_PECE1 (1ull << (63 - 50))
> +#define LPCR_P7_PECE2 (1ull << (63 - 51))
> +#define LPCR_P8_PECE0 (1ull << (63 - 47))
> +#define LPCR_P8_PECE1 (1ull << (63 - 48))
> +#define LPCR_P8_PECE2 (1ull << (63 - 49))
> +#define LPCR_P8_PECE3 (1ull << (63 - 50))
> +#define LPCR_P8_PECE4 (1ull << (63 - 51))
>  
>  #define msr_sf   ((env->msr >> MSR_SF)   & 1)
>  #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
> @@ -1059,6 +1067,11 @@ struct CPUPPCState {
>   * instructions and SPRs are diallowed if MSR:HV is 0
>   */
>  bool has_hv_mode;
> +/* On P7/P8, set when in PM state, we need to handle resume
> + * in a special way (such as routing some resume causes to
> + * 0x100), so flag this here.
> + */
> +bool in_pm_state;
>  #endif
>  
>  /* Those resources are used only during code translation */
> @@ -2068,6 +2081,8 @@ enum {
>  PPC2_FP_CVT_S64= 0x0001ULL,
>  /* Transactional Memory (ISA 2.07, Book II)  
> */
>  PPC2_TM= 0x0002ULL,
> +/* Server PM instructgions (ISA 2.06, Book III)  
> */
> +PPC2_PM_ISA206 = 0x0004ULL,
>  
>  #define PPC_TCG_INSNS2 (PPC2_BOOKE206 | PPC2_VSX | PPC2_PRCNTL | PPC2_DBRX | 
> \
>  PPC2_ISA205 | PPC2_VSX207 | PPC2_PERM_ISA206 | \
> @@ -2075,7 +2090,7 @@ enum {
>  PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | \
>  PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | \
>  PPC2_ALTIVEC_207 | PPC2_ISA207S | PPC2_DFP | \
> -PPC2_FP_CVT_S64 | PPC2_TM)
> +PPC2_FP_CVT_S64 | PPC2_TM | PPC2_PM_ISA206)
>  };
>  
>  
> /*/
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index 054c12de3bff..533866b87b60 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -101,6 +101,44 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  asrr0 = -1;
>  asrr1 = -1;
>  
> +/* check for special resume at 0x100 from doze/nap/sleep/winkle on P7/P8 
> */
> +if (env->in_pm_state) {
> +env->in_pm_state = false;
> +
> +/* Pretend to be returning from doze always as we don't lose state */
> +msr |= (0x1ull << (63 - 47));
> +
> +/* Non-machine check are routed to 0x100 with a wakeup cause
> + * encoded in SRR1
> + */
> +if (excp != POWERPC_EXCP_MCHECK) {
> +switch (excp) {
> +case POWERPC_EXCP_RESET:
> +msr |= 0x4ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_EXTERNAL:
> +msr |= 0x8ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_DECR:
> +msr |= 0x6ull << (63 - 45);
> +

Re: [Qemu-devel] [PATCH 05/10] ppc: Fix generation if ISI/DSI vs. HV mode

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 08:42:26AM +0200, Cédric Le Goater wrote:
> On 06/14/2016 08:34 AM, David Gibson wrote:
> > On Mon, Jun 13, 2016 at 07:24:51AM +0200, Cédric Le Goater wrote:
> >> From: Benjamin Herrenschmidt 
> >>
> >> Under some circumstances, we need to direct ISI and DSI interrupts
> >> at the hypervisor, turning them into HISI/HDSI, and using different
> >> SPRs (HDSISR and HDAR) depending on the combination of MSR_DR and
> >> the corresponding VPM bits in LPCR.
> >>
> >> This moves part of the code into helpers that are fixed to select
> >> the right exception type and registers. On pre-P7 processors, LPCR
> >> is 0 which provides the old behaviour of directing the interrupts
> >> at the supervisor.
> >>
> >> Thanks to Andrei Warkentin for finding a bug when HV=1
> >>
> >> Signed-off-by: Benjamin Herrenschmidt 
> >> Reviewed-by: David Gibson 
> >> ---
> >>  target-ppc/mmu-hash64.c | 69 
> >> +++--
> >>  1 file changed, 50 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> >> index 668da5e22653..072a952c8bd5 100644
> >> --- a/target-ppc/mmu-hash64.c
> >> +++ b/target-ppc/mmu-hash64.c
> >> @@ -613,6 +613,47 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU 
> >> *cpu,
> >>  return 0;
> >>  }
> >>  
> >> +static void ppc_hash64_set_isi(CPUState *cs, CPUPPCState *env,
> >> +   uint64_t error_code)
> >> +{
> >> +bool vpm;
> >> +
> >> +if (msr_ir) {
> >> +vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
> >> +} else {
> >> +vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
> >> +}
> >> +if (vpm && !msr_hv) {
> >> +cs->exception_index = POWERPC_EXCP_HISI;
> > 
> > In the ISI case, you use HISI if !msr_hv..
> > 
> >> +} else {
> >> +cs->exception_index = POWERPC_EXCP_ISI;
> >> +}
> >> +env->error_code = error_code;
> >> +}
> >> +
> >> +static void ppc_hash64_set_dsi(CPUState *cs, CPUPPCState *env, uint64_t 
> >> dar,
> >> +   uint64_t dsisr)
> >> +{
> >> +bool vpm;
> >> +
> >> +if (msr_dr) {
> >> +vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
> >> +} else {
> >> +vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
> >> +}
> >> +if (vpm && msr_hv) {
> >> +cs->exception_index = POWERPC_EXCP_HDSI;
> > 
> > ..but in the DSI case you use HDSI if msr_hv.  Is that really right?
> 
> No. I forgot to add this patch from Andrei :
> 
>   
> https://github.com/legoater/qemu/commit/e218fd3ba945bb0f483f5f12bedbb74d897cd5b9
> 
> I will send it as a fix.

Given that there are a number of tweaks this series needs, I'd prefer
to see it folded in rather than as a separate patch.

> 
> C.  
> 
> >> +env->spr[SPR_HDAR] = dar;
> >> +env->spr[SPR_HDSISR] = dsisr;
> >> +} else {
> >> +cs->exception_index = POWERPC_EXCP_DSI;
> >> +env->spr[SPR_DAR] = dar;
> >> +env->spr[SPR_DSISR] = dsisr;
> >> +   }
> >> +env->error_code = 0;
> >> +}
> >> +
> >> +
> >>  int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr,
> >>  int rwx, int mmu_idx)
> >>  {
> >> @@ -623,7 +664,7 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> >> eaddr,
> >>  hwaddr pte_offset;
> >>  ppc_hash_pte64_t pte;
> >>  int pp_prot, amr_prot, prot;
> >> -uint64_t new_pte1;
> >> +uint64_t new_pte1, dsisr;
> >>  const int need_prot[] = {PAGE_READ, PAGE_WRITE, PAGE_EXEC};
> >>  hwaddr raddr;
> >>  
> >> @@ -657,26 +698,21 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, 
> >> vaddr eaddr,
> >>  
> >>  /* 3. Check for segment level no-execute violation */
> >>  if ((rwx == 2) && (slb->vsid & SLB_VSID_N)) {
> >> -cs->exception_index = POWERPC_EXCP_ISI;
> >> -env->error_code = 0x1000;
> >> +ppc_hash64_set_isi(cs, env, 0x1000);
> >>  return 1;
> >>  }
> >>  
> >>  /* 4. Locate the PTE in the hash table */
> >>  pte_offset = ppc_hash64_htab_lookup(cpu, slb, eaddr, );
> >>  if (pte_offset == -1) {
> >> +dsisr = 0x4000;
> >>  if (rwx == 2) {
> >> -cs->exception_index = POWERPC_EXCP_ISI;
> >> -env->error_code = 0x4000;
> >> +ppc_hash64_set_isi(cs, env, dsisr);
> >>  } else {
> >> -cs->exception_index = POWERPC_EXCP_DSI;
> >> -env->error_code = 0;
> >> -env->spr[SPR_DAR] = eaddr;
> >>  if (rwx == 1) {
> >> -env->spr[SPR_DSISR] = 0x4200;
> >> -} else {
> >> -env->spr[SPR_DSISR] = 0x4000;
> >> +dsisr |= 0x0200;
> >>  }
> >> +ppc_hash64_set_dsi(cs, env, eaddr, dsisr);
> >>  }
> >>  return 1;
> >>  }
> >> @@ -705,14 +741,9 @@ int

Re: [Qemu-devel] [PATCH 09/10] ppc: Move exception generation code out of line

2016-06-14 Thread David Gibson

On Mon, Jun 13, 2016 at 10:36:24AM +0200, Cédric Le Goater wrote:
> On 06/13/2016 09:44 AM, Thomas Huth wrote:
> > On 13.06.2016 07:24, Cédric Le Goater wrote:
> >> From: Benjamin Herrenschmidt 
> >>
> >> There's no point inlining this, if you hit the exception case you exit
> >> anyway, and not inlining saves about 100K of code size (and cache
> >> footprint).
> >>
> >> Signed-off-by: Benjamin Herrenschmidt 
> >> ---
> >>  target-ppc/translate.c | 9 ++---
> >>  1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> >> index f211d175c09c..600d5db2bb9a 100644
> >> --- a/target-ppc/translate.c
> >> +++ b/target-ppc/translate.c
> >> @@ -283,7 +283,8 @@ void gen_update_current_nip(void *opaque)
> >>  tcg_gen_movi_tl(cpu_nip, ctx->nip);
> >>  }
> >>  
> >> -static inline void gen_exception_err(DisasContext *ctx, uint32_t excp, 
> >> uint32_t error)
> >> +static void __attribute__((noinline))
> >> +gen_exception_err(DisasContext *ctx, uint32_t excp, uint32_t error)
> >>  {
> >>  TCGv_i32 t0, t1;
> >>  if (ctx->exception == POWERPC_EXCP_NONE) {
> >> @@ -297,7 +298,8 @@ static inline void gen_exception_err(DisasContext 
> >> *ctx, uint32_t excp, uint32_t
> >>  ctx->exception = (excp);
> >>  }
> >>  
> >> -static inline void gen_exception(DisasContext *ctx, uint32_t excp)
> >> +static void __attribute__((noinline))
> >> +gen_exception(DisasContext *ctx, uint32_t excp)
> >>  {
> >>  TCGv_i32 t0;
> >>  if (ctx->exception == POWERPC_EXCP_NONE) {
> >> @@ -309,7 +311,8 @@ static inline void gen_exception(DisasContext *ctx, 
> >> uint32_t excp)
> >>  ctx->exception = (excp);
> >>  }
> >>  
> >> -static inline void gen_debug_exception(DisasContext *ctx)
> >> +static void __attribute__((noinline))
> >> +gen_debug_exception(DisasContext *ctx)
> >>  {
> >>  TCGv_i32 t0;
> > 
> > Do you get the same results if you just remove the "inline" keyword,
> > without adding the "__attribute__((noinline))" ? If yes, I'd suggest to
> > do this patch without the "__attribute__((noinline))" - that's easier to
> > read, and the compiler can still decide to inline something in case it's
> > better one a certain architecture.
> 
> Yes. They are no differences. 
> 
> The interesting part though is that the .text is about the same size. 
> There is even a slight increase of ~2K with gcc 4.9.2 (intel host) and 
> a slight decrease of ~1K with gcc 5.3.1 (ppc64le host).
> 
> I guess we can just drop that patch. It does not seem to bring much.

I would prefer to see the inline keyword removed.  Except in the case
of tiny header functions, it's very rarely a good idea - usually the
compiler will have better information on whether to inline or not.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] Fix confusing argument names in some common functions

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 03:26:17PM +0300, Sergey Sorokin wrote:
> There are functions tlb_fill(), cpu_unaligned_access() and
> do_unaligned_access() that are called with access type and mmu index
> arguments. But these arguments are named 'is_write' and 'is_user' in their
> declarations. The patches fix the arguments to avoid a confusion.
> 
> Signed-off-by: Sergey Sorokin 
> ---
> In the second version of the patch a type of access_type argument
> was changed from int to MMUAccessType. To allow it the declaration of
> MMUAccessType was moved from exec/cpu-common.h into qom/cpu.h
> The series of two patches was merged into one.
> 
>  include/exec/cpu-common.h |  6 --
>  include/exec/exec-all.h   |  4 ++--
>  include/qom/cpu.h | 15 +++
>  target-alpha/cpu.h|  3 ++-
>  target-alpha/mem_helper.c |  7 ---
>  target-arm/internals.h|  5 +++--
>  target-arm/op_helper.c| 30 +-
>  target-cris/op_helper.c   |  6 +++---
>  target-i386/mem_helper.c  |  6 +++---
>  target-lm32/op_helper.c   |  6 +++---
>  target-m68k/op_helper.c   |  6 +++---
>  target-microblaze/op_helper.c |  6 +++---
>  target-mips/cpu.h |  3 ++-
>  target-mips/op_helper.c   | 10 +-
>  target-moxie/helper.c |  6 +++---
>  target-openrisc/mmu_helper.c  |  4 ++--
>  target-ppc/mmu_helper.c   |  8 
>  target-s390x/mem_helper.c |  6 +++---
>  target-sh4/op_helper.c|  6 +++---
>  target-sparc/cpu.h|  7 ---
>  target-sparc/ldst_helper.c| 13 +++--
>  target-tricore/op_helper.c|  6 +++---
>  target-unicore32/op_helper.c  |  4 ++--
>  target-xtensa/cpu.h   |  3 ++-
>  target-xtensa/op_helper.c | 11 ++-
>  25 files changed, 100 insertions(+), 87 deletions(-)
> 
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index aaee995..9ac1eaf 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -23,12 +23,6 @@ typedef struct CPUListState {
>  FILE *file;
>  } CPUListState;
>  
> -typedef enum MMUAccessType {
> -MMU_DATA_LOAD  = 0,
> -MMU_DATA_STORE = 1,
> -MMU_INST_FETCH = 2
> -} MMUAccessType;
> -
>  #if !defined(CONFIG_USER_ONLY)
>  
>  enum device_endian {
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index c1f59fa..db79ab6 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -361,8 +361,8 @@ extern uintptr_t tci_tb_ptr;
>  struct MemoryRegion *iotlb_to_region(CPUState *cpu,
>   hwaddr index, MemTxAttrs attrs);
>  
> -void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
> -  uintptr_t retaddr);
> +void tlb_fill(CPUState *cpu, target_ulong addr, MMUAccessType access_type,
> +  int mmu_idx, uintptr_t retaddr);
>  
>  #endif
>  
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 32f3af3..422ac41 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -60,6 +60,12 @@ typedef uint64_t vaddr;
>  #define CPU_CLASS(class) OBJECT_CLASS_CHECK(CPUClass, (class), TYPE_CPU)
>  #define CPU_GET_CLASS(obj) OBJECT_GET_CLASS(CPUClass, (obj), TYPE_CPU)
>  
> +typedef enum MMUAccessType {
> +MMU_DATA_LOAD  = 0,
> +MMU_DATA_STORE = 1,
> +MMU_INST_FETCH = 2
> +} MMUAccessType;
> +
>  typedef struct CPUWatchpoint CPUWatchpoint;
>  
>  typedef void (*CPUUnassignedAccess)(CPUState *cpu, hwaddr addr,
> @@ -142,7 +148,8 @@ typedef struct CPUClass {
>  void (*do_interrupt)(CPUState *cpu);
>  CPUUnassignedAccess do_unassigned_access;
>  void (*do_unaligned_access)(CPUState *cpu, vaddr addr,
> -int is_write, int is_user, uintptr_t 
> retaddr);
> +MMUAccessType access_type,
> +int mmu_idx, uintptr_t retaddr);
>  bool (*virtio_is_big_endian)(CPUState *cpu);
>  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
> uint8_t *buf, int len, bool is_write);
> @@ -716,12 +723,12 @@ static inline void cpu_unassigned_access(CPUState *cpu, 
> hwaddr addr,
>  }
>  
>  static inline void cpu_unaligned_access(CPUState *cpu, vaddr addr,
> -int is_write, int is_user,
> -uintptr_t retaddr)
> +MMUAccessType access_type,
> +int mmu_idx, uintptr_t retaddr)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  
> -cc->do_unaligned_access(cpu, addr, is_write, is_user, retaddr);
> +cc->do_unaligned_access(cpu, addr, access_type, mmu_idx, retaddr);
>  }
>  #endif
>  
> diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
> index e71ea70..cfbb615 100644
> --- a/target-alpha/cpu.h
> +++ b/target-alpha/cpu.h
> @@ -323,7 +323,8 @@ hwaddr

Re: [Qemu-devel] [PATCH 03/10] ppc: Rework POWER7 & POWER8 exception model (part 2)

2016-06-14 Thread David Gibson

On Wed, Jun 15, 2016 at 07:19:39AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-06-14 at 16:25 +1000, David Gibson wrote:
> > > Properly implement LPES0/1 handling for HV vs. !HV mode.
> > > 
> > > Signed-off-by: Benjamin Herrenschmidt 
> > > [clg: AIL implementation was fixed in commit 5c94b2a5e5ef
> > >   fixed checkpatch.pl errors ]
> > 
> > Code looks ok, but the short description really needs an update,
> > since
> > this has been taken out of its original series context.
> 
> This is still what this does. It properly implements support for LPCR0
> (LPCR1 isn't supported). It also fixes how the HV bit is handled when
> taking interrupts and which set of SRR's are used in some cases.

It's the "Rework POWER7 & POWER8 exception model (part 2)" bit which
is the problem.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 02/10] ppc: Create cpu_ppc_set_papr() helper (for LPCR)

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 08:52:12AM +0200, Cédric Le Goater wrote:
> On 06/14/2016 08:15 AM, David Gibson wrote:
> > On Mon, Jun 13, 2016 at 07:24:48AM +0200, Cédric Le Goater wrote:
> >> From: Benjamin Herrenschmidt 
> >>
> >> And move the code adjusting the MSR mask and calling kvmppc_set_papr()
> >> to it. This allows us to add a few more things such as disabling setting
> >> of MSR:HV and appropriate LPCR bits which will be used when fixing
> >> the exception model.
> >>
> >> Signed-off-by: Benjamin Herrenschmidt 
> >> Reviewed-by: David Gibson 
> >> [clg: previous commit 26a7f1291bb5 did not include the LPCR setting as
> >>   it was not needed at the time ]
> > 
> > I see how this came about, but it means the commit message (both long
> > and short) is confusingly mismatched from the code now.
> 
> OK. I will work on it.
> 
> For my education, how much can we change the initial changelog of a patch ?
> Is it considered as part of the code ? 

I'm not entirely sure I follow the question, but AFAIK it's entirely
ok to reword and clarify a commit message as it moves through the
signed-off-by chain.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be placed
> into the wire. Thus the target could be very efficiently zeroed out. This
> is should be done with the largest chunk possible.
> 
> This improves the performance of the live migration of the empty disk by
> 150 times if NBD supports write_zeroes.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index c7b3639..c2f8773 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -21,6 +21,7 @@
>  #include "qemu/ratelimit.h"
>  #include "qemu/bitmap.h"
>  
> +#define MIRROR_ZERO_CHUNK   (3u << (29 - BDRV_SECTOR_BITS))  /* 1.5 Gb */

Probably nicer to track this in bytes.  And do you really want a
hard-coded arbitrary limit, or is it better to live with
MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?

> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  
>  end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
> +if (base == NULL && !bdrv_has_zero_init(target_bs) &&
> +target_bs->drv->bdrv_co_write_zeroes == NULL) {

Indentation is off, although if checkpatch.pl doesn't complain I guess
it doesn't matter that much.

Why should you care whether the target_bs->drv implements a callback?
Can't you just rely on the normal bdrv_*() functions to do the dirty
work of picking the most efficient implementation without you having to
bypass the block layer?  In fact, isn't that the whole goal of
bdrv_make_zero() - why not call that instead of reimplementing it?

Patch needs rebasing - we've redone this into bdrv_co_pwrite_zeroes and
a byte interface, since upstream commit c1499a5e.

>  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
>  return 0;
>  }
> @@ -546,6 +548,34 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  }
>  sector_num += n;
>  }
> +
> +if (base != NULL || bdrv_has_zero_init(target_bs)) {

You're now repeating the conditional that used to be 'bool
mark_all_dirty' (well, this is !mark_all_dirty); is it worth keeping the
simpler bool around?

> +/* no need to zero out entire disk */
> +return 0;
> +}
> +
> +for (sector_num = 0; sector_num < end; ) {
> +int nb_sectors = MIN(MIRROR_ZERO_CHUNK, end - sector_num);

Why limit yourself to 1.5G? It's either too small for what you can
really do, or too large for what the device permits.  See my above
comment about MIN_NON_ZERO.

> +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +
> +if (now - last_pause_ns > SLICE_TIME) {
> +last_pause_ns = now;
> +block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, 0);
> +}
> +
> +if (block_job_is_cancelled(>common)) {
> +return -EINTR;
> +}
> +
> +if (s->in_flight == MAX_IN_FLIGHT) {
> +trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
> +mirror_wait_for_io(s);
> +continue;
> +}

Hmm - I guess your mirror yield points are why you couldn't just
directly use bdrv_make_zero(); but is that something where some code
refactoring can share more code rather than duplicating it?

> +
> +mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
> +sector_num += nb_sectors;
> +}
>  return 0;
>  }
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 00/20] GICv3 emulation

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> This series implements emulation of the GICv3 interrupt controller.
> It is based to some extent on previous patches from Shlomo and
> Pavel, but the bulk of it has turned out to be new code. (The
> combination of changing the underlying data structures, adding
> support for TrustZone and implementing proper GICv3 behaviour rather
> than borrowing fragments of GICv2 emulation code meant there wasn't
> much left to reuse.) I've tried to reflect this in the various
> authorship credits on the patches, but please let me know if you
> feel I got anything miscredited one way or the other.
> 
> Key points about the GICv3 emulated here:
>  * "non-legacy" only, ie system registers and affinity routing
>  * TrustZone is implemented
>  * no virtualization support
>  * only the "core" GICv3, so no LPI support (via ITS or otherwise)
>  * no attempt to work around the Linux guest kernel bug fixed
>in commit 7c9b973061b0 (so you need that fix for your guest to
>boot with this GICv3)
> 

I've tested this series using a linux kernel with single vcpu and 123
vcpus and also tested with UEFI.

Tested-by: Shannon Zhao 

Thanks,
-- 
Shannon

Re: [Qemu-devel] [PATCH v3 16/20] hw/intc/arm_gicv3: Implement gicv3_cpuif_update()

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> Implement the gicv3_cpuif_update() function which deals with correctly
> asserting IRQ and FIQ based on the current running priority of the CPU,
> the priority of the highest priority pending interrupt and the CPU's
> current exception level and security state.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/arm_gicv3_cpuif.c | 140 
> +-
>  hw/intc/gicv3_internal.h  |   5 +-
>  trace-events  |   2 +
>  3 files changed, 142 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
> index e112646..7faf3c0 100644
> --- a/hw/intc/arm_gicv3_cpuif.c
> +++ b/hw/intc/arm_gicv3_cpuif.c
> @@ -36,6 +36,142 @@ static bool gicv3_use_ns_bank(CPUARMState *env)
>  return !arm_is_secure_below_el3(env);
>  }
>  
> +static int icc_highest_active_prio(GICv3CPUState *cs)
> +{
> +/* Calculate the current running priority based on the set bits
> + * in the Active Priority Registers.
> + */
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(cs->icc_apr[0]); i++) {
> +uint32_t apr = cs->icc_apr[GICV3_G0][i] |
> +cs->icc_apr[GICV3_G1][i] | cs->icc_apr[GICV3_G1NS][i];
> +
> +if (!apr) {
> +continue;
> +}
> +return (i * 32 + ctz32(apr)) << (GIC_MIN_BPR + 1);
> +}
> +/* No current active interrupts: return idle priority */
> +return 0xff;
> +}
> +
> +static uint32_t icc_gprio_mask(GICv3CPUState *cs, int group)
> +{
> +/* Return a mask word which clears the subpriority bits from
> + * a priority value for an interrupt in the specified group.
> + * This depends on the BPR value:
> + *  a BPR of 0 means the group priority bits are [7:1];
> + *  a BPR of 1 means they are [7:2], and so on down to
> + *  a BPR of 7 meaning no group priority bits at all.
> + * Which BPR to use depends on the group of the interrupt and
> + * the current ICC_CTLR.CBPR settings.
> + */
> +if ((group == GICV3_G1 && cs->icc_ctlr_el1[GICV3_S] & ICC_CTLR_EL1_CBPR) 
> ||
> +(group == GICV3_G1NS &&
> + cs->icc_ctlr_el1[GICV3_NS] & ICC_CTLR_EL1_CBPR)) {
> +group = GICV3_G0;
> +}
> +
> +return ~0U << ((cs->icc_bpr[group] & 7) + 1);
> +}
> +
> +static bool icc_no_enabled_hppi(GICv3CPUState *cs)
> +{
> +/* Return true if there is no pending interrupt, or the
> + * highest priority pending interrupt is in a group which has been
> + * disabled at the CPU interface by the ICC_IGRPEN* register enable bits.
> + */
> +return cs->hppi.prio == 0xff || (cs->icc_igrpen[cs->hppi.grp] == 0);
> +}
> +
> +static bool icc_hppi_can_preempt(GICv3CPUState *cs)
> +{
> +/* Return true if we have a pending interrupt of sufficient
> + * priority to preempt.
> + */
> +int rprio;
> +uint32_t mask;
> +
> +if (icc_no_enabled_hppi(cs)) {
> +return false;
> +}
> +
> +if (cs->hppi.prio >= cs->icc_pmr_el1) {
> +/* Priority mask masks this interrupt */
> +return false;
> +}
> +
> +rprio = icc_highest_active_prio(cs);
> +if (rprio == 0xff) {
> +/* No currently running interrupt so we can preempt */
> +return true;
> +}
> +
> +mask = icc_gprio_mask(cs, cs->hppi.grp);
> +
> +/* We only preempt a running interrupt if the pending interrupt's
> + * group priority is sufficient (the subpriorities are not considered).
> + */
> +if ((cs->hppi.prio & mask) < (rprio & mask)) {
> +return true;
> +}
> +
> +return false;
> +}
> +
> +void gicv3_cpuif_update(GICv3CPUState *cs)
> +{
> +/* Tell the CPU about its highest priority pending interrupt */
> +int irqlevel = 0;
> +int fiqlevel = 0;
> +ARMCPU *cpu = ARM_CPU(cs->cpu);
> +CPUARMState *env = >env;
> +
> +trace_gicv3_cpuif_update(gicv3_redist_affid(cs), cs->hppi.irq,
> + cs->hppi.grp, cs->hppi.prio);
> +
> +if (cs->hppi.grp == GICV3_G1 && !arm_feature(env, ARM_FEATURE_EL3)) {
> +/* If a Security-enabled GIC sends a G1S interrupt to a
> + * Security-disabled CPU, we must treat it as if it were G0.
> + */
> +cs->hppi.grp = GICV3_G0;
> +}
> +
> +if (icc_hppi_can_preempt(cs)) {
> +/* We have an interrupt: should we signal it as IRQ or FIQ?
> + * This is described in the GICv3 spec section 4.6.2.
> + */
> +bool isfiq;
> +
> +switch (cs->hppi.grp) {
> +case GICV3_G0:
> +isfiq = true;
> +break;
> +case GICV3_G1:
> +isfiq = (!arm_is_secure(env) ||
> + (arm_current_el(env) == 3 && arm_el_is_aa64(env, 3)));
> +break;
> +case GICV3_G1NS:
> +isfiq = arm_is_secure(env);
> +break;
> +default:
> +

Re: [Qemu-devel] [PATCH v3 12/20] hw/intc/arm_gicv3: Implement GICv3 redistributor registers

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> From: Shlomo Pongratz 
> 
> Implement the redistributor registers of a GICv3.
> 
> Signed-off-by: Shlomo Pongratz 
> [PMM: significantly overhauled/rewritten:
>  * use the new data structures
>  * restructure register read/write to handle different width accesses
>natively, since almost all registers are 32-bit only, rather
>than implementing everything as byte accesses
>  * implemented security extension support
> ]
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/Makefile.objs  |   1 +
>  hw/intc/arm_gicv3_redist.c | 501 
> +
>  hw/intc/gicv3_internal.h   |   4 +
>  trace-events   |   6 +
>  4 files changed, 512 insertions(+)
>  create mode 100644 hw/intc/arm_gicv3_redist.c
> 
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index a173d29..417db11 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -15,6 +15,7 @@ common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv3.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_dist.o
> +common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_redist.o
>  common-obj-$(CONFIG_OPENPIC) += openpic.o
>  
>  obj-$(CONFIG_APIC) += apic.o apic_common.o
> diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
> new file mode 100644
> index 000..e0f23e1
> --- /dev/null
> +++ b/hw/intc/arm_gicv3_redist.c
> @@ -0,0 +1,501 @@
> +/*
> + * ARM GICv3 emulation: Redistributor
> + *
> + * Copyright (c) 2015 Huawei.
> + * Copyright (c) 2016 Linaro Limited.
> + * Written by Shlomo Pongratz, Peter Maydell
> + *
> + * This code is licensed under the GPL, version 2 or (at your option)
> + * any later version.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "trace.h"
> +#include "gicv3_internal.h"
> +
> +static uint32_t mask_group(GICv3CPUState *cs, MemTxAttrs attrs)
> +{
> +/* Return a 32-bit mask which should be applied for this set of 32
> + * interrupts; each bit is 1 if access is permitted by the
> + * combination of attrs.secure and GICR_GROUPR. (GICR_NSACR does
> + * not affect config register accesses, unlike GICD_NSACR.)
> + */
> +if (!attrs.secure && !(cs->gic->gicd_ctlr & GICD_CTLR_DS)) {
> +/* bits for Group 0 or Secure Group 1 interrupts are RAZ/WI */
> +return cs->gicr_igroupr0;
> +}
> +return 0xU;
> +}
> +
> +static void gicr_write_set_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
> +  uint32_t *reg, uint32_t val)
> +{
> +/* Helper routine to implement writing to a "set-bitmap" register */
> +val &= mask_group(cs, attrs);
> +*reg |= val;
> +gicv3_redist_update(cs);
> +}
> +
> +static void gicr_write_clear_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
> +uint32_t *reg, uint32_t val)
> +{
> +/* Helper routine to implement writing to a "clear-bitmap" register */
> +val &= mask_group(cs, attrs);
> +*reg &= ~val;
> +gicv3_redist_update(cs);
> +}
> +
> +static uint32_t gicr_read_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
> + uint32_t reg)
> +{
> +reg &= mask_group(cs, attrs);
> +return reg;
> +}
> +
> +static uint8_t gicr_read_ipriorityr(GICv3CPUState *cs, MemTxAttrs attrs,
> +int irq)
> +{
> +/* Read the value of GICR_IPRIORITYR for the specified interrupt,
> + * honouring security state (these are RAZ/WI for Group 0 or Secure
> + * Group 1 interrupts).
> + */
> +uint32_t prio;
> +
> +prio = cs->gicr_ipriorityr[irq];
> +
> +if (!attrs.secure && !(cs->gic->gicd_ctlr & GICD_CTLR_DS)) {
> +if (!(cs->gicr_igroupr0 & (1U << irq))) {
> +/* Fields for Group 0 or Secure Group 1 interrupts are RAZ/WI */
> +return 0;
> +}
> +/* NS view of the interrupt priority */
> +prio = (prio << 1) & 0xff;
> +}
> +return prio;
> +}
> +
> +static void gicr_write_ipriorityr(GICv3CPUState *cs, MemTxAttrs attrs, int 
> irq,
> +  uint8_t value)
> +{
> +/* Write the value of GICD_IPRIORITYR for the specified interrupt,
> + * honouring security state (these are RAZ/WI for Group 0 or Secure
> + * Group 1 interrupts).
> + */
> +if (!attrs.secure && !(cs->gic->gicd_ctlr & GICD_CTLR_DS)) {
> +if (!(cs->gicr_igroupr0 & (1U << irq))) {
> +/* Fields for Group 0 or Secure Group 1 interrupts are RAZ/WI */
> +return;
> +}
> +/* NS view of the interrupt priority */
> +value = 0x80 | (value >> 1);
> +}
> +cs->gicr_ipriorityr[irq] = value;
> +}
> +
> +static MemTxResult gicr_readb(GICv3CPUState *cs,

Re: [Qemu-devel] [PATCH v3 15/20] hw/intc/arm_gicv3: Implement GICv3 CPU interface registers

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> Implement the CPU interface registers for the GICv3; these are
> CPU system registers, not MMIO registers.
> 
> This commit implements all the registers which are simple
> accessors for GIC state, but not those which act as interfaces
> for acknowledging, dismissing or generating interrupts. (Those
> will be added in a later commit.)
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/Makefile.objs |   1 +
>  hw/intc/arm_gicv3.c   |   2 +
>  hw/intc/arm_gicv3_cpuif.c | 646 
> ++
>  hw/intc/gicv3_internal.h  |   1 +
>  trace-events  |  16 ++
>  5 files changed, 666 insertions(+)
>  create mode 100644 hw/intc/arm_gicv3_cpuif.c
> 
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 417db11..c7bbf88 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -35,3 +35,4 @@ obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
>  obj-$(CONFIG_S390_FLIC_KVM) += s390_flic_kvm.o
>  obj-$(CONFIG_ASPEED_SOC) += aspeed_vic.o
> +obj-$(CONFIG_ARM_GIC) += arm_gicv3_cpuif.o
> diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
> index 65ebca2..8a6c647 100644
> --- a/hw/intc/arm_gicv3.c
> +++ b/hw/intc/arm_gicv3.c
> @@ -369,6 +369,8 @@ static void arm_gic_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
> +
> +gicv3_init_cpuif(s);
>  }
>  
>  static void arm_gicv3_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
> new file mode 100644
> index 000..e112646
> --- /dev/null
> +++ b/hw/intc/arm_gicv3_cpuif.c
> @@ -0,0 +1,646 @@
> +/*
> + * ARM Generic Interrupt Controller v3
> + *
> + * Copyright (c) 2016 Linaro Limited
> + * Written by Peter Maydell
> + *
> + * This code is licensed under the GPL, version 2 or (at your option)
> + * any later version.
> + */
> +
> +/* This file contains the code for the system register interface
> + * portions of the GICv3.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "trace.h"
> +#include "gicv3_internal.h"
> +#include "cpu.h"
> +
> +static GICv3CPUState *icc_cs_from_env(CPUARMState *env)
> +{
> +/* Given the CPU, find the right GICv3CPUState struct.
> + * Since we registered the CPU interface with the EL change hook as
> + * the opaque pointer, we can just directly get from the CPU to it.
> + */
> +return arm_get_el_change_hook_opaque(arm_env_get_cpu(env));
> +}
> +
> +static bool gicv3_use_ns_bank(CPUARMState *env)
> +{
> +/* Return true if we should use the NonSecure bank for a banked GIC
> + * CPU interface register. Note that this differs from the
> + * access_secure_reg() function because GICv3 banked registers are
> + * banked even for AArch64, unlike the other CPU system registers.
> + */
> +return !arm_is_secure_below_el3(env);
> +}
> +
> +static uint64_t icc_pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +GICv3CPUState *cs = icc_cs_from_env(env);
> +uint32_t value = cs->icc_pmr_el1;
> +
> +if (arm_feature(env, ARM_FEATURE_EL3) && !arm_is_secure(env) &&
> +(env->cp15.scr_el3 & SCR_FIQ)) {
> +/* NS access and Group 0 is inaccessible to NS: return the
> + * NS view of the current priority
> + */
> +if (value & 0x80) {
> +/* Secure priorities not visible to NS */
> +value = 0;
> +} else if (value != 0xff) {
> +value = (value << 1) & 0xff;
> +}
> +}
> +
> +trace_gicv3_icc_pmr_read(gicv3_redist_affid(cs), value);
> +
> +return value;
> +}
> +
> +static void icc_pmr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> +  uint64_t value)
> +{
> +GICv3CPUState *cs = icc_cs_from_env(env);
> +
> +trace_gicv3_icc_pmr_write(gicv3_redist_affid(cs), value);
> +
> +value &= 0xff;
> +
> +if (arm_feature(env, ARM_FEATURE_EL3) && !arm_is_secure(env) &&
> +(env->cp15.scr_el3 & SCR_FIQ)) {
> +/* NS access and Group 0 is inaccessible to NS: return the
> + * NS view of the current priority
> + */
> +if (!(cs->icc_pmr_el1 & 0x80)) {
> +/* Current PMR in the secure range, don't allow NS to change it 
> */
> +return;
> +}
> +value = (value >> 1) & 0x80;
> +}
> +cs->icc_pmr_el1 = value;
> +gicv3_cpuif_update(cs);
> +}
> +
> +static uint64_t icc_bpr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +GICv3CPUState *cs = icc_cs_from_env(env);
> +int grp = (ri->crm == 8) ? GICV3_G0 : GICV3_G1;
> +bool satinc = false;
> +uint64_t bpr;
> +
> +if (grp == GICV3_G1 && gicv3_use_ns_bank(env)) {
> +grp = GICV3_G1NS;
> +}
> +
> +if (grp == GICV3_G1 && !arm_is_el3_or_mon(env) &&
> +

Re: [Qemu-devel] [PATCH v3 11/20] hw/intc/arm_gicv3: Implement GICv3 distributor registers

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> From: Shlomo Pongratz 
> 
> Implement the distributor registers of a GICv3.
> 
> Signed-off-by: Shlomo Pongratz 
> [PMM: significantly overhauled/rewritten:
>  * use the new bitmap data structures
>  * restructure register read/write to handle different width accesses
>natively, since almost all registers are 32-bit only, rather
>than implementing everything as byte accesses
>  * implemented security extension support
> ]
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/Makefile.objs|   1 +
>  hw/intc/arm_gicv3_dist.c | 858 
> +++
>  hw/intc/gicv3_internal.h |   4 +
>  trace-events |   6 +
>  4 files changed, 869 insertions(+)
>  create mode 100644 hw/intc/arm_gicv3_dist.c
> 
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 4e94fdd..a173d29 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -14,6 +14,7 @@ common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
>  common-obj-$(CONFIG_ARM_GIC) += arm_gicv3.o
> +common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_dist.o
>  common-obj-$(CONFIG_OPENPIC) += openpic.o
>  
>  obj-$(CONFIG_APIC) += apic.o apic_common.o
> diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
> new file mode 100644
> index 000..881d555
> --- /dev/null
> +++ b/hw/intc/arm_gicv3_dist.c
> @@ -0,0 +1,858 @@
> +/*
> + * ARM GICv3 emulation: Distributor
> + *
> + * Copyright (c) 2015 Huawei.
> + * Copyright (c) 2016 Linaro Limited.
> + * Written by Shlomo Pongratz, Peter Maydell
> + *
> + * This code is licensed under the GPL, version 2 or (at your option)
> + * any later version.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "trace.h"
> +#include "gicv3_internal.h"
> +
> +/* The GICD_NSACR registers contain a two bit field for each interrupt which
> + * allows the guest to give NonSecure code access to registers controlling
> + * Secure interrupts:
> + *  0b00: no access (NS accesses to bits for Secure interrupts will RAZ/WI)
> + *  0b01: NS r/w accesses permitted to ISPENDR, SETSPI_NSR, SGIR
> + *  0b10: as 0b01, and also r/w to ICPENDR, r/o to ISACTIVER/ICACTIVER,
> + *and w/o to CLRSPI_NSR
> + *  0b11: as 0b10, and also r/w to IROUTER and ITARGETSR
> + *
> + * Given a (multiple-of-32) interrupt number, these mask functions return
> + * a mask word where each bit is 1 if the NSACR settings permit access
> + * to the interrupt. The mask returned can then be ORed with the GICD_GROUP
> + * word for this set of interrupts to give an overall mask.
> + */
> +
> +typedef uint32_t maskfn(GICv3State *s, int irq);
> +
> +static uint32_t mask_nsacr_ge1(GICv3State *s, int irq)
> +{
> +/* Return a mask where each bit is set if the NSACR field is >= 1 */
> +uint64_t raw_nsacr = s->gicd_nsacr[irq / 16 + 1];
> +
> +raw_nsacr = raw_nsacr << 32 | s->gicd_nsacr[irq / 16];
> +raw_nsacr = (raw_nsacr >> 1) | raw_nsacr;
> +return half_unshuffle64(raw_nsacr);
> +}
> +
> +static uint32_t mask_nsacr_ge2(GICv3State *s, int irq)
> +{
> +/* Return a mask where each bit is set if the NSACR field is >= 2 */
> +uint64_t raw_nsacr = s->gicd_nsacr[irq / 16 + 1];
> +
> +raw_nsacr = raw_nsacr << 32 | s->gicd_nsacr[irq / 16];
> +raw_nsacr = raw_nsacr >> 1;
> +return half_unshuffle64(raw_nsacr);
> +}
> +
> +/* We don't need a mask_nsacr_ge3() because IROUTER isn't a bitmap 
> register,
> + * but it would be implemented using:
> + *  raw_nsacr = (raw_nsacr >> 1) & raw_nsacr;
> + */
> +
> +static uint32_t mask_group_and_nsacr(GICv3State *s, MemTxAttrs attrs,
> + maskfn *maskfn, int irq)
> +{
> +/* Return a 32-bit mask which should be applied for this set of 32
> + * interrupts; each bit is 1 if access is permitted by the
> + * combination of attrs.secure, GICD_GROUPR and GICD_NSACR.
> + */
> +uint32_t mask;
> +
> +if (!attrs.secure && !(s->gicd_ctlr & GICD_CTLR_DS)) {
> +/* bits for Group 0 or Secure Group 1 interrupts are RAZ/WI
> + * unless the NSACR bits permit access.
> + */
> +mask = *gic_bmp_ptr32(s->group, irq);
> +if (maskfn) {
> +mask |= maskfn(s, irq);
> +}
> +return mask;
> +}
> +return 0xU;
> +}
> +
> +static int gicd_ns_access(GICv3State *s, int irq)
> +{
> +/* Return the 2 bit NS_access field from GICD_NSACR for the
> + * specified interrupt.
> + */
> +if (irq < GIC_INTERNAL || irq >= s->num_irq) {
> +return 0;
> +}
> +return extract32(s->gicd_nsacr[irq / 16], (irq % 16) * 2, 2);
> +}
> +
> +static void gicd_write_set_bitmap_reg(GICv3State *s, MemTxAttrs attrs,
> +  uint32_t

Re: [Qemu-devel] [PATCH v3 10/20] hw/intc/arm_gicv3: Implement functions to identify next pending irq

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> Implement the GICv3 logic to recalculate the highest priority pending
> interrupt for each CPU after some part of the GIC state has changed.
> We avoid unnecessary full recalculation where possible.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/arm_gicv3.c| 293 
> +
>  hw/intc/arm_gicv3_common.c |   9 ++
>  hw/intc/gicv3_internal.h   | 121 +++
>  include/hw/intc/arm_gicv3_common.h |  18 +++
>  4 files changed, 441 insertions(+)
> 
> diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
> index 96e0d2f..171d587 100644
> --- a/hw/intc/arm_gicv3.c
> +++ b/hw/intc/arm_gicv3.c
> @@ -21,6 +21,287 @@
>  #include "hw/intc/arm_gicv3.h"
>  #include "gicv3_internal.h"
>  
> +static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)
> +{
> +/* Return true if this IRQ at this priority should take
> + * precedence over the current recorded highest priority
> + * pending interrupt for this CPU. We also return true if
> + * the current recorded highest priority pending interrupt
> + * is the same as this one (a property which the calling code
> + * relies on).
> + */
> +if (prio < cs->hppi.prio) {
> +return true;
> +}
> +/* If multiple pending interrupts have the same priority then it is an
> + * IMPDEF choice which of them to signal to the CPU. We choose to
> + * signal the one with the lowest interrupt number.
> + */
> +if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
> +return true;
> +}
> +return false;
> +}
> +
> +static uint32_t gicd_int_pending(GICv3State *s, int irq)
> +{
> +/* Recalculate which distributor interrupts are actually pending
> + * in the group of 32 interrupts starting at irq (which should be a 
> multiple
> + * of 32), and return a 32-bit integer which has a bit set for each
> + * interrupt that is eligible to be signaled to the CPU interface.
> + *
> + * An interrupt is pending if:
> + *  + the PENDING latch is set OR it is level triggered and the input is 
> 1
> + *  + its ENABLE bit is set
> + *  + the GICD enable bit for its group is set
> + * Conveniently we can bulk-calculate this with bitwise operations.
> + */
> +uint32_t pend, grpmask;
> +uint32_t pending = *gic_bmp_ptr32(s->pending, irq);
> +uint32_t edge_trigger = *gic_bmp_ptr32(s->edge_trigger, irq);
> +uint32_t level = *gic_bmp_ptr32(s->level, irq);
> +uint32_t group = *gic_bmp_ptr32(s->group, irq);
> +uint32_t grpmod = *gic_bmp_ptr32(s->grpmod, irq);
> +uint32_t enable = *gic_bmp_ptr32(s->enabled, irq);
> +
> +pend = pending | (~edge_trigger & level);
> +pend &= enable;
> +
> +if (s->gicd_ctlr & GICD_CTLR_DS) {
> +grpmod = 0;
> +}
> +
> +grpmask = 0;
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP1NS) {
> +grpmask |= group;
> +}
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP1S) {
> +grpmask |= (~group & grpmod);
> +}
> +if (s->gicd_ctlr & GICD_CTLR_EN_GRP0) {
> +grpmask |= (~group & ~grpmod);
> +}
> +pend &= grpmask;
> +
> +return pend;
> +}
> +
> +static uint32_t gicr_int_pending(GICv3CPUState *cs)
> +{
> +/* Recalculate which redistributor interrupts are actually pending,
> + * and return a 32-bit integer which has a bit set for each interrupt
> + * that is eligible to be signaled to the CPU interface.
> + *
> + * An interrupt is pending if:
> + *  + the PENDING latch is set OR it is level triggered and the input is 
> 1
> + *  + its ENABLE bit is set
> + *  + the GICD enable bit for its group is set
> + * Conveniently we can bulk-calculate this with bitwise operations.
> + */
> +uint32_t pend, grpmask, grpmod;
> +
> +pend = cs->gicr_ipendr0 | (~cs->edge_trigger & cs->level);
> +pend &= cs->gicr_ienabler0;
> +
> +if (cs->gic->gicd_ctlr & GICD_CTLR_DS) {
> +grpmod = 0;
> +} else {
> +grpmod = cs->gicr_igrpmodr0;
> +}
> +
> +grpmask = 0;
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1NS) {
> +grpmask |= cs->gicr_igroupr0;
> +}
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1S) {
> +grpmask |= (~cs->gicr_igroupr0 & grpmod);
> +}
> +if (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP0) {
> +grpmask |= (~cs->gicr_igroupr0 & ~grpmod);
> +}
> +pend &= grpmask;
> +
> +return pend;
> +}
> +
> +/* Update the interrupt status after state in a redistributor
> + * or CPU interface has changed, but don't tell the CPU i/f.
> + */
> +static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
> +{
> +/* Find the highest priority pending interrupt among the
> + * redistributor interrupts (SGIs and PPIs).
> + */
> +bool seenbetter = false;
> +uint8_t prio;
> +int i;
> +

Re: [Qemu-devel] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> There is no need to scan allocation tables if we have mark_all_dirty flag
> set. Just mark it all dirty.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 797659d..c7b3639 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -507,12 +507,16 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>  BlockDriverState *base = s->base;
>  BlockDriverState *bs = blk_bs(s->common.blk);
>  BlockDriverState *target_bs = blk_bs(s->target);
> -bool mark_all_dirty = base == NULL && !bdrv_has_zero_init(target_bs);
>  uint64_t last_pause_ns;
>  int ret, n;
>  
>  end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> +if (base == NULL && !bdrv_has_zero_init(target_bs)) {
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);

Won't work as written.  'end' is 64 bits, but bdrv_set_dirty_bitmap() is
limited to a 32-bit sector count.  Might be first worth updating
bdrv_set_dirty_bitmap() and friends to be byte-based rather than
sector-based (but still tracking a sector per bit, obviously), as well
as expand it to operate on 64-bit sizes rather than 32-bit.

I'm also worried slightly that the existing code repeated things in a
loop, and therefore had pause points every iteration and could thus
remain responsive to an early cancel.  But doing the entire operation in
one chunk (assuming you fix bitmap code to handle a 64-bit size) may end
up running for so long without interruption that you lose the benefits
of an early interruption that you have by virtue of a 32-bit limit.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 08/20] hw/intc/arm_gicv3: Add vmstate descriptors

2016-06-14 Thread Shannon Zhao



On 2016/6/14 22:38, Peter Maydell wrote:
> From: Pavel Fedin 
> 
> Add state structure descriptors for the GICv3 state. We mark
> the KVM GICv3 device as having a migration blocker until the
> code to save and restore the state in the kernel is implemented.
> 
> Signed-off-by: Pavel Fedin 
> [PMM: Adjust to renamed struct fields; switched to using uint32_t
>  array backed bitmaps; add migration blocker setting]
> Signed-off-by: Peter Maydell 
Reviewed-by: Shannon Zhao 

> ---
>  hw/intc/arm_gicv3_common.c | 50 
> +-
>  hw/intc/arm_gicv3_kvm.c|  7 +++
>  2 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
> index 1557833..d1714e4 100644
> --- a/hw/intc/arm_gicv3_common.c
> +++ b/hw/intc/arm_gicv3_common.c
> @@ -49,11 +49,59 @@ static int gicv3_post_load(void *opaque, int version_id)
>  return 0;
>  }
>  
> +static const VMStateDescription vmstate_gicv3_cpu = {
> +.name = "arm_gicv3_cpu",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(level, GICv3CPUState),
> +VMSTATE_UINT32(gicr_ctlr, GICv3CPUState),
> +VMSTATE_UINT32_ARRAY(gicr_statusr, GICv3CPUState, 2),
> +VMSTATE_UINT32(gicr_waker, GICv3CPUState),
> +VMSTATE_UINT64(gicr_propbaser, GICv3CPUState),
> +VMSTATE_UINT64(gicr_pendbaser, GICv3CPUState),
> +VMSTATE_UINT32(gicr_igroupr0, GICv3CPUState),
> +VMSTATE_UINT32(gicr_ienabler0, GICv3CPUState),
> +VMSTATE_UINT32(gicr_ipendr0, GICv3CPUState),
> +VMSTATE_UINT32(gicr_iactiver0, GICv3CPUState),
> +VMSTATE_UINT32(edge_trigger, GICv3CPUState),
> +VMSTATE_UINT32(gicr_igrpmodr0, GICv3CPUState),
> +VMSTATE_UINT32(gicr_nsacr, GICv3CPUState),
> +VMSTATE_UINT8_ARRAY(gicr_ipriorityr, GICv3CPUState, GIC_INTERNAL),
> +VMSTATE_UINT64_ARRAY(icc_ctlr_el1, GICv3CPUState, 2),
> +VMSTATE_UINT64(icc_pmr_el1, GICv3CPUState),
> +VMSTATE_UINT64_ARRAY(icc_bpr, GICv3CPUState, 3),
> +VMSTATE_UINT64_2DARRAY(icc_apr, GICv3CPUState, 3, 4),
> +VMSTATE_UINT64_ARRAY(icc_igrpen, GICv3CPUState, 3),
> +VMSTATE_UINT64(icc_ctlr_el3, GICv3CPUState),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  static const VMStateDescription vmstate_gicv3 = {
>  .name = "arm_gicv3",
> -.unmigratable = 1,
> +.version_id = 1,
> +.minimum_version_id = 1,
>  .pre_save = gicv3_pre_save,
>  .post_load = gicv3_post_load,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(gicd_ctlr, GICv3State),
> +VMSTATE_UINT32_ARRAY(gicd_statusr, GICv3State, 2),
> +VMSTATE_UINT32_ARRAY(group, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(grpmod, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(enabled, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(pending, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(active, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(level, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT32_ARRAY(edge_trigger, GICv3State, GICV3_BMP_SIZE),
> +VMSTATE_UINT8_ARRAY(gicd_ipriority, GICv3State, GICV3_MAXIRQ),
> +VMSTATE_UINT64_ARRAY(gicd_irouter, GICv3State, GICV3_MAXIRQ),
> +VMSTATE_UINT32_ARRAY(gicd_nsacr, GICv3State,
> + DIV_ROUND_UP(GICV3_MAXIRQ, 16)),
> +VMSTATE_STRUCT_VARRAY_POINTER_UINT32(cpu, GICv3State, num_cpu,
> + vmstate_gicv3_cpu, 
> GICv3CPUState),
> +VMSTATE_END_OF_LIST()
> +}
>  };
>  
>  void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
> index acc1730..d08808d 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -119,6 +119,13 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, 
> Error **errp)
>  KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd);
>  kvm_arm_register_device(>iomem_redist, -1, KVM_DEV_ARM_VGIC_GRP_ADDR,
>  KVM_VGIC_V3_ADDR_TYPE_REDIST, s->dev_fd);
> +
> +/* Block migration of a KVM GICv3 device: the API for saving and 
> restoring
> + * the state in the kernel is not yet finalised in the kernel or
> + * implemented in QEMU.
> + */
> +error_setg(>migration_blocker, "vGICv3 migration is not implemented");
> +migrate_add_blocker(s->migration_blocker);
>  }
>  
>  static void kvm_arm_gicv3_class_init(ObjectClass *klass, void *data)
> 

-- 
Shannon

Re: [Qemu-devel] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> The code inside the helper will be extended in the next patch. mirror_run
> itself is overbloated at the moment.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 83 
> ++
>  1 file changed, 49 insertions(+), 34 deletions(-)
> 

Looks like a nice split.
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] backup: Fail early if cannot determine cluster size

2016-06-14 Thread Fam Zheng

On Tue, 06/14 16:49, Max Reitz wrote:
> On 18.05.2016 07:53, Fam Zheng wrote:
> > Otherwise the job is orphaned and block_job_cancel_sync in
> > bdrv_close_all() when quiting will hang.
> > 
> > A simple reproducer is running blockdev-backup from null-co:// to
> > null-co://.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/backup.c | 34 ++
> >  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> Sorry for having waited so long (I was thinking that maybe Jeff wanted
> to take this patch), but now the patch no longer applies and I don't
> feel comfortable with just fixing it up myself; git-backport-diff tells
> me the required changes are "[0015] [FC]".

Oh it turns out this patch is superseded by

commit 91ab68837933232bcef99da7c968e6d41900419b
Author: Kevin Wolf 
Date:   Thu Apr 14 12:59:55 2016 +0200

backup: Don't leak BackupBlockJob in error path

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 

So let's just drop it.

Fam

Re: [Qemu-devel] [PATCH] backup: Fail early if cannot determine cluster size

2016-06-14 Thread Fam Zheng

On Tue, 06/14 16:49, Max Reitz wrote:
> On 18.05.2016 07:53, Fam Zheng wrote:
> > Otherwise the job is orphaned and block_job_cancel_sync in
> > bdrv_close_all() when quiting will hang.
> > 
> > A simple reproducer is running blockdev-backup from null-co:// to
> > null-co://.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/backup.c | 34 ++
> >  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> Sorry for having waited so long (I was thinking that maybe Jeff wanted
> to take this patch), but now the patch no longer applies and I don't
> feel comfortable with just fixing it up myself; git-backport-diff tells
> me the required changes are "[0015] [FC]".

No problem! I'll rebase it and submit again.

Fam

Re: [Qemu-devel] [PATCH] tests: Rename tests/Makefile to tests/Makefile.include

2016-06-14 Thread Eric Blake

On 06/14/2016 06:33 PM, Fam Zheng wrote:
> On Tue, 06/14 16:05, Eric Blake wrote:
>> On 05/31/2016 08:23 PM, Fam Zheng wrote:
>>> The file is only included from the top Makefile. Rename it to reflect
>>> this more obviously.
>>>
>>> Signed-off-by: Fam Zheng 
>>> ---
>>>  Makefile | 2 +-
>>>  tests/{Makefile => Makefile.include} | 0
>>>  2 files changed, 1 insertion(+), 1 deletion(-)
>>>  rename tests/{Makefile => Makefile.include} (100%)
>>
>> Reviewed-by: Eric Blake 
> 
> This one is already applied in master.
> 

Oh, shows how much I rely on tab-completion.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RFC 10/16] hw/ppc/spapr: don't use smp_cores, smp_threads

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 08:23:08AM +0200, Andrew Jones wrote:
> On Tue, Jun 14, 2016 at 01:03:41PM +1000, David Gibson wrote:
> > On Fri, Jun 10, 2016 at 07:40:21PM +0200, Andrew Jones wrote:
> > > Use CPUState nr_cores,nr_threads and MachineState
> > > cores,threads instead.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  hw/ppc/spapr.c  | 9 +
> > >  hw/ppc/spapr_rtas.c | 2 +-
> > >  2 files changed, 6 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 063664234106e..f78276bb4b164 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -35,7 +35,6 @@
> > >  #include "net/net.h"
> > >  #include "sysemu/device_tree.h"
> > >  #include "sysemu/block-backend.h"
> > > -#include "sysemu/cpus.h"
> > >  #include "sysemu/kvm.h"
> > >  #include "sysemu/device_tree.h"
> > >  #include "kvm_ppc.h"
> > > @@ -603,7 +602,7 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> > > *fdt, int offset,
> > >  uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 
> > > 10;
> > >  uint32_t page_sizes_prop[64];
> > >  size_t page_sizes_prop_size;
> > > -uint32_t vcpus_per_socket = smp_threads * smp_cores;
> > > +uint32_t vcpus_per_socket = cs->nr_cores * cs->nr_threads;

This function has an 'spapr' parameter which is a subcloss of
MachineState, so you can get to the machine value from there.

> > >  uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > >  
> > >  /* Note: we keep CI large pages off for now because a 64K capable 
> > > guest
> > > @@ -1774,7 +1773,7 @@ static void ppc_spapr_init(MachineState *machine)
> > >  /* Set up Interrupt Controller before we create the VCPUs */
> > >  spapr->icp = xics_system_init(machine,
> > >DIV_ROUND_UP(max_cpus * 
> > > kvmppc_smt_threads(),
> > > -   smp_threads),
> > > +   machine->threads),
> > >XICS_IRQS, _fatal);
> > >  
> > >  if (smc->dr_lmb_enabled) {
> > > @@ -2268,9 +2267,11 @@ static HotplugHandler 
> > > *spapr_get_hotpug_handler(MachineState *machine,
> > >  
> > >  static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> > >  {
> > > +CPUState *cs = first_cpu;
> > > +
> > >  /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> > >   * socket means much for the paravirtualized PAPR platform) */
> > > -return cpu_index / smp_threads / smp_cores;
> > > +return cpu_index / cs->nr_cores / cs->nr_threads;

Here you can use qdev_get_machine().  That's no more an ugly use of a
global than using first_cpu.

> > >  }
> > >  
> > >  static void spapr_machine_class_init(ObjectClass *oc, void *data)
> > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > > index 43e2c684fda8d..3fdfbb01a20dd 100644
> > > --- a/hw/ppc/spapr_rtas.c
> > > +++ b/hw/ppc/spapr_rtas.c
> > > @@ -742,7 +742,7 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
> > > rtas_addr,
> > >  lrdr_capacity[1] = cpu_to_be32(max_hotplug_addr & 0x);
> > >  lrdr_capacity[2] = 0;
> > >  lrdr_capacity[3] = cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE);
> > > -lrdr_capacity[4] = cpu_to_be32(max_cpus/smp_threads);
> > > +lrdr_capacity[4] = cpu_to_be32(max_cpus / machine->threads);
> > >  ret = qemu_fdt_setprop(fdt, "/rtas", "ibm,lrdr-capacity", 
> > > lrdr_capacity,
> > >   sizeof(lrdr_capacity));
> > >  if (ret < 0) {
> > 
> > I think all the places that use cs->nr_* here it actually makes more
> > sense to use the value in the machine state.
> 
> I think I used machine state whenever I (easily) could. How do I get to
> machine state from a CPU method? I will if I can, for all machines,
> and then gladly kill the CPUState->nr_cores/nr_threads.

I can't speak to all machines, but notes above on how it can be done
for spapr.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 05/16] hw/core/machine: add smp properites

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 08:08:07AM +0200, Andrew Jones wrote:
> On Tue, Jun 14, 2016 at 12:00:26PM +1000, David Gibson wrote:
> > On Fri, Jun 10, 2016 at 07:40:16PM +0200, Andrew Jones wrote:
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  hw/core/machine.c   | 81 
> > > +
> > >  include/hw/boards.h |  6 
> > >  2 files changed, 87 insertions(+)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 3dce9020e510a..2625044002e57 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -172,6 +172,53 @@ static void machine_set_dumpdtb(Object *obj, const 
> > > char *value, Error **errp)
> > >  ms->dumpdtb = g_strdup(value);
> > >  }
> > >  
> > > +static void machine_get_smp(Object *obj, Visitor *v, const char *name,
> > > +void *opaque, Error **errp)
> > > +{
> > > +MachineState *ms = MACHINE(obj);
> > > +int64_t value;
> > > +
> > > +if (strncmp(name, "sockets", 7) == 0) {
> > > +value = ms->sockets;
> > > +} else if (strncmp(name, "cores", 5) == 0) {
> > > +value = ms->cores;
> > > +} else if (strncmp(name, "threads", 7) == 0) {
> > > +value = ms->threads;
> > > +} else if (strncmp(name, "maxcpus", 7) == 0) {
> > > +value = ms->maxcpus;
> > > +} else if (strncmp(name, "cpus", 4) == 0) {
> > > +value = ms->cpus;
> > > +}
> > > +
> > > +visit_type_int(v, name, , errp);
> > > +}
> > 
> > Any particular for multiplexing all the set / get, rather than having
> > separate callbacks for each property?
> 
> Not really. This way just makes denser code. But I'll go whichever
> direction people prefer.

I'd prefer not to have the multiplexer, but I don't care that much.

> Actually I should probably add an
> else { error_report(...) } in either case, which means the multifunction
> direction would still contain strncmps.

Hrm.. that would seem an odd choice to me if you didn't have the
multiplex.  Not including the strncmp() means you can change the
property name (or add aliases) in a single place, without changing the
callback functions.

Note also that the current set of strncmp()s is only correct because
there is a limited set of things bound to this callback.  Remember
that strncmp(name, "sockets", 7) will match "socketsfoo".

> > > +
> > > +static void machine_set_smp(Object *obj, Visitor *v, const char *name,
> > > +void *opaque, Error **errp)
> > > +{
> > > +MachineState *ms = MACHINE(obj);
> > > +Error *error = NULL;
> > > +int64_t value;
> > > +
> > > +visit_type_int(v, name, , );
> > > +if (error) {
> > > +error_propagate(errp, error);
> > > +return;
> > > +}
> > > +
> > > +if (strncmp(name, "sockets", 7) == 0) {
> > > +ms->sockets = value;
> > > +} else if (strncmp(name, "cores", 5) == 0) {
> > > +ms->cores = value;;
> > > +} else if (strncmp(name, "threads", 7) == 0) {
> > > +ms->threads = value;
> > > +} else if (strncmp(name, "maxcpus", 7) == 0) {
> > > +ms->maxcpus = value;
> > > +} else if (strncmp(name, "cpus", 4) == 0) {
> > > +ms->cpus = value;
> > > +}
> > > +}
> > > +
> > >  static void machine_get_phandle_start(Object *obj, Visitor *v,
> > >const char *name, void *opaque,
> > >Error **errp)
> > > @@ -368,8 +415,18 @@ static void machine_init_notify(Notifier *notifier, 
> > > void *data)
> > >  foreach_dynamic_sysbus_device(error_on_sysbus_device, NULL);
> > >  }
> > >  
> > > +static void machine_set_smp_parameters(MachineState *ms)
> > > +{
> > > +if (ms->sockets != -1 || ms->cores != -1 || ms->threads != -1 ||
> > > +ms->maxcpus != -1 || ms->cpus != -1) {
> > > +error_report("warning: cpu topology: "
> > > + "machine properties currently ignored");
> > > +}
> > > +}
> > > +
> > >  static void machine_pre_init(MachineState *ms)
> > >  {
> > > +machine_set_smp_parameters(ms);
> > >  }
> > >  
> > >  static void machine_class_init(ObjectClass *oc, void *data)
> > > @@ -403,6 +460,11 @@ static void machine_initfn(Object *obj)
> > >  ms->dump_guest_core = true;
> > >  ms->mem_merge = true;
> > >  ms->enable_graphics = true;
> > > +ms->sockets = -1;
> > > +ms->cores = -1;
> > > +ms->threads = -1;
> > > +ms->maxcpus = -1;
> > > +ms->cpus = -1;
> > >  
> > >  object_property_add_str(obj, "accel",
> > >  machine_get_accel, machine_set_accel, NULL);
> > > @@ -462,6 +524,25 @@ static void machine_initfn(Object *obj)
> > >  object_property_set_description(obj, "dt-compatible",
> > >  "Overrides the \"compatible\" 
> > > property of the dt root node",
> > >  NULL);
> > > +

Re: [Qemu-devel] [PATCH RFC 07/16] qom/cpu: make nr-cores, nr-threads real properties

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 08:19:49AM +0200, Andrew Jones wrote:
> On Tue, Jun 14, 2016 at 12:12:16PM +1000, David Gibson wrote:
> > On Sun, Jun 12, 2016 at 03:48:10PM +0200, Andrew Jones wrote:
> > > On Sat, Jun 11, 2016 at 08:54:35AM +0200, Thomas Huth wrote:
> > > > On 10.06.2016 19:40, Andrew Jones wrote:
> > > > > Signed-off-by: Andrew Jones 
> > > > > ---
> > > > >  qom/cpu.c | 8 
> > > > >  1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/qom/cpu.c b/qom/cpu.c
> > > > > index 751e992de8823..024cda3eb98c8 100644
> > > > > --- a/qom/cpu.c
> > > > > +++ b/qom/cpu.c
> > > > > @@ -28,6 +28,7 @@
> > > > >  #include "exec/log.h"
> > > > >  #include "qemu/error-report.h"
> > > > >  #include "sysemu/sysemu.h"
> > > > > +#include "hw/qdev-properties.h"
> > > > >  
> > > > >  bool cpu_exists(int64_t id)
> > > > >  {
> > > > > @@ -342,6 +343,12 @@ static int64_t cpu_common_get_arch_id(CPUState 
> > > > > *cpu)
> > > > >  return cpu->cpu_index;
> > > > >  }
> > > > >  
> > > > > +static Property cpu_common_properties[] = {
> > > > > +DEFINE_PROP_INT32("nr-cores", CPUState, nr_cores, 1),
> > > > > +DEFINE_PROP_INT32("nr-threads", CPUState, nr_threads, 1),
> > > > > +DEFINE_PROP_END_OF_LIST()
> > > > > +};
> > > > 
> > > > Are you aware of the current CPU hotplug discussion that is going on?
> > > 
> > > I'm aware of it going on, but haven't been following it.
> > > 
> > > > I'm not very involved there, but I think some of these reworks also move
> > > > "nr_threads" into the CPU state already, e.g. see:
> > > 
> > > nr_threads (and nr_cores) are already state in CPUState. This patch just
> > > exposes that state via properties.
> > > 
> > > > 
> > > > https://github.com/dgibson/qemu/commit/9d07719784ecbeebea71
> > > > 
> > > > ... so you might want to check these patches first to see whether you
> > > > can base your rework on them?
> > > 
> > > Every cpu, and thus every machine, uses CPUState for its cpus. I'm
> > > not sure every machine will want to use that new abstract core class
> > > though. If they did, then we could indeed use nr_threads from there
> > > instead (and remove it from CPUState), but we'd still need nr_cores
> > > from the abstract cpu package class (CPUState).
> > 
> > Hmm.  Since the CPUState object represents just a single thread, it
> > seems weird to me that it would have nr_threads and nr_cores
> > information.
> > 
> > Exposing those as properties makes that much worse, because it's now
> > ABI, rather than internal detail we can clean up at some future time.
> 
> CPUState is supposed to be "State of one CPU core or thread", which
> justifies having nr_threads state, as it may be describing a core.

Um.. does it ever actually represent a (multithread) core in practice?
It would need to have duplicated register state for every thread were
that the case.

> I guess there's no justification for having nr_cores in there though.
> I agree adding the Core class is a good idea, assuming it will get used
> by all machines, and CPUState then gets changed to a Thread class. The
> question then, though, is do we also create a Socket class that contains
> nr_cores?

That was roughly our intention with the way the cross platform hotplug
stuff is evolving.  But the intention was that the Socket objects
would only need to be constructed for machine types where it makes
sense.  So for example on the paravirt pseries platform, we'll only
have Core objects, because the socket distinction isn't really
meaningful.

> And how will a Thread method get that information when it
> needs to emulate, e.g. CPUID, that requires it? It's a bit messy, so
> I'm open to all suggestions on it.

So, if the Thread needs this information, I'm not opposed to it having
it internally (presumably populated earlier from the Core object).
But I am opposed to it being a locked in part of the interface by
having it as an exposed property.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 06/16] vl: move smp parsing to machine pre_init

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 04:03:29PM +0200, Andrew Jones wrote:
> On Tue, Jun 14, 2016 at 01:53:05PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 14/06/2016 13:39, Andrew Jones wrote:
> > > On Tue, Jun 14, 2016 at 10:17:49AM +0200, Paolo Bonzini wrote:
> > >> On 13/06/2016 22:35, Andrew Jones wrote:
> > >>> On Mon, Jun 13, 2016 at 07:04:01PM +0200, Paolo Bonzini wrote:
> >  On 10/06/2016 19:40, Andrew Jones wrote:
> > >> They should just not specify it and get a default of 1. ;)
> > > 
> > > Yeah, threads, the only one we should never calculate, could be
> > > optional. If not specified, defaulting to 1 makes perfect sense.
> > > But, threads=0, which is weird, but in a way specifying that it's
> > > not specified, also makes some sense.
> > 
> > If it's weird, let's make it invalid. :)
> 
> I'm weird, and starting to like it :-)
> 
> > 
> >  - cpus % (cores * threads) != 0
> > >>>
> > >>> Hmm. This makes sense where cpus is the number of cpu packages,
> > >>
> > >> I'm not sure I understand what you mean here.  The point is that the
> > >> machine starts with an integral number of sockets.
> > > 
> > > OK, s/cpus/maxcpus/ then. By using the currently online number, I
> > > thought you were starting to prepare for cpu packages, which are
> > > indivisible sets of cores and threads.
> > 
> > Yes, that's what I meant to do.  Isn't cpus what you call
> > "total-online-threads" below?
> 
> It is. I was thinking it may need to be redefined if we wanted to
> hotplug these "indivisible sets of cores and threads" (sockets),
> instead of what we do today, which is to hotplug one "cpu" at a time.
> However, I just chatted with Igor, and he says cpu hotplug operates
> on logical processors, and thus it's fine to talk about hotplugging
> threads. Even real hardware does this. Real hardware will plug a
> socket full of threads, but then the firmware may keep most of them
> disabled until they're "hotplugged". So, I think the value of cpus
> can be anything. Even the useless value of zero.

Uh.. this depends on the platform (machine type).  ACPI allows
(logical) hotplugging of individual threads.  However on pseries the
hotplug mechanism works at core granularity only.  I don't know of
real examples off hand, but it's easy to imagine a platform interface
which worked at socket, or even a multi-socket-module level

In other words the "cpu package" doesn't necessarily lie at a fixed
level of the thread/core/socket heirarchy.

This is something we're grappling with in trying to implement
cross-platform cpu hotplug, and we obviously don't want to make it
worse with these -smp changes.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 06/16] vl: move smp parsing to machine pre_init

2016-06-14 Thread David Gibson

On Tue, Jun 14, 2016 at 10:17:49AM +0200, Paolo Bonzini wrote:
> 
> 
> On 13/06/2016 22:35, Andrew Jones wrote:
> > On Mon, Jun 13, 2016 at 07:04:01PM +0200, Paolo Bonzini wrote:
> >> On 10/06/2016 19:40, Andrew Jones wrote:
> >>> +if (sockets == -1 || cores == -1 || threads == -1 ||
> >>> +maxcpus == -1 || cpus == -1) {
> >>> +error_report("cpu topology: "
> >>> + "all machine properties must be specified");
> >>> +exit(1);
> >>> +}
> >>> +
> >>
> >> I think it's sane to accept some defaults.  It must not be the DWIM
> >> thing that -smp does (which is targeted to Windows's dislike of
> >> multi-socket machine on consumer hardware).  It must be something that
> >> makes sense, and my proposal is:
> >>
> >> - threads: 1
> >> - cores: 1
> >> - sockets:
> >>   - maxcpus / (cores * threads) if maxcpus given
> >>   - cpus / (cores * threads) if cpus given
> >>   - else 1
> >> - maxcpus: cores * threads * sockets
> >> - cpus: maxcpus
> > 
> > I think some machines may prefer
> > 
> > - threads: 1
> > - sockets: 1
> > - cores:
> >   - maxcpus / (sockets * threads) if maxcpus given
> >   - cpus / (sockets * threads) if cpus given
> >   - else 1
> 
> smp_cores is only used by pseries and x86 machines.  I expect machines
> that must be single-socket to disregard smp_sockets altogether.

Note that on pseries (as a purely paravirt platform), the distinction
between cores and sockets is basically meaningless - there is no
important difference between a threads=4,cores=4,sockets=1 machine and
a threads=4,cores=1,sockets=4 machine.s

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] tests: Rename tests/Makefile to tests/Makefile.include

2016-06-14 Thread Fam Zheng

On Tue, 06/14 16:05, Eric Blake wrote:
> On 05/31/2016 08:23 PM, Fam Zheng wrote:
> > The file is only included from the top Makefile. Rename it to reflect
> > this more obviously.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  Makefile | 2 +-
> >  tests/{Makefile => Makefile.include} | 0
> >  2 files changed, 1 insertion(+), 1 deletion(-)
> >  rename tests/{Makefile => Makefile.include} (100%)
> 
> Reviewed-by: Eric Blake 

This one is already applied in master.

Fam

Re: [Qemu-devel] [PATCH v2] Use "-s" instead of "--quiet" to resolve non-fatal build error on FreeBSD.

2016-06-14 Thread Fam Zheng

On Tue, 06/14 11:07, Sean Bruno wrote:
> The --quiet argument is not available on all operating systems.  Use -s
> instead to match the rest of the Makefile uses.  This fixes a non-fatal
> error seen on FreeBSD.
> 
> Signed-off-by: Sean Bruno 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index ed4032a..a7a356a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -185,7 +185,7 @@ qemu-version.h: FORCE
>   printf '""\n'; \
>   fi; \
>   fi) > $@.tmp)
> - $(call quiet-command, cmp --quiet $@ $@.tmp || mv $@.tmp $@)
> + $(call quiet-command, cmp -s $@ $@.tmp || mv $@.tmp $@)
>  
>  config-host.h: config-host.h-timestamp
>  config-host.h-timestamp: config-host.mak
> -- 
> 2.8.4
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH v3 08/14] block/nbd: Accept SocketAddress

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Add a new option "address" to the NBD block driver which accepts a
> SocketAddress.
> 
> "path", "host" and "port" are still supported as legacy options and are
> mapped to their corresponding SocketAddress representation.

The back-compat work is pretty slick.

> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c   | 97 
> ++-
>  tests/qemu-iotests/051.out|  4 +-
>  tests/qemu-iotests/051.pc.out |  4 +-
>  3 files changed, 64 insertions(+), 41 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 3adf302..9ae66ba 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -32,6 +32,8 @@
>  #include "qemu/uri.h"
>  #include "block/block_int.h"
>  #include "qemu/module.h"
> +#include "qapi-visit.h"
> +#include "qapi/qmp-input-visitor.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qjson.h"
>  #include "qapi/qmp/qint.h"
> @@ -128,7 +130,9 @@ static bool nbd_has_filename_options_conflict(QDict 
> *options, Error **errp)
>  if (!strcmp(e->key, "host")
>  || !strcmp(e->key, "port")
>  || !strcmp(e->key, "path")
> -|| !strcmp(e->key, "export"))
> +|| !strcmp(e->key, "export")
> +|| !strcmp(e->key, "address")
> +|| !strncmp(e->key, "address.", 8))

May need a tweak if you break before '||'

>  {
>  error_setg(errp, "Option '%s' cannot be used with a file name",
> e->key);
> @@ -202,46 +206,66 @@ out:
>  g_free(file);
>  }
>  
> +static bool nbd_process_legacy_socket_options(QDict *options, Error **errp)
> +{
> +if (qdict_haskey(options, "path") && qdict_haskey(options, "host")) {
> +error_setg(errp, "path and host may not be used at the same time");
> +return false;
> +} else if (qdict_haskey(options, "path")) {
> +if (qdict_haskey(options, "port")) {
> +error_setg(errp, "port may not be used without host");
> +return false;
> +}
> +
> +qdict_put(options, "address.type", qstring_from_str("unix"));
> +qdict_change_key(options, "path", "address.data.path");
> +} else if (qdict_haskey(options, "host")) {
> +qdict_put(options, "address.type", qstring_from_str("inet"));
> +qdict_change_key(options, "host", "address.data.host");
> +if (!qdict_change_key(options, "port", "address.data.port")) {
> +qdict_put(options, "address.data.port",
> +  qstring_from_str(stringify(NBD_DEFAULT_PORT)));
> +}

The rewrite from old to modern is rather nice.  I wonder if we could
then use a generated qapi_visit_SocketAddress instead of manually
handling all the complication of SocketAddress proper.

> +}
> +
> +return true;
> +}
> +
>  static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char 
> **export,
>   Error **errp)
>  {
> -SocketAddress *saddr;
> +SocketAddress *saddr = NULL;
> +QDict *addr = NULL;
> +QObject *crumpled_addr;
> +QmpInputVisitor *iv;
> +Error *local_err = NULL;
>  
> -if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
> -if (qdict_haskey(options, "path")) {
> -error_setg(errp, "path and host may not be used at the same 
> time");
> -} else {
> -error_setg(errp, "one of path and host must be specified");
> -}
> -return NULL;
> +if (!nbd_process_legacy_socket_options(options, errp)) {
> +goto fail;
>  }
> -if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) {
> -error_setg(errp, "port may not be used without host");
> -return NULL;
> +
> +qdict_extract_subqdict(options, , "address.");
> +if (!qdict_size(addr)) {
> +error_setg(errp, "NBD server address missing");
> +goto fail;
>  }
>  
> -saddr = g_new0(SocketAddress, 1);
> +crumpled_addr = qdict_crumple(addr, true, errp);
> +if (!crumpled_addr) {
> +goto fail;
> +}
>  

Ah, so you ARE depending on Dan's qdict_crumple().

> -if (qdict_haskey(options, "path")) {
> -UnixSocketAddress *q_unix;
> -saddr->type = SOCKET_ADDRESS_KIND_UNIX;
> -q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
> -q_unix->path = g_strdup(qdict_get_str(options, "path"));
> -qdict_del(options, "path");
> -} else {
> -InetSocketAddress *inet;
> -saddr->type = SOCKET_ADDRESS_KIND_INET;
> -inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
> -inet->host = g_strdup(qdict_get_str(options, "host"));
> -if (!qdict_get_try_str(options, "port")) {
> -inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
> -} else {
> -inet->port = g_strdup(qdict_get_str(options, "port"));
> -}
> -

Re: [Qemu-devel] [PATCH v3 07/14] block/nbd: "address" in nbd_refresh_filename()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> As of a future patch, the NBD block driver will accept a SocketAddress
> structure for a new "address" option. In order to support this,
> nbd_refresh_filename() needs some changes.
> 
> The two TODOs introduced by this patch will be removed in the very next
> one. They exist to explain that it is currently impossible for
> nbd_refresh_filename() to emit an "address.*" option (which the NBD
> block driver does not handle yet). The next patch will arm these code
> paths, but it will also enable handling of these options.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 84 
> -
>  1 file changed, 61 insertions(+), 23 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 1736f68..3adf302 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -441,37 +441,75 @@ static void nbd_attach_aio_context(BlockDriverState *bs,
>  static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
>  {
>  QDict *opts = qdict_new();
> -const char *path   = qdict_get_try_str(options, "path");
> -const char *host   = qdict_get_try_str(options, "host");
> -const char *port   = qdict_get_try_str(options, "port");
> +bool can_generate_filename = true;
> +const char *path = NULL, *host = NULL, *port = NULL;
>  const char *export = qdict_get_try_str(options, "export");
>  const char *tlscreds = qdict_get_try_str(options, "tls-creds");
>  
> -if (host && !port) {
> -port = stringify(NBD_DEFAULT_PORT);
> +if (qdict_get_try_str(options, "address.type")) {
> +/* This path will only be possible as of a future patch;
> + * TODO: Remove this note once it is */
> +
> +const char *type = qdict_get_str(options, "address.type");
> +

Oh, I'm so tempted to teach the QAPI generator how to make a
discriminated union have a default 'type' value (thus making the
discriminator optional), so that we don't need a layer of nesting behind
'address.'.

> +if (!strcmp(type, "unix")) {
> +path = qdict_get_str(options, "address.data.path");
> +} else if (!strcmp(type, "inet")) {
> +host = qdict_get_str(options, "address.data.host");
> +port = qdict_get_str(options, "address.data.port");

It's especially annoying that because SocketAddress is not flat, we have
to expose the 'data.' layer of nesting, even if we could avoid the
'address.' layer.

> +
> +can_generate_filename = !qdict_haskey(options, "address.data.to")
> + && !qdict_haskey(options, 
> "address.data.ipv4")
> + && !qdict_haskey(options, 
> "address.data.ipv6");
> +} else {
> +can_generate_filename = false;
> +}
> +} else {
> +path = qdict_get_try_str(options, "path");
> +host = qdict_get_try_str(options, "host");
> +port = qdict_get_try_str(options, "port");
> +
> +if (host && !port) {
> +port = stringify(NBD_DEFAULT_PORT);
> +}
>  }

Looks clean given the constraints of what you are able to use from QAPI.

> +
> +if (qdict_get_try_str(options, "address.type")) {
> +/* This path will only be possible as of a future patch;
> + * TODO: Remove this note once it is */
> +
> +const QDictEntry *e;
> +for (e = qdict_first(options); e; e = qdict_next(options, e)) {
> +if (!strncmp(e->key, "address.", 8)) {
> +qobject_incref(e->value);
> +qdict_put_obj(opts, e->key, e->value);
> +}
> +}

This part makes me wonder if we want Dan's qdict_crumple() working first.

>  } else {
> -qdict_put(opts, "host", qstring_from_str(host));
> -qdict_put(opts, "port", qstring_from_str(port));
> +if (path) {
> +qdict_put(opts, "path", qstring_from_str(path));
> +} else {
> +qdict_put(opts, "host", qstring_from_str(host));
> +qdict_put(opts, "port", qstring_from_str(port));
> +}
>  }
>  if (export) {
>  qdict_put(opts, "export", qstring_from_str(export));
> 

At this point, I'll reserve giving R-b until I've seen the whole series
(it may need rebasing anyways...)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 06/14] block/nbd: Add nbd_has_filename_options_conflict()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Right now, we have four possible options that conflict with specifying
> an NBD filename, and a future patch will add another one ("address").
> This future option is a nested QDict that is flattened at this point,
> requiring as to test each option whether its key has an "address."
> prefix. Therefore, we will then need to iterate through all options.
> 
> Adding this iteration logic now will simplify adding the new option
> later. A nice side effect is that the user will not receive a long list
> of five options which are not supposed to be specified with a filename,
> but we can actually print the problematic option.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 26 --
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index d12bcc6..1736f68 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -120,6 +120,25 @@ out:
>  return ret;
>  }
>  
> +static bool nbd_has_filename_options_conflict(QDict *options, Error **errp)
> +{
> +const QDictEntry *e;
> +
> +for (e = qdict_first(options); e; e = qdict_next(options, e)) {
> +if (!strcmp(e->key, "host")
> +|| !strcmp(e->key, "port")

I know there are already instances of breaking before || in this file,
but most of qemu breaks after, as in:

if (!strcmp(e->key, "host") ||
!strcmp(e->key, "port") ||
...


But choice of formatting is trivial, so:
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv

2016-06-14 Thread Eric Blake

On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> 4th argument is flags rather than size. Fortunately flags occupies
> 5 less significant bits and they are always zero due to alignment.
> 
> Signed-off-by: Denis V. Lunev 
> Reviewed-by: Vladimir Sementsov-Ogievskiy
> CC: Stefan Hajnoczi 
> CC: Fam Zheng 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Jeff Cody 
> CC: Eric Blake 
> ---
>  block/mirror.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

Duplicate of this patch, already on Kevin's block tree:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03377.html

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 for-2.7 00/14] qapi: Allow blockdev-add for NBD

2016-06-14 Thread Eric Blake

On 05/03/2016 09:23 AM, Kevin Wolf wrote:
> Am 06.04.2016 um 20:28 hat Max Reitz geschrieben:
>> Turns out NBD is not so simple to do if you do it right. Anyway, this
>> series adds blockdev-add support for NBD clients.
> 
> What the series does seems to make sense to me, though things would be a
> bit nicer (especially on the command line) if SocketAddress had been a
> flat union from the beginning.  It's too late to change now, and I guess
> the ugliness isn't worth duplicating it.

I'm sorely tempted to make SocketAddress a flat union anyways (or even
an alternate that can be flat or nested at will), if only for
convenience.  But that's a pipe dream - there's probably not enough time
before 2.7 to actually achieve it.

> 
> I'll leave the in-detail review to our NBD folks.
> 
> Kevin
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 05/14] block/nbd: Use qdict_put()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Instead of inlining this nice macro (i.e. resorting to
> qdict_put_obj(..., QOBJECT(...))), use it.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 04/14] block/nbd: Default port in nbd_refresh_filename()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Instead of not emitting the port in nbd_refresh_filename(), just set it
> to the default if the user did not specify it. This makes the logic a
> bit simpler.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 2112ec0..efa5d3d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -433,6 +433,10 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
> QDict *options)
>  const char *export = qdict_get_try_str(options, "export");
>  const char *tlscreds = qdict_get_try_str(options, "tls-creds");
>  
> +if (host && !port) {
> +port = stringify(NBD_DEFAULT_PORT);
> +}

It would be nice to store the port as a number rather than a string -
but that's not your task (I've long thought that port should be a QAPI
alternate type, with both string and number branches, but it's a big
audit and code change to actually make it happen).

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1592590] [NEW] Prevent qemu-img resize from causing "Active L1 table too large"

2016-06-14 Thread Nowaker

Public bug reported:

This commit prevents qemu from overallocating if qcow2 image is too big
(whatever that means): https://lists.gnu.org/archive/html/qemu-
devel/2014-07/msg01481.html

However, `qemu-img resize` isn't protected by the same code and allows
to go beyond that.

root@nwkr-laptop ~virtkick/hdd # qemu-img resize 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2 +10T
Image resized.

Which then causes "Active L1 table too large" error that cannot be
reversed.

root@nwkr-laptop ~virtkick/hdd # qemu-img info 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2
qemu-img: Could not open '33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2': 
Active L1 table too large

root@nwkr-laptop ~virtkick/hdd # qemu-img resize 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2 -10T
qemu-img: Could not open '33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2': 
Active L1 table too large


I originally faces this bug when I passed wrong parameters to qemu-img in a 
programatic way which caused an image to go corrupt. It's good to protect 
user's images from being resized too much.

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: qemu-img

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1592590

Title:
  Prevent qemu-img resize from causing "Active L1 table too large"

Status in QEMU:
  New

Bug description:
  This commit prevents qemu from overallocating if qcow2 image is too
  big (whatever that means): https://lists.gnu.org/archive/html/qemu-
  devel/2014-07/msg01481.html

  However, `qemu-img resize` isn't protected by the same code and allows
  to go beyond that.

  root@nwkr-laptop ~virtkick/hdd # qemu-img resize 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2 +10T
  Image resized.

  Which then causes "Active L1 table too large" error that cannot be
  reversed.

  root@nwkr-laptop ~virtkick/hdd # qemu-img info 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2
  qemu-img: Could not open 
'33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2': Active L1 table too large

  root@nwkr-laptop ~virtkick/hdd # qemu-img resize 
33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2 -10T
  qemu-img: Could not open 
'33_test_609dffde-eb51-4b75-918d-b814f1bcb526.qcow2': Active L1 table too large

  
  I originally faces this bug when I passed wrong parameters to qemu-img in a 
programatic way which caused an image to go corrupt. It's good to protect 
user's images from being resized too much.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1592590/+subscriptions

Re: [Qemu-devel] [PATCH v2] Use "-s" instead of "--quiet" to resolve non-fatal build error on FreeBSD.

2016-06-14 Thread Eric Blake

On 06/14/2016 12:07 PM, Sean Bruno wrote:
> The --quiet argument is not available on all operating systems.  Use -s
> instead to match the rest of the Makefile uses.  This fixes a non-fatal
> error seen on FreeBSD.
> 
> Signed-off-by: Sean Bruno 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

See also my pending patch related to this code:
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02375.html

if Paolo wants to pull in all the changes related to version generation
in his next pull request.

> 
> diff --git a/Makefile b/Makefile
> index ed4032a..a7a356a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -185,7 +185,7 @@ qemu-version.h: FORCE
>   printf '""\n'; \
>   fi; \
>   fi) > $@.tmp)
> - $(call quiet-command, cmp --quiet $@ $@.tmp || mv $@.tmp $@)
> + $(call quiet-command, cmp -s $@ $@.tmp || mv $@.tmp $@)
>  
>  config-host.h: config-host.h-timestamp
>  config-host.h-timestamp: config-host.mak
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 01/14] qdict: Add qdict_change_key()

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> This is a shorthand function for changing a QDict's entry's key.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/qapi/qmp/qdict.h |  1 +
>  qobject/qdict.c  | 23 +++
>  2 files changed, 24 insertions(+)

Reviewed-by: Eric Blake 

However, it would be nice to enhance tests/check-qdict.c to cover this.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4 04/11] nbd: Improve server handling of bogus commands

2016-06-14 Thread Paolo Bonzini



On 14/06/2016 17:59, Alex Bligh wrote:
> 
>> On 14 Jun 2016, at 16:11, Paolo Bonzini  wrote:
>>
>>> To illustrate the problem, look consider what qemu itself would do as
>>> a server if it can't buffer the entire read issued to it.
>>
>> Return ENOMEM?
> 
> Well OK, qemu then 'works' on the basis it breaks another
> part of the spec, which is coping with long reads.

ENOMEM is a documented error code, and the limits extension will help
with that as well.

>> However, it looks like the
>> de facto status prior to structured replies is that the error is in the
>> spec, and this patch introduces a regression.
> 
> Well, I guess the patch makes it work the same as the
> reference server implementation and the spec, which I'd
> consider a fix. My view is that the error is in the
> kernel client.

... and QEMU and BSD.  What good is a server that doesn't interoperate
(albeit only in error cases) with any client?

Paolo

Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC 06/16] vl: move smp parsing to machine pre_init

2016-06-14 Thread Paolo Bonzini



On 14/06/2016 23:32, Programmingkid wrote:
> 
> On Jun 14, 2016, at 5:13 PM, BALATON Zoltan wrote:
> 
>> On Tue, 14 Jun 2016, Programmingkid wrote:
>>> On Jun 14, 2016, at 7:39 AM, qemu-ppc-requ...@nongnu.org wrote:
 smp_cores is only used by pseries and x86 machines.  I expect machines
 that must be single-socket to disregard smp_sockets altogether.
>>>
>>> Could smp support be added to the beigeg3 and mac99 targets?
>>
>> I think the machines these are emulating had no SMP so it would not make 
>> much sense. Maybe you'd need to create a new target emulating a Mac model 
>> with multiple CPUs instead or update mac99 to a newer model that had SMP?
> 
> http://www.macworld.com/article/1002601/multiprocessor.html

Do you also have a link to the specs or hardware programming manuals?

Paolo

Re: [Qemu-devel] Regarding the status of Gluster's multi volfile server patches

2016-06-14 Thread Eric Blake

On 06/14/2016 06:36 AM, Jeff Cody wrote:
> On Tue, Jun 14, 2016 at 03:27:55PM +0530, Prasanna Kalever wrote:
>> Hello,
>>
>> Can someone please help me in tracking the QAPI changes for union
>> discriminator addition,
>> the below [1] gluster related patches are awaiting from very long,
>> because of the dependency with QMP design changes.
>>
>> These patches [1] also stuck changes in libvirt/vdsm. It will be
>> really helpful for us, if someone can suggest what can be done to
>> accept this patches as early as possible.
>>
>> Many thanks in advance.
>>
>> [1] https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06324.html
> 
> 
> Hi Prasanna,
> 
> The last I know, as of v15 / v16, is that the libvirt team (Eric Blake)
> still has outstanding concerns on the QAPI interface.
> 
> Eric, are your concerns regarding the API for this series still outstanding,
> or is this something you'd be comfortable supporting in libvirt with the
> current proposed interface?

I've currently posted all patches in my working tree, but review says I
need another round of review before it gets merged:
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03569.html
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03900.html

I'm also one or two patches shy of being able to use one union type as
the branch of another flat union type, where I still have to write the
patch, but it should be easy on top of the qapi patches that have landed
so far.

I also know that we want to get NBD converted for blockdev-add at the
same time, and that NBD, gluster, sheepdog, and other network-based
devices will share similar issues, so on my plate is to review this series:
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg01048.html

Can you please point me to the last proposal for gluster, and/or repost
any series you have for proposing the idea?  I want to help get working
blockdev-add into 2.7 if at all possible.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 02/14] block/nbd: Drop trailing "." in error messages

2016-06-14 Thread Eric Blake

On 04/06/2016 12:28 PM, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  block/nbd.c   | 4 ++--
>  tests/qemu-iotests/051.out| 4 ++--
>  tests/qemu-iotests/051.pc.out | 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 

Could go in via qemu-trivial, if desired.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 5/7] trace: enable tracing in qemu-nbd

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> Please note, trace_init_backends() must be called in the final process,
> i.e. after daemonization. This is necessary to keep tracing thread in
> the proper process.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile  |  2 +-
>  qemu-nbd.c| 18 +-
>  qemu-nbd.texi |  3 +++
>  3 files changed, 21 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCHv2 2/2] MAINTAINERS: remove Blue Swirl as SPARC maintainer

2016-06-14 Thread Paolo Bonzini

On 14/06/2016 10:33, Markus Armbruster wrote:
> The remaining ones are
> effectively orphaned, but you can't see that in MAINTAINERS.  I feel
> keeping status there honest is important.
> 
> Subsystems
> --
> Checkpatch
> M: Blue Swirl 
> S: Odd Fixes
> F: scripts/checkpatch.pl

This one is okay to remove M and even bump to Maintained.  It is in
obviously heavy use and has been in the "miscellaneous" pool for a
while.  But I don't really long for having my name attached to it. :)

> 
> Usermode Emulation
> --
> BSD user
> M: Blue Swirl 
> S: Maintained
> F: bsd-user/

The FreeBSD guys have effectively forked bsd-user.  Orphan is appropriate.

> Tiny Code Generator (TCG)
> -
> SPARC target
> M: Blue Swirl 
> S: Maintained
> F: tcg/sparc/
> F: disas/sparc.c

Remove M and leave it as Odd Fixes, the TCG maintainers can have a look
at the occasional patch.

Paolo

Re: [Qemu-devel] [PATCH] tests: Rename tests/Makefile to tests/Makefile.include

2016-06-14 Thread Eric Blake

On 05/31/2016 08:23 PM, Fam Zheng wrote:
> The file is only included from the top Makefile. Rename it to reflect
> this more obviously.
> 
> Signed-off-by: Fam Zheng 
> ---
>  Makefile | 2 +-
>  tests/{Makefile => Makefile.include} | 0
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename tests/{Makefile => Makefile.include} (100%)

Reviewed-by: Eric Blake 

> 
> diff --git a/Makefile b/Makefile
> index 3a3c5dc..0cd111b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -161,7 +161,7 @@ dummy := $(call unnest-vars,, \
>  common-obj-m)
>  
>  ifneq ($(wildcard config-host.mak),)
> -include $(SRC_PATH)/tests/Makefile
> +include $(SRC_PATH)/tests/Makefile.include
>  endif
>  
>  all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all modules
> diff --git a/tests/Makefile b/tests/Makefile.include
> similarity index 100%
> rename from tests/Makefile
> rename to tests/Makefile.include
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/2] build-sys: define QEMU_SRC_PATH

2016-06-14 Thread Michael Roth

Quoting marcandre.lur...@redhat.com (2016-06-14 08:16:53)
> From: Marc-André Lureau 
> 
> Define QEMU_SRC_PATH in config-host.h, to ease accessing of tests data
> files.
> 
> Signed-off-by: Marc-André Lureau 

I know this avoids the need to define environment variables for
individual test targets to pass on SRC_PATH, but the fact that
we didn't rely on this before made me a bit apprehensive about
suggesting it. qemu-iotests for instance relies on a symlink
back to SRC_PATH, and check-qapi-schema tests feed individual
schema files to a script via tests/Makefile.include. Both can
be made to check against a different/changing SRC_PATH, but
if we bake it into the code that's not possible.

I'm not sure if there's a valid use-case for needing to do
so, but it seems to be bad form to not have any mechanism
to change them without recompiling. Personally I think I'd
prefer the environment variable approach, even if it means
needing to add it per-target. I think if we wanted to get
fancy we could do this via a recipe that exports environment
variables added to a lazily-evaluated Makefile variable by
each target's dependencies, but it's probably not worth it
outside of a more general cleanup to how we handle
SRC_PATH/BUILD_DIR dependencies throughout tests/

If others are fine with the approach taken here though I
wouldn't hold things up.

> ---
>  scripts/create_config | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/scripts/create_config b/scripts/create_config
> index 1dd6a35..2fbe126 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -116,6 +116,9 @@ case $line in
>   DSOSUF=*)
>  echo "#define HOST_DSOSUF \"${line#*=}\""
>  ;;
> + SRC_PATH=*)
> + echo "#define QEMU_SRC_PATH \"${line#*=}\""
> + ;;
>  esac
> 
>  done # read
> -- 
> 2.7.4
>

[Qemu-devel] [Bug 1588328] Re: Qemu 2.6 Solaris 9 Sparc Segmentation Fault

2016-06-14 Thread Mark Cave-Ayland

Thanks for the test case. It appears that this is a regression that
occurred somewhere between 2.5 and 2.6 - bisecting now.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1588328

Title:
  Qemu 2.6 Solaris 9 Sparc Segmentation Fault

Status in QEMU:
  New

Bug description:
  Hi,
  I tried the following command to boot Solaris 9 sparc:
  qemu-system-sparc -nographic -boot d -hda ./Spark9.disk -m 256 -cdrom 
sol-9-905hw-ga-sparc-dvd.iso -serial telnet:0.0.0.0:3000,server 

  It seems there are a few Segmentation Faults, one from the starting of
  the boot. Another at the beginning of the commandline installation.

  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
  Configuration device id QEMU version 1 machine id 32
  Probing SBus slot 0 offset 0
  Probing SBus slot 1 offset 0
  Probing SBus slot 2 offset 0
  Probing SBus slot 3 offset 0
  Probing SBus slot 4 offset 0
  Probing SBus slot 5 offset 0
  Invalid FCode start byte
  CPUs: 1 x FMI,MB86904
  UUID: ----
  Welcome to OpenBIOS v1.1 built on Apr 18 2016 08:19
Type 'help' for detailed information
  Trying cdrom:d...
  Not a bootable ELF image
  Loading a.out image...
  Loaded 7680 bytes
  entry point is 0x4000
  bootpath: 
/iommu@0,1000/sbus@0,10001000/espdma@5,840/esp@5,880/sd@2,0:d

  Jumping to entry point 4000 for type 0005...
  switching to new context:
  SunOS Release 5.9 Version Generic_118558-34 32-bit
  Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  WARNING: 
/iommu@0,1000/sbus@0,10001000/espdma@5,840/esp@5,880/sd@0,0 (sd0):
Corrupt label; wrong magic number

  Segmentation Fault
  Configuring /dev and /devices
  NOTICE: Couldn't set value (../../sun/io/audio/sada/drv/audiocs/audio_4231.c, 
Line #1759 0x00 0x88)
  audio may not work correctly until it is stopped and restarted
  Segmentation Fault
  Using RPC Bootparams for network configuration information.
  Skipping interface le0
  Searching for configuration file(s)...
  Search complete.

  

  What type of terminal are you using?
   1) ANSI Standard CRT
   2) DEC VT52
   3) DEC VT100
   4) Heathkit 19
   5) Lear Siegler ADM31
   6) PC Console
   7) Sun Command Tool
   8) Sun Workstation
   9) Televideo 910
   10) Televideo 925
   11) Wyse Model 50
   12) X Terminal Emulator (xterms)
   13) CDE Terminal Emulator (dtterm)
   14) Other
  Type the number of your choice and press Return: 3
  syslog service starting.
  savecore: no dump device configured
  Running in command line mode
  /sbin/disk0_install[109]: 143 Segmentation Fault
  /sbin/run_install[130]: 155 Segmentation Fault

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1588328/+subscriptions

Re: [Qemu-devel] [PATCH v1 01/40] trace: add build framework for merging trace-events files

2016-06-14 Thread Eric Blake

On 06/14/2016 06:26 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 09, 2016 at 05:57:55PM +0100, Daniel P. Berrange wrote:
>> +$(BUILD_DIR)/trace-events-all: $(trace-events-y:%=$(SRC_PATH)/%)
>> +$(call quiet-command,cat $^ > $@)
> 
> $^ needs to be a stable ordering across make invocations and across
> machines to avoid thrashing ccache and spuriously renumbering trace
> event IDs.
> 
> I guess it is stable but just in case anyone knows of a case where $^
> would change order without any changes to the actual files (e.g.
> readdir(2) returning things in a different order, make parallel jobs,
> etc)...

As far as I can tell from 'info make', $^ is the list of prerequisites,
_in prerequisite order_, minus duplicates.  So the only way the order
will change is if you edit Makefile, and thus it looks stable to me.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 4/4] machine: remove iommu property

2016-06-14 Thread Paolo Bonzini



On 14/06/2016 20:38, Eduardo Habkost wrote:
> On Mon, Jun 13, 2016 at 05:37:00PM +0300, Marcel Apfelbaum wrote:
>> Since iommu devices can be created with '-device' there is
>> no need to keep iommu as machine and mch property.
>>
>> Signed-off-by: Marcel Apfelbaum 
> 
> On the one hand, the option is present since v2.2.0. On the other
> hand, it is present only in q35, and the property really doesn't
> belong to TYPE_MACHINE (so even if we decide to keep it for a few
> releases, property registration needs to be moved to
> pc_q35_machine_options().
> 
> What kind of usage the existing code has, currently? Development
> and debugging only, or is there any chance somebody might be
> using it in production?

Definitely development and debugging only.  Someone might be using it in
automated tests, but that's pretty much it.

Paolo

Re: [Qemu-devel] [PATCH qemu v17 06/12] memory: Add reporting of supported page sizes

2016-06-14 Thread Alexey Kardashevskiy

On 08/06/16 16:05, Alexey Kardashevskiy wrote:
> On 08/06/16 16:00, David Gibson wrote:
>> On Mon, Jun 06, 2016 at 03:31:04PM +0200, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/06/2016 05:35, David Gibson wrote:
 On Wed, Jun 01, 2016 at 06:57:37PM +1000, Alexey Kardashevskiy wrote:
>> Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
>> uses when translating, however this information is not available outside
>> the translate context for various checks.
>>
>> This adds a get_page_sizes callback to MemoryRegionIOMMUOps and
>> a wrapper for it so IOMMU users (such as VFIO) can know the actual
>> page size(s) used by an IOMMU.
>>
>> As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
>> as fallback.
>>
>> This removes vfio_container_granularity() and uses new helper in
>> memory_region_iommu_replay() when replaying IOMMU mappings on added
>> IOMMU memory region.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> Reviewed-by: David Gibson 
 Paolo,

 Looks like you were left off the CC for this one.

 I think this is ready to go - do you want to merge, comment or ack and
 we'll take it either through my tree or Alex's?
>>>
>>> It's okay for you to merge, but the callback should be called
>>> "get_page_size" or "get_replay_granularity".  The plural is weird.
>>
>> Hm, no, it really could return multiple page sizes if the logical
>> (guest side) IOMMU supports them.
> 
> It could but it does not now and I cannot see it coming in near future so I
> am really confused now about the naming and what the callback should return
> - one page size or a mask. What should it be now?


Paolo, David?


> 
>> That might be useful at some point
>> in the future.  For now, it's sufficient for the replay to use the
>> smallest pagesize.
>>
> 
> 


-- 
Alexey



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL 5/8] target-sparc: Use global registers for the register window

2016-06-14 Thread Mark Cave-Ayland

On 23/02/16 18:33, Richard Henderson wrote:

> Via indirection off cpu_regwptr.
> 
> Tested-by: Mark Cave-Ayland 
> Signed-off-by: Richard Henderson 
> ---
>  target-sparc/translate.c | 57 
> 
>  1 file changed, 33 insertions(+), 24 deletions(-)
> 
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 4be56dd..00d61ee 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -43,7 +43,8 @@ static TCGv_ptr cpu_env, cpu_regwptr;
>  static TCGv cpu_cc_src, cpu_cc_src2, cpu_cc_dst;
>  static TCGv_i32 cpu_cc_op;
>  static TCGv_i32 cpu_psr;
> -static TCGv cpu_fsr, cpu_pc, cpu_npc, cpu_gregs[8];
> +static TCGv cpu_fsr, cpu_pc, cpu_npc;
> +static TCGv cpu_regs[32];
>  static TCGv cpu_y;
>  #ifndef CONFIG_USER_ONLY
>  static TCGv cpu_tbr;
> @@ -273,36 +274,31 @@ static inline void gen_address_mask(DisasContext *dc, 
> TCGv addr)
>  
>  static inline TCGv gen_load_gpr(DisasContext *dc, int reg)
>  {
> -if (reg == 0 || reg >= 8) {
> +if (reg > 0) {
> +assert(reg < 32);
> +return cpu_regs[reg];
> +} else {
>  TCGv t = get_temp_tl(dc);
> -if (reg == 0) {
> -tcg_gen_movi_tl(t, 0);
> -} else {
> -tcg_gen_ld_tl(t, cpu_regwptr, (reg - 8) * sizeof(target_ulong));
> -}
> +tcg_gen_movi_tl(t, 0);
>  return t;
> -} else {
> -return cpu_gregs[reg];
>  }
>  }
>  
>  static inline void gen_store_gpr(DisasContext *dc, int reg, TCGv v)
>  {
>  if (reg > 0) {
> -if (reg < 8) {
> -tcg_gen_mov_tl(cpu_gregs[reg], v);
> -} else {
> -tcg_gen_st_tl(v, cpu_regwptr, (reg - 8) * sizeof(target_ulong));
> -}
> +assert(reg < 32);
> +tcg_gen_mov_tl(cpu_regs[reg], v);
>  }
>  }
>  
>  static inline TCGv gen_dest_gpr(DisasContext *dc, int reg)
>  {
> -if (reg == 0 || reg >= 8) {
> -return get_temp_tl(dc);
> +if (reg > 0) {
> +assert(reg < 32);
> +return cpu_regs[reg];
>  } else {
> -return cpu_gregs[reg];
> +return get_temp_tl(dc);
>  }
>  }
>  
> @@ -2158,9 +2154,13 @@ static inline void gen_ldda_asi(DisasContext *dc, TCGv 
> hi, TCGv addr,
>  tcg_temp_free_i32(r_size);
>  tcg_temp_free_i32(r_asi);
>  
> -t = gen_dest_gpr(dc, rd + 1);
> +/* ??? Work around an apparent bug in Ubuntu gcc 4.8.2-10ubuntu2+12,
> +   whereby "rd + 1" elicits "error: array subscript is above array".
> +   Since we have already asserted that rd is even, the semantics
> +   are unchanged.  */
> +t = gen_dest_gpr(dc, rd | 1);
>  tcg_gen_trunc_i64_tl(t, t64);
> -gen_store_gpr(dc, rd + 1, t);
> +gen_store_gpr(dc, rd | 1, t);
>  
>  tcg_gen_shri_i64(t64, t64, 32);
>  tcg_gen_trunc_i64_tl(hi, t64);
> @@ -5330,8 +5330,11 @@ void gen_intermediate_code(CPUSPARCState * env, 
> TranslationBlock * tb)
>  void gen_intermediate_code_init(CPUSPARCState *env)
>  {
>  static int inited;
> -static const char gregnames[8][4] = {
> +static const char gregnames[32][4] = {
>  "g0", "g1", "g2", "g3", "g4", "g5", "g6", "g7",
> +"o0", "o1", "o2", "o3", "o4", "o5", "o6", "o7",
> +"l0", "l1", "l2", "l3", "l4", "l5", "l6", "l7",
> +"i0", "i1", "i2", "i3", "i4", "i5", "i6", "i7",
>  };
>  static const char fregnames[32][4] = {
>  "f0", "f2", "f4", "f6", "f8", "f10", "f12", "f14",
> @@ -5401,11 +5404,17 @@ void gen_intermediate_code_init(CPUSPARCState *env)
>  *rtl[i].ptr = tcg_global_mem_new(cpu_env, rtl[i].off, rtl[i].name);
>  }
>  
> -TCGV_UNUSED(cpu_gregs[0]);
> +TCGV_UNUSED(cpu_regs[0]);
>  for (i = 1; i < 8; ++i) {
> -cpu_gregs[i] = tcg_global_mem_new(cpu_env,
> -  offsetof(CPUSPARCState, gregs[i]),
> -  gregnames[i]);
> +cpu_regs[i] = tcg_global_mem_new(cpu_env,
> + offsetof(CPUSPARCState, gregs[i]),
> + gregnames[i]);
> +}
> +
> +for (i = 8; i < 32; ++i) {
> +cpu_regs[i] = tcg_global_mem_new(cpu_regwptr,
> + (i - 8) * sizeof(target_ulong),
> + gregnames[i]);
>  }
>  
>  for (i = 0; i < TARGET_DPREGS; i++) {
> 

Hi Richard,

Following up the bug report at
https://bugs.launchpad.net/qemu/+bug/1588328, I bisected the regression
down to this particular commit. I can't see anything obvious here, so
perhaps this is exposing another bug somewhere else?


ATB,

Mark.

Re: [Qemu-devel] [PATCH 2/7] doc: sync help descriprion for --trace with man for qemu.1

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  qemu-options.hx | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Ah, this was the change I requested on 1/7.  I might have squashed the
two, or put this one first.  But it's obviously correct.

Reviewed-by: Eric Blake 

> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 1f9b390..f28abd3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3667,7 +3667,7 @@ DEF("trace", HAS_ARG, QEMU_OPTION_trace,
>  STEXI
>  HXCOMM This line is not accurate, as some sub-options are backend-specific 
> but
>  HXCOMM HX does not support conditional compilation of text.
> -@item -trace [events=@var{file}][,file=@var{file}]
> +@item -trace [[enable=]@var{pattern}][,events=@var{file}][,file=@var{file}]
>  @findex -trace
>  @include qemu-option-trace.texi
>  ETEXI
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 7/7] trace: enable tracing in qemu-img

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> The command will work this way:
> qemu-img --trace qcow2* create -f qcow2 1.img 64G
> 
> Signed-off-by: Denis V. Lunev 
> Suggested by: Daniel P. Berrange 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile  |  2 +-
>  qemu-img.c| 18 +-
>  qemu-img.texi |  3 +++
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Handling errors caused by -global (was Re: [PATCH v2 4/6] cpu: use CPUClass->parse_features() as convertor to global properties)

2016-06-14 Thread Paolo Bonzini



- Original Message -
> From: "Eduardo Habkost" 
> To: "Igor Mammedov" 
> Cc: qemu-devel@nongnu.org, "peter maydell" , "mark 
> cave-ayland"
> , blauwir...@gmail.com, qemu-...@nongnu.org, 
> pbonz...@redhat.com, r...@twiddle.net,
> "Markus Armbruster" 
> Sent: Tuesday, June 14, 2016 9:47:18 PM
> Subject: Handling errors caused by -global (was Re: [Qemu-devel] [PATCH v2 
> 4/6] cpu: use CPUClass->parse_features()
> as convertor to global properties)
> 
> On Thu, Jun 09, 2016 at 07:11:01PM +0200, Igor Mammedov wrote:
> [...]
> > -static void cpu_common_parse_features(CPUState *cpu, char *features,
> > +static void cpu_common_parse_features(const char *typename, char
> > *features,
> >Error **errp)
> >  {
> >  char *featurestr; /* Single "key=value" string being parsed */
> >  char *val;
> > -Error *err = NULL;
> > +static bool cpu_globals_initialized;
> > +
> > +/* TODO: all callers of ->parse_features() need to be changed to
> > + * call it only once, so we can remove this check (or change it
> > + * to assert(!cpu_globals_initialized).
> > + * Current callers of ->parse_features() are:
> > + * - machvirt_init()
> > + * - cpu_generic_init()
> > + * - cpu_x86_create()
> > + */
> > +if (cpu_globals_initialized) {
> > +return;
> > +}
> > +cpu_globals_initialized = true;
> >  
> >  featurestr = features ? strtok(features, ",") : NULL;
> >  
> >  while (featurestr) {
> >  val = strchr(featurestr, '=');
> >  if (val) {
> > +GlobalProperty *prop = g_new0(typeof(*prop), 1);
> >  *val = 0;
> >  val++;
> > -object_property_parse(OBJECT(cpu), val, featurestr, );
> > -if (err) {
> > -error_propagate(errp, err);
> > -return;
> > -}
> > +prop->driver = typename;
> > +prop->property = g_strdup(featurestr);
> > +prop->value = g_strdup(val);
> > +qdev_prop_register_global(prop);
> 
> This allows the user to trigger an assert:
> 
>   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
>   qemu-system-x86_64: hw/core/qdev-properties.c:1087:
>   qdev_prop_set_globals_for_type: Assertion `prop->user_provided' failed.
>   Aborted (core dumped)
> 
> but even if we fix the assert by setting
> prop->user_provided=true, we have a problem. Previous behavior
> was:
> 
>   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
>   qemu-system-x86_64: Property '.INVALID' not found
>   $
> 
> after this patch, and setting prop->user_provided=true, we have:
> 
>   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
>   qemu-system-x86_64: Warning: global qemu64-x86_64-cpu.INVALID=on ignored:
>   Property '.INVALID' not found
>   [QEMU keeps running]
> 
> QEMU needs to refuse to run if an invalid property is specified
> on -cpu. It is an important mechanism to prevent VMs from running
> if the user is requesting for a unsupported feature that requires
> newer QEMU.
> 
> Any suggestions on how to fix that?

Replace user_provided with a Error* argument to qdev_prop_register_global.
It can be _abort and NULL for the current users of the function,
while -cpu can use _fatal.

Paolo

> Maybe qdev_prop_set_globals() can collect errors in a list in
> DeviceState, and we can check for them in code that creates
> device objects (like cpu_generic_init(), qdev_device_add()), or
> in the beginning of device_set_realized().
> 
> --
> Eduardo
>

Re: [Qemu-devel] [PATCH 1/7] doc: move text describing --trace to specific .texi file

2016-06-14 Thread Eric Blake

On 06/14/2016 03:49 PM, Eric Blake wrote:
> On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
>> This text will be included to qemu-nbd/qemu-img mans in the next patches.
>>
>> Signed-off-by: Denis V. Lunev 
>> CC: Eric Blake 
>> CC: Paolo Bonzini 
>> CC: Stefan Hajnoczi 
>> CC: Kevin Wolf 
>> ---
>>  Makefile   |  3 ++-
>>  qemu-option-trace.texi | 25 +
>>  qemu-options.hx| 27 +--
>>  3 files changed, 28 insertions(+), 27 deletions(-)
>>  create mode 100644 qemu-option-trace.texi
> 
>> +++ b/qemu-options.hx
>> @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options 
>> are backend-specific but
>>  HXCOMM HX does not support conditional compilation of text.
>>  @item -trace [events=@var{file}][,file=@var{file}]
>>  @findex -trace
> 
> As long as you are touching here, change the -trace line to match --help
> output:
> 
> -trace [[enable=]][,events=][,file=]

Nevermind - I see you do that in 2/7.

> 
> With that change:
> 
> Reviewed-by: Eric Blake 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL 10/10] target-i386: Print obsolete warnings if +-features are used

2016-06-14 Thread Eduardo Habkost

On Tue, Jun 14, 2016 at 05:38:42PM -0400, Paolo Bonzini wrote:
> 
> 
> - Original Message -
> > From: "Eduardo Habkost" 
> > To: "Paolo Bonzini" 
> > Cc: "Peter Maydell" , "Andreas Färber" 
> > , qemu-devel@nongnu.org, "Richard
> > Henderson" , "Igor Mammedov" 
> > Sent: Tuesday, June 14, 2016 11:31:03 PM
> > Subject: Re: [PULL 10/10] target-i386: Print obsolete warnings if 
> > +-features are used
> > 
> > On Tue, Jun 14, 2016 at 05:16:40PM -0400, Paolo Bonzini wrote:
> > > - Original Message -
> > > > From: "Eduardo Habkost" 
> > > > To: "Peter Maydell" 
> > > > Cc: "Andreas Färber" , qemu-devel@nongnu.org, "Richard
> > > > Henderson" , "Paolo
> > > > Bonzini" , "Igor Mammedov" 
> > > > Sent: Tuesday, June 14, 2016 10:59:08 PM
> > > > Subject: [PULL 10/10] target-i386: Print obsolete warnings if +-features
> > > > are used
> > > > 
> > > > From: Igor Mammedov 
> > > > 
> > > > Signed-off-by: Igor Mammedov 
> > > > [ehabkost: Changed to use error_report()]
> > > > Signed-off-by: Eduardo Habkost 
> > > > ---
> > > >  target-i386/cpu.c | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > > index 3665fec..baa3783 100644
> > > > --- a/target-i386/cpu.c
> > > > +++ b/target-i386/cpu.c
> > > > @@ -1980,9 +1980,15 @@ static void x86_cpu_parse_featurestr(CPUState 
> > > > *cs,
> > > > char *features,
> > > >  /* Compatibility syntax: */
> > > >  if (featurestr[0] == '+') {
> > > >  add_flagname_to_bitmaps(featurestr + 1, plus_features,
> > > >  _err);
> > > > +error_report(
> > > > +"'+%s' is obsolete and will be removed in future, use
> > > > '%s=on'",
> > > > +featurestr + 1, featurestr + 1);
> > > >  continue;
> > > >  } else if (featurestr[0] == '-') {
> > > >  add_flagname_to_bitmaps(featurestr + 1, minus_features,
> > > >  _err);
> > > > +error_report(
> > > > +"'-%s' is obsolete and will be removed in future, use
> > > > '%s=off'",
> > > > +featurestr + 1, featurestr + 1);
> > > >  continue;
> > > >  }
> > > 
> > > I still disagree with this change.
> > 
> > I've just removed the patch from the x86-pull-request tag, while
> > we sort this out.
> > 
> > Do you suggest supporting the "[+-]feature" syntax forever?
> 
> I suggest supporting it, but removing the awful interaction with 
> "feature=on/off"
> as soon as possible.  This shouldn't block this pull request, of course.

I plan to fix the awful ordering semantics. First with a warning
for 1 or 2 releases (only when the weird semantics is really
triggered), then +feature/-feature could be directly translated
to feature=on/feature=off.

> 
> I just believe it's not practical to remove the feature.  For example
> kvm-unit-tests can be used with new kernel and old QEMU, so I don't think
> it will move away from [+-]feature very soon.
> 
> Regarding libvirt, is "feature=on/off" introspectable?  That would also be
> a problem for libvirt to support both old and new QEMU.

Good point. Removing the feature would require dozens of extra
compatibility code to libvirt and kvm-unit-tests just to save 6
lines of code in QEMU. You convinced me.

-- 
Eduardo

[Qemu-devel] [PATCH v2 16/17] block: Split bdrv_merge_limits() from bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of gthe limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing dhain during the main bdrv_refresh_limits().  Also,
the next patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Solve the issue by factoring out a new bdrv_merge_limits(),
and using that function to properly merge limits when comparing
two BlockDriverState objects.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 include/block/block.h |  1 +
 include/block/block_int.h |  4 ++--
 include/qemu/typedefs.h   |  1 +
 block/io.c| 31 +--
 block/raw_bsd.c   |  4 ++--
 5 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 733a8ec..c1d4648 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -262,6 +262,7 @@ int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
 int bdrv_change_backing_file(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0169019..88ef826 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -323,7 +323,7 @@ struct BlockDriver {
 QLIST_ENTRY(BlockDriver) list;
 };

-typedef struct BlockLimits {
+struct BlockLimits {
 /* maximum number of bytes that can be discarded at once (since it
  * is signed, it must be < 2G, if set), should be multiple of
  * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -364,7 +364,7 @@ typedef struct BlockLimits {

 /* maximum number of iovec elements */
 int max_iov;
-} BlockLimits;
+};

 typedef struct BdrvOpBlocker BdrvOpBlocker;

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index b113fcf..e6f72c2 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -14,6 +14,7 @@ typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 typedef struct BlockBackend BlockBackend;
 typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
+typedef struct BlockLimits BlockLimits;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
 typedef struct CharDriverState CharDriverState;
diff --git a/block/io.c b/block/io.c
index 0b5c40d..c6c1f7b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -67,6 +67,17 @@ static void bdrv_parent_drained_end(BlockDriverState *bs)
 }
 }

+void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
+{
+dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
+dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
+ src->opt_mem_alignment);
+dst->min_mem_alignment = MAX(dst->min_mem_alignment,
+ src->min_mem_alignment);
+dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
+}
+
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BlockDriver *drv = bs->drv;
@@ -88,11 +99,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
-bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
-bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
-bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
-bs->bl.max_iov = bs->file->bs->bl.max_iov;
+bdrv_merge_limits(>bl, >file->bs->bl);
 } else {
 bs->bl.min_mem_alignment = 512;
 bs->bl.opt_mem_alignment = getpagesize();
@@ -107,19 +114,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
-  bs->backing->bs->bl.opt_transfer);
-bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
-   bs->backing->bs->bl.max_transfer);
-bs->bl.opt_mem_alignment =
-MAX(bs->bl.opt_mem_alignment,
-bs->backing->bs->bl.opt_mem_alignment);
-bs->bl.min_mem_alignment =
-MAX(bs->bl.min_mem_alignment,
-bs->backing->bs->bl.min_mem_alignment);
-bs->bl.max_iov =
-MIN(bs->bl.max_iov,
-

Re: [Qemu-devel] [PATCH 1/7] doc: move text describing --trace to specific .texi file

2016-06-14 Thread Eric Blake

On 06/14/2016 04:08 AM, Denis V. Lunev wrote:
> This text will be included to qemu-nbd/qemu-img mans in the next patches.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Paolo Bonzini 
> CC: Stefan Hajnoczi 
> CC: Kevin Wolf 
> ---
>  Makefile   |  3 ++-
>  qemu-option-trace.texi | 25 +
>  qemu-options.hx| 27 +--
>  3 files changed, 28 insertions(+), 27 deletions(-)
>  create mode 100644 qemu-option-trace.texi

> +++ b/qemu-options.hx
> @@ -3669,32 +3669,7 @@ HXCOMM This line is not accurate, as some sub-options 
> are backend-specific but
>  HXCOMM HX does not support conditional compilation of text.
>  @item -trace [events=@var{file}][,file=@var{file}]
>  @findex -trace

As long as you are touching here, change the -trace line to match --help
output:

-trace [[enable=]][,events=][,file=]

With that change:

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 08/17] blkdebug: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

A later patch will be altering when bs->request_alignment
defaults are set, so fall back to sector alignment if we have
not inherited anything yet.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/blkdebug.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 20d25bd..1589fa7 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -37,6 +37,7 @@
 typedef struct BDRVBlkdebugState {
 int state;
 int new_state;
+int align;

 QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
 QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
@@ -382,9 +383,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 }

 /* Set request alignment */
-align = qemu_opt_get_size(opts, "align", bs->request_alignment);
-if (align > 0 && align < INT_MAX && !(align & (align - 1))) {
-bs->request_alignment = align;
+align = qemu_opt_get_size(opts, "align",
+  bs->request_alignment ?: BDRV_SECTOR_SIZE);
+if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
+s->align = align;
 } else {
 error_setg(errp, "Invalid alignment");
 ret = -EINVAL;
@@ -720,6 +722,15 @@ static void blkdebug_refresh_filename(BlockDriverState 
*bs, QDict *options)
 bs->full_open_options = opts;
 }

+static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVBlkdebugState *s = bs->opaque;
+
+if (s->align) {
+bs->request_alignment = s->align;
+}
+}
+
 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
 {
@@ -738,6 +749,7 @@ static BlockDriver bdrv_blkdebug = {
 .bdrv_getlength = blkdebug_getlength,
 .bdrv_truncate  = blkdebug_truncate,
 .bdrv_refresh_filename  = blkdebug_refresh_filename,
+.bdrv_refresh_limits= blkdebug_refresh_limits,

 .bdrv_aio_readv = blkdebug_aio_readv,
 .bdrv_aio_writev= blkdebug_aio_writev,
-- 
2.5.5

Re: [Qemu-devel] [PATCH 2/2] block: fix libvirt snapshot with existing bitmaps

2016-06-14 Thread Eric Blake

On 06/14/2016 11:08 AM, Vladimir Sementsov-Ogievskiy wrote:
> Fix the following bug:
> 
>  # virsh start test
>  Domain test started
> 
>  #  virsh qemu-monitor-command test \
>  '{"execute":"block-dirty-bitmap-add",\
>   "arguments":{"node":"drive0","name":"ab"}}'
>  {"return":{},"id":"libvirt-36"}'}'
> 
>  # virsh snapshot-create test
>  error: Unable to read from monitor: Connection reset by peer
> 
> Actually, assert "assert(pos < hb->size)" in hbitmap_iter_init fires,
> because qcow2_save_vmstate just writes to bs (not to bs->file->bs) after
> the end of the drive.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/dirty-bitmap.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 4902ca5..d28b49c 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -364,6 +364,20 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
> cur_sector,
>  int nr_sectors)
>  {
>  BdrvDirtyBitmap *bitmap;
> +int64_t bitmap_size;
> +
> +if (QLIST_EMPTY(>dirty_bitmaps)) {
> +return;
> +}
> +
> +bitmap_size = QLIST_FIRST(>dirty_bitmaps)->size;
> +
> +if (cur_sector >= bitmap_size) {
> +/* this may come from qcow2_save_vmstate */
> +return;
> +}

Do we still need this patch after Kevin's work to fix vmstate to no
longer go through the block layer?

https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL 10/10] target-i386: Print obsolete warnings if +-features are used

2016-06-14 Thread Paolo Bonzini



- Original Message -
> From: "Eduardo Habkost" 
> To: "Paolo Bonzini" 
> Cc: "Peter Maydell" , "Andreas Färber" 
> , qemu-devel@nongnu.org, "Richard
> Henderson" , "Igor Mammedov" 
> Sent: Tuesday, June 14, 2016 11:31:03 PM
> Subject: Re: [PULL 10/10] target-i386: Print obsolete warnings if +-features 
> are used
> 
> On Tue, Jun 14, 2016 at 05:16:40PM -0400, Paolo Bonzini wrote:
> > - Original Message -
> > > From: "Eduardo Habkost" 
> > > To: "Peter Maydell" 
> > > Cc: "Andreas Färber" , qemu-devel@nongnu.org, "Richard
> > > Henderson" , "Paolo
> > > Bonzini" , "Igor Mammedov" 
> > > Sent: Tuesday, June 14, 2016 10:59:08 PM
> > > Subject: [PULL 10/10] target-i386: Print obsolete warnings if +-features
> > > are used
> > > 
> > > From: Igor Mammedov 
> > > 
> > > Signed-off-by: Igor Mammedov 
> > > [ehabkost: Changed to use error_report()]
> > > Signed-off-by: Eduardo Habkost 
> > > ---
> > >  target-i386/cpu.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > index 3665fec..baa3783 100644
> > > --- a/target-i386/cpu.c
> > > +++ b/target-i386/cpu.c
> > > @@ -1980,9 +1980,15 @@ static void x86_cpu_parse_featurestr(CPUState *cs,
> > > char *features,
> > >  /* Compatibility syntax: */
> > >  if (featurestr[0] == '+') {
> > >  add_flagname_to_bitmaps(featurestr + 1, plus_features,
> > >  _err);
> > > +error_report(
> > > +"'+%s' is obsolete and will be removed in future, use
> > > '%s=on'",
> > > +featurestr + 1, featurestr + 1);
> > >  continue;
> > >  } else if (featurestr[0] == '-') {
> > >  add_flagname_to_bitmaps(featurestr + 1, minus_features,
> > >  _err);
> > > +error_report(
> > > +"'-%s' is obsolete and will be removed in future, use
> > > '%s=off'",
> > > +featurestr + 1, featurestr + 1);
> > >  continue;
> > >  }
> > 
> > I still disagree with this change.
> 
> I've just removed the patch from the x86-pull-request tag, while
> we sort this out.
> 
> Do you suggest supporting the "[+-]feature" syntax forever?

I suggest supporting it, but removing the awful interaction with 
"feature=on/off"
as soon as possible.  This shouldn't block this pull request, of course.

I just believe it's not practical to remove the feature.  For example
kvm-unit-tests can be used with new kernel and old QEMU, so I don't think
it will move away from [+-]feature very soon.

Regarding libvirt, is "feature=on/off" introspectable?  That would also be
a problem for libvirt to support both old and new QEMU.

Paolo

[Qemu-devel] [PATCH v2 09/17] iscsi: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/iscsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 4290e41..b4661c9 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1588,7 +1588,6 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto out;
 }
 bs->total_sectors = sector_lun2qemu(iscsilun->num_blocks, iscsilun);
-bs->request_alignment = iscsilun->block_size;

 /* We don't have any emulation for devices other than disks and CD-ROMs, so
  * this must be sg ioctl compatible. We force it to be sg, otherwise qemu
@@ -1710,6 +1709,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 IscsiLun *iscsilun = bs->opaque;
 uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0x : 0x;

+bs->request_alignment = iscsilun->block_size;
+
 if (iscsilun->bl.max_xfer_len) {
 max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
 }
-- 
2.5.5

Re: [Qemu-devel] Handling errors caused by -global (was Re: [PATCH v2 4/6] cpu: use CPUClass->parse_features() as convertor to global properties)

2016-06-14 Thread Eduardo Habkost

On Tue, Jun 14, 2016 at 05:41:03PM -0400, Paolo Bonzini wrote:
> 
> 
> - Original Message -
> > From: "Eduardo Habkost" 
> > To: "Igor Mammedov" 
> > Cc: qemu-devel@nongnu.org, "peter maydell" , 
> > "mark cave-ayland"
> > , blauwir...@gmail.com, qemu-...@nongnu.org, 
> > pbonz...@redhat.com, r...@twiddle.net,
> > "Markus Armbruster" 
> > Sent: Tuesday, June 14, 2016 9:47:18 PM
> > Subject: Handling errors caused by -global (was Re: [Qemu-devel] [PATCH v2 
> > 4/6] cpu: use CPUClass->parse_features()
> > as convertor to global properties)
> > 
> > On Thu, Jun 09, 2016 at 07:11:01PM +0200, Igor Mammedov wrote:
> > [...]
> > > -static void cpu_common_parse_features(CPUState *cpu, char *features,
> > > +static void cpu_common_parse_features(const char *typename, char
> > > *features,
> > >Error **errp)
> > >  {
> > >  char *featurestr; /* Single "key=value" string being parsed */
> > >  char *val;
> > > -Error *err = NULL;
> > > +static bool cpu_globals_initialized;
> > > +
> > > +/* TODO: all callers of ->parse_features() need to be changed to
> > > + * call it only once, so we can remove this check (or change it
> > > + * to assert(!cpu_globals_initialized).
> > > + * Current callers of ->parse_features() are:
> > > + * - machvirt_init()
> > > + * - cpu_generic_init()
> > > + * - cpu_x86_create()
> > > + */
> > > +if (cpu_globals_initialized) {
> > > +return;
> > > +}
> > > +cpu_globals_initialized = true;
> > >  
> > >  featurestr = features ? strtok(features, ",") : NULL;
> > >  
> > >  while (featurestr) {
> > >  val = strchr(featurestr, '=');
> > >  if (val) {
> > > +GlobalProperty *prop = g_new0(typeof(*prop), 1);
> > >  *val = 0;
> > >  val++;
> > > -object_property_parse(OBJECT(cpu), val, featurestr, );
> > > -if (err) {
> > > -error_propagate(errp, err);
> > > -return;
> > > -}
> > > +prop->driver = typename;
> > > +prop->property = g_strdup(featurestr);
> > > +prop->value = g_strdup(val);
> > > +qdev_prop_register_global(prop);
> > 
> > This allows the user to trigger an assert:
> > 
> >   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
> >   qemu-system-x86_64: hw/core/qdev-properties.c:1087:
> >   qdev_prop_set_globals_for_type: Assertion `prop->user_provided' failed.
> >   Aborted (core dumped)
> > 
> > but even if we fix the assert by setting
> > prop->user_provided=true, we have a problem. Previous behavior
> > was:
> > 
> >   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
> >   qemu-system-x86_64: Property '.INVALID' not found
> >   $
> > 
> > after this patch, and setting prop->user_provided=true, we have:
> > 
> >   $ ./x86_64-softmmu/qemu-system-x86_64 -cpu qemu64,INVALID=on
> >   qemu-system-x86_64: Warning: global qemu64-x86_64-cpu.INVALID=on ignored:
> >   Property '.INVALID' not found
> >   [QEMU keeps running]
> > 
> > QEMU needs to refuse to run if an invalid property is specified
> > on -cpu. It is an important mechanism to prevent VMs from running
> > if the user is requesting for a unsupported feature that requires
> > newer QEMU.
> > 
> > Any suggestions on how to fix that?
> 
> Replace user_provided with a Error* argument to qdev_prop_register_global.
> It can be _abort and NULL for the current users of the function,
> while -cpu can use _fatal.

Brilliant. This should work.

-- 
Eduardo

[Qemu-devel] [PATCH v2 10/17] qcow2: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/qcow2.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c40baca..393afa9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -975,9 +975,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 }

 bs->encrypted = 1;
-
-/* Encryption works on a sector granularity */
-bs->request_alignment = BDRV_SECTOR_SIZE;
 }

 s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
@@ -1196,6 +1193,10 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;

+if (bs->encrypted) {
+/* Encryption works on a sector granularity */
+bs->request_alignment = BDRV_SECTOR_SIZE;
+}
 bs->bl.pwrite_zeroes_alignment = s->cluster_size;
 }

-- 
2.5.5

Re: [Qemu-devel] [PULL 00/10] X86 queue, 2016-06-14

2016-06-14 Thread Eduardo Habkost

x86-pull-request tag was updated (patch 10/10 was removed). See
updated pull request info below.

The following changes since commit 49237b856ae58ee7955be0b959c504c51b014f20:

  Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160614-tag' into 
staging (2016-06-14 16:32:32 +0100)

are available in the git repository at:

  git://github.com/ehabkost/qemu.git tags/x86-pull-request

for you to fetch changes up to f6750e959a397dea988efd4e488e1ff813011065:

  target-i386: Consolidate calls of object_property_parse() in 
x86_cpu_parse_featurestr (2016-06-14 16:17:09 -0300)


X86 queue, 2016-06-14 (v2)



Eduardo Habkost (4):
  target-i386: add Skylake-Client cpu model
  target-i386: Remove xlevel & hv-spinlocks option fixups
  target-i386: Remove assert(kvm_enabled()) from host_x86_cpu_initfn()
  target-i386: Consolidate calls of object_property_parse() in
x86_cpu_parse_featurestr

Igor Mammedov (4):
  pc: Add 2.7 machine
  target-i386: Move features logic that requires CPUState to realize
time
  target-i386: Move xcc->kvm_required check to realize time
  target-i386: Use cpu_generic_init() in cpu_x86_init()

Radim Krčmář (1):
  target-i386: Implement CPUID[0xB] (Extended Topology Enumeration)

 hw/i386/pc_piix.c|  16 ++-
 hw/i386/pc_q35.c |  13 ++-
 include/hw/i386/pc.h |   9 ++
 target-i386/cpu.c| 276 +++
 target-i386/cpu.h|   8 ++
 5 files changed, 208 insertions(+), 114 deletions(-)

-- 
2.5.5

On Tue, Jun 14, 2016 at 05:58:58PM -0300, Eduardo Habkost wrote:
> The following changes since commit 49237b856ae58ee7955be0b959c504c51b014f20:
> 
>   Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160614-tag' 
> into staging (2016-06-14 16:32:32 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/ehabkost/qemu.git tags/x86-pull-request
> 
> for you to fetch changes up to bf68389515c16dbc8cc1f082e724c574d57f7c4d:
> 
>   target-i386: Print obsolete warnings if +-features are used (2016-06-14 
> 16:17:09 -0300)
> 
> 
> X86 queue, 2016-06-14
> 
> 
> 
> Eduardo Habkost (4):
>   target-i386: add Skylake-Client cpu model
>   target-i386: Remove xlevel & hv-spinlocks option fixups
>   target-i386: Remove assert(kvm_enabled()) from host_x86_cpu_initfn()
>   target-i386: Consolidate calls of object_property_parse() in
> x86_cpu_parse_featurestr
> 
> Igor Mammedov (5):
>   pc: Add 2.7 machine
>   target-i386: Move features logic that requires CPUState to realize
> time
>   target-i386: Move xcc->kvm_required check to realize time
>   target-i386: Use cpu_generic_init() in cpu_x86_init()
>   target-i386: Print obsolete warnings if +-features are used
> 
> Radim Krčmář (1):
>   target-i386: Implement CPUID[0xB] (Extended Topology Enumeration)
> 
>  hw/i386/pc_piix.c|  16 ++-
>  hw/i386/pc_q35.c |  13 ++-
>  include/hw/i386/pc.h |   9 ++
>  target-i386/cpu.c| 282 
> +++
>  target-i386/cpu.h|   8 ++
>  5 files changed, 214 insertions(+), 114 deletions(-)
> 
> -- 
> 2.5.5
> 

-- 
Eduardo

[Qemu-devel] [PATCH v2 14/17] block: Switch transfer length bounds to byte-based

2016-06-14 Thread Eric Blake

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

Signed-off-by: Eric Blake 

---
v2: Hoist unrelated cleanups earlier, use name 'max_transfer' in more
places, tweak iscsi calculations
---
 include/block/block_int.h  | 11 +++
 include/sysemu/block-backend.h |  2 +-
 block/block-backend.c  | 10 +-
 block/io.c | 23 +++
 block/iscsi.c  | 20 
 block/nbd.c|  2 +-
 block/raw-posix.c  |  3 ++-
 hw/block/virtio-blk.c  |  9 +
 hw/scsi/scsi-generic.c | 11 ++-
 qemu-img.c |  8 
 10 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 16c43e2..2b45d57 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -338,11 +338,14 @@ typedef struct BlockLimits {
  * power of 2, and less than max_pwrite_zeroes if that is set */
 uint32_t pwrite_zeroes_alignment;

-/* optimal transfer length in sectors */
-int opt_transfer_length;
+/* optimal transfer length in bytes (must be power of 2, and
+ * multiple of bs->request_alignment), or 0 if no preferred size */
+uint32_t opt_transfer;

-/* maximal transfer length in sectors */
-int max_transfer_length;
+/* maximal transfer length in bytes (need not be power of 2, but
+ * should be multiple of opt_transfer), or 0 for no 32-bit limit.
+ * For now, anything larger than INT_MAX is clamped down. */
+uint32_t max_transfer;

 /* memory alignment so that no bounce buffer is needed */
 size_t min_mem_alignment;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c04af8e..2469a1c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -170,7 +170,7 @@ bool blk_is_available(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
-int blk_get_max_transfer_length(BlockBackend *blk);
+uint32_t blk_get_max_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
diff --git a/block/block-backend.c b/block/block-backend.c
index 1fb070b..e042544 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1303,16 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
 }
 }

-/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
-int blk_get_max_transfer_length(BlockBackend *blk)
+/* Returns the maximum transfer length, in bytes; guaranteed nonzero */
+uint32_t blk_get_max_transfer(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
-int max = 0;
+uint32_t max = 0;

 if (bs) {
-max = bs->bl.max_transfer_length;
+max = bs->bl.max_transfer;
 }
-return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
+return MIN_NON_ZERO(max, INT_MAX);
 }

 int blk_get_max_iov(BlockBackend *blk)
diff --git a/block/io.c b/block/io.c
index c425ce8..5e38ab4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -88,8 +88,8 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer_length = bs->file->bs->bl.opt_transfer_length;
-bs->bl.max_transfer_length = bs->file->bs->bl.max_transfer_length;
+bs->bl.opt_transfer = bs->file->bs->bl.opt_transfer;
+bs->bl.max_transfer = bs->file->bs->bl.max_transfer;
 bs->bl.min_mem_alignment = bs->file->bs->bl.min_mem_alignment;
 bs->bl.opt_mem_alignment = bs->file->bs->bl.opt_mem_alignment;
 bs->bl.max_iov = bs->file->bs->bl.max_iov;
@@ -107,12 +107,10 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 error_propagate(errp, local_err);
 return;
 }
-bs->bl.opt_transfer_length =
-MAX(bs->bl.opt_transfer_length,
-bs->backing->bs->bl.opt_transfer_length);
-bs->bl.max_transfer_length =
-MIN_NON_ZERO(bs->bl.max_transfer_length,
- bs->backing->bs->bl.max_transfer_length);
+bs->bl.opt_transfer = MAX(bs->bl.opt_transfer,
+  bs->backing->bs->bl.opt_transfer);
+bs->bl.max_transfer =

Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC 06/16] vl: move smp parsing to machine pre_init

2016-06-14 Thread Programmingkid


On Jun 14, 2016, at 5:13 PM, BALATON Zoltan wrote:

> On Tue, 14 Jun 2016, Programmingkid wrote:
>> On Jun 14, 2016, at 7:39 AM, qemu-ppc-requ...@nongnu.org wrote:
>>> smp_cores is only used by pseries and x86 machines.  I expect machines
>>> that must be single-socket to disregard smp_sockets altogether.
>> 
>> Could smp support be added to the beigeg3 and mac99 targets?
> 
> I think the machines these are emulating had no SMP so it would not make much 
> sense. Maybe you'd need to create a new target emulating a Mac model with 
> multiple CPUs instead or update mac99 to a newer model that had SMP?

http://www.macworld.com/article/1002601/multiprocessor.html
You will find this article enlightening.

> But what would this bring besides more complexity and more possible bugs?

- An emulator that can take better advantage of the host system's 
multiprocessor offering.
- An emulator that can run more programs and still feel responsive.
- A simulator for testing how a program behaves in a single and multiprocessor 
environment.

[Qemu-devel] [PATCH v2 06/17] iscsi: Advertise realistic limits to block layer

2016-06-14 Thread Eric Blake

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x4000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS.  The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/iscsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 7e78ade..4290e41 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1697,7 +1697,9 @@ static void iscsi_close(BlockDriverState *bs)

 static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
 {
-return MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
+
+return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
 }

 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
-- 
2.5.5

[Qemu-devel] [PATCH v2 05/17] nbd: Advertise realistic limits to block layer

2016-06-14 Thread Eric Blake

We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits.  What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire.  So the better limit to tell the
block layer is 32M for both values.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6015e8b..bf67c8a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -362,8 +362,8 @@ static int nbd_co_flush(BlockDriverState *bs)

 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
-bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
-bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
+bs->bl.max_discard = NBD_MAX_SECTORS;
+bs->bl.max_transfer_length = NBD_MAX_SECTORS;
 }

 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
-- 
2.5.5

[Qemu-devel] [PATCH v2 15/17] block: Switch discard length bounds to byte-based

2016-06-14 Thread Eric Blake

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  In iscsi.c, sector_limits_lun2qemu()
is no longer needed; and the BlockLimits type is now completely
byte-based.

Signed-off-by: Eric Blake 

---
v2: rebase nbd and iscsi limits across earlier improvements
---
 include/block/block_int.h | 21 +++--
 block/io.c| 16 +---
 block/iscsi.c | 19 ++-
 block/nbd.c   |  2 +-
 qemu-img.c|  3 ++-
 5 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2b45d57..0169019 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,18 +324,27 @@ struct BlockDriver {
 };

 typedef struct BlockLimits {
-/* maximum number of sectors that can be discarded at once */
-int max_discard;
+/* maximum number of bytes that can be discarded at once (since it
+ * is signed, it must be < 2G, if set), should be multiple of
+ * pdiscard_alignment, but need not be power of 2. May be 0 if no
+ * inherent 32-bit limit */
+int32_t max_pdiscard;

-/* optimal alignment for discard requests in sectors */
-int64_t discard_alignment;
+/* optimal alignment for discard requests in bytes, must be power
+ * of 2, less than max_discard if that is set, and multiple of
+ * bs->request_alignment. May be 0 if bs->request_alignment is
+ * good enough */
+uint32_t pdiscard_alignment;

 /* maximum number of bytes that can zeroized at once (since it is
- * signed, it must be < 2G, if set) */
+ * signed, it must be < 2G, if set), should be multiple of
+ * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
 int32_t max_pwrite_zeroes;

 /* optimal alignment for write zeroes requests in bytes, must be
- * power of 2, and less than max_pwrite_zeroes if that is set */
+ * power of 2, less than max_pwrite_zeroes if that is set, and
+ * multiple of bs->request_alignment. May be 0 if
+ * bs->request_alignment is good enough */
 uint32_t pwrite_zeroes_alignment;

 /* optimal transfer length in bytes (must be power of 2, and
diff --git a/block/io.c b/block/io.c
index 5e38ab4..0b5c40d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2352,19 +2352,21 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
   BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);

-max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
+max_discard = MIN_NON_ZERO(bs->bl.max_pdiscard >> BDRV_SECTOR_BITS,
+   BDRV_REQUEST_MAX_SECTORS);
 while (nb_sectors > 0) {
 int ret;
 int num = nb_sectors;
+int discard_alignment = bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS;

 /* align request */
-if (bs->bl.discard_alignment &&
-num >= bs->bl.discard_alignment &&
-sector_num % bs->bl.discard_alignment) {
-if (num > bs->bl.discard_alignment) {
-num = bs->bl.discard_alignment;
+if (discard_alignment &&
+num >= discard_alignment &&
+sector_num % discard_alignment) {
+if (num > discard_alignment) {
+num = discard_alignment;
 }
-num -= sector_num % bs->bl.discard_alignment;
+num -= sector_num % discard_alignment;
 }

 /* limit request size */
diff --git a/block/iscsi.c b/block/iscsi.c
index 739d5ca..af9babf 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1696,13 +1696,6 @@ static void iscsi_close(BlockDriverState *bs)
 memset(iscsilun, 0, sizeof(IscsiLun));
 }

-static int sector_limits_lun2qemu(int64_t sector, IscsiLun *iscsilun)
-{
-int limit = MIN(sector_lun2qemu(sector, iscsilun), INT_MAX / 2 + 1);
-
-return limit < BDRV_REQUEST_MAX_SECTORS ? limit : 0;
-}
-
 static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 /* We don't actually refresh here, but just return data queried in
@@ -1722,14 +1715,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }

 if (iscsilun->lbp.lbpu) {
-if (iscsilun->bl.max_unmap < 0x) {
-bs->bl.max_discard =
-sector_limits_lun2qemu(iscsilun->bl.max_unmap, iscsilun);
+if (iscsilun->bl.max_unmap < 0x / iscsilun->block_size) {
+bs->bl.max_pdiscard =
+iscsilun->bl.max_unmap * iscsilun->block_size;
 }
-bs->bl.discard_alignment =
-

[Qemu-devel] [PATCH v2 12/17] block: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/bochs.c | 7 ++-
 block/cloop.c | 7 ++-
 block/dmg.c   | 7 ++-
 block/vvfat.c | 7 ++-
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index 6c8d0f3..182c50b 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -105,7 +105,6 @@ static int bochs_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1; // no write support yet
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 ret = bdrv_pread(bs->file->bs, 0, , sizeof(bochs));
 if (ret < 0) {
@@ -189,6 +188,11 @@ fail:
 return ret;
 }

+static void bochs_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
 {
 BDRVBochsState *s = bs->opaque;
@@ -283,6 +287,7 @@ static BlockDriver bdrv_bochs = {
 .instance_size = sizeof(BDRVBochsState),
 .bdrv_probe= bochs_probe,
 .bdrv_open = bochs_open,
+.bdrv_refresh_limits = bochs_refresh_limits,
 .bdrv_co_preadv = bochs_co_preadv,
 .bdrv_close= bochs_close,
 };
diff --git a/block/cloop.c b/block/cloop.c
index ea5a92b..d574003 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -67,7 +67,6 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1;
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 /* read header */
 ret = bdrv_pread(bs->file->bs, 128, >block_size, 4);
@@ -199,6 +198,11 @@ fail:
 return ret;
 }

+static void cloop_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int cloop_read_block(BlockDriverState *bs, int block_num)
 {
 BDRVCloopState *s = bs->opaque;
@@ -280,6 +284,7 @@ static BlockDriver bdrv_cloop = {
 .instance_size  = sizeof(BDRVCloopState),
 .bdrv_probe = cloop_probe,
 .bdrv_open  = cloop_open,
+.bdrv_refresh_limits = cloop_refresh_limits,
 .bdrv_co_preadv = cloop_co_preadv,
 .bdrv_close = cloop_close,
 };
diff --git a/block/dmg.c b/block/dmg.c
index 06eb513..1e53cd8 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -439,7 +439,6 @@ static int dmg_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;

 bs->read_only = 1;
-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */

 s->n_chunks = 0;
 s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
@@ -547,6 +546,11 @@ fail:
 return ret;
 }

+static void dmg_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline int is_sector_in_chunk(BDRVDMGState* s,
 uint32_t chunk_num, uint64_t sector_num)
 {
@@ -720,6 +724,7 @@ static BlockDriver bdrv_dmg = {
 .instance_size  = sizeof(BDRVDMGState),
 .bdrv_probe = dmg_probe,
 .bdrv_open  = dmg_open,
+.bdrv_refresh_limits = dmg_refresh_limits,
 .bdrv_co_preadv = dmg_co_preadv,
 .bdrv_close = dmg_close,
 };
diff --git a/block/vvfat.c b/block/vvfat.c
index 6d2e21c..08b1aa3 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1180,7 +1180,6 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->read_only = 0;
 }

-bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
 bs->total_sectors = cyls * heads * secs;

 if (init_directories(s, dirname, heads, secs, errp)) {
@@ -1212,6 +1211,11 @@ fail:
 return ret;
 }

+static void vvfat_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
+}
+
 static inline void vvfat_close_current_file(BDRVVVFATState *s)
 {
 if(s->current_mapping) {
@@ -3049,6 +3053,7 @@ static BlockDriver bdrv_vvfat = {

 .bdrv_parse_filename= vvfat_parse_filename,
 .bdrv_file_open = vvfat_open,
+.bdrv_refresh_limits= vvfat_refresh_limits,
 .bdrv_close = vvfat_close,

 .bdrv_co_preadv = vvfat_co_preadv,
-- 
2.5.5

[Qemu-devel] [PATCH v2 17/17] block: Move request_alignment into BlockLimit

2016-06-14 Thread Eric Blake

It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Signed-off-by: Eric Blake 

---
v2: drop hacks for save/restore of alignment, now that earlier
patches handled setting up BlockLimits more sanely
---
 include/block/block_int.h | 22 +-
 block.c   |  2 +-
 block/blkdebug.c  |  4 ++--
 block/bochs.c |  2 +-
 block/cloop.c |  2 +-
 block/dmg.c   |  2 +-
 block/io.c| 12 ++--
 block/iscsi.c |  2 +-
 block/qcow2.c |  2 +-
 block/raw-posix.c | 16 
 block/raw-win32.c |  6 +++---
 block/vvfat.c |  2 +-
 12 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 88ef826..77f47d9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -324,6 +324,12 @@ struct BlockDriver {
 };

 struct BlockLimits {
+/* Alignment requirement, in bytes, for offset/length of I/O
+ * requests. Must be a power of 2 less than INT_MAX. A value of 0
+ * defaults to 1 for drivers with modern byte interfaces, and to
+ * 512 otherwise. */
+uint32_t request_alignment;
+
 /* maximum number of bytes that can be discarded at once (since it
  * is signed, it must be < 2G, if set), should be multiple of
  * pdiscard_alignment, but need not be power of 2. May be 0 if no
@@ -332,8 +338,8 @@ struct BlockLimits {

 /* optimal alignment for discard requests in bytes, must be power
  * of 2, less than max_discard if that is set, and multiple of
- * bs->request_alignment. May be 0 if bs->request_alignment is
- * good enough */
+ * bl.request_alignment. May be 0 if bl.request_alignment is good
+ * enough */
 uint32_t pdiscard_alignment;

 /* maximum number of bytes that can zeroized at once (since it is
@@ -343,12 +349,12 @@ struct BlockLimits {

 /* optimal alignment for write zeroes requests in bytes, must be
  * power of 2, less than max_pwrite_zeroes if that is set, and
- * multiple of bs->request_alignment. May be 0 if
- * bs->request_alignment is good enough */
+ * multiple of bl.request_alignment. May be 0 if
+ * bl.request_alignment is good enough */
 uint32_t pwrite_zeroes_alignment;

 /* optimal transfer length in bytes (must be power of 2, and
- * multiple of bs->request_alignment), or 0 if no preferred size */
+ * multiple of bl.request_alignment), or 0 if no preferred size */
 uint32_t opt_transfer;

 /* maximal transfer length in bytes (need not be power of 2, but
@@ -356,10 +362,10 @@ struct BlockLimits {
  * For now, anything larger than INT_MAX is clamped down. */
 uint32_t max_transfer;

-/* memory alignment so that no bounce buffer is needed */
+/* memory alignment, in bytes so that no bounce buffer is needed */
 size_t min_mem_alignment;

-/* memory alignment for bounce buffer */
+/* memory alignment, in bytes, for bounce buffer */
 size_t opt_mem_alignment;

 /* maximum number of iovec elements */
@@ -463,8 +469,6 @@ struct BlockDriverState {
 /* I/O Limits */
 BlockLimits bl;

-/* Alignment requirement for offset/length of I/O requests */
-unsigned int request_alignment;
 /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
 unsigned int supported_write_flags;
 /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
diff --git a/block.c b/block.c
index 61fe1df..d578df8 100644
--- a/block.c
+++ b/block.c
@@ -1016,7 +1016,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BdrvChild *file,

 assert(bdrv_opt_mem_align(bs) != 0);
 assert(bdrv_min_mem_align(bs) != 0);
-assert(is_power_of_2(bs->request_alignment) || bdrv_is_sg(bs));
+assert(is_power_of_2(bs->bl.request_alignment) || bdrv_is_sg(bs));

 qemu_opts_del(opts);
 return 0;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1589fa7..5e32643 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -384,7 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,

 /* Set request alignment */
 align = qemu_opt_get_size(opts, "align",
-  bs->request_alignment ?: BDRV_SECTOR_SIZE);
+  bs->bl.request_alignment ?: BDRV_SECTOR_SIZE);
 if (align > 0 && align < INT_MAX && is_power_of_2(align)) {
 s->align = align;
 } else {
@@ -727,7 +727,7 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, 
Error **errp)
 BDRVBlkdebugState *s = bs->opaque;

 if (s->align) {
-bs->request_alignment = s->align;
+bs->bl.request_alignment = s->align;
 }
 }

diff --git a/block/bochs.c b/block/bochs.c
index 182c50b..4194f1d 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -190,7 +190,7 @@ fail:

 static void

[Qemu-devel] [PATCH v2 02/17] block: Document supported flags during bdrv_aligned_preadv()

2016-06-14 Thread Eric Blake

We don't pass any flags on to drivers to handle.  Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future.  Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake 

---
v2: retitle from 'Honor flags during bdrv_aligned_preadv()', and
change scope of patch
---
 block/io.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 056a101..9191772 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,6 +933,9 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BlockDriverState *bs,

 if (drv->bdrv_co_pwrite_zeroes &&
 buffer_is_zero(bounce_buffer, iov.iov_len)) {
+/* FIXME: Should we (perhaps conditionally) be setting
+ * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
+ * that still correctly reads as zero? */
 ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
 } else {
 /* This does not change the data on the disk, it is not necessary
@@ -975,7 +978,12 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
-assert(!(flags & ~BDRV_REQ_MASK));
+
+/* TODO: We would need a per-BDS .supported_read_flags and
+ * potential fallback support, if we ever implement any read flags
+ * to pass through to drivers.  For now, there aren't any
+ * passthrough flags.  */
+assert(!(flags & ~(BDRV_REQ_NO_SERIALISING | BDRV_REQ_COPY_ON_READ)));

 /* Handle Copy on Read and associated serialisation */
 if (flags & BDRV_REQ_COPY_ON_READ) {
-- 
2.5.5

[Qemu-devel] [PATCH v2 13/17] block: Set default request_alignment during bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake 
---
 block.c| 1 -
 block/io.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index b350794..61fe1df 100644
--- a/block.c
+++ b/block.c
@@ -937,7 +937,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild 
*file,
 goto fail_opts;
 }

-bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
 bs->read_only = !(bs->open_flags & BDRV_O_RDWR);

 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
diff --git a/block/io.c b/block/io.c
index 383154f..c425ce8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -78,6 +78,9 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
 return;
 }

+/* Default alignment based on whether driver has byte interface */
+bs->request_alignment = drv->bdrv_co_preadv ? 1 : 512;
+
 /* Take some limits from the children as a default */
 if (bs->file) {
 bdrv_refresh_limits(bs->file->bs, _err);
-- 
2.5.5

[Qemu-devel] [PATCH v2 11/17] raw-win32: Set request_alignment during .bdrv_refresh_limits()

2016-06-14 Thread Eric Blake

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/raw-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/raw-win32.c b/block/raw-win32.c
index fd23891..88382d9 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -222,7 +222,7 @@ static void raw_attach_aio_context(BlockDriverState *bs,
 }
 }

-static void raw_probe_alignment(BlockDriverState *bs)
+static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 DWORD sectorsPerCluster, freeClusters, totalClusters, count;
@@ -365,7 +365,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs));
 }

-raw_probe_alignment(bs);
 ret = 0;
 fail:
 qemu_opts_del(opts);
@@ -550,6 +549,7 @@ BlockDriver bdrv_file = {
 .bdrv_needs_filename = true,
 .bdrv_parse_filename = raw_parse_filename,
 .bdrv_file_open = raw_open,
+.bdrv_refresh_limits = raw_probe_alignment,
 .bdrv_close = raw_close,
 .bdrv_create= raw_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-- 
2.5.5

[Qemu-devel] [PATCH v2 01/17] block: Tighter assertions on bdrv_aligned_pwritev()

2016-06-14 Thread Eric Blake

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/io.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 42e63f7..056a101 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1242,7 +1242,7 @@ fail:
  */
 static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
 BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
-QEMUIOVector *qiov, int flags)
+int64_t align, QEMUIOVector *qiov, int flags)
 {
 BlockDriver *drv = bs->drv;
 bool waited;
@@ -1251,6 +1251,9 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 int64_t start_sector = offset >> BDRV_SECTOR_BITS;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

+assert(is_power_of_2(align));
+assert((offset & (align - 1)) == 0);
+assert((bytes & (align - 1)) == 0);
 assert(!qiov || bytes == qiov->size);
 assert((bs->open_flags & BDRV_O_NO_IO) == 0);
 assert(!(flags & ~BDRV_REQ_MASK));
@@ -1337,7 +1340,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,

 memset(buf + head_padding_bytes, 0, zero_bytes);
 ret = bdrv_aligned_pwritev(bs, req, offset & ~(align - 1), align,
-   _qiov,
+   align, _qiov,
flags & ~BDRV_REQ_ZERO_WRITE);
 if (ret < 0) {
 goto fail;
@@ -1350,7 +1353,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 if (bytes >= align) {
 /* Write the aligned part in the middle. */
 uint64_t aligned_bytes = bytes & ~(align - 1);
-ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes,
+ret = bdrv_aligned_pwritev(bs, req, offset, aligned_bytes, align,
NULL, flags);
 if (ret < 0) {
 goto fail;
@@ -1374,7 +1377,7 @@ static int coroutine_fn 
bdrv_co_do_zero_pwritev(BlockDriverState *bs,
 bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_AFTER_TAIL);

 memset(buf, 0, bytes);
-ret = bdrv_aligned_pwritev(bs, req, offset, align,
+ret = bdrv_aligned_pwritev(bs, req, offset, align, align,
_qiov, flags & ~BDRV_REQ_ZERO_WRITE);
 }
 fail:
@@ -1499,7 +1502,7 @@ int coroutine_fn bdrv_co_pwritev(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }

-ret = bdrv_aligned_pwritev(bs, , offset, bytes,
+ret = bdrv_aligned_pwritev(bs, , offset, bytes, align,
use_local_qiov ? _qiov : qiov,
flags);

-- 
2.5.5

[Qemu-devel] [PATCH v2 03/17] block: Fix harmless off-by-one in bdrv_aligned_preadv()

2016-06-14 Thread Eric Blake

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake 
Reviewed-by: Kevin Wolf 

---
v2: no change, add R-b
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 9191772..383154f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1024,7 +1024,7 @@ static int coroutine_fn 
bdrv_aligned_preadv(BlockDriverState *bs,
 }

 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
-if (bytes < max_bytes) {
+if (bytes <= max_bytes) {
 ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 0);
 } else if (max_bytes > 0) {
 QEMUIOVector local_qiov;
-- 
2.5.5

Re: [Qemu-devel] [PULL 10/10] target-i386: Print obsolete warnings if +-features are used

2016-06-14 Thread Eduardo Habkost

On Tue, Jun 14, 2016 at 05:16:40PM -0400, Paolo Bonzini wrote:
> - Original Message -
> > From: "Eduardo Habkost" 
> > To: "Peter Maydell" 
> > Cc: "Andreas Färber" , qemu-devel@nongnu.org, "Richard 
> > Henderson" , "Paolo
> > Bonzini" , "Igor Mammedov" 
> > Sent: Tuesday, June 14, 2016 10:59:08 PM
> > Subject: [PULL 10/10] target-i386: Print obsolete warnings if +-features 
> > are used
> > 
> > From: Igor Mammedov 
> > 
> > Signed-off-by: Igor Mammedov 
> > [ehabkost: Changed to use error_report()]
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  target-i386/cpu.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 3665fec..baa3783 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -1980,9 +1980,15 @@ static void x86_cpu_parse_featurestr(CPUState *cs,
> > char *features,
> >  /* Compatibility syntax: */
> >  if (featurestr[0] == '+') {
> >  add_flagname_to_bitmaps(featurestr + 1, plus_features,
> >  _err);
> > +error_report(
> > +"'+%s' is obsolete and will be removed in future, use
> > '%s=on'",
> > +featurestr + 1, featurestr + 1);
> >  continue;
> >  } else if (featurestr[0] == '-') {
> >  add_flagname_to_bitmaps(featurestr + 1, minus_features,
> >  _err);
> > +error_report(
> > +"'-%s' is obsolete and will be removed in future, use
> > '%s=off'",
> > +featurestr + 1, featurestr + 1);
> >  continue;
> >  }
> 
> I still disagree with this change.

I've just removed the patch from the x86-pull-request tag, while
we sort this out.

Do you suggest supporting the "[+-]feature" syntax forever? If
that's really what you prefer, I won't complain too loudly.  It's
only a few extra lines of code, after all.

But if you have something else in mind, please clarify what you
suggest.

-- 
Eduardo

[Qemu-devel] [PATCH v2 07/17] block: Give nonzero result to blk_get_max_transfer_length()

2016-06-14 Thread Eric Blake

Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 block/block-backend.c  |  7 ---
 hw/block/virtio-blk.c  |  3 +--
 hw/scsi/scsi-generic.c | 12 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 34500e6..1fb070b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1303,15 +1303,16 @@ int blk_get_flags(BlockBackend *blk)
 }
 }

+/* Returns the maximum transfer length, in sectors; guaranteed nonzero */
 int blk_get_max_transfer_length(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
+int max = 0;

 if (bs) {
-return bs->bl.max_transfer_length;
-} else {
-return 0;
+max = bs->bl.max_transfer_length;
 }
+return MIN_NON_ZERO(max, BDRV_REQUEST_MAX_SECTORS);
 }

 int blk_get_max_iov(BlockBackend *blk)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 284e646..1d2792e 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
 void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
 {
 int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
-int max_xfer_len = 0;
+int max_xfer_len;
 int64_t sector_num = 0;

 if (mrb->num_reqs == 1) {
@@ -392,7 +392,6 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
 }

 max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
-max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);

 qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
   _compare);
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 71372a8..db04a76 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -226,12 +226,12 @@ static void scsi_read_complete(void * opaque, int ret)
 r->req.cmd.buf[0] == INQUIRY &&
 r->req.cmd.buf[2] == 0xb0) {
 uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
-if (max_xfer_len) {
-stl_be_p(>buf[8], max_xfer_len);
-/* Also take care of the opt xfer len. */
-if (ldl_be_p(>buf[12]) > max_xfer_len) {
-stl_be_p(>buf[12], max_xfer_len);
-}
+
+assert(max_xfer_len);
+stl_be_p(>buf[8], max_xfer_len);
+/* Also take care of the opt xfer len. */
+if (ldl_be_p(>buf[12]) > max_xfer_len) {
+stl_be_p(>buf[12], max_xfer_len);
 }
 }
 scsi_req_data(>req, len);
-- 
2.5.5

[Qemu-devel] [PATCH v2 04/17] nbd: Allow larger requests

2016-06-14 Thread Eric Blake

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work.  It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake 

---
v2: new patch
---
 include/block/nbd.h | 1 +
 block/nbd-client.c  | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b86a976..36dde24 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,7 @@ enum {

 /* Maximum size of a single READ/WRITE data buffer */
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)
+#define NBD_MAX_SECTORS (NBD_MAX_BUFFER_SIZE / BDRV_SECTOR_SIZE)

 ssize_t nbd_wr_syncv(QIOChannel *ioc,
  struct iovec *iov,
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 4d13444..420bce8 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -269,10 +269,6 @@ static int nbd_co_writev_1(BlockDriverState *bs, int64_t 
sector_num,
 return -reply.error;
 }

-/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
- * remain aligned to 4K. */
-#define NBD_MAX_SECTORS 2040
-
 int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, QEMUIOVector *qiov)
 {
-- 
2.5.5

[Qemu-devel] [PATCH v2 00/17] Byte-based block limits

2016-06-14 Thread Eric Blake

BlockLimits is currently an ugly mix of byte limits vs.
sector limits.  Unify it.  Fix some bugs I found in
bdrv_aligned_preadv() while at it.

Prequisite: Kevin's ongoing work to migrate bdrv_aligned_preadv()
to be byte-based (commit 3de06b2 on his vmstate branch at the
time of this email, but that gets rebased):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02832.html

Trivial contextual conflict in nbd.h with the pull request Paolo
will soon be posting (both series add a #define near the same
line; resolution is to add both):
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg0.html

Also available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-limits-v2

Since v1:
- drop things already done in Kevin's work
- rebase
- split out lots of cleanup work to bdrv_refresh_limits() so
that qemu-iotests does not gain any problems on 77

001/17:[down] 'block: Tighter assertions on bdrv_aligned_pwritev()'
002/17:[down] 'block: Document supported flags during bdrv_aligned_preadv()'
003/17:[down] 'block: Fix harmless off-by-one in bdrv_aligned_preadv()'
004/17:[down] 'nbd: Allow larger requests'
005/17:[down] 'nbd: Advertise realistic limits to block layer'
006/17:[down] 'iscsi: Advertise realistic limits to block layer'
007/17:[down] 'block: Give nonzero result to blk_get_max_transfer_length()'
008/17:[down] 'blkdebug: Set request_alignment during .bdrv_refresh_limits()'
009/17:[down] 'iscsi: Set request_alignment during .bdrv_refresh_limits()'
010/17:[down] 'qcow2: Set request_alignment during .bdrv_refresh_limits()'
011/17:[down] 'raw-win32: Set request_alignment during .bdrv_refresh_limits()'
012/17:[down] 'block: Set request_alignment during .bdrv_refresh_limits()'
013/17:[down] 'block: Set default request_alignment during 
bdrv_refresh_limits()'
014/17:[0061] [FC] 'block: Switch transfer length bounds to byte-based'
015/17:[0008] [FC] 'block: Switch discard length bounds to byte-based'
016/17:[down] 'block: Split bdrv_merge_limits() from bdrv_refresh_limits()'
017/17:[0044] [FC] 'block: Move request_alignment into BlockLimit'

Eric Blake (17):
  block: Tighter assertions on bdrv_aligned_pwritev()
  block: Document supported flags during bdrv_aligned_preadv()
  block: Fix harmless off-by-one in bdrv_aligned_preadv()
  nbd: Allow larger requests
  nbd: Advertise realistic limits to block layer
  iscsi: Advertise realistic limits to block layer
  block: Give nonzero result to blk_get_max_transfer_length()
  blkdebug: Set request_alignment during .bdrv_refresh_limits()
  iscsi: Set request_alignment during .bdrv_refresh_limits()
  qcow2: Set request_alignment during .bdrv_refresh_limits()
  raw-win32: Set request_alignment during .bdrv_refresh_limits()
  block: Set request_alignment during .bdrv_refresh_limits()
  block: Set default request_alignment during bdrv_refresh_limits()
  block: Switch transfer length bounds to byte-based
  block: Switch discard length bounds to byte-based
  block: Split bdrv_merge_limits() from bdrv_refresh_limits()
  block: Move request_alignment into BlockLimit

 include/block/block.h  |  1 +
 include/block/block_int.h  | 48 ++---
 include/block/nbd.h|  1 +
 include/qemu/typedefs.h|  1 +
 include/sysemu/block-backend.h |  2 +-
 block.c|  3 +-
 block/blkdebug.c   | 18 ++--
 block/block-backend.c  |  9 ++--
 block/bochs.c  |  7 ++-
 block/cloop.c  |  7 ++-
 block/dmg.c|  7 ++-
 block/io.c | 96 +++---
 block/iscsi.c  | 40 +-
 block/nbd-client.c |  4 --
 block/nbd.c|  4 +-
 block/qcow2.c  |  7 +--
 block/raw-posix.c  | 19 +
 block/raw-win32.c  | 10 ++---
 block/raw_bsd.c|  4 +-
 block/vvfat.c  |  7 ++-
 hw/block/virtio-blk.c  | 10 ++---
 hw/scsi/scsi-generic.c | 15 ---
 qemu-img.c |  9 ++--
 23 files changed, 195 insertions(+), 134 deletions(-)

-- 
2.5.5

Re: [Qemu-devel] [PATCH 1/1] cpu: report hyperv feature words through qom

2016-06-14 Thread Denis V. Lunev


On 06/14/2016 11:54 PM, Eduardo Habkost wrote:

On Tue, Jun 14, 2016 at 11:45:08PM +0300, Denis V. Lunev wrote:

On 06/14/2016 10:59 PM, Eduardo Habkost wrote:

On Tue, Jun 14, 2016 at 01:28:40PM +0300, Denis V. Lunev wrote:

From: Evgeny Yakovlev 

This change adds hyperv feature words report through qom rpc.

When VM is configured with hyperv features enabled libvirt will check that
required featured words are set in cpuid leaf 4003 through qom
request.

Currently qemu does not report hyperv feature words which prevents windows
guests from starting with libvirt.

Signed-off-by: Evgeny Yakovlev 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: Marcelo Tosatti 

Which QEMU version did you use to test this? Some of those properties already
exist. See:

static Property x86_cpu_properties[] = {
[...]
{ .name  = "hv-spinlocks", .info  = _prop_spinlocks },
DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
DEFINE_PROP_BOOL("hv-vapic", X86CPU, hyperv_vapic, false),
DEFINE_PROP_BOOL("hv-time", X86CPU, hyperv_time, false),
DEFINE_PROP_BOOL("hv-crash", X86CPU, hyperv_crash, false),
DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false),
[...]
DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id),
DEFINE_PROP_END_OF_LIST()
};

QEMU will crash if you try to register the properties twice:

$ ./x86_64-softmmu/qemu-system-x86_64
qemu-system-x86_64: /home/ehabkost/rh/proj/virt/qemu/target-i386/cpu.c:3094: 
x86_cpu_register_bit_prop: Assertion `fp->ptr == field' failed.
Aborted (core dumped)

I like the idea of moving hyperv feature information inside the features array,
though.

no, idea is a bit different.

The user selects properties in the command line to enable
different HyperV enlightenments. This is how we do that
and this is how the QEMU is expected to work.

After that libvirt starts to check that these properties do
work. In order to do that it executes qom-get and expects
to find enabled HyperV enlightenments in the guest CPUID.

This is the idea of this patch.

And why exactly moving hyperv feature information inside the
CPUX86State::features array wouldn't do exactly what you say
above?


property remains ;)

OK, may be I have missed you point. I fear word "move"
in your letter. I think we "add" it to features.

Anyway, I got your point that features comes first,
CPUID is filled in the base of features. No prob.

Den

Re: [Qemu-devel] [Qemu-ppc] [PATCH RFC 06/16] vl: move smp parsing to machine pre_init

2016-06-14 Thread BALATON Zoltan


On Tue, 14 Jun 2016, Programmingkid wrote:

On Jun 14, 2016, at 7:39 AM, qemu-ppc-requ...@nongnu.org wrote:

smp_cores is only used by pseries and x86 machines.  I expect machines
that must be single-socket to disregard smp_sockets altogether.


Could smp support be added to the beigeg3 and mac99 targets?


I think the machines these are emulating had no SMP so it would not make 
much sense. Maybe you'd need to create a new target emulating a Mac model 
with multiple CPUs instead or update mac99 to a newer model that had SMP?

But what would this bring besides more complexity and more possible bugs?

Regards,
BALATON Zoltan

Re: [Qemu-devel] [PATCH 03/10] ppc: Rework POWER7 & POWER8 exception model (part 2)

2016-06-14 Thread Benjamin Herrenschmidt

On Tue, 2016-06-14 at 16:25 +1000, David Gibson wrote:
> > Properly implement LPES0/1 handling for HV vs. !HV mode.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > [clg: AIL implementation was fixed in commit 5c94b2a5e5ef
> >   fixed checkpatch.pl errors ]
> 
> Code looks ok, but the short description really needs an update,
> since
> this has been taken out of its original series context.

This is still what this does. It properly implements support for LPCR0
(LPCR1 isn't supported). It also fixes how the HV bit is handled when
taking interrupts and which set of SRR's are used in some cases.

Cheers,
Ben.

[Qemu-devel] [PULL 09/10] target-i386: Consolidate calls of object_property_parse() in x86_cpu_parse_featurestr

2016-06-14 Thread Eduardo Habkost

Signed-off-by: Eduardo Habkost 
Reviewed-by: Igor Mammedov 
Signed-off-by: Igor Mammedov 
Reviewed-by: Eduardo Habkost 
Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 74 +--
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 329d85c..3665fec 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1966,43 +1966,59 @@ static void x86_cpu_parse_featurestr(CPUState *cs, char 
*features,
 char *featurestr; /* Single 'key=value" string being parsed */
 Error *local_err = NULL;
 
-featurestr = features ? strtok(features, ",") : NULL;
+if (!features) {
+return;
+}
+
+for (featurestr = strtok(features, ",");
+ featurestr  && !local_err;
+ featurestr = strtok(NULL, ",")) {
+const char *name;
+const char *val = NULL;
+char *eq = NULL;
 
-while (featurestr) {
-char *val;
+/* Compatibility syntax: */
 if (featurestr[0] == '+') {
 add_flagname_to_bitmaps(featurestr + 1, plus_features, _err);
+continue;
 } else if (featurestr[0] == '-') {
 add_flagname_to_bitmaps(featurestr + 1, minus_features, 
_err);
-} else if ((val = strchr(featurestr, '='))) {
-*val = 0; val++;
-feat2prop(featurestr);
-if (!strcmp(featurestr, "tsc-freq")) {
-int64_t tsc_freq;
-char *err;
-char num[32];
-
-tsc_freq = qemu_strtosz_suffix_unit(val, ,
-   QEMU_STRTOSZ_DEFSUFFIX_B, 1000);
-if (tsc_freq < 0 || *err) {
-error_setg(errp, "bad numerical value %s", val);
-return;
-}
-snprintf(num, sizeof(num), "%" PRId64, tsc_freq);
-object_property_parse(OBJECT(cpu), num, "tsc-frequency",
-  _err);
-} else {
-object_property_parse(OBJECT(cpu), val, featurestr, 
_err);
-}
+continue;
+}
+
+eq = strchr(featurestr, '=');
+if (eq) {
+*eq++ = 0;
+val = eq;
 } else {
-feat2prop(featurestr);
-object_property_parse(OBJECT(cpu), "on", featurestr, _err);
+val = "on";
 }
-if (local_err) {
-error_propagate(errp, local_err);
-return;
+
+feat2prop(featurestr);
+name = featurestr;
+
+/* Special case: */
+if (!strcmp(name, "tsc-freq")) {
+int64_t tsc_freq;
+char *err;
+char num[32];
+
+tsc_freq = qemu_strtosz_suffix_unit(val, ,
+   QEMU_STRTOSZ_DEFSUFFIX_B, 1000);
+if (tsc_freq < 0 || *err) {
+error_setg(errp, "bad numerical value %s", val);
+return;
+}
+snprintf(num, sizeof(num), "%" PRId64, tsc_freq);
+val = num;
+name = "tsc-frequency";
 }
-featurestr = strtok(NULL, ",");
+
+object_property_parse(OBJECT(cpu), val, name, _err);
+}
+
+if (local_err) {
+error_propagate(errp, local_err);
 }
 }
 
-- 
2.5.5

Re: [Qemu-devel] [PATCH 1/1] cpu: report hyperv feature words through qom

2016-06-14 Thread Eduardo Habkost

On Tue, Jun 14, 2016 at 11:59:18PM +0300, Denis V. Lunev wrote:
> On 06/14/2016 11:54 PM, Eduardo Habkost wrote:
> > On Tue, Jun 14, 2016 at 11:45:08PM +0300, Denis V. Lunev wrote:
> > > On 06/14/2016 10:59 PM, Eduardo Habkost wrote:
> > > > On Tue, Jun 14, 2016 at 01:28:40PM +0300, Denis V. Lunev wrote:
> > > > > From: Evgeny Yakovlev 
> > > > > 
> > > > > This change adds hyperv feature words report through qom rpc.
> > > > > 
> > > > > When VM is configured with hyperv features enabled libvirt will check 
> > > > > that
> > > > > required featured words are set in cpuid leaf 4003 through qom
> > > > > request.
> > > > > 
> > > > > Currently qemu does not report hyperv feature words which prevents 
> > > > > windows
> > > > > guests from starting with libvirt.
> > > > > 
> > > > > Signed-off-by: Evgeny Yakovlev 
> > > > > Signed-off-by: Denis V. Lunev 
> > > > > CC: Paolo Bonzini 
> > > > > CC: Richard Henderson 
> > > > > CC: Eduardo Habkost 
> > > > > CC: Marcelo Tosatti 
> > > > Which QEMU version did you use to test this? Some of those properties 
> > > > already
> > > > exist. See:
> > > > 
> > > > static Property x86_cpu_properties[] = {
> > > > [...]
> > > > { .name  = "hv-spinlocks", .info  = _prop_spinlocks },
> > > > DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, 
> > > > false),
> > > > DEFINE_PROP_BOOL("hv-vapic", X86CPU, hyperv_vapic, false),
> > > > DEFINE_PROP_BOOL("hv-time", X86CPU, hyperv_time, false),
> > > > DEFINE_PROP_BOOL("hv-crash", X86CPU, hyperv_crash, false),
> > > > DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
> > > > DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
> > > > DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
> > > > DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
> > > > DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false),
> > > > [...]
> > > > DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id),
> > > > DEFINE_PROP_END_OF_LIST()
> > > > };
> > > > 
> > > > QEMU will crash if you try to register the properties twice:
> > > > 
> > > > $ ./x86_64-softmmu/qemu-system-x86_64
> > > > qemu-system-x86_64: 
> > > > /home/ehabkost/rh/proj/virt/qemu/target-i386/cpu.c:3094: 
> > > > x86_cpu_register_bit_prop: Assertion `fp->ptr == field' failed.
> > > > Aborted (core dumped)
> > > > 
> > > > I like the idea of moving hyperv feature information inside the 
> > > > features array,
> > > > though.
> > > no, idea is a bit different.
> > > 
> > > The user selects properties in the command line to enable
> > > different HyperV enlightenments. This is how we do that
> > > and this is how the QEMU is expected to work.
> > > 
> > > After that libvirt starts to check that these properties do
> > > work. In order to do that it executes qom-get and expects
> > > to find enabled HyperV enlightenments in the guest CPUID.
> > > 
> > > This is the idea of this patch.
> > And why exactly moving hyperv feature information inside the
> > CPUX86State::features array wouldn't do exactly what you say
> > above?
> > 
> property remains ;)
> 
> OK, may be I have missed you point. I fear word "move"
> in your letter. I think we "add" it to features.

I mean we move data to a different struct field (from the
hyperv_* booleans to the features array), and let the existing
feature property system handle it.

Your patch does that partially, but it conflicts with the
existing properties registered at x86_cpu_properties.

But we also need to make kvm_arch_get_supported_cpuid() handle
the hyperv CPUID leaves (to replace the existing
kvm_check_extensions() and has_msr_* checks in
kvm_arch_init_vcpu()).

After we do that, some things should work automatically:
* New hv-* properties available in QOM and "-cpu";
* hyperv support for the "check" and "enforce" options;
* hyperv support in "-cpu host'; and
* hyperv information returned in the "feature-words" and
  "filtered-features" QOM properties.

See how FEAT_KVM is handled in target-i386/cpu.c and
kvm_arch_init_vcpu(), for reference.

-- 
Eduardo

Re: [Qemu-devel] [PULL 10/10] target-i386: Print obsolete warnings if +-features are used

2016-06-14 Thread Paolo Bonzini



- Original Message -
> From: "Eduardo Habkost" 
> To: "Peter Maydell" 
> Cc: "Andreas Färber" , qemu-devel@nongnu.org, "Richard 
> Henderson" , "Paolo
> Bonzini" , "Igor Mammedov" 
> Sent: Tuesday, June 14, 2016 10:59:08 PM
> Subject: [PULL 10/10] target-i386: Print obsolete warnings if +-features are 
> used
> 
> From: Igor Mammedov 
> 
> Signed-off-by: Igor Mammedov 
> [ehabkost: Changed to use error_report()]
> Signed-off-by: Eduardo Habkost 
> ---
>  target-i386/cpu.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 3665fec..baa3783 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1980,9 +1980,15 @@ static void x86_cpu_parse_featurestr(CPUState *cs,
> char *features,
>  /* Compatibility syntax: */
>  if (featurestr[0] == '+') {
>  add_flagname_to_bitmaps(featurestr + 1, plus_features,
>  _err);
> +error_report(
> +"'+%s' is obsolete and will be removed in future, use
> '%s=on'",
> +featurestr + 1, featurestr + 1);
>  continue;
>  } else if (featurestr[0] == '-') {
>  add_flagname_to_bitmaps(featurestr + 1, minus_features,
>  _err);
> +error_report(
> +"'-%s' is obsolete and will be removed in future, use
> '%s=off'",
> +featurestr + 1, featurestr + 1);
>  continue;
>  }

I still disagree with this change.

Paolo

1 2 3 4 5 6 >

1 - 100 of 569 matches

Mail list logo