Re: [Qemu-devel] [PATCH v7 17/17] intel_iommu: enable vfio devices

2017-03-15 Thread Peter Xu
On Tue, Feb 07, 2017 at 04:28:19PM +0800, Peter Xu wrote:
> This patch is based on Aviv Ben-David ()'s patch
> upstream:
> 
>   "IOMMU: enable intel_iommu map and unmap notifiers"
>   https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html
> 
> However I removed/fixed some content, and added my own codes.
> 
> Instead of translate() every page for iotlb invalidations (which is
> slower), we walk the pages when needed and notify in a hook function.
> 
> This patch enables vfio devices for VT-d emulation.
> 
> And, since we already have vhost DMAR support via device-iotlb, a
> natural benefit that this patch brings is that vt-d enabled vhost can
> live even without ATS capability now. Though more tests are needed.
> 

Hi, Michael,

If there is any possiblility that this version be merged in the future
at any point, would you please help add Aviv's sign-off into this
patch as well right here (I think it should be before Jason's r-b):

Signed-off-by: Aviv Ben-David 

Since I think we should definitely give Aviv more credit since he's
done a great work before (and devoted lots of time).

(Aviv, please reply if you have other opinions, or I'll just make
 myself bold)

Thanks,

> Reviewed-by: Jason Wang 
> Signed-off-by: Peter Xu 

[...]

-- peterx



Re: [Qemu-devel] [RFC v2 1/8] block: add bdrv_measure() API

2017-03-15 Thread Max Reitz
On 16.03.2017 04:38, Stefan Hajnoczi wrote:
> On Thu, Mar 16, 2017 at 02:01:03AM +0100, Max Reitz wrote:
>> On 15.03.2017 10:29, Stefan Hajnoczi wrote:
>>> bdrv_measure() provides a conservative maximum for the size of a new
>>> image.  This information is handy if storage needs to be allocated (e.g.
>>> a SAN or an LVM volume) ahead of time.
>>>
>>> Signed-off-by: Stefan Hajnoczi 
>>> ---
>>>  qapi/block-core.json  | 19 +++
>>>  include/block/block.h |  4 
>>>  include/block/block_int.h |  2 ++
>>>  block.c   | 33 +
>>>  4 files changed, 58 insertions(+)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 786b39e..673569d 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -463,6 +463,25 @@
>>> '*dirty-bitmaps': ['BlockDirtyInfo'] } }
>>>  
>>>  ##
>>> +# @BlockMeasureInfo:
>>> +#
>>> +# Image size calculation information.  This structure describes the size
>>> +# requirements for creating a new image.
>>> +#
>>> +# @required-bytes: Amount of space required for image creation.  This 
>>> value is
>>> +#  the host file size including sparse file regions.  A 
>>> new 5
>>> +#  GB raw file therefore has a required size of 5 GB, not 0
>>> +#  bytes.
>>
>> This should probably note that it's a conservative estimation (and I
>> agree that it should be). It's nice to have it in the commit message but
>> few people are going to run git blame on the QAPI documentation to find
>> out the rest of its story. :-)
> 
> Will fix.
> 
>>> +#
>>> +# @fully-allocated-bytes: Space required once data has been written to all
>>> +# sectors
>>> +#
>>> +# Since: 2.10
>>> +##
>>> +{ 'struct': 'BlockMeasureInfo',
>>> +  'data': {'required-bytes': 'int', 'fully-allocated-bytes': 'int'} }
>>> +
>>> +##
>>>  # @query-block:
>>>  #
>>>  # Get a list of BlockInfo for all virtual block devices.
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 5149260..43c789f 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -298,6 +298,10 @@ int bdrv_truncate(BdrvChild *child, int64_t offset);
>>>  int64_t bdrv_nb_sectors(BlockDriverState *bs);
>>>  int64_t bdrv_getlength(BlockDriverState *bs);
>>>  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
>>> +void bdrv_measure(BlockDriver *drv, QemuOpts *opts,
>>> +  BlockDriverState *in_bs,
>>> +  BlockMeasureInfo *info,
>>> +  Error **errp);
>>>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>>>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>>>  int bdrv_commit(BlockDriverState *bs);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 6c699ac..45a7fbe 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -201,6 +201,8 @@ struct BlockDriver {
>>>  int64_t (*bdrv_getlength)(BlockDriverState *bs);
>>>  bool has_variable_length;
>>>  int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
>>> +void (*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
>>> + BlockMeasureInfo *info, Error **errp);
>>>  
>>>  int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
>>>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
>>> diff --git a/block.c b/block.c
>>> index cb57370..532a4d1 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -3260,6 +3260,39 @@ int64_t 
>>> bdrv_get_allocated_file_size(BlockDriverState *bs)
>>>  return -ENOTSUP;
>>>  }
>>>  
>>> +/*
>>> + * bdrv_measure:
>>> + * @drv: Format driver
>>> + * @opts: Creation options
>>> + * @in_bs: Existing image containing data for new image (may be NULL)
>>> + * @info: Result object
>>> + * @errp: Error object
>>> + *
>>> + * Calculate file size required to create a new image.
>>> + *
>>> + * If @in_bs is given then space for allocated clusters and zero clusters
>>> + * from that image are included in the calculation.  If @opts contains a
>>> + * backing file that is shared by @in_bs then backing clusters are omitted
>>> + * from the calculation.
>>
>> This seems to run a bit contrary to the documentation of
>> BlockMeasureInfo.required-bytes, and I don't fully understand it either.
>>
>> What does "space for zero clusters" mean? Do zero clusters take space?
>> Does it depend on the image format? (i.e. would they take space for raw
>> but not for qcow2?)
> 
> Yes, zero clusters are an image format-specific feature.  A contrived
> example:
> 
>   in_bs: qcow2 version=3 with zero clusters
>   Output format: qcow2 version=2 (zero clusters not supported!)
>   Image creation options: backing file given
> 
> We must take care to allocate clusters in the new image for zero
> clusters in in_bs.  We cannot simply skip allocating those zero clusters
> 

Re: [Qemu-devel] [PATCH v2] trace: ensure $(tracetool-y) is defined in top level makefile

2017-03-15 Thread Stefan Hajnoczi
On Wed, Mar 15, 2017 at 12:34:21PM +, Daniel P. Berrange wrote:
> The build rules for trace files have a dependancy on $(tracetool-y).
> This variable populated in the trace/Makefile.objs file and thus its
> definition gets pulled into the top level makefile. This happens too
> late in the process though, so by the time $(tracetool-y) is defined,
> make has already evaluated $(tracetool-y) in the dependancies and
> found it to be empty. The result is that when the tracetool source
> is changed, the generated files are not rebuilt. The solution is to
> define the variable in the top level makefile too
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  Makefile| 3 +++
>  trace/Makefile.objs | 8 
>  2 files changed, 3 insertions(+), 8 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC v2 6/8] qemu-img: add measure subcommand

2017-03-15 Thread Stefan Hajnoczi
On Thu, Mar 16, 2017 at 02:46:12AM +0100, Max Reitz wrote:
> On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> > The measure subcommand calculates the size required by a new image file.
> > This can be used by users or management tools that need to allocate
> > space on an LVM volume, SAN LUN, etc before creating or converting an
> > image file.
> > 
> > Suggested-by: Maor Lipchuk 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  qemu-img.c   | 213 
> > +++
> >  qemu-img-cmds.hx |   9 +++
> 
> Looking forward to the documentation in the non-RFC version. O:-)

Thanks for reminding me that the qemu-img man page needs documentation
for this new sub-command :).

> >  2 files changed, 222 insertions(+)
> > 
> > diff --git a/qemu-img.c b/qemu-img.c
> > index 98b836b..e8c6dcc 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -59,6 +59,7 @@ enum {
> >  OPTION_PATTERN = 260,
> >  OPTION_FLUSH_INTERVAL = 261,
> >  OPTION_NO_DRAIN = 262,
> > +OPTION_SIZE = 263,
> >  };
> >  
> >  typedef enum OutputFormat {
> > @@ -4287,6 +4288,218 @@ out:
> >  return 0;
> >  }
> >  
> > +static void dump_json_block_measure_info(BlockMeasureInfo *info)
> > +{
> > +QString *str;
> > +QObject *obj;
> > +Visitor *v = qobject_output_visitor_new();
> > +
> > +visit_type_BlockMeasureInfo(v, NULL, , _abort);
> > +visit_complete(v, );
> > +str = qobject_to_json_pretty(obj);
> > +assert(str != NULL);
> > +printf("%s\n", qstring_get_str(str));
> > +qobject_decref(obj);
> > +visit_free(v);
> > +QDECREF(str);
> > +}
> > +
> > +static int img_measure(int argc, char **argv)
> > +{
> > +static const struct option long_options[] = {
> > +{"help", no_argument, 0, 'h'},
> > +{"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
> > +{"object", required_argument, 0, OPTION_OBJECT},
> > +{"output", required_argument, 0, OPTION_OUTPUT},
> > +{"size", required_argument, 0, OPTION_SIZE},
> > +{0, 0, 0, 0}
> > +};
> > +OutputFormat output_format = OFORMAT_HUMAN;
> > +BlockBackend *in_blk = NULL;
> > +BlockDriver *drv;
> > +const char *filename = NULL;
> > +const char *fmt = NULL;
> > +const char *out_fmt = "raw";
> > +char *options = NULL;
> > +char *snapshot_name = NULL;
> > +QemuOpts *opts = NULL;
> > +QemuOpts *object_opts = NULL;
> > +QemuOpts *sn_opts = NULL;
> > +QemuOptsList *create_opts = NULL;
> > +bool image_opts = false;
> > +uint64_t img_size = ~0ULL;
> > +BlockMeasureInfo info;
> > +Error *local_err = NULL;
> > +int ret = 1;
> > +int c;
> > +
> > +while ((c = getopt_long(argc, argv, "hf:O:o:l:",
> > +long_options, NULL)) != -1) {
> > +switch (c) {
> > +case '?':
> > +case 'h':
> > +help();
> > +break;
> > +case 'f':
> > +fmt = optarg;
> > +break;
> > +case 'O':
> > +out_fmt = optarg;
> > +break;
> > +case 'o':
> > +if (!is_valid_option_list(optarg)) {
> > +error_report("Invalid option list: %s", optarg);
> > +goto out;
> > +}
> > +if (!options) {
> > +options = g_strdup(optarg);
> > +} else {
> > +char *old_options = options;
> > +options = g_strdup_printf("%s,%s", options, optarg);
> > +g_free(old_options);
> > +}
> > +break;
> > +case 'l':
> > +if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
> > +sn_opts = qemu_opts_parse_noisily(_snapshot_opts,
> > +  optarg, false);
> > +if (!sn_opts) {
> > +error_report("Failed in parsing snapshot param '%s'",
> > + optarg);
> > +goto out;
> > +}
> > +} else {
> > +snapshot_name = optarg;
> > +}
> > +break;
> > +case OPTION_OBJECT:
> > +object_opts = qemu_opts_parse_noisily(_object_opts,
> > +  optarg, true);
> > +if (!object_opts) {
> > +goto out;
> > +}
> > +break;
> > +case OPTION_IMAGE_OPTS:
> > +image_opts = true;
> > +break;
> > +case OPTION_OUTPUT:
> > +if (!strcmp(optarg, "json")) {
> > +output_format = OFORMAT_JSON;
> > +} else if (!strcmp(optarg, "human")) {
> > +output_format = OFORMAT_HUMAN;
> > +} else {
> > +error_report("--output must be used with human or json "
> > + "as argument.");
> > 

Re: [Qemu-devel] [RFC v2 5/8] qcow2: add bdrv_measure() support

2017-03-15 Thread Stefan Hajnoczi
On Thu, Mar 16, 2017 at 02:57:13AM +0100, Max Reitz wrote:
> On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> > Use qcow2_calc_prealloc_size() to get the required file size.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > TODO:
> >  * Query block status and only count allocated clusters if in_bs != NULL
> >  * Exclude backing file clusters if in_bs != NULL and -o backing_file=
> >is given
> > ---
> >  block/qcow2.c | 53 +
> >  1 file changed, 53 insertions(+)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 19be468..a4caf97 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -2940,6 +2940,58 @@ static coroutine_fn int 
> > qcow2_co_flush_to_os(BlockDriverState *bs)
> >  return 0;
> >  }
> >  
> > +static void qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
> > +  BlockMeasureInfo *info, Error **errp)
> > +{
> > +Error *local_err = NULL;
> > +uint64_t allocated_bytes = 0;
> > +uint64_t prealloc_size;
> > +uint64_t size;
> > +uint64_t refcount_bits;
> > +size_t cluster_size;
> > +int version;
> > +
> > +size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
> > +BDRV_SECTOR_SIZE);
> > +
> > +cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
> > +if (local_err) {
> > +goto err;
> > +}
> > +
> > +version = qcow2_opt_get_version_del(opts, _err);
> > +if (local_err) {
> > +goto err;
> > +}
> > +
> > +refcount_bits = qcow2_opt_get_refcount_bits_del(opts, version, 
> > _err);
> > +if (local_err) {
> > +goto err;
> > +}
> > +
> > +if (in_bs) {
> > +int64_t ssize = bdrv_getlength(in_bs);
> > +if (ssize < 0) {
> > +error_setg_errno(errp, -ssize, "Unable to get image size");
> > +return;
> > +}
> > +
> > +size = ssize;
> > +
> > +/* TODO How many clusters are allocated modulo backing file in 
> > opts? */
> > +}
> > +
> > +prealloc_size = qcow2_calc_prealloc_size(size, cluster_size,
> > + ctz32(refcount_bits));
> > +info->required_bytes = prealloc_size - (align_offset(size, 
> > cluster_size) -
> > +allocated_bytes);
> 
> But what if @opts contains a preallocation option?

Will implement in the next revision.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC v2 3/8] qcow2: extract preallocation calculation function

2017-03-15 Thread Stefan Hajnoczi
On Thu, Mar 16, 2017 at 02:18:36AM +0100, Max Reitz wrote:
> On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> > Calculating the preallocated image size will be needed to implement
> > .bdrv_measure().  Extract the code out into a separate function.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  block/qcow2.c | 134 
> > +-
> >  1 file changed, 76 insertions(+), 58 deletions(-)
> 
> So let's see whether it will be this or
> http://lists.nongnu.org/archive/html/qemu-block/2017-03/msg00342.html
> that lands first! ;-)

Ha, thanks for pointing that out :).

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC v2 1/8] block: add bdrv_measure() API

2017-03-15 Thread Stefan Hajnoczi
On Thu, Mar 16, 2017 at 02:01:03AM +0100, Max Reitz wrote:
> On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> > bdrv_measure() provides a conservative maximum for the size of a new
> > image.  This information is handy if storage needs to be allocated (e.g.
> > a SAN or an LVM volume) ahead of time.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  qapi/block-core.json  | 19 +++
> >  include/block/block.h |  4 
> >  include/block/block_int.h |  2 ++
> >  block.c   | 33 +
> >  4 files changed, 58 insertions(+)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 786b39e..673569d 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -463,6 +463,25 @@
> > '*dirty-bitmaps': ['BlockDirtyInfo'] } }
> >  
> >  ##
> > +# @BlockMeasureInfo:
> > +#
> > +# Image size calculation information.  This structure describes the size
> > +# requirements for creating a new image.
> > +#
> > +# @required-bytes: Amount of space required for image creation.  This 
> > value is
> > +#  the host file size including sparse file regions.  A 
> > new 5
> > +#  GB raw file therefore has a required size of 5 GB, not 0
> > +#  bytes.
> 
> This should probably note that it's a conservative estimation (and I
> agree that it should be). It's nice to have it in the commit message but
> few people are going to run git blame on the QAPI documentation to find
> out the rest of its story. :-)

Will fix.

> > +#
> > +# @fully-allocated-bytes: Space required once data has been written to all
> > +# sectors
> > +#
> > +# Since: 2.10
> > +##
> > +{ 'struct': 'BlockMeasureInfo',
> > +  'data': {'required-bytes': 'int', 'fully-allocated-bytes': 'int'} }
> > +
> > +##
> >  # @query-block:
> >  #
> >  # Get a list of BlockInfo for all virtual block devices.
> > diff --git a/include/block/block.h b/include/block/block.h
> > index 5149260..43c789f 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -298,6 +298,10 @@ int bdrv_truncate(BdrvChild *child, int64_t offset);
> >  int64_t bdrv_nb_sectors(BlockDriverState *bs);
> >  int64_t bdrv_getlength(BlockDriverState *bs);
> >  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
> > +void bdrv_measure(BlockDriver *drv, QemuOpts *opts,
> > +  BlockDriverState *in_bs,
> > +  BlockMeasureInfo *info,
> > +  Error **errp);
> >  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
> >  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
> >  int bdrv_commit(BlockDriverState *bs);
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index 6c699ac..45a7fbe 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -201,6 +201,8 @@ struct BlockDriver {
> >  int64_t (*bdrv_getlength)(BlockDriverState *bs);
> >  bool has_variable_length;
> >  int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
> > +void (*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
> > + BlockMeasureInfo *info, Error **errp);
> >  
> >  int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
> >  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
> > diff --git a/block.c b/block.c
> > index cb57370..532a4d1 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -3260,6 +3260,39 @@ int64_t 
> > bdrv_get_allocated_file_size(BlockDriverState *bs)
> >  return -ENOTSUP;
> >  }
> >  
> > +/*
> > + * bdrv_measure:
> > + * @drv: Format driver
> > + * @opts: Creation options
> > + * @in_bs: Existing image containing data for new image (may be NULL)
> > + * @info: Result object
> > + * @errp: Error object
> > + *
> > + * Calculate file size required to create a new image.
> > + *
> > + * If @in_bs is given then space for allocated clusters and zero clusters
> > + * from that image are included in the calculation.  If @opts contains a
> > + * backing file that is shared by @in_bs then backing clusters are omitted
> > + * from the calculation.
> 
> This seems to run a bit contrary to the documentation of
> BlockMeasureInfo.required-bytes, and I don't fully understand it either.
> 
> What does "space for zero clusters" mean? Do zero clusters take space?
> Does it depend on the image format? (i.e. would they take space for raw
> but not for qcow2?)

Yes, zero clusters are an image format-specific feature.  A contrived
example:

  in_bs: qcow2 version=3 with zero clusters
  Output format: qcow2 version=2 (zero clusters not supported!)
  Image creation options: backing file given

We must take care to allocate clusters in the new image for zero
clusters in in_bs.  We cannot simply skip allocating those zero clusters
since there is a backing file and the contents of the backing file
must not be 

Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-15 Thread Zhong, Yang
Hello Paolo,

As for your said tcmalloc or jemalloc for optimization memory allocation, we 
also tried the malloc_trim() in glibc, which can get the same memory 
optimization.

diff --git a/util/rcu.c b/util/rcu.c
index 9adc5e4..3643e43 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,7 +32,7 @@
#include "qemu/atomic.h"
#include "qemu/thread.h"
#include "qemu/main-loop.h"
-
+#include "malloc.h"
/*
  * Global grace period counter.  Bit 0 is always one in rcu_gp_ctr.
  * Bits 1 and above are defined in synchronize_rcu.
@@ -271,6 +271,7 @@ static void *call_rcu_thread(void *opaque)
 n--;
 node->func(node);
 }
+malloc_trim(0);
 qemu_mutex_unlock_iothread();
 }
 abort();

The man info in malloc_trim  is only release free memory from the top of the 
heap, in fact, we found this function also release hole memory below top of 
heap.

Thanks!

ZhongYang


-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com] 
Sent: Thursday, March 16, 2017 4:22 AM
To: Xu, Anthony 
Cc: Zhong, Yang ; Peng, Chao P ; 
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 
12252kB to 2752KB



- Original Message -
> From: "Anthony Xu" 
> To: "Paolo Bonzini" 
> Cc: "Yang Zhong" , "Chao P Peng" 
> , qemu-devel@nongnu.org
> Sent: Wednesday, March 15, 2017 8:05:48 PM
> Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size 
> from 12252kB to 2752KB
> 
> > The first unref is done after as->current_map is overwritten.
> > as->current_map is accessed under RCU, so it needs call_rcu.  It
> > balances the initial reference that is present since flatview_init.
> 
> Got it, thanks for explanation.
> 
> > > but it is not clear to me, is this a bug or by design? Is 
> > > flatview_destroy called
> > in current thread
> > > or RCU thread?
> > 
> > You cannot know, because there are also other callers of 
> > address_space_get_flatview.  Usually it's from the RCU thread.
> 
> If it is from current thread, do we need to check if the global lock 
> is acquired?
> As you mentioned flatview_unref may call memory_region_unref.

No, you don't need to check it.  Most functions (in this case,
memory_region_transaction_commit) can only be called under the global lock.
In fact, only address_space_read/write/ld*/st*, plus memory_region_find can be 
called without the global lock.

> In the comments of memory_region_finalize
> "   /* We know the region is not visible in any address space (it
>  * does not have a container and cannot be a root either because
>  * it has no references,
> "
> 
> We know the memory region is not a root region, and the memory region 
> has already been removed from address space even it has sub regions.
> memory_region_transaction_commit should be called when the memory 
> region is removed from address space.
> memory_region_transaction_commit seems not be needed in 
> memory_region_finalize.
> Let me know if you think otherwise.

Yes, you can replace memory_region_del_subregion in memory_region_finalize with 
special code that does

assert(!mr->enabled);
assert(subregion->container == mr);
subregion->container = NULL;
QTAILQ_REMOVE(>subregions, subregion, subregions_link);
memory_region_unref(subregion);

The last four lines are shared with memory_region_del_subregion, so please 
factor them in a new function, for example memory_region_del_subregion_internal.

> > > How about fall back to synchronize_rcu?
> > 
> > I'm afraid it would be too slow, but you can test.
> 
> It is not slow, the contention is not high. Yes we can test.

It's not a matter of contention.  It's a matter of how long an RCU critical 
section can be---not just at startup, but at any point during QEMU execution.

Have you tried tcmalloc or jemalloc?  They use the brk region less and 
sometimes are more aggressive in releasing mmap-ed memory areas.

> Please review below patch, MemoryRegion is protected by RCU.

Removing transaction begin/commit in memory_region_finalize makes little sense 
because memory_region_del_subregion calls those two functions again.  See above 
for an alternative.  Apart from this, the patch is at least correct, though I'm 
not sure it's a good idea (synchronize_rcu needs testing).  I'm not sure I 
understand the address_space_write change).

Paolo

> Thanks,
> Anthony
> 
> 
> 
> diff --git a/exec.c b/exec.c
> index a22f5a0..6631668 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener)
> 
>  atomic_rcu_set(>dispatch, next);
>  if (cur) {
> -call_rcu(cur, address_space_dispatch_free, rcu);
> +synchronize_rcu();
> +address_space_dispatch_free(cur);
>  }
>  }
> 
> @@ -2567,7 +2568,8 @@ void 

[Qemu-devel] [PULL 5/5] virtio-serial-bus: Delete timer from list before free it

2017-03-15 Thread Michael S. Tsirkin
From: zhanghailiang 

Signed-off-by: zhanghailiang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Amit Shah 
---
 hw/char/virtio-serial-bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index d544cd9..d797a67 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -724,6 +724,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
 }
 }
 g_free(s->post_load->connected);
+timer_del(s->post_load->timer);
 timer_free(s->post_load->timer);
 g_free(s->post_load);
 s->post_load = NULL;
-- 
MST




[Qemu-devel] [PULL 4/5] hw/virtio: fix Power Management Control Register for PCI Express virtio devices

2017-03-15 Thread Michael S. Tsirkin
From: Marcel Apfelbaum 

Make Power Management State flag writable to conform
with the PCI Express spec.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.h |  4 
 include/hw/compat.h|  4 
 include/hw/pci/pcie.h  |  2 ++
 hw/virtio/virtio-pci.c | 11 +++
 4 files changed, 21 insertions(+)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 9b5dd5a..b095dfc 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -75,6 +75,7 @@ enum {
 VIRTIO_PCI_FLAG_ATS_BIT,
 VIRTIO_PCI_FLAG_INIT_DEVERR_BIT,
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT,
+VIRTIO_PCI_FLAG_INIT_PM_BIT,
 };
 
 /* Need to activate work-arounds for buggy guests at vmstate load. */
@@ -108,6 +109,9 @@ enum {
 /* Init Link Control register */
 #define VIRTIO_PCI_FLAG_INIT_LNKCTL (1 << VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT)
 
+/* Init Power Management */
+#define VIRTIO_PCI_FLAG_INIT_PM (1 << VIRTIO_PCI_FLAG_INIT_PM_BIT)
+
 typedef struct {
 MSIMessage msg;
 int virq;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 0931aa5..90606f9 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -30,6 +30,10 @@
 .driver   = "virtio-pci",\
 .property = "x-pcie-lnkctl-init",\
 .value= "off",\
+},{\
+.driver   = "virtio-pci",\
+.property = "x-pcie-pm-init",\
+.value= "off",\
 },
 
 #define HW_COMPAT_2_7 \
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 11c6247..3d8f24b 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -63,6 +63,8 @@ typedef enum {
 struct PCIExpressDevice {
 /* Offset of express capability in config space */
 uint8_t exp_cap;
+/* Offset of Power Management capability in config space */
+uint8_t pm_cap;
 
 /* SLOT */
 bool hpev_notified; /* Logical AND of conditions for hot plug event.
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 300aa4a..f9b7244 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1812,6 +1812,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF);
 assert(pos > 0);
+pci_dev->exp.pm_cap = pos;
 
 /*
  * Indicates that this function complies with revision 1.2 of the
@@ -1829,6 +1830,12 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 pcie_cap_lnkctl_init(pci_dev);
 }
 
+if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) {
+/* Init Power Management Control Register */
+pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+}
+
 if (proxy->flags & VIRTIO_PCI_FLAG_ATS) {
 pcie_ats_init(pci_dev, 256);
 }
@@ -1877,6 +1884,8 @@ static void virtio_pci_reset(DeviceState *qdev)
 if (pci_is_express(dev)) {
 pcie_cap_deverr_reset(dev);
 pcie_cap_lnkctl_reset(dev);
+
+pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
 }
 }
 
@@ -1902,6 +1911,8 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_INIT_DEVERR_BIT, true),
 DEFINE_PROP_BIT("x-pcie-lnkctl-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
+DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST




[Qemu-devel] [PULL 3/5] hw/virtio: fix Link Control Register for PCI Express virtio devices

2017-03-15 Thread Michael S. Tsirkin
From: Marcel Apfelbaum 

Make several Link Control Register flags writable to conform
with the PCI Express spec.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.h |  4 
 include/hw/compat.h|  4 
 include/hw/pci/pcie.h  |  3 +++
 hw/pci/pcie.c  | 14 ++
 hw/virtio/virtio-pci.c |  8 
 5 files changed, 33 insertions(+)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 120661d..9b5dd5a 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -74,6 +74,7 @@ enum {
 VIRTIO_PCI_FLAG_PAGE_PER_VQ_BIT,
 VIRTIO_PCI_FLAG_ATS_BIT,
 VIRTIO_PCI_FLAG_INIT_DEVERR_BIT,
+VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT,
 };
 
 /* Need to activate work-arounds for buggy guests at vmstate load. */
@@ -104,6 +105,9 @@ enum {
 /* Init error enabling flags */
 #define VIRTIO_PCI_FLAG_INIT_DEVERR (1 << VIRTIO_PCI_FLAG_INIT_DEVERR_BIT)
 
+/* Init Link Control register */
+#define VIRTIO_PCI_FLAG_INIT_LNKCTL (1 << VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT)
+
 typedef struct {
 MSIMessage msg;
 int virq;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index c98776a..0931aa5 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -26,6 +26,10 @@
 .driver   = "virtio-pci",\
 .property = "x-pcie-deverr-init",\
 .value= "off",\
+},{\
+.driver   = "virtio-pci",\
+.property = "x-pcie-lnkctl-init",\
+.value= "off",\
 },
 
 #define HW_COMPAT_2_7 \
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 163c519..11c6247 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -96,6 +96,9 @@ uint8_t pcie_cap_flags_get_vector(PCIDevice *dev);
 void pcie_cap_deverr_init(PCIDevice *dev);
 void pcie_cap_deverr_reset(PCIDevice *dev);
 
+void pcie_cap_lnkctl_init(PCIDevice *dev);
+void pcie_cap_lnkctl_reset(PCIDevice *dev);
+
 void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot);
 void pcie_cap_slot_reset(PCIDevice *dev);
 void pcie_cap_slot_write_config(PCIDevice *dev,
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 82a8902..18e634f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -223,6 +223,20 @@ void pcie_cap_deverr_reset(PCIDevice *dev)
  PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
 }
 
+void pcie_cap_lnkctl_init(PCIDevice *dev)
+{
+uint32_t pos = dev->exp.exp_cap;
+pci_long_test_and_set_mask(dev->wmask + pos + PCI_EXP_LNKCTL,
+   PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES);
+}
+
+void pcie_cap_lnkctl_reset(PCIDevice *dev)
+{
+uint8_t *lnkctl = dev->config + dev->exp.exp_cap + PCI_EXP_LNKCTL;
+pci_long_test_and_clear_mask(lnkctl,
+ PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES);
+}
+
 static void hotplug_event_update_event_status(PCIDevice *dev)
 {
 uint32_t pos = dev->exp.exp_cap;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f6de5ee..300aa4a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1824,6 +1824,11 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 pcie_cap_deverr_init(pci_dev);
 }
 
+if (proxy->flags & VIRTIO_PCI_FLAG_INIT_LNKCTL) {
+/* Init Link Control Register */
+pcie_cap_lnkctl_init(pci_dev);
+}
+
 if (proxy->flags & VIRTIO_PCI_FLAG_ATS) {
 pcie_ats_init(pci_dev, 256);
 }
@@ -1871,6 +1876,7 @@ static void virtio_pci_reset(DeviceState *qdev)
 
 if (pci_is_express(dev)) {
 pcie_cap_deverr_reset(dev);
+pcie_cap_lnkctl_reset(dev);
 }
 }
 
@@ -1894,6 +1900,8 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_ATS_BIT, false),
 DEFINE_PROP_BIT("x-pcie-deverr-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_DEVERR_BIT, true),
+DEFINE_PROP_BIT("x-pcie-lnkctl-init", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST




[Qemu-devel] [PULL 1/5] hw/pcie: fix Extended Configuration Space for devices with no Extended Capabilities

2017-03-15 Thread Michael S. Tsirkin
From: Marcel Apfelbaum 

Absence of any Extended Capabilities is required to be
indicated by an Extended Capability header with a Capability ID of
h, a Capability Version of 0h, and a Next Capability Offset of 000h.

Instead of inserting a 'NULL' capability is simpler to mark the start
of the Extended Configuration Space as read-only to achieve the same
behaviour.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/compat.h  | 4 
 include/hw/pci/pci.h | 2 ++
 hw/pci/pci.c | 2 ++
 hw/pci/pcie.c| 6 ++
 4 files changed, 14 insertions(+)

diff --git a/include/hw/compat.h b/include/hw/compat.h
index b7db438..ce3bfe3 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -18,6 +18,10 @@
 .driver   = "pci-bridge",\
 .property = "shpc",\
 .value= "on",\
+},{\
+.driver   = TYPE_PCI_DEVICE,\
+.property = "x-pcie-extcap-init",\
+.value= "off",\
 },
 
 #define HW_COMPAT_2_7 \
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 713ede0..a37a2d5 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -183,6 +183,8 @@ enum {
 /* Link active status in endpoint capability is always set */
 #define QEMU_PCIE_LNKSTA_DLLLA_BITNR 8
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
+#define QEMU_PCIE_EXTCAP_INIT_BITNR 9
+QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index ad46390..e6b08e1 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -64,6 +64,8 @@ static Property pci_props[] = {
 QEMU_PCI_CAP_SERR_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,
 QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
+DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
+QEMU_PCIE_EXTCAP_INIT_BITNR, true),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index fc54bfd..82a8902 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -109,6 +109,12 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
  PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
 
 pci_set_word(dev->wmask + pos + PCI_EXP_DEVCTL2, PCI_EXP_DEVCTL2_EETLPPB);
+
+if (dev->cap_present & QEMU_PCIE_EXTCAP_INIT) {
+/* read-only to behave like a 'NULL' Extended Capability Header */
+pci_set_long(dev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
+}
+
 return pos;
 }
 
-- 
MST




[Qemu-devel] [PULL 2/5] hw/virtio: fix error enabling flags in Device Control register

2017-03-15 Thread Michael S. Tsirkin
From: Marcel Apfelbaum 

When the virtio devices are PCI Express, make error-enabling flags
writable to respect the PCIe spec.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.h |  4 
 include/hw/compat.h|  4 
 hw/virtio/virtio-pci.c | 12 
 dtc|  2 +-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index d00064c..120661d 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -73,6 +73,7 @@ enum {
 VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT,
 VIRTIO_PCI_FLAG_PAGE_PER_VQ_BIT,
 VIRTIO_PCI_FLAG_ATS_BIT,
+VIRTIO_PCI_FLAG_INIT_DEVERR_BIT,
 };
 
 /* Need to activate work-arounds for buggy guests at vmstate load. */
@@ -100,6 +101,9 @@ enum {
 /* address space translation service */
 #define VIRTIO_PCI_FLAG_ATS (1 << VIRTIO_PCI_FLAG_ATS_BIT)
 
+/* Init error enabling flags */
+#define VIRTIO_PCI_FLAG_INIT_DEVERR (1 << VIRTIO_PCI_FLAG_INIT_DEVERR_BIT)
+
 typedef struct {
 MSIMessage msg;
 int virq;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index ce3bfe3..c98776a 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -22,6 +22,10 @@
 .driver   = TYPE_PCI_DEVICE,\
 .property = "x-pcie-extcap-init",\
 .value= "off",\
+},{\
+.driver   = "virtio-pci",\
+.property = "x-pcie-deverr-init",\
+.value= "off",\
 },
 
 #define HW_COMPAT_2_7 \
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 69cc471..f6de5ee 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1819,6 +1819,11 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
  */
 pci_set_word(pci_dev->config + pos + PCI_PM_PMC, 0x3);
 
+if (proxy->flags & VIRTIO_PCI_FLAG_INIT_DEVERR) {
+/* Init error enabling flags */
+pcie_cap_deverr_init(pci_dev);
+}
+
 if (proxy->flags & VIRTIO_PCI_FLAG_ATS) {
 pcie_ats_init(pci_dev, 256);
 }
@@ -1849,6 +1854,7 @@ static void virtio_pci_reset(DeviceState *qdev)
 {
 VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
 VirtioBusState *bus = VIRTIO_BUS(>bus);
+PCIDevice *dev = PCI_DEVICE(qdev);
 int i;
 
 virtio_pci_stop_ioeventfd(proxy);
@@ -1862,6 +1868,10 @@ static void virtio_pci_reset(DeviceState *qdev)
 proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
 proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
 }
+
+if (pci_is_express(dev)) {
+pcie_cap_deverr_reset(dev);
+}
 }
 
 static Property virtio_pci_properties[] = {
@@ -1882,6 +1892,8 @@ static Property virtio_pci_properties[] = {
  ignore_backend_features, false),
 DEFINE_PROP_BIT("ats", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_ATS_BIT, false),
+DEFINE_PROP_BIT("x-pcie-deverr-init", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_INIT_DEVERR_BIT, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/dtc b/dtc
index 558cd81..65cc4d2 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit 558cd81bdd432769b59bff01240c44f82cfb1a9d
+Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf
-- 
MST




[Qemu-devel] [PULL 0/5] virtio, pci: fixes

2017-03-15 Thread Michael S. Tsirkin
The following changes since commit 1883ff34b540daacae948f493b0ba525edf5f642:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2017-03-15 18:44:05 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to bdf4c4ec53f293ea1baa7ce7c31fe0301887b513:

  virtio-serial-bus: Delete timer from list before free it (2017-03-16 01:46:42 
+0200)


virtio, pci: fixes

More fixes missed in the previous pull request.

Signed-off-by: Michael S. Tsirkin 


Marcel Apfelbaum (4):
  hw/pcie: fix Extended Configuration Space for devices with no Extended 
Capabilities
  hw/virtio: fix error enabling flags in Device Control register
  hw/virtio: fix Link Control Register for PCI Express virtio devices
  hw/virtio: fix Power Management Control Register for PCI Express virtio 
devices

zhanghailiang (1):
  virtio-serial-bus: Delete timer from list before free it

 hw/virtio/virtio-pci.h  | 12 
 include/hw/compat.h | 16 
 include/hw/pci/pci.h|  2 ++
 include/hw/pci/pcie.h   |  5 +
 hw/char/virtio-serial-bus.c |  1 +
 hw/pci/pci.c|  2 ++
 hw/pci/pcie.c   | 20 
 hw/virtio/virtio-pci.c  | 31 +++
 dtc |  2 +-
 9 files changed, 90 insertions(+), 1 deletion(-)




Re: [Qemu-devel] [PATCH for-2.9] mirror: Fix backwards mirror_yield parameters

2017-03-15 Thread Stefan Hajnoczi
On Wed, Mar 15, 2017 at 10:59:24AM +, Daniel P. Berrange wrote:
> On Wed, Mar 15, 2017 at 10:26:44AM +, Daniel P. Berrange wrote:
> > On Wed, Mar 15, 2017 at 06:18:35PM +0800, Stefan Hajnoczi wrote:
> > > On Fri, Mar 10, 2017 at 02:49:22PM -0600, Eric Blake wrote:
> > > > And here's where I'm stuck: the makefiles are broken.  Touching
> > > > scripts/tracetool/format/h.py does NOT cause tracetool to be re-run by a
> > > > mere 'make'; I've had to resort to 'make -B block/trace.h-timestamp' to
> > > > get things to rebuild.  And this is in spite of the fact that h.py
> > > > _should_ be getting listed in $(tracetool-y) by trace/Makefile.objs, and
> > > > $(tracetool-y) is listed as a dependency of %/trace.h-timestamp in the
> > > > top-level Makefile.  I would appreciate anyone with advice or an idea on
> > > > how to patch Makefile to get the dependency working without me having to
> > > > manually kick it.
> > > 
> > > Also CCing Daniel Berrange.  He recently touched the tracing Makefiles
> > > and may have ideas.
> > 
> > I've been looking at this and I'm damned if I understand what's broken.
> > All the required dependancies look to be expressed in the Makefile
> > 
> >   %/trace.h: %/trace.h-timestamp
> >   %/trace.h-timestamp: $(SRC_PATH)/%/trace-events $(tracetool-y)
> > 
> > and $(tracetool-y) expands to the list of source files
> > 
> >   tracetool-y = $(SRC_PATH)/scripts/tracetool.py
> >   tracetool-y += $(shell find $(SRC_PATH)/scripts/tracetool -name "*.py")
> > 
> > 
> > If I do 'touch hw/net/trace-events', then hw/net/trace.h gets rebuilt,
> > but if do 'touch scripts/tracetool.py' it doesn't get built. So somehow
> > make seems to be dropping the $(tracetool-y) deps despite being listed
> > against the %/trace-h-timestamp file, and despite earlier deps on
> > trace-events being honoured
> 
> Oh this is a fun one.  While the $(tracetool-y) variable *is* defined at
> the time the build rules execute, it is *not* defined at the time make
> evaluates dependancies, so it expanded to be empty ! I copied you on
> the easy fix

Nice find, thanks for the help!

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] qemu-img: show help for invalid global options

2017-03-15 Thread Max Reitz
On 13.03.2017 06:11, Stefan Hajnoczi wrote:
> The qemu-img sub-command executes regardless of invalid global options:
> 
>   $ qemu-img --foo info test.img
>   qemu-img: unrecognized option '--foo'
>   image: test.img
>   ...
> 
> The unrecognized option warning may be missed by the user.  This can
> hide incorrect command-lines in scripts and confuse users.
> 
> This patch prints the help information and terminates instead of
> executing the sub-command.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qemu-img.c | 1 +
>  1 file changed, 1 insertion(+)

Well, now you get blown away by a wall of text and spotting what went
wrong is actually not quite simple. Maybe we should follow the way of
the coreutils, that is:

qemu-img: unrecognized option '--foo'
Try 'qemu-img --help' for more information.

?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC v2 5/8] qcow2: add bdrv_measure() support

2017-03-15 Thread Max Reitz
On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> Use qcow2_calc_prealloc_size() to get the required file size.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
> TODO:
>  * Query block status and only count allocated clusters if in_bs != NULL
>  * Exclude backing file clusters if in_bs != NULL and -o backing_file=
>is given
> ---
>  block/qcow2.c | 53 +
>  1 file changed, 53 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 19be468..a4caf97 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2940,6 +2940,58 @@ static coroutine_fn int 
> qcow2_co_flush_to_os(BlockDriverState *bs)
>  return 0;
>  }
>  
> +static void qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
> +  BlockMeasureInfo *info, Error **errp)
> +{
> +Error *local_err = NULL;
> +uint64_t allocated_bytes = 0;
> +uint64_t prealloc_size;
> +uint64_t size;
> +uint64_t refcount_bits;
> +size_t cluster_size;
> +int version;
> +
> +size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
> +BDRV_SECTOR_SIZE);
> +
> +cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
> +if (local_err) {
> +goto err;
> +}
> +
> +version = qcow2_opt_get_version_del(opts, _err);
> +if (local_err) {
> +goto err;
> +}
> +
> +refcount_bits = qcow2_opt_get_refcount_bits_del(opts, version, 
> _err);
> +if (local_err) {
> +goto err;
> +}
> +
> +if (in_bs) {
> +int64_t ssize = bdrv_getlength(in_bs);
> +if (ssize < 0) {
> +error_setg_errno(errp, -ssize, "Unable to get image size");
> +return;
> +}
> +
> +size = ssize;
> +
> +/* TODO How many clusters are allocated modulo backing file in opts? 
> */
> +}
> +
> +prealloc_size = qcow2_calc_prealloc_size(size, cluster_size,
> + ctz32(refcount_bits));
> +info->required_bytes = prealloc_size - (align_offset(size, cluster_size) 
> -
> +allocated_bytes);

But what if @opts contains a preallocation option?

Max

> +info->fully_allocated_bytes = prealloc_size;
> +return;
> +
> +err:
> +error_propagate(errp, local_err);
> +}
> +
>  static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>  {
>  BDRVQcow2State *s = bs->opaque;
> @@ -3487,6 +3539,7 @@ BlockDriver bdrv_qcow2 = {
>  .bdrv_snapshot_delete   = qcow2_snapshot_delete,
>  .bdrv_snapshot_list = qcow2_snapshot_list,
>  .bdrv_snapshot_load_tmp = qcow2_snapshot_load_tmp,
> +.bdrv_measure   = qcow2_measure,
>  .bdrv_get_info  = qcow2_get_info,
>  .bdrv_get_specific_info = qcow2_get_specific_info,
>  
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC v2 7/8] qemu-iotests: support per-format golden output files

2017-03-15 Thread Max Reitz
On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> Some tests produce format-dependent output.  Either the difference is
> filtered out and ignored, or the test case is format-specific so we
> don't need to worry about per-format output differences.
> 
> There is a third case: the test script is the same for all image formats
> and the format-dependent output is relevant.  An ugly workaround is to
> copy-paste the test into multiple per-format test cases.  This
> duplicates code and is not maintainable.
> 
> This patch allows test cases to add per-format golden output files so a
> single test case can work correctly when format-dependent output must be
> checked:
> 
>   123.out.qcow2
>   123.out.raw
>   123.out.vmdk
>   ...
> 
> This naming scheme is not composable with 123.out.nocache or 123.pc.out,
> two other scenarios where output files are split.  I don't think it
> matters since few test cases need these features.

Well, it would be pretty simple to implement by just always using
reference_x="$reference.$NEW_EXTENSION" instead of rebuilding the path
from screatch, but the number of combinations would blow up so much it
may be better to leave that out anyway.

(No R-b because I like to save my precious R-bs for non-RFCs)

Max

> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/qemu-iotests/check | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 4b1c674..29553cf 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -338,6 +338,11 @@ do
>  reference="$reference_machine"
>  fi
>  
> +reference_format="$source_iotests/$seq.out.$IMGFMT"
> +if [ -f "$reference_format" ]; then
> +reference="$reference_format"
> +fi
> +
>  if [ "$CACHEMODE" = "none" ]; then
>  [ -f "$source_iotests/$seq.out.nocache" ] && 
> reference="$source_iotests/$seq.out.nocache"
>  fi
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC v2 6/8] qemu-img: add measure subcommand

2017-03-15 Thread Max Reitz
On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> The measure subcommand calculates the size required by a new image file.
> This can be used by users or management tools that need to allocate
> space on an LVM volume, SAN LUN, etc before creating or converting an
> image file.
> 
> Suggested-by: Maor Lipchuk 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qemu-img.c   | 213 
> +++
>  qemu-img-cmds.hx |   9 +++

Looking forward to the documentation in the non-RFC version. O:-)

>  2 files changed, 222 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 98b836b..e8c6dcc 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -59,6 +59,7 @@ enum {
>  OPTION_PATTERN = 260,
>  OPTION_FLUSH_INTERVAL = 261,
>  OPTION_NO_DRAIN = 262,
> +OPTION_SIZE = 263,
>  };
>  
>  typedef enum OutputFormat {
> @@ -4287,6 +4288,218 @@ out:
>  return 0;
>  }
>  
> +static void dump_json_block_measure_info(BlockMeasureInfo *info)
> +{
> +QString *str;
> +QObject *obj;
> +Visitor *v = qobject_output_visitor_new();
> +
> +visit_type_BlockMeasureInfo(v, NULL, , _abort);
> +visit_complete(v, );
> +str = qobject_to_json_pretty(obj);
> +assert(str != NULL);
> +printf("%s\n", qstring_get_str(str));
> +qobject_decref(obj);
> +visit_free(v);
> +QDECREF(str);
> +}
> +
> +static int img_measure(int argc, char **argv)
> +{
> +static const struct option long_options[] = {
> +{"help", no_argument, 0, 'h'},
> +{"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
> +{"object", required_argument, 0, OPTION_OBJECT},
> +{"output", required_argument, 0, OPTION_OUTPUT},
> +{"size", required_argument, 0, OPTION_SIZE},
> +{0, 0, 0, 0}
> +};
> +OutputFormat output_format = OFORMAT_HUMAN;
> +BlockBackend *in_blk = NULL;
> +BlockDriver *drv;
> +const char *filename = NULL;
> +const char *fmt = NULL;
> +const char *out_fmt = "raw";
> +char *options = NULL;
> +char *snapshot_name = NULL;
> +QemuOpts *opts = NULL;
> +QemuOpts *object_opts = NULL;
> +QemuOpts *sn_opts = NULL;
> +QemuOptsList *create_opts = NULL;
> +bool image_opts = false;
> +uint64_t img_size = ~0ULL;
> +BlockMeasureInfo info;
> +Error *local_err = NULL;
> +int ret = 1;
> +int c;
> +
> +while ((c = getopt_long(argc, argv, "hf:O:o:l:",
> +long_options, NULL)) != -1) {
> +switch (c) {
> +case '?':
> +case 'h':
> +help();
> +break;
> +case 'f':
> +fmt = optarg;
> +break;
> +case 'O':
> +out_fmt = optarg;
> +break;
> +case 'o':
> +if (!is_valid_option_list(optarg)) {
> +error_report("Invalid option list: %s", optarg);
> +goto out;
> +}
> +if (!options) {
> +options = g_strdup(optarg);
> +} else {
> +char *old_options = options;
> +options = g_strdup_printf("%s,%s", options, optarg);
> +g_free(old_options);
> +}
> +break;
> +case 'l':
> +if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
> +sn_opts = qemu_opts_parse_noisily(_snapshot_opts,
> +  optarg, false);
> +if (!sn_opts) {
> +error_report("Failed in parsing snapshot param '%s'",
> + optarg);
> +goto out;
> +}
> +} else {
> +snapshot_name = optarg;
> +}
> +break;
> +case OPTION_OBJECT:
> +object_opts = qemu_opts_parse_noisily(_object_opts,
> +  optarg, true);
> +if (!object_opts) {
> +goto out;
> +}
> +break;
> +case OPTION_IMAGE_OPTS:
> +image_opts = true;
> +break;
> +case OPTION_OUTPUT:
> +if (!strcmp(optarg, "json")) {
> +output_format = OFORMAT_JSON;
> +} else if (!strcmp(optarg, "human")) {
> +output_format = OFORMAT_HUMAN;
> +} else {
> +error_report("--output must be used with human or json "
> + "as argument.");
> +goto out;
> +}
> +break;
> +case OPTION_SIZE:
> +{
> +int64_t sval;
> +
> +sval = cvtnum(optarg);
> +if (sval < 0) {
> +if (sval == -ERANGE) {
> +error_report("Image size must be less than 8 EiB!");
> +} else {
> +error_report("Invalid image size specified! You may 

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-03-15 Thread Huang, Kai

Thanks!

Thanks,
-Kai

On 3/14/2017 5:57 AM, Paolo Bonzini wrote:



On 13/03/2017 15:58, fangying wrote:

Hi, Huang Kai

After weeks of intensive testing, we think the problem is solved and
this issue can be closed.


Thanks for the update.  We got to the same conclusion.

Paolo





Re: [Qemu-devel] [PATCH v2] migration/block: Avoid invoking blk_drain too frequently

2017-03-15 Thread Fam Zheng
On Wed, 03/15 17:31, Dr. David Alan Gilbert wrote:
> * Fam Zheng (f...@redhat.com) wrote:
> > On Wed, 03/15 11:37, Lidong Chen wrote:
> > > Increase bmds->cur_dirty after submit io, so reduce the frequency
> > > involve into blk_drain, and improve the performance obviously
> > > when block migration.
> > > 
> > > The performance test result of this patch:
> > > 
> > > During the block dirty save phase, this patch improve guest os IOPS
> > > from 4.0K to 9.5K. and improve the migration speed from
> > > 505856 rsec/s to 855756 rsec/s.
> > > 
> > > Signed-off-by: Lidong Chen 
> > > ---
> > >  migration/block.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/migration/block.c b/migration/block.c
> > > index 6741228..7734ff7 100644
> > > --- a/migration/block.c
> > > +++ b/migration/block.c
> > > @@ -576,6 +576,9 @@ static int mig_save_device_dirty(QEMUFile *f, 
> > > BlkMigDevState *bmds,
> > >  }
> > >  
> > >  bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, sector, 
> > > nr_sectors);
> > > +sector += nr_sectors;
> > > +bmds->cur_dirty = sector;
> > > +
> > >  break;
> > >  }
> > >  sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
> > > -- 
> > > 1.8.3.1
> > > 
> > 
> > Nice catch above all, thank you!
> > 
> > Reviewed-by: Fam Zheng 
> 
> Are you taking that via a block pull?

I can do that, but I'm not sure whether it should go to 2.9. This is a
performance improvement, which usually doesn't qualify as bug fixes. But this
also looks like a mistake in original code.

Fam



Re: [Qemu-devel] [PATCH v2 2/2] virtio-scsi: Fix acquire/release in dataplane handlers

2017-03-15 Thread Fam Zheng
On Tue, 03/14 23:36, Fam Zheng wrote:
> After the AioContext lock push down, there is a race between
> virtio_scsi_dataplane_start and those "assert(s->ctx &&
> s->dataplane_started)", because the latter doesn't isn't wrapped in

s/doesn't//

> aio_context_acquire.



Re: [Qemu-devel] [RFC v2 3/8] qcow2: extract preallocation calculation function

2017-03-15 Thread Max Reitz
On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> Calculating the preallocated image size will be needed to implement
> .bdrv_measure().  Extract the code out into a separate function.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/qcow2.c | 134 
> +-
>  1 file changed, 76 insertions(+), 58 deletions(-)

So let's see whether it will be this or
http://lists.nongnu.org/archive/html/qemu-block/2017-03/msg00342.html
that lands first! ;-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC v2 1/8] block: add bdrv_measure() API

2017-03-15 Thread Max Reitz
On 15.03.2017 10:29, Stefan Hajnoczi wrote:
> bdrv_measure() provides a conservative maximum for the size of a new
> image.  This information is handy if storage needs to be allocated (e.g.
> a SAN or an LVM volume) ahead of time.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qapi/block-core.json  | 19 +++
>  include/block/block.h |  4 
>  include/block/block_int.h |  2 ++
>  block.c   | 33 +
>  4 files changed, 58 insertions(+)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 786b39e..673569d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -463,6 +463,25 @@
> '*dirty-bitmaps': ['BlockDirtyInfo'] } }
>  
>  ##
> +# @BlockMeasureInfo:
> +#
> +# Image size calculation information.  This structure describes the size
> +# requirements for creating a new image.
> +#
> +# @required-bytes: Amount of space required for image creation.  This value 
> is
> +#  the host file size including sparse file regions.  A new 5
> +#  GB raw file therefore has a required size of 5 GB, not 0
> +#  bytes.

This should probably note that it's a conservative estimation (and I
agree that it should be). It's nice to have it in the commit message but
few people are going to run git blame on the QAPI documentation to find
out the rest of its story. :-)

> +#
> +# @fully-allocated-bytes: Space required once data has been written to all
> +# sectors
> +#
> +# Since: 2.10
> +##
> +{ 'struct': 'BlockMeasureInfo',
> +  'data': {'required-bytes': 'int', 'fully-allocated-bytes': 'int'} }
> +
> +##
>  # @query-block:
>  #
>  # Get a list of BlockInfo for all virtual block devices.
> diff --git a/include/block/block.h b/include/block/block.h
> index 5149260..43c789f 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -298,6 +298,10 @@ int bdrv_truncate(BdrvChild *child, int64_t offset);
>  int64_t bdrv_nb_sectors(BlockDriverState *bs);
>  int64_t bdrv_getlength(BlockDriverState *bs);
>  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
> +void bdrv_measure(BlockDriver *drv, QemuOpts *opts,
> +  BlockDriverState *in_bs,
> +  BlockMeasureInfo *info,
> +  Error **errp);
>  void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
>  void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
>  int bdrv_commit(BlockDriverState *bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 6c699ac..45a7fbe 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -201,6 +201,8 @@ struct BlockDriver {
>  int64_t (*bdrv_getlength)(BlockDriverState *bs);
>  bool has_variable_length;
>  int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
> +void (*bdrv_measure)(QemuOpts *opts, BlockDriverState *in_bs,
> + BlockMeasureInfo *info, Error **errp);
>  
>  int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
>  uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
> diff --git a/block.c b/block.c
> index cb57370..532a4d1 100644
> --- a/block.c
> +++ b/block.c
> @@ -3260,6 +3260,39 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
> *bs)
>  return -ENOTSUP;
>  }
>  
> +/*
> + * bdrv_measure:
> + * @drv: Format driver
> + * @opts: Creation options
> + * @in_bs: Existing image containing data for new image (may be NULL)
> + * @info: Result object
> + * @errp: Error object
> + *
> + * Calculate file size required to create a new image.
> + *
> + * If @in_bs is given then space for allocated clusters and zero clusters
> + * from that image are included in the calculation.  If @opts contains a
> + * backing file that is shared by @in_bs then backing clusters are omitted
> + * from the calculation.

This seems to run a bit contrary to the documentation of
BlockMeasureInfo.required-bytes, and I don't fully understand it either.

What does "space for zero clusters" mean? Do zero clusters take space?
Does it depend on the image format? (i.e. would they take space for raw
but not for qcow2?)

And is space for unallocated clusters included or not? Do unallocated
clusters without a backing image count as zero clusters?

If that space is not included, then it would run contrary to the QAPI
documentation which states that it should be included.

Finally, how are you supposed to check whether the backing file in @opts
is shared by @in_bs?

> + *
> + * If @in_bs is NULL then the calculation includes no allocated clusters
> + * unless a preallocation option is given in @opts.

But the BlockMeasureInfo.required-bytes documentation states that a new
5 GB raw image should still report 5 GB of required space.

Max

> + *
> + * Note that @in_bs may use a different BlockDriver from @drv.
> + */
> +void bdrv_measure(BlockDriver *drv, 

[Qemu-devel] [PULL for-rc1 3/3] ide: ahci: call cleanup function in ahci unit

2017-03-15 Thread John Snow
From: Li Qiang 

This can avoid memory leak when hotunplug the ahci device.

Signed-off-by: Li Qiang 
Message-id: 1488449293-80280-4-git-send-email-liqiang...@360.cn
Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 6a17acf..f60826d 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1485,6 +1485,18 @@ void ahci_realize(AHCIState *s, DeviceState *qdev, 
AddressSpace *as, int ports)
 
 void ahci_uninit(AHCIState *s)
 {
+int i, j;
+
+for (i = 0; i < s->ports; i++) {
+AHCIDevice *ad = >dev[i];
+
+for (j = 0; j < 2; j++) {
+IDEState *s = >port.ifs[j];
+
+ide_exit(s);
+}
+}
+
 g_free(s->dev);
 }
 
-- 
2.9.3




[Qemu-devel] [PULL for-rc1 1/3] ide: qdev: register ide bus unrealize function

2017-03-15 Thread John Snow
From: Li Qiang 

we have an idebus unrealize function, but it was being
registered as the unrealize function for the IDE Device,
so it was not getting invoked on device teardown because
nothing is "unrealizing" the IDE devices themselves.

Suggested-by: John Snow 
Signed-off-by: Li Qiang 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 1488449293-80280-2-git-send-email-liqiang...@360.cn
Signed-off-by: John Snow 
---
 hw/ide/qdev.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 4383cd1..299e592 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -31,7 +31,7 @@
 /* - */
 
 static char *idebus_get_fw_dev_path(DeviceState *dev);
-static void idebus_unrealize(DeviceState *qdev, Error **errp);
+static void idebus_unrealize(BusState *qdev, Error **errp);
 
 static Property ide_props[] = {
 DEFINE_PROP_UINT32("unit", IDEDevice, unit, -1),
@@ -43,14 +43,15 @@ static void ide_bus_class_init(ObjectClass *klass, void 
*data)
 BusClass *k = BUS_CLASS(klass);
 
 k->get_fw_dev_path = idebus_get_fw_dev_path;
+k->unrealize = idebus_unrealize;
 }
 
-static void idebus_unrealize(DeviceState *qdev, Error **errp)
+static void idebus_unrealize(BusState *bus, Error **errp)
 {
-IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev->parent_bus);
+IDEBus *ibus = IDE_BUS(bus);
 
-if (bus->vmstate) {
-qemu_del_vm_change_state_handler(bus->vmstate);
+if (ibus->vmstate) {
+qemu_del_vm_change_state_handler(ibus->vmstate);
 }
 }
 
@@ -370,7 +371,6 @@ static void ide_device_class_init(ObjectClass *klass, void 
*data)
 k->init = ide_qdev_init;
 set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
 k->bus_type = TYPE_IDE_BUS;
-k->unrealize = idebus_unrealize;
 k->props = ide_props;
 }
 
-- 
2.9.3




[Qemu-devel] [PULL for-rc1 2/3] ide: core: add cleanup function

2017-03-15 Thread John Snow
From: Li Qiang 

As the pci ahci can be hotplug and unplug, in the ahci unrealize
function it should free all the resource once allocated in the
realized function. This patch add ide_exit to free the resource.

Signed-off-by: Li Qiang 
Message-id: 1488449293-80280-3-git-send-email-liqiang...@360.cn
Signed-off-by: John Snow 
---
 hw/ide/core.c | 8 
 include/hw/ide/internal.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index db509b3..0b48b64 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2603,6 +2603,14 @@ void ide_init2(IDEBus *bus, qemu_irq irq)
 bus->dma = _dma_nop;
 }
 
+void ide_exit(IDEState *s)
+{
+timer_del(s->sector_write_timer);
+timer_free(s->sector_write_timer);
+qemu_vfree(s->smart_selftest_data);
+qemu_vfree(s->io_buffer);
+}
+
 static const MemoryRegionPortio ide_portio_list[] = {
 { 0, 8, 1, .read = ide_ioport_read, .write = ide_ioport_write },
 { 0, 1, 2, .read = ide_data_readw, .write = ide_data_writew },
diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
index 88dc118..482a951 100644
--- a/include/hw/ide/internal.h
+++ b/include/hw/ide/internal.h
@@ -607,6 +607,7 @@ int ide_init_drive(IDEState *s, BlockBackend *blk, 
IDEDriveKind kind,
uint32_t cylinders, uint32_t heads, uint32_t secs,
int chs_trans);
 void ide_init2(IDEBus *bus, qemu_irq irq);
+void ide_exit(IDEState *s);
 void ide_init_ioport(IDEBus *bus, ISADevice *isa, int iobase, int iobase2);
 void ide_register_restart_cb(IDEBus *bus);
 
-- 
2.9.3




[Qemu-devel] [PULL for-rc1 0/3] Ide patches

2017-03-15 Thread John Snow
The following changes since commit 1883ff34b540daacae948f493b0ba525edf5f642:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2017-03-15 18:44:05 +)

are available in the git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to d68f0f778e7f4fbd674627274267f269e40f0b04:

  ide: ahci: call cleanup function in ahci unit (2017-03-15 20:50:14 -0400)





Li Qiang (3):
  ide: qdev: register ide bus unrealize function
  ide: core: add cleanup function
  ide: ahci: call cleanup function in ahci unit

 hw/ide/ahci.c | 12 
 hw/ide/core.c |  8 
 hw/ide/qdev.c | 12 ++--
 include/hw/ide/internal.h |  1 +
 4 files changed, 27 insertions(+), 6 deletions(-)

-- 
2.9.3




[Qemu-devel] [RFC patch 0/3] block: pause block jobs for bdrv_drain_begin/end

2017-03-15 Thread John Snow
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1367369#c8

It's possible to wedge QEMU if the guest tries to reset a virtio-pci
device as QEMU is also using the drive for a blockjob. This patchset
aims to allow us to safely pause/resume jobs attached to individual
nodes in a manner similar to how bdrv_drain_all_begin/end do.

Kevin suggested a DevOps approach which I've approximated here, but
I've got a number of questions before I push the point any further.

John Snow (3):
  blockjob: add block_job_start_shim
  block-backend: add drained_begin / drained_end ops
  blockjob: add devops to blockjob backends

 block/block-backend.c  | 25 ++
 blockjob.c | 60 +-
 include/sysemu/block-backend.h |  8 ++
 3 files changed, 81 insertions(+), 12 deletions(-)

-- 
2.9.3




[Qemu-devel] [RFC patch 1/3] blockjob: add block_job_start_shim

2017-03-15 Thread John Snow
The purpose of this shim is to allow us to pause pre-started jobs.
The purpose of *that* is to allow us to buffer a pause request that
will be able to take effect before the job ever does any work, allowing
us to create jobs during a quiescent state (under which they will be
automatically paused), then resuming the jobs after the critical section
in any order, either:

(1) -block_job_start
-block_job_resume (via e.g. drained_end)

(2) -block_job_resume (via e.g. drained_end)
-block_job_start

The problem that requires a shim is the idea that a job must start in the
busy=true state only its first time-- all subsequent entries require busy
to be false, and the toggling of this state is otherwise handled during
existing pause and yield points.

The shim simply allows us to mandate that a job can "start," set busy to
true, then immediately pause only if necessary.

Signed-off-by: John Snow 
---
 blockjob.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 69126af..6d08ce5 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -250,16 +250,28 @@ static bool block_job_started(BlockJob *job)
 return job->co;
 }
 
+/**
+ * This code must run after the job's coroutine has been entered,
+ * but not before.
+ */
+static void coroutine_fn block_job_start_shim(void *opaque)
+{
+BlockJob *job = opaque;
+
+assert(job && job->driver && job->driver->start);
+block_job_pause_point(job);
+job->driver->start(job);
+}
+
 void block_job_start(BlockJob *job)
 {
 assert(job && !block_job_started(job) && job->paused &&
-   !job->busy && job->driver->start);
-job->co = qemu_coroutine_create(job->driver->start, job);
-if (--job->pause_count == 0) {
-job->paused = false;
-job->busy = true;
-qemu_coroutine_enter(job->co);
-}
+   job->driver && job->driver->start);
+job->co = qemu_coroutine_create(block_job_start_shim, job);
+job->pause_count--;
+job->busy = true;
+job->paused = false;
+qemu_coroutine_enter(job->co);
 }
 
 void block_job_ref(BlockJob *job)
-- 
2.9.3




[Qemu-devel] [RFC patch 2/3] block-backend: add drained_begin / drained_end ops

2017-03-15 Thread John Snow
Signed-off-by: John Snow 

---

RFC questions:

- Does the presence of blk->quiesce_counter relieve the burden of needing
  blk->public.io_limits_disabled? I could probably eliminate this counter
  entirely and just spy on the root node's quiescent state at key moments
  instead. I am confident I'm traipsing on delicate drain semantics.

- Should I treat the separation of a quisced BDS/BB as a drained_end request
  as an analogue to how blk_insert_bs (in this patch) handles this?

Signed-off-by: John Snow 
---
 block/block-backend.c  | 25 +
 include/sysemu/block-backend.h |  8 
 2 files changed, 33 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..eb85e8b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -65,6 +65,8 @@ struct BlockBackend {
 bool allow_write_beyond_eof;
 
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
+
+int quiesce_counter;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
 }
 bdrv_ref(bs);
 
+/* The new BDS may be quiescent, we should attempt to match */
+if (bs->quiesce_counter) {
+blk_root_drained_begin(blk->root);
+}
+
 notifier_list_notify(>insert_bs_notifiers, blk);
 if (blk->public.throttle_state) {
 throttle_timers_attach_aio_context(
@@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 
 blk->dev_ops = ops;
 blk->dev_opaque = opaque;
+
+/* Are we currently quiesced? Should we enforce this right now? */
+if (blk->quiesce_counter && ops->drained_begin) {
+ops->drained_begin(opaque);
+}
 }
 
 /*
@@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
 
+blk->quiesce_counter++;
+
 /* Note that blk->root may not be accessible here yet if we are just
  * attaching to a BlockDriverState that is drained. Use child instead. */
 
+if (blk->dev_ops && blk->dev_ops->drained_begin) {
+blk->dev_ops->drained_begin(blk->dev_opaque);
+}
+
 if (blk->public.io_limits_disabled++ == 0) {
 throttle_group_restart_blk(blk);
 }
@@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
 
+assert(blk->quiesce_counter);
+blk->quiesce_counter--;
+
 assert(blk->public.io_limits_disabled);
 --blk->public.io_limits_disabled;
+
+if (blk->dev_ops && blk->dev_ops->drained_end) {
+blk->dev_ops->drained_end(blk->dev_opaque);
+}
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 096c17f..c6f4408 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -58,6 +58,14 @@ typedef struct BlockDevOps {
  * Runs when the size changed (e.g. monitor command block_resize)
  */
 void (*resize_cb)(void *opaque);
+/*
+ * Runs when the backend receives a drain request.
+ */
+void (*drained_begin)(void *opaque);
+/*
+ * Runs when the backend's drain request ends.
+ */
+void (*drained_end)(void *opaque);
 } BlockDevOps;
 
 /* This struct is embedded in (the private) BlockBackend struct and contains
-- 
2.9.3




[Qemu-devel] [RFC patch 3/3] blockjob: add devops to blockjob backends

2017-03-15 Thread John Snow
This lets us hook into drained_begin and drained_end requests from the
backend level, which is particularly useful for making sure that all
jobs associated with a particular node (whether the source or the target)
receive a drain request.

Suggested-by: Kevin Wolf 
Signed-off-by: John Snow 

--

RFC topics:

- BlkDevOps is traditionally only for Qdev devices, and a BlockJob is not
  currently a 'device'... Do we want to loosen this restriction, find another
  way to deliver callbacks to BlockJobs attached to BlkBackends, or do something
  crazy like make a BlockJob device object?

  struct JobDevice {
DeviceState parent_obj;
BlockJob *job;
  } ...??

- Not so sure about leaving the initial job refcount at 1, since we are giving
  away a handle to this job as dev_opaque. By incrementing it to 2, however,
  it's not clear whose responsibility it is to decrement it again.

Signed-off-by: John Snow 
---
 blockjob.c | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 6d08ce5..0542677 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -68,6 +68,23 @@ static const BdrvChildRole child_job = {
 .stay_at_node   = true,
 };
 
+static void block_job_drained_begin(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+.drained_begin = block_job_drained_begin,
+.drained_end = block_job_drained_end,
+};
+
 BlockJob *block_job_next(BlockJob *job)
 {
 if (!job) {
@@ -205,11 +222,6 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 }
 
 job = g_malloc0(driver->instance_size);
-error_setg(>blocker, "block device is in use by block job: %s",
-   BlockJobType_lookup[driver->job_type]);
-block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, _abort);
-bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
-
 job->driver= driver;
 job->id= g_strdup(job_id);
 job->blk   = blk;
@@ -219,8 +231,20 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->paused= true;
 job->pause_count   = 1;
 job->refcnt= 1;
+/* RFC: job is now stored both in bs->job and blk->dev_opaque,
+ * but we cannot expect the backend to know how to put down the ref
+ * so it'd be up to this code here to increment it anyway, making it
+ * a bit of a useless exercise. It would be up to us to ensure that
+ * we destruct the blkbackend before putting down our last reference. */
+
+error_setg(>blocker, "block device is in use by block job: %s",
+   BlockJobType_lookup[driver->job_type]);
+block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, _abort);
 bs->job = job;
 
+blk_set_dev_ops(blk, _job_dev_ops, job);
+bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+
 QLIST_INSERT_HEAD(_jobs, job, job_list);
 
 blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
-- 
2.9.3




Re: [Qemu-devel] [PATCH] nbd-client: fix handling of hungup connections

2017-03-15 Thread Max Reitz
On 14.03.2017 12:11, Paolo Bonzini wrote:
> After the switch to reading replies in a coroutine, nothing is
> reentering pending receive coroutines if the connection hangs.
> Move nbd_recv_coroutines_enter_all to the reply read coroutine,
> which is the place where hangups are detected.  nbd_teardown_connection
> can simply wait for the reply read coroutine to detect the hangup
> and clean up after itself.
> 
> This wouldn't be enough though because nbd_receive_reply returns 0
> (rather than -EPIPE or similar) when reading from a hung connection.
> Fix the return value check in nbd_read_reply_entry.
> 
> This fixes qemu-iotests 083.
> 
> Reported-by: Max Reitz 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd-client.c | 12 ++--
>  nbd/client.c   |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 0dc12c2..1e2952f 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -33,17 +33,15 @@
>  #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
>  #define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
>  
> -static void nbd_recv_coroutines_enter_all(BlockDriverState *bs)
> +static void nbd_recv_coroutines_enter_all(NBDClientSession *s)
>  {
> -NBDClientSession *s = nbd_get_client_session(bs);
>  int i;
>  
>  for (i = 0; i < MAX_NBD_REQUESTS; i++) {
>  if (s->recv_coroutine[i]) {
> -qemu_coroutine_enter(s->recv_coroutine[i]);
> +aio_co_wake(s->recv_coroutine[i]);
>  }
>  }
> -BDRV_POLL_WHILE(bs, s->read_reply_co);
>  }
>  
>  static void nbd_teardown_connection(BlockDriverState *bs)
> @@ -58,7 +56,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  qio_channel_shutdown(client->ioc,
>   QIO_CHANNEL_SHUTDOWN_BOTH,
>   NULL);
> -nbd_recv_coroutines_enter_all(bs);
> +BDRV_POLL_WHILE(bs, client->read_reply_co);
>  
>  nbd_client_detach_aio_context(bs);
>  object_unref(OBJECT(client->sioc));
> @@ -76,7 +74,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>  for (;;) {
>  assert(s->reply.handle == 0);
>  ret = nbd_receive_reply(s->ioc, >reply);
> -if (ret < 0) {
> +if (ret <= 0) {
>  break;
>  }
>  
> @@ -103,6 +101,8 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>  aio_co_wake(s->recv_coroutine[i]);
>  qemu_coroutine_yield();
>  }
> +
> +nbd_recv_coroutines_enter_all(s);
>  s->read_reply_co = NULL;
>  }
>  
> diff --git a/nbd/client.c b/nbd/client.c
> index 5c9dee3..746e9a7 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -812,6 +812,6 @@ ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply 
> *reply)
>  LOG("invalid magic (got 0x%" PRIx32 ")", magic);
>  return -EINVAL;
>  }
> -return 0;
> +return sizeof(buf);

I'd (weakly) prefer for this function to instead return an error value
(e.g. -EPIPE as the commit message says) if read_sync() returned 0. This
is because a return value of 0 is often used to signify success and it's
always a bit weird if functions to not adhere to this convention. Just a
rather weak preference, though.

So:

Reviewed-by: Max Reitz 

And generously leaving it to you (whether) to apply the patch. O:-)

Max

>  }
>  
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] block: quiesce AioContext when detaching from it

2017-03-15 Thread Max Reitz
On 14.03.2017 12:11, Paolo Bonzini wrote:
> While it is true that bdrv_set_aio_context only works on a single
> BlockDriverState subtree (see commit message for 53ec73e, "block: Use
> bdrv_drain to replace uncessary bdrv_drain_all", 2015-07-07), it works
> at the AioContext level rather than the BlockDriverState level.
> 
> Therefore, it is also necessary to trigger pending bottom halves too,
> even if no requests are pending.
> 
> For NBD this ensures that the aio_co_schedule of a previous call to
> nbd_attach_aio_context is completed before detaching from the old
> AioContext; it fixes qemu-iotest 094.  Another similar bug happens
> when the VM is stopped and the virtio-blk dataplane irqfd is torn down.
> In this case it's possible that guest I/O gets stuck if notify_guest_bh
> was scheduled but doesn't run.
> 
> Calling aio_poll from another AioContext is safe if non-blocking; races
> such as the one mentioned in the commit message for c9d1a56 ("block:
> only call aio_poll on the current thread's AioContext", 2016-10-28)
> are a concern for blocking calls.
> 
> I considered other options, including:
> 
> - moving the bs->wakeup mechanism to AioContext, and letting the caller
> check.  This might work for virtio which has a clear place to wakeup
> (notify_place_bh) and check the condition (virtio_blk_data_plane_stop).
> For aio_co_schedule I couldn't find a clear place to check the condition.
> 
> - adding a dummy oneshot bottom half and waiting for it to trigger.
> This has the complication that bottom half list is LIFO for historical
> reasons.  There were performance issues caused by bottom half ordering
> in the past, so I decided against it for 2.9.
> 
> Fixes: 99723548561978da8ef44cf804fb7912698f5d88
> Reported-by: Max Reitz 
> Reported-by: Halil Pasic 
> Tested-by: Halil Pasic 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c | 7 +++
>  1 file changed, 7 insertions(+)

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 06/30] trace: Fix parameter types in migration

2017-03-15 Thread Eric Blake
On 03/15/2017 02:55 PM, Eric Blake wrote:

>> Okay, my next version will insert an explicit cast in any caller that is
>> otherwise passing a __u64 (since we can't guarantee what type __u64
>> resolves to, and therefore what format string is appropriate for it).
> 
> Or maybe I will just omit those changes from my next version, in lieu of
> a gcc bug report.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80060 if you are interested

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] blockdev: fix bitmap clear undo

2017-03-15 Thread Eric Blake
On 03/15/2017 04:28 PM, John Snow wrote:
> Only undo the action if we actually prepared the action.
> 
> Signed-off-by: John Snow 
> ---
>  blockdev.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/blockdev.c b/blockdev.c
> index f1f49bd..c5b2c2c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -2047,7 +2047,9 @@ static void 
> block_dirty_bitmap_clear_abort(BlkActionState *common)
>  BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>   common, common);
>  
> -bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup);
> +if (state->backup) {
> +bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup);
> +}
>  }
>  
>  static void block_dirty_bitmap_clear_commit(BlkActionState *common)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 6/9] xen/9pfs: receive requests from the frontend

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:57 -0700
> Stefano Stabellini  wrote:
> 
> > Upon receiving an event channel notification from the frontend, schedule
> > the bottom half. From the bottom half, read one request from the ring,
> > create a pdu and call pdu_submit to handle it.
> > 
> > For now, only handle one request per ring at a time.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > CC: Aneesh Kumar K.V 
> > CC: Greg Kurz 
> > ---
> 
> Oops, one more remark I forgot in my the previous mail. See below.
> 
> >  hw/9pfs/xen-9p-backend.c | 47 
> > +++
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 0e4a133..741dd31 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -94,12 +94,59 @@ static int xen_9pfs_init(struct XenDevice *xendev)
> >  return 0;
> >  }
> >  
> > +static int xen_9pfs_receive(struct Xen9pfsRing *ring)
> > +{
> > +struct xen_9pfs_header h;
> > +RING_IDX cons, prod, masked_prod, masked_cons;
> > +V9fsPDU *pdu;
> > +
> > +if (ring->inprogress) {
> > +return 0;
> > +}
> > +
> > +cons = ring->intf->out_cons;
> > +prod = ring->intf->out_prod;
> > +xen_rmb();
> > +
> > +if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
> > +return 0;
> > +}
> > +ring->inprogress = true;
> > +
> > +masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > +masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > +xen_9pfs_read_packet(ring->ring.out, masked_prod, _cons,
> > + XEN_9PFS_RING_SIZE, (uint8_t*) , sizeof(h));
> > +
> > +pdu = pdu_alloc(>priv->state);
> > +pdu->size = h.size;
> 
> 9P uses little-endian, so this should be:
> 
> pdu->size = le32_to_cpu(h.size);
> 
> > +pdu->id = h.id;
> > +pdu->tag = h.tag;
> 
> and:
> 
> pdu->tag = le16_to_cpu(h.tag);
> 
> > +ring->out_size = h.size;
> > +ring->out_cons = cons + h.size;
> 
> Same here.

OK, thanks. Xen doesn't support big endian today, but they don't hurt.


> > +
> > +qemu_co_queue_init(>complete);
> > +pdu_submit(pdu);
> > +
> > +return 0;
> > +}
> > +
> >  static void xen_9pfs_bh(void *opaque)
> >  {
> > +struct Xen9pfsRing *ring = opaque;
> > +xen_9pfs_receive(ring);
> >  }
> >  
> >  static void xen_9pfs_evtchn_event(void *opaque)
> >  {
> > +struct Xen9pfsRing *ring = opaque;
> > +evtchn_port_t port;
> > +
> > +port = xenevtchn_pending(ring->evtchndev);
> > +xenevtchn_unmask(ring->evtchndev, port);
> > +
> > +qemu_bh_schedule(ring->bh);
> >  }
> >  
> >  static int xen_9pfs_free(struct XenDevice *xendev)
> 
> 



Re: [Qemu-devel] [PATCH v2 8/9] xen/9pfs: send responses back to the frontend

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:59 -0700
> Stefano Stabellini  wrote:
> 
> > Once a request is completed, xen_9pfs_push_and_notify gets called. In
> > xen_9pfs_push_and_notify, update the indexes (data has already been
> > copied to the sg by the common code) and send a notification to the
> > frontend.
> > 
> > Schedule the bottom-half to check if we already have any other requests
> > pending.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > CC: Aneesh Kumar K.V 
> > CC: Greg Kurz 
> > ---
> >  hw/9pfs/xen-9p-backend.c | 16 
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index d72a749..fed369f 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -166,6 +166,22 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
> >  
> >  static void xen_9pfs_push_and_notify(V9fsPDU *pdu)
> >  {
> > +RING_IDX prod;
> > +struct Xen9pfsDev *priv = container_of(pdu->s, struct Xen9pfsDev, 
> > state);
> > +struct Xen9pfsRing *ring = >rings[pdu->tag % priv->num_rings];
> > +
> 
> Coding style for structured types.

Yep



> > +ring->intf->out_cons = ring->out_cons;
> > +xen_wmb();
> > +
> > +prod = ring->intf->in_prod;
> > +xen_rmb();
> > +ring->intf->in_prod = prod + pdu->size;
> > +xen_wmb();
> > +
> > +ring->inprogress = false;
> > +xenevtchn_notify(ring->evtchndev, ring->local_port);
> > +
> > +qemu_bh_schedule(ring->bh);
> >  }
> >  
> >  static const struct V9fsTransport xen_9p_transport = {
> 
> 



Re: [Qemu-devel] [PATCH v2 7/9] xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:58 -0700
> Stefano Stabellini  wrote:
> 
> > Implement xen_9pfs_init_in/out_iov_from_pdu and
> > xen_9pfs_pdu_vmarshal/vunmarshall by creating new sg pointing to the
> > data on the ring.
> > 
> > This is safe as we only handle one request per ring at any given time.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > CC: Aneesh Kumar K.V 
> > CC: Greg Kurz 
> > ---
> >  hw/9pfs/xen-9p-backend.c | 91 
> > ++--
> >  1 file changed, 89 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 741dd31..d72a749 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -48,12 +48,77 @@ typedef struct Xen9pfsDev {
> >  struct Xen9pfsRing *rings;
> >  } Xen9pfsDev;
> >  
> > +static void xen_9pfs_in_sg(struct Xen9pfsRing *ring,
> 
> Coding style for structured types.

OK


> > +   struct iovec *in_sg,
> > +   int *num,
> > +   uint32_t idx,
> > +   uint32_t size)
> > +{
> > +RING_IDX cons, prod, masked_prod, masked_cons;
> > +
> > +cons = ring->intf->in_cons;
> > +prod = ring->intf->in_prod;
> > +xen_rmb();
> > +masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > +masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > +if (masked_prod < masked_cons) {
> > +in_sg[0].iov_base = ring->ring.in + masked_prod;
> > +in_sg[0].iov_len = masked_cons - masked_prod;
> > +*num = 1;
> > +} else {
> > +in_sg[0].iov_base = ring->ring.in + masked_prod;
> > +in_sg[0].iov_len = XEN_9PFS_RING_SIZE - masked_prod;
> > +in_sg[1].iov_base = ring->ring.in;
> > +in_sg[1].iov_len = masked_cons;
> > +*num = 2;
> > +}
> > +}
> > +
> > +static void xen_9pfs_out_sg(struct Xen9pfsRing *ring,
> 
> Coding style for structured types.

OK

> > +struct iovec *out_sg,
> > +int *num,
> > +uint32_t idx)
> > +{
> > +RING_IDX cons, prod, masked_prod, masked_cons;
> > +
> > +cons = ring->intf->out_cons;
> > +prod = ring->intf->out_prod;
> > +xen_rmb();
> > +masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > +masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > +if (masked_cons < masked_prod) {
> > +out_sg[0].iov_base = ring->ring.out + masked_cons;
> > +out_sg[0].iov_len = ring->out_size;
> > +*num = 1;
> > +} else {
> > +if (ring->out_size > (XEN_9PFS_RING_SIZE - masked_cons)) {
> > +out_sg[0].iov_base = ring->ring.out + masked_cons;
> > +out_sg[0].iov_len = XEN_9PFS_RING_SIZE - masked_cons;
> > +out_sg[1].iov_base = ring->ring.out;
> > +out_sg[1].iov_len = ring->out_size - (XEN_9PFS_RING_SIZE - 
> > masked_cons);
> > +*num = 2;
> > +} else {
> > +out_sg[0].iov_base = ring->ring.out + masked_cons;
> > +out_sg[0].iov_len = ring->out_size;
> > +*num = 1;
> > +}
> > +}
> > +}
> > +
> >  static ssize_t xen_9pfs_pdu_vmarshal(V9fsPDU *pdu,
> >   size_t offset,
> >   const char *fmt,
> >   va_list ap)
> >  {
> > -return 0;
> > +struct Xen9pfsDev *xen_9pfs = container_of(pdu->s, struct Xen9pfsDev, 
> > state);
> 
> Coding style for structured types.

OK 


> > +struct iovec in_sg[2];
> > +int num;
> > +
> > +xen_9pfs_in_sg(_9pfs->rings[pdu->tag % xen_9pfs->num_rings],
> > +   in_sg, , pdu->idx, ROUND_UP(offset + 128, 512));
> > +return v9fs_iov_vmarshal(in_sg, num, offset, 0, fmt, ap);
> >  }
> >  
> >  static ssize_t xen_9pfs_pdu_vunmarshal(V9fsPDU *pdu,
> > @@ -61,13 +126,27 @@ static ssize_t xen_9pfs_pdu_vunmarshal(V9fsPDU *pdu,
> > const char *fmt,
> > va_list ap)
> >  {
> > -return 0;
> > +struct Xen9pfsDev *xen_9pfs = container_of(pdu->s, struct Xen9pfsDev, 
> > state);
> 
> Coding style for structured types.

OK


> > +struct iovec out_sg[2];
> > +int num;
> > +
> > +xen_9pfs_out_sg(_9pfs->rings[pdu->tag % xen_9pfs->num_rings],
> > +out_sg, , pdu->idx);
> > +return v9fs_iov_vunmarshal(out_sg, num, offset, 0, fmt, ap);
> >  }
> >  
> >  static void xen_9pfs_init_out_iov_from_pdu(V9fsPDU *pdu,
> > struct iovec **piov,
> > unsigned int *pniov)
> >  {
> > +struct Xen9pfsDev 

Re: [Qemu-devel] [PATCH v2 6/9] xen/9pfs: receive requests from the frontend

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:57 -0700
> Stefano Stabellini  wrote:
> 
> > Upon receiving an event channel notification from the frontend, schedule
> > the bottom half. From the bottom half, read one request from the ring,
> > create a pdu and call pdu_submit to handle it.
> > 
> > For now, only handle one request per ring at a time.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > CC: Aneesh Kumar K.V 
> > CC: Greg Kurz 
> > ---
> >  hw/9pfs/xen-9p-backend.c | 47 
> > +++
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 0e4a133..741dd31 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -94,12 +94,59 @@ static int xen_9pfs_init(struct XenDevice *xendev)
> >  return 0;
> >  }
> >  
> > +static int xen_9pfs_receive(struct Xen9pfsRing *ring)
> 
> Coding style for structured types.

I'll fix


> > +{
> > +struct xen_9pfs_header h;
> > +RING_IDX cons, prod, masked_prod, masked_cons;
> > +V9fsPDU *pdu;
> > +
> > +if (ring->inprogress) {
> > +return 0;
> > +}
> > +
> > +cons = ring->intf->out_cons;
> > +prod = ring->intf->out_prod;
> > +xen_rmb();
> > +
> > +if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
> > +return 0;
> > +}
> > +ring->inprogress = true;
> > +
> > +masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > +masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > +xen_9pfs_read_packet(ring->ring.out, masked_prod, _cons,
> > + XEN_9PFS_RING_SIZE, (uint8_t*) , sizeof(h));
> > +
> > +pdu = pdu_alloc(>priv->state);
> 
> pdu_alloc() can theoretically return NULL. Maybe add a comment to
> indicate this won't happen because "for now [we] only handle one
> request per ring at a time".

OK


> > +pdu->size = h.size;
> > +pdu->id = h.id;
> > +pdu->tag = h.tag;
> > +ring->out_size = h.size;
> > +ring->out_cons = cons + h.size;
> > +
> > +qemu_co_queue_init(>complete);
> > +pdu_submit(pdu);
> > +
> > +return 0;
> > +}
> > +
> >  static void xen_9pfs_bh(void *opaque)
> >  {
> > +struct Xen9pfsRing *ring = opaque;
> 
> Coding style for structured types.

OK


> > +xen_9pfs_receive(ring);
> >  }
> >  
> >  static void xen_9pfs_evtchn_event(void *opaque)
> >  {
> > +struct Xen9pfsRing *ring = opaque;
> 
> Coding style for structured types.

OK


> > +evtchn_port_t port;
> > +
> > +port = xenevtchn_pending(ring->evtchndev);
> > +xenevtchn_unmask(ring->evtchndev, port);
> > +
> > +qemu_bh_schedule(ring->bh);
> >  }
> >  
> >  static int xen_9pfs_free(struct XenDevice *xendev)



Re: [Qemu-devel] [PATCH v2 5/9] xen/9pfs: connect to the frontend

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:56 -0700
> Stefano Stabellini  wrote:
> 
> > Write the limits of the backend to xenstore. Connect to the frontend.
> > Upon connection, allocate the rings according to the protocol
> > specification.
> > 
> > Initialize a QEMUBH to schedule work upon receiving an event channel
> > notification from the frontend.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > CC: Aneesh Kumar K.V 
> > CC: Greg Kurz 
> > ---
> >  hw/9pfs/xen-9p-backend.c | 159 
> > ++-
> >  1 file changed, 158 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 35032d3..0e4a133 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -17,8 +17,35 @@
> >  #include "qemu/config-file.h"
> >  #include "fsdev/qemu-fsdev.h"
> >  
> > +#define VERSIONS "1"
> > +#define MAX_RINGS 8
> > +#define MAX_RING_ORDER 8
> > +
> > +struct Xen9pfsRing {
> > +struct Xen9pfsDev *priv;
> > +
> > +int ref;
> > +xenevtchn_handle   *evtchndev;
> > +int evtchn;
> > +int local_port;
> > +struct xen_9pfs_data_intf *intf;
> > +unsigned char *data;
> > +struct xen_9pfs_data ring;
> > +
> > +QEMUBH *bh;
> > +
> > +/* local copies, so that we can read/write PDU data directly from
> > + * the ring */
> > +RING_IDX out_cons, out_size, in_cons;
> > +bool inprogress;
> > +};
> > +
> 
> QEMU Coding style requires a typedef.

OK


> FWIW, maybe you could also convert
> other structured types like XenDevice or XenDevOps.

I'll do in a separate series


> >  typedef struct Xen9pfsDev {
> >  struct XenDevice xendev;  /* must be first */
> > +V9fsState state;
> > +
> > +int num_rings;
> > +struct Xen9pfsRing *rings;
> >  } Xen9pfsDev;
> >  
> >  static ssize_t xen_9pfs_pdu_vmarshal(V9fsPDU *pdu,
> > @@ -67,22 +94,152 @@ static int xen_9pfs_init(struct XenDevice *xendev)
> >  return 0;
> >  }
> >  
> > +static void xen_9pfs_bh(void *opaque)
> > +{
> > +}
> > +
> > +static void xen_9pfs_evtchn_event(void *opaque)
> > +{
> > +}
> > +
> >  static int xen_9pfs_free(struct XenDevice *xendev)
> >  {
> > -return -1;
> > +int i;
> > +struct Xen9pfsDev *xen_9pdev = container_of(xendev, struct Xen9pfsDev, 
> > xendev);
> > +
> 
> Coding style.

OK


> > +for (i = 0; i < xen_9pdev->num_rings; i++) {
> > +if (xen_9pdev->rings[i].data != NULL) {
> > +xengnttab_unmap(xen_9pdev->xendev.gnttabdev,
> > +xen_9pdev->rings[i].data,
> > +(1 << XEN_9PFS_RING_ORDER));
> > +}
> > +if (xen_9pdev->rings[i].intf != NULL) {
> > +xengnttab_unmap(xen_9pdev->xendev.gnttabdev,
> > +xen_9pdev->rings[i].intf,
> > +1);
> > +}
> > +if (xen_9pdev->rings[i].evtchndev > 0) {
> > +
> > qemu_set_fd_handler(xenevtchn_fd(xen_9pdev->rings[i].evtchndev),
> > +NULL, NULL, NULL);
> > +xenevtchn_unbind(xen_9pdev->rings[i].evtchndev, 
> > xen_9pdev->rings[i].local_port);
> > +}
> > +if (xen_9pdev->rings[i].bh != NULL) {
> > +qemu_bh_delete(xen_9pdev->rings[i].bh);
> > +}
> > +}
> > +g_free(xen_9pdev->rings);
> > +return 0;
> >  }
> >  
> >  static int xen_9pfs_connect(struct XenDevice *xendev)
> >  {
> > +int i;
> > +struct Xen9pfsDev *xen_9pdev = container_of(xendev, struct Xen9pfsDev, 
> > xendev);
> 
> Coding style.

OK


> > +V9fsState *s = _9pdev->state;
> > +QemuOpts *fsdev;
> > +char *security_model, *path;
> > +
> > +if (xenstore_read_fe_int(_9pdev->xendev, "num-rings",
> > + _9pdev->num_rings) == -1 ||
> > +xen_9pdev->num_rings > MAX_RINGS) {
> > +return -1;
> > +}
> > +
> > +xen_9pdev->rings = g_malloc0(xen_9pdev->num_rings * sizeof(struct 
> > Xen9pfsRing));
> > +for (i = 0; i < xen_9pdev->num_rings; i++) {
> > +char str[16];
> > +
> > +xen_9pdev->rings[i].priv = xen_9pdev;
> > +xen_9pdev->rings[i].evtchn = -1;
> > +xen_9pdev->rings[i].local_port = -1;
> > +
> > +sprintf(str, "ring-ref%u", i);
> > +if (xenstore_read_fe_int(_9pdev->xendev, str,
> > + _9pdev->rings[i].ref) == -1) {
> > +goto out;
> > +}
> > +sprintf(str, "event-channel-%u", i);
> > +if (xenstore_read_fe_int(_9pdev->xendev, str,
> > + _9pdev->rings[i].evtchn) == -1) {
> > +goto out;
> > +}
> > +
> > +xen_9pdev->rings[i].intf =  xengnttab_map_grant_ref(
> > +xen_9pdev->xendev.gnttabdev,
> > +

[Qemu-devel] [PATCH] blockdev: fix bitmap clear undo

2017-03-15 Thread John Snow
Only undo the action if we actually prepared the action.

Signed-off-by: John Snow 
---
 blockdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index f1f49bd..c5b2c2c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2047,7 +2047,9 @@ static void block_dirty_bitmap_clear_abort(BlkActionState 
*common)
 BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
  common, common);
 
-bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup);
+if (state->backup) {
+bdrv_undo_clear_dirty_bitmap(state->bitmap, state->backup);
+}
 }
 
 static void block_dirty_bitmap_clear_commit(BlkActionState *common)
-- 
2.9.3




[Qemu-devel] [PULL for-2.9] Update OpenBIOS images

2017-03-15 Thread Mark Cave-Ayland
Hi Peter,

This update contains just the 64-bit PCI BAR fix which enables virtio modern 
devices to
work once again. Please pull.


ATB,

Mark. 


The following changes since commit 1883ff34b540daacae948f493b0ba525edf5f642:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2017-03-15 18:44:05 +)

are available in the git repository at:


  https://github.com/mcayland/qemu.git tags/qemu-openbios-signed

for you to fetch changes up to 111308e789b7a3e2771a61f68bd057f8f8b1bc22:

  Update OpenBIOS images to f233c3f built from submodule. (2017-03-15 19:42:08 
+)


Update OpenBIOS images


Mark Cave-Ayland (1):
  Update OpenBIOS images to f233c3f built from submodule.

 pc-bios/openbios-ppc |  Bin 750840 -> 750840 bytes
 pc-bios/openbios-sparc32 |  Bin 382048 -> 382048 bytes
 pc-bios/openbios-sparc64 |  Bin 1593408 -> 1593408 bytes
 roms/openbios|2 +-
 4 files changed, 1 insertion(+), 1 deletion(-)



Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-15 Thread Paolo Bonzini


- Original Message -
> From: "Anthony Xu" 
> To: "Paolo Bonzini" 
> Cc: "Yang Zhong" , "Chao P Peng" 
> , qemu-devel@nongnu.org
> Sent: Wednesday, March 15, 2017 8:05:48 PM
> Subject: Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 
> 12252kB to 2752KB
> 
> > The first unref is done after as->current_map is overwritten.
> > as->current_map is accessed under RCU, so it needs call_rcu.  It
> > balances the initial reference that is present since flatview_init.
> 
> Got it, thanks for explanation.
> 
> > > but it is not clear to me, is this a bug or by design? Is
> > > flatview_destroy called
> > in current thread
> > > or RCU thread?
> > 
> > You cannot know, because there are also other callers of
> > address_space_get_flatview.  Usually it's from the RCU thread.
> 
> If it is from current thread, do we need to check if the global lock is
> acquired?
> As you mentioned flatview_unref may call memory_region_unref.

No, you don't need to check it.  Most functions (in this case,
memory_region_transaction_commit) can only be called under the global lock.
In fact, only address_space_read/write/ld*/st*, plus memory_region_find can
be called without the global lock.

> In the comments of memory_region_finalize
> "   /* We know the region is not visible in any address space (it
>  * does not have a container and cannot be a root either because
>  * it has no references,
> "
> 
> We know the memory region is not a root region, and the memory region has
> already been removed from address space even it has sub regions.
> memory_region_transaction_commit should be called when the
> memory region is removed from address space.
> memory_region_transaction_commit seems not be needed in
> memory_region_finalize.
> Let me know if you think otherwise.

Yes, you can replace memory_region_del_subregion in memory_region_finalize
with special code that does

assert(!mr->enabled);
assert(subregion->container == mr);
subregion->container = NULL;
QTAILQ_REMOVE(>subregions, subregion, subregions_link);
memory_region_unref(subregion);

The last four lines are shared with memory_region_del_subregion, so please
factor them in a new function, for example memory_region_del_subregion_internal.

> > > How about fall back to synchronize_rcu?
> > 
> > I'm afraid it would be too slow, but you can test.
> 
> It is not slow, the contention is not high. Yes we can test.

It's not a matter of contention.  It's a matter of how long an RCU
critical section can be---not just at startup, but at any point
during QEMU execution.

Have you tried tcmalloc or jemalloc?  They use the brk region
less and sometimes are more aggressive in releasing mmap-ed memory
areas.

> Please review below patch, MemoryRegion is protected by RCU.

Removing transaction begin/commit in memory_region_finalize makes
little sense because memory_region_del_subregion calls those two
functions again.  See above for an alternative.  Apart from this,
the patch is at least correct, though I'm not sure it's a good
idea (synchronize_rcu needs testing).  I'm not sure I understand the
address_space_write change).

Paolo

> Thanks,
> Anthony
> 
> 
> 
> diff --git a/exec.c b/exec.c
> index a22f5a0..6631668 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener)
> 
>  atomic_rcu_set(>dispatch, next);
>  if (cur) {
> -call_rcu(cur, address_space_dispatch_free, rcu);
> +synchronize_rcu();
> +address_space_dispatch_free(cur);
>  }
>  }
> 
> @@ -2567,7 +2568,8 @@ void address_space_destroy_dispatch(AddressSpace *as)
> 
>  atomic_rcu_set(>dispatch, NULL);
>  if (d) {
> -call_rcu(d, address_space_dispatch_free, rcu);
> +synchronize_rcu();
> +address_space_dispatch_free(d);
>  }
>  }
> 
> @@ -2712,19 +2714,22 @@ static bool prepare_mmio_access(MemoryRegion *mr)
>  return release_lock;
>  }
> 
> -/* Called within RCU critical section.  */
> -static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr
> addr,
> -MemTxAttrs attrs,
> -const uint8_t *buf,
> -int len, hwaddr addr1,
> -hwaddr l, MemoryRegion *mr)
> +MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
> +  MemTxAttrs attrs, const uint8_t *buf, int len)
>  {
>  uint8_t *ptr;
>  uint64_t val;
> +hwaddr l=len;
> +hwaddr addr1;
> +MemoryRegion *mr;
>  MemTxResult result = MEMTX_OK;
>  bool release_lock = false;
> 
>  for (;;) {
> +rcu_read_lock();
> +mr = address_space_translate(as, addr, , , true);
> +memory_region_ref(mr);
> +rcu_read_unlock();
>  if 

Re: [Qemu-devel] [RFC PATCH v2 30/30] trace: Force compiler warnings on trace parameter type mismatches

2017-03-15 Thread Eric Blake
On 03/13/2017 02:55 PM, Eric Blake wrote:

> ---
> 
> RFC for two reasons:
> 1. the Makefile issue documented above means that incremental
> builds won't benefit from this patch without manual intervention
> (fresh builds, including docker, manage to test it, though)

Dan's posted the fix for this one.

> 2. there are still failures under 'make docker-test-mingw@fedora'
> due to more type mismatches that still need to be squashed. I'm
> still working on fixing those, but wanted to at least post this
> series for initial review, especially so the maintainer can weigh
> in on how much (or little) belongs in 2.9

mingw is a nightmare.  Things like ntohl() having non-standard behavior
on 32-bit mingw can only be worked around by hairy redefinitions in
osdep.h or by casts at the callsites. See the conversation in 6/30, but
I'm hoping gcc will give us a new knob to tone down the level of
-Wformat warnings for two types that have different rank but the same
width; if gcc gives us that, then we'd add a configure test, and
conditionally generate the type-mismatch checking code ONLY when gcc is
smart enough to not care about rank mismatch.  Until then, this RFC
30/30 proved useful for flushing out the earlier cleanups (many of which
you can/should still accept), but should NOT be incorporated as-is, not
even when 2.10 opens.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 06/30] trace: Fix parameter types in migration

2017-03-15 Thread Eric Blake
On 03/14/2017 09:36 AM, Eric Blake wrote:

 So why is a PRIx64 not the right way to print a __u64 ?
>>>
>>> Because __u64 is not the same type as uint64_t.  On 64-bit Linux, __u64
>>> is 'unsigned long long', while uint64_t is 'unsigned long'.
>>>
 (I prefer %llx to the horrid PRIx64 syntax, but it still seems weird in 
 this case)
>>>
>>> As it is, I'm not sure if __u64 is always 'unsigned long long' in ALL
>>> Linux clients; an even-more-conservative patch would be to switch all
>>> callers to use explicit casts to something (like uint64_t or unsigned
>>> long long) that we have full control over, rather than passing __u64
>>> where we have no control over what type it ultimately resolves to.
>>
>> That would be my preference - casting to (uint64_t) at the caller and
>> keep this as PRIx64.  We know it's a 64bit value so we should never
>> use unsigned long long anywhere for it.
> 
> Okay, my next version will insert an explicit cast in any caller that is
> otherwise passing a __u64 (since we can't guarantee what type __u64
> resolves to, and therefore what format string is appropriate for it).

Or maybe I will just omit those changes from my next version, in lieu of
a gcc bug report.  Here's my summary of an IRC conversation on the topic:

I just discovered that 32-bit mingw has a bug: it's  header
declared ntohl() as a function that returns 'u_long' ('unsigned long'),
but that POSIX requires ntohl() to return 'uin32_t' (which is 'unsigned
int' in that platform).  The only workaround to buggy system headers is
to insert casts at all call sites, or to quit using overly-picky
-Wformat warnings.

The gist of my gcc bug report will be that newer gcc recently introduced
a fine-tuning for -Wformat, namely -Wformat-signedness.  Some people
really do care about mismatches between "%d" and 'unsigned', or between
"%x" and 'int', but most of us don't.  When -Wformat was originally
invented, there was a lot of 32-bit code, and very little 64-bit code,
so it made TOTAL sense that you wanted to flag mismatches between "%d"
and 'long', or between "%ld" and 'int', as it was a porting aid to
64-bit platforms.  But these days, when we KNOW that a 32-bit platform
has 32-bit 'int' and 'long', and therefore that printf will behave
identically, it would be nice to squelch the warning, and instead only
issue it when there really is a mismatch (such as "%d" and 'long' on
64-bit).  If there were such a knob (call it -Wformat-rank), then in
qemu, we'd turn it off (-Wno-format-rank) so we can play fast and sloppy
with same-width but different-rank typedefs (such as the kernel's __u64
vs. uint64_t, or mingw's 'u_long' vs. 'uint32_t), but still get the full
benefit of real warnings when there is a 32-bit vs. 64-bit difference.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH V2 0/4] hw/virtio: fix several PCI Express compliance issues

2017-03-15 Thread Marcel Apfelbaum

On 03/02/2017 08:25 AM, Marcel Apfelbaum wrote:

On 02/20/2017 10:43 PM, Marcel Apfelbaum wrote:

v1 -> v2:
  - Added compat properties (Michael S. Tsirkin)
  - Rebased on latest master
  - Regarding the patch 1/4, we don't need to init the PCI "standard"
config capabilities to 0 since they are "protected" by the Capabilities 
List bit
   (bit 4) to indicate that the Capabilities Pointer is located at offset 34h.


Fix a few issues found while running WHQL tests:



ping



ping

Please add it to 2.9 please, the patches are ready for some time.

Thanks,
Marcel


Thanks,
Marcel


 - Assertion 1F27399E-30B9-44BC-8908-D6E6F3836212: FAILED. Enhanced Capability 
Header register
   of the PCI Express Enhanced Capabilities Absent Indicator table must be 
read-only .

   Solved in patch 1/4

 - Assertion 47C39833-84AD-44EA-9723-0695202ADDEA: FAILED. Bit 0 (Correctable 
Error Reporting Enable)
   in the Device Control register (offset 8h) in the PCI Express Capability 
table must be read-writable .
 - Assertion 5CBA2A63-A48E-4443-85FA-A7DCD8EA47BC: FAILED. Bit 1 (Non-Fatal 
Error Reporting Enable)
   in the Device Control register (offset 8h) in the PCI Express Capability 
table must be read-writable .
 - Assertion 0AB06F7C-59CB-4F9A-8363-B51B1ACAB54F: FAILED. Bit 2 (Fatal Error 
Reporting Enable)
   in the Device Control register (offset 8h) in the PCI Express Capability 
table must be read-writable .
 - Assertion E3834E4A-A7BD-410C-9A61-FA91770D2A71: FAILED. Bit 3 (Unsupported 
Request Reporting Enable)
   in the Device Control register (offset 8h) in the PCI Express Capability 
table must be read-writable

   Solved in patch 2/4

 - Assertion 1587DC0B-FE59-494E-85B5-C2A59D0CC098: FAILED. Bit 6 (Common Clock 
Configuration)
   in the Link Control register (offset 10h) in the PCI Express Capability 
table must be read-writable .
 - Assertion 13DD25A3-07E4-4477-BE0F-2273BBB32174: FAILED. Bit 7 (Extended 
Synch) in the Link Control
   register (offset 10h) in the PCI Express Capability table must be 
read-writable .

  Solved in patch 3/4

  - AM Assertion 06779BD9-0C35-4CA1-9EB3-96E7DA9A74F8: FAILED. Bit range 1:0 
(PowerState)in
the Power Management Control/Status register (offset 4h) in the Power 
Management Capability table is 0h.
It must be 3h after a supported D3 transition.

Thanks,
Marcel

Marcel Apfelbaum (4):
  hw/pcie: fix Extended Configuration Space for devices with no Extended
Capabilities
  hw/virtio: fix error enabling flags in Device Control register
  hw/virtio: fix Link Control Register for PCI Express virtio devices
  hw/virtio: fix Power Management Control Register for PCI Express
virtio devices

 hw/pci/pci.c   |  2 ++
 hw/pci/pcie.c  | 20 
 hw/virtio/virtio-pci.c | 31 +++
 hw/virtio/virtio-pci.h | 12 
 include/hw/compat.h| 16 
 include/hw/pci/pci.h   |  2 ++
 include/hw/pci/pcie.h  |  5 +
 7 files changed, 88 insertions(+)








Re: [Qemu-devel] [PULL 0/7] virtio, pc: fixes

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 18:01, Michael S. Tsirkin  wrote:
> The following changes since commit d84f714eafedd8bb9d4aaec8b76417bef8e3535e:
>
>   Update version for v2.9.0-rc0 release (2017-03-14 19:18:23 +)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 60a8d8023473dd24957b3a66824f66cd35b80d64:
>
>   virtio-pci: reset modern vq meta data (2017-03-15 19:59:18 +0200)
>
> 
> virtio, pc: fixes
>
> Some fixes to fallback from using virtio caching,
> pls a minor vm gen id fix.
>
> Signed-off-by: Michael S. Tsirkin 
>
> 
> Ben Warren (1):
>   Bugfix: Handle error if VM Generation ID device not present
>
> Jason Wang (6):
>   virtio: guard against NULL pfn
>   virtio: destroy region cache during reset
>   virtio: validate address space cache during init
>   pci: introduce a bus master container
>   Revert "virtio: unbreak virtio-pci with IOMMU after caching ring 
> translations"
>   virtio-pci: reset modern vq meta data

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH v1 2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

2017-03-15 Thread Xu, Anthony
> The first unref is done after as->current_map is overwritten.
> as->current_map is accessed under RCU, so it needs call_rcu.  It
> balances the initial reference that is present since flatview_init.

Got it, thanks for explanation.

> > but it is not clear to me, is this a bug or by design? Is flatview_destroy 
> > called
> in current thread
> > or RCU thread?
> 
> You cannot know, because there are also other callers of
> address_space_get_flatview.  Usually it's from the RCU thread.

If it is from current thread, do we need to check if the global lock is 
acquired?
As you mentioned flatview_unref may call memory_region_unref.


> - let memory_region_finalize remove subregions (commit 2e2b8eb,
> "memory:
> allow destroying a non-empty MemoryRegion", 2015-10-09)
> 
> - let memory_region_finalize disable coalesced I/O (in fact there are no
> callers of memory_region_clear_coalescing outside memory.c)

Okay, It may have sub regions,

In the comments of memory_region_finalize
"   /* We know the region is not visible in any address space (it
 * does not have a container and cannot be a root either because
 * it has no references,
"

We know the memory region is not a root region, and the memory region has 
already been removed from address space even it has sub regions.
memory_region_transaction_commit should be called when the 
memory region is removed from address space.
memory_region_transaction_commit seems not be needed in memory_region_finalize.
Let me know if you think otherwise.


> > How about fall back to synchronize_rcu?
> 
> I'm afraid it would be too slow, but you can test.

It is not slow, the contention is not high. Yes we can test.


> Nope.  The RCU read lock protects all MemoryRegions through
> flatview_unref: RCU keeps the FlatView alive, and the FlatView keeps
> MemoryRegions alive.

I see, RCU needs to protect MemoryRegions as well.



Please review below patch, MemoryRegion is protected by RCU.

Thanks,
Anthony



diff --git a/exec.c b/exec.c
index a22f5a0..6631668 100644
--- a/exec.c
+++ b/exec.c
@@ -2521,7 +2521,8 @@ static void mem_commit(MemoryListener *listener)

 atomic_rcu_set(>dispatch, next);
 if (cur) {
-call_rcu(cur, address_space_dispatch_free, rcu);
+synchronize_rcu();
+address_space_dispatch_free(cur);
 }
 }

@@ -2567,7 +2568,8 @@ void address_space_destroy_dispatch(AddressSpace *as)

 atomic_rcu_set(>dispatch, NULL);
 if (d) {
-call_rcu(d, address_space_dispatch_free, rcu);
+synchronize_rcu();
+address_space_dispatch_free(d);
 }
 }

@@ -2712,19 +2714,22 @@ static bool prepare_mmio_access(MemoryRegion *mr)
 return release_lock;
 }

-/* Called within RCU critical section.  */
-static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
-MemTxAttrs attrs,
-const uint8_t *buf,
-int len, hwaddr addr1,
-hwaddr l, MemoryRegion *mr)
+MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
+  MemTxAttrs attrs, const uint8_t *buf, int len)
 {
 uint8_t *ptr;
 uint64_t val;
+hwaddr l=len;
+hwaddr addr1;
+MemoryRegion *mr;
 MemTxResult result = MEMTX_OK;
 bool release_lock = false;

 for (;;) {
+rcu_read_lock();
+mr = address_space_translate(as, addr, , , true);
+memory_region_ref(mr);
+rcu_read_unlock();
 if (!memory_access_is_direct(mr, true)) {
 release_lock |= prepare_mmio_access(mr);
 l = memory_access_size(mr, l, addr1);
@@ -2764,7 +2769,7 @@ static MemTxResult 
address_space_write_continue(AddressSpace *as, hwaddr addr,
 memcpy(ptr, buf, l);
 invalidate_and_set_dirty(mr, addr1, l);
 }
-
+memory_region_unref(mr);
 if (release_lock) {
 qemu_mutex_unlock_iothread();
 release_lock = false;
@@ -2779,27 +2784,6 @@ static MemTxResult 
address_space_write_continue(AddressSpace *as, hwaddr addr,
 }

 l = len;
-mr = address_space_translate(as, addr, , , true);
-}
-
-return result;
-}
-
-MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs 
attrs,
-const uint8_t *buf, int len)
-{
-hwaddr l;
-hwaddr addr1;
-MemoryRegion *mr;
-MemTxResult result = MEMTX_OK;
-
-if (len > 0) {
-rcu_read_lock();
-l = len;
-mr = address_space_translate(as, addr, , , true);
-result = address_space_write_continue(as, addr, attrs, buf, len,
-  addr1, l, mr);
-rcu_read_unlock();
 }

 return result;
diff --git a/memory.c b/memory.c
index 64b0a60..d12437c 100644
--- a/memory.c
+++ b/memory.c
@@ -1505,13 +1505,10 @@ static void 

Re: [Qemu-devel] [PATCH v2 2/9] xen: import ring.h from xen

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Wed, 15 Mar 2017 11:36:12 -0700 (PDT)
> Stefano Stabellini  wrote:
> 
> > On Wed, 15 Mar 2017, Greg Kurz wrote:
> > > On Mon, 13 Mar 2017 16:55:53 -0700
> > > Stefano Stabellini  wrote:
> > >   
> > > > Do not use the ring.h header installed on the system. Instead, import
> > > > the header into the QEMU codebase. This avoids problems when QEMU is
> > > > built against a Xen version too old to provide all the ring macros.
> > > >   
> > > 
> > > What kind of problems ?  
> > 
> > The FLEX macros are only available in Xen 4.9+ (still unreleased).
> > However, aside from these macros, the Xen 9pfs frontends and backends
> > could work fine on any Xen releases. In fact, the Xen public/io headers
> > are only provided as reference.
> > 
> 
> Ok.
> 
> > 
> > > > Signed-off-by: Stefano Stabellini 
> > > > CC: anthony.per...@citrix.com
> > > > CC: jgr...@suse.com
> > > > ---
> > > > NB: The new macros have not been committed to Xen yet. Do not apply this
> > > > patch until they do.  
> > > 
> > > Why ? Does this break compat with older Xen ?  
> > 
> > No, it does not break compatibility. But I think it is a good idea to
> > commit the header to QEMU only after the corresponding Xen header has
> > been accepted. I don't want the two to diverge.
> > 
> 
> Fair enough but this will put the entire patchset on hold then. When Xen 4.9
> is supposed to be released ?

In a few months, but I don't think we need to wait until the release,
just for the reviewed-by on the new macros, that should come soon.


> > > > ---
> > > > ---
> > > >  hw/block/xen_blkif.h |   2 +-
> > > >  hw/usb/xen-usb.c |   2 +-
> > > >  include/hw/xen/io/ring.h | 455 
> > > > +++
> > > >  3 files changed, 457 insertions(+), 2 deletions(-)
> > > >  create mode 100644 include/hw/xen/io/ring.h
> > > > 
> > > > diff --git a/hw/block/xen_blkif.h b/hw/block/xen_blkif.h
> > > > index 3300b6f..3e6e1ea 100644
> > > > --- a/hw/block/xen_blkif.h
> > > > +++ b/hw/block/xen_blkif.h
> > > > @@ -1,7 +1,7 @@
> > > >  #ifndef XEN_BLKIF_H
> > > >  #define XEN_BLKIF_H
> > > >  
> > > > -#include 
> > > > +#include "hw/xen/io/ring.h"
> > > >  #include 
> > > >  #include 
> > > >  
> > > > diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
> > > > index 8e676e6..370b3d9 100644
> > > > --- a/hw/usb/xen-usb.c
> > > > +++ b/hw/usb/xen-usb.c
> > > > @@ -33,7 +33,7 @@
> > > >  #include "qapi/qmp/qint.h"
> > > >  #include "qapi/qmp/qstring.h"
> > > >  
> > > > -#include 
> > > > +#include "hw/xen/io/ring.h"
> > > >  #include 
> > > >  
> > > >  /*
> > > > diff --git a/include/hw/xen/io/ring.h b/include/hw/xen/io/ring.h
> > > > new file mode 100644
> > > > index 000..cf01fc3
> > > > --- /dev/null
> > > > +++ b/include/hw/xen/io/ring.h
> > > > @@ -0,0 +1,455 @@
> > > > +/**
> > > > + * ring.h
> > > > + * 
> > > > + * Shared producer-consumer ring macros.
> > > > + *
> > > > + * Permission is hereby granted, free of charge, to any person 
> > > > obtaining a copy
> > > > + * of this software and associated documentation files (the 
> > > > "Software"), to
> > > > + * deal in the Software without restriction, including without 
> > > > limitation the
> > > > + * rights to use, copy, modify, merge, publish, distribute, 
> > > > sublicense, and/or
> > > > + * sell copies of the Software, and to permit persons to whom the 
> > > > Software is
> > > > + * furnished to do so, subject to the following conditions:
> > > > + *
> > > > + * The above copyright notice and this permission notice shall be 
> > > > included in
> > > > + * all copies or substantial portions of the Software.
> > > > + *
> > > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> > > > EXPRESS OR
> > > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> > > > MERCHANTABILITY,
> > > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> > > > SHALL THE
> > > > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > > > OTHER
> > > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> > > > ARISING
> > > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > > + * DEALINGS IN THE SOFTWARE.
> > > > + *
> > > > + * Tim Deegan and Andrew Warfield November 2004.
> > > > + */
> > > > +
> > > > +#ifndef __XEN_PUBLIC_IO_RING_H__
> > > > +#define __XEN_PUBLIC_IO_RING_H__
> > > > +
> > > > +#if __XEN_INTERFACE_VERSION__ < 0x00030208
> > > > +#define xen_mb()  mb()
> > > > +#define xen_rmb() rmb()
> > > > +#define xen_wmb() wmb()
> > > > +#endif
> > > > +
> > > > +typedef unsigned int RING_IDX;
> > > > +
> > > > +/* Round a 32-bit unsigned constant down to the nearest power of two. 
> > > > */
> > > > +#define __RD2(_x)  (((_x) & 0x0002) ? 0x2  : 

Re: [Qemu-devel] [PATCH v2 3/9] xen: introduce the header file for the Xen 9pfs transport protocol

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:54 -0700
> Stefano Stabellini  wrote:
> 
> > It uses the new ring.h macros to declare rings and interfaces.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > ---
> >  hw/9pfs/xen_9pfs.h | 20 
> >  1 file changed, 20 insertions(+)
> >  create mode 100644 hw/9pfs/xen_9pfs.h
> > 
> 
> This header file has only one user: please move its content to
> hw/9pfs/xen-9p-backend.c (except the 9P header structure, see
> below).

OK, I can do that. There is going to be a very similar header in the Xen
tree under xen/include/public/io
(http://marc.info/?l=xen-devel=148952997709142), but from QEMU point
of view, it makes sense to fold it in xen-9p-backend.c.


> > diff --git a/hw/9pfs/xen_9pfs.h b/hw/9pfs/xen_9pfs.h
> > new file mode 100644
> > index 000..c4e8901
> > --- /dev/null
> > +++ b/hw/9pfs/xen_9pfs.h
> > @@ -0,0 +1,20 @@
> > +#ifndef XEN_9PFS_H
> > +#define XEN_9PFS_H
> > +
> > +#include "hw/xen/io/ring.h"
> > +#include 
> > +
> > +struct xen_9pfs_header {
> > +   uint32_t size;
> > +   uint8_t id;
> > +   uint16_t tag;
> > +
> > +   /* uint8_t sdata[]; */
> 
> This doesn't seem useful.

I'll remove


> > +} __attribute__((packed));
> > +
> 
> A few remarks:
> - this is a 9P message header actually, which is also used with virtio,
> - QEMU coding style requires a typedef in CamelCase,
> - the 9P protocol explicitely uses little-endian ordering. Since we
>   don't have endian-specific types, it makes sense to indicate that
>   when naming the fields.
> - we have a QEMU_PACKED macro which seems to be used a lot more than
>   the gcc syntax
> 
> Please define a new type in hw/9pfs/9p.h and use it in both transports.
> Something like:
> 
> typedef struct {
> uint32_t size_le;
> uint8_t id;
> uint16_t tag_le;
> } QEMU_PACKED P9MsgHeader;

OK


> > +#define PAGE_SHIFT XC_PAGE_SHIFT
> 
> I don't see any user for this in hw/9pfs/xen-9p-backend.c

PAGE_SHIFT is used by the macros below, but the original Xen headers
don't have the PAGE_SHIFT definition, so, for the sake of keeping the
two in sync, I didn't add it there.


> > +#define XEN_9PFS_RING_ORDER 6
> > +#define XEN_9PFS_RING_SIZE  XEN_FLEX_RING_SIZE(XEN_9PFS_RING_ORDER)
> > +DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
> > +
> > +#endif



Re: [Qemu-devel] [PATCH v2 2/9] xen: import ring.h from xen

2017-03-15 Thread Greg Kurz
On Wed, 15 Mar 2017 11:36:12 -0700 (PDT)
Stefano Stabellini  wrote:

> On Wed, 15 Mar 2017, Greg Kurz wrote:
> > On Mon, 13 Mar 2017 16:55:53 -0700
> > Stefano Stabellini  wrote:
> >   
> > > Do not use the ring.h header installed on the system. Instead, import
> > > the header into the QEMU codebase. This avoids problems when QEMU is
> > > built against a Xen version too old to provide all the ring macros.
> > >   
> > 
> > What kind of problems ?  
> 
> The FLEX macros are only available in Xen 4.9+ (still unreleased).
> However, aside from these macros, the Xen 9pfs frontends and backends
> could work fine on any Xen releases. In fact, the Xen public/io headers
> are only provided as reference.
> 

Ok.

> 
> > > Signed-off-by: Stefano Stabellini 
> > > CC: anthony.per...@citrix.com
> > > CC: jgr...@suse.com
> > > ---
> > > NB: The new macros have not been committed to Xen yet. Do not apply this
> > > patch until they do.  
> > 
> > Why ? Does this break compat with older Xen ?  
> 
> No, it does not break compatibility. But I think it is a good idea to
> commit the header to QEMU only after the corresponding Xen header has
> been accepted. I don't want the two to diverge.
> 

Fair enough but this will put the entire patchset on hold then. When Xen 4.9
is supposed to be released ?

> 
> > > ---
> > > ---
> > >  hw/block/xen_blkif.h |   2 +-
> > >  hw/usb/xen-usb.c |   2 +-
> > >  include/hw/xen/io/ring.h | 455 
> > > +++
> > >  3 files changed, 457 insertions(+), 2 deletions(-)
> > >  create mode 100644 include/hw/xen/io/ring.h
> > > 
> > > diff --git a/hw/block/xen_blkif.h b/hw/block/xen_blkif.h
> > > index 3300b6f..3e6e1ea 100644
> > > --- a/hw/block/xen_blkif.h
> > > +++ b/hw/block/xen_blkif.h
> > > @@ -1,7 +1,7 @@
> > >  #ifndef XEN_BLKIF_H
> > >  #define XEN_BLKIF_H
> > >  
> > > -#include 
> > > +#include "hw/xen/io/ring.h"
> > >  #include 
> > >  #include 
> > >  
> > > diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
> > > index 8e676e6..370b3d9 100644
> > > --- a/hw/usb/xen-usb.c
> > > +++ b/hw/usb/xen-usb.c
> > > @@ -33,7 +33,7 @@
> > >  #include "qapi/qmp/qint.h"
> > >  #include "qapi/qmp/qstring.h"
> > >  
> > > -#include 
> > > +#include "hw/xen/io/ring.h"
> > >  #include 
> > >  
> > >  /*
> > > diff --git a/include/hw/xen/io/ring.h b/include/hw/xen/io/ring.h
> > > new file mode 100644
> > > index 000..cf01fc3
> > > --- /dev/null
> > > +++ b/include/hw/xen/io/ring.h
> > > @@ -0,0 +1,455 @@
> > > +/**
> > > + * ring.h
> > > + * 
> > > + * Shared producer-consumer ring macros.
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining 
> > > a copy
> > > + * of this software and associated documentation files (the "Software"), 
> > > to
> > > + * deal in the Software without restriction, including without 
> > > limitation the
> > > + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> > > and/or
> > > + * sell copies of the Software, and to permit persons to whom the 
> > > Software is
> > > + * furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice shall be 
> > > included in
> > > + * all copies or substantial portions of the Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> > > EXPRESS OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> > > MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> > > SHALL THE
> > > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> > > ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > + * DEALINGS IN THE SOFTWARE.
> > > + *
> > > + * Tim Deegan and Andrew Warfield November 2004.
> > > + */
> > > +
> > > +#ifndef __XEN_PUBLIC_IO_RING_H__
> > > +#define __XEN_PUBLIC_IO_RING_H__
> > > +
> > > +#if __XEN_INTERFACE_VERSION__ < 0x00030208
> > > +#define xen_mb()  mb()
> > > +#define xen_rmb() rmb()
> > > +#define xen_wmb() wmb()
> > > +#endif
> > > +
> > > +typedef unsigned int RING_IDX;
> > > +
> > > +/* Round a 32-bit unsigned constant down to the nearest power of two. */
> > > +#define __RD2(_x)  (((_x) & 0x0002) ? 0x2  : ((_x) & 
> > > 0x1))
> > > +#define __RD4(_x)  (((_x) & 0x000c) ? __RD2((_x)>>2)<<2: 
> > > __RD2(_x))
> > > +#define __RD8(_x)  (((_x) & 0x00f0) ? __RD4((_x)>>4)<<4: 
> > > __RD4(_x))
> > > +#define __RD16(_x) (((_x) & 0xff00) ? __RD8((_x)>>8)<<8: 
> > > __RD8(_x))
> > > +#define __RD32(_x) (((_x) & 0x) ? __RD16((_x)>>16)<<16 : 
> > > __RD16(_x))
> > > +
> > > +/*
> > > + * Calculate size of a shared ring, given 

Re: [Qemu-devel] [PATCH] Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL

2017-03-15 Thread Eric Blake
On 03/15/2017 01:25 PM, Daniel P. Berrange wrote:
> RFC 6331 documents a number of serious security weaknesses in
> the SASL DIGEST-MD5 mechanism. As such, QEMU should not be
> using or recommending it as a default mechanism for VNC auth
> with SASL.
> 
> GSSAPI (Kerberos) is the only other viable SASL mechanism that
> can provide secure session encryption so enable that by defalt
> as the replacement. If users have TLS enabled for VNC, they can
> optionally decide to use SCRAM-SHA-1 instead of GSSAPI, allowing
> plain username and password auth.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  qemu-doc.texi | 50 +-
>  qemu.sasl | 54 +++---
>  2 files changed, 60 insertions(+), 44 deletions(-)
> 
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 794ab4a..922b3a4 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1732,37 +1732,45 @@ SASL service config /etc/sasl2/qemu.conf. If running 
> QEMU as an
>  unprivileged user, an environment variable SASL_CONF_PATH can be used
>  to make it search alternate locations for the service config.
>  
> -The default configuration might contain
> +If the TLS option is enabled for VNC, then it will provide session 
> encryption,
> +otherwise the SASL mechanism will have to provide encryption. In the latter
> +case the list of possible plugins that can be used is drastically reduced. In
> +fact only the GSSAPI SASL mechanism provides an acceptable level of security
> +by modern standards. Previous versions of QEMU referred to the DIGEST-MD5
> +mechansim, however, it has multiple serious flaws described in detail in

s/mechansim/mechanism/

> +RFC 6331 and thus should never be used any more. The SCRAM-SHA-1 mechanism
> +provides a simple username/password auth facility similar to DIGEST-MD5, but
> +does not support session encryption, so can only be used in conbination with

s/conbination/combination/

> +TLS.
> +
> +When not using TLS the recommended configuration is
>  
>  @example
> -mech_list: digest-md5
> -sasldb_path: /etc/qemu/passwd.db
> +mech_list: gssapi
> +keytab: /etc/qemu/krb5.tab
>  @end example
>  
> -This says to use the 'Digest MD5' mechanism, which is similar to the HTTP
> -Digest-MD5 mechanism. The list of valid usernames & passwords is maintained
> -in the /etc/qemu/passwd.db file, and can be updated using the saslpasswd2
> -command. While this mechanism is easy to configure and use, it is not
> -considered secure by modern standards, so only suitable for developers /
> -ad-hoc testing.
> +This says to use the 'GSSAPI' mechanism with the Kerberos v5 protocol, with
> +the server principal stored in /etc/qemu/krb5.tab. For this to work the
> +administrator of your KDC must generate a Kerberos principal for the server,
> +with a name of  'qemu/somehost.example.com@@EXAMPLE.COM' replacing

spurious double space

> +'somehost.example.com' with the fully qualified host name of the machine
> +running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm.
>  
> -A more serious deployment might use Kerberos, which is done with the 'gssapi'
> -mechanism
> +When using TLS, if username+password authentication is desired, then a
> +reasonable configuration is
>  
>  @example
> -mech_list: gssapi
> -keytab: /etc/qemu/krb5.tab
> +mech_list: scram-sha-1
> +sasldb_path: /etc/qemu/passwd.db
>  @end example
>  
> -For this to work the administrator of your KDC must generate a Kerberos
> -principal for the server, with a name of  
> 'qemu/somehost.example.com@@EXAMPLE.COM'
> -replacing 'somehost.example.com' with the fully qualified host name of the
> -machine running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm.
> +The saslpasswd2 program can be used to populate the passwd.db file with
> +accounts.
>  
> -Other configurations will be left as an exercise for the reader. It should
> -be noted that only Digest-MD5 and GSSAPI provides a SSF layer for data
> -encryption. For all other mechanisms, VNC should always be configured to
> -use TLS and x509 certificates to protect security credentials from snooping.
> +Other SASL configurations will be left as an exercise for the reader. Note 
> that
> +all mechanisms except GSSAPI, should be combined with use of TLS to ensure a
> +security data channel.

s/security/secure/ ?

>  
>  @node gdb_usage
>  @section GDB usage
> diff --git a/qemu.sasl b/qemu.sasl
> index 64fdef3..8c501a1 100644
> --- a/qemu.sasl
> +++ b/qemu.sasl
> @@ -1,36 +1,44 @@
> -# If you want to use the non-TLS socket, then you *must* include
> -# the GSSAPI or DIGEST-MD5 mechanisms, because they are the only
> -# ones that can offer session encryption as well as authentication.
> +# If you want to use VNC remotely without TLS, then you *must*
> +# pick  a mechanism which provides session encryption as well

spurious double space

> +# as authentication.
>  #
> -# If you're only using TLS, then you can turn on any mechanisms
> +# If you are 

Re: [Qemu-devel] [Bug 1217339] [PATCH] Unix signal to send ACPI-shutdown to Guest

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 18:08, Daniel P. Berrange  wrote:
> On Wed, Mar 15, 2017 at 06:00:40PM +, Peter Maydell wrote:
>> On 15 March 2017 at 17:46, Simon  wrote:
>> > OK for not using SIGHUP and keep SIGTERM, SIGINT and SIGHUP to have the
>> > same behavior.
>> >
>> > SIGQUIT is reserved for core files generation.
>> >
>> > SIGUSR1 is already used in 'util/qemu-progress.c' to trigger a report
>> > on ongoing jobs, so it does not seem usable.
>> >
>> > SIGUSR2 is temporarily used in 'util/coroutine-sigaltstack.c' which
>> > takes care however to preserve the original handler. I did not saw
>> > any other place where it is used, so it seems to be a better candidate.
>>
>> I don't think we can use SIGUSR2 here -- as you say, it's used
>> in the sigaltstack version of coroutines, and so there would
>> be races if the user tried to use SIGUSR2 to power down the
>> machine while we happened to be starting a new coroutine.
>>
>> SIGUSR1 is also no good, as it is used by main-loop.c as
>> the SIG_IPI.
>
> Which means we'd be into the realm of having to pick  SIGRTMIN + N for
> some arbitrary N >= 0

That's pretty nasty. Why can't we use SIGHUP, again?

thanks
-- PMM



Re: [Qemu-devel] [PATCH for 2.9] migration: use "" as the default for tls-creds/hostname

2017-03-15 Thread Eric Blake
On 03/15/2017 11:16 AM, Daniel P. Berrange wrote:
> The tls-creds parameter has a default value of NULL indicating
> that TLS should not be used. Setting it to non-NULL enables
> use of TLS. Once tls-creds are set to a non-NULL value via the
> monitor, it isn't possible to set them back to NULL again, due
> to current implementation limitations. The empty string is not
> a valid QObject identifier, so this switches to use "" as the
> default, indicating that TLS will not be used
> 
> The tls-hostname parameter has a default value of NULL indicating
> the the hostname from the migrate connection URI should be used.
> Again, once tls-hostname is set non-NULL, to override the default
> hostname for x509 cert validation, it isn't possible to reset it
> back to NULL via the monitor. The empty string is not a valid
> hostname, so this switches to use "" as the default, indicating
> that the migrate URI hostname should be used.
> 
> Using "" as the default for both, also means that the monitor
> commands "info migrate_parameters" / "query-migrate-parameters"
> will report existance of tls-creds/tls-parameters even when set
> to their default values.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  migration/migration.c | 4 
>  migration/tls.c   | 2 +-
>  qapi-schema.json  | 4 
>  3 files changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake 

And still leaves the door open to future growth if we want to add
"foo":null for resetting a value to default in 2.10.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 1/9] configure: change CONFIG_XEN_BACKEND to be a target property

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Paolo Bonzini wrote:
> On 14/03/2017 21:23, Stefano Stabellini wrote:
> > On Tue, 14 Mar 2017, Stefano Stabellini wrote:
> >>> Then you add to Makefile:
> >>>
> >>>  CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
> >>>  CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
> >>> +CONFIG_XEN := $(CONFIG_XEN_BACKEND)
> >>>  CONFIG_ALL=y
> >>>  -include config-all-devices.mak
> >>>  -include config-all-disas.mak
> >>>
> >>> The Makefile change ensures that they are built before descending in the
> >>> target-specific directories.
> >>
> >> But I don't understand this. Please correct me if I am wrong, but this
> >> change looks like it would end up setting CONFIG_XEN every time that
> >> CONFIG_XEN_BACKEND is set. Without the configure change at the top, it
> >> would end up setting CONFIG_XEN whenever the host supports Xen, even for
> >> non-x86 and non-ARM targets. What am I missing?
> 
> This CONFIG_XEN assignment applies to the toplevel only, i.e. to files
> that are built once.  Targets will still take CONFIG_XEN from
> config-target.mak, and it will not be set for non-x86/non-ARM targets.
> This CONFIG_XEN assignment applies to files that are compiled once.
> 
> The issue you reported here:
> 
> >   LINKaarch64-softmmu/qemu-system-aarch64
> > ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_alloc':
> > /local/qemu/hw/9pfs/xen-9p-backend.c:387: undefined reference to 
> > `xenstore_write_be_str'
> > /local/qemu/hw/9pfs/xen-9p-backend.c:388: undefined reference to 
> > `xenstore_write_be_int'
> 
> is because you need this in patch 9:
> 
> -common-obj-$(CONFIG_XEN_BACKEND) += xen-9p-backend.o
> +common-obj-$(CONFIG_XEN) += xen-9p-backend.o
> 

/me shakes his head in shame.
Thank you for the explanation!



Re: [Qemu-devel] [PULL 0/2] Miscellaneous patches for 2017-03-15

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 13:18, Markus Armbruster  wrote:
> The following changes since commit d84f714eafedd8bb9d4aaec8b76417bef8e3535e:
>
>   Update version for v2.9.0-rc0 release (2017-03-14 19:18:23 +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-misc-2017-03-15
>
> for you to fetch changes up to 4d0e72396b69656f36f484b54ffe64893d793a80:
>
>   coverity-model: model address_space_read/write (2017-03-15 13:59:16 +0100)
>
> 
> Miscellaneous patches for 2017-03-15
>
> 
> Markus Armbruster (1):
>   tests: Use error_free_or_abort() where appropriate
>
> Paolo Bonzini (1):
>   coverity-model: model address_space_read/write
>
>  scripts/coverity-model.c| 17 +
>  tests/test-qemu-opts.c  |  3 +--
>  tests/test-qobject-output-visitor.c |  6 ++
>  3 files changed, 16 insertions(+), 10 deletions(-)
>
> Markus Armbruster (1):
>   tests: Use error_free_or_abort() where appropriate
>
> Paolo Bonzini (1):
>   coverity-model: model address_space_read/write

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH v2 2/9] xen: import ring.h from xen

2017-03-15 Thread Stefano Stabellini
On Wed, 15 Mar 2017, Greg Kurz wrote:
> On Mon, 13 Mar 2017 16:55:53 -0700
> Stefano Stabellini  wrote:
> 
> > Do not use the ring.h header installed on the system. Instead, import
> > the header into the QEMU codebase. This avoids problems when QEMU is
> > built against a Xen version too old to provide all the ring macros.
> > 
> 
> What kind of problems ?

The FLEX macros are only available in Xen 4.9+ (still unreleased).
However, aside from these macros, the Xen 9pfs frontends and backends
could work fine on any Xen releases. In fact, the Xen public/io headers
are only provided as reference.


> > Signed-off-by: Stefano Stabellini 
> > CC: anthony.per...@citrix.com
> > CC: jgr...@suse.com
> > ---
> > NB: The new macros have not been committed to Xen yet. Do not apply this
> > patch until they do.
> 
> Why ? Does this break compat with older Xen ?

No, it does not break compatibility. But I think it is a good idea to
commit the header to QEMU only after the corresponding Xen header has
been accepted. I don't want the two to diverge.


> > ---
> > ---
> >  hw/block/xen_blkif.h |   2 +-
> >  hw/usb/xen-usb.c |   2 +-
> >  include/hw/xen/io/ring.h | 455 
> > +++
> >  3 files changed, 457 insertions(+), 2 deletions(-)
> >  create mode 100644 include/hw/xen/io/ring.h
> > 
> > diff --git a/hw/block/xen_blkif.h b/hw/block/xen_blkif.h
> > index 3300b6f..3e6e1ea 100644
> > --- a/hw/block/xen_blkif.h
> > +++ b/hw/block/xen_blkif.h
> > @@ -1,7 +1,7 @@
> >  #ifndef XEN_BLKIF_H
> >  #define XEN_BLKIF_H
> >  
> > -#include 
> > +#include "hw/xen/io/ring.h"
> >  #include 
> >  #include 
> >  
> > diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
> > index 8e676e6..370b3d9 100644
> > --- a/hw/usb/xen-usb.c
> > +++ b/hw/usb/xen-usb.c
> > @@ -33,7 +33,7 @@
> >  #include "qapi/qmp/qint.h"
> >  #include "qapi/qmp/qstring.h"
> >  
> > -#include 
> > +#include "hw/xen/io/ring.h"
> >  #include 
> >  
> >  /*
> > diff --git a/include/hw/xen/io/ring.h b/include/hw/xen/io/ring.h
> > new file mode 100644
> > index 000..cf01fc3
> > --- /dev/null
> > +++ b/include/hw/xen/io/ring.h
> > @@ -0,0 +1,455 @@
> > +/**
> > + * ring.h
> > + * 
> > + * Shared producer-consumer ring macros.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation 
> > the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> > and/or
> > + * sell copies of the Software, and to permit persons to whom the Software 
> > is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> > THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + *
> > + * Tim Deegan and Andrew Warfield November 2004.
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_IO_RING_H__
> > +#define __XEN_PUBLIC_IO_RING_H__
> > +
> > +#if __XEN_INTERFACE_VERSION__ < 0x00030208
> > +#define xen_mb()  mb()
> > +#define xen_rmb() rmb()
> > +#define xen_wmb() wmb()
> > +#endif
> > +
> > +typedef unsigned int RING_IDX;
> > +
> > +/* Round a 32-bit unsigned constant down to the nearest power of two. */
> > +#define __RD2(_x)  (((_x) & 0x0002) ? 0x2  : ((_x) & 
> > 0x1))
> > +#define __RD4(_x)  (((_x) & 0x000c) ? __RD2((_x)>>2)<<2: __RD2(_x))
> > +#define __RD8(_x)  (((_x) & 0x00f0) ? __RD4((_x)>>4)<<4: __RD4(_x))
> > +#define __RD16(_x) (((_x) & 0xff00) ? __RD8((_x)>>8)<<8: __RD8(_x))
> > +#define __RD32(_x) (((_x) & 0x) ? __RD16((_x)>>16)<<16 : 
> > __RD16(_x))
> > +
> > +/*
> > + * Calculate size of a shared ring, given the total available space for the
> > + * ring and indexes (_sz), and the name tag of the request/response 
> > structure.
> > + * A ring contains as many entries as will fit, rounded down to the 
> > nearest 
> > + * power of two (so we can mask with (size-1) to loop around).
> > + */
> > +#define __CONST_RING_SIZE(_s, _sz) \
> > +(__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \
> > +   sizeof(((struct _s##_sring *)0)->ring[0])))
> > +/*
> > + * The same for passing in an actual pointer instead 

[Qemu-devel] [PATCH] Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL

2017-03-15 Thread Daniel P. Berrange
RFC 6331 documents a number of serious security weaknesses in
the SASL DIGEST-MD5 mechanism. As such, QEMU should not be
using or recommending it as a default mechanism for VNC auth
with SASL.

GSSAPI (Kerberos) is the only other viable SASL mechanism that
can provide secure session encryption so enable that by defalt
as the replacement. If users have TLS enabled for VNC, they can
optionally decide to use SCRAM-SHA-1 instead of GSSAPI, allowing
plain username and password auth.

Signed-off-by: Daniel P. Berrange 
---
 qemu-doc.texi | 50 +-
 qemu.sasl | 54 +++---
 2 files changed, 60 insertions(+), 44 deletions(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index 794ab4a..922b3a4 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1732,37 +1732,45 @@ SASL service config /etc/sasl2/qemu.conf. If running 
QEMU as an
 unprivileged user, an environment variable SASL_CONF_PATH can be used
 to make it search alternate locations for the service config.
 
-The default configuration might contain
+If the TLS option is enabled for VNC, then it will provide session encryption,
+otherwise the SASL mechanism will have to provide encryption. In the latter
+case the list of possible plugins that can be used is drastically reduced. In
+fact only the GSSAPI SASL mechanism provides an acceptable level of security
+by modern standards. Previous versions of QEMU referred to the DIGEST-MD5
+mechansim, however, it has multiple serious flaws described in detail in
+RFC 6331 and thus should never be used any more. The SCRAM-SHA-1 mechanism
+provides a simple username/password auth facility similar to DIGEST-MD5, but
+does not support session encryption, so can only be used in conbination with
+TLS.
+
+When not using TLS the recommended configuration is
 
 @example
-mech_list: digest-md5
-sasldb_path: /etc/qemu/passwd.db
+mech_list: gssapi
+keytab: /etc/qemu/krb5.tab
 @end example
 
-This says to use the 'Digest MD5' mechanism, which is similar to the HTTP
-Digest-MD5 mechanism. The list of valid usernames & passwords is maintained
-in the /etc/qemu/passwd.db file, and can be updated using the saslpasswd2
-command. While this mechanism is easy to configure and use, it is not
-considered secure by modern standards, so only suitable for developers /
-ad-hoc testing.
+This says to use the 'GSSAPI' mechanism with the Kerberos v5 protocol, with
+the server principal stored in /etc/qemu/krb5.tab. For this to work the
+administrator of your KDC must generate a Kerberos principal for the server,
+with a name of  'qemu/somehost.example.com@@EXAMPLE.COM' replacing
+'somehost.example.com' with the fully qualified host name of the machine
+running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm.
 
-A more serious deployment might use Kerberos, which is done with the 'gssapi'
-mechanism
+When using TLS, if username+password authentication is desired, then a
+reasonable configuration is
 
 @example
-mech_list: gssapi
-keytab: /etc/qemu/krb5.tab
+mech_list: scram-sha-1
+sasldb_path: /etc/qemu/passwd.db
 @end example
 
-For this to work the administrator of your KDC must generate a Kerberos
-principal for the server, with a name of  
'qemu/somehost.example.com@@EXAMPLE.COM'
-replacing 'somehost.example.com' with the fully qualified host name of the
-machine running QEMU, and 'EXAMPLE.COM' with the Kerberos Realm.
+The saslpasswd2 program can be used to populate the passwd.db file with
+accounts.
 
-Other configurations will be left as an exercise for the reader. It should
-be noted that only Digest-MD5 and GSSAPI provides a SSF layer for data
-encryption. For all other mechanisms, VNC should always be configured to
-use TLS and x509 certificates to protect security credentials from snooping.
+Other SASL configurations will be left as an exercise for the reader. Note that
+all mechanisms except GSSAPI, should be combined with use of TLS to ensure a
+security data channel.
 
 @node gdb_usage
 @section GDB usage
diff --git a/qemu.sasl b/qemu.sasl
index 64fdef3..8c501a1 100644
--- a/qemu.sasl
+++ b/qemu.sasl
@@ -1,36 +1,44 @@
-# If you want to use the non-TLS socket, then you *must* include
-# the GSSAPI or DIGEST-MD5 mechanisms, because they are the only
-# ones that can offer session encryption as well as authentication.
+# If you want to use VNC remotely without TLS, then you *must*
+# pick  a mechanism which provides session encryption as well
+# as authentication.
 #
-# If you're only using TLS, then you can turn on any mechanisms
+# If you are only using TLS, then you can turn on any mechanisms
 # you like for authentication, because TLS provides the encryption
 #
-# Default to a simple username+password mechanism
-# NB digest-md5 is no longer considered secure by current standards
-mech_list: digest-md5
+# If you are only using UNIX, sockets then encryption is not
+# required at all.
+#
+# NB, previously DIGEST-MD5 was set as 

Re: [Qemu-devel] [Bug 1217339] [PATCH] Unix signal to send ACPI-shutdown to Guest

2017-03-15 Thread Daniel P. Berrange
On Wed, Mar 15, 2017 at 06:00:40PM +, Peter Maydell wrote:
> On 15 March 2017 at 17:46, Simon  wrote:
> > OK for not using SIGHUP and keep SIGTERM, SIGINT and SIGHUP to have the
> > same behavior.
> >
> > SIGQUIT is reserved for core files generation.
> >
> > SIGUSR1 is already used in 'util/qemu-progress.c' to trigger a report
> > on ongoing jobs, so it does not seem usable.
> >
> > SIGUSR2 is temporarily used in 'util/coroutine-sigaltstack.c' which
> > takes care however to preserve the original handler. I did not saw
> > any other place where it is used, so it seems to be a better candidate.
> 
> I don't think we can use SIGUSR2 here -- as you say, it's used
> in the sigaltstack version of coroutines, and so there would
> be races if the user tried to use SIGUSR2 to power down the
> machine while we happened to be starting a new coroutine.
> 
> SIGUSR1 is also no good, as it is used by main-loop.c as
> the SIG_IPI.

Which means we'd be into the realm of having to pick  SIGRTMIN + N for
some arbitrary N >= 0


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



[Qemu-devel] [PULL 7/7] virtio-pci: reset modern vq meta data

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

We don't reset proxy->vqs[].{num|desc[]|avail[]|used[]}. This means if
a driver enable the vq without setting vq address after reset. The old
addresses were leaked. Fixing this by resetting modern vq meta data
during device reset.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5ce42af..69cc471 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1857,6 +1857,10 @@ static void virtio_pci_reset(DeviceState *qdev)
 
 for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
 proxy->vqs[i].enabled = 0;
+proxy->vqs[i].num = 0;
+proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
+proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
+proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
 }
 }
 
-- 
MST




[Qemu-devel] [PULL 4/7] virtio: validate address space cache during init

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

We don't check the return value of address_space_cache_init(), this
may lead buggy driver use incorrect region caches. Instead of
triggering an assert, catch and warn this early in
virtio_init_region_cache().

Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a00380f..82b6060 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -131,6 +131,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, 
int n)
 VRingMemoryRegionCaches *new;
 hwaddr addr, size;
 int event_size;
+int64_t len;
 
 event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 
2 : 0;
 
@@ -140,21 +141,41 @@ static void virtio_init_region_cache(VirtIODevice *vdev, 
int n)
 }
 new = g_new0(VRingMemoryRegionCaches, 1);
 size = virtio_queue_get_desc_size(vdev, n);
-address_space_cache_init(>desc, vdev->dma_as,
- addr, size, false);
+len = address_space_cache_init(>desc, vdev->dma_as,
+   addr, size, false);
+if (len < size) {
+virtio_error(vdev, "Cannot map desc");
+goto err_desc;
+}
 
 size = virtio_queue_get_used_size(vdev, n) + event_size;
-address_space_cache_init(>used, vdev->dma_as,
- vq->vring.used, size, true);
+len = address_space_cache_init(>used, vdev->dma_as,
+   vq->vring.used, size, true);
+if (len < size) {
+virtio_error(vdev, "Cannot map used");
+goto err_used;
+}
 
 size = virtio_queue_get_avail_size(vdev, n) + event_size;
-address_space_cache_init(>avail, vdev->dma_as,
- vq->vring.avail, size, false);
+len = address_space_cache_init(>avail, vdev->dma_as,
+   vq->vring.avail, size, false);
+if (len < size) {
+virtio_error(vdev, "Cannot map avail");
+goto err_avail;
+}
 
 atomic_rcu_set(>vring.caches, new);
 if (old) {
 call_rcu(old, virtio_free_region_cache, rcu);
 }
+return;
+
+err_avail:
+address_space_cache_destroy(>used);
+err_used:
+address_space_cache_destroy(>desc);
+err_desc:
+g_free(new);
 }
 
 /* virt queue functions */
-- 
MST




[Qemu-devel] [PULL 3/7] virtio: destroy region cache during reset

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

We don't destroy region cache during reset which can make the maps
of previous driver leaked to a buggy or malicious driver that don't
set vring address before starting to use the device. Fix this by
destroy the region cache during reset and validate it before trying to
see them.

Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 45 ++---
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 9164579..a00380f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -185,10 +185,16 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc 
*desc,
 virtio_tswap16s(vdev, >next);
 }
 
+static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
+{
+VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+assert(caches != NULL);
+return caches;
+}
 /* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_flags(VirtQueue *vq)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingAvail, flags);
 return virtio_lduw_phys_cached(vq->vdev, >avail, pa);
 }
@@ -196,7 +202,7 @@ static inline uint16_t vring_avail_flags(VirtQueue *vq)
 /* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_idx(VirtQueue *vq)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingAvail, idx);
 vq->shadow_avail_idx = virtio_lduw_phys_cached(vq->vdev, >avail, 
pa);
 return vq->shadow_avail_idx;
@@ -205,7 +211,7 @@ static inline uint16_t vring_avail_idx(VirtQueue *vq)
 /* Called within rcu_read_lock().  */
 static inline uint16_t vring_avail_ring(VirtQueue *vq, int i)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingAvail, ring[i]);
 return virtio_lduw_phys_cached(vq->vdev, >avail, pa);
 }
@@ -220,7 +226,7 @@ static inline uint16_t vring_get_used_event(VirtQueue *vq)
 static inline void vring_used_write(VirtQueue *vq, VRingUsedElem *uelem,
 int i)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingUsed, ring[i]);
 virtio_tswap32s(vq->vdev, >id);
 virtio_tswap32s(vq->vdev, >len);
@@ -231,7 +237,7 @@ static inline void vring_used_write(VirtQueue *vq, 
VRingUsedElem *uelem,
 /* Called within rcu_read_lock().  */
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingUsed, idx);
 return virtio_lduw_phys_cached(vq->vdev, >used, pa);
 }
@@ -239,7 +245,7 @@ static uint16_t vring_used_idx(VirtQueue *vq)
 /* Called within rcu_read_lock().  */
 static inline void vring_used_idx_set(VirtQueue *vq, uint16_t val)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 hwaddr pa = offsetof(VRingUsed, idx);
 virtio_stw_phys_cached(vq->vdev, >used, pa, val);
 address_space_cache_invalidate(>used, pa, sizeof(val));
@@ -249,7 +255,7 @@ static inline void vring_used_idx_set(VirtQueue *vq, 
uint16_t val)
 /* Called within rcu_read_lock().  */
 static inline void vring_used_flags_set_bit(VirtQueue *vq, int mask)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 VirtIODevice *vdev = vq->vdev;
 hwaddr pa = offsetof(VRingUsed, flags);
 uint16_t flags = virtio_lduw_phys_cached(vq->vdev, >used, pa);
@@ -261,7 +267,7 @@ static inline void vring_used_flags_set_bit(VirtQueue *vq, 
int mask)
 /* Called within rcu_read_lock().  */
 static inline void vring_used_flags_unset_bit(VirtQueue *vq, int mask)
 {
-VRingMemoryRegionCaches *caches = atomic_rcu_read(>vring.caches);
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
 VirtIODevice *vdev = vq->vdev;
 hwaddr pa = offsetof(VRingUsed, flags);
 uint16_t flags = virtio_lduw_phys_cached(vq->vdev, >used, pa);
@@ -279,7 +285,7 @@ static inline void vring_set_avail_event(VirtQueue *vq, 
uint16_t val)
 return;
 }
 
-caches = atomic_rcu_read(>vring.caches);
+caches = vring_get_region_caches(vq);
 pa 

[Qemu-devel] [PULL 6/7] Revert "virtio: unbreak virtio-pci with IOMMU after caching ring translations"

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

This reverts commit
96a8821d21411f10d77ea994af369c6e5c35a2cc. Previous patch is a better
solution which does not require a strict order between virtio and IOMMU.

CC: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
 hw/virtio/virtio-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b76f3f6..5ce42af 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1153,7 +1153,7 @@ static AddressSpace *virtio_pci_get_dma_as(DeviceState *d)
 VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
 PCIDevice *dev = >pci_dev;
 
-return pci_device_iommu_address_space(dev);
+return pci_get_address_space(dev);
 }
 
 static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
-- 
MST




[Qemu-devel] [PULL 1/7] Bugfix: Handle error if VM Generation ID device not present

2017-03-15 Thread Michael S. Tsirkin
From: Ben Warren 

This was crashing due to NULL-pointer dereference

QMP Test case:
==

(QEMU) query-vm-generation-id
{"error": {"class": "GenericError", "desc": "VM Generation ID device not
found"}}

HMP Test case:
==
virsh # qemu-monitor-command --hmp 3 info vm-generation-id
VM Generation ID device not found

Signed-off-by: Ben Warren 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
---
 hmp.c | 4 +++-
 hw/acpi/vmgenid.c | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 261843f..edb8970 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2608,9 +2608,11 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict 
*qdict)
 
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
 {
-GuidInfo *info = qmp_query_vm_generation_id(NULL);
+Error *err = NULL;
+GuidInfo *info = qmp_query_vm_generation_id();
 if (info) {
 monitor_printf(mon, "%s\n", info->guid);
 }
+hmp_handle_error(mon, );
 qapi_free_GuidInfo(info);
 }
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index 744f284..7a3ad17 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -248,6 +248,7 @@ GuidInfo *qmp_query_vm_generation_id(Error **errp)
 Object *obj = find_vmgenid_dev();
 
 if (!obj) {
+error_setg(errp, "VM Generation ID device not found");
 return NULL;
 }
 vms = VMGENID(obj);
-- 
MST




[Qemu-devel] [PULL 5/7] pci: introduce a bus master container

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU after caching ring
translations") tries to make IOMMU works with virtio memory region
cache, but it requires IOMMU to be created before any virtio
devices. This is sub optimal, fixing this by introduce a bus master
container to make sure address space can be initialized during device
registering, and then we can safely set alias and make
bus_master_enable_region as its subregion during bus master
initialization.

Cc: Paolo Bonzini 
Signed-off-by: Jason Wang 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci/pci.h | 1 +
 hw/pci/pci.c | 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9349acb..713ede0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -284,6 +284,7 @@ struct PCIDevice {
 char name[64];
 PCIIORegion io_regions[PCI_NUM_REGIONS];
 AddressSpace bus_master_as;
+MemoryRegion bus_master_container_region;
 MemoryRegion bus_master_enable_region;
 
 /* do not access the following fields */
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 273f1e4..ad46390 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -88,8 +88,8 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
  OBJECT(pci_dev), "bus master",
  dma_as->root, 0, 
memory_region_size(dma_as->root));
 memory_region_set_enabled(_dev->bus_master_enable_region, false);
-address_space_init(_dev->bus_master_as,
-   _dev->bus_master_enable_region, pci_dev->name);
+memory_region_add_subregion(_dev->bus_master_container_region, 0,
+_dev->bus_master_enable_region);
 }
 
 static void pcibus_machine_done(Notifier *notifier, void *data)
@@ -995,6 +995,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pci_dev->devfn = devfn;
 pci_dev->requester_id_cache = pci_req_id_cache_get(pci_dev);
 
+memory_region_init(_dev->bus_master_container_region, OBJECT(pci_dev),
+   "bus master container", UINT64_MAX);
+address_space_init(_dev->bus_master_as,
+   _dev->bus_master_container_region, pci_dev->name);
+
 if (qdev_hotplug) {
 pci_init_bus_master(pci_dev);
 }
-- 
MST




[Qemu-devel] [PULL 0/7] virtio, pc: fixes

2017-03-15 Thread Michael S. Tsirkin
The following changes since commit d84f714eafedd8bb9d4aaec8b76417bef8e3535e:

  Update version for v2.9.0-rc0 release (2017-03-14 19:18:23 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 60a8d8023473dd24957b3a66824f66cd35b80d64:

  virtio-pci: reset modern vq meta data (2017-03-15 19:59:18 +0200)


virtio, pc: fixes

Some fixes to fallback from using virtio caching,
pls a minor vm gen id fix.

Signed-off-by: Michael S. Tsirkin 


Ben Warren (1):
  Bugfix: Handle error if VM Generation ID device not present

Jason Wang (6):
  virtio: guard against NULL pfn
  virtio: destroy region cache during reset
  virtio: validate address space cache during init
  pci: introduce a bus master container
  Revert "virtio: unbreak virtio-pci with IOMMU after caching ring 
translations"
  virtio-pci: reset modern vq meta data

 include/hw/pci/pci.h   |   1 +
 hmp.c  |   4 +-
 hw/acpi/vmgenid.c  |   1 +
 hw/pci/pci.c   |   9 -
 hw/virtio/virtio-pci.c |   6 ++-
 hw/virtio/virtio.c | 104 +++--
 6 files changed, 100 insertions(+), 25 deletions(-)




[Qemu-devel] [PULL 2/7] virtio: guard against NULL pfn

2017-03-15 Thread Michael S. Tsirkin
From: Jason Wang 

To avoid access stale memory region cache after reset, this patch
check the existence of virtqueue pfn for all exported virtqueue access
helpers before trying to use them.

Cc: Cornelia Huck 
Cc: Paolo Bonzini 
Reviewed-by: Cornelia Huck 
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index efce4b3..9164579 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -318,6 +318,10 @@ int virtio_queue_ready(VirtQueue *vq)
  * Called within rcu_read_lock().  */
 static int virtio_queue_empty_rcu(VirtQueue *vq)
 {
+if (unlikely(!vq->vring.avail)) {
+return 1;
+}
+
 if (vq->shadow_avail_idx != vq->last_avail_idx) {
 return 0;
 }
@@ -329,6 +333,10 @@ int virtio_queue_empty(VirtQueue *vq)
 {
 bool empty;
 
+if (unlikely(!vq->vring.avail)) {
+return 1;
+}
+
 if (vq->shadow_avail_idx != vq->last_avail_idx) {
 return 0;
 }
@@ -431,6 +439,10 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
+if (unlikely(!vq->vring.used)) {
+return;
+}
+
 idx = (idx + vq->used_idx) % vq->vring.num;
 
 uelem.id = elem->index;
@@ -448,6 +460,10 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
+if (unlikely(!vq->vring.used)) {
+return;
+}
+
 /* Make sure buffer is written before we update index. */
 smp_wmb();
 trace_virtqueue_flush(vq, count);
@@ -546,6 +562,16 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 int64_t len = 0;
 int rc;
 
+if (unlikely(!vq->vring.desc)) {
+if (in_bytes) {
+*in_bytes = 0;
+}
+if (out_bytes) {
+*out_bytes = 0;
+}
+return;
+}
+
 rcu_read_lock();
 idx = vq->last_avail_idx;
 total_bufs = in_total = out_total = 0;
-- 
MST




Re: [Qemu-devel] [Bug 1217339] [PATCH] Unix signal to send ACPI-shutdown to Guest

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 17:46, Simon  wrote:
> OK for not using SIGHUP and keep SIGTERM, SIGINT and SIGHUP to have the
> same behavior.
>
> SIGQUIT is reserved for core files generation.
>
> SIGUSR1 is already used in 'util/qemu-progress.c' to trigger a report
> on ongoing jobs, so it does not seem usable.
>
> SIGUSR2 is temporarily used in 'util/coroutine-sigaltstack.c' which
> takes care however to preserve the original handler. I did not saw
> any other place where it is used, so it seems to be a better candidate.

I don't think we can use SIGUSR2 here -- as you say, it's used
in the sigaltstack version of coroutines, and so there would
be races if the user tried to use SIGUSR2 to power down the
machine while we happened to be starting a new coroutine.

SIGUSR1 is also no good, as it is used by main-loop.c as
the SIG_IPI.

thanks
-- PMM



Re: [Qemu-devel] [PULL for-2.9 0/1] Block patches

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 05:06, Stefan Hajnoczi  wrote:
> The following changes since commit d84f714eafedd8bb9d4aaec8b76417bef8e3535e:
>
>   Update version for v2.9.0-rc0 release (2017-03-14 19:18:23 +)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 9dc44aa5829eb3131a01378a738dee28a382bbc1:
>
>   os: don't corrupt pre-existing memory-backend data with prealloc 
> (2017-03-15 11:55:41 +0800)
>
> 
>
> 
>
> Daniel P. Berrange (1):
>   os: don't corrupt pre-existing memory-backend data with prealloc
>
>  util/oslib-posix.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)

Applied, thanks.

-- PMM



Re: [Qemu-devel] [Bug 1217339] [PATCH] Unix signal to send ACPI-shutdown to Guest

2017-03-15 Thread Simon

Daniel P. Berrange:


While I understand your motivation this creates a semantic change for
existing users of QEMU. IOW anyone who is currently relying on use
of SIGHUP will experiance a regression when upgrading QEMU.

So if we want to signal to generate a clean shutdown, we need to pick
one that QEMU hasn't already set a specific behaviour for.

SIGQUIT could be a valid option, or the super generic SIGUSR1

Regards,
Daniel


Thanks for your answer Daniel.

OK for not using SIGHUP and keep SIGTERM, SIGINT and SIGHUP to have the
same behavior.

SIGQUIT is reserved for core files generation.

SIGUSR1 is already used in 'util/qemu-progress.c' to trigger a report
on ongoing jobs, so it does not seem usable.

SIGUSR2 is temporarily used in 'util/coroutine-sigaltstack.c' which
takes care however to preserve the original handler. I did not saw
any other place where it is used, so it seems to be a better candidate.

Here is a second version of my patch using SIGUSR2 to cleanly power off
the guest:

Signed-off-by: Simon Geusebroek 
---
diff -ru qemu-2.8.0/os-posix.c qemu-new/os-posix.c
--- qemu-2.8.0/os-posix.c   2016-12-20 21:16:48.0 +0100
+++ qemu-new/os-posix.c 2017-03-15 17:45:28.290737575 +0100
@@ -72,6 +72,7 @@
 sigaction(SIGINT,  , NULL);
 sigaction(SIGHUP,  , NULL);
 sigaction(SIGTERM, , NULL);
+sigaction(SIGUSR2, , NULL);
 }

 /* Find a likely location for support files using the location of the 
binary.

diff -ru qemu-2.8.0/vl.c qemu-new/vl.c
--- qemu-2.8.0/vl.c 2016-12-20 21:16:54.0 +0100
+++ qemu-new/vl.c   2017-03-15 17:45:20.866737459 +0100
@@ -1871,7 +1871,11 @@
 /* Cannot call qemu_system_shutdown_request directly because
  * we are in a signal handler.
  */
-shutdown_requested = 1;
+if (signal == SIGUSR2) {
+powerdown_requested = 1;
+} else {
+shutdown_requested = 1;
+}
 qemu_notify_event();
 }

--




Re: [Qemu-devel] [PATCH v2] migration/block: Avoid invoking blk_drain too frequently

2017-03-15 Thread Dr. David Alan Gilbert
* Fam Zheng (f...@redhat.com) wrote:
> On Wed, 03/15 11:37, Lidong Chen wrote:
> > Increase bmds->cur_dirty after submit io, so reduce the frequency
> > involve into blk_drain, and improve the performance obviously
> > when block migration.
> > 
> > The performance test result of this patch:
> > 
> > During the block dirty save phase, this patch improve guest os IOPS
> > from 4.0K to 9.5K. and improve the migration speed from
> > 505856 rsec/s to 855756 rsec/s.
> > 
> > Signed-off-by: Lidong Chen 
> > ---
> >  migration/block.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/migration/block.c b/migration/block.c
> > index 6741228..7734ff7 100644
> > --- a/migration/block.c
> > +++ b/migration/block.c
> > @@ -576,6 +576,9 @@ static int mig_save_device_dirty(QEMUFile *f, 
> > BlkMigDevState *bmds,
> >  }
> >  
> >  bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, sector, 
> > nr_sectors);
> > +sector += nr_sectors;
> > +bmds->cur_dirty = sector;
> > +
> >  break;
> >  }
> >  sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
> > -- 
> > 1.8.3.1
> > 
> 
> Nice catch above all, thank you!
> 
> Reviewed-by: Fam Zheng 

Are you taking that via a block pull?

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v2 2/2] virtio-scsi: Fix acquire/release in dataplane handlers

2017-03-15 Thread Ed Swierk
On Tue, Mar 14, 2017 at 8:36 AM, Fam Zheng  wrote:
> After the AioContext lock push down, there is a race between
> virtio_scsi_dataplane_start and those "assert(s->ctx &&
> s->dataplane_started)", because the latter doesn't isn't wrapped in
> aio_context_acquire.
>
> Reproducer is simply booting a Fedora guest with an empty
> virtio-scsi-dataplane controller:
>
> qemu-system-x86_64 \
>   -drive 
> if=none,id=root,format=raw,file=Fedora-Cloud-Base-25-1.3.x86_64.raw \
>   -device virtio-scsi \
>   -device scsi-disk,drive=root,bootindex=1 \
>   -object iothread,id=io \
>   -device virtio-scsi-pci,iothread=io \
>   -net user,hostfwd=tcp::10022-:22 -net nic,model=virtio -m 2048 \
>   --enable-kvm
>
> Fix this by moving acquire/release pairs from virtio_scsi_handle_*_vq to
> their callers - and wrap the broken assertions in.
>
> Signed-off-by: Fam Zheng 

Verified this fixes the assertion failure on 2.9.0-rc0.

Tested-by: Ed Swierk 



Re: [Qemu-devel] [PULL 0/2] submodule-update queue 20170303

2017-03-15 Thread Jeff Cody
On Wed, Mar 15, 2017 at 04:51:07PM +, Peter Maydell wrote:
> On 15 March 2017 at 15:30, James Hanley  wrote:
> > It would appear that the HTTP versions of the repo URLs are /not/
> > referencing the same backend
> >
> > jim@jim-VirtualBox:~$ git ls-remote http://git.qemu-project.org/git/dtc.git
> > 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e HEAD
> > 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/heads/expressions
> > 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/heads/master
> > 17dab7023959256224e800dd77cae57d8ecfaec0 refs/heads/multi-v1-tags
> > 5426757b714e4c142da488fe685220b732f69d7b refs/heads/testing
> > 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/github/expressions
> > 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/remotes/github/master
> > 17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/github/multi-v1-tags
> > 5426757b714e4c142da488fe685220b732f69d7b refs/remotes/github/testing
> > 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/origin/expressions
> > 120775eb1cf39f8dcecd695c3ff1cfef8aeb669d refs/remotes/origin/master
> > 17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/origin/multi-v1-tags
> > f5aa792d81f5911eff088e4f88c0cd0a11ea9ca0 refs/tags/dwg-last
> > 0a1018321b08f89d0f1942c77802aa777a82d437 refs/tags/v1.0.0
> > 5cb1fbdd7cf82e1909e27c81073cf3272cb63fa3 refs/tags/v1.0.0^{}
> > 8e4751ca3600a2d82365e7e9d806f2bab9b81d56 refs/tags/v1.0.0-rc1
> > 74ce242bf3307c7ec77b9ddfff443c247ac8c0a3 refs/tags/v1.0.0-rc1^{}
> > 38738612dec55c0262de2192cbe655f499b8c5de refs/tags/v1.1.0
> > 202863e4dd681d17c06a82943f49485bf7860633 refs/tags/v1.1.0^{}
> > 2d38c152a6cbcd6fcd7a2f2535d3bc8860c975f9 refs/tags/v1.1.0-rc1
> > 7364cc79b5fa11e416dce01802139bc87d690118 refs/tags/v1.1.0-rc1^{}
> > 427b0062114703674688aa581d13499b1b2da896 refs/tags/v1.2.0
> > 52c356d81b1b5b5426f53655e782c37793c3637e refs/tags/v1.2.0^{}
> > 33ea8e2705c6905edcabda65dfa92af56716056b refs/tags/v1.2.0-rc1
> > f8bf4bfc8796b46e6086a52f0cd6c1f9ed58645a refs/tags/v1.2.0-rc1^{}
> > a532a5d2148e6a644bb56f7aa3d29297d19e30de refs/tags/v1.2.0-rc2
> > 17773b0e5148c5ae281ee21492c871292cb7de20 refs/tags/v1.2.0-rc2^{}
> > 00e38ce99a600e146aa20eac082b8d7d8ec70711 refs/tags/v1.3.0
> > bc895d6d09695d05ceb8b52486ffe861d6cfbdde refs/tags/v1.3.0^{}
> > 6d109a2e4885896d2665d4bbcc5bc985110b0950 refs/tags/v1.4.0
> > 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf refs/tags/v1.4.0^{}
> > 29a9b5177c0bc192f7881b940932f903aca9c360 refs/tags/v1.4.1
> > 302fca9f4c283e1994cf0a5a9ce1cf43ca15e6d2 refs/tags/v1.4.1^{}
> > 24ec6e01bca89179d744d836fe94f2b459abd03d refs/tags/v1.4.2
> > ec02b34c05be04f249ffaaca4b666f5246877dea refs/tags/v1.4.2^{}
> > jim@jim-VirtualBox:~$
> >
> >
> > Note that the tags for 1.4.3 and 1.4.4 are missing... Are the tags not
> > "fetched" with the synchronization?
> 
> Jeff, Stefan -- is something odd with the HTTP versions here?
> 

Hi,

Yes - the post-update hooks to populate the info/refs file (needed for http)
is  not invokled when we update from our script.  I just did it manually for
dtc.git, and I am now updating the script so that this will happen
automatically.

(The tags should now be present)

Thanks,
-Jeff



Re: [Qemu-devel] [PULL 0/2] submodule-update queue 20170303

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 15:30, James Hanley  wrote:
> It would appear that the HTTP versions of the repo URLs are /not/
> referencing the same backend
>
> jim@jim-VirtualBox:~$ git ls-remote http://git.qemu-project.org/git/dtc.git
> 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e HEAD
> 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/heads/expressions
> 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/heads/master
> 17dab7023959256224e800dd77cae57d8ecfaec0 refs/heads/multi-v1-tags
> 5426757b714e4c142da488fe685220b732f69d7b refs/heads/testing
> 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/github/expressions
> 3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/remotes/github/master
> 17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/github/multi-v1-tags
> 5426757b714e4c142da488fe685220b732f69d7b refs/remotes/github/testing
> 827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/origin/expressions
> 120775eb1cf39f8dcecd695c3ff1cfef8aeb669d refs/remotes/origin/master
> 17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/origin/multi-v1-tags
> f5aa792d81f5911eff088e4f88c0cd0a11ea9ca0 refs/tags/dwg-last
> 0a1018321b08f89d0f1942c77802aa777a82d437 refs/tags/v1.0.0
> 5cb1fbdd7cf82e1909e27c81073cf3272cb63fa3 refs/tags/v1.0.0^{}
> 8e4751ca3600a2d82365e7e9d806f2bab9b81d56 refs/tags/v1.0.0-rc1
> 74ce242bf3307c7ec77b9ddfff443c247ac8c0a3 refs/tags/v1.0.0-rc1^{}
> 38738612dec55c0262de2192cbe655f499b8c5de refs/tags/v1.1.0
> 202863e4dd681d17c06a82943f49485bf7860633 refs/tags/v1.1.0^{}
> 2d38c152a6cbcd6fcd7a2f2535d3bc8860c975f9 refs/tags/v1.1.0-rc1
> 7364cc79b5fa11e416dce01802139bc87d690118 refs/tags/v1.1.0-rc1^{}
> 427b0062114703674688aa581d13499b1b2da896 refs/tags/v1.2.0
> 52c356d81b1b5b5426f53655e782c37793c3637e refs/tags/v1.2.0^{}
> 33ea8e2705c6905edcabda65dfa92af56716056b refs/tags/v1.2.0-rc1
> f8bf4bfc8796b46e6086a52f0cd6c1f9ed58645a refs/tags/v1.2.0-rc1^{}
> a532a5d2148e6a644bb56f7aa3d29297d19e30de refs/tags/v1.2.0-rc2
> 17773b0e5148c5ae281ee21492c871292cb7de20 refs/tags/v1.2.0-rc2^{}
> 00e38ce99a600e146aa20eac082b8d7d8ec70711 refs/tags/v1.3.0
> bc895d6d09695d05ceb8b52486ffe861d6cfbdde refs/tags/v1.3.0^{}
> 6d109a2e4885896d2665d4bbcc5bc985110b0950 refs/tags/v1.4.0
> 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf refs/tags/v1.4.0^{}
> 29a9b5177c0bc192f7881b940932f903aca9c360 refs/tags/v1.4.1
> 302fca9f4c283e1994cf0a5a9ce1cf43ca15e6d2 refs/tags/v1.4.1^{}
> 24ec6e01bca89179d744d836fe94f2b459abd03d refs/tags/v1.4.2
> ec02b34c05be04f249ffaaca4b666f5246877dea refs/tags/v1.4.2^{}
> jim@jim-VirtualBox:~$
>
>
> Note that the tags for 1.4.3 and 1.4.4 are missing... Are the tags not
> "fetched" with the synchronization?

Jeff, Stefan -- is something odd with the HTTP versions here?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2] Change the method to calculate dirty-pages-rate

2017-03-15 Thread Juan Quintela
Chao Fan  wrote:
> In function cpu_physical_memory_sync_dirty_bitmap, file
> include/exec/ram_addr.h:
>
> if (src[idx][offset]) {
> unsigned long bits = atomic_xchg([idx][offset], 0);
> unsigned long new_dirty;
> new_dirty = ~dest[k];
> dest[k] |= bits;
> new_dirty &= bits;
> num_dirty += ctpopl(new_dirty);
> }
>
> After these codes executed, only the pages not dirtied in bitmap(dest),
> but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated.
> For example:
> When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b,
> and atomic_rcu_read(_bitmap_rcu)->bmap = 0b0011,
> the new_dirty will be 0b1100, and this function will return 2 but not
> 4 which is expected.
> the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new,
> so these should be calculated also.
>
> Signed-off-by: Chao Fan 
> Signed-off-by: Li Zhijian 
>
> ---
> v2: Remove the parameter 'num_dirty_pages_init'
> Fix incoming parameters of trace_migration_bitmap_sync_end

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [Qemu-arm] [PATCH v2 0/5] hw: arm: exynos: Bring up secondary CPU + CPUIDLE issue

2017-03-15 Thread Krzysztof Kozlowski
On Wed, Mar 15, 2017 at 08:05:56AM +, Alex Bennée wrote:
> 
> Krzysztof Kozlowski  writes:
> 
> > On Tue, Mar 14, 2017 at 06:24:33PM +, Alex Bennée wrote:
> >> That means you're on a pre-mttcg tree so that's not the reason.
> >
> > I managed to build it on current master (v2.8.0-2182-g5e2fb7c598c6). It
> > behaves the same (with "-accel tcg,thread=single": cannot reach
> > switching to init).
> 
> Interesting. Can you post your exact qemu command line and I'll have a
> go at reproducing.

I build usually with:
../qemu/configure --cc="ccache cc" --cxx="ccache c++" --enable-debug 
--enable-fdt \
--enable-kvm --enable-libusb --enable-libssh2 --enable-lzo \
--enable-bzip2 --enable-curses --enable-gtk  --enable-cap-ng

Running with command:
arm-softmmu/qemu-system-arm -m 1024 -M smdkc210 -smp 2 -append \
'console=ttySAC0,115200n8 console=ttyS0 earlyprintk' -d guest_errors \
-serial stdio -D ..//log-smdkc210.log -kernel ../cur-linux/zImage
-dtb ../cur-linux/dts/exynos4210-smdkv310.dtb
-initrd ../armv7-odroidu3-exynos-v4.10-initramfs.cpio.gz

Plus the -accel line, depending on the needs.

The initramfs is a simple one from Arch Arm.

> 
> In the meantime you can always --enable-debug-tcg your build and see if
> any of the asserts fire.

enable-debug-tcg did not change much - no asserts. The last dmesg is:
[   72.616013] Registering SWP/SWPB emulation handler
[   74.484873] s3c64xx-spi 1394.spi: spi bus clock parent not specified, 
using clock at index 0 as parent
[   74.485994] s3c64xx-spi 1394.spi: number of chip select lines not 
specified, assuming 1 chip select line
[   74.493249] s3c64xx-spi 1394.spi: Failed to get RX DMA channel
[   74.503885] hctosys: unable to open rtc device (rtc0)
[   74.523004] ALSA device list:
[   74.523711]   No soundcards found.
[   74.542278] samsung-uart 1380.serial: DMA request failed, DMA will not 
be used


Best regards,
Krzysztof




Re: [Qemu-devel] [PATCH for 2.9] migration: use "" as the default for tls-creds/hostname

2017-03-15 Thread Dr. David Alan Gilbert
* Daniel P. Berrange (berra...@redhat.com) wrote:
> The tls-creds parameter has a default value of NULL indicating
> that TLS should not be used. Setting it to non-NULL enables
> use of TLS. Once tls-creds are set to a non-NULL value via the
> monitor, it isn't possible to set them back to NULL again, due
> to current implementation limitations. The empty string is not
> a valid QObject identifier, so this switches to use "" as the
> default, indicating that TLS will not be used
> 
> The tls-hostname parameter has a default value of NULL indicating
> the the hostname from the migrate connection URI should be used.
> Again, once tls-hostname is set non-NULL, to override the default
> hostname for x509 cert validation, it isn't possible to reset it
> back to NULL via the monitor. The empty string is not a valid
> hostname, so this switches to use "" as the default, indicating
> that the migrate URI hostname should be used.
> 
> Using "" as the default for both, also means that the monitor
> commands "info migrate_parameters" / "query-migrate-parameters"
> will report existance of tls-creds/tls-parameters even when set
> to their default values.
> 
> Signed-off-by: Daniel P. Berrange 

Yes, simple enough.

Reviewed-by: Dr. David Alan Gilbert 

Markus, Eric - are you OK with that?

Dave


> ---
>  migration/migration.c | 4 
>  migration/tls.c   | 2 +-
>  qapi-schema.json  | 4 
>  3 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 3dab684..54060f7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -110,6 +110,8 @@ MigrationState *migrate_get_current(void)
>  
>  if (!once) {
>  qemu_mutex_init(_migration.src_page_req_mutex);
> +current_migration.parameters.tls_creds = g_strdup("");
> +current_migration.parameters.tls_hostname = g_strdup("");
>  once = true;
>  }
>  return _migration;
> @@ -458,6 +460,7 @@ void migration_channel_process_incoming(MigrationState *s,
>  ioc, object_get_typename(OBJECT(ioc)));
>  
>  if (s->parameters.tls_creds &&
> +*s->parameters.tls_creds &&
>  !object_dynamic_cast(OBJECT(ioc),
>   TYPE_QIO_CHANNEL_TLS)) {
>  Error *local_err = NULL;
> @@ -480,6 +483,7 @@ void migration_channel_connect(MigrationState *s,
>  ioc, object_get_typename(OBJECT(ioc)), hostname);
>  
>  if (s->parameters.tls_creds &&
> +*s->parameters.tls_creds &&
>  !object_dynamic_cast(OBJECT(ioc),
>   TYPE_QIO_CHANNEL_TLS)) {
>  Error *local_err = NULL;
> diff --git a/migration/tls.c b/migration/tls.c
> index 203c11d..45bec44 100644
> --- a/migration/tls.c
> +++ b/migration/tls.c
> @@ -141,7 +141,7 @@ void migration_tls_channel_connect(MigrationState *s,
>  return;
>  }
>  
> -if (s->parameters.tls_hostname) {
> +if (s->parameters.tls_hostname && *s->parameters.tls_hostname) {
>  hostname = s->parameters.tls_hostname;
>  }
>  if (!hostname) {
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 32b4a4b..eb9bf67 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1036,6 +1036,8 @@
>  # credentials must be for a 'server' endpoint. Setting this
>  # will enable TLS for all migrations. The default is unset,
>  # resulting in unsecured migration at the QEMU level. (Since 2.7)
> +# An empty string means that QEMU will use plain text mode for
> +# migration, rather than TLS (Since 2.9)
>  #
>  # @tls-hostname: #optional hostname of the target host for the migration. 
> This
>  #is required when using x509 based TLS credentials and the
> @@ -1043,6 +1045,8 @@
>  #example if using fd: or exec: based migration, the
>  #hostname must be provided so that the server's x509
>  #certificate identity can be validated. (Since 2.7)
> +#An empty string means that QEMU will use the hostname
> +#associated with the migration URI, if any. (Since 2.9)
>  #
>  # @max-bandwidth: to set maximum speed for migration. maximum speed in
>  # bytes per second. (Since 2.8)
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v2] MAINTAINERS: Add myself as openrisc maintainer

2017-03-15 Thread Alex Bennée

Stafford Horne  writes:

> Jia has claimed he is no longer able to maintain.  I have fixing bugs here
> and there and getting familiar with the code base. Orignal thread from Jia:
>
>  https://lists.librecores.org/pipermail/openrisc/2017-January/000321.html
>
> Signed-off-by: Stafford Horne 

Reviewed-by: Alex Bennée 

> ---
> Changes in v2
>  - fix typo Off/Odd
>
>  MAINTAINERS | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4714df8..b359089 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -196,8 +196,8 @@ F: hw/nios2/
>  F: disas/nios2.c
>
>  OpenRISC
> -M: Jia Liu 
> -S: Maintained
> +M: Stafford Horne 
> +S: Odd Fixes
>  F: target/openrisc/
>  F: hw/openrisc/
>  F: tests/tcg/openrisc/


--
Alex Bennée



Re: [Qemu-devel] KVM call for 2017-03-14

2017-03-15 Thread Greg Kurz
On Wed, 15 Mar 2017 12:25:38 +0100
Laurent Vivier  wrote:

> Le 15/03/2017 à 11:29, Greg Kurz a écrit :
> > On Tue, 14 Mar 2017 11:56:36 +0100
> > Peter Maydell  wrote:
> >   
> >> On 14 March 2017 at 09:59, Juan Quintela  wrote:  
> >>> Peter Maydell  wrote:
>  On 14 March 2017 at 09:13, Stefan Hajnoczi  wrote:   
>   
> > On Mon, Mar 13, 2017 at 11:02:01AM +0100, Peter Maydell wrote:
> > The minimum requirements for the new language:
> > 1. Does it support the host operating systems that QEMU runs on?
> > 2. Does it support the host architectures that QEMU runs on?
> 
>  Speaking of this, I was thinking that we should introduce
>  a rule that for any host OS/arch we support we must have
>  a build machine so we can at least do a compile test.
>  For instance if you believe configure we support Solaris
>  and AIX, but I bet they're bit-rotting. The ia64 backend
>  has to be a strong candidate for being dumped too.
>  Demanding "system we can test on or we drop support"
>  would let us more clearly see what we're actually running
>  on and avoid unnecessarily ruling things out because they
>  don't support Itanium or AIX...
> >>>
> >>> YES, YES and YES.
> >>>
> >>> I demand an osX build machine NOW  Remote access is ok.
> >>
> >> OSX is actually in the set that's OK because I have a
> >> machine I can test on. The ones that are problems are
> >> all the BSDs, AIX, Solaris, Haiku, and architectures  
> > 
> > The most relevant links I could find about AIX hosts only mention 
> > QEMU 0.9.1:
> > 
> > http://www.perzl.org/aix/index.php?n=Main.Qemu
> > http://www.vivier.eu/Qemu/
> > 
> > I'm not aware of any effort within IBM to support newer versions
> > of QEMU on an AIX host (Cc'ing Laurent in case he would be aware
> > of a non-IBM initiative).
> >  
> 
> At this time it was a personal initiative, made possible because I have
> worked at Bull on the port of GNOME to AIX and had access to some Bull
> AIX Servers.
> 
> Now this effort has been moved to the AIX Toolbox for Linux Applications:
> 
> http://www-03.ibm.com/systems/power/software/aix/linux/
> 

Heh :)

> Perhaps you can try to contact someone inside IBM about this toolbox?
> 

Not even sure if it's worth the pain as I could not find any trace of QEMU
in the download pages:

http://www-03.ibm.com/systems/power/software/aix/linux/toolbox/alpha.html
https://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/

Cheers.

--
Greg

> Laurent
> 
> 


pgp5Z7uMhu39r.pgp
Description: OpenPGP digital signature


[Qemu-devel] need help

2017-03-15 Thread oussema ben khedher
hi i m using the log in_asm to get the assembly code of TB so when i searched 
inside qemu i find that the function "log_target_disas its the responsible to 
display assembly instruction but the problem that i did not find how really it 
works because i need to get all the branch address 
thank you 


Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"

2017-03-15 Thread Alex Bennée

Gerd Hoffmann  writes:

>   Hi,
>
>> Instead of having MMIO register spaces we use the dirty tracking
>> mechanism. Here regions are marked as for dirty tracking and when the
>> SoftMMU helper first comes to this bit of memory it will follow the slow
>> path and mark region as visited.
>
> visited?  Do you mean dirty bit is set for the page?  Or is this
> something else?

Yes the dirty page bit (or the clearing of the TLB_NOTDIRTY bit from the
SoftMMU entry).

>
>> Once done this bit is cleared and all
>> future writes to that page are written directly from the translated
>> code. This no longer has an implicit synchronisation from the BQL so
>> there is now a race and you can have memory being updated which might
>> miss this flagging.
>
> Not fully following what you are trying to say.  But pages being updated
> without dirty bit getting set (if clear) certainly is a problem for the
> vga emulation (and live migration too).
>
>> Note that KVM has some similar hacks to avoid trapping all writes to
>> video memory with its coalesced mmio mechanism however I'm not familiar
>> with all the details.
>
> Normal linear framebuffer access doesn't use this.

Ahh OK - as I said I wasn't super familiar with what coalesced mmio was
trying to achieve. I assume it is trying to avoid trapping on every
single MMIO access?

>
>> Now there are mechanisms we can use to ensure there are no races happen
>> and return to the situation that the display is only updated when the
>> TCG cores are not running.
>
> tcg and display updates running in parallel isn't a problem, we have
> that with kvm anyway.  Dirty bit handling must be correct though.
>
> With kvm at the start of each display update vga fetches the dirty
> bitmap from the kernel (memory_region_sync_dirty_bitmap).  Then it goes
> use memory_region_get_dirty to figure which pages have been touched.
>
> When memory_region_sync_dirty_bitmap is called the kernel will clear the
> memory bitmap of the region and also map all pages read-only.  Next
> guest update will pagefault and the kernel can set the dirty bit for the
> page (maybe there is a more efficient way with EPT available).
>
> I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
> the fast path optimization, so the slow path can update the dirty bits
> correctly.

Sure. And for the low level cputlb implementation we can clear those
bits atomically. However when the memory region is synced we also need
to flush any entries in the TLB that have already been hit and cleared
TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from
outside of a vCPU context because we can't just queue the work and exit
the vCPU run loop to reach a known CPU state.

The RFC PATCH I sent earlier solves this by ensuring the update runs in
a quiescent period (i.e. when the vCPUs are not running) but it is
sub-optimal as it means the display code has to have a basic knowledge
of vCPUs and when they run.

--
Alex Bennée



[Qemu-devel] [Bug 1673130] Re: qemu 2.7.0 receives SIGABRT in qemu_coroutine_enter()

2017-03-15 Thread Mohammed Gamal
Another stack trace:

--
(gdb) bt
#0  0x7f2f34690067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f2f34691448 in __GI_abort () at abort.c:89
#2  0x5629b8260b6c in qemu_coroutine_enter (co=0x7f2cd6a00940) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:113
#3  0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cd6a00880) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#4  0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cd6a00880) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#5  0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cee202b00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#6  0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cee202b00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#7  0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cee2141d0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#8  0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cee2141d0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#9  0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf300b370) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#10 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf300b370) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#11 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf1a03560) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#12 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf1a03560) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#13 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf3e15ba0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#14 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf3e15ba0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#15 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2ce80087f0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#16 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2ce80087f0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#17 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cee20d9c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#18 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cee20d9c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#19 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2ceff04850) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#20 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2ceff04850) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#21 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf21061c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#22 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf21061c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#23 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf2105c00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#24 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf2105c00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#25 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf3e1d590) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#26 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf3e1d590) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#27 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf3e16a00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#28 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf3e16a00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#29 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2ce8004da0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#30 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2ce8004da0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#31 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf3e15dc0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#32 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2cf3e15dc0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#33 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2ccff00420) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#34 0x5629b82609a9 in qemu_coroutine_enter (co=0x7f2ccff00420) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#35 0x5629b8260e74 in qemu_co_queue_run_restart (co=0x7f2cf1e04900) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#36 0x5629b82609a9 in qemu_coroutine_enter 

[Qemu-devel] [Bug 1673130] Re: qemu 2.7.0 receives SIGABRT in qemu_coroutine_enter()

2017-03-15 Thread Mohammed Gamal
Third stack trace:

--
#0  0x7f4d5ad6a067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f4d5ad6b448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x562a4c582b6c in qemu_coroutine_enter (co=0x7f4b1bf0a900) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:113
#3  0x562a4c582e55 in qemu_co_queue_run_restart (co=0x7f4b1bf0a830) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#4  0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1bf0a830) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#5  0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b1bf0f4c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#6  0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1bf0f4c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#7  0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e07c40) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#8  0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e07c40) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#9  0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e11420) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#10 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e11420) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#11 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e18c30) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#12 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e18c30) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#13 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b1bf07ea0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#14 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1bf07ea0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#15 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b1000c0c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#16 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1000c0c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#17 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e11b10) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#18 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e11b10) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#19 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e10500) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#20 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e10500) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#21 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b1bf0a610) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#22 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1bf0a610) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#23 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e12820) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#24 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e12820) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#25 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b10002b10) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#26 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b10002b10) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#27 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b1000bfb0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#28 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b1000bfb0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#29 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e103f0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#30 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e103f0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#31 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b17e078b0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#32 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b17e078b0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#33 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4adfe02b00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#34 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4adfe02b00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#35 0x562a4c582e74 in qemu_co_queue_run_restart (co=0x7f4b15701ae0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#36 0x562a4c5829a9 in qemu_coroutine_enter (co=0x7f4b15701ae0) at 

[Qemu-devel] [Bug 1673130] [NEW] qemu 2.7.0 receives SIGABRT in qemu_coroutine_enter()

2017-03-15 Thread Mohammed Gamal
Public bug reported:

I've been experiencing frequent SIGABRTs (in addition to segfaults in
#1671876) lately with qemu 2.7.0 running Ubuntu 16.04 guests. The crash
usually happens in qemu_coroutine_enter(). I haven't seen this so far
with any other guests or distros.

Here is one stack trace I obtained
--
(gdb) bt
#0  0x7fd7cc676067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7fd7cc677448 in __GI_abort () at abort.c:89
#2  0x556aed247b6c in qemu_coroutine_enter (co=0x7fd574300df0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:113
#3  0x556aed247e55 in qemu_co_queue_run_restart (co=0x7fd574300ce0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#4  0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd574300ce0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#5  0x556aed247e74 in qemu_co_queue_run_restart (co=0x7fd589111670) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#6  0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd589111670) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#7  0x556aed247e74 in qemu_co_queue_run_restart (co=0x7fd57430dba0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#8  0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd57430dba0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#9  0x556aed247e74 in qemu_co_queue_run_restart (co=0x7fd589119130) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#10 0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd589119130) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#11 0x556aed247e74 in qemu_co_queue_run_restart (co=0x7fd589117410) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#12 0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd589117410) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#13 0x556aed247e74 in qemu_co_queue_run_restart (co=0x7fd577f00e00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
#14 0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd577f00e00) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
#15 0x556aed247fa0 in qemu_co_enter_next (queue=queue@entry=0x556aef34e5e0) 
at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106
#16 0x556aed1e6060 in timer_cb (blk=0x556aef34e590, is_write=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400
#17 0x556aed1a3615 in timerlist_run_timers (timer_list=0x556aef3bad40) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528
#18 0x556aed1a3679 in timerlistgroup_run_timers 
(tlg=tlg@entry=0x556af0738758) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564
#19 0x556aed1a3f47 in aio_dispatch (ctx=ctx@entry=0x556af0738610) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357
#20 0x556aed1a40e8 in aio_poll (ctx=0x556af0738610, blocking=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479
#21 0x556aed005c79 in iothread_run (opaque=0x556af07383c0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46
#22 0x7fd7cc9f40a4 in start_thread (arg=0x7fd7aafff700) at 
pthread_create.c:403
#23 0x7fd7cc72962d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
--

The code crashes here
--
void qemu_coroutine_enter(Coroutine *co)
{
Coroutine *self = qemu_coroutine_self();
CoroutineAction ret;

trace_qemu_coroutine_enter(self, co, co->entry_arg);

if (co->caller) {
fprintf(stderr, "Co-routine re-entered recursively\n");
abort();  <--- Code aborts here 
  
}

[...]
}
--

Debugging further we see:
--
(gdb) frame 2
#2  0x556aed247b6c in qemu_coroutine_enter (co=0x7fd574300df0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:113
113/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c: No such file 
or directory.
(gdb) print *co
$1 = {entry = 0x7fd793e95a58, entry_arg = 0x1, caller = 0x7fd793e95a38, 
pool_next = {sle_next = 0x10}, co_queue_wakeup = {sqh_first = 0x7fd6ebbd2000, 
sqh_last = 0x1000}, co_queue_next = {
sqe_next = 0x7fd6ebbd1000}}
(gdb) print *co->caller
$2 = {entry = 0x40040001, entry_arg = 0xc546a20, caller = 0x0, pool_next = 
{sle_next = 0x0}, co_queue_wakeup = {sqh_first = 0x0, sqh_last = 
0xea00061f7480}, co_queue_next = {sqe_next = 0x1000}}
(gdb) frame 4
#4  0x556aed2479a9 in qemu_coroutine_enter (co=0x7fd574300ce0) at 

[Qemu-devel] [PATCH for 2.9] migration: use "" as the default for tls-creds/hostname

2017-03-15 Thread Daniel P. Berrange
The tls-creds parameter has a default value of NULL indicating
that TLS should not be used. Setting it to non-NULL enables
use of TLS. Once tls-creds are set to a non-NULL value via the
monitor, it isn't possible to set them back to NULL again, due
to current implementation limitations. The empty string is not
a valid QObject identifier, so this switches to use "" as the
default, indicating that TLS will not be used

The tls-hostname parameter has a default value of NULL indicating
the the hostname from the migrate connection URI should be used.
Again, once tls-hostname is set non-NULL, to override the default
hostname for x509 cert validation, it isn't possible to reset it
back to NULL via the monitor. The empty string is not a valid
hostname, so this switches to use "" as the default, indicating
that the migrate URI hostname should be used.

Using "" as the default for both, also means that the monitor
commands "info migrate_parameters" / "query-migrate-parameters"
will report existance of tls-creds/tls-parameters even when set
to their default values.

Signed-off-by: Daniel P. Berrange 
---
 migration/migration.c | 4 
 migration/tls.c   | 2 +-
 qapi-schema.json  | 4 
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3dab684..54060f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -110,6 +110,8 @@ MigrationState *migrate_get_current(void)
 
 if (!once) {
 qemu_mutex_init(_migration.src_page_req_mutex);
+current_migration.parameters.tls_creds = g_strdup("");
+current_migration.parameters.tls_hostname = g_strdup("");
 once = true;
 }
 return _migration;
@@ -458,6 +460,7 @@ void migration_channel_process_incoming(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)));
 
 if (s->parameters.tls_creds &&
+*s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
  TYPE_QIO_CHANNEL_TLS)) {
 Error *local_err = NULL;
@@ -480,6 +483,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname);
 
 if (s->parameters.tls_creds &&
+*s->parameters.tls_creds &&
 !object_dynamic_cast(OBJECT(ioc),
  TYPE_QIO_CHANNEL_TLS)) {
 Error *local_err = NULL;
diff --git a/migration/tls.c b/migration/tls.c
index 203c11d..45bec44 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -141,7 +141,7 @@ void migration_tls_channel_connect(MigrationState *s,
 return;
 }
 
-if (s->parameters.tls_hostname) {
+if (s->parameters.tls_hostname && *s->parameters.tls_hostname) {
 hostname = s->parameters.tls_hostname;
 }
 if (!hostname) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 32b4a4b..eb9bf67 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1036,6 +1036,8 @@
 # credentials must be for a 'server' endpoint. Setting this
 # will enable TLS for all migrations. The default is unset,
 # resulting in unsecured migration at the QEMU level. (Since 2.7)
+# An empty string means that QEMU will use plain text mode for
+# migration, rather than TLS (Since 2.9)
 #
 # @tls-hostname: #optional hostname of the target host for the migration. This
 #is required when using x509 based TLS credentials and the
@@ -1043,6 +1045,8 @@
 #example if using fd: or exec: based migration, the
 #hostname must be provided so that the server's x509
 #certificate identity can be validated. (Since 2.7)
+#An empty string means that QEMU will use the hostname
+#associated with the migration URI, if any. (Since 2.9)
 #
 # @max-bandwidth: to set maximum speed for migration. maximum speed in
 # bytes per second. (Since 2.8)
-- 
2.9.3




[Qemu-devel] [Bug 1217339] Re: SIGQUIT to send ACPI-shutdown to Guest

2017-03-15 Thread WhiteWinterWolf
Sorry Thomas, I was not aware of this page. I checked the CODING_STYLE
file present in Qemu source before submitting this, maybe it could be
useful to include this URL there.

Meanwhile, for reference the discussion continues there:
https://lists.nongnu.org/archive/html/qemu-devel/2017-03/msg03039.html

Regards.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1217339

Title:
  SIGQUIT to send ACPI-shutdown to Guest

Status in QEMU:
  New

Bug description:
  When qemu receives SIGQUIT, it should first try to run
  system_powerdown (giving the guest an ACPI signal to begin the
  shutdown process), before ending the whole qemu process.

  At this point there is no way to do a graceful shutdown if you do not
  have access to the monitor and you do not use any wrapper like
  libvirt.

  If, for some reason SIGQUIT would not be accepted as the signal, take
  any free to use signal, like SIGUSR1. There should be a way to get
  ACPI shutdown sent to the guest.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1217339/+subscriptions



[Qemu-devel] [PATCH v5] xen: don't save/restore the physmap on VM save/restore

2017-03-15 Thread Igor Druzhinin
Saving/restoring the physmap to/from xenstore was introduced to
QEMU majorly in order to cover up the VRAM region restore issue.
The sequence of restore operations implies that we should know
the effective guest VRAM address *before* we have the VRAM region
restored (which happens later). Unfortunately, in Xen environment
VRAM memory does actually belong to a guest - not QEMU itself -
which means the position of this region is unknown beforehand and
can't be mapped into QEMU address space immediately.

Previously, recreating xenstore keys, holding the physmap, by the
toolstack helped to get this information in place at the right
moment ready to be consumed by QEMU to map the region properly.

The extraneous complexity of having those keys transferred by the
toolstack and unnecessary redundancy prompted us to propose a
solution which doesn't require any extra data in xenstore. The idea
is to defer the VRAM region mapping till the point we actually know
the effective address and able to map it. To that end, we initially
just skip the mapping request for the framebuffer if we unable to
map it now. Then, after the memory region restore phase, we perform
the mapping again, this time successfully, and update the VRAM region
metadata accordingly.

Signed-off-by: Igor Druzhinin 
---
v5:
* Add an assertion and debug printf

v4:
* Use VGA post_load handler for vram_ptr update

v3:
* Modify qemu_ram_ptr_length similarly with qemu_map_ram_ptr
* Add a comment explaining qemu_map_ram_ptr and qemu_ram_ptr_length
  semantic change for Xen
* Dropped some redundant changes

v2:
* Fix some building and coding style issues
---
 exec.c   |  16 +
 hw/display/vga.c |  11 ++
 xen-hvm.c| 104 ++-
 3 files changed, 46 insertions(+), 85 deletions(-)

diff --git a/exec.c b/exec.c
index aabb035..a1ac8cd 100644
--- a/exec.c
+++ b/exec.c
@@ -2008,6 +2008,14 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 return ramblock_ptr(block, addr);
 }
@@ -2041,6 +2049,14 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
 }
 
 block->host = xen_map_cache(block->offset, block->max_length, 1);
+if (block->host == NULL) {
+/* In case we cannot establish the mapping right away we might
+ * still be able to do it later e.g. on a later stage of restore.
+ * We don't touch the block and return NULL here to indicate
+ * that intention.
+ */
+return NULL;
+}
 }
 
 return ramblock_ptr(block, addr);
diff --git a/hw/display/vga.c b/hw/display/vga.c
index 69c3e1d..7d85fd8 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2035,6 +2035,12 @@ static int vga_common_post_load(void *opaque, int 
version_id)
 {
 VGACommonState *s = opaque;
 
+if (xen_enabled() && !s->vram_ptr) {
+/* update VRAM region pointer in case we've failed
+ * the last time during init phase */
+s->vram_ptr = memory_region_get_ram_ptr(>vram);
+assert(s->vram_ptr);
+}
 /* force refresh */
 s->graphic_mode = -1;
 vbe_update_vgaregs(s);
@@ -2165,6 +2171,11 @@ void vga_common_init(VGACommonState *s, Object *obj, 
bool global_vmstate)
 vmstate_register_ram(>vram, global_vmstate ? NULL : DEVICE(obj));
 xen_register_framebuffer(>vram);
 s->vram_ptr = memory_region_get_ram_ptr(>vram);
+/* VRAM pointer might still be NULL here if we are restoring on Xen.
+   We try to get it again later at post-load phase. */
+#ifdef DEBUG_VGA_MEM
+printf("vga: vram ptr: %p\n", s->vram_ptr);
+#endif
 s->get_bpp = vga_get_bpp;
 s->get_offsets = vga_get_offsets;
 s->get_resolution = vga_get_resolution;
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..8bedd9b 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -317,7 +317,6 @@ static int xen_add_to_physmap(XenIOState *state,
 XenPhysmap *physmap = NULL;
 hwaddr pfn, start_gpfn;
 hwaddr phys_offset = memory_region_get_ram_addr(mr);
-char path[80], value[17];
 const char *mr_name;
 
 if (get_physmapping(state, start_addr, size)) {
@@ -340,6 +339,22 @@ go_physmap:
 DPRINTF("mapping vram to %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
 start_addr, start_addr + size);
 
+mr_name = memory_region_name(mr);
+
+physmap = g_malloc(sizeof(XenPhysmap));
+
+physmap->start_addr = start_addr;
+physmap->size = size;
+physmap->name = mr_name;
+

Re: [Qemu-devel] Obsolete QEMU host environments

2017-03-15 Thread Aurelien Jarno
On 2017-03-15 07:09, Richard Henderson wrote:
> On 03/15/2017 03:07 AM, Peter Maydell wrote:
> > On 14 March 2017 at 17:54, Thomas Huth  wrote:
> > > Our ia64 host backend in QEMU (tcg/ia64) is still marked as maintained
> > > ... so it's maybe not as dead as you think? Or should we rather get rid
> > > of that soon, too?
> > 
> > I don't actually mind whether we keep tcg/ia64 or drop it.
> > But if we keep it then we must have a machine we can test
> > it on -- that means one I have access to (and which we
> > can otherwise use for central testing), not just one the
> > maintainer might happen to have.
> > 
> > Also, that MAINTAINERS entry was added in 2011, so it's
> > probably about as out of date as the code :-)
> 
> Red Hat's last ia64 machine died a few years ago, so I haven't
> been able to test it for quite some time.
> 
> I don't know if Aurelien can still test on ia64 or not.

I have the same issue, the last QEMU test I have been able to do on an
ia64 machine dates from last summer, so something like 9 months ago. I
don't think anybody has a lot of interest in ia64 anymore, so I guess
it's time to just remove the ia64 host backend.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"

2017-03-15 Thread Gerd Hoffmann
  Hi,

> Instead of having MMIO register spaces we use the dirty tracking
> mechanism. Here regions are marked as for dirty tracking and when the
> SoftMMU helper first comes to this bit of memory it will follow the slow
> path and mark region as visited.

visited?  Do you mean dirty bit is set for the page?  Or is this
something else?

> Once done this bit is cleared and all
> future writes to that page are written directly from the translated
> code. This no longer has an implicit synchronisation from the BQL so
> there is now a race and you can have memory being updated which might
> miss this flagging.

Not fully following what you are trying to say.  But pages being updated
without dirty bit getting set (if clear) certainly is a problem for the
vga emulation (and live migration too).

> Note that KVM has some similar hacks to avoid trapping all writes to
> video memory with its coalesced mmio mechanism however I'm not familiar
> with all the details.

Normal linear framebuffer access doesn't use this.

> Now there are mechanisms we can use to ensure there are no races happen
> and return to the situation that the display is only updated when the
> TCG cores are not running.

tcg and display updates running in parallel isn't a problem, we have
that with kvm anyway.  Dirty bit handling must be correct though.

With kvm at the start of each display update vga fetches the dirty
bitmap from the kernel (memory_region_sync_dirty_bitmap).  Then it goes
use memory_region_get_dirty to figure which pages have been touched.

When memory_region_sync_dirty_bitmap is called the kernel will clear the
memory bitmap of the region and also map all pages read-only.  Next
guest update will pagefault and the kernel can set the dirty bit for the
page (maybe there is a more efficient way with EPT available).

I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
the fast path optimization, so the slow path can update the dirty bits
correctly.

cheers,
  Gerd




Re: [Qemu-devel] [PULL 0/2] submodule-update queue 20170303

2017-03-15 Thread James Hanley
It would appear that the HTTP versions of the repo URLs are /not/
referencing the same backend

jim@jim-VirtualBox:~$ git ls-remote http://git.qemu-project.org/git/dtc.git
3b9c97093d6e1067f4d24d2bff32f3dd24e0751e HEAD
827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/heads/expressions
3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/heads/master
17dab7023959256224e800dd77cae57d8ecfaec0 refs/heads/multi-v1-tags
5426757b714e4c142da488fe685220b732f69d7b refs/heads/testing
827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/github/expressions
3b9c97093d6e1067f4d24d2bff32f3dd24e0751e refs/remotes/github/master
17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/github/multi-v1-tags
5426757b714e4c142da488fe685220b732f69d7b refs/remotes/github/testing
827ac8eca016c39b13fc916bbdb16f9f2fe3c34c refs/remotes/origin/expressions
120775eb1cf39f8dcecd695c3ff1cfef8aeb669d refs/remotes/origin/master
17dab7023959256224e800dd77cae57d8ecfaec0 refs/remotes/origin/multi-v1-tags
f5aa792d81f5911eff088e4f88c0cd0a11ea9ca0 refs/tags/dwg-last
0a1018321b08f89d0f1942c77802aa777a82d437 refs/tags/v1.0.0
5cb1fbdd7cf82e1909e27c81073cf3272cb63fa3 refs/tags/v1.0.0^{}
8e4751ca3600a2d82365e7e9d806f2bab9b81d56 refs/tags/v1.0.0-rc1
74ce242bf3307c7ec77b9ddfff443c247ac8c0a3 refs/tags/v1.0.0-rc1^{}
38738612dec55c0262de2192cbe655f499b8c5de refs/tags/v1.1.0
202863e4dd681d17c06a82943f49485bf7860633 refs/tags/v1.1.0^{}
2d38c152a6cbcd6fcd7a2f2535d3bc8860c975f9 refs/tags/v1.1.0-rc1
7364cc79b5fa11e416dce01802139bc87d690118 refs/tags/v1.1.0-rc1^{}
427b0062114703674688aa581d13499b1b2da896 refs/tags/v1.2.0
52c356d81b1b5b5426f53655e782c37793c3637e refs/tags/v1.2.0^{}
33ea8e2705c6905edcabda65dfa92af56716056b refs/tags/v1.2.0-rc1
f8bf4bfc8796b46e6086a52f0cd6c1f9ed58645a refs/tags/v1.2.0-rc1^{}
a532a5d2148e6a644bb56f7aa3d29297d19e30de refs/tags/v1.2.0-rc2
17773b0e5148c5ae281ee21492c871292cb7de20 refs/tags/v1.2.0-rc2^{}
00e38ce99a600e146aa20eac082b8d7d8ec70711 refs/tags/v1.3.0
bc895d6d09695d05ceb8b52486ffe861d6cfbdde refs/tags/v1.3.0^{}
6d109a2e4885896d2665d4bbcc5bc985110b0950 refs/tags/v1.4.0
65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf refs/tags/v1.4.0^{}
29a9b5177c0bc192f7881b940932f903aca9c360 refs/tags/v1.4.1
302fca9f4c283e1994cf0a5a9ce1cf43ca15e6d2 refs/tags/v1.4.1^{}
24ec6e01bca89179d744d836fe94f2b459abd03d refs/tags/v1.4.2
ec02b34c05be04f249ffaaca4b666f5246877dea refs/tags/v1.4.2^{}
jim@jim-VirtualBox:~$


Note that the tags for 1.4.3 and 1.4.4 are missing... Are the tags not
"fetched" with the synchronization?

On Mon, Mar 6, 2017 at 11:50 AM, James Hanley  wrote:

> I'm still seeing the same error - the only change I have in my clone is
> the following so that the submodules are accessible across our firewall -
> should the http URLs be referencing the same backend repo as the git URLs?:
> jim@jim-VirtualBox:~/project/test_qemu_repo/qemu$ git diff -v
> 6865190577f240d0c0f15d6537a893d771596404^ 6865190577f240d0c0f15d6537a893
> d771596404
> diff --git a/.gitmodules b/.gitmodules
> index ca323b4..0156c06 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -1,36 +1,36 @@
>  [submodule "roms/vgabios"]
> path = roms/vgabios
> -   url = git://git.qemu-project.org/vgabios.git/
> +   url = http://git.qemu-project.org/git/vgabios.git
>  [submodule "roms/seabios"]
> path = roms/seabios
> -   url = git://git.qemu-project.org/seabios.git/
> +   url = http://git.qemu-project.org/git/seabios.git
>  [submodule "roms/SLOF"]
> path = roms/SLOF
> -   url = git://git.qemu-project.org/SLOF.git
> +   url = http://git.qemu-project.org/git/SLOF.git
>  [submodule "roms/ipxe"]
> path = roms/ipxe
> -   url = git://git.qemu-project.org/ipxe.git
> +   url = http://git.qemu-project.org/git/ipxe.git
>  [submodule "roms/openbios"]
> path = roms/openbios
> -   url = git://git.qemu-project.org/openbios.git
> +   url = http://git.qemu-project.org/git/openbios.git
>  [submodule "roms/openhackware"]
> path = roms/openhackware
> -   url = git://git.qemu-project.org/openhackware.git
> +   url = http://git.qemu-project.org/git/openhackware.git
>  [submodule "roms/qemu-palcode"]
> path = roms/qemu-palcode
> -   url = git://github.com/rth7680/qemu-palcode.git
> +   url = https://github.com/rth7680/qemu-palcode.git
>  [submodule "roms/sgabios"]
> path = roms/sgabios
> -   url = git://git.qemu-project.org/sgabios.git
> +   url = http://git.qemu-project.org/git/sgabios.git
>  [submodule "pixman"]
> path = pixman
> -   url = git://anongit.freedesktop.org/pixman
> +   url = https://anongit.freedesktop.org/git/pixman.git
>  [submodule "dtc"]
> path = dtc
> -   url = git://git.qemu-project.org/dtc.git
> +   url = http://git.qemu-project.org/git/dtc.git
>  [submodule "roms/u-boot"]
> path = roms/u-boot
> -   url = git://git.qemu-project.org/u-boot.git
> +   url = http://git.qemu-project.org/git/u-boot.git
>  

Re: [Qemu-devel] [PATCH] blk: fix aio context loss on media change

2017-03-15 Thread Kevin Wolf
Am 15.03.2017 um 16:09 hat Paolo Bonzini geschrieben:
>  There should be a policy on which BB sets AioContext on the BDS (e.g.
>  only the device does it), but apart from that, it should not be an issue.
> >>>
> >>> We don't know which BBs are going to be attached. We don't necessarily
> >>> have a device at all, or we could have two of them.
> >>
> >> Wow, can we really have two? :-O
> > 
> > What would prevent you from doing this? The whole blockdev work was
> > about making the block layer more flexible, so now we have this
> > flexibility of attaching more or less anything to anything (unless op
> > blockers prevent it, which is why they are important for actually
> > supporting blockdev).
> 
> Yeah, the actual question was more "will the blockers allow two" devices
> behind the same BDS.  But I suppose there's no reason to prevent that
> (emulating multipath, for example).

For read-only devices, there's no reason anyway. For writable ones, you
have to set the share-rw=on qdev property, but then it's allowed.

> >>> Though maybe we should try to keep a BDS and its children in the same
> >>> AioContext anyway if that's possible? Will it make a difference?
> >>
> >> Everything can make sense---but yes, keeping the whole hierarchy in the
> >> same AioContext makes sense more often.
> > 
> > So I take this to mean that it does make a difference. :-)
> > 
> > If we want to keep users and their child nodes in the same AioContext by
> > default, we'll probably still need to implement all of the callbacks
> > that we would need for proper AioContext management today.
> 
> Or just assume that in the common case people won't specify iothread on
> -blockdev, only on -device.

Right, but then attaching a new device could mean that the BDS moves to
a different AioContext even though the other users are still on the old
one. Not sure if this is bad, but it might be unexpected.

Kevin



Re: [Qemu-devel] [PATCH v3 0/2] hw/i386: Update FADT to Revision 3 (ACPI 2.0)

2017-03-15 Thread G 3


On Mar 15, 2017, at 10:25 AM, qemu-devel-requ...@nongnu.org wrote:


On Wed, Mar 15, 2017 at 07:20:25PM +1300, Phil Dennis-Jordan wrote:
This updates the FADT generated for x86/64 machine types from  
Revision 1 to 3. (Based on ACPI standard 2.0 instead of 1.0) As  
previously, the goal is to make running macOS/OS X guests  
smoother. With a Rev1 FADT, rebooting such a guest doesn't work,  
as the OS uses the reset register information from the FADT.  
Switching to a Rev3 (ACPI 2.0) FADT solves this problem.


The previous discussion of this raised a bunch of points, which  
I'll address/clarify here as well:


 1. No runtime option. The preference was expressed that we try to  
stay backwards-compatible with legacy guests as opposed to adding  
a runtime option for different APCI versions. ACPI 2.0/FADT Rev3  
is the minimum version required for exposing the reset register,  
and it is also backwards-compatible with 1.0/Rev1, so that seemed  
a good version to target.


 2. Legacy guest testing. I've tested this successfully (no  
apparent regressions) with:
   * Windows XP x86 (both "pc" and "q35" machine types, the latter  
using -device piix4-ide)

   * Windows 7, both 32-bit and 64-bit editions
   * Windows 10 x64
   * Fedora 7 x86 Live image
   * Fedora 25 x86_64 Live image
   * Ubuntu 10.04.4 AMD64 Live image
Any other specific OSes and versions I should check?


Windows 2000



Re: [Qemu-devel] [PATCH v2 for-2.9 02/47] qapi: Make doc comments optional where we don't need them

2017-03-15 Thread Eric Blake
On 03/15/2017 10:13 AM, Markus Armbruster wrote:

>>> +Pragma's scope is currently the complete schema.  Setting it to
>>
>> You can do:
>>
>> { 'pragma': { 'doc-required': true } }
>> { 'pragma': { 'whitelist': [...] } }
>>
>> what you can't do is:
>>
>> { 'pragma': { 'doc-required': true } }
>> { 'pragma': { 'doc-required': false } }
>>
>> Maybe s/Setting it/Setting a given pragma name/
> 
> Yes, that's better.  Perhaps "Setting the same pragma to ..."

Works for me.

> 
>>> +different values in parts of the schema doesn't work.
> 
> I considered rejecting all but the first pragma for any given pragma,
> but decided just add this note for now.

Yeah, I noticed. Coding up rejection is harder than just documenting
that repetition is undefined.  And it's not like we're adding new pragma
usages into our *.json files on a daily basis ;)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 for-2.9 02/47] qapi: Make doc comments optional where we don't need them

2017-03-15 Thread Markus Armbruster
Eric Blake  writes:

> On 03/15/2017 07:56 AM, Markus Armbruster wrote:
>> Since we added the documentation generator in commit 3313b61, doc
>> comments are mandatory.  That's a very good idea for a schema that
>> needs to be documented, but has proven to be annoying for testing.
>> 
>> Make doc comments optional again, but add a new directive
>> 
>> { 'pragma': { 'doc-required': true } }
>> 
>> to let a QAPI schema require them.
>> 
>> Add test cases for the new pragma directive.  While there, plug a
>> minor hole in includ directive test coverage.
>
> s/includ/include/

Will fix.

>> Require documentation in the schemas we actually want documented:
>> qapi-schema.json and qga/qapi-schema.json.
>> 
>> We could probably make qapi2texi.py cope with incomplete
>> documentation, but for now, simply make it refuse to run unless the
>> schema has 'doc-required': true.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>
>> +=== Pragma directives ===
>> +
>> +Usage: { 'pragma': DICT }
>> +
>> +The pragma directive lets you control optional generator behavior.
>> +The dictionary's entries are pragma names and values.
>> +
>> +Pragma's scope is currently the complete schema.  Setting it to
>
> You can do:
>
> { 'pragma': { 'doc-required': true } }
> { 'pragma': { 'whitelist': [...] } }
>
> what you can't do is:
>
> { 'pragma': { 'doc-required': true } }
> { 'pragma': { 'doc-required': false } }
>
> Maybe s/Setting it/Setting a given pragma name/

Yes, that's better.  Perhaps "Setting the same pragma to ..."

>> +different values in parts of the schema doesn't work.

I considered rejecting all but the first pragma for any given pragma,
but decided just add this note for now.

>> +
>> +Pragma 'doc-required' takes a boolean value.  If true, documentation
>> +is required.  Default is false.
>> +
>> +
>
> The new tests are nice; thanks.
>
> Reviewed-by: Eric Blake 

Thanks!



Re: [Qemu-devel] [PATCH] blk: fix aio context loss on media change

2017-03-15 Thread Paolo Bonzini


On 15/03/2017 16:02, Kevin Wolf wrote:
> Am 15.03.2017 um 15:43 hat Paolo Bonzini geschrieben:
>> On 15/03/2017 15:30, Kevin Wolf wrote:
>>> Am 15.03.2017 um 14:39 hat Paolo Bonzini geschrieben:
 On 15/03/2017 12:03, Kevin Wolf wrote:
> But we discussed this earlier, and while I'm not completely sure any
> more about the details, I seem to remeber that Paolo said something
> along the lines that AioContext is going away anyway and building the
> code for proper management would be wasted time.

 AioContext is going to stay, but everybody will be able to send
 operations to a BB/BDS from any AioContext.  The BDS AioContext will
 only matter for network devices, since they have to attach the file
 descriptor handlers somewhere.  For files it won't matter at all because
 you can use multiple Linux AIO context or thread pools at the same time.
>>>
>>> Should the iothread option then become a -blockdev option rather than a
>>> -device one?
>>
>> Well, both.  The device also needs an I/O thread to attach its ioeventfd
>> handler.  And it makes sense to use the -device I/O thread if -blockdev
>> specified none.
> 
> Right, that makes sense. I just wasn't aware until now that we would get
> a per-node option, so that's good to know.

I hadn't thought about it either. :)

 There should be a policy on which BB sets AioContext on the BDS (e.g.
 only the device does it), but apart from that, it should not be an issue.
>>>
>>> We don't know which BBs are going to be attached. We don't necessarily
>>> have a device at all, or we could have two of them.
>>
>> Wow, can we really have two? :-O
> 
> What would prevent you from doing this? The whole blockdev work was
> about making the block layer more flexible, so now we have this
> flexibility of attaching more or less anything to anything (unless op
> blockers prevent it, which is why they are important for actually
> supporting blockdev).

Yeah, the actual question was more "will the blockers allow two" devices
behind the same BDS.  But I suppose there's no reason to prevent that
(emulating multipath, for example).

>>> Though maybe we should try to keep a BDS and its children in the same
>>> AioContext anyway if that's possible? Will it make a difference?
>>
>> Everything can make sense---but yes, keeping the whole hierarchy in the
>> same AioContext makes sense more often.
> 
> So I take this to mean that it does make a difference. :-)
> 
> If we want to keep users and their child nodes in the same AioContext by
> default, we'll probably still need to implement all of the callbacks
> that we would need for proper AioContext management today.

Or just assume that in the common case people won't specify iothread on
-blockdev, only on -device.

Paolo



Re: [Qemu-devel] [PATCH] blk: fix aio context loss on media change

2017-03-15 Thread Kevin Wolf
Am 15.03.2017 um 15:43 hat Paolo Bonzini geschrieben:
> On 15/03/2017 15:30, Kevin Wolf wrote:
> > Am 15.03.2017 um 14:39 hat Paolo Bonzini geschrieben:
> >> On 15/03/2017 12:03, Kevin Wolf wrote:
> >>> But we discussed this earlier, and while I'm not completely sure any
> >>> more about the details, I seem to remeber that Paolo said something
> >>> along the lines that AioContext is going away anyway and building the
> >>> code for proper management would be wasted time.
> >>
> >> AioContext is going to stay, but everybody will be able to send
> >> operations to a BB/BDS from any AioContext.  The BDS AioContext will
> >> only matter for network devices, since they have to attach the file
> >> descriptor handlers somewhere.  For files it won't matter at all because
> >> you can use multiple Linux AIO context or thread pools at the same time.
> > 
> > Should the iothread option then become a -blockdev option rather than a
> > -device one?
> 
> Well, both.  The device also needs an I/O thread to attach its ioeventfd
> handler.  And it makes sense to use the -device I/O thread if -blockdev
> specified none.

Right, that makes sense. I just wasn't aware until now that we would get
a per-node option, so that's good to know.

> >> There should be a policy on which BB sets AioContext on the BDS (e.g.
> >> only the device does it), but apart from that, it should not be an issue.
> > 
> > We don't know which BBs are going to be attached. We don't necessarily
> > have a device at all, or we could have two of them.
> 
> Wow, can we really have two? :-O

What would prevent you from doing this? The whole blockdev work was
about making the block layer more flexible, so now we have this
flexibility of attaching more or less anything to anything (unless op
blockers prevent it, which is why they are important for actually
supporting blockdev).

> > Though maybe we should try to keep a BDS and its children in the same
> > AioContext anyway if that's possible? Will it make a difference?
> 
> Everything can make sense---but yes, keeping the whole hierarchy in the
> same AioContext makes sense more often.

So I take this to mean that it does make a difference. :-)

If we want to keep users and their child nodes in the same AioContext by
default, we'll probably still need to implement all of the callbacks
that we would need for proper AioContext management today.

Kevin



Re: [Qemu-devel] [PATCH v2 for-2.9 31/47] qapi: Fix detection of doc / expression mismatch

2017-03-15 Thread Eric Blake
On 03/15/2017 07:57 AM, Markus Armbruster wrote:
> This fixes the errors uncovered by the previous commit.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> ---

>  def check_freeform_doc(doc):
> -if doc.symbol:
> -raise QAPISemError(doc.info,
> -   "Documention for '%s' is not followed"
> -   " by the definition" % doc.symbol)

> +++ b/tests/qapi-schema/doc-missing-expr.err
> @@ -1 +1 @@
> -tests/qapi-schema/doc-missing-expr.json:3: Documention for 'bar' is not 
> followed by the definition
> +tests/qapi-schema/doc-missing-expr.json:3: Documentation for 'bar' is not 
> followed by the definition

I didn't even catch that you were also fixing the typo, on my last round
of reviews :)

R-b stands.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 for-2.9 30/47] tests/qapi-schema: Improve doc / expression mismatch coverage

2017-03-15 Thread Eric Blake
On 03/15/2017 07:57 AM, Markus Armbruster wrote:
> New tests doc-before-include.json and doc-before-pragma.json show we
> fail to reject a misplaced expression comment.
> 
> New test doc-no-symbol.json shows a bad error message.
> 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1671876] Re: qemu 2.7.0 segfaults in qemu_co_queue_run_restart()

2017-03-15 Thread Mohammed Gamal
Unfortunately it'd not be possible to use another version at the moment.
Is it possible that someone takes a look at the stack traces?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1671876

Title:
  qemu 2.7.0 segfaults in qemu_co_queue_run_restart()

Status in QEMU:
  New

Bug description:
  Hi,

  I've been experiencing frequent segfaults lately with qemu 2.7.0
  running Ubuntu 16.04 guests. The crash usually happens in
  qemu_co_queue_run_restart(). I haven't seen this so far with any other
  guests or distros.

  Here is one back trace I obtained from one of the crashing VMs.

  --
  (gdb) bt
  #0  qemu_co_queue_run_restart (co=0x7fba8ff05aa0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:59
  #1  0x55c1656f39a9 in qemu_coroutine_enter (co=0x7fba8ff05aa0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
  #2  0x55c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd20430) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
  #3  0x55c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd20430) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
  #4  0x55c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd14ea0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
  #5  0x55c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd14ea0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
  #6  0x55c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba80c11dc0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
  #7  0x55c1656f39a9 in qemu_coroutine_enter (co=0x7fba80c11dc0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
  #8  0x55c1656f3e74 in qemu_co_queue_run_restart (co=0x7fba8dd0bd70) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:60
  #9  0x55c1656f39a9 in qemu_coroutine_enter (co=0x7fba8dd0bd70) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine.c:119
  #10 0x55c1656f3fa0 in qemu_co_enter_next 
(queue=queue@entry=0x55c1669e75e0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/util/qemu-coroutine-lock.c:106
  #11 0x55c165692060 in timer_cb (blk=0x55c1669e7590, is_write=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/block/throttle-groups.c:400
  #12 0x55c16564f615 in timerlist_run_timers (timer_list=0x55c166a53e80) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:528
  #13 0x55c16564f679 in timerlistgroup_run_timers 
(tlg=tlg@entry=0x55c167c81cf8) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/qemu-timer.c:564
  #14 0x55c16564ff47 in aio_dispatch (ctx=ctx@entry=0x55c167c81bb0) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:357
  #15 0x55c1656500e8 in aio_poll (ctx=0x55c167c81bb0, blocking=) at /build/pb-qemu-pssKUp/pb-qemu-2.7.0/aio-posix.c:479
  #16 0x55c1654b1c79 in iothread_run (opaque=0x55c167c81960) at 
/build/pb-qemu-pssKUp/pb-qemu-2.7.0/iothread.c:46
  #17 0x7fbc4b64f0a4 in allocate_stack (stack=, 
pdp=, attr=0x0) at allocatestack.c:416
  #18 __pthread_create_2_1 (newthread=, attr=,
  start_routine=, arg=) at pthread_create.c:539
  Backtrace stopped: Cannot access memory at address 0x8
  --

  The code that crashes is this
  --
  void qemu_co_queue_run_restart(Coroutine *co)
  {
  Coroutine *next;

  trace_qemu_co_queue_run_restart(co);
  while ((next = QSIMPLEQ_FIRST(>co_queue_wakeup))) {
  QSIMPLEQ_REMOVE_HEAD(>co_queue_wakeup, co_queue_next); <-Crash
  qemu_coroutine_enter(next);
  }
  }
  --

  Expanding the macro QSIMPLEQ_REMOVE_HEAD gives us
  --
  #define QSIMPLEQ_REMOVE_HEAD(head, field) do {  \
  if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL)\
  (head)->sqh_last = &(head)->sqh_first;  \
  } while (/*CONSTCOND*/0)
  --

  which corrsponds to
  --
  if (((>co_queue_wakeup)->sqh_first = 
(>co_queue_wakeup)->sqh_first->co_queue_next.sqe_next) == NULL)\
  (>co_queue_wakeup)->sqh_last = &(>co_queue_wakeup)->sqh_first;
  --

  Debugging the list we see
  --
  (gdb) print *(>co_queue_wakeup->sqh_first)
  $6 = (struct Coroutine *) 0x1000
  (gdb) print *(>co_queue_wakeup->sqh_first->co_queue_next)
  Cannot access memory at address 

  1   2   3   4   >