Re: [Qemu-devel] [PATCH] virtio-net: do not start queues that are not enabled by the guest

2019-02-24 Thread Jason Wang



On 2019/2/22 下午12:22, Michael S. Tsirkin wrote:

On Thu, Feb 21, 2019 at 10:10:08PM -0500, Michael S. Tsirkin wrote:

On Fri, Feb 22, 2019 at 11:04:05AM +0800, Jason Wang wrote:

On 2019/2/22 上午9:35, Michael S. Tsirkin wrote:

On Thu, Feb 21, 2019 at 05:40:22PM +0800, Jason Wang wrote:

On 2019/2/21 下午4:18, Yuri Benditovich wrote:

  For 1.0 device, we can fix the queue_enable, but for 0.9x device how 
do
  you enable one specific queue in this case? (setting status?)


  Do I understand correctly that for 0.9 device in some cases the device 
will
  receive feature _MQ set, but will not receive 
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET?
  Or the problem is different?


Let me clarify, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET is used to control the the
number of queue pairs used by device for doing transmission and reception. It
was not used to enable or disable a virtqueue.

For 1.0 device, we should use queue_enable in pci cfg to enable and disable
queue:


We could do:

1) allocate memory and set queue_enable for vq0

2) allocate memory and set queue_enable for vq1

3) Set vq paris to 1

4) allocate memory and set queue_enable for vq2

5) allocate memory and set queue_enable for vq3

6) set vq pairs to 2

I do not think spec allows this.


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by 
the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the 
device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits 
after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, 
the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the 
device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and 
population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


Thus vqs are setup at step 7.

# of vq pairs are set up through a command which is a special
buffer, and spec says:

The driver MUST NOT send any buffer available notifications to the device 
before setting DRIVER_OK.


So you meant write to queue_enable is forbidden after DRIVER_OK (though it's
not very clear to me from the  spec). And if a driver want to enable new
queues, it must reset the device?


That's my reading.  What do you think?



Looks like I can infer this from the spec, maybe it's better to clarify.



Btw some legacy drivers might violate this by addig buffers
before driver_ok.



Yes, but it's probably too late to fix them.

Thanks.





But this requires a proper implementation for queue_enable for vhost which is
missed in qemu and probably what you really want to do.

but for 0.9x device, there's no such way to do this. That's the issue.

0.9x there's no queue enable, assumption is PA!=0 means VQ has
been enabled.



So
driver must allocate all queBes before starting the device, otherwise there's
no way to enable it afterwards.

As per spec queues must be allocated before DRIVER_OK.

That is universal.


If I understand correctly, this is not what is done by current windows
drivers.

Thanks



There're tricks to make it work like what is
done in your patch, but it depends on a specific implementation like qemu which
is sub-optimal.




  A fundamental question is what prevents you from just initialization 
all
  queues during driver start? It looks to me this save lots of efforts
  than allocating queue dynamically.


  This is not so trivial in Windows driver, as it does not have objects for 
queues
  that it does not use. Linux driver first of all allocates all the
  queues and then
  adds Rx/Tx to those it will use. Windows driver first decides how many 
queues
  it will use then allocates objects for them and initializes them from 
zero to
  fully functional state.


Well, you just need to allocate some memory for the virtqueue, there's no need
to make it visible to the rest until it was enabled.

Thanks








Re: [Qemu-devel] [PATCH v3 06/10] block/dirty-bitmap: explicitly lock bitmaps with successors

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
23.02.2019 3:06, John Snow wrote:
> Instead of implying a user_locked/busy status, make it explicit.
> Now, bitmaps in use by migration, NBD or backup operations
> are all treated the same way with the same code paths.
> 
> Signed-off-by: John Snow
> Reviewed-by: Eric Blake

hmm, you forget my r-b:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


[Qemu-devel] [Bug 1817345] Re: configure script breaks when $source_path contains white spaces

2019-02-24 Thread Antonio Ospite
I am OK with just checking and complaining.

Linux too solves the problem in this way:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Makefile?id=51193b76bfff5027cf96ba63effae808ad67cca7

A general "shellcheck" pass wouldn't hurt, tho.

Thank you,
   Antonio

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1817345

Title:
  configure script breaks when $source_path contains white spaces

Status in QEMU:
  New

Bug description:
  Hi,

  I noticed that the configure script breaks when the qemu source
  directory is in a path containing white spaces, in particular the list
  of targets is not correctly generated when calling "./configure
  --help".

  Steps to reproduce the problem:

  $ mkdir "dir with spaces"
  $ cd dir\ with\ spaces/
  $ git clone https://git.qemu.org/git/qemu.git
  $ cd qemu/
  $ ./configure --help | grep -A3 target-list

  
  Actual result:

--target-list=LIST   set target list (default: build everything)
 Available targets: dir with *-softmmu dir with 
 *-linux-user

  
  Expected result:

--target-list=LIST   set target list (default: build everything)
 Available targets: aarch64-softmmu alpha-softmmu 
 arm-softmmu cris-softmmu hppa-softmmu i386-softmmu 
 lm32-softmmu m68k-softmmu microblaze-softmmu 

  
  This happens because the $mak_wilds variable uses spaces to separate 
different paths, maybe newlines may be used, which are less likely to be in 
directory names.

  BTW "shellcheck" may help finding some other problems.

  Qemu version:

  $ git describe 
  v3.1.0-1960-ga05838cb2a

  Thanks,
 Antonio

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1817345/+subscriptions



Re: [Qemu-devel] [PATCH v3 05/10] nbd: change error checking order for bitmaps

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
23.02.2019 3:06, John Snow wrote:
> Check that the bitmap is not in use prior to it checking if it is
> not enabled/recording guest writes. The bitmap being busy was likely
> at the behest of the user, so this error has a greater chance of being
> understood by the user.
> 
> Signed-off-by: John Snow


Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 04/10] block/dirty-bitmap: change semantics of enabled predicate

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
23.02.2019 3:06, John Snow wrote:
> Currently, the enabled predicate means something like:
> "the QAPI status of the bitmap is ACTIVE."
> After this patch, it should mean exclusively:
> "This bitmap is recording guest writes, and is allowed to do so."
> 
> In many places, this is how this predicate was already used.
> Internal usages of the bitmap QPI can call user_locked to find out if
> the bitmap is in use by an operation.
> 
> To accommodate this, modify the create_successor routine to now
> explicitly disable the parent bitmap at creation time.
> 
> 
> Justifications:
> 
> 1. bdrv_dirty_bitmap_status suffers no change from the lack of
> 1:1 parity with the new predicates because of the order in which
> the predicates are checked. This is now only for compatibility.
> 
> 2. bdrv_set_dirty() is unchanged: pre-patch, it was skipping bitmaps that were
> disabled or had a successor, while post-patch it is only skipping bitmaps
> that are disabled. To accommodate this, create_successor now ensures that
> any bitmap with a successor is explicitly disabled.
> 
> 3. qcow2_store_persistent_dirty_bitmaps: No functional change. This function
> cares only about the literal enabled bit, and makes no effort to check if
> the bitmap is in-use or not. After this patch there are still no ways to
> produce an enabled bitmap with a successor.
> 
> 4. block_dirty_bitmap_enable_prepare
> block_dirty_bitmap_disable_prepare
> init_dirty_bitmap_migration
> nbd_export_new
> 
> These functions care about the literal enabled bit,
> and already check user_locked separately.
> 
> Signed-off-by: John Snow

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 03/10] block/dirty-bitmap: remove set/reset assertions against enabled bit

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
about subject: shouldn't it be "against disabled bit" instead?

23.02.2019 3:06, John Snow wrote:
> bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap are only used as an
> internal API by the mirror and migration areas of our code. These
> calls modify the bitmap, but do so at the behest of QEMU and not the
> guest.
> 
> Presently, these bitmaps are always "enabled" anyway, but there's no
> reason they have to be.
> 
> Modify these internal APIs to drop this assertion.
> 
> Signed-off-by: John Snow 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

> ---
>   block/dirty-bitmap.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index aa3f86bb73..9ea5738420 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -542,7 +542,6 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>   void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
> int64_t offset, int64_t bytes)
>   {
> -assert(bdrv_dirty_bitmap_enabled(bitmap));
>   assert(!bdrv_dirty_bitmap_readonly(bitmap));
>   hbitmap_set(bitmap->bitmap, offset, bytes);
>   }
> @@ -559,7 +558,6 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>   int64_t offset, int64_t bytes)
>   {
> -assert(bdrv_dirty_bitmap_enabled(bitmap));
>   assert(!bdrv_dirty_bitmap_readonly(bitmap));
>   hbitmap_reset(bitmap->bitmap, offset, bytes);
>   }
> 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3] hw/display: Add basic ATI VGA emulation

2019-02-24 Thread Gerd Hoffmann
> > Attached patch creates two separate devices.  It's just some QOM
> > boilerplate, they still share 95% of the code.
> 
> Thanks. Do you want me to merge this and submit a new version or you want to
> fix up after merging my patch or what should we do with this? (I'm not a fan
> of adding a new function for every new model of these devices so I'd
> probably get rid of the is_ati_* functions and still compare to device id
> internally maybe via a single is_ati_device(id) function though. I also find
> ati-rage128pro a bit too long, I liked the ati-vga better with selecting
> variants via a property but if you think it's better with more QOM
> boilerplate and adding every variant to the default_list[] one by one then I
> can live with that. How about ati-r128pro at least instead?

Just apply the patch to your branch and tidy up as you want.  Shorten
the rage name is fine with me.

I don't expect we'll add too many variants, getting the two work
properly will be difficult enough, so the QOM boilerplate shouldn't be
much of a problem.  I think it'll be more user-friendly to just have two
or three ati devices instead of expecting them to dig out the correct
id.

> >  With that in place
> > vgabios works just fine for both devices.
> 
> But I don't get why is the above needed for this. By the time the client
> runs (after device realize) both should look the same: your QOM subclasses
> and my ati-vga class that sets device_id based on property. Where's the
> difference that makes one work and other doesn't?

See pci_patch_ids() in hw/pci/pci.c

That gets vgabios-stdvga going with the ati cards (as long as the bochs
interface is available), by setting the vendor and device IDs so seabios
accepts and loads the rom.

That requires the device ID being set at that point though.
Specifically setting it in reset handler is too late, and I think doing
it in realize() is too late too.  With two QOM types it is already set,
thats why this gets the vgabios work with the rv100.

> > Neither radeonfb.ko nor radeon.ko (drm driver) can handle the rv100.
> > Loading radeonfb results in a kernel panic.  radeon.ko prints an
> > initialization error.  Seems at least radeonfb tries to pull some info
> > out of the bios image ...
> 
> Is there a live cd iso or something simple to test this with?

No.  Problem is modern distros tend to not build these drivers any more.

I have a script which creates a big initrd with a small distro install
plus a self-compiled kernel, for testing my kernel patches.  This is
what I have used here too.

> I think the ATI option ROMs have some mode timings that are needed to
> program PLLs for setting modes and drivers extract this from the BIOS which
> has some predefined structure so if a driver relies on this it may have a
> problem when we don't have a BIOS with the data it expects. But if it does
> not check BIOS format and just pulls junk out of unexpected BIOS image and
> puts that in clock registers that should not matter as those are ignored
> anyway. I think rv100 will probably need a separate vgabios that looks more
> like what drivers expect but I don't have plans yet to implement that.

For radeonfb this might work.
The radeon drm driver error messaga was quite unspecific.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v3 02/10] block/dirty-bitmaps: rename frozen predicate helper

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
23.02.2019 3:06, John Snow wrote:
> "Frozen" was a good description a long time ago, but it isn't adequate now.
> Rename the frozen predicate to has_successor to make the semantics of the
> predicate more clear to outside callers.
> 
> In the process, remove some calls to frozen() that no longer semantically
> make sense. For bdrv_enable_dirty_bitmap_locked and
> bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU
> internals from performing this action when we only wished to prohibit QMP
> users from issuing these commands. All of the QMP API commands for bitmap
> manipulation already check against user_locked() to prohibit these actions.
> 
> Several other assertions really want to check that the bitmap isn't in-use
> by another operation -- use the bitmap_user_locked function for this instead,
> which presently also checks for has_successor. This leaves some redundant
> checks of has_sucessor through different helpers that are addressed in
> forthcoming patches.
> 
> Signed-off-by: John Snow 
> ---
>   block/dirty-bitmap.c   | 32 +---
>   include/block/dirty-bitmap.h   |  2 +-
>   migration/block-dirty-bitmap.c |  2 +-
>   3 files changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 101383b3af..aa3f86bb73 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c

[..]

> @@ -285,7 +288,7 @@ void bdrv_dirty_bitmap_enable_successor(BdrvDirtyBitmap 
> *bitmap)
>   static void bdrv_release_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap)
>   {
>   assert(!bitmap->active_iterators);
> -assert(!bdrv_dirty_bitmap_frozen(bitmap));
> +assert(!bdrv_dirty_bitmap_user_locked(bitmap));

hmm. I'd prefere to keep a separate assertion for successor: we don't free it 
here,
so we really depend on absence of this object.

>   assert(!bitmap->meta);
>   QLIST_REMOVE(bitmap, list);
>   hbitmap_free(bitmap->bitmap);
> @@ -325,7 +328,7 @@ BdrvDirtyBitmap 
> *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
>   /**
>* In cases of failure where we can no longer safely delete the parent,
>* we may wish to re-join the parent and child/successor.
> - * The merged parent will be un-frozen, but not explicitly re-enabled.
> + * The merged parent will not be user_locked, nor explicitly re-enabled.
>* Called within bdrv_dirty_bitmap_lock..unlock and with BQL taken.
>*/
>   BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs,
> @@ -373,7 +376,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs, 
> int64_t bytes)
>   
>   bdrv_dirty_bitmaps_lock(bs);
>   QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
> -assert(!bdrv_dirty_bitmap_frozen(bitmap));
> +assert(!bdrv_dirty_bitmap_user_locked(bitmap));

and here too. As we don't truncate successors.

>   assert(!bitmap->active_iterators);
>   hbitmap_truncate(bitmap->bitmap, bytes);
>   bitmap->size = bytes;
> @@ -391,7 +394,7 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, 
> BdrvDirtyBitmap *bitmap)
>   
>   /**
>* Release all named dirty bitmaps attached to a BDS (for use in 
> bdrv_close()).
> - * There must not be any frozen bitmaps attached.
> + * There must not be any user_locked bitmaps attached.
>* This function does not remove persistent bitmaps from the storage.
>* Called with BQL taken.
>*/
> @@ -428,7 +431,6 @@ void bdrv_remove_persistent_dirty_bitmap(BlockDriverState 
> *bs,
>   void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
>   {
>   bdrv_dirty_bitmap_lock(bitmap);
> -assert(!bdrv_dirty_bitmap_frozen(bitmap));
>   bitmap->disabled = true;
>   bdrv_dirty_bitmap_unlock(bitmap);
>   }
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 04a117fc81..cdbb4dfefd 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h

Could you please add scripts/git.orderfile to your .git/config, like

[diff]
 orderFile = /path/to/scripts/git.orderfile

?

> @@ -36,7 +36,7 @@ BlockDirtyInfoList 
> *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
>   uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>   uint32_t bdrv_dirty_bitmap_granularity(const BdrvDirtyBitmap *bitmap);
>   bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
> -bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
> +bool bdrv_dirty_bitmap_has_successor(BdrvDirtyBitmap *bitmap);
>   const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
>   int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
>   DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 6426151e4f..ac6954142f 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -542,7 +542,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
> DirtyBitmapLoadState 

Re: [Qemu-devel] [PATCH qemu v2 0/4] spapr_pci, vfio: NVIDIA V100 + POWER9 passthrough

2019-02-24 Thread Alexey Kardashevskiy



On 15/02/2019 16:21, David Gibson wrote:
> On Fri, Feb 15, 2019 at 03:34:52PM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 15/02/2019 14:54, David Gibson wrote:
>>> On Fri, Feb 15, 2019 at 02:32:14PM +1100, Alexey Kardashevskiy wrote:


 On 15/02/2019 14:24, David Gibson wrote:
> On Fri, Feb 15, 2019 at 11:35:02AM +1100, Alexey Kardashevskiy wrote:
>>
>>
>> On 15/02/2019 10:37, Alex Williamson wrote:
>>> On Thu, 14 Feb 2019 16:21:40 +1100
>>> Alexey Kardashevskiy  wrote:
>>>
 This is for passing through NVIDIA V100 GPUs on POWER9 systems.

 This implements a subdriver for NVIDIA V100 GPU with coherent memory 
 and
 NPU/ATS support available in the POWER9 CPU.

 1/4 is a preparation for bigger DMA windows.
 2/4 is a small cleanup.

 Here is the kernel driver:
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/vfio/pci?h=v5.0-rc6=7f92891778dff62303c070ac81de7b7d80de331a

 SLOF changes already went in.

 This depends on "pci: Move NVIDIA vendor id to the rest of ids" 
 (posted separately).
>>>
>>> TBH, I'm not sure it was the best idea to let it live or die on it's
>>> own when it now creates a build dependency for this series.
>> I am sure that patch is disgustingly primitive and can make it to
>> upstream in just one click and the rest of the series will take more
>> time anyway (always does :) ).
>>
 This is based on sha1
 1ea6057 Mark Cave-Ayland "mac_newworld: change default NIC to sungem 
 for mac99 machine".
>>>
>>> Perhaps this is why it doesn't apply cleanly against qemu.git.  Are
>>> there dependencies we need to wait for in the ppc tree as well?
>>
>> There are few changes in spapr_pci so conflicts are possible, every time
>> when David updates his tree and I rebase I get some minor ones.
>>
 Please comment. Thanks.
>>>
>>> Besides the build dependency on PCI_VENDOR_ID_NVIDIA, I also get this:
>>>
>>> .../qemu.git/hw/vfio/spapr.c: In function ‘vfio_spapr_create_window’:
>>> .../qemu.git/hw/vfio/spapr.c:212:9: error: ‘ret’ may be used 
>>> uninitialized in this function [-Werror=maybe-uninitialized]
>>>  error_report("Failed to create a window, ret = %d (%m)", ret);
>>>  ^
>>> cc1: all warnings being treated as errors
>>
>> Agrh. How exactly do you make them errors, not warnings? I get no
>> warning/error with --disable-werror but not having  --disable-werror
>> prints warnings, not errors so it does not fail the build and easy to 
>> miss.
>
> Are you sure it's printing warnings with --enable-werror?  Otherwise
> it sounds like you just have a compiler version that's not picking
> this up.

 Ah, here is my mistake - I thought not having --disable-werror means
 that it is enabled as ./configure does not advertise  "--enable-werror":

 [fstn1-p1 qemu]$ ./configure --help | grep werror
   --disable-werror disable compilation abort on warning


 Apparently it is a tri-state flag :)
>>>
>>> Um.. no.. I believe it should be on by default as well, I was just
>>> saying --enable-werror because I thought it was clearer than the
>>> double negative in "without --disable-werror".
>>
>> The default value of werror is "":
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=configure;h=a61682c3c727f467ce2710c7469a33b261da8ff6;hb=HEAD#l935
> 
> Which turns into a "yes" on Linux - see line 1835-1844.


No it does not - the 'test -d "$source_path/.git"' fails as "git
worktree" makes .git a file, not a folder.



-- 
Alexey



Re: [Qemu-devel] [PATCH] target-i386: Enhance the stub for kvm_arch_get_supported_cpuid()

2019-02-24 Thread Kamil Rytarowski
Ping.

On 21.02.2019 03:08, Kamil Rytarowski wrote:
> On 20.02.2019 18:29, Paolo Bonzini wrote:
>> On 20/02/19 12:59, Kamil Rytarowski wrote:
>>> Ping, still valid.
>>
>> Sorry, I missed your email.
>>
>>> On 15.02.2019 00:38, Kamil Rytarowski wrote:
 I consider it as fragile hack and certainly not something to depend on.
 Also in some circumstances of such code, especially "if (zero0)" we want
 to enable disabled code under a debugger.
>>
>> That's a good objection, but certainly does not apply to KVM on NetBSD.
>>
> 
> There is KVM for Darwin (experimental and rather toy project) and it
> might be ported to NetBSD (I have actually forked it on GitHub
> recently), but I doubt that someone would enable KVM on any platform
> under a debugger this way and expect something to work.
> 
 There were also kernel backdoors due to this optimization.
>>
>> Citation please?
>>
> 
> I saw an exploit for such case with a .txt writeup on ftp of grsecurity
> but that service seems to be gone (probably long time ago), so please
> defer discussion on it. If someone is interested to find it out, there
> are enough pointers to dig it (assuming that this is still possible).
> 
 Requested cpu.i (hopefully correctly generated)

 http://netbsd.org/~kamil/qemu/cpu.i.bz2
>>
>> So, first thing first I can reproduce clang's behavior with this .i file
>> and also with this reduced test case.
>>
>> extern void f(void);
>> int i, j;
>> int main()
>> {
>> if (0  && i) f();
>> if (j  && 0) f();
>>}
>>
>> The first is eliminated but the second is not, just like in QEMU where
>> this works:
>>
>> if (kvm_enabled() && cpu->enable_pmu) {
>> KVMState *s = cs->kvm_state;
>>
>> *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
>> *ebx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EBX);
>> *ecx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_ECX);
>> *edx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EDX);
>> } else if (hvf_enabled() && cpu->enable_pmu) {
>> *eax = hvf_get_supported_cpuid(0xA, count, R_EAX);
>> *ebx = hvf_get_supported_cpuid(0xA, count, R_EBX);
>> *ecx = hvf_get_supported_cpuid(0xA, count, R_ECX);
>> *edx = hvf_get_supported_cpuid(0xA, count, R_EDX);
>>
>> while this doesn't:
>>
>> if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
>> kvm_enabled()) {
>> KVMState *s = CPU(cpu)->kvm_state;
>> uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
>> uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
>> uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
>> uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
>> uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
>>
>> But, that's okay, it's -O0 so we give clang a pass for that  Note that
>> clang does do the optimization even in more complex cases like
>>
>> extern _Bool f(void);
>> int main()
>> {
>> if (!0) return 0;
>> if (!f()) return 0;
>> }
>>
>> The problem is that there is a kvm-stub.c entry for that, and in fact
>> my compilation passes and the symbol is resolved correctly:
>>
>> $ nm target/i386/cpu.o |grep kvm_.*get_sup
>>  U kvm_arch_get_supported_cpuid
>> $ nm target/i386/kvm-stub.o|grep kvm_.*get_sup
>> 0030 T kvm_arch_get_supported_cpuid
>> $ nm qemu-system-x86_64 |grep kvm_.*get_sup
>> 0046eab0 T kvm_arch_get_supported_cpuid
>>
>> As expected, something much less obvious is going on for you, in
>> particular __OPTIMIZE__seems not to be working properly.  However,
>> that would also be very surprising.
>>
>> Please:
>>
>> 1) run the last two "nm" commands on your build (wthout grep).
>>
> 
> I cannot run nm(1) on qemu-system-x86_64 as it's not linkable.
> 
> I'm getting the same result for target/i386/cpu.o and
> target/i386/kvm-stub.o.
> 
> $ nm ./i386-softmmu/target/i386/kvm-stub.o
>  U abort
>  T kvm_allows_irq0_override
> 0030 T kvm_arch_get_supported_cpuid
> 0020 T kvm_enable_x2apic
> 0010 T kvm_has_smm
> 0050 T kvm_hv_vpindex_settable
> 
> grep(1) used, but otherwise I would need to upload results somewhere else.
> 
> $ nm ./i386-bsd-user/target/i386/cpu.o |grep kvm
>  U kvm_arch_get_supported_cpuid
> 1290 d kvm_default_props
>  U kvm_state
> 0240 T x86_cpu_change_kvm_default
> 
> Please note that there are 4 types of x86 build: i386, x86_64 and two
> bsd-user (32-bit and 64-bit).
> 
> According to my observations of repeated attempts both builds of
> bsd-user are affected.
> 
> There is also a difference that kvm-stub is twice in !bsd-user
> directories and once in bsd-user ones.
> 
> $ find . -name kvm-stub.o|grep x86_64
> 

Re: [Qemu-devel] [PATCH v3 01/10] block/dirty-bitmap: add recording and busy properties

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
23.02.2019 3:06, John Snow wrote:
> The current API allows us to report a single status, which we've defined as:
> 
> Frozen: has a successor, treated as qmp_locked, may or may not be enabled.
> Locked: no successor, qmp_locked. may or may not be enabled.
> Disabled: Not frozen or locked, disabled.
> Active: Not frozen, locked, or disabled.
> 
> The problem is that both "Frozen" and "Locked" mean nearly the same thing,
> and that both of them do not intuit whether they are recording guest writes
> or not.
> 
> This patch deprecates that status field and introduces two orthogonal
> properties instead to replace it.
> 
> Signed-off-by: John Snow

Reviewed-by: Vladimir Sementsov-Ogievskiy 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH 4/4] block/nbd-client: use non-blocking io channel for nbd negotiation

2019-02-24 Thread Vladimir Sementsov-Ogievskiy
19.02.2019 16:18, Vladimir Sementsov-Ogievskiy wrote:
> 12.02.2019 1:02, Eric Blake wrote:
>> On 2/11/19 6:56 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Now negotiation is done in coroutine, so to take benefit of it let's
>>> use non-blocking model.
>>>
>>> Note that QIOChannel handle synchronous io calls correctly anyway, so
>>
>> s/handle/handles/
>>
>>> it's not a problem to send final NBD_CMD_DISC to non-blocking channel
>>> but using sync qio interface, even not in coroutine.
>>
>> s/not in/when not in a/
>>
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/nbd-client.c | 10 +++---
>>>   1 file changed, 3 insertions(+), 7 deletions(-)
>>
>> This fixes qemu as NBD client for use in block devices, but what about
>> qemu as NBD client in qemu-nbd?  Does that need any updates?  Then
>> again, your observation about QIO doing the right thing for both
>> blocking (qemu-nbd) and non-blocking (block layer) channels seems to
>> cover that.
> 
> In qemu-nbd client works in a separate thread, so, I think, we don't need
> nonblocking: we have a thread anyway.
> 
> On the other hand, is my code in nbd_receive_common safe, keeping in mind
> that we call it from separate thread?
> 
> Are qio_channel_attach_aio_context, qemu_aio_coroutine_enter, AIO_WAIT_WHILE
> thread-safe?

Could somebody shed a bit of light here, please?

> 
>>
>>> @@ -1072,9 +1072,6 @@ static int nbd_client_connect(BlockDriverState *bs,
>>>   object_ref(OBJECT(client->ioc));
>>>   }
>>> -    /* Now that we're connected, set the socket to be non-blocking and
>>> - * kick the reply mechanism.  */
>>> -    qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
>>>   client->connection_co = qemu_coroutine_create(nbd_connection_entry, 
>>> client);
>>>   nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
>>> @@ -1083,9 +1080,8 @@ static int nbd_client_connect(BlockDriverState *bs,
>>>    fail:
>>>   /*
>>> - * We have connected, but must fail for other reasons. The
>>> - * connection is still blocking; send NBD_CMD_DISC as a courtesy
>>> - * to the server.
>>> + * We have connected, but must fail for other reasons.
>>> + * Send NBD_CMD_DISC as a courtesy to the server.
>>>    */
>>>   {
>>>   NBDRequest request = { .type = NBD_CMD_DISC };
>>>
>>
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH] scsi-cd: Fix crash after remote cdrom detached

2019-02-24 Thread Zheng Xiang
Ping?

On 2019/2/15 11:17, Zheng Xiang wrote:
> Hi Paolo,
> 
> On 2019/2/15 2:07, Paolo Bonzini wrote:
>> On 14/02/19 13:27, Xiang Zheng wrote:
>>> There is a small window between the twice blk_is_available in
>>> scsi_disk_emulate_command which would cause crash due to the later
>>> assertion if the remote cdrom is detached in this window.
>>>
>>> So this patch replaces assertions with return to avoid qemu crash.
>>>
>>> Signed-off-by: Xiang Zheng 
>>> ---
>>> The qemu error log shows:
>>>
>>> qemu-system-aarch64: /home/qemu/hw/scsi/scsi-disk.c:1896: 
>>> scsi_disk_emulate_command: Assertion `blk_is_available(s->qdev.conf.blk)' 
>>> failed.
>>> 2019-02-15 04:35:18.592: shutting down, reason=crashed
>>
>> Is this with virtio-scsi-dataplane?
>>
> 
> No, the QEMU commandline about scsi is bellow:
>   -device virtio-scsi-pci,id=scsi0,bus=pci.4,addr=0x0 \
>   -drive 
> file=/mnt/zhengxiang/guestos.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=native
>  \
>   -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>  \
>   -drive 
> file=/home/tmp.iso,format=raw,if=none,id=drive-scsi0-0-0-1,readonly=on,cache=none,aio=threads
>  \
>   -device 
> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
> 
> This problem can be reproduced by detaching and attaching remote cdrom 
> repeatly.
> 
-- 

Thanks,
Xiang





Re: [Qemu-devel] [PATCH] spapr-rtas: add ibm, get-vpd RTAS interface

2019-02-24 Thread David Gibson
On Sat, Feb 23, 2019 at 11:40:57AM -0300, Maxiwell S. Garcia wrote:
> This adds a handler for ibm,get-vpd RTAS calls, allowing pseries
> guest to collect host information. It is disabled by default to
> avoid unwanted information leakage. To enable it, use:
> ‘-M pseries,vpd-export=on’
> 
> Only the SE and TM keywords are returned at the moment:
> SE for Machine or Cabinet Serial Number and
> TM for Machine Type and Model.
> 
> Powerpc-utils tools can dispatch RTAS calls to retrieve host
> information using this ibm,get-vpd interface. The 'host-serial'
> and 'host-model' nodes of device-tree hold the same information but
> in a static manner, which is useless after a migration operation.
> 
> Signed-off-by: Maxiwell S. Garcia 
> ---
>  hw/ppc/spapr.c | 21 ++
>  hw/ppc/spapr_rtas.c| 93 ++
>  include/hw/ppc/spapr.h | 17 +++-
>  3 files changed, 130 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index abf9ebce59..09fd9e2ebb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3026,6 +3026,20 @@ static char *spapr_get_fw_dev_path(FWPathProvider *p, 
> BusState *bus,
>  return NULL;
>  }
>  
> +static bool spapr_get_vpd_export(Object *obj, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +return spapr->vpd_export;
> +}
> +
> +static void spapr_set_vpd_export(Object *obj, bool value, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +spapr->vpd_export = value;
> +}
> +
>  static char *spapr_get_kvm_type(Object *obj, Error **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3150,6 +3164,7 @@ static void spapr_instance_init(Object *obj)
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  
>  spapr->htab_fd = -1;
> +spapr->vpd_export = false;
>  spapr->use_hotplug_event_source = true;
>  object_property_add_str(obj, "kvm-type",
>  spapr_get_kvm_type, spapr_set_kvm_type, NULL);
> @@ -3182,6 +3197,12 @@ static void spapr_instance_init(Object *obj)
>  object_property_add_bool(obj, "vfio-no-msix-emulation",
>   spapr_get_msix_emulation, NULL, NULL);
>  
> +object_property_add_bool(obj, "vpd-export", spapr_get_vpd_export,
> + spapr_set_vpd_export, NULL);
> +object_property_set_description(obj, "vpd-export",
> +"Export Host's VPD information to guest",
> +_abort);
> +
>  /* The machine class defines the default interrupt controller mode */
>  spapr->irq = smc->irq;
>  object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index d6a0952154..214e0edfc5 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -287,6 +287,97 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU 
> *cpu,
>  rtas_st(rets, 0, ret);
>  }
>  
> +static inline int vpd_st(target_ulong addr, target_ulong len,
> + const void *val, uint16_t val_len)
> +{
> +hwaddr phys = ppc64_phys_to_real(addr);
> +if (len < val_len) {
> +return RTAS_OUT_PARAM_ERROR;
> +}
> +cpu_physical_memory_write(phys, val, val_len);
> +return RTAS_OUT_SUCCESS;
> +}
> +
> +static inline void vpd_ret(target_ulong rets, const int status,
> +   const int next_seq_number, const int 
> bytes_returned)
> +{
> +rtas_st(rets, 0, status);
> +rtas_st(rets, 1, next_seq_number);
> +rtas_st(rets, 2, bytes_returned);
> +}
> +
> +static void rtas_ibm_get_vpd(PowerPCCPU *cpu,
> + sPAPRMachineState *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> +sPAPRMachineState *sm = SPAPR_MACHINE(spapr);
> +target_ulong loc_code_addr;
> +target_ulong work_area_addr;
> +target_ulong work_area_size;
> +target_ulong seq_number;
> +unsigned char loc_code = 0;
> +unsigned int next_seq_number = 0;
> +int status = RTAS_IBM_GET_VPD_PARAMETER_ERROR;
> +int ret = 0;
> +char *field = '\0';

ITYM
char *field = "\0";

Assigning field to an empty string.  As it is '\0' is being coerced to
an integer (0) then to a pointer (NULL)...

> +
> +if (!sm->vpd_export) {
> +vpd_ret(rets, RTAS_OUT_NOT_AUTHORIZED, 1, 0);
> +return;
> +}
> +
> +/* Specific Location Code is not supported */
> +loc_code_addr = rtas_ld(args, 0);
> +cpu_physical_memory_read(loc_code_addr, _code, 1);
> +if (loc_code != 0) {
> +vpd_ret(rets, RTAS_IBM_GET_VPD_PARAMETER_ERROR, 1, 0);
> +return;
> +}
> +
> +work_area_addr = rtas_ld(args, 1);
> +work_area_size = rtas_ld(args, 2);
> +seq_number = 

[Qemu-devel] [ping] [PATCH] Fix breakpoint support in Nios II user-mode emulation.

2019-02-24 Thread Sandra Loosemore

Ping?

http://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg03483.html

-Sandra



[Qemu-devel] [ping] [PATCH v5 0/2] Nios II generic board config and semihosting

2019-02-24 Thread Sandra Loosemore

Ping?

http://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg03167.html

-Sandra



Re: [Qemu-devel] [Bug 1816052] Re: qemu system emulator fails to start if no sound card is present on host

2019-02-24 Thread Like Xu

On 2019/2/19 17:02, Gerd Hoffmann wrote:

What happens if you remove "try-alsa" from the configure line?



This issue could be fixed by passing "export QEMU_AUDIO_DRV=none" to 
shell env when alsa support form host kernel is unavailable.





Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-24 Thread Zhao Yan
On Thu, Feb 21, 2019 at 01:40:51PM -0700, Alex Williamson wrote:
> Hi Yan,
> 
> Thanks for working on this!
> 
> On Tue, 19 Feb 2019 16:50:54 +0800
> Yan Zhao  wrote:
> 
> > This patchset enables VFIO devices to have live migration capability.
> > Currently it does not support post-copy phase.
> > 
> > It follows Alex's comments on last version of VFIO live migration patches,
> > including device states, VFIO device state region layout, dirty bitmap's
> > query.
> > 
> > Device Data
> > ---
> > Device data is divided into three types: device memory, device config,
> > and system memory dirty pages produced by device.
> > 
> > Device config: data like MMIOs, page tables...
> > Every device is supposed to possess device config data.
> > Usually device config's size is small (no big than 10M), and it
> 
> I'm not sure how we can really impose a limit here, it is what it is
> for a device.  A smaller state is obviously desirable to reduce
> downtime, but some devices could have very large states.
> 
> > needs to be loaded in certain strict order.
> > Therefore, device config only needs to be saved/loaded in
> > stop-and-copy phase.
> > The data of device config is held in device config region.
> > Size of device config data is smaller than or equal to that of
> > device config region.
> 
> So the intention here is that this is the last data read from the
> device and it's done in one pass, so the region needs to be large
> enough to expose all config data at once.  On restore it's the last
> data written before switching the device to the run state.
> 
> > 
> > Device Memory: device's internal memory, standalone and outside system
> 
> s/system/VM/
> 
> > memory. It is usually very big.
> 
> Or it doesn't exist.  Not sure we should be setting expectations since
> it will vary per device.
> 
> > This kind of data needs to be saved / loaded in pre-copy and
> > stop-and-copy phase.
> > The data of device memory is held in device memory region.
> > Size of devie memory is usually larger than that of device
> > memory region. qemu needs to save/load it in chunks of size of
> > device memory region.
> > Not all device has device memory. Like IGD only uses system memory.
> 
> It seems a little gratuitous to me that this is a separate region or
> that this data is handled separately.  All of this data is opaque to
> QEMU, so why do we need to separate it?
hi Alex,
as the device state interfaces are provided by kernel, it is expected to
meet as general needs as possible. So, do you think there are such use
cases from user space that user space knows well of the device, and
it wants kernel to return desired data back to it.
E.g. It just wants to get whole device config data including all mmios,
page tables, pci config data...
or, It just wants to get current device memory snapshot, not including any
dirty data.
Or, It just needs the dirty pages in device memory or system memory.
With all this accurate query, quite a lot of useful features can be
developped in user space.

If all of this data is opaque to user app, seems the only use case is
for live migration.

>From another aspect, if the final solution is to let the data opaque to
user space, like what NV did, kernel side's implementation will be more
complicated, and actually a little challenge to vendor driver.
in that case, in pre-copy phase,
1. in not LOGGING state, vendor driver first returns full data including
full device memory snapshot
2. user space reads some data (you can't expect it to finish reading all
data)
3. then userspace set the device state to LOGGING to start dirty data
logging
4. vendor driver starts dirty data logging, and appends the dirty data to
the tail of remaining unread full data and increase the pending data size?
5. user space keeps reading data.
6. vendor driver keeps appending new dirty data to the tail of remaining
unread full data/dirty data and increase the pending data size?

in stop-and-copy phase
1. user space sets device state to exit LOGGING state,
2. vendor driver stops data logging. it has to append device config
   data at the tail of remaining dirty data unread by userspace.

during this flow, when vendor driver should get dirty data? just keeps
logging and appends to tail? how to ensure dirty data is refresh new before
LOGGING state exit? how does vendor driver know whether certain dirty data
is copied or not?

I've no idea how NVidia handle this problem, and they don't open their
kernel side code. 
just feel it's a bit hard for other vendor drivers to follow:)

> > System memory dirty pages: If a device produces dirty pages in system
> > memory, it is able to get dirty bitmap for certain range of system
> > memory. This dirty bitmap is queried in pre-copy and stop-and-copy
> > phase in .log_sync callback. By setting dirty bitmap in .log_sync
> > 

[Qemu-devel] [PATCH v4 0/2] CODING_STYLE: trivial update

2019-02-24 Thread Wei Yang
The first one is suggested by Igor Mammedov to provide rule for multiline
code.

The second is a trivial fix to make example code all indented with 4 spaces.

v4:
  * one exception case for function
v3:
  * fix typo in both changelog and example
v2:
  * adjust Patch 1 as suggested by Eric


Wei Yang (2):
  CODING_STYLE: specify the indent rule for multiline code
  CODING_STYLE: indent example code as all others

 CODING_STYLE | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

-- 
2.19.1




[Qemu-devel] [PATCH v4 1/2] CODING_STYLE: specify the indent rule for multiline code

2019-02-24 Thread Wei Yang
We didn't specify the indent rule for multiline code here, which may
mislead users. And in current code, the code use different rules.

Add this rule in CODING_STYLE to make sure this is clear to every one.

Signed-off-by: Wei Yang 
Suggested-by: Igor Mammedov 

---
v4:
   * widths -> width
   * add an exception example for function
v3:
   * misleading -> mislead
   * add comma after arg2 in example
v2:
   * rephrase changelog suggested by Eric Blake
 - remove one redundant line
 - fix some awkward grammar
 - add { ; at the end of example
---
 CODING_STYLE | 29 +
 1 file changed, 29 insertions(+)

diff --git a/CODING_STYLE b/CODING_STYLE
index ec075dedc4..1bccf4f865 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -29,6 +29,35 @@ Spaces of course are superior to tabs because:
 
 Do not leave whitespace dangling off the ends of lines.
 
+1.1 Multiline Indent
+
+There are several places where indent is necessary:
+
+ - struct definition
+ - if/else
+ - while/for
+ - function definition & call
+
+When breaking up a long line to fit within line width, align the secondary
+lines just after the opening parenthesis of the first.
+
+For example:
+
+if (a == 1 &&
+b == 2) {
+
+while (a == 1 &&
+   b == 2) {
+
+do_something(arg1, arg2,
+ arg3);
+
+One exception for function is indenting following lines relative to function
+name start:
+
+do_something(arg1, arg2,
+arg3);
+
 2. Line width
 
 Lines should be 80 characters; try not to make them longer.
-- 
2.19.1




[Qemu-devel] [PATCH v4 2/2] CODING_STYLE: indent example code as all others

2019-02-24 Thread Wei Yang
All the example code are indented with four spaces except this one.

Fix this by adding four spaces here.

Signed-off-by: Wei Yang 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
---
 CODING_STYLE | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/CODING_STYLE b/CODING_STYLE
index 1bccf4f865..e6d21dcd67 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -137,10 +137,10 @@ block to a separate function altogether.
 When comparing a variable for (in)equality with a constant, list the
 constant on the right, as in:
 
-if (a == 1) {
-/* Reads like: "If a equals 1" */
-do_something();
-}
+if (a == 1) {
+/* Reads like: "If a equals 1" */
+do_something();
+}
 
 Rationale: Yoda conditions (as in 'if (1 == a)') are awkward to read.
 Besides, good compilers already warn users when '==' is mis-typed as '=',
-- 
2.19.1




Re: [Qemu-devel] [PATCH v5] i386, acpi: check acpi_memory_hotplug capacity in pre_plug

2019-02-24 Thread Wei Yang
On Mon, Feb 25, 2019 at 09:07:08AM +0800, Wei Yang wrote:
>Currently we do device realization like below:
>
>   hotplug_handler_pre_plug()
>   dc->realize()
>   hotplug_handler_plug()
>
>Before we do device realization and plug, we should allocate necessary
>resources and check if memory-hotplug-support property is enabled.
>
>At the piix4 and ich9, the memory-hotplug-support property is checked at
>plug stage. This means that device has been realized and mapped into guest
>address space 'pc_dimm_plug()' by the time acpi plug handler is called,
>where it might fail and crash QEMU due to reaching g_assert_not_reached()
>(piix4) or error_abort (ich9).
>
>Fix it by checking if memory hotplug is enabled at pre_plug stage
>where we can gracefully abort hotplug request.
>
>Signed-off-by: Wei Yang 
>CC: Igor Mammedov 
>CC: Eric Blake 
>Signed-off-by: Wei Yang 
>
>---
>v5:
>   * rebase on latest upstream
>   * remove a comment before hotplug_handler_pre_plug
>   * fix alignment for ich9_pm_device_pre_plug_cb
>v4:
>   * fix code alignment of piix4_device_pre_plug_cb
>v3:
>   * replace acpi_memory_hotplug with memory-hotplug-support in changelog
>   * fix code alignment of ich9_pm_device_pre_plug_cb
>   * print which device type memory-hotplug-support is disabled in
> ich9_pm_device_pre_plug_cb and piix4_device_pre_plug_cb
>v2:
>   * (imamm...@redhat.com)
> - Almost the whole third paragraph
>   * apply this change to ich9
>   * use hotplug_handler_pre_plug() instead of open-coding check
>---
> hw/acpi/ich9.c | 15 +--
> hw/acpi/piix4.c| 13 ++---
> hw/i386/pc.c   |  2 ++
> hw/isa/lpc_ich9.c  |  1 +
> include/hw/acpi/ich9.h |  2 ++
> 5 files changed, 28 insertions(+), 5 deletions(-)
>
>diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
>index c5d8646abc..e53dfe1ee3 100644
>--- a/hw/acpi/ich9.c
>+++ b/hw/acpi/ich9.c
>@@ -483,13 +483,24 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
>*pm, Error **errp)
>  NULL);
> }
> 
>+void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>+Error **errp)
>+{
>+ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
>+
>+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>+!lpc->pm.acpi_memory_hotplug.is_enabled)
>+error_setg(errp,
>+   "memory hotplug is not enabled: %s.memory-hotplug-support "
>+   "is not set", object_get_typename(OBJECT(lpc)));
>+}
>+
> void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> Error **errp)
> {
> ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
> 
>-if (lpc->pm.acpi_memory_hotplug.is_enabled &&
>-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> nvdimm_acpi_plug_cb(hotplug_dev, dev);
> } else {
>diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
>index df8c0db909..8b9654e437 100644
>--- a/hw/acpi/piix4.c
>+++ b/hw/acpi/piix4.c
>@@ -374,9 +374,16 @@ static void piix4_pm_powerdown_req(Notifier *n, void 
>*opaque)
> static void piix4_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> DeviceState *dev, Error **errp)

Hmm, I found one interesting thing.

This function is introduced in

commit ec266f408882fd38475f72c4e864ed576228643b
pci/pcihp: perform check for bus capability in pre_plug handler

This is just implemented in piix4, but don't has the counterpart in ich9.

So I suggest to have a follow up patch to create the counterpart for ich9 and
then apply this patch on top of that.

Is my understanding correct?

> {
>+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
>+
> if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> acpi_pcihp_device_pre_plug_cb(hotplug_dev, dev, errp);
>-} else if (!object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>+} else if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>+   !s->acpi_memory_hotplug.is_enabled) {
>+error_setg(errp,
>+   "memory hotplug is not enabled: %s.memory-hotplug-support "
>+   "is not set", object_get_typename(OBJECT(s)));
>+} else if (
>!object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> error_setg(errp, "acpi: device pre plug request for not supported"
>" device type: %s", object_get_typename(OBJECT(dev)));
>@@ -388,8 +395,7 @@ static void piix4_device_plug_cb(HotplugHandler 
>*hotplug_dev,
> {
> PIIX4PMState *s = PIIX4_PM(hotplug_dev);
> 
>-if (s->acpi_memory_hotplug.is_enabled &&
>-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> nvdimm_acpi_plug_cb(hotplug_dev, dev);
> } else 

[Qemu-devel] [PATCH v5] i386, acpi: check acpi_memory_hotplug capacity in pre_plug

2019-02-24 Thread Wei Yang
Currently we do device realization like below:

   hotplug_handler_pre_plug()
   dc->realize()
   hotplug_handler_plug()

Before we do device realization and plug, we should allocate necessary
resources and check if memory-hotplug-support property is enabled.

At the piix4 and ich9, the memory-hotplug-support property is checked at
plug stage. This means that device has been realized and mapped into guest
address space 'pc_dimm_plug()' by the time acpi plug handler is called,
where it might fail and crash QEMU due to reaching g_assert_not_reached()
(piix4) or error_abort (ich9).

Fix it by checking if memory hotplug is enabled at pre_plug stage
where we can gracefully abort hotplug request.

Signed-off-by: Wei Yang 
CC: Igor Mammedov 
CC: Eric Blake 
Signed-off-by: Wei Yang 

---
v5:
   * rebase on latest upstream
   * remove a comment before hotplug_handler_pre_plug
   * fix alignment for ich9_pm_device_pre_plug_cb
v4:
   * fix code alignment of piix4_device_pre_plug_cb
v3:
   * replace acpi_memory_hotplug with memory-hotplug-support in changelog
   * fix code alignment of ich9_pm_device_pre_plug_cb
   * print which device type memory-hotplug-support is disabled in
 ich9_pm_device_pre_plug_cb and piix4_device_pre_plug_cb
v2:
   * (imamm...@redhat.com)
 - Almost the whole third paragraph
   * apply this change to ich9
   * use hotplug_handler_pre_plug() instead of open-coding check
---
 hw/acpi/ich9.c | 15 +--
 hw/acpi/piix4.c| 13 ++---
 hw/i386/pc.c   |  2 ++
 hw/isa/lpc_ich9.c  |  1 +
 include/hw/acpi/ich9.h |  2 ++
 5 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index c5d8646abc..e53dfe1ee3 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -483,13 +483,24 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
*pm, Error **errp)
  NULL);
 }
 
+void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
+Error **errp)
+{
+ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
+
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
+!lpc->pm.acpi_memory_hotplug.is_enabled)
+error_setg(errp,
+   "memory hotplug is not enabled: %s.memory-hotplug-support "
+   "is not set", object_get_typename(OBJECT(lpc)));
+}
+
 void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
 Error **errp)
 {
 ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
 
-if (lpc->pm.acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
 nvdimm_acpi_plug_cb(hotplug_dev, dev);
 } else {
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index df8c0db909..8b9654e437 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -374,9 +374,16 @@ static void piix4_pm_powerdown_req(Notifier *n, void 
*opaque)
 static void piix4_device_pre_plug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
+PIIX4PMState *s = PIIX4_PM(hotplug_dev);
+
 if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
 acpi_pcihp_device_pre_plug_cb(hotplug_dev, dev, errp);
-} else if (!object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
+   !s->acpi_memory_hotplug.is_enabled) {
+error_setg(errp,
+   "memory hotplug is not enabled: %s.memory-hotplug-support "
+   "is not set", object_get_typename(OBJECT(s)));
+} else if (
!object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 error_setg(errp, "acpi: device pre plug request for not supported"
" device type: %s", object_get_typename(OBJECT(dev)));
@@ -388,8 +395,7 @@ static void piix4_device_plug_cb(HotplugHandler 
*hotplug_dev,
 {
 PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
-if (s->acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
 nvdimm_acpi_plug_cb(hotplug_dev, dev);
 } else {
@@ -704,6 +710,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void 
*data)
 dc->user_creatable = false;
 dc->hotpluggable = false;
 hc->pre_plug = piix4_device_pre_plug_cb;
+hc->pre_plug = piix4_device_pre_plug_cb;
 hc->plug = piix4_device_plug_cb;
 hc->unplug_request = piix4_device_unplug_request_cb;
 hc->unplug = piix4_device_unplug_cb;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 578f772e7c..64d9e756fa 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2083,6 +2083,8 @@ static void pc_memory_pre_plug(HotplugHandler 
*hotplug_dev, 

Re: [Qemu-devel] [PATCH] tcg/ppc: Add vector opcodes

2019-02-24 Thread David Gibson
On Fri, Feb 22, 2019 at 05:13:37PM +, Mark Cave-Ayland wrote:
> On 22/02/2019 05:59, Richard Henderson wrote:
> 
> > This requires VSX, not just Altivec, so Power7 or later.
> > 
> > Signed-off-by: Richard Henderson 
> > ---
> > 
> > At present there are no tunables that can avoid the 64-bit element
> > load/store requirement.  As with requiring AVX1 for x86 hosts, I'm
> > not sure it's worth inventing such a tunable for pre-power7 hosts.
> > 
> > Tested vs aarch64 risu test cases.  It's probably worth testing
> > this vs Mark's target/ppc conversion.
> 
> Oooh this looks really exciting! However... I only have a G4 Mac Mini around 
> that I
> use for testing which is Altivec-only :(  Is it much work to support
> non-VSX hosts?

I have access to POWER8 and POWER9 machines, but I haven't worked with
RISU before.  If you can give me a straightforward recipe I can try
running the tests.

> 
> This leads me to a related point that came up when Howard and I were testing 
> the PPC
> vector patches - how do we know at runtime which optimisations were being 
> used, e.g.
> what is the value of have_avx2 on a particular CPU running QEMU?
> 
> Under Linux this isn't too bad since you can just do "cat /proc/cpuinfo | 
> grep avx2"
> but it becomes more tricky when getting bug reports from Windows users who 
> aren't
> particularly technical...
> 
> 
> ATB,
> 
> Mark.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PULL 00/26] pci, pc, virtio: fixes, cleanups, tests

2019-02-24 Thread David Gibson
On Fri, Feb 22, 2019 at 10:53:54AM -0500, Michael S. Tsirkin wrote:
> On Fri, Feb 22, 2019 at 03:47:36PM +, Peter Maydell wrote:
> > On Fri, 22 Feb 2019 at 02:40, Michael S. Tsirkin  wrote:
> > >
> > > The following changes since commit 
> > > fc3dbb90f2eb069801bfb4cfe9cbc83cf9c5f4a9:
> > >
> > >   Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' 
> > > into staging (2019-02-21 13:09:33 +)
> > >
> > > are available in the Git repository at:
> > >
> > >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> > >
> > > for you to fetch changes up to 1f8c04f18d2ee2f6ec88217dfd547ab38d2be5c5:
> > >
> > >   pci: Sanity test minimum downstream LNKSTA (2019-02-21 12:28:41 -0500)
> > >
> > > 
> > > pci, pc, virtio: fixes, cleanups, tests
> > >
> > > Lots of work on tests: BiosTablesTest UEFI app,
> > > vhost-user testing for non-Linux hosts.
> > > Misc cleanups and fixes all over the place
> > >
> > > Signed-off-by: Michael S. Tsirkin 
> > >
> > > 
> > 
> > Compile failure on clang:
> > 
> > /home/petmay01/linaro/qemu-for-merges/hw/virtio/virtio-balloon.c:40:3:
> > error: redefinition of typedef 'PartiallyBalloonedPage' is a C11
> > feature [-Werror,-Wtypedef-redefinition]
> > } PartiallyBalloonedPage;
> >   ^
> > /home/petmay01/linaro/qemu-for-merges/include/hw/virtio/virtio-balloon.h:33:39:
> > note: previous definition is here
> > typedef struct PartiallyBalloonedPage PartiallyBalloonedPage;
> >   ^
> > 1 error generated.
> > /home/petmay01/linaro/qemu-for-merges/rules.mak:69: recipe for target
> > 'hw/virtio/virtio-balloon.o' failed
> > 
> > thanks
> > -- PMM
> 
> Fixed up and re-pushed.
> David, pls note above and don't add duplicate typedefs in the future.
> There's always include/qemu/typedefs.h if you don't know where
> to put a typedef.

Yeah, sorry.  I noticed the failure on Travis and was going to send a
fix, not realizing you'd already picked it up.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification

2019-02-24 Thread David Gibson
On Fri, Feb 22, 2019 at 10:08:22AM +0100, Greg Kurz wrote:
> On Thu, 14 Feb 2019 15:39:13 +1100
> David Gibson  wrote:
> 
> > The virtio-balloon device's verification of the address given to it by the
> > guest has a number of faults:
> > * The addresses here are guest physical addresses, which should be
> >   'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> >   pretty subtle and confusing)
> > * We don't check for section.mr being NULL, which is the main way that
> >   memory_region_find() reports basic failures.  We really need to check
> >   that before looking at any other section fields, because
> >   memory_region_find() doesn't initialize them on the failure path
> > * We're passing a length of '1' to memory_region_find(), but really the
> >   guest is requesting that we put the entire page into the balloon,
> >   so it makes more sense to call it with BALLOON_PAGE_SIZE
> > 
> > Signed-off-by: David Gibson 
> > Reviewed-by: David Hildenbrand 
> > Reviewed-by: Michael S. Tsirkin 
> > ---
> >  hw/virtio/virtio-balloon.c | 17 ++---
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 43af521884..eb357824d8 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice 
> > *vdev, VirtQueue *vq)
> >  }
> >  
> >  while (iov_to_buf(elem->out_sg, elem->out_num, offset, , 4) == 
> > 4) {
> > -ram_addr_t pa;
> > -ram_addr_t addr;
> > +hwaddr pa;
> > +hwaddr addr;
> >  int p = virtio_ldl_p(vdev, );
> >  
> > -pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> > +pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> >  offset += 4;
> >  
> > -/* FIXME: remove get_system_memory(), but how? */
> > -section = memory_region_find(get_system_memory(), pa, 1);
> > -if (!int128_nz(section.size) ||
> > -!memory_region_is_ram(section.mr) ||
> > +section = memory_region_find(get_system_memory(), pa,
> > + BALLOON_PAGE_SIZE);
> > +if (!section.mr) {
> > +trace_virtio_balloon_bad_addr(pa);
> > +continue;
> > +}
> 
> memory_region_unref(section.mr) with section.mr == NULL is safe and
> resolves to a nop. Not sure you need a separate if to handle this
> case.

memory_region_is_ram() and friends are not, however - they will
dereference their argument unconditionally.

> 
> Apart from that,
> 
> Reviewed-by: Greg Kurz 
> 
> > +if (!memory_region_is_ram(section.mr) ||
> >  memory_region_is_rom(section.mr) ||
> >  memory_region_is_romd(section.mr)) {
> >  trace_virtio_balloon_bad_addr(pa);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] hw/ppc: Use object_initialize_child for correct reference counting

2019-02-24 Thread David Gibson
On Fri, Feb 22, 2019 at 07:20:14AM +0100, Thomas Huth wrote:
> On 21/02/2019 19.14, Philippe Mathieu-Daudé wrote:
> > On 2/21/19 12:24 PM, Thomas Huth wrote:
> >> Both functions, object_initialize() and object_property_add_child() 
> >> increase
> >> the reference counter of the new object, so one of the references has to be
> >> dropped afterwards to get the reference counting right. Otherwise the child
> >> object will not be properly cleaned up when the parent gets destroyed.
> >> Thus let's use now object_initialize_child() instead to get the reference
> >> counting here right.
> >>
> >> Suggested-by: Eduardo Habkost 
> >> Signed-off-by: Thomas Huth 
> >> ---
> >>  hw/ppc/pnv.c | 12 ++--
> >>  hw/ppc/pnv_psi.c |  4 ++--
> >>  hw/ppc/spapr.c   |  6 +++---
> >>  3 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index da54086..9e03e9c 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -736,18 +736,18 @@ static void pnv_chip_power8_instance_init(Object 
> >> *obj)
> >>  {
> >>  Pnv8Chip *chip8 = PNV8_CHIP(obj);
> >>  
> >> -object_initialize(>psi, sizeof(chip8->psi), TYPE_PNV_PSI);
> >> -object_property_add_child(obj, "psi", OBJECT(>psi), NULL);
> >> +object_initialize_child(obj, "psi",  >psi, sizeof(chip8->psi),
> >> +TYPE_PNV_PSI, _abort, NULL);
> >>  object_property_add_const_link(OBJECT(>psi), "xics",
> >> OBJECT(qdev_get_machine()), 
> >> _abort);
> >>  
> >> -object_initialize(>lpc, sizeof(chip8->lpc), TYPE_PNV_LPC);
> >> -object_property_add_child(obj, "lpc", OBJECT(>lpc), NULL);
> >> +object_initialize_child(obj, "lpc",  >lpc, sizeof(chip8->lpc),
> >> +TYPE_PNV_LPC, _abort, NULL);
> >>  object_property_add_const_link(OBJECT(>lpc), "psi",
> >> OBJECT(>psi), _abort);
> >>  
> >> -object_initialize(>occ, sizeof(chip8->occ), TYPE_PNV_OCC);
> >> -object_property_add_child(obj, "occ", OBJECT(>occ), NULL);
> >> +object_initialize_child(obj, "occ",  >occ, sizeof(chip8->occ),
> >> +TYPE_PNV_OCC, _abort, NULL);
> >>  object_property_add_const_link(OBJECT(>occ), "psi",
> >> OBJECT(>psi), _abort);
> >>  }
> >> diff --git a/hw/ppc/pnv_psi.c b/hw/ppc/pnv_psi.c
> >> index 8ced095..44bc0cb 100644
> >> --- a/hw/ppc/pnv_psi.c
> >> +++ b/hw/ppc/pnv_psi.c
> >> @@ -444,8 +444,8 @@ static void pnv_psi_init(Object *obj)
> >>  {
> >>  PnvPsi *psi = PNV_PSI(obj);
> >>  
> >> -object_initialize(>ics, sizeof(psi->ics), TYPE_ICS_SIMPLE);
> >> -object_property_add_child(obj, "ics-psi", OBJECT(>ics), NULL);
> >> +object_initialize_child(obj, "ics-psi",  >ics, sizeof(psi->ics),
> >> +TYPE_ICS_SIMPLE, _abort, NULL);
> >>  }
> >>  
> >>  static const uint8_t irq_to_xivr[] = {
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index abf9ebc..6c58dca 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -1696,9 +1696,9 @@ static void spapr_create_nvram(sPAPRMachineState 
> >> *spapr)
> >>  
> >>  static void spapr_rtc_create(sPAPRMachineState *spapr)
> >>  {
> >> -object_initialize(>rtc, sizeof(spapr->rtc), TYPE_SPAPR_RTC);
> >> -object_property_add_child(OBJECT(spapr), "rtc", OBJECT(>rtc),
> >> -  _fatal);
> >> +object_initialize_child(OBJECT(spapr), "rtc",
> >> +>rtc, sizeof(spapr->rtc), 
> >> TYPE_SPAPR_RTC,
> >> +_fatal, NULL);
> >>  object_property_set_bool(OBJECT(>rtc), true, "realized",
> >>_fatal);
> >>  object_property_add_alias(OBJECT(spapr), "rtc-time", 
> >> OBJECT(>rtc),
> >>
> > 
> > What about intc/spapr_xive.c?
> 
> Good hint. I missed that one since it's in hw/intc, not in hw/ppc...
> 
> David, could you please squash this on top:
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 290a290..ac5f5ef 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -244,13 +244,12 @@ static void spapr_xive_instance_init(Object *obj)
>  {
>  sPAPRXive *xive = SPAPR_XIVE(obj);
>  
> -object_initialize(>source, sizeof(xive->source), TYPE_XIVE_SOURCE);
> -object_property_add_child(obj, "source", OBJECT(>source), NULL);
> +object_initialize_child(obj, "source", >source, 
> sizeof(xive->source),
> +TYPE_XIVE_SOURCE, _abort, NULL);
>  
> -object_initialize(>end_source, sizeof(xive->end_source),
> -  TYPE_XIVE_END_SOURCE);
> -object_property_add_child(obj, "end_source", OBJECT(>end_source),
> -  NULL);
> +object_initialize_child(obj, "end_source", >end_source,
> +sizeof(xive->end_source), TYPE_XIVE_END_SOURCE,
> +_abort, NULL);
>  }
> 

Re: [Qemu-devel] x86 segment limits enforcement with TCG

2019-02-24 Thread Stephen Checkoway



> On Feb 24, 2019, at 14:46, Peter Maydell  wrote:
> 
> On Sun, 24 Feb 2019 at 19:37, Stephen Checkoway
>  wrote:
>> I think that something about adding the tcg_gen_brcond_tl is causing values 
>> to become dead and then qemu aborts.
> 
> Yep -- all "TCG temporaries" are dead at the end
> of a basic block, and brcond ends a basic block.
> Only globals and "local temporaries" stay live
> across brcond. This is documented in tcg/README,
> though it doesn't spell it out very explicitly.

Ah yes. I see that now. I missed it on my first read through.

> This makes brcond pretty painful to use and
> almost impossible to introduce into the middle
> of some existing sequence of generated code.
> I haven't looked at what the best way to do what
> you're trying to do here is, though.

Are there other examples of straight-line code being converted to a conditional 
I might be able to use as an example? I thought INTO would be a good example, 
but it merely calls a helper. Maybe I should do that? I assume it'll be slow, 
but speed isn't really my primary concern.

> By the way, don't do this:
> +dc->A1 = tcg_temp_new();
> 
> The current use of a small number of tcg temps
> in the i386 translate.c code is an antipattern
> that is a relic from a very old version of the
> code. It's much better to simply create new
> temporaries in the code at the point where you
> need them and then free them once you're done.

Great, thanks. I saw both the A0/T0/T1 and the creation of new temporaries and 
I wasn't sure which pattern I should follow.

-- 
Stephen Checkoway








Re: [Qemu-devel] [Bug 1817345] Re: configure script breaks when $source_path contains white spaces

2019-02-24 Thread Peter Maydell
On Sun, 24 Feb 2019 at 19:46, Michael Tokarev
<1817...@bugs.launchpad.net> wrote:
> I think it is better to just disallow building in a path containing
> spaces, -- there are so many packages and other tools that fails in this
> config, might require lots of work to fix this AND to ensure all future
> changes are still working, and there's an easy workaround

Certainly I bet that the problems with spaces in the path do
not stop with the handling of the target-list help message.
Make does not really get on at all with spaces...
It would probably be helpful if we made configure complain
if it finds spaces in the build or source paths.

thanks
-- PMM



Re: [Qemu-devel] x86 segment limits enforcement with TCG

2019-02-24 Thread Peter Maydell
On Sun, 24 Feb 2019 at 19:37, Stephen Checkoway
 wrote:
> I think that something about adding the tcg_gen_brcond_tl is causing values 
> to become dead and then qemu aborts.

Yep -- all "TCG temporaries" are dead at the end
of a basic block, and brcond ends a basic block.
Only globals and "local temporaries" stay live
across brcond. This is documented in tcg/README,
though it doesn't spell it out very explicitly.

This makes brcond pretty painful to use and
almost impossible to introduce into the middle
of some existing sequence of generated code.
I haven't looked at what the best way to do what
you're trying to do here is, though.

By the way, don't do this:
+dc->A1 = tcg_temp_new();

The current use of a small number of tcg temps
in the i386 translate.c code is an antipattern
that is a relic from a very old version of the
code. It's much better to simply create new
temporaries in the code at the point where you
need them and then free them once you're done.

thanks
-- PMM



[Qemu-devel] [Bug 1817345] Re: configure script breaks when $source_path contains white spaces

2019-02-24 Thread Michael Tokarev
I think it is better to just disallow building in a path containing
spaces, -- there are so many packages and other tools that fails in this
config, might require lots of work to fix this AND to ensure all future
changes are still working, and there's an easy workaround

Just my few cents.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1817345

Title:
  configure script breaks when $source_path contains white spaces

Status in QEMU:
  New

Bug description:
  Hi,

  I noticed that the configure script breaks when the qemu source
  directory is in a path containing white spaces, in particular the list
  of targets is not correctly generated when calling "./configure
  --help".

  Steps to reproduce the problem:

  $ mkdir "dir with spaces"
  $ cd dir\ with\ spaces/
  $ git clone https://git.qemu.org/git/qemu.git
  $ cd qemu/
  $ ./configure --help | grep -A3 target-list

  
  Actual result:

--target-list=LIST   set target list (default: build everything)
 Available targets: dir with *-softmmu dir with 
 *-linux-user

  
  Expected result:

--target-list=LIST   set target list (default: build everything)
 Available targets: aarch64-softmmu alpha-softmmu 
 arm-softmmu cris-softmmu hppa-softmmu i386-softmmu 
 lm32-softmmu m68k-softmmu microblaze-softmmu 

  
  This happens because the $mak_wilds variable uses spaces to separate 
different paths, maybe newlines may be used, which are less likely to be in 
directory names.

  BTW "shellcheck" may help finding some other problems.

  Qemu version:

  $ git describe 
  v3.1.0-1960-ga05838cb2a

  Thanks,
 Antonio

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1817345/+subscriptions



[Qemu-devel] x86 segment limits enforcement with TCG

2019-02-24 Thread Stephen Checkoway
Hi all,

Sorry for the long email. The TL;DR is I'm trying to add x86 segment limit 
checks in TCG and my added code is causing qemu to abort during code generation 
because some registers have become dead.

I have some x86 firmware that refuses to boot unless a variety of x86 security 
mechanisms are enforced. In particular, it checks that the general protection 
fault handler runs if an access to memory is greater than a segment limit: it 
configures a segment with a limit of 3, sets ds to the corresponding selector 
and then tries to load 4 bytes from address 3.

I looked through target/i386/translate.c and it looks like in most (all?) 
cases, gen_lea_v_seg is used, either directly or indirectly, to compute a 
linear address by adding the base address of the segment. I attempted to modify 
this to check the segment limit and raise a general protection fault if the 
base address plus the size of the access is greater than the limit. The full 
diff is below, but the main change adds this.

if (s->pe && !CODE64(s)) {
/*
 * Check that the access is within the segment limit in protected
 * mode.
 */
TCGLabel *label = gen_new_label();
int access_limit = (1 << (aflag & MO_SIZE)) - 1;
tcg_gen_mov_tl(s->A1, a0);
if (access_limit > 0) {
tcg_gen_addi_tl(s->A1, s->A1, access_limit);
}
tcg_gen_brcond_tl(TCG_COND_LE, s->A1, limit, label);
gen_exception(s, EXCP0D_GPF, s->pc_start - s->cs_base);
gen_set_label(label);
}

With this change, qemu aborts when trying to run tests/pxe-test. I ran the test 
in lldb and it's aborting in the TCG generation (full stack trace below).

frame #3: 0x00010001eac4 qemu-system-i386`temp_load(s=0x000105062600, 
ts=0x000105063170, desired_regs=65535, allocated_regs=48, preferred_regs=0) 
at tcg.c:3211
   3208 break;
   3209 case TEMP_VAL_DEAD:
   3210 default:
-> 3211 tcg_abort();
   3212 }
   3213 ts->reg = reg;
   3214 ts->val_type = TEMP_VAL_REG;

because ts->value_type is TEMP_VAL_DEAD.

I tried removing everything except for the tcg_gen_brcond_tl (using a0, rather 
than s->A1) and the gen_set_label and I have the same behavior. The operation 
it aborts on is

(lldb) p *op
(TCGOp) $0 = {
  opc = INDEX_op_add_i32
  param1 = 0
  param2 = 0
  life = 24
  link = {
tqe_next = 0x000105055c30
tqe_circ = {
  tql_next = 0x000105055c30
  tql_prev = 0x000105055b48
}
  }
  args = ([0] = 4378961264, [1] = 4378961264, [2] = 4378960256, [3] = 
18446744073709486096, [4] = 281470681808641, [5] = 4378961880, [6] = 
4378961880, [7] = 0, [8] = 18446744069414649855, [9] = 5, [10] = 4379204752)
  output_pref = ([0] = 65343, [1] = 1)
}

which I think is coming from the add for the segment base. Changing the 
condition to TCG_COND_ALWAYS breaks elsewhere. 

I think that something about adding the tcg_gen_brcond_tl is causing values to 
become dead and then qemu aborts.

My questions are
1. Is this the right place to add segment limit checks? (I guess I don't need 
to check every memory access; catching the majority of them is likely fine.)

2. Am I using tcg_gen_brcond_tl (or any of the other TCG functions) incorrectly?

3. Is there a way I can see the generated TCG for this test? I've been running 
lldb using
$ QTEST_QEMU_BINARY='tmux split-window lldb -- i386-softmmu/qemu-system-i386' 
QTEST_QEMU_IMG=qemu-img tests/pxe-test -m=quick -k --tap
but when I try to add additional options like -d op, the test harness aborts: 
qemu/tests/boot-sector.c:119:boot_sector_init: code should not be reached

Any assistance is very much appreciated.

Thank you,

Steve

Stack trace:
  * frame #0: 0x7fff6b45323e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff6b509c1c libsystem_pthread.dylib`pthread_kill + 285
frame #2: 0x7fff6b3bc1c9 libsystem_c.dylib`abort + 127
frame #3: 0x00010001eac4 
qemu-system-i386`temp_load(s=0x000105062600, ts=0x000105063170, 
desired_regs=65535, allocated_regs=48, preferred_regs=0) at tcg.c:3211
frame #4: 0x00010001af09 
qemu-system-i386`tcg_reg_alloc_op(s=0x000105062600, op=0x000104a78300) 
at tcg.c:3458
frame #5: 0x0001000176f2 
qemu-system-i386`tcg_gen_code(s=0x000105062600, tb=0x0001058f93c0) at 
tcg.c:3981
frame #6: 0x0001000d3972 
qemu-system-i386`tb_gen_code(cpu=0x000105043c00, pc=787953, cs_base=786432, 
flags=192, cflags=-16252928) at translate-all.c:1751
frame #7: 0x0001000d1696 
qemu-system-i386`tb_find(cpu=0x000105043c00, last_tb=0x, 
tb_exit=0, cf_mask=524288) at cpu-exec.c:407
frame #8: 0x0001000d0dc1 
qemu-system-i386`cpu_exec(cpu=0x000105043c00) at cpu-exec.c:728
frame #9: 0x00010006bbf1 
qemu-system-i386`tcg_cpu_exec(cpu=0x000105043c00) at cpus.c:1429
frame #10: 0x00010006b4b7 

Re: [Qemu-devel] [PATCH] riscv: Add proper alignment check and pending 'C' extension upon misa writes

2019-02-24 Thread Richard Henderson
On 2/23/19 11:57 PM, Amed Magdy wrote:
> Thank you for your review and feedback, Richard.
> As Eric mentioned, this is the first time contribution. I have been exploring
> Qemu for some time and try to understand main flow, internals, ..etc.
> 
>>  You cannot manipulate env like this during translation.
> 
>>  Neither the write to env->pc_next nor the read from env->pending_rvc here 
>>will
>>  be in any synchronization with the execution of write_misa. 
>  
> Does this applies for translated code in a single translation block only or 
> for
> different TBs also ?

All of the time this will not work.

> So should I manipulate "env" from translation context through helpers only ?
> for example:
> 
> TCGv temp;
> tcg_gen_movi_tl(temp, ctx->pc_succ_insn);
> gen_helper_next_pc(cpu_env, temp);
> 
> while the helper function definition like that:
> void helper_next_pc(CPURISCVState *env, target_ulong pc_next)
> {
>     env->pc_next = pc_next;
> }

This is a correct way to write pc_next.  (Although I still don't understand why
you need it.)

> and the same to read "env->pending_rvc"

No, you cannot read pending_rvc this way.  You would need to incorporate
pending_rvc into the flags set by cpu_get_tb_cpu_state.  (Although I still
don't understand why you need it.)

> I'm thinking of a way to add 'C' extension at run time through waiting the
> correct aligned instruction, which I believe it might be after branch 
> something
> quite similar to switching between ARM and THUMB states in ARM, for misa 'RVC'
> enable to take effect since it will be no possibility to check alignment with
> 'C' extension.

It seems to me that the C extension can be enabled at any point, since if C is
off, you know that the next insn is aligned modulo 4.

It is only if the C extension is enabled, and you want to disable it, that is
when we must check to see if the next insn is aligned mod 4.  It is trivial to
arrange for a particular instruction to be aligned, via assembler directives.
So it seems silly to make the definition of the csr write to misa any more
complicated than it is.

You are right that the existing code is broken, in that it is checking the
alignment of the host PC and not the guest PC.  However, I see no reason to
introduce a new "pc_next" field when we already update the pc field ...

> static void gen_system(DisasContext *ctx, uint32_t opc, int rd, int rs1,
>int csr)
> {
> TCGv source1, csr_store, dest, rs1_pass, imm_rs1;
> source1 = tcg_temp_new();
> csr_store = tcg_temp_new();
> dest = tcg_temp_new();
> rs1_pass = tcg_temp_new();
> imm_rs1 = tcg_temp_new();
> gen_get_gpr(source1, rs1);
> tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);

... here.

So as far as I can see, the only thing that needs fixing is GETPC.


r~


diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 960d2b0aa9..8726ef802e 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -370,10 +370,11 @@ static int write_misa(CPURISCVState *env, int csrno,
target_ulong val)
 val &= ~RVD;
 }

-/* Suppress 'C' if next instruction is not aligned
- * TODO: this should check next_pc
+/*
+ * Suppress 'C' if next instruction is not aligned.
+ * We updated env->pc to the next insn in the translator.
  */
-if ((val & RVC) && (GETPC() & ~3) != 0) {
+if ((val & RVC) && (env->pc & ~3) != 0) {
 val &= ~RVC;
 }






Re: [Qemu-devel] [PULL 00/26] pci, pc, virtio: fixes, cleanups, tests

2019-02-24 Thread Michael S. Tsirkin
On Sun, Feb 24, 2019 at 10:21:52AM +, Peter Maydell wrote:
> On Sun, 24 Feb 2019 at 00:34, Michael S. Tsirkin  wrote:
> > Peter, can you merge for_upstream now pls? Don't want to spam
> > the list with a trivial change like that ...
> 
> Yes, it's on my list, but so are seven other pullreqs;
> seems like everybody likes to submit on a Friday, so
> sending on a Friday guarantees maximum delay because
> you'll be in a big queue with other people and I don't
> generally handle pullreqs on the weekend either...

OK I'll try to switch over to middle of the week.

> In general I prefer it if you just re-send the cover-letter
> email as a v2 rather than informally asking for a retry: that
> guarantees I'll see it and automatically makes it appear
> in my list of things to process. You don't need to
> resend all the individual patchmails if the change was
> minor.
> 
> thanks
> -- PMM

Good to know.

-- 
MST



[Qemu-devel] [PATCH] tests/Makefile.include: test all rounding modes of softfloat

2019-02-24 Thread Alex Bennée
We missed a bug in a recent patch as we were not testing all the
rounding modes for all operations. However enabling all rounding modes
for mulAdd does slow down the already slowest test and doesn't really
buy us much additional coverage so lets allow the default test flags
to be overridden.

Signed-off-by: Alex Bennée 
---
 tests/Makefile.include | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 060f765b0e..f260014f02 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -898,12 +898,12 @@ $(FP_TEST_BIN):
 # The full test suite can take a bit of time, default to a quick run
 # "-l 2 -r all" can take more than a day for some operations and is best
 # run manually
-FP_TL=-l 1
+FP_TL=-l 1 -r all
 
-# $1 = tests, $2 = description
+# $1 = tests, $2 = description, $3 = test flags
 test-softfloat = $(call quiet-command, \
cd $(BUILD_DIR)/tests/fp && \
-   ./fp-test -s $(FP_TL) $1 > $2.out 2>&1 || \
+   ./fp-test -s $(if $3,$3,$(FP_TL)) $1 > $2.out 2>&1 || \
(cat $2.out && exit 1;), \
"FLOAT TEST", $2)
 
@@ -984,7 +984,7 @@ check-softfloat-compare: $(SF_COMPARE_RULES)
 check-softfloat-mulAdd: $(FP_TEST_BIN)
$(call test-softfloat, \
f16_mulAdd f32_mulAdd f64_mulAdd f128_mulAdd, \
-   mulAdd)
+   mulAdd,-l 1)
 
 # FIXME: extF80_rem (broken)
 check-softfloat-rem: $(FP_TEST_BIN)
-- 
2.20.1




Re: [Qemu-devel] [PATCH 2/5] decodetree: Move documentation to docs/decodetree.rst

2019-02-24 Thread Peter Maydell
On Sat, 23 Feb 2019 at 23:35, Richard Henderson
 wrote:
>
> One great big block comment isn't the best way to document
> the syntax of a language.
>
> Signed-off-by: Richard Henderson 
> ---
>  MAINTAINERS   |   1 +
>  docs/decodetree.rst   | 156 ++
>  scripts/decodetree.py | 134 +---
>  3 files changed, 158 insertions(+), 133 deletions(-)
>  create mode 100644 docs/decodetree.rst

Could you put this in docs/devel/, please ?
Our plan for documentation is to try to split it into
multiple manuals, each of which gets a subdirectory of
docs/, and eventually have nothing still in the docs/
directory itself. This doc should definitely be in
devel/ as it's relevant only to QEMU developers.

thanks
-- PMM



Re: [Qemu-devel] [PULL 00/26] pci, pc, virtio: fixes, cleanups, tests

2019-02-24 Thread Peter Maydell
On Sun, 24 Feb 2019 at 00:34, Michael S. Tsirkin  wrote:
> Peter, can you merge for_upstream now pls? Don't want to spam
> the list with a trivial change like that ...

Yes, it's on my list, but so are seven other pullreqs;
seems like everybody likes to submit on a Friday, so
sending on a Friday guarantees maximum delay because
you'll be in a big queue with other people and I don't
generally handle pullreqs on the weekend either...

In general I prefer it if you just re-send the cover-letter
email as a v2 rather than informally asking for a retry: that
guarantees I'll see it and automatically makes it appear
in my list of things to process. You don't need to
resend all the individual patchmails if the change was
minor.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] riscv: Add proper alignment check and pending 'C' extension upon misa writes

2019-02-24 Thread Amed Magdy
Thank you so much, Eric.

Sorry about this unclear description.
I forgot to fix user name in git configuration before submitting the patch.

Sorry about any inconvenience.
Thanks,
Ahmed

On Sat, 23 Feb 2019 at 23:46, Eric Blake  wrote:

> On 2/22/19 10:25 AM, amagdy.af...@gmail.com wrote:
> > From: ahmed_magdy 
> >
> > Signed-off-by: ahmed_magdy 
>
> This appears to be your first contribution to qemu. Welcome to the
> community!
>
> Typically, a Signed-off-by designation should be a proper name (what you
> would sign a legal document with, as it has a legal significance on your
> right to contribute the code).  Using all lowercase and _ instead of
> space looks like a username, and while I am not one to tell you it can't
> be a legal name, it is unusual enough to at least raise my suspicions.
>
> Furthermore, your commit message doesn't give any details beyond the
> "what" in the subject line. The body of the commit message should
> explain the "why" (what bug are you fixing, how to reproduce it), so
> that a reviewer stands a chance of determining if the code matches the
> description you gave, and if the issue you describe really does warrant
> the inclusion of your patch.  You gave a brief "why" in your cover letter:
>
> "I'm submiting this patch to properly check the next instruction
> alignment and scheduale compression extenstion enable upon 'MISA'
> register writes to later aligned instruction through exporting next
> instruction 'pc' to riscv cpu state"
>
> where it would be wise to include an improved version of that text with
> this commit proper (since the cover letter does not get applied to git).
>  For that matter, when sending a single patch, a cover letter is
> optional (it is only mandatory when sending a multi-patch series).
>
> For more patch submission hints, see:
> https://wiki.qemu.org/Contribute/SubmitAPatch
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>


Re: [Qemu-devel] [PATCH v6 1/7] vhost-user: Support transferring inflight buffer between qemu and backend

2019-02-24 Thread Yongji Xie
On Sun, 24 Feb 2019 at 08:14, Michael S. Tsirkin  wrote:
>
> On Sat, Feb 23, 2019 at 09:10:01PM +0800, Yongji Xie wrote:
> > On Fri, 22 Feb 2019 at 22:54, Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Feb 22, 2019 at 03:05:23PM +0800, Yongji Xie wrote:
> > > > On Fri, 22 Feb 2019 at 14:21, Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, Feb 22, 2019 at 10:47:03AM +0800, Yongji Xie wrote:
> > > > > > > > +
> > > > > > > > +To track inflight I/O, the queue region should be processed as 
> > > > > > > > follows:
> > > > > > > > +
> > > > > > > > +When receiving available buffers from the driver:
> > > > > > > > +
> > > > > > > > +1. Get the next available head-descriptor index from 
> > > > > > > > available ring, i
> > > > > > > > +
> > > > > > > > +2. Set desc[i].inflight to 1
> > > > > > > > +
> > > > > > > > +When supplying used buffers to the driver:
> > > > > > > > +
> > > > > > > > +1. Get corresponding used head-descriptor index, i
> > > > > > > > +
> > > > > > > > +2. Set desc[i].next to process_head
> > > > > > > > +
> > > > > > > > +3. Set process_head to i
> > > > > > > > +
> > > > > > > > +4. Steps 1,2,3 may be performed repeatedly if batching is 
> > > > > > > > possible
> > > > > > > > +
> > > > > > > > +5. Increase the idx value of used ring by the size of the 
> > > > > > > > batch
> > > > > > > > +
> > > > > > > > +6. Set the inflight field of each DescStateSplit entry in 
> > > > > > > > the batch to 0
> > > > > > > > +
> > > > > > > > +7. Set used_idx to the idx value of used ring
> > > > > > > > +
> > > > > > > > +When reconnecting:
> > > > > > > > +
> > > > > > > > +1. If the value of used_idx does not match the idx value 
> > > > > > > > of used ring,
> > > > > > > > +
> > > > > > > > +(a) Subtract the value of used_idx from the idx value 
> > > > > > > > of used ring to get
> > > > > > > > +the number of in-progress DescStateSplit entries
> > > > > > > > +
> > > > > > > > +(b) Set the inflight field of the in-progress 
> > > > > > > > DescStateSplit entries which
> > > > > > > > +start from process_head to 0
> > > > > > > > +
> > > > > > > > +(c) Set used_idx to the idx value of used ring
> > > > > > > > +
> > > > > > > > +2. Resubmit each inflight DescStateSplit entry
> > > > > > >
> > > > > > > I re-read a couple of time and I still don't understand what it 
> > > > > > > says.
> > > > > > >
> > > > > > > For simplicity consider split ring. So we want a list of heads 
> > > > > > > that are
> > > > > > > outstanding. Fair enough. Now device finishes a head. What now? I 
> > > > > > > needs
> > > > > > > to drop head from the list. But list is unidirectional (just 
> > > > > > > next, no
> > > > > > > prev). So how can you drop an entry from the middle?
> > > > > > >
> > > > > >
> > > > > > The process_head is only used when slave crash between increasing 
> > > > > > the
> > > > > > idx value of used ring and updating used_idx. We use it to find the
> > > > > > in-progress DescStateSplit entries before the crash and complete 
> > > > > > them
> > > > > > when reconnecting. Make sure guest and slave have the same view for
> > > > > > inflight I/Os.
> > > > > >
> > > > >
> > > > > But I don't understand how does the described process help do it?
> > > > >
> > > >
> > > > For example, we need to submit descriptors A, B, C to driver in a batch.
> > > >
> > > > Firstly, we will link those descriptors like:
> > > >
> > > > process_head->A->B->C(A)
> > > >
> > > > Then, we need to update idx value of used vring to mark those
> > > > descriptors as used:
> > > >
> > > > _vring.used->idx += 3(B)
> > > >
> > > > At last, clear the inflight field of those descriptors and update
> > > > used_idx field:
> > > >
> > > > A.inflight = 0; B.inflight = 0; C.inflight = 0;(C)
> > > >
> > > > used_idx = _vring.used->idx;(D)
> > > >
> > > > After (B), guest can consume the descriptors A,B,C. So we must make
> > > > sure the inflight field of A,B,C is cleared when reconnecting to avoid
> > > > re-submitting used descriptor. If slave crash during (C), the inflight
> > > > field of A,B,C may be incorrect. To detect that case, we can see
> > > > whether used_idx matches _vring.used->idx. And through process_head,
> > > > we can get the in-progress descriptors A,B,C and clear their inflight
> > > > field again when reconnecting.
> > > >
> > > > >
> > > > > > In other case, the inflight field is enough to track inflight I/O.
> > > > > > When reconnecting, we go through all DescStateSplit entries and
> > > > > > re-submit the entry whose inflight field is equal to 1.
> > > > >
> > > > > What I don't understand is how do we know the order
> > > > > in which they have to be resubmitted. Reordering
> > > > > operations would be a big problem, won't it?
> > > > >
> > > >
> > > > In previous patch, I record avail_idx for each DescStateSplit entry to
> > > > preserve the order. Is it useful to fix