date:20201103

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation

2020-11-03 Thread Kirti Wankhede





On 11/4/2020 1:57 AM, Alex Williamson wrote:

On Wed, 4 Nov 2020 01:18:12 +0530
Kirti Wankhede  wrote:


On 10/30/2020 12:35 AM, Alex Williamson wrote:

On Thu, 29 Oct 2020 23:11:16 +0530
Kirti Wankhede  wrote:
   





+System memory dirty pages tracking
+--
+
+A ``log_sync`` memory listener callback is added to mark system memory pages


s/is added to mark/marks those/
  

+as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried


s/by/by the/
s/Dirty/The dirty/
  

+per container. All pages pinned by vendor driver through vfio_pin_pages()


s/by/by the/
  

+external API have to be marked as dirty during migration. When there are CPU
+writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
+by vendor driver can also be written by device. There is currently no device


s/by/by the/ (x2)
  

+which has hardware support for dirty page tracking. So all pages which are
+pinned by vendor driver are considered as dirty.
+Dirty pages are tracked when device is in stop-and-copy phase because if pages
+are marked dirty during pre-copy phase and content is transfered from source to
+destination, there is no way to know newly dirtied pages from the point they
+were copied earlier until device stops. To avoid repeated copy of same content,
+pinned pages are marked dirty only during stop-and-copy phase.


  

Let me take a quick stab at rewriting this paragraph (not sure if I
understood it correctly):

"Dirty pages are tracked when the device is in the stop-and-copy phase.
During the pre-copy phase, it is not possible to distinguish a dirty
page that has been transferred from the source to the destination from
newly dirtied pages, which would lead to repeated copying of the same
content. Therefore, pinned pages are only marked dirty during the
stop-and-copy phase." ?
  


I think above rephrase only talks about repeated copying in pre-copy
phase. Used "copied earlier until device stops" to indicate both
pre-copy and stop-and-copy till device stops.



Now I'm confused, I thought we had abandoned the idea that we can only
report pinned pages during stop-and-copy.  Doesn't the device needs to
expose its dirty memory footprint during the iterative phase regardless
of whether that causes repeat copies?  If QEMU iterates and sees that
all memory is still dirty, it may have transferred more data, but it
can actually predict if it can achieve its downtime tolerances.  Which
is more important, less data transfer or predictability?  Thanks,
   


Even if QEMU copies and transfers content of all sys mem pages during
pre-copy (worst case with IOMMU backed mdev device when its vendor
driver is not smart to pin pages explicitly and all sys mem pages are
marked dirty), then also its prediction about downtime tolerance will
not be correct, because during stop-and-copy again all pages need to be
copied as device can write to any of those pinned pages.


I think you're only reiterating my point.  If QEMU copies all of guest
memory during the iterative phase and each time it sees that all memory
is dirty, such as if CPUs or devices (including assigned devices) are
dirtying pages as fast as it copies them (or continuously marks them
dirty), then QEMU can predict that downtime will require copying all
pages. 


But as of now there is no way to know if device has dirtied pages during 
iterative phase.



If instead devices don't mark dirty pages until the VM is
stopped, then QEMU might iterate through memory copy and predict a short
downtime because not much memory is dirty, only to be surprised that
all of memory is suddenly dirty.  At that point it's too late, the VM
is already stopped, the predicted short downtime takes far longer than
expected.  This is exactly why we made the kernel interface mark pinned
pages persistently dirty when it was proposed that we only report
pinned pages once.  Thanks,



Since there is no way to know if device dirtied pages during iterative 
phase, QEMU should query pinned pages in stop-and-copy phase.


Whenever there will be hardware support or some software mechanism to 
report pages dirtied by device then we will add a capability bit in 
migration capability and based on that capability bit qemu/user space 
app should decide to query dirty pages in iterative phase.


Thanks,
Kirti

Re: VFIO Migration

2020-11-03 Thread Michael S. Tsirkin

On Mon, Nov 02, 2020 at 11:11:53AM +, Stefan Hajnoczi wrote:
> Device States
> -
> The details of the device state representation are not covered in this 
> document
> but the general requirements are discussed here.
> 
> The device state consists of data accessible through the device's hardware
> interface and internal state that is needed to restore device operation.
> State in the hardware interface includes the values of hardware registers.
> An example of internal state is an index value needed to avoid processing
> queued requests more than once.
> 
> Changes can be made to the device state representation as follows. Each change
> to device state must have a corresponding device configuration parameter that
> allows the change to toggled:
> 
> * When the parameter is disabled the hardware interface and device state
>   representation are unchanged. This allows old device states to be loaded.
> 
> * When the parameter is enabled the change comes into effect.
> 
> * The parameter's default value disables the change. Therefore old versions do
>   not have to explicitly specify the parameter.
> 
> The following example illustrates migration from an old device
> implementation to a new one. A version=1 network card is migrated to a
> new device implementation that is also capable of version=2 and adds the
> rx-filter-size=32 parameter. The new device is instantiated with
> version=1, which disables rx-filter-size and is capable of loading the
> version=1 device state. The migration completes successfully but note
> the device is still operating at version=1 level in the new device.
> 
> The following example illustrates migration from a new device
> implementation back to an older one. The new device implementation
> supports version=1 and version=2. The old device implementation supports
> version=1 only. Therefore the device can only be migrated when
> instantiated with version=1 or the equivalent full configuration
> parameters.

So all this is pretty complex and easy for vendors to get wrong.  How
about we introduce a directory under docs/interop/ where each supported
device can list the format of its state and parameters and what is tied
to what?

I am a bit unsure about the usefulness of the version shortcut.
It would be handy if all this was used directly by users
but these are unlikely to want to orchestrate cross version
migrations, and tools do not need shortcuts like these ...

-- 
MST

Re: [PATCH v2 1/3] softmmu: Do not use C99 // comments

2020-11-03 Thread chaihaoyu



Thank you for your replay. That's OK if C99 support both kinds of comment style.


> chaihaoyu  writes:
> 
>> Hi, recently I found some code style problems while using checkpatch.pl 
>> tool,please review.
>>
>> Date: Tue, 3 Nov 2020 11:01:40 +0800
>> signed-off-by: Haoyu Chai
>> ---
>>  softmmu/memory.c | 2 +-
>>  softmmu/memory_mapping.c | 2 +-
>>  softmmu/physmem.c| 2 +-
>>  3 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index 107ce0a4f8..5fb591b001 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -36,7 +36,7 @@
>>  #include "hw/boards.h"
>>  #include "migration/vmstate.h"
>>
>> -//#define DEBUG_UNASSIGNED
>> +/* #define DEBUG_UNASSIGNED */
>>
>>  static unsigned memory_region_transaction_depth;
>>  static bool memory_region_update_pending;
>> diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
>> index 18d0b8067c..f64053499e 100644
>> --- a/softmmu/memory_mapping.c
>> +++ b/softmmu/memory_mapping.c
>> @@ -19,7 +19,7 @@
>>  #include "exec/memory.h"
>>  #include "exec/address-spaces.h"
>>
>> -//#define DEBUG_GUEST_PHYS_REGION_ADD
>> +/* #define DEBUG_GUEST_PHYS_REGION_ADD */
>>
>>  static void memory_mapping_list_add_mapping_sorted(MemoryMappingList *list,
>> MemoryMapping *mapping)
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 44ffb60b5d..78c1b6580a 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -75,7 +75,7 @@
>>  #include 
>>  #endif
>>
>> -//#define DEBUG_SUBPAGE
>> +/* #define DEBUG_SUBPAGE */
>>
>>  /* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
>>   * are protected by the ramlist lock.
> 
> I recommend to leave these alone.
> 
> CODING_STYLE.rst:
> 
> Rationale: The // form is valid in C99, so this is purely a matter of
> consistency of style. The checkpatch script will warn you about this.
> 
> For "real" comments, we overwhelmingly use /* */, and avoiding // makes
> sense.  Most exceptions are in code we copy from elsewhere, such as
> disas/libvixl/.
> 
> For commenting out *code*, we use both forms.  Here are the counts for
> commenting out macro definitions:
> 
> $ git-grep '^/\* *# *define' | wc -l
> 125
> $ git-grep '^// *# *define' | wc -l
> 192
> 
> .
>

Re: [PATCH v8 06/11] hw/block/nvme: Support allocated CNS command variants

2020-11-03 Thread Klaus Jensen

On Oct 30 11:32, Dmitry Fomichev wrote:
> From: Niklas Cassel 
> 
> Many CNS commands have "allocated" command variants. These include
> a namespace as long as it is allocated, that is a namespace is
> included regardless if it is active (attached) or not.
> 
> While these commands are optional (they are mandatory for controllers
> supporting the namespace attachment command), our QEMU implementation
> is more complete by actually providing support for these CNS values.
> 
> However, since our QEMU model currently does not support the namespace
> attachment command, these new allocated CNS commands will return the
> same result as the active CNS command variants.
> 
> In NVMe, a namespace is active if it exists and is attached to the
> controller.
> 
> Add a new Boolean namespace flag, "attached", to provide the most
> basic namespace attachment support. The default value for this new
> flag is true. Also, implement the logic in the new CNS values to
> include/exclude namespaces based on this new property. The only thing
> missing is hooking up the actual Namespace Attachment command opcode,
> which will allow a user to toggle the "attached" flag per namespace.
> 
> The reason for not hooking up this command completely is because the
> NVMe specification requires the namespace management command to be
> supported if the namespace attachment command is supported.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Keith Busch 
> ---

Please rip out all the ns->attached conditionals (it's dead code).

Just add the new CNS values in the switch and let them fall through.


signature.asc
Description: PGP signature

Re: [PATCH 0/4] qga: Fix some style problems

2020-11-03 Thread Marc-André Lureau

On Mon, Oct 26, 2020 at 1:16 PM AlexChen  wrote:

> Fix some error style problems found by checkpatch.pl.
>
> alexchen (4):
>   qga: Add spaces around operator
>   qga: Delete redundant spaces
>   qga: Open brace '{' following struct go on the same
>   qga: switch and case should be at the same indent
>
>  qga/channel-win32.c  |  6 ++---
>  qga/commands-posix.c |  4 +--
>  qga/commands-win32.c | 28 ++---
>  qga/commands.c   |  4 +--
>  qga/main.c   | 59 ++--
>  5 files changed, 50 insertions(+), 51 deletions(-)
>
>
Reviewed-by: Marc-André Lureau 


-- 
Marc-André Lureau

Re: [PATCH] hw/9pfs: virtio-9p: Ensure config space is a multiple of 4 bytes

2020-11-03 Thread Bin Meng

Hi Michael,

On Tue, Nov 3, 2020 at 8:05 PM Michael S. Tsirkin  wrote:
>
> On Tue, Nov 03, 2020 at 02:26:10PM +0800, Bin Meng wrote:
> > Hi Michael,
> >
> > On Fri, Oct 30, 2020 at 5:29 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Oct 29, 2020 at 04:25:41PM +0800, Bin Meng wrote:
> > > > From: Bin Meng 
> > > >
> > > > At present the virtio device config space access is handled by the
> > > > virtio_config_readX() and virtio_config_writeX() APIs. They perform
> > > > a sanity check on the result of address plus size against the config
> > > > space size before the access occurs.
> > > >
> > > > For unaligned access, the last converted naturally aligned access
> > > > will fail the sanity check on 9pfs. For example, with a mount_tag
> > > > `p9fs`, if guest software tries to read the mount_tag via a 4 byte
> > > > read at the mount_tag offset which is not 4 byte aligned, the read
> > > > result will be `p9\377\377`, which is wrong.
> > > >
> > > > This changes the size of device config space to be a multiple of 4
> > > > bytes so that correct result can be returned in all circumstances.
> > > >
> > > > Signed-off-by: Bin Meng 
> > >
> > >
> > >
> > > The patch is ok, but I'd like to clarify the commit log.
> >
> > Thanks for the review.
> >
> > >
> > > If I understand correctly, what happens is:
> > > - tag is set to a value that is not a multiple of 4 bytes
> >
> > It's not about the mount_tag value, but the length of the mount_tag is 4.
> >
> > > - guest attempts to read the last 4 bytes of the tag
> >
> > Yep. So the config space of a 9pfs looks like the following:
> >
> > offset: 0x14, size: 2 bytes indicating the length of the following mount_tag
> > offset: 0x16, size: value of (offset 0x14).
> >
> > When a 4-byte mount_tag is given, guest software is subject to read 4
> > bytes (value read from offset 0x14) at offset 0x16.
>
>
> Well looking at Linux guest code:
>
>
> static inline void __virtio_cread_many(struct virtio_device *vdev,
>unsigned int offset,
>void *buf, size_t count, size_t bytes)
> {
> u32 old, gen = vdev->config->generation ?
> vdev->config->generation(vdev) : 0;
> int i;
>
> might_sleep();
> do {
> old = gen;
>
> for (i = 0; i < count; i++)
> vdev->config->get(vdev, offset + bytes * i,
>   buf + i * bytes, bytes);
>
> gen = vdev->config->generation ?
> vdev->config->generation(vdev) : 0;
> } while (gen != old);
> }
>
>
>
> static inline void virtio_cread_bytes(struct virtio_device *vdev,
>   unsigned int offset,
>   void *buf, size_t len)
> {
> __virtio_cread_many(vdev, offset, buf, len, 1);
> }
>
> and:
>
>
> virtio_cread_bytes(vdev, offsetof(struct virtio_9p_config, tag),
>tag, tag_len);
>
>
>
> So guest is doing multiple 1-byte reads.
>

Correct.

>
> Spec actually says:
> For device configuration access, the driver MUST use 8-bit wide 
> accesses for 8-bit wide fields, 16-bit wide
>
> and aligned accesses for 16-bit wide fields and 32-bit wide and 
> aligned accesses for 32-bit and 64-bit wide
>
> fields. For 64-bit fields, the driver MAY access each of the high and 
> low 32-bit parts of the field independently.
>

Yes.

> 9p was never standardized, but the linux header at least lists it as
> follows:
>
> struct virtio_9p_config {
> /* length of the tag name */
> __virtio16 tag_len;
> /* non-NULL terminated tag name */
> __u8 tag[0];
> } __attribute__((packed));
>
> In that sense tag is an 8 byte field.
>
> So which guest reads tag using a 32 bit read, and why?
>

The obvious fix can be made to the guest which exposed this issue, but
I was wondering whether we should enforce all virtio devices' config
space size to be a multiple of 4 bytes which sounds more natural.

Regards,
Bin

Re: [PATCH for-5.2 3/4] qtest: escape device name in device-introspect-test

2020-11-03 Thread Thomas Huth

On 03/11/2020 16.14, Paolo Bonzini wrote:
> device-introspect-test uses HMP, so it should escape the device name
> properly.  Because of this, a few devices that had commas in their
> names were escaping testing.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/qtest/device-introspect-test.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/device-introspect-test.c 
> b/tests/qtest/device-introspect-test.c
> index 9f22340ee5..f471b0e0dd 100644
> --- a/tests/qtest/device-introspect-test.c
> +++ b/tests/qtest/device-introspect-test.c
> @@ -104,7 +104,9 @@ static QList *device_type_list(QTestState *qts, bool 
> abstract)
>  static void test_one_device(QTestState *qts, const char *type)
>  {
>  QDict *resp;
> -char *help;
> +g_autofree char *help;
> +g_autofree GRegex *comma;
> +g_autofree char *escaped;
>  
>  g_test_message("Testing device '%s'", type);
>  
> @@ -113,8 +115,9 @@ static void test_one_device(QTestState *qts, const char 
> *type)
> type);
>  qobject_unref(resp);
>  
> -help = qtest_hmp(qts, "device_add \"%s,help\"", type);
> -g_free(help);
> +comma = g_regex_new(",", 0, 0, NULL);
> +escaped = g_regex_replace_literal(comma, type, -1, 0, ",,", 0, NULL);
> +help = qtest_hmp(qts, "device_add \"%s,help\"", escaped);
>  }

Having "help =" as final statement now, this looks somewhat weird at a first
glance (until you look at the g_autofree at the beginning of the function).
Maybe it's better to drop the help variable completely and just do:
g_free(gtest_hmp(...)) ?

 Thomas

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-03 Thread Michael S. Tsirkin

On Wed, Nov 04, 2020 at 07:50:52AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > I think not. Obviously each firmware should have its own ABI no matter
> > > whether its public or proprietary. For proprietary firmware, it should
> > > be understood by the proprietary userspace counterpart.
> > 
> > Userspace does not necessarily need to interpret the contents. The
> > vendor can ship a binary blob and the driver loads the file onto the
> > device without interpreting it.
> 
> Exactly.  Neither userspace nor kernel look at the blob, except maybe
> some headers with version, size, checksum etc.  Only the device does
> something with the actual content.
> 
> Doing the same make sense for migration device state.  The kernel driver
> saves and restores the device state.  Userspace doesn't need to look at
> it.  Again, with an exception for some header fields.
> 
> So requiring userspace being able to interpret the migration data
> (except header) for all devices looks rather pointless to me.

If nothing else we need a good place where vendors can publish this
data.

> Speaking of headers: Defining a common header format makes sense.
> For standard devices (virtio, nvme, ...) it makes sense to try define
> a standard, cross-vendor migration data format.
> For vendor-specific devices (gpus for example) I absolutely don't see
> the point.
> 
> take care,
>   Gerd

Re: [PATCH for-5.2 1/4] ivshmem-test: do not use short-form boolean option

2020-11-03 Thread Thomas Huth

On 03/11/2020 16.14, Paolo Bonzini wrote:
> This QemuOpts idiom will be deprecated, so get rid of it in the tests.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/qtest/ivshmem-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/ivshmem-test.c b/tests/qtest/ivshmem-test.c
> index d5c8b9f128..dfa69424ed 100644
> --- a/tests/qtest/ivshmem-test.c
> +++ b/tests/qtest/ivshmem-test.c
> @@ -135,7 +135,7 @@ static void setup_vm_cmd(IVState *s, const char *cmd, 
> bool msix)
>  static void setup_vm(IVState *s)
>  {
>  char *cmd = g_strdup_printf("-object memory-backend-file"
> -",id=mb1,size=1M,share,mem-path=/dev/shm%s"
> +
> ",id=mb1,size=1M,share=on,mem-path=/dev/shm%s"
>  " -device ivshmem-plain,memdev=mb1", tmpshm);
>  
>  setup_vm_cmd(s, cmd, false);
> 

Reviewed-by: Thomas Huth

Re: VFIO Migration

2020-11-03 Thread Stefan Hajnoczi

On Tue, Nov 03, 2020 at 06:49:51PM +, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > On Tue, Nov 03, 2020 at 12:17:09PM +, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > > > Device Models
> > > > -
> > > > Devices have a *hardware interface* consisting of hardware registers,
> > > > interrupts, and so on.
> > > > 
> > > > The hardware interface together with the device state representation is 
> > > > called
> > > > a *device model*. Device models can be assigned URIs such as
> > > > https://qemu.org/devices/e1000e to uniquely identify them.
> > > 
> > > I think this is a unique identifier, not actually a URI; the https://
> > > isn't needed since no one expects to ever connect to this.
> > 
> > Yes, it could be any unique string. If the URI idea is not popular we
> > can use any similar scheme.
> 
> I'm OK with it being a URI; just drop the https.

Okay.

> > > > However, secondary aspects related to the physical port may affect the 
> > > > device's
> > > > hardware interface and need to be reflected in the device 
> > > > configuration. The
> > > > link speed may depend on the physical port and be reported through the 
> > > > device's
> > > > hardware interface. In that case a ``link-speed`` configuration 
> > > > parameter is
> > > > required to prevent unexpected changes to the link speed after 
> > > > migration.
> > > 
> > > That's an interesting example; because depending on the device, it might
> > > be:
> > > a) Completely virtualised so that the guest *shouldn't* know what
> > > the physical link speed is, precisely to allow the physical network on
> > > the destination to be different.
> > > 
> > > b) Part of the migrated state
> > > 
> > > c) Something that's allowed to be reloaded after migration
> > > 
> > > d) Configurable
> > > 
> > > so I'm not sure whether it's a good example in this case or not.
> > 
> > Can you think of an example that has only one option?
> > 
> > I tried but couldn't. For example take a sound card. The guest is aware
> > the device supports stereo playback (2 output channels), but which exact
> > stereo host device is used doesn't matter, they are all suitable.
> > 
> > Now imagine migrating to a 7.1 surround-sound device. Similar options
> > come into play:
> > 
> > a) Emulate stereo and mix it to 7.1 surround-sound on the physical
> >device. The guest still sees the stereo device.
> > 
> > b) Refuse migration.
> > 
> > c) Indicate that the output has switched and let the guest reconfigure
> >itself (e.g. a sound card with multiple outputs, where one of them is
> >stereo and another is 7.1 surround sound).
> > 
> > Which option is desirable depends on the use case.
> 
> Yes, but I think it might be worth calling out these differences;  there
> are explicitly cases where you don't want external changes to be visible
> and other cases where you do; both are valid, but both need thinking
> about. (Another one, GPU whether you have a monitor plugged in!)

Okay.

> > > Maybe what's needed is a stronger instruction to abstract external
> > > device state so that it's not part of the configuration in most cases.
> > 
> > Do you want to propose something?
> 
> I think something like 'Some part of a devices state may be irrelevant
> to a migration; for example on some NICs it might be preferable to hide
> the physical characteristics of the link from the guest.'

Got it.

> > > > For example, if address filtering support was added to a network card 
> > > > then
> > > > device versions and the corresponding configurations may look like this:
> > > > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > > > * ``version=2`` - ``rx-filter-size=32``
> > > 
> > > Note configuration parameters might have been added during the life of
> > > the device; e.g. if the original card had no support for rx-filters, it
> > > might not have a rx-filter-size parameter.
> > 
> > version=1 does not explicitly set rx-filter-size=0. When a new parameter
> > is introduced it must have a default value that disables its effect on
> > the hardware interface and/or device state representation. This is
> > described in a bit more detail in the next section, maybe it should be
> > reordered.
> 
> We've generally found the definition of devices tends in practice to be
> done newer->older; i.e. you define the current machine, and then define
> the next older machine setting the defaults that used to be true; then
> define the older version behind that

That is not possible here because an older device implementation is
unaware of new configuration parameters.

Looking at the example above, imagine a version=1 device is instantiated
on a device implementation that supports both version=1 and version=2.
Should the configuration parameter list for version=1 be empty or
rx-filter-size=0?

It must to be empty, otherwise an older device implementation that only
supports version=1

Re: [PATCH v2] qapi, qemu-options: make all parsing visitors parse boolean options the same

2020-11-03 Thread Markus Armbruster

Paolo Bonzini  writes:

> OptsVisitor, StringInputVisitor and the keyval visitor have
> three different ideas of how a human could write the value of
> a boolean option.  Pay homage to the backwards-compatibility
> gods and make the new common helper accept all four sets (on/off,
> true/false, y/n and yes/no), and case-insensitive at that.
>
> Since OptsVisitor is supposed to match qemu-options, adjust
> it as well.

For clarity: s/it/the latter/

> Supersedes: <20201103142344.402353-1-pbonz...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  include/qapi/util.h  |  2 ++
>  qapi/opts-visitor.c  | 14 +-
>  qapi/qapi-util.c | 23 +++
>  qapi/qobject-input-visitor.c | 15 +--
>  qapi/string-input-visitor.c  | 17 +
>  util/qemu-option.c   | 20 ++--
>  6 files changed, 34 insertions(+), 57 deletions(-)
>
> diff --git a/include/qapi/util.h b/include/qapi/util.h
> index bc312e90aa..6178e98e97 100644
> --- a/include/qapi/util.h
> +++ b/include/qapi/util.h
> @@ -19,6 +19,8 @@ typedef struct QEnumLookup {
>  const char *qapi_enum_lookup(const QEnumLookup *lookup, int val);
>  int qapi_enum_parse(const QEnumLookup *lookup, const char *buf,
>  int def, Error **errp);
> +bool qapi_bool_parse(const char *name, const char *value, bool *obj,
> + Error **errp);
>  
>  int parse_qapi_name(const char *name, bool complete);
>  
> diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
> index 7781c23a42..587f31baf6 100644
> --- a/qapi/opts-visitor.c
> +++ b/qapi/opts-visitor.c
> @@ -368,7 +368,6 @@ opts_type_str(Visitor *v, const char *name, char **obj, 
> Error **errp)
>  }
>  
>  
> -/* mimics qemu-option.c::parse_option_bool() */
>  static bool
>  opts_type_bool(Visitor *v, const char *name, bool *obj, Error **errp)
>  {
> @@ -379,19 +378,8 @@ opts_type_bool(Visitor *v, const char *name, bool *obj, 
> Error **errp)
>  if (!opt) {
>  return false;
>  }
> -
>  if (opt->str) {
> -if (strcmp(opt->str, "on") == 0 ||
> -strcmp(opt->str, "yes") == 0 ||
> -strcmp(opt->str, "y") == 0) {
> -*obj = true;
> -} else if (strcmp(opt->str, "off") == 0 ||
> -strcmp(opt->str, "no") == 0 ||
> -strcmp(opt->str, "n") == 0) {
> -*obj = false;
> -} else {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
> -   "on|yes|y|off|no|n");
> +if (!qapi_bool_parse(opt->name, opt->str, obj, errp)) {
>  return false;
>  }
>  } else {
> diff --git a/qapi/qapi-util.c b/qapi/qapi-util.c
> index 29a6c98b53..4dd8f6c313 100644
> --- a/qapi/qapi-util.c
> +++ b/qapi/qapi-util.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/ctype.h"
> +#include "qapi/qmp/qerror.h"
>  
>  const char *qapi_enum_lookup(const QEnumLookup *lookup, int val)
>  {
> @@ -40,6 +41,28 @@ int qapi_enum_parse(const QEnumLookup *lookup, const char 
> *buf,
>  return def;
>  }
>  
> +bool qapi_bool_parse(const char *name, const char *value, bool *obj, Error 
> **errp)
> +{
> +if (!strcasecmp(value, "on") ||
> +!strcasecmp(value, "yes") ||
> +!strcasecmp(value, "true") ||
> +!strcasecmp(value, "y")) {
> +*obj = true;
> +return true;
> +}
> +if (!strcasecmp(value, "off") ||
> +!strcasecmp(value, "no") ||
> +!strcasecmp(value, "false") ||
> +!strcasecmp(value, "n")) {
> +*obj = false;
> +return true;
> +}
> +
> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name,
> +   "boolean (on/off, yes/no, true/false, y/n)");

Recommend to have the error message only mention the preferred form.  I
like the laconic "'on' or 'off'".  It's really all the user needs to
know.

> +return false;
> +}
> +
>  /*
>   * Parse a valid QAPI name from @str.
>   * A valid name consists of letters, digits, hyphen and underscore.
> diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
> index 7b184b50a7..f4541a4fdd 100644
> --- a/qapi/qobject-input-visitor.c
> +++ b/qapi/qobject-input-visitor.c
> @@ -512,16 +512,11 @@ static bool qobject_input_type_bool_keyval(Visitor *v, 
> const char *name,
>  return false;
>  }
>  
> -if (!strcmp(str, "on")) {
> -*obj = true;
> -} else if (!strcmp(str, "off")) {
> -*obj = false;
> -} else {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
> -   full_name(qiv, name), "'on' or 'off'");
> -return false;
> -}
> -return true;
> +/*
> + * Calling full_name is a bit slow, but keyval (human/CLI) parsing
> + * is not a hot path.
> + */
> +return qapi_bool_parse(full_name(qiv, name), str, obj, errp);

Avoiding the full_name() on success isn't hard:

   if

Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Klaus Jensen

On Nov  3 21:37, Philippe Mathieu-Daudé wrote:
> On 11/3/20 8:48 PM, Dmitry Fomichev wrote:
> >> -Original Message-
> >> From: Philippe Mathieu-Daudé 
> ...
> >>>  typedef struct QEMU_PACKED NvmeCqe {
> >>> -uint32_tresult;
> >>> -uint32_trsvd;
> >>> +union {
> >>> +uint64_t result64;
> >>> +uint32_t result32;
> >>> +};
> >>
> >> When using packed structure you want to define all fields to
> >> avoid alignment confusion (and I'm surprised the compiler doesn't
> >> complain...). So this would be:
> >>
> >>union {
> >>uint64_t result64;
> >>struct {
> >>uint32_tresult32;
> >>uint32_trsvd32;
> >>};
> >>};
> >>

I align (hehe...) towards this. The amount of bit-juggling we need for
commands justify the need for separate NvmeCmd's, but in this case I
think an NvmeCqeZA is just unnecessary clutter. If the result value is
complex, the approach used by AERs is better I think:

   typedef struct QEMU_PACKED NvmeAerResult {
   uint8_t event_type;
   uint8_t event_info;
   uint8_t log_page;
   uint8_t resv;
   } NvmeAerResult;

   NvmeAerResult *result = (NvmeAerResult *)>cqe.result32;

Since storing the Zone Append ALBA in the result64 isn't really a
complex operation, let's just assign it into that member directly.

(Addendum) That DW1 is command specific and no longer reserved is
defined by TP 4056 (Namespace Types) - not v1.4 or any of its revisions.

signature.asc
Description: PGP signature

Re: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-03 Thread Zheng Chuan

I think i've got what Daniel point in another maillist about this problem.

This is exactly due to Blocking I/O issue of TLS handshake.

Src: (multifd_send_0)   
Dst: (multifd_recv_1)
multifd_channel_connect 
migration_channel_process_incoming
multifd_tls_channel_connect 
   migration_tls_channel_process_incoming
   multifd_tls_channel_connect  
   qio_channel_tls_handshake_task
   qio_channel_tls_handshake
 gnutls_handshake
 qio_channel_tls_handshake_task 
  ...
 qcrypto_tls_session_handshake  
  ...
  gnutls_handshake  
  ...
 ...
  ...
   recvmsg (Blocking I/O waiting for response)  
 recvmsg (Blocking I/O waiting for response)

Here is how hang up happens.
The Src multifd_send_0 invokes tls handshake, it sends hello to sever and wait 
response.
However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1.
Both of Src and Dst main qemu loop are blocking and waiting for reponse which 
results in hang forever.

I have verified it through gdb that shows they are belong to different TLS 
handshake socket on Src and Dst.

So to solve this problem, one method maybe is that
we need to extract multifd_channel_connect() from 
multifd_new_send_channel_async as a qio task, which could
offload tls handshake to the thread other than qemu main loop？


On 2020/11/3 13:52, Zheng Chuan wrote:
> 
> 
> On 2020/11/3 4:16, Dr. David Alan Gilbert wrote:
>> * zhengchuan (zhengch...@huawei.com) wrote:
>>> Anyone who could help this would be appreciated since we have stuck for 
>>> three days:(
>>>
>>> IIUC, the client (Src) has sent first hello message to sever(Dst), however 
>>> due to something happened while restarted libvirtd,
>>> The messages is lost, and both of them are waiting which leading to hang 
>>> forever, but I could find out how for now.
>>
>> If you need to un-break things, I suggest killing the destination might
>> free it; but I'm not sure.
>>
> Hi, Dave.
> Unfortunately, no. After killing the destination, it left Src main migration 
> thread stuck at multifd_send_sync_main().
> 
>> An interesting question is if we can make migration-cancel work in this
>> case.
>>
>> Dave
>>
> Bad thing happened, since the main qemu thread is stuck at recvmsg(), qemu 
> could not respond for libvirt qmp_migrate_cancel:(
> 
> During the time, I also found another question is that the Dst socket 
> connections are not closed after migration-cancel,
> multifd channel would be left with status of CLOSE-WAIT if we look at them 
> though 'ss' command.
> 
> This is because the multifd_save_cleanup() is simply call 
> socket_send_channel_destroy and unref the ioc other than calling
> qio_channel_shutdown() in multifd_recv_terminate_threads(), It is not working 
> for tls channel.
> Simply working around by adding qio_channel_shutdown like this
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = _send_state->params[i];
> 
> +   qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> socket_send_channel_destroy(p->c);
> }
> The residual socket is closed, but i doubt if it is the correct solution...
> 
> Back to the problem described in this issue, it is still not resolved after 
> this working around, but i think it is also a similiar
> cleanup issue, and i will dig it out more further...
> 
> 
>>> -Original Message-
>>> From: Qemu-devel 
>>> [mailto:qemu-devel-bounces+zhengchuan=huawei@nongnu.org] On Behalf Of 
>>> Yan Jin
>>> Sent: 2020年11月2日 11:12
>>> To: qemu-devel@nongnu.org
>>> Subject: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the 
>>> dst-libvirtd service restarts
>>>
>>> ** Description changed:
>>>
>>>   hi,
>>>   
>>>   I found that the multi-channel TLS-handshake will be stuck when the dst-
>>>   libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
>>>   In the meantime, live_migration thread is blocked in
>>>   multifd_send_sync_main, so migration cannot be cancelled though src-
>>>   libvirt has delivered the QMP command.
>>>   
>>>   Is there any way to exit migration when the multi-channel TLS-handshake
>>> - is stuck? Does setting TLS handshake timeout function take effect?
>>> + is stuck? Does setting TLS-handshake timeout function take effect?
>>>   
>>>   The stack trace are as follows:
>>>   
>>>   =src

Re: VFIO Migration

2020-11-03 Thread Stefan Hajnoczi

On Wed, Nov 04, 2020 at 11:32:34AM +0800, Jason Wang wrote:
> 
> On 2020/11/3 下午8:15, Stefan Hajnoczi wrote:
> > On Tue, Nov 03, 2020 at 04:46:53PM +0800, Jason Wang wrote:
> > > On 2020/11/2 下午7:11, Stefan Hajnoczi wrote:
> > > > There is discussion about VFIO migration in the "Re: Out-of-Process
> > > > Device Emulation session at KVM Forum 2020" thread. The current status
> > > > is that Kirti proposed a VFIO device region type for saving and loading
> > > > device state. There is currently no guidance on migrating between
> > > > different device versions or device implementations from different
> > > > vendors. This is known to be non-trivial and raised discussion about
> > > > whether it should really be handled by VFIO or centralized in QEMU.
> > > > 
> > > > Below is a document that describes how to ensure migration compatibility
> > > > in VFIO. It does not require changes to the VFIO migration interface. It
> > > > can be used for both VFIO/mdev kernel devices and vfio-user devices.
> > > > 
> > > > The idea is that the device state blob is opaque to the VMM but the same
> > > > level of migration compatibility that exists today is still available.
> > > 
> > > So if we can't mandate this or there's no way to validate this. Vendor is
> > > still free to implement their own protocol which could lead a lot of
> > > maintaining burden.
> > Yes, the device state representation is their responsibility. We can't
> > do that for them since they define the hardware interface and internal
> > state.
> > 
> > As Michael and Paolo have mentioned in the other thread, we can provide
> > guidelines and standardize common aspects.
> > 
> > > > Migration can fail if loading the device state is not possible. It 
> > > > should fail
> > > > early with a clear error message. It must not appear to complete but 
> > > > leave the
> > > > device inoperable due to a migration problem.
> > > 
> > > For VFIO-user, how management know that a VM can be migrated from src to
> > > dst? For kernel, we have sysfs.
> > vfio-user devices will normally be instantiated in one of two ways:
> > 
> > 1. Launching a device backend and passing command-line parameters:
> > 
> >   $ my-nic --socket-path /tmp/my-nic-vfio-user.sock \
> >--model https://vendor-a.com/my-nic \
> >   --rss on
> > 
> > Here "model" is the device model URL. The program could support
> > multiple device models.
> > 
> > The "rss" device configuration parameter enables Receive Side Scaling
> > (RSS) as an example of a configuration parameter.
> > 
> > 2. Creating a device using an RPC interface:
> > 
> >   (qemu) device-add my-nic,rss=on
> > 
> > If the device instantiation succeeds then it is safe to live migrate.
> > The device is exposing the desired hardware interface and expecting the
> > right device state representation.
> 
> 
> Does this mean there will still be a "my-nic" stub in qemu? (I thought it
> should be a generic one like device-add "vfio-user-pci")

No, sorry for the confusing example. I was thinking of
qemu-storage-daemon or multi-process QEMU where devices could be
configured over a QMP/HMP monitor. The device happens to be implemented
in the QEMU codebase but the VMM doesn't need a stub device.

A D-Bus or gRPC example would have been clearer because it's not
associated with a VMM.

> > 
> > > > The rest of this document describes how these requirements can be met.
> > > > 
> > > > Device Models
> > > > -
> > > > Devices have a *hardware interface* consisting of hardware registers,
> > > > interrupts, and so on.
> > > > 
> > > > The hardware interface together with the device state representation is 
> > > > called
> > > > a *device model*. Device models can be assigned URIs such as
> > > > https://qemu.org/devices/e1000e to uniquely identify them.
> > > 
> > > It looks worse than "pci://vendor_id.device_id.subvendor_id.subdevice_id".
> > > "e1000e" means a lot of different 8275X implementations that have subtle 
> > > but
> > > easy to be ignored differences.
> > If you wish to reflect those differences in the device model URI then
> > you can use:
> > 
> >
> > https://qemu.org/devices/pci
> > 
> > Another option is to use device configuration parameters to express
> > differences.
> > 
> > The important thing is that this device model URI has one owner. No one
> > else will use qemu.org. There can be many different e1000e device model
> > URIs, if necessary (with slightly different hardware interfaces and/or
> > device state representations). This avoids collisions.
> > 
> > > And is it possible to have a list of URIs here?
> > A device implementation (mdev driver, vfio-user device backend, etc) may
> > support multiple device model URIs.
> > 
> > A device instance has an immutable device model URI and list of
> > configuration parameters. In other words, once the device is created its
> > ABI is fixed for the lifetime of the device. A new device instance can
> > be

Re: [PATCH] target/openrisc: Remove dead code attempting to check "is timer disabled"

2020-11-03 Thread Stafford Horne

On Tue, Nov 03, 2020 at 11:46:54AM +, Peter Maydell wrote:
> In the mtspr helper we attempt to check for "is the timer disabled"
> with "if (env->ttmr & TIMER_NONE)".  This is wrong because TIMER_NONE
> is zero and the condition is always false (Coverity complains about
> the dead code.)
> 
> The correct check would be to test whether the TTMR_M field in the
> register is equal to TIMER_NONE instead.  However, the
> cpu_openrisc_timer_update() function checks whether the timer is
> enabled (it looks at cpu->env.is_counting, which is set to 0 via
> cpu_openrisc_count_stop() when the TTMR_M field is set to
> TIMER_NONE), so there's no need to check for "timer disabled" in the
> target/openrisc code.  Instead, simply remove the dead code.

Thanks for the patch!

I think the check is needed, but it's coded wrong.  The ttmr bits 31,30
are the timer mode.  If they are equal to zero (TIMER_NONE) then it means
the timer is disabled and we can skip the timer update.

The code should be something like ((env->ttmr >> 30) == TIMER_NONE). But I
haven't tested it.

I may not have time to look at this, in the next few days so if you want to just
remove the code I am fine with that.  It seems to be working fine as is anyway.

Also, we almost always have timers running in my workloads so the optimization
doesn't do much.

-Stafford

> Fixes: Coverity CID 1005812
> Signed-off-by: Peter Maydell 
> ---
>  target/openrisc/sys_helper.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
> index d9fe6c59489..41390d046f6 100644
> --- a/target/openrisc/sys_helper.c
> +++ b/target/openrisc/sys_helper.c
> @@ -176,9 +176,6 @@ void HELPER(mtspr)(CPUOpenRISCState *env, target_ulong 
> spr, target_ulong rb)
>  
>  case TO_SPR(10, 1): /* TTCR */
>  cpu_openrisc_count_set(cpu, rb);
> -if (env->ttmr & TIMER_NONE) {
> -return;
> -}
>  cpu_openrisc_timer_update(cpu);
>  break;
>  #endif
> -- 
> 2.20.1
>

Re: Out-of-Process Device Emulation session at KVM Forum 2020

2020-11-03 Thread Gerd Hoffmann

  Hi,

> > I think not. Obviously each firmware should have its own ABI no matter
> > whether its public or proprietary. For proprietary firmware, it should
> > be understood by the proprietary userspace counterpart.
> 
> Userspace does not necessarily need to interpret the contents. The
> vendor can ship a binary blob and the driver loads the file onto the
> device without interpreting it.

Exactly.  Neither userspace nor kernel look at the blob, except maybe
some headers with version, size, checksum etc.  Only the device does
something with the actual content.

Doing the same make sense for migration device state.  The kernel driver
saves and restores the device state.  Userspace doesn't need to look at
it.  Again, with an exception for some header fields.

So requiring userspace being able to interpret the migration data
(except header) for all devices looks rather pointless to me.

Speaking of headers: Defining a common header format makes sense.
For standard devices (virtio, nvme, ...) it makes sense to try define
a standard, cross-vendor migration data format.
For vendor-specific devices (gpus for example) I absolutely don't see
the point.

take care,
  Gerd

Debugging with rr

2020-11-03 Thread Sai Pavan Boddu

Hi,

I tired debugging QEMU with rr version 4.1.0, on ubuntu 16.04. And I see below 
errors, any suggestions on the issue would be helpful

rr: /build/rr-jR8ti5/rr-4.1.0/src/Event.h:304: SyscallEvent& Event::Syscall(): 
Assertion `is_syscall_event()' failed.

Regards,
Sai Pavan

Re: [PATCH-for-5.2 2/3] gitlab-ci: Add a job to cover the --without-default-devices config

2020-11-03 Thread Thomas Huth

On 03/11/2020 21.41, Philippe Mathieu-Daudé wrote:
> On 11/3/20 7:43 PM, Thomas Huth wrote:
>> On 03/11/2020 17.46, Philippe Mathieu-Daudé wrote:
[...]
>>> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
>>> index 3b15ae5c302..6ee098ec53c 100644
>>> --- a/.gitlab-ci.yml
>>> +++ b/.gitlab-ci.yml
>>> @@ -262,6 +262,17 @@ build-user-plugins:
>>>  MAKE_CHECK_ARGS: check-tcg
>>>timeout: 1h 30m
>>>  
>>> +build-system-ubuntu-without-default-devices:
>>> +  <<: *native_build_job_definition
>>> +  variables:
>>> +IMAGE: ubuntu2004
>>> +CONFIGURE_ARGS: --without-default-devices --disable-user --disable-xen 
>>> --disable-tools --disable-docs
>>> +MAKE_CHECK_ARGS: check-build
>>
>> AFAIK "check-build" is pretty much a no-op since the convertion to meson ...
>> could you maybe replace with a set of qtest targets that work, to make sure
>> that we do not regress here? E.g.:
>>
>> MAKE_CHECK_ARGS: check-qtest-avr check-qtestcris check-qtest-m68k
>> check-qtest-microblaze check-qtest-mipsel check-qtest-moxie ...
> 
> qtests don't work with --without-default-devices, as we don't check
> for (un-)available devices.

Sure, "make check-qtest" does not work, I know. But some targets work fine,
e.g. "make check-qtest-avr", and that's what I've suggested. By testing
those targets, we can make sure that the qtests don't regress any further
with --without-default-devices.

> I'll try check-unit.

I think that does not have much benefit since it should be independent of
the devices and is tested in the other pipelines already?

 Thomas

Re: [PATCH-for-5.2 2/3] gitlab-ci: Add a job to cover the --without-default-devices config

2020-11-03 Thread Thomas Huth

On 04/11/2020 03.27, Stefano Stabellini wrote:
[...]
> Actually I care about Xen and 9pfs support, it is one of the few
> combinations that I use regularly and it is even enabled in the Xilinx
> product I look after. But admittedly I don't test QEMU master as much as
> I should. With the recent changes to the build system it is not very
> suprising that there are some issues. It would be great to have a Xen
> and 9pfs test in the gitlab CI-loop.
> 
> 
> FYI I tried to build the latest QEMU on Alpine Linux 3.12 ARM64 and I
> get:
> 
>   ninja: unknown tool 'query'
> 
> Even after rebuilding ninja master by hand. Any ideas? I don't know much
> about ninja.
> 
> 
> So I gave up on that and I spinned up a Debian Buster x86 container for
> this build. That one got past the "ninja: unknown tool 'query'" error.
> The build completed without problems to the end.
> 
> Either way I can't reproduce the build error above.

Did you run "configure" with "--without-default-devices" ?

 Thomas

RE: [PATCH RESEND v2 0/7] some memleak trivial patchs

2020-11-03 Thread Chenqun (kuhn)

Ping!

Hi Laurent,
  These patches have been reviewed by many people 2 months ago. 
Could you help add them to the trivial branch?

 Pan Nengyuan (6):
   qga/channel-posix: Plug memory leak in ga_channel_write_all()
   elf2dmp/qemu_elf: Plug memleak in QEMU_Elf_init
   elf2dmp/pdb: Plug memleak in pdb_init_from_file
   migration/colo: Plug memleaks in colo_process_incoming_thread
   blockdev: Fix a memleak in drive_backup_prepare()
   block/file-posix: fix a possible undefined behavior

Thanks，
Chen Qun

> -Original Message-
> From: Chenqun (kuhn)
> Sent: Friday, October 30, 2020 6:23 PM
> To: qemu-devel@nongnu.org; qemu-triv...@nongnu.org
> Cc: Pannengyuan ; lviv...@redhat.com;
> Zhanghailiang ; ganqixin
> 
> Subject: RE: [PATCH RESEND v2 0/7] some memleak trivial patchs
> 
> Ping!
> 
> Hi all,
>   The patche2 ~7 have been reviewed for some time.
> Could someone please pick it up？
> 
> Thanks,
> Chen Qun
> 
> > -Original Message-
> > From: Chenqun (kuhn)
> > Sent: Friday, October 23, 2020 2:12 PM
> > To: qemu-devel@nongnu.org; qemu-triv...@nongnu.org
> > Cc: Pannengyuan ; lviv...@redhat.com;
> > Zhanghailiang ; ganqixin
> > ; Chenqun (kuhn) 
> > Subject: [PATCH RESEND v2 0/7] some memleak trivial patchs
> >
> > Hi all,
> >
> >   Here are some memory leak patches reported by EulerRobot.
> > Some patch submissions have been unattended for a while and I resend them.
> >
> > Thanks,
> > Chen Qun
> >
> >
> > Chen Qun (1):
> >   tests/migration: fix memleak in wait_command/wait_command_fd
> >
> > Pan Nengyuan (6):
> >   qga/channel-posix: Plug memory leak in ga_channel_write_all()
> >   elf2dmp/qemu_elf: Plug memleak in QEMU_Elf_init
> >   elf2dmp/pdb: Plug memleak in pdb_init_from_file
> >   migration/colo: Plug memleaks in colo_process_incoming_thread
> >   blockdev: Fix a memleak in drive_backup_prepare()
> >   block/file-posix: fix a possible undefined behavior
> >
> >  block/file-posix.c  |  2 +-
> >  blockdev.c  |  1 +
> >  contrib/elf2dmp/pdb.c   |  1 +
> >  contrib/elf2dmp/qemu_elf.c  |  1 +
> >  migration/colo.c|  5 -
> >  qga/channel-posix.c |  6 +-
> >  tests/qtest/migration-helpers.c | 16 
> >  7 files changed, 25 insertions(+), 7 deletions(-)
> >
> > --
> > 2.23.0

[PULL v2 37/38] vhost-user-blk-test: drop unused return value

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

The sock_path return value was unused and bogus (it doesn't make sense
when there are multiple drives because only the last path is arbitrarily
returned).

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-12-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/vhost-user-blk-test.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 15daf8ccbc..0d056cc189 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -705,8 +705,8 @@ static void quit_storage_daemon(void *qmp_test_state)
 g_free(qmp_test_state);
 }
 
-static char *start_vhost_user_blk(GString *cmd_line, int vus_instances,
-  int num_queues)
+static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
+ int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
 int fd, qmp_fd, i;
@@ -774,7 +774,6 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 g_test_queue_destroy(quit_storage_daemon, qmp_test_state);
 
 qobject_unref(qtest_qmp(qmp_test_state, "{'execute': 
'qmp_capabilities'}"));
-return sock_path;
 }
 
 static void *vhost_user_blk_test_setup(GString *cmd_line, void *arg)
-- 
MST

[PULL v2 10/38] hw/acpi : add spaces around operator

2020-11-03 Thread Michael S. Tsirkin

From: Xinhao Zhang 

Fix code style. Operator needs spaces both sides.

Signed-off-by: Xinhao Zhang 
Signed-off-by: Kai Deng 
Message-Id: <20201103102634.273021-3-zhangxinh...@huawei.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/pcihp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 32ae8b2c0a..17c32e0ffd 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -400,7 +400,7 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
PCIBus *root_bus,
 s->io_len = ACPI_PCIHP_SIZE;
 s->io_base = ACPI_PCIHP_ADDR;
 
-s->root= root_bus;
+s->root = root_bus;
 s->legacy_piix = !bridges_enabled;
 
 memory_region_init_io(>io, owner, _pcihp_io_ops, s,
-- 
MST

[PULL v2 31/38] contrib/vhost-user-blk: fix get_config() information leak

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Refuse get_config() in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-6-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 25eccd02b5..caad88637e 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -404,6 +404,8 @@ vub_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
 VugDev *gdev;
 VubDev *vdev_blk;
 
+g_return_val_if_fail(len <= sizeof(struct virtio_blk_config), -1);
+
 gdev = container_of(vu_dev, VugDev, parent);
 vdev_blk = container_of(gdev, VubDev, parent);
 memcpy(config, _blk->blkcfg, len);
-- 
MST

[PULL v2 16/38] virtio-iommu: Call memory notifiers in attach/detach

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Call the memory notifiers when attaching an endpoint to a domain, to
replay existing mappings, and when detaching the endpoint, to remove all
mappings.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-5-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 7dd15c5eac..7b64892351 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -170,11 +170,39 @@ static void virtio_iommu_notify_unmap(IOMMUMemoryRegion 
*mr, hwaddr virt_start,
 memory_region_notify_iommu(mr, 0, entry);
 }
 
+static gboolean virtio_iommu_notify_unmap_cb(gpointer key, gpointer value,
+ gpointer data)
+{
+VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
+IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
+
+virtio_iommu_notify_unmap(mr, interval->low, interval->high);
+
+return false;
+}
+
+static gboolean virtio_iommu_notify_map_cb(gpointer key, gpointer value,
+   gpointer data)
+{
+VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
+VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
+IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
+
+virtio_iommu_notify_map(mr, interval->low, interval->high,
+mapping->phys_addr, mapping->flags);
+
+return false;
+}
+
 static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
 {
+VirtIOIOMMUDomain *domain = ep->domain;
+
 if (!ep->domain) {
 return;
 }
+g_tree_foreach(domain->mappings, virtio_iommu_notify_unmap_cb,
+   ep->iommu_mr);
 QLIST_REMOVE(ep, next);
 ep->domain = NULL;
 }
@@ -317,6 +345,10 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
 
 ep->domain = domain;
 
+/* Replay domain mappings on the associated memory region */
+g_tree_foreach(domain->mappings, virtio_iommu_notify_map_cb,
+   ep->iommu_mr);
+
 return VIRTIO_IOMMU_S_OK;
 }
 
-- 
MST

Re: [PATCH v3 2/7] target/riscv: Add a virtualised MMU Mode

2020-11-03 Thread Alistair Francis

On Tue, Nov 3, 2020 at 12:20 PM Richard Henderson
 wrote:
>
> On 11/3/20 11:50 AM, Alistair Francis wrote:
> > @@ -30,6 +30,10 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
> >  #ifdef CONFIG_USER_ONLY
> >  return 0;
> >  #else
> > +if (riscv_cpu_virt_enabled(env)) {
> > +return env->priv | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
> > +}
>
> Still setting this bit here, incorrectly.

Argh! I must have dreamed fixing it. Sending a new version now.

Alistair

>
>
> r~

[PATCH v4 4/5] target/riscv: Remove the hyp load and store functions

2020-11-03 Thread Alistair Francis

Remove the special Virtulisation load and store functions and just use
the standard tcg tcg_gen_qemu_ld_tl() and tcg_gen_qemu_st_tl() functions
instead.

As part of this change we ensure we still run an access check to make
sure we can perform the operations.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h  |  12 +++
 target/riscv/helper.h   |   2 -
 target/riscv/op_helper.c|  86 -
 target/riscv/translate.c|   2 +
 target/riscv/insn_trans/trans_rvh.c.inc | 123 +---
 5 files changed, 59 insertions(+), 166 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0cf48a1521..c0a326c843 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -375,6 +375,8 @@ FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
 FIELD(TB_FLAGS, SEW, 5, 3)
 FIELD(TB_FLAGS, VILL, 8, 1)
+/* Is a Hypervisor instruction load/store allowed? */
+FIELD(TB_FLAGS, HLSX, 9, 1)
 
 /*
  * A simplification for VLMAX
@@ -421,7 +423,17 @@ static inline void cpu_get_tb_cpu_state(CPURISCVState 
*env, target_ulong *pc,
 if (riscv_cpu_fp_enabled(env)) {
 flags |= env->mstatus & MSTATUS_FS;
 }
+
+if (riscv_has_ext(env, RVH)) {
+if (env->priv == PRV_M ||
+(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
+(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
+get_field(env->hstatus, HSTATUS_HU))) {
+flags = FIELD_DP32(flags, TB_FLAGS, HLSX, 1);
+}
+}
 #endif
+
 *pflags = flags;
 }
 
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4b690147fb..ee35311052 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -81,8 +81,6 @@ DEF_HELPER_1(tlb_flush, void, env)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(hyp_tlb_flush, void, env)
 DEF_HELPER_1(hyp_gvma_tlb_flush, void, env)
-DEF_HELPER_4(hyp_load, tl, env, tl, tl, tl)
-DEF_HELPER_5(hyp_store, void, env, tl, tl, tl, tl)
 DEF_HELPER_4(hyp_x_load, tl, env, tl, tl, tl)
 #endif
 
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 5759850e69..980d4f39e1 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -227,92 +227,6 @@ void helper_hyp_gvma_tlb_flush(CPURISCVState *env)
 helper_hyp_tlb_flush(env);
 }
 
-target_ulong helper_hyp_load(CPURISCVState *env, target_ulong address,
- target_ulong attrs, target_ulong memop)
-{
-if (env->priv == PRV_M ||
-(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
-(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
-get_field(env->hstatus, HSTATUS_HU))) {
-target_ulong pte;
-int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
-
-switch (memop) {
-case MO_SB:
-pte = cpu_ldsb_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_UB:
-pte = cpu_ldub_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TESW:
-pte = cpu_ldsw_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TEUW:
-pte = cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TESL:
-pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TEUL:
-pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TEQ:
-pte = cpu_ldq_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-default:
-g_assert_not_reached();
-}
-
-return pte;
-}
-
-if (riscv_cpu_virt_enabled(env)) {
-riscv_raise_exception(env, RISCV_EXCP_VIRT_INSTRUCTION_FAULT, GETPC());
-} else {
-riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
-}
-return 0;
-}
-
-void helper_hyp_store(CPURISCVState *env, target_ulong address,
-  target_ulong val, target_ulong attrs, target_ulong memop)
-{
-if (env->priv == PRV_M ||
-(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
-(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
-get_field(env->hstatus, HSTATUS_HU))) {
-int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
-
-switch (memop) {
-case MO_SB:
-case MO_UB:
-cpu_stb_mmuidx_ra(env, address, val, mmu_idx, GETPC());
-break;
-case MO_TESW:
-case MO_TEUW:
-cpu_stw_mmuidx_ra(env, address, val, mmu_idx, GETPC());
-break;
-case MO_TESL:
-case MO_TEUL:
-cpu_stl_mmuidx_ra(env, address, val, mmu_idx, GETPC());
-break;
-case MO_TEQ:
-cpu_stq_mmuidx_ra(env, address, val, mmu_idx, GETPC());
-break;
-default:
-

[PULL v2 33/38] tests/qtest: add multi-queue test case to vhost-user-blk-test

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Signed-off-by: Stefan Hajnoczi 
Message-id: 20201001144604.559733-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-8-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/vhost-user-blk-test.c | 81 +--
 1 file changed, 76 insertions(+), 5 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index e7e44f9bf0..31f2335f97 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -559,6 +559,67 @@ static void pci_hotplug(void *obj, void *data, 
QGuestAllocator *t_alloc)
 qpci_unplug_acpi_device_test(qts, "drv1", PCI_SLOT_HP);
 }
 
+static void multiqueue(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+QVirtioPCIDevice *pdev1 = obj;
+QVirtioDevice *dev1 = >vdev;
+QVirtioPCIDevice *pdev8;
+QVirtioDevice *dev8;
+QTestState *qts = pdev1->pdev->bus->qts;
+uint64_t features;
+uint16_t num_queues;
+
+/*
+ * The primary device has 1 queue and VIRTIO_BLK_F_MQ is not enabled. The
+ * VIRTIO specification allows VIRTIO_BLK_F_MQ to be enabled when there is
+ * only 1 virtqueue, but --device vhost-user-blk-pci doesn't do this (which
+ * is also spec-compliant).
+ */
+features = qvirtio_get_features(dev1);
+g_assert_cmpint(features & (1u << VIRTIO_BLK_F_MQ), ==, 0);
+features = features & ~(QVIRTIO_F_BAD_FEATURE |
+(1u << VIRTIO_RING_F_INDIRECT_DESC) |
+(1u << VIRTIO_F_NOTIFY_ON_EMPTY) |
+(1u << VIRTIO_BLK_F_SCSI));
+qvirtio_set_features(dev1, features);
+
+/* Hotplug a secondary device with 8 queues */
+qtest_qmp_device_add(qts, "vhost-user-blk-pci", "drv1",
+ "{'addr': %s, 'chardev': 'char2', 'num-queues': 8}",
+ stringify(PCI_SLOT_HP) ".0");
+
+pdev8 = virtio_pci_new(pdev1->pdev->bus,
+   &(QPCIAddress) {
+   .devfn = QPCI_DEVFN(PCI_SLOT_HP, 0)
+   });
+g_assert_nonnull(pdev8);
+g_assert_cmpint(pdev8->vdev.device_type, ==, VIRTIO_ID_BLOCK);
+
+qos_object_start_hw(>obj);
+
+dev8 = >vdev;
+features = qvirtio_get_features(dev8);
+g_assert_cmpint(features & (1u << VIRTIO_BLK_F_MQ),
+==,
+(1u << VIRTIO_BLK_F_MQ));
+features = features & ~(QVIRTIO_F_BAD_FEATURE |
+(1u << VIRTIO_RING_F_INDIRECT_DESC) |
+(1u << VIRTIO_F_NOTIFY_ON_EMPTY) |
+(1u << VIRTIO_BLK_F_SCSI) |
+(1u << VIRTIO_BLK_F_MQ));
+qvirtio_set_features(dev8, features);
+
+num_queues = qvirtio_config_readw(dev8,
+offsetof(struct virtio_blk_config, num_queues));
+g_assert_cmpint(num_queues, ==, 8);
+
+qvirtio_pci_device_disable(pdev8);
+qos_object_destroy(>obj);
+
+/* unplug secondary disk */
+qpci_unplug_acpi_device_test(qts, "drv1", PCI_SLOT_HP);
+}
+
 /*
  * Check that setting the vring addr on a non-existent virtqueue does
  * not crash.
@@ -643,7 +704,8 @@ static void quit_storage_daemon(void *qmp_test_state)
 g_free(qmp_test_state);
 }
 
-static char *start_vhost_user_blk(GString *cmd_line, int vus_instances)
+static char *start_vhost_user_blk(GString *cmd_line, int vus_instances,
+  int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
 int fd, qmp_fd, i;
@@ -675,8 +737,8 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances)
 g_string_append_printf(storage_daemon_command,
 "--blockdev driver=file,node-name=disk%d,filename=%s "
 "--export 
type=vhost-user-blk,id=disk%d,addr.type=unix,addr.path=%s,"
-"node-name=disk%i,writable=on ",
-i, img_path, i, sock_path, i);
+"node-name=disk%i,writable=on,num-queues=%d ",
+i, img_path, i, sock_path, i, num_queues);
 
 g_string_append_printf(cmd_line, "-chardev socket,id=char%d,path=%s ",
i + 1, sock_path);
@@ -705,7 +767,7 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances)
 
 static void *vhost_user_blk_test_setup(GString *cmd_line, void *arg)
 {
-start_vhost_user_blk(cmd_line, 1);
+start_vhost_user_blk(cmd_line, 1, 1);
 return arg;
 }
 
@@ -719,7 +781,13 @@ static void *vhost_user_blk_test_setup(GString *cmd_line, 
void *arg)
 static void *vhost_user_blk_hotplug_test_setup(GString *cmd_line, void *arg)
 {
 /* "-chardev socket,id=char2" is used for pci_hotplug*/
-start_vhost_user_blk(cmd_line, 2);
+start_vhost_user_blk(cmd_line, 2, 1);
+return arg;
+}
+
+static void *vhost_user_blk_multiqueue_test_setup(GString *cmd_line, void

[PULL v2 28/38] configure: introduce --enable-vhost-user-blk-server

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-3-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 configure| 15 +++
 block/export/export.c|  4 ++--
 block/export/meson.build |  2 +-
 util/meson.build |  2 +-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 2c3c69f118..b5e8f5f72c 100755
--- a/configure
+++ b/configure
@@ -329,6 +329,7 @@ vhost_crypto=""
 vhost_scsi=""
 vhost_vsock=""
 vhost_user=""
+vhost_user_blk_server=""
 vhost_user_fs=""
 kvm="auto"
 hax="auto"
@@ -1246,6 +1247,10 @@ for opt do
   ;;
   --enable-vhost-vsock) vhost_vsock="yes"
   ;;
+  --disable-vhost-user-blk-server) vhost_user_blk_server="no"
+  ;;
+  --enable-vhost-user-blk-server) vhost_user_blk_server="yes"
+  ;;
   --disable-vhost-user-fs) vhost_user_fs="no"
   ;;
   --enable-vhost-user-fs) vhost_user_fs="yes"
@@ -1791,6 +1796,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vhost-cryptovhost-user-crypto backend support
   vhost-kernelvhost kernel backend support
   vhost-user  vhost-user backend support
+  vhost-user-blk-servervhost-user-blk server support
   vhost-vdpa  vhost-vdpa kernel backend support
   spice   spice
   rbd rados block device (rbd)
@@ -2382,6 +2388,12 @@ if test "$vhost_net" = ""; then
   test "$vhost_kernel" = "yes" && vhost_net=yes
 fi
 
+# libvhost-user is Linux-only
+test "$vhost_user_blk_server" = "" && vhost_user_blk_server=$linux
+if test "$vhost_user_blk_server" = "yes" && test "$linux" = "no"; then
+  error_exit "--enable-vhost-user-blk-server is only available on Linux"
+fi
+
 ##
 # pkg-config probe
 
@@ -6275,6 +6287,9 @@ fi
 if test "$vhost_vdpa" = "yes" ; then
   echo "CONFIG_VHOST_VDPA=y" >> $config_host_mak
 fi
+if test "$vhost_user_blk_server" = "yes" ; then
+  echo "CONFIG_VHOST_USER_BLK_SERVER=y" >> $config_host_mak
+fi
 if test "$vhost_user_fs" = "yes" ; then
   echo "CONFIG_VHOST_USER_FS=y" >> $config_host_mak
 fi
diff --git a/block/export/export.c b/block/export/export.c
index c3478c6c97..bad6f21b1c 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -22,13 +22,13 @@
 #include "qapi/qapi-commands-block-export.h"
 #include "qapi/qapi-events-block-export.h"
 #include "qemu/id.h"
-#if defined(CONFIG_LINUX) && defined(CONFIG_VHOST_USER)
+#ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 _exp_nbd,
-#if defined(CONFIG_LINUX) && defined(CONFIG_VHOST_USER)
+#ifdef CONFIG_VHOST_USER_BLK_SERVER
 _exp_vhost_user_blk,
 #endif
 };
diff --git a/block/export/meson.build b/block/export/meson.build
index 9fb4fbf81d..19526435d8 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -1,2 +1,2 @@
 blockdev_ss.add(files('export.c'))
-blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: 
files('vhost-user-blk-server.c'))
+blockdev_ss.add(when: 'CONFIG_VHOST_USER_BLK_SERVER', if_true: 
files('vhost-user-blk-server.c'))
diff --git a/util/meson.build b/util/meson.build
index c5159ad79d..f359af0d46 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -66,7 +66,7 @@ if have_block
   util_ss.add(files('main-loop.c'))
   util_ss.add(files('nvdimm-utils.c'))
   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 
'qemu-coroutine-io.c'))
-  util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
+  util_ss.add(when: 'CONFIG_LINUX', if_true: [
 files('vhost-user-server.c'), vhost_user
   ])
   util_ss.add(files('block-helpers.c'))
-- 
MST

[PATCH v4 2/5] target/riscv: Set the virtualised MMU mode when doing hyp accesses

2020-11-03 Thread Alistair Francis

When performing the hypervisor load/store operations set the MMU mode to
indicate that we are virtualised.

Signed-off-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/op_helper.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index e20d56dcb8..548c5851ec 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -235,30 +235,31 @@ target_ulong helper_hyp_load(CPURISCVState *env, 
target_ulong address,
 (env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
 get_field(env->hstatus, HSTATUS_HU))) {
 target_ulong pte;
+int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
 
 riscv_cpu_set_two_stage_lookup(env, true);
 
 switch (memop) {
 case MO_SB:
-pte = cpu_ldsb_data_ra(env, address, GETPC());
+pte = cpu_ldsb_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_UB:
-pte = cpu_ldub_data_ra(env, address, GETPC());
+pte = cpu_ldub_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TESW:
-pte = cpu_ldsw_data_ra(env, address, GETPC());
+pte = cpu_ldsw_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TEUW:
-pte = cpu_lduw_data_ra(env, address, GETPC());
+pte = cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TESL:
-pte = cpu_ldl_data_ra(env, address, GETPC());
+pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TEUL:
-pte = cpu_ldl_data_ra(env, address, GETPC());
+pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TEQ:
-pte = cpu_ldq_data_ra(env, address, GETPC());
+pte = cpu_ldq_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 default:
 g_assert_not_reached();
@@ -284,23 +285,25 @@ void helper_hyp_store(CPURISCVState *env, target_ulong 
address,
 (env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
 (env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
 get_field(env->hstatus, HSTATUS_HU))) {
+int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+
 riscv_cpu_set_two_stage_lookup(env, true);
 
 switch (memop) {
 case MO_SB:
 case MO_UB:
-cpu_stb_data_ra(env, address, val, GETPC());
+cpu_stb_mmuidx_ra(env, address, val, mmu_idx, GETPC());
 break;
 case MO_TESW:
 case MO_TEUW:
-cpu_stw_data_ra(env, address, val, GETPC());
+cpu_stw_mmuidx_ra(env, address, val, mmu_idx, GETPC());
 break;
 case MO_TESL:
 case MO_TEUL:
-cpu_stl_data_ra(env, address, val, GETPC());
+cpu_stl_mmuidx_ra(env, address, val, mmu_idx, GETPC());
 break;
 case MO_TEQ:
-cpu_stq_data_ra(env, address, val, GETPC());
+cpu_stq_mmuidx_ra(env, address, val, mmu_idx, GETPC());
 break;
 default:
 g_assert_not_reached();
@@ -326,15 +329,16 @@ target_ulong helper_hyp_x_load(CPURISCVState *env, 
target_ulong address,
 (env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
 get_field(env->hstatus, HSTATUS_HU))) {
 target_ulong pte;
+int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
 
 riscv_cpu_set_two_stage_lookup(env, true);
 
 switch (memop) {
 case MO_TEUW:
-pte = cpu_lduw_data_ra(env, address, GETPC());
+pte = cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 case MO_TEUL:
-pte = cpu_ldl_data_ra(env, address, GETPC());
+pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
 break;
 default:
 g_assert_not_reached();
-- 
2.28.0

[PULL v2 14/38] virtio-iommu: Store memory region in endpoint struct

2020-11-03 Thread Michael S. Tsirkin

From: Jean-Philippe Brucker 

Store the memory region associated to each endpoint into the endpoint
structure, to allow efficient memory notification on map/unmap.

Acked-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-3-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 4c8f3909b7..a5c2d69aad 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUDomain {
 typedef struct VirtIOIOMMUEndpoint {
 uint32_t id;
 VirtIOIOMMUDomain *domain;
+IOMMUMemoryRegion *iommu_mr;
 QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
 } VirtIOIOMMUEndpoint;
 
@@ -137,16 +138,19 @@ static VirtIOIOMMUEndpoint 
*virtio_iommu_get_endpoint(VirtIOIOMMU *s,
   uint32_t ep_id)
 {
 VirtIOIOMMUEndpoint *ep;
+IOMMUMemoryRegion *mr;
 
 ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(ep_id));
 if (ep) {
 return ep;
 }
-if (!virtio_iommu_mr(s, ep_id)) {
+mr = virtio_iommu_mr(s, ep_id);
+if (!mr) {
 return NULL;
 }
 ep = g_malloc0(sizeof(*ep));
 ep->id = ep_id;
+ep->iommu_mr = mr;
 trace_virtio_iommu_get_endpoint(ep_id);
 g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
 return ep;
@@ -910,9 +914,14 @@ static gboolean reconstruct_endpoints(gpointer key, 
gpointer value,
 VirtIOIOMMU *s = (VirtIOIOMMU *)data;
 VirtIOIOMMUDomain *d = (VirtIOIOMMUDomain *)value;
 VirtIOIOMMUEndpoint *iter;
+IOMMUMemoryRegion *mr;
 
 QLIST_FOREACH(iter, >endpoint_list, next) {
+mr = virtio_iommu_mr(s, iter->id);
+assert(mr);
+
 iter->domain = d;
+iter->iommu_mr = mr;
 g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
 }
 return false; /* continue the domain traversal */
-- 
MST

[PULL v2 35/38] vhost-user-blk-test: rename destroy_drive() to destroy_file()

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

The function is used not just for image files but also for UNIX domain
sockets (QMP monitor and vhost-user-blk). Reflect that in the name.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-10-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/vhost-user-blk-test.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 31f2335f97..f05f14c192 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -658,7 +658,8 @@ static const char *qtest_qemu_storage_daemon_binary(void)
 return qemu_storage_daemon_bin;
 }
 
-static void drive_destroy(void *path)
+/* g_test_queue_destroy() cleanup function for files */
+static void destroy_file(void *path)
 {
 unlink(path);
 g_free(path);
@@ -678,7 +679,7 @@ static char *drive_create(void)
 g_assert_cmpint(ret, ==, 0);
 close(fd);
 
-g_test_queue_destroy(drive_destroy, t_path);
+g_test_queue_destroy(destroy_file, t_path);
 return t_path;
 }
 
@@ -717,7 +718,7 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 
 qmp_fd = mkstemp(qmp_sock_path);
 g_assert_cmpint(qmp_fd, >=, 0);
-g_test_queue_destroy(drive_destroy, qmp_sock_path);
+g_test_queue_destroy(destroy_file, qmp_sock_path);
 
 g_string_append_printf(storage_daemon_command,
 "exec %s "
@@ -731,7 +732,7 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 sock_path = g_strdup(sock_path_tempate);
 fd = mkstemp(sock_path);
 g_assert_cmpint(fd, >=, 0);
-g_test_queue_destroy(drive_destroy, sock_path);
+g_test_queue_destroy(drive_file, sock_path);
 /* create image file */
 img_path = drive_create();
 g_string_append_printf(storage_daemon_command,
-- 
MST

[PATCH v4 1/5] target/riscv: Add a virtualised MMU Mode

2020-11-03 Thread Alistair Francis

Add a new MMU mode that includes the current virt mode.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu-param.h  | 11 ++-
 target/riscv/cpu.h|  4 +++-
 target/riscv/cpu_helper.c |  2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
index 664fc1d371..e4cf3c01eb 100644
--- a/target/riscv/cpu-param.h
+++ b/target/riscv/cpu-param.h
@@ -18,6 +18,15 @@
 # define TARGET_VIRT_ADDR_SPACE_BITS 32 /* sv32 */
 #endif
 #define TARGET_PAGE_BITS 12 /* 4 KiB Pages */
-#define NB_MMU_MODES 4
+/*
+ * The current MMU Modes are:
+ *  - U mode 0b000
+ *  - S mode 0b001
+ *  - M mode 0b011
+ *  - U mode HLV/HLVX/HSV 0b100
+ *  - S mode HLV/HLVX/HSV 0b101
+ *  - M mode HLV/HLVX/HSV 0b111
+ */
+#define NB_MMU_MODES 6
 
 #endif
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 87b68affa8..5d8e54c426 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -363,7 +363,9 @@ void QEMU_NORETURN riscv_raise_exception(CPURISCVState *env,
 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 
-#define TB_FLAGS_MMU_MASK   3
+#define TB_FLAGS_MMU_MASK   7
+#define TB_FLAGS_PRIV_MMU_MASK3
+#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
 typedef CPURISCVState CPUArchState;
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 3eb3a034db..9dfa7af401 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -323,7 +323,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
  * (riscv_cpu_do_interrupt) is correct */
 MemTxResult res;
 MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
-int mode = mmu_idx;
+int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
 bool use_background = false;
 
 /*
-- 
2.28.0

[PULL v2 11/38] hw/virtio/vhost-backend: Fix Coverity CID 1432871

2020-11-03 Thread Michael S. Tsirkin

From: Philippe Mathieu-Daudé 

Fix uninitialized value issues reported by Coverity:

  Field 'msg.reserved' is uninitialized when calling write().

While the 'struct vhost_msg' does not have a 'reserved' field,
we still initialize it to have the two parts of the function
consistent.

Reported-by: Coverity (CID 1432864: UNINIT)
Fixes: c471ad0e9bd ("vhost_net: device IOTLB support")
Reviewed-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20201103063541.2463363-1-phi...@redhat.com>
Reviewed-by: Stefano Garzarella 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/vhost-backend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 88c8ecc9e0..222bbcc62d 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -257,7 +257,7 @@ static int vhost_kernel_send_device_iotlb_msg(struct 
vhost_dev *dev,
   struct vhost_iotlb_msg *imsg)
 {
 if (dev->backend_cap & (1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2)) {
-struct vhost_msg_v2 msg;
+struct vhost_msg_v2 msg = {};
 
 msg.type = VHOST_IOTLB_MSG_V2;
 msg.iotlb = *imsg;
@@ -267,7 +267,7 @@ static int vhost_kernel_send_device_iotlb_msg(struct 
vhost_dev *dev,
 return -EFAULT;
 }
 } else {
-struct vhost_msg msg;
+struct vhost_msg msg = {};
 
 msg.type = VHOST_IOTLB_MSG;
 msg.iotlb = *imsg;
-- 
MST

[PULL v2 32/38] test: new qTest case to test the vhost-user-blk-server

2020-11-03 Thread Michael S. Tsirkin

From: Coiby Xu 

This test case has the same tests as tests/virtio-blk-test.c except for
tests have block_resize. Since vhost-user server can only server one
client one time, two instances of vhost-user-blk-server are started by
qemu-storage-daemon for the hotplug test.

In order to not block scripts/tap-driver.pl, vhost-user-blk-server will
send "quit" command to qemu-storage-daemon's QMP monitor. So a function
is added to libqtest.c to establish socket connection with socket
server.

Suggested-by: Thomas Huth 
Signed-off-by: Coiby Xu 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Marc-André Lureau 
Message-id: 20200918080912.321299-7-coiby...@gmail.com
[Update meson.build to only test when CONFIG_TOOLS has built
qemu-storage-daemon. This prevents CI failures with --disable-tools.
Also bump RAM to 256 MB because that is the minimum RAM granularity on
ppc64 spapr machines.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-7-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/libqos/libqtest.h   |  17 +
 tests/qtest/libqos/vhost-user-blk.h |  48 ++
 tests/qtest/libqos/vhost-user-blk.c | 129 +
 tests/qtest/libqtest.c  |  36 +-
 tests/qtest/vhost-user-blk-test.c   | 751 
 tests/qtest/libqos/meson.build  |   1 +
 tests/qtest/meson.build |   2 +
 7 files changed, 982 insertions(+), 2 deletions(-)
 create mode 100644 tests/qtest/libqos/vhost-user-blk.h
 create mode 100644 tests/qtest/libqos/vhost-user-blk.c
 create mode 100644 tests/qtest/vhost-user-blk-test.c

diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index 5c959f1853..241b5f89fb 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -132,6 +132,23 @@ void qtest_qmp_send(QTestState *s, const char *fmt, ...)
 void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
 GCC_FMT_ATTR(2, 3);
 
+/**
+ * qtest_socket_client:
+ * @server_socket_path: the socket server's path
+ *
+ * Connect to a socket server.
+ */
+int qtest_socket_client(char *server_socket_path);
+
+/**
+ * qtest_create_state_with_qmp_fd:
+ * @fd: socket fd
+ *
+ * Wrap socket fd in QTestState to make use of qtest_qmp*
+ * functions
+ */
+QTestState *qtest_create_state_with_qmp_fd(int fd);
+
 /**
  * qtest_vqmp_fds:
  * @s: #QTestState instance to operate on.
diff --git a/tests/qtest/libqos/vhost-user-blk.h 
b/tests/qtest/libqos/vhost-user-blk.h
new file mode 100644
index 00..2a03456a45
--- /dev/null
+++ b/tests/qtest/libqos/vhost-user-blk.h
@@ -0,0 +1,48 @@
+/*
+ * libqos driver framework
+ *
+ * Based on tests/qtest/libqos/virtio-blk.c
+ *
+ * Copyright (c) 2020 Coiby Xu 
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#ifndef TESTS_LIBQOS_VHOST_USER_BLK_H
+#define TESTS_LIBQOS_VHOST_USER_BLK_H
+
+#include "qgraph.h"
+#include "virtio.h"
+#include "virtio-pci.h"
+
+typedef struct QVhostUserBlk QVhostUserBlk;
+typedef struct QVhostUserBlkPCI QVhostUserBlkPCI;
+typedef struct QVhostUserBlkDevice QVhostUserBlkDevice;
+
+struct QVhostUserBlk {
+QVirtioDevice *vdev;
+};
+
+struct QVhostUserBlkPCI {
+QVirtioPCIDevice pci_vdev;
+QVhostUserBlk blk;
+};
+
+struct QVhostUserBlkDevice {
+QOSGraphObject obj;
+QVhostUserBlk blk;
+};
+
+#endif
diff --git a/tests/qtest/libqos/vhost-user-blk.c 
b/tests/qtest/libqos/vhost-user-blk.c
new file mode 100644
index 00..58c7e1eb69
--- /dev/null
+++ b/tests/qtest/libqos/vhost-user-blk.c
@@ -0,0 +1,129 @@
+/*
+ * libqos driver framework
+ *
+ * Based on tests/qtest/libqos/virtio-blk.c
+ *
+ * Copyright (c) 2020 Coiby Xu 
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"

[PULL v2 12/38] hw/smbios: Fix leaked fd in save_opt_one() error path

2020-11-03 Thread Michael S. Tsirkin

From: Philippe Mathieu-Daudé 

Fix the following Coverity issue (RESOURCE_LEAK):

  CID 1432879: Resource leak

Handle variable fd going out of scope leaks the handle.

Replace a close() call by qemu_close() since the handle is
opened with qemu_open().

Fixes: bb99f4772f5 ("hw/smbios: support loading OEM strings values from a file")
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20201030152742.1553968-1-phi...@redhat.com>
Reviewed-by: Stefano Garzarella 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/smbios/smbios.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 8b30906e50..6a3d39793b 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -988,16 +988,18 @@ static int save_opt_one(void *opaque,
 if (ret < 0) {
 error_setg(errp, "Unable to read from %s: %s",
value, strerror(errno));
+qemu_close(fd);
 return -1;
 }
 if (memchr(buf, '\0', ret)) {
 error_setg(errp, "NUL in OEM strings value in %s", value);
+qemu_close(fd);
 return -1;
 }
 g_byte_array_append(data, (guint8 *)buf, ret);
 }
 
-close(fd);
+qemu_close(fd);
 
 *opt->dest = g_renew(char *, *opt->dest, (*opt->ndest) + 1);
 (*opt->dest)[*opt->ndest] = (char *)g_byte_array_free(data,  FALSE);
-- 
MST

[PULL v2 25/38] Revert "vhost-blk: set features before setting inflight feature"

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

This reverts commit adb29c027341ba095a3ef4beef6aaef86d3a520e.

The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.

To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:

  $ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r 
-b /var/tmp/foo.img
  $ build/qemu-system-x86_64 \
-device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
-object memory-backend-memfd,id=mem,size=1G,share=on \
-M memory-backend=mem,accel=kvm \
-chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
  Segmentation fault (core dumped)

Cc: Jin Yu 
Cc: Raphael Norwitz 
Cc: Michael S. Tsirkin 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201102165709.232180-1-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h |  1 -
 hw/block/vhost-user-blk.c |  6 --
 hw/virtio/vhost.c | 18 --
 3 files changed, 25 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 839bfb153c..94585067f7 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -141,7 +141,6 @@ void vhost_dev_reset_inflight(struct vhost_inflight 
*inflight);
 void vhost_dev_free_inflight(struct vhost_inflight *inflight);
 void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
 int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
-int vhost_dev_prepare_inflight(struct vhost_dev *hdev);
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index f67b29bbf3..a076b1e54d 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -131,12 +131,6 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 
 s->dev.acked_features = vdev->guest_features;
 
-ret = vhost_dev_prepare_inflight(>dev);
-if (ret < 0) {
-error_report("Error set inflight format: %d", -ret);
-goto err_guest_notifiers;
-}
-
 if (!s->inflight->addr) {
 ret = vhost_dev_get_inflight(>dev, s->queue_size, s->inflight);
 if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f2482378c6..79b2be20df 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1645,24 +1645,6 @@ int vhost_dev_load_inflight(struct vhost_inflight 
*inflight, QEMUFile *f)
 return 0;
 }
 
-int vhost_dev_prepare_inflight(struct vhost_dev *hdev)
-{
-int r;
- 
-if (hdev->vhost_ops->vhost_get_inflight_fd == NULL ||
-hdev->vhost_ops->vhost_set_inflight_fd == NULL) {
-return 0;
-}
- 
-r = vhost_dev_set_features(hdev, hdev->log_enabled);
-if (r < 0) {
-VHOST_OPS_DEBUG("vhost_dev_prepare_inflight failed");
-return r;
-}
-
-return 0;
-}
-
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight)
 {
-- 
MST

[PULL v2 26/38] vhost-blk: set features before setting inflight feature

2020-11-03 Thread Michael S. Tsirkin

From: Jin Yu 

Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.

Signed-off-by: Jin Yu 
Acked-by: Raphael Norwitz 
Message-Id: <20201103123617.28256-1-jin...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h |  1 +
 hw/block/vhost-user-blk.c |  6 ++
 hw/virtio/vhost.c | 20 
 3 files changed, 27 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 94585067f7..4a8bc75415 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -141,6 +141,7 @@ void vhost_dev_reset_inflight(struct vhost_inflight 
*inflight);
 void vhost_dev_free_inflight(struct vhost_inflight *inflight);
 void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
 int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev, VirtIODevice *vdev);
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight);
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a076b1e54d..2dd3d93ca0 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -131,6 +131,12 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 
 s->dev.acked_features = vdev->guest_features;
 
+ret = vhost_dev_prepare_inflight(>dev, vdev);
+if (ret < 0) {
+error_report("Error set inflight format: %d", -ret);
+goto err_guest_notifiers;
+}
+
 if (!s->inflight->addr) {
 ret = vhost_dev_get_inflight(>dev, s->queue_size, s->inflight);
 if (ret < 0) {
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 79b2be20df..614ccc2bcb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1645,6 +1645,26 @@ int vhost_dev_load_inflight(struct vhost_inflight 
*inflight, QEMUFile *f)
 return 0;
 }
 
+int vhost_dev_prepare_inflight(struct vhost_dev *hdev, VirtIODevice *vdev)
+{
+int r;
+
+if (hdev->vhost_ops->vhost_get_inflight_fd == NULL ||
+hdev->vhost_ops->vhost_set_inflight_fd == NULL) {
+return 0;
+}
+
+hdev->vdev = vdev;
+
+r = vhost_dev_set_features(hdev, hdev->log_enabled);
+if (r < 0) {
+VHOST_OPS_DEBUG("vhost_dev_prepare_inflight failed");
+return r;
+}
+
+return 0;
+}
+
 int vhost_dev_set_inflight(struct vhost_dev *dev,
struct vhost_inflight *inflight)
 {
-- 
MST

[PULL v2 36/38] vhost-user-blk-test: close fork child file descriptors

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Do not leave stdin, stdout, stderr open after fork. stdout is the
tap-driver.pl pipe. If we keep the pipe open then tap-driver.pl will not
detect that qos-test has terminated and it will hang.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-11-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/vhost-user-blk-test.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index f05f14c192..15daf8ccbc 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -749,6 +749,17 @@ static char *start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
storage_daemon_command->str);
 pid_t pid = fork();
 if (pid == 0) {
+/*
+ * Close standard file descriptors so tap-driver.pl pipe detects when
+ * our parent terminates.
+ */
+close(0);
+close(1);
+close(2);
+open("/dev/null", O_RDONLY);
+open("/dev/null", O_WRONLY);
+open("/dev/null", O_WRONLY);
+
 execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
 exit(1);
 }
-- 
MST

[PATCH v4 5/5] target/riscv: Split the Hypervisor execute load helpers

2020-11-03 Thread Alistair Francis

Split the hypervisor execute load functions into two seperate functions.
This avoids us having to pass the memop to the C helper functions.

Signed-off-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/helper.h   |  3 ++-
 target/riscv/op_helper.c| 36 +++--
 target/riscv/insn_trans/trans_rvh.c.inc | 20 +-
 3 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index ee35311052..939731c345 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -81,7 +81,8 @@ DEF_HELPER_1(tlb_flush, void, env)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(hyp_tlb_flush, void, env)
 DEF_HELPER_1(hyp_gvma_tlb_flush, void, env)
-DEF_HELPER_4(hyp_x_load, tl, env, tl, tl, tl)
+DEF_HELPER_2(hyp_hlvx_hu, tl, env, tl)
+DEF_HELPER_2(hyp_hlvx_wu, tl, env, tl)
 #endif
 
 /* Vector functions */
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 980d4f39e1..d55def76cf 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -227,36 +227,18 @@ void helper_hyp_gvma_tlb_flush(CPURISCVState *env)
 helper_hyp_tlb_flush(env);
 }
 
-target_ulong helper_hyp_x_load(CPURISCVState *env, target_ulong address,
-   target_ulong attrs, target_ulong memop)
+target_ulong helper_hyp_hlvx_hu(CPURISCVState *env, target_ulong address)
 {
-if (env->priv == PRV_M ||
-(env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) ||
-(env->priv == PRV_U && !riscv_cpu_virt_enabled(env) &&
-get_field(env->hstatus, HSTATUS_HU))) {
-target_ulong pte;
-int mmu_idx = cpu_mmu_index(env, false) | 
TB_FLAGS_PRIV_HYP_ACCESS_MASK;
-
-switch (memop) {
-case MO_TEUW:
-pte = cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-case MO_TEUL:
-pte = cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
-break;
-default:
-g_assert_not_reached();
-}
+int mmu_idx = cpu_mmu_index(env, true) | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
 
-return pte;
-}
+return cpu_lduw_mmuidx_ra(env, address, mmu_idx, GETPC());
+}
 
-if (riscv_cpu_virt_enabled(env)) {
-riscv_raise_exception(env, RISCV_EXCP_VIRT_INSTRUCTION_FAULT, GETPC());
-} else {
-riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
-}
-return 0;
+target_ulong helper_hyp_hlvx_wu(CPURISCVState *env, target_ulong address)
+{
+int mmu_idx = cpu_mmu_index(env, true) | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+
+return cpu_ldl_mmuidx_ra(env, address, mmu_idx, GETPC());
 }
 
 #endif /* !CONFIG_USER_ONLY */
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index cc197e7186..ce7ed5affb 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -277,20 +277,16 @@ static bool trans_hlvx_hu(DisasContext *ctx, arg_hlvx_hu 
*a)
 #ifndef CONFIG_USER_ONLY
 TCGv t0 = tcg_temp_new();
 TCGv t1 = tcg_temp_new();
-TCGv mem_idx = tcg_temp_new();
-TCGv memop = tcg_temp_new();
+
+check_access(ctx);
 
 gen_get_gpr(t0, a->rs1);
-tcg_gen_movi_tl(mem_idx, ctx->mem_idx);
-tcg_gen_movi_tl(memop, MO_TEUW);
 
-gen_helper_hyp_x_load(t1, cpu_env, t0, mem_idx, memop);
+gen_helper_hyp_hlvx_hu(t1, cpu_env, t0);
 gen_set_gpr(a->rd, t1);
 
 tcg_temp_free(t0);
 tcg_temp_free(t1);
-tcg_temp_free(mem_idx);
-tcg_temp_free(memop);
 return true;
 #else
 return false;
@@ -303,20 +299,16 @@ static bool trans_hlvx_wu(DisasContext *ctx, arg_hlvx_wu 
*a)
 #ifndef CONFIG_USER_ONLY
 TCGv t0 = tcg_temp_new();
 TCGv t1 = tcg_temp_new();
-TCGv mem_idx = tcg_temp_new();
-TCGv memop = tcg_temp_new();
+
+check_access(ctx);
 
 gen_get_gpr(t0, a->rs1);
-tcg_gen_movi_tl(mem_idx, ctx->mem_idx);
-tcg_gen_movi_tl(memop, MO_TEUL);
 
-gen_helper_hyp_x_load(t1, cpu_env, t0, mem_idx, memop);
+gen_helper_hyp_hlvx_wu(t1, cpu_env, t0);
 gen_set_gpr(a->rd, t1);
 
 tcg_temp_free(t0);
 tcg_temp_free(t1);
-tcg_temp_free(mem_idx);
-tcg_temp_free(memop);
 return true;
 #else
 return false;
-- 
2.28.0

[PULL v2 21/38] virtio-iommu: Set supported page size mask

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

The virtio-iommu device can deal with arbitrary page sizes for virtual
endpoints, but for endpoints assigned with VFIO it must follow the page
granule used by the host IOMMU driver.

Implement the interface to set the vIOMMU page size mask, called by VFIO
for each endpoint. We assume that all host IOMMU drivers use the same
page granule (the host page granule). Override the page_size_mask field
in the virtio config space.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-10-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 50 
 hw/virtio/trace-events   |  1 +
 2 files changed, 51 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 78e07aa40a..fc5c75d693 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -899,6 +899,55 @@ static int 
virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
 return 0;
 }
 
+/*
+ * The default mask (TARGET_PAGE_MASK) is the smallest supported guest granule,
+ * for example 0xf000. When an assigned device has page size
+ * restrictions due to the hardware IOMMU configuration, apply this restriction
+ * to the mask.
+ */
+static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
+   uint64_t new_mask,
+   Error **errp)
+{
+IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+VirtIOIOMMU *s = sdev->viommu;
+uint64_t cur_mask = s->config.page_size_mask;
+
+trace_virtio_iommu_set_page_size_mask(mr->parent_obj.name, cur_mask,
+  new_mask);
+
+if ((cur_mask & new_mask) == 0) {
+error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
+   " is incompatible with mask 0x%"PRIx64, cur_mask, new_mask);
+return -1;
+}
+
+/*
+ * After the machine is finalized, we can't change the mask anymore. If by
+ * chance the hotplugged device supports the same granule, we can still
+ * accept it. Having a different masks is possible but the guest will use
+ * sub-optimal block sizes, so warn about it.
+ */
+if (qdev_hotplug) {
+int new_granule = ctz64(new_mask);
+int cur_granule = ctz64(cur_mask);
+
+if (new_granule != cur_granule) {
+error_setg(errp, "virtio-iommu page mask 0x%"PRIx64
+   " is incompatible with mask 0x%"PRIx64, cur_mask,
+   new_mask);
+return -1;
+} else if (new_mask != cur_mask) {
+warn_report("virtio-iommu page mask 0x%"PRIx64
+" does not match 0x%"PRIx64, cur_mask, new_mask);
+}
+return 0;
+}
+
+s->config.page_size_mask &= new_mask;
+return 0;
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1130,6 +1179,7 @@ static void 
virtio_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = virtio_iommu_translate;
 imrc->replay = virtio_iommu_replay;
 imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
+imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask;
 }
 
 static const TypeInfo virtio_iommu_info = {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 982d0002a6..2060a144a2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -109,6 +109,7 @@ virtio_iommu_fill_resv_property(uint32_t devid, uint8_t 
subtype, uint64_t start,
 virtio_iommu_notify_map(const char *name, uint64_t virt_start, uint64_t 
virt_end, uint64_t phys_start, uint32_t flags) "mr=%s virt_start=0x%"PRIx64" 
virt_end=0x%"PRIx64" phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t 
virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, 
uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64" 
phys_start=0x%"PRIx64
+virtio_iommu_set_page_size_mask(const char *name, uint64_t old, uint64_t new) 
"mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
 virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
 virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 
-- 
MST

[PULL v2 30/38] block/export: fix vhost-user-blk get_config() information leak

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Refuse get_config() requests in excess of sizeof(struct virtio_blk_config).

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-5-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 block/export/vhost-user-blk-server.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 33cc0818b8..62672d1cb9 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -266,6 +266,9 @@ vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t 
len)
 {
 VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
+
+g_return_val_if_fail(len <= sizeof(struct virtio_blk_config), -1);
+
 memcpy(config, >blkcfg, len);
 return 0;
 }
-- 
MST

[PULL v2 20/38] vfio: Set IOMMU page size as per host supported page size

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Set IOMMU supported page size mask same as host Linux supported page
size mask.

Acked-by: Alex Williamson 
Reviewed-by: Eric Auger 
Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-9-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/vfio/common.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e18ea2cf91..35895b18a6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -789,6 +789,14 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 int128_get64(llend),
 iommu_idx);
 
+ret = memory_region_iommu_set_page_size_mask(giommu->iommu,
+ container->pgsizes,
+ );
+if (ret) {
+g_free(giommu);
+goto fail;
+}
+
 ret = memory_region_register_iommu_notifier(section->mr, >n,
 );
 if (ret) {
-- 
MST

[PATCH v4 3/5] target/riscv: Remove the HS_TWO_STAGE flag

2020-11-03 Thread Alistair Francis

The HS_TWO_STAGE flag is no longer required as the MMU index contains
the information if we are performing a two stage access.

Signed-off-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h|  3 +-
 target/riscv/cpu_bits.h   |  1 -
 target/riscv/cpu_helper.c | 60 ---
 target/riscv/op_helper.c  | 12 
 4 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5d8e54c426..0cf48a1521 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -323,8 +323,7 @@ bool riscv_cpu_virt_enabled(CPURISCVState *env);
 void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable);
 bool riscv_cpu_force_hs_excep_enabled(CPURISCVState *env);
 void riscv_cpu_set_force_hs_excep(CPURISCVState *env, bool enable);
-bool riscv_cpu_two_stage_lookup(CPURISCVState *env);
-void riscv_cpu_set_two_stage_lookup(CPURISCVState *env, bool enable);
+bool riscv_cpu_two_stage_lookup(int mmu_idx);
 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch);
 hwaddr riscv_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 void  riscv_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index daedad8691..24b24c69c5 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -469,7 +469,6 @@
  * page table fault.
  */
 #define FORCE_HS_EXCEP  2
-#define HS_TWO_STAGE4
 
 /* RV32 satp CSR field masks */
 #define SATP32_MODE 0x8000
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 9dfa7af401..a2787b1d48 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -207,22 +207,9 @@ void riscv_cpu_set_force_hs_excep(CPURISCVState *env, bool 
enable)
 env->virt = set_field(env->virt, FORCE_HS_EXCEP, enable);
 }
 
-bool riscv_cpu_two_stage_lookup(CPURISCVState *env)
+bool riscv_cpu_two_stage_lookup(int mmu_idx)
 {
-if (!riscv_has_ext(env, RVH)) {
-return false;
-}
-
-return get_field(env->virt, HS_TWO_STAGE);
-}
-
-void riscv_cpu_set_two_stage_lookup(CPURISCVState *env, bool enable)
-{
-if (!riscv_has_ext(env, RVH)) {
-return;
-}
-
-env->virt = set_field(env->virt, HS_TWO_STAGE, enable);
+return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
 }
 
 int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts)
@@ -333,7 +320,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
  * was called. Background registers will be used if the guest has
  * forced a two stage translation to be on (in HS or M mode).
  */
-if (riscv_cpu_two_stage_lookup(env) && access_type != MMU_INST_FETCH) {
+if (!riscv_cpu_virt_enabled(env) && riscv_cpu_two_stage_lookup(mmu_idx)) {
 use_background = true;
 }
 
@@ -572,7 +559,7 @@ restart:
 
 static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
 MMUAccessType access_type, bool pmp_violation,
-bool first_stage)
+bool first_stage, bool two_stage)
 {
 CPUState *cs = env_cpu(env);
 int page_fault_exceptions;
@@ -595,8 +582,7 @@ static void raise_mmu_exception(CPURISCVState *env, 
target_ulong address,
 }
 break;
 case MMU_DATA_LOAD:
-if ((riscv_cpu_virt_enabled(env) || riscv_cpu_two_stage_lookup(env)) &&
-!first_stage) {
+if (two_stage && !first_stage) {
 cs->exception_index = RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT;
 } else {
 cs->exception_index = page_fault_exceptions ?
@@ -604,8 +590,7 @@ static void raise_mmu_exception(CPURISCVState *env, 
target_ulong address,
 }
 break;
 case MMU_DATA_STORE:
-if ((riscv_cpu_virt_enabled(env) || riscv_cpu_two_stage_lookup(env)) &&
-!first_stage) {
+if (two_stage && !first_stage) {
 cs->exception_index = RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT;
 } else {
 cs->exception_index = page_fault_exceptions ?
@@ -696,6 +681,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 int prot, prot2;
 bool pmp_violation = false;
 bool first_stage_error = true;
+bool two_stage_lookup = false;
 int ret = TRANSLATE_FAIL;
 int mode = mmu_idx;
 target_ulong tlb_size = 0;
@@ -715,11 +701,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 access_type != MMU_INST_FETCH &&
 get_field(env->mstatus, MSTATUS_MPRV) &&
 get_field(env->mstatus, MSTATUS_MPV)) {
-riscv_cpu_set_two_stage_lookup(env, true);
+two_stage_lookup = true;
 }
 
 if (riscv_cpu_virt_enabled(env) ||
-(riscv_cpu_two_stage_lookup(env) && access_type != MMU_INST_FETCH)) {
+((riscv_cpu_two_stage_lookup(mmu_idx) || two_stage_lookup) &&
+ access_type != MMU_INST_FETCH)) {
 /* Two stage

[PULL v2 19/38] memory: Add interface to set iommu page size mask

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Allow to set the page size mask supported by an iommu memory region.
This enables a vIOMMU to communicate the page size granule supported by
an assigned device, on hosts that use page sizes greater than 4kB.

Acked-by: Peter Xu 
Reviewed-by: Eric Auger 
Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-8-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/exec/memory.h | 38 ++
 softmmu/memory.c  | 13 +
 2 files changed, 51 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index aff6ef7605..0f3e6bcd5e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -397,6 +397,32 @@ struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/**
+ * @iommu_set_page_size_mask:
+ *
+ * Restrict the page size mask that can be supported with a given IOMMU
+ * memory region. Used for example to propagate host physical IOMMU page
+ * size mask limitations to the virtual IOMMU.
+ *
+ * Optional method: if this method is not provided, then the default global
+ * page mask is used.
+ *
+ * @iommu: the IOMMUMemoryRegion
+ *
+ * @page_size_mask: a bitmask of supported page sizes. At least one bit,
+ * representing the smallest page size, must be set. Additional set bits
+ * represent supported block sizes. For example a host physical IOMMU that
+ * uses page tables with a page size of 4kB, and supports 2MB and 4GB
+ * blocks, will set mask 0x40201000. A granule of 4kB with indiscriminate
+ * block sizes is specified with mask 0xf000.
+ *
+ * Returns 0 on success, or a negative error. In case of failure, the error
+ * object must be created.
+ */
+ int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
+ uint64_t page_size_mask,
+ Error **errp);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1409,6 +1435,18 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
  */
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
+/**
+ * memory_region_iommu_set_page_size_mask: set the supported page
+ * sizes for a given IOMMU memory region
+ *
+ * @iommu_mr: IOMMU memory region
+ * @page_size_mask: supported page size mask
+ * @errp: pointer to Error*, to store an error if it happens.
+ */
+int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
+   uint64_t page_size_mask,
+   Error **errp);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 21d533d8ed..71951fe4dc 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1841,6 +1841,19 @@ static int 
memory_region_update_iommu_notify_flags(IOMMUMemoryRegion *iommu_mr,
 return ret;
 }
 
+int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
+   uint64_t page_size_mask,
+   Error **errp)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+int ret = 0;
+
+if (imrc->iommu_set_page_size_mask) {
+ret = imrc->iommu_set_page_size_mask(iommu_mr, page_size_mask, errp);
+}
+return ret;
+}
+
 int memory_region_register_iommu_notifier(MemoryRegion *mr,
   IOMMUNotifier *n, Error **errp)
 {
-- 
MST

[PULL v2 29/38] block/export: make vhost-user-blk config space little-endian

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

VIRTIO 1.0 devices have little-endian configuration space. The
vhost-user-blk-server.c code already uses little-endian for virtqueue
processing but not for the configuration space fields. Fix this so the
vhost-user-blk export works on big-endian hosts.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-4-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 block/export/vhost-user-blk-server.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 41f4933d6e..33cc0818b8 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -264,7 +264,6 @@ static uint64_t vu_blk_get_protocol_features(VuDev *dev)
 static int
 vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
 {
-/* TODO blkcfg must be little-endian for VIRTIO 1.0 */
 VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 memcpy(config, >blkcfg, len);
@@ -343,18 +342,18 @@ vu_blk_initialize_config(BlockDriverState *bs,
  uint32_t blk_size,
  uint16_t num_queues)
 {
-config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
-config->blk_size = blk_size;
-config->size_max = 0;
-config->seg_max = 128 - 2;
-config->min_io_size = 1;
-config->opt_io_size = 1;
-config->num_queues = num_queues;
-config->max_discard_sectors = 32768;
-config->max_discard_seg = 1;
-config->discard_sector_alignment = config->blk_size >> 9;
-config->max_write_zeroes_sectors = 32768;
-config->max_write_zeroes_seg = 1;
+config->capacity = cpu_to_le64(bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+config->blk_size = cpu_to_le32(blk_size);
+config->size_max = cpu_to_le32(0);
+config->seg_max = cpu_to_le32(128 - 2);
+config->min_io_size = cpu_to_le16(1);
+config->opt_io_size = cpu_to_le32(1);
+config->num_queues = cpu_to_le16(num_queues);
+config->max_discard_sectors = cpu_to_le32(32768);
+config->max_discard_seg = cpu_to_le32(1);
+config->discard_sector_alignment = cpu_to_le32(config->blk_size >> 9);
+config->max_write_zeroes_sectors = cpu_to_le32(32768);
+config->max_write_zeroes_seg = cpu_to_le32(1);
 }
 
 static void vu_blk_exp_request_shutdown(BlockExport *exp)
-- 
MST

[PULL v2 08/38] hw/acpi : Don't use '#' flag of printf format

2020-11-03 Thread Michael S. Tsirkin

From: Xinhao Zhang 

Fix code style. Don't use '#' flag of printf format ('%#') in
format strings, use '0x' prefix instead

Signed-off-by: Xinhao Zhang 
Signed-off-by: Kai Deng 
Message-Id: <20201103102634.273021-1-zhangxinh...@huawei.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/nvdimm.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 8f7cc16add..8ad5516142 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -556,7 +556,7 @@ static void nvdimm_dsm_func_read_fit(NVDIMMState *state, 
NvdimmDsmIn *in,
 
 fit = fit_buf->fit;
 
-nvdimm_debug("Read FIT: offset %#x FIT size %#x Dirty %s.\n",
+nvdimm_debug("Read FIT: offset 0x%x FIT size 0x%x Dirty %s.\n",
  read_fit->offset, fit->len, fit_buf->dirty ? "Yes" : "No");
 
 if (read_fit->offset > fit->len) {
@@ -664,7 +664,7 @@ static void nvdimm_dsm_label_size(NVDIMMDevice *nvdimm, 
hwaddr dsm_mem_addr)
 label_size = nvdimm->label_size;
 mxfer = nvdimm_get_max_xfer_label_size();
 
-nvdimm_debug("label_size %#x, max_xfer %#x.\n", label_size, mxfer);
+nvdimm_debug("label_size 0x%x, max_xfer 0x%x.\n", label_size, mxfer);
 
 label_size_out.func_ret_status = 
cpu_to_le32(NVDIMM_DSM_RET_STATUS_SUCCESS);
 label_size_out.label_size = cpu_to_le32(label_size);
@@ -680,19 +680,19 @@ static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice 
*nvdimm,
 uint32_t ret = NVDIMM_DSM_RET_STATUS_INVALID;
 
 if (offset + length < offset) {
-nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
+nvdimm_debug("offset 0x%x + length 0x%x is overflow.\n", offset,
  length);
 return ret;
 }
 
 if (nvdimm->label_size < offset + length) {
-nvdimm_debug("position %#x is beyond label data (len = %" PRIx64 
").\n",
+nvdimm_debug("position 0x%x is beyond label data (len = %" PRIx64 
").\n",
  offset + length, nvdimm->label_size);
 return ret;
 }
 
 if (length > nvdimm_get_max_xfer_label_size()) {
-nvdimm_debug("length (%#x) is larger than max_xfer (%#x).\n",
+nvdimm_debug("length (0x%x) is larger than max_xfer (0x%x).\n",
  length, nvdimm_get_max_xfer_label_size());
 return ret;
 }
@@ -716,7 +716,7 @@ static void nvdimm_dsm_get_label_data(NVDIMMDevice *nvdimm, 
NvdimmDsmIn *in,
 get_label_data->offset = le32_to_cpu(get_label_data->offset);
 get_label_data->length = le32_to_cpu(get_label_data->length);
 
-nvdimm_debug("Read Label Data: offset %#x length %#x.\n",
+nvdimm_debug("Read Label Data: offset 0x%x length 0x%x.\n",
  get_label_data->offset, get_label_data->length);
 
 status = nvdimm_rw_label_data_check(nvdimm, get_label_data->offset,
@@ -755,7 +755,7 @@ static void nvdimm_dsm_set_label_data(NVDIMMDevice *nvdimm, 
NvdimmDsmIn *in,
 set_label_data->offset = le32_to_cpu(set_label_data->offset);
 set_label_data->length = le32_to_cpu(set_label_data->length);
 
-nvdimm_debug("Write Label Data: offset %#x length %#x.\n",
+nvdimm_debug("Write Label Data: offset 0x%x length 0x%x.\n",
  set_label_data->offset, set_label_data->length);
 
 status = nvdimm_rw_label_data_check(nvdimm, set_label_data->offset,
@@ -838,7 +838,7 @@ nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
 NvdimmDsmIn *in;
 hwaddr dsm_mem_addr = val;
 
-nvdimm_debug("dsm memory address %#" HWADDR_PRIx ".\n", dsm_mem_addr);
+nvdimm_debug("dsm memory address 0x%" HWADDR_PRIx ".\n", dsm_mem_addr);
 
 /*
  * The DSM memory is mapped to guest address space so an evil guest
@@ -852,11 +852,11 @@ nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
 in->function = le32_to_cpu(in->function);
 in->handle = le32_to_cpu(in->handle);
 
-nvdimm_debug("Revision %#x Handler %#x Function %#x.\n", in->revision,
+nvdimm_debug("Revision 0x%x Handler 0x%x Function 0x%x.\n", in->revision,
  in->handle, in->function);
 
 if (in->revision != 0x1 /* Currently we only support DSM Spec Rev1. */) {
-nvdimm_debug("Revision %#x is not supported, expect %#x.\n",
+nvdimm_debug("Revision 0x%x is not supported, expect 0x%x.\n",
  in->revision, 0x1);
 nvdimm_dsm_no_payload(NVDIMM_DSM_RET_STATUS_UNSUPPORT, dsm_mem_addr);
 goto exit;
-- 
MST

[PATCH v4 0/5] Fix the Hypervisor access functions

2020-11-03 Thread Alistair Francis

Richard pointed out that the Hypervisor access functions don't work
correctly, see:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg751540.html.
This seris fixes them up by adding a new MMU index for the virtualised
state.
v4:
 - Consolidate the inline helper to a helper function
 - Actually don't have a virtualised MMU index
v3:
 - Don't use an external helper
v2:
 - Use only 6 MMU modes instead of 8

Alistair Francis (5):
  target/riscv: Add a virtualised MMU Mode
  target/riscv: Set the virtualised MMU mode when doing hyp accesses
  target/riscv: Remove the HS_TWO_STAGE flag
  target/riscv: Remove the hyp load and store functions
  target/riscv: Split the Hypervisor execute load helpers

 target/riscv/cpu-param.h|  11 +-
 target/riscv/cpu.h  |  19 +++-
 target/riscv/cpu_bits.h |   1 -
 target/riscv/helper.h   |   5 +-
 target/riscv/cpu_helper.c   |  62 +-
 target/riscv/op_helper.c| 124 +---
 target/riscv/translate.c|   2 +
 target/riscv/insn_trans/trans_rvh.c.inc | 143 +---
 8 files changed, 112 insertions(+), 255 deletions(-)

-- 
2.28.0

[PULL v2 17/38] virtio-iommu: Add replay() memory region callback

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Implement the replay callback to setup all mappings for a new memory
region.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-6-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 40 
 hw/virtio/trace-events   |  1 +
 2 files changed, 41 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 7b64892351..985257c88f 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -847,6 +847,45 @@ static gint int_cmp(gconstpointer a, gconstpointer b, 
gpointer user_data)
 return (ua > ub) - (ua < ub);
 }
 
+static gboolean virtio_iommu_remap(gpointer key, gpointer value, gpointer data)
+{
+VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
+VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
+IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
+
+trace_virtio_iommu_remap(mr->parent_obj.name, interval->low, 
interval->high,
+ mapping->phys_addr);
+virtio_iommu_notify_map(mr, interval->low, interval->high,
+mapping->phys_addr, mapping->flags);
+return false;
+}
+
+static void virtio_iommu_replay(IOMMUMemoryRegion *mr, IOMMUNotifier *n)
+{
+IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+VirtIOIOMMU *s = sdev->viommu;
+uint32_t sid;
+VirtIOIOMMUEndpoint *ep;
+
+sid = virtio_iommu_get_bdf(sdev);
+
+qemu_mutex_lock(>mutex);
+
+if (!s->endpoints) {
+goto unlock;
+}
+
+ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
+if (!ep || !ep->domain) {
+goto unlock;
+}
+
+g_tree_foreach(ep->domain->mappings, virtio_iommu_remap, mr);
+
+unlock:
+qemu_mutex_unlock(>mutex);
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1076,6 +1115,7 @@ static void 
virtio_iommu_memory_region_class_init(ObjectClass *klass,
 IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
 
 imrc->translate = virtio_iommu_translate;
+imrc->replay = virtio_iommu_replay;
 }
 
 static const TypeInfo virtio_iommu_info = {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index b87a397406..ea3c3b25ad 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -108,6 +108,7 @@ virtio_iommu_report_fault(uint8_t reason, uint32_t flags, 
uint32_t endpoint, uin
 virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t 
start, uint64_t end) "dev= %d, type=%d start=0x%"PRIx64" end=0x%"PRIx64
 virtio_iommu_notify_map(const char *name, uint64_t virt_start, uint64_t 
virt_end, uint64_t phys_start, uint32_t flags) "mr=%s virt_start=0x%"PRIx64" 
virt_end=0x%"PRIx64" phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t 
virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
+virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, 
uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64" 
phys_start=0x%"PRIx64
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
MST

[PULL v2 05/38] memory-device: Support big alignment requirements

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

Let's warn instead of bailing out - the worst thing that can happen is
that we'll fail hot/coldplug later. The user got warned, and this should
be rare.

This will be necessary for memory devices with rather big (user-defined)
alignment requirements - say a virtio-mem device with a 2G block size -
which will become important, for example, when supporting vfio in the
future.

Reviewed-by: Pankaj Gupta 
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-5-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/mem/memory-device.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 4bc9cf0917..8a736f1a26 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -119,9 +119,10 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 
 /* start of address space indicates the maximum alignment we expect */
 if (!QEMU_IS_ALIGNED(range_lob(), align)) {
-error_setg(errp, "the alignment (0x%" PRIx64 ") is not supported",
-   align);
-return 0;
+warn_report("the alignment (0x%" PRIx64 ") exceeds the expected"
+" maximum alignment, memory will get fragmented and not"
+" all 'maxmem' might be usable for memory devices.",
+align);
 }
 
 memory_device_check_addable(ms, size, );
@@ -151,7 +152,7 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 return 0;
 }
 } else {
-if (range_init(, range_lob(), size)) {
+if (range_init(, QEMU_ALIGN_UP(range_lob(), align), size)) {
 error_setg(errp, "can't add memory device, device too big");
 return 0;
 }
-- 
MST

[PULL v2 18/38] virtio-iommu: Add notify_flag_changed() memory region callback

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Add notify_flag_changed() to notice when memory listeners are added and
removed.

Acked-by: Eric Auger 
Signed-off-by: Bharat Bhushan 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-7-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 14 ++
 hw/virtio/trace-events   |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 985257c88f..78e07aa40a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -886,6 +886,19 @@ unlock:
 qemu_mutex_unlock(>mutex);
 }
 
+static int virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
+IOMMUNotifierFlag old,
+IOMMUNotifierFlag new,
+Error **errp)
+{
+if (old == IOMMU_NOTIFIER_NONE) {
+trace_virtio_iommu_notify_flag_add(iommu_mr->parent_obj.name);
+} else if (new == IOMMU_NOTIFIER_NONE) {
+trace_virtio_iommu_notify_flag_del(iommu_mr->parent_obj.name);
+}
+return 0;
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1116,6 +1129,7 @@ static void 
virtio_iommu_memory_region_class_init(ObjectClass *klass,
 
 imrc->translate = virtio_iommu_translate;
 imrc->replay = virtio_iommu_replay;
+imrc->notify_flag_changed = virtio_iommu_notify_flag_changed;
 }
 
 static const TypeInfo virtio_iommu_info = {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index ea3c3b25ad..982d0002a6 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -109,6 +109,8 @@ virtio_iommu_fill_resv_property(uint32_t devid, uint8_t 
subtype, uint64_t start,
 virtio_iommu_notify_map(const char *name, uint64_t virt_start, uint64_t 
virt_end, uint64_t phys_start, uint32_t flags) "mr=%s virt_start=0x%"PRIx64" 
virt_end=0x%"PRIx64" phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t 
virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, 
uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64" 
phys_start=0x%"PRIx64
+virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
+virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
MST

[PULL v2 27/38] libvhost-user: follow QEMU comment style

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-2-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 contrib/libvhost-user/libvhost-user.h | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.h 
b/contrib/libvhost-user/libvhost-user.h
index 3bbeae8587..a1539dbb69 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -392,7 +392,8 @@ struct VuDev {
 bool broken;
 uint16_t max_queues;
 
-/* @read_msg: custom method to read vhost-user message
+/*
+ * @read_msg: custom method to read vhost-user message
  *
  * Read data from vhost_user socket fd and fill up
  * the passed VhostUserMsg *vmsg struct.
@@ -409,15 +410,19 @@ struct VuDev {
  *
  */
 vu_read_msg_cb read_msg;
-/* @set_watch: add or update the given fd to the watch set,
- * call cb when condition is met */
+
+/*
+ * @set_watch: add or update the given fd to the watch set,
+ * call cb when condition is met.
+ */
 vu_set_watch_cb set_watch;
 
 /* @remove_watch: remove the given fd from the watch set */
 vu_remove_watch_cb remove_watch;
 
-/* @panic: encountered an unrecoverable error, you may try to
- * re-initialize */
+/*
+ * @panic: encountered an unrecoverable error, you may try to re-initialize
+ */
 vu_panic_cb panic;
 const VuDevIface *iface;
 
-- 
MST

[PULL v2 04/38] virtio-mem: Probe THP size to determine default block size

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

Let's allow a minimum block size of 1 MiB in all configurations. Select
the default block size based on
- The page size of the memory backend.
- The THP size if the memory backend size corresponds to the real host
  page size.
- The global minimum of 1 MiB.
and warn if something smaller is configured by the user.

VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
THP size unconditionally.

For now we only support virtio-mem on x86-64 - there isn't a user-visible
change (x86-64 only supports 2 MiB THP on the PMD level) - the default
was, and will be 2 MiB.

If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
expect it to be more transparent - e.g., to only optimize fully populated
ranges unless explicitly told /configured otherwise (in contrast to PMD
THP).

Reviewed-by: Pankaj Gupta 
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-4-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mem.c | 105 +++--
 1 file changed, 101 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 461ac68ee8..655824ff81 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -33,10 +33,83 @@
 #include "trace.h"
 
 /*
- * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
- * memory (e.g., 2MB on x86_64).
+ * Let's not allow blocks smaller than 1 MiB, for example, to keep the tracking
+ * bitmap small.
  */
-#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)QEMU_VMALLOC_ALIGN)
+#define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
+
+#if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
+defined(__powerpc64__)
+#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
+#else
+/* fallback to 1 MiB (e.g., the THP size on s390x) */
+#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
+#endif
+
+/*
+ * We want to have a reasonable default block size such that
+ * 1. We avoid splitting THPs when unplugging memory, which degrades
+ *performance.
+ * 2. We avoid placing THPs for plugged blocks that also cover unplugged
+ *blocks.
+ *
+ * The actual THP size might differ between Linux kernels, so we try to probe
+ * it. In the future (if we ever run into issues regarding 2.), we might want
+ * to disable THP in case we fail to properly probe the THP size, or if the
+ * block size is configured smaller than the THP size.
+ */
+static uint32_t thp_size;
+
+#define HPAGE_PMD_SIZE_PATH 
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+static uint32_t virtio_mem_thp_size(void)
+{
+gchar *content = NULL;
+const char *endptr;
+uint64_t tmp;
+
+if (thp_size) {
+return thp_size;
+}
+
+/*
+ * Try to probe the actual THP size, fallback to (sane but eventually
+ * incorrect) default sizes.
+ */
+if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, , NULL, NULL) &&
+!qemu_strtou64(content, , 0, ) &&
+(!endptr || *endptr == '\n')) {
+/*
+ * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
+ * pages) or weird, fallback to something smaller.
+ */
+if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
+warn_report("Read unsupported THP size: %" PRIx64, tmp);
+} else {
+thp_size = tmp;
+}
+}
+
+if (!thp_size) {
+thp_size = VIRTIO_MEM_DEFAULT_THP_SIZE;
+warn_report("Could not detect THP size, falling back to %" PRIx64
+"  MiB.", thp_size / MiB);
+}
+
+g_free(content);
+return thp_size;
+}
+
+static uint64_t virtio_mem_default_block_size(RAMBlock *rb)
+{
+const uint64_t page_size = qemu_ram_pagesize(rb);
+
+/* We can have hugetlbfs with a page size smaller than the THP size. */
+if (page_size == qemu_real_host_page_size) {
+return MAX(page_size, virtio_mem_thp_size());
+}
+return MAX(page_size, VIRTIO_MEM_MIN_BLOCK_SIZE);
+}
+
 /*
  * Size the usable region bigger than the requested size if possible. Esp.
  * Linux guests will only add (aligned) memory blocks in case they fully
@@ -443,10 +516,23 @@ static void virtio_mem_device_realize(DeviceState *dev, 
Error **errp)
 rb = vmem->memdev->mr.ram_block;
 page_size = qemu_ram_pagesize(rb);
 
+/*
+ * If the block size wasn't configured by the user, use a sane default. 
This
+ * allows using hugetlbfs backends of any page size without manual
+ * intervention.
+ */
+if (!vmem->block_size) {
+vmem->block_size = virtio_mem_default_block_size(rb);
+}
+
 if (vmem->block_size < page_size) {
 error_setg(errp, "'%s' property has to be at least the page size (0x%"
PRIx64 ")", VIRTIO_MEM_BLOCK_SIZE_PROP,

[PULL v2 15/38] virtio-iommu: Add memory notifiers for map/unmap

2020-11-03 Thread Michael S. Tsirkin

From: Bharat Bhushan 

Extend VIRTIO_IOMMU_T_MAP/UNMAP request to notify memory listeners. It
will call VFIO notifier to map/unmap regions in the physical IOMMU.

Signed-off-by: Bharat Bhushan 
Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-4-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 56 
 hw/virtio/trace-events   |  2 ++
 2 files changed, 58 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index a5c2d69aad..7dd15c5eac 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -125,6 +125,51 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, 
gpointer user_data)
 }
 }
 
+static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr virt_start,
+hwaddr virt_end, hwaddr paddr,
+uint32_t flags)
+{
+IOMMUTLBEntry entry;
+IOMMUAccessFlags perm = IOMMU_ACCESS_FLAG(flags & VIRTIO_IOMMU_MAP_F_READ,
+  flags & 
VIRTIO_IOMMU_MAP_F_WRITE);
+
+if (!(mr->iommu_notify_flags & IOMMU_NOTIFIER_MAP) ||
+(flags & VIRTIO_IOMMU_MAP_F_MMIO) || !perm) {
+return;
+}
+
+trace_virtio_iommu_notify_map(mr->parent_obj.name, virt_start, virt_end,
+  paddr, perm);
+
+entry.target_as = _space_memory;
+entry.addr_mask = virt_end - virt_start;
+entry.iova = virt_start;
+entry.perm = perm;
+entry.translated_addr = paddr;
+
+memory_region_notify_iommu(mr, 0, entry);
+}
+
+static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start,
+  hwaddr virt_end)
+{
+IOMMUTLBEntry entry;
+
+if (!(mr->iommu_notify_flags & IOMMU_NOTIFIER_UNMAP)) {
+return;
+}
+
+trace_virtio_iommu_notify_unmap(mr->parent_obj.name, virt_start, virt_end);
+
+entry.target_as = _space_memory;
+entry.addr_mask = virt_end - virt_start;
+entry.iova = virt_start;
+entry.perm = IOMMU_NONE;
+entry.translated_addr = 0;
+
+memory_region_notify_iommu(mr, 0, entry);
+}
+
 static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
 {
 if (!ep->domain) {
@@ -315,6 +360,7 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
 VirtIOIOMMUDomain *domain;
 VirtIOIOMMUInterval *interval;
 VirtIOIOMMUMapping *mapping;
+VirtIOIOMMUEndpoint *ep;
 
 if (flags & ~VIRTIO_IOMMU_MAP_F_MASK) {
 return VIRTIO_IOMMU_S_INVAL;
@@ -344,6 +390,11 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
 
 g_tree_insert(domain->mappings, interval, mapping);
 
+QLIST_FOREACH(ep, >endpoint_list, next) {
+virtio_iommu_notify_map(ep->iommu_mr, virt_start, virt_end, phys_start,
+flags);
+}
+
 return VIRTIO_IOMMU_S_OK;
 }
 
@@ -356,6 +407,7 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
 VirtIOIOMMUMapping *iter_val;
 VirtIOIOMMUInterval interval, *iter_key;
 VirtIOIOMMUDomain *domain;
+VirtIOIOMMUEndpoint *ep;
 int ret = VIRTIO_IOMMU_S_OK;
 
 trace_virtio_iommu_unmap(domain_id, virt_start, virt_end);
@@ -373,6 +425,10 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
 uint64_t current_high = iter_key->high;
 
 if (interval.low <= current_low && interval.high >= current_high) {
+QLIST_FOREACH(ep, >endpoint_list, next) {
+virtio_iommu_notify_unmap(ep->iommu_mr, current_low,
+  current_high);
+}
 g_tree_remove(domain->mappings, iter_key);
 trace_virtio_iommu_unmap_done(domain_id, current_low, 
current_high);
 } else {
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index cf1e59de30..b87a397406 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -106,6 +106,8 @@ virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
 virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t 
sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
 virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t endpoint, 
uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address =0x%"PRIx64
 virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t 
start, uint64_t end) "dev= %d, type=%d start=0x%"PRIx64" end=0x%"PRIx64
+virtio_iommu_notify_map(const char *name, uint64_t virt_start, uint64_t 
virt_end, uint64_t phys_start, uint32_t flags) "mr=%s virt_start=0x%"PRIx64" 
virt_end=0x%"PRIx64" phys_start=0x%"PRIx64" flags=%d"
+virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t 
virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 
 # virtio-mem.c
 virtio_mem_send_response(uint16_t type) "type=%" PRIu16
-- 
MST

[PULL v2 24/38] net: Add vhost-vdpa in show_netdevs()

2020-11-03 Thread Michael S. Tsirkin

From: Cindy Lu 

Fix the bug that while Check qemu supported netdev,
there is no vhost-vdpa

Signed-off-by: Cindy Lu 
Message-Id: <20201016030909.9522-2-l...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 net/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/net.c b/net/net.c
index 7a2a0fb5ac..794c652282 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1049,6 +1049,9 @@ static void show_netdevs(void)
 #endif
 #ifdef CONFIG_POSIX
 "vhost-user",
+#endif
+#ifdef CONFIG_VHOST_VDPA
+"vhost-vdpa",
 #endif
 };
 
-- 
MST

[PULL v2 38/38] vhost-user-blk-test: fix races by using fd passing

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Pass the QMP and vhost-user-blk server sockets as file descriptors. That
way the sockets are already open and in a listen state when the QEMU
process is launched.

This solves the race with qemu-storage-daemon startup where the UNIX
domain sockets may not be ready yet when QEMU attempts to connect. It
also saves us sleeping for 1 second if the qemu-storage-daemon QMP
socket is not ready yet.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-13-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/vhost-user-blk-test.c | 42 +++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 0d056cc189..9589f90b14 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -683,8 +683,22 @@ static char *drive_create(void)
 return t_path;
 }
 
-static char sock_path_tempate[] = "/tmp/qtest.vhost_user_blk.XX";
-static char qmp_sock_path_tempate[] = "/tmp/qtest.vhost_user_blk.qmp.XX";
+static char *create_listen_socket(int *fd)
+{
+int tmp_fd;
+char *path;
+
+/* No race because our pid makes the path unique */
+path = g_strdup_printf("/tmp/qtest-%d-sock.XX", getpid());
+tmp_fd = mkstemp(path);
+g_assert_cmpint(tmp_fd, >=, 0);
+close(tmp_fd);
+unlink(path);
+
+*fd = qtest_socket_server(path);
+g_test_queue_destroy(destroy_file, path);
+return path;
+}
 
 static void quit_storage_daemon(void *qmp_test_state)
 {
@@ -709,37 +723,33 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
  int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
-int fd, qmp_fd, i;
+int qmp_fd, i;
 QTestState *qmp_test_state;
 gchar *img_path;
-char *sock_path = NULL;
-char *qmp_sock_path = g_strdup(qmp_sock_path_tempate);
+char *qmp_sock_path;
 GString *storage_daemon_command = g_string_new(NULL);
 
-qmp_fd = mkstemp(qmp_sock_path);
-g_assert_cmpint(qmp_fd, >=, 0);
-g_test_queue_destroy(destroy_file, qmp_sock_path);
+qmp_sock_path = create_listen_socket(_fd);
 
 g_string_append_printf(storage_daemon_command,
 "exec %s "
-"--chardev socket,id=qmp,path=%s,server,nowait --monitor 
chardev=qmp ",
-vhost_user_blk_bin, qmp_sock_path);
+"--chardev socket,id=qmp,fd=%d,server,nowait --monitor chardev=qmp 
",
+vhost_user_blk_bin, qmp_fd);
 
 g_string_append_printf(cmd_line,
 " -object memory-backend-memfd,id=mem,size=256M,share=on -M 
memory-backend=mem ");
 
 for (i = 0; i < vus_instances; i++) {
-sock_path = g_strdup(sock_path_tempate);
-fd = mkstemp(sock_path);
-g_assert_cmpint(fd, >=, 0);
-g_test_queue_destroy(drive_file, sock_path);
+int fd;
+char *sock_path = create_listen_socket();
+
 /* create image file */
 img_path = drive_create();
 g_string_append_printf(storage_daemon_command,
 "--blockdev driver=file,node-name=disk%d,filename=%s "
-"--export 
type=vhost-user-blk,id=disk%d,addr.type=unix,addr.path=%s,"
+"--export type=vhost-user-blk,id=disk%d,addr.type=fd,addr.str=%d,"
 "node-name=disk%i,writable=on,num-queues=%d ",
-i, img_path, i, sock_path, i, num_queues);
+i, img_path, i, fd, i, num_queues);
 
 g_string_append_printf(cmd_line, "-chardev socket,id=char%d,path=%s ",
i + 1, sock_path);
-- 
MST

[PULL v2 13/38] virtio-iommu: Fix virtio_iommu_mr()

2020-11-03 Thread Michael S. Tsirkin

From: Jean-Philippe Brucker 

Due to an invalid mask, virtio_iommu_mr() may return the wrong memory
region. It hasn't been too problematic so far because the function was
only used to test existence of an endpoint, but that is about to change.

Fixes: cfb42188b24d ("virtio-iommu: Implement attach/detach command")
Cc: QEMU Stable 
Acked-by: Eric Auger 
Reviewed-by: Peter Xu 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-2-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 21ec63b108..4c8f3909b7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -101,7 +101,7 @@ static IOMMUMemoryRegion *virtio_iommu_mr(VirtIOIOMMU *s, 
uint32_t sid)
 bus_n = PCI_BUS_NUM(sid);
 iommu_pci_bus = iommu_find_iommu_pcibus(s, bus_n);
 if (iommu_pci_bus) {
-devfn = sid & PCI_DEVFN_MAX;
+devfn = sid & (PCI_DEVFN_MAX - 1);
 dev = iommu_pci_bus->pbdev[devfn];
 if (dev) {
 return >iommu_mr;
-- 
MST

[PULL v2 03/38] virtio-mem: Make sure "usable_region_size" is always multiples of the block size

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

The spec states:
  "The device MUST set addr, region_size, usable_region_size, plugged_size,
   requested_size to multiples of block_size."

With block sizes > 256MB, we currently wouldn't guarantee that for the
usable_region_size.

Note that we cannot exceed the region_size, as we already enforce the
alignment there properly.

Fixes: 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-3-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 70200b4eac..461ac68ee8 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -227,6 +227,9 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *vmem,
 uint64_t newsize = MIN(memory_region_size(>memdev->mr),
requested_size + VIRTIO_MEM_USABLE_EXTENT);
 
+/* The usable region size always has to be multiples of the block size. */
+newsize = QEMU_ALIGN_UP(newsize, vmem->block_size);
+
 if (!requested_size) {
 newsize = 0;
 }
-- 
MST

[PULL v2 09/38] hw/acpi : add space before the open parenthesis '('

2020-11-03 Thread Michael S. Tsirkin

From: Xinhao Zhang 

Fix code style. Space required before the open parenthesis '('.

Signed-off-by: Xinhao Zhang 
Signed-off-by: Kai Deng 
Message-Id: <20201103102634.273021-2-zhangxinh...@huawei.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index ade9158cbf..2c0c83221f 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -558,7 +558,7 @@ static void acpi_pm1_cnt_write(ACPIREGS *ar, uint16_t val)
 if (val & ACPI_BITMASK_SLEEP_ENABLE) {
 /* change suspend type */
 uint16_t sus_typ = (val >> 10) & 7;
-switch(sus_typ) {
+switch (sus_typ) {
 case 0: /* soft power off */
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 break;
-- 
MST

[PULL v2 34/38] libqtest: add qtest_socket_server()

2020-11-03 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

There is a qtest_socket_client() API. Add an equivalent
qtest_socket_server() API that returns a new UNIX domain socket in the
listen state. The code for this was already there but only used
internally in init_socket().

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201027173528.213464-9-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/libqos/libqtest.h |  8 +++
 tests/qtest/libqtest.c| 40 ---
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index 241b5f89fb..699be8c2a2 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -132,6 +132,14 @@ void qtest_qmp_send(QTestState *s, const char *fmt, ...)
 void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
 GCC_FMT_ATTR(2, 3);
 
+/**
+ * qtest_socket_server:
+ * @socket_path: the UNIX domain socket path
+ *
+ * Create and return a listen socket file descriptor, or abort on failure.
+ */
+int qtest_socket_server(const char *socket_path);
+
 /**
  * qtest_socket_client:
  * @server_socket_path: the socket server's path
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index ab34075f2b..d652ffc90d 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -82,24 +82,8 @@ static void qtest_client_set_rx_handler(QTestState *s, 
QTestRecvFn recv);
 
 static int init_socket(const char *socket_path)
 {
-struct sockaddr_un addr;
-int sock;
-int ret;
-
-sock = socket(PF_UNIX, SOCK_STREAM, 0);
-g_assert_cmpint(sock, !=, -1);
-
-addr.sun_family = AF_UNIX;
-snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", socket_path);
+int sock = qtest_socket_server(socket_path);
 qemu_set_cloexec(sock);
-
-do {
-ret = bind(sock, (struct sockaddr *), sizeof(addr));
-} while (ret == -1 && errno == EINTR);
-g_assert_cmpint(ret, !=, -1);
-ret = listen(sock, 1);
-g_assert_cmpint(ret, !=, -1);
-
 return sock;
 }
 
@@ -638,6 +622,28 @@ QTestState *qtest_create_state_with_qmp_fd(int fd)
 return qmp_test_state;
 }
 
+int qtest_socket_server(const char *socket_path)
+{
+struct sockaddr_un addr;
+int sock;
+int ret;
+
+sock = socket(PF_UNIX, SOCK_STREAM, 0);
+g_assert_cmpint(sock, !=, -1);
+
+addr.sun_family = AF_UNIX;
+snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", socket_path);
+
+do {
+ret = bind(sock, (struct sockaddr *), sizeof(addr));
+} while (ret == -1 && errno == EINTR);
+g_assert_cmpint(ret, !=, -1);
+ret = listen(sock, 1);
+g_assert_cmpint(ret, !=, -1);
+
+return sock;
+}
+
 int qtest_socket_client(char *server_socket_path)
 {
 struct sockaddr_un serv_addr;
-- 
MST

[PULL v2 06/38] memory-device: Add get_min_alignment() callback

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

Add a callback that can be used to express additional alignment
requirements (exceeding the ones from the memory region).

Will be used by virtio-mem to express special alignment requirements due
to manually configured, big block sizes (e.g., 1GB with an ordinary
memory-backend-ram). This avoids failing later when realizing, because
auto-detection wasn't able to assign a properly aligned address.

Reviewed-by: Pankaj Gupta 
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-6-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/mem/memory-device.h | 10 ++
 hw/mem/memory-device.c | 11 +--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 30d7e99f52..48d2611fc5 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -88,6 +88,16 @@ struct MemoryDeviceClass {
  */
 MemoryRegion *(*get_memory_region)(MemoryDeviceState *md, Error **errp);
 
+/*
+ * Optional: Return the desired minimum alignment of the device in guest
+ * physical address space. The final alignment is computed based on this
+ * alignment and the alignment requirements of the memory region.
+ *
+ * Called when plugging the memory device to detect the required alignment
+ * during address assignment.
+ */
+uint64_t (*get_min_alignment)(const MemoryDeviceState *md);
+
 /*
  * Translate the memory device into #MemoryDeviceInfo.
  */
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 8a736f1a26..cf0627fd01 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -259,7 +259,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
MachineState *ms,
 {
 const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
 Error *local_err = NULL;
-uint64_t addr, align;
+uint64_t addr, align = 0;
 MemoryRegion *mr;
 
 mr = mdc->get_memory_region(md, _err);
@@ -267,7 +267,14 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
MachineState *ms,
 goto out;
 }
 
-align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
+if (legacy_align) {
+align = *legacy_align;
+} else {
+if (mdc->get_min_alignment) {
+align = mdc->get_min_alignment(md);
+}
+align = MAX(align, memory_region_get_alignment(mr));
+}
 addr = mdc->get_addr(md);
 addr = memory_device_get_free_addr(ms, !addr ? NULL : , align,
memory_region_size(mr), _err);
-- 
MST

[PULL v2 23/38] vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup

2020-11-03 Thread Michael S. Tsirkin

From: Cindy Lu 

fix the bug that fd will still open after the cleanup

Signed-off-by: Cindy Lu 
Message-Id: <20201016030909.9522-1-l...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 net/vhost-vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 99c476db8c..fe659ec9e2 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -145,6 +145,10 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 g_free(s->vhost_net);
 s->vhost_net = NULL;
 }
+ if (s->vhost_vdpa.device_fd >= 0) {
+qemu_close(s->vhost_vdpa.device_fd);
+s->vhost_vdpa.device_fd = -1;
+}
 }
 
 static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc)
-- 
MST

[PULL v2 02/38] virtio-mem: Make sure "addr" is always multiples of the block size

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

The spec states:
  "The device MUST set addr, region_size, usable_region_size, plugged_size,
   requested_size to multiples of block_size."

In some cases, we currently don't guarantee that for "addr": For example,
when starting a VM with 4 GiB boot memory and a virtio-mem device with a
block size of 2 GiB, "memaddr"/"addr" will be auto-assigned to
0x14000 (5 GiB).

We'll try to improve auto-assignment for memory devices next, to avoid
bailing out in case memory device code selects a bad address.

Note: The Linux driver doesn't support such big block sizes yet.

Reviewed-by: Pankaj Gupta 
Fixes: 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-2-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mem.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 7c8ca9f28b..70200b4eac 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -449,6 +449,11 @@ static void virtio_mem_device_realize(DeviceState *dev, 
Error **errp)
")", VIRTIO_MEM_REQUESTED_SIZE_PROP,
VIRTIO_MEM_BLOCK_SIZE_PROP, vmem->block_size);
 return;
+} else if (!QEMU_IS_ALIGNED(vmem->addr, vmem->block_size)) {
+error_setg(errp, "'%s' property has to be multiples of '%s' (0x%" 
PRIx64
+   ")", VIRTIO_MEM_ADDR_PROP, VIRTIO_MEM_BLOCK_SIZE_PROP,
+   vmem->block_size);
+return;
 } else if (!QEMU_IS_ALIGNED(memory_region_size(>memdev->mr),
 vmem->block_size)) {
 error_setg(errp, "'%s' property memdev size has to be multiples of"
-- 
MST

[PULL v2 22/38] vfio: Don't issue full 2^64 unmap

2020-11-03 Thread Michael S. Tsirkin

From: Jean-Philippe Brucker 

IOMMUs may declare memory regions spanning from 0 to UINT64_MAX. When
attempting to deal with such region, vfio_listener_region_del() passes a
size of 2^64 to int128_get64() which throws an assertion failure.  Even
ignoring this, the VFIO_IOMMU_DMA_MAP ioctl cannot handle this size
since the size field is 64-bit. Split the request in two.

Acked-by: Alex Williamson 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20201030180510.747225-11-jean-phili...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/vfio/common.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 35895b18a6..c1fdbf17f2 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -950,6 +950,17 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 
 if (try_unmap) {
+if (int128_eq(llsize, int128_2_64())) {
+/* The unmap ioctl doesn't accept a full 64-bit span. */
+llsize = int128_rshift(llsize, 1);
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+iova += int128_get64(llsize);
+}
 ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-- 
MST

[PULL v2 07/38] virito-mem: Implement get_min_alignment()

2020-11-03 Thread Michael S. Tsirkin

From: David Hildenbrand 

The block size determines the alignment requirements. Implement
get_min_alignment() of the TYPE_MEMORY_DEVICE interface.

This allows auto-assignment of a properly aligned address in guest
physical address space. For example, when specifying a 2GB block size
for a virtio-mem device with 10GB with a memory setup "-m 4G, 20G",
we'll no longer fail when realizing.

Reviewed-by: Pankaj Gupta 
Cc: "Michael S. Tsirkin" 
Cc: Wei Yang 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
Message-Id: <20201008083029.9504-7-da...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mem-pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
index 913f4a3326..fa5395cd88 100644
--- a/hw/virtio/virtio-mem-pci.c
+++ b/hw/virtio/virtio-mem-pci.c
@@ -76,6 +76,12 @@ static void virtio_mem_pci_fill_device_info(const 
MemoryDeviceState *md,
 info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
 }
 
+static uint64_t virtio_mem_pci_get_min_alignment(const MemoryDeviceState *md)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_BLOCK_SIZE_PROP,
+_abort);
+}
+
 static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
 {
 VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
@@ -110,6 +116,7 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, 
void *data)
 mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
 mdc->get_memory_region = virtio_mem_pci_get_memory_region;
 mdc->fill_device_info = virtio_mem_pci_fill_device_info;
+mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
 }
 
 static void virtio_mem_pci_instance_init(Object *obj)
-- 
MST

[PULL v2 01/38] pc: comment style fixup

2020-11-03 Thread Michael S. Tsirkin

Fix up checkpatch comment style warnings.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Chen Qun 
---
 hw/i386/pc.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5e6c0023e0..17b514d1da 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1149,10 +1149,11 @@ void pc_basic_device_init(struct PCMachineState *pcms,
 error_report("couldn't create HPET device");
 exit(1);
 }
-/* For pc-piix-*, hpet's intcap is always IRQ2. For pc-q35-1.7
-* and earlier, use IRQ2 for compat. Otherwise, use IRQ16~23,
-* IRQ8 and IRQ2.
-*/
+/*
+ * For pc-piix-*, hpet's intcap is always IRQ2. For pc-q35-1.7 and
+ * earlier, use IRQ2 for compat. Otherwise, use IRQ16~23, IRQ8 and
+ * IRQ2.
+ */
 uint8_t compat = object_property_get_uint(OBJECT(hpet),
 HPET_INTCAP, NULL);
 if (!compat) {
-- 
MST

[PULL v2 00/38] pc,pci,vhost,virtio: fixes

2020-11-03 Thread Michael S. Tsirkin

Sending v2 since v1 was borken on 32 bit.

The following changes since commit c7a7a877b716cf14848f1fd5c754d293e2f8d852:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201102' 
into staging (2020-11-03 10:38:05 +)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 5676edd6d984b0696d206faa230c8da15e271044:

  vhost-user-blk-test: fix races by using fd passing (2020-11-03 16:39:06 -0500)


pc,pci,vhost,virtio: fixes

Lots of fixes all over the place.
virtio-mem and virtio-iommu patches are kind of fixes but
it seems better to just make them behave sanely than
try to educate users about the limitations ...

Signed-off-by: Michael S. Tsirkin 


Bharat Bhushan (7):
  virtio-iommu: Add memory notifiers for map/unmap
  virtio-iommu: Call memory notifiers in attach/detach
  virtio-iommu: Add replay() memory region callback
  virtio-iommu: Add notify_flag_changed() memory region callback
  memory: Add interface to set iommu page size mask
  vfio: Set IOMMU page size as per host supported page size
  virtio-iommu: Set supported page size mask

Cindy Lu (2):
  vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup
  net: Add vhost-vdpa in show_netdevs()

Coiby Xu (1):
  test: new qTest case to test the vhost-user-blk-server

David Hildenbrand (6):
  virtio-mem: Make sure "addr" is always multiples of the block size
  virtio-mem: Make sure "usable_region_size" is always multiples of the 
block size
  virtio-mem: Probe THP size to determine default block size
  memory-device: Support big alignment requirements
  memory-device: Add get_min_alignment() callback
  virito-mem: Implement get_min_alignment()

Jean-Philippe Brucker (3):
  virtio-iommu: Fix virtio_iommu_mr()
  virtio-iommu: Store memory region in endpoint struct
  vfio: Don't issue full 2^64 unmap

Jin Yu (1):
  vhost-blk: set features before setting inflight feature

Michael S. Tsirkin (1):
  pc: comment style fixup

Philippe Mathieu-Daudé (2):
  hw/virtio/vhost-backend: Fix Coverity CID 1432871
  hw/smbios: Fix leaked fd in save_opt_one() error path

Stefan Hajnoczi (12):
  Revert "vhost-blk: set features before setting inflight feature"
  libvhost-user: follow QEMU comment style
  configure: introduce --enable-vhost-user-blk-server
  block/export: make vhost-user-blk config space little-endian
  block/export: fix vhost-user-blk get_config() information leak
  contrib/vhost-user-blk: fix get_config() information leak
  tests/qtest: add multi-queue test case to vhost-user-blk-test
  libqtest: add qtest_socket_server()
  vhost-user-blk-test: rename destroy_drive() to destroy_file()
  vhost-user-blk-test: close fork child file descriptors
  vhost-user-blk-test: drop unused return value
  vhost-user-blk-test: fix races by using fd passing

Xinhao Zhang (3):
  hw/acpi : Don't use '#' flag of printf format
  hw/acpi : add space before the open parenthesis '('
  hw/acpi : add spaces around operator

 configure   |  15 +
 contrib/libvhost-user/libvhost-user.h   |  15 +-
 include/exec/memory.h   |  38 ++
 include/hw/mem/memory-device.h  |  10 +
 include/hw/virtio/vhost.h   |   2 +-
 tests/qtest/libqos/libqtest.h   |  25 +
 tests/qtest/libqos/vhost-user-blk.h |  48 ++
 block/export/export.c   |   4 +-
 block/export/vhost-user-blk-server.c|  28 +-
 contrib/vhost-user-blk/vhost-user-blk.c |   2 +
 hw/acpi/core.c  |   2 +-
 hw/acpi/nvdimm.c|  20 +-
 hw/acpi/pcihp.c |   2 +-
 hw/block/vhost-user-blk.c   |   2 +-
 hw/i386/pc.c|   9 +-
 hw/mem/memory-device.c  |  20 +-
 hw/smbios/smbios.c  |   4 +-
 hw/vfio/common.c|  19 +
 hw/virtio/vhost-backend.c   |   4 +-
 hw/virtio/vhost.c   |   8 +-
 hw/virtio/virtio-iommu.c| 205 +++-
 hw/virtio/virtio-mem-pci.c  |   7 +
 hw/virtio/virtio-mem.c  | 113 -
 net/net.c   |   3 +
 net/vhost-vdpa.c|   4 +
 softmmu/memory.c|  13 +
 tests/qtest/libqos/vhost-user-blk.c | 129 +
 tests/qtest/libqtest.c  |  76 ++-
 tests/qtest/vhost-user-blk-test.c   | 843 
 block/export/meson.build|   2 +-
 hw/virtio/trace-events  |   6 +
 tests/qtest/libqos/meson.build  |   1 +
 tests/qtest/meson.build |   2 +
 util/meson.build

Re: [PATCH] hw/intc: Fix incorrect calculation of core in liointc_read() and liointc_write()

2020-11-03 Thread chen huacai

Hi, Philippe and Jiaxun,

On Wed, Nov 4, 2020 at 1:17 AM Philippe Mathieu-Daudé  wrote:
>
> On 11/3/20 4:40 PM, Jiaxun Yang wrote:
> > 于 2020年11月3日 GMT+08:00 下午8:28:27, "Philippe Mathieu-Daudé" 
> >  写到:
> >> On 11/3/20 10:32 AM, AlexChen wrote:
> >>> According to the loongson spec
> >>> (http://www.loongson.cn/uploadfile/cpu/3B1500/Loongson_3B1500_cpu_user_1.pdf)
> >>> and the macro definition(#define R_PERCORE_ISR(x) (0x40 + 0x8 * x)), we 
> >>> know
> >>> that the ISR size of per CORE is 8, so here we need to divide
> >>> (addr - R_PERCORE_ISR(0)) by 8, not 4.
> >>>
> >>> Reported-by: Euler Robot 
> >>> Signed-off-by: Alex Chen 
> >>> ---
> >>>  hw/intc/loongson_liointc.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> For a model added in 2020, its code style is a bit
> >> disappointing (leading to that kind of hidden bugs).
> >> I'm even surprised it passed the review process.
> >>
> >> Thanks for the fix.
> >>
> >> Reviewed-by: Philippe Mathieu-Daudé 
> >
> > It was my proof of concept code.
>
> Don't worry Jiaxun, this comment is not to you, but to
> the QEMU community as a whole. We should have asked
> improvements during review, and explain what could be
> improve, what is not the best style but acceptable,
> and what is good.
>
> > Any suggestions on enhancement?
> > I'll have some free time afterwards so probably can do something.
>
> It is a bit awkward to not comment on a patch (diff).
> Comment I'd have made:
>
> - Add definition for 0x8 magic value in R_PERCORE_ISR()
> - Replace R_PERCORE_ISR() macro my function
> - Replace dead D() code by trace events
> - Use a simple 32-bit implementation (pic_ops.impl.min/max = 4)
>   to let the generic memory code deal with the 8-bit accesses
>   to mapper[].
>
> If you have time, today what would be more useful is to have
> tests for the Loongson-3 board.
As you suggested, LOONGSON_LIOINTC will be used in loongson3_virt.c
and it is defined in a .c file now, so this is a chance for me to
rework liointc.

Huacai
>
> You can see some examples with the Malta board in the repository:
>
> $ git-grep malta tests/acceptance/
>
> Regards,
>
> Phil.
>


-- 
Huacai Chen

Re: [PATCH V16 1/6] target/mips: Fix PageMask with variable page size

2020-11-03 Thread Huacai Chen

Hi, Phillippe,

On Tue, Nov 3, 2020 at 10:53 PM Philippe Mathieu-Daudé  wrote:
>
> On 10/30/20 11:25 AM, Huacai Chen wrote:
> > From: Jiaxun Yang 
> >
> > Our current code assumed the target page size is always 4k
> > when handling PageMask and VPN2, however, variable page size
> > was just added to mips target and that's no longer true.
> >
> > Fixes: ee3863b9d414 ("target/mips: Support variable page size")
> > Signed-off-by: Jiaxun Yang 
> > Signed-off-by: Huacai Chen 
> > ---
> >  target/mips/cp0_helper.c | 36 +---
> >  target/mips/cpu.h|  1 +
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> >
> > diff --git a/target/mips/cp0_helper.c b/target/mips/cp0_helper.c
> > index 12143ac..d90ddd9 100644
> > --- a/target/mips/cp0_helper.c
> > +++ b/target/mips/cp0_helper.c
> > @@ -892,13 +892,31 @@ void helper_mtc0_memorymapid(CPUMIPSState *env, 
> > target_ulong arg1)
> >
> >  void update_pagemask(CPUMIPSState *env, target_ulong arg1, int32_t 
> > *pagemask)
> >  {
> > -uint64_t mask = arg1 >> (TARGET_PAGE_BITS + 1);
> > -if (!(env->insn_flags & ISA_MIPS32R6) || (arg1 == ~0) ||
> > -(mask == 0x || mask == 0x0003 || mask == 0x000F ||
> > - mask == 0x003F || mask == 0x00FF || mask == 0x03FF ||
> > - mask == 0x0FFF || mask == 0x3FFF || mask == 0x)) {
> > -env->CP0_PageMask = arg1 & (0x1FFF & (TARGET_PAGE_MASK << 1));
> > +unsigned long mask;
> > +int maskbits;
> > +
> > +if (env->insn_flags & ISA_MIPS32R6) {
> > +return;
> > +}
> > +/* Don't care MASKX as we don't support 1KB page */
> > +mask = extract32((uint32_t)arg1, CP0PM_MASK, 16);
> > +maskbits = find_first_zero_bit(, 32);
> > +
> > +/* Ensure no more set bit after first zero */
> > +if (mask >> maskbits) {
> > +goto invalid;
> > +}
> > +/* We don't support VTLB entry smaller than target page */
> > +if ((maskbits + 12) < TARGET_PAGE_BITS) {
> > +goto invalid;
> >  }
> > +env->CP0_PageMask = mask << CP0PM_MASK;
> > +
> > +return;
> > +
> > +invalid:
> > +/* When invalid, set to default target page size. */
> > +env->CP0_PageMask = (~TARGET_PAGE_MASK >> 12) << CP0PM_MASK;
> >  }
>
> I was going to queue this patch for 5.2-rc1, but it fails all
> I6400 tests...
>
>   Linux version 4.7.0-rc1-dirty (root@2e66df87a9ff) (gcc version 6.3.0
> 20170516 (Debian 6.3.0-18) ) #1 SMP Sat Feb 1 18:38:13 UTC 2020
>   GCRs appear to have been moved (expected them at 0x1fbf8000)!
>   earlycon: uart8250 at I/O port 0x3f8 (options '38400n8')
>   bootconsole [uart8250] enabled
>   MIPS CPS SMP unable to proceed without a CM
>   CPU0 revision is: 0001a900 (MIPS I6400)
>   FPU revision is: 20f30300
>   MSA revision is: 0300
>   MIPS: machine is mti,malta
>   Software DMA cache coherency enabled
>   Determined physical RAM map:
>   memory: 0800 @  (usable)
>   Zone ranges:
>   DMA  [mem 0x-0x00ff]
>   DMA32[mem 0x0100-0x]
>   Normal   empty
>   Movable zone start for each node
>   Early memory node ranges
>   node   0: [mem 0x-0x07ff]
>   Initmem setup node 0 [mem 0x-0x07ff]
>   Primary instruction cache 64kB, VIPT, 4-way, linesize 64 bytes.
>   Primary data cache 64kB, 4-way, VIPT, no aliases, linesize 64 bytes
>   percpu: Embedded 5 pages/cpu @98000107c000 s29664 r8192 d44064 u81920
>   Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 8163
>   Kernel command line: clocksource=GIC console=tty0 console=ttyS0
>   PID hash table entries: 512 (order: -2, 4096 bytes)
>   Dentry cache hash table entries: 16384 (order: 3, 131072 bytes)
>   Inode-cache hash table entries: 8192 (order: 2, 65536 bytes)
>   Kernel panic - not syncing: MMU doesn't support PAGE_SIZE=0x4000
OK, let's investigate it.

Huacai

Re: [PATCH V16 5/6] hw/mips: Add Loongson-3 machine support

2020-11-03 Thread Huacai Chen

Hi, Philippe,

On Wed, Nov 4, 2020 at 4:23 AM Philippe Mathieu-Daudé  wrote:
>
> Hi Huacai,
>
> On 10/30/20 11:25 AM, Huacai Chen wrote:
> > Add Loongson-3 based machine support, it use liointc as the interrupt
> > controler and use GPEX as the pci controller. Currently it can work with
> > both TCG and KVM.
> >
> > As the machine model is not based on any exiting physical hardware, the
> > name of the machine is "loongson3-virt". It may be superseded in future
> > by a real machine model. If this happens, then a regular deprecation
> > procedure shall occur for "loongson3-virt" machine.
> >
> > We now already have a full functional Linux kernel (based on Linux-5.4.x
> > LTS, the kvm host side and guest side have both been upstream for Linux-
> > 5.9, but Linux-5.9 has not been released yet) here:
> >
> > https://github.com/chenhuacai/linux
> >
> > Of course the upstream kernel is also usable (though it is "unstable"
> > now):
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >
> > How to use QEMU/Loongson-3?
> > 1, Download kernel source from the above URL;
> > 2, Build a kernel with arch/mips/configs/loongson3_defconfig;
> > 3, Boot a Loongson-3A4000 host with this kernel (for KVM mode);
> > 4, Build QEMU-master with this patchset;
> > 5, modprobe kvm (only necessary for KVM mode);
> > 6, Use QEMU with TCG:
> >qemu-system-mips64el -M loongson3-virt,accel=tcg -cpu 
> > Loongson-3A1000 -kernel  -append ...
> >Use QEMU with KVM:
> >qemu-system-mips64el -M loongson3-virt,accel=kvm -cpu 
> > Loongson-3A4000 -kernel  -append ...
> >
> >The "-cpu" parameter is optional here and QEMU will use the correct type 
> > for TCG/KVM automatically.
> >
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > Signed-off-by: Jiaxun Yang 
> > ---
> >  default-configs/devices/mips64el-softmmu.mak |   1 +
> >  hw/mips/Kconfig  |  12 +
> >  hw/mips/loongson3_virt.c | 620 
> > +++
> >  hw/mips/meson.build  |   2 +-
> >  4 files changed, 634 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/mips/loongson3_virt.c
> ...
> > +static inline void loongson3_virt_devices_init(MachineState *machine, 
> > DeviceState *pic)
> > +{
> > +int i;
> > +qemu_irq irq;
> > +PCIBus *pci_bus;
> > +DeviceState *dev;
> > +MemoryRegion *pio_alias;
> > +MemoryRegion *mmio_alias, *mmio_reg;
> > +MemoryRegion *ecam_alias, *ecam_reg;
> > +
> > +dev = qdev_new(TYPE_GPEX_HOST);
> > +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
> > +pci_bus = PCI_HOST_BRIDGE(dev)->bus;
> > +
> > +ecam_alias = g_new0(MemoryRegion, 1);
> > +ecam_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
> > +memory_region_init_alias(ecam_alias, OBJECT(dev), "pcie-ecam",
> > + ecam_reg, 0, 
> > virt_memmap[VIRT_PCIE_ECAM].size);
> > +memory_region_add_subregion(get_system_memory(),
> > +virt_memmap[VIRT_PCIE_ECAM].base, 
> > ecam_alias);
> > +
> > +mmio_alias = g_new0(MemoryRegion, 1);
> > +mmio_reg = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 1);
> > +memory_region_init_alias(mmio_alias, OBJECT(dev), "pcie-mmio",
> > + mmio_reg, virt_memmap[VIRT_PCIE_MMIO].base,
> > + virt_memmap[VIRT_PCIE_MMIO].size);
> > +memory_region_add_subregion(get_system_memory(),
> > +virt_memmap[VIRT_PCIE_MMIO].base, 
> > mmio_alias);
> > +
> > +pio_alias = g_new0(MemoryRegion, 1);
> > +memory_region_init_alias(pio_alias, OBJECT(dev), "pcie-pio",
> > + get_system_io(), 0, 
> > virt_memmap[VIRT_PCIE_PIO].size);
> > +memory_region_add_subregion(get_system_memory(),
> > +virt_memmap[VIRT_PCIE_PIO].base, 
> > pio_alias);
> > +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, 
> > virt_memmap[VIRT_PCIE_PIO].base);
> > +
> > +for (i = 0; i < GPEX_NUM_IRQS; i++) {
> > +irq = qdev_get_gpio_in(pic, PCIE_IRQ_BASE + i);
> > +sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, irq);
> > +gpex_set_irq_num(GPEX_HOST(dev), i, PCIE_IRQ_BASE + i);
> > +}
> > +
> > +pci_vga_init(pci_bus);
> > +
> > +if (defaults_enabled()) {
> > +pci_create_simple(pci_bus, -1, "pci-ohci");
> > +usb_create_simple(usb_bus_find(-1), "usb-kbd");
> > +usb_create_simple(usb_bus_find(-1), "usb-tablet");
> > +}
> > +
> > +for (i = 0; i < nb_nics; i++) {
> > +NICInfo *nd = _table[i];
> > +
> > +if (!nd->model) {
> > +nd->model = g_strdup("virtio");
> > +}
> > +
> > +pci_nic_init_nofail(nd, pci_bus, nd->model, NULL);
> > +}
> > +}
> > +
> > +static void mips_loongson3_virt_init(MachineState *machine)
> > +{
> > +int i;
> > +long bios_size;
> > +MIPSCPU *cpu;
> > +Clock

Re: VFIO Migration

2020-11-03 Thread Jason Wang




On 2020/11/3 下午8:15, Stefan Hajnoczi wrote:

On Tue, Nov 03, 2020 at 04:46:53PM +0800, Jason Wang wrote:

On 2020/11/2 下午7:11, Stefan Hajnoczi wrote:

There is discussion about VFIO migration in the "Re: Out-of-Process
Device Emulation session at KVM Forum 2020" thread. The current status
is that Kirti proposed a VFIO device region type for saving and loading
device state. There is currently no guidance on migrating between
different device versions or device implementations from different
vendors. This is known to be non-trivial and raised discussion about
whether it should really be handled by VFIO or centralized in QEMU.

Below is a document that describes how to ensure migration compatibility
in VFIO. It does not require changes to the VFIO migration interface. It
can be used for both VFIO/mdev kernel devices and vfio-user devices.

The idea is that the device state blob is opaque to the VMM but the same
level of migration compatibility that exists today is still available.


So if we can't mandate this or there's no way to validate this. Vendor is
still free to implement their own protocol which could lead a lot of
maintaining burden.

Yes, the device state representation is their responsibility. We can't
do that for them since they define the hardware interface and internal
state.

As Michael and Paolo have mentioned in the other thread, we can provide
guidelines and standardize common aspects.


Migration can fail if loading the device state is not possible. It should fail
early with a clear error message. It must not appear to complete but leave the
device inoperable due to a migration problem.


For VFIO-user, how management know that a VM can be migrated from src to
dst? For kernel, we have sysfs.

vfio-user devices will normally be instantiated in one of two ways:

1. Launching a device backend and passing command-line parameters:

  $ my-nic --socket-path /tmp/my-nic-vfio-user.sock \
   --model https://vendor-a.com/my-nic \
  --rss on

Here "model" is the device model URL. The program could support
multiple device models.

The "rss" device configuration parameter enables Receive Side Scaling
(RSS) as an example of a configuration parameter.

2. Creating a device using an RPC interface:

  (qemu) device-add my-nic,rss=on

If the device instantiation succeeds then it is safe to live migrate.
The device is exposing the desired hardware interface and expecting the
right device state representation.



Does this mean there will still be a "my-nic" stub in qemu? (I thought 
it should be a generic one like device-add "vfio-user-pci")






The rest of this document describes how these requirements can be met.

Device Models
-
Devices have a *hardware interface* consisting of hardware registers,
interrupts, and so on.

The hardware interface together with the device state representation is called
a *device model*. Device models can be assigned URIs such as
https://qemu.org/devices/e1000e to uniquely identify them.


It looks worse than "pci://vendor_id.device_id.subvendor_id.subdevice_id".
"e1000e" means a lot of different 8275X implementations that have subtle but
easy to be ignored differences.

If you wish to reflect those differences in the device model URI then
you can use:

   
https://qemu.org/devices/pci

Another option is to use device configuration parameters to express
differences.

The important thing is that this device model URI has one owner. No one
else will use qemu.org. There can be many different e1000e device model
URIs, if necessary (with slightly different hardware interfaces and/or
device state representations). This avoids collisions.


And is it possible to have a list of URIs here?

A device implementation (mdev driver, vfio-user device backend, etc) may
support multiple device model URIs.

A device instance has an immutable device model URI and list of
configuration parameters. In other words, once the device is created its
ABI is fixed for the lifetime of the device. A new device instance can
be configured by powering off the machine, hotplug, etc.


Multiple implementations of a device model may exist. They are they are
interchangeable if they follow the same hardware interface and device
state representation.

Multiple implementations of the same hardware interface may exist with
different device state representations, in which case the device models are not
interchangeable and must be assigned different URIs.

Migration is only possible when the same device model is supported by the
*source* and the *destination* devices.

Device Configuration

Device models may have parameters that affect the hardware interface or device
state representation. For example, a network card may have a configurable
address filtering table size parameter called ``rx-filter-size``. A
device state saved with ``rx-filter-size=32`` cannot be safely loaded
into a device with ``rx-filter-size=0``, because

Re: [RFC PATCH 6/6] docs: Added eBPF documentation.

2020-11-03 Thread Jason Wang




On 2020/11/3 上午2:51, Andrew Melnychenko wrote:

From: Andrew 

Also, added maintainers information.

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
  MAINTAINERS   |   6 +++
  docs/ebpf.rst |  29 +++
  docs/ebpf_rss.rst | 129 ++
  3 files changed, 164 insertions(+)
  create mode 100644 docs/ebpf.rst
  create mode 100644 docs/ebpf_rss.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 2c22bbca5a..464b3f3c95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3111,6 +3111,12 @@ S: Maintained
  F: hw/semihosting/
  F: include/hw/semihosting/
  
+EBPF:

+M: Andrew Melnychenko 
+M: Yuri Benditovich 
+S: Maintained
+F: ebpf/*
+
  Build and test automation
  -
  Build and test automation
diff --git a/docs/ebpf.rst b/docs/ebpf.rst
new file mode 100644
index 00..e45d085432
--- /dev/null
+++ b/docs/ebpf.rst
@@ -0,0 +1,29 @@
+===
+eBPF qemu support
+===
+
+eBPF support (CONFIG_EBPF) is enabled automatically by 'configure' script
+if 'bpf' system call is available.
+To disable eBPF support use './configure --disable-bpf'
+
+Basic eBPF functionality is located in ebpf/ebpf.c and ebpf/ebpf.h.
+There are basic functions to load the eBPF program into the kernel.
+Mostly, functions name are self-explanatory:
+
+- `bpf_create_map()`, `bpf_lookup_element()`, `bpf_update_element()`, 
`bpf_delete_element()` - manages eBPF maps. On error, a basic error message 
would be reported and returned -1. On success, 0 would be 
returned(`bpf_create_map()` returns map's file descriptor).
+- `bpf_prog_load()` - load the program. The program has to have proper map 
file descriptors if there are used. On error - the log eBPF would be reported. 
On success, the program file descriptor returned.
+- `bpf_fixup_mapfd()` - would place map file descriptor into the program 
according to 'relocate array' of 'struct fixup_mapfd_t'. The function would 
return how many instructions were 'fixed' aka how many relocations was occurred.
+
+Simplified workflow would look like this:
+
+.. code:: C
+
+int map1 = bpf_create_map(...);
+int map2 = bpf_create_map(...);
+
+bpf_fixup_mapfd(, ARRAY_SIZE(), , 
ARRAY_SIZE(), , map1);
+bpf_fixup_mapfd(, ARRAY_SIZE(), , 
ARRAY_SIZE(), , map2);
+
+int prog = bpf_prog_load(, , 
ARRAY_SIZE(), "GPL");
+
+See the bpf(2) for details.
diff --git a/docs/ebpf_rss.rst b/docs/ebpf_rss.rst
new file mode 100644
index 00..96fee391b8
--- /dev/null
+++ b/docs/ebpf_rss.rst
@@ -0,0 +1,129 @@
+===
+eBPF RSS virtio-net support
+===
+
+RSS(Receive Side Scaling) is used to distribute network packets to guest 
virtqueues
+by calculating packet hash. Usually every queue is processed then by a 
specific guest CPU core.
+
+For now there are 2 RSS implementations in qemu:
+- 'software' RSS (functions if qemu receives network packets, i.e. vhost=off)
+- eBPF RSS (can function with also with vhost=on)
+
+If steering BPF is not set for kernel's TUN module, the TUN uses automatic 
selection
+of rx virtqueue based on lookup table built according to calculated symmetric 
hash
+of transmitted packets.
+If steering BPF is set for TUN the BPF code calculates the hash of packet 
header and
+returns the virtqueue number to place the packet to.
+
+Simplified decision formula:
+
+.. code:: C
+
+queue_index = indirection_table[hash()%]
+
+
+Not for all packets, the hash can/should be calculated.
+
+Note: currently, eBPF RSS does not support hash reporting.
+
+eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap 
configurations:
+
+- eBPF is used:
+
+tap,vhost=off & virtio-net-pci,rss=on,hash=off
+
+- eBPF is used:
+
+tap,vhost=on & virtio-net-pci,rss=on,hash=off
+
+- 'software' RSS is used:
+
+tap,vhost=off & virtio-net-pci,rss=on,hash=on
+
+- eBPF is used, hash population feature is not reported to the guest:
+
+tap,vhost=on & virtio-net-pci,rss=on,hash=on
+
+If CONFIG_EBPF is not set then only 'software' RSS is supported.
+Also 'software' RSS, as a fallback, is used if the eBPF program failed to load 
or set to TUN.
+
+RSS eBPF program
+
+
+RSS program located in ebpf/tun_rss_steering.h as an array of 'struct 
bpf_insn'.
+So the program is part of the qemu binary.
+Initially, the eBPF program was compiled by clang and source code located at 
ebpf/rss.bpf.c.
+Prerequisites to recompile the eBPF program (regenerate 
ebpf/tun_rss_steering.h):
+
+llvm, clang, kernel source tree, python3 + (pip3 pyelftools)
+Adjust 'linuxhdrs' in Makefile.ebpf to reflect the location of the 
kernel source tree
+
+$ cd ebpf
+$ make -f Makefile.ebpf
+
+Note the python script for convertation from eBPF ELF object to '.h' file - 
Ebpf_to_C.py:
+
+$ python EbpfElf_to_C.py rss.bpf.o tun_rss_steering
+
+The first argument of the script

Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.

2020-11-03 Thread Jason Wang




On 2020/11/3 上午2:51, Andrew Melnychenko wrote:

From: Andrew 

When RSS is enabled the device tries to load the eBPF program
to select RX virtqueue in the TUN. If eBPF can be loaded
the RSS will function also with vhost (works with kernel 5.8 and later).
Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
or when hash population requested by the guest.

Signed-off-by: Yuri Benditovich 
Signed-off-by: Andrew Melnychenko 
---
  hw/net/vhost_net.c |   2 +
  hw/net/virtio-net.c| 120 +++--
  include/hw/virtio/virtio-net.h |   4 ++
  net/vhost-vdpa.c   |   2 +
  4 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 24d555e764..16124f99c3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
  VIRTIO_NET_F_MTU,
  VIRTIO_F_IOMMU_PLATFORM,
  VIRTIO_F_RING_PACKED,
+VIRTIO_NET_F_RSS,
+VIRTIO_NET_F_HASH_REPORT,
  
  /* This bit implies RARP isn't sent by QEMU out of band */

  VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 277289d56e..afcc3032ec 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
  
  static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
  
+static uint64_t fix_ebpf_vhost_features(uint64_t features)

+{
+/* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature */
+uint64_t ret = features;
+#ifndef CONFIG_EBPF
+virtio_clear_feature(, VIRTIO_NET_F_RSS);
+#endif
+/* for now, there is no solution for populating the hash from eBPF */
+virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);



I think we probably need to to something reverse since RSS is under the 
control on qemu cli, disable features like this may break migration.


We need disable vhost instead when:

1) eBPF is not supported but RSS is required from command line

or

2) HASH_REPORT is required from command line



+
+return ret;
+}
+
  static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
  Error **errp)
  {
@@ -732,9 +745,9 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  return features;
  }
  
-virtio_clear_feature(, VIRTIO_NET_F_RSS);

-virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);
-features = vhost_net_get_features(get_vhost_net(nc->peer), features);
+features = fix_ebpf_vhost_features(
+vhost_net_get_features(get_vhost_net(nc->peer), features));
+
  vdev->backend_features = features;
  
  if (n->mtu_bypass_backend &&

@@ -1169,12 +1182,75 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
  }
  }
  
+static void virtio_net_unload_epbf_rss(VirtIONet *n);

+
  static void virtio_net_disable_rss(VirtIONet *n)
  {
  if (n->rss_data.enabled) {
  trace_virtio_net_rss_disable();
  }
  n->rss_data.enabled = false;
+
+if (!n->rss_data.enabled_software_rss && ebpf_rss_is_loaded(>ebpf_rss)) 
{
+virtio_net_unload_epbf_rss(n);
+}
+}
+
+static bool virtio_net_attach_steering_ebpf(NICState *nic, int prog_fd)
+{
+NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
+if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
+return false;
+}
+
+return nc->info->set_steering_ebpf(nc, prog_fd);
+}
+
+static void rss_data_to_rss_config(struct VirtioNetRssData *data,
+   struct EBPFRSSConfig *config)
+{
+config->redirect = data->redirect;
+config->populate_hash = data->populate_hash;
+config->hash_types = data->hash_types;
+config->indirections_len = data->indirections_len;
+config->default_queue = data->default_queue;
+}
+
+static bool virtio_net_load_epbf_rss(VirtIONet *n)
+{
+struct EBPFRSSConfig config = {};
+
+if (!n->rss_data.enabled) {
+if (ebpf_rss_is_loaded(>ebpf_rss)) {
+ebpf_rss_unload(>ebpf_rss);
+}
+return true;
+}
+
+if (!ebpf_rss_is_loaded(>ebpf_rss) && !ebpf_rss_load(>ebpf_rss)) {
+return false;
+}
+
+rss_data_to_rss_config(>rss_data, );
+
+if (!ebpf_rss_set_all(>ebpf_rss, ,
+  n->rss_data.indirections_table, n->rss_data.key)) {
+ebpf_rss_unload(>ebpf_rss);
+return false;
+}
+
+if (!virtio_net_attach_steering_ebpf(n->nic, n->ebpf_rss.program_fd)) {
+ebpf_rss_unload(>ebpf_rss);
+return false;
+}
+
+return true;
+}
+
+static void virtio_net_unload_epbf_rss(VirtIONet *n)
+{
+virtio_net_attach_steering_ebpf(n->nic, -1);
+ebpf_rss_unload(>ebpf_rss);
  }
  
  static uint16_t virtio_net_handle_rss(VirtIONet *n,

@@ -1208,6 +1284,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
  err_value =

RE: [PATCH 1/6] target/xtensa: fix uninitialized variable warning

2020-11-03 Thread Chenqun (kuhn)

> -Original Message-
> From: Max Filippov [mailto:jcmvb...@gmail.com]
> Sent: Tuesday, November 3, 2020 5:22 PM
> To: Chenqun (kuhn) 
> Cc: qemu-devel ; QEMU Trivial
> ; Zhanghailiang
> ; ganqixin ; Euler
> Robot 
> Subject: Re: [PATCH 1/6] target/xtensa: fix uninitialized variable warning
> 
> On Mon, Nov 2, 2020 at 5:52 PM Chen Qun 
> wrote:
> >
> > The compiler cannot determine whether the return values of the
> > xtensa_operand_is_register(isa, opc, opnd)  and
> xtensa_operand_is_visible(isa, opc, opnd) functions are the same.
> 
> It doesn't have to because 1) they definitely are not the same, but
> 2) it doesn't matter.
> 
> > So,it assumes that 'rf' is not assigned, but it's used.
> 
> The assumption is wrong. rf is used under the 'if (register_file)'
> condition and register_file is initialized to NULL and then set to something
> non-NULL based on the value of rf here:
> 
Hi Max,
  Yeah, your analysis is correct. This rf is used only when register_file is 
non-NULL. When this condition is met, the rf must have been assigned a value.
The GCC 9.3 compilation I use contains the Wmaybe-uninitialized parameter by 
default. It cannot recognize this complex logic judgment.
This warning may be frequently encountered by developers who compile this part 
of code.

> 958 if (xtensa_operand_is_register(isa, opc, opnd)) {
> 959 rf = xtensa_operand_regfile(isa, opc, opnd);
> 960 register_file = dc->config->regfile[rf];
> 
> > The compiler showed warning:
> > target/xtensa/translate.c: In function ‘disas_xtensa_insn’:
> > target/xtensa/translate.c:985:43: warning: ‘rf’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
> >   985 | arg[vopnd].num_bits =
> xtensa_regfile_num_bits(isa, rf);
> >   |
> ^~~~
> >
> > Add a default value for 'rf' to prevented the warning.
> 
> I don't see it doing default build with gcc 8.3. But then I don't see
> -Wmaybe-uninitialized in the compiler command line either.
> 
Maybe it's available after GCC9, or some CFLAG configuration.

The -Wmaybe-uninitialized parameter has this description：
"These warnings are only possible in optimizing compilation, because otherwise 
GCC does not keep track of the state of variables."
From：https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#Warning-Options

I have tried to configure only "-O2 -fexceptions" for the CFLAG on GCC9, and 
this warning will occur.

> > Reported-by: Euler Robot 
> > Signed-off-by: Chen Qun 
> > ---
> > Cc: Max Filippov 
> > ---
> >  target/xtensa/translate.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
> > index 944a157747..eea851bbe7 100644
> > --- a/target/xtensa/translate.c
> > +++ b/target/xtensa/translate.c
> > @@ -953,7 +953,7 @@ static void disas_xtensa_insn(CPUXtensaState *env,
> > DisasContext *dc)
> >
> >  for (opnd = vopnd = 0; opnd < opnds; ++opnd) {
> >  void **register_file = NULL;
> > -xtensa_regfile rf;
> > +xtensa_regfile rf = -1;
> 
> Please use XTENSA_UNDEFINED instead if you still think this is worth changing.
> 
I don't think it's wrong, it's just a bit of trouble for the compiler :)

Thanks,
Chen Qun

Re: [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState.

2020-11-03 Thread Jason Wang




On 2020/11/3 上午2:51, Andrew Melnychenko wrote:

From: Andrew 

For now, that method supported only by Linux TAP.
Linux TAP uses TUNSETSTEERINGEBPF ioctl.
TUNSETSTEERINGBPF was added 3 years ago.
Qemu checks if it was defined before using.

Signed-off-by: Andrew Melnychenko 
---
  include/net/net.h |  2 ++
  net/tap-bsd.c |  5 +
  net/tap-linux.c   | 19 +++
  net/tap-solaris.c |  5 +
  net/tap-stub.c|  5 +
  net/tap.c |  9 +
  net/tap_int.h |  1 +
  7 files changed, 46 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 897b2d7595..d8a41fb010 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -60,6 +60,7 @@ typedef int (SetVnetBE)(NetClientState *, bool);
  typedef struct SocketReadState SocketReadState;
  typedef void (SocketReadStateFinalize)(SocketReadState *rs);
  typedef void (NetAnnounce)(NetClientState *);
+typedef bool (SetSteeringEBPF)(NetClientState *, int);
  
  typedef struct NetClientInfo {

  NetClientDriver type;
@@ -81,6 +82,7 @@ typedef struct NetClientInfo {
  SetVnetLE *set_vnet_le;
  SetVnetBE *set_vnet_be;
  NetAnnounce *announce;
+SetSteeringEBPF *set_steering_ebpf;
  } NetClientInfo;
  
  struct NetClientState {

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 77aaf674b1..4f64f31e98 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -259,3 +259,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
  return -1;
  }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index b0635e9e32..196373019f 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -31,6 +31,7 @@
  
  #include 

  #include 
+#include  /* TUNSETSTEERINGEBPF */
  
  #include "qapi/error.h"

  #include "qemu/error-report.h"
@@ -316,3 +317,21 @@ int tap_fd_get_ifname(int fd, char *ifname)
  pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
  return 0;
  }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+#ifdef TUNSETSTEERINGEBPF



I'm not sure how much this can help.

But looking at tap-linux.h, I wonder do we need to pull TUN/TAP uapi 
headers.


Thanks



+if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) _fd) != 0) {
+error_report("Issue while setting TUNSETSTEERINGEBPF:"
+" %s with fd: %d, prog_fd: %d",
+strerror(errno), fd, prog_fd);
+
+   return -1;
+}
+
+return 0;
+#else
+error_report("TUNSETSTEERINGEBPF is not supported");
+return -1;
+#endif
+}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 0475a58207..d85224242b 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -255,3 +255,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
  return -1;
  }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap-stub.c b/net/tap-stub.c
index de525a2e69..a0fa25804b 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -85,3 +85,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
  return -1;
  }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+return -1;
+}
diff --git a/net/tap.c b/net/tap.c
index c46ff66184..81f50017bd 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -337,6 +337,14 @@ static void tap_poll(NetClientState *nc, bool enable)
  tap_write_poll(s, enable);
  }
  
+static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)

+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+
+return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
+}
+
  int tap_get_fd(NetClientState *nc)
  {
  TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -362,6 +370,7 @@ static NetClientInfo net_tap_info = {
  .set_vnet_hdr_len = tap_set_vnet_hdr_len,
  .set_vnet_le = tap_set_vnet_le,
  .set_vnet_be = tap_set_vnet_be,
+.set_steering_ebpf = tap_set_steering_ebpf,
  };
  
  static TAPState *net_tap_fd_init(NetClientState *peer,

diff --git a/net/tap_int.h b/net/tap_int.h
index 225a49ea48..547f8a5a28 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -44,5 +44,6 @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
  int tap_fd_enable(int fd);
  int tap_fd_disable(int fd);
  int tap_fd_get_ifname(int fd, char *ifname);
+int tap_fd_set_steering_ebpf(int fd, int prog_fd);
  
  #endif /* NET_TAP_INT_H */

Re: [PATCH-for-5.2 2/3] gitlab-ci: Add a job to cover the --without-default-devices config

2020-11-03 Thread Stefano Stabellini

On Tue, 3 Nov 2020, Philippe Mathieu-Daudé wrote:
> I forgot to Cc the 9pfs & Xen maintainers, doing it now ;)
> 
> On 11/3/20 6:01 PM, Philippe Mathieu-Daudé wrote:
> > On 11/3/20 5:52 PM, Daniel P. Berrangé wrote:
> >> On Tue, Nov 03, 2020 at 05:46:03PM +0100, Philippe Mathieu-Daudé wrote:
> >>> We test './configure --without-default-devices' since commit
> >>> 20885b5b169 (".travis.yml: test that no-default-device builds
> >>> do not regress") in Travis-CI.
> >>>
> >>> As we prefer to use GitLab-CI, add the equivalent job there.
> >>>
> >>> One minor difference: the GitLab Ubuntu docker image has the
> >>> Xen devel packages installed. As it is automatically selected,
> >>> we need to disable it with the --disable-xen option, else the
> >>> build fails:
> >>>
> >>>   /usr/bin/ld: libcommon.fa.p/hw_xen_xen-legacy-backend.c.o: in function 
> >>> `xen_be_register_common':
> >>>   hw/xen/xen-legacy-backend.c:754: undefined reference to `xen_9pfs_ops'
> >>>   /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x8): 
> >>> undefined reference to `local_ops'
> >>>   /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x20): 
> >>> undefined reference to `synth_ops'
> >>>   /usr/bin/ld: libcommon.fa.p/fsdev_qemu-fsdev.c.o:(.data.rel+0x38): 
> >>> undefined reference to `proxy_ops'
> >>>   collect2: error: ld returned 1 exit status
> >>
> >> Surely this is a build bug we need to fix rather than ignore in CI ?
> > 
> > Well it predates this series, so nobody really cared
> > (thus I wonder if it makes sense to invest resources
> > there).
> > 
> > Anyway I can have a look after 5.2-rc1.

Actually I care about Xen and 9pfs support, it is one of the few
combinations that I use regularly and it is even enabled in the Xilinx
product I look after. But admittedly I don't test QEMU master as much as
I should. With the recent changes to the build system it is not very
suprising that there are some issues. It would be great to have a Xen
and 9pfs test in the gitlab CI-loop.

FYI I tried to build the latest QEMU on Alpine Linux 3.12 ARM64 and I
get:

  ninja: unknown tool 'query'

Even after rebuilding ninja master by hand. Any ideas? I don't know much
about ninja.

So I gave up on that and I spinned up a Debian Buster x86 container for
this build. That one got past the "ninja: unknown tool 'query'" error.
The build completed without problems to the end.

Either way I can't reproduce the build error above.

Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net

2020-11-03 Thread Jason Wang




On 2020/11/3 下午7:56, Daniel P. Berrangé wrote:

On Tue, Nov 03, 2020 at 12:32:43PM +0200, Yuri Benditovich wrote:

On Tue, Nov 3, 2020 at 11:02 AM Jason Wang  wrote:


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:

Basic idea is to use eBPF to calculate and steer packets in TAP.
RSS(Receive Side Scaling) is used to distribute network packets to guest

virtqueues

by calculating packet hash.
eBPF RSS allows us to use RSS with vhost TAP.

This set of patches introduces the usage of eBPF for packet steering
and RSS hash calculation:
* RSS(Receive Side Scaling) is used to distribute network packets to
guest virtqueues by calculating packet hash
* eBPF RSS suppose to be faster than already existing 'software'
implementation in QEMU
* Additionally adding support for the usage of RSS with vhost

Supported kernels: 5.8+

Implementation notes:
Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
Added eBPF support to qemu directly through a system call, see the
bpf(2) for details.
The eBPF program is part of the qemu and presented as an array of bpf
instructions.
The program can be recompiled by provided Makefile.ebpf(need to adjust
'linuxhdrs'),
although it's not required to build QEMU with eBPF support.
Added changes to virtio-net and vhost, primary eBPF RSS is used.
'Software' RSS used in the case of hash population and as a fallback

option.

For vhost, the hash population feature is not reported to the guest.

Please also see the documentation in PATCH 6/6.

I am sending those patches as RFC to initiate the discussions and get
feedback on the following points:
* Fallback when eBPF is not supported by the kernel


Yes, and it could also a lacking of CAP_BPF.



* Live migration to the kernel that doesn't have eBPF support


Is there anything that we needs special treatment here?

Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything

works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
all the steering does not use proper queues.





* Integration with current QEMU build


Yes, a question here:

1) Any reason for not using libbpf, e.g it has been shipped with some
distros


We intentionally do not use libbpf, as it present only on some distros.
We can switch to libbpf, but this will disable bpf if libbpf is not
installed

If we were modifying existing funtionality then introducing a dep on
libbpf would be a problem as you'd be breaking existing QEMU users
on distros without libbpf.

This is brand new functionality though, so it is fine to place a
requirement on libbpf. If distros don't ship that library and they
want BPF features in QEMU, then those distros should take responsibility
for adding libbpf to their package set.


2) It would be better if we can avoid shipping bytecodes



This creates new dependencies: llvm + clang + ...
We would prefer byte code and ability to generate it if prerequisites are
installed.

I've double checked with Fedora, and generating the BPF program from
source is a mandatory requirement for QEMU. Pre-generated BPF bytecode
is not permitted.

There was also a question raised about the kernel ABI compatibility
for BPF programs ?

   https://lwn.net/Articles/831402/

   "The basic problem is that when BPF is compiled, it uses a set
of kernel headers that describe various kernel data structures
for that particular version, which may be different from those
on the kernel where the program is run. Until relatively recently,
that was solved by distributing the BPF as C code along with the
Clang compiler to build the BPF on the system where it was going
to be run."

Is this not an issue for QEMU's usage of BPF here ?



That's good point. Actually, DPDK ships RSS bytecodes but I don't know 
it works.


But as mentioned in the link, if we generate the code with BTF that 
would be fine.


Thanks




The dependancy on llvm is unfortunate for people who build with GCC,
but at least they can opt-out via a configure switch if they really
want to. As that LWN article notes, GCC will gain BPF support


Regards,
Daniel

Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net

2020-11-03 Thread Jason Wang

On 2020/11/3 下午6:32, Yuri Benditovich wrote:

On Tue, Nov 3, 2020 at 11:02 AM Jason Wang > wrote:

On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> Basic idea is to use eBPF to calculate and steer packets in TAP.
> RSS(Receive Side Scaling) is used to distribute network packets
to guest virtqueues
> by calculating packet hash.
> eBPF RSS allows us to use RSS with vhost TAP.
>
> This set of patches introduces the usage of eBPF for packet steering
> and RSS hash calculation:
> * RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues by calculating packet hash
> * eBPF RSS suppose to be faster than already existing 'software'
> implementation in QEMU
> * Additionally adding support for the usage of RSS with vhost
>
> Supported kernels: 5.8+
>
> Implementation notes:
> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> Added eBPF support to qemu directly through a system call, see the
> bpf(2) for details.
> The eBPF program is part of the qemu and presented as an array
of bpf
> instructions.
> The program can be recompiled by provided Makefile.ebpf(need to
adjust
> 'linuxhdrs'),
> although it's not required to build QEMU with eBPF support.
> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> 'Software' RSS used in the case of hash population and as a
fallback option.
> For vhost, the hash population feature is not reported to the guest.
>
> Please also see the documentation in PATCH 6/6.
>
> I am sending those patches as RFC to initiate the discussions
and get
> feedback on the following points:
> * Fallback when eBPF is not supported by the kernel

Yes, and it could also a lacking of CAP_BPF.

> * Live migration to the kernel that doesn't have eBPF support

Is there anything that we needs special treatment here?

Possible case: rss=on, vhost=on, source system with kernel 5.8 
(everything works) -> dest. system 5.6 (bpf does not work), the 
adapter functions, but all the steering does not use proper queues.

Right, I think we need to disable vhost on dest.

> * Integration with current QEMU build

Yes, a question here:

1) Any reason for not using libbpf, e.g it has been shipped with some
distros

We intentionally do not use libbpf, as it present only on some distros.
We can switch to libbpf, but this will disable bpf if libbpf is not 
installed

That's better I think.

2) It would be better if we can avoid shipping bytecodes

This creates new dependencies: llvm + clang + ...
We would prefer byte code and ability to generate it if prerequisites 
are installed.

It's probably ok if we treat the bytecode as a kind of firmware.

But in the long run, it's still worthwhile consider the qemu source is 
used for development and llvm/clang should be a common requirement for 
generating eBPF bytecode for host.

> * Additional usage for eBPF for packet filtering

Another interesting topics in to implement mac/vlan filters. And
in the
future, I plan to add mac based steering. All of these could be
done via
eBPF.

No problem, we can cooperate if needed

>
> Know issues:
> * hash population not supported by eBPF RSS: 'software' RSS used

Is this because there's not way to write to vnet header in
STERRING BPF?

Yes. We plan to submit changes for kernel to cooperate with BPF and 
populate the hash, this work is in progress

That would require a new type of eBPF program and may need some work on 
verifier.

Btw, macvtap is still lacking even steering ebpf program. Would you want 
to post a patch to support that?

> as a fallback, also, hash population feature is not reported to
guests
> with vhost.
> * big-endian BPF support: for now, eBPF is disabled for
big-endian systems.

Are there any blocker for this?

No, can be added in v2

Cool.

Thanks

Just some quick questions after a glance of the codes. Will go
through
them tomorrow.

Thanks

>
> Andrew (6):
>    Added SetSteeringEBPF method for NetClientState.
>    ebpf: Added basic eBPF API.
>    ebpf: Added eBPF RSS program.
>    ebpf: Added eBPF RSS loader.
>    virtio-net: Added eBPF RSS to virtio-net.
>    docs: Added eBPF documentation.
>
>   MAINTAINERS                    |   6 +
>   configure                      |  36 +++
>   docs/ebpf.rst                  |  29 ++
>   docs/ebpf_rss.rst              | 129 
>   ebpf/EbpfElf_to_C.py           |  67 
>   ebpf/Makefile.ebpf             |  38 +++
>   ebpf/ebpf-stub.c               |  28 ++
>   ebpf/ebpf.c                    | 107 +++
>   ebpf/ebpf.h                    |  35 +++
>   ebpf/ebpf_rss.c                | 178

Re: [PATCH 00/15] python: absorb scripts/qmp/qom-* tooling

2020-11-03 Thread John Snow


On 10/21/20 2:51 PM, John Snow wrote:

Based-on: <20201020193555.1493936-1-js...@redhat.com>
   [PATCH v3 00/15] python: create installable package



Thanks for the early look. This is superseded by:

[PATCH v2 00/72] python: move scripts/qmp to python/qemu/qmp


This is a bit of a demonstration of the direction I want to take our
python tooling, and how it *might* work.

By moving items from ./scripts/*.py over to ./python/qemu/* somewhere,
they can be checked with the same isort/flake8/pylint/mypy tooling as
everything else. This will help prevent regressions.

I would like to, over time, move all applicable python scripts from
./scripts to ./python. That will be a long, gradual stream of changes,
but the more we do it, the better off we'll be for these tools.

Reviewer notes:

- I just rewrote qom-xxx entirely, though it is based on the original
   scripts. Doing it brick by brick was too slow.

- I added a symlink to the qom-fuse file under the python/ tree so I
   could check it with the usual linters. This causes some future
   knowledge to bleed through in a few places; notably I update the
   python setup.cfg several times in the middle of the series where it
   doesn't seem like that should have an effect.

- qom-fuse disappears from the tree for a single commit, but that
   preserves git-blame history. Best I could do.

John Snow (15):
   python/qmp: Add qom script rewrites
   python/qmp: add qom script entry points
   scripts/qmp: redirect qom-xxx scripts to python/qemu/qmp/
   scripts/qom-fuse: apply isort rules
   scripts/qom-fuse: apply flake8 rules
   python: Add 'fh' to known-good variable names
   scripts/qom-fuse: Apply pylint rules
   scripts/qom-fuse: Add docstrings
   scripts/qom-fuse: Convert to QOMCommand
   scripts/qom-fuse: use QOMCommand.qom_list()
   scripts/qom-fuse: ensure QOMFuse.read always returns bytes
   scripts/qom-fuse: add static type hints
   scripts/qom-fuse: move to python/qemu/qmp/qom_fuse.py
   scripts/qom-fuse: add redirection shim to python/qemu/qmp/qom-fuse.py
   python: add fuse command to 'qom' tools

  python/qemu/qmp/qom.py| 217 ++
  python/qemu/qmp/qom_common.py | 153 
  python/qemu/qmp/qom_fuse.py   | 207 
  python/setup.cfg  |  24 +++-
  scripts/qmp/qom-fuse  | 144 +-
  scripts/qmp/qom-get   |  66 +--
  scripts/qmp/qom-list  |  63 +-
  scripts/qmp/qom-set   |  63 +-
  scripts/qmp/qom-tree  |  74 +---
  9 files changed, 618 insertions(+), 393 deletions(-)
  create mode 100644 python/qemu/qmp/qom.py
  create mode 100644 python/qemu/qmp/qom_common.py
  create mode 100644 python/qemu/qmp/qom_fuse.py



NACK

[PATCH v2 70/72] scripts/qmp-shell: move to python/qemu/qmp/qmp_shell.py

2020-11-03 Thread John Snow

The script will be unavailable for a commit or two, which will help
preserve development history attached to the new file. A forwarder will
be added shortly afterwards.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell => python/qemu/qmp/qmp_shell.py | 3 ---
 1 file changed, 3 deletions(-)
 rename scripts/qmp/qmp-shell => python/qemu/qmp/qmp_shell.py (99%)
 mode change 100755 => 100644

diff --git a/scripts/qmp/qmp-shell b/python/qemu/qmp/qmp_shell.py
old mode 100755
new mode 100644
similarity index 99%
rename from scripts/qmp/qmp-shell
rename to python/qemu/qmp/qmp_shell.py
index 15aedb80c2af..337acfce2d26
--- a/scripts/qmp/qmp-shell
+++ b/python/qemu/qmp/qmp_shell.py
@@ -1,4 +1,3 @@
-#!/usr/bin/env python3
 #
 # Copyright (C) 2009, 2010 Red Hat Inc.
 #
@@ -96,8 +95,6 @@
 Sequence,
 )
 
-
-sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qmp
 from qemu.qmp import QMPMessage
 
-- 
2.26.2

[PATCH v2 69/72] scripts/qmp-shell: add docstrings

2020-11-03 Thread John Snow

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 1a8a4ba18ab4..15aedb80c2af 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -106,15 +106,20 @@ LOG = logging.getLogger(__name__)
 
 
 class QMPCompleter:
+"""
+QMPCompleter provides a readline library tab-complete behavior.
+"""
 # NB: Python 3.9+ will probably allow us to subclass list[str] directly,
 # but pylint as of today does not know that List[str] is simply 'list'.
 def __init__(self) -> None:
 self._matches: List[str] = []
 
 def append(self, value: str) -> None:
+"""Append a new valid completion to the list of possibilities."""
 return self._matches.append(value)
 
 def complete(self, text: str, state: int) -> Optional[str]:
+"""readline.set_completer() callback implementation."""
 for cmd in self._matches:
 if cmd.startswith(text):
 if state == 0:
@@ -124,7 +129,9 @@ class QMPCompleter:
 
 
 class QMPShellError(qmp.QMPError):
-pass
+"""
+QMP Shell Base error class.
+"""
 
 
 class FuzzyJSON(ast.NodeTransformer):
@@ -137,6 +144,9 @@ class FuzzyJSON(ast.NodeTransformer):
 @classmethod
 def visit_Name(cls,  # pylint: disable=invalid-name
node: ast.Name) -> ast.AST:
+"""
+Transform Name nodes with certain values into Constant (keyword) nodes.
+"""
 if node.id == 'true':
 return ast.Constant(value=True)
 if node.id == 'false':
@@ -147,6 +157,13 @@ class FuzzyJSON(ast.NodeTransformer):
 
 
 class QMPShell(qmp.QEMUMonitorProtocol):
+"""
+QMPShell provides a basic readline-based QMP shell.
+
+:param address: Address of the QMP server.
+:param pretty: Pretty-print QMP messages.
+:param verbose: Echo outgoing QMP messages to console.
+"""
 def __init__(self, address: qmp.SocketAddrT,
  pretty: bool = False, verbose: bool = False):
 super().__init__(address)
@@ -333,6 +350,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 def show_banner(self,
 msg: str = 'Welcome to the QMP low-level shell!') -> None:
+"""
+Print to stdio a greeting, and the QEMU version if available.
+"""
 print(msg)
 if not self._greeting:
 print('Connected')
@@ -342,6 +362,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 @property
 def prompt(self) -> str:
+"""
+Return the current shell prompt, including a trailing space.
+"""
 if self._transmode:
 return 'TRANS> '
 return '(QEMU) '
@@ -367,6 +390,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 return self._execute_cmd(cmdline)
 
 def repl(self) -> Iterator[None]:
+"""
+Return an iterator that implements the REPL.
+"""
 self.show_banner()
 while self.read_exec_command():
 yield
@@ -374,6 +400,13 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 
 class HMPShell(QMPShell):
+"""
+HMPShell provides a basic readline-based HMP shell, tunnelled via QMP.
+
+:param address: Address of the QMP server.
+:param pretty: Pretty-print QMP messages.
+:param verbose: Echo outgoing QMP messages to console.
+"""
 def __init__(self, address: qmp.SocketAddrT,
  pretty: bool = False, verbose: bool = False):
 super().__init__(address, pretty, verbose)
@@ -451,11 +484,15 @@ class HMPShell(QMPShell):
 
 
 def die(msg: str) -> NoReturn:
+"""Write an error to stderr, then exit with a return code of 1."""
 sys.stderr.write('ERROR: %s\n' % msg)
 sys.exit(1)
 
 
 def main() -> None:
+"""
+qmp-shell entry point: parse command line arguments and start the REPL.
+"""
 parser = argparse.ArgumentParser()
 parser.add_argument('-H', '--hmp', action='store_true',
 help='Use HMP interface')
-- 
2.26.2

[PATCH v2 68/72] scripts/qmp-shell: make QMPShellError inherit QMPError

2020-11-03 Thread John Snow

This way, all exceptions from the qemu.qmp namespace all derive from
QMPError.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 40ff9e0a82bd..1a8a4ba18ab4 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -123,7 +123,7 @@ class QMPCompleter:
 return None
 
 
-class QMPShellError(Exception):
+class QMPShellError(qmp.QMPError):
 pass
 
 
-- 
2.26.2

[PATCH v2 57/72] scripts/qmp-shell: add mypy types

2020-11-03 Thread John Snow

As per my usual, this patch is annotations only. Any changes with side
effects are done elsewhere.

Note: pylint does not understand the subscripts for Collection in Python 3.6,
so use the stronger Sequence type as a workaround.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 67 ++-
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 670361322c51..2d0e85b5f723 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -72,10 +72,18 @@ import os
 import re
 import readline
 import sys
+from typing import (
+Iterator,
+List,
+NoReturn,
+Optional,
+Sequence,
+)
 
 
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qmp
+from qemu.qmp import QMPMessage
 
 
 class QMPCompleter:
@@ -122,25 +130,26 @@ class FuzzyJSON(ast.NodeTransformer):
 # TODO: QMPShell's interface is a bit ugly (eg. _fill_completion() and
 #   _execute_cmd()). Let's design a better one.
 class QMPShell(qmp.QEMUMonitorProtocol):
-def __init__(self, address, pretty=False, verbose=False):
+def __init__(self, address: str, pretty: bool = False,
+ verbose: bool = False):
 super().__init__(self.parse_address(address))
-self._greeting = None
+self._greeting: Optional[QMPMessage] = None
 self._completer = QMPCompleter()
 self._pretty = pretty
 self._transmode = False
-self._actions = list()
+self._actions: List[QMPMessage] = []
 self._histfile = os.path.join(os.path.expanduser('~'),
   '.qmp-shell_history')
 self.verbose = verbose
 
-def _fill_completion(self):
+def _fill_completion(self) -> None:
 cmds = self.cmd('query-commands')
 if 'error' in cmds:
 return
 for cmd in cmds['return']:
 self._completer.append(cmd['name'])
 
-def __completer_setup(self):
+def __completer_setup(self) -> None:
 self._completer = QMPCompleter()
 self._fill_completion()
 readline.set_history_length(1024)
@@ -157,14 +166,14 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 print(f"Failed to read history '{self._histfile}': {err!s}")
 atexit.register(self.__save_history)
 
-def __save_history(self):
+def __save_history(self) -> None:
 try:
 readline.write_history_file(self._histfile)
 except IOError as err:
 print(f"Failed to save history file '{self._histfile}': {err!s}")
 
 @classmethod
-def __parse_value(cls, val):
+def __parse_value(cls, val: str) -> object:
 try:
 return int(val)
 except ValueError:
@@ -189,7 +198,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 pass
 return val
 
-def __cli_expr(self, tokens, parent):
+def __cli_expr(self,
+   tokens: Sequence[str],
+   parent: qmp.QMPObject) -> None:
 for arg in tokens:
 (key, sep, val) = arg.partition('=')
 if sep != '=':
@@ -215,7 +226,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 raise QMPShellError(f'Cannot set "{key}" multiple times')
 parent[optpath[-1]] = value
 
-def __build_cmd(self, cmdline):
+def __build_cmd(self, cmdline: str) -> Optional[QMPMessage]:
 """
 Build a QMP input object from a user provided command-line in the
 following format:
@@ -224,6 +235,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 """
 argument_regex = r'''(?:[^\s"']|"(?:\\.|[^"])*"|'(?:\\.|[^'])*')+'''
 cmdargs = re.findall(argument_regex, cmdline)
+qmpcmd: QMPMessage
 
 # Transactional CLI entry/exit:
 if cmdargs[0] == 'transaction(':
@@ -261,14 +273,14 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 self.__cli_expr(cmdargs[1:], qmpcmd['arguments'])
 return qmpcmd
 
-def _print(self, qmp_message):
+def _print(self, qmp_message: object) -> None:
 indent = None
 if self._pretty:
 indent = 4
 jsobj = json.dumps(qmp_message, indent=indent, sort_keys=self._pretty)
 print(str(jsobj))
 
-def _execute_cmd(self, cmdline):
+def _execute_cmd(self, cmdline: str) -> bool:
 try:
 qmpcmd = self.__build_cmd(cmdline)
 except Exception as err:
@@ -288,11 +300,12 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 self._print(resp)
 return True
 
-def connect(self, negotiate: bool = True):
+def connect(self, negotiate: bool = True) -> None:
 self._greeting = super().connect(negotiate)
 self.__completer_setup()
 
-def show_banner(self, msg='Welcome to the QMP low-level shell!'):
+def show_banner(self,
+msg: str = 'Welcome to the QMP low-level shell!') -> None:
 print(msg)

[PATCH v2 55/72] scripts/qmp-shell: initialize completer early

2020-11-03 Thread John Snow

Add an empty completer as a more type-safe placeholder instead of
'None'.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 73694035b203..670361322c51 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -125,7 +125,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 def __init__(self, address, pretty=False, verbose=False):
 super().__init__(self.parse_address(address))
 self._greeting = None
-self._completer = None
+self._completer = QMPCompleter()
 self._pretty = pretty
 self._transmode = False
 self._actions = list()
-- 
2.26.2

[PATCH v2 56/72] python/qmp: add QMPObject type alias

2020-11-03 Thread John Snow

This is meant to represent any generic object seen in a QMPMessage, not
just the root object itself.

Signed-off-by: John Snow 
---
 python/qemu/qmp/__init__.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/qemu/qmp/__init__.py b/python/qemu/qmp/__init__.py
index a6e1a7b85775..ba0d2281d678 100644
--- a/python/qemu/qmp/__init__.py
+++ b/python/qemu/qmp/__init__.py
@@ -41,6 +41,9 @@
 #: QMPReturnValue is the 'return' value of a command.
 QMPReturnValue = object
 
+#: QMPObject is any object in a QMP message.
+QMPObject = Dict[str, object]
+
 # QMPMessage can be outgoing commands or incoming events/returns.
 # QMPReturnValue is usually a dict/json object, but due to QAPI's
 # 'returns-whitelist', it can actually be anything.
-- 
2.26.2

[PATCH v2 63/72] scripts/qmp-shell: remove TODO

2020-11-03 Thread John Snow

We still want to revamp qmp-shell again, but there's much more to the
idea than the comment now intuits. Remove it.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 2 --
 1 file changed, 2 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 0199a13a3428..3c32b576a37d 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -130,8 +130,6 @@ class FuzzyJSON(ast.NodeTransformer):
 return node
 
 
-# TODO: QMPShell's interface is a bit ugly (eg. _fill_completion() and
-#   _execute_cmd()). Let's design a better one.
 class QMPShell(qmp.QEMUMonitorProtocol):
 def __init__(self, address: qmp.SocketAddrT,
  pretty: bool = False, verbose: bool = False):
-- 
2.26.2

[PATCH v2 54/72] scripts/qmp-shell: refactor QMPCompleter

2020-11-03 Thread John Snow

list is a generic type, but we expect to use strings directly. We could
subclass list[str], but pylint does not presently understand that
invocation.

Change this class to envelop a list instead of *being* a list, for
simpler typing.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 847d34890f97..73694035b203 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -78,9 +78,17 @@ sys.path.append(os.path.join(os.path.dirname(__file__), 
'..', '..', 'python'))
 from qemu import qmp
 
 
-class QMPCompleter(list):
-def complete(self, text, state):
-for cmd in self:
+class QMPCompleter:
+# NB: Python 3.9+ will probably allow us to subclass list[str] directly,
+# but pylint as of today does not know that List[str] is simply 'list'.
+def __init__(self) -> None:
+self._matches: List[str] = []
+
+def append(self, value: str) -> None:
+return self._matches.append(value)
+
+def complete(self, text: str, state: int) -> Optional[str]:
+for cmd in self._matches:
 if cmd.startswith(text):
 if state == 0:
 return cmd
-- 
2.26.2

Re: [PATCH v2 00/16] qapi: static typing conversion, pt3

2020-11-03 Thread John Snow


On 10/26/20 5:36 PM, John Snow wrote:

based-on: <20201026194251.11075-1-js...@redhat.com>
   [PATCH v2 00/11] qapi: static typing conversion, pt2


Ping,

This series can be reviewed independently of pt2, so I encourage you to 
try if you have the time.




Hi, this series adds static type hints to the QAPI module.
This is part three, and it focuses on expr.py.

Part 3: https://gitlab.com/jsnow/qemu/-/tree/python-qapi-cleanup-pt3
Everything: https://gitlab.com/jsnow/qemu/-/tree/python-qapi-cleanup-pt6

- Requires Python 3.6+
- Requires mypy 0.770 or newer (for type analysis only)
- Requires pylint 2.6.0 or newer (for lint checking only)

Type hints are added in patches that add *only* type hints and change no
other behavior. Any necessary changes to behavior to accommodate typing
are split out into their own tiny patches.

Every commit should pass with:
  - flake8 qapi/
  - pylint --rcfile=qapi/pylintrc qapi/
  - mypy --config-file=qapi/mypy.ini qapi/

V2:
  - Rebased on the latest V2
002/16:[0001] [FC] 'qapi/expr.py: Check for dict instead of OrderedDict'
  - Import order differences
007/16:[0006] [FC] 'qapi/expr.py: Add casts in a few select cases'
  - Import order differences
008/16:[0006] [FC] 'qapi/expr.py: add type hint annotations'
  - Import order differents
012/16:[0066] [FC] 'qapi/expr.py: Add docstrings'
  - Various docstring changes for Sphinx
014/16:[0004] [FC] 'qapi/expr.py: Use tuples instead of lists for static data'
  - Change to accommodate new 'coroutine' key
015/16:[0006] [FC] 'qapi/expr.py: move related checks inside check_xxx 
functions'
  - Fix check order (ehabkost)

John Snow (16):
   qapi/expr.py: Remove 'info' argument from nested check_if_str
   qapi/expr.py: Check for dict instead of OrderedDict
   qapi/expr.py: constrain incoming expression types
   qapi/expr.py: Add assertion for union type 'check_dict'
   qapi/expr.py: move string check upwards in check_type
   qapi/expr.py: Check type of 'data' member
   qapi/expr.py: Add casts in a few select cases
   qapi/expr.py: add type hint annotations
   qapi/expr.py: rewrite check_if
   qapi/expr.py: Remove single-letter variable
   qapi/expr.py: enable pylint checks
   qapi/expr.py: Add docstrings
   qapi/expr.py: Modify check_keys to accept any Iterable
   qapi/expr.py: Use tuples instead of lists for static data
   qapi/expr.py: move related checks inside check_xxx functions
   qapi/expr.py: Use an expression checker dispatch table

  scripts/qapi/expr.py  | 447 +++---
  scripts/qapi/mypy.ini |   5 -
  scripts/qapi/pylintrc |   1 -
  3 files changed, 334 insertions(+), 119 deletions(-)

[PATCH v2 66/72] scripts/qmp-shell: convert usage comment to docstring

2020-11-03 Thread John Snow

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 128 --
 1 file changed, 72 insertions(+), 56 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 8d5845ab4815..82fe16cff820 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -1,7 +1,5 @@
 #!/usr/bin/env python3
 #
-# Low-level QEMU shell on top of QMP.
-#
 # Copyright (C) 2009, 2010 Red Hat Inc.
 #
 # Authors:
@@ -10,60 +8,78 @@
 # This work is licensed under the terms of the GNU GPL, version 2.  See
 # the COPYING file in the top-level directory.
 #
-# Usage:
-#
-# Start QEMU with:
-#
-# # qemu [...] -qmp unix:./qmp-sock,server
-#
-# Run the shell:
-#
-# $ qmp-shell ./qmp-sock
-#
-# Commands have the following format:
-#
-#< command-name > [ arg-name1=arg1 ] ... [ arg-nameN=argN ]
-#
-# For example:
-#
-# (QEMU) device_add driver=e1000 id=net1
-# {u'return': {}}
-# (QEMU)
-#
-# key=value pairs also support Python or JSON object literal subset notations,
-# without spaces. Dictionaries/objects {} are supported as are arrays [].
-#
-#example-command arg-name1={'key':'value','obj'={'prop':"value"}}
-#
-# Both JSON and Python formatting should work, including both styles of
-# string literal quotes. Both paradigms of literal values should work,
-# including null/true/false for JSON and None/True/False for Python.
-#
-#
-# Transactions have the following multi-line format:
-#
-#transaction(
-#action-name1 [ arg-name1=arg1 ] ... [arg-nameN=argN ]
-#...
-#action-nameN [ arg-name1=arg1 ] ... [arg-nameN=argN ]
-#)
-#
-# One line transactions are also supported:
-#
-#transaction( action-name1 ... )
-#
-# For example:
-#
-# (QEMU) transaction(
-# TRANS> block-dirty-bitmap-add node=drive0 name=bitmap1
-# TRANS> block-dirty-bitmap-clear node=drive0 name=bitmap0
-# TRANS> )
-# {"return": {}}
-# (QEMU)
-#
-# Use the -v and -p options to activate the verbose and pretty-print options,
-# which will echo back the properly formatted JSON-compliant QMP that is being
-# sent to QEMU, which is useful for debugging and documentation generation.
+
+"""
+Low-level QEMU shell on top of QMP.
+
+usage: qmp-shell [-h] [-H] [-N] [-v] [-p] qmp_server
+
+positional arguments:
+  qmp_server< UNIX socket path | TCP address:port >
+
+optional arguments:
+  -h, --helpshow this help message and exit
+  -H, --hmp Use HMP interface
+  -N, --skip-negotiation
+Skip negotiate (for qemu-ga)
+  -v, --verbose Verbose (echo commands sent and received)
+  -p, --pretty  Pretty-print JSON
+
+
+Start QEMU with:
+
+# qemu [...] -qmp unix:./qmp-sock,server
+
+Run the shell:
+
+$ qmp-shell ./qmp-sock
+
+Commands have the following format:
+
+   < command-name > [ arg-name1=arg1 ] ... [ arg-nameN=argN ]
+
+For example:
+
+(QEMU) device_add driver=e1000 id=net1
+{'return': {}}
+(QEMU)
+
+key=value pairs also support Python or JSON object literal subset notations,
+without spaces. Dictionaries/objects {} are supported as are arrays [].
+
+   example-command arg-name1={'key':'value','obj'={'prop':"value"}}
+
+Both JSON and Python formatting should work, including both styles of
+string literal quotes. Both paradigms of literal values should work,
+including null/true/false for JSON and None/True/False for Python.
+
+
+Transactions have the following multi-line format:
+
+   transaction(
+   action-name1 [ arg-name1=arg1 ] ... [arg-nameN=argN ]
+   ...
+   action-nameN [ arg-name1=arg1 ] ... [arg-nameN=argN ]
+   )
+
+One line transactions are also supported:
+
+   transaction( action-name1 ... )
+
+For example:
+
+(QEMU) transaction(
+TRANS> block-dirty-bitmap-add node=drive0 name=bitmap1
+TRANS> block-dirty-bitmap-clear node=drive0 name=bitmap0
+TRANS> )
+{"return": {}}
+(QEMU)
+
+Use the -v and -p options to activate the verbose and pretty-print options,
+which will echo back the properly formatted JSON-compliant QMP that is being
+sent to QEMU, which is useful for debugging and documentation generation.
+"""
+
 import argparse
 import ast
 import json
-- 
2.26.2

[PATCH v2 41/72] scripts/qmp-shell: fix shell history exception handling

2020-11-03 Thread John Snow

We want to remove exceptions that are too broad here; we only want to
catch IOErrors that get raised as a direct result of the open call.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 2fd677e3dabd..6ad9384c1804 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -67,7 +67,6 @@
 
 import ast
 import atexit
-import errno
 import json
 import os
 import re
@@ -163,19 +162,17 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 readline.set_completer_delims('')
 try:
 readline.read_history_file(self._histfile)
-except Exception as e:
-if isinstance(e, IOError) and e.errno == errno.ENOENT:
-# File not found. No problem.
-pass
-else:
-print("Failed to read history '%s'; %s" % (self._histfile, e))
+except FileNotFoundError:
+pass
+except IOError as err:
+print(f"Failed to read history '{self._histfile}': {err!s}")
 atexit.register(self.__save_history)
 
 def __save_history(self):
 try:
 readline.write_history_file(self._histfile)
-except Exception as e:
-print("Failed to save history file '%s'; %s" % (self._histfile, e))
+except IOError as err:
+print(f"Failed to save history file '{self._histfile}': {err!s}")
 
 @classmethod
 def __parse_value(cls, val):
-- 
2.26.2

[PATCH v2 61/72] scripts/qmp-shell: Use context manager instead of atexit

2020-11-03 Thread John Snow

We can invoke the shell history writing when we leave the QMPShell scope
instead of relying on atexit. Doing so may be preferable to avoid global
state being registered from within a class instead of from the
application logic directly.

Use QMP's context manager to hook this history saving at close time,
which gets invoked when we leave the context block.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index f14fe211cca4..ec028d662e8e 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -66,7 +66,6 @@
 # sent to QEMU, which is useful for debugging and documentation generation.
 import argparse
 import ast
-import atexit
 import json
 import os
 import re
@@ -142,6 +141,11 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 self.pretty = pretty
 self.verbose = verbose
 
+def close(self) -> None:
+# Hook into context manager of parent to save shell history.
+self._save_history()
+super().close()
+
 def _fill_completion(self) -> None:
 cmds = self.cmd('query-commands')
 if 'error' in cmds:
@@ -164,9 +168,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 pass
 except IOError as err:
 print(f"Failed to read history '{self._histfile}': {err!s}")
-atexit.register(self.__save_history)
 
-def __save_history(self) -> None:
+def _save_history(self) -> None:
 try:
 readline.write_history_file(self._histfile)
 except IOError as err:
@@ -448,25 +451,25 @@ def main() -> None:
 parser.error("QMP socket or TCP address must be specified")
 
 shell_class = HMPShell if args.hmp else QMPShell
+
 try:
 address = shell_class.parse_address(args.qmp_server)
 except qmp.QMPBadPortError:
 parser.error(f"Bad port number: {args.qmp_server}")
 return  # pycharm doesn't know error() is noreturn
 
-qemu = shell_class(address, args.pretty, args.verbose)
+with shell_class(address, args.pretty, args.verbose) as qemu:
+try:
+qemu.connect(negotiate=not args.skip_negotiation)
+except qmp.QMPConnectError:
+die("Didn't get QMP greeting message")
+except qmp.QMPCapabilitiesError:
+die("Couldn't negotiate capabilities")
+except OSError as err:
+die(f"Couldn't connect to {args.qmp_server}: {err!s}")
 
-try:
-qemu.connect(negotiate=not args.skip_negotiation)
-except qmp.QMPConnectError:
-die("Didn't get QMP greeting message")
-except qmp.QMPCapabilitiesError:
-die("Couldn't negotiate capabilities")
-except OSError as err:
-die(f"Couldn't connect to {args.qmp_server}: {err!s}")
-
-for _ in qemu.repl():
-pass
+for _ in qemu.repl():
+pass
 
 
 if __name__ == '__main__':
-- 
2.26.2

[PATCH v2 64/72] scripts/qmp-shell: Fix empty-transaction invocation

2020-11-03 Thread John Snow

calling "transaction( )" is pointless, but valid. Rework the parser to
allow this kind of invocation. This helps clean up exception handling
later by removing accidental breakages of the parser that aren't
explicitly forbidden.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 3c32b576a37d..78e4eae00754 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -244,11 +244,14 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 cmdargs = re.findall(argument_regex, cmdline)
 qmpcmd: QMPMessage
 
-# Transactional CLI entry/exit:
-if cmdargs[0] == 'transaction(':
+# Transactional CLI entry:
+if cmdargs and cmdargs[0] == 'transaction(':
 self._transmode = True
+self._actions = []
 cmdargs.pop(0)
-elif cmdargs[0] == ')' and self._transmode:
+
+# Transactional CLI exit:
+if cmdargs and cmdargs[0] == ')' and self._transmode:
 self._transmode = False
 if len(cmdargs) > 1:
 msg = 'Unexpected input after close of Transaction sub-shell'
@@ -257,15 +260,14 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 'execute': 'transaction',
 'arguments': {'actions': self._actions}
 }
-self._actions = list()
 return qmpcmd
 
-# Nothing to process?
+# No args, or no args remaining
 if not cmdargs:
 return None
 
-# Parse and then cache this Transactional Action
 if self._transmode:
+# Parse and cache this Transactional Action
 finalize = False
 action = {'type': cmdargs[0], 'data': {}}
 if cmdargs[-1] == ')':
-- 
2.26.2

[PATCH v2 62/72] scripts/qmp-shell: use logging to show warnings

2020-11-03 Thread John Snow

Perfect candidate is non-fatal shell history messages.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index ec028d662e8e..0199a13a3428 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -67,6 +67,7 @@
 import argparse
 import ast
 import json
+import logging
 import os
 import re
 import readline
@@ -85,6 +86,9 @@ from qemu import qmp
 from qemu.qmp import QMPMessage
 
 
+LOG = logging.getLogger(__name__)
+
+
 class QMPCompleter:
 # NB: Python 3.9+ will probably allow us to subclass list[str] directly,
 # but pylint as of today does not know that List[str] is simply 'list'.
@@ -167,13 +171,15 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 except FileNotFoundError:
 pass
 except IOError as err:
-print(f"Failed to read history '{self._histfile}': {err!s}")
+msg = f"Failed to read history '{self._histfile}': {err!s}"
+LOG.warning(msg)
 
 def _save_history(self) -> None:
 try:
 readline.write_history_file(self._histfile)
 except IOError as err:
-print(f"Failed to save history file '{self._histfile}': {err!s}")
+msg = f"Failed to save history file '{self._histfile}': {err!s}"
+LOG.warning(msg)
 
 @classmethod
 def __parse_value(cls, val: str) -> object:
-- 
2.26.2

[PATCH v2 72/72] scripts/qmp-shell: add redirection shim

2020-11-03 Thread John Snow

qmp-shell has a new home, add a redirect for a little while as the dust
settles.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100755 scripts/qmp/qmp-shell

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
new file mode 100755
index ..4a20f97db708
--- /dev/null
+++ b/scripts/qmp/qmp-shell
@@ -0,0 +1,11 @@
+#!/usr/bin/env python3
+
+import os
+import sys
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
+from qemu.qmp import qmp_shell
+
+
+if __name__ == '__main__':
+qmp_shell.main()
-- 
2.26.2

[PATCH v2 71/72] python: add qmp-shell entry point

2020-11-03 Thread John Snow

now 'qmp-shell' should be available from the command line when
installing the python package.

Signed-off-by: John Snow 
---
 python/setup.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/setup.cfg b/python/setup.cfg
index f2f54bcaefe8..230fd870ea91 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -42,6 +42,7 @@ console_scripts =
 qom-tree = qemu.qmp.qom:QOMTree.entry_point
 qom-fuse = qemu.qmp.qom_fuse:QOMFuse.entry_point
 qemu-ga-client = qemu.qmp.qemu_ga_client:main
+qmp-shell = qemu.qmp.qmp_shell:main
 
 [flake8]
 extend-ignore = E722  # Prefer pylint's bare-except checks to flake8's
-- 
2.26.2

[PATCH v2 37/72] scripts/qmp-shell: use triple-double-quote docstring style

2020-11-03 Thread John Snow

(2014 me had never written python before.)

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 7233c4e00b89..73b2c099f0f0 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -98,9 +98,12 @@ class QMPShellBadPort(QMPShellError):
 
 
 class FuzzyJSON(ast.NodeTransformer):
-'''This extension of ast.NodeTransformer filters literal "true/false/null"
+"""
+This extension of ast.NodeTransformer filters literal "true/false/null"
 values in an AST and replaces them by proper "True/False/None" values that
-Python can properly evaluate.'''
+Python can properly evaluate.
+"""
+
 @classmethod
 def visit_Name(cls, node):
 if node.id == 'true':
-- 
2.26.2

[PATCH v2 60/72] python/qmp: return generic type from context manager

2020-11-03 Thread John Snow

__enter__ can be invoked from a subclass, so it needs a more specific
type.

Signed-off-by: John Snow 
---
 python/qemu/qmp/__init__.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/qemu/qmp/__init__.py b/python/qemu/qmp/__init__.py
index ba0d2281d678..376954cb6d27 100644
--- a/python/qemu/qmp/__init__.py
+++ b/python/qemu/qmp/__init__.py
@@ -30,6 +30,7 @@
 TextIO,
 Tuple,
 Type,
+TypeVar,
 Union,
 cast,
 )
@@ -220,7 +221,9 @@ def __get_events(self, wait: Union[bool, float] = False) -> 
None:
 if ret is None:
 raise QMPConnectError("Error while reading from socket")
 
-def __enter__(self) -> 'QEMUMonitorProtocol':
+T = TypeVar('T')
+
+def __enter__(self: T) -> T:
 # Implement context manager enter function.
 return self
 
-- 
2.26.2

[PATCH v2 42/72] scripts/qmp-shell: explicitly chain exception context

2020-11-03 Thread John Snow

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 6ad9384c1804..b3ab735397d4 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -138,8 +138,8 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 if len(addr) == 2:
 try:
 port = int(addr[1])
-except ValueError:
-raise QMPShellBadPort
+except ValueError as err:
+raise QMPShellBadPort from err
 return addr[0], port
 # socket path
 return arg
-- 
2.26.2

[PATCH v2 58/72] scripts/qmp-shell: Accept SocketAddrT instead of string

2020-11-03 Thread John Snow

Don't extend QEMUMonitorProtocol but change the argument types; move the
string parsing just outside of the class.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 2d0e85b5f723..b465c7f9e2d6 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -130,9 +130,9 @@ class FuzzyJSON(ast.NodeTransformer):
 # TODO: QMPShell's interface is a bit ugly (eg. _fill_completion() and
 #   _execute_cmd()). Let's design a better one.
 class QMPShell(qmp.QEMUMonitorProtocol):
-def __init__(self, address: str, pretty: bool = False,
- verbose: bool = False):
-super().__init__(self.parse_address(address))
+def __init__(self, address: qmp.SocketAddrT,
+ pretty: bool = False, verbose: bool = False):
+super().__init__(address)
 self._greeting: Optional[QMPMessage] = None
 self._completer = QMPCompleter()
 self._pretty = pretty
@@ -347,7 +347,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 
 class HMPShell(QMPShell):
-def __init__(self, address: str,
+def __init__(self, address: qmp.SocketAddrT,
  pretty: bool = False, verbose: bool = False):
 super().__init__(address, pretty, verbose)
 self.__cpu_index = 0
@@ -450,11 +450,13 @@ def main() -> None:
 
 shell_class = HMPShell if args.hmp else QMPShell
 try:
-qemu = shell_class(args.qmp_server, args.pretty, args.verbose)
+address = shell_class.parse_address(args.qmp_server)
 except qmp.QMPBadPortError:
 parser.error(f"Bad port number: {args.qmp_server}")
 return  # pycharm doesn't know error() is noreturn
 
+qemu = shell_class(address, args.pretty, args.verbose)
+
 try:
 qemu.connect(negotiate=not args.skip_negotiation)
 except qmp.QMPConnectError:
-- 
2.26.2

[PATCH v2 67/72] scripts/qmp-shell: remove double-underscores

2020-11-03 Thread John Snow

They're not needed; single underscore is enough to express intent that
these methods are "internal". double underscore is used as a weak name
mangling, but that isn't beneficial for us here.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 52 +--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index 82fe16cff820..40ff9e0a82bd 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -171,7 +171,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 for cmd in cmds['return']:
 self._completer.append(cmd['name'])
 
-def __completer_setup(self) -> None:
+def _completer_setup(self) -> None:
 self._completer = QMPCompleter()
 self._fill_completion()
 readline.set_history_length(1024)
@@ -196,7 +196,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 LOG.warning(msg)
 
 @classmethod
-def __parse_value(cls, val: str) -> object:
+def _parse_value(cls, val: str) -> object:
 try:
 return int(val)
 except ValueError:
@@ -221,9 +221,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 pass
 return val
 
-def __cli_expr(self,
-   tokens: Sequence[str],
-   parent: qmp.QMPObject) -> None:
+def _cli_expr(self,
+  tokens: Sequence[str],
+  parent: qmp.QMPObject) -> None:
 for arg in tokens:
 (key, sep, val) = arg.partition('=')
 if sep != '=':
@@ -231,7 +231,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 f"Expected a key=value pair, got '{arg!s}'"
 )
 
-value = self.__parse_value(val)
+value = self._parse_value(val)
 optpath = key.split('.')
 curpath = []
 for path in optpath[:-1]:
@@ -249,7 +249,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 raise QMPShellError(f'Cannot set "{key}" multiple times')
 parent[optpath[-1]] = value
 
-def __build_cmd(self, cmdline: str) -> Optional[QMPMessage]:
+def _build_cmd(self, cmdline: str) -> Optional[QMPMessage]:
 """
 Build a QMP input object from a user provided command-line in the
 following format:
@@ -289,13 +289,13 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 if cmdargs[-1] == ')':
 cmdargs.pop(-1)
 finalize = True
-self.__cli_expr(cmdargs[1:], action['data'])
+self._cli_expr(cmdargs[1:], action['data'])
 self._actions.append(action)
-return self.__build_cmd(')') if finalize else None
+return self._build_cmd(')') if finalize else None
 
 # Standard command: parse and return it to be executed.
 qmpcmd = {'execute': cmdargs[0], 'arguments': {}}
-self.__cli_expr(cmdargs[1:], qmpcmd['arguments'])
+self._cli_expr(cmdargs[1:], qmpcmd['arguments'])
 return qmpcmd
 
 def _print(self, qmp_message: object) -> None:
@@ -306,7 +306,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 def _execute_cmd(self, cmdline: str) -> bool:
 try:
-qmpcmd = self.__build_cmd(cmdline)
+qmpcmd = self._build_cmd(cmdline)
 except QMPShellError as err:
 print(
 f"Error while parsing command line: {err!s}\n"
@@ -329,7 +329,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 
 def connect(self, negotiate: bool = True) -> None:
 self._greeting = super().connect(negotiate)
-self.__completer_setup()
+self._completer_setup()
 
 def show_banner(self,
 msg: str = 'Welcome to the QMP low-level shell!') -> None:
@@ -377,10 +377,10 @@ class HMPShell(QMPShell):
 def __init__(self, address: qmp.SocketAddrT,
  pretty: bool = False, verbose: bool = False):
 super().__init__(address, pretty, verbose)
-self.__cpu_index = 0
+self._cpu_index = 0
 
-def __cmd_completion(self) -> None:
-for cmd in self.__cmd_passthrough('help')['return'].split('\r\n'):
+def _cmd_completion(self) -> None:
+for cmd in self._cmd_passthrough('help')['return'].split('\r\n'):
 if cmd and cmd[0] != '[' and cmd[0] != '\t':
 name = cmd.split()[0]  # drop help text
 if name == 'info':
@@ -396,22 +396,22 @@ class HMPShell(QMPShell):
 self._completer.append(name)
 self._completer.append('help ' + name)  # help completion
 
-def __info_completion(self) -> None:
-for cmd in self.__cmd_passthrough('info')['return'].split('\r\n'):
+def _info_completion(self) -> None:
+for cmd in self._cmd_passthrough('info')['return'].split('\r\n'):
 if cmd:
 self._completer.append('info ' + cmd.split()[1])
 
-def __other_completion(self) ->

[PATCH v2 53/72] scripts/qmp-shell: Fix "FuzzyJSON" parser

2020-11-03 Thread John Snow

I'm not sure when this regressed (Or maybe if it was ever working right
to begin with?), but the Python AST requires you to change "Names" to
"Constants" in order to truly convert `false` to `False`.

Signed-off-by: John Snow 
---
 scripts/qmp/qmp-shell | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index aa148517a864..847d34890f97 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -95,18 +95,19 @@ class QMPShellError(Exception):
 class FuzzyJSON(ast.NodeTransformer):
 """
 This extension of ast.NodeTransformer filters literal "true/false/null"
-values in an AST and replaces them by proper "True/False/None" values that
-Python can properly evaluate.
+values in a Python AST and replaces them by proper "True/False/None" values
+that Python can properly evaluate.
 """
 
 @classmethod
-def visit_Name(cls, node):  # pylint: disable=invalid-name
+def visit_Name(cls,  # pylint: disable=invalid-name
+   node: ast.Name) -> ast.AST:
 if node.id == 'true':
-node.id = 'True'
+return ast.Constant(value=True)
 if node.id == 'false':
-node.id = 'False'
+return ast.Constant(value=False)
 if node.id == 'null':
-node.id = 'None'
+return ast.Constant(value=None)
 return node
 
 
@@ -174,10 +175,9 @@ class QMPShell(qmp.QEMUMonitorProtocol):
 # Try once again as FuzzyJSON:
 try:
 tree = ast.parse(val, mode='eval')
-return ast.literal_eval(FuzzyJSON().visit(tree))
-except SyntaxError:
-pass
-except ValueError:
+transformed = FuzzyJSON().visit(tree)
+return ast.literal_eval(transformed)
+except (SyntaxError, ValueError):
 pass
 return val
 
-- 
2.26.2

1 2 3 4 5 6 >

1 - 100 of 510 matches

Mail list logo