date:20200316

Re: [PATCH] target/i386: Add ARCH_CAPABILITIES related bits into Icelake-Server CPU model

2020-03-16 Thread Paolo Bonzini

On 16/03/20 06:33, Xiaoyao Li wrote:
> Current Icelake-Server CPU model lacks all the features enumerated by
> MSR_IA32_ARCH_CAPABILITIES.
> 
> Add them, so that guest of "Icelake-Server" can see all of them.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  target/i386/cpu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 92fafa265914..5f09d114e1c2 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3425,7 +3425,12 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
>  CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57,
>  .features[FEAT_7_0_EDX] =
> -CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
> +CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_ARCH_CAPABILITIES |
> +CPUID_7_0_EDX_SPEC_CTRL_SSBD,
> +.features[FEAT_ARCH_CAPABILITIES] =
> +MSR_ARCH_CAP_RDCL_NO | MSR_ARCH_CAP_IBRS_ALL |
> +MSR_ARCH_CAP_SKIP_L1DFL_VMENTRY | MSR_ARCH_CAP_MDS_NO |
> +MSR_ARCH_CAP_PSCHANGE_MC_NO | MSR_ARCH_CAP_TAA_NO,
>  /* Missing: XSAVES (not supported by some Linux versions,
>  * including v4.1 to v4.12).
>  * KVM doesn't yet expose any XSAVES state save component,
> 

Hi Xiaoyao,

you need to add them as a new version of the CPU model.

Paolo

Re: [PATCH] checkpatch: enforce process for expected files

2020-03-16 Thread Stefan Hajnoczi

On Sun, Mar 15, 2020 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> If the process documented in tests/qtest/bios-tables-test.c
> is followed, then same patch never touches both expected
> files and code. Teach checkpatch to enforce this rule.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Peter, Igor what do you think?

Minor comments below:

Reviewed-by: Stefan Hajnoczi 

> 
>  scripts/checkpatch.pl | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index b27e4ff5e9..96583e3fff 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -35,6 +35,8 @@ my $summary_file = 0;
>  my $root;
>  my %debug;
>  my $help = 0;
> +my $testexpected;
> +my $nontestexpected;

If you respin, please add acpi to these variable names since they are
specific to acpi.

>  
>  sub help {
>   my ($exitcode) = @_;
> @@ -1256,6 +1258,26 @@ sub WARN {
>   }
>  }
>  
> +# According to tests/qtest/bios-tables-test.c: do not
> +# change expected file in the same commit with adding test
> +sub checkfilename {
> + my ($name) = @_;

There is a tab instead of spaces here that could be fixed if you decide
to respin.

> +if ($name =~ m#^tests/data/acpi/# and
> +# make exception for a shell script that rebuilds the files
> +not $name =~ m#^\.sh$# or
> +$name =~ m#^tests/qtest/bios-tables-test-allowed-diff.h$#) {
> +$testexpected = $name;
> +} else {
> +$nontestexpected = $name;
> +}
> +if (defined $testexpected and defined $nontestexpected) {
> +ERROR("Do not add expected files together with tests, " .
> +  "follow instructions in " .
> +  "tests/qtest/bios-tables-test.c: both " .
> +  $testexpected . " and " . $nontestexpected . " found\n");
> +}
> +}
> +
>  sub process {
>   my $filename = shift;
>  
> @@ -1431,9 +1453,11 @@ sub process {
>   if ($line =~ /^diff --git.*?(\S+)$/) {
>   $realfile = $1;
>   $realfile =~ s@^([^/]*)/@@ if (!$file);
> +checkfilename($realfile);
>   } elsif ($line =~ /^\+\+\+\s+(\S+)/) {
>   $realfile = $1;
>   $realfile =~ s@^([^/]*)/@@ if (!$file);
> +checkfilename($realfile);

The surrounding lines in this hunk use tab indentation, not spaces.


signature.asc
Description: PGP signature

[PATCH] travis.yml: Set G_MESSAGES_DEBUG do report GLib errors

2020-03-16 Thread Philippe Mathieu-Daudé

Since commit f5852efa293 we can display GLib errors with the QEMU
error reporting API. Set it to the 'error' level, as this helps
understanding failures from QEMU calls to GLib on Travis-CI.

Signed-off-by: Philippe Mathieu-Daudé 
---
 .travis.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.travis.yml b/.travis.yml
index b92798ac3b..ccf68aa9ab 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -79,6 +79,7 @@ env:
 - 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
 - CCACHE_SLOPPINESS="include_file_ctime,include_file_mtime"
 - CCACHE_MAXSIZE=1G
+- G_MESSAGES_DEBUG=error
 
 
 git:
-- 
2.21.1

[Bug 1866870] Re: KVM Guest pauses after upgrade to Ubuntu 20.04

2020-03-16 Thread Christian Ehrhardt 

Thanks David!

While bisecting on upstream git with just "-cpu Penryn" we have seen that it 
always works there.
So it might be an interaction with some Ubuntu build/packaging/configure detail 
together with these old chips.

While we still can't be sure if the VMX warnings are a red-herring
chances are that only "-cpu Penryn,vmx=on" will trigger the issue -
Andreas will test and bisect with that once he is back online - we will
see if that is any different.

I'll also build a Ubuntu'esque 4.2 with the Penryn changes of [1]
reverted just to complete the interim picture of our testing. That is
available for testing at [2]. Further I added a Ubuntu build with rather
crude reverts of almost all VMX related 4.2 changes.

[1]: 
https://git.qemu.org/?p=qemu.git;a=commit;h=0723cc8a5558c94388db75ae1f4991314914edd3
[2]: 
https://launchpad.net/~paelzer/+archive/ubuntu/bug-1866870-qemu-penryn-crash
[3]: 
https://launchpad.net/~paelzer/+archive/ubuntu/bug-1866870-qemu-penryn-crash-fullreverts

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866870

Title:
  KVM Guest pauses after upgrade to Ubuntu 20.04

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Symptom:
  Error unpausing domain: internal error: unable to execute QEMU command 
'cont': Resetting the Virtual Machine is required

  Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in 
cb_wrapper
  callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
  callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 
66, in newfn
  ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/domain.py", line 1311, in 
resume
  self._backend.resume()
File "/usr/lib/python3/dist-packages/libvirt.py", line 2174, in resume
  if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self)
  libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': 
Resetting the Virtual Machine is required

  
  ---

  As outlined here:
  https://bugs.launchpad.net/qemu/+bug/1813165/comments/15

  After upgrade, all KVM guests are in a default pause state. Even after
  forcing them off via virsh, and restarting them the guests are paused.

  These Guests are not nested.

  A lot of diganostic information are outlined in the previous bug
  report link provided. The solution mentioned in previous report had
  been allegedly integrated into the downstream updates.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1866870/+subscriptions

[PATCH v7 3/4] qcow2: add zstd cluster compression

2020-03-16 Thread Denis Plotnikov

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |  20 +++
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 124 +
 block/qcow2.c  |   7 +++
 5 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 5597e24474..9048114445 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -208,6 +208,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
@@ -575,11 +576,30 @@ Compressed Clusters Descriptor (x = 62 - (cluster_bits - 
8)):
 Another compressed cluster may map to the tail of the final
 sector used by this compressed cluster.
 
+The layout of the compressed data depends on the 
compression
+type used for the image (see compressed cluster layout).
+
 If a cluster is unallocated, read requests shall read the data from the backing
 file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+=== Compressed Cluster Layout ===
+
+The compressed cluster data has a layout depending on the compression
+type used for the image, as follows:
+
+Compressed data layout for the available compression types:
+data_space_lenght - data chunk length available to store a compressed cluster.
+(for more details see "Compressed Clusters Descriptor")
+x = data_space_length - 1
+
+0:  (default)  zlib :
+Byte  0 -  x: the compressed data content
+  all the space provided used for compressed data
+1:  zstd :
+Byte  0 -  3: the length of compressed data in bytes
+  4 -  x: the compressed data content
 
 == Snapshots ==
 
diff --git a/configure b/configure
index caa65f5883..b2a0aa241a 100755
--- a/configure
+++ b/configure
@@ -1835,7 +1835,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a306484973..8953451818 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4401,11 +4401,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.0
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..b2d1c6d395 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,115 @@ static ssize_t

Re: [PATCH] cpus: avoid stucking in pause_all_vcpus due to race

2020-03-16 Thread Paolo Bonzini

On 16/03/20 09:37, Longpeng(Mike) wrote:
> From: Longpeng 
> 
> We found an issue when repeat reboot in guest during migration, it cause the
> migration thread never be waken up again.
> 
> |
>|
> LOCK BQL   |
> ...|
> main_loop_should_exit  |
>  pause_all_vcpus   |
>   1. set all cpus ->stop=true  |
>  and then kick |
>   2. return if all cpus is paused  |
>  (by '->stopped == true'), else|
>   3. qemu_cond_wait [BQL UNLOCK]   |
>|LOCK BQL
>|...
>|do_vm_stop
>| pause_all_vcpus
>|  (A)set all cpus ->stop=true
>| and then kick
>|  (B)return if all cpus is paused
>| (by '->stopped == true'), else
>|  (C)qemu_cond_wait [BQL UNLOCK]
>   4. be waken up and LOCK BQL  |  (D)be waken up BUT wait for  BQL
>   5. goto 2.   |
>  (BQL is still LOCKed) |
>  resume_all_vcpus  |
>   1. set all cpus ->stop=false |
>  and ->stopped=false   |
> ...|
> BQL UNLOCK |  (E)LOCK BQL
>|  (F)goto B. [but stopped is false now!]
>|Finally, sleep at step 3 forever.
> 
> As suggested by Paolo, resume_all_vcpus should notice this race, so we need
> to move the change of runstate before pause_all_vcpus in do_vm_stop() and
> ignore the resume request if runstate is not running.
> 
> Cc: Paolo Bonzini 
> Cc: Dr . David Alan Gilbert 
> Cc: Richard Henderson 
> Signed-off-by: Longpeng 

Queued, thanks!

Paolo

> ---
>  cpus.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index b4f8b84..ef441bd 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1026,9 +1026,9 @@ static int do_vm_stop(RunState state, bool send_stop)
>  int ret = 0;
>  
>  if (runstate_is_running()) {
> +runstate_set(state);
>  cpu_disable_ticks();
>  pause_all_vcpus();
> -runstate_set(state);
>  vm_state_notify(0, state);
>  if (send_stop) {
>  qapi_event_send_stop();
> @@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
>  {
>  CPUState *cpu;
>  
> +if (!runstate_is_running()) {
> +return;
> +}
> +
>  qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
>  CPU_FOREACH(cpu) {
>  cpu_resume(cpu);
>

Re: [PATCH v6 1/4] qcow2: introduce compression type feature

2020-03-16 Thread Denis Plotnikov


Thanks for the comments.
I'll make the fixes accordingly and re-sent the series shortly.

Denis

On 14.03.2020 00:40, Eric Blake wrote:

On 3/12/20 4:22 AM, Denis Plotnikov wrote:

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression 
than ZLIB.


The default compression is ZLIB. Images created with ZLIB compression 
type

are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some 
changes

in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
 * filter out compression_type for all the tests


Presumably this filter is optional, and we will not use it on the 
specific new tests that prove zstd compression works - but that should 
be later in the series, so for this patch it is okay.



 * fix header size, feature table size and backing file offset
   affected tests: 031, 036, 061, 080
   header_size +=8: 1 byte compression type
    7 bytes padding
   feature_table += 48: incompatible feture compression type


feature

   backing_file_offset += 56 (8 + 48 -> header_change + 
fature_table_change)


feature

(interesting that you have two different changed spellings ;)

 * add "compression type" for test output matching when it isn't 
filtered

   affected tests: 049, 060, 061, 065, 144, 182, 242, 255


Or maybe the comment above should be changed to "many tests" rather 
than "all the tests".




Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  qapi/block-core.json |  22 +-
  block/qcow2.h    |  20 -
  include/block/block_int.h    |   1 +
  block/qcow2.c    | 121 +++
  tests/qemu-iotests/031.out   |  14 ++--
  tests/qemu-iotests/036.out   |   4 +-
  tests/qemu-iotests/049.out   | 102 +-
  tests/qemu-iotests/060.out   |   1 +
  tests/qemu-iotests/061.out   |  34 +
  tests/qemu-iotests/065   |  28 ---
  tests/qemu-iotests/080   |   2 +-
  tests/qemu-iotests/144.out   |   4 +-
  tests/qemu-iotests/182.out   |   2 +-
  tests/qemu-iotests/242.out   |   5 ++
  tests/qemu-iotests/255.out   |   8 +-
  tests/qemu-iotests/common.filter |   3 +-
  16 files changed, 275 insertions(+), 96 deletions(-)




+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
    uint32_t refcount_order;
  uint32_t header_length;
+
+    /* Additional fields */
+    uint8_t  compression_type;
+
+    /* header must be a multiple of 8 */
+    uint8_t  padding[7];


Why two spaces after uint8_t (twice)?



@@ -369,6 +380,13 @@ typedef struct BDRVQcow2State {
    bool metadata_preallocation_checked;
  bool metadata_preallocation;
+    /*
+ * Compression type used for the image. Default: 0 - ZLIB
+ * The image compression type is set on image creation.
+ * The only way to change the compression type is to convert the 
image

+ * with the desired compression type set


Missing trailing '.'.  Maybe someday we can get 'qemu-img amend' to 
also adjust the compression type in-place; if that's something we 
think we might do, then this could be better worded as "For now, the 
only way to change...".



+++ b/block/qcow2.c
@@ -1242,6 +1242,48 @@ static int 
qcow2_update_options(BlockDriverState *bs, QDict *options,

  return ret;
  }
  +static int validate_compression_type(BDRVQcow2State *s, Error **errp)



+
+static int qcow2_compression_type_from_format(const char *ct)
+{
+    if (g_str_equal(ct, "zlib")) {
+    return QCOW2_COMPRESSION_TYPE_ZLIB;
+    } else {
+    return -EINVAL;
+    }


Why are you open-coding this?

qapi_enum_parse(_lookup, ct, -1, errp)

should do what you use this for, and automatically updates itself when 
you add zstd to the qapi enum later.



@@ -3401,6 +3493,8 @@ qcow2_co_create(BlockdevCreateOptions 
*create_options, Error **errp)

  .refcount_table_offset  = cpu_to_be64(cluster_size),
  .refcount_table_clusters    = cpu_to_be32(1),
  .refcount_order = cpu_to_be32(refcount_order),
+    /* don't deal with endians since compression_type is 1 byte 
long */


endianness


+    .compression_type   = compression_type,
  .header_length  = cpu_to_be32(sizeof(*header)),
  };
  @@ -5516,6 +5631,12

[PATCH v3 2/4] linux-user, aarch64: sync syscall numbers with kernel v5.5

2020-03-16 Thread Laurent Vivier

Use helper script scripts/gensyscalls.sh to generate the file.

This change TARGET_NR_fstatat64 by TARGET_NR_newfstatat that is correct
because definitions from linux are:

arch/arm64/include/uapi/asm/unistd.h

  #define __ARCH_WANT_NEW_STAT

include/uapi/asm-generic/unistd.h

  #if defined(__ARCH_WANT_NEW_STAT) || defined(__ARCH_WANT_STAT64)
  #define __NR3264_fstatat 79
  __SC_3264(__NR3264_fstatat, sys_fstatat64, sys_newfstatat)
  #define __NR3264_fstat 80
  __SC_3264(__NR3264_fstat, sys_fstat64, sys_newfstat)
  #endif
  ...
  #if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
  ...
  #if defined(__ARCH_WANT_NEW_STAT) || defined(__ARCH_WANT_STAT64)
  #define __NR_newfstatat __NR3264_fstatat
  #define __NR_fstat __NR3264_fstat
  #endif
  ...

Add syscalls 286 (preadv2) to 435 (clone3).

Signed-off-by: Laurent Vivier 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---

Notes:
v2: add comments suggested by Taylor

 linux-user/aarch64/syscall_nr.h | 34 -
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/linux-user/aarch64/syscall_nr.h b/linux-user/aarch64/syscall_nr.h
index f00ffd7fb82f..85de000b2490 100644
--- a/linux-user/aarch64/syscall_nr.h
+++ b/linux-user/aarch64/syscall_nr.h
@@ -1,7 +1,8 @@
 /*
  * This file contains the system call numbers.
+ * Do not modify.
+ * This file is generated by scripts/gensyscalls.sh
  */
-
 #ifndef LINUX_USER_AARCH64_SYSCALL_NR_H
 #define LINUX_USER_AARCH64_SYSCALL_NR_H
 
@@ -84,7 +85,7 @@
 #define TARGET_NR_splice 76
 #define TARGET_NR_tee 77
 #define TARGET_NR_readlinkat 78
-#define TARGET_NR_fstatat64 79
+#define TARGET_NR_newfstatat 79
 #define TARGET_NR_fstat 80
 #define TARGET_NR_sync 81
 #define TARGET_NR_fsync 82
@@ -254,8 +255,8 @@
 #define TARGET_NR_prlimit64 261
 #define TARGET_NR_fanotify_init 262
 #define TARGET_NR_fanotify_mark 263
-#define TARGET_NR_name_to_handle_at 264
-#define TARGET_NR_open_by_handle_at 265
+#define TARGET_NR_name_to_handle_at 264
+#define TARGET_NR_open_by_handle_at 265
 #define TARGET_NR_clock_adjtime 266
 #define TARGET_NR_syncfs 267
 #define TARGET_NR_setns 268
@@ -276,5 +277,28 @@
 #define TARGET_NR_membarrier 283
 #define TARGET_NR_mlock2 284
 #define TARGET_NR_copy_file_range 285
+#define TARGET_NR_preadv2 286
+#define TARGET_NR_pwritev2 287
+#define TARGET_NR_pkey_mprotect 288
+#define TARGET_NR_pkey_alloc 289
+#define TARGET_NR_pkey_free 290
+#define TARGET_NR_statx 291
+#define TARGET_NR_io_pgetevents 292
+#define TARGET_NR_rseq 293
+#define TARGET_NR_kexec_file_load 294
+#define TARGET_NR_pidfd_send_signal 424
+#define TARGET_NR_io_uring_setup 425
+#define TARGET_NR_io_uring_enter 426
+#define TARGET_NR_io_uring_register 427
+#define TARGET_NR_open_tree 428
+#define TARGET_NR_move_mount 429
+#define TARGET_NR_fsopen 430
+#define TARGET_NR_fsconfig 431
+#define TARGET_NR_fsmount 432
+#define TARGET_NR_fspick 433
+#define TARGET_NR_pidfd_open 434
+#define TARGET_NR_clone3 435
+#define TARGET_NR_syscalls 436
+
+#endif /* LINUX_USER_AARCH64_SYSCALL_NR_H */
 
-#endif
-- 
2.24.1

Re: [PATCH v2 3/8] qapi/misc: Restrict balloon-related commands to machine code

2020-03-16 Thread David Hildenbrand

On 16.03.20 01:03, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  qapi/machine.json  | 83 ++
>  qapi/misc.json | 83 --
>  include/sysemu/balloon.h   |  2 +-
>  balloon.c  |  2 +-
>  hw/virtio/virtio-balloon.c |  2 +-
>  monitor/hmp-cmds.c |  1 +
>  6 files changed, 87 insertions(+), 86 deletions(-)
> 
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 07ffc07ba2..c096efbea3 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -915,3 +915,86 @@
>'data': 'NumaOptions',
>'allow-preconfig': true
>  }
> +
> +##
> +# @balloon:
> +#
> +# Request the balloon driver to change its balloon size.
> +#
> +# @value: the target size of the balloon in bytes

Not related to your patch. The description of most of this stuff is wrong.

It's not the target size of the balloon, it's the target logical size of
the VM (logical_vm_size = vm_ram_size - balloon_size)

-> balloon_size = vm_ram_size - @value

E.g., "balloon 1024" with a 3G guest means "please inflate the balloon
to 2048"

> +#
> +# Returns: - Nothing on success
> +#  - If the balloon driver is enabled but not functional because the 
> KVM
> +#kernel module cannot support it, KvmMissingCap
> +#  - If no balloon device is present, DeviceNotActive
> +#
> +# Notes: This command just issues a request to the guest.  When it returns,
> +#the balloon size may not have changed.  A guest can change the 
> balloon
> +#size independent of this command.
> +#
> +# Since: 0.14.0
> +#
> +# Example:
> +#
> +# -> { "execute": "balloon", "arguments": { "value": 536870912 } }
> +# <- { "return": {} }
> +#
> +##
> +{ 'command': 'balloon', 'data': {'value': 'int'} }
> +
> +##
> +# @BalloonInfo:
> +#
> +# Information about the guest balloon device.
> +#
> +# @actual: the number of bytes the balloon currently contains

Dito

@actual is the logical size of the VM (logical_vm_size = vm_ram_size -
balloon_size)

> +#
> +# Since: 0.14.0
> +#
> +##
> +{ 'struct': 'BalloonInfo', 'data': {'actual': 'int' } }
> +
> +##
> +# @query-balloon:
> +#
> +# Return information about the balloon device.
> +#
> +# Returns: - @BalloonInfo on success
> +#  - If the balloon driver is enabled but not functional because the 
> KVM
> +#kernel module cannot support it, KvmMissingCap
> +#  - If no balloon device is present, DeviceNotActive
> +#
> +# Since: 0.14.0
> +#
> +# Example:
> +#
> +# -> { "execute": "query-balloon" }
> +# <- { "return": {
> +#  "actual": 1073741824,
> +#   }
> +#}
> +#
> +##
> +{ 'command': 'query-balloon', 'returns': 'BalloonInfo' }
> +
> +##
> +# @BALLOON_CHANGE:
> +#
> +# Emitted when the guest changes the actual BALLOON level. This value is
> +# equivalent to the @actual field return by the 'query-balloon' command
> +#
> +# @actual: actual level of the guest memory balloon in bytes

Dito

@actual is the logical size of the VM (vm_ram_size - balloon_size)


Most probably we want to pull this description fix out. #badinterface

-- 
Thanks,

David / dhildenb

Re: [PATCH] target/rx/cpu: Use address_space_ldl() to read reset vector address

2020-03-16 Thread Peter Maydell

On Sun, 15 Mar 2020 at 13:49, Philippe Mathieu-Daudé  wrote:
>
> From: Philippe Mathieu-Daudé 
>
> The RX code flash is not a Masked ROM but a EEPROM (electrically
> erasable programmable flash memory).
> When implementing the flash hardware, the rom_ptr() returns NULL
> and the reset vector is not set.
> Instead, use the address_space ld/st API to fetch the reset vector
> address from the code flash.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Based-on: <20200315132810.7022-1-f4...@amsat.org>
>
> Same issue might occurs in Cortex-M arm_cpu_reset()

rom_ptr() does not mean "I'm trying to get this from ROM",
it means "I'm trying to get this from a user-supplied ELF
file or similar which hasn't been loaded into guest memory
yet". (This is a workaround for a reset ordering issue where
CPU reset happens before rom_reset() runs.)

Removing the usage of rom_ptr() altogether here doesn't
look right -- have you tested the case where the initial
reset vector contents are provided via -kernel or
-device loader,... ?

thanks
-- PMM

[PATCH v4 3/6] virtio-net: implement RX RSS processing

2020-03-16 Thread Yuri Benditovich

If VIRTIO_NET_F_RSS negotiated and RSS is enabled, process
incoming packets, calculate packet's hash and place the
packet into respective RX virtqueue.

Signed-off-by: Yuri Benditovich 
---
 hw/net/virtio-net.c| 88 +-
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 6d21922746..de2d68d4ca 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -42,6 +42,7 @@
 #include "trace.h"
 #include "monitor/qdev.h"
 #include "hw/pci/pci.h"
+#include "net_rx_pkt.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -1598,8 +1599,80 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
 return 0;
 }
 
+static uint8_t virtio_net_get_hash_type(bool isip4,
+bool isip6,
+bool isudp,
+bool istcp,
+uint32_t types)
+{
+if (isip4) {
+if (istcp && (types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4)) {
+return NetPktRssIpV4Tcp;
+}
+if (isudp && (types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4)) {
+return NetPktRssIpV4Udp;
+}
+if (types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+return NetPktRssIpV4;
+}
+} else if (isip6) {
+uint32_t mask = VIRTIO_NET_RSS_HASH_TYPE_TCP_EX |
+VIRTIO_NET_RSS_HASH_TYPE_TCPv6;
+
+if (istcp && (types & mask)) {
+return (types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) ?
+NetPktRssIpV6TcpEx : NetPktRssIpV6Tcp;
+}
+mask = VIRTIO_NET_RSS_HASH_TYPE_UDP_EX | 
VIRTIO_NET_RSS_HASH_TYPE_UDPv6;
+if (isudp && (types & mask)) {
+return (types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) ?
+NetPktRssIpV6UdpEx : NetPktRssIpV6Udp;
+}
+mask = VIRTIO_NET_RSS_HASH_TYPE_IP_EX | VIRTIO_NET_RSS_HASH_TYPE_IPv6;
+if (types & mask) {
+return (types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) ?
+NetPktRssIpV6Ex : NetPktRssIpV6;
+}
+}
+return 0xff;
+}
+
+static int virtio_net_process_rss(NetClientState *nc, const uint8_t *buf,
+  size_t size)
+{
+VirtIONet *n = qemu_get_nic_opaque(nc);
+unsigned int index = nc->queue_index, new_index;
+struct NetRxPkt *pkt = n->rx_pkt;
+uint8_t net_hash_type;
+uint32_t hash;
+bool isip4, isip6, isudp, istcp;
+
+net_rx_pkt_set_protocols(pkt, buf + n->host_hdr_len,
+ size - n->host_hdr_len);
+net_rx_pkt_get_protocols(pkt, , , , );
+if (isip4 && (net_rx_pkt_get_ip4_info(pkt)->fragment)) {
+istcp = isudp = false;
+}
+if (isip6 && (net_rx_pkt_get_ip6_info(pkt)->fragment)) {
+istcp = isudp = false;
+}
+net_hash_type = virtio_net_get_hash_type(isip4, isip6, isudp, istcp,
+ n->rss_data.hash_types);
+if (net_hash_type > NetPktRssIpV6UdpEx) {
+return n->rss_data.default_queue;
+}
+
+hash = net_rx_pkt_calc_rss_hash(pkt, net_hash_type, n->rss_data.key);
+new_index = hash & (n->rss_data.indirections_len - 1);
+new_index = n->rss_data.indirections_table[new_index];
+if (index == new_index) {
+return -1;
+}
+return new_index;
+}
+
 static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
-  size_t size)
+  size_t size, bool no_rss)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1613,6 +1686,14 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 return -1;
 }
 
+if (!no_rss && n->rss_data.enabled) {
+int index = virtio_net_process_rss(nc, buf, size);
+if (index >= 0) {
+NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
+return virtio_net_receive_rcu(nc2, buf, size, true);
+}
+}
+
 /* hdr_len refers to the header we supply to the guest */
 if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) 
{
 return 0;
@@ -1707,7 +1788,7 @@ static ssize_t virtio_net_do_receive(NetClientState *nc, 
const uint8_t *buf,
 {
 RCU_READ_LOCK_GUARD();
 
-return virtio_net_receive_rcu(nc, buf, size);
+return virtio_net_receive_rcu(nc, buf, size, false);
 }
 
 static void virtio_net_rsc_extract_unit4(VirtioNetRscChain *chain,
@@ -3283,6 +3364,8 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 QTAILQ_INIT(>rsc_chains);
 n->qdev = dev;
+
+net_rx_pkt_init(>rx_pkt, false);
 }
 
 static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
@@ -3320,6 +3403,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, 
Error

[PATCH v4 5/6] virtio-net: reference implementation of hash report

2020-03-16 Thread Yuri Benditovich

Suggest VIRTIO_NET_F_HASH_REPORT if specified in device
parameters.
If the VIRTIO_NET_F_HASH_REPORT is set,
the device extends configuration space. If the feature
is negotiated, the packet layout is extended to
accomodate the hash information. In this case deliver
packet's hash value and report type in virtio header
extension.
Use for configuration the same procedure as already
used for RSS. We add two fields in rss_data that
controls what the device does with the calculated hash
if rss_data.enabled is set. If field 'populate' is set
the hash is set in the packet, if field 'redirect' is
set the hash is used to decide the queue to place the
packet to.

Signed-off-by: Yuri Benditovich 
---
 hw/net/virtio-net.c| 99 +++---
 include/hw/virtio/virtio-net.h |  2 +
 2 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index de2d68d4ca..a0614ad4e6 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -184,7 +184,7 @@ static VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_net_config, mtu)},
 {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
  .end = endof(struct virtio_net_config, duplex)},
-{.flags = 1ULL << VIRTIO_NET_F_RSS,
+{.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL << VIRTIO_NET_F_HASH_REPORT),
  .end = endof(struct virtio_net_config_with_rss, supported_hash_types)},
 {}
 };
@@ -218,7 +218,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
uint8_t *config)
 netcfg.cfg.duplex = n->net_conf.duplex;
 netcfg.rss_max_key_size = VIRTIO_NET_RSS_MAX_KEY_SIZE;
 virtio_stw_p(vdev, _max_indirection_table_length,
- VIRTIO_NET_RSS_MAX_TABLE_LEN);
+ virtio_host_has_feature(vdev, VIRTIO_NET_F_RSS) ?
+ VIRTIO_NET_RSS_MAX_TABLE_LEN : 1);
 virtio_stl_p(vdev, _hash_types,
  VIRTIO_NET_RSS_SUPPORTED_HASHES);
 memcpy(config, , n->config_size);
@@ -644,7 +645,7 @@ static int peer_has_ufo(VirtIONet *n)
 }
 
 static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
-   int version_1)
+   int version_1, int hash_report)
 {
 int i;
 NetClientState *nc;
@@ -652,7 +653,10 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
 if (version_1) {
-n->guest_hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+n->guest_hdr_len = hash_report ?
+sizeof(struct virtio_net_hdr_v1_hash) :
+sizeof(struct virtio_net_hdr_mrg_rxbuf);
+n->rss_data.populate_hash = true;
 } else {
 n->guest_hdr_len = n->mergeable_rx_bufs ?
 sizeof(struct virtio_net_hdr_mrg_rxbuf) :
@@ -773,6 +777,8 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO4);
 virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO6);
 virtio_clear_feature(, VIRTIO_NET_F_GUEST_ECN);
+
+virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);
 }
 
 if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
@@ -785,6 +791,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 }
 
 virtio_clear_feature(, VIRTIO_NET_F_RSS);
+virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);
 features = vhost_net_get_features(get_vhost_net(nc->peer), features);
 vdev->backend_features = features;
 
@@ -951,12 +958,15 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint64_t features)
virtio_has_feature(features,
   VIRTIO_NET_F_MRG_RXBUF),
virtio_has_feature(features,
-  VIRTIO_F_VERSION_1));
+  VIRTIO_F_VERSION_1),
+   virtio_has_feature(features,
+  VIRTIO_NET_F_HASH_REPORT));
 
 n->rsc4_enabled = virtio_has_feature(features, VIRTIO_NET_F_RSC_EXT) &&
 virtio_has_feature(features, VIRTIO_NET_F_GUEST_TSO4);
 n->rsc6_enabled = virtio_has_feature(features, VIRTIO_NET_F_RSC_EXT) &&
 virtio_has_feature(features, VIRTIO_NET_F_GUEST_TSO6);
+n->rss_data.redirect = virtio_has_feature(features, VIRTIO_NET_F_RSS);
 
 if (n->has_vnet_hdr) {
 n->curr_guest_offloads =
@@ -1230,7 +1240,9 @@ static void virtio_net_disable_rss(VirtIONet *n)
 }
 
 static uint16_t virtio_net_handle_rss(VirtIONet *n,
-  struct iovec *iov, unsigned int iov_cnt)
+  struct iovec *iov,
+  unsigned int iov_cnt,
+  bool do_rss)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(n);

[PATCH v4 1/6] virtio-net: introduce RSS and hash report features

2020-03-16 Thread Yuri Benditovich

Signed-off-by: Yuri Benditovich 
---
 hw/net/virtio-net.c | 65 +
 1 file changed, 65 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3627bb1717..90b01221e9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -71,6 +71,71 @@
 #define VIRTIO_NET_IP6_ADDR_SIZE   32  /* ipv6 saddr + daddr */
 #define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
 
+/* TODO: remove after virtio-net header update */
+#if !defined(VIRTIO_NET_RSS_HASH_TYPE_IPv4)
+#define VIRTIO_NET_F_HASH_REPORT57  /* Supports hash report */
+#define VIRTIO_NET_F_RSS60  /* Supports RSS RX steering */
+
+/* supported/enabled hash types */
+#define VIRTIO_NET_RSS_HASH_TYPE_IPv4  (1 << 0)
+#define VIRTIO_NET_RSS_HASH_TYPE_TCPv4 (1 << 1)
+#define VIRTIO_NET_RSS_HASH_TYPE_UDPv4 (1 << 2)
+#define VIRTIO_NET_RSS_HASH_TYPE_IPv6  (1 << 3)
+#define VIRTIO_NET_RSS_HASH_TYPE_TCPv6 (1 << 4)
+#define VIRTIO_NET_RSS_HASH_TYPE_UDPv6 (1 << 5)
+#define VIRTIO_NET_RSS_HASH_TYPE_IP_EX (1 << 6)
+#define VIRTIO_NET_RSS_HASH_TYPE_TCP_EX(1 << 7)
+#define VIRTIO_NET_RSS_HASH_TYPE_UDP_EX(1 << 8)
+
+struct virtio_net_config_with_rss {
+struct virtio_net_config cfg;
+/* maximum size of RSS key */
+uint8_t rss_max_key_size;
+/* maximum number of indirection table entries */
+uint16_t rss_max_indirection_table_length;
+/* bitmask of supported VIRTIO_NET_RSS_HASH_ types */
+uint32_t supported_hash_types;
+} QEMU_PACKED;
+
+struct virtio_net_hdr_v1_hash {
+struct virtio_net_hdr_v1 hdr;
+uint32_t hash_value;
+#define VIRTIO_NET_HASH_REPORT_NONE0
+#define VIRTIO_NET_HASH_REPORT_IPv41
+#define VIRTIO_NET_HASH_REPORT_TCPv4   2
+#define VIRTIO_NET_HASH_REPORT_UDPv4   3
+#define VIRTIO_NET_HASH_REPORT_IPv64
+#define VIRTIO_NET_HASH_REPORT_TCPv6   5
+#define VIRTIO_NET_HASH_REPORT_UDPv6   6
+#define VIRTIO_NET_HASH_REPORT_IPv6_EX 7
+#define VIRTIO_NET_HASH_REPORT_TCPv6_EX8
+#define VIRTIO_NET_HASH_REPORT_UDPv6_EX9
+uint16_t hash_report;
+uint16_t padding;
+};
+
+/*
+ * The command VIRTIO_NET_CTRL_MQ_RSS_CONFIG has the same effect as
+ * VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET does and additionally configures
+ * the receive steering to use a hash calculated for incoming packet
+ * to decide on receive virtqueue to place the packet. The command
+ * also provides parameters to calculate a hash and receive virtqueue.
+ */
+struct virtio_net_rss_config {
+uint32_t hash_types;
+uint16_t indirection_table_mask;
+uint16_t unclassified_queue;
+uint16_t indirection_table[1/* + indirection_table_mask */];
+uint16_t max_tx_vq;
+uint8_t hash_key_length;
+uint8_t hash_key_data[/* hash_key_length */];
+};
+
+#define VIRTIO_NET_CTRL_MQ_RSS_CONFIG  1
+#define VIRTIO_NET_CTRL_MQ_HASH_CONFIG 2
+
+#endif
+
 /* Purge coalesced packets timer interval, This value affects the performance
a lot, and should be tuned carefully, '30'(300us) is the recommended
value to pass the WHQL test, '5' can gain 2x netperf throughput with
-- 
2.17.1

[PATCH v4 2/6] virtio-net: implement RSS configuration command

2020-03-16 Thread Yuri Benditovich

Optionally report RSS feature.
Handle RSS configuration command and keep RSS parameters
in virtio-net device context.

Signed-off-by: Yuri Benditovich 
---
 hw/net/trace-events|   3 +
 hw/net/virtio-net.c| 189 +
 include/hw/virtio/virtio-net.h |  13 +++
 3 files changed, 185 insertions(+), 20 deletions(-)

diff --git a/hw/net/trace-events b/hw/net/trace-events
index a1da98a643..a84b9c3d9f 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -371,6 +371,9 @@ virtio_net_announce_notify(void) ""
 virtio_net_announce_timer(int round) "%d"
 virtio_net_handle_announce(int round) "%d"
 virtio_net_post_load_device(void)
+virtio_net_rss_disable(void)
+virtio_net_rss_error(const char *msg, uint32_t value) "%s, value 0x%08x"
+virtio_net_rss_enable(uint32_t p1, uint16_t p2, uint8_t p3) "hashes 0x%x, 
table of %d, key of %d"
 
 # tulip.c
 tulip_reg_write(uint64_t addr, const char *name, int size, uint64_t val) "addr 
0x%02"PRIx64" (%s) size %d value 0x%08"PRIx64
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 90b01221e9..6d21922746 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -142,6 +142,16 @@ struct virtio_net_rss_config {
tso/gso/gro 'off'. */
 #define VIRTIO_NET_RSC_DEFAULT_INTERVAL 30
 
+#define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
+ VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
+ VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
+ VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
+ VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
+ VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \
+ VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \
+ VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
+ VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
+
 /* temporary until standard header include it */
 #if !defined(VIRTIO_NET_HDR_F_RSC_INFO)
 
@@ -173,6 +183,8 @@ static VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_net_config, mtu)},
 {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
  .end = endof(struct virtio_net_config, duplex)},
+{.flags = 1ULL << VIRTIO_NET_F_RSS,
+ .end = endof(struct virtio_net_config_with_rss, supported_hash_types)},
 {}
 };
 
@@ -195,28 +207,33 @@ static int vq2q(int queue_index)
 static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
-struct virtio_net_config netcfg;
-
-virtio_stw_p(vdev, , n->status);
-virtio_stw_p(vdev, _virtqueue_pairs, n->max_queues);
-virtio_stw_p(vdev, , n->net_conf.mtu);
-memcpy(netcfg.mac, n->mac, ETH_ALEN);
-virtio_stl_p(vdev, , n->net_conf.speed);
-netcfg.duplex = n->net_conf.duplex;
+struct virtio_net_config_with_rss netcfg;
+
+virtio_stw_p(vdev, , n->status);
+virtio_stw_p(vdev, _virtqueue_pairs, n->max_queues);
+virtio_stw_p(vdev, , n->net_conf.mtu);
+memcpy(netcfg.cfg.mac, n->mac, ETH_ALEN);
+virtio_stl_p(vdev, , n->net_conf.speed);
+netcfg.cfg.duplex = n->net_conf.duplex;
+netcfg.rss_max_key_size = VIRTIO_NET_RSS_MAX_KEY_SIZE;
+virtio_stw_p(vdev, _max_indirection_table_length,
+ VIRTIO_NET_RSS_MAX_TABLE_LEN);
+virtio_stl_p(vdev, _hash_types,
+ VIRTIO_NET_RSS_SUPPORTED_HASHES);
 memcpy(config, , n->config_size);
 }
 
 static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
-struct virtio_net_config netcfg = {};
+struct virtio_net_config_with_rss netcfg = {};
 
 memcpy(, config, n->config_size);
 
 if (!virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR) &&
 !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1) &&
-memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
-memcpy(n->mac, netcfg.mac, ETH_ALEN);
+memcmp(netcfg.cfg.mac, n->mac, ETH_ALEN)) {
+memcpy(n->mac, netcfg.cfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
 }
 }
@@ -766,6 +783,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 return features;
 }
 
+virtio_clear_feature(, VIRTIO_NET_F_RSS);
 features = vhost_net_get_features(get_vhost_net(nc->peer), features);
 vdev->backend_features = features;
 
@@ -925,6 +943,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint64_t features)
 }
 
 virtio_net_set_multiqueue(n,
+  virtio_has_feature(features, VIRTIO_NET_F_RSS) ||
   virtio_has_feature(features, VIRTIO_NET_F_MQ));
 
 virtio_net_set_mrg_rx_bufs(n,
@@ -1201,25 +1220,152 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
+static void virtio_net_disable_rss(VirtIONet *n)
+{
+

Re: [PATCH] tools/virtiofsd: add support for --socket-group

2020-03-16 Thread Daniel P . Berrangé

On Sat, Mar 14, 2020 at 02:33:25PM +0100, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Mar 12, 2020 at 11:49 AM Daniel P. Berrangé  
> wrote:
> >
> > On Thu, Mar 12, 2020 at 10:41:42AM +, Alex Bennée wrote:
> > > If you like running QEMU as a normal user (very common for TCG runs)
> > > but you have to run virtiofsd as a root user you run into connection
> > > problems. Adding support for an optional --socket-group allows the
> > > users to keep using the command line.
> >
> > If we're going to support this, then I think we need to put it in
> > the vhost-user.rst specification so we standardize across backends.
> >
> >
> 
> Perhaps. Otoh, I wonder if the backend spec should be more limited to
> arguments/introspection that are used by programs.
> 
> In this case, I even consider --socket-path to be unnecessary, as a
> management layer can/should provide a preopened & setup fd directly.
> 
> What do you think?

I think there's value in standardization even if it is an option targetted
at human admins, rather than machine usage. You are right though that
something like libvirt would never use --socket-group, or --socket-path.
Even admins would benefit if all programs followed the same naming for
these.  We could document such options as "SHOULD" rather than "MUST"
IOW, we don't mandate --socket-group, but if you're going to provide a
way to control socket group, this option should be used.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[Bug 1867601] [NEW] test-char not concurrent with unix socket

2020-03-16 Thread Philippe Mathieu-Daudé

Public bug reported:

'make check-unit' might fail when running multiple tests in parallel.

Apparently occurred on OSX CI:
https://travis-ci.org/github/philmd/qemu/jobs/662357430

Guess is same unix path used:

static SocketAddress unixaddr = {
.type = SOCKET_ADDRESS_TYPE_UNIX,
.u.q_unix.path = (char *)"test-char.sock",
};

Note, other tests in this file use g_dir_make_tmp().

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1867601

Title:
  test-char not concurrent with unix socket

Status in QEMU:
  New

Bug description:
  'make check-unit' might fail when running multiple tests in parallel.

  Apparently occurred on OSX CI:
  https://travis-ci.org/github/philmd/qemu/jobs/662357430

  Guess is same unix path used:

  static SocketAddress unixaddr = {
  .type = SOCKET_ADDRESS_TYPE_UNIX,
  .u.q_unix.path = (char *)"test-char.sock",
  };

  Note, other tests in this file use g_dir_make_tmp().

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1867601/+subscriptions

[PATCH v7 0/4] qcow2: Implement zstd cluster compression method

2020-03-16 Thread Denis Plotnikov

v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series

Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |  20 
 configure|   2 +-
 qapi/block-core.json |  23 +++-
 block/qcow2.h|  20 +++-
 include/block/block_int.h|   1 +
 block/qcow2-threads.c| 195 +--
 block/qcow2.c| 120 +++
 tests/qemu-iotests/031.out   |  14 +--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 +++---
 tests/qemu-iotests/065   |  28 +++--
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 +
 tests/qemu-iotests/255.out   |   8 +-
 tests/qemu-iotests/287   | 128 
 tests/qemu-iotests/287.out   |  43 +++
 tests/qemu-iotests/common.filter |   3 +-
 tests/qemu-iotests/group |   1 +
 22 files changed, 652 insertions(+), 108 deletions(-)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

-- 
2.17.0

Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-16 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> On 14.03.2020 00:54, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>>
>>> 13.03.2020 18:42, Markus Armbruster wrote:
 Vladimir Sementsov-Ogievskiy  writes:

> 12.03.2020 19:36, Markus Armbruster wrote:
>> I may have a second look tomorrow with fresher eyes, but let's get this
>> out now as is.
>>
>> Vladimir Sementsov-Ogievskiy  writes:
>>
>>> Script adds ERRP_AUTO_PROPAGATE macro invocation where appropriate and
>>> does corresponding changes in code (look for details in
>>> include/qapi/error.h)
>>>
>>> Usage example:
>>> spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
>>> --macro-file scripts/cocci-macro-file.h --in-place --no-show-diff \
>>> --max-width 80 FILES...
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>
>>> Cc: Eric Blake 
>>> Cc: Kevin Wolf 
>>> Cc: Max Reitz 
>>> Cc: Greg Kurz 
>>> Cc: Christian Schoenebeck 
>>> Cc: Stefano Stabellini 
>>> Cc: Anthony Perard 
>>> Cc: Paul Durrant 
>>> Cc: Stefan Hajnoczi 
>>> Cc: "Philippe Mathieu-Daudé" 
>>> Cc: Laszlo Ersek 
>>> Cc: Gerd Hoffmann 
>>> Cc: Stefan Berger 
>>> Cc: Markus Armbruster 
>>> Cc: Michael Roth 
>>> Cc: qemu-devel@nongnu.org
>>> Cc: qemu-bl...@nongnu.org
>>> Cc: xen-de...@lists.xenproject.org
>>>
>>> scripts/coccinelle/auto-propagated-errp.cocci | 327 
>>> ++
>>> include/qapi/error.h  |   3 +
>>> MAINTAINERS   |   1 +
>>> 3 files changed, 331 insertions(+)
>>> create mode 100644 scripts/coccinelle/auto-propagated-errp.cocci
>>>
>>> diff --git a/scripts/coccinelle/auto-propagated-errp.cocci 
>>> b/scripts/coccinelle/auto-propagated-errp.cocci
>>> new file mode 100644
>>> index 00..7dac2dcfa4
>>> --- /dev/null
>>> +++ b/scripts/coccinelle/auto-propagated-errp.cocci
>>> @@ -0,0 +1,327 @@
>>> +// Use ERRP_AUTO_PROPAGATE (see include/qapi/error.h)
>>> +//
>>> +// Copyright (c) 2020 Virtuozzo International GmbH.
>>> +//
>>> +// This program is free software; you can redistribute it and/or
>>> +// modify it under the terms of the GNU General Public License as
>>> +// published by the Free Software Foundation; either version 2 of the
>>> +// License, or (at your option) any later version.
>>> +//
>>> +// This program is distributed in the hope that it will be useful,
>>> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> +// GNU General Public License for more details.
>>> +//
>>> +// You should have received a copy of the GNU General Public License
>>> +// along with this program.  If not, see
>>> +// .
>>> +//
>>> +// Usage example:
>>> +// spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
>>> +//  --macro-file scripts/cocci-macro-file.h --in-place \
>>> +//  --no-show-diff --max-width 80 FILES...
>>> +//
>>> +// Note: --max-width 80 is needed because coccinelle default is less
>>> +// than 80, and without this parameter coccinelle may reindent some
>>> +// lines which fit into 80 characters but not to coccinelle default,
>>> +// which in turn produces extra patch hunks for no reason.
>>
>> This is about unwanted reformatting of parameter lists due to the ___
>> chaining hack.  --max-width 80 makes that less likely, but not
>> impossible.
>>
>> We can search for unwanted reformatting of parameter lists.  I think
>> grepping diffs for '^\+.*Error \*\*' should do the trick.  For the whole
>> tree, I get one false positive (not a parameter list), and one hit:
>>
>>@@ -388,8 +388,10 @@ static void object_post_init_with_type(O
>> }
>> }
>>
>>-void object_apply_global_props(Object *obj, const GPtrArray 
>> *props, Error **errp)
>>+void object_apply_global_props(Object *obj, const GPtrArray 
>> *props,
>>+   Error **errp)
>> {
>>+ERRP_AUTO_PROPAGATE();
>> int i;
>>
>> if (!props) {
>>
>> Reformatting, but not unwanted.
>
> Yes, I saw it. This line is 81 character length, so it's OK to fix it in 
> one hunk with
> ERRP_AUTO_PROPAGATE addition even for non-automatic patch.

 Agree.

>>
>> The --max-width 80 hack is good enough for me.
>>
>> It does result in slightly long transformed lines, e.g. this one in
>> replication.c:
>>
>>@@ -113,7 +113,7 @@ static int replication_open(BlockDriverS
>> s->mode

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Auger Eric

Hi Bharat,

On 3/16/20 9:58 AM, Bharat Bhushan wrote:
> Hi Eric,
> 
> On Mon, Mar 16, 2020 at 1:15 PM Bharat Bhushan  
> wrote:
>>
>> Hi Eric,
>>
>> On Mon, Mar 16, 2020 at 1:02 PM Auger Eric  wrote:
>>>
>>> Hi Bharat,
>>>
>>> On 3/16/20 7:41 AM, Bharat Bhushan wrote:
 Hi Eric,

 On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
>
> Hi Bharat
>
> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
>> iommu-notifier are called when a device is attached
> IOMMU notifiers
>> or detached to as address-space.
>> This is needed for VFIO.
> and vhost for detach
>>
>> Signed-off-by: Bharat Bhushan 
>> ---
>>  hw/virtio/virtio-iommu.c | 47 
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index e51344a53e..2006f72901 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
>>  uint32_t id;
>>  VirtIOIOMMUDomain *domain;
>>  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
>> +VirtIOIOMMU *viommu;
> This needs specal care on post-load. When migrating the EPs, only the id
> is migrated. On post-load you need to set viommu as it is done for
> domain. migration is allowed with vhost.

 ok, I have not tried vhost/migration. Below change set viommu when
 reconstructing endpoint.
>>>
>>>
>>> Yes I think this should be OK.
>>>
>>> By the end I did the series a try with vhost/vfio. with vhost it works
>>> (not with recent kernel though, but the issue may be related to kernel).
>>> With VFIO however it does not for me.
>>>
>>> First issue is: your guest can use 4K page and your host can use 64KB
>>> pages. In that case VFIO_DMA_MAP will fail with -EINVAL. We must devise
>>> a way to pass the host settings to the VIRTIO-IOMMU device.
>>>
>>> Even with 64KB pages, it did not work for me. I have obviously not the
>>> storm of VFIO_DMA_MAP failures but I have some, most probably due to
>>> some wrong notifications somewhere. I will try to investigate on my side.
>>>
>>> Did you test with VFIO on your side?
>>
>> I did not tried with different page sizes, only tested with 4K page size.
>>
>> Yes it works, I tested with two n/w device assigned to VM, both interfaces 
>> works
>>
>> First I will try with 64k page size.
> 
> 64K page size does not work for me as well,
> 
> I think we are not passing correct page_size_mask here
> (config.page_size_mask is set to TARGET_PAGE_MASK ( which is
> 0xf000))
I guess you mean with guest using 4K and host using 64K.
> 
> We need to set this correctly as per host page size, correct?
Yes that's correct. We need to put in place a control path to retrieve
the page settings on host through VFIO to inform the virtio-iommu device.

Besides this issue, did you try with 64kB on host and guest?

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Thanks
>> -Bharat
>>
>>>
>>> Thanks
>>>
>>> Eric

 @@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
 key, gpointer value,

  QLIST_FOREACH(iter, >endpoint_list, next) {
  iter->domain = d;
 +   iter->viommu = s;
  g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
  }
  return false; /* continue the domain traversal */

>>  } VirtIOIOMMUEndpoint;
>>
>>  typedef struct VirtIOIOMMUInterval {
>> @@ -155,8 +156,44 @@ static void 
>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
>>  memory_region_notify_iommu(mr, 0, entry);
>>  }
>>
>> +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer value,
>> +   gpointer data)
>> +{
>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
>> +
>> +virtio_iommu_notify_unmap(mr, interval->low,
>> +  interval->high - interval->low + 1);
>> +
>> +return false;
>> +}
>> +
>> +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
>> + gpointer data)
>> +{
>> +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
>> +
>> +virtio_iommu_notify_map(mr, interval->low, mapping->phys_addr,
>> +interval->high - interval->low + 1);
>> +
>> +return false;
>> +}
>> +
>>  static void 
>> virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
>>  {
>> +VirtioIOMMUNotifierNode *node;
>> +VirtIOIOMMU *s = ep->viommu;
>> +VirtIOIOMMUDomain *domain =

Re: [PATCH 4/8] hw/ide: Move MAX_IDE_BUS define to one header

2020-03-16 Thread Philippe Mathieu-Daudé


On 3/16/20 7:53 AM, Markus Armbruster wrote:

BALATON Zoltan  writes:


There are several definitions of MAX_IDE_BUS in different boards (some
of them unused) with the same value. Move it to include/hw/ide/internal.h
to have it in a central place.

Signed-off-by: BALATON Zoltan 


This one feels a bit questionable.

The number of (PATA) IDE buses provides by a host bus adapter depends on
the HBA.  It happens to be 2 for all HBAs we implement, but it could
really be anything.

Similar for SATA, where the common number is 6, but could really be
anything.  I can't see offhand whether any HBA we implement provides a
different number.

By moving MAX_IDE_BUS to include/hw/ide/internal.h, you bake the
accidental commonality into the interface to the IDE core.  I'd prefer
not to.


I agree with Markus here (I kept this commit tagged because I was 
thinking the same but didn't know how to express it correctly. Thanks 
Markus!).

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Bharat Bhushan

Hi Eric,

On Mon, Mar 16, 2020 at 2:35 PM Auger Eric  wrote:
>
> Hi Bharat,
>
> On 3/16/20 9:58 AM, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > On Mon, Mar 16, 2020 at 1:15 PM Bharat Bhushan  
> > wrote:
> >>
> >> Hi Eric,
> >>
> >> On Mon, Mar 16, 2020 at 1:02 PM Auger Eric  wrote:
> >>>
> >>> Hi Bharat,
> >>>
> >>> On 3/16/20 7:41 AM, Bharat Bhushan wrote:
>  Hi Eric,
> 
>  On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
> >
> > Hi Bharat
> >
> > On 3/13/20 8:48 AM, Bharat Bhushan wrote:
> >> iommu-notifier are called when a device is attached
> > IOMMU notifiers
> >> or detached to as address-space.
> >> This is needed for VFIO.
> > and vhost for detach
> >>
> >> Signed-off-by: Bharat Bhushan 
> >> ---
> >>  hw/virtio/virtio-iommu.c | 47 
> >>  1 file changed, 47 insertions(+)
> >>
> >> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >> index e51344a53e..2006f72901 100644
> >> --- a/hw/virtio/virtio-iommu.c
> >> +++ b/hw/virtio/virtio-iommu.c
> >> @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
> >>  uint32_t id;
> >>  VirtIOIOMMUDomain *domain;
> >>  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
> >> +VirtIOIOMMU *viommu;
> > This needs specal care on post-load. When migrating the EPs, only the id
> > is migrated. On post-load you need to set viommu as it is done for
> > domain. migration is allowed with vhost.
> 
>  ok, I have not tried vhost/migration. Below change set viommu when
>  reconstructing endpoint.
> >>>
> >>>
> >>> Yes I think this should be OK.
> >>>
> >>> By the end I did the series a try with vhost/vfio. with vhost it works
> >>> (not with recent kernel though, but the issue may be related to kernel).
> >>> With VFIO however it does not for me.
> >>>
> >>> First issue is: your guest can use 4K page and your host can use 64KB
> >>> pages. In that case VFIO_DMA_MAP will fail with -EINVAL. We must devise
> >>> a way to pass the host settings to the VIRTIO-IOMMU device.
> >>>
> >>> Even with 64KB pages, it did not work for me. I have obviously not the
> >>> storm of VFIO_DMA_MAP failures but I have some, most probably due to
> >>> some wrong notifications somewhere. I will try to investigate on my side.
> >>>
> >>> Did you test with VFIO on your side?
> >>
> >> I did not tried with different page sizes, only tested with 4K page size.
> >>
> >> Yes it works, I tested with two n/w device assigned to VM, both interfaces 
> >> works
> >>
> >> First I will try with 64k page size.
> >
> > 64K page size does not work for me as well,
> >
> > I think we are not passing correct page_size_mask here
> > (config.page_size_mask is set to TARGET_PAGE_MASK ( which is
> > 0xf000))
> I guess you mean with guest using 4K and host using 64K.
> >
> > We need to set this correctly as per host page size, correct?
> Yes that's correct. We need to put in place a control path to retrieve
> the page settings on host through VFIO to inform the virtio-iommu device.
>
> Besides this issue, did you try with 64kB on host and guest?

I tried Followings
  - 4k host and 4k guest  - it works with v7 version
  - 64k host and 64k guest - it does not work with v7
hard-coded config.page_size_mask to 0x and it works

Thanks
-Bharat

>
> Thanks
>
> Eric
> >
> > Thanks
> > -Bharat
> >
> >>
> >> Thanks
> >> -Bharat
> >>
> >>>
> >>> Thanks
> >>>
> >>> Eric
> 
>  @@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
>  key, gpointer value,
> 
>   QLIST_FOREACH(iter, >endpoint_list, next) {
>   iter->domain = d;
>  +   iter->viommu = s;
>   g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
>   }
>   return false; /* continue the domain traversal */
> 
> >>  } VirtIOIOMMUEndpoint;
> >>
> >>  typedef struct VirtIOIOMMUInterval {
> >> @@ -155,8 +156,44 @@ static void 
> >> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
> >>  memory_region_notify_iommu(mr, 0, entry);
> >>  }
> >>
> >> +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer 
> >> value,
> >> +   gpointer data)
> >> +{
> >> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> >> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> >> +
> >> +virtio_iommu_notify_unmap(mr, interval->low,
> >> +  interval->high - interval->low + 1);
> >> +
> >> +return false;
> >> +}
> >> +
> >> +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
> >> + gpointer data)
> >> +{
> >> +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
> >> +

[PATCH] cpus: avoid stucking in pause_all_vcpus due to race

2020-03-16 Thread Longpeng(Mike)

From: Longpeng 

We found an issue when repeat reboot in guest during migration, it cause the
migration thread never be waken up again.

|
   |
LOCK BQL   |
...|
main_loop_should_exit  |
 pause_all_vcpus   |
  1. set all cpus ->stop=true  |
 and then kick |
  2. return if all cpus is paused  |
 (by '->stopped == true'), else|
  3. qemu_cond_wait [BQL UNLOCK]   |
   |LOCK BQL
   |...
   |do_vm_stop
   | pause_all_vcpus
   |  (A)set all cpus ->stop=true
   | and then kick
   |  (B)return if all cpus is paused
   | (by '->stopped == true'), else
   |  (C)qemu_cond_wait [BQL UNLOCK]
  4. be waken up and LOCK BQL  |  (D)be waken up BUT wait for  BQL
  5. goto 2.   |
 (BQL is still LOCKed) |
 resume_all_vcpus  |
  1. set all cpus ->stop=false |
 and ->stopped=false   |
...|
BQL UNLOCK |  (E)LOCK BQL
   |  (F)goto B. [but stopped is false now!]
   |Finally, sleep at step 3 forever.

As suggested by Paolo, resume_all_vcpus should notice this race, so we need
to move the change of runstate before pause_all_vcpus in do_vm_stop() and
ignore the resume request if runstate is not running.

Cc: Paolo Bonzini 
Cc: Dr . David Alan Gilbert 
Cc: Richard Henderson 
Signed-off-by: Longpeng 
---
 cpus.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index b4f8b84..ef441bd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1026,9 +1026,9 @@ static int do_vm_stop(RunState state, bool send_stop)
 int ret = 0;
 
 if (runstate_is_running()) {
+runstate_set(state);
 cpu_disable_ticks();
 pause_all_vcpus();
-runstate_set(state);
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -1899,6 +1899,10 @@ void resume_all_vcpus(void)
 {
 CPUState *cpu;
 
+if (!runstate_is_running()) {
+return;
+}
+
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
 CPU_FOREACH(cpu) {
 cpu_resume(cpu);
-- 
1.8.3.1

Re: [PATCH 6/8] hw/ide: Do ide_drive_get() within pci_ide_create_devs()

2020-03-16 Thread Philippe Mathieu-Daudé


On 3/16/20 7:23 AM, Markus Armbruster wrote:

Paolo Bonzini  writes:


On 13/03/20 23:16, BALATON Zoltan wrote:


+    pci_dev = pci_create_simple(pci_bus, -1, "cmd646-ide");
+    pci_ide_create_devs(pci_dev);


Additionally, I think it may also make sense to move pci_ide_create_devs
call into the realize methods of these IDE controllers so boards do not
need to do it explicitely. These calls always follow the creation of the
device immediately so could just be done internally in IDE device and
simplify it further. I can attempt to prepare additional patches for
that but first I'd like to hear if anyone has anything against that to
avoid doing useless work.


No, it's better to do it separately.  I think that otherwise you could
add another IDE controller with -device, and both controllers would try
to add the drives.


Correct.

Creating device frontends for -drive if=ide is the board's job.  Boards
may delegate to suitable helpers.  I'd very much prefer these helpers
not to live with device model code.  Board and device model code should
be cleanly separated to to reduce the temptation to muddle their
responsibilities.  It's separation of concerns.

I actually wish we had separate sub-trees for boards and devices instead
of keeping both in hw/.


Never too late!

To be clear, you suggest:

- one dir with machines, boards, system-on-module
- one dir with devices, cpu, system-on-chips

Correct?




Basically, separating the call means that only automatically added
controllers obey "if=ide".

Re: [PULL 132/136] mem-prealloc: optimize large guest startup

2020-03-16 Thread Laurent Vivier

Hi,

a bug has been reported in launchpad for this patch:

[Regression]Powerpc kvm guest unable to start with hugepage backed
memory
https://bugs.launchpad.net/qemu/+bug/1866962

Thanks,
Laurent

Le 25/02/2020 à 13:07, Paolo Bonzini a écrit :
> From: bauerchen 
> 
> [desc]:
> Large memory VM starts slowly when using -mem-prealloc, and
> there are some areas to optimize in current method;
> 
> 1、mmap will be used to alloc threads stack during create page
> clearing threads, and it will attempt mm->mmap_sem for write
> lock, but clearing threads have hold read lock, this competition
> will cause threads createion very slow;
> 
> 2、methods of calcuating pages for per threads is not well;if we use
> 64 threads to split 160 hugepage,63 threads clear 2page,1 thread
> clear 34 page,so the entire speed is very slow;
> 
> to solve the first problem,we add a mutex in thread function,and
> start all threads when all threads finished createion;
> and the second problem, we spread remainder to other threads,in
> situation that 160 hugepage and 64 threads, there are 32 threads
> clear 3 pages,and 32 threads clear 2 pages.
> 
> [test]:
> 320G 84c VM start time can be reduced to 10s
> 680G 84c VM start time can be reduced to 18s
> 
> Signed-off-by: bauerchen 
> Reviewed-by: Pan Rui 
> Reviewed-by: Ivan Ren 
> [Simplify computation of the number of pages per thread. - Paolo]
> Signed-off-by: Paolo Bonzini 
> ---
>  util/oslib-posix.c | 32 
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 5a291cc..897e8f3 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -76,6 +76,10 @@ static MemsetThread *memset_thread;
>  static int memset_num_threads;
>  static bool memset_thread_failed;
>  
> +static QemuMutex page_mutex;
> +static QemuCond page_cond;
> +static bool threads_created_flag;
> +
>  int qemu_get_thread_id(void)
>  {
>  #if defined(__linux__)
> @@ -403,6 +407,17 @@ static void *do_touch_pages(void *arg)
>  MemsetThread *memset_args = (MemsetThread *)arg;
>  sigset_t set, oldset;
>  
> +/*
> + * On Linux, the page faults from the loop below can cause mmap_sem
> + * contention with allocation of the thread stacks.  Do not start
> + * clearing until all threads have been created.
> + */
> +qemu_mutex_lock(_mutex);
> +while(!threads_created_flag){
> +qemu_cond_wait(_cond, _mutex);
> +}
> +qemu_mutex_unlock(_mutex);
> +
>  /* unblock SIGBUS */
>  sigemptyset();
>  sigaddset(, SIGBUS);
> @@ -451,27 +466,28 @@ static inline int get_memset_num_threads(int smp_cpus)
>  static bool touch_all_pages(char *area, size_t hpagesize, size_t numpages,
>  int smp_cpus)
>  {
> -size_t numpages_per_thread;
> -size_t size_per_thread;
> +size_t numpages_per_thread, leftover;
>  char *addr = area;
>  int i = 0;
>  
>  memset_thread_failed = false;
> +threads_created_flag = false;
>  memset_num_threads = get_memset_num_threads(smp_cpus);
>  memset_thread = g_new0(MemsetThread, memset_num_threads);
> -numpages_per_thread = (numpages / memset_num_threads);
> -size_per_thread = (hpagesize * numpages_per_thread);
> +numpages_per_thread = numpages / memset_num_threads;
> +leftover = numpages % memset_num_threads;
>  for (i = 0; i < memset_num_threads; i++) {
>  memset_thread[i].addr = addr;
> -memset_thread[i].numpages = (i == (memset_num_threads - 1)) ?
> -numpages : numpages_per_thread;
> +memset_thread[i].numpages = numpages_per_thread + (i < leftover);
>  memset_thread[i].hpagesize = hpagesize;
>  qemu_thread_create(_thread[i].pgthread, "touch_pages",
> do_touch_pages, _thread[i],
> QEMU_THREAD_JOINABLE);
> -addr += size_per_thread;
> -numpages -= numpages_per_thread;
> +addr += memset_thread[i].numpages * hpagesize;
>  }
> +threads_created_flag = true;
> +qemu_cond_broadcast(_cond);
> +
>  for (i = 0; i < memset_num_threads; i++) {
>  qemu_thread_join(_thread[i].pgthread);
>  }
>

[PATCH v3 3/4] linux-user, nios2: sync syscall numbers with kernel v5.5

2020-03-16 Thread Laurent Vivier

Use helper script scripts/gensyscalls.sh to generate the file.

This adds TARGET_NR_llseek that was missing and remove syscalls 1024
to 1079.

Add new syscalls from 288 (pkey_mprotect) to 434 (pidfd_open)

Signed-off-by: Laurent Vivier 
Reviewed-by: Alistair Francis 
---

Notes:
v2: add comments suggested by Taylor

 linux-user/nios2/syscall_nr.h | 650 +-
 1 file changed, 320 insertions(+), 330 deletions(-)

diff --git a/linux-user/nios2/syscall_nr.h b/linux-user/nios2/syscall_nr.h
index 8fb87864ca0b..32d485dc9ae8 100644
--- a/linux-user/nios2/syscall_nr.h
+++ b/linux-user/nios2/syscall_nr.h
@@ -1,334 +1,324 @@
+/*
+ * This file contains the system call numbers.
+ * Do not modify.
+ * This file is generated by scripts/gensyscalls.sh
+ */
 #ifndef LINUX_USER_NIOS2_SYSCALL_NR_H
 #define LINUX_USER_NIOS2_SYSCALL_NR_H
 
-#define TARGET_NR_io_setup  0
-#define TARGET_NR_io_destroy1
-#define TARGET_NR_io_submit 2
-#define TARGET_NR_io_cancel 3
-#define TARGET_NR_io_getevents  4
-#define TARGET_NR_setxattr  5
-#define TARGET_NR_lsetxattr 6
-#define TARGET_NR_fsetxattr 7
-#define TARGET_NR_getxattr  8
-#define TARGET_NR_lgetxattr 9
-#define TARGET_NR_fgetxattr 10
-#define TARGET_NR_listxattr 11
-#define TARGET_NR_llistxattr12
-#define TARGET_NR_flistxattr13
-#define TARGET_NR_removexattr   14
-#define TARGET_NR_lremovexattr  15
-#define TARGET_NR_fremovexattr  16
-#define TARGET_NR_getcwd17
-#define TARGET_NR_lookup_dcookie18
-#define TARGET_NR_eventfd2  19
-#define TARGET_NR_epoll_create1 20
-#define TARGET_NR_epoll_ctl 21
-#define TARGET_NR_epoll_pwait   22
-#define TARGET_NR_dup   23
-#define TARGET_NR_dup3  24
-#define TARGET_NR_fcntl64   25
-#define TARGET_NR_inotify_init1 26
-#define TARGET_NR_inotify_add_watch 27
-#define TARGET_NR_inotify_rm_watch  28
-#define TARGET_NR_ioctl 29
-#define TARGET_NR_ioprio_set30
-#define TARGET_NR_ioprio_get31
-#define TARGET_NR_flock 32
-#define TARGET_NR_mknodat   33
-#define TARGET_NR_mkdirat   34
-#define TARGET_NR_unlinkat  35
-#define TARGET_NR_symlinkat 36
-#define TARGET_NR_linkat37
-#define TARGET_NR_renameat  38
-#define TARGET_NR_umount2   39
-#define TARGET_NR_mount 40
-#define TARGET_NR_pivot_root41
-#define TARGET_NR_nfsservctl42
-#define TARGET_NR_statfs64  43
-#define TARGET_NR_fstatfs64 44
-#define TARGET_NR_truncate6445
-#define TARGET_NR_ftruncate64   46
-#define TARGET_NR_fallocate 47
-#define TARGET_NR_faccessat 48
-#define TARGET_NR_chdir 49
-#define TARGET_NR_fchdir50
-#define TARGET_NR_chroot51
-#define TARGET_NR_fchmod52
-#define TARGET_NR_fchmodat  53
-#define TARGET_NR_fchownat  54
-#define TARGET_NR_fchown55
-#define TARGET_NR_openat56
-#define TARGET_NR_close 57
-#define TARGET_NR_vhangup   58
-#define TARGET_NR_pipe2 59
-#define TARGET_NR_quotactl  60
-#define TARGET_NR_getdents6461
-#define TARGET_NR_read  63
-#define TARGET_NR_write 64
-#define TARGET_NR_readv 65
-#define TARGET_NR_writev66
-#define TARGET_NR_pread64   67
-#define TARGET_NR_pwrite64  68
-#define TARGET_NR_preadv69
-#define TARGET_NR_pwritev   70
-#define TARGET_NR_sendfile6471
-#define TARGET_NR_pselect6  72
-#define TARGET_NR_ppoll 73
-#define TARGET_NR_signalfd4 74
-#define TARGET_NR_vmsplice  75
-#define TARGET_NR_splice76
-#define TARGET_NR_tee   77
-#define TARGET_NR_readlinkat78
-#define TARGET_NR_fstatat64 79
-#define TARGET_NR_fstat64   80
-#define TARGET_NR_sync  81
-#define TARGET_NR_fsync 82
-#define TARGET_NR_fdatasync 83
-#define TARGET_NR_sync_file_range   84
-#define TARGET_NR_timerfd_create85
-#define TARGET_NR_timerfd_settime   86
-#define

Re: [PATCH v9] s390x: protvirt: Fence huge pages

2020-03-16 Thread Janosch Frank

On 3/13/20 9:21 AM, Christian Borntraeger wrote:
> 
> 
> On 12.03.20 17:25, Janosch Frank wrote:
>> Let's bail out of the protected transition if we detect that huge
>> pages might be in use.
>>
>> Signed-off-by: Janosch Frank 
>> ---
>>
>> I'd like to squash this into the unpack patch to give a proper error
>> message if we try to transition into the protected mode while being
>> backed by huge pages. 
> 
> Looks good.
> But maybe we can do it better. Why not reverse the logic and
> instead of having kvm_s390_get_hpage_1m, let us define an protvirt_allowed
> that as of today only returns hugepages != 1:
> Then we could (for kvm-stub.c) also say protvirt_allowed=false;
> And if other reasons come along we can extend.
> 
> We could also keep this patch separate, does not really matter.

The *_allowed() functions are all based on the machine and part of
s390-virtio-ccw.c so having one in kvm.c looks strange.

!protvirt_allowed could have any number of reasons in the future, I
introduced this patch to give a specific error message that can help the
user to chose the right options when looking for the error.

Other ideas or a revised one?

> 
>>
>> ---
>>  hw/s390x/ipl.h | 16 
>>  hw/s390x/s390-virtio-ccw.c |  1 -
>>  target/s390x/diag.c| 23 ---
>>  target/s390x/kvm-stub.c|  5 +
>>  target/s390x/kvm.c |  5 +
>>  target/s390x/kvm_s390x.h   |  1 +
>>  6 files changed, 35 insertions(+), 16 deletions(-)
>>
>> diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
>> index af5bb130a6334821..95e3183c9cccf8b6 100644
>> --- a/hw/s390x/ipl.h
>> +++ b/hw/s390x/ipl.h
>> @@ -185,6 +185,22 @@ struct S390IPLState {
>>  typedef struct S390IPLState S390IPLState;
>>  QEMU_BUILD_BUG_MSG(offsetof(S390IPLState, iplb) & 3, "alignment of iplb 
>> wrong");
>>  
>> +#define DIAG_308_RC_OK  0x0001
>> +#define DIAG_308_RC_NO_CONF 0x0102
>> +#define DIAG_308_RC_INVALID 0x0402
>> +#define DIAG_308_RC_NO_PV_CONF  0x0902
>> +#define DIAG_308_RC_INVAL_FOR_PV0x0a02
>> +
>> +#define DIAG308_RESET_MOD_CLR   0
>> +#define DIAG308_RESET_LOAD_NORM 1
>> +#define DIAG308_LOAD_CLEAR  3
>> +#define DIAG308_LOAD_NORMAL_DUMP4
>> +#define DIAG308_SET 5
>> +#define DIAG308_STORE   6
>> +#define DIAG308_PV_SET  8
>> +#define DIAG308_PV_STORE9
>> +#define DIAG308_PV_START10
>> +
>>  #define S390_IPL_TYPE_FCP 0x00
>>  #define S390_IPL_TYPE_CCW 0x02
>>  #define S390_IPL_TYPE_PV 0x05
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index ebdaaa3a001f6e8c..d32f35c7f47b9c1d 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -361,7 +361,6 @@ out_err:
>>  return rc;
>>  }
>>  
>> -#define DIAG_308_RC_INVAL_FOR_PV0x0a02
>>  static void s390_machine_inject_pv_error(CPUState *cs)
>>  {
>>  int r1 = (cs->kvm_run->s390_sieic.ipa & 0x00f0) >> 4;
>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>> index b245e557037ded06..b1ca81633b83bbdc 100644
>> --- a/target/s390x/diag.c
>> +++ b/target/s390x/diag.c
>> @@ -21,6 +21,7 @@
>>  #include "hw/s390x/ipl.h"
>>  #include "hw/s390x/s390-virtio-ccw.h"
>>  #include "hw/s390x/pv.h"
>> +#include "kvm_s390x.h"
>>  
>>  int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
>>  {
>> @@ -50,21 +51,6 @@ int handle_diag_288(CPUS390XState *env, uint64_t r1, 
>> uint64_t r3)
>>  return diag288_class->handle_timer(diag288, func, timeout);
>>  }
>>  
>> -#define DIAG_308_RC_OK  0x0001
>> -#define DIAG_308_RC_NO_CONF 0x0102
>> -#define DIAG_308_RC_INVALID 0x0402
>> -#define DIAG_308_RC_NO_PV_CONF  0x0902
>> -
>> -#define DIAG308_RESET_MOD_CLR   0
>> -#define DIAG308_RESET_LOAD_NORM 1
>> -#define DIAG308_LOAD_CLEAR  3
>> -#define DIAG308_LOAD_NORMAL_DUMP4
>> -#define DIAG308_SET 5
>> -#define DIAG308_STORE   6
>> -#define DIAG308_PV_SET  8
>> -#define DIAG308_PV_STORE9
>> -#define DIAG308_PV_START10
>> -
>>  static int diag308_parm_check(CPUS390XState *env, uint64_t r1, uint64_t 
>> addr,
>>uintptr_t ra, bool write)
>>  {
>> @@ -166,6 +152,13 @@ out:
>>  return;
>>  }
>>  
>> +if (kvm_s390_get_hpage_1m()) {
>> +error_report("Protected VMs can currently not be backed with "
>> + "huge pages");
>> +env->regs[r1 + 1] = DIAG_308_RC_INVAL_FOR_PV;
>> +return;
>> +}
>> +
>>  s390_ipl_reset_request(cs, S390_RESET_PV);
>>  break;
>>  default:
>> diff --git a/target/s390x/kvm-stub.c b/target/s390x/kvm-stub.c
>> index c4cd497f850eb9c7..aa185017a2a886ca 100644
>> --- a/target/s390x/kvm-stub.c
>> +++ b/target/s390x/kvm-stub.c
>> @@ -39,6 +39,11 @@ int kvm_s390_vcpu_interrupt_post_load(S390CPU *cpu)
>>  return 0;
>>

[PATCH v2] target/i386: Add ARCH_CAPABILITIES related bits into Icelake-Server CPU model

2020-03-16 Thread Xiaoyao Li

Current Icelake-Server CPU model lacks all the features enumerated by
MSR_IA32_ARCH_CAPABILITIES.

Add them, so that guest of "Icelake-Server" can see all of them.

Signed-off-by: Xiaoyao Li 
---
v2:
 - Add it as a new version.
---
 target/i386/cpu.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 92fafa265914..5fba6a2ad6b3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3496,6 +3496,19 @@ static X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 },
 },
+{
+.version = 3,
+.props = (PropValue[]) {
+{ "arch-capabilities", "on" },
+{ "rdctl-no", "on" },
+{ "ibrs-all", "on" },
+{ "skip-l1dfl-vmentry", "on" },
+{ "mds-no", "on" },
+{ "pschange-mc-no", "on" },
+{ "taa-no", "on" },
+{ /* end of list */ }
+},
+},
 { /* end of list */ }
 }
 },
-- 
2.20.1

[PATCH v3 4/4] linux-user, openrisc: sync syscall numbers with kernel v5.5

2020-03-16 Thread Laurent Vivier

Use helper script scripts/gensyscalls.sh to generate the file.

Add TARGET_NR_or1k_atomic
Remove useless comments and blank lines.
Define diretly the __NR_XXX64 syscalls rather than using the
intermediate __NR3264 definition.

Remove wrong cut'n'paste (like "#ifdef __ARCH_WANT_SYNC_FILE_RANGE2")

Add new syscalls from 286 (preadv) to 434 (pidfd_open).

Remove obsolete syscalls 1204 (open) to 1079 (fork).

Signed-off-by: Laurent Vivier 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---

Notes:
v2: add comments suggested by Taylor

 linux-user/openrisc/syscall_nr.h | 309 +++
 1 file changed, 62 insertions(+), 247 deletions(-)

diff --git a/linux-user/openrisc/syscall_nr.h b/linux-user/openrisc/syscall_nr.h
index 7763dbcfd8b3..340383beb2c6 100644
--- a/linux-user/openrisc/syscall_nr.h
+++ b/linux-user/openrisc/syscall_nr.h
@@ -1,13 +1,17 @@
+/*
+ * This file contains the system call numbers.
+ * Do not modify.
+ * This file is generated by scripts/gensyscalls.sh
+ */
 #ifndef LINUX_USER_OPENRISC_SYSCALL_NR_H
 #define LINUX_USER_OPENRISC_SYSCALL_NR_H
 
 #define TARGET_NR_io_setup 0
+#define TARGET_NR_or1k_atomic TARGET_NR_arch_specific_syscall
 #define TARGET_NR_io_destroy 1
 #define TARGET_NR_io_submit 2
 #define TARGET_NR_io_cancel 3
 #define TARGET_NR_io_getevents 4
-
-/* fs/xattr.c */
 #define TARGET_NR_setxattr 5
 #define TARGET_NR_lsetxattr 6
 #define TARGET_NR_fsetxattr 7
@@ -20,63 +24,36 @@
 #define TARGET_NR_removexattr 14
 #define TARGET_NR_lremovexattr 15
 #define TARGET_NR_fremovexattr 16
-
-/* fs/dcache.c */
 #define TARGET_NR_getcwd 17
-
-/* fs/cookies.c */
 #define TARGET_NR_lookup_dcookie 18
-
-/* fs/eventfd.c */
 #define TARGET_NR_eventfd2 19
-
-/* fs/eventpoll.c */
 #define TARGET_NR_epoll_create1 20
 #define TARGET_NR_epoll_ctl 21
 #define TARGET_NR_epoll_pwait 22
-
-/* fs/fcntl.c */
 #define TARGET_NR_dup 23
 #define TARGET_NR_dup3 24
-#define TARGET_NR_3264_fcntl 25
-
-/* fs/inotify_user.c */
+#define TARGET_NR_fcntl64 25
 #define TARGET_NR_inotify_init1 26
 #define TARGET_NR_inotify_add_watch 27
 #define TARGET_NR_inotify_rm_watch 28
-
-/* fs/ioctl.c */
 #define TARGET_NR_ioctl 29
-
-/* fs/ioprio.c */
 #define TARGET_NR_ioprio_set 30
 #define TARGET_NR_ioprio_get 31
-
-/* fs/locks.c */
 #define TARGET_NR_flock 32
-
-/* fs/namei.c */
 #define TARGET_NR_mknodat 33
 #define TARGET_NR_mkdirat 34
 #define TARGET_NR_unlinkat 35
 #define TARGET_NR_symlinkat 36
 #define TARGET_NR_linkat 37
 #define TARGET_NR_renameat 38
-
-/* fs/namespace.c */
 #define TARGET_NR_umount2 39
 #define TARGET_NR_mount 40
 #define TARGET_NR_pivot_root 41
-
-/* fs/nfsctl.c */
 #define TARGET_NR_nfsservctl 42
-
-/* fs/open.c */
-#define TARGET_NR_3264_statfs 43
-#define TARGET_NR_3264_fstatfs 44
-#define TARGET_NR_3264_truncate 45
-#define TARGET_NR_3264_ftruncate 46
-
+#define TARGET_NR_statfs64 43
+#define TARGET_NR_fstatfs64 44
+#define TARGET_NR_truncate64 45
+#define TARGET_NR_ftruncate64 46
 #define TARGET_NR_fallocate 47
 #define TARGET_NR_faccessat 48
 #define TARGET_NR_chdir 49
@@ -89,18 +66,10 @@
 #define TARGET_NR_openat 56
 #define TARGET_NR_close 57
 #define TARGET_NR_vhangup 58
-
-/* fs/pipe.c */
 #define TARGET_NR_pipe2 59
-
-/* fs/quota.c */
 #define TARGET_NR_quotactl 60
-
-/* fs/readdir.c */
 #define TARGET_NR_getdents64 61
-
-/* fs/read_write.c */
-#define TARGET_NR_3264_lseek 62
+#define TARGET_NR_llseek 62
 #define TARGET_NR_read 63
 #define TARGET_NR_write 64
 #define TARGET_NR_readv 65
@@ -109,85 +78,42 @@
 #define TARGET_NR_pwrite64 68
 #define TARGET_NR_preadv 69
 #define TARGET_NR_pwritev 70
-
-/* fs/sendfile.c */
-#define TARGET_NR_3264_sendfile 71
-
-/* fs/select.c */
+#define TARGET_NR_sendfile64 71
 #define TARGET_NR_pselect6 72
 #define TARGET_NR_ppoll 73
-
-/* fs/signalfd.c */
 #define TARGET_NR_signalfd4 74
-
-/* fs/splice.c */
 #define TARGET_NR_vmsplice 75
 #define TARGET_NR_splice 76
 #define TARGET_NR_tee 77
-
-/* fs/stat.c */
 #define TARGET_NR_readlinkat 78
-#define TARGET_NR_3264_fstatat 79
-#define TARGET_NR_3264_fstat 80
-
-/* fs/sync.c */
+#define TARGET_NR_fstatat64 79
+#define TARGET_NR_fstat64 80
 #define TARGET_NR_sync 81
 #define TARGET_NR_fsync 82
 #define TARGET_NR_fdatasync 83
-
-#ifdef __ARCH_WANT_SYNC_FILE_RANGE2
-#define TARGET_NR_sync_file_range2 84
-#else
 #define TARGET_NR_sync_file_range 84
-#endif
-
-/* fs/timerfd.c */
 #define TARGET_NR_timerfd_create 85
 #define TARGET_NR_timerfd_settime 86
 #define TARGET_NR_timerfd_gettime 87
-
-/* fs/utimes.c */
 #define TARGET_NR_utimensat 88
-
-/* kernel/acct.c */
 #define TARGET_NR_acct 89
-
-/* kernel/capability.c */
 #define TARGET_NR_capget 90
 #define TARGET_NR_capset 91
-
-/* kernel/exec_domain.c */
 #define TARGET_NR_personality 92
-
-/* kernel/exit.c */
 #define TARGET_NR_exit 93
 #define TARGET_NR_exit_group 94
 #define TARGET_NR_waitid 95
-
-/* kernel/fork.c */
 #define TARGET_NR_set_tid_address 96
 #define TARGET_NR_unshare 97
-
-/* kernel/futex.c */
 #define

Re: [PATCH v4 2/3] mac_via: fix incorrect creation of mos6522 device in mac_via

2020-03-16 Thread Paolo Bonzini

On 16/03/20 07:03, Markus Armbruster wrote:
> Paolo Bonzini  writes:
> 
>> On 15/03/20 15:56, Markus Armbruster wrote:

 The question is why they are not, i.e. where does the above reasoning 
 break.
>>> I don't know.  But let's for the sake of the argument assume this
>>> actually worked.  Asking for help in the monitor then *still* has side
>>> effects visible in the time span between .instance_init() and
>>> finalization.
>>>
>>> Why is that harmless?
>>
>> I don't really have an answer, but if that is a problem we could change
>> "info qtree" to skip non-realized devices.
> 
> Can we convince ourselves that "info qtree" is the *only* way to observe
> these side effects?

There is of course qom-get/qom-set, but those _should_ show this side
effect.

If we decide that "info qtree" should only show devices visible to the
guest (as opposed to all objects that have been created), then "show
only realized devices" is not even a hack but the correct implementation
of the concept.

Paolo

> If yes, a hack to ignore unrealized devices "fixes" the problem.
> 
> If no, it sweeps it under the rug.

Re: [PULL 132/136] mem-prealloc: optimize large guest startup

2020-03-16 Thread Paolo Bonzini

On 16/03/20 09:42, Laurent Vivier wrote:
> Hi,
> 
> a bug has been reported in launchpad for this patch:
> 
> [Regression]Powerpc kvm guest unable to start with hugepage backed
> memory
> https://bugs.launchpad.net/qemu/+bug/1866962

Indeed, I'm sending the pull request with the fix today.  Sorry for the
breakage.

Paolo

Re: [PATCH] docs/conf.py: Raise ConfigError for bad Sphinx Python version

2020-03-16 Thread Peter Maydell

On Fri, 13 Mar 2020 at 22:30, John Snow  wrote:
> When was ConfigError introduced, and what's our minimum Sphinx version?
> (Hm, looks like it's not versioned, so I'll trust it's been around a while.)

Yeah, it's been around a long time; our minimum Sphinx version is 1.3.

thanks
-- PMM

Re: [PATCH 0/2] Fix Cooperlake CPU model

2020-03-16 Thread Paolo Bonzini

On 16/03/20 02:39, Zhang, Cathy wrote:
> On 1/7/2020 9:31 PM, Paolo Bonzini wrote:
>> On 25/12/19 07:30, Xiaoyao Li wrote:
>>> Current Cooperlake CPU model lacks VMX features which are introduced
>>> by Paolo
>>> several months ago, and it also lacks 2 security features in
>>> MSR_IA32_ARCH_CAPABILITIES disclosed recently.
>>>
>>> Xiaoyao Li (2):
>>>    target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES
>>>    target/i386: Add missed features to Cooperlake CPU model
>>>
>>>   target/i386/cpu.c | 51 ++-
>>>   target/i386/cpu.h | 13 +++-
>>>   2 files changed, 58 insertions(+), 6 deletions(-)
>>>
>> Queued, thanks.
>>
>> Paolo
> 
> Hi Paolo,
> 
> Can I ask one question that will you put all the patches for Cooper Lake
> Cpu model into QEMU v5.0-rc0?

These are included already:

commit b952544fe8a061f0c0cccfd50a58220bc6ac94da
Merge: dc65a5bdc9 083b266f69
Author: Peter Maydell 
Date:   Fri Jan 10 17:16:49 2020 +

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
staging

* Compat machines fix (Denis)
* Command line parsing fixes (Michal, Peter, Xiaoyao)
* Cooperlake CPU model fixes (Xiaoyao)
* i386 gdb fix (mkdolata)
* IOEventHandler cleanup (Philippe)
* icount fix (Pavel)
* RR support for random number sources (Pavel)
* Kconfig fixes (Philippe)

Paolo

[PATCH v7 1/4] qcow2: introduce compression type feature

2020-03-16 Thread Denis Plotnikov

The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 85e27bb61f..a306484973 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.0)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4392,6 +4395,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.0
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4415,6 +4430,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.0)
 #
 # Since: 2.12
 ##
@@ -4430,7 +4447,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..cb6bf2ab83 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(sizeof(QCowHeader) % 8 != 0);
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+QCOW2_INCOMPAT_COMPRESSION  = 1 <<

Re: [PATCH 1/2] block: bdrv_set_backing_bs: fix use-after-free

2020-03-16 Thread Philippe Mathieu-Daudé


On 3/16/20 7:06 AM, Vladimir Sementsov-Ogievskiy wrote:

There is a use-after-free possible: bdrv_unref_child() leaves
bs->backing freed but not NULL. bdrv_attach_child may produce nested
polling loop due to drain, than access of freed pointer is possible.

I've produced the following crash on 30 iotest with modified code. It
does not reproduce on master, but still seems possible:

 #0  __strcmp_avx2 () at /lib64/libc.so.6
 #1  bdrv_backing_overridden (bs=0x55c9d3cc2060) at block.c:6350
 #2  bdrv_refresh_filename (bs=0x55c9d3cc2060) at block.c:6404
 #3  bdrv_backing_attach (c=0x55c9d48e5520) at block.c:1063
 #4  bdrv_replace_child_noperm
 (child=child@entry=0x55c9d48e5520,
 new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2290
 #5  bdrv_replace_child
 (child=child@entry=0x55c9d48e5520,
 new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2320
 #6  bdrv_root_attach_child
 (child_bs=child_bs@entry=0x55c9d3cc2060,
 child_name=child_name@entry=0x55c9d241d478 "backing",
 child_role=child_role@entry=0x55c9d26ecee0 ,
 ctx=, perm=, shared_perm=21,
 opaque=0x55c9d3c5a3d0, errp=0x7ffd117108e0) at block.c:2424
 #7  bdrv_attach_child
 (parent_bs=parent_bs@entry=0x55c9d3c5a3d0,
 child_bs=child_bs@entry=0x55c9d3cc2060,
 child_name=child_name@entry=0x55c9d241d478 "backing",
 child_role=child_role@entry=0x55c9d26ecee0 ,
 errp=errp@entry=0x7ffd117108e0) at block.c:5876
 #8  in bdrv_set_backing_hd
 (bs=bs@entry=0x55c9d3c5a3d0,
 backing_hd=backing_hd@entry=0x55c9d3cc2060,
 errp=errp@entry=0x7ffd117108e0)
 at block.c:2576
 #9  stream_prepare (job=0x55c9d49d84a0) at block/stream.c:150
 #10 job_prepare (job=0x55c9d49d84a0) at job.c:761
 #11 job_txn_apply (txn=, fn=) at
 job.c:145
 #12 job_do_finalize (job=0x55c9d49d84a0) at job.c:778
 #13 job_completed_txn_success (job=0x55c9d49d84a0) at job.c:832
 #14 job_completed (job=0x55c9d49d84a0) at job.c:845
 #15 job_completed (job=0x55c9d49d84a0) at job.c:836
 #16 job_exit (opaque=0x55c9d49d84a0) at job.c:864
 #17 aio_bh_call (bh=0x55c9d471a160) at util/async.c:117
 #18 aio_bh_poll (ctx=ctx@entry=0x55c9d3c46720) at util/async.c:117
 #19 aio_poll (ctx=ctx@entry=0x55c9d3c46720,
 blocking=blocking@entry=true)
 at util/aio-posix.c:728
 #20 bdrv_parent_drained_begin_single (poll=true, c=0x55c9d3d558f0)
 at block/io.c:121
 #21 bdrv_parent_drained_begin_single (c=c@entry=0x55c9d3d558f0,
 poll=poll@entry=true)
 at block/io.c:114
 #22 bdrv_replace_child_noperm
 (child=child@entry=0x55c9d3d558f0,
 new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2258
 #23 bdrv_replace_child
 (child=child@entry=0x55c9d3d558f0,
 new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2320
 #24 bdrv_root_attach_child
 (child_bs=child_bs@entry=0x55c9d3d27300,
 child_name=child_name@entry=0x55c9d241d478 "backing",
 child_role=child_role@entry=0x55c9d26ecee0 ,
 ctx=, perm=, shared_perm=21,
 opaque=0x55c9d3cc2060, errp=0x7ffd11710c60) at block.c:2424
 #25 bdrv_attach_child
 (parent_bs=parent_bs@entry=0x55c9d3cc2060,
 child_bs=child_bs@entry=0x55c9d3d27300,
 child_name=child_name@entry=0x55c9d241d478 "backing",
 child_role=child_role@entry=0x55c9d26ecee0 ,
 errp=errp@entry=0x7ffd11710c60) at block.c:5876
 #26 bdrv_set_backing_hd
 (bs=bs@entry=0x55c9d3cc2060,
 backing_hd=backing_hd@entry=0x55c9d3d27300,
 errp=errp@entry=0x7ffd11710c60)
 at block.c:2576
 #27 stream_prepare (job=0x55c9d495ead0) at block/stream.c:150
 ...



Apparently:
Fixes: 12fa4af61f (block: Add Error parameter to bdrv_set_backing_hd)
Right?


Signed-off-by: Vladimir Sementsov-Ogievskiy 


Reviewed-by: Philippe Mathieu-Daudé 


---
  block.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 957630b1c5..a862ce4df9 100644
--- a/block.c
+++ b/block.c
@@ -2735,10 +2735,10 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
  
  if (bs->backing) {

  bdrv_unref_child(bs, bs->backing);
+bs->backing = NULL;
  }
  
  if (!backing_hd) {

-bs->backing = NULL;
  goto out;
  }

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Bharat Bhushan

Hi Eric,

On Mon, Mar 16, 2020 at 1:15 PM Bharat Bhushan  wrote:
>
> Hi Eric,
>
> On Mon, Mar 16, 2020 at 1:02 PM Auger Eric  wrote:
> >
> > Hi Bharat,
> >
> > On 3/16/20 7:41 AM, Bharat Bhushan wrote:
> > > Hi Eric,
> > >
> > > On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
> > >>
> > >> Hi Bharat
> > >>
> > >> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
> > >>> iommu-notifier are called when a device is attached
> > >> IOMMU notifiers
> > >>> or detached to as address-space.
> > >>> This is needed for VFIO.
> > >> and vhost for detach
> > >>>
> > >>> Signed-off-by: Bharat Bhushan 
> > >>> ---
> > >>>  hw/virtio/virtio-iommu.c | 47 
> > >>>  1 file changed, 47 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > >>> index e51344a53e..2006f72901 100644
> > >>> --- a/hw/virtio/virtio-iommu.c
> > >>> +++ b/hw/virtio/virtio-iommu.c
> > >>> @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
> > >>>  uint32_t id;
> > >>>  VirtIOIOMMUDomain *domain;
> > >>>  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
> > >>> +VirtIOIOMMU *viommu;
> > >> This needs specal care on post-load. When migrating the EPs, only the id
> > >> is migrated. On post-load you need to set viommu as it is done for
> > >> domain. migration is allowed with vhost.
> > >
> > > ok, I have not tried vhost/migration. Below change set viommu when
> > > reconstructing endpoint.
> >
> >
> > Yes I think this should be OK.
> >
> > By the end I did the series a try with vhost/vfio. with vhost it works
> > (not with recent kernel though, but the issue may be related to kernel).
> > With VFIO however it does not for me.
> >
> > First issue is: your guest can use 4K page and your host can use 64KB
> > pages. In that case VFIO_DMA_MAP will fail with -EINVAL. We must devise
> > a way to pass the host settings to the VIRTIO-IOMMU device.
> >
> > Even with 64KB pages, it did not work for me. I have obviously not the
> > storm of VFIO_DMA_MAP failures but I have some, most probably due to
> > some wrong notifications somewhere. I will try to investigate on my side.
> >
> > Did you test with VFIO on your side?
>
> I did not tried with different page sizes, only tested with 4K page size.
>
> Yes it works, I tested with two n/w device assigned to VM, both interfaces 
> works
>
> First I will try with 64k page size.

64K page size does not work for me as well,

I think we are not passing correct page_size_mask here
(config.page_size_mask is set to TARGET_PAGE_MASK ( which is
0xf000))

We need to set this correctly as per host page size, correct?

Thanks
-Bharat

>
> Thanks
> -Bharat
>
> >
> > Thanks
> >
> > Eric
> > >
> > > @@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
> > > key, gpointer value,
> > >
> > >  QLIST_FOREACH(iter, >endpoint_list, next) {
> > >  iter->domain = d;
> > > +   iter->viommu = s;
> > >  g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
> > >  }
> > >  return false; /* continue the domain traversal */
> > >
> > >>>  } VirtIOIOMMUEndpoint;
> > >>>
> > >>>  typedef struct VirtIOIOMMUInterval {
> > >>> @@ -155,8 +156,44 @@ static void 
> > >>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
> > >>>  memory_region_notify_iommu(mr, 0, entry);
> > >>>  }
> > >>>
> > >>> +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer 
> > >>> value,
> > >>> +   gpointer data)
> > >>> +{
> > >>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> > >>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> > >>> +
> > >>> +virtio_iommu_notify_unmap(mr, interval->low,
> > >>> +  interval->high - interval->low + 1);
> > >>> +
> > >>> +return false;
> > >>> +}
> > >>> +
> > >>> +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
> > >>> + gpointer data)
> > >>> +{
> > >>> +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
> > >>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> > >>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> > >>> +
> > >>> +virtio_iommu_notify_map(mr, interval->low, mapping->phys_addr,
> > >>> +interval->high - interval->low + 1);
> > >>> +
> > >>> +return false;
> > >>> +}
> > >>> +
> > >>>  static void 
> > >>> virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep)
> > >>>  {
> > >>> +VirtioIOMMUNotifierNode *node;
> > >>> +VirtIOIOMMU *s = ep->viommu;
> > >>> +VirtIOIOMMUDomain *domain = ep->domain;
> > >>> +
> > >>> +QLIST_FOREACH(node, >notifiers_list, next) {
> > >>> +if (ep->id == node->iommu_dev->devfn) {
> > >>> +g_tree_foreach(domain->mappings, 
> > >>> virtio_iommu_mapping_unmap,
> > >>> +   >iommu_dev->iommu_mr);
> > >> I

[PATCH v3 1/4] scripts: add a script to generate syscall_nr.h

2020-03-16 Thread Laurent Vivier

This script is needed for targets based on asm-generic syscall numbers 
generation

Signed-off-by: Laurent Vivier 
Reviewed-by: Alistair Francis 
Reviewed-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---

Notes:
v3: remove useless upper command
v2: add comments suggested by Taylor

 scripts/gensyscalls.sh | 102 +
 1 file changed, 102 insertions(+)
 create mode 100755 scripts/gensyscalls.sh

diff --git a/scripts/gensyscalls.sh b/scripts/gensyscalls.sh
new file mode 100755
index ..b7b8456f6312
--- /dev/null
+++ b/scripts/gensyscalls.sh
@@ -0,0 +1,102 @@
+#!/bin/sh
+#
+# Update syscall_nr.h files from linux headers asm-generic/unistd.h
+#
+# This code is licensed under the GPL version 2 or later.  See
+# the COPYING file in the top-level directory.
+#
+
+linux="$1"
+output="$2"
+
+TMP=$(mktemp -d)
+
+if [ "$linux" = "" ] ; then
+echo "Needs path to linux source tree" 1>&2
+exit 1
+fi
+
+if [ "$output" = "" ] ; then
+output="$PWD"
+fi
+
+upper()
+{
+echo "$1" | tr "[:lower:]" "[:upper:]" | tr "[:punct:]" "_"
+}
+
+qemu_arch()
+{
+case "$1" in
+arm64)
+echo "aarch64"
+;;
+*)
+echo "$1"
+;;
+esac
+}
+
+read_includes()
+{
+arch=$1
+bits=$2
+
+ cpp -P -nostdinc -fdirectives-only \
+-D_UAPI_ASM_$(upper ${arch})_BITSPERLONG_H \
+-D__BITS_PER_LONG=${bits} \
+-I${linux}/arch/${arch}/include/uapi/ \
+-I${linux}/include/uapi \
+-I${TMP} \
+"${linux}/arch/${arch}/include/uapi/asm/unistd.h"
+}
+
+filter_defines()
+{
+grep -e "#define __NR_" -e "#define __NR3264"
+}
+
+rename_defines()
+{
+sed "s/ __NR_/ TARGET_NR_/g;s/(__NR_/(TARGET_NR_/g"
+}
+
+evaluate_values()
+{
+sed "s/#define TARGET_NR_/QEMU TARGET_NR_/" | \
+cpp -P -nostdinc | \
+sed "s/^QEMU /#define /"
+}
+
+generate_syscall_nr()
+{
+arch=$1
+bits=$2
+file="$3"
+guard="$(upper LINUX_USER_$(qemu_arch $arch)_$(basename "$file"))"
+
+(echo "/*"
+echo " * This file contains the system call numbers."
+echo " * Do not modify."
+echo " * This file is generated by scripts/gensyscalls.sh"
+echo " */"
+echo "#ifndef ${guard}"
+echo "#define ${guard}"
+echo
+read_includes $arch $bits | filter_defines | rename_defines | \
+evaluate_values | sort -n -k 3
+echo
+echo "#endif /* ${guard} */"
+echo) > "$file"
+}
+
+mkdir "$TMP/asm"
+> "$TMP/asm/bitsperlong.h"
+
+generate_syscall_nr arm64 64 "$output/linux-user/aarch64/syscall_nr.h"
+generate_syscall_nr nios2 32 "$output/linux-user/nios2/syscall_nr.h"
+generate_syscall_nr openrisc 32 "$output/linux-user/openrisc/syscall_nr.h"
+
+generate_syscall_nr riscv 32 "$output/linux-user/riscv/syscall32_nr.h"
+generate_syscall_nr riscv 64 "$output/linux-user/riscv/syscall64_nr.h"
+rm -fr "$TMP"
-- 
2.24.1

[PATCH v3 0/4] linux-user: generate syscall_nr.h from linux unistd.h

2020-03-16 Thread Laurent Vivier

This series adds a script to generate syscall_nr.h for
architectures that don't use syscall.tbl but asm-generic/unistd.h

The script uses several cpp passes and filters result with a grep/sed/tr 
sequence.
The result must be checked before being used, so it's why the script is not
automatically run.

I have run the script, checked and added new files for arm64, nios2, openrisc.

I don't include result for riscv as Alistair is already working on a series
for this architecture and it needs some changes in syscall.c as some
syscalls are not defined.

We also need to add the _time64 variant of syscalls added by the update of the
syscall_nr.h.

Based-on: <20200310103403.3284090-1-laur...@vivier.eu>

v3: remove useless upper command
v2: add comments suggested by Taylor

Laurent Vivier (4):
  scripts: add a script to generate syscall_nr.h
  linux-user, aarch64: sync syscall numbers with kernel v5.5
  linux-user,nios2: sync syscall numbers with kernel v5.5
  linux-user, openrisc: sync syscall numbers with kernel v5.5

 linux-user/aarch64/syscall_nr.h  |  34 +-
 linux-user/nios2/syscall_nr.h| 650 +++
 linux-user/openrisc/syscall_nr.h | 309 +++
 scripts/gensyscalls.sh   | 102 +
 4 files changed, 513 insertions(+), 582 deletions(-)
 create mode 100755 scripts/gensyscalls.sh

-- 
2.24.1

[PATCH v4 0/6] reference implementation of RSS and hash report

2020-03-16 Thread Yuri Benditovich

Support for VIRTIO_NET_F_RSS and VIRTIO_NET_F_HASH_REPORT
features in QEMU for reference purpose.
Implements Toeplitz hash calculation for incoming
packets according to configuration provided by driver.
Uses calculated hash for decision on receive virtqueue
and/or reports the hash in the virtio header

Changes from v3
Use pointer for indirections table instead of array
Cosmetic changes per v3 review

Changes from v2:
Implemented migration support
Added implementation of hash report
Changed reporting of error during processing of command
(per review of v2)
Cosmetic changes per v2 review

Yuri Benditovich (6):
  virtio-net: introduce RSS and hash report features
  virtio-net: implement RSS configuration command
  virtio-net: implement RX RSS processing
  tap: allow extended virtio header with hash info
  virtio-net: reference implementation of hash report
  virtio-net: add migration support for RSS and hash report

 hw/net/trace-events|   3 +
 hw/net/virtio-net.c| 437 +++--
 include/hw/virtio/virtio-net.h |  16 ++
 net/tap.c  |  11 +-
 4 files changed, 439 insertions(+), 28 deletions(-)

-- 
2.17.1

Re: [PATCH 0/2] Fix Cooperlake CPU model

2020-03-16 Thread Paolo Bonzini

On 16/03/20 11:19, Zhang, Cathy wrote:
> Yes, I see they are already in master, but not in v4.2 yet, so will they
> be in the next release v5.0?

Yes, that's what master will become.

Paolo

[PATCH v4 6/6] virtio-net: add migration support for RSS and hash report

2020-03-16 Thread Yuri Benditovich

Save and restore RSS/hash report configuration.

Signed-off-by: Yuri Benditovich 
---
 hw/net/virtio-net.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index a0614ad4e6..f343762a0f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2842,6 +2842,13 @@ static int virtio_net_post_load_device(void *opaque, int 
version_id)
 }
 }
 
+if (n->rss_data.enabled) {
+trace_virtio_net_rss_enable(n->rss_data.hash_types,
+n->rss_data.indirections_len,
+sizeof(n->rss_data.key));
+} else {
+trace_virtio_net_rss_disable();
+}
 return 0;
 }
 
@@ -3019,6 +3026,24 @@ static const VMStateDescription 
vmstate_virtio_net_has_vnet = {
 },
 };
 
+static const VMStateDescription vmstate_rss = {
+.name  = "vmstate_rss",
+.fields = (VMStateField[]) {
+VMSTATE_BOOL(enabled, VirtioNetRssData),
+VMSTATE_BOOL(redirect, VirtioNetRssData),
+VMSTATE_BOOL(populate_hash, VirtioNetRssData),
+VMSTATE_UINT32(hash_types, VirtioNetRssData),
+VMSTATE_UINT32(indirections_len, VirtioNetRssData),
+VMSTATE_UINT16(default_queue, VirtioNetRssData),
+VMSTATE_UINT8_ARRAY(key, VirtioNetRssData,
+VIRTIO_NET_RSS_MAX_KEY_SIZE),
+VMSTATE_VARRAY_UINT32_ALLOC(indirections_table, VirtioNetRssData,
+indirections_len, 0,
+vmstate_info_uint16, uint16_t),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static const VMStateDescription vmstate_virtio_net_device = {
 .name = "virtio-net-device",
 .version_id = VIRTIO_NET_VM_VERSION,
@@ -3067,6 +3092,7 @@ static const VMStateDescription vmstate_virtio_net_device 
= {
  vmstate_virtio_net_tx_waiting),
 VMSTATE_UINT64_TEST(curr_guest_offloads, VirtIONet,
 has_ctrl_guest_offloads),
+VMSTATE_STRUCT(rss_data, VirtIONet, 1, vmstate_rss, VirtioNetRssData),
 VMSTATE_END_OF_LIST()
},
 };
-- 
2.17.1

Re: [PATCH v2 2/3] acpi: Add Windows ACPI Emulated Device Table (WAET)

2020-03-16 Thread Igor Mammedov

On Fri, 13 Mar 2020 16:50:08 +0200
Liran Alon  wrote:

> Microsoft introduced this ACPI table to avoid Windows guests performing
> various workarounds for device erratas. As the virtual device emulated
> by VMM may not have the errata.
> 
> Currently, WAET allows hypervisor to inform guest about two
> specific behaviors: One for RTC and the other for ACPI PM timer.
> 
> Support for WAET have been introduced since Windows Vista. This ACPI
> table is also exposed by other common hypervisors by default, including:
> VMware, GCP and AWS.
> 
> This patch adds WAET ACPI Table to QEMU.
> 
> We set "ACPI PM timer good" bit in "Emualted Device Flags" field to
> indicate that the ACPI PM timer has been enhanced to not require
> multiple reads to obtain a reliable value.
> This results in improving the performance of Windows guests that use
> ACPI PM timer by avoiding unnecessary VMExits caused by these multiple
> reads.
> 
> Co-developed-by: Elad Gabay 
> Signed-off-by: Liran Alon 

Reviewed-by: Igor Mammedov 

> ---
>  hw/i386/acpi-build.c | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 9c4e46fa7466..1c3a2e8fcb3c 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2512,6 +2512,34 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
>  build_header(linker, table_data, (void *)(table_data->data + dmar_start),
>   "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
>  }
> +
> +/*
> + * Windows ACPI Emulated Devices Table
> + * (Version 1.0 - April 6, 2009)
> + * Spec: 
> http://download.microsoft.com/download/7/E/7/7E7662CF-CBEA-470B-A97E-CE7CE0D98DC2/WAET.docx
> + *
> + * Helpful to speedup Windows guests and ignored by others.
> + */
> +static void
> +build_waet(GArray *table_data, BIOSLinker *linker)
> +{
> +int waet_start = table_data->len;
> +
> +/* WAET header */
> +acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +/*
> + * Set "ACPI PM timer good" flag.
> + *
> + * Tells Windows guests that our ACPI PM timer is reliable in the
> + * sense that guest can read it only once to obtain a reliable value.
> + * Which avoids costly VMExits caused by guest re-reading it 
> unnecessarily.
> + */
> +build_append_int_noprefix(table_data, 1 << 1 /* ACPI PM timer good */, 
> 4);
this should work but, I'd use (1UL << 1) if you'll need to respin


> +
> +build_header(linker, table_data, (void *)(table_data->data + waet_start),
> + "WAET", table_data->len - waet_start, 1, NULL, NULL);
> +}
> +
>  /*
>   *   IVRS table as specified in AMD IOMMU Specification v2.62, Section 5.2
>   *   accessible here http://support.amd.com/TechDocs/48882_IOMMU.pdf
> @@ -2859,6 +2887,9 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
> *machine)
>machine->nvdimms_state, machine->ram_slots);
>  }
>  
> +acpi_add_table(table_offsets, tables_blob);
> +build_waet(tables_blob, tables->linker);
> +
>  /* Add tables supplied by user (if any) */
>  for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
>  unsigned len = acpi_table_len(u);

[PATCH v2 0/2] thread: add lock guard macros

2020-03-16 Thread Stefan Hajnoczi

Lock guards automatically call qemu_(rec_)mutex_unlock() when returning from a
function or leaving leaving a lexical scope.  This simplifies code and
eliminates leaks (especially in error code paths).

This series adds lock guards for QemuMutex and QemuRecMutex.  It does not
convert the entire tree but includes example conversions.

Stefan Hajnoczi (2):
  lockable: add lock guards
  lockable: add QemuRecMutex support

 include/qemu/lockable.h | 67 +
 plugins/core.c  |  7 ++---
 plugins/loader.c| 16 +-
 util/qemu-timer.c   | 23 +++---
 4 files changed, 89 insertions(+), 24 deletions(-)

-- 
2.24.1

Re: [PATCH v4 2/3] mac_via: fix incorrect creation of mos6522 device in mac_via

2020-03-16 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 15/03/20 15:56, Markus Armbruster wrote:
>>>
>>> The question is why they are not, i.e. where does the above reasoning break.
>> I don't know.  But let's for the sake of the argument assume this
>> actually worked.  Asking for help in the monitor then *still* has side
>> effects visible in the time span between .instance_init() and
>> finalization.
>> 
>> Why is that harmless?
>
> I don't really have an answer, but if that is a problem we could change
> "info qtree" to skip non-realized devices.

Can we convince ourselves that "info qtree" is the *only* way to observe
these side effects?

If yes, a hack to ignore unrealized devices "fixes" the problem.

If no, it sweeps it under the rug.

Re: [PATCH 6/8] hw/ide: Do ide_drive_get() within pci_ide_create_devs()

2020-03-16 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 13/03/20 23:16, BALATON Zoltan wrote:
>>>
>>> +    pci_dev = pci_create_simple(pci_bus, -1, "cmd646-ide");
>>> +    pci_ide_create_devs(pci_dev);
>> 
>> Additionally, I think it may also make sense to move pci_ide_create_devs
>> call into the realize methods of these IDE controllers so boards do not
>> need to do it explicitely. These calls always follow the creation of the
>> device immediately so could just be done internally in IDE device and
>> simplify it further. I can attempt to prepare additional patches for
>> that but first I'd like to hear if anyone has anything against that to
>> avoid doing useless work.
>
> No, it's better to do it separately.  I think that otherwise you could
> add another IDE controller with -device, and both controllers would try
> to add the drives.

Correct.

Creating device frontends for -drive if=ide is the board's job.  Boards
may delegate to suitable helpers.  I'd very much prefer these helpers
not to live with device model code.  Board and device model code should
be cleanly separated to to reduce the temptation to muddle their
responsibilities.  It's separation of concerns.

I actually wish we had separate sub-trees for boards and devices instead
of keeping both in hw/.

> Basically, separating the call means that only automatically added
> controllers obey "if=ide".

Re: [PATCH v7 2/5] virtio-iommu: Add iommu notifier for map/unmap

2020-03-16 Thread Bharat Bhushan

Hi Eric,

On Fri, Mar 13, 2020 at 7:55 PM Auger Eric  wrote:
>
> Hi Bharat,
> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
> > This patch extends VIRTIO_IOMMU_T_MAP/UNMAP request to
> > notify registered iommu-notifier. Which will call vfio
> s/iommu-notifier/iommu-notifiers
> > notifier to map/unmap region in iommu.
> can be any notifier (vhost/vfio).
> >
> > Signed-off-by: Bharat Bhushan 
> > Signed-off-by: Eric Auger 
> > ---
> >  hw/virtio/trace-events   |  2 +
> >  hw/virtio/virtio-iommu.c | 66 +++-
> >  include/hw/virtio/virtio-iommu.h |  6 +++
> >  3 files changed, 73 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index e83500bee9..d94a1cd8a3 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -73,3 +73,5 @@ virtio_iommu_get_domain(uint32_t domain_id) "Alloc 
> > domain=%d"
> >  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
> >  virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, 
> > uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
> >  virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t 
> > endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address 
> > =0x%"PRIx64
> > +virtio_iommu_notify_map(const char *name, uint64_t iova, uint64_t paddr, 
> > uint64_t map_size) "mr=%s iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64
> > +virtio_iommu_notify_unmap(const char *name, uint64_t iova, uint64_t 
> > map_size) "mr=%s iova=0x%"PRIx64" size=0x%"PRIx64
> > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > index 4cee8083bc..e51344a53e 100644
> > --- a/hw/virtio/virtio-iommu.c
> > +++ b/hw/virtio/virtio-iommu.c
> > @@ -123,6 +123,38 @@ static gint interval_cmp(gconstpointer a, 
> > gconstpointer b, gpointer user_data)
> >  }
> >  }
> >
> > +static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr iova,
> > +hwaddr paddr, hwaddr size)
> > +{
> > +IOMMUTLBEntry entry;
> > +
> > +entry.target_as = _space_memory;
> > +entry.addr_mask = size - 1;
> > +
> > +entry.iova = iova;
> > +trace_virtio_iommu_notify_map(mr->parent_obj.name, iova, paddr, size);
> > +entry.perm = IOMMU_RW;
> > +entry.translated_addr = paddr;
> > +
> > +memory_region_notify_iommu(mr, 0, entry);
> > +}
> > +
> > +static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
> > +  hwaddr size)
> > +{
> > +IOMMUTLBEntry entry;
> > +
> > +entry.target_as = _space_memory;
> > +entry.addr_mask = size - 1;
> > +
> > +entry.iova = iova;
> > +trace_virtio_iommu_notify_unmap(mr->parent_obj.name, iova, size);
> > +entry.perm = IOMMU_NONE;
> > +entry.translated_addr = 0;
> > +
> > +memory_region_notify_iommu(mr, 0, entry);
> > +}
> > +
> >  static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint 
> > *ep)
> >  {
> >  if (!ep->domain) {
> > @@ -307,9 +339,12 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
> >  uint64_t virt_start = le64_to_cpu(req->virt_start);
> >  uint64_t virt_end = le64_to_cpu(req->virt_end);
> >  uint32_t flags = le32_to_cpu(req->flags);
> > +hwaddr size = virt_end - virt_start + 1;
> > +VirtioIOMMUNotifierNode *node;
> >  VirtIOIOMMUDomain *domain;
> >  VirtIOIOMMUInterval *interval;
> >  VirtIOIOMMUMapping *mapping;
> > +VirtIOIOMMUEndpoint *ep;
> >
> >  if (flags & ~VIRTIO_IOMMU_MAP_F_MASK) {
> >  return VIRTIO_IOMMU_S_INVAL;
> > @@ -339,9 +374,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
> >
> >  g_tree_insert(domain->mappings, interval, mapping);
> >
> > +/* All devices in an address-space share mapping */
> > +QLIST_FOREACH(node, >notifiers_list, next) {
> > +QLIST_FOREACH(ep, >endpoint_list, next) {
> > +if (ep->id == node->iommu_dev->devfn) {
> > +virtio_iommu_notify_map(>iommu_dev->iommu_mr,
> > +virt_start, phys_start, size);
> > +}
> > +}
> > +}
> > +
> >  return VIRTIO_IOMMU_S_OK;
> >  }
> >
> > +static void virtio_iommu_remove_mapping(VirtIOIOMMU *s, VirtIOIOMMUDomain 
> > *domain,
> > +VirtIOIOMMUInterval *interval)
> > +{
> > +VirtioIOMMUNotifierNode *node;
> > +VirtIOIOMMUEndpoint *ep;
> > +
> > +QLIST_FOREACH(node, >notifiers_list, next) {
> > +QLIST_FOREACH(ep, >endpoint_list, next) {
> > +if (ep->id == node->iommu_dev->devfn) {
> > +virtio_iommu_notify_unmap(>iommu_dev->iommu_mr,
> > +  interval->low,
> > +  interval->high - interval->low + 
> > 1);
> > +}
> > +}
> > +}
> > +g_tree_remove(domain->mappings, (gpointer)(interval));
> > +}
> What about

Re: [PATCH 0/8] Misc hw/ide legacy clean up

2020-03-16 Thread Markus Armbruster

BALATON Zoltan  writes:

> These are some clean ups to remove more legacy init functions and
> lessen dependence on include/hw/ide.h with some simplifications in
> board code. There should be no functional change.

PATCH 1 could quote precedence more clearly in the commit message, but
that's detail.

I don't like PATCH 4.

PATCH 1-3,5-8:
Reviewed-by: Markus Armbruster

Re: [PATCH v5 08/26] nvme: refactor device realization

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 11:27, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > This patch splits up nvme_realize into multiple individual functions,
> > each initializing a different subset of the device.
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c | 175 +++-
> >  hw/block/nvme.h |  21 ++
> >  2 files changed, 133 insertions(+), 63 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index e1810260d40b..81514eaef63a 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -44,6 +44,7 @@
> >  #include "nvme.h"
> >  
> >  #define NVME_SPEC_VER 0x00010201
> > +#define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
> >  
> >  #define NVME_GUEST_ERR(trace, fmt, ...) \
> >  do { \
> > @@ -1325,67 +1326,106 @@ static const MemoryRegionOps nvme_cmb_ops = {
> >  },
> >  };
> >  
> > -static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> > +static int nvme_check_constraints(NvmeCtrl *n, Error **errp)
> >  {
> > -NvmeCtrl *n = NVME(pci_dev);
> > -NvmeIdCtrl *id = >id_ctrl;
> > -
> > -int i;
> > -int64_t bs_size;
> > -uint8_t *pci_conf;
> > -
> > -if (!n->params.num_queues) {
> > -error_setg(errp, "num_queues can't be zero");
> > -return;
> > -}
> > +NvmeParams *params = >params;
> >  
> >  if (!n->conf.blk) {
> > -error_setg(errp, "drive property not set");
> > -return;
> > +error_setg(errp, "nvme: block backend not configured");
> > +return 1;
> As a matter of taste, negative values indicate error, and 0 is the success 
> value.
> In Linux kernel this is even an official rule.
> >  }

Fixed.

> >  
> > -bs_size = blk_getlength(n->conf.blk);
> > -if (bs_size < 0) {
> > -error_setg(errp, "could not get backing file size");
> > -return;
> > +if (!params->serial) {
> > +error_setg(errp, "nvme: serial not configured");
> > +return 1;
> >  }
> >  
> > -if (!n->params.serial) {
> > -error_setg(errp, "serial property not set");
> > -return;
> > +if ((params->num_queues < 1 || params->num_queues > NVME_MAX_QS)) {
> > +error_setg(errp, "nvme: invalid queue configuration");
> Maybe something like "nvme: invalid queue count specified, should be between 
> 1 and ..."?
> > +return 1;
> >  }

Fixed.

> > +
> > +return 0;
> > +}
> > +
> > +static int nvme_init_blk(NvmeCtrl *n, Error **errp)
> > +{
> >  blkconf_blocksizes(>conf);
> >  if (!blkconf_apply_backend_options(>conf, 
> > blk_is_read_only(n->conf.blk),
> > -   false, errp)) {
> > -return;
> > +false, errp)) {
> > +return 1;
> >  }
> >  
> > -pci_conf = pci_dev->config;
> > -pci_conf[PCI_INTERRUPT_PIN] = 1;
> > -pci_config_set_prog_interface(pci_dev->config, 0x2);
> > -pci_config_set_class(pci_dev->config, PCI_CLASS_STORAGE_EXPRESS);
> > -pcie_endpoint_cap_init(pci_dev, 0x80);
> > +return 0;
> > +}
> >  
> > +static void nvme_init_state(NvmeCtrl *n)
> > +{
> >  n->num_namespaces = 1;
> >  n->reg_size = pow2ceil(0x1004 + 2 * (n->params.num_queues + 1) * 4);
> 
> Isn't that wrong?
> First 4K of mmio (0x1000) is the registers, and that is followed by the 
> doorbells,
> and each doorbell takes 8 bytes (assuming regular doorbell stride).
> so n->params.num_queues + 1 should be total number of queues, thus the 0x1004 
> should be 0x1000 IMHO.
> I might miss some rounding magic here though.
> 

Yeah. I think you are right. It all becomes slightly more fishy due to
the num_queues device parameter being 1's based and accounts for the
admin queue pair.

But in get/set features, the value has to be 0's based and only account
for the I/O queues, so we need to subtract 2 from the value. It's
confusing all around.

Since the admin queue pair isn't really optional I think it would be
better that we introduces a new max_ioqpairs parameter that is 1's
based, counts number of pairs and obviously only accounts for the io
queues.

I guess we need to keep the num_queues parameter around for
compatibility.

The doorbells are only 4 bytes btw, but the calculation still looks
wrong. With a max_ioqpairs parameter in place, the reg_size should be

pow2ceil(0x1008 + 2 * (n->params.max_ioqpairs) * 4)

Right? Thats 0x1000 for the core registers, 8 bytes for the sq/cq
doorbells for the admin queue pair, and then room for the i/o queue
pairs.

I added a patch for this in v6.

> > -n->ns_size = bs_size / (uint64_t)n->num_namespaces;
> > -
> >  n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> >  n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
> >  n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
> > +}
> >  
> > -memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n,
> > -  "nvme", n->reg_size);
> > +static void nvme_init_cmb(NvmeCtrl *n,

Re: [PATCH v5 17/26] nvme: allow multiple aios per command

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 13:48, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > This refactors how the device issues asynchronous block backend
> > requests. The NvmeRequest now holds a queue of NvmeAIOs that are
> > associated with the command. This allows multiple aios to be issued for
> > a command. Only when all requests have been completed will the device
> > post a completion queue entry.
> > 
> > Because the device is currently guaranteed to only issue a single aio
> > request per command, the benefit is not immediately obvious. But this
> > functionality is required to support metadata, the dataset management
> > command and other features.
> 
> I don't know what the strategy will be chosen for supporting metadata
> (qemu doesn't have any notion of metadata in the block layer), but for 
> dataset management
> you are right. Dataset management command can contain a table of areas to 
> discard
> (although in reality I have seen no driver putting there more that one entry).
> 

The strategy is different depending on how the metadata is transferred
between host and device. For the "separate buffer" case, metadata is
transferred using a separate memory pointer in the nvme command (MPTR).
In this case the metadata is kept separately on a new blockdev attached
to the namespace.

In the other case, metadata is transferred as part of an extended lba
(say 512 + 8 bytes) and kept inline on the main namespace blockdev. This
is challenging for QEMU as it breaks interoperability of the image with
other devices. But that is a discussion for fresh RFC ;)

Note that the support for multiple AIOs is also used for DULBE support
down the line when I get around to posting those patches. So this is
preparatory for a lot of features that requires persistant state across
device power off.

> 
> > 
> > Signed-off-by: Klaus Jensen 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c   | 449 +-
> >  hw/block/nvme.h   | 134 +++--
> >  hw/block/trace-events |   8 +
> >  3 files changed, 480 insertions(+), 111 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 334265efb21e..e97da35c4ca1 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -19,7 +19,8 @@
> >   *  -drive file=,if=none,id=
> >   *  -device nvme,drive=,serial=,id=, \
> >   *  cmb_size_mb=, \
> > - *  num_queues=
> > + *  num_queues=, \
> > + *  mdts=
> 
> Could you split mdts checks into a separate patch? This is not related to the 
> series.

Absolutely. Done.

> 
> >   *
> >   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
> >   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> > @@ -57,6 +58,7 @@
> >  } while (0)
> >  
> >  static void nvme_process_sq(void *opaque);
> > +static void nvme_aio_cb(void *opaque, int ret);
> >  
> >  static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
> >  {
> > @@ -341,6 +343,107 @@ static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t 
> > *ptr, uint32_t len,
> >  return status;
> >  }
> >  
> > +static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > +{
> > +NvmeNamespace *ns = req->ns;
> > +
> > +uint32_t len = req->nlb << nvme_ns_lbads(ns);
> > +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> > +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> > +
> > +return nvme_map_prp(n, >qsg, >iov, prp1, prp2, len, req);
> > +}
> 
> Same here, this is another nice refactoring and it should be in separate 
> patch.

Done.

> 
> > +
> > +static void nvme_aio_destroy(NvmeAIO *aio)
> > +{
> > +g_free(aio);
> > +}
> > +
> > +static inline void nvme_req_register_aio(NvmeRequest *req, NvmeAIO *aio,
> > +NvmeAIOOp opc)
> > +{
> > +aio->opc = opc;
> > +
> > +trace_nvme_dev_req_register_aio(nvme_cid(req), aio, blk_name(aio->blk),
> > +aio->offset, aio->len, nvme_aio_opc_str(aio), req);
> > +
> > +if (req) {
> > +QTAILQ_INSERT_TAIL(>aio_tailq, aio, tailq_entry);
> > +}
> > +}
> > +
> > +static void nvme_aio(NvmeAIO *aio)
> Function name not clear to me. Maybe change this to something like 
> nvme_submit_aio.

Fixed.

> > +{
> > +BlockBackend *blk = aio->blk;
> > +BlockAcctCookie *acct = >acct;
> > +BlockAcctStats *stats = blk_get_stats(blk);
> > +
> > +bool is_write, dma;
> > +
> > +switch (aio->opc) {
> > +case NVME_AIO_OPC_NONE:
> > +break;
> > +
> > +case NVME_AIO_OPC_FLUSH:
> > +block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
> > +aio->aiocb = blk_aio_flush(blk, nvme_aio_cb, aio);
> > +break;
> > +
> > +case NVME_AIO_OPC_WRITE_ZEROES:
> > +block_acct_start(stats, acct, aio->len, BLOCK_ACCT_WRITE);
> > +aio->aiocb = blk_aio_pwrite_zeroes(blk, aio->offset, aio->len,
> > +BDRV_REQ_MAY_UNMAP, nvme_aio_cb, aio);
> > +break;
> > +
> > +case

[PATCH v7 2/4] qcow2: rework the cluster compression routine

2020-03-16 Thread Denis Plotnikov

The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Bharat Bhushan

Hi Eric,

On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
>
> Hi Bharat
>
> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
> > iommu-notifier are called when a device is attached
> IOMMU notifiers
> > or detached to as address-space.
> > This is needed for VFIO.
> and vhost for detach
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  hw/virtio/virtio-iommu.c | 47 
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > index e51344a53e..2006f72901 100644
> > --- a/hw/virtio/virtio-iommu.c
> > +++ b/hw/virtio/virtio-iommu.c
> > @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
> >  uint32_t id;
> >  VirtIOIOMMUDomain *domain;
> >  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
> > +VirtIOIOMMU *viommu;
> This needs specal care on post-load. When migrating the EPs, only the id
> is migrated. On post-load you need to set viommu as it is done for
> domain. migration is allowed with vhost.

ok, I have not tried vhost/migration. Below change set viommu when
reconstructing endpoint.

@@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
key, gpointer value,

 QLIST_FOREACH(iter, >endpoint_list, next) {
 iter->domain = d;
+   iter->viommu = s;
 g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
 }
 return false; /* continue the domain traversal */

> >  } VirtIOIOMMUEndpoint;
> >
> >  typedef struct VirtIOIOMMUInterval {
> > @@ -155,8 +156,44 @@ static void 
> > virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
> >  memory_region_notify_iommu(mr, 0, entry);
> >  }
> >
> > +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer value,
> > +   gpointer data)
> > +{
> > +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> > +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> > +
> > +virtio_iommu_notify_unmap(mr, interval->low,
> > +  interval->high - interval->low + 1);
> > +
> > +return false;
> > +}
> > +
> > +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
> > + gpointer data)
> > +{
> > +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
> > +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> > +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> > +
> > +virtio_iommu_notify_map(mr, interval->low, mapping->phys_addr,
> > +interval->high - interval->low + 1);
> > +
> > +return false;
> > +}
> > +
> >  static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint 
> > *ep)
> >  {
> > +VirtioIOMMUNotifierNode *node;
> > +VirtIOIOMMU *s = ep->viommu;
> > +VirtIOIOMMUDomain *domain = ep->domain;
> > +
> > +QLIST_FOREACH(node, >notifiers_list, next) {
> > +if (ep->id == node->iommu_dev->devfn) {
> > +g_tree_foreach(domain->mappings, virtio_iommu_mapping_unmap,
> > +   >iommu_dev->iommu_mr);
> I understand this should fo the job for domain removal

did not get the comment, are you saying we should do this on domain removal?

> > +}
> > +}
> > +
> >  if (!ep->domain) {
> >  return;
> >  }
> > @@ -178,6 +215,7 @@ static VirtIOIOMMUEndpoint 
> > *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> >  }
> >  ep = g_malloc0(sizeof(*ep));
> >  ep->id = ep_id;
> > +ep->viommu = s;
> >  trace_virtio_iommu_get_endpoint(ep_id);
> >  g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
> >  return ep;
> > @@ -272,6 +310,7 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> >  {
> >  uint32_t domain_id = le32_to_cpu(req->domain);
> >  uint32_t ep_id = le32_to_cpu(req->endpoint);
> > +VirtioIOMMUNotifierNode *node;
> >  VirtIOIOMMUDomain *domain;
> >  VirtIOIOMMUEndpoint *ep;
> >
> > @@ -299,6 +338,14 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
> >
> >  ep->domain = domain;
> >
> > +/* Replay existing address space mappings on the associated memory 
> > region */
> maybe use the "domain" terminology here.

ok,

Thanks
-Bharat

> > +QLIST_FOREACH(node, >notifiers_list, next) {
> > +if (ep_id == node->iommu_dev->devfn) {
> > +g_tree_foreach(domain->mappings, virtio_iommu_mapping_map,
> > +   >iommu_dev->iommu_mr);
> > +}
> > +}
> > +
> >  return VIRTIO_IOMMU_S_OK;
> >  }
> >
> >
> Thanks
>
> Eric
>

Re: [PATCH v2 1/8] target/i386: Restrict X86CPUFeatureWord to X86 targets

2020-03-16 Thread Philippe Mathieu-Daudé


On 3/16/20 1:29 AM, Aleksandar Markovic wrote:



On Monday, March 16, 2020, Philippe Mathieu-Daudé > wrote:


Move out x86-specific structures from generic machine code.


Philippe,

I a kind of have hard time understanding what is achieved with this 
patch. Is this pure code moving/reorganization? What is the logical 
connection between this patch and the whole series (that is about 
removing unneeded building for user mode)? How does this patch affect 
build time for user mode?


This code ends up in all linux-user binaries:

$ make clean mipsel-linux-user/all
$ fgrep -r X86CPURegister32
Binary file qapi/qapi-visit-machine.o matches
qapi/qapi-visit-machine.h:void visit_type_X86CPURegister32(Visitor *v, 
const char *name, X86CPURegister32 *obj, Error **errp);

qapi/qapi-types-machine.c:const QEnumLookup X86CPURegister32_lookup = {
qapi/qapi-types-machine.h:typedef enum X86CPURegister32 {
qapi/qapi-types-machine.h:} X86CPURegister32;
qapi/qapi-types-machine.h:#define X86CPURegister32_str(val) \
qapi/qapi-types-machine.h:qapi_enum_lookup(_lookup, 
(val))

qapi/qapi-types-machine.h:extern const QEnumLookup X86CPURegister32_lookup;
qapi/qapi-types-machine.h:X86CPURegister32 cpuid_register;
Binary file qapi/qapi-events-machine.o matches
Binary file qapi/qapi-types-machine.o matches
qapi/qapi-doc.texi:@deftp {Enum} X86CPURegister32
qapi/qapi-doc.texi:@item @code{cpuid-register: X86CPURegister32}
qapi/qapi-visit-machine.c:void visit_type_X86CPURegister32(Visitor *v, 
const char *name, X86CPURegister32 *obj, Error **errp)
qapi/qapi-visit-machine.c:visit_type_enum(v, name, , 
_lookup, errp);
qapi/qapi-visit-machine.c:visit_type_X86CPURegister32(v, 
"cpuid-register", >cpuid_register, );

Binary file hw/core/qdev.o matches
Binary file hw/core/cpu.o matches
Binary file libqemuutil.a matches
Binary file mipsel-linux-user/qemu-mipsel matches
\---/

By restricting this structure on the x86 architecture, this is less time 
compiling unused code on others archs, and resulting binary is smaller too.




Sincerely,
Aleksandar


Acked-by: Richard Henderson mailto:richard.hender...@linaro.org>>
Signed-off-by: Philippe Mathieu-Daudé mailto:phi...@redhat.com>>
---
  qapi/machine-target.json   | 45 ++
  qapi/machine.json          | 42 ---
  target/i386/cpu.c          |  2 +-
  target/i386/machine-stub.c | 22 +++
  target/i386/Makefile.objs  |  3 ++-
  5 files changed, 70 insertions(+), 44 deletions(-)
  create mode 100644 target/i386/machine-stub.c

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index f2c82949d8..fb7a4b7850 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -3,6 +3,51 @@
  # This work is licensed under the terms of the GNU GPL, version 2
or later.
  # See the COPYING file in the top-level directory.

+##
+# @X86CPURegister32:
+#
+# A X86 32-bit register
+#
+# Since: 1.5
+##
+{ 'enum': 'X86CPURegister32',
+  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ],
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @X86CPUFeatureWordInfo:
+#
+# Information about a X86 CPU feature word
+#
+# @cpuid-input-eax: Input EAX value for CPUID instruction for that
feature word
+#
+# @cpuid-input-ecx: Input ECX value for CPUID instruction for that
+#                   feature word
+#
+# @cpuid-register: Output register containing the feature bits
+#
+# @features: value of output register, containing the feature bits
+#
+# Since: 1.5
+##
+{ 'struct': 'X86CPUFeatureWordInfo',
+  'data': { 'cpuid-input-eax': 'int',
+            '*cpuid-input-ecx': 'int',
+            'cpuid-register': 'X86CPURegister32',
+            'features': 'int' },
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @DummyForceArrays:
+#
+# Not used by QMP; hack to let us use X86CPUFeatureWordInfoList
internally
+#
+# Since: 2.5
+##
+{ 'struct': 'DummyForceArrays',
+  'data': { 'unused': ['X86CPUFeatureWordInfo'] },
+  'if': 'defined(TARGET_I386)' }
+
  ##
  # @CpuModelInfo:
  #
diff --git a/qapi/machine.json b/qapi/machine.json
index 6c11e3cf3a..de05730704 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -505,48 +505,6 @@
     'dst': 'uint16',
     'val': 'uint8' }}

-##
-# @X86CPURegister32:
-#
-# A X86 32-bit register
-#
-# Since: 1.5
-##
-{ 'enum': 'X86CPURegister32',
-  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ] }
-
-##
-# @X86CPUFeatureWordInfo:
-#
-# Information about a X86 CPU feature word
-#
-# @cpuid-input-eax: Input EAX value for CPUID instruction

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Bharat Bhushan

Hi Eric,

On Mon, Mar 16, 2020 at 1:02 PM Auger Eric  wrote:
>
> Hi Bharat,
>
> On 3/16/20 7:41 AM, Bharat Bhushan wrote:
> > Hi Eric,
> >
> > On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
> >>
> >> Hi Bharat
> >>
> >> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
> >>> iommu-notifier are called when a device is attached
> >> IOMMU notifiers
> >>> or detached to as address-space.
> >>> This is needed for VFIO.
> >> and vhost for detach
> >>>
> >>> Signed-off-by: Bharat Bhushan 
> >>> ---
> >>>  hw/virtio/virtio-iommu.c | 47 
> >>>  1 file changed, 47 insertions(+)
> >>>
> >>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >>> index e51344a53e..2006f72901 100644
> >>> --- a/hw/virtio/virtio-iommu.c
> >>> +++ b/hw/virtio/virtio-iommu.c
> >>> @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
> >>>  uint32_t id;
> >>>  VirtIOIOMMUDomain *domain;
> >>>  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
> >>> +VirtIOIOMMU *viommu;
> >> This needs specal care on post-load. When migrating the EPs, only the id
> >> is migrated. On post-load you need to set viommu as it is done for
> >> domain. migration is allowed with vhost.
> >
> > ok, I have not tried vhost/migration. Below change set viommu when
> > reconstructing endpoint.
>
>
> Yes I think this should be OK.
>
> By the end I did the series a try with vhost/vfio. with vhost it works
> (not with recent kernel though, but the issue may be related to kernel).
> With VFIO however it does not for me.
>
> First issue is: your guest can use 4K page and your host can use 64KB
> pages. In that case VFIO_DMA_MAP will fail with -EINVAL. We must devise
> a way to pass the host settings to the VIRTIO-IOMMU device.
>
> Even with 64KB pages, it did not work for me. I have obviously not the
> storm of VFIO_DMA_MAP failures but I have some, most probably due to
> some wrong notifications somewhere. I will try to investigate on my side.
>
> Did you test with VFIO on your side?

I did not tried with different page sizes, only tested with 4K page size.

Yes it works, I tested with two n/w device assigned to VM, both interfaces works

First I will try with 64k page size.

Thanks
-Bharat

>
> Thanks
>
> Eric
> >
> > @@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
> > key, gpointer value,
> >
> >  QLIST_FOREACH(iter, >endpoint_list, next) {
> >  iter->domain = d;
> > +   iter->viommu = s;
> >  g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
> >  }
> >  return false; /* continue the domain traversal */
> >
> >>>  } VirtIOIOMMUEndpoint;
> >>>
> >>>  typedef struct VirtIOIOMMUInterval {
> >>> @@ -155,8 +156,44 @@ static void 
> >>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
> >>>  memory_region_notify_iommu(mr, 0, entry);
> >>>  }
> >>>
> >>> +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer value,
> >>> +   gpointer data)
> >>> +{
> >>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> >>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> >>> +
> >>> +virtio_iommu_notify_unmap(mr, interval->low,
> >>> +  interval->high - interval->low + 1);
> >>> +
> >>> +return false;
> >>> +}
> >>> +
> >>> +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
> >>> + gpointer data)
> >>> +{
> >>> +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
> >>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
> >>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
> >>> +
> >>> +virtio_iommu_notify_map(mr, interval->low, mapping->phys_addr,
> >>> +interval->high - interval->low + 1);
> >>> +
> >>> +return false;
> >>> +}
> >>> +
> >>>  static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint 
> >>> *ep)
> >>>  {
> >>> +VirtioIOMMUNotifierNode *node;
> >>> +VirtIOIOMMU *s = ep->viommu;
> >>> +VirtIOIOMMUDomain *domain = ep->domain;
> >>> +
> >>> +QLIST_FOREACH(node, >notifiers_list, next) {
> >>> +if (ep->id == node->iommu_dev->devfn) {
> >>> +g_tree_foreach(domain->mappings, virtio_iommu_mapping_unmap,
> >>> +   >iommu_dev->iommu_mr);
> >> I understand this should fo the job for domain removal
> >
> > did not get the comment, are you saying we should do this on domain removal?
> see my reply on 2/5
>
> Note the above code should be moved after the check of !ep->domain below

ohh yes, will move

Thanks
-Bharat

> >
> >>> +}
> >>> +}
> >>> +
> >>>  if (!ep->domain) {
> >>>  return;
> >>>  }
> >>> @@ -178,6 +215,7 @@ static VirtIOIOMMUEndpoint 
> >>> *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> >>>  }
> >>>  ep = g_malloc0(sizeof(*ep));
> >>>  ep->id = ep_id;
> >>> +ep->viommu = s;
> >>>

Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-16 Thread LIU Zhiwei





On 2020/3/15 13:16, Richard Henderson wrote:

On 3/12/20 7:58 AM, LIU Zhiwei wrote:

+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+if (offset > vl) {\
+offset = vl;  \
+} \

This isn't right.


+for (i = 0; i < vl; i++) {\
+if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {\
+continue; \
+} \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
+} \
+if (i == 0) { \
+return;   \
+} \

You need to eliminate vl == 0 first, not last.
Then

 for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.



+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+for (i = 0; i < vl; i++) {\
+if (!vm && !vext_elem_mask(v0, mlen, i)) {\
+continue; \
+} \
+if (i + offset < vlmax) { \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));  \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

 if (offset >= vlmax) {
max = 0;
 } else {
max = MIN(vl, vlmax - offset);
 }
 for (i = 0; i < max; ++i)



+} else {  \
+*((ETYPE *)vd + H(i)) = 0;\
+}

Which lets these zeros merge into...


+for (; i < vlmax; i++) {  \
+CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));  \
+} \

These zeros.


+#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)   \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t i;

Re: [PATCH v5 15/26] nvme: bump supported specification to 1.3

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 12:35, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > Add new fields to the Identify Controller and Identify Namespace data
> > structures accoding to NVM Express 1.3d.
> > 
> > NVM Express 1.3d requires the following additional features:
> >   - addition of the Namespace Identification Descriptor List (CNS 03h)
> > for the Identify command
> >   - support for returning Command Sequence Error if a Set Features
> > command is submitted for the Number of Queues feature after any I/O
> > queues have been created.
> >   - The addition of the Log Specific Field (LSP) in the Get Log Page
> > command.
> 
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c   | 57 ---
> >  hw/block/nvme.h   |  1 +
> >  hw/block/trace-events |  3 ++-
> >  include/block/nvme.h  | 20 ++-
> >  4 files changed, 71 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 900732bb2f38..4acfc85b56a2 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -9,7 +9,7 @@
> >   */
> >  
> >  /**
> > - * Reference Specification: NVM Express 1.2.1
> > + * Reference Specification: NVM Express 1.3d
> >   *
> >   *   https://nvmexpress.org/resources/specifications/
> >   */
> > @@ -43,7 +43,7 @@
> >  #include "trace.h"
> >  #include "nvme.h"
> >  
> > -#define NVME_SPEC_VER 0x00010201
> > +#define NVME_SPEC_VER 0x00010300
> >  #define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
> >  #define NVME_TEMPERATURE 0x143
> >  #define NVME_TEMPERATURE_WARNING 0x157
> > @@ -735,6 +735,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
> > NvmeRequest *req)
> >  uint32_t dw12 = le32_to_cpu(cmd->cdw12);
> >  uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> >  uint8_t  lid = dw10 & 0xff;
> > +uint8_t  lsp = (dw10 >> 8) & 0xf;
> >  uint8_t  rae = (dw10 >> 15) & 0x1;
> >  uint32_t numdl, numdu;
> >  uint64_t off, lpol, lpou;
> > @@ -752,7 +753,7 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, 
> > NvmeRequest *req)
> >  return NVME_INVALID_FIELD | NVME_DNR;
> >  }
> >  
> > -trace_nvme_dev_get_log(nvme_cid(req), lid, rae, len, off);
> > +trace_nvme_dev_get_log(nvme_cid(req), lid, lsp, rae, len, off);
> >  
> >  switch (lid) {
> >  case NVME_LOG_ERROR_INFO:
> > @@ -863,6 +864,8 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeCmd 
> > *cmd)
> >  cq = g_malloc0(sizeof(*cq));
> >  nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
> >  NVME_CQ_FLAGS_IEN(qflags));
> Code alignment on that '('
> > +
> > +n->qs_created = true;
> Should be done also at nvme_create_sq

No, because you can't create a SQ without a matching CQ:

if (unlikely(!cqid || nvme_check_cqid(n, cqid))) {
trace_nvme_dev_err_invalid_create_sq_cqid(cqid);
return NVME_INVALID_CQID | NVME_DNR;
}


So if there is a matching cq, then qs_created = true.

> >  return NVME_SUCCESS;
> >  }
> >  
> > @@ -924,6 +927,47 @@ static uint16_t nvme_identify_ns_list(NvmeCtrl *n, 
> > NvmeIdentify *c)
> >  return ret;
> >  }
> >  
> > +static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeCmd *c)
> > +{
> > +static const int len = 4096;
> The spec caps the Identify payload size to 4K,
> thus this should go to nvme.h

Done.

> > +
> > +struct ns_descr {
> > +uint8_t nidt;
> > +uint8_t nidl;
> > +uint8_t rsvd2[2];
> > +uint8_t nid[16];
> > +};
> This is also part of the spec, thus should
> move to nvme.h
> 

Done - and cleaned up.

> > +
> > +uint32_t nsid = le32_to_cpu(c->nsid);
> > +uint64_t prp1 = le64_to_cpu(c->prp1);
> > +uint64_t prp2 = le64_to_cpu(c->prp2);
> > +
> > +struct ns_descr *list;
> > +uint16_t ret;
> > +
> > +trace_nvme_dev_identify_ns_descr_list(nsid);
> > +
> > +if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
> > +trace_nvme_dev_err_invalid_ns(nsid, n->num_namespaces);
> > +return NVME_INVALID_NSID | NVME_DNR;
> > +}
> > +
> > +/*
> > + * Because the NGUID and EUI64 fields are 0 in the Identify Namespace 
> > data
> > + * structure, a Namespace UUID (nidt = 0x3) must be reported in the
> > + * Namespace Identification Descriptor. Add a very basic Namespace UUID
> > + * here.
> Some per namespace uuid qemu property will be very nice to have to have a 
> uuid that
> is at least somewhat unique.
> Linux kernel I think might complain if it detects namespaces with duplicate 
> uuids.

It will be "unique" per controller (because it's just the namespace id).
The spec also says that it should be fixed for the lifetime of the
namespace, but I'm not sure how to ensure that without keeping that
state on disk somehow. I have a solution for this in a later series, but
for now, I think this is ok.

But since we actually support multiple controllers, there certainly is
an issue here. Maybe we can

Re: [PATCH v5 14/26] nvme: make sure ncqr and nsqr is valid

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 12:30, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > 0x is not an allowed value for NCQR and NSQR in Set Features on
> > Number of Queues.
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 30c5b3e7a67d..900732bb2f38 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1133,6 +1133,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
> > NvmeCmd *cmd, NvmeRequest *req)
> >  blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
> >  break;
> >  case NVME_NUMBER_OF_QUEUES:
> > +if ((dw11 & 0x) == 0x || ((dw11 >> 16) & 0x) == 
> > 0x) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> Very minor nitpick: since this spec requirement is not obvious, a 
> quote/reference to the spec
> would be nice to have here. 
> 

Added.

> > +
> >  trace_nvme_dev_setfeat_numq((dw11 & 0x) + 1,
> >  ((dw11 >> 16) & 0x) + 1, n->params.num_queues - 1,
> >  n->params.num_queues - 1);
> 
> Reviewed-by: Maxim Levitsky 
> 
> Best regards,
>   Maxim Levitsky
>

Re: [PATCH v7 2/5] virtio-iommu: Add iommu notifier for map/unmap

2020-03-16 Thread Auger Eric

Hi Bharat,

On 3/16/20 7:36 AM, Bharat Bhushan wrote:
> Hi Eric,
> 
> On Fri, Mar 13, 2020 at 7:55 PM Auger Eric  wrote:
>>
>> Hi Bharat,
>> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
>>> This patch extends VIRTIO_IOMMU_T_MAP/UNMAP request to
>>> notify registered iommu-notifier. Which will call vfio
>> s/iommu-notifier/iommu-notifiers
>>> notifier to map/unmap region in iommu.
>> can be any notifier (vhost/vfio).
>>>
>>> Signed-off-by: Bharat Bhushan 
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  hw/virtio/trace-events   |  2 +
>>>  hw/virtio/virtio-iommu.c | 66 +++-
>>>  include/hw/virtio/virtio-iommu.h |  6 +++
>>>  3 files changed, 73 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>>> index e83500bee9..d94a1cd8a3 100644
>>> --- a/hw/virtio/trace-events
>>> +++ b/hw/virtio/trace-events
>>> @@ -73,3 +73,5 @@ virtio_iommu_get_domain(uint32_t domain_id) "Alloc 
>>> domain=%d"
>>>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
>>>  virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, 
>>> uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>>>  virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t 
>>> endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address 
>>> =0x%"PRIx64
>>> +virtio_iommu_notify_map(const char *name, uint64_t iova, uint64_t paddr, 
>>> uint64_t map_size) "mr=%s iova=0x%"PRIx64" pa=0x%" PRIx64" size=0x%"PRIx64
>>> +virtio_iommu_notify_unmap(const char *name, uint64_t iova, uint64_t 
>>> map_size) "mr=%s iova=0x%"PRIx64" size=0x%"PRIx64
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index 4cee8083bc..e51344a53e 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -123,6 +123,38 @@ static gint interval_cmp(gconstpointer a, 
>>> gconstpointer b, gpointer user_data)
>>>  }
>>>  }
>>>
>>> +static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr iova,
>>> +hwaddr paddr, hwaddr size)
>>> +{
>>> +IOMMUTLBEntry entry;
>>> +
>>> +entry.target_as = _space_memory;
>>> +entry.addr_mask = size - 1;
>>> +
>>> +entry.iova = iova;
>>> +trace_virtio_iommu_notify_map(mr->parent_obj.name, iova, paddr, size);
>>> +entry.perm = IOMMU_RW;
>>> +entry.translated_addr = paddr;
>>> +
>>> +memory_region_notify_iommu(mr, 0, entry);
>>> +}
>>> +
>>> +static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
>>> +  hwaddr size)
>>> +{
>>> +IOMMUTLBEntry entry;
>>> +
>>> +entry.target_as = _space_memory;
>>> +entry.addr_mask = size - 1;
>>> +
>>> +entry.iova = iova;
>>> +trace_virtio_iommu_notify_unmap(mr->parent_obj.name, iova, size);
>>> +entry.perm = IOMMU_NONE;
>>> +entry.translated_addr = 0;
>>> +
>>> +memory_region_notify_iommu(mr, 0, entry);
>>> +}
>>> +
>>>  static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint 
>>> *ep)
>>>  {
>>>  if (!ep->domain) {
>>> @@ -307,9 +339,12 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>>>  uint64_t virt_start = le64_to_cpu(req->virt_start);
>>>  uint64_t virt_end = le64_to_cpu(req->virt_end);
>>>  uint32_t flags = le32_to_cpu(req->flags);
>>> +hwaddr size = virt_end - virt_start + 1;
>>> +VirtioIOMMUNotifierNode *node;
>>>  VirtIOIOMMUDomain *domain;
>>>  VirtIOIOMMUInterval *interval;
>>>  VirtIOIOMMUMapping *mapping;
>>> +VirtIOIOMMUEndpoint *ep;
>>>
>>>  if (flags & ~VIRTIO_IOMMU_MAP_F_MASK) {
>>>  return VIRTIO_IOMMU_S_INVAL;
>>> @@ -339,9 +374,37 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>>>
>>>  g_tree_insert(domain->mappings, interval, mapping);
>>>
>>> +/* All devices in an address-space share mapping */
>>> +QLIST_FOREACH(node, >notifiers_list, next) {
>>> +QLIST_FOREACH(ep, >endpoint_list, next) {
>>> +if (ep->id == node->iommu_dev->devfn) {
>>> +virtio_iommu_notify_map(>iommu_dev->iommu_mr,
>>> +virt_start, phys_start, size);
>>> +}
>>> +}
>>> +}
>>> +
>>>  return VIRTIO_IOMMU_S_OK;
>>>  }
>>>
>>> +static void virtio_iommu_remove_mapping(VirtIOIOMMU *s, VirtIOIOMMUDomain 
>>> *domain,
>>> +VirtIOIOMMUInterval *interval)
>>> +{
>>> +VirtioIOMMUNotifierNode *node;
>>> +VirtIOIOMMUEndpoint *ep;
>>> +
>>> +QLIST_FOREACH(node, >notifiers_list, next) {
>>> +QLIST_FOREACH(ep, >endpoint_list, next) {
>>> +if (ep->id == node->iommu_dev->devfn) {
>>> +virtio_iommu_notify_unmap(>iommu_dev->iommu_mr,
>>> +  interval->low,
>>> +  interval->high - interval->low + 
>>> 1);
>>> +}
>>> +}
>>> +}
>>> +g_tree_remove(domain->mappings,

Re: [PATCH v7 3/5] virtio-iommu: Call iommu notifier for attach/detach

2020-03-16 Thread Auger Eric

Hi Bharat,

On 3/16/20 7:41 AM, Bharat Bhushan wrote:
> Hi Eric,
> 
> On Fri, Mar 13, 2020 at 8:11 PM Auger Eric  wrote:
>>
>> Hi Bharat
>>
>> On 3/13/20 8:48 AM, Bharat Bhushan wrote:
>>> iommu-notifier are called when a device is attached
>> IOMMU notifiers
>>> or detached to as address-space.
>>> This is needed for VFIO.
>> and vhost for detach
>>>
>>> Signed-off-by: Bharat Bhushan 
>>> ---
>>>  hw/virtio/virtio-iommu.c | 47 
>>>  1 file changed, 47 insertions(+)
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index e51344a53e..2006f72901 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -49,6 +49,7 @@ typedef struct VirtIOIOMMUEndpoint {
>>>  uint32_t id;
>>>  VirtIOIOMMUDomain *domain;
>>>  QLIST_ENTRY(VirtIOIOMMUEndpoint) next;
>>> +VirtIOIOMMU *viommu;
>> This needs specal care on post-load. When migrating the EPs, only the id
>> is migrated. On post-load you need to set viommu as it is done for
>> domain. migration is allowed with vhost.
> 
> ok, I have not tried vhost/migration. Below change set viommu when
> reconstructing endpoint.


Yes I think this should be OK.

By the end I did the series a try with vhost/vfio. with vhost it works
(not with recent kernel though, but the issue may be related to kernel).
With VFIO however it does not for me.

First issue is: your guest can use 4K page and your host can use 64KB
pages. In that case VFIO_DMA_MAP will fail with -EINVAL. We must devise
a way to pass the host settings to the VIRTIO-IOMMU device.

Even with 64KB pages, it did not work for me. I have obviously not the
storm of VFIO_DMA_MAP failures but I have some, most probably due to
some wrong notifications somewhere. I will try to investigate on my side.

Did you test with VFIO on your side?

Thanks

Eric
> 
> @@ -984,6 +973,7 @@ static gboolean reconstruct_endpoints(gpointer
> key, gpointer value,
> 
>  QLIST_FOREACH(iter, >endpoint_list, next) {
>  iter->domain = d;
> +   iter->viommu = s;
>  g_tree_insert(s->endpoints, GUINT_TO_POINTER(iter->id), iter);
>  }
>  return false; /* continue the domain traversal */
> 
>>>  } VirtIOIOMMUEndpoint;
>>>
>>>  typedef struct VirtIOIOMMUInterval {
>>> @@ -155,8 +156,44 @@ static void 
>>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr iova,
>>>  memory_region_notify_iommu(mr, 0, entry);
>>>  }
>>>
>>> +static gboolean virtio_iommu_mapping_unmap(gpointer key, gpointer value,
>>> +   gpointer data)
>>> +{
>>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
>>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
>>> +
>>> +virtio_iommu_notify_unmap(mr, interval->low,
>>> +  interval->high - interval->low + 1);
>>> +
>>> +return false;
>>> +}
>>> +
>>> +static gboolean virtio_iommu_mapping_map(gpointer key, gpointer value,
>>> + gpointer data)
>>> +{
>>> +VirtIOIOMMUMapping *mapping = (VirtIOIOMMUMapping *) value;
>>> +VirtIOIOMMUInterval *interval = (VirtIOIOMMUInterval *) key;
>>> +IOMMUMemoryRegion *mr = (IOMMUMemoryRegion *) data;
>>> +
>>> +virtio_iommu_notify_map(mr, interval->low, mapping->phys_addr,
>>> +interval->high - interval->low + 1);
>>> +
>>> +return false;
>>> +}
>>> +
>>>  static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint 
>>> *ep)
>>>  {
>>> +VirtioIOMMUNotifierNode *node;
>>> +VirtIOIOMMU *s = ep->viommu;
>>> +VirtIOIOMMUDomain *domain = ep->domain;
>>> +
>>> +QLIST_FOREACH(node, >notifiers_list, next) {
>>> +if (ep->id == node->iommu_dev->devfn) {
>>> +g_tree_foreach(domain->mappings, virtio_iommu_mapping_unmap,
>>> +   >iommu_dev->iommu_mr);
>> I understand this should fo the job for domain removal
> 
> did not get the comment, are you saying we should do this on domain removal?
see my reply on 2/5

Note the above code should be moved after the check of !ep->domain below
> 
>>> +}
>>> +}
>>> +
>>>  if (!ep->domain) {
>>>  return;
>>>  }
>>> @@ -178,6 +215,7 @@ static VirtIOIOMMUEndpoint 
>>> *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>>>  }
>>>  ep = g_malloc0(sizeof(*ep));
>>>  ep->id = ep_id;
>>> +ep->viommu = s;
>>>  trace_virtio_iommu_get_endpoint(ep_id);
>>>  g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
>>>  return ep;
>>> @@ -272,6 +310,7 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>  {
>>>  uint32_t domain_id = le32_to_cpu(req->domain);
>>>  uint32_t ep_id = le32_to_cpu(req->endpoint);
>>> +VirtioIOMMUNotifierNode *node;
>>>  VirtIOIOMMUDomain *domain;
>>>  VirtIOIOMMUEndpoint *ep;
>>>
>>> @@ -299,6 +338,14 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>
>>>  ep->domain = domain;
>>>
>>> +/*

Re: [PATCH v2 00/12] user-mode: Prune build dependencies (part 1)

2020-03-16 Thread Philippe Mathieu-Daudé


On 3/16/20 1:16 AM, Aleksandar Markovic wrote:



On Monday, March 16, 2020, Philippe Mathieu-Daudé > wrote:


This is the first part of a series reducing user-mode
dependencies. By stripping out unused code, the build
and testing time is reduced (as is space used by objects).

Part 1:
- reduce user-mode object list
- remove some migration code from user-mode
- remove cpu_get_crash_info()


What is the purpose of dividing into parts? What is the content of other 
parts, and when do you plan to submit those? A series is usually a 
stand-alone and a complete logical unit - why did you decide to submit 
"parts" separately (just curious)?


Big series are hard to digest and scare reviewers. Peter told me twice 
his rule of thumb is to split a series if it gets bigger than 20 patches 
(and a patch if it modify more than 200 lines). He also recently said he 
skipped review of a ~32 patches series of mine which was too big. I 
don't want other reviewers to do that neither, so I try to split <=20.


Each series could be applied apart, except the last patch from the 3rd 
part (qapi: Restrict code generated for user-mode) which is the one 
really cutting down user-mode code by avoiding pulling in system-mode 
symbols.


First part is generic, second part is QAPI-related, and third part 
concerns hw/core/qdev-properties.c. Each part is covered by different 
maintainers.




Does this series affect executables' size, or cut build times only?


Both. It will saves us CI testing time, save time to distributions 
packaging linux-user-only builds, produce smaller binaries.




Thanks,
Aleksandar

Since v1:
- Addressed Laurent/Richard review comments
- Removed 'exec: Drop redundant #ifdeffery'
- Removed 'target: Restrict write_elfXX_note() handlers to system-mode'

v1:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg688456.html


Philippe Mathieu-Daudé (12):
   Makefile: Only build virtiofsd if system-mode is enabled
   configure: Avoid building TCG when not needed
   tests/Makefile: Only display TCG-related tests when TCG is available
   tests/Makefile: Restrict some softmmu-only tests
   util/Makefile: Reduce the user-mode object list
   stubs/Makefile: Reduce the user-mode object list
   target/riscv/cpu: Restrict CPU migration to system-mode
   exec: Assert CPU migration is not used on user-only build
   arch_init: Remove unused 'qapi-commands-misc.h' include
   target/i386: Restrict CpuClass::get_crash_info() to system-mode
   target/s390x: Restrict CpuClass::get_crash_info() to system-mode
   hw/core: Restrict CpuClass::get_crash_info() to system-mode

  configure              |  4 +++
  Makefile               |  2 +-
  include/hw/core/cpu.h  |  7 -
  arch_init.c            |  1 -
  exec.c                 |  4 ++-
  hw/core/cpu.c          |  2 ++
  target/i386/cpu.c      |  6 -
  target/riscv/cpu.c     |  6 +++--
  target/s390x/cpu.c     | 12 -
  stubs/Makefile.objs    | 52 +
  tests/Makefile.include | 18 +++--
  util/Makefile.objs     | 59 +++---
  12 files changed, 108 insertions(+), 65 deletions(-)

-- 
2.21.1

Re: [PATCH v5 09/26] nvme: add temperature threshold feature

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 11:31, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > It might seem wierd to implement this feature for an emulated device,
> > but it is mandatory to support and the feature is useful for testing
> > asynchronous event request support, which will be added in a later
> > patch.
> 
> Absolutely but as the old saying is, rules are rules.
> At least, to the defense of the spec, making this mandatory
> forced the vendors to actually report some statistics about
> the device in neutral format as opposed to yet another
> vendor proprietary thing (I am talking about SMART log page).
> 
> > 
> > Signed-off-by: Klaus Jensen 
> 
> I noticed that you sign off some patches with your @samsung.com email,
> and some with @cnexlabs.com
> Is there a reason for that?

Yeah. Some of this code was made while I was at CNEX Labs. I've since
moved to Samsung. But credit where credit's due.

> 
> 
> > ---
> >  hw/block/nvme.c  | 50 
> >  hw/block/nvme.h  |  2 ++
> >  include/block/nvme.h |  7 ++-
> >  3 files changed, 58 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 81514eaef63a..f72348344832 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -45,6 +45,9 @@
> >  
> >  #define NVME_SPEC_VER 0x00010201
> >  #define NVME_MAX_QS PCI_MSIX_FLAGS_QSIZE
> > +#define NVME_TEMPERATURE 0x143
> > +#define NVME_TEMPERATURE_WARNING 0x157
> > +#define NVME_TEMPERATURE_CRITICAL 0x175
> >  
> >  #define NVME_GUEST_ERR(trace, fmt, ...) \
> >  do { \
> > @@ -798,9 +801,31 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl 
> > *n, NvmeCmd *cmd)
> >  static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest 
> > *req)
> >  {
> >  uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> > +uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> >  uint32_t result;
> >  
> >  switch (dw10) {
> > +case NVME_TEMPERATURE_THRESHOLD:
> > +result = 0;
> > +
> > +/*
> > + * The controller only implements the Composite Temperature 
> > sensor, so
> > + * return 0 for all other sensors.
> > + */
> > +if (NVME_TEMP_TMPSEL(dw11)) {
> > +break;
> > +}
> > +
> > +switch (NVME_TEMP_THSEL(dw11)) {
> > +case 0x0:
> > +result = cpu_to_le16(n->features.temp_thresh_hi);
> > +break;
> > +case 0x1:
> > +result = cpu_to_le16(n->features.temp_thresh_low);
> > +break;
> > +}
> > +
> > +break;
> >  case NVME_VOLATILE_WRITE_CACHE:
> >  result = blk_enable_write_cache(n->conf.blk);
> >  trace_nvme_dev_getfeat_vwcache(result ? "enabled" : "disabled");
> > @@ -845,6 +870,23 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> > *cmd, NvmeRequest *req)
> >  uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> >  
> >  switch (dw10) {
> > +case NVME_TEMPERATURE_THRESHOLD:
> > +if (NVME_TEMP_TMPSEL(dw11)) {
> > +break;
> > +}
> > +
> > +switch (NVME_TEMP_THSEL(dw11)) {
> > +case 0x0:
> > +n->features.temp_thresh_hi = NVME_TEMP_TMPTH(dw11);
> > +break;
> > +case 0x1:
> > +n->features.temp_thresh_low = NVME_TEMP_TMPTH(dw11);
> > +break;
> > +default:
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +break;
> >  case NVME_VOLATILE_WRITE_CACHE:
> >  blk_set_enable_write_cache(n->conf.blk, dw11 & 1);
> >  break;
> > @@ -1366,6 +1408,9 @@ static void nvme_init_state(NvmeCtrl *n)
> >  n->namespaces = g_new0(NvmeNamespace, n->num_namespaces);
> >  n->sq = g_new0(NvmeSQueue *, n->params.num_queues);
> >  n->cq = g_new0(NvmeCQueue *, n->params.num_queues);
> > +
> > +n->temperature = NVME_TEMPERATURE;
> 
> This appears not to be used in the patch.
> I think you should move that to the next patch that
> adds the get log page support.
> 

Fixed.

> > +n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
> >  }
> >  
> >  static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> > @@ -1447,6 +1492,11 @@ static void nvme_init_ctrl(NvmeCtrl *n)
> >  id->acl = 3;
> >  id->frmw = 7 << 1;
> >  id->lpa = 1 << 0;
> > +
> > +/* recommended default value (~70 C) */
> > +id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
> > +id->cctemp = cpu_to_le16(NVME_TEMPERATURE_CRITICAL);
> > +
> >  id->sqes = (0x6 << 4) | 0x6;
> >  id->cqes = (0x4 << 4) | 0x4;
> >  id->nn = cpu_to_le32(n->num_namespaces);
> > diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> > index a867bdfabafd..1518f32557a3 100644
> > --- a/hw/block/nvme.h
> > +++ b/hw/block/nvme.h
> > @@ -108,6 +108,7 @@ typedef struct NvmeCtrl {
> >  uint64_tirq_status;
> >  uint64_thost_timestamp; /* Timestamp sent by the 
> > host */
> >

Re: [PATCH v5 10/26] nvme: add support for the get log page command

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 11:35, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > Add support for the Get Log Page command and basic implementations of
> > the mandatory Error Information, SMART / Health Information and Firmware
> > Slot Information log pages.
> > 
> > In violation of the specification, the SMART / Health Information log
> > page does not persist information over the lifetime of the controller
> > because the device has no place to store such persistent state.
> Yea, not the end of the world.
> > 
> > Note that the LPA field in the Identify Controller data structure
> > intentionally has bit 0 cleared because there is no namespace specific
> > information in the SMART / Health information log page.
> Makes sense.
> > 
> > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > Section 5.10 ("Get Log Page command").
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c   | 122 +-
> >  hw/block/nvme.h   |  10 
> >  hw/block/trace-events |   2 +
> >  include/block/nvme.h  |   2 +-
> >  4 files changed, 134 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index f72348344832..468c36918042 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -569,6 +569,123 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeCmd 
> > *cmd)
> >  return NVME_SUCCESS;
> >  }
> >  
> > +static uint16_t nvme_smart_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t 
> > buf_len,
> > +uint64_t off, NvmeRequest *req)
> > +{
> > +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> > +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> > +uint32_t nsid = le32_to_cpu(cmd->nsid);
> > +
> > +uint32_t trans_len;
> > +time_t current_ms;
> > +uint64_t units_read = 0, units_written = 0, read_commands = 0,
> > +write_commands = 0;
> > +NvmeSmartLog smart;
> > +BlockAcctStats *s;
> > +
> > +if (nsid && nsid != 0x) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +s = blk_get_stats(n->conf.blk);
> > +
> > +units_read = s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> > +units_written = s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> > +read_commands = s->nr_ops[BLOCK_ACCT_READ];
> > +write_commands = s->nr_ops[BLOCK_ACCT_WRITE];
> > +
> > +if (off > sizeof(smart)) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +trans_len = MIN(sizeof(smart) - off, buf_len);
> > +
> > +memset(, 0x0, sizeof(smart));
> > +
> > +smart.data_units_read[0] = cpu_to_le64(units_read / 1000);
> > +smart.data_units_written[0] = cpu_to_le64(units_written / 1000);
> > +smart.host_read_commands[0] = cpu_to_le64(read_commands);
> > +smart.host_write_commands[0] = cpu_to_le64(write_commands);
> > +
> > +smart.temperature[0] = n->temperature & 0xff;
> > +smart.temperature[1] = (n->temperature >> 8) & 0xff;
> > +
> > +if ((n->temperature > n->features.temp_thresh_hi) ||
> > +(n->temperature < n->features.temp_thresh_low)) {
> > +smart.critical_warning |= NVME_SMART_TEMPERATURE;
> > +}
> > +
> > +current_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> > +smart.power_on_hours[0] = cpu_to_le64(
> > +(((current_ms - n->starttime_ms) / 1000) / 60) / 60);
> > +
> > +return nvme_dma_read_prp(n, (uint8_t *)  + off, trans_len, prp1,
> > +prp2);
> > +}
> Looks OK.
> > +
> > +static uint16_t nvme_fw_log_info(NvmeCtrl *n, NvmeCmd *cmd, uint32_t 
> > buf_len,
> > +uint64_t off, NvmeRequest *req)
> > +{
> > +uint32_t trans_len;
> > +uint64_t prp1 = le64_to_cpu(cmd->prp1);
> > +uint64_t prp2 = le64_to_cpu(cmd->prp2);
> > +NvmeFwSlotInfoLog fw_log;
> > +
> > +if (off > sizeof(fw_log)) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +memset(_log, 0, sizeof(NvmeFwSlotInfoLog));
> > +
> > +trans_len = MIN(sizeof(fw_log) - off, buf_len);
> > +
> > +return nvme_dma_read_prp(n, (uint8_t *) _log + off, trans_len, prp1,
> > +prp2);
> > +}
> Looks OK
> > +
> > +static uint16_t nvme_get_log(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > +{
> > +uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> > +uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> > +uint32_t dw12 = le32_to_cpu(cmd->cdw12);
> > +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > +uint8_t  lid = dw10 & 0xff;
> > +uint8_t  rae = (dw10 >> 15) & 0x1;
> > +uint32_t numdl, numdu;
> > +uint64_t off, lpol, lpou;
> > +size_t   len;
> > +
> > +numdl = (dw10 >> 16);
> > +numdu = (dw11 & 0x);
> > +lpol = dw12;
> > +lpou = dw13;
> > +
> > +len = (((numdu << 16) | numdl) + 1) << 2;
> > +off = (lpou << 32ULL) | lpol;
> > +
> > +if (off & 0x3) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> 
> Good. 
> Note that there are plenty of other places in the driver

Re: [PATCH v5 12/26] nvme: add missing mandatory features

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 12:27, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > Add support for returning a resonable response to Get/Set Features of
> > mandatory features.
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c   | 57 ---
> >  hw/block/trace-events |  2 ++
> >  include/block/nvme.h  |  3 ++-
> >  3 files changed, 58 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index a186d95df020..3267ee2de47a 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1008,7 +1008,15 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, 
> > NvmeCmd *cmd, NvmeRequest *req)
> >  uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> >  uint32_t result;
> >  
> > +trace_nvme_dev_getfeat(nvme_cid(req), dw10);
> > +
> >  switch (dw10) {
> > +case NVME_ARBITRATION:
> > +result = cpu_to_le32(n->features.arbitration);
> > +break;
> > +case NVME_POWER_MANAGEMENT:
> > +result = cpu_to_le32(n->features.power_mgmt);
> > +break;
> >  case NVME_TEMPERATURE_THRESHOLD:
> >  result = 0;
> >  
> > @@ -1029,6 +1037,9 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeCmd 
> > *cmd, NvmeRequest *req)
> >  break;
> >  }
> >  
> > +break;
> > +case NVME_ERROR_RECOVERY:
> > +result = cpu_to_le32(n->features.err_rec);
> >  break;
> >  case NVME_VOLATILE_WRITE_CACHE:
> >  result = blk_enable_write_cache(n->conf.blk);
> 
> This is existing code but still like to point out that endianess conversion 
> is missing.

Fixed.

> Also we need to think if we need to do some flush if the write cache is 
> disabled.
> I don't know yet that area well enough.
> 

Looking at the block layer code it just sets a flag when disabling, but
subsequent requests will have BDRV_REQ_FUA set. So to make sure that
stuff in the cache is flushed, let's do a flush.

> > @@ -1041,6 +1052,19 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, 
> > NvmeCmd *cmd, NvmeRequest *req)
> >  break;
> >  case NVME_TIMESTAMP:
> >  return nvme_get_feature_timestamp(n, cmd);
> > +case NVME_INTERRUPT_COALESCING:
> > +result = cpu_to_le32(n->features.int_coalescing);
> > +break;
> > +case NVME_INTERRUPT_VECTOR_CONF:
> > +if ((dw11 & 0x) > n->params.num_queues) {
> Looks like it should be >= since interrupt vector is not zero based.

Fixed in other patch.

> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +result = cpu_to_le32(n->features.int_vector_config[dw11 & 0x]);
> > +break;
> > +case NVME_WRITE_ATOMICITY:
> > +result = cpu_to_le32(n->features.write_atomicity);
> > +break;
> >  case NVME_ASYNCHRONOUS_EVENT_CONF:
> >  result = cpu_to_le32(n->features.async_config);
> >  break;
> > @@ -1076,6 +1100,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> > *cmd, NvmeRequest *req)
> >  uint32_t dw10 = le32_to_cpu(cmd->cdw10);
> >  uint32_t dw11 = le32_to_cpu(cmd->cdw11);
> >  
> > +trace_nvme_dev_setfeat(nvme_cid(req), dw10, dw11);
> > +
> >  switch (dw10) {
> >  case NVME_TEMPERATURE_THRESHOLD:
> >  if (NVME_TEMP_TMPSEL(dw11)) {
> > @@ -1116,6 +1142,13 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
> > NvmeCmd *cmd, NvmeRequest *req)
> >  case NVME_ASYNCHRONOUS_EVENT_CONF:
> >  n->features.async_config = dw11;
> >  break;
> > +case NVME_ARBITRATION:
> > +case NVME_POWER_MANAGEMENT:
> > +case NVME_ERROR_RECOVERY:
> > +case NVME_INTERRUPT_COALESCING:
> > +case NVME_INTERRUPT_VECTOR_CONF:
> > +case NVME_WRITE_ATOMICITY:
> > +return NVME_FEAT_NOT_CHANGABLE | NVME_DNR;
> >  default:
> >  trace_nvme_dev_err_invalid_setfeat(dw10);
> >  return NVME_INVALID_FIELD | NVME_DNR;
> > @@ -1689,6 +1722,21 @@ static void nvme_init_state(NvmeCtrl *n)
> >  n->temperature = NVME_TEMPERATURE;
> >  n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
> >  n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> > +
> > +/*
> > + * There is no limit on the number of commands that the controller may
> > + * launch at one time from a particular Submission Queue.
> > + */
> > +n->features.arbitration = 0x7;
> A nice #define in nvme.h stating that 0x7 means no burst limit would be nice.
> 

Done.

> > +
> > +n->features.int_vector_config = g_malloc0_n(n->params.num_queues,
> > +sizeof(*n->features.int_vector_config));
> > +
> > +/* disable coalescing (not supported) */
> > +for (int i = 0; i < n->params.num_queues; i++) {
> > +n->features.int_vector_config[i] = i | (1 << 16);
> Same here

Done.

> > +}
> > +
> >  n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
> >  }
> >  
> > @@ -1782,15 +1830,17 @@ static void

[PATCH v7 4/4] iotests: 287: add qcow2 compression type test

2020-03-16 Thread Denis Plotnikov

The test checks fulfilling qcow2 requiriements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/287 | 128 +
 tests/qemu-iotests/287.out |  43 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 172 insertions(+)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..49d15b3d43
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,128 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M | grep "Invalid parameter 
'zstd'" 2>&1 1>/dev/null
+
+ZSTD_SUPPORTED=$?
+
+if (($ZSTD_SUPPORTED==0)); then
+_notrun "ZSTD is disabled"
+fi
+
+# Test: when compression is zlib the incompatible bit is unset
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Test: when compression differs from zlib the incompatible bit is set
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Test: an image can't be openned if compression type is zlib and
+#   incompatible feature compression type is set
+echo
+echo "=== Testing zlib with incompatible bit set  ==="
+echo
+
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+$QEMU_IMG info "$TEST_IMG" 2>1 1>/dev/null
+if (($?==0)); then
+echo "Error: The image openned successfully. The image must not be openned"
+fi
+
+# Test: an image can't be openned if compression type is NOT zlib and
+#   incompatible feature compression type is UNSET
+echo
+echo "=== Testing zstd with incompatible bit unset  ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+$QEMU_IMG info "$TEST_IMG" 2>1 1>/dev/null
+if (($?==0)); then
+echo "Error: The image openned successfully. The image must not be openned"
+fi
+# Test: check compression type values
+echo
+echo "=== Testing compression type values  ==="
+echo
+# zlib=0
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+od -j104 -N1 -An -vtu1 "$TEST_IMG"
+
+# zstd=1
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+od -j104 -N1 -An -vtu1 "$TEST_IMG"
+
+# Test: using zstd compression, write to and read from an image
+echo
+echo "=== Testing reading and writing with zstd ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$QEMU_IO -c "write -c -P 0xAC 65536 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 65536 65536 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/287.out b/tests/qemu-iotests/287.out
new file mode 100644
index 00..8e51c3078d
--- /dev/null
+++ b/tests/qemu-iotests/287.out
@@ -0,0 +1,43 @@
+QA output created by 287
+
+=== Testing compression type incompatible bit setting for zlib ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864

[PATCH 1/2] block: bdrv_set_backing_bs: fix use-after-free

2020-03-16 Thread Vladimir Sementsov-Ogievskiy

There is a use-after-free possible: bdrv_unref_child() leaves
bs->backing freed but not NULL. bdrv_attach_child may produce nested
polling loop due to drain, than access of freed pointer is possible.

I've produced the following crash on 30 iotest with modified code. It
does not reproduce on master, but still seems possible:

#0  __strcmp_avx2 () at /lib64/libc.so.6
#1  bdrv_backing_overridden (bs=0x55c9d3cc2060) at block.c:6350
#2  bdrv_refresh_filename (bs=0x55c9d3cc2060) at block.c:6404
#3  bdrv_backing_attach (c=0x55c9d48e5520) at block.c:1063
#4  bdrv_replace_child_noperm
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2290
#5  bdrv_replace_child
(child=child@entry=0x55c9d48e5520,
new_bs=new_bs@entry=0x55c9d3cc2060) at block.c:2320
#6  bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 ,
ctx=, perm=, shared_perm=21,
opaque=0x55c9d3c5a3d0, errp=0x7ffd117108e0) at block.c:2424
#7  bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3c5a3d0,
child_bs=child_bs@entry=0x55c9d3cc2060,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 ,
errp=errp@entry=0x7ffd117108e0) at block.c:5876
#8  in bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3c5a3d0,
backing_hd=backing_hd@entry=0x55c9d3cc2060,
errp=errp@entry=0x7ffd117108e0)
at block.c:2576
#9  stream_prepare (job=0x55c9d49d84a0) at block/stream.c:150
#10 job_prepare (job=0x55c9d49d84a0) at job.c:761
#11 job_txn_apply (txn=, fn=) at
job.c:145
#12 job_do_finalize (job=0x55c9d49d84a0) at job.c:778
#13 job_completed_txn_success (job=0x55c9d49d84a0) at job.c:832
#14 job_completed (job=0x55c9d49d84a0) at job.c:845
#15 job_completed (job=0x55c9d49d84a0) at job.c:836
#16 job_exit (opaque=0x55c9d49d84a0) at job.c:864
#17 aio_bh_call (bh=0x55c9d471a160) at util/async.c:117
#18 aio_bh_poll (ctx=ctx@entry=0x55c9d3c46720) at util/async.c:117
#19 aio_poll (ctx=ctx@entry=0x55c9d3c46720,
blocking=blocking@entry=true)
at util/aio-posix.c:728
#20 bdrv_parent_drained_begin_single (poll=true, c=0x55c9d3d558f0)
at block/io.c:121
#21 bdrv_parent_drained_begin_single (c=c@entry=0x55c9d3d558f0,
poll=poll@entry=true)
at block/io.c:114
#22 bdrv_replace_child_noperm
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2258
#23 bdrv_replace_child
(child=child@entry=0x55c9d3d558f0,
new_bs=new_bs@entry=0x55c9d3d27300) at block.c:2320
#24 bdrv_root_attach_child
(child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 ,
ctx=, perm=, shared_perm=21,
opaque=0x55c9d3cc2060, errp=0x7ffd11710c60) at block.c:2424
#25 bdrv_attach_child
(parent_bs=parent_bs@entry=0x55c9d3cc2060,
child_bs=child_bs@entry=0x55c9d3d27300,
child_name=child_name@entry=0x55c9d241d478 "backing",
child_role=child_role@entry=0x55c9d26ecee0 ,
errp=errp@entry=0x7ffd11710c60) at block.c:5876
#26 bdrv_set_backing_hd
(bs=bs@entry=0x55c9d3cc2060,
backing_hd=backing_hd@entry=0x55c9d3d27300,
errp=errp@entry=0x7ffd11710c60)
at block.c:2576
#27 stream_prepare (job=0x55c9d495ead0) at block/stream.c:150
...

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 957630b1c5..a862ce4df9 100644
--- a/block.c
+++ b/block.c
@@ -2735,10 +2735,10 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 if (bs->backing) {
 bdrv_unref_child(bs, bs->backing);
+bs->backing = NULL;
 }
 
 if (!backing_hd) {
-bs->backing = NULL;
 goto out;
 }
 
-- 
2.21.0

Re: [PATCH v5 16/26] nvme: refactor prp mapping

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 13:44, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > Refactor nvme_map_prp and allow PRPs to be located in the CMB. The logic
> > ensures that if some of the PRP is in the CMB, all of it must be located
> > there, as per the specification.
> 
> To be honest this looks like not refactoring but a bugfix
> (old code was just assuming that if first prp entry is in cmb, the rest also 
> is)

I split it up into a separate bugfix patch.

> > 
> > Also combine nvme_dma_{read,write}_prp into a single nvme_dma_prp that
> > takes an additional DMADirection parameter.
> 
> To be honest 'nvme_dma_prp' was not a clear function name to me at first 
> glance.
> Could you rename this to nvme_dma_prp_rw or so? (Although even that is 
> somewhat unclear
> to convey the meaning of read/write the data to/from the guest memory areas 
> defined by the prp list.
> Also could you split this change into a new patch?
> 

Splitting into new patch.

> > 
> > Signed-off-by: Klaus Jensen 
> > Signed-off-by: Klaus Jensen 
> Now you even use your both addresses :-)
> 
> > ---
> >  hw/block/nvme.c   | 245 +++---
> >  hw/block/nvme.h   |   2 +-
> >  hw/block/trace-events |   1 +
> >  include/block/nvme.h  |   1 +
> >  4 files changed, 160 insertions(+), 89 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 4acfc85b56a2..334265efb21e 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -58,6 +58,11 @@
> >  
> >  static void nvme_process_sq(void *opaque);
> >  
> > +static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
> > +{
> > +return >cmbuf[addr - n->ctrl_mem.addr];
> > +}
> 
> To my taste I would put this together with the patch that
> added nvme_addr_is_cmb. I know that some people are against
> this citing the fact that you should use the code you add
> in the same patch. Your call.
> 
> Regardless of this I also prefer to put refactoring patches first in the 
> series.
> 
> > +
> >  static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
> >  {
> >  hwaddr low = n->ctrl_mem.addr;
> > @@ -152,138 +157,187 @@ static void nvme_irq_deassert(NvmeCtrl *n, 
> > NvmeCQueue *cq)
> >  }
> >  }
> >  
> > -static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t 
> > prp1,
> > - uint64_t prp2, uint32_t len, NvmeCtrl *n)
> > +static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector 
> > *iov,
> > +uint64_t prp1, uint64_t prp2, uint32_t len, NvmeRequest *req)
> 
> Split line alignment (it was correct before).
> Also while at the refactoring, it would be great to add some documentation
> to this and few more functions, since its not clear immediately what this 
> does.
> 
> 
> >  {
> >  hwaddr trans_len = n->page_size - (prp1 % n->page_size);
> >  trans_len = MIN(len, trans_len);
> >  int num_prps = (len >> n->page_bits) + 1;
> > +uint16_t status = NVME_SUCCESS;
> > +bool is_cmb = false;
> > +bool prp_list_in_cmb = false;
> > +
> > +trace_nvme_dev_map_prp(nvme_cid(req), req->cmd.opcode, trans_len, len,
> > +prp1, prp2, num_prps);
> >  
> >  if (unlikely(!prp1)) {
> >  trace_nvme_dev_err_invalid_prp();
> >  return NVME_INVALID_FIELD | NVME_DNR;
> > -} else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
> > -   prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
> > -qsg->nsg = 0;
> > +}
> > +
> > +if (nvme_addr_is_cmb(n, prp1)) {
> > +is_cmb = true;
> > +
> >  qemu_iovec_init(iov, num_prps);
> > -qemu_iovec_add(iov, (void *)>cmbuf[prp1 - n->ctrl_mem.addr], 
> > trans_len);
> > +
> > +/*
> > + * PRPs do not cross page boundaries, so if the start address 
> > (here,
> > + * prp1) is within the CMB, it cannot cross outside the controller
> > + * memory buffer range. This is ensured by
> > + *
> > + *   len = n->page_size - (addr % n->page_size)
> > + *
> > + * Thus, we can directly add to the iovec without risking an out of
> > + * bounds access. This also holds for the remaining qemu_iovec_add
> > + * calls.
> > + */
> > +qemu_iovec_add(iov, nvme_addr_to_cmb(n, prp1), trans_len);
> >  } else {
> >  pci_dma_sglist_init(qsg, >parent_obj, num_prps);
> >  qemu_sglist_add(qsg, prp1, trans_len);
> >  }
> > +
> >  len -= trans_len;
> >  if (len) {
> >  if (unlikely(!prp2)) {
> >  trace_nvme_dev_err_invalid_prp2_missing();
> > +status = NVME_INVALID_FIELD | NVME_DNR;
> >  goto unmap;
> >  }
> > +
> >  if (len > n->page_size) {
> >  uint64_t prp_list[n->max_prp_ents];
> >  uint32_t nents, prp_trans;
> >  int i = 0;
> >  
> > +if (nvme_addr_is_cmb(n, prp2)) {
> > +prp_list_in_cmb

[PATCH 2/2] block/qcow2: zero data_file child after free

2020-03-16 Thread Vladimir Sementsov-Ogievskiy

data_file being NULL doesn't seem to be a correct state, but it's
better than dead pointer and simpler to debug.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index d44b45633d..6cdefe059f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1758,6 +1758,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 g_free(s->image_data_file);
 if (has_data_file(bs)) {
 bdrv_unref_child(bs, s->data_file);
+s->data_file = NULL;
 }
 g_free(s->unknown_header_fields);
 cleanup_unknown_header_ext(bs);
@@ -2621,6 +2622,7 @@ static void qcow2_close(BlockDriverState *bs)
 
 if (has_data_file(bs)) {
 bdrv_unref_child(bs, s->data_file);
+s->data_file = NULL;
 }
 
 qcow2_refcount_close(bs);
-- 
2.21.0

[PATCH 0/2] zero pointer after bdrv_unref_child

2020-03-16 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I faced use-after-free of bs->backing pointer after bdrv_unref_child in
bdrv_set_backing_hd.

Fix it, and do similar thing for s->data_file in qcow2.c.

I'm not sure that this is the full fix. Is it safe to keep bs->backing
during bdrv_unref_child itself? Is it safe to keep bs->backing during
all-child-unref loop in bdrv_close?


Vladimir Sementsov-Ogievskiy (2):
  block: bdrv_set_backing_bs: fix use-after-free
  block/qcow2: zero data_file child after free

 block.c   | 2 +-
 block/qcow2.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

-- 
2.21.0

Re: [PATCH 1/8] hw/ide: Get rid of piix3_init functions

2020-03-16 Thread Markus Armbruster

BALATON Zoltan  writes:

> This removes pci_piix3_ide_init() and pci_piix3_xen_ide_init()
> functions similar to clean up done to other ide devices.

Got a commit hash for "done to other ide devices"?

> Signed-off-by: BALATON Zoltan

Re: [PATCH 4/8] hw/ide: Move MAX_IDE_BUS define to one header

2020-03-16 Thread Markus Armbruster

BALATON Zoltan  writes:

> There are several definitions of MAX_IDE_BUS in different boards (some
> of them unused) with the same value. Move it to include/hw/ide/internal.h
> to have it in a central place.
>
> Signed-off-by: BALATON Zoltan 

This one feels a bit questionable.

The number of (PATA) IDE buses provides by a host bus adapter depends on
the HBA.  It happens to be 2 for all HBAs we implement, but it could
really be anything.

Similar for SATA, where the common number is 6, but could really be
anything.  I can't see offhand whether any HBA we implement provides a
different number.

By moving MAX_IDE_BUS to include/hw/ide/internal.h, you bake the
accidental commonality into the interface to the IDE core.  I'd prefer
not to.

Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-16 Thread Vladimir Sementsov-Ogievskiy





On 14.03.2020 00:54, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


13.03.2020 18:42, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


12.03.2020 19:36, Markus Armbruster wrote:

I may have a second look tomorrow with fresher eyes, but let's get this
out now as is.

Vladimir Sementsov-Ogievskiy  writes:


Script adds ERRP_AUTO_PROPAGATE macro invocation where appropriate and
does corresponding changes in code (look for details in
include/qapi/error.h)

Usage example:
spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
--macro-file scripts/cocci-macro-file.h --in-place --no-show-diff \
--max-width 80 FILES...

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Cc: Eric Blake 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Greg Kurz 
Cc: Christian Schoenebeck 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Stefan Hajnoczi 
Cc: "Philippe Mathieu-Daudé" 
Cc: Laszlo Ersek 
Cc: Gerd Hoffmann 
Cc: Stefan Berger 
Cc: Markus Armbruster 
Cc: Michael Roth 
Cc: qemu-devel@nongnu.org
Cc: qemu-bl...@nongnu.org
Cc: xen-de...@lists.xenproject.org

scripts/coccinelle/auto-propagated-errp.cocci | 327 ++
include/qapi/error.h  |   3 +
MAINTAINERS   |   1 +
3 files changed, 331 insertions(+)
create mode 100644 scripts/coccinelle/auto-propagated-errp.cocci

diff --git a/scripts/coccinelle/auto-propagated-errp.cocci 
b/scripts/coccinelle/auto-propagated-errp.cocci
new file mode 100644
index 00..7dac2dcfa4
--- /dev/null
+++ b/scripts/coccinelle/auto-propagated-errp.cocci
@@ -0,0 +1,327 @@
+// Use ERRP_AUTO_PROPAGATE (see include/qapi/error.h)
+//
+// Copyright (c) 2020 Virtuozzo International GmbH.
+//
+// This program is free software; you can redistribute it and/or
+// modify it under the terms of the GNU General Public License as
+// published by the Free Software Foundation; either version 2 of the
+// License, or (at your option) any later version.
+//
+// This program is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with this program.  If not, see
+// .
+//
+// Usage example:
+// spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
+//  --macro-file scripts/cocci-macro-file.h --in-place \
+//  --no-show-diff --max-width 80 FILES...
+//
+// Note: --max-width 80 is needed because coccinelle default is less
+// than 80, and without this parameter coccinelle may reindent some
+// lines which fit into 80 characters but not to coccinelle default,
+// which in turn produces extra patch hunks for no reason.


This is about unwanted reformatting of parameter lists due to the ___
chaining hack.  --max-width 80 makes that less likely, but not
impossible.

We can search for unwanted reformatting of parameter lists.  I think
grepping diffs for '^\+.*Error \*\*' should do the trick.  For the whole
tree, I get one false positive (not a parameter list), and one hit:

   @@ -388,8 +388,10 @@ static void object_post_init_with_type(O
}
}

   -void object_apply_global_props(Object *obj, const GPtrArray *props, 
Error **errp)
   +void object_apply_global_props(Object *obj, const GPtrArray *props,
   +   Error **errp)
{
   +ERRP_AUTO_PROPAGATE();
int i;

if (!props) {

Reformatting, but not unwanted.


Yes, I saw it. This line is 81 character length, so it's OK to fix it in one 
hunk with
ERRP_AUTO_PROPAGATE addition even for non-automatic patch.


Agree.



The --max-width 80 hack is good enough for me.

It does result in slightly long transformed lines, e.g. this one in
replication.c:

   @@ -113,7 +113,7 @@ static int replication_open(BlockDriverS
s->mode = REPLICATION_MODE_PRIMARY;
top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
if (top_id) {
   -error_setg(_err, "The primary side does not support option 
top-id");
   +error_setg(errp, "The primary side does not support option 
top-id");
goto fail;
}
} else if (!strcmp(mode, "secondary")) {

v8 did break this line (that's how I found it).  However, v9 still
shortens the line, just not below the target.  All your + lines look
quite unlikely to lengthen lines.  Let's not worry about this.


+// Switch unusual Error ** parameter names to errp
+// (this is necessary to use ERRP_AUTO_PROPAGATE).
+//
+// Disable optional_qualifier to skip functions with
+// "Error *const *errp" parameter.
+//
+// Skip functions with "assert(_errp && *_errp)" statement, because
+// that signals unusual semantics,

Re: [PATCH v5 21/26] nvme: add support for scatter gather lists

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 14:07, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> > For now, support the Data Block, Segment and Last Segment descriptor
> > types.
> > 
> > See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
> > 
> > Signed-off-by: Klaus Jensen 
> > Acked-by: Fam Zheng 
> > ---
> >  block/nvme.c  |  18 +-
> >  hw/block/nvme.c   | 375 +++---
> >  hw/block/trace-events |   4 +
> >  include/block/nvme.h  |  62 ++-
> >  4 files changed, 389 insertions(+), 70 deletions(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index d41c4bda6e39..521f521054d5 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -446,7 +446,7 @@ static void nvme_identify(BlockDriverState *bs, int 
> > namespace, Error **errp)
> >  error_setg(errp, "Cannot map buffer for DMA");
> >  goto out;
> >  }
> > -cmd.prp1 = cpu_to_le64(iova);
> > +cmd.dptr.prp.prp1 = cpu_to_le64(iova);
> >  
> >  if (nvme_cmd_sync(bs, s->queues[0], )) {
> >  error_setg(errp, "Failed to identify controller");
> > @@ -545,7 +545,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, 
> > Error **errp)
> >  }
> >  cmd = (NvmeCmd) {
> >  .opcode = NVME_ADM_CMD_CREATE_CQ,
> > -.prp1 = cpu_to_le64(q->cq.iova),
> > +.dptr.prp.prp1 = cpu_to_le64(q->cq.iova),
> >  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
> >  .cdw11 = cpu_to_le32(0x3),
> >  };
> > @@ -556,7 +556,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, 
> > Error **errp)
> >  }
> >  cmd = (NvmeCmd) {
> >  .opcode = NVME_ADM_CMD_CREATE_SQ,
> > -.prp1 = cpu_to_le64(q->sq.iova),
> > +.dptr.prp.prp1 = cpu_to_le64(q->sq.iova),
> >  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
> >  .cdw11 = cpu_to_le32(0x1 | (n << 16)),
> >  };
> > @@ -906,16 +906,16 @@ try_map:
> >  case 0:
> >  abort();
> >  case 1:
> > -cmd->prp1 = pagelist[0];
> > -cmd->prp2 = 0;
> > +cmd->dptr.prp.prp1 = pagelist[0];
> > +cmd->dptr.prp.prp2 = 0;
> >  break;
> >  case 2:
> > -cmd->prp1 = pagelist[0];
> > -cmd->prp2 = pagelist[1];
> > +cmd->dptr.prp.prp1 = pagelist[0];
> > +cmd->dptr.prp.prp2 = pagelist[1];
> >  break;
> >  default:
> > -cmd->prp1 = pagelist[0];
> > -cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
> > +cmd->dptr.prp.prp1 = pagelist[0];
> > +cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + 
> > sizeof(uint64_t));
> >  break;
> >  }
> >  trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 204ae1d33234..a91c60fdc111 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -75,8 +75,10 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
> > addr)
> >  
> >  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> >  {
> > -if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> > -memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size);
> > +hwaddr hi = addr + size;
> Are you sure you don't want to check for overflow here?
> Its theoretical issue since addr has to be almost full 64 bit
> but still for those things I check this very defensively.
> 

The use of nvme_addr_read in map_prp simply cannot overflow due to how
the size is calculated, but for SGLs it's different. But the overflow is
checked in map_sgl because we have to return a special error code in
that case.

On the other hand there may be other callers of nvme_addr_read in the
future that does not check this, so I'll re-add it.

> > +
> > +if (n->cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
> Here you fix the bug I mentioned in patch 6. I suggest you to move the fix 
> there.

Done.

> > +memcpy(buf, nvme_addr_to_cmb(n, addr), size);
> >  return 0;
> >  }
> >  
> > @@ -159,6 +161,48 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue 
> > *cq)
> >  }
> >  }
> >  
> > +static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr 
> > addr,
> > +size_t len)
> > +{
> > +if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> > +return NVME_DATA_TRANSFER_ERROR;
> > +}
> > +
> > +qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> > +
> > +return NVME_SUCCESS;
> > +}
> > +
> > +static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector 
> > *iov,
> > +hwaddr addr, size_t len)
> > +{
> > +bool addr_is_cmb = nvme_addr_is_cmb(n, addr);
> > +
> > +if (addr_is_cmb) {
> > +if (qsg->sg) {
> > +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> > +}
> > +
> > +if (!iov->iov) {
> > +qemu_iovec_init(iov, 1);
> > +}
> > +
> > +

Re: [PATCH v5 20/26] nvme: handle dma errors

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 13:52, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> > Handling DMA errors gracefully is required for the device to pass the
> > block/011 test ("disable PCI device while doing I/O") in the blktests
> > suite.
> > 
> > With this patch the device passes the test by retrying "critical"
> > transfers (posting of completion entries and processing of submission
> > queue entries).
> > 
> > If DMA errors occur at any other point in the execution of the command
> > (say, while mapping the PRPs), the command is aborted with a Data
> > Transfer Error status code.
> > 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c   | 42 +-
> >  hw/block/trace-events |  2 ++
> >  include/block/nvme.h  |  2 +-
> >  3 files changed, 36 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index f8c81b9e2202..204ae1d33234 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -73,14 +73,14 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
> > addr)
> >  return addr >= low && addr < hi;
> >  }
> >  
> > -static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> > +static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
> >  {
> >  if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> >  memcpy(buf, (void *) >cmbuf[addr - n->ctrl_mem.addr], size);
> > -return;
> > +return 0;
> >  }
> >  
> > -pci_dma_read(>parent_obj, addr, buf, size);
> > +return pci_dma_read(>parent_obj, addr, buf, size);
> >  }
> >  
> >  static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
> > @@ -168,6 +168,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> > *qsg, QEMUIOVector *iov,
> >  uint16_t status = NVME_SUCCESS;
> >  bool is_cmb = false;
> >  bool prp_list_in_cmb = false;
> > +int ret;
> >  
> >  trace_nvme_dev_map_prp(nvme_cid(req), req->cmd.opcode, trans_len, len,
> >  prp1, prp2, num_prps);
> > @@ -218,7 +219,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> > *qsg, QEMUIOVector *iov,
> >  
> >  nents = (len + n->page_size - 1) >> n->page_bits;
> >  prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
> > -nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
> > +ret = nvme_addr_read(n, prp2, (void *) prp_list, prp_trans);
> > +if (ret) {
> > +trace_nvme_dev_err_addr_read(prp2);
> > +status = NVME_DATA_TRANSFER_ERROR;
> > +goto unmap;
> > +}
> >  while (len != 0) {
> >  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
> >  
> > @@ -237,7 +243,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList 
> > *qsg, QEMUIOVector *iov,
> >  i = 0;
> >  nents = (len + n->page_size - 1) >> n->page_bits;
> >  prp_trans = MIN(n->max_prp_ents, nents) * 
> > sizeof(uint64_t);
> > -nvme_addr_read(n, prp_ent, (void *) prp_list, 
> > prp_trans);
> > +ret = nvme_addr_read(n, prp_ent, (void *) prp_list,
> > +prp_trans);
> > +if (ret) {
> > +trace_nvme_dev_err_addr_read(prp_ent);
> > +status = NVME_DATA_TRANSFER_ERROR;
> > +goto unmap;
> > +}
> >  prp_ent = le64_to_cpu(prp_list[i]);
> >  }
> >  
> > @@ -443,6 +455,7 @@ static void nvme_post_cqes(void *opaque)
> >  NvmeCQueue *cq = opaque;
> >  NvmeCtrl *n = cq->ctrl;
> >  NvmeRequest *req, *next;
> > +int ret;
> >  
> >  QTAILQ_FOREACH_SAFE(req, >req_list, entry, next) {
> >  NvmeSQueue *sq;
> > @@ -452,15 +465,21 @@ static void nvme_post_cqes(void *opaque)
> >  break;
> >  }
> >  
> > -QTAILQ_REMOVE(>req_list, req, entry);
> >  sq = req->sq;
> >  req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
> >  req->cqe.sq_id = cpu_to_le16(sq->sqid);
> >  req->cqe.sq_head = cpu_to_le16(sq->head);
> >  addr = cq->dma_addr + cq->tail * n->cqe_size;
> > -nvme_inc_cq_tail(cq);
> > -pci_dma_write(>parent_obj, addr, (void *)>cqe,
> > +ret = pci_dma_write(>parent_obj, addr, (void *)>cqe,
> >  sizeof(req->cqe));
> > +if (ret) {
> > +trace_nvme_dev_err_addr_write(addr);
> > +timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> > +100 * SCALE_MS);
> > +break;
> > +}
> > +QTAILQ_REMOVE(>req_list, req, entry);
> > +nvme_inc_cq_tail(cq);
> >  nvme_req_clear(req);
> >  QTAILQ_INSERT_TAIL(>req_list, req, entry);
> >  }
> > @@ -1588,7 +1607,12 @@ static void nvme_process_sq(void *opaque)
> >  
>

Re: [PATCH v5 22/26] nvme: support multiple namespaces

2020-03-16 Thread Klaus Birkelund Jensen

On Feb 12 14:34, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> > This adds support for multiple namespaces by introducing a new 'nvme-ns'
> > device model. The nvme device creates a bus named from the device name
> > ('id'). The nvme-ns devices then connect to this and registers
> > themselves with the nvme device.
> > 
> > This changes how an nvme device is created. Example with two namespaces:
> > 
> >   -drive file=nvme0n1.img,if=none,id=disk1
> >   -drive file=nvme0n2.img,if=none,id=disk2
> >   -device nvme,serial=deadbeef,id=nvme0
> >   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
> >   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> > 
> > The drive property is kept on the nvme device to keep the change
> > backward compatible, but the property is now optional. Specifying a
> > drive for the nvme device will always create the namespace with nsid 1.
> Very reasonable way to do it. 
> > 
> > Signed-off-by: Klaus Jensen 
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/Makefile.objs |   2 +-
> >  hw/block/nvme-ns.c | 158 +++
> >  hw/block/nvme-ns.h |  60 +++
> >  hw/block/nvme.c| 235 +
> >  hw/block/nvme.h|  47 -
> >  hw/block/trace-events  |   6 +-
> >  6 files changed, 389 insertions(+), 119 deletions(-)
> >  create mode 100644 hw/block/nvme-ns.c
> >  create mode 100644 hw/block/nvme-ns.h
> > 
> > diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> > index 28c2495a00dc..45f463462f1e 100644
> > --- a/hw/block/Makefile.objs
> > +++ b/hw/block/Makefile.objs
> > @@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
> >  common-obj-$(CONFIG_XEN) += xen-block.o
> >  common-obj-$(CONFIG_ECC) += ecc.o
> >  common-obj-$(CONFIG_ONENAND) += onenand.o
> > -common-obj-$(CONFIG_NVME_PCI) += nvme.o
> > +common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o
> >  common-obj-$(CONFIG_SWIM) += swim.o
> >  
> >  obj-$(CONFIG_SH4) += tc58128.o
> > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> > new file mode 100644
> > index ..0e5be44486f4
> > --- /dev/null
> > +++ b/hw/block/nvme-ns.c
> > @@ -0,0 +1,158 @@
> > +#include "qemu/osdep.h"
> > +#include "qemu/units.h"
> > +#include "qemu/cutils.h"
> > +#include "qemu/log.h"
> > +#include "hw/block/block.h"
> > +#include "hw/pci/msix.h"
> Do you need this include?

No, I needed hw/pci/pci.h instead :)

> > +#include "sysemu/sysemu.h"
> > +#include "sysemu/block-backend.h"
> > +#include "qapi/error.h"
> > +
> > +#include "hw/qdev-properties.h"
> > +#include "hw/qdev-core.h"
> > +
> > +#include "nvme.h"
> > +#include "nvme-ns.h"
> > +
> > +static int nvme_ns_init(NvmeNamespace *ns)
> > +{
> > +NvmeIdNs *id_ns = >id_ns;
> > +
> > +id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> > +id_ns->nuse = id_ns->ncap = id_ns->nsze =
> > +cpu_to_le64(nvme_ns_nlbas(ns));
> Nitpick: To be honest I don't really like that chain assignment, 
> especially since it forces to wrap the line, but that is just my
> personal taste.

Fixed, and also added a comment as to why they are the same.

> > +
> > +return 0;
> > +}
> > +
> > +static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
> > +Error **errp)
> > +{
> > +uint64_t perm, shared_perm;
> > +
> > +Error *local_err = NULL;
> > +int ret;
> > +
> > +perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
> > +shared_perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> > +BLK_PERM_GRAPH_MOD;
> > +
> > +ret = blk_set_perm(ns->blk, perm, shared_perm, _err);
> > +if (ret) {
> > +error_propagate_prepend(errp, local_err, "blk_set_perm: ");
> > +return ret;
> > +}
> 
> You should consider using blkconf_apply_backend_options.
> Take a look at for example virtio_blk_device_realize.
> That will give you support for read only block devices as well.

So, yeah. There is a reason for this. And I will add that as a comment,
but I will write it here for posterity.

The problem is when the nvme-ns device starts getting more than just a
single drive attached (I have patches ready that will add a "metadata"
and a "state" drive). The blkconf_ functions work on a BlockConf that
embeds a BlockBackend, so you can't have one BlockConf with multiple
BlockBackend's. That is why I'm kinda copying the "good parts" of
the blkconf_apply_backend_options code here.

> 
> I personally only once grazed the area of block permissions,
> so I prefer someone from the block layer to review this as well.
> 
> > +
> > +ns->size = blk_getlength(ns->blk);
> > +if (ns->size < 0) {
> > +error_setg_errno(errp, -ns->size, "blk_getlength");
> > +return 1;
> > +}
> > +
> > +switch (n->conf.wce) {
> > +case ON_OFF_AUTO_ON:
> > +n->features.volatile_wc = 1;
> > +break;
> > +case ON_OFF_AUTO_OFF:
> > +n->features.volatile_wc = 0;
> > +case

Re: [PATCH 0/2] Fix Cooperlake CPU model

2020-03-16 Thread Zhang, Cathy


On 3/16/2020 4:41 PM, Paolo Bonzini wrote:

On 16/03/20 02:39, Zhang, Cathy wrote:

On 1/7/2020 9:31 PM, Paolo Bonzini wrote:

On 25/12/19 07:30, Xiaoyao Li wrote:

Current Cooperlake CPU model lacks VMX features which are introduced
by Paolo
several months ago, and it also lacks 2 security features in
MSR_IA32_ARCH_CAPABILITIES disclosed recently.

Xiaoyao Li (2):
    target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES
    target/i386: Add missed features to Cooperlake CPU model

   target/i386/cpu.c | 51 ++-
   target/i386/cpu.h | 13 +++-
   2 files changed, 58 insertions(+), 6 deletions(-)


Queued, thanks.

Paolo

Hi Paolo,

Can I ask one question that will you put all the patches for Cooper Lake
Cpu model into QEMU v5.0-rc0?

These are included already:

commit b952544fe8a061f0c0cccfd50a58220bc6ac94da
Merge: dc65a5bdc9 083b266f69
Author: Peter Maydell 
Date:   Fri Jan 10 17:16:49 2020 +

 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
staging
 
 * Compat machines fix (Denis)

 * Command line parsing fixes (Michal, Peter, Xiaoyao)
 * Cooperlake CPU model fixes (Xiaoyao)
 * i386 gdb fix (mkdolata)
 * IOEventHandler cleanup (Philippe)
 * icount fix (Pavel)
 * RR support for random number sources (Pavel)
 * Kconfig fixes (Philippe)
 
Paolo
Yes, I see they are already in master, but not in v4.2 yet, so will they 
be in the next release v5.0?

Re: [PATCH v2] python/qemu/qmp.py: QMP debug with VM label

2020-03-16 Thread Oksana Voshchana

Hi Eduardo
I'm already fixing it.

Thank you,

On Sun, Mar 15, 2020 at 5:39 PM Eduardo Habkost  wrote:

> On Thu, Mar 12, 2020 at 04:05:47PM +0200, Oksana Vohchana wrote:
> > QEMUMachine writes some messages to the default logger.
> > But it sometimes hard to read the output if we have requests to
> > more than one VM.
> > This patch adds a label to the logger in the debug mode.
> >
> > Signed-off-by: Oksana Vohchana 
> >
> > ---
> > v2:
> >  - Instead of shown the label in the message it provides the label
> >only in the debug logger information
> > ---
> >  python/qemu/machine.py | 2 +-
> >  python/qemu/qmp.py | 5 -
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> > index 183d8f3d38..d0aa774c1c 100644
> > --- a/python/qemu/machine.py
> > +++ b/python/qemu/machine.py
> > @@ -270,7 +270,7 @@ class QEMUMachine(object):
> >  self._vm_monitor = os.path.join(self._sock_dir,
> >  self._name +
> "-monitor.sock")
> >  self._remove_files.append(self._vm_monitor)
> > -self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor,
> server=True)
> > +self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor,
> server=True, nickname=self._name)
> >
> >  def _post_launch(self):
> >  if self._qmp:
> > diff --git a/python/qemu/qmp.py b/python/qemu/qmp.py
> > index f40586eedd..d58b18c304 100644
> > --- a/python/qemu/qmp.py
> > +++ b/python/qemu/qmp.py
> > @@ -46,7 +46,7 @@ class QEMUMonitorProtocol:
> >  #: Logger object for debugging messages
> >  logger = logging.getLogger('QMP')
>
> This will create a single logger instance.
>
> >
> > -def __init__(self, address, server=False):
> > +def __init__(self, address, server=False, nickname=None):
> >  """
> >  Create a QEMUMonitorProtocol class.
> >
> > @@ -62,6 +62,7 @@ class QEMUMonitorProtocol:
> >  self.__address = address
> >  self.__sock = self.__get_sock()
> >  self.__sockfile = None
> > +self._nickname = nickname
> >  if server:
> >  self.__sock.setsockopt(socket.SOL_SOCKET,
> socket.SO_REUSEADDR, 1)
> >  self.__sock.bind(self.__address)
> > @@ -188,6 +189,8 @@ class QEMUMonitorProtocol:
> >  @return QMP response as a Python dict or None if the connection
> has
> >  been closed
> >  """
> > +if self._nickname:
> > +self.logger.name = 'QMP.{}'.format(self._nickname)
>
> This will change the name of that single instance and affect
> every single QEMUMonitorProtocol object.  Please don't do that.
>
> You can just do:
>
> self.logger = logging.getLogger('QMP').getChild(self._nickname)
>
> at __init__().
>
>
> >  self.logger.debug(">>> %s", qmp_cmd)
> >  try:
> >  self.__sock.sendall(json.dumps(qmp_cmd).encode('utf-8'))
> > --
> > 2.21.1
> >
>
> --
> Eduardo
>
>

Re: [PATCH] softmmu/vl.c: Handle '-cpu help' and '-device help' before 'no default machine'

2020-03-16 Thread Kashyap Chamarthy

[Cc: Markus; he'd be pleasantly surprised with this, if he already
hadn't noticed this, as he was also mildly annoyed about this the other
day.]

On Fri, Mar 13, 2020 at 05:24:47PM +, Peter Maydell wrote:
> Currently if you try to ask for the list of CPUs for a target
> architecture which does not specify a default machine type
> you just get an error:
> 
>   $ qemu-system-arm -cpu help
>   qemu-system-arm: No machine specified, and there is no default
>   Use -machine help to list supported machines

I just applied the patch and built on QEMU.

With `qemu-system-arm`:

$> ./arm-softmmu/qemu-system-arm -cpu help | head -5
Available CPUs:
  arm1026
  arm1136
  arm1136-r2
  arm1176

$> ./arm-softmmu/qemu-system-arm -device help | head -5
Controller/Bridge/Hub devices:
name "i82801b11-bridge", bus PCI
name "ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port"
name "pci-bridge", bus PCI, desc "Standard PCI Bridge"
name "pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)"

With `qemu-system-aarch64`:

$> ./aarch64-softmmu/qemu-system-aarch64 -cpu help | head -5
Available CPUs:
  arm1026
  arm1136
  arm1136-r2
  arm1176

$> ./aarch64-softmmu/qemu-system-aarch64 -device help | head -5
Controller/Bridge/Hub devices:
name "i82801b11-bridge", bus PCI
name "ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port"
name "pci-bridge", bus PCI, desc "Standard PCI Bridge"
name "pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)"

> Since the list of CPUs doesn't depend on the machine, this is
> unnecessarily unhelpful. "-device help" has a similar problem.
> 
> Move the checks for "did the user ask for -cpu help or -device help"
> up so they precede the select_machine() call which checks that the
> user specified a valid machine type.
> 
> Signed-off-by: Peter Maydell 

Tested-by: Kashyap Chamarthy 

> ---
> This has been on-and-off irritating me for years, and it's
> embarrassing how simple the fix turns out to be...
> ---
>  softmmu/vl.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)

[...] 

Thanks. :)

-- 
/kashyap

[PULL 0/6] Audio 20200316 patches

2020-03-16 Thread Gerd Hoffmann

The following changes since commit 61c265f0660ee476985808c8aa7915617c44fd53:

  Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20200313a' 
into staging (2020-03-13 10:33:04 +)

are available in the Git repository at:

  git://git.kraxel.org/qemu tags/audio-20200316-pull-request

for you to fetch changes up to 49f77e6faf36cddd84417f9080462413acdbcc27:

  audio: add audiodev format=f32 option documentation (2020-03-16 10:18:07 
+0100)


audio: float fixes



Volker Rümelin (6):
  qapi/audio: add documentation for AudioFormat
  audio: change naming scheme of FLOAT_CONV macros
  audio: consistency changes
  audio: change mixing engine float range to [-1.f, 1.f]
  audio: fix saturation nonlinearity in clip_* functions
  audio: add audiodev format=f32 option documentation

 audio/mixeng_template.h | 22 ++
 audio/mixeng.c  | 26 +-
 qapi/audio.json | 14 ++
 qemu-options.hx |  4 ++--
 4 files changed, 39 insertions(+), 27 deletions(-)

-- 
2.18.2

Re: [PATCH v5 07/50] multi-process: define mpqemu-link object

2020-03-16 Thread Stefan Hajnoczi

On Tue, Mar 10, 2020 at 11:26:23AM -0700, Elena Ufimtseva wrote:
> On Tue, Mar 10, 2020 at 04:09:41PM +, Stefan Hajnoczi wrote:
> > On Mon, Feb 24, 2020 at 03:54:58PM -0500, Jagannathan Raman wrote:
> > > +msg->num_fds = 0;
> > > +for (chdr = CMSG_FIRSTHDR(); chdr != NULL;
> > > + chdr = CMSG_NXTHDR(, chdr)) {
> > > +if ((chdr->cmsg_level == SOL_SOCKET) &&
> > > +(chdr->cmsg_type == SCM_RIGHTS)) {
> > > +fdsize = chdr->cmsg_len - CMSG_LEN(0);
> > > +msg->num_fds = fdsize / sizeof(int);
> > > +if (msg->num_fds > REMOTE_MAX_FDS) {
> > > +/*
> > > + * TODO: Security issue detected. Sender never sends more
> > > + * than REMOTE_MAX_FDS. This condition should be 
> > > signaled to
> > > + * the admin
> > > + */
> > 
> > This TODO doesn't seem actionable.  The error is already handled.
> > 
> > > +qemu_log_mask(LOG_REMOTE_DEBUG,
> > > +  "%s: Max FDs exceeded\n", __func__);
> > > +return -ERANGE;
> > 
> > The mutex must be released.
> 
> Thank you! Will fix this and above.

I have posted a patch series that adds lock guards (automatic unlocking)
to prevent cases like this in the future:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg04628.html

You can use the QEMU_LOCK_GUARD() and/or WITH_QEMU_LOCK_GUARD() macros
to avoid the need for manual qemu_mutex_unlock() calls.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH] modules: load modules from versioned /var/run dir

2020-03-16 Thread Stefan Hajnoczi

On Tue, Mar 10, 2020 at 12:47:49PM +0100, Christian Ehrhardt wrote:
> On Tue, Mar 10, 2020 at 10:39 AM Stefan Hajnoczi  wrote:
> > On Fri, Mar 06, 2020 at 02:26:48PM +0100, Christian Ehrhardt wrote:
> And finally this has to be considered an "offer" by qemu to the packagers
> to fix a real field issue.
> The packaging does not "have to" exploit this, every Distro is free to just
> ignore it.

I understand.  My intention is just to draw the attention of the right
people so that other distros are aware of the problem and improvements
can be discussed.

From my own perspective it seems fine to merge this or a similar patch
into qemu.git.

Stefan

signature.asc
Description: PGP signature

[PATCH 0/2] avoid integer overflow

2020-03-16 Thread Yifei Jiang

the constant default type is "int", when the constant is shifted to the left,
it may exceed 32 bits, resulting in integer overflowing. So constant type need
change to "long"

Yifei Jiang (2):
  tcg: avoid integer overflow
  accel/tcg: avoid integer overflow

 accel/tcg/cputlb.c |  6 +++---
 tcg/tcg-op-gvec.c  | 18 +-
 tcg/tcg-op-vec.c   |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

-- 
2.19.1

[PULL 2/6] audio: change naming scheme of FLOAT_CONV macros

2020-03-16 Thread Gerd Hoffmann

From: Volker Rümelin 

This patch changes the naming scheme of the FLOAT_CONV_TO and
FLOAT_CONV_FROM macros to the scheme used in mixeng_template.h.

Signed-off-by: Volker Rümelin 
Message-id: 20200308193321.20668-2-vr_q...@t-online.de
Signed-off-by: Gerd Hoffmann 
---
 audio/mixeng.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/audio/mixeng.c b/audio/mixeng.c
index c14b0d874ce5..b57fad83bf3b 100644
--- a/audio/mixeng.c
+++ b/audio/mixeng.c
@@ -268,17 +268,17 @@ f_sample *mixeng_clip[2][2][2][3] = {
 };
 
 #ifdef FLOAT_MIXENG
-#define FLOAT_CONV_TO(x) (x)
-#define FLOAT_CONV_FROM(x) (x)
+#define CONV_NATURAL_FLOAT(x) (x)
+#define CLIP_NATURAL_FLOAT(x) (x)
 #else
 static const float float_scale = UINT_MAX;
-#define FLOAT_CONV_TO(x) ((x) * float_scale)
+#define CONV_NATURAL_FLOAT(x) ((x) * float_scale)
 
 #ifdef RECIPROCAL
 static const float float_scale_reciprocal = 1.f / UINT_MAX;
-#define FLOAT_CONV_FROM(x) ((x) * float_scale_reciprocal)
+#define CLIP_NATURAL_FLOAT(x) ((x) * float_scale_reciprocal)
 #else
-#define FLOAT_CONV_FROM(x) ((x) / float_scale)
+#define CLIP_NATURAL_FLOAT(x) ((x) / float_scale)
 #endif
 #endif
 
@@ -288,7 +288,7 @@ static void conv_natural_float_to_mono(struct st_sample 
*dst, const void *src,
 float *in = (float *)src;
 
 while (samples--) {
-dst->r = dst->l = FLOAT_CONV_TO(*in++);
+dst->r = dst->l = CONV_NATURAL_FLOAT(*in++);
 dst++;
 }
 }
@@ -299,8 +299,8 @@ static void conv_natural_float_to_stereo(struct st_sample 
*dst, const void *src,
 float *in = (float *)src;
 
 while (samples--) {
-dst->l = FLOAT_CONV_TO(*in++);
-dst->r = FLOAT_CONV_TO(*in++);
+dst->l = CONV_NATURAL_FLOAT(*in++);
+dst->r = CONV_NATURAL_FLOAT(*in++);
 dst++;
 }
 }
@@ -316,7 +316,7 @@ static void clip_natural_float_from_mono(void *dst, const 
struct st_sample *src,
 float *out = (float *)dst;
 
 while (samples--) {
-*out++ = FLOAT_CONV_FROM(src->l) + FLOAT_CONV_FROM(src->r);
+*out++ = CLIP_NATURAL_FLOAT(src->l) + CLIP_NATURAL_FLOAT(src->r);
 src++;
 }
 }
@@ -327,8 +327,8 @@ static void clip_natural_float_from_stereo(
 float *out = (float *)dst;
 
 while (samples--) {
-*out++ = FLOAT_CONV_FROM(src->l);
-*out++ = FLOAT_CONV_FROM(src->r);
+*out++ = CLIP_NATURAL_FLOAT(src->l);
+*out++ = CLIP_NATURAL_FLOAT(src->r);
 src++;
 }
 }
-- 
2.18.2

Re: [PATCH v2 0/6] mostly changes related to audio float samples

2020-03-16 Thread Gerd Hoffmann

On Sun, Mar 08, 2020 at 08:29:05PM +0100, Volker Rümelin wrote:
> v2:
> - "qapi/audio: add documentation for AudioFormat"
>   Markus suggested to correct a spelling mistake.
> 
> - "audio: add audiodev format=f32 option documentation"
>   New patch.

Pull request sent.

thanks,
  Gerd

[PATCH 01/11] MAINTAINERS: Fix KVM path expansion glob

2020-03-16 Thread Philippe Mathieu-Daudé

The KVM files has been moved from target-ARCH to the target/ARCH/
folder in commit fcf5ef2a. Fix the pathname expansion.

Fixes: fcf5ef2a ("Move target-* CPU file into a target/ folder")
Signed-off-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 32867bc636..7898e338f6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -353,7 +353,7 @@ Overall KVM CPUs
 M: Paolo Bonzini 
 L: k...@vger.kernel.org
 S: Supported
-F: */kvm.*
+F: */*/kvm*
 F: accel/kvm/
 F: accel/stubs/kvm-stub.c
 F: include/hw/kvm/
-- 
2.21.1

Re: [PATCH 0/5] QEMU Gating CI

2020-03-16 Thread Cleber Rosa




- Original Message -
> From: "Peter Maydell" 
> To: "Cleber Rosa" 
> Cc: "Alex Bennée" , "QEMU Developers" 
> , "Fam Zheng" ,
> "Eduardo Habkost" , "Beraldo Leal" , 
> "Philippe Mathieu-Daudé"
> , "Thomas Huth" , "Wainer dos Santos 
> Moschetta" , "Erik
> Skultety" , "Willian Rampazzo" , 
> "Wainer Moschetta" 
> Sent: Monday, March 16, 2020 7:57:33 AM
> Subject: Re: [PATCH 0/5] QEMU Gating CI
> 
> On Thu, 12 Mar 2020 at 22:16, Cleber Rosa  wrote:
> > The quick answer is:
> >
> >  $ git push g...@gitlab.com:qemu-project/qemu.git my-branch:staging
> 
> So I did this bit...
> 
> > Once that push happens, you could use:
> >
> >  $ contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
> 
> ...but this script just says:
> 
> $ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
> ERROR: No pipeline found
> failure
> 

Hi Peter,

A few possible reasons come to my mind:

1) It usually takes a few seconds after the push for the pipeline to

2) If you've pushed to a repo different than gitlab.com/qemu-project/qemu,
   you'd have to tweak the project ID (-p|--project-id).

3) The local branch is not called "staging", so the script can not find the
   commit ID, in that case you can use -c|--commit.

> thanks
> -- PMM
> 
> 

Please let me know if any of these points helps.

Cheers,
- Cleber.

[PATCH v4] python/qemu/qmp.py: QMP debug with VM label

2020-03-16 Thread Oksana Vohchana

QEMUMachine writes some messages to the default logger.
But it sometimes hard to read the output if we have requests to
more than one VM.
This patch adds a label to the logger in the debug mode.

Signed-off-by: Oksana Vohchana 
---
v2:
 - Instead of shown the label in the message it provides the label
   only in the debug logger information.
v3:
 - Fixes coding style problems.
v4:
 - Use a suffix method to get a children's logger process from the parent.
---
 python/qemu/machine.py | 3 ++-
 python/qemu/qmp.py | 5 -
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 183d8f3d38..f53abfa492 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -270,7 +270,8 @@ class QEMUMachine(object):
 self._vm_monitor = os.path.join(self._sock_dir,
 self._name + "-monitor.sock")
 self._remove_files.append(self._vm_monitor)
-self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor, server=True)
+self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor, server=True,
+nickname=self._name)
 
 def _post_launch(self):
 if self._qmp:
diff --git a/python/qemu/qmp.py b/python/qemu/qmp.py
index f40586eedd..d6c9b2f4b1 100644
--- a/python/qemu/qmp.py
+++ b/python/qemu/qmp.py
@@ -46,7 +46,7 @@ class QEMUMonitorProtocol:
 #: Logger object for debugging messages
 logger = logging.getLogger('QMP')
 
-def __init__(self, address, server=False):
+def __init__(self, address, server=False, nickname=None):
 """
 Create a QEMUMonitorProtocol class.
 
@@ -62,6 +62,9 @@ class QEMUMonitorProtocol:
 self.__address = address
 self.__sock = self.__get_sock()
 self.__sockfile = None
+self._nickname = nickname
+if self._nickname:
+self.logger = logging.getLogger('QMP').getChild(self._nickname)
 if server:
 self.__sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 self.__sock.bind(self.__address)
-- 
2.21.1

[PATCH v4 4/6] tap: allow extended virtio header with hash info

2020-03-16 Thread Yuri Benditovich

Signed-off-by: Yuri Benditovich 
---
 net/tap.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/tap.c b/net/tap.c
index 6207f61f84..47de7fdeb6 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -63,6 +63,14 @@ typedef struct TAPState {
 Notifier exit;
 } TAPState;
 
+/* TODO: remove when virtio_net.h updated */
+struct virtio_net_hdr_v1_hash {
+struct virtio_net_hdr_v1 hdr;
+uint32_t hash_value;
+uint16_t hash_report;
+uint16_t padding;
+};
+
 static void launch_script(const char *setup_script, const char *ifname,
   int fd, Error **errp);
 
@@ -254,7 +262,8 @@ static void tap_set_vnet_hdr_len(NetClientState *nc, int 
len)
 
 assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
 assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
-   len == sizeof(struct virtio_net_hdr));
+   len == sizeof(struct virtio_net_hdr) ||
+   len == sizeof(struct virtio_net_hdr_v1_hash));
 
 tap_fd_set_vnet_hdr_len(s->fd, len);
 s->host_vnet_hdr_len = len;
-- 
2.17.1

Re: [PATCH v3 0/4] linux-user: generate syscall_nr.h from linux unistd.h

2020-03-16 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200316085620.309769-1-laur...@vivier.eu/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==6255==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
==6310==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==6310==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7ffefb716000; bottom 0x7fc6a702; size: 0x0038546f6000 (241934753792)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/qtest/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
==6333==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
PASS 14 test-aio /aio/timer/schedule
==6325==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==6339==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
PASS 17 test-aio /aio-gsource/bh/schedule
---
PASS 26 test-aio /aio-gsource/event/flush
PASS 27 test-aio /aio-gsource/event/wait/no-flush-cb
PASS 2 ide-test /x86_64/ide/flush
==6345==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==6351==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
==6357==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
==6360==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 3 test-aio-multithread /aio/multi/mutex/contended
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
==6392==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
==6404==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
==6408==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 5 test-thread-pool /thread-pool/cancel
==6475==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may

[PATCH 03/11] MAINTAINERS: Add an entry for the HAX accelerator

2020-03-16 Thread Philippe Mathieu-Daudé

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Sergio Andres Gomez Del Real 
Cc: Vincent Palatin 
Cc: Yu Ning 
Cc: Tao Wu 
Cc: haxm-t...@intel.com
Cc: Colin Xu 
Cc: Hang Yuan 
Cc: David Chou 
Cc: Wenchao Wang 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 08d9556ab2..7ec42a18f7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -414,6 +414,12 @@ S: Maintained
 F: include/sysemu/accel.h
 F: accel/stubs/Makefile.objs
 
+HAX Accelerator
+S: Orphan
+F: accel/stubs/hax-stub.c
+F: target/i386/hax-all.c
+F: include/sysemu/hax.h
+
 WHPX CPUs
 M: Sunil Muthuswamy 
 S: Supported
-- 
2.21.1

Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-16 Thread Stefan Hajnoczi

On Wed, Mar 11, 2020 at 11:08:27PM -0700, Klaus Birkelund Jensen wrote:
> On Mar 11 15:54, Andrzej Jakowski wrote:
> > On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
> > > Please try:
> > > 
> > >   $ git grep pmem
> > > 
> > > backends/hostmem-file.c is the backend that can be used and the
> > > pmem_persist() API can be used to flush writes.
> > 
> > I've reworked this patch into hostmem-file type of backend.
> > From simple tests in virtual machine: writing to PMR region
> > and then reading from it after VM power cycle I have observed that
> > there is no persistency.

Sounds like an integration bug.  QEMU's NVDIMM emulation uses
HostMemoryBackend and file contents survive guest reboot.

If you would like help debugging this, please post a link to the code
and the command-line that you are using.

> > I guess that persistent behavior can be achieved if memory backend file
> > resides on actual persistent memory in VMM. I haven't found mechanism to
> > persist memory backend file when it resides in the file system on block
> > storage. My original mmap + msync based solution worked well there.
> > I believe that main problem with mmap was with "ifdef _WIN32" that made it 
> > platform specific and w/o it patchew CI complained. 
> > Is there a way that I could rework mmap + msync solution so it would fit
> > into qemu design?
> > 
> 
> Hi Andrzej,
> 
> Thanks for working on this!
> 
> FWIW, I have implemented other stuff for the NVMe device that requires
> persistent storage (e.g. LBA allocation tracking for DULBE support). I
> used the approach of adding an additional blockdev and simply use the
> qemu block layer. This would also make it work on WIN32. And if we just
> set bit 0 in PMRWBM and disable the write cache on the blockdev we
> should be good on the durability requirements.
>
> Unfortunately, I do not see (or know, maybe Stefan has an idea?) an easy
> way of using the MemoryRegionOps nicely with async block backend i/o. so
> we either have to use blocking I/O or fire and forget aio. Or, we can
> maybe keep bit 1 set in PMRWBM and force a blocking blk_flush on PMRSTS
> read.

QEMU's block layer does not support persistent memory semantics and
doesn't support mmap.  It's fine for storing state from device emulation
code, but if the guest itself requires memory load/store access to the
data then the QEMU block layer does not provide that.

For PMR I think HostMemoryBackend is the best fit.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 0/5] QEMU Gating CI

2020-03-16 Thread Peter Maydell

On Thu, 12 Mar 2020 at 22:16, Cleber Rosa  wrote:
> The quick answer is:
>
>  $ git push g...@gitlab.com:qemu-project/qemu.git my-branch:staging

So I did this bit...

> Once that push happens, you could use:
>
>  $ contrib/ci/scripts/gitlab-pipeline-status --verbose --wait

...but this script just says:

$ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --wait
ERROR: No pipeline found
failure

thanks
-- PMM

[Bug 1866870] Re: KVM Guest pauses after upgrade to Ubuntu 20.04

2020-03-16 Thread Boris Derzhavets

Verification new packages to be installed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866870

Title:
  KVM Guest pauses after upgrade to Ubuntu 20.04

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  Symptom:
  Error unpausing domain: internal error: unable to execute QEMU command 
'cont': Resetting the Virtual Machine is required

  Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in 
cb_wrapper
  callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
  callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 
66, in newfn
  ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/domain.py", line 1311, in 
resume
  self._backend.resume()
File "/usr/lib/python3/dist-packages/libvirt.py", line 2174, in resume
  if ret == -1: raise libvirtError ('virDomainResume() failed', dom=self)
  libvirt.libvirtError: internal error: unable to execute QEMU command 'cont': 
Resetting the Virtual Machine is required

  
  ---

  As outlined here:
  https://bugs.launchpad.net/qemu/+bug/1813165/comments/15

  After upgrade, all KVM guests are in a default pause state. Even after
  forcing them off via virsh, and restarting them the guests are paused.

  These Guests are not nested.

  A lot of diganostic information are outlined in the previous bug
  report link provided. The solution mentioned in previous report had
  been allegedly integrated into the downstream updates.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1866870/+subscriptions

Re: [PATCH 04/11] MAINTAINERS: Add an entry for the HVF accelerator

2020-03-16 Thread Philippe Mathieu-Daudé


Hi Roman,

On 3/16/20 1:12 PM, Roman Bolshakov wrote:

Hi Philippe,

I can take the ownership if nobody wants it. At the moment I'm working
on APIC for HVF to get kvm-unit-tests fixed.

Next items on the list (in no particular order):
* MMX emulation
* SSE emulation
* qxl display
* gdb stub
* virtio-gpu/virgil running on metal
* VFIO-PCI based on macOS user-space DriverKit framework


Glad to hear :)
I suppose Paolo will be happy to have someone caring about HVF.
Do you mind sending a patch to step in?

Thanks,

Phil.



Best regards,
Roman

On Mon, Mar 16, 2020 at 01:00:42PM +0100, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Reviewed-by: Nikita Leshenko 
Cc: Sergio Andres Gomez Del Real 
Cc: Roman Bolshakov 
Cc: Patrick Colp 
Cc: Cameron Esfahani 
Cc: Liran Alon 
Cc: Heiher 
---
  MAINTAINERS | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7ec42a18f7..bcf40afb85 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -420,6 +420,12 @@ F: accel/stubs/hax-stub.c
  F: target/i386/hax-all.c
  F: include/sysemu/hax.h
  
+HVF Accelerator

+S: Orphan
+F: accel/stubs/hvf-stub.c
+F: target/i386/hvf/hvf.c
+F: include/sysemu/hvf.h
+
  WHPX CPUs
  M: Sunil Muthuswamy 
  S: Supported
--
2.21.1

[PULL 6/6] audio: add audiodev format=f32 option documentation

2020-03-16 Thread Gerd Hoffmann

From: Volker Rümelin 

The documentaion for -audiodev format=f32 option was missing.

Signed-off-by: Volker Rümelin 
Message-id: 20200308193321.20668-6-vr_q...@t-online.de
Signed-off-by: Gerd Hoffmann 
---
 qemu-options.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 1d8f852d8969..962a5ebaa67a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -551,7 +551,7 @@ DEF("audiodev", HAS_ARG, QEMU_OPTION_audiodev,
 "in|out.frequency= frequency to use with fixed settings\n"
 "in|out.channels= number of channels to use with fixed 
settings\n"
 "in|out.format= sample format to use with fixed settings\n"
-"valid values: s8, s16, s32, u8, u16, u32\n"
+"valid values: s8, s16, s32, u8, u16, u32, f32\n"
 "in|out.voices= number of voices to use\n"
 "in|out.buffer-length= length of buffer in microseconds\n"
 "-audiodev none,id=id,[,prop[=value][,...]]\n"
@@ -647,7 +647,7 @@ SRST
 ``in|out.format=format``
 Specify the sample format to use when using fixed-settings.
 Valid values are: ``s8``, ``s16``, ``s32``, ``u8``, ``u16``,
-``u32``. Default is ``s16``.
+``u32``, ``f32``. Default is ``s16``.
 
 ``in|out.voices=voices``
 Specify the number of voices to use. Default is 1.
-- 
2.18.2

[PULL 5/6] audio: fix saturation nonlinearity in clip_* functions

2020-03-16 Thread Gerd Hoffmann

From: Volker Rümelin 

The current positive limit for the saturation nonlinearity is
only correct if the type of the result has 8 bits or less.

Signed-off-by: Volker Rümelin 
Message-id: 20200308193321.20668-5-vr_q...@t-online.de
Signed-off-by: Gerd Hoffmann 
---
 audio/mixeng_template.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/audio/mixeng_template.h b/audio/mixeng_template.h
index fc8e1d4d9ebf..bc8509e423f6 100644
--- a/audio/mixeng_template.h
+++ b/audio/mixeng_template.h
@@ -83,10 +83,9 @@ static inline int64_t glue (conv_, ET) (IN_T v)
 
 static inline IN_T glue (clip_, ET) (int64_t v)
 {
-if (v >= 0x7f00) {
+if (v >= 0x7fffLL) {
 return IN_MAX;
-}
-else if (v < -2147483648LL) {
+} else if (v < -2147483648LL) {
 return IN_MIN;
 }
 
-- 
2.18.2

[PATCH v2 2/2] lockable: add QemuRecMutex support

2020-03-16 Thread Stefan Hajnoczi

The polymorphic locking macros don't support QemuRecMutex yet.  Add it
so that lock guards can be used with QemuRecMutex.

Convert TCG plugins functions that benefit from these macros.  Manual
qemu_rec_mutex_lock/unlock() callers are left unmodified in cases where
clarity would not improve by switching to the macros.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/lockable.h |  2 ++
 plugins/core.c  |  7 +++
 plugins/loader.c| 16 
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 2b52c7c1e5..44b3f4be72 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -50,6 +50,7 @@ qemu_make_lockable(void *x, QemuLockable *lockable)
 #define QEMU_LOCK_FUNC(x) ((QemuLockUnlockFunc *)\
 QEMU_GENERIC(x,  \
  (QemuMutex *, qemu_mutex_lock), \
+ (QemuRecMutex *, qemu_rec_mutex_lock), \
  (CoMutex *, qemu_co_mutex_lock),\
  (QemuSpin *, qemu_spin_lock),   \
  unknown_lock_type))
@@ -57,6 +58,7 @@ qemu_make_lockable(void *x, QemuLockable *lockable)
 #define QEMU_UNLOCK_FUNC(x) ((QemuLockUnlockFunc *)  \
 QEMU_GENERIC(x,  \
  (QemuMutex *, qemu_mutex_unlock),   \
+ (QemuRecMutex *, qemu_rec_mutex_unlock), \
  (CoMutex *, qemu_co_mutex_unlock),  \
  (QemuSpin *, qemu_spin_unlock), \
  unknown_lock_type))
diff --git a/plugins/core.c b/plugins/core.c
index ed863011ba..51bfc94787 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -15,6 +15,7 @@
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 #include "qapi/error.h"
+#include "qemu/lockable.h"
 #include "qemu/option.h"
 #include "qemu/rcu_queue.h"
 #include "qemu/xxhash.h"
@@ -150,11 +151,11 @@ do_plugin_register_cb(qemu_plugin_id_t id, enum 
qemu_plugin_event ev,
 {
 struct qemu_plugin_ctx *ctx;
 
-qemu_rec_mutex_lock();
+QEMU_LOCK_GUARD();
 ctx = plugin_id_to_ctx_locked(id);
 /* if the plugin is on its way out, ignore this request */
 if (unlikely(ctx->uninstalling)) {
-goto out_unlock;
+return;
 }
 if (func) {
 struct qemu_plugin_cb *cb = ctx->callbacks[ev];
@@ -178,8 +179,6 @@ do_plugin_register_cb(qemu_plugin_id_t id, enum 
qemu_plugin_event ev,
 } else {
 plugin_unregister_cb__locked(ctx, ev);
 }
- out_unlock:
-qemu_rec_mutex_unlock();
 }
 
 void plugin_register_cb(qemu_plugin_id_t id, enum qemu_plugin_event ev,
diff --git a/plugins/loader.c b/plugins/loader.c
index 15fc7e5515..685d334e1a 100644
--- a/plugins/loader.c
+++ b/plugins/loader.c
@@ -19,6 +19,7 @@
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 #include "qapi/error.h"
+#include "qemu/lockable.h"
 #include "qemu/option.h"
 #include "qemu/rcu_queue.h"
 #include "qemu/qht.h"
@@ -367,15 +368,14 @@ void plugin_reset_uninstall(qemu_plugin_id_t id,
 struct qemu_plugin_reset_data *data;
 struct qemu_plugin_ctx *ctx;
 
-qemu_rec_mutex_lock();
-ctx = plugin_id_to_ctx_locked(id);
-if (ctx->uninstalling || (reset && ctx->resetting)) {
-qemu_rec_mutex_unlock();
-return;
+WITH_QEMU_LOCK_GUARD() {
+ctx = plugin_id_to_ctx_locked(id);
+if (ctx->uninstalling || (reset && ctx->resetting)) {
+return;
+}
+ctx->resetting = reset;
+ctx->uninstalling = !reset;
 }
-ctx->resetting = reset;
-ctx->uninstalling = !reset;
-qemu_rec_mutex_unlock();
 
 data = g_new(struct qemu_plugin_reset_data, 1);
 data->ctx = ctx;
-- 
2.24.1

[PATCH 06/11] accel/Kconfig: Extract accel selectors into their own config

2020-03-16 Thread Philippe Mathieu-Daudé

Move the accel selectors from the global Kconfig.host to their
own Kconfig file.

Signed-off-by: Philippe Mathieu-Daudé 
---
 Makefile  | 1 +
 Kconfig.host  | 7 ---
 accel/Kconfig | 6 ++
 3 files changed, 7 insertions(+), 7 deletions(-)
 create mode 100644 accel/Kconfig

diff --git a/Makefile b/Makefile
index d83a94bc53..d1e2ec10e7 100644
--- a/Makefile
+++ b/Makefile
@@ -419,6 +419,7 @@ MINIKCONF_ARGS = \
 CONFIG_PVRDMA=$(CONFIG_PVRDMA)
 
 MINIKCONF_INPUTS = $(SRC_PATH)/Kconfig.host \
+   $(SRC_PATH)/accel/Kconfig \
$(SRC_PATH)/hw/Kconfig
 MINIKCONF_DEPS = $(MINIKCONF_INPUTS) \
  $(wildcard $(SRC_PATH)/hw/*/Kconfig)
diff --git a/Kconfig.host b/Kconfig.host
index 55136e037d..a6d871c399 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -2,9 +2,6 @@
 # down to Kconfig.  See also MINIKCONF_ARGS in the Makefile:
 # these two need to be kept in sync.
 
-config KVM
-bool
-
 config LINUX
 bool
 
@@ -31,10 +28,6 @@ config VHOST_KERNEL
 bool
 select VHOST
 
-config XEN
-bool
-select FSDEV_9P if VIRTFS
-
 config VIRTFS
 bool
 
diff --git a/accel/Kconfig b/accel/Kconfig
new file mode 100644
index 00..c21802bb49
--- /dev/null
+++ b/accel/Kconfig
@@ -0,0 +1,6 @@
+config KVM
+bool
+
+config XEN
+bool
+select FSDEV_9P if VIRTFS
-- 
2.21.1

[PATCH 05/11] Makefile: Write MINIKCONF variables as one entry per line

2020-03-16 Thread Philippe Mathieu-Daudé

Having one entry per line helps reviews/refactors. As we are
going to modify the MINIKCONF variables, split them now to
ease further review.

Signed-off-by: Philippe Mathieu-Daudé 
---
 Makefile | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 7df22fcc5d..d83a94bc53 100644
--- a/Makefile
+++ b/Makefile
@@ -418,8 +418,10 @@ MINIKCONF_ARGS = \
 CONFIG_LINUX=$(CONFIG_LINUX) \
 CONFIG_PVRDMA=$(CONFIG_PVRDMA)
 
-MINIKCONF_INPUTS = $(SRC_PATH)/Kconfig.host $(SRC_PATH)/hw/Kconfig
-MINIKCONF_DEPS = $(MINIKCONF_INPUTS) $(wildcard $(SRC_PATH)/hw/*/Kconfig)
+MINIKCONF_INPUTS = $(SRC_PATH)/Kconfig.host \
+   $(SRC_PATH)/hw/Kconfig
+MINIKCONF_DEPS = $(MINIKCONF_INPUTS) \
+ $(wildcard $(SRC_PATH)/hw/*/Kconfig)
 MINIKCONF = $(PYTHON) $(SRC_PATH)/scripts/minikconf.py \
 
 $(SUBDIR_DEVICES_MAK): %/config-devices.mak: default-configs/%.mak 
$(MINIKCONF_DEPS) $(BUILD_DIR)/config-host.mak
-- 
2.21.1

[PATCH 00/11] accel: Allow targets to use Kconfig, disable semihosting by default

2020-03-16 Thread Philippe Mathieu-Daudé

This series include generic patches I took of the KVM/ARM
specific series which will follow.

- List orphan accelerators in MAINTAINERS
- Add accel/Kconfig
- Allow targets to use their how Kconfig
- Enforce semihosting on ARM/LM32/MIPS, disable it elsewhere

Previous RFC for semihosting posted earlier:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg631218.html

Philippe Mathieu-Daudé (11):
  MAINTAINERS: Fix KVM path expansion glob
  MAINTAINERS: Add an 'overall' entry for accelerators
  MAINTAINERS: Add an entry for the HAX accelerator
  MAINTAINERS: Add an entry for the HVF accelerator
  Makefile: Write MINIKCONF variables as one entry per line
  accel/Kconfig: Extract accel selectors into their own config
  accel/Kconfig: Add the TCG selector
  target/Kconfig: Allow targets to use Kconfig
  target/mips: Always enable CONFIG_SEMIHOSTING
  target/arm: Always enable CONFIG_SEMIHOSTING
  hw/semihosting: Make the feature depend of TCG, and allow to disable
it

 Makefile  | 10 ++--
 default-configs/aarch64-linux-user-common.mak |  4 
 default-configs/aarch64-linux-user.mak|  2 ++
 default-configs/aarch64_be-linux-user.mak |  2 ++
 default-configs/arm-linux-user-common.mak |  4 
 default-configs/arm-linux-user.mak|  2 ++
 default-configs/arm-softmmu.mak   |  4 +++-
 default-configs/armeb-linux-user.mak  |  2 ++
 default-configs/mips-linux-user-common.mak|  4 
 default-configs/mips-linux-user.mak   |  2 ++
 default-configs/mips64-linux-user.mak |  2 ++
 default-configs/mips64el-linux-user.mak   |  2 ++
 default-configs/mipsel-linux-user.mak |  2 ++
 default-configs/mipsn32-linux-user.mak|  2 ++
 default-configs/mipsn32el-linux-user.mak  |  2 ++
 Kconfig.host  |  7 --
 MAINTAINERS   | 23 ++-
 accel/Kconfig |  9 
 hw/semihosting/Kconfig|  4 +++-
 target/Kconfig|  1 +
 20 files changed, 78 insertions(+), 12 deletions(-)
 create mode 100644 default-configs/aarch64-linux-user-common.mak
 create mode 100644 default-configs/arm-linux-user-common.mak
 create mode 100644 default-configs/mips-linux-user-common.mak
 create mode 100644 accel/Kconfig
 create mode 100644 target/Kconfig

-- 
2.21.1

Re: [PATCH 0/5] QEMU Gating CI

2020-03-16 Thread Peter Maydell

On Mon, 16 Mar 2020 at 12:04, Cleber Rosa  wrote:
> A few possible reasons come to my mind:
>
> 1) It usually takes a few seconds after the push for the pipeline to
>
> 2) If you've pushed to a repo different than gitlab.com/qemu-project/qemu,
>you'd have to tweak the project ID (-p|--project-id).
>
> 3) The local branch is not called "staging", so the script can not find the
>commit ID, in that case you can use -c|--commit.

Yes, the local branch is something else for the purposes of
testing this series. But using --commit doesn't work either:

$ ./contrib/ci/scripts/gitlab-pipeline-status --verbose --commit
81beaaab0851fe8c4db971 --wait
ERROR: No pipeline found
failure

On the web UI:
https://gitlab.com/qemu-project/qemu/pipelines
the pipelines are marked "stuck" (I don't know why there
are two of them for the same commit); drilling down,
the build part has completed but all the test parts are
pending with "This job is stuck because you don't have
any active runners online with any of these tags assigned
to them" type messages.

thanks
-- PMM

[PATCH 07/11] accel/Kconfig: Add the TCG selector

2020-03-16 Thread Philippe Mathieu-Daudé

Expose the CONFIG_TCG selector to let minikconf.py uses it.

When building with --disable-tcg build, this helps to deselect
devices that are TCG-dependent.

Signed-off-by: Philippe Mathieu-Daudé 
---
 Makefile  | 1 +
 accel/Kconfig | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/Makefile b/Makefile
index d1e2ec10e7..1cf9d76ce7 100644
--- a/Makefile
+++ b/Makefile
@@ -405,6 +405,7 @@ endif
 MINIKCONF_ARGS = \
 $(CONFIG_MINIKCONF_MODE) \
 $@ $*/config-devices.mak.d $< $(MINIKCONF_INPUTS) \
+CONFIG_TCG=$(CONFIG_TCG) \
 CONFIG_KVM=$(CONFIG_KVM) \
 CONFIG_SPICE=$(CONFIG_SPICE) \
 CONFIG_IVSHMEM=$(CONFIG_IVSHMEM) \
diff --git a/accel/Kconfig b/accel/Kconfig
index c21802bb49..2ad94a3839 100644
--- a/accel/Kconfig
+++ b/accel/Kconfig
@@ -1,3 +1,6 @@
+config TCG
+bool
+
 config KVM
 bool
 
-- 
2.21.1

[PATCH v2] hw/scsi/vmw_pvscsi: Remove assertion for kick after reset

2020-03-16 Thread Liran Alon

From: Elazar Leibovich 

When running Ubuntu 3.13.0-65-generic guest, QEMU sometimes crashes
during guest ACPI reset. It crashes on assert(s->rings_info_valid)
in pvscsi_process_io().

Analyzing the crash revealed that it happens when userspace issues
a sync during a reboot syscall.

Below are backtraces we gathered from the guests.

Guest backtrace when issuing PVSCSI_CMD_ADAPTER_RESET:
pci_device_shutdown
device_shutdown
init_pid_ns
init_pid_ns
kernel_power_off
SYSC_reboot

Guest backtrace when issuing PVSCSI_REG_OFFSET_KICK_RW_IO:
scsi_done
scsi_dispatch_cmd
blk_add_timer
scsi_request_fn
elv_rb_add
__blk_run_queue
queue_unplugged
blk_flush_plug_list
blk_finish_plug
ext4_writepages
set_next_entity
do_writepages
__filemap_fdatawrite_range
filemap_write_and_wait_range
ext4_sync_file
ext4_sync_file
do_fsync
sys_fsync

Since QEMU pvscsi should imitate VMware pvscsi device emulation,
we decided to imitate VMware's behavior in this case.

To check VMware behavior, we wrote a kernel module that issues
a reset to the pvscsi device and then issues a kick. We ran it on
VMware ESXi 6.5 and it seems that it simply ignores the kick.
Hence, we decided to ignore the kick as well.

Signed-off-by: Elazar Leibovich 
Signed-off-by: Liran Alon 
---
 hw/scsi/vmw_pvscsi.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index c91352cf46de..2491f204ddd7 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -29,6 +29,7 @@
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "qemu/log.h"
 #include "hw/scsi/scsi.h"
 #include "migration/vmstate.h"
 #include "scsi/constants.h"
@@ -719,7 +720,12 @@ pvscsi_process_io(PVSCSIState *s)
 PVSCSIRingReqDesc descr;
 hwaddr next_descr_pa;
 
-assert(s->rings_info_valid);
+if (!s->rings_info_valid) {
+qemu_log("WARNING: PVSCSI: Cannot process I/O when "
+ "rings are not valid.\n");
+return;
+}
+
 while ((next_descr_pa = pvscsi_ring_pop_req_descr(>rings)) != 0) {
 
 /* Only read after production index verification */
-- 
2.20.1

1 2 3 4 5 6 7 >

1 - 100 of 640 matches

Mail list logo