Re: [Qemu-devel] [PATCH v3 09/18] sdcard: handles more commands in SPI mode

2018-01-31 Thread Philippe Mathieu-Daudé
On 01/31/2018 09:11 PM, Alistair Francis wrote:
> On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  
> wrote:
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/sd/sd.c | 57 ++---
>>  1 file changed, 54 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
>> index 2eca999bc3..07424aa56e 100644
>> --- a/hw/sd/sd.c
>> +++ b/hw/sd/sd.c
>> @@ -1390,9 +1390,11 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
>> SDRequest req)
>>
>>  /* Application specific commands (Class 8) */
>>  case 55:   /* CMD55:  APP_CMD */
>> -if (sd->rca != rca)
>> -return sd_r0;
>> -
>> +if (!sd->spi) {
>> +if (sd->rca != rca) {
>> +return sd_r0;
>> +}
>> +}
>>  sd->expecting_acmd = true;
>>  sd->card_status |= APP_CMD;
>>  return sd_r1;
>> @@ -1412,6 +1414,18 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
>> SDRequest req)
>>  }
>>  break;
>>
>> +case 58:/* CMD58:   READ_OCR (SPI) */
>> +if (!sd->spi) {
>> +goto bad_cmd;
>> +}
>> +return sd_r3;
>> +
>> +case 59:/* CMD59:   CRC_ON_OFF (SPI) */
>> +if (!sd->spi) {
>> +goto bad_cmd;
>> +}
>> +goto unimplemented_cmd;
>> +
>>  default:
>>  bad_cmd:
>>  qemu_log_mask(LOG_GUEST_ERROR, "SD: Unknown CMD%i\n", req.cmd);
>> @@ -1436,6 +1450,9 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>>  sd->card_status |= APP_CMD;
>>  switch (req.cmd) {
>>  case 6:/* ACMD6:  SET_BUS_WIDTH */
>> +if (sd->spi) {
>> +goto unimplemented_cmd;
>> +}
>>  switch (sd->state) {
>>  case sd_transfer_state:
>>  sd->sd_status[0] &= 0x3f;
>> @@ -1460,6 +1477,12 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>>  }
>>  break;
>>
>> +case 18:
> 
> These should all have a comment describing what they are and for the
> others as well.

This is backported from a patch of you or Peter Crosthwaite =)

I'll check the specs and add documentation.

> 
>> +if (sd->spi) {
>> +goto unimplemented_cmd;
>> +}
>> +break;
>> +
>>  case 22:   /* ACMD22: SEND_NUM_WR_BLOCKS */
>>  switch (sd->state) {
>>  case sd_transfer_state:
>> @@ -1485,6 +1508,19 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>>  }
>>  break;
>>
>> +case 25:
>> +case 26:
>> +if (sd->spi) {
>> +goto unimplemented_cmd;
>> +}
>> +break;
>> +
>> +case 38:
>> +if (sd->spi) {
>> +goto unimplemented_cmd;
>> +}
>> +break;
>> +
>>  case 41:   /* ACMD41: SD_APP_OP_COND */
>>  if (sd->spi) {
>>  /* SEND_OP_CMD */
>> @@ -1542,6 +1578,12 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>>  }
>>  break;
>>
>> +case 43 ... 49:
>> +if (sd->spi) {
>> +goto unimplemented_cmd;
>> +}
>> +break;
>> +
>>  case 51:   /* ACMD51: SEND_SCR */
>>  switch (sd->state) {
>>  case sd_transfer_state:
>> @@ -1555,9 +1597,18 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>>  }
>>  break;
>>
>> +case 55:/* Not exist */
>> +break;
>> +
>>  default:
>>  /* Fall back to standard commands.  */
>>  return sd_normal_command(sd, req);
>> +
>> +unimplemented_cmd:
>> +/* Commands that are recognised but not yet implemented in SPI 
>> mode.  */
> 
> This should be unimplemented_spi_cmd then, to be more clear.

Good idea.

> 
> Alistair
> 
>> +qemu_log_mask(LOG_UNIMP, "SD: CMD%i not implemented in SPI mode\n",
>> +  req.cmd);
>> +return sd_illegal;
>>  }
>>
>>  qemu_log_mask(LOG_GUEST_ERROR, "SD: ACMD%i in a wrong state\n", 
>> req.cmd);
>> --
>> 2.15.1
>>
>>



Re: [Qemu-devel] [PATCH v2 04/19] Include qapi/error.h exactly where needed

2018-01-31 Thread Markus Armbruster
Cc: Gerd for an idea; please skip to bottom of the message.

Eric Blake  writes:

> On 01/31/2018 08:48 AM, Markus Armbruster wrote:
>> This cleanup makes the number of objects depending on qapi/error.h
>> drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
>> 
>> While there, separate #include from file comment with a blank line,
>> and drop a useless comment on why qemu/osdep.h is included first.
>> 
>> Signed-off-by: Markus Armbruster 
>> Reviewed-by: Eric Blake 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> ---
>
>> +++ b/block/write-threshold.c
>> @@ -16,6 +16,7 @@
>>  #include "block/write-threshold.h"
>>  #include "qemu/notify.h"
>>  #include "qapi-event.h"
>> +#include "qapi/error.h"
>>  #include "qmp-commands.h"
>>  
>>  
>
> Worth squashing a double blank line while here?

Yes.  Same for the other too.

>> +++ b/hw/pci-bridge/i82801b11.c
>> @@ -44,7 +44,6 @@
>>  #include "qemu/osdep.h"
>>  #include "hw/pci/pci.h"
>>  #include "hw/i386/ich9.h"
>> -#include "qapi/error.h"
>>  
>>  
>>  
>> /*/
>
> And here
>
>> +++ b/include/crypto/random.h
>> @@ -22,7 +22,6 @@
>>  #define QCRYPTO_RANDOM_H
>>  
>>  #include "qemu-common.h"
>> -#include "qapi/error.h"
>>  
>>  
>>  /**
>> diff --git a/include/crypto/xts.h b/include/crypto/xts.h
>> index da32ab82b6..719971afb7 100644
>> --- a/include/crypto/xts.h
>> +++ b/include/crypto/xts.h
>> @@ -27,7 +27,6 @@
>>  #define QCRYPTO_XTS_H
>>  
>>  #include "qemu-common.h"
>> -#include "qapi/error.h"
>>  
>>  
>>  #define XTS_BLOCK_SIZE 16
>
> And here
>
>> +++ b/include/ui/console.h
>> @@ -6,7 +6,9 @@
>>  #include "qapi/qmp/qdict.h"
>>  #include "qemu/notify.h"
>>  #include "qemu/error-report.h"
>> +#ifndef CONFIG_VNC
>>  #include "qapi/error.h"
>> +#endif
>
> Interesting conditional; it's because of the static inline fallbacks
> that are also conditional.  Makes sense.  An alternative would be having
> the fallbacks be in a .c rather than static inline in the .h, but it
> doesn't bother me enough to worry about it.

In my opinion, they should really, really be made a stub so we don't
have to include qapi/error.h in any configuration, but right now this
simple ifdeffery is all I can do.

> My R-b stands whether or not you make more whitespace tweaks.

Thanks!



Re: [Qemu-devel] [PATCH v2 02/19] Clean up includes

2018-01-31 Thread Markus Armbruster
Fam Zheng  writes:

> On Wed, Jan 31, 2018 at 11:48 PM, Thomas Huth  wrote:
>> On 31.01.2018 15:48, Markus Armbruster wrote:
>>> Clean up includes so that osdep.h is included first and headers
>>> which it implies are not included manually.
>>>
>>> This commit was created with scripts/clean-includes, with the change
>>> to target/s390x/gen-features.c manually reverted, and blank lines
>>> around deletions collapsed.
>>
>> Reviewed-by: Thomas Huth 
>>
>> I wonder whether it would make sense to add a check based on
>> scripts/clean-includes to patchew already, so that these includes do not
>> sneak in so easily again...?
>
> (Having not looked at the series), are we clean with this series
> applied? If so it makes a good point to do so.

It isn't, but I guess it could be made clean with a bit of work both on
sources and the script.



Re: [Qemu-devel] [PATCH v2 02/19] Clean up includes

2018-01-31 Thread Markus Armbruster
Eric Blake  writes:

> On 01/31/2018 08:48 AM, Markus Armbruster wrote:
>> Clean up includes so that osdep.h is included first and headers
>> which it implies are not included manually.
>> 
>> This commit was created with scripts/clean-includes, with the change
>> to target/s390x/gen-features.c manually reverted, and blank lines
>> around deletions collapsed.
>
> Is it worth tweaking scripts/clean-includes to add gen-features.c to the
> whitelist of special cases (alongside *.inc.c, perhaps)?

Probably, but I'm in a bit of a time squeeze right now.

A naming convention for .c files that only compile when included into
something else would make sense.  Hmm, like *gasp* .h?

Offenders include crypto/cipher-*.c.  "git-grep '#include.*\.c'" coughs
up more candidates.

>> Signed-off-by: Markus Armbruster 
>> ---
>
> Reviewed-by: Eric Blake 

Thanks!



Re: [Qemu-devel] [PATCH v2 01/19] Use #include "..." for our own headers, <...> for others

2018-01-31 Thread Markus Armbruster
Eric Blake  writes:

> On 01/31/2018 08:48 AM, Markus Armbruster wrote:
>> System headers should be included with <...>, our own headers with
>> "...".  Offenders tracked down with an ugly, brittle and probably
>> buggy Perl script.  Previous iteration was commit a9c94277f0.
>> 
>> Put the cleaned up system header includes first, except for the ones
>> the next commit will delete.
>> 
>> While there, separate #include from file comment with a blank line,
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>
>> +++ b/target/i386/hvf/x86_mmu.c
>> @@ -15,18 +15,17 @@
>>   * You should have received a copy of the GNU Lesser General Public
>>   * License along with this program; if not, see 
>> .
>>   */
>> +
>>  #include "qemu/osdep.h"
>> +#include 
>
>  is an obsolete spelling for the now-universal .  We
> should NEVER need to include it; scripts/clean-includes should be taught
> to blacklist this one.

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe trying to
produce bigger and better idiots.  So far, the Universe is winning."

My point is: if we extended our tooling every time we see a mistake,
we'll drown in tooling.  I think we should limit ourselves to *common*
mistakes.  Is this one common?

> But that's a matter for another patch; this one does exactly what it
> promises, and was mechanical (even if the brittle perl script isn't
> published), so it's best to keep it as-is.

I append my Perl script.

Its simpleminded approach to locating headers in the source tree fails
frequently.  It should really use the compiler to find them.

I reviewed inclusions of headers the script flags as "(included
inconsistently)".  I also reviewed "(included with <>)" where the script
can find a header in the tree.  I patched only obvious mistakes.  There
are a few unobvious cases: headers in tree with the same name as a
system header, e.g. elf.h, io.h.

> Reviewed-by: Eric Blake 

Thanks!


#!/usr/bin/perl -w

use strict;

my %ihdr = ();
my %idir = ();
my @sall = ();
my @sinc = ();

open(my $fh, "-|", "git grep '^[ \t]*#[ \t]*include[ \t]'")
or die "can't run git grep: $!";
while (<$fh>) {
m,^(([^:]*)/)?[^:/]*:[ \t]*\#[ \t]*include[ \t]*(["<])([^">]*), or next;
my $dir = $2 // ".";
my $delim = $3;
my $h = $4;
$ihdr{$h} |= 1 << ($delim eq "<");
if (exists $idir{$h}) {
my $aref = $idir{$h};
push @$aref, $dir unless grep($_ eq $dir, @$aref);
} else {
$idir{$h} = [$dir];
}
}
close ($fh);

open($fh, "-|", "git ls-tree -r --name-only HEAD")
or die "can't run git ls-tree: $!";
while (<$fh>) {
chomp;
push @sall, $_;
}
close ($fh);

@sinc = grep(/^include\//, @sall);

sub pr {
my ($h, $fn, $src) = @_;

print "$h -> $fn";
if ($ihdr{$h} == 3) {
print " (included inconsistently)";
} elsif ($src) {
print " (included with <>)" if ($ihdr{$h} != 1);
} else {
print " (included with \"\")" if ($ihdr{$h} != 2);
}
print "\n";
}

for my $h (keys %ihdr) {
$h =~ m,^(\.\./)*(include/)?(.*), or die;
my $hh = $3;
my @fn = grep(/^include\/\Q$hh\E$/, @sinc);
if (@fn) {
pr($h, $fn[0], 1);
next;
}
@fn = grep(/^\Q$hh\E$/, @sall);
if (@fn) {
pr($h, $fn[0], 1);
next;
}
for my $dir (@{$idir{$h}}) {
next if $dir eq ".";
print "## $dir/$hh\n";
@fn = grep(/^\Q$dir\/$hh\E$/, @sall);
if (@fn) {
pr($h, $fn[0], 1);
} else {
pr($h, "? (in $dir)", 0);
}
}
}



Re: [Qemu-devel] [PATCH v2 01/19] Use #include "..." for our own headers, <...> for others

2018-01-31 Thread Markus Armbruster
Thomas Huth  writes:

> On 31.01.2018 15:48, Markus Armbruster wrote:
>> System headers should be included with <...>, our own headers with
>> "...". Offenders tracked down with an ugly, brittle and probably
>> buggy Perl script.  Previous iteration was commit a9c94277f0.
>> 
>> Put the cleaned up system header includes first, except for the ones
>> the next commit will delete.
>
> That's a little bit of code churn ... why not delete them here
> immediately, or simply ignore these headers here and just delete them in
> the next patch?

Ignore won't do, as scripts/clean-includes won't find them then.

Delete is possible, but requires still more explanation in the commit
message.  Worthwhile?

>> While there, separate #include from file comment with a blank line,
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
> [...]
>> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
>> index 0570f597ec..9968974fbe 100644
>> --- a/target/s390x/gen-features.c
>> +++ b/target/s390x/gen-features.c
>> @@ -13,8 +13,8 @@
>>   */
>>  
>>  
>
> While you're at it, remove one of the two blank lines?

Yes.

>> -#include "inttypes.h"
>> -#include "stdio.h"
>> +#include 
>> +#include 
>>  #include "cpu_features_def.h"
>
> Above remarks are just nits, anyway patch looks fine, so:
>
> Reviewed-by: Thomas Huth 

Thanks!



Re: [Qemu-devel] [PATCH] vcpu: join vcpu thread after it exiting

2018-01-31 Thread no-reply
Hi,

This series failed docker-build@min-glib build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1517463523-21394-1-git-send-email-li...@zju.edu.cn
Subject: [Qemu-devel] [PATCH] vcpu: join vcpu thread after it exiting

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
7e25138ab9 vcpu: join vcpu thread after it exiting

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-90o96dw9/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   min-glib
  GEN 
/var/tmp/patchew-tester-tmp-90o96dw9/src/docker-src.2018-02-01-01.39.11.26281/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-90o96dw9/src/docker-src.2018-02-01-01.39.11.26281/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-90o96dw9/src/docker-src.2018-02-01-01.39.11.26281/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-90o96dw9/src/docker-src.2018-02-01-01.39.11.26281/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'10739aa26051a5d49d88132604539d3ed085e72e'
  COPYRUNNER
RUN test-build in qemu:min-glib 
Environment variables:
HOSTNAME=213cecf9d462
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/tmp/qemu-test/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/tmp/qemu-test/install
BIOS directory/tmp/qemu-test/install/share/qemu
firmware path /tmp/qemu-test/install/share/qemu-firmware
binary directory  /tmp/qemu-test/install/bin
library directory /tmp/qemu-test/install/lib
module directory  /tmp/qemu-test/install/lib/qemu
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory  /tmp/qemu-test/install/etc
local state directory   /tmp/qemu-test/install/var
Manual directory  /tmp/qemu-test/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
GIT binarygit
GIT submodules
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all -Wno-missing-braces
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
Multipath support no
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
HAX support   no
HVF support   no
TCG 

Re: [Qemu-devel] Can not hotplug device to pci-to-pci bridge using machine type older than 1.7

2018-01-31 Thread Fei Li


On 01/29/2018 09:33 PM, Igor Mammedov wrote:

On Thu, 25 Jan 2018 16:01:39 +0800
Fei Li  wrote:


Hi,

After I hotplug a pci device to a pci2pci bridge (I use "pci.1") with
machine type 1.7 or older,
I can not see the pci device via `lspci` in the guest, but can see it
via `info qtree` in the hmp.
What's more, after I reboot the guest or hotplug another pci device to
pci.0, I can see the
previous hotplugged pci device via `lspci`.

I use qemu v2.9.* (but machine type =1.7 or 1.6) and kernel 4.4.
I also do the test on self-build qemu v1.7, v1.6 and kernel 4.4, the
same phenomenon.
My qemu command line is as follows:
/usr/bin/qemu-system-x86_64 -m 1024 -machine
pc-i440fx-1.7,accel=kvm,kernel_irqchip=on -enable-kvm \
-smp sockets=1,cores=1,threads=1 -no-user-config -cpu Skylake-Client \
-name "pci-hp-qemu" -monitor telnet:127.0.0.1:8848,server,nowait \
-drive file=/opt/pcihp.qcow2,if=none,id=drive0,format=qcow2 \
-device virtio-blk-pci,drive=drive0,id=sata0,bootindex=1 \
-device pci-bridge,chassis_nr=1 \
-device virtio-net-pci,bus=pci.1,addr=01,id=netdev1 \
-device virtio-net-pci,bus=pci.0,addr=07,id=netdev2 \
-boot order=c
...
(qemu) device_add virtio-net-pci,bus=pci.1,addr=12,id=netdev12

Checking the qemu commit log, I see qemu v2.0 introduces the acpi based
pci hotplug
support and enables hotplug for pci devices behind pci2pci bridges
(upstream commit 99fd437d, 9e047b98 and db4728e6).

I also checked qemu v2.9 code, and notice there's a judge: if
use_acpi_pci_hotplug is true (MT >= 2.0),
the code will update the hotplug_handler for *all buses* to PIIX4PMState
and uses
acpi_pcihp_device_plug_cb as the hp callback; else, for MT<=1.7, the
code will only update the
hotplug_handler for pci.0 and leave other buses continue to use its
default hp_handler: the
pci2pci bridge and default callback: pci_bridge_dev_hotplug_cb =>
shpc_device_hotplug_cb.
With the latter shpc* hotplug callback, hotplug a pci device to a
pci2pci bridge, like pci.1, I can
not be seen it in the guest via `lspci`.

I am just wondering whether the qemu code does not support the shpc
hotplug to pci2pci bridge,

Last time, it used to 'work' i.e. guest is notified but fails to init
hotplugged device due to lack of IO resources.

I'd suggest to start looking at issue from guest side and see what
SHPC driver does to figure out if something in QEMU/firmware needs
to be fixed.

Sorry for my late reply..
Thanks very much for giving me the clear direction. I am afraid I need more
time to locate the reason. Considering pcihp to pci2pci bridge with old MT
(<=1.7) is a rare scenario, I'd like to continue investigating as long 
as I have

enough free time. :)

Again, thanks for your kindly suggestions.
Fei




or the shpc spec does not support this.
Could someone shed some light on me? Thanks a lot!

Have a nice day
Fei









Re: [Qemu-devel] [PATCH] vcpu: join vcpu thread after it exiting

2018-01-31 Thread no-reply
Hi,

This series failed docker-quick@centos6 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1517463523-21394-1-git-send-email-li...@zju.edu.cn
Subject: [Qemu-devel] [PATCH] vcpu: join vcpu thread after it exiting

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
7e25138ab9 vcpu: join vcpu thread after it exiting

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-fq304xn_/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   centos6
  GEN 
/var/tmp/patchew-tester-tmp-fq304xn_/src/docker-src.2018-02-01-01.28.13.6075/qemu.tar
Cloning into 
'/var/tmp/patchew-tester-tmp-fq304xn_/src/docker-src.2018-02-01-01.28.13.6075/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 
'/var/tmp/patchew-tester-tmp-fq304xn_/src/docker-src.2018-02-01-01.28.13.6075/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered 
for path 'ui/keycodemapdb'
Cloning into 
'/var/tmp/patchew-tester-tmp-fq304xn_/src/docker-src.2018-02-01-01.28.13.6075/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out 
'10739aa26051a5d49d88132604539d3ed085e72e'
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
bzip2-devel-1.0.5-7.el6_0.x86_64
ccache-3.1.6-2.el6.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el6.x86_64
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
gettext-0.17-18.el6.x86_64
git-1.7.1-9.el6_9.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libepoxy-devel-1.2-3.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
librdmacm-devel-1.0.21-0.el6.x86_64
lzo-devel-2.03-3.1.el6_5.1.x86_64
make-3.81-23.el6.x86_64
mesa-libEGL-devel-11.0.7-4.el6.x86_64
mesa-libgbm-devel-11.0.7-4.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
spice-glib-devel-0.26-8.el6.x86_64
spice-server-devel-0.12.4-16.el6.x86_64
tar-1.23-15.el6_8.x86_64
vte-devel-0.25.1-9.el6.x86_64
xen-devel-4.6.6-2.el6.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=bison bzip2-devel ccache csnappy-devel flex g++
 gcc gettext git glib2-devel libepoxy-devel libfdt-devel
 librdmacm-devel lzo-devel make mesa-libEGL-devel 
mesa-libgbm-devel pixman-devel SDL-devel spice-glib-devel 
spice-server-devel tar vte-devel xen-devel zlib-devel
HOSTNAME=3a375ebf79d8
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/tmp/qemu-test/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/tmp/qemu-test/install
BIOS directory/tmp/qemu-test/install/share/qemu
firmware path /tmp/qemu-test/install/share/qemu-firmware
binary directory  /tmp/qemu-test/install/bin
library directory /tmp/qemu-test/install/lib
module directory  /tmp/qemu-test/install/lib/qemu
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory  /tmp/qemu-test/install/etc
local state directory   /tmp/qemu-test/install/var
Manual directory  /tmp/qemu-test/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
GIT binarygit
GIT submodules
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -DNCURSES_WIDECHAR   
-fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
-D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
-Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  
-Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs 
-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers 
-Wold-style-declaration -Wold-style-definition -Wtype-limits 
-fstack-protector-all -Wno-missing-braces  -I/usr/include/libpng12   

[Qemu-devel] [PATCH] virtio-blk: enable multiple vectors when using multiple I/O queues

2018-01-31 Thread Changpeng Liu
Currently virtio-pci driver hardcoded 2 vectors for virtio-blk device,
for multiple I/O queues scenario, all the I/O queues will share one
interrupt vector, while here, enable multiple vectors according to
the number of I/O queues.

Signed-off-by: Changpeng Liu 
---
 hw/virtio/virtio-pci.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 9ae10f0..379b00c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1932,7 +1932,7 @@ static Property virtio_blk_pci_properties[] = {
 DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
 DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
-DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
+DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 
DEV_NVECTORS_UNSPECIFIED),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1941,6 +1941,10 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
 
+if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
+vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
+}
+
 qdev_set_parent_bus(vdev, BUS(_dev->bus));
 object_property_set_bool(OBJECT(vdev), true, "realized", errp);
 }
@@ -1983,7 +1987,7 @@ static const TypeInfo virtio_blk_pci_info = {
 
 static Property vhost_user_blk_pci_properties[] = {
 DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
-DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
+DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 
DEV_NVECTORS_UNSPECIFIED),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1992,6 +1996,10 @@ static void vhost_user_blk_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
 
+if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
+vpci_dev->nvectors = dev->vdev.num_queues + 1;
+}
+
 qdev_set_parent_bus(vdev, BUS(_dev->bus));
 object_property_set_bool(OBJECT(vdev), true, "realized", errp);
 }
-- 
1.9.3




[Qemu-devel] [PATCH v3 2/2] i386: Add support to get/set/migrate Intel Processor Trace feature

2018-01-31 Thread Luwei Kang
From: Chao Peng 

Add Intel Processor Trace related definition. It also add
corresponding part to kvm_get/set_msr and vmstate.

Signed-off-by: Chao Peng 
Signed-off-by: Luwei Kang 
---
 target/i386/cpu.h | 22 ++
 target/i386/kvm.c | 51 +++
 target/i386/machine.c | 38 ++
 3 files changed, 111 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7facc8b..164d17f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -419,6 +419,21 @@ typedef enum X86Seg {
 #define MSR_MC0_ADDR0x402
 #define MSR_MC0_MISC0x403
 
+#define MSR_IA32_RTIT_OUTPUT_BASE   0x560
+#define MSR_IA32_RTIT_OUTPUT_MASK   0x561
+#define MSR_IA32_RTIT_CTL   0x570
+#define MSR_IA32_RTIT_STATUS0x571
+#define MSR_IA32_RTIT_CR3_MATCH 0x572
+#define MSR_IA32_RTIT_ADDR0_A   0x580
+#define MSR_IA32_RTIT_ADDR0_B   0x581
+#define MSR_IA32_RTIT_ADDR1_A   0x582
+#define MSR_IA32_RTIT_ADDR1_B   0x583
+#define MSR_IA32_RTIT_ADDR2_A   0x584
+#define MSR_IA32_RTIT_ADDR2_B   0x585
+#define MSR_IA32_RTIT_ADDR3_A   0x586
+#define MSR_IA32_RTIT_ADDR3_B   0x587
+#define MAX_RTIT_ADDRS  8
+
 #define MSR_EFER0xc080
 
 #define MSR_EFER_SCE   (1 << 0)
@@ -1158,6 +1173,13 @@ typedef struct CPUX86State {
 uint64_t msr_hv_stimer_config[HV_STIMER_COUNT];
 uint64_t msr_hv_stimer_count[HV_STIMER_COUNT];
 
+uint64_t msr_rtit_ctrl;
+uint64_t msr_rtit_status;
+uint64_t msr_rtit_output_base;
+uint64_t msr_rtit_output_mask;
+uint64_t msr_rtit_cr3_match;
+uint64_t msr_rtit_addrs[MAX_RTIT_ADDRS];
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index f9f4cd1..097c953 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1811,6 +1811,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
 }
 }
+if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
+int addr_num = kvm_arch_get_supported_cpuid(kvm_state,
+0x14, 1, R_EAX) & 0x7;
+
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL,
+env->msr_rtit_ctrl);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_STATUS,
+env->msr_rtit_status);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_BASE,
+env->msr_rtit_output_base);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_MASK,
+env->msr_rtit_output_mask);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CR3_MATCH,
+env->msr_rtit_cr3_match);
+for (i = 0; i < addr_num; i++) {
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_ADDR0_A + i,
+env->msr_rtit_addrs[i]);
+}
+}
 
 /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
  *   kvm_put_msr_feature_control. */
@@ -2124,6 +2143,20 @@ static int kvm_get_msrs(X86CPU *cpu)
 }
 }
 
+if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
+int addr_num =
+kvm_arch_get_supported_cpuid(kvm_state, 0x14, 1, R_EAX) & 0x7;
+
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CTL, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_STATUS, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_BASE, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_OUTPUT_MASK, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_CR3_MATCH, 0);
+for (i = 0; i < addr_num; i++) {
+kvm_msr_entry_add(cpu, MSR_IA32_RTIT_ADDR0_A + i, 0);
+}
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -2364,6 +2397,24 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_SPEC_CTRL:
 env->spec_ctrl = msrs[i].data;
 break;
+case MSR_IA32_RTIT_CTL:
+env->msr_rtit_ctrl = msrs[i].data;
+break;
+case MSR_IA32_RTIT_STATUS:
+env->msr_rtit_status = msrs[i].data;
+break;
+case MSR_IA32_RTIT_OUTPUT_BASE:
+env->msr_rtit_output_base = msrs[i].data;
+break;
+case MSR_IA32_RTIT_OUTPUT_MASK:
+env->msr_rtit_output_mask = msrs[i].data;
+break;
+case MSR_IA32_RTIT_CR3_MATCH:
+env->msr_rtit_cr3_match = msrs[i].data;
+break;
+case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+env->msr_rtit_addrs[index - MSR_IA32_RTIT_ADDR0_A] = 

[Qemu-devel] [PATCH v3 1/2] i386: Add Intel Processor Trace feature support

2018-01-31 Thread Luwei Kang
From: Chao Peng 

Expose Intel Processor Trace feature to guest.

To make Intel PT live migration safe and get same CPUID information
with same CPU model on diffrent host. CPUID[14] is constant in this
patch. Intel PT use EPT is first supported in IceLake, the CPUID[14]
get on this machine as default value. Intel PT would be disabled
If any machine don't support this minial feature list.

Signed-off-by: Chao Peng 
Signed-off-by: Luwei Kang 
---
 target/i386/cpu.c | 53 +++--
 target/i386/cpu.h |  1 +
 target/i386/kvm.c | 23 +++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a49d222..aaa427a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -172,7 +172,14 @@
 #define L2_ITLB_4K_ASSOC   4
 #define L2_ITLB_4K_ENTRIES   512
 
-
+/* CPUID Leaf 0x14 constants: */
+#define INTLE_PT_MAX_SUBLEAF 0x1
+#define INTEL_PT_MINIMAL_EBX 0xf
+#define INTEL_PT_MINIMAL_ECX 0x7
+#define INTLE_PT_ADDR_RANGES_NUM 0x2 /* Number of configurable address ranges 
*/
+#define INTEL_PT_MTC_BITMAP  (0x0249 << 16) /* Support ART(0,3,6,9) */
+#define INTEL_PT_CYCLE_BITMAP0x1fff /* Support 0,2^(0~11) */
+#define INTEL_PT_PSB_BITMAP  (0x003f << 16) /* Support 
2K,4K,8K,16K,32K,64K */
 
 static void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
  uint32_t vendor2, uint32_t vendor3)
@@ -427,7 +434,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, "mpx", NULL,
 "avx512f", "avx512dq", "rdseed", "adx",
 "smap", "avx512ifma", "pcommit", "clflushopt",
-"clwb", NULL, "avx512pf", "avx512er",
+"clwb", "intel-pt", "avx512pf", "avx512er",
 "avx512cd", "sha-ni", "avx512bw", "avx512vl",
 },
 .cpuid_eax = 7,
@@ -3452,6 +3459,27 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x14: {
+/* Intel Processor Trace Enumeration */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) ||
+!kvm_enabled()) {
+break;
+}
+
+if (count == 0) {
+*eax = INTLE_PT_MAX_SUBLEAF;
+*ebx = INTEL_PT_MINIMAL_EBX;
+*ecx = INTEL_PT_MINIMAL_ECX;
+} else if (count == 1) {
+*eax = INTEL_PT_MTC_BITMAP | INTLE_PT_ADDR_RANGES_NUM;
+*ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
+}
+break;
+}
 case 0x4000:
 /*
  * CPUID code in kvm_arch_init_vcpu() ignores stuff
@@ -4082,6 +4110,27 @@ static int x86_cpu_filter_features(X86CPU *cpu)
 }
 }
 
+if (env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) {
+KVMState *s = CPU(cpu)->kvm_state;
+uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
+uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
+uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
+uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
+uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
+
+if (!eax_0 ||
+   ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
+   ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
+   ((eax_1 & (INTLE_PT_ADDR_RANGES_NUM | INTEL_PT_MTC_BITMAP)) !=
+(INTLE_PT_ADDR_RANGES_NUM | INTEL_PT_MTC_BITMAP)) ||
+   ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
+(INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP))) {
+env->features[FEAT_7_0_EBX] &= ~CPUID_7_0_EBX_INTEL_PT;
+cpu->filtered_features[FEAT_7_0_EBX] |= CPUID_7_0_EBX_INTEL_PT;
+rv = 1;
+}
+}
+
 return rv;
 }
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f91e37d..7facc8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -644,6 +644,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EBX_PCOMMIT  (1U << 22) /* Persistent Commit */
 #define CPUID_7_0_EBX_CLFLUSHOPT (1U << 23) /* Flush a Cache Line Optimized */
 #define CPUID_7_0_EBX_CLWB (1U << 24) /* Cache Line Write Back */
+#define CPUID_7_0_EBX_INTEL_PT (1U << 25) /* Intel Processor Trace */
 #define CPUID_7_0_EBX_AVX512PF (1U << 26) /* AVX-512 Prefetch */
 #define CPUID_7_0_EBX_AVX512ER (1U << 27) /* AVX-512 Exponential and 
Reciprocal */
 #define CPUID_7_0_EBX_AVX512CD (1U << 28) /* AVX-512 Conflict Detection */
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index ad4b159..f9f4cd1 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -865,6 +865,29 @@ int kvm_arch_init_vcpu(CPUState *cs)
   

[Qemu-devel] [PATCH] vcpu: join vcpu thread after it exiting

2018-01-31 Thread linzhecheng
As we create vcpu thread with QEMU_THREAD_JOINABLE mode,
we should join it after it exiting to cleanup resources.

Signed-off-by: linzhecheng 

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f290f48..6ff71e4 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -282,9 +282,9 @@ err:
 
 static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 {
-struct KVMParkedVcpu *cpu;
+struct KVMParkedVcpu *cpu, *next_cpu;
 
-QLIST_FOREACH(cpu, >kvm_parked_vcpus, node) {
+QLIST_FOREACH_SAFE(cpu, >kvm_parked_vcpus, node, *next_cpu) {
 if (cpu->vcpu_id == vcpu_id) {
 int kvm_fd;
 
diff --git a/cpus.c b/cpus.c
index 2cb0af9..1890bfe 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1205,6 +1205,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 cpu->created = false;
 qemu_cond_signal(_cpu_cond);
 qemu_mutex_unlock_iothread();
+rcu_unregister_thread();
 return NULL;
 }
 
@@ -1759,6 +1760,7 @@ void cpu_remove_sync(CPUState *cpu)
 cpu_remove(cpu);
 while (cpu->created) {
 qemu_cond_wait(_cpu_cond, _global_mutex);
+qemu_thread_join(cpu->thread);
 }
 }
 
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-01-31 Thread Marcel Apfelbaum
On 01/02/2018 4:22, Michael S. Tsirkin wrote:
> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
>> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
>>> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
 On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> Currently only file backed memory backend can
> be created with a "share" flag in order to allow
> sharing guest RAM with other processes in the host.
>
> Add the "share" flag also to RAM Memory Backend
> in order to allow remapping parts of the guest RAM
> to different host virtual addresses. This is needed
> by the RDMA devices in order to remap non-contiguous
> QEMU virtual addresses to a contiguous virtual address range.
>

 Why do we need to make this configurable?  Would anything break
 if MAP_SHARED was always used if possible?
>>>
>>> See Documentation/vm/numa_memory_policy.txt for a list
>>> of complications.
>>
>> Ew.
>>
>>>
>>> Maybe we should more of an effort to detect and report these
>>> issues.
>>
>> Probably.  Having other features breaking silently when using
>> pvrdma doesn't sound good.  We must at least document those
>> problems in the documentation for memory-backend-ram.
>>
>> BTW, what's the root cause for requiring HVAs in the buffer?
> 
> It's a side effect of the kernel/userspace API which always wants
> a single HVA/len pair to map memory for the application.
> 
> 

Hi Eduardo and Michael,

>>  Can
>> this be fixed?
> 
> I think yes.  It'd need to be a kernel patch for the RDMA subsystem
> mapping an s/g list with actual memory. The HVA/len pair would then just
> be used to refer to the region, without creating the two mappings.
> 
> Something like splitting the register mr into
> 
> mr = create mr (va/len) - allocate a handle and record the va/len
> 
> addmemory(mr, offset, hva, len) - pin memory
> 
> register mr - pass it to HW
> 
> As a nice side effect we won't burn so much virtual address space.
>

We would still need a contiguous virtual address space range (for post-send)
which we don't have since guest contiguous virtual address space
will always end up as non-contiguous host virtual address space.

I am not sure the RDMA HW can handle a large VA with holes.

An alternative would be 0-based MR, QEMU intercepts the post-send
operations and can substract the guest VA base address.
However I didn't see the implementation in kernel for 0 based MRs
and also the RDMA maintainer said it would work for local keys
and not for remote keys.

> This will fix rdma with hugetlbfs as well which is currently broken.
> 
> 

There is already a discussion on the linux-rdma list:
https://www.spinics.net/lists/linux-rdma/msg60079.html
But it will take some (actually a lot of) time, we are currently talking about
a possible API. And it does not solve the re-mapping...

Thanks,
Marcel

>> -- 
>> Eduardo




[Qemu-devel] [PULL 1/2] block/ssh: fix possible segmentation fault when .desc is not null-terminated

2018-01-31 Thread Jeff Cody
From: Murilo Opsfelder Araujo 

This patch prevents a possible segmentation fault when .desc members are checked
against NULL.

The ssh_runtime_opts was added by commit
8a6a80896d6af03b8ee0c17cdf37219eca2588a7 ("block/ssh: Use QemuOpts for runtime
options").

This fix was inspired by
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg00883.html.

Fixes: 8a6a80896d6af03b8ee0c17cdf37219eca2588a7 ("block/ssh: Use QemuOpts for 
runtime options")
Cc: Max Reitz 
Cc: Eric Blake 
Signed-off-by: Murilo Opsfelder Araujo 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
Signed-off-by: Jeff Cody 
---
 block/ssh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/ssh.c b/block/ssh.c
index b049a16..8890a0c 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -556,6 +556,7 @@ static QemuOptsList ssh_runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "Defines how and what to check the host key against",
 },
+{ /* end of list */ }
 },
 };
 
-- 
2.9.5




[Qemu-devel] [PULL 2/2] iotests: Make 200 run on tmpfs

2018-01-31 Thread Jeff Cody
From: Max Reitz 

200 currently fails on tmpfs because it sets cache=none.  However,
without that (and aio=native), the test still works now and it fails
before Jeff's series (on fc7dbc119e0852a70dc9fa68bb41a318e49e4cd6).  So
we can probably remove the aio=native safely, and replace cache=none by
cache=$CACHEMODE.

Signed-off-by: Max Reitz 
Reviewed-by: Jeff Cody 
Message-id: 20180117135015.15051-1-mre...@redhat.com
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/200 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index d8787dd..ddbdedc 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -60,7 +60,7 @@ qemu_comm_method="qmp"
 _launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
  -object iothread,id=iothread0 \
  -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
- -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,cache=$CACHEMODE,id=drive_sysdisk,format=$IMGFMT
 \
  -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
 h1=$QEMU_HANDLE
 
-- 
2.9.5




[Qemu-devel] [PULL 0/2] Block patches

2018-01-31 Thread Jeff Cody
The following changes since commit b05631954d6dfe93340d516660397e2c1a2a5dd6:

  Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20180131' into 
staging (2018-01-31 15:50:29 +)

are available in the git repository at:

  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 45a79646ea746fa3f32083d0aa70512aae29f6b3:

  iotests: Make 200 run on tmpfs (2018-01-31 22:37:00 -0500)


Block patches


Max Reitz (1):
  iotests: Make 200 run on tmpfs

Murilo Opsfelder Araujo (1):
  block/ssh: fix possible segmentation fault when .desc is not
null-terminated

 block/ssh.c| 1 +
 tests/qemu-iotests/200 | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-- 
2.9.5




Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Dan Williams
[ adding Michal and lsf-pci ]

On Wed, Jan 31, 2018 at 7:02 PM, Dan Williams  wrote:
> On Wed, Jan 31, 2018 at 6:29 PM, Haozhong Zhang
>  wrote:
>> + vfio maintainer Alex Williamson in case my understanding of vfio is 
>> incorrect.
>>
>> On 01/31/18 16:32 -0800, Dan Williams wrote:
>>> On Wed, Jan 31, 2018 at 4:24 PM, Haozhong Zhang
>>>  wrote:
>>> > On 01/31/18 16:08 -0800, Dan Williams wrote:
>>> >> On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
>>> >>  wrote:
>>> >> > On 01/31/18 14:25 -0800, Dan Williams wrote:
>>> >> >> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
>>> >> >>  wrote:
>>> >> >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>>> >> >> > guarantee the write persistence to mmap'ed files supporting DAX 
>>> >> >> > (e.g.,
>>> >> >> > files on ext4/xfs file system mounted with '-o dax').
>>> >> >>
>>> >> >> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
>>> >> >> metadata is in sync after a fault. However, that does not make
>>> >> >> filesystem-DAX safe for use with QEMU, because we still need to
>>> >> >> coordinate DMA with fileystem operations. There is no way to do that
>>> >> >> coordination from within a guest. QEMU needs to use device-dax if the
>>> >> >> guest might ever perform DMA to a virtual-pmem range. See this patch
>>> >> >> set for more details on the DAX vs DMA problem [1]. I think we need to
>>> >> >> enforce this in the host kernel. I.e. do not allow file backed DAX
>>> >> >> pages to be mapped in EPT entries unless / until we have a solution to
>>> >> >> the DMA synchronization problem. Apologies for not noticing this
>>> >> >> earlier.
>>> >> >
>>> >> > QEMU does not truncate or punch holes of the file once it has been
>>> >> > mmap()'ed. Does the problem [1] still exist in such case?
>>> >>
>>> >> Something else on the system might. The only agent that could enforce
>>> >> protection is the kernel, and the kernel will likely just disallow
>>> >> passing addresses from filesystem-dax vmas through to a guest
>>> >> altogether. I think there's even a problem in the non-DAX case unless
>>> >> KVM is pinning pages while they are handed out to a guest. The problem
>>> >> is that we don't have a page cache page to pin in the DAX case.
>>> >>
>>> >
>>> > Does it mean any user-space code like
>>> >   ptr = mmap(..., fd, ...); // fd refers to a file on DAX filesystem
>>> >   // make DMA to ptr
>>> > is unsafe?
>>>
>>> Yes, it is currently unsafe because there is no coordination with the
>>> filesytem if it decides to make block layout changes. We can fix that
>>> in the non-virtualization case by having the filesystem wait for DMA
>>> completion callbacks (i.e. what for all pages to be idle), but as far
>>> as I can see we can't do the same coordination for DMA initiated by a
>>> guest device driver.
>>>
>>
>> I think that fix [1] also works for KVM/QEMU. The guest DMA are
>> performed on two types of devices:
>>
>> 1. For emulated devices, the guest DMA requests are trapped and
>>actually performed by QEMU on the host side. The host side fix [1]
>>can cover this case.
>>
>> 2. For passthrough devices, vfio pins all pages, including those
>>backed by dax mode files, used by the guest if any device is
>>passthroughed to it. If I read the commit message in [2] correctly,
>>operations that change the page-to-file offset association of pages
>>from dax mode files will be deferred until the reference count of
>>the affected pages becomes 1.  That is, if any passthrough device
>>is used with a VM, the changes of page-to-file offset will not be
>>able to happen until the VM is shutdown, so the fix [1] still takes
>>effect here.
>
> This sounds like a longterm mapping under control of vfio and not the
> filesystem. See get_user_pages_longterm(), it is a problem if pages
> are pinned indefinitely especially DAX. It sounds like vfio is in the
> same boat as RDMA and cannot support long lived pins of DAX pages. As
> of 4.15 RDMA to filesystem-DAX pages has been disabled. The eventual
> fix will be to create a "memory-registration with lease" semantic
> available for RDMA so that the kernel can forcibly revoke page pinning
> to perform physical layout changes. In the near it seems
> vaddr_get_pfn() needs to be fixed to use get_user_pages_longterm() so
> that filesystem-dax mappings are explicitly disallowed.
>
>> Another question is how a user-space application (e.g., QEMU) knows
>> whether it's safe to mmap a file on the DAX file system?
>
> I think we fix vaddr_get_pfn() to start failing for DAX mappings
> unless/until we can add a "with lease" mechanism. Userspace will know
> when it is safe again when vfio stops failing.

Btw, there is an LSF/MM topic proposal on this subject [1].

[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-January/013935.html



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Dan Williams
On Wed, Jan 31, 2018 at 6:29 PM, Haozhong Zhang
 wrote:
> + vfio maintainer Alex Williamson in case my understanding of vfio is 
> incorrect.
>
> On 01/31/18 16:32 -0800, Dan Williams wrote:
>> On Wed, Jan 31, 2018 at 4:24 PM, Haozhong Zhang
>>  wrote:
>> > On 01/31/18 16:08 -0800, Dan Williams wrote:
>> >> On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
>> >>  wrote:
>> >> > On 01/31/18 14:25 -0800, Dan Williams wrote:
>> >> >> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
>> >> >>  wrote:
>> >> >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> >> >> > guarantee the write persistence to mmap'ed files supporting DAX 
>> >> >> > (e.g.,
>> >> >> > files on ext4/xfs file system mounted with '-o dax').
>> >> >>
>> >> >> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
>> >> >> metadata is in sync after a fault. However, that does not make
>> >> >> filesystem-DAX safe for use with QEMU, because we still need to
>> >> >> coordinate DMA with fileystem operations. There is no way to do that
>> >> >> coordination from within a guest. QEMU needs to use device-dax if the
>> >> >> guest might ever perform DMA to a virtual-pmem range. See this patch
>> >> >> set for more details on the DAX vs DMA problem [1]. I think we need to
>> >> >> enforce this in the host kernel. I.e. do not allow file backed DAX
>> >> >> pages to be mapped in EPT entries unless / until we have a solution to
>> >> >> the DMA synchronization problem. Apologies for not noticing this
>> >> >> earlier.
>> >> >
>> >> > QEMU does not truncate or punch holes of the file once it has been
>> >> > mmap()'ed. Does the problem [1] still exist in such case?
>> >>
>> >> Something else on the system might. The only agent that could enforce
>> >> protection is the kernel, and the kernel will likely just disallow
>> >> passing addresses from filesystem-dax vmas through to a guest
>> >> altogether. I think there's even a problem in the non-DAX case unless
>> >> KVM is pinning pages while they are handed out to a guest. The problem
>> >> is that we don't have a page cache page to pin in the DAX case.
>> >>
>> >
>> > Does it mean any user-space code like
>> >   ptr = mmap(..., fd, ...); // fd refers to a file on DAX filesystem
>> >   // make DMA to ptr
>> > is unsafe?
>>
>> Yes, it is currently unsafe because there is no coordination with the
>> filesytem if it decides to make block layout changes. We can fix that
>> in the non-virtualization case by having the filesystem wait for DMA
>> completion callbacks (i.e. what for all pages to be idle), but as far
>> as I can see we can't do the same coordination for DMA initiated by a
>> guest device driver.
>>
>
> I think that fix [1] also works for KVM/QEMU. The guest DMA are
> performed on two types of devices:
>
> 1. For emulated devices, the guest DMA requests are trapped and
>actually performed by QEMU on the host side. The host side fix [1]
>can cover this case.
>
> 2. For passthrough devices, vfio pins all pages, including those
>backed by dax mode files, used by the guest if any device is
>passthroughed to it. If I read the commit message in [2] correctly,
>operations that change the page-to-file offset association of pages
>from dax mode files will be deferred until the reference count of
>the affected pages becomes 1.  That is, if any passthrough device
>is used with a VM, the changes of page-to-file offset will not be
>able to happen until the VM is shutdown, so the fix [1] still takes
>effect here.

This sounds like a longterm mapping under control of vfio and not the
filesystem. See get_user_pages_longterm(), it is a problem if pages
are pinned indefinitely especially DAX. It sounds like vfio is in the
same boat as RDMA and cannot support long lived pins of DAX pages. As
of 4.15 RDMA to filesystem-DAX pages has been disabled. The eventual
fix will be to create a "memory-registration with lease" semantic
available for RDMA so that the kernel can forcibly revoke page pinning
to perform physical layout changes. In the near it seems
vaddr_get_pfn() needs to be fixed to use get_user_pages_longterm() so
that filesystem-dax mappings are explicitly disallowed.

> Another question is how a user-space application (e.g., QEMU) knows
> whether it's safe to mmap a file on the DAX file system?

I think we fix vaddr_get_pfn() to start failing for DAX mappings
unless/until we can add a "with lease" mechanism. Userspace will know
when it is safe again when vfio stops failing.



Re: [Qemu-devel] Windows balloon driver PFN issue

2018-01-31 Thread Peter Xu
On Thu, Feb 01, 2018 at 04:24:40AM +0200, Michael S. Tsirkin wrote:
> On Thu, Feb 01, 2018 at 10:18:53AM +0800, Peter Xu wrote:
> > On Wed, Jan 31, 2018 at 04:03:12PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 31, 2018 at 05:28:35PM +0800, Peter Xu wrote:
> > > > Hi, Michael and the list,
> > > > 
> > > > I observed this on windows 8 enterprise guests, when doing memory 
> > > > ballooning:
> > > > 
> > > > 23892@1517298572.328354:virtio_balloon_to_target balloon target: 
> > > > 0x8000 num_pages: 524288
> > > > 23892@1517298638.542819:virtio_balloon_get_config num_pages: 524288 
> > > > actual: 0
> > > > 23892@1517298638.542974:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x174604000
> > > > 23892@1517298638.543059:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543135:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460a000
> > > > 23892@1517298638.543140:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543143:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460b000
> > > > 23892@1517298638.543146:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543148:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460c000
> > > > 23892@1517298638.543152:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543154:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460d000
> > > > 23892@1517298638.543159:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543162:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460e000
> > > > 23892@1517298638.543165:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > 23892@1517298638.543167:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x17460f000
> > > > 23892@1517298638.543170:virtio_balloon_handle_output section name: 
> > > > pc.ram gpa: 0x0
> > > > ...
> > > > 
> > > > I think it's very possible that these zero addresses (please let me
> > > > know what the first 4K page is used for if anyone knows, since IIUC
> > > > that's what we throw away now) are half of the 64bit PFN.  Or say, not
> > > > sure whether this means a windows guest driver bug that is using
> > > > 64bits for PFN rather than 32bits (and I suppose the protocol is using
> > > > 32bit for PFNs).
> > > > 
> > > > Michael, do you know what to do with this?
> > > > 
> > > > Thanks,
> > > 
> > > PFN is GPA>>12.  Do you have more than 1<<44 bytes of memory in this VM 
> > > then?
> > 
> > No.  But isn't it still not good to drop the page at offset zero (and
> > drop it NNN times)?
> 
> Absolutely - looks like a bug. I just don't know why does this happen.

IMHO if we are using a PFN array like this:

   u64 pfn_array[];

In the windows guest driver, then we'll see this (as mentioned
above).  But for sure this is wild guess of mine.

> 
> >  And I'm not sure what will happen if guest has
> > 1<<44 bytes; then we'll possibly drop very random addresses since a
> > real address will be splitted?
> > 
> > Thanks,
> 
> The balloon won't work, period. There was an interface change to fix
> that but implementation issues remained and contributor seems to be busy
> with page hints.

Okay.  So IIUC this is already a known issue for the driver owner.
Then it seems that there's nothing more I can do for now...

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Haozhong Zhang
+ vfio maintainer Alex Williamson in case my understanding of vfio is incorrect.

On 01/31/18 16:32 -0800, Dan Williams wrote:
> On Wed, Jan 31, 2018 at 4:24 PM, Haozhong Zhang
>  wrote:
> > On 01/31/18 16:08 -0800, Dan Williams wrote:
> >> On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
> >>  wrote:
> >> > On 01/31/18 14:25 -0800, Dan Williams wrote:
> >> >> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
> >> >>  wrote:
> >> >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> >> >> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> >> >> > files on ext4/xfs file system mounted with '-o dax').
> >> >>
> >> >> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
> >> >> metadata is in sync after a fault. However, that does not make
> >> >> filesystem-DAX safe for use with QEMU, because we still need to
> >> >> coordinate DMA with fileystem operations. There is no way to do that
> >> >> coordination from within a guest. QEMU needs to use device-dax if the
> >> >> guest might ever perform DMA to a virtual-pmem range. See this patch
> >> >> set for more details on the DAX vs DMA problem [1]. I think we need to
> >> >> enforce this in the host kernel. I.e. do not allow file backed DAX
> >> >> pages to be mapped in EPT entries unless / until we have a solution to
> >> >> the DMA synchronization problem. Apologies for not noticing this
> >> >> earlier.
> >> >
> >> > QEMU does not truncate or punch holes of the file once it has been
> >> > mmap()'ed. Does the problem [1] still exist in such case?
> >>
> >> Something else on the system might. The only agent that could enforce
> >> protection is the kernel, and the kernel will likely just disallow
> >> passing addresses from filesystem-dax vmas through to a guest
> >> altogether. I think there's even a problem in the non-DAX case unless
> >> KVM is pinning pages while they are handed out to a guest. The problem
> >> is that we don't have a page cache page to pin in the DAX case.
> >>
> >
> > Does it mean any user-space code like
> >   ptr = mmap(..., fd, ...); // fd refers to a file on DAX filesystem
> >   // make DMA to ptr
> > is unsafe?
> 
> Yes, it is currently unsafe because there is no coordination with the
> filesytem if it decides to make block layout changes. We can fix that
> in the non-virtualization case by having the filesystem wait for DMA
> completion callbacks (i.e. what for all pages to be idle), but as far
> as I can see we can't do the same coordination for DMA initiated by a
> guest device driver.
> 

I think that fix [1] also works for KVM/QEMU. The guest DMA are
performed on two types of devices:

1. For emulated devices, the guest DMA requests are trapped and
   actually performed by QEMU on the host side. The host side fix [1]
   can cover this case.

2. For passthrough devices, vfio pins all pages, including those
   backed by dax mode files, used by the guest if any device is
   passthroughed to it. If I read the commit message in [2] correctly,
   operations that change the page-to-file offset association of pages
   from dax mode files will be deferred until the reference count of
   the affected pages becomes 1.  That is, if any passthrough device
   is used with a VM, the changes of page-to-file offset will not be
   able to happen until the VM is shutdown, so the fix [1] still takes
   effect here.

Another question is how a user-space application (e.g., QEMU) knows
whether it's safe to mmap a file on the DAX file system?

[1] https://lists.01.org/pipermail/linux-nvdimm/2017-December/013704.html
[2] https://lists.01.org/pipermail/linux-nvdimm/2017-December/013713.html


Thanks,
Haozhong



Re: [Qemu-devel] [PATCH v2 02/19] Clean up includes

2018-01-31 Thread Fam Zheng
On Wed, Jan 31, 2018 at 11:48 PM, Thomas Huth  wrote:
> On 31.01.2018 15:48, Markus Armbruster wrote:
>> Clean up includes so that osdep.h is included first and headers
>> which it implies are not included manually.
>>
>> This commit was created with scripts/clean-includes, with the change
>> to target/s390x/gen-features.c manually reverted, and blank lines
>> around deletions collapsed.
>
> Reviewed-by: Thomas Huth 
>
> I wonder whether it would make sense to add a check based on
> scripts/clean-includes to patchew already, so that these includes do not
> sneak in so easily again...?

(Having not looked at the series), are we clean with this series
applied? If so it makes a good point to do so.

Fam

>
>  Thomas



Re: [Qemu-devel] [PATCH v5 2/5] migration: Set the migration tcp port

2018-01-31 Thread Peter Xu
On Wed, Jan 31, 2018 at 01:35:33PM +0100, Juan Quintela wrote:
> Peter Xu  wrote:
> > On Mon, Jan 29, 2018 at 01:17:51PM +0100, Juan Quintela wrote:
> >> We can set the port parameter as zero.  This patch lets us know what
> >> port the system was choosen for us.  Now we can migrate to this place.
> >> 
> >> Signed-off-by: Juan Quintela 
> >> 
> >> --
> >> 
> >> This was migrate_set_uri(), but as we only need the tcp_port, change
> >> to that one.
> >> ---
> >>  migration/migration.c | 10 ++
> >>  migration/migration.h |  2 ++
> >>  migration/socket.c| 35 ++-
> >>  3 files changed, 42 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index eb6958dcda..53818a87af 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -245,6 +245,16 @@ void migrate_send_rp_req_pages(MigrationIncomingState 
> >> *mis, const char *rbname,
> >>  }
> >>  }
> >>  
> >> +void migrate_set_port(const uint16_t port, Error **errp)
> >> +{
> >> +MigrateSetParameters p = {
> >> +.has_x_tcp_port = true,
> >> +.x_tcp_port = port,
> >> +};
> >> +
> >> +qmp_migrate_set_parameters(, errp);
> >
> > If we are calling qmp_migrate_set_parameters() here, does it mean that
> > user can also set this parameter via QMP?
> 
> Yeap.  We do that, or we invent yet another mechanism to update the
> tcp_port parameter :-(
> 
> You can't have both.
> 
> if the user modifies it, it just shots itself it its feet, no?

How about set this directly? :)  Like:

  migrate_get_current()->parameters.x_tcp_port = xxx;

Maybe also add a comment showing that this is a special case.  I just
feel strange that if user can set it, but I'll follow your decision
even if you really want to keep the qmp_*() call since after all this
is for debugging, and QEMU itself won't use this value now.

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2] Small IPMI fixes

2018-01-31 Thread Michael S. Tsirkin
On Wed, Jan 31, 2018 at 06:52:25PM +0100, Greg Kurz wrote:
> On Tue, 23 Jan 2018 10:55:15 -0600
> Corey Minyard  wrote:
> 
> > Michael?  Anyone?  Can we get these in?
> > 
> 
> Since you're adding yourself as a maintainer for IPMI, why not just sending
> a pull request ?

Alternatively if you want others to merge it, pls Cc the relevant people.

> > Thanks
> > 
> > -corey
> > 
> > On 01/15/2018 06:57 PM, miny...@acm.org wrote:
> > > These are some fixes I've had for a while (mostly).  They are small
> > > fixes and additions for various things.  It would be nice to be able
> > > to get these into the upcoming release.
> > >
> > > I also add myself as the maintainer for the IPMI code, since that
> > > seemed appropriate.
> > >
> > > Changes in v2:
> > >
> > > I removed the "other" fixes (the one to vl.c) from the previous patch
> > > set, these are all IPMI-related changes.  I also moved adding myself
> > > as the IPMI maintainer to the first patch, per Marc-André.
> > >  
> > 
> > 



Re: [Qemu-devel] Windows balloon driver PFN issue

2018-01-31 Thread Michael S. Tsirkin
On Thu, Feb 01, 2018 at 10:18:53AM +0800, Peter Xu wrote:
> On Wed, Jan 31, 2018 at 04:03:12PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 31, 2018 at 05:28:35PM +0800, Peter Xu wrote:
> > > Hi, Michael and the list,
> > > 
> > > I observed this on windows 8 enterprise guests, when doing memory 
> > > ballooning:
> > > 
> > > 23892@1517298572.328354:virtio_balloon_to_target balloon target: 
> > > 0x8000 num_pages: 524288
> > > 23892@1517298638.542819:virtio_balloon_get_config num_pages: 524288 
> > > actual: 0
> > > 23892@1517298638.542974:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x174604000
> > > 23892@1517298638.543059:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543135:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460a000
> > > 23892@1517298638.543140:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543143:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460b000
> > > 23892@1517298638.543146:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543148:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460c000
> > > 23892@1517298638.543152:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543154:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460d000
> > > 23892@1517298638.543159:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543162:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460e000
> > > 23892@1517298638.543165:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > 23892@1517298638.543167:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x17460f000
> > > 23892@1517298638.543170:virtio_balloon_handle_output section name: pc.ram 
> > > gpa: 0x0
> > > ...
> > > 
> > > I think it's very possible that these zero addresses (please let me
> > > know what the first 4K page is used for if anyone knows, since IIUC
> > > that's what we throw away now) are half of the 64bit PFN.  Or say, not
> > > sure whether this means a windows guest driver bug that is using
> > > 64bits for PFN rather than 32bits (and I suppose the protocol is using
> > > 32bit for PFNs).
> > > 
> > > Michael, do you know what to do with this?
> > > 
> > > Thanks,
> > 
> > PFN is GPA>>12.  Do you have more than 1<<44 bytes of memory in this VM 
> > then?
> 
> No.  But isn't it still not good to drop the page at offset zero (and
> drop it NNN times)?

Absolutely - looks like a bug. I just don't know why does this happen.

>  And I'm not sure what will happen if guest has
> 1<<44 bytes; then we'll possibly drop very random addresses since a
> real address will be splitted?
> 
> Thanks,

The balloon won't work, period. There was an interface change to fix
that but implementation issues remained and contributor seems to be busy
with page hints.

> -- 
> Peter Xu



Re: [Qemu-devel] [RFC PATCH qemu v6] vfio-pci: Allow mmap of MSIX BAR

2018-01-31 Thread Alexey Kardashevskiy
On 01/02/18 05:57, Alex Williamson wrote:
> On Wed, 31 Jan 2018 18:24:45 +1100
> Alexey Kardashevskiy  wrote:
> 
>> This makes use of a new VFIO_REGION_INFO_CAP_MSIX_MAPPABLE capability
>> which tells that a region with MSIX data can be mapped entirely, i.e.
>> the VFIO PCI driver won't prevent MSIX vectors area from being mapped.
>>
>> With this change, all BARs are mapped in a single chunk and MSIX vectors
>> are emulated on top unless the machine requests not to by defining and
>> enabling a new "vfio-no-msix-emulation" property. At the moment only
>> sPAPR machine does so - it prohibits MSIX emulation and does not allow
>> enabling it as it does not define the "set" callback for the new property;
>> the new property also does not appear in "-machine pseries,help".
>>
>> This change may affect machines which are using MSIX emulation and
>> mapping MMIO RAM regions for DMA as previously MSIX containing BAR
>> would be split in chunks aligned to the system page size and we would
>> only see aligned RAM memory sections in vfio_listener_region_add/del.
>> As the system page size is usually is equal or bigger than the IOMMU
>> page size, vfio_dma_map() did not have a legitimate reason to fail so
>> mapping failures were fatal. Note that this behaviour silently skips
>> parts of BARs adjacent to the MSIX data.
>>
>> With this change, we map entire MSIX BAR and emulate MSIX on top of it
>> so vfio_listener_region_add/del may be called with unaligned MMIO RAM
>> regions and vfio_dma_map() will fail on these. In order to mitigate
>> this, this changes vfio_listener_region_add() not to try vfio_dma_map() if
>> the memory section is not aligned to the minimal IOMMU page; an error
>> is printed instead. This also treats all errors from vfio_dma_map()
>> non fatal when the listener is called on the MMIO RAM region.
>> vfio_listener_region_del() is adjusted accordingly.
>>
>> If the amount of errors printed is overwhelming, the MSIX relocation
>> could be used to avoid excessive error output.
>>
>> This requires the kernel change - "vfio-pci: Allow mapping MSIX BAR" -
>> for the new capability: https://www.spinics.net/lists/kvm/msg160282.html
> 
> Looks like there are at least 3 or 4 patches here:

RFC...


> A) Update linux-headers (gated by v4.16-rc1)

Sure, I just do not see it in any upstream yet.


> B) vfio_is_cap_present
> C) check alignment on region_add/del vs hostwin, non-fatal ram_device


Having B) without C) will break bisectability. Having C) before B) will not
trigger any new code paths so it will break bisectability too. I'd have B)
and C) in a single patch.

> D) platform directed disabling of MSI-X MemoryRegion

Sure.


>  
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>> Changes:
>> v6:
>> * dropped that stupid p2p property patch
>>
>> v5:
>> * rebased on top of 'p2p' proposed patch
>>
>> v4:
>> * silenced dma map errors if unaligned mapping is attempted - they are going
>> to fail anyway
>>
>> v3:
>> * vfio_listener_region_add() won't make qemu exit if failed on MMIO MR
>> ---
>>  include/hw/vfio/vfio-common.h |  1 +
>>  linux-headers/linux/vfio.h|  5 
>>  hw/ppc/spapr.c|  7 +
>>  hw/vfio/common.c  | 66 
>> +++
>>  hw/vfio/pci.c | 10 +++
>>  5 files changed, 83 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index f3a2ac9..927d600 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -171,6 +171,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
>>   struct vfio_region_info **info);
>>  int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
>>   uint32_t subtype, struct vfio_region_info 
>> **info);
>> +bool vfio_is_cap_present(VFIODevice *vbasedev, uint16_t cap_type, int 
>> region);
>>  #endif
>>  extern const MemoryListener vfio_prereg_listener;
>>  
>> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
>> index 4312e96..b45182e 100644
>> --- a/linux-headers/linux/vfio.h
>> +++ b/linux-headers/linux/vfio.h
>> @@ -301,6 +301,11 @@ struct vfio_region_info_cap_type {
>>  #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG  (2)
>>  #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG   (3)
>>  
>> +/*
>> + * The MSIX mappable capability informs that MSIX data of a BAR can be 
>> mmapped.
>> + */
>> +#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE  3
>> +
>>  /**
>>   * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
>>   *  struct vfio_irq_info)
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 32a876b..6d333e2 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2830,6 +2830,11 @@ static void spapr_set_modern_hotplug_events(Object 
>> *obj, bool value,
>>  spapr->use_hotplug_event_source = value;
>>  }
>>  
>> +static bool 

Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-01-31 Thread Michael S. Tsirkin
On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote:
> On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
> > > On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> > > > Currently only file backed memory backend can
> > > > be created with a "share" flag in order to allow
> > > > sharing guest RAM with other processes in the host.
> > > > 
> > > > Add the "share" flag also to RAM Memory Backend
> > > > in order to allow remapping parts of the guest RAM
> > > > to different host virtual addresses. This is needed
> > > > by the RDMA devices in order to remap non-contiguous
> > > > QEMU virtual addresses to a contiguous virtual address range.
> > > > 
> > > 
> > > Why do we need to make this configurable?  Would anything break
> > > if MAP_SHARED was always used if possible?
> > 
> > See Documentation/vm/numa_memory_policy.txt for a list
> > of complications.
> 
> Ew.
> 
> > 
> > Maybe we should more of an effort to detect and report these
> > issues.
> 
> Probably.  Having other features breaking silently when using
> pvrdma doesn't sound good.  We must at least document those
> problems in the documentation for memory-backend-ram.
> 
> BTW, what's the root cause for requiring HVAs in the buffer?

It's a side effect of the kernel/userspace API which always wants
a single HVA/len pair to map memory for the application.


>  Can
> this be fixed?

I think yes.  It'd need to be a kernel patch for the RDMA subsystem
mapping an s/g list with actual memory. The HVA/len pair would then just
be used to refer to the region, without creating the two mappings.

Something like splitting the register mr into

mr = create mr (va/len) - allocate a handle and record the va/len

addmemory(mr, offset, hva, len) - pin memory

register mr - pass it to HW

As a nice side effect we won't burn so much virtual address space.

This will fix rdma with hugetlbfs as well which is currently broken.


> -- 
> Eduardo



[Qemu-devel] [PATCH v3] docs: Add docs/devel/testing.rst

2018-01-31 Thread Fam Zheng
To make our efforts on QEMU testing easier to consume by contributors,
let's add a document. For example, Patchew reports build errors on
patches that should be relatively easy to reproduce with a few steps, and
it is much nicer if there is such a documentation that it can refer to.

This focuses on how to run existing tests and how to write new test
cases, without going into the frameworks themselves.

The VM based testing section is moved from tests/vm/README which now
is a single line pointing to the new doc.

Signed-off-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 

---

v3: Fix "development", "reclamation" and "tests". [Eric]
Fix "libqtest". [Andrew]
Add Stefan's reviewed-by.

v2: Fix spelling errors and improve wordings. [Stefan, Eric, Thomas]
Example test name "test-foo -> foo-test". The mass renaming will be
done in another series. [Thomas, Eric]
Document how to debug unit tests and qtests. [Eric]
Document group permission setup for docker, and the privilege risks.
[Alex]
Update tests/vm/README.
---
 docs/devel/testing.rst | 486 +
 tests/vm/README|  90 +
 2 files changed, 487 insertions(+), 89 deletions(-)
 create mode 100644 docs/devel/testing.rst

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
new file mode 100644
index 00..0ca1a2d4b5
--- /dev/null
+++ b/docs/devel/testing.rst
@@ -0,0 +1,486 @@
+===
+Testing in QEMU
+===
+
+This document describes the testing infrastructure in QEMU.
+
+Testing with "make check"
+=
+
+The "make check" testing family includes most of the C based tests in QEMU. For
+a quick help, run ``make check-help`` from the source tree.
+
+The usual way to run these tests is:
+
+.. code::
+
+  make check
+
+which includes QAPI schema tests, unit tests, and QTests. Different sub-types
+of "make check" tests will be explained below.
+
+Before running tests, it is best to build QEMU programs first. Some tests
+expect the executables to exist and will fail with obscure messages if they
+cannot find them.
+
+Unit tests
+--
+
+Unit tests, which can be invoked with ``make check-unit``, are simple C tests
+that typically link to individual QEMU object files and exercise them by
+calling exported functions.
+
+If you are writing new code in QEMU, consider adding a unit test, especially
+for utility modules that are relatively stateless or have few dependencies. To
+add a new unit test:
+
+1. Create a new source file. For example, ``tests/foo-test.c``.
+
+2. Write the test. Normally you would include the header file which exports
+   the module API, then verify the interface behaves as expected from your
+   test. The test code should be organized with the glib testing framework.
+   Copying and modifying an existing test is usually a good idea.
+
+3. Add the test to ``tests/Makefile.include``. First, name the unit test
+   program and add it to ``$(check-unit-y)``; then add a rule to build the
+   executable. Optionally, you can add a magical variable to support ``gcov``.
+   For example:
+
+.. code::
+
+  check-unit-y += tests/foo-test$(EXESUF)
+  tests/foo-test$(EXESUF): tests/foo-test.o $(test-util-obj-y)
+  ...
+  gcov-files-foo-test-y = util/foo.c
+
+Since unit tests don't require environment variables, the simplest way to debug
+a unit test failure is often directly invoking it or even running it under
+``gdb``. However there can still be differences in behavior between ``make``
+invocations and your manual run, due to ``$MALLOC_PERTURB_`` environment
+variable (which affects memory reclamation and catches invalid pointers better)
+and gtester options. If necessary, you can run
+
+.. code::
+  make check-unit V=1
+
+and copy the actual command line which executes the unit test, then run
+it from the command line.
+
+QTest
+-
+
+QTest is a device emulation testing framework.  It can be very useful to test
+device models; it could also control certain aspects of QEMU (such as virtual
+clock stepping), with a special purpose "qtest" protocol.  Refer to the
+documentation in ``qtest.c`` for more details of the protocol.
+
+QTest cases can be executed with
+
+.. code::
+
+   make check-qtest
+
+The QTest library is implemented by ``tests/libqtest.c`` and the API is defined
+in ``tests/libqtest.h``.
+
+Consider adding a new QTest case when you are introducing a new virtual
+hardware, or extending one if you are adding functionalities to an existing
+virtual device.
+
+On top of libqtest, a higher level library, ``libqos``, was created to
+encapsulate common tasks of device drivers, such as memory management and
+communicating with system buses or devices. Many virtual device tests use
+libqos instead of directly calling into libqtest.
+
+Steps to add a new QTest case are:
+
+1. Create a new source file for the test. (More than one file can be added as
+   necessary.) For 

Re: [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

2018-01-31 Thread Liang Li
On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
> 
> 
> On 01/30/2018 03:38 AM, Liang Li wrote:
>> When doing drive mirror to a low speed shared storage, if there was heavy
>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>> can't be canceled immediately, it would keep running until the heavy BLK IO
>> workload stopped in the VM.
>> 
>> Because libvirt depends on block-job-cancel for block live migration, the
>> current block-job-cancel has the semantic to make sure data is in sync after
>> the 'ready' event.  This semantic can't meet some requirement, for example,
>> people may use drive mirror for realtime backup while need the ability of
>> block live migration. If drive mirror can't not be cancelled immediately,
>> it means block live migration need to wait, because libvirt make use drive
>> mirror to implement block live migration and only one drive mirror block
>> job is allowed at the same time for a give block dev.
>> 
>> We need a new interface for 'force cancel', which could quit block job
>> immediately if don't care about whether data is in sync or not.
>> 
>> 'force' is not used by libvirt currently, to make things simple, change
>> it's semantic slightly, hope it will not break some use case which need its
>> original semantic.
>> 
>> Cc: Paolo Bonzini 
>> Cc: Jeff Cody 
>> Cc: Kevin Wolf 
>> Cc: Max Reitz 
>> Cc: Eric Blake 
>> Cc: John Snow 
>> Reported-by: Huaitong Han 
>> Signed-off-by: Huaitong Han 
>> Signed-off-by: Liang Li 
>> ---
>> block/mirror.c|  9 +++--
>> blockdev.c|  4 ++--
>> blockjob.c| 11 ++-
>> hmp-commands.hx   |  3 ++-
>> include/block/blockjob.h  |  9 -
>> qapi/block-core.json  |  6 --
>> tests/test-blockjob-txn.c |  8 
>> 7 files changed, 29 insertions(+), 21 deletions(-)
>> 
>> diff --git a/block/mirror.c b/block/mirror.c
>> index c9badc1..c22dff9 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
>> 
>> ret = 0;
>> trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>> -if (!s->synced) {
>> -block_job_sleep_ns(>common, delay_ns);
>> -if (block_job_is_cancelled(>common)) {
>> -break;
>> -}
>> +if (block_job_is_cancelled(>common) && s->common.force) {
>> +break;
> 
> what's the justification for removing the sleep in the case that
> !s->synced && !block_job_is_cancelled(...) ?
> 
if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) {}'
will execute, there is a block_job_sleep_ns there.

block_job_sleep_ns is for rate throttling, if there is no more data to sync, 
sleep is not needed, right?

>> } else if (!should_complete) {
>> delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
>> block_job_sleep_ns(>common, delay_ns);
>> @@ -887,7 +884,7 @@ immediate_exit:
>>  * or it was cancelled prematurely so that we do not guarantee that
>>  * the target is a copy of the source.
>>  */
>> -assert(ret < 0 || (!s->synced && 
>> block_job_is_cancelled(>common)));
>> +assert(ret < 0 || block_job_is_cancelled(>common));
> 
> This assertion gets weaker in the case where force isn't provided, is
> that desired?
> 
yes. if force quit is used, the following condition can be true

(ret >= 0) && (s->synced) && (block_job_is_cancelled(>common)) 

so the above assert should be changed, or it will be failed.

>> assert(need_drain);
>> mirror_wait_for_all_io(s);
>> }
>> diff --git a/blockdev.c b/blockdev.c
>> index 8e977ee..039f156 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>> aio_context_acquire(aio_context);
>> 
>> if (bs->job) {
>> -block_job_cancel(bs->job);
>> +block_job_cancel(bs->job, false);
>> }
>> 
>> aio_context_release(aio_context);
>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
>> }
>> 
>> trace_qmp_block_job_cancel(job);
>> -block_job_cancel(job);
>> +block_job_cancel(job, force);
>> out:
>> aio_context_release(aio_context);
>> }
>> diff --git a/blockjob.c b/blockjob.c
>> index f5cea84..0aacb50 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
>> block_job_unref(job);
>> }
>> 
>> -static void block_job_cancel_async(BlockJob *job)
>> +static void block_job_cancel_async(BlockJob *job, bool force)
>> {
>> if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
>> block_job_iostatus_reset(job);
>> @@ -376,6 

Re: [Qemu-devel] Windows balloon driver PFN issue

2018-01-31 Thread Peter Xu
On Wed, Jan 31, 2018 at 04:03:12PM +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 31, 2018 at 05:28:35PM +0800, Peter Xu wrote:
> > Hi, Michael and the list,
> > 
> > I observed this on windows 8 enterprise guests, when doing memory 
> > ballooning:
> > 
> > 23892@1517298572.328354:virtio_balloon_to_target balloon target: 0x8000 
> > num_pages: 524288
> > 23892@1517298638.542819:virtio_balloon_get_config num_pages: 524288 actual: > > 0
> > 23892@1517298638.542974:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x174604000
> > 23892@1517298638.543059:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543135:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460a000
> > 23892@1517298638.543140:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543143:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460b000
> > 23892@1517298638.543146:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543148:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460c000
> > 23892@1517298638.543152:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543154:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460d000
> > 23892@1517298638.543159:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543162:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460e000
> > 23892@1517298638.543165:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > 23892@1517298638.543167:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x17460f000
> > 23892@1517298638.543170:virtio_balloon_handle_output section name: pc.ram 
> > gpa: 0x0
> > ...
> > 
> > I think it's very possible that these zero addresses (please let me
> > know what the first 4K page is used for if anyone knows, since IIUC
> > that's what we throw away now) are half of the 64bit PFN.  Or say, not
> > sure whether this means a windows guest driver bug that is using
> > 64bits for PFN rather than 32bits (and I suppose the protocol is using
> > 32bit for PFNs).
> > 
> > Michael, do you know what to do with this?
> > 
> > Thanks,
> 
> PFN is GPA>>12.  Do you have more than 1<<44 bytes of memory in this VM then?

No.  But isn't it still not good to drop the page at offset zero (and
drop it NNN times)?  And I'm not sure what will happen if guest has
1<<44 bytes; then we'll possibly drop very random addresses since a
real address will be splitted?

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2] docs: Add docs/devel/testing.rst

2018-01-31 Thread Fam Zheng
On Wed, Jan 31, 2018 at 10:24 PM, Eric Blake  wrote:
> On 01/30/2018 09:28 PM, Fam Zheng wrote:
>> To make our efforts on QEMU testing easier to consume by contributors,
>> let's add a document. For example, Patchew reports build errors on
>> patches that should be relatively easy to reproduce with a few steps, and
>> it is much nicer if there is such a documentation that it can refer to.
>>
>> This focuses on how to run existing tests and how to write new test
>> cases, without going into the frameworks themselves.
>>
>> The VM based testing section is moved from tests/vm/README which now
>> is a single line pointing to the new doc.
>>
>> Signed-off-by: Fam Zheng 
>>
>
>> +Testing with "make check"
>> +=
>> +
>> +The "make check" testing family includes most of the C based tests in QEMU. 
>> For
>> +a quick help, run ``make check-help`` from the source tree.
>> +
>> +The usual way to run these tests is:
>> +
>> +.. code::
>> +
>> +  make check
>> +
>> +which includes QAPI schema tests, unit tests, and QTests. Different 
>> sub-types
>> +of "make check" testings will be explained below.
>
> s/testings/tests/ or even s/testings//
>
>
>> +Since unit tests don't require environment variables, the simplest way to 
>> debug
>> +a unit test failure is often directly invoking it or even running it under
>> +``gdb``. However there can still be differences in behavior between ``make``
>> +invocations and your manual run, due to ``$MALLOC_PERTURB_`` environment
>> +variable (which affects memory reclaimation and catches invalid pointers
>
> s/reclaimation/reclamation/
>
>
>> +  $ sudo groupadd docker
>> +  $ sudo usermod $USER -G docker
>> +  $ sudo chown :docker /var/run/docker.sock
>> +
>> +Note that any one of above configurations makes it possible for the user to
>> +exploit the whole host with Docker bind mounting or other privileged
>> +operations.  So only do it on developement machines.
>
> s/developement/development/

Thank you! Will update. I should have run it through the Grammarly
check one more time. :)

Fam



Re: [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

2018-01-31 Thread Liang Li
On Tue, Jan 30, 2018 at 08:20:03AM -0600, Eric Blake wrote:
> On 01/30/2018 02:38 AM, Liang Li wrote:
>> When doing drive mirror to a low speed shared storage, if there was heavy
>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>> can't be canceled immediately, it would keep running until the heavy BLK IO
>> workload stopped in the VM.
> 
> So far so good.   But the grammar and explanation in the rest of the
> commit is a bit hard to read; let me give a shot at an alternative wording:
> 
> Libvirt depends on the current block-job-cancel semantics, which is that
> when used without a flag after the 'ready' event, the command blocks
> until data is in sync.  However, these semantics are awkward in other
> situations, for example, people may use drive mirror for realtime
> backups while still wanting to use block live migration.  Libvirt cannot
> start a block live migration while another drive mirror is in progress,
> but the user would rather abandon the backup attempt as broken and
> proceed with the live migration than be stuck waiting for the current
> drive mirror backup to finish.
> 
> The drive-mirror command already includes a 'force' flag, which libvirt
> does not use, although it documented the flag as only being useful to
> quit a job which is paused.  However, since quitting a paused job has
> the same effect as abandoning a backup in a non-paused job (namely, the
> destination file is not in sync, and the command completes immediately),
> we can just improve the documentation to make the force flag obviously
> useful.
> 

much better, will include in the v2. Thanks!
>> 
>> Cc: Paolo Bonzini 
>> Cc: Jeff Cody 
>> Cc: Kevin Wolf 
>> Cc: Max Reitz 
>> Cc: Eric Blake 
>> Cc: John Snow 
>> Reported-by: Huaitong Han 
>> Signed-off-by: Huaitong Han 
>> Signed-off-by: Liang Li 
>> ---
> 
> 
>> +++ b/hmp-commands.hx
>> @@ -106,7 +106,8 @@ ETEXI
>> .args_type  = "force:-f,device:B",
>> .params = "[-f] device",
>> .help   = "stop an active background block operation (use -f"
>> -  "\n\t\t\t if the operation is currently paused)",
>> +  "\n\t\t\t if you want to abort the operation 
>> immediately"
>> +  "\n\t\t\t instead of keep running until data is in 
>> sync )",
> 
> s/sync )/sync)/
> 

done
>> .cmd= hmp_block_job_cancel,
>> },
>> 
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index 00403d9..4a96c42 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -63,6 +63,12 @@ typedef struct BlockJob {
>> bool cancelled;
>> 
>> /**
>> + * Set to true if the job should be abort immediately without waiting
> 
> s/be //

done
> 
>> + * for data is in sync.
> 
> s/is/to be/
> 

done
>> + */
>> +bool force;
>> +
>> +/**
>>  * Counter for pause request. If non-zero, the block job is either 
>> paused,
>>  * or if busy == true will pause itself as soon as possible.
>>  */
>> @@ -218,10 +224,11 @@ void block_job_start(BlockJob *job);
>> /**
>>  * block_job_cancel:
>>  * @job: The job to be canceled.
>> + * @force: Quit a job without waiting data is in sync.
> 
> s/data is/for data to be/
> 

done
>> +++ b/qapi/block-core.json
>> @@ -2098,8 +2098,10 @@
>> #  the name of the parameter), but since QEMU 2.7 it can have
>> #  other values.
>> #
>> -# @force: whether to allow cancellation of a paused job (default
>> -# false).  Since 1.3.
>> +# @force: #optional whether to allow cancellation a job without waiting 
>> data is
> 
> The '#optional' tag should no longer be added.
> 
>> +# in sync, please not that since 2.12 it's semantic is not exactly 
>> the
>> +# same as before, from 1.3 to 2.11 it means whether to allow 
>> cancellation
>> +# of a paused job (default false).  Since 1.3.
> 
> Reads awkwardly.  I suggest:
> 
> @force: If true, and the job has already emitted the event
> BLOCK_JOB_READY, abandon the job immediately (even if it is paused)
> instead of waiting for the destination to complete its final
> synchronization (since 1.3)
> 

much more clear
> 
>> +++ b/tests/test-blockjob-txn.c
>> @@ -125,7 +125,7 @@ static void test_single_job(int expected)
>> block_job_start(job);
>> 
>> if (expected == -ECANCELED) {
>> -block_job_cancel(job);
>> +block_job_cancel(job, false);
>> }
>> 
>> while (result == -EINPROGRESS) {
>> @@ -173,10 +173,10 @@ static void test_pair_jobs(int expected1, int 
>> expected2)
>> block_job_txn_unref(txn);
>> 
>> if (expected1 == -ECANCELED) {
>> -block_job_cancel(job1);
>> +block_job_cancel(job1, false);
>> }
>> if (expected2 == -ECANCELED) {
>> -   

Re: [Qemu-devel] vhost-user question

2018-01-31 Thread jack.chen
thanks,but I really can not understand how the fd works,can someone
explain it or give me some  reference material??


2018-02-01 1:31 GMT+08:00 Dr. David Alan Gilbert :
> * jack.chen (zhun...@gmail.com) wrote:
>> hello,I am confused when I read vhost-user source code in qemu.I know
>> vhost-user app shared memory with qemu by mmap,but why it can use fd which
>> is belong to qemu?
>> relative code:
>>  qemu code in function vhost_user_set_mem_table
>> fd = memory_region_get_fd(mr);
>> if (fd > 0) {
>> msg.payload.memory.regions[fd_num].userspace_addr =
>> reg->userspace_addr;
>> msg.payload.memory.regions[fd_num].memory_size  =
>> reg->memory_size;
>> msg.payload.memory.regions[fd_num].guest_phys_addr =
>> reg->guest_phys_addr;
>> msg.payload.memory.regions[fd_num].mmap_offset = offset;
>> assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
>> fds[fd_num++] = fd;
>> }
>>
>> ……
>> DPDK code in vhost_user_set_mem_table
>>
>> mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
>> MAP_SHARED | MAP_POPULATE, fd, 0);
>> ……
>>
>> thanks a lot!
>
> Because that's how the dpdk/vhost-user binary knows what to mmap;
> each fd corresponds to the backing file of the memory area that's being
> shared.  This way the dpdk/vhost doesn't need to open those files itself
> or try and match the exact memory configuration of qemu; QEMU just gives
> it the exact thing it needs to mmap - which is just the fd and offsets.
>
> Dave
>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH] sii3112: Change angle brackets to quotes in #include lines

2018-01-31 Thread David Gibson
On Tue, Jan 30, 2018 at 02:10:10PM +0100, BALATON Zoltan wrote:
> This matches what other files do for qemu includes
> 
> Signed-off-by: BALATON Zoltan 

Applied, thanks.

> ---
>  hw/ide/sii3112.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
> index 17aa930..e3896c6 100644
> --- a/hw/ide/sii3112.c
> +++ b/hw/ide/sii3112.c
> @@ -12,8 +12,8 @@
>   * http://wiki.osdev.org/User:Quok/Silicon_Image_Datasheets
>   */
>  
> -#include 
> -#include 
> +#include "qemu/osdep.h"
> +#include "hw/ide/pci.h"
>  #include "trace.h"
>  
>  #define TYPE_SII3112_PCI "sii3112"

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] iotests: Fix CID for VMDK afl image

2018-01-31 Thread Fam Zheng
On Thu, Feb 1, 2018 at 2:58 AM, Max Reitz  wrote:
> On 2018-01-30 07:25, Fam Zheng wrote:
>> This reverts commit 76bf133c4 which updated the reference output, and
>> fixed the reference image, because the code path we want to exercise is
>> actually the invalid image size.
>>
>> The descriptor block in the image, which includes the CID to verify, has been
>> invalid since the reference image was added. Since commit 9877860e7bd we 
>> report
>> this error earlier than the "file too large", so 059.out mismatches.
>>
>> The binary change is generated along the operations of:
>>
>>   $ bunzip2 afl9.vmdk.bz2
>>   $ qemu-img create -f vmdk fix.vmdk 1G
>>   $ dd if=afl9.vmdk.bz2 of=fix.vmdk bs=512 count=1 conv=notrunc
>>   $ mv fix.vmdk afl9.vmdk
>>   $ bzip2 afl9.vmdk
>>
>> Signed-off-by: Fam Zheng 
>>
>> ---
>>
>> v2: Fix commit message "qcow2 -> vmdk". [Kevin]
>> Revert 76bf133c4.
>
> H, now this fails again on my 32 bit build. :-(
>
> The issue there is that you get a "Cannot allocate memory" when trying
> to open the file.  My current fix was 2291712c39111a732 which simply
> converted that to "Invalid argument", but now it's getting a bit more
> complicated...  Should I just continue to play the game and check the
> output for "Cannot allocate memory" and print exactly what the reference
> output is expecting...?

Ahhh. OK, then, with a big comment.

I'd say let's just _notrun on 32 bit.

Fam



Re: [Qemu-devel] [PATCH v3 18/18] sdcard: add an enum for the SD PHY Spec version

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> So far this device intends to model the Spec v1.10
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 1f6c4ce2a4..9880a5d090 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -47,6 +47,11 @@
>
>  //#define DEBUG_SD 1
>
> +typedef enum {
> +SD_PHY_SPEC_VER_1_10 = 110,
> +SD_PHY_SPEC_VER_2_00 = 200, /* not yet supported */
> +} sd_phy_spec_ver_t;
> +
>  typedef enum {
>  sd_r0 = 0,/* no response */
>  sd_r1,/* normal response command */
> @@ -122,6 +127,7 @@ struct SDState {
>  qemu_irq inserted_cb;
>  QEMUTimer *ocr_power_timer;
>  const char *proto_name;
> +int spec_version;
>  bool enable;
>  uint8_t dat_lines;
>  bool cmd_line;
> @@ -2191,6 +2197,7 @@ static void sd_realize(DeviceState *dev, Error **errp)
>  int ret;
>
>  sd->proto_name = sd->spi ? "SPI" : "SD";
> +sd->spec_version = SD_PHY_SPEC_VER_1_10;
>
>  if (sd->blk && blk_is_read_only(sd->blk)) {
>  error_setg(errp, "Cannot use read-only drive as SD card");
> --
> 2.15.1
>
>



Re: [Qemu-devel] [PATCH v3 13/18] sdcard: simplify SEND_IF_COND (CMD8)

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> replace switch(single case) -> if()
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 26 +++---
>  1 file changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index b5c947df62..707c294169 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1015,23 +1015,19 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>
>  case 8:/* CMD8:   SEND_IF_COND */
>  /* Physical Layer Specification Version 2.00 command */
> -switch (sd->state) {
> -case sd_idle_state:
> -sd->vhs = 0;
> -
> -/* No response if not exactly one VHS bit is set.  */
> -if (!(req.arg >> 8) || (req.arg >> (ctz32(req.arg & ~0xff) + 
> 1))) {
> -return sd->spi ? sd_r7 : sd_r0;
> -}
> -
> -/* Accept.  */
> -sd->vhs = req.arg;
> -return sd_r7;
> -
> -default:
> +if (sd->state != sd_idle_state) {
>  break;
>  }
> -break;
> +sd->vhs = 0;
> +
> +/* No response if not exactly one VHS bit is set.  */
> +if (!(req.arg >> 8) || (req.arg >> (ctz32(req.arg & ~0xff) + 1))) {
> +return sd->spi ? sd_r7 : sd_r0;
> +}
> +
> +/* Accept.  */
> +sd->vhs = req.arg;
> +return sd_r7;
>
>  case 9:/* CMD9:   SEND_CSD */
>  switch (sd->state) {
> --
> 2.15.1
>
>



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Dan Williams
On Wed, Jan 31, 2018 at 4:24 PM, Haozhong Zhang
 wrote:
> On 01/31/18 16:08 -0800, Dan Williams wrote:
>> On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
>>  wrote:
>> > On 01/31/18 14:25 -0800, Dan Williams wrote:
>> >> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
>> >>  wrote:
>> >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> >> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
>> >> > files on ext4/xfs file system mounted with '-o dax').
>> >>
>> >> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
>> >> metadata is in sync after a fault. However, that does not make
>> >> filesystem-DAX safe for use with QEMU, because we still need to
>> >> coordinate DMA with fileystem operations. There is no way to do that
>> >> coordination from within a guest. QEMU needs to use device-dax if the
>> >> guest might ever perform DMA to a virtual-pmem range. See this patch
>> >> set for more details on the DAX vs DMA problem [1]. I think we need to
>> >> enforce this in the host kernel. I.e. do not allow file backed DAX
>> >> pages to be mapped in EPT entries unless / until we have a solution to
>> >> the DMA synchronization problem. Apologies for not noticing this
>> >> earlier.
>> >
>> > QEMU does not truncate or punch holes of the file once it has been
>> > mmap()'ed. Does the problem [1] still exist in such case?
>>
>> Something else on the system might. The only agent that could enforce
>> protection is the kernel, and the kernel will likely just disallow
>> passing addresses from filesystem-dax vmas through to a guest
>> altogether. I think there's even a problem in the non-DAX case unless
>> KVM is pinning pages while they are handed out to a guest. The problem
>> is that we don't have a page cache page to pin in the DAX case.
>>
>
> Does it mean any user-space code like
>   ptr = mmap(..., fd, ...); // fd refers to a file on DAX filesystem
>   // make DMA to ptr
> is unsafe?

Yes, it is currently unsafe because there is no coordination with the
filesytem if it decides to make block layout changes. We can fix that
in the non-virtualization case by having the filesystem wait for DMA
completion callbacks (i.e. what for all pages to be idle), but as far
as I can see we can't do the same coordination for DMA initiated by a
guest device driver.



Re: [Qemu-devel] [PATCH v3 11/18] sdcard: check the card is in correct state for APP CMD (CMD55)

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index bbf9496e8a..434d1fbc47 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1389,6 +1389,14 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>
>  /* Application specific commands (Class 8) */
>  case 55:   /* CMD55:  APP_CMD */
> +switch (sd->state) {
> +case sd_ready_state:
> +case sd_identification_state:
> +case sd_inactive_state:
> +return sd_illegal;
> +default:
> +break;
> +}
>  if (!sd->spi) {
>  if (sd->rca != rca) {
>  return sd_r0;
> --
> 2.15.1
>
>



Re: [Qemu-devel] [PATCH v3 12/18] sdcard: warn if host uses an incorrect address for APP CMD (CMD55)

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 434d1fbc47..b5c947df62 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1394,6 +1394,11 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  case sd_identification_state:
>  case sd_inactive_state:
>  return sd_illegal;
> +case sd_idle_state:
> +if (rca) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "SD: illegal RCA 0x%04x for APP_CMD\n", 
> req.cmd);
> +}
>  default:
>  break;
>  }
> --
> 2.15.1
>
>



Re: [Qemu-devel] [PATCH v2] docs: Add docs/devel/testing.rst

2018-01-31 Thread Fam Zheng
On Wed, Jan 31, 2018 at 9:49 PM, Andrew Jones  wrote:
> On Wed, Jan 31, 2018 at 11:28:00AM +0800, Fam Zheng wrote:
>> +On top of libqtest, a higher level library, ``libqos``, was created to
>> +encapsulate common tasks of device drivers, such as memory management and
>> +communicating with system buses or devices. Many virtual device tests use
>> +libqos instead of directly calling into libqos.
>^^ libqtest?

Yes, I've apparently made a mistake.

Fam

>
> Thanks,
> drew



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Haozhong Zhang
On 01/31/18 16:08 -0800, Dan Williams wrote:
> On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
>  wrote:
> > On 01/31/18 14:25 -0800, Dan Williams wrote:
> >> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
> >>  wrote:
> >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> >> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> >> > files on ext4/xfs file system mounted with '-o dax').
> >>
> >> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
> >> metadata is in sync after a fault. However, that does not make
> >> filesystem-DAX safe for use with QEMU, because we still need to
> >> coordinate DMA with fileystem operations. There is no way to do that
> >> coordination from within a guest. QEMU needs to use device-dax if the
> >> guest might ever perform DMA to a virtual-pmem range. See this patch
> >> set for more details on the DAX vs DMA problem [1]. I think we need to
> >> enforce this in the host kernel. I.e. do not allow file backed DAX
> >> pages to be mapped in EPT entries unless / until we have a solution to
> >> the DMA synchronization problem. Apologies for not noticing this
> >> earlier.
> >
> > QEMU does not truncate or punch holes of the file once it has been
> > mmap()'ed. Does the problem [1] still exist in such case?
> 
> Something else on the system might. The only agent that could enforce
> protection is the kernel, and the kernel will likely just disallow
> passing addresses from filesystem-dax vmas through to a guest
> altogether. I think there's even a problem in the non-DAX case unless
> KVM is pinning pages while they are handed out to a guest. The problem
> is that we don't have a page cache page to pin in the DAX case.
> 

Does it mean any user-space code like
  ptr = mmap(..., fd, ...); // fd refers to a file on DAX filesystem
  // make DMA to ptr
is unsafe?



Re: [Qemu-devel] [PATCH v3 10/18] sdcard: handle CMD54 (SDIO)

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> Linux uses it to poll the bus before polling for a card.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/sd/sd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 07424aa56e..bbf9496e8a 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1378,9 +1378,8 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  }
>  break;
>
> -case 52:
> -case 53:
> -/* CMD52, CMD53: reserved for SDIO cards
> +case 52 ... 54:
> +/* CMD52, CMD53, CMD54: reserved for SDIO cards
>   * (see the SDIO Simplified Specification V2.0)
>   * Handle as illegal command but do not complain
>   * on stderr, as some OSes may use these in their
> --
> 2.15.1
>
>



Re: [Qemu-devel] [PATCH v3 09/18] sdcard: handles more commands in SPI mode

2018-01-31 Thread Alistair Francis
On Mon, Jan 22, 2018 at 7:30 PM, Philippe Mathieu-Daudé  wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sd/sd.c | 57 ++---
>  1 file changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 2eca999bc3..07424aa56e 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1390,9 +1390,11 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>
>  /* Application specific commands (Class 8) */
>  case 55:   /* CMD55:  APP_CMD */
> -if (sd->rca != rca)
> -return sd_r0;
> -
> +if (!sd->spi) {
> +if (sd->rca != rca) {
> +return sd_r0;
> +}
> +}
>  sd->expecting_acmd = true;
>  sd->card_status |= APP_CMD;
>  return sd_r1;
> @@ -1412,6 +1414,18 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  }
>  break;
>
> +case 58:/* CMD58:   READ_OCR (SPI) */
> +if (!sd->spi) {
> +goto bad_cmd;
> +}
> +return sd_r3;
> +
> +case 59:/* CMD59:   CRC_ON_OFF (SPI) */
> +if (!sd->spi) {
> +goto bad_cmd;
> +}
> +goto unimplemented_cmd;
> +
>  default:
>  bad_cmd:
>  qemu_log_mask(LOG_GUEST_ERROR, "SD: Unknown CMD%i\n", req.cmd);
> @@ -1436,6 +1450,9 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  sd->card_status |= APP_CMD;
>  switch (req.cmd) {
>  case 6:/* ACMD6:  SET_BUS_WIDTH */
> +if (sd->spi) {
> +goto unimplemented_cmd;
> +}
>  switch (sd->state) {
>  case sd_transfer_state:
>  sd->sd_status[0] &= 0x3f;
> @@ -1460,6 +1477,12 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  }
>  break;
>
> +case 18:

These should all have a comment describing what they are and for the
others as well.

> +if (sd->spi) {
> +goto unimplemented_cmd;
> +}
> +break;
> +
>  case 22:   /* ACMD22: SEND_NUM_WR_BLOCKS */
>  switch (sd->state) {
>  case sd_transfer_state:
> @@ -1485,6 +1508,19 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  }
>  break;
>
> +case 25:
> +case 26:
> +if (sd->spi) {
> +goto unimplemented_cmd;
> +}
> +break;
> +
> +case 38:
> +if (sd->spi) {
> +goto unimplemented_cmd;
> +}
> +break;
> +
>  case 41:   /* ACMD41: SD_APP_OP_COND */
>  if (sd->spi) {
>  /* SEND_OP_CMD */
> @@ -1542,6 +1578,12 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  }
>  break;
>
> +case 43 ... 49:
> +if (sd->spi) {
> +goto unimplemented_cmd;
> +}
> +break;
> +
>  case 51:   /* ACMD51: SEND_SCR */
>  switch (sd->state) {
>  case sd_transfer_state:
> @@ -1555,9 +1597,18 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
>  }
>  break;
>
> +case 55:/* Not exist */
> +break;
> +
>  default:
>  /* Fall back to standard commands.  */
>  return sd_normal_command(sd, req);
> +
> +unimplemented_cmd:
> +/* Commands that are recognised but not yet implemented in SPI mode. 
>  */

This should be unimplemented_spi_cmd then, to be more clear.

Alistair

> +qemu_log_mask(LOG_UNIMP, "SD: CMD%i not implemented in SPI mode\n",
> +  req.cmd);
> +return sd_illegal;
>  }
>
>  qemu_log_mask(LOG_GUEST_ERROR, "SD: ACMD%i in a wrong state\n", req.cmd);
> --
> 2.15.1
>
>



Re: [Qemu-devel] [RFC v3 00/14] blockjobs: add explicit job management

2018-01-31 Thread John Snow


On 01/26/2018 09:05 PM, John Snow wrote:
> For jobs that complete when a monitor isn't looking, there's no way to
> tell what the job's final return code was. We need to allow jobs to
> remain in the list until queried for reliable management.
> 
> Furthermore, it's not viable to have graph changes when the monitor
> isn't looking either. We need at least another event for that.
> 
> This series is a rough sketch for the WAITING, PENDING and CONCLUDED
> events that accompany an expanded job management scheme that a management
> client can opt-in to.
> 
> V3:
>  - Added WAITING and PENDING events
>  - Added block_job_finalize verb
>  - Added .pending() callback for jobs
>  - Tweaked how .commit/.abort work
> 
> V2:
>  - Added tests!
>  - Changed property name (Jeff, Paolo)
> 
> RFC / Known problems:
> - I need a lot more tests.
> - Jobs need to actually implement their .pending callback for this to be of 
> any
>   actual use.
> - Mirror needs to be refactored to use the commit/abort/pending/clean 
> callbacks
>   to fulfill the promise made by "no graph changes without user authorization"
>   that PENDING is supposed to offer
> - Jobs beyond backup will be able to opt-in to the new management scheme in 
> the
>   next version.
> - V4 will include a forced synchronicity for jobs in a transaction; i.e. all
>   jobs will be forced to use either the old or new styles, but not a mix.
> 
> Please take a look and shout loudly if I'm wandering in the wrong direction.
> 
> 
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch block-job-reap
> https://github.com/jnsnow/qemu/tree/block-job-reap
> 
> This version is tagged block-job-reap-v3:
> https://github.com/jnsnow/qemu/releases/tag/block-job-reap-v3
> 
> John Snow (14):
>   blockjobs: add manual property
>   blockjobs: Add status enum
>   blockjobs: add state transition table
>   blockjobs: RFC add block_job_verb permission table
>   blockjobs: add block_job_dismiss
>   blockjobs: add CONCLUDED state
>   blockjobs: ensure abort is called for cancelled jobs
>   blockjobs: add commit, abort, clean helpers
>   blockjobs: add prepare callback
>   blockjobs: Add waiting event
>   blockjobs: add PENDING status and event
>   blockjobs: add block-job-finalize
>   blockjobs: Expose manual property
>   iotests: test manual job dismissal
> 
>  block/backup.c   |  22 ++--
>  block/commit.c   |   2 +-
>  block/mirror.c   |   2 +-
>  block/replication.c  |   5 +-
>  block/stream.c   |   2 +-
>  block/trace-events   |   2 +
>  blockdev.c   |  42 ++-
>  blockjob.c   | 279 
> ---
>  include/block/block_int.h|   9 +-
>  include/block/blockjob.h |  38 ++
>  include/block/blockjob_int.h |  12 +-
>  qapi/block-core.json | 135 +++--
>  tests/qemu-iotests/056   | 241 +
>  tests/qemu-iotests/056.out   |   4 +-
>  tests/test-bdrv-drain.c  |   4 +-
>  tests/test-blockjob-txn.c|   2 +-
>  tests/test-blockjob.c|   4 +-
>  17 files changed, 750 insertions(+), 55 deletions(-)
> 

NACK

There are a lot of changes I've already made to this series based on
Kevin's recommendations; please wait for the next revision.

And a lot of bugs in this series I've already found.

--js



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Dan Williams
On Wed, Jan 31, 2018 at 4:02 PM, Haozhong Zhang
 wrote:
> On 01/31/18 14:25 -0800, Dan Williams wrote:
>> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
>>  wrote:
>> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
>> > files on ext4/xfs file system mounted with '-o dax').
>>
>> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
>> metadata is in sync after a fault. However, that does not make
>> filesystem-DAX safe for use with QEMU, because we still need to
>> coordinate DMA with fileystem operations. There is no way to do that
>> coordination from within a guest. QEMU needs to use device-dax if the
>> guest might ever perform DMA to a virtual-pmem range. See this patch
>> set for more details on the DAX vs DMA problem [1]. I think we need to
>> enforce this in the host kernel. I.e. do not allow file backed DAX
>> pages to be mapped in EPT entries unless / until we have a solution to
>> the DMA synchronization problem. Apologies for not noticing this
>> earlier.
>
> QEMU does not truncate or punch holes of the file once it has been
> mmap()'ed. Does the problem [1] still exist in such case?

Something else on the system might. The only agent that could enforce
protection is the kernel, and the kernel will likely just disallow
passing addresses from filesystem-dax vmas through to a guest
altogether. I think there's even a problem in the non-DAX case unless
KVM is pinning pages while they are handed out to a guest. The problem
is that we don't have a page cache page to pin in the DAX case.



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Haozhong Zhang
On 01/31/18 14:25 -0800, Dan Williams wrote:
> On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
>  wrote:
> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > files on ext4/xfs file system mounted with '-o dax').
> 
> Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
> metadata is in sync after a fault. However, that does not make
> filesystem-DAX safe for use with QEMU, because we still need to
> coordinate DMA with fileystem operations. There is no way to do that
> coordination from within a guest. QEMU needs to use device-dax if the
> guest might ever perform DMA to a virtual-pmem range. See this patch
> set for more details on the DAX vs DMA problem [1]. I think we need to
> enforce this in the host kernel. I.e. do not allow file backed DAX
> pages to be mapped in EPT entries unless / until we have a solution to
> the DMA synchronization problem. Apologies for not noticing this
> earlier.

QEMU does not truncate or punch holes of the file once it has been
mmap()'ed. Does the problem [1] still exist in such case?

Thanks,
Haozhong

> 
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-December/013704.html



Re: [Qemu-devel] [RfC PATCH v2 4/5] vfio/display: core & wireup

2018-01-31 Thread Alex Williamson
On Wed, 31 Jan 2018 23:42:49 +
"Zhang, Tina"  wrote:
> > @@ -2977,6 +2984,8 @@ static void vfio_instance_init(Object *obj)  static
> > Property vfio_pci_dev_properties[] = {
> >  DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
> >  DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
> > +DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
> > +display, ON_OFF_AUTO_AUTO),  
> Not "x-display"?


x-display would mean that it's an unsupported, experimental option that
libvirt wouldn't enable and downstreams might consider unsupported.
Don't we want this to be a supported feature enabled through libvirt?
Thanks,

Alex



Re: [Qemu-devel] [RfC PATCH v2 4/5] vfio/display: core & wireup

2018-01-31 Thread Zhang, Tina


> -Original Message-
> From: Gerd Hoffmann [mailto:kra...@redhat.com]
> Sent: Wednesday, January 31, 2018 8:12 PM
> To: qemu-devel@nongnu.org
> Cc: Alex Williamson ; Zhang, Tina
> ; intel-gvt-...@lists.freedesktop.org; Kirti Wankhede
> ; Gerd Hoffmann 
> Subject: [RfC PATCH v2 4/5] vfio/display: core & wireup
> 
> Infrastructure for display support.  Must be enabled using 'display' property.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/vfio/pci.h |  3 +++
>  hw/vfio/display.c | 51
> +++
>  hw/vfio/pci.c |  9 +
>  hw/vfio/Makefile.objs |  2 +-
>  4 files changed, 64 insertions(+), 1 deletion(-)  create mode 100644
> hw/vfio/display.c
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index a8fb3b3422..8f3295188c
> 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -130,6 +130,7 @@ typedef struct VFIOPCIDevice {  #define
> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT 2  #define
> VFIO_FEATURE_ENABLE_IGD_OPREGION \
>  (1 << VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT)
> +OnOffAuto display;
>  int32_t bootindex;
>  uint32_t igd_gms;
>  uint8_t pm_cap;
> @@ -169,4 +170,6 @@ int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> struct vfio_region_info *info,
> Error **errp);
> 
> +int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
> +
>  #endif /* HW_VFIO_VFIO_PCI_H */
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c new file mode 100644 index
> 00..4ba93ea251
> --- /dev/null
> +++ b/hw/vfio/display.c
> @@ -0,0 +1,51 @@
> +/*
> + * display support for mdev based vgpu devices
> + *
> + * Copyright Red Hat, Inc. 2017
> + *
> + * Authors:
> + *Gerd Hoffmann
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> +See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include 
> +#include 
> +
> +#include "sysemu/sysemu.h"
> +#include "ui/console.h"
> +#include "pci.h"
> +
> +int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp) {
> +struct vfio_device_gfx_plane_info probe;
> +int ret;
> +
> +memset(, 0, sizeof(probe));
> +probe.argsz = sizeof(probe);
> +probe.flags = VFIO_GFX_PLANE_TYPE_PROBE |
> VFIO_GFX_PLANE_TYPE_DMABUF;
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_QUERY_GFX_PLANE, );
> +if (ret == 0) {
> +error_setg(errp, "vfio-display: dmabuf support not implemented yet");
> +return -1;
> +}
> +
> +memset(, 0, sizeof(probe));
> +probe.argsz = sizeof(probe);
> +probe.flags = VFIO_GFX_PLANE_TYPE_PROBE |
> VFIO_GFX_PLANE_TYPE_REGION;
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_QUERY_GFX_PLANE, );
> +if (ret == 0) {
> +error_setg(errp, "vfio-display: region support not implemented yet");
> +return -1;
> +}
> +
> +if (vdev->display == ON_OFF_AUTO_AUTO) {
> +/* not an error in automatic mode */
> +return 0;
> +}
> +
> +error_setg(errp, "vfio: device doesn't support any (known) display 
> method");
> +return -1;
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 2c71295125..cd5310c7ce
> 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2873,6 +2873,13 @@ static void vfio_realize(PCIDevice *pdev, Error
> **errp)
>  }
>  }
> 
> +if (vdev->display != ON_OFF_AUTO_OFF) {
> +ret = vfio_display_probe(vdev, errp);
> +if (ret) {
> +goto out_teardown;
> +}
> +}
> +
>  vfio_register_err_notifier(vdev);
>  vfio_register_req_notifier(vdev);
>  vfio_setup_resetfn_quirk(vdev);
> @@ -2977,6 +2984,8 @@ static void vfio_instance_init(Object *obj)  static
> Property vfio_pci_dev_properties[] = {
>  DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
>  DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
> +DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
> +display, ON_OFF_AUTO_AUTO),
Not "x-display"?

BR,
Tina
>  DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
> intx.mmap_timeout, 1100),
>  DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features, diff --git
> a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index
> c3ab9097f1..a2e7a0a7cf 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,6 +1,6 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
> -obj-$(CONFIG_PCI) += pci.o pci-quirks.o
> +obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o
>  obj-$(CONFIG_VFIO_CCW) += ccw.o
>  obj-$(CONFIG_SOFTMMU) += platform.o
>  obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
> --
> 2.9.3




Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-01-31 Thread Eduardo Habkost
On Wed, Jan 31, 2018 at 11:10:07PM +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
> > On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> > > Currently only file backed memory backend can
> > > be created with a "share" flag in order to allow
> > > sharing guest RAM with other processes in the host.
> > > 
> > > Add the "share" flag also to RAM Memory Backend
> > > in order to allow remapping parts of the guest RAM
> > > to different host virtual addresses. This is needed
> > > by the RDMA devices in order to remap non-contiguous
> > > QEMU virtual addresses to a contiguous virtual address range.
> > > 
> > 
> > Why do we need to make this configurable?  Would anything break
> > if MAP_SHARED was always used if possible?
> 
> See Documentation/vm/numa_memory_policy.txt for a list
> of complications.

Ew.

> 
> Maybe we should more of an effort to detect and report these
> issues.

Probably.  Having other features breaking silently when using
pvrdma doesn't sound good.  We must at least document those
problems in the documentation for memory-backend-ram.

BTW, what's the root cause for requiring HVAs in the buffer?  Can
this be fixed?

-- 
Eduardo



Re: [Qemu-devel] [PATCH v5] cocoa.m: Add ability for user to specify mouse ungrab key

2018-01-31 Thread Programmingkid

> On Jan 31, 2018, at 4:03 AM, Gerd Hoffmann  wrote:
> 
>>> What about the other hotkeys?
>>> 
>>> There is fullscreen.  Ctrl-Alt-F for SDL and GTK.  Cmd-F for cocoa, but
>>> it works only if the grab is not active.
>> 
>> If has to be that way because the meta (command) key is sent to the
>> guest when the mouse grab is in effect. I actually suggest the SDL and
>> GTK interfaces send the meta (windows key on PC keyboard) key to the
>> guest when a mouse grab is in effect. This way guest operating systems
>> like Mac OS X can use keyboard shortcuts. 
> 
> Well, the modifier key changes are actually sent to the guest, but
> usually they have no effect there.
> 
> So, if you press Ctrl-Alt-F, the guest will see the keydown and keyup
> events for ctrl and alt.  It will not see the 'f', because SDL hijacks
> that.

Maybe the f could be sent to the guest directly via 
qemu_input_event_send_key_qcode().

> 
>>> Console select (Ctrl-Alt-), works for SDL and GTK.  When I read the
>>> code correctly it should work for cocoa the same way, but it doesn't
>>> work for me.  Dunno why. 
>> 
>> I know this code was broken in cocoa for a while but I made the patch
>> that fixes this problem. Console selection does work correctly using a
>> recent git version of QEMU.
> 
> Just pulled latest master and recompiled, still not working for me.
> This is sierra in a vm.

What software do you use to run the VM? It could be possible the VM software is 
interfering with the ungrab keys.

> Unrelated side note: hvf doesn't work for me too (nested on kvm).

I haven't played with this new feature yet. Maybe I will try it out one day.

> 
>>> Quit. Ctrl-Alt-Q on gtk.  Cmd-Q on cocoa, again only working without
>>> keyboard grab.  Nothing on SDL.  Just closing the window to quit works
>>> on GTK and SDL, both have a switch to turn it off.
>> 
>> I know Command-Q is the standard Macintosh keyboard short for quitting
>> an application. With GTK I would imagine it would depend on how the
>> host operating system expects applications to be built. I don't think
>> Windows and Linux have a standard keyboard shortcut for quitting an
>> application. 
> 
> On Linux Ctrl-Q seems to be common (maybe only in the gnome world).
> On Windows Alt-F4 should be standard.
> 
> Qemu picked two-modifier hot-keys (like Ctrl-Alt-Q) to avoid overlaps
> with guest hot keys, so both qemu and guest hotkeys can be used without
> having to deal with keyboard/mouse grabs all the time.
> 
>> Ok I see what you want:
>> 
>> -display 
>> gtk,hotkey-modifiers=ctrl+shift,hotkey-grab=f12,hotkey-fullscreen=f11
>> 
>> -display cocoa,hotkey-modifiers=Command,hotkey-grab=f13,hotkey-fullscreen=f14
> 
> I think that would be "super" not "Command" because that is the name for
> the key in the pc world so this is what the key got as QKeyCode name.
> 
>> I assume hotkey-modifiers is used to set the meta key.
> 
> Yes.
> 
>> Is hotkey-grab the key used to ungrab the mouse?
> 
> Yes (keyboard grab too).
> 
>> If it is shouldn't it be called hotkey-ungrab?
> 
> Good question.  On gtk/sdl Ctrl-Alt-G actually toggles the grab, i.e.
> can do *both* grab and ungrab.  But I expect most users use it for
> ungrab only, due to mouse clicks activating the grab too.
> 
 Example usage:  -ungrab home
-ungrab shift-ctrl
>>> 
>>> Modifier-only hotkeys are tricky with gtk (doable, but no support for
>>> that in the toolkit).
>>> 
-ungrab ctrl-x
-ungrab pgup-pgdn
>>> 
>>> Really?  Two non-modifier keys?
>> Yes. The ungrab sequence could be a-b-c and still allow these keys to be 
>> used in the guest.
>> 
>>> How is that implemented?  Do you queue
>>> up the pgup keypress, waiting to see whenever pgdn is pressed too, then
>>> only in case that didn't happen forward the queued pgup key to the guest?
>> 
>> Basically there is a array that acts as a check list. It checks off
>> keys that belong to the ungrab sequence as they are detected. Once a
>> non-ungrab key is detected, the check list is cleared. If all the
>> ungrab keys are detected the ungrab code is executed. This only
>> happens on keyup events. That way if Control-ALT were the ungrab keys,
>> sending Control-ALT-Delete to the guest is still possible because
>> these are the keys detected on the keyup event. The Delete key would
>> have cleared the check list. Daniel Berrange is the one I can thank
>> for this idea. He might be able to explain it better than me.
> 
> Hmm, ok.
> 
> Doing the same for gtk would basically imply to not use any toolkit
> support for hotkeys ...
> 
> It'll also become more difficuilt if we use that for multiple hotkeys.
> 
> But possibly we can share the code across all UIs, now that keycodemapdb
> is used by qemu.  So first translate the keycode from the UI toolkit to
> a QKeyCode, then feed that into shared hotkey detection code.

Sounds like a good idea. I will be more than happy to help. My cocoa code
uses sets to implement keeping track 

Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Alistair Francis
On Wed, Jan 31, 2018 at 12:32 PM, Alex Bennée  wrote:
>
> Alistair Francis  writes:
>
>> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini  wrote:
>>> On 30/01/2018 18:56, Alistair Francis wrote:

 I don't have a good solution though, as setting CPU_INTERRUPT_RESET
 doesn't help (that isn't handled while we are halted) and
 async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
 want.

 I've ever tried pausing all CPUs before reseting the CPU and them
 resuming them all but that doesn't seem to to work either.
>>>
>>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>>> takes care of stopping all other CPUs while the function runs.
>>>
 Is there
 anything I'm missing? Is there no reliable way to reset a CPU?
>>>
>>> What do you mean by reliable?  Executing no instruction after the one
>>> you were at?
>>
>> The reset is called by a GPIO line, so I need the reset to be called
>> basically as quickly as the GPIO line changes. The async_ and
>> async_safe_ functions seem to not run quickly enough, even if I run a
>> process_work_queue() function afterwards.
>>
>> Is there a way to kick the CPU to act on the async_*?
>
> Define quickly enough? The async_(safe) functions kick the vCPUs so they
> will all exit the run loop as they enter the next TB (even if they loop
> to themselves).

We have a special power controller CPU that wakes all the CPUs up and
at boot the async_* functions don't wake the CPUs up. If I just use
the cpu_rest() function directly everything starts fine (but then I
hit issues later).

If I forcefully run process_queued_cpu_work() then I can get the CPUs
up, but I don't think that is the right solution.

>
> From an external vCPUs point of view those extra instructions have
> already executed. If the resetting vCPU needs them to have reset by the
> time it executes it's next instruction it should either cpu_loop_exit at
> that point or ensure it is the last instruction in it's TB (which is
> what we do for the MMU flush cases in ARM, they all end the TB at that
> point).

cpu_loop_exit() sounds like it would help, but as I'm not in the CPU
context it just seg faults.

Alistair

>
>
>>
>> Thanks,
>> Alistair
>>
>>>
>>> Paolo
>>>
>
>
> --
> Alex Bennée
>



Re: [Qemu-devel] [PATCH v4 0/6] nvdimm: support MAP_SYNC for memory-backend-file

2018-01-31 Thread Dan Williams
On Tue, Jan 30, 2018 at 10:02 PM, Haozhong Zhang
 wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').

Wait, MAP_SYNC does not guarantee persistence. It makes sure that the
metadata is in sync after a fault. However, that does not make
filesystem-DAX safe for use with QEMU, because we still need to
coordinate DMA with fileystem operations. There is no way to do that
coordination from within a guest. QEMU needs to use device-dax if the
guest might ever perform DMA to a virtual-pmem range. See this patch
set for more details on the DAX vs DMA problem [1]. I think we need to
enforce this in the host kernel. I.e. do not allow file backed DAX
pages to be mapped in EPT entries unless / until we have a solution to
the DMA synchronization problem. Apologies for not noticing this
earlier.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-December/013704.html



Re: [Qemu-devel] [PATCH] hw/audio/sb16.c: Convert file to new logging API

2018-01-31 Thread Programmingkid

> On Jan 31, 2018, at 4:43 PM, Programmingkid  wrote:
> 
>> 
>> On Jan 31, 2018, at 4:22 AM, Gerd Hoffmann  wrote:
>> 
>> On Tue, Jan 30, 2018 at 10:30:46AM -0500, Programmingkid wrote:
>>> 
 On Jan 30, 2018, at 4:41 AM, Gerd Hoffmann  wrote:
 
> @@ -148,15 +142,16 @@ static int irq_of_magic (int magic)
> #if 0
> static void log_dsp (SB16State *dsp)
> {
> -ldebug ("%s:%s:%d:%s:dmasize=%d:freq=%d:const=%d:speaker=%d\n",
> -dsp->fmt_stereo ? "Stereo" : "Mono",
> -dsp->fmt_signed ? "Signed" : "Unsigned",
> -dsp->fmt_bits,
> -dsp->dma_auto ? "Auto" : "Single",
> -dsp->block_size,
> -dsp->freq,
> -dsp->time_const,
> -dsp->speaker);
> +qemu_log_mask(LOG_UNIMP, "%s:%s:%d:%s:dmasize=%d:freq=%d:const=%d:"
> +  "speaker=%d\n",
> +  dsp->fmt_stereo ? "Stereo" : "Mono",
> +  dsp->fmt_signed ? "Signed" : "Unsigned",
> +  dsp->fmt_bits,
> +  dsp->dma_auto ? "Auto" : "Single",
> +  dsp->block_size,
> +  dsp->freq,
> +  dsp->time_const,
> +  dsp->speaker);
> }
> #endif
 
 Hmm, dead code.  Any places which call log_dsp() ?
>>> 
>>> There are several places but they are all dead code. Do you want this 
>>> removed?
>> 
>> So it is still used but not compiled in by default.
>> 
>> Looking at all of this again, I think "qemu_log_mask(LOG_UNIMP, ...)"
>> should only be used in case there *really* is some soundblaster feature
>> unimplemented.  Otherwise converting to a tracepoint is probably more
>> useful, so the debug messages can be enabled/disabled individually at
>> runtime.
>> 
>> The "#if 0" seems to be there to limit the flood of debug messages.  So
>> once they all are converted to trace points this is not needed any more
>> and I think we should just enable all this code.
>> 
>> dolog() seems to be mostly cases where the code warns about
>> unimplemented featues or incomplete emulation, so using LOG_UNIMP looks
>> sensible.
> 
> Simply replacing dolog() with "qemu_log_mask(LOG_UNIMP," seems like what was 
> desired.
> 
>> ldebug() should probably trace points.
> 
> I need help with this. I tried replacing ldebug() with 
> "qemu_log_mask(LOG_TRACE,", but that caused a lot of messages to be printed 
> to the terminal. It is odd because I thought I had to use "-d trace" to make 
> it print. Do you have another suggestion on what to replace ldebug() with? 
> 
>> Probably makes sense to split this patch up into a small series.
> 
> The first patch will be the replacing dolog() with qemu_log_mask(LOG_UNIMP. 
> The second patch will be replacing ldebug() with what you suggest.

After doing a little more research I found this code in log.c:

/* enable or disable low levels log */
void qemu_set_log(int log_flags)
{
qemu_loglevel = log_flags;
#ifdef CONFIG_TRACE_LOG
qemu_loglevel |= LOG_TRACE;
#endif

It looks like no matter what the trace level that is set, LOG_TRACE will always 
be used. I think this might be a mistake. The configure script will set 
CONFIG_TRACE_LOG if the backend is log. This appears to be the default 
behavior. This will lead to LOG_TRACE being set in util/log.c. 


Re: [Qemu-devel] [PATCH] hw/audio/sb16.c: Convert file to new logging API

2018-01-31 Thread Programmingkid

> On Jan 31, 2018, at 4:22 AM, Gerd Hoffmann  wrote:
> 
> On Tue, Jan 30, 2018 at 10:30:46AM -0500, Programmingkid wrote:
>> 
>>> On Jan 30, 2018, at 4:41 AM, Gerd Hoffmann  wrote:
>>> 
 @@ -148,15 +142,16 @@ static int irq_of_magic (int magic)
 #if 0
 static void log_dsp (SB16State *dsp)
 {
 -ldebug ("%s:%s:%d:%s:dmasize=%d:freq=%d:const=%d:speaker=%d\n",
 -dsp->fmt_stereo ? "Stereo" : "Mono",
 -dsp->fmt_signed ? "Signed" : "Unsigned",
 -dsp->fmt_bits,
 -dsp->dma_auto ? "Auto" : "Single",
 -dsp->block_size,
 -dsp->freq,
 -dsp->time_const,
 -dsp->speaker);
 +qemu_log_mask(LOG_UNIMP, "%s:%s:%d:%s:dmasize=%d:freq=%d:const=%d:"
 +  "speaker=%d\n",
 +  dsp->fmt_stereo ? "Stereo" : "Mono",
 +  dsp->fmt_signed ? "Signed" : "Unsigned",
 +  dsp->fmt_bits,
 +  dsp->dma_auto ? "Auto" : "Single",
 +  dsp->block_size,
 +  dsp->freq,
 +  dsp->time_const,
 +  dsp->speaker);
 }
 #endif
>>> 
>>> Hmm, dead code.  Any places which call log_dsp() ?
>> 
>> There are several places but they are all dead code. Do you want this 
>> removed?
> 
> So it is still used but not compiled in by default.
> 
> Looking at all of this again, I think "qemu_log_mask(LOG_UNIMP, ...)"
> should only be used in case there *really* is some soundblaster feature
> unimplemented.  Otherwise converting to a tracepoint is probably more
> useful, so the debug messages can be enabled/disabled individually at
> runtime.
> 
> The "#if 0" seems to be there to limit the flood of debug messages.  So
> once they all are converted to trace points this is not needed any more
> and I think we should just enable all this code.
> 
> dolog() seems to be mostly cases where the code warns about
> unimplemented featues or incomplete emulation, so using LOG_UNIMP looks
> sensible.

Simply replacing dolog() with "qemu_log_mask(LOG_UNIMP," seems like what was 
desired.

> ldebug() should probably trace points.

I need help with this. I tried replacing ldebug() with 
"qemu_log_mask(LOG_TRACE,", but that caused a lot of messages to be printed to 
the terminal. It is odd because I thought I had to use "-d trace" to make it 
print. Do you have another suggestion on what to replace ldebug() with? 

> Probably makes sense to split this patch up into a small series.

The first patch will be the replacing dolog() with qemu_log_mask(LOG_UNIMP. The 
second patch will be replacing ldebug() with what you suggest.


Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-01-31 Thread Michael S. Tsirkin
On Wed, Jan 31, 2018 at 06:40:59PM -0200, Eduardo Habkost wrote:
> On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> > Currently only file backed memory backend can
> > be created with a "share" flag in order to allow
> > sharing guest RAM with other processes in the host.
> > 
> > Add the "share" flag also to RAM Memory Backend
> > in order to allow remapping parts of the guest RAM
> > to different host virtual addresses. This is needed
> > by the RDMA devices in order to remap non-contiguous
> > QEMU virtual addresses to a contiguous virtual address range.
> > 
> 
> Why do we need to make this configurable?  Would anything break
> if MAP_SHARED was always used if possible?

See Documentation/vm/numa_memory_policy.txt for a list
of complications.

Maybe we should more of an effort to detect and report these
issues.

> 
> > Moved the "share" flag to the Host Memory base class,
> > modified phys_mem_alloc to include the new parameter
> > and a new interface memory_region_init_ram_shared_nomigrate.
> > 
> > There are no functional changes if the new flag is not used.
> > 
> > Signed-off-by: Marcel Apfelbaum 
> [...]
> 
> -- 
> Eduardo



Re: [Qemu-devel] [PATCH V8 1/4] mem: add share parameter to memory-backend-ram

2018-01-31 Thread Eduardo Habkost
On Wed, Jan 17, 2018 at 11:54:18AM +0200, Marcel Apfelbaum wrote:
> Currently only file backed memory backend can
> be created with a "share" flag in order to allow
> sharing guest RAM with other processes in the host.
> 
> Add the "share" flag also to RAM Memory Backend
> in order to allow remapping parts of the guest RAM
> to different host virtual addresses. This is needed
> by the RDMA devices in order to remap non-contiguous
> QEMU virtual addresses to a contiguous virtual address range.
> 

Why do we need to make this configurable?  Would anything break
if MAP_SHARED was always used if possible?


> Moved the "share" flag to the Host Memory base class,
> modified phys_mem_alloc to include the new parameter
> and a new interface memory_region_init_ram_shared_nomigrate.
> 
> There are no functional changes if the new flag is not used.
> 
> Signed-off-by: Marcel Apfelbaum 
[...]

-- 
Eduardo



Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Alex Bennée

Alistair Francis  writes:

> On Tue, Jan 30, 2018 at 8:26 PM, Paolo Bonzini  wrote:
>> On 30/01/2018 18:56, Alistair Francis wrote:
>>>
>>> I don't have a good solution though, as setting CPU_INTERRUPT_RESET
>>> doesn't help (that isn't handled while we are halted) and
>>> async_run_on_cpu()/run_on_cpu() doesn't reliably reset the CPU when we
>>> want.
>>>
>>> I've ever tried pausing all CPUs before reseting the CPU and them
>>> resuming them all but that doesn't seem to to work either.
>>
>> async_safe_run_on_cpu would be like async_run_on_cpu, except that it
>> takes care of stopping all other CPUs while the function runs.
>>
>>> Is there
>>> anything I'm missing? Is there no reliable way to reset a CPU?
>>
>> What do you mean by reliable?  Executing no instruction after the one
>> you were at?
>
> The reset is called by a GPIO line, so I need the reset to be called
> basically as quickly as the GPIO line changes. The async_ and
> async_safe_ functions seem to not run quickly enough, even if I run a
> process_work_queue() function afterwards.
>
> Is there a way to kick the CPU to act on the async_*?

Define quickly enough? The async_(safe) functions kick the vCPUs so they
will all exit the run loop as they enter the next TB (even if they loop
to themselves).

>From an external vCPUs point of view those extra instructions have
already executed. If the resetting vCPU needs them to have reset by the
time it executes it's next instruction it should either cpu_loop_exit at
that point or ensure it is the last instruction in it's TB (which is
what we do for the MMU flush cases in ARM, they all end the TB at that
point).


>
> Thanks,
> Alistair
>
>>
>> Paolo
>>


--
Alex Bennée



Re: [Qemu-devel] [PATCH v3 20/39] qcow2: Update qcow2_get_cluster_offset() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> qcow2_get_cluster_offset() checks how many contiguous bytes are
> available at a given offset. The returned number of bytes is limited
> by the amount that can be addressed without having to load more than
> one L2 table.
> 
> Since we'll be loading L2 slices instead of full tables this patch
> changes the limit accordingly using the size of the L2 slice for the
> calculations instead of the full table size.
> 
> One consequence of this is that with small L2 slices operations such
> as 'qemu-img map' will need to iterate in more steps because each
> qcow2_get_cluster_offset() call will potentially return a smaller
> number. However the code is already prepared for that so this doesn't
> break semantics.
> 
> The l2_table variable is also renamed to l2_slice to reflect this, and
> offset_to_l2_index() is replaced with offset_to_l2_slice_index().
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 30 +++---
>  1 file changed, 15 insertions(+), 15 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RfC PATCH v2 0/5] vfio: add display support

2018-01-31 Thread Alex Williamson
On Wed, 31 Jan 2018 13:12:12 +0100
Gerd Hoffmann  wrote:

> This series adds support for a vgpu display to the qemu vfio code.
> For now only regions are supported, dmabufs will follow later.
> 
> The vfio API update is done, queued in drm-next, should land in the
> upstream kernel during the 4.16 merge window.  So the 4.16-rc1 kernel
> header sync should bring the header changes needed for this series.
> 
> Patch #1 of this series has the vfio.h updates too, for testing
> convinience, but I don't plan to include that patch in the final
> patch submission.
> 
> plese test and review,

Hi Gerd,

The vfio bits look reasonable to me, it'd be nice if we could operate
on the VFIODevice rather than VFIOPCIDevice to make this universal for
all vfio devices, but that's just a nit since non-PCI graphics devices
may never come to fruition.  As for testing, region support was
included for NVIDIA, dmabuf for Intel... is this testing request mainly
for the Kirti and others at NVIDIA?  Thanks,

Alex



[Qemu-devel] [PATCH] tests/migration: Add source to PC boot block

2018-01-31 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The boot block used in the migration test is currently only
shipped as a hex (with the source in the git commit message),
change this to actually include the source.

A makefile rule is added, but the expectation is that
the generated hex is shipped as well as the .s, so that
there's no requirement to have just the right assembler etc.

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/Makefile.include  | 18 
 tests/migration-test.c  | 46 +--
 tests/migration/x86-a-b-bootblock.h | 52 +
 tests/migration/x86-a-b-bootblock.s | 92 +
 4 files changed, 163 insertions(+), 45 deletions(-)
 create mode 100644 tests/migration/x86-a-b-bootblock.h
 create mode 100644 tests/migration/x86-a-b-bootblock.s

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 851aafe9d1..2a0889479c 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -831,6 +831,24 @@ tests/migration/initrd-stress.img: 
tests/migration/stress$(EXESUF)
rm $(INITRD_WORK_DIR)/init
rmdir $(INITRD_WORK_DIR)
 
+ASM_WORK_DIR=tests/asm-temp
+$(SRC_PATH)/tests/migration/x86-a-b-bootblock.h: 
$(SRC_PATH)/tests/migration/x86-a-b-bootblock.s
+   mkdir $(ASM_WORK_DIR)
+   as --32 -march=i486 $< -o $(ASM_WORK_DIR).o
+   objcopy -O binary $(ASM_WORK_DIR).o $(ASM_WORK_DIR).boot
+   dd if=$(ASM_WORK_DIR).boot of=$(ASM_WORK_DIR).bootsect bs=256 count=2 
skip=124
+   xxd -i $(ASM_WORK_DIR).bootsect | \
+sed -e 's/tests_asm_temp_bootsect/bootsect/' -e 's/.*int.*//' > 
$(ASM_WORK_DIR).hex
+   echo -e "/* This file is automatically generated from\n" \
+ " * tests/migration/x86-a-b-bootblock.s, edit that and then\n" \
+ " * make tests/migration/x86-a-b-bootblock.h to update,\n" \
+ " * and then remember to send both in your patch submission.\n" \
+ " */\n" | cat - $(ASM_WORK_DIR).hex > $@
+   rm $(ASM_WORK_DIR).hex $(ASM_WORK_DIR).bootsect $(ASM_WORK_DIR).boot
+   rm $(ASM_WORK_DIR).o
+   rmdir $(ASM_WORK_DIR)
+
+
 ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
 endif
diff --git a/tests/migration-test.c b/tests/migration-test.c
index 799e24ebc6..7550fd8d56 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -79,51 +79,7 @@ static const char *tmpfs;
 /* A simple PC boot sector that modifies memory (1-100MB) quickly
  * outputing a 'B' every so often if it's still running.
  */
-unsigned char bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x74, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
-  0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
-  0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
-  0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
-  0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
-  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x75, 0xe9, 0x66, 0xb8, 0x42, 0x00, 0x66,
-  0xba, 0xf8, 0x03, 0xee, 0xeb, 0xde, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
-  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x5c, 0x7c,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Ard Biesheuvel
On 31 January 2018 at 19:12, Christoffer Dall
 wrote:
> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
>  wrote:
>> On 31 January 2018 at 17:39, Christoffer Dall
>>  wrote:
>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>>>  wrote:
 On 31 January 2018 at 16:53, Christoffer Dall
  wrote:
> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
>  wrote:
>> On 31 January 2018 at 09:53, Christoffer Dall
>>  wrote:
>>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
 On 29/01/18 10:04, Peter Maydell wrote:
 > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
 >  wrote:
 >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
 >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
 >>>  wrote:
  * Peter Maydell (peter.mayd...@linaro.org) wrote:
 > I think the correct fix here is that your test code should turn
 > its MMU on. Trying to treat guest RAM as uncacheable doesn't work
 > for Arm KVM guests (for the same reason that VGA device video 
 > memory
 > doesn't work). If it's RAM your guest has to arrange to map it as
 > Normal Cacheable, and then everything should work fine.
 
  Does this cause problems with migrating at just the wrong point 
  during
  a VM boot?
 >>>
 >>> It wouldn't surprise me if it did, but I don't think I've ever
 >>> tried to provoke that problem...
 >>
 >> If you think it'll get the RAM contents wrong, it might be best to 
 >> fail
 >> the migration if you can detect the cache is disabled in the guest.
 >
 > I guess QEMU could look at the value of the "MMU disabled/enabled" 
 > bit
 > in the guest's system registers, and refuse migration if it's off...
 >
 > (cc'd Marc, Christoffer to check that I don't have the wrong end
 > of the stick about how thin the ice is in the period before the
 > guest turns on its MMU...)

 Once MMU and caches are on, we should be in a reasonable place for QEMU
 to have a consistent view of the memory. The trick is to prevent the
 vcpus from changing that. A guest could perfectly turn off its MMU at
 any given time if it needs to (and it is actually required on some HW 
 if
 you want to mitigate headlining CVEs), and KVM won't know about that.

>>>
>>> (Clarification: KVM can detect this is it bother to check the VCPU's
>>> system registers, but we don't trap to KVM when the VCPU turns off its
>>> caches, right?)
>>>
 You may have to pause the vcpus before starting the migration, or
 introduce a new KVM feature that would automatically pause a vcpu that
 is trying to disable its MMU while the migration is on. This would
 involve trapping all the virtual memory related system registers, with
 an obvious cost. But that cost would be limited to the time it takes to
 migrate the memory, so maybe that's acceptable.

>>> Is that even sufficient?
>>>
>>> What if the following happened. (1) guest turns off MMU, (2) guest
>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>>> incoherent, ...).
>>>
>>> I'm also not really sure if pausing one VCPU because it turned off its
>>> MMU will go very well when trying to migrate a large VM (wouldn't this
>>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>>> appears to be dead?).  As a short-term 'fix' it's probably better to
>>> refuse migration if you detect that a VCPU had begun turning off its
>>> MMU.
>>>
>>> On the larger scale of thins; this appears to me to be another case of
>>> us really needing some way to coherently access memory between QEMU and
>>> the VM, but in the case of the VCPU turning off the MMU prior to
>>> migration, we don't even know where it may have written data, and I'm
>>> therefore not really sure what the 'proper' solution would be.
>>>
>>> (cc'ing Ard who has has thought about this problem before in the context
>>> of UEFI and VGA.)
>>>
>>
>> Actually, the VGA case is much simpler because the host is not
>> expected to write to the framebuffer, only read from it, and the guest
>> is not expected to create a cacheable mapping for it, so any
>> incoherency can be trivially solved by cache invalidation on the host
>> side. (Note that this has nothing to do 

Re: [Qemu-devel] [PATCH v3 19/39] qcow2: Update get_cluster_table() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This patch updates get_cluster_table() to return L2 slices instead of
> full L2 tables.
> 
> The code itself needs almost no changes, it only needs to call
> offset_to_l2_slice_index() instead of offset_to_l2_index(). This patch
> also renames all the relevant variables and the documentation.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 18/39] qcow2: Refactor get_cluster_table()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> After the previous patch we're now always using l2_load() in
> get_cluster_table() regardless of whether a new L2 table has to be
> allocated or not.
> 
> This patch refactors that part of the code to use one single l2_load()
> call.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 2a53c1dc5f..0c0cab85e8 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -695,15 +695,7 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  return -EIO;
>  }
>  
> -/* seek the l2 table of the given l2 offset */
> -
> -if (s->l1_table[l1_index] & QCOW_OFLAG_COPIED) {
> -/* load the l2 table in memory */
> -ret = l2_load(bs, offset, l2_offset, _table);
> -if (ret < 0) {
> -return ret;
> -}
> -} else {
> +if (!(s->l1_table[l1_index] & QCOW_OFLAG_COPIED)) {
>  /* First allocate a new L2 table (and do COW if needed) */
>  ret = l2_allocate(bs, l1_index);
>  if (ret < 0) {
> @@ -719,11 +711,12 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  /* Get the offset of the newly-allocated l2 table */
>  l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
>  assert(offset_into_cluster(s, l2_offset) == 0);
> -/* Load the l2 table in memory */
> -ret = l2_load(bs, offset, l2_offset, _table);
> -if (ret < 0) {
> -return ret;
> -}
> +}
> +
> +/* load the l2 table in memory */
> +ret = l2_load(bs, offset, l2_offset, _table);
> +if (ret < 0) {
> +return ret;
>  }

I'd pull the l2_offset assignment (and alignment check) down below the
QCOW_OFLAG_COPIED check, saving us another two lines (!!1!) which we
could then spend on an

assert(s->l1_table[l1_index] & QCOW_OFLAG_COPIED);

Max

>  
>  /* find the cluster offset for the given disk offset */
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 17/39] qcow2: Update l2_allocate() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This patch updates l2_allocate() to support the qcow2 cache returning
> L2 slices instead of full L2 tables.
> 
> The old code simply gets an L2 table from the cache and initializes it
> with zeroes or with the contents of an existing table. With a cache
> that returns slices instead of tables the idea remains the same, but
> the code must now iterate over all the slices that are contained in an
> L2 table.
> 
> Since now we're operating with slices the function can no longer
> return the newly-allocated table, so it's up to the caller to retrieve
> the appropriate L2 slice after calling l2_allocate() (note that with
> this patch the caller is still loading full L2 tables, but we'll deal
> with that in a separate patch).
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 56 
> +++
>  1 file changed, 34 insertions(+), 22 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 57349928a9..2a53c1dc5f 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -264,11 +264,12 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int 
> l1_index)
>   *
>   */
>  
> -static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> +static int l2_allocate(BlockDriverState *bs, int l1_index)
>  {
>  BDRVQcow2State *s = bs->opaque;
>  uint64_t old_l2_offset;
> -uint64_t *l2_table = NULL;
> +uint64_t *l2_slice = NULL;
> +unsigned slice, slice_size2, n_slices;

I'd personally prefer size_t, but oh well.

(And also maybe slice_size_bytes or slice_bytes instead of the
not-so-intuitive slice_size2.  I know we're using *_size2 in other
places, but that's bad enough as it is.)

Overall (with that fixed or not, and with the spelling fixed as pointed
out by Eric);

Reviewed-by: Max Reitz 

However, I'm wondering whether this is the best approach.  The old L2
table is probably not going to be used after this function, so we're
basically polluting the cache here.  That was bad enough so far, but now
that actually means wasting multiple cache entries on it.

Sure, the code is simpler this way.  But maybe it would still be better
to manually copy the data over from the old offset...  (As long as it's
not much more complicated.)

Max

>  int64_t l2_offset;
>  int ret;
>  



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 16/39] qcow2: Prepare l2_allocate() for adding L2 slice support

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Adding support for L2 slices to l2_allocate() needs (among other
> things) an extra loop that iterates over all slices of a new L2 table.
> 
> Putting all changes in one patch would make it hard to read because
> all semantic changes would be mixed with pure indentation changes.
> 
> To make things easier this patch simply creates a new block and
> changes the indentation of all lines of code inside it. Thus, all
> modifications in this patch are cosmetic. There are no semantic
> changes and no variables are renamed yet. The next patch will take
> care of that.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 55 
> ---
>  1 file changed, 30 insertions(+), 25 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 15/39] qcow2: Update l2_load() to support L2 slices

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Each entry in the qcow2 L2 cache stores a full L2 table (which uses a
> complete cluster in the qcow2 image). A cluster is usually too large
> to be used efficiently as the size for a cache entry, so we want to
> decouple both values by allowing smaller cache entries. Therefore the
> qcow2 L2 cache will no longer return full L2 tables but slices
> instead.
> 
> This patch updates l2_load() so it can handle L2 slices correctly.
> Apart from the offset of the L2 table (which we already had) we also
> need the guest offset in order to calculate which one of the slices
> we need.
> 
> An L2 slice has currently the same size as an L2 table (one cluster),
> so for now this function will load exactly the same data as before.
> 
> This patch also removes a stale comment about the return value being
> a pointer to the L2 table. This function returns an error code since
> 55c17e9821c474d5fcdebdc82ed2fc096777d611.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 14/39] qcow2: Add offset_to_l2_slice_index()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Similar to offset_to_l2_index(), this function takes a guest offset
> and returns the index in the L2 slice that contains its L2 entry.
> 
> An L2 slice has currently the same size as an L2 table (one cluster),
> so both functions return the same value for now.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.h | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 13/39] qcow2: Add l2_slice_size field to BDRVQcow2State

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> The BDRVQcow2State structure contains an l2_size field, which stores
> the number of 64-bit entries in an L2 table.
> 
> For efficiency reasons we want to be able to load slices instead of
> full L2 tables, so we need to know how many entries an L2 slice can
> hold.
> 
> An L2 slice is the portion of an L2 table that is loaded by the qcow2
> cache. At the moment that cache can only load complete tables,
> therefore an L2 slice has the same size as an L2 table (one cluster)
> and l2_size == l2_slice_size.
> 
> Later we'll allow smaller slices, but until then we have to use this
> new l2_slice_size field to make the rest of the code ready for that.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2.c | 3 +++
>  block/qcow2.h | 1 +
>  2 files changed, 4 insertions(+)

Am I missing something or does this patch miss setting l2_slice_size in
qcow2_do_open()?

Max

> diff --git a/block/qcow2.c b/block/qcow2.c
> index e2d4bf7ad5..78f067cae7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -805,6 +805,7 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  typedef struct Qcow2ReopenState {
>  Qcow2Cache *l2_table_cache;
>  Qcow2Cache *refcount_block_cache;
> +int l2_slice_size; /* Number of entries in a slice of the L2 table */
>  bool use_lazy_refcounts;
>  int overlap_check;
>  bool discard_passthrough[QCOW2_DISCARD_MAX];
> @@ -886,6 +887,7 @@ static int qcow2_update_options_prepare(BlockDriverState 
> *bs,
>  }
>  }
>  
> +r->l2_slice_size = s->cluster_size / sizeof(uint64_t);
>  r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
>  r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
>  if (r->l2_table_cache == NULL || r->refcount_block_cache == NULL) {
> @@ -1049,6 +1051,7 @@ static void 
> qcow2_update_options_commit(BlockDriverState *bs,
>  }
>  s->l2_table_cache = r->l2_table_cache;
>  s->refcount_block_cache = r->refcount_block_cache;
> +s->l2_slice_size = r->l2_slice_size;
>  
>  s->overlap_check = r->overlap_check;
>  s->use_lazy_refcounts = r->use_lazy_refcounts;
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0559afbc63..e0aee88811 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -251,6 +251,7 @@ typedef struct BDRVQcow2State {
>  int cluster_bits;
>  int cluster_size;
>  int cluster_sectors;
> +int l2_slice_size;
>  int l2_bits;
>  int l2_size;
>  int l1_size;
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 12/39] qcow2: Add offset_to_l1_index()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> Similar to offset_to_l2_index(), this function returns the index in
> the L1 table for a given guest offset. This is only used in a couple
> of places and it's not a particularly complex calculation, but it
> makes the code a bit more readable.
> 
> Although in the qcow2_get_cluster_offset() case the old code was
> taking advantage of the l1_bits variable, we're going to get rid of
> the other uses of l1_bits in a later patch anyway, so it doesn't make
> sense to keep it just for this.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 4 ++--
>  block/qcow2.h | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 11/39] qcow2: Remove BDS parameter from qcow2_cache_is_table_offset()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_addr(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c| 3 +--
>  block/qcow2-refcount.c | 6 +++---
>  block/qcow2.h  | 3 +--
>  3 files changed, 5 insertions(+), 7 deletions(-)

I was hoping you'd smuggle something funny into one of the commit
messages of these patches to test whether we'd actually read them. :(

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 10/39] qcow2: Remove BDS parameter from qcow2_cache_discard()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx() and qcow2_cache_table_release(). This
> is no longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c| 2 +-
>  block/qcow2-refcount.c | 6 +++---
>  block/qcow2.h  | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Alistair Francis
On Wed, Jan 31, 2018 at 10:59 AM, Peter Maydell
 wrote:
> On 31 January 2018 at 18:51, Alistair Francis  wrote:
>> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
>>  wrote:
>>> On 31 January 2018 at 18:17, Alistair Francis  wrote:
 On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini  wrote:
> cpu->halted = false likewise should not be needed here, but you cannot
> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
> cpu_handle_halt will clear cpu->halted.

 The problem with that is that I hit this assert for ARM CPUs:

 qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
 Assertion `el >= 1 && el <= 3' failed.
>>>
>>> Backtrace from when you hit that might be useful...
>>
>> Here it is:
>>
>> (gdb) bt
>> #0  0x71a030bb in __GI_raise (sig=sig@entry=6) at
>> ../sysdeps/unix/sysv/linux/raise.c:51
>> #1  0x71a04f5d in __GI_abort () at abort.c:90
>> #2  0x719faf17 in __assert_fail_base (fmt=,
>> assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
>> file=file@entry=0x55cf8660
>> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
>> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
>> "arm_el_is_aa64") at assert.c:92
>> #3  0x719fafc2 in __GI___assert_fail
>> (assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
>> file=file@entry=0x55cf8660
>> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
>> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
>> "arm_el_is_aa64") at assert.c:101
>> #4  0x557eb872 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
>> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
>> #5  0x55951233 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
>> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
>> #6  0x55951233 in arm_cpu_do_interrupt (cs=0x57234550) at
>> /scratch/alistai/master-qemu/target/arm/helper.c:8020
>
> The problem is here (or further down the callstack) -- you
> definitely don't want to be trying to take an interrupt from
> the guest's perspective, which is what arm_cpu_do_interrupt()
> is for...
>
> This is probably happening because cpu->exception_index isn't
> right at this point (though the arm code has a habit of leaving
> it set to whatever its value was last...)

Ok, adding a cpu->exception_index = -1 seems to fix the assert.

Thanks for that Peter.

Now I'm just left with a hang :(

Alistair

>
>> #7  0x5585e75b in cpu_handle_exception (ret=> pointer>, cpu=0x56c64200)
>> at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
>> #8  0x5585e75b in cpu_exec (cpu=cpu@entry=0x57234550) at
>> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
>> #9  0x5582d963 in tcg_cpu_exec (cpu=0x57234550) at
>> /scratch/alistai/master-qemu/cpus.c:1297
>> #10 0x5582d963 in qemu_tcg_cpu_thread_fn (arg=0x57234550)
>> at /scratch/alistai/master-qemu/cpus.c:1502
>> #11 0x71db37fc in start_thread (arg=0x7ffef6b43700) at
>> pthread_create.c:465
>> #12 0x71ae0b5f in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH v3 09/39] qcow2: Remove BDS parameter from qcow2_cache_clean_unused()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_table_release(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 2 +-
>  block/qcow2.c   | 4 ++--
>  block/qcow2.h   | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 08/39] qcow2: Remove BDS parameter from qcow2_cache_destroy()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was never using the BlockDriverState parameter so it can
> be safely removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c |  2 +-
>  block/qcow2.c   | 16 
>  block/qcow2.h   |  2 +-
>  3 files changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 07/39] qcow2: Remove BDS parameter from qcow2_cache_put()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c|  2 +-
>  block/qcow2-cluster.c  | 28 ++--
>  block/qcow2-refcount.c | 30 +++---
>  block/qcow2.h  |  2 +-
>  4 files changed, 31 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 06/39] qcow2: Remove BDS parameter from qcow2_cache_entry_mark_dirty()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to pass it
> to qcow2_cache_get_table_idx(). This is no longer necessary so this
> parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c|  3 +--
>  block/qcow2-cluster.c  | 12 ++--
>  block/qcow2-refcount.c | 14 ++
>  block/qcow2.h  |  3 +--
>  4 files changed, 14 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 05/39] qcow2: Remove BDS parameter from qcow2_cache_table_release()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 04/39] qcow2: Remove BDS parameter from qcow2_cache_get_table_idx()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 03/39] qcow2: Remove BDS parameter from qcow2_cache_get_table_addr()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function was only using the BlockDriverState parameter to get the
> cache table size (since it was equal to the cluster size). This is no
> longer necessary so this parameter can be removed.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 02/39] qcow2: Add table size field to Qcow2Cache

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> The table size in the qcow2 cache is currently equal to the cluster
> size. This doesn't allow us to use the cache memory efficiently,
> particularly with large cluster sizes, so we need to be able to have
> smaller cache tables that are independent from the cluster size. This
> patch adds a new field to Qcow2Cache that we can use instead of the
> cluster size.
> 
> The current table size is still being initialized to the cluster size,
> so there are no semantic changes yet, but this patch will allow us to
> prepare the rest of the code and simplify a few function calls.
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cache.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 39/39] iotests: Add l2-cache-entry-size to iotest 137

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> This test tries reopening a qcow2 image with valid and invalid
> options. This patch adds l2-cache-entry-size to the set.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/137 | 5 +
>  tests/qemu-iotests/137.out | 2 ++
>  2 files changed, 7 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 38/39] iotests: Test downgrading an image using a small L2 slice size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> expand_zero_clusters_in_l1() is used when downgrading qcow2 images
> from v3 to v2 (compat=0.10). This is one of the functions that needed
> more changes to support L2 slices, so this patch extends iotest 061 to
> test downgrading a qcow2 image using a smaller slice size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/061 | 16 
>  tests/qemu-iotests/061.out | 61 
> ++
>  2 files changed, 77 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 37/39] iotests: Test valid values of l2-cache-entry-size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> The l2-cache-entry-size setting can only contain values that are
> powers of two between 512 and the cluster size.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  tests/qemu-iotests/103 | 17 +
>  tests/qemu-iotests/103.out |  3 +++
>  2 files changed, 20 insertions(+)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 36/39] qcow2: Allow configuring the L2 slice size

2018-01-31 Thread Eric Blake
On 01/26/2018 09:00 AM, Alberto Garcia wrote:
> Now that the code is ready to handle L2 slices we can finally add an
> option to allow configuring their size.
> 
> An L2 slice is the portion of an L2 table that is read by the qcow2
> cache. Until now the cache was always reading full L2 tables, and
> since the L2 table size is equal to the cluster size this was not very
> efficient with large clusters. Here's a more detailed explanation of
> why it makes sense to have smaller cache entries in order to load L2
> data:
> 
>https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> This patch introduces a new command-line option to the qcow2 driver
> named l2-cache-entry-size (cf. l2-cache-size). The cache entry size
> has the same restrictions as the cluster size: it must be a power of
> two and it has the same range of allowed values, with the additional
> requirement that it must not be larger than the cluster size.
> 
> The L2 cache entry size (L2 slice size) remains equal to the cluster
> size for now by default, so this feature must be explicitly enabled.
> Although my tests show that 4KB slices consistently improve
> performance and give the best results, let's wait and make more tests
> with different cluster sizes before deciding on an optimal default.
> 
> Now that the cache entry size is not necessarily equal to the cluster
> size we need to reflect that in the MIN_L2_CACHE_SIZE documentation.
> That minimum value is a requirement of the COW algorithm: we need to
> read two L2 slices (and not two L2 tables) in order to do COW, see
> l2_allocate() for the actual code.
> 
> Signed-off-by: Alberto Garcia 
> ---
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 01/39] qcow2: Fix documentation of get_cluster_table()

2018-01-31 Thread Max Reitz
On 2018-01-26 15:59, Alberto Garcia wrote:
> This function has not been returning the offset of the L2 table since
> commit 3948d1d4876065160583e79533bf604481063833
> 
> Signed-off-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> ---
>  block/qcow2-cluster.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

2018-01-31 Thread Christoffer Dall
On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
 wrote:
> On 31 January 2018 at 17:39, Christoffer Dall
>  wrote:
>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
>>  wrote:
>>> On 31 January 2018 at 16:53, Christoffer Dall
>>>  wrote:
 On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
  wrote:
> On 31 January 2018 at 09:53, Christoffer Dall
>  wrote:
>> On Mon, Jan 29, 2018 at 10:32:12AM +, Marc Zyngier wrote:
>>> On 29/01/18 10:04, Peter Maydell wrote:
>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
>>> >  wrote:
>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
>>> >>>  wrote:
>>>  * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>> > I think the correct fix here is that your test code should turn
>>> > its MMU on. Trying to treat guest RAM as uncacheable doesn't work
>>> > for Arm KVM guests (for the same reason that VGA device video 
>>> > memory
>>> > doesn't work). If it's RAM your guest has to arrange to map it as
>>> > Normal Cacheable, and then everything should work fine.
>>> 
>>>  Does this cause problems with migrating at just the wrong point 
>>>  during
>>>  a VM boot?
>>> >>>
>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
>>> >>> tried to provoke that problem...
>>> >>
>>> >> If you think it'll get the RAM contents wrong, it might be best to 
>>> >> fail
>>> >> the migration if you can detect the cache is disabled in the guest.
>>> >
>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
>>> > in the guest's system registers, and refuse migration if it's off...
>>> >
>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
>>> > of the stick about how thin the ice is in the period before the
>>> > guest turns on its MMU...)
>>>
>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
>>> to have a consistent view of the memory. The trick is to prevent the
>>> vcpus from changing that. A guest could perfectly turn off its MMU at
>>> any given time if it needs to (and it is actually required on some HW if
>>> you want to mitigate headlining CVEs), and KVM won't know about that.
>>>
>>
>> (Clarification: KVM can detect this is it bother to check the VCPU's
>> system registers, but we don't trap to KVM when the VCPU turns off its
>> caches, right?)
>>
>>> You may have to pause the vcpus before starting the migration, or
>>> introduce a new KVM feature that would automatically pause a vcpu that
>>> is trying to disable its MMU while the migration is on. This would
>>> involve trapping all the virtual memory related system registers, with
>>> an obvious cost. But that cost would be limited to the time it takes to
>>> migrate the memory, so maybe that's acceptable.
>>>
>> Is that even sufficient?
>>
>> What if the following happened. (1) guest turns off MMU, (2) guest
>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
>> incoherent, ...).
>>
>> I'm also not really sure if pausing one VCPU because it turned off its
>> MMU will go very well when trying to migrate a large VM (wouldn't this
>> ask for all the other VCPUs beginning to complain that the stopped VCPU
>> appears to be dead?).  As a short-term 'fix' it's probably better to
>> refuse migration if you detect that a VCPU had begun turning off its
>> MMU.
>>
>> On the larger scale of thins; this appears to me to be another case of
>> us really needing some way to coherently access memory between QEMU and
>> the VM, but in the case of the VCPU turning off the MMU prior to
>> migration, we don't even know where it may have written data, and I'm
>> therefore not really sure what the 'proper' solution would be.
>>
>> (cc'ing Ard who has has thought about this problem before in the context
>> of UEFI and VGA.)
>>
>
> Actually, the VGA case is much simpler because the host is not
> expected to write to the framebuffer, only read from it, and the guest
> is not expected to create a cacheable mapping for it, so any
> incoherency can be trivially solved by cache invalidation on the host
> side. (Note that this has nothing to do with DMA coherency, but only
> with PCI MMIO BARs that are backed by DRAM in the host)

 In case of the running guest, the host will also only read from the
 cached mapping.  Of 

Re: [Qemu-devel] [PATCH v2] block: maintain persistent disabled bitmaps

2018-01-31 Thread Max Reitz
On 2018-01-29 19:43, Max Reitz wrote:
> On 2018-01-22 11:41, Vladimir Sementsov-Ogievskiy wrote:
>> To maintain load/store disabled bitmap there is new approach:
>>
>>  - deprecate @autoload flag of block-dirty-bitmap-add, make it ignored
>>  - store enabled bitmaps as "auto" to qcow2
>>  - store disabled bitmaps without "auto" flag to qcow2
>>  - on qcow2 open load "auto" bitmaps as enabled and others
>>as disabled (except in_use bitmaps)
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Reviewed-by: John Snow 
>> ---
> 
> Thanks, looks very reasonable.  Applied to my block branch:
> 
> https://github.com/XanClic/qemu/commits/block

...aaand I've only just now seen that iotest 176 will need to be fixed
along with this, so I'm going to unqueue this patch for now.

And when I'm already at it: Should we add deprecation information to
qemu-doc.texi?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread Max Reitz
On 2018-01-31 20:01, John Snow wrote:
> 
> 
> On 01/31/2018 01:54 PM, Max Reitz wrote:
>> On 2018-01-18 11:58, Liang Li wrote:
>>> The .count of HBitmap is forgot to set in function
>>> hbitmap_deserialize_finish, let's set it to the right value.
>>>
>>> Cc: Vladimir Sementsov-Ogievskiy 
>>> Cc: Fam Zheng 
>>> Cc: Max Reitz 
>>> Cc: John Snow 
>>> Signed-off-by: weiping zhang 
>>> Signed-off-by: Liang Li 
>>> ---
>>>  util/hbitmap.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>>> index 289778a..58a2c93 100644
>>> --- a/util/hbitmap.c
>>> +++ b/util/hbitmap.c
>>> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>>>  }
>>>  
>>>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
>>> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>>>  }
>>>  
>>>  void hbitmap_free(HBitmap *hb)
>>
>> Actually CC-ing John...
>>
>> (Looks good to me, though.)
>>
>> Max
>>
> 
> Staged already, sorry. Will send the PR this Friday.
> 
> https://github.com/jnsnow/qemu/commit/78ad6913bd34a54f658d5182990fe149614d6402

OK, good. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread John Snow


On 01/31/2018 01:54 PM, Max Reitz wrote:
> On 2018-01-18 11:58, Liang Li wrote:
>> The .count of HBitmap is forgot to set in function
>> hbitmap_deserialize_finish, let's set it to the right value.
>>
>> Cc: Vladimir Sementsov-Ogievskiy 
>> Cc: Fam Zheng 
>> Cc: Max Reitz 
>> Cc: John Snow 
>> Signed-off-by: weiping zhang 
>> Signed-off-by: Liang Li 
>> ---
>>  util/hbitmap.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/util/hbitmap.c b/util/hbitmap.c
>> index 289778a..58a2c93 100644
>> --- a/util/hbitmap.c
>> +++ b/util/hbitmap.c
>> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>>  }
>>  
>>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
>> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>>  }
>>  
>>  void hbitmap_free(HBitmap *hb)
> 
> Actually CC-ing John...
> 
> (Looks good to me, though.)
> 
> Max
> 

Staged already, sorry. Will send the PR this Friday.

https://github.com/jnsnow/qemu/commit/78ad6913bd34a54f658d5182990fe149614d6402



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Peter Maydell
On 31 January 2018 at 18:51, Alistair Francis  wrote:
> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
>  wrote:
>> On 31 January 2018 at 18:17, Alistair Francis  wrote:
>>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini  wrote:
 cpu->halted = false likewise should not be needed here, but you cannot
 just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
 interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
 cpu_handle_halt will clear cpu->halted.
>>>
>>> The problem with that is that I hit this assert for ARM CPUs:
>>>
>>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>>> Assertion `el >= 1 && el <= 3' failed.
>>
>> Backtrace from when you hit that might be useful...
>
> Here it is:
>
> (gdb) bt
> #0  0x71a030bb in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x71a04f5d in __GI_abort () at abort.c:90
> #2  0x719faf17 in __assert_fail_base (fmt=,
> assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x55cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:92
> #3  0x719fafc2 in __GI___assert_fail
> (assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x55cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:101
> #4  0x557eb872 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
> #5  0x55951233 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
> #6  0x55951233 in arm_cpu_do_interrupt (cs=0x57234550) at
> /scratch/alistai/master-qemu/target/arm/helper.c:8020

The problem is here (or further down the callstack) -- you
definitely don't want to be trying to take an interrupt from
the guest's perspective, which is what arm_cpu_do_interrupt()
is for...

This is probably happening because cpu->exception_index isn't
right at this point (though the arm code has a habit of leaving
it set to whatever its value was last...)

> #7  0x5585e75b in cpu_handle_exception (ret= pointer>, cpu=0x56c64200)
> at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
> #8  0x5585e75b in cpu_exec (cpu=cpu@entry=0x57234550) at
> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
> #9  0x5582d963 in tcg_cpu_exec (cpu=0x57234550) at
> /scratch/alistai/master-qemu/cpus.c:1297
> #10 0x5582d963 in qemu_tcg_cpu_thread_fn (arg=0x57234550)
> at /scratch/alistai/master-qemu/cpus.c:1502
> #11 0x71db37fc in start_thread (arg=0x7ffef6b43700) at
> pthread_create.c:465
> #12 0x71ae0b5f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

thanks
-- PMM



Re: [Qemu-devel] [PATCH] iotests: Fix CID for VMDK afl image

2018-01-31 Thread Max Reitz
On 2018-01-30 07:25, Fam Zheng wrote:
> This reverts commit 76bf133c4 which updated the reference output, and
> fixed the reference image, because the code path we want to exercise is
> actually the invalid image size.
> 
> The descriptor block in the image, which includes the CID to verify, has been
> invalid since the reference image was added. Since commit 9877860e7bd we 
> report
> this error earlier than the "file too large", so 059.out mismatches.
> 
> The binary change is generated along the operations of:
> 
>   $ bunzip2 afl9.vmdk.bz2
>   $ qemu-img create -f vmdk fix.vmdk 1G
>   $ dd if=afl9.vmdk.bz2 of=fix.vmdk bs=512 count=1 conv=notrunc
>   $ mv fix.vmdk afl9.vmdk
>   $ bzip2 afl9.vmdk
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Fix commit message "qcow2 -> vmdk". [Kevin]
> Revert 76bf133c4.

H, now this fails again on my 32 bit build. :-(

The issue there is that you get a "Cannot allocate memory" when trying
to open the file.  My current fix was 2291712c39111a732 which simply
converted that to "Invalid argument", but now it's getting a bit more
complicated...  Should I just continue to play the game and check the
output for "Cannot allocate memory" and print exactly what the reference
output is expecting...?

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH qemu v6] vfio-pci: Allow mmap of MSIX BAR

2018-01-31 Thread Alex Williamson
On Wed, 31 Jan 2018 18:24:45 +1100
Alexey Kardashevskiy  wrote:

> This makes use of a new VFIO_REGION_INFO_CAP_MSIX_MAPPABLE capability
> which tells that a region with MSIX data can be mapped entirely, i.e.
> the VFIO PCI driver won't prevent MSIX vectors area from being mapped.
> 
> With this change, all BARs are mapped in a single chunk and MSIX vectors
> are emulated on top unless the machine requests not to by defining and
> enabling a new "vfio-no-msix-emulation" property. At the moment only
> sPAPR machine does so - it prohibits MSIX emulation and does not allow
> enabling it as it does not define the "set" callback for the new property;
> the new property also does not appear in "-machine pseries,help".
> 
> This change may affect machines which are using MSIX emulation and
> mapping MMIO RAM regions for DMA as previously MSIX containing BAR
> would be split in chunks aligned to the system page size and we would
> only see aligned RAM memory sections in vfio_listener_region_add/del.
> As the system page size is usually is equal or bigger than the IOMMU
> page size, vfio_dma_map() did not have a legitimate reason to fail so
> mapping failures were fatal. Note that this behaviour silently skips
> parts of BARs adjacent to the MSIX data.
> 
> With this change, we map entire MSIX BAR and emulate MSIX on top of it
> so vfio_listener_region_add/del may be called with unaligned MMIO RAM
> regions and vfio_dma_map() will fail on these. In order to mitigate
> this, this changes vfio_listener_region_add() not to try vfio_dma_map() if
> the memory section is not aligned to the minimal IOMMU page; an error
> is printed instead. This also treats all errors from vfio_dma_map()
> non fatal when the listener is called on the MMIO RAM region.
> vfio_listener_region_del() is adjusted accordingly.
> 
> If the amount of errors printed is overwhelming, the MSIX relocation
> could be used to avoid excessive error output.
> 
> This requires the kernel change - "vfio-pci: Allow mapping MSIX BAR" -
> for the new capability: https://www.spinics.net/lists/kvm/msg160282.html

Looks like there are at least 3 or 4 patches here:

A) Update linux-headers (gated by v4.16-rc1)
B) vfio_is_cap_present
C) check alignment on region_add/del vs hostwin, non-fatal ram_device
D) platform directed disabling of MSI-X MemoryRegion

 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> Changes:
> v6:
> * dropped that stupid p2p property patch
> 
> v5:
> * rebased on top of 'p2p' proposed patch
> 
> v4:
> * silenced dma map errors if unaligned mapping is attempted - they are going
> to fail anyway
> 
> v3:
> * vfio_listener_region_add() won't make qemu exit if failed on MMIO MR
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  linux-headers/linux/vfio.h|  5 
>  hw/ppc/spapr.c|  7 +
>  hw/vfio/common.c  | 66 
> +++
>  hw/vfio/pci.c | 10 +++
>  5 files changed, 83 insertions(+), 6 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index f3a2ac9..927d600 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -171,6 +171,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
>   struct vfio_region_info **info);
>  int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
>   uint32_t subtype, struct vfio_region_info 
> **info);
> +bool vfio_is_cap_present(VFIODevice *vbasedev, uint16_t cap_type, int 
> region);
>  #endif
>  extern const MemoryListener vfio_prereg_listener;
>  
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 4312e96..b45182e 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -301,6 +301,11 @@ struct vfio_region_info_cap_type {
>  #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG   (2)
>  #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG(3)
>  
> +/*
> + * The MSIX mappable capability informs that MSIX data of a BAR can be 
> mmapped.
> + */
> +#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE   3
> +
>  /**
>   * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
>   *   struct vfio_irq_info)
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 32a876b..6d333e2 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2830,6 +2830,11 @@ static void spapr_set_modern_hotplug_events(Object 
> *obj, bool value,
>  spapr->use_hotplug_event_source = value;
>  }
>  
> +static bool spapr_get_msix_emulation(Object *obj, Error **errp)
> +{
> +return true;
> +}
> +
>  static char *spapr_get_resize_hpt(Object *obj, Error **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -2911,6 +2916,8 @@ static void spapr_instance_init(Object *obj)
>  object_property_set_description(obj, "vsmt",
>  "Virtual SMT: KVM behaves as 

Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Alistair Francis
On Wed, Jan 31, 2018 at 10:51 AM, Alistair Francis  wrote:
> On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
>  wrote:
>> On 31 January 2018 at 18:17, Alistair Francis  wrote:
>>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini  wrote:
 cpu->halted = false likewise should not be needed here, but you cannot
 just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
 interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
 cpu_handle_halt will clear cpu->halted.
>>>
>>> The problem with that is that I hit this assert for ARM CPUs:
>>>
>>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>>> Assertion `el >= 1 && el <= 3' failed.
>>
>> Backtrace from when you hit that might be useful...
>
> Here it is:
>
> (gdb) bt
> #0  0x71a030bb in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x71a04f5d in __GI_abort () at abort.c:90
> #2  0x719faf17 in __assert_fail_base (fmt=,
> assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x55cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:92
> #3  0x719fafc2 in __GI___assert_fail
> (assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
> file=file@entry=0x55cf8660
> "/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
> function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
> "arm_el_is_aa64") at assert.c:101
> #4  0x557eb872 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1446
> #5  0x55951233 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
> /scratch/alistai/master-qemu/target/arm/cpu.h:1838
> #6  0x55951233 in arm_cpu_do_interrupt (cs=0x57234550) at
> /scratch/alistai/master-qemu/target/arm/helper.c:8020
> #7  0x5585e75b in cpu_handle_exception (ret= pointer>, cpu=0x56c64200)
> at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
> #8  0x5585e75b in cpu_exec (cpu=cpu@entry=0x57234550) at
> /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
> #9  0x5582d963 in tcg_cpu_exec (cpu=0x57234550) at
> /scratch/alistai/master-qemu/cpus.c:1297
> #10 0x5582d963 in qemu_tcg_cpu_thread_fn (arg=0x57234550)
> at /scratch/alistai/master-qemu/cpus.c:1502
> #11 0x71db37fc in start_thread (arg=0x7ffef6b43700) at
> pthread_create.c:465
> #12 0x71ae0b5f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This diff works around it, at least for now:

diff --git a/target/arm/helper.c b/target/arm/helper.c
index eebc898b37..06b40809d9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8015,6 +8015,10 @@ void arm_cpu_do_interrupt(CPUState *cs)
 return;
 }

+if (is_a64(env) && new_el == 0) {
+return;
+}
+
 assert(!excp_is_internal(cs->exception_index));
 if (arm_el_is_aa64(env, new_el)) {
 arm_cpu_do_interrupt_aarch64(cs);


>
> Alistair
>
>>
>> thanks
>> -- PMM



Re: [Qemu-devel] [PATCH] hbitmap: fix missing restore count when finish deserialization

2018-01-31 Thread Max Reitz
On 2018-01-18 11:58, Liang Li wrote:
> The .count of HBitmap is forgot to set in function
> hbitmap_deserialize_finish, let's set it to the right value.
> 
> Cc: Vladimir Sementsov-Ogievskiy 
> Cc: Fam Zheng 
> Cc: Max Reitz 
> Cc: John Snow 
> Signed-off-by: weiping zhang 
> Signed-off-by: Liang Li 
> ---
>  util/hbitmap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 289778a..58a2c93 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -630,6 +630,7 @@ void hbitmap_deserialize_finish(HBitmap *bitmap)
>  }
>  
>  bitmap->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> +bitmap->count = hb_count_between(bitmap, 0, bitmap->size - 1);
>  }
>  
>  void hbitmap_free(HBitmap *hb)

Actually CC-ing John...

(Looks good to me, though.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Alistair Francis
On Wed, Jan 31, 2018 at 10:48 AM, Peter Maydell
 wrote:
> On 31 January 2018 at 18:17, Alistair Francis  wrote:
>> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini  wrote:
>>> cpu->halted = false likewise should not be needed here, but you cannot
>>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>>> cpu_handle_halt will clear cpu->halted.
>>
>> The problem with that is that I hit this assert for ARM CPUs:
>>
>> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
>> Assertion `el >= 1 && el <= 3' failed.
>
> Backtrace from when you hit that might be useful...

Here it is:

(gdb) bt
#0  0x71a030bb in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:51
#1  0x71a04f5d in __GI_abort () at abort.c:90
#2  0x719faf17 in __assert_fail_base (fmt=,
assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
file=file@entry=0x55cf8660
"/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
"arm_el_is_aa64") at assert.c:92
#3  0x719fafc2 in __GI___assert_fail
(assertion=assertion@entry=0x55cf86c4 "el >= 1 && el <= 3",
file=file@entry=0x55cf8660
"/scratch/alistai/master-qemu/target/arm/cpu.h", line=line@entry=1446,
function=function@entry=0x55d314e8 <__PRETTY_FUNCTION__.24916>
"arm_el_is_aa64") at assert.c:101
#4  0x557eb872 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
/scratch/alistai/master-qemu/target/arm/cpu.h:1446
#5  0x55951233 in arm_el_is_aa64 (el=0, env=0x5723c7f8) at
/scratch/alistai/master-qemu/target/arm/cpu.h:1838
#6  0x55951233 in arm_cpu_do_interrupt (cs=0x57234550) at
/scratch/alistai/master-qemu/target/arm/helper.c:8020
#7  0x5585e75b in cpu_handle_exception (ret=, cpu=0x56c64200)
at /scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:532
#8  0x5585e75b in cpu_exec (cpu=cpu@entry=0x57234550) at
/scratch/alistai/master-qemu/accel/tcg/cpu-exec.c:748
#9  0x5582d963 in tcg_cpu_exec (cpu=0x57234550) at
/scratch/alistai/master-qemu/cpus.c:1297
#10 0x5582d963 in qemu_tcg_cpu_thread_fn (arg=0x57234550)
at /scratch/alistai/master-qemu/cpus.c:1502
#11 0x71db37fc in start_thread (arg=0x7ffef6b43700) at
pthread_create.c:465
#12 0x71ae0b5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Alistair

>
> thanks
> -- PMM



Re: [Qemu-devel] MTTCG External Halt

2018-01-31 Thread Peter Maydell
On 31 January 2018 at 18:17, Alistair Francis  wrote:
> On Wed, Jan 31, 2018 at 9:13 AM, Paolo Bonzini  wrote:
>> cpu->halted = false likewise should not be needed here, but you cannot
>> just clear CPU_INTERRUPT_HALT either.  You need to set a *different*
>> interrupt request bit (the dummy CPU_INTERRUPT_EXITTB will do) and
>> cpu_handle_halt will clear cpu->halted.
>
> The problem with that is that I hit this assert for ARM CPUs:
>
> qemu-system-aarch64: ./target/arm/cpu.h:1446: arm_el_is_aa64:
> Assertion `el >= 1 && el <= 3' failed.

Backtrace from when you hit that might be useful...

thanks
-- PMM



Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas

2018-01-31 Thread Eric Blake
On 01/31/2018 12:35 PM, Max Reitz wrote:

>>> Not sure how useful 2 is, though.  (I don't know.  I always hear about
>>> people wanting to optimize for such a case where a backing file is
>>> shorter than the overlay, but I can't imagine a real use case for that.)
>>
>> I can; here's what's happened to me personally.  I created an image, and
>> took internal snapshots (yeah, I know those aren't in fashion these
>> days, but this was long ago).  Later, I ran out of space.  I wanted to
>> resize the image, but am not convinced whether resizing the image will
>> play nicely with the internal snapshots (in fact, I don't recall
>> off-hand whether this was something we prevented in the past and now
>> support, or if it is still unsupported now...) - so the easiest way is
>> to create a larger overlay image.
> 
> But you were convinced that creating an overlay would play nicely with
> the internal snapshots? ;-)

Yes. As long as I don't mind pointing back to the original file (rather
than the overlay) at the point I attempt to revert to the internal
snapshot, then loading the snapshot doesn't have to worry about a size
change.

> 
> I'm not sure whether that sounds like a use case I'd want to optimize
> for, but, well.

I guess it boils down to whether the optimization is a low-maintenance
freebie.  There's no reason to reject the optimization if a patch
demonstrates it is easy to support and it has iotests coverage.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Bug 1739304] Re: Passing a directory to (eg.) -cdrom results in misleading error message

2018-01-31 Thread John Snow


On 01/31/2018 01:36 PM, Michal Suchánek wrote:
> On Wed, 20 Dec 2017 21:57:00 -0500
> John Snow  wrote:
> 
>> On 12/20/2017 05:31 AM, Dr. David Alan Gilbert wrote:
>>> Yep, can repeat it here, it seems pretty random which error it
>>> gives:
>>>
>>> [dgilbert@dgilbert-t530 try]$ ./x86_64-softmmu/qemu-system-x86_64
>>> -cdrom /tmp qemu-system-x86_64: -cdrom /tmp: Could not refresh
>>> total sector count: Invalid argument [dgilbert@dgilbert-t530
>>> try]$ ./x86_64-softmmu/qemu-system-x86_64 -cdrom /var
>>> qemu-system-x86_64: -cdrom /var: Could not read image for
>>> determining its format: File too large
>>>
>>>
>>> ** Changed in: qemu
>>>Status: New => Confirmed
>>>   
>>
>> Looks like directories play funny games.
>>
> 
> ...
> 
>> specifically:
>>
>> (gdb) s
>> 1963 size = lseek(s->fd, 0, SEEK_END);
>> (gdb) s
>> 1964 if (size < 0) {
>> (gdb) print size
>> $37 = 9223372036854775807
>>
>> cool, cool, cool. This value is 0x7fff and errno isn't
>> set. cool and good function.
> 
> Indeed: The behavior of lseek() on devices which are incapable of
> seeking is implementation-defined.
> 
>>
>> so, lseek on a folder returns crazy nonsense. Perhaps we ought to
>> actually specifically disallow folders, we don't appear to.
> 
> It probably returns -1 which it is supposed to do on error. It should
> also set errno in that case, though.
> 
> So this is probably bug in the error handling code in lseek.
> 
> Thanks
> 
> Michal
> 

Thanks. It looks like we can't make stronger guarantees about the
behavior of lseek, so I submitted a patch to ratchet down QEMU's
acceptable file types:

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05055.html

Thanks,
--John



Re: [Qemu-devel] [PATCH v4] file-posix: specify expected filetypes

2018-01-31 Thread John Snow
ping

On 01/19/2018 06:03 PM, Eric Blake wrote:
> On 01/19/2018 04:47 PM, John Snow wrote:
>> Adjust each caller of raw_open_common to specify if they are expecting
>> host and character devices or not. Tighten expectations of file types upon
>> open in the common code and refuse types that are not expected.
>>
>> This has two effects:
>>
>> (1) Character and block devices are now considered deprecated for the
>> 'file' driver, which expects only S_IFREG, and
>> (2) no file-posix driver (file, host_cdrom, or host_device) can open
>> directories now.
>>
>> I don't think there's a legitimate reason to open directories as if
>> they were files. This prevents QEMU from opening and attempting to probe
>> a directory inode, which can break in exciting ways. One of those ways
>> is lseek on ext4/xfs, which will return 0x7fff as the file
>> size instead of EISDIR. This can coax QEMU into responding with a
>> confusing "file too big" instead of "Hey, that's not a file".
>>
>> See: https://bugs.launchpad.net/qemu/+bug/1739304/
>> Signed-off-by: John Snow 
>> ---
> 
> Reviewed-by: Eric Blake 
> 



  1   2   3   >