Re: [PATCH] chardev/char-socket: Fix TLS io channels sending too much data to the backend

2024-02-29 Thread Antoine Damhet
On Thu, Feb 29, 2024 at 11:43:37AM +0100, Thomas Huth wrote:
> Commit ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
> changed the behavior of the TLS io channels to schedule a second reading
> attempt if there is still incoming data pending. This caused a regression
> with backends like the sclpconsole that check in their read function that
> the sender does not try to write more bytes to it than the device can
> currently handle.
> 
> The problem can be reproduced like this:
> 
>  1) In one terminal, do this:
> 
>   mkdir qemu-pki
>   cd qemu-pki
>   openssl genrsa 2048 > ca-key.pem
>   openssl req -new -x509 -nodes -days 365000 -key ca-key.pem -out ca-cert.pem
>   # enter some dummy value for the cert
>   openssl genrsa 2048 > server-key.pem
>   openssl req -new -x509 -nodes -days 365000 -key server-key.pem \
> -out server-cert.pem
>   # enter some other dummy values for the cert
> 
>   gnutls-serv --echo --x509cafile ca-cert.pem --x509keyfile server-key.pem \
>   --x509certfile server-cert.pem -p 8338
> 
>  2) In another terminal, do this:
> 
>   wget 
> https://download.fedoraproject.org/pub/fedora-secondary/releases/39/Cloud/s390x/images/Fedora-Cloud-Base-39-1.5.s390x.qcow2
> 
>   qemu-system-s390x -nographic -nodefaults \
> -hda Fedora-Cloud-Base-39-1.5.s390x.qcow2 \
> -object 
> tls-creds-x509,id=tls0,endpoint=client,verify-peer=false,dir=$PWD/qemu-pki \
> -chardev socket,id=tls_chardev,host=localhost,port=8338,tls-creds=tls0 \
> -device sclpconsole,chardev=tls_chardev,id=tls_serial
> 
> QEMU then aborts after a second or two with:
> 
>   qemu-system-s390x: ../hw/char/sclpconsole.c:73: chr_read: Assertion
>`size <= SIZE_BUFFER_VT220 - scon->iov_data_len' failed.
>  Aborted (core dumped)
> 
> It looks like the second read does not trigger the chr_can_read() function
> to be called before the second read, which should normally always be done
> before sending bytes to a character device to see how much it can handle,
> so the s->max_size in tcp_chr_read() still contains the old value from the
> previous read. Let's make sure that we use the up-to-date value by calling
> tcp_chr_read_poll() again here.
> 
> Fixes: ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
> Buglink: https://issues.redhat.com/browse/RHEL-24614
> Reviewed-by: Daniel P. Berrangé 

Reviewed-by: Antoine Damhet 
Tested-by: Antoine Damhet 

> Signed-off-by: Thomas Huth 
> ---
>  Sorry if you've got this mail twice - I forgot to CC: qemu-devel when
>  I sent it out the first time ... *facepalm*
> 
>  chardev/char-socket.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 67e3334423..8a0406cc1e 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -496,9 +496,9 @@ static gboolean tcp_chr_read(QIOChannel *chan, 
> GIOCondition cond, void *opaque)
>  s->max_size <= 0) {
>  return TRUE;
>  }
> -len = sizeof(buf);
> -if (len > s->max_size) {
> -len = s->max_size;
> +len = tcp_chr_read_poll(opaque);
> +if (len > sizeof(buf)) {
> +len = sizeof(buf);
>  }
>  size = tcp_chr_recv(chr, (void *)buf, len);
>  if (size == 0 || (size == -1 && errno != EAGAIN)) {
> -- 
> 2.44.0
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [PATCH] io/channel-tls: plug memory leakage on GSource

2023-03-07 Thread Antoine Damhet
Nice catch !

On Mon, Mar 06, 2023 at 08:15:21PM -0300, Matheus Tavares Bernardino wrote:
> This leakage can be seen through test-io-channel-tls:
> 
> $ ../configure --target-list=aarch64-softmmu --enable-sanitizers
> $ make ./tests/unit/test-io-channel-tls
> $ ./tests/unit/test-io-channel-tls
> 
> Indirect leak of 104 byte(s) in 1 object(s) allocated from:
> #0 0x7f81d1725808 in __interceptor_malloc 
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
> #1 0x7f81d135ae98 in g_malloc 
> (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57e98)
> #2 0x55616c5d4c1b in object_new_with_propv ../qom/object.c:795
> #3 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768
> #4 0x55616c5c5415 in test_tls_creds_create 
> ../tests/unit/test-io-channel-tls.c:70
> #5 0x55616c5c5a6b in test_io_channel_tls 
> ../tests/unit/test-io-channel-tls.c:158
> #6 0x7f81d137d58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)
> 
> Indirect leak of 32 byte(s) in 1 object(s) allocated from:
> #0 0x7f81d1725a06 in __interceptor_calloc 
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
> #1 0x7f81d1472a20 in gnutls_dh_params_init 
> (/lib/x86_64-linux-gnu/libgnutls.so.30+0x46a20)
> #2 0x55616c6485ff in qcrypto_tls_creds_x509_load 
> ../crypto/tlscredsx509.c:634
> #3 0x55616c648ba2 in qcrypto_tls_creds_x509_complete 
> ../crypto/tlscredsx509.c:694
> #4 0x55616c5e1fea in user_creatable_complete ../qom/object_interfaces.c:28
> #5 0x55616c5d4c8c in object_new_with_propv ../qom/object.c:807
> #6 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768
> #7 0x55616c5c5415 in test_tls_creds_create 
> ../tests/unit/test-io-channel-tls.c:70
> #8 0x55616c5c5a6b in test_io_channel_tls 
> ../tests/unit/test-io-channel-tls.c:158
> #9 0x7f81d137d58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)
> 
> ...
> 
> SUMMARY: AddressSanitizer: 49143 byte(s) leaked in 184 allocation(s).
> 
> The docs for `g_source_add_child_source(source, child_source)` says
> "source will hold a reference on child_source while child_source is
> attached to it." Therefore, we should unreference the child source at
> `qio_channel_tls_read_watch()` after attaching it to `source`. With this
> change, ./tests/unit/test-io-channel-tls shows no leakages.
> 
> Signed-off-by: Matheus Tavares Bernardino 

Reviewed-by: Antoine Damhet 

> ---
>  io/channel-tls.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 8052945ba0..5a7a3d48d6 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -446,6 +446,7 @@ qio_channel_tls_read_watch(QIOChannelTLS *tioc, GSource 
> *source)
>  object_ref(OBJECT(tioc));
>  
>  g_source_add_child_source(source, child);
> +g_source_unref(child);
>  }
>  
>  static GSource *qio_channel_tls_create_watch(QIOChannel *ioc,
> -- 
> 2.37.2
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [PATCH 2/2] io/channel-tls: fix handling of bigger read buffers

2022-12-16 Thread Antoine Damhet
On Wed, Nov 16, 2022 at 10:52:20AM +, Daniel P. Berrangé wrote:
> On Tue, Nov 15, 2022 at 03:23:29PM +0100, antoine.dam...@shadow.tech wrote:
> > From: Antoine Damhet 
> > 
> > Since the TLS backend can read more data from the underlying QIOChannel
> > we introduce a minimal child GSource to notify if we still have more
> > data available to be read.
> > 
> > Signed-off-by: Antoine Damhet 
> > Signed-off-by: Charles Frey 
> > ---
> >  io/channel-tls.c | 66 +++-
> >  1 file changed, 65 insertions(+), 1 deletion(-)
> 
> Reviewed-by: Daniel P. Berrangé 

Thanks,

Now that the 7.2.0 is released, can we hope to get this queued ? If not
what should I do ?

Best regards,

-- 
Antoine 'xdbob' Damhet

> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 


signature.asc
Description: PGP signature


[PATCH 2/2] io/channel-tls: fix handling of bigger read buffers

2022-11-15 Thread antoine . damhet
From: Antoine Damhet 

Since the TLS backend can read more data from the underlying QIOChannel
we introduce a minimal child GSource to notify if we still have more
data available to be read.

Signed-off-by: Antoine Damhet 
Signed-off-by: Charles Frey 
---
 io/channel-tls.c | 66 +++-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 4ce890a538..4f2b8828f9 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -388,12 +388,76 @@ static void qio_channel_tls_set_aio_fd_handler(QIOChannel 
*ioc,
 qio_channel_set_aio_fd_handler(tioc->master, ctx, io_read, io_write, 
opaque);
 }
 
+typedef struct QIOChannelTLSSource QIOChannelTLSSource;
+struct QIOChannelTLSSource {
+GSource parent;
+QIOChannelTLS *tioc;
+};
+
+static gboolean
+qio_channel_tls_source_check(GSource *source)
+{
+QIOChannelTLSSource *tsource = (QIOChannelTLSSource *)source;
+
+return qcrypto_tls_session_check_pending(tsource->tioc->session) > 0;
+}
+
+static gboolean
+qio_channel_tls_source_prepare(GSource *source, gint *timeout)
+{
+*timeout = -1;
+return qio_channel_tls_source_check(source);
+}
+
+static gboolean
+qio_channel_tls_source_dispatch(GSource *source, GSourceFunc callback,
+gpointer user_data)
+{
+return G_SOURCE_CONTINUE;
+}
+
+static void
+qio_channel_tls_source_finalize(GSource *source)
+{
+QIOChannelTLSSource *tsource = (QIOChannelTLSSource *)source;
+
+object_unref(OBJECT(tsource->tioc));
+}
+
+static GSourceFuncs qio_channel_tls_source_funcs = {
+qio_channel_tls_source_prepare,
+qio_channel_tls_source_check,
+qio_channel_tls_source_dispatch,
+qio_channel_tls_source_finalize
+};
+
+static void
+qio_channel_tls_read_watch(QIOChannelTLS *tioc, GSource *source)
+{
+GSource *child;
+QIOChannelTLSSource *tlssource;
+
+child = g_source_new(_channel_tls_source_funcs,
+  sizeof(QIOChannelTLSSource));
+tlssource = (QIOChannelTLSSource *)child;
+
+tlssource->tioc = tioc;
+object_ref(OBJECT(tioc));
+
+g_source_add_child_source(source, child);
+}
+
 static GSource *qio_channel_tls_create_watch(QIOChannel *ioc,
  GIOCondition condition)
 {
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
+GSource *source = qio_channel_create_watch(tioc->master, condition);
+
+if (condition & G_IO_IN) {
+qio_channel_tls_read_watch(tioc, source);
+}
 
-return qio_channel_create_watch(tioc->master, condition);
+return source;
 }
 
 QCryptoTLSSession *
-- 
2.38.1




[PATCH 1/2] crypto: TLS: introduce `check_pending`

2022-11-15 Thread antoine . damhet
From: Antoine Damhet 

The new `qcrypto_tls_session_check_pending` function allows the caller
to know if data have already been consumed from the backend and is
already available.

Signed-off-by: Antoine Damhet 
---
 crypto/tlssession.c | 14 ++
 include/crypto/tlssession.h | 11 +++
 2 files changed, 25 insertions(+)

diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index b302d835d2..1e98f44e0d 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -493,6 +493,13 @@ qcrypto_tls_session_read(QCryptoTLSSession *session,
 }
 
 
+size_t
+qcrypto_tls_session_check_pending(QCryptoTLSSession *session)
+{
+return gnutls_record_check_pending(session->handle);
+}
+
+
 int
 qcrypto_tls_session_handshake(QCryptoTLSSession *session,
   Error **errp)
@@ -615,6 +622,13 @@ qcrypto_tls_session_read(QCryptoTLSSession *sess,
 }
 
 
+size_t
+qcrypto_tls_session_check_pending(QCryptoTLSSession *session)
+{
+return 0;
+}
+
+
 int
 qcrypto_tls_session_handshake(QCryptoTLSSession *sess,
   Error **errp)
diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
index 15b9cef086..571049bd0e 100644
--- a/include/crypto/tlssession.h
+++ b/include/crypto/tlssession.h
@@ -248,6 +248,17 @@ ssize_t qcrypto_tls_session_read(QCryptoTLSSession *sess,
  char *buf,
  size_t len);
 
+/**
+ * qcrypto_tls_session_check_pending:
+ * @sess: the TLS session object
+ *
+ * Check if there are unread data in the TLS buffers that have
+ * already been read from the underlying data source.
+ *
+ * Returns: the number of bytes available or zero
+ */
+size_t qcrypto_tls_session_check_pending(QCryptoTLSSession *sess);
+
 /**
  * qcrypto_tls_session_handshake:
  * @sess: the TLS session object
-- 
2.38.1




[PATCH 0/2] TLS: fix read stall with large buffers

2022-11-15 Thread antoine . damhet
From: Antoine Damhet 

At least with the TCP backend the tls implementation can stall because
the current notification mechanism is based on the readyness status of
the underlying file descriptor but gnutls can read all the data
available while the consumer has only handled part (eg: the TCP
implementation is capped to 4096 bytes per read).

We encountered the bug on the real world using encrypted
USB-redirection but we could reproduce it with virtio-serial.

On the host side:

```
$ mkdir /tmp/tls-test
$ echo 'test:8e6da54b954357236ec2eed1df0dc3542163dd03bf9c94f2d90781a3a3e443c9' 
> /tmp/tls-test/keys.psk
$ qemu-system-x86_64 -m 4G -enable-kvm -device virtio-serial \
-object tls-creds-psk,id=tls0,endpoint=server,dir=/tmp/tls-test \
-chardev 
socket,server=on,wait=off,id=serial-sock,port=4242,host=127.0.0.1,tls-creds=tls0
 \
-device virtserialport,chardev=serial-sock,name=test.serial \
-cdrom archlinux-2022.11.01-x86_64.iso
# The sleep is to keep the socket open during the test
$ python -c 'print(8192 * "a"); import time; time.sleep(600);' | \
openssl s_client -connect 127.0.0.1:4242 \
-psk 8e6da54b954357236ec2eed1df0dc3542163dd03bf9c94f2d90781a3a3e443c9 \
-psk_identity test
```

On the guest side:

```
# The socket is forced open, we stop cat manually
$ cat /dev/virtio-ports/test.serial > test.data
^C
# only part of the data was readable
$ wc -c test.data
4096
```

Antoine Damhet (2):
  crypto: TLS: introduce `check_pending`
  io/channel-tls: fix handling of bigger read buffers

 crypto/tlssession.c | 14 
 include/crypto/tlssession.h | 11 +++
 io/channel-tls.c| 66 -
 3 files changed, 90 insertions(+), 1 deletion(-)

-- 
2.38.1




Re: [PATCH] block/block-backend: blk_set_enable_write_cache is IO_CODE

2022-10-27 Thread Antoine Damhet
Thanks, it works for us

On Thu, Oct 27, 2022 at 03:27:26AM -0400, Emanuele Giuseppe Esposito wrote:
> blk_set_enable_write_cache() is defined as GLOBAL_STATE_CODE
> but can be invoked from iothreads when handling scsi requests.
> This triggers an assertion failure:
> 
>  0x7fd6c3515ce1 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>  0x7fd6c34ff537 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>  0x7fd6c34ff40f in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>  0x7fd6c350e662 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
>  0x56149e2cea03 in blk_set_enable_write_cache (wce=true, 
> blk=0x5614a01c27f0)
>at ../src/block/block-backend.c:1949
>  0x56149e2d0a67 in blk_set_enable_write_cache (blk=0x5614a01c27f0,
>wce=) at ../src/block/block-backend.c:1951
>  0x56149dfe9c59 in scsi_disk_apply_mode_select (p=0x7fd6b400c00e "\004",
>page=, s=) at ../src/hw/scsi/scsi-disk.c:1520
>  mode_select_pages (change=true, len=18, p=0x7fd6b400c00e "\004", 
> r=0x7fd6b4001ff0)
>at ../src/hw/scsi/scsi-disk.c:1570
>  scsi_disk_emulate_mode_select (inbuf=, r=0x7fd6b4001ff0) at
>../src/hw/scsi/scsi-disk.c:1640
>  scsi_disk_emulate_write_data (req=0x7fd6b4001ff0) at 
> ../src/hw/scsi/scsi-disk.c:1934
>  0x56149e18ff16 in virtio_scsi_handle_cmd_req_submit (req=,
>req=, s=0x5614a12f16b0) at ../src/hw/scsi/virtio-scsi.c:719
>  virtio_scsi_handle_cmd_vq (vq=0x7fd6bab92140, s=0x5614a12f16b0) at
>../src/hw/scsi/virtio-scsi.c:761
>  virtio_scsi_handle_cmd (vq=, vdev=) at
>../src/hw/scsi/virtio-scsi.c:775
>  virtio_scsi_handle_cmd (vdev=0x5614a12f16b0, vq=0x7fd6bab92140) at
>../src/hw/scsi/virtio-scsi.c:765
>  0x56149e1a8aa6 in virtio_queue_notify_vq (vq=0x7fd6bab92140) at
>../src/hw/virtio/virtio.c:2365
>  0x56149e3ccea5 in aio_dispatch_handler (ctx=ctx@entry=0x5614a01babe0,
>node=) at ../src/util/aio-posix.c:369
>  0x56149e3cd868 in aio_dispatch_ready_handlers (ready_list=0x7fd6c09b2680,
>ctx=0x5614a01babe0) at ../src/util/aio-posix.c:399
>  aio_poll (ctx=0x5614a01babe0, blocking=blocking@entry=true) at
>../src/util/aio-posix.c:713
>  0x56149e2a7796 in iothread_run (opaque=opaque@entry=0x56149ffde500) at
>../src/iothread.c:67
>  0x56149e3d0859 in qemu_thread_start (args=0x7fd6c09b26f0) at
>../src/util/qemu-thread-posix.c:504
>  0x7fd6c36b9ea7 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
>  0x7fd6c35d9aef in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> Changing GLOBAL_STATE_CODE in IO_CODE is allowed, since GSC callers are
> allowed to call IO_CODE.
> 
> Resolves: #1272
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Tested-by: Antoine Damhet 

> ---
>  block/block-backend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)



-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [DISCUSSION] Allow ACPI default OEM ID and OEM table ID fields to be set.

2020-11-26 Thread Antoine Damhet
On Thu, Nov 26, 2020 at 12:09:13PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 26, 2020 at 05:34:50PM +0100, Antoine Damhet wrote:
> > On Thu, Nov 26, 2020 at 08:29:41AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Nov 26, 2020 at 01:50:12PM +0100, Antoine Damhet wrote:
> > > > On Thu, Nov 26, 2020 at 06:09:11AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Nov 25, 2020 at 09:13:22PM +0100, Antoine Damhet wrote:
> > > > > > On Wed, Nov 25, 2020 at 11:04:55AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Nov 25, 2020 at 01:32:51PM +, Richard W.M. Jones 
> > > > > > > wrote:
> > > > > > > > On Wed, Nov 25, 2020 at 02:27:11PM +0100, Antoine Damhet wrote:
> > 
> > [...]
> > 
> > > Exactly so I ask myself whether it's worth it, their next version
> > > will check CPUID and then where are we?
> > 
> > Then I guess they will have to admit that they are purposefully blocking
> > VM use and it's not our problem anymore.
> > 
> > > But maybe it's time we just changed all these IDs to e.g. QEMU.
> > > We are very far from bochs generated tables by now.
> > 
> > That's a good idea, but I still think they should be user override-able
> > (unless you think it would be a heavy maintenance burden, in that case
> > you are king in your castle :D )
> 
> Fundamentally there's a chance that if you get unlucky
> with your OEM ID some software somewhere will look for
> it and try to activate some vendor specific behaviour.
> So giving users full control here isn't all that safe ...
> 
> Question btw, are you also supplying a SLIC table?

In production we do not, I've only toyed with adding a dummy table to
see if the `oem_id` and `oem_table_id` there were enough.

> 
> 
> > > Question is will this cause annoyances with e.g. windows guests?
> > 
> > Windows 10 guests seems unaffected, I cannot say for the other
> > versions/servers editions.
> > 
> > > Igor what's your experience with this?
> > 
> > [...]
> > 
> > -- 
> > Antoine 'xdbob' Damhet
> 

-- 
Antoine 'xdbob' Damhet



Re: [DISCUSSION] Allow ACPI default OEM ID and OEM table ID fields to be set.

2020-11-26 Thread Antoine Damhet
On Thu, Nov 26, 2020 at 08:29:41AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 26, 2020 at 01:50:12PM +0100, Antoine Damhet wrote:
> > On Thu, Nov 26, 2020 at 06:09:11AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Nov 25, 2020 at 09:13:22PM +0100, Antoine Damhet wrote:
> > > > On Wed, Nov 25, 2020 at 11:04:55AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Nov 25, 2020 at 01:32:51PM +, Richard W.M. Jones wrote:
> > > > > > On Wed, Nov 25, 2020 at 02:27:11PM +0100, Antoine Damhet wrote:

[...]

> Exactly so I ask myself whether it's worth it, their next version
> will check CPUID and then where are we?

Then I guess they will have to admit that they are purposefully blocking
VM use and it's not our problem anymore.

> But maybe it's time we just changed all these IDs to e.g. QEMU.
> We are very far from bochs generated tables by now.

That's a good idea, but I still think they should be user override-able
(unless you think it would be a heavy maintenance burden, in that case
you are king in your castle :D )

> Question is will this cause annoyances with e.g. windows guests?

Windows 10 guests seems unaffected, I cannot say for the other
versions/servers editions.

> Igor what's your experience with this?

[...]

-- 
Antoine 'xdbob' Damhet



Re: [DISCUSSION] Allow ACPI default OEM ID and OEM table ID fields to be set.

2020-11-26 Thread Antoine Damhet
On Thu, Nov 26, 2020 at 06:09:11AM -0500, Michael S. Tsirkin wrote:
> On Wed, Nov 25, 2020 at 09:13:22PM +0100, Antoine Damhet wrote:
> > On Wed, Nov 25, 2020 at 11:04:55AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Nov 25, 2020 at 01:32:51PM +, Richard W.M. Jones wrote:
> > > > On Wed, Nov 25, 2020 at 02:27:11PM +0100, Antoine Damhet wrote:

[...]

> > 
> > I'm sorry I cannot give you the name of the crashing software due to a
> > company policy. But I can tell you that if either `BOCHS ` or `BXPC` is
> > present in any of the tables it will crash. Any (or at least the few
> > that I threw at it) other string will work so it seems it's some kind
> > of DRM-related hypervisor detection.
> 
> Hmm I'm not sure how far we want to go with this. If software vendors
> want to detect a hypervisor there will always be a way.
> How are we sure we are not starting an arms race here?

We can't but IMHO, as long as we stay within the specs we should be OK.
There are far more obvious checks like the `CPUID[0x1].ECX[31]` which
would destroy most of the PV features in a proprietary OS like Windows
if disabled.

Worst case scenario they would do timing-based detection and that would
be insane to defeat. As for the `Shadow` virtual machines we try to
"play" fair by exposing deterministic values (for example `Shadow` and
`Blade` are clearly exposed in SMBIOS) and don't hide the fact that we
are a virtual machine, so we are easy to ban if the vendor really wishes
to.

> 
> Also which of the IDs matter?  OEMID? OEM Table ID? Creator ID?

I just checked for the Creator ID and it also crash, my guess is that
they dump the tables and look for `BOSH` and `BXPC` patterns anywhere.

PS: we reached-out to the software-vendor which did not acknowledge
banning VMs but added an entry to their FAQ saying that VMs were not
supported.

> 
> 
> > As for the uniqueness of the table IDs, I guess it would be sane to keep
> > the same pattern (id+table sig) but allowing the first 4 bytes to be
> > overridden.
> > 
> > [...]
> 
> It's certainly possible, it's just very specific to just this DRM scheme.
> Not sure what's a better way to do it:
>   qemu -acpidefault oem_id=ABCD,oem_table_id=EFGHIJKL
> is probably going too far since then table IDs are not unique.
> 
> Also I'd probably use machine properties for this, the need here
> is baroque enough that we don't want a dedicated option.
> 
> > 
> > -- 
> > Antoine 'xdbob' Damhet
> 

-- 
Antoine 'xdbob' Damhet



Re: [DISCUSSION] Allow ACPI default OEM ID and OEM table ID fields to be set.

2020-11-25 Thread Antoine Damhet
On Wed, Nov 25, 2020 at 11:04:55AM -0500, Michael S. Tsirkin wrote:
> On Wed, Nov 25, 2020 at 01:32:51PM +, Richard W.M. Jones wrote:
> > On Wed, Nov 25, 2020 at 02:27:11PM +0100, Antoine Damhet wrote:
> > > Hello,
> > > 
> > > We recently found out that some softwares are effectively crashing
> > > when they detect qemu's `OEM ID` or `OEM table ID` in the ACPI tables.
> > > 
> > > I see no reason not to expose the setting to the user/command-line. A
> > > previous patch has been submitted in 2015[1] but did not get through
> > > because (if I understand correctly) using the IDs on the `SLIC`, `BXPC`
> > > and `RSDT` tables were enough at the time.
> > > 
> > > If you agree, I am willing to forward port the patches of M. Jones but I
> > > need to ask how it would work `Signed-Off`-wise ?
> > 
> > On this point, the patch I sent was actually written by
> > Michael Tokarev, I was only trying to get them upstream.
> > 
> > Rich.
> 
> I think at least one of the issues is that e.g. UEFI at least
> seems to assume unique OEM table IDs e.g. for SSDTs.
> 
> So let's try to be more specific please, which software
> crashes, what does it want to see and in which table.

I'm sorry I cannot give you the name of the crashing software due to a
company policy. But I can tell you that if either `BOCHS ` or `BXPC` is
present in any of the tables it will crash. Any (or at least the few
that I threw at it) other string will work so it seems it's some kind
of DRM-related hypervisor detection.

As for the uniqueness of the table IDs, I guess it would be sane to keep
the same pattern (id+table sig) but allowing the first 4 bytes to be
overridden.

[...]

-- 
Antoine 'xdbob' Damhet



[DISCUSSION] Allow ACPI default OEM ID and OEM table ID fields to be set.

2020-11-25 Thread Antoine Damhet
Hello,

We recently found out that some softwares are effectively crashing
when they detect qemu's `OEM ID` or `OEM table ID` in the ACPI tables.

I see no reason not to expose the setting to the user/command-line. A
previous patch has been submitted in 2015[1] but did not get through
because (if I understand correctly) using the IDs on the `SLIC`, `BXPC`
and `RSDT` tables were enough at the time.

If you agree, I am willing to forward port the patches of M. Jones but I
need to ask how it would work `Signed-Off`-wise ?

Thanks in advance for your time,

PS: the softwares will crash if the signature is found in any of the
exposed tables.

[1]: 
https://lore.kernel.org/qemu-devel/1441220618-4750-1-git-send-email-rjo...@redhat.com/

-- 
Antoine 'xdbob' Damhet



Re: [PATCH v2] target/i386: always create kvmclock device

2020-09-23 Thread Antoine Damhet
Hi,

The patch doesn't apply cleanly to master but it works (patched ->
patched working, patched -> unpatched: old behavior, unpatched ->
patched: old behavior)

Thanks to everyone for the swift responses :)

On Tue, Sep 22, 2020 at 05:19:34PM +0200, Vitaly Kuznetsov wrote:
> QEMU's kvmclock device is only created when KVM PV feature bits for
> kvmclock (KVM_FEATURE_CLOCKSOURCE/KVM_FEATURE_CLOCKSOURCE2) are
> exposed to the guest. With 'kvm=off' cpu flag the device is not
> created and we don't call KVM_GET_CLOCK/KVM_SET_CLOCK upon migration.
> It was reported that without these call at least Hyper-V TSC page
> clocksouce (which can be enabled independently) gets broken after
> migration.
> 
> Switch to creating kvmclock QEMU device unconditionally, it seems
> to always make sense to call KVM_GET_CLOCK/KVM_SET_CLOCK on migration.
> Use KVM_CAP_ADJUST_CLOCK check instead of CPUID feature bits.
> 
> Reported-by: Antoine Damhet 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Vitaly Kuznetsov 

Tested-by: Antoine Damhet 

> ---
>  hw/i386/kvm/clock.c| 7 +--
>  hw/i386/microvm.c  | 2 +-
>  hw/i386/pc.c   | 1 +
>  hw/i386/pc_piix.c  | 7 +--
>  hw/i386/pc_q35.c   | 5 -
>  include/hw/i386/pc.h   | 3 +++
>  include/hw/kvm/clock.h | 4 ++--
>  target/i386/kvm.c  | 5 +
>  target/i386/kvm_i386.h | 1 +
>  9 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 64283358f91d..30cf53393ee1 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -328,11 +328,14 @@ static const TypeInfo kvmclock_info = {
>  };
>  
>  /* Note: Must be called after VCPU initialization. */
> -void kvmclock_create(void)
> +void kvmclock_create(bool create_always)
>  {
>  X86CPU *cpu = X86_CPU(first_cpu);
>  
> -if (kvm_enabled() &&
> +if (!kvm_enabled() || !kvm_has_adjust_clock())
> +return;
> +
> +if (create_always ||
>  cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
>  sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 81d0888930d1..fd0b84109154 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -119,7 +119,7 @@ static void microvm_devices_init(MicrovmMachineState *mms)
>  
>  ioapic_init_gsi(gsi_state, "machine");
>  
> -kvmclock_create();
> +kvmclock_create(true);
>  
>  for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>  sysbus_create_simple("virtio-mmio",
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d11daacc23cf..0e036ef9c15f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1956,6 +1956,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
> *data)
>  pcmc->acpi_data_size = 0x2 + 0x8000;
>  pcmc->linuxboot_dma_enabled = true;
>  pcmc->pvh_enabled = true;
> +pcmc->kvmclock_create_always = true;
>  assert(!mc->get_hotplug_handler);
>  mc->get_hotplug_handler = pc_get_hotplug_handler;
>  mc->hotplug_allowed = pc_hotplug_allowed;
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 32b1453e6a82..1a68338c737b 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -158,8 +158,8 @@ static void pc_init1(MachineState *machine,
>  
>  x86_cpus_init(x86ms, pcmc->default_cpu_version);
>  
> -if (kvm_enabled() && pcmc->kvmclock_enabled) {
> -kvmclock_create();
> +if (pcmc->kvmclock_enabled) {
> +kvmclock_create(pcmc->kvmclock_create_always);
>  }
>  
>  if (pcmc->pci_enabled) {
> @@ -440,11 +440,14 @@ DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
>  
>  static void pc_i440fx_5_1_machine_options(MachineClass *m)
>  {
> +PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> +
>  pc_i440fx_5_2_machine_options(m);
>  m->alias = NULL;
>  m->is_default = false;
>  compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
>  compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
> +pcmc->kvmclock_create_always = false;
>  }
>  
>  DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 0cb9c18cd44d..0dd59bd765b1 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -185,7 +185,7 @@ static void pc_q35_init(MachineState *machine)
>  
>  x86_cpus_init(x86ms, pcmc->default_cpu_version);
>  
> -kvmclock_create();
>

Re: [PATCH] target/i386: always create kvmclock device

2020-09-18 Thread Antoine Damhet
On Thu, Sep 17, 2020 at 06:44:10PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > >> >
> > >> > Shouldn't the old check used when machine type <= 5.1 in order to avoid
> > >> > migration incompatibility ?
> > >> 
> > >> Hm, when the check fails we just don't create the device and no error is
> > >> reported, so even if we have kvmclock data in the migration stream but
> > >> fail to create it migration will still succeed, right? (not a migration
> > >> expert here :-)
> > >
> > > When the migration stream is parsed, it'll try and find a "kvmclock"
> > > device to pass the data it's reading to; if one doesn't exist it'll
> > > fail.
> > 
> > This may happen with an older machine type when the destination is
> > running an unfixed QEMU and the source has the fix, right?
> 
> Yes I think so.
> 
> > The solution
> > would be to introduce a flag for older machine types (or for new ones)
> > like 'kvmclock_always'.
> 
> Yep sounds the normal answer.
> (You might want to try it first to trigger the bug)

So, I tried the patch and:

# patched -> patched

Everything working as expected

# patched -> unpatched

Migration failure with:

```
Unknown savevm section or instance 'kvmclock' 0. Make sure that your current VM 
setup matches your saved VM setup, including any hotplugged devices
load of migration failed: Invalid argument
```

# unpatched -> patched

The guest hangs upon arrival, I don't know which value is restored but
something is restored (and far enough from 0 to confuse Windows).

> 
> > > The other question is in the incoming direction from an older VM;
> > > you'll have a kvm clock created here, but you won't load the kvm clock
> > > state from the migration stream - what is this clock going to do?
> > 
> > This is not really a problem I believe: the clock was absent on the
> > source and things somehow worked for the guest so even if we don't
> > initialize kvmclock properly on the destination nothing bad is expected.
> 
> OK.
> 
> Dave

[...]

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [PATCH] target/i386: always create kvmclock device

2020-09-17 Thread Antoine Damhet
On Thu, Sep 17, 2020 at 03:42:37PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > >
> > > Shouldn't the old check used when machine type <= 5.1 in order to avoid
> > > migration incompatibility ?
> > 
> > Hm, when the check fails we just don't create the device and no error is
> > reported, so even if we have kvmclock data in the migration stream but
> > fail to create it migration will still succeed, right? (not a migration
> > expert here :-)
> 
> When the migration stream is parsed, it'll try and find a "kvmclock"
> device to pass the data it's reading to; if one doesn't exist it'll
> fail.
> 
> The other question is in the incoming direction from an older VM;
> you'll have a kvm clock created here, but you won't load the kvm clock
> state from the migration stream - what is this clock going to do?

I guess that if the migration succeed (and the VMState keeps it's
initial value) the timestamp will be restored to 0 which is the current
behavior.

> 
> Dave


[...]

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [PATCH] target/i386: always create kvmclock device

2020-09-17 Thread Antoine Damhet
On Thu, Sep 17, 2020 at 01:13:06PM +0200, Vitaly Kuznetsov wrote:
> QEMU's kvmclock device is only created when KVM PV feature bits for
> kvmclock (KVM_FEATURE_CLOCKSOURCE/KVM_FEATURE_CLOCKSOURCE2) are
> exposed to the guest. With 'kvm=off' cpu flag the device is not
> created and we don't call KVM_GET_CLOCK/KVM_SET_CLOCK upon migration.
> It was reported that without these call at least Hyper-V TSC page
> clocksouce (which can be enabled independently) gets broken after
> migration.
> 
> Switch to creating kvmclock QEMU device unconditionally, it seems
> to always make sense to call KVM_GET_CLOCK/KVM_SET_CLOCK on migration.
> Use KVM_CAP_ADJUST_CLOCK check instead of CPUID feature bits.
> 
> Reported-by: Antoine Damhet 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  hw/i386/kvm/clock.c| 6 +-
>  target/i386/kvm.c  | 5 +
>  target/i386/kvm_i386.h | 1 +
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 64283358f91d..526c9ea5172b 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -330,11 +330,7 @@ static const TypeInfo kvmclock_info = {
>  /* Note: Must be called after VCPU initialization. */
>  void kvmclock_create(void)
>  {
> -X86CPU *cpu = X86_CPU(first_cpu);
> -
> -if (kvm_enabled() &&
> -cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> -   (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
> +if (kvm_enabled() && kvm_has_adjust_clock()) {

Shouldn't the old check used when machine type <= 5.1 in order to avoid
migration incompatibility ?

>  sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
>  }
>  }
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 4a8b3a41c1bc..20b31b65307b 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -143,6 +143,11 @@ bool kvm_has_adjust_clock_stable(void)
>  return (ret == KVM_CLOCK_TSC_STABLE);
>  }
>  
> +bool kvm_has_adjust_clock(void)
> +{
> +return kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
> +}
> +
>  bool kvm_has_exception_payload(void)
>  {
>  return has_exception_payload;
> diff --git a/target/i386/kvm_i386.h b/target/i386/kvm_i386.h
> index 064b8798a26c..0fce4e51d2d6 100644
> --- a/target/i386/kvm_i386.h
> +++ b/target/i386/kvm_i386.h
> @@ -34,6 +34,7 @@
>  
>  bool kvm_allows_irq0_override(void);
>  bool kvm_has_smm(void);
> +bool kvm_has_adjust_clock(void);
>  bool kvm_has_adjust_clock_stable(void);
>  bool kvm_has_exception_payload(void);
>  void kvm_synchronize_all_tsc(void);
> -- 
> 2.25.4
> 
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [BUG] Migration hv_time rollback

2020-09-16 Thread Antoine Damhet
On Wed, Sep 16, 2020 at 02:50:56PM +0200, Vitaly Kuznetsov wrote:
[...]

> >>
> >
> >
> > Oh, I think I see what's going on. When you add 'kvm=off'
> > cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
> > kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
> > migration.
> >
> > In case we really want to support 'kvm=off' I think we can add Hyper-V
> > features check here along with KVM, this should do the job.
> 
> Does the untested
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 64283358f91d..e03b2ca6d8f6 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -333,8 +333,9 @@ void kvmclock_create(void)
>  X86CPU *cpu = X86_CPU(first_cpu);
>  
>  if (kvm_enabled() &&
> -cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> -   (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
> +((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> + (1ULL << 
> KVM_FEATURE_CLOCKSOURCE2))) ||
> + (cpu->env.features[FEAT_HYPERV_EAX] & 
> HV_TIME_REF_COUNT_AVAILABLE))) {
>  sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
>  }
>  }
> 
> help?

It appears to work :)

> 
> (I don't think we need to remove all 'if (kvm_enabled())' checks from
> machine types as 'kvm=off' should not be related).

Indeed (I didn't look at the macro, it was just quick & dirty).

> 
> -- 
> Vitaly
> 
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [BUG] Migration hv_time rollback

2020-09-16 Thread Antoine Damhet
On Wed, Sep 16, 2020 at 01:59:43PM +0200, Vitaly Kuznetsov wrote:
> "Dr. David Alan Gilbert"  writes:
> 
> > cc'ing in Vitaly who knows about the hv stuff.
> >
> 
> cc'ing Marcelo who knows about clocksources :-)
> 
> > * Antoine Damhet (antoine.dam...@blade-group.com) wrote:
> >> Hi,
> >> 
> >> We are experiencing timestamp rollbacks during live-migration of
> >> Windows 10 guests
> 
> Are you migrating to the same hardware (with the same TSC frequency)? Is
> TSC used as the clocksource on the host?

Yes we are migrating to the exact same hardware. And yes TSC is used as
a clocksource in the host (but the bug is still happening with `hpet` as
a clocksource).

> 
> >>  with the following qemu configuration (linux 5.4.46
> >> and qemu master):
> >> ```
> >> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
> >> ```
> 
> Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is
> not going to check for KVM identification anyway so we pretend we're
> Hyper-V. 

Some softwares explicitly checks for the presence of KVM and then crash
if they find it in CPUID :/

> 
> Also, have you tried adding more Hyper-V enlightenments? 

Yes, I published a stripped-down command-line for a minimal reproducer
but even `hv-frequencies` and `hv-reenlightenment` don't help.

> 
> >
> > How big a jump are you seeing, and how did you notice it in the guest?
> >
> > Dave
> >
> >> I have tracked the bug to the fact that `kvmclock` is not exposed and
> >> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
> >> 
> >> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
> >> present and add Hyper-V support for the `kvmclock_current_nsec`
> >> function.
> 
> AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by
> the guest:
> 
>if (!(env->system_time_msr & 1ULL)) {
> /* KVM clock not active */
> return 0;
> }
> 
> and this is (and way) always false for Windows guests.

Hooo, I missed this piece. When is `clock_is_reliable` expected to be
false ? Because if it is I still think we should be able to query at
least `HV_X64_MSR_REFERENCE_TSC`

> 
> >> 
> >> I'm asking for advice because I am unsure this is the _right_ approach
> >> and how to keep migration compatibility between qemu versions.
> >> 
> >> Thank you all,
> >> 
> >> -- 
> >> Antoine 'xdbob' Damhet
> 
> -- 
> Vitaly
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [BUG] Migration hv_time rollback

2020-09-16 Thread Antoine Damhet
On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
> cc'ing in Vitaly who knows about the hv stuff.

Thanks

> 
> * Antoine Damhet (antoine.dam...@blade-group.com) wrote:
> > Hi,
> > 
> > We are experiencing timestamp rollbacks during live-migration of
> > Windows 10 guests with the following qemu configuration (linux 5.4.46
> > and qemu master):
> > ```
> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
> > ```
> 
> How big a jump are you seeing, and how did you notice it in the guest?

I'm seeing jumps of about the guest uptime (indicating a reset of the
counter). It's expected because we won't call `KVM_SET_CLOCK` to
restore any value.

We first noticed it because after some migrations `dwm.exe` crashes with
the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
the past." error code.

I can also confirm the following hack makes the behavior disappear:

```
diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 64283358f9..f334bdf35f 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -332,11 +332,7 @@ void kvmclock_create(void)
 {
 X86CPU *cpu = X86_CPU(first_cpu);

-if (kvm_enabled() &&
-cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
-   (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
-sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
-}
+sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
 }

 static void kvmclock_register_types(void)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 32b1453e6a..11d980ba85 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -158,9 +158,7 @@ static void pc_init1(MachineState *machine,

 x86_cpus_init(x86ms, pcmc->default_cpu_version);

-if (kvm_enabled() && pcmc->kvmclock_enabled) {
-kvmclock_create();
-}
+kvmclock_create();

 if (pcmc->pci_enabled) {
 pci_memory = g_new(MemoryRegion, 1);
```

> 
> Dave
> 
> > I have tracked the bug to the fact that `kvmclock` is not exposed and
> > disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
> > 
> > I think we should enable the `kvmclock` (qemu device) if `hv-time` is
> > present and add Hyper-V support for the `kvmclock_current_nsec`
> > function.
> > 
> > I'm asking for advice because I am unsure this is the _right_ approach
> > and how to keep migration compatibility between qemu versions.
> > 
> > Thank you all,
> > 
> > -- 
> > Antoine 'xdbob' Damhet
> 
> 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


[BUG] Migration hv_time rollback

2020-09-16 Thread Antoine Damhet
Hi,

We are experiencing timestamp rollbacks during live-migration of
Windows 10 guests with the following qemu configuration (linux 5.4.46
and qemu master):
```
$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
```

I have tracked the bug to the fact that `kvmclock` is not exposed and
disabled from qemu PoV but is in fact used by `hv-time` (in KVM).

I think we should enable the `kvmclock` (qemu device) if `hv-time` is
present and add Hyper-V support for the `kvmclock_current_nsec`
function.

I'm asking for advice because I am unsure this is the _right_ approach
and how to keep migration compatibility between qemu versions.

Thank you all,

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


Re: [PATCH RESEND] file-posix: Handle `EINVAL` fallocate return value

2020-07-20 Thread Antoine Damhet
On Mon, Jul 20, 2020 at 04:07:26PM +0200, Kevin Wolf wrote:
> Am 17.07.2020 um 15:56 hat antoine.dam...@blade-group.com geschrieben:
> > From: Antoine Damhet 
> > 
> > The `detect-zeroes=unmap` option may issue unaligned
> > `FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return
> > `EINVAL`, qemu should then write the zeroes to the blockdev instead of
> > issuing an `IO_ERROR`.
> > 
> > Signed-off-by: Antoine Damhet 
> 
> Do you have a simple reproducer for this? I tried it with something like
> this (also with a LV instead of loop, but it didn't really make a
> difference):
> 
> $ ./qemu-io -c 'write -P 0 42 1234' --image-opts 
> driver=host_device,filename=/dev/loop0,cache.direct=on,detect-zeroes=on
> wrote 1234/1234 bytes at offset 42
> 1.205 KiB, 1 ops; 00.00 sec (2.021 MiB/sec and 1717.5697 ops/sec)
> 
> So I don't seem to run into an error.

```
$ qemu-io -c 'write -P 0 42 1234' --image-opts 
driver=host_device,filename=/dev/loop0,detect-zeroes=unmap
write failed: Invalid argument
```

This seems do do the trick :) (We triggered the bug with Windows 10
guests and with an iSCSI drive so it was hardly a simple reproducer).

> 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 8067e238cb..b2fabcc1b8 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1620,7 +1620,11 @@ static int handle_aiocb_write_zeroes_unmap(void 
> > *opaque)
> >  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> >  int ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | 
> > FALLOC_FL_KEEP_SIZE,
> > aiocb->aio_offset, aiocb->aio_nbytes);
> > -if (ret != -ENOTSUP) {
> > +switch (ret) {
> > +case -ENOTSUP:
> > +case -EINVAL:
> > +break;
> > +default:
> >  return ret;
> >  }
> >  #endif
> 
> This means that we fall back to BLKZEROOUT in case of -EINVAL. Does this
> return a better error code in the relevant cases, or did you just happen
> to test a case where it was skipped or returned -ENOTSUP?

I guess I misinterpreted the comment before calling
`handle_aiocb_write_zeroes`.

The codepath is:
* handle_aiocb_write_zeroes_unmap -> handle_aiocb_write_zeroes -> 
handle_aiocb_write_zeroes_block

In witch the code will return `-ENOSTUP` (`!s->has_write_zeroes`) and
never fall back to `BLKZEROOUT`.

So it's working as I expected but now I am unsure that my fix is the
right thing to do, what do you think ?

> 
> Kevin
> 
> 

-- 
Antoine 'xdbob' Damhet


signature.asc
Description: PGP signature


[PATCH] file-posix: Handle `EINVAL` fallocate return value

2020-07-17 Thread Antoine Damhet
From: Antoine Damhet 

The `detect-zeroes=unmap` option may issue unaligned
`FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return
`EINVAL`, qemu should then write the zeroes to the blockdev instead of
issuing an `IO_ERROR`.

Signed-off-by: Antoine Damhet 
---
 block/file-posix.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 8067e238cb..b2fabcc1b8 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1620,7 +1620,11 @@ static int handle_aiocb_write_zeroes_unmap(void *opaque)
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 int ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
aiocb->aio_offset, aiocb->aio_nbytes);
-if (ret != -ENOTSUP) {
+switch (ret) {
+case -ENOTSUP:
+case -EINVAL:
+break;
+default:
 return ret;
 }
 #endif
-- 
2.27.0




[PATCH RESEND] file-posix: Handle `EINVAL` fallocate return value

2020-07-17 Thread antoine . damhet
From: Antoine Damhet 

The `detect-zeroes=unmap` option may issue unaligned
`FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return
`EINVAL`, qemu should then write the zeroes to the blockdev instead of
issuing an `IO_ERROR`.

Signed-off-by: Antoine Damhet 
---

I am resending this patch because:
1. I sent the first version from the wrong email address
2. I forgot to send it to the devel mailing list

Please forgive me for the inconvenience.

 block/file-posix.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 8067e238cb..b2fabcc1b8 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1620,7 +1620,11 @@ static int handle_aiocb_write_zeroes_unmap(void *opaque)
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 int ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
aiocb->aio_offset, aiocb->aio_nbytes);
-if (ret != -ENOTSUP) {
+switch (ret) {
+case -ENOTSUP:
+case -EINVAL:
+break;
+default:
 return ret;
 }
 #endif
-- 
2.27.0