from:"Fam Zheng"

[PATCH] hostmem: Add clear option to file backend

2023-03-02 Thread Fam Zheng

This adds a memset to clear the backing memory. This is useful in the
case of PMEM DAX to drop dirty data, if the backing memory is handed
over from a previous application or firmware which didn't clean up
before exiting.

Signed-off-by: Fam Zheng 
---
 backends/hostmem-file.c | 20 
 include/exec/memory.h   |  3 +++
 qapi/qom.json   |  3 +++
 qemu-options.hx |  4 +++-
 softmmu/physmem.c   |  6 +-
 5 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..f7468d24ce 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -30,6 +30,7 @@ struct HostMemoryBackendFile {
 bool discard_data;
 bool is_pmem;
 bool readonly;
+bool clear;
 };
 
 static void
@@ -56,6 +57,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags = backend->share ? RAM_SHARED : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
+ram_flags |= fb->clear ? RAM_CLEAR : 0;
 memory_region_init_ram_from_file(>mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
  fb->mem_path, fb->readonly, errp);
@@ -168,6 +170,21 @@ static void file_memory_backend_set_readonly(Object *obj, 
bool value,
 fb->readonly = value;
 }
 
+static bool file_memory_backend_get_clear(Object *obj, Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+
+return fb->clear;
+}
+
+static void file_memory_backend_set_clear(Object *obj, bool value,
+ Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
+
+fb->clear = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
@@ -204,6 +221,9 @@ file_backend_class_init(ObjectClass *oc, void *data)
 object_class_property_add_bool(oc, "readonly",
 file_memory_backend_get_readonly,
 file_memory_backend_set_readonly);
+object_class_property_add_bool(oc, "clear",
+file_memory_backend_get_clear,
+file_memory_backend_set_clear);
 }
 
 static void file_backend_instance_finalize(Object *o)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..3345db5241 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -232,6 +232,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM that isn't accessible through normal means. */
 #define RAM_PROTECTED (1 << 8)
 
+/* Clear these pages when mapping */
+#define RAM_CLEAR (1 << 9)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
IOMMUNotifierFlag flags,
hwaddr start, hwaddr end,
diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..2c4aa5b0d5 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -629,6 +629,8 @@
 # specify the required alignment via this option.
 # 0 selects a default alignment (currently the page size). (default: 0)
 #
+# @clear: if true, the memory is memset to 0 during init. (default: false)
+#
 # @discard-data: if true, the file contents can be destroyed when QEMU exits,
 #to avoid unnecessarily flushing data to the backing file. Note
 #that ``discard-data`` is only an optimization, and QEMU might
@@ -649,6 +651,7 @@
 { 'struct': 'MemoryBackendFileProperties',
   'base': 'MemoryBackendProperties',
   'data': { '*align': 'size',
+'*clear': 'bool',
 '*discard-data': 'bool',
 'mem-path': 'str',
 '*pmem': { 'type': 'bool', 'if': 'CONFIG_LIBPMEM' },
diff --git a/qemu-options.hx b/qemu-options.hx
index beeb4475ba..6c8345c62e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4859,7 +4859,7 @@ SRST
 they are specified. Note that the 'id' property must be set. These
 objects are placed in the '/objects' path.
 
-``-object 
memory-backend-file,id=id,size=size,mem-path=dir,share=on|off,discard-data=on|off,merge=on|off,dump=on|off,prealloc=on|off,host-nodes=host-nodes,policy=default|preferred|bind|interleave,align=align,readonly=on|off``
+``-object 
memory-backend-file,id=id,size=size,mem-path=dir,share=on|off,clear=on|off,discard-data=on|off,merge=on|off,dump=on|off,prealloc=on|off,host-nodes=host-nodes,policy=default|preferred|bind|interleave,align=align,readonly=on|off``
 Creates a memory file backend object, which can be used to back
 the guest RAM with huge pages.
 
@@ -4886,6 +4886,8 @@ SRST
 Documentation/vm/numa\_memory\_policy.txt on the Linux kernel
 source tree for additional details.
 
+Setting clear=on will make QEMU memset the backend to 0.
+
 Setting the ``discard-data`` boolean option to on in

Re: Attaching qcow2 images to containers

2022-05-19 Thread Fam Zheng

On 2022-05-18 07:30, Stefan Hajnoczi wrote:
> Hi Kirill,
> I saw your "[PATCH 0/4] dm: Introduce dm-qcow2 driver to attach QCOW2
> files as block device" patch series:
> https://lore.kernel.org/linux-kernel/ykme5zs2cpxun...@infradead.org/T/
> 
> There has been recent work in vDPA (VIRTIO Data Path Acceleration) to
> achieve similar functionality. The qemu-storage-daemon VDUSE export
> attaches a virtio-blk device to the host kernel and QEMU's qcow2
> implementation can be used:
> https://patchew.org/QEMU/20220504074051.90-1-xieyon...@bytedance.com/
> 
> A container can then access this virtio-blk device (/dev/vda). Note that
> the virtio-blk device is implemented in software using vDPA/VDUSE, there
> is no virtio-pci device.
> 
> As a quick comparison with a dm-qcow2 target, this approach keeps the
> qcow2 code in QEMU userspace and can take advantage of QEMU block layer
> features (storage migration/mirroring/backup, snapshots, etc). On the
> other hand, it's likely to be more heavyweight because bounce buffers
> are required in VDUSE for security reasons, there is a separate
> userspace process involved, and there's the virtio_blk.ko driver and an
> emulated virtio-blk device involved.
> 
> Another similar feature that was recently added to QEMU is the
> qemu-storage-daemon FUSE export:
> 
>   $ qemu-storage-daemon \
> --blockdev file,filename=test.img,node-name=drive0 \
>   --export fuse,node-name=drive0,id=fuse0,mountpoint=/tmp/foo
>   $ ls -alF /tmp/foo
>   -r. 1 me me 10737418240 May 18 07:22 /tmp/foo
> 
> This exports a disk image as a file via FUSE. Programs can access it
> like a regular file and qemu-storage-daemon will do the qcow2 I/O on the
> underlying file.
> 
> I wanted to mention these options for exposing qcow2 disk images to
> processes/containers on the host. Depending on your use cases they might
> be interesting. Performance comparisons against VDUSE and FUSE exports
> would be interesting since these new approaches seem to be replacing
> qemu-nbd.

In addition, there was also qemu-tcmu, (more PoC compared to other options):

https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg04408.html

Fam

> 
> Can you share more about your use cases for the dm-qcow2 target? It
> could be useful for everyone I've CCed to be aware of various efforts in
> this area.
> 
> Thanks,
> Stefan

Re: [PATCH] Remove Ubuntu 18.04 support from the repository

2022-05-11 Thread Fam Zheng

On 2022-05-11 13:13, Philippe Mathieu-Daudé via wrote:
> +Fam
> On Wed, May 11, 2022 at 1:03 PM Thomas Huth  wrote:
> > On 11/05/2022 12.46, Philippe Mathieu-Daudé wrote:
> > > +Robert
> > >
> > >   On Wed, May 11, 2022 at 11:30 AM Daniel P. Berrangé
> > >  wrote:
> > >>
> > >> On Tue, May 10, 2022 at 09:56:12PM +0200, Thomas Huth wrote:
> > >>> According to our "Supported build platforms" policy, we now do not 
> > >>> support
> > >>> Ubuntu 18.04 anymore. Remove the related files and entries from our CI.
> > >>>
> > >>> Signed-off-by: Thomas Huth 
> > >>> ---
> > >>>   Seems like nobody touched the 18.04-based tests/vm/ubuntu* files in a
> > >>>   very long time, so I assume these are not used anymore and can 
> > >>> completely
> > >>>   be removed now.
> > >>
> > >> Or it could mean that they are working fine and so haven't needed
> > >> changes...
> > >
> > > Yes :)
> >
> > At least for me "make vm-build-ubuntu.aarch64" is only failing with ssh
> > timeouts (on my x86 laptop) ... is this really supposed to work with TCG, or
> > is this KVM (on arm hosts) only?
> 
> Yes this timeout code is not working. I suppose it is tied to the TCG host 
> perf.
> I suggested a pair of patches to increase it but back then Fam didn't accepted
> them because IIRC these VMs were used by patchew (previous to Gitlab).
> Today we have better framework for testing, so I wouldn't use this script on
> CI, but it is still valuable for manual testing.

Yes I believe these are not needed by patchew anymore.

Fam

Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations

2022-04-22 Thread Fam Zheng

On 2022-04-22 09:52, Daniel P. Berrangé wrote:
> On Fri, Apr 22, 2022 at 09:34:28AM +0100, Fam Zheng wrote:
> > On 2022-04-22 00:36, Sam Li wrote:
> > > Linux recently added a new io_uring(7) optimization API that QEMU
> > > doesn't take advantage of yet. The liburing library that QEMU uses
> > > has added a corresponding new API calling io_uring_register_ring_fd().
> > > When this API is called after creating the ring, the io_uring_submit()
> > > library function passes a flag to the io_uring_enter(2) syscall
> > > allowing it to skip the ring file descriptor fdget()/fdput()
> > > operations. This saves some CPU cycles.
> > > 
> > > Signed-off-by: Sam Li 
> > > ---
> > >  block/io_uring.c | 10 +-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/io_uring.c b/block/io_uring.c
> > > index 782afdb433..5247fb79e2 100644
> > > --- a/block/io_uring.c
> > > +++ b/block/io_uring.c
> > > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> > >  }
> > >  
> > >  ioq_init(>io_q);
> > > -return s;
> > > +if (io_uring_register_ring_fd(>ring) < 0) {
> > > +/*
> > > + * Only warn about this error: we will fallback to the 
> > > non-optimized
> > > + * io_uring operations.
> > > + */
> > > +error_reportf_err(*errp,
> > > + "failed to register linux io_uring ring file 
> > > descriptor");
> > 
> > IIUC errp can be NULL, so let's not dereference it without checking. So, 
> > just
> > use error_report?
> 
> Plenty of people will be running kernels that lack the new feature,
> so this "failure" will be an expected scenario. We shouldn't be
> spamming the logs with any error or warning message. Assuming  QEMU
> remains fully functional, merely not as optimized, we should be
> totally silent.

Functionally, that's a very valid point. But performance wise, is it good to
have some visibility of this? Since people use io_uring instead of other
options almost certainly for performance, and here the issue does matter quite
a bit.

Fam

> 
> At most stick in a 'trace' point so we can record whether the
> optimization is present.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 
>

Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations

2022-04-22 Thread Fam Zheng

On 2022-04-22 00:36, Sam Li wrote:
> Linux recently added a new io_uring(7) optimization API that QEMU
> doesn't take advantage of yet. The liburing library that QEMU uses
> has added a corresponding new API calling io_uring_register_ring_fd().
> When this API is called after creating the ring, the io_uring_submit()
> library function passes a flag to the io_uring_enter(2) syscall
> allowing it to skip the ring file descriptor fdget()/fdput()
> operations. This saves some CPU cycles.
> 
> Signed-off-by: Sam Li 
> ---
>  block/io_uring.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 782afdb433..5247fb79e2 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
>  }
>  
>  ioq_init(>io_q);
> -return s;
> +if (io_uring_register_ring_fd(>ring) < 0) {
> +/*
> + * Only warn about this error: we will fallback to the non-optimized
> + * io_uring operations.
> + */
> +error_reportf_err(*errp,
> + "failed to register linux io_uring ring file 
> descriptor");

IIUC errp can be NULL, so let's not dereference it without checking. So, just
use error_report?

Fam

> +}
>  
> +return s;
>  }
>  
>  void luring_cleanup(LuringState *s)
> -- 
> Use error_reportf_err to avoid memory leak due to not freeing error
> object.
> --
> 2.35.1
> 
>

Re: [PATCH v3] Use io_uring_register_ring_fd() to skip fd operations

2022-04-21 Thread Fam Zheng

On 2022-04-19 07:33, Sam Li wrote:
> Linux recently added a new io_uring(7) optimization API that QEMU
> doesn't take advantage of yet. The liburing library that QEMU uses
> has added a corresponding new API calling io_uring_register_ring_fd().
> When this API is called after creating the ring, the io_uring_submit()
> library function passes a flag to the io_uring_enter(2) syscall
> allowing it to skip the ring file descriptor fdget()/fdput()
> operations. This saves some CPU cycles.
> 
> Signed-off-by: Sam Li 
> ---
>  block/io_uring.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 782afdb433..51f4834b69 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
>  }
>  
>  ioq_init(>io_q);
> -return s;
> +if (io_uring_register_ring_fd(>ring) < 0) {
> +/*
> + * Only warn about this error: we will fallback to the non-optimized
> + * io_uring operations.
> + */
> +error_setg_errno(errp, errno,
> + "failed to register linux io_uring ring file 
> descriptor");
> +}
>  
> +return s;

As a general convention, I don't think the errp is going to get proper handling
by the callers, if non-NULL is returned like here. IOW a matching error_free is
never called and this is memory leak?

I guess error_report is better?

Fam

>  }
>  
>  void luring_cleanup(LuringState *s)
> -- 
> 2.35.1
> 
>

Re: [PATCH 3/6] scsi-disk: add MODE_PAGE_APPLE quirk for Macintosh

2022-04-21 Thread Fam Zheng

On 2022-04-21 07:51, Mark Cave-Ayland wrote:
> One of the mechanisms MacOS uses to identify drives compatible with MacOS is 
> to
> send a custom MODE SELECT command for page 0x30 to the drive. The response to
> this is a hard-coded manufacturer string which must match in order for the
> drive to be usable within MacOS.
> 
> Add an implementation of the MODE SELECT page 0x30 response guarded by a newly
> defined SCSI_DISK_QUIRK_MODE_PAGE_APPLE quirk bit so that drives attached
> to non-Apple machines function exactly as before.
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/scsi/scsi-disk.c  | 19 +++
>  include/hw/scsi/scsi.h   |  3 +++
>  include/scsi/constants.h |  1 +
>  3 files changed, 23 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index d89cdd4e4a..37013756d5 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -1085,6 +1085,7 @@ static int mode_sense_page(SCSIDiskState *s, int page, 
> uint8_t **p_outbuf,
>  [MODE_PAGE_R_W_ERROR]  = (1 << TYPE_DISK) | (1 << 
> TYPE_ROM),
>  [MODE_PAGE_AUDIO_CTL]  = (1 << TYPE_ROM),
>  [MODE_PAGE_CAPABILITIES]   = (1 << TYPE_ROM),
> +[MODE_PAGE_APPLE]  = (1 << TYPE_ROM),
>  };
>  
>  uint8_t *p = *p_outbuf + 2;
> @@ -1229,6 +1230,22 @@ static int mode_sense_page(SCSIDiskState *s, int page, 
> uint8_t **p_outbuf,
>  p[19] = (16 * 176) & 0xff;
>  break;
>  
> + case MODE_PAGE_APPLE:
> +if (s->qdev.type == TYPE_DISK &&
> +(s->quirks & SCSI_DISK_QUIRK_MODE_PAGE_APPLE)) {

This is always false. SCSI_DISK_QUIRK_MODE_PAGE_APPLE is defined 0.

You need (1 << SCSI_DISK_QUIRK_MODE_PAGE_APPLE) instead.

> +
> +length = 0x24;
> +if (page_control == 1) { /* Changeable Values */
> +break;
> +}
> +
> +memset(p, 0, length);
> +strcpy((char *)p + 8, "APPLE COMPUTER, INC   ");
> +break;
> +} else {
> +return -1;
> +}
> +
>  default:
>  return -1;
>  }
> @@ -3042,6 +3059,8 @@ static Property scsi_hd_properties[] = {
>  DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
>  DEFINE_PROP_INT32("scsi_version", SCSIDiskState, 
> qdev.default_scsi_version,
>5),
> +DEFINE_PROP_BIT("quirk_mode_page_apple", SCSIDiskState, quirks,
> +SCSI_DISK_QUIRK_MODE_PAGE_APPLE, 0),
>  DEFINE_BLOCK_CHS_PROPERTIES(SCSIDiskState, qdev.conf),
>  DEFINE_PROP_END_OF_LIST(),
>  };
> diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
> index 1ffb367f94..f629706250 100644
> --- a/include/hw/scsi/scsi.h
> +++ b/include/hw/scsi/scsi.h
> @@ -226,4 +226,7 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, 
> int target, int lun);
>  /* scsi-generic.c. */
>  extern const SCSIReqOps scsi_generic_req_ops;
>  
> +/* scsi-disk.c */
> +#define SCSI_DISK_QUIRK_MODE_PAGE_APPLE 0
> +
>  #endif
> diff --git a/include/scsi/constants.h b/include/scsi/constants.h
> index 2a32c08b5e..21ca7b50cd 100644
> --- a/include/scsi/constants.h
> +++ b/include/scsi/constants.h
> @@ -234,6 +234,7 @@
>  #define MODE_PAGE_FAULT_FAIL  0x1c
>  #define MODE_PAGE_TO_PROTECT  0x1d
>  #define MODE_PAGE_CAPABILITIES0x2a
> +#define MODE_PAGE_APPLE   0x30
>  #define MODE_PAGE_ALLS0x3f
>  /* Not in Mt. Fuji, but in ATAPI 2.6 -- deprecated now in favor
>   * of MODE_PAGE_SENSE_POWER */
> -- 
> 2.20.1
> 
> 

Fam

Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)

2022-04-12 Thread Fam Zheng

On 2022-04-11 12:23, Steven Sistare wrote:
> On 3/29/2022 7:03 AM, Fam Zheng wrote:
> > On 2021-08-06 14:43, Steve Sistare wrote:
> >> Preserve vfio INTX state across cpr restart.  Preserve VFIOINTx fields as
> >> follows:
> >>   pin : Recover this from the vfio config in kernel space
> >>   interrupt : Preserve its eventfd descriptor across exec.
> >>   unmask : Ditto
> >>   route.irq : This could perhaps be recovered in vfio_pci_post_load by
> >> calling pci_device_route_intx_to_irq(pin), whose implementation reads
> >> config space for a bridge device such as ich9.  However, there is no
> >> guarantee that the bridge vmstate is read before vfio vmstate.  Rather
> >> than fiddling with MigrationPriority for vmstate handlers, explicitly
> >> save route.irq in vfio vmstate.
> >>   pending : save in vfio vmstate.
> >>   mmap_timeout, mmap_timer : Re-initialize
> >>   bool kvm_accel : Re-initialize
> >>
> >> In vfio_realize, defer calling vfio_intx_enable until the vmstate
> >> is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
> >> vfio_intx_kvm_enable to skip vfio initialization, but still perform
> >> kvm initialization.
> >>
> >> Signed-off-by: Steve Sistare 
> > 
> > Hi Steve,
> > 
> > Not directly related to this patch, but since the context is close: it looks
> > like this series only takes care of exec restart mode of vfio-pci, have you 
> > had
> > any thoughts on kexec reboot mode with vfio-pci?
> > 
> > The general idea is if DMAR context is not lost during kexec, we should be 
> > able
> > to set up irqfds again and things will just work?
> > 
> > Fam
> 
> Hi Fam,
>   I have thought about that use case, but only in general terms.
> IMO it best fits in the cpr framework as a new mode (rather than as 
> a new -restore command line argument).  

Yes I think that is better, I will try that.

> 
> In your code below, you would have fewer code changes if you set 
> 'reused = true' for the new mode, rather than testing both 'reused and 
> restored' 
> at multiple sites. Lastly, I cleaned up the vector handling somewhat from V6 
> to V7, so you may want to try your code using V7 as a base.

I am cleaning up the kernel patches and will post both parts once ready.

Fam

Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)

2022-03-29 Thread Fam Zheng

On 2021-08-06 14:43, Steve Sistare wrote:
> Preserve vfio INTX state across cpr restart.  Preserve VFIOINTx fields as
> follows:
>   pin : Recover this from the vfio config in kernel space
>   interrupt : Preserve its eventfd descriptor across exec.
>   unmask : Ditto
>   route.irq : This could perhaps be recovered in vfio_pci_post_load by
> calling pci_device_route_intx_to_irq(pin), whose implementation reads
> config space for a bridge device such as ich9.  However, there is no
> guarantee that the bridge vmstate is read before vfio vmstate.  Rather
> than fiddling with MigrationPriority for vmstate handlers, explicitly
> save route.irq in vfio vmstate.
>   pending : save in vfio vmstate.
>   mmap_timeout, mmap_timer : Re-initialize
>   bool kvm_accel : Re-initialize
> 
> In vfio_realize, defer calling vfio_intx_enable until the vmstate
> is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
> vfio_intx_kvm_enable to skip vfio initialization, but still perform
> kvm initialization.
> 
> Signed-off-by: Steve Sistare 

Hi Steve,

Not directly related to this patch, but since the context is close: it looks
like this series only takes care of exec restart mode of vfio-pci, have you had
any thoughts on kexec reboot mode with vfio-pci?

The general idea is if DMAR context is not lost during kexec, we should be able
to set up irqfds again and things will just work?

Fam

--

PS some more info below:

I have some local kernel patches to kexec reboot most part of the host kernel
while keeping IOMMU DMAR tables in a valid state; with that, not many extra
things are needed in addition to restore it. A PoC is like below (I can share
more details of the kernel changes if this patch makes any sense):


commit f8951e58be86bd6e37f816394a9a73f28d8059fc
Author: Fam Zheng 
Date:   Mon Mar 21 13:19:49 2022 +

cpr: Add live-update support to vfio-pci devices

In cpr-save, always serialize the vfio-pci states.

In cpr-load, add a '-restore' mode that will do
VFIO_GROUP_GET_DEVICE_FD_INTACT and skip DMAR setup, somewhat similar to
    the current cpr exec mode.

Signed-off-by: Fam Zheng 

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 73f4259556..e36f0ef97d 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -584,10 +584,15 @@ void msix_init_vector_notifiers(PCIDevice *dev,
 MSIVectorReleaseNotifier release_notifier,
 MSIVectorPollNotifier poll_notifier)
 {
+int vector;
+
 assert(use_notifier && release_notifier);
 dev->msix_vector_use_notifier = use_notifier;
 dev->msix_vector_release_notifier = release_notifier;
 dev->msix_vector_poll_notifier = poll_notifier;
+for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
+msix_handle_mask_update(dev, vector, true);
+}
 }
 
 int msix_set_vector_notifiers(PCIDevice *dev,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 605ffbb5d0..f1240410a8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2066,6 +2066,9 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 bool reused;
 VFIOAddressSpace *space;
 
+if (restore) {
+return 0;
+}
 space = vfio_get_address_space(as);
 fd = cpr_find_fd("vfio_container_for_group", group->groupid);
 reused = (fd > 0);
@@ -2486,7 +2489,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
 fd = cpr_find_fd(name, 0);
 reused = (fd >= 0);
 if (!reused) {
-fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+int op = restore ? VFIO_GROUP_GET_DEVICE_FD_INTACT : 
VFIO_GROUP_GET_DEVICE_FD;
+fd = ioctl(group->fd, op, name);
 }
 
 if (fd < 0) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e32513c668..9da5f93228 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -361,7 +361,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error 
**errp)
  * Do not alter interrupt state during vfio_realize and cpr-load.  The
  * reused flag is cleared thereafter.
  */
-if (!vdev->pdev.reused) {
+if (!vdev->pdev.reused && !restore) {
 vfio_disable_interrupts(vdev);
 }
 
@@ -388,7 +388,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error 
**errp)
 fd = event_notifier_get_fd(>intx.interrupt);
 qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
 
-if (vdev->pdev.reused) {
+if (vdev->pdev.reused && !restore) {
 vfio_intx_reenable_kvm(vdev, );
 goto finish;
 }
@@ -2326,6 +2326,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 int ret, i, count;
 bool multi = false;
 
+if (restore) {
+return 0;
+}
 trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
 if (!single) {
@@

Re: [PATCH 3/4] python/qmp-shell: relicense as LGPLv2+

2022-03-29 Thread Fam Zheng

On 2022-03-25 16:04, John Snow wrote:
> qmp-shell is presently licensed as GPLv2 (only). I intend to include
> this tool as an add-on to an LGPLv2+ library package hosted on
> PyPI.org. I've selected LGPLv2+ to maximize compatibility with other
> licenses while retaining a copyleft license.
> 
> To keep licensing matters simple, I'd like to relicense this tool as
> LGPLv2+ as well in order to keep the resultant license of the hosted
> release files simple -- even if library users won't "link against" this
> command line tool.
> 
> Therefore, I am asking permission from the current authors of this
> tool to loosen the license. At present, those people are:
> 
> - John Snow (me!), 411/609
> - Luiz Capitulino, Author, 97/609
> - Daniel Berrangé, 81/609
> - Eduardo Habkost, 10/609
> - Marc-André Lureau, 6/609
> - Fam Zheng, 3/609
> - Cleber Rosa, 1/609
> 
> (All of which appear to have been written under redhat.com addresses.)
> 
> Eduardo's fixes are largely automated from 2to3 conversion tools and may
> not necessarily constitute authorship, but his signature would put to
> rest any questions.
> 
> Cleber's changes concern a single import statement change. Also won't
> hurt to ask.
> 
> CC: Luiz Capitulino 
> CC: Daniel Berrange 
> CC: Eduardo Habkost 
> CC: Marc-André Lureau 
> CC: Fam Zheng 
> CC: Cleber Rosa 
> 
> Signed-off-by: John Snow 

No longer wearing that hat any more so maybe my reply doesn't matter, but since
I'm Cc'ed with my new address, I am personally happy with the re-licensing:

Acked-by: Fam Zheng

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-22 Thread Fam Zheng




> On 22 Jun 2021, at 13:42, Philippe Mathieu-Daudé  wrote:
> 
> On 6/22/21 10:06 AM, Philippe Mathieu-Daudé wrote:
>> On 6/22/21 9:29 AM, Philippe Mathieu-Daudé wrote:
>>> On 6/21/21 5:36 PM, Fam Zheng wrote:
>>>>> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé  
>>>>> wrote:
>>>>> On 6/21/21 3:18 PM, Fam Zheng wrote:
>>>>>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>>>>>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>>>>>>> -ENOMEM in case of error. The driver was correctly handling the
>>>>>>> error path to recycle its volatile IOVA mappings.
>>>>>>> 
>>>>>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>>>>>>> DMA mappings per container", April 2019) added the -ENOSPC error to
>>>>>>> signal the user exhausted the DMA mappings available for a container.
>>>>>>> 
>>>>>>> The block driver started to mis-behave:
>>>>>>> 
>>>>>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>>>>>>> (qemu)
>>>>>>> (qemu) info status
>>>>>>> VM status: paused (io-error)
>>>>>>> (qemu) c
>>>>>>> VFIO_MAP_DMA failed: No space left on device
>>>>>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
>>>>>>> Assertion `ctx == blk->ctx' failed.
>>>>>> 
>>>>>> Hi Phil,
>>>>>> 
>>>>>> 
>>>>>> The diff looks good to me, but I’m not sure what exactly caused the 
>>>>>> assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC 
>>>>>> before, so it should be treated as a general case. What am I missing?
>>>>> 
>>>>> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE
>>>>> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping
>>>>> exhausted. I don't understand the full "VM resume" path, but this
>>>>> is not what we want (IO_NOSPACE is to warn the operator to add
>>>>> more storage and resume, which is pointless in our case, resuming
>>>>> won't help until we flush the mappings).
>>>>> 
>>>>> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.
>>>> 
>>>> I agree with that. It just makes me feel there’s another bug in the 
>>>> resuming code path. Can you get a backtrace?
>>> 
>>> It seems the resuming code path bug has been fixed elsewhere:
>>> 
>>> (qemu) info status
>>> info status
>>> VM status: paused (io-error)
>>> (qemu) c
>>> c
>>> 2021-06-22T07:27:00.745466Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
>>> space left on device
>>> (qemu) info status
>>> info status
>>> VM status: paused (io-error)
>>> (qemu) c
>>> c
>>> 2021-06-22T07:27:12.458137Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
>>> space left on device
>>> (qemu) c
>>> c
>>> 2021-06-22T07:27:13.439167Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
>>> space left on device
>>> (qemu) c
>>> c
>>> 2021-06-22T07:27:14.272071Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
>>> space left on device
>>> (qemu)
>>> 
>> 
>> I tested all releases up to v4.1.0 and could not trigger the
>> blk_get_aio_context() assertion. Building using --enable-debug.
>> IIRC Gentoo is more aggressive, so I'll restart using -O2.
> 
> Took 4h30 to test all releases with -O3, couldn't reproduce :(
> 
> I wish I hadn't postponed writing an Ansible test script...
> 
> On v1 Michal said he doesn't have access to the machine anymore,
> so I'll assume the other issue got fixed elsewhere.
> 
> 

Cool, then I think it’s just the commit message that should be updated.

Fam

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Fam Zheng




> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé  wrote:
> 
> On 6/21/21 3:18 PM, Fam Zheng wrote:
>> 
>> 
>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  wrote:
>>> 
>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>>> -ENOMEM in case of error. The driver was correctly handling the
>>> error path to recycle its volatile IOVA mappings.
>>> 
>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>>> DMA mappings per container", April 2019) added the -ENOSPC error to
>>> signal the user exhausted the DMA mappings available for a container.
>>> 
>>> The block driver started to mis-behave:
>>> 
>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>>> (qemu)
>>> (qemu) info status
>>> VM status: paused (io-error)
>>> (qemu) c
>>> VFIO_MAP_DMA failed: No space left on device
>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
>>> Assertion `ctx == blk->ctx' failed.
>> 
>> Hi Phil,
>> 
>> 
>> The diff looks good to me, but I’m not sure what exactly caused the 
>> assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC 
>> before, so it should be treated as a general case. What am I missing?
> 
> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE
> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping
> exhausted. I don't understand the full "VM resume" path, but this
> is not what we want (IO_NOSPACE is to warn the operator to add
> more storage and resume, which is pointless in our case, resuming
> won't help until we flush the mappings).
> 
> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.

I agree with that. It just makes me feel there’s another bug in the resuming 
code path. Can you get a backtrace?

Fam

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Fam Zheng




> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  wrote:
> 
> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> -ENOMEM in case of error. The driver was correctly handling the
> error path to recycle its volatile IOVA mappings.
> 
> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> DMA mappings per container", April 2019) added the -ENOSPC error to
> signal the user exhausted the DMA mappings available for a container.
> 
> The block driver started to mis-behave:
> 
>  qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>  (qemu)
>  (qemu) info status
>  VM status: paused (io-error)
>  (qemu) c
>  VFIO_MAP_DMA failed: No space left on device
>  qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
> Assertion `ctx == blk->ctx' failed.

Hi Phil,
 

The diff looks good to me, but I’m not sure what exactly caused the assertion 
failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it 
should be treated as a general case. What am I missing?

Reviewed-by: Fam Zheng 

> 
> Fix by handling the new -ENOSPC error (when DMA mappings are
> exhausted) without any distinction to the current -ENOMEM error,
> so we don't change the behavior on old kernels where the CVE-2019-3882
> fix is not present.
> 
> An easy way to reproduce this bug is to restrict the DMA mapping
> limit (65535 by default) when loading the VFIO IOMMU module:
> 
>  # modprobe vfio_iommu_type1 dma_entry_limit=666
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Michal Prívozník 
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Buglink: https://bugs.launchpad.net/qemu/+bug/186
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: KISS checking both errors undistinguishedly (Maxim)
> ---
> block/nvme.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 2b5421e7aa6..c3d2a49866c 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -1030,7 +1030,7 @@ try_map:
> r = qemu_vfio_dma_map(s->vfio,
>   qiov->iov[i].iov_base,
>   len, true, );
> -if (r == -ENOMEM && retry) {
> +if ((r == -ENOMEM || r == -ENOSPC) && retry) {
> retry = false;
> trace_nvme_dma_flush_queue_wait(s);
> if (s->dma_map_count) {
> -- 
> 2.31.1
> 
>

Re: [PATCH 0/2] util/async: print leaked BH name when AioContext finalizes

2021-04-15 Thread Fam Zheng

On 2021-04-14 21:02, Stefan Hajnoczi wrote:
> Eric Ernst and I debugged a BH leak and it was more involved than it should 
> be.
> The problem is that BHs don't have a human-readable identifier, so low-level
> debugging techniques and inferences about the code are required to figure out
> which BH was leaked in production environments without easy debug access.
> 
> The leak ended up already being fixed upstream but let's improve diagnostics
> for leaked BHs so that this becomes quick and easy in the future.
> 
> Stefan Hajnoczi (2):
>   util/async: add a human-readable name to BHs for debugging
>   util/async: print leaked BH name when AioContext finalizes
> 
>  include/block/aio.h| 31 ---
>  include/qemu/main-loop.h   |  4 +++-
>  tests/unit/ptimer-test-stubs.c |  2 +-
>  util/async.c   | 25 +
>  util/main-loop.c   |  4 ++--
>  5 files changed, 55 insertions(+), 11 deletions(-)
> 
> -- 
> 2.30.2
> 

Reviewed-by: Fam Zheng

Re: [PATCH v2] MAINTAINERS: Merge the Gitlab-CI section into the generic CI section

2021-03-10 Thread Fam Zheng

My time on QEMU is rather limited and I could hardly look at testing
patches for long. So it makes sense to drop me.

Acked-by: Fam Zheng 

Thanks!
Fam

On Mon, 8 Mar 2021 at 15:50, Thomas Huth  wrote:

> The status of the gitlab-CI files is currently somewhat confusing, and
> it is often not quite clear whether a patch should go via my tree or
> via the testing tree of Alex. That situation has grown historically...
> Initially, I was the only one using the gitlab-CI, just for my private
> repository there. But in the course of time, the gitlab-CI switched to
> use the containers from tests/docker/ (which is not part of the gitlab-CI
> section in the MAINTAINERS file), and QEMU now even switched to gitlab.com
> completely for the repository and will soon use it as its gating CI, too,
> so it makes way more sense if the gitlab-ci.yml files belong to the people
> who are owning the qemu-project on gitlab.com and take care of the gitlab
> CI there. Thus let's merge the gitlab-ci section into the common "test and
> build automation" section.
>
> While we're at it, I'm also removing the line with Fam there for now,
> since he was hardly active during the last years in this area anymore.
> If he ever gets more time for this part again in the future, we surely
> can add the line back again.
>
> Now to avoid that Alex is listed here alone, Philippe and I agreed to
> help as backup maintainers here, too.
>
> Signed-off-by: Thomas Huth 
> ---
>  v2: Keep Philippe and myself as maintainer instead of reviewer
>
>  MAINTAINERS | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 26c9454823..5c4c179abb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3262,17 +3262,21 @@ F: include/hw/remote/iohub.h
>
>  Build and test automation
>  -----
> -Build and test automation
> +Build and test automation, Linux Continuous Integration
>  M: Alex Bennée 
> -M: Fam Zheng 
> -R: Philippe Mathieu-Daudé 
> +M: Philippe Mathieu-Daudé 
> +M: Thomas Huth 
> +R: Wainer dos Santos Moschetta 
>  S: Maintained
>  F: .github/lockdown.yml
> +F: .gitlab-ci.yml
> +F: .gitlab-ci.d/
>  F: .travis.yml
>  F: scripts/ci/
>  F: tests/docker/
>  F: tests/vm/
>  F: scripts/archive-source.sh
> +W: https://gitlab.com/qemu-project/qemu/pipelines
>  W: https://travis-ci.org/qemu/qemu
>  W: http://patchew.org/QEMU/
>
> @@ -3289,17 +3293,6 @@ S: Maintained
>  F: .cirrus.yml
>  W: https://cirrus-ci.com/github/qemu/qemu
>
> -GitLab Continuous Integration
> -M: Thomas Huth 
> -M: Philippe Mathieu-Daudé 
> -M: Alex Bennée 
> -R: Wainer dos Santos Moschetta 
> -S: Maintained
> -F: .gitlab-ci.yml
> -F: .gitlab-ci.d/crossbuilds.yml
> -F: .gitlab-ci.d/*py
> -F: scripts/ci/gitlab-pipeline-status
> -
>  Guest Test Compilation Support
>  M: Alex Bennée 
>  R: Philippe Mathieu-Daudé 
> --
> 2.27.0
>
>
>

Re: [PATCH] block: Introduce zero-co:// and zero-aio://

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 15:02, Max Reitz  wrote:

> On 10.03.21 15:17, f...@euphon.net wrote:
> > From: Fam Zheng 
> >
> > null-co:// has a read-zeroes=off default, when used to in security
> > analysis, this can cause false positives because the driver doesn't
> > write to the read buffer.
> >
> > null-co:// has the highest possible performance as a block driver, so
> > let's keep it that way. This patch introduces zero-co:// and
> > zero-aio://, largely similar with null-*://, but have read-zeroes=on by
> > default, so it's more suitable in cases than null-co://.
> >
> > Signed-off-by: Fam Zheng 
> > ---
> >   block/null.c | 91 
> >   1 file changed, 91 insertions(+)
>
> You’ll also need to make all tests that currently use null-{co,aio} use
> zero-{co,aio}, because none of those are performance tests (as far as
> I’m aware), so they all want a block driver that memset()s.
>
> (And that’s basically my problem with this approach; nearly everyone who
> uses null-* right now probably wants read-zeroes=on, so keeping null-*
> as it is means all of those users should be changed.  Sure, they were
> all wrong to not specify read-zeroes=on, but that’s how it is.  So while
> technically this patch is a compatible change, in contrast to the one
> making read-zeroes=on the default, in practice it absolutely isn’t.)
>
> Another problem arising from that is I can imagine that some
> distributions might have whitelisted null-co because many iotests
> implicitly depend on it, so the iotests will fail if they aren’t
> whitelisted.  Now they’d need to whitelist zero-co, too.  Not
> impossible, sure, but it’s work that would need to be done.
>
>
> My problem is this: We have a not-really problem, namely “valgrind and
> other tools complain”.  Philippe (and I guess me on IRC) proposed a
> simple solution whose only drawbacks (AFAIU) are:
>
> (1) When writing performance tests, you’ll then need to explicitly
> specify read-zeroes=off.  Existing performance tests using null-co will
> probably wrongly show degredation.  (Are there such tests, though?)
>
> (2) null will not quite conform to its name, strictly speaking.
> Frankly, I don’t know who’d care other than people who write those
> performance tests mentioned in (1).  I know I don’t care.
>
> (Technically: (3) We should look into all qemu tests that use null-co to
> see whether they test performance.  In practice, they don’t, so we don’t
> need to.)
>
> So AFAIU change the read-zeroes default would affect very few people, if
> any.  I see you care about (2), and you’re the maintainer, so I can’t
> say that there is no problem.  But it isn’t a practical one.
>
> So on the practical front, we get a small benefit (tools won’t complain)
> for a small drawback (having to remember read-zeroes=off for performance
> tests).
>
>
> Now you propose a change that has the following drawbacks, as I see it:
>
> (1) All non-performance tests using null-* should be changed to zero-*.
>   And those are quite a few tests, so this is some work.
>
> (2) Distributions that have whitelisted null-co now should whitelist
> zero-co, too.
>
> Not impossible, but I consider these much bigger practical drawbacks
> than making read-zeroes=on the default.  It’s actual work that must be
> done.  To me personally, these drawbacks far outweigh the benefit of
> having valgrind and other tools not complain.
>
>
> I can’t stop you from updating this patch to do (1), but it doesn’t make
> sense to me from a practical perspective.  It just doesn’t seem worth it
> to me.
>

But using null-co:// in tests is not wrong, the problem is the caller
failed to initialize its buffer. IMO the valgrind issue should really be
fixed like this:

diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index dcbccee294..9078718445 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -55,7 +55,7 @@ struct partition {
static int guess_disk_lchs(BlockBackend *blk,
   int *pcylinders, int *pheads, int *psectors)
{
-uint8_t buf[BDRV_SECTOR_SIZE];
+uint8_t buf[BDRV_SECTOR_SIZE] = {};
int i, heads, sectors, cylinders;
struct partition *p;
uint32_t nr_sects;

Re: [PATCH] block: Introduce zero-co:// and zero-aio://

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 14:41, Vladimir Sementsov-Ogievskiy <
vsement...@virtuozzo.com> wrote:

> 10.03.2021 17:17, f...@euphon.net wrote:
> > From: Fam Zheng 
> >
> > null-co:// has a read-zeroes=off default, when used to in security
> > analysis, this can cause false positives because the driver doesn't
> > write to the read buffer.
> >
> > null-co:// has the highest possible performance as a block driver, so
> > let's keep it that way. This patch introduces zero-co:// and
> > zero-aio://, largely similar with null-*://, but have read-zeroes=on by
> > default, so it's more suitable in cases than null-co://.
> >
> > Signed-off-by: Fam Zheng 
> > ---
> >   block/null.c | 91 
> >   1 file changed, 91 insertions(+)
> >
> > diff --git a/block/null.c b/block/null.c
> > index cc9b1d4ea7..5de97e8fda 100644
> > --- a/block/null.c
> > +++ b/block/null.c
> > @@ -76,6 +76,30 @@ static void null_aio_parse_filename(const char
> *filename, QDict *options,
> >   }
> >   }
> >
> > +static void zero_co_parse_filename(const char *filename, QDict *options,
> > +   Error **errp)
> > +{
> > +/* This functions only exists so that a zero-co:// filename is
> accepted
> > + * with the zero-co driver. */
> > +if (strcmp(filename, "zero-co://")) {
> > +error_setg(errp, "The only allowed filename for this driver is "
> > + "'zero-co://'");
> > +return;
> > +}
> > +}
> > +
> > +static void zero_aio_parse_filename(const char *filename, QDict
> *options,
> > +Error **errp)
> > +{
> > +/* This functions only exists so that a zero-aio:// filename is
> accepted
> > + * with the zero-aio driver. */
> > +if (strcmp(filename, "zero-aio://")) {
> > +error_setg(errp, "The only allowed filename for this driver is "
> > + "'zero-aio://'");
> > +return;
> > +}
> > +}
> > +
> >   static int null_file_open(BlockDriverState *bs, QDict *options, int
> flags,
> > Error **errp)
> >   {
> > @@ -99,6 +123,29 @@ static int null_file_open(BlockDriverState *bs,
> QDict *options, int flags,
> >   return ret;
> >   }
> >
> > +static int zero_file_open(BlockDriverState *bs, QDict *options, int
> flags,
> > +  Error **errp)
> > +{
> > +QemuOpts *opts;
> > +BDRVNullState *s = bs->opaque;
> > +int ret = 0;
> > +
> > +opts = qemu_opts_create(_opts, NULL, 0, _abort);
> > +qemu_opts_absorb_qdict(opts, options, _abort);
> > +s->length =
> > +qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 1 << 30);
> > +s->latency_ns =
> > +qemu_opt_get_number(opts, NULL_OPT_LATENCY, 0);
> > +if (s->latency_ns < 0) {
> > +error_setg(errp, "latency-ns is invalid");
> > +ret = -EINVAL;
> > +}
> > +s->read_zeroes = true;
> > +qemu_opts_del(opts);
> > +bs->supported_write_flags = BDRV_REQ_FUA;
> > +return ret;
> > +}
> > +
> >   static int64_t null_getlength(BlockDriverState *bs)
> >   {
> >   BDRVNullState *s = bs->opaque;
> > @@ -316,10 +363,54 @@ static BlockDriver bdrv_null_aio = {
> >   .strong_runtime_opts= null_strong_runtime_opts,
> >   };
> >
> > +static BlockDriver bdrv_zero_co = {
> > +.format_name= "zero-co",
> > +.protocol_name  = "zero-co",
> > +.instance_size  = sizeof(BDRVNullState),
> > +
> > +.bdrv_file_open = zero_file_open,
> > +.bdrv_parse_filename= zero_co_parse_filename,
> > +.bdrv_getlength = null_getlength,
> > +.bdrv_get_allocated_file_size = null_allocated_file_size,
> > +
> > +.bdrv_co_preadv = null_co_preadv,
> > +.bdrv_co_pwritev= null_co_pwritev,
> > +.bdrv_co_flush_to_disk  = null_co_flush,
> > +.bdrv_reopen_prepare= null_reopen_prepare,
> > +
> > +.bdrv_co_block_status   = null_co_block_status,
> > +
> > +.bdrv_refresh_filename  = null_refresh_filename,
> > +.strong_runtime_opts= null_strong_runtime_opts,
> > +};
> > +
> > +static BlockDriver bdrv_zero_aio = {
> > +.form

Re: [PATCH] block: Introduce zero-co:// and zero-aio://

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 14:51, Philippe Mathieu-Daudé 
wrote:

> On 3/10/21 3:28 PM, Fam Zheng wrote:
> > On Wed, 10 Mar 2021 at 14:24, Philippe Mathieu-Daudé  > <mailto:phi...@redhat.com>> wrote:
> >
> > On 3/10/21 3:17 PM, f...@euphon.net <mailto:f...@euphon.net> wrote:
> > > From: Fam Zheng mailto:famzh...@amazon.com>>
> > >
> > > null-co:// has a read-zeroes=off default, when used to in security
> > > analysis, this can cause false positives because the driver doesn't
> > > write to the read buffer.
> > >
> > > null-co:// has the highest possible performance as a block driver,
> so
> > > let's keep it that way. This patch introduces zero-co:// and
> > > zero-aio://, largely similar with null-*://, but have
> > read-zeroes=on by
> > > default, so it's more suitable in cases than null-co://.
> >
> > Thanks!
> >
> > >
> > > Signed-off-by: Fam Zheng mailto:f...@euphon.net>>
> > > ---
> > >  block/null.c | 91
> > 
> > >  1 file changed, 91 insertions(+)
> >
> > > +static int zero_file_open(BlockDriverState *bs, QDict *options,
> > int flags,
> > > +  Error **errp)
> > > +{
> > > +QemuOpts *opts;
> > > +BDRVNullState *s = bs->opaque;
> > > +int ret = 0;
> > > +
> > > +opts = qemu_opts_create(_opts, NULL, 0, _abort);
> > > +qemu_opts_absorb_qdict(opts, options, _abort);
> > > +s->length =
> > > +qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 1 << 30);
> >
> > Please do not use this magic default value, let's always explicit the
> > size.
> >
> > s->length = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
> > if (s->length < 0) {
> > error_setg(errp, "%s is invalid", BLOCK_OPT_SIZE);
> > ret = -EINVAL;
> > }
> >
> >
> > Doesn't that result in a bare
> >
> >   -blockdev zero-co://
> >
> > would have 0 byte in size?
>
> Yes, this will display an error.
>
> Maybe better something like:
>
> if (s->length == 0) {
> error_append_hint(errp, "Usage: zero-co://,size=NUM");
> error_setg(errp, "size must be specified");
> ret = -EINVAL;
> } else if (s->length < 0) {
> error_setg(errp, "%s is invalid", BLOCK_OPT_SIZE);
> ret = -EINVAL;
> }
>
> >
> > I'm copying what null-co:// does today, and it's convenient for tests.
> > Why is a default size bad?
>
> We learned default are almost always bad because you can not get
> rid of them. Also because we have reports of errors when using
> floppy and another one (can't remember) with null-co because of
> invalid size and we have to explain "the default is 1GB, you have
> to explicit your size".
>

Yeah I think that makes sense. I'll revise.

Thanks,
Fam

Re: [PATCH] block: Introduce zero-co:// and zero-aio://

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 14:24, Philippe Mathieu-Daudé 
wrote:

> On 3/10/21 3:17 PM, f...@euphon.net wrote:
> > From: Fam Zheng 
> >
> > null-co:// has a read-zeroes=off default, when used to in security
> > analysis, this can cause false positives because the driver doesn't
> > write to the read buffer.
> >
> > null-co:// has the highest possible performance as a block driver, so
> > let's keep it that way. This patch introduces zero-co:// and
> > zero-aio://, largely similar with null-*://, but have read-zeroes=on by
> > default, so it's more suitable in cases than null-co://.
>
> Thanks!
>
> >
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/null.c | 91 
> >  1 file changed, 91 insertions(+)
>
> > +static int zero_file_open(BlockDriverState *bs, QDict *options, int
> flags,
> > +  Error **errp)
> > +{
> > +QemuOpts *opts;
> > +BDRVNullState *s = bs->opaque;
> > +int ret = 0;
> > +
> > +opts = qemu_opts_create(_opts, NULL, 0, _abort);
> > +qemu_opts_absorb_qdict(opts, options, _abort);
> > +s->length =
> > +qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 1 << 30);
>
> Please do not use this magic default value, let's always explicit the
> size.
>
> s->length = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
> if (s->length < 0) {
> error_setg(errp, "%s is invalid", BLOCK_OPT_SIZE);
> ret = -EINVAL;
> }


Doesn't that result in a bare

  -blockdev zero-co://

would have 0 byte in size?

I'm copying what null-co:// does today, and it's convenient for tests. Why
is a default size bad?

Fam



>
>
> +s->latency_ns =
> > +qemu_opt_get_number(opts, NULL_OPT_LATENCY, 0);
> > +if (s->latency_ns < 0) {
> > +error_setg(errp, "latency-ns is invalid");
> > +ret = -EINVAL;
> > +}
> > +s->read_zeroes = true;
> > +qemu_opts_del(opts);
> > +bs->supported_write_flags = BDRV_REQ_FUA;
> > +return ret;
> > +}
>
> Otherwise LGTM :)
>
>
>

Re: [PATCH 0/3] block: Introduce the 'zeroes-co' driver to help security reports

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 12:38, Philippe Mathieu-Daudé 
wrote:

> On 3/10/21 1:32 PM, Fam Zheng wrote:
> > On Wed, 10 Mar 2021 at 11:44, Philippe Mathieu-Daudé  > <mailto:phi...@redhat.com>> wrote:
> >
> > Hi,
> >
> > This is an alternative approach to changing null-co driver
> > default 'read-zeroes' option to true:
> > https://www.mail-archive.com/qemu-block@nongnu.org/msg80873.html
> > <https://www.mail-archive.com/qemu-block@nongnu.org/msg80873.html>
> >
> > Instead we introduce yet another block driver with an explicit
> > name: 'zeroes-co'. We then clarify in secure-coding-practices.rst
> > that security reports have to be sent using this new driver.
> >
> > The 2nd patch is RFC because I won't spend time converting the
> > tests until the first patch is discussed, as I already spent enough
> > time doing that in the previous mentioned series.
> >
> > Regards,
> >
> > Phil.
> >
> > Philippe Mathieu-Daudé (3):
> >   block: Introduce the 'zeroes-co' driver
> >   tests/test-blockjob: Use zeroes-co instead of
> null-co,read-zeroes=on
> >   docs/secure-coding-practices: Describe null-co/zeroes-co block
> drivers
> >
> >  docs/devel/secure-coding-practices.rst |   7 +
> >  block/zeroes.c | 306
> +
> >
> > Why not add another BlockDriver struct to block/null.c and set the
> > read_zeroes field in the .bdrv_file_open callback? It would make the
> > patch much simpler.
>
> This is the first patch I wrote but then realized you are listed as
> null-co maintainer, so you might be uninterested in having this new
> driver in the same file. And declaring the prototypes public to
> reuse them seemed overkill.
>
>
I posted a patch for block/null.c to add the same interface, just as an
alternative to the first patch.

I think we all agree that both zeroed and non-zeroed read mode will be
supported going forward, the divergence is on approach:

a) -blockdev null-co:// | -blockdev null-co://,read-zeroes=off,

if we make read-zeroes=on the default.

b) -blockdev null-co:// | -blockdev zero-co://,

if we keep the current default of null-co://, but introduce zero-co://
which has a clearer interface.

Fam

Re: [PATCH 0/3] block: Introduce the 'zeroes-co' driver to help security reports

2021-03-10 Thread Fam Zheng

On Wed, 10 Mar 2021 at 11:44, Philippe Mathieu-Daudé 
wrote:

> Hi,
>
> This is an alternative approach to changing null-co driver
> default 'read-zeroes' option to true:
> https://www.mail-archive.com/qemu-block@nongnu.org/msg80873.html
>
> Instead we introduce yet another block driver with an explicit
> name: 'zeroes-co'. We then clarify in secure-coding-practices.rst
> that security reports have to be sent using this new driver.
>
> The 2nd patch is RFC because I won't spend time converting the
> tests until the first patch is discussed, as I already spent enough
> time doing that in the previous mentioned series.
>
> Regards,
>
> Phil.
>
> Philippe Mathieu-Daudé (3):
>   block: Introduce the 'zeroes-co' driver
>   tests/test-blockjob: Use zeroes-co instead of null-co,read-zeroes=on
>   docs/secure-coding-practices: Describe null-co/zeroes-co block drivers
>
>  docs/devel/secure-coding-practices.rst |   7 +
>  block/zeroes.c | 306 +
>


Why not add another BlockDriver struct to block/null.c and set the
read_zeroes field in the .bdrv_file_open callback? It would make the patch
much simpler.

Fam


>  tests/test-blockjob.c  |   4 +-
>  block/meson.build  |   1 +
>  4 files changed, 315 insertions(+), 3 deletions(-)
>  create mode 100644 block/zeroes.c
>
> --
> 2.26.2
>
>
>
>

Re: [PATCH v2 0/2] block: Use 'read-zeroes=true' mode by default with 'null-co' driver

2021-02-23 Thread Fam Zheng

On 2021-02-23 17:01, Max Reitz wrote:
> On 23.02.21 10:21, Fam Zheng wrote:
> > On 2021-02-22 18:55, Philippe Mathieu-Daudé wrote:
> > > On 2/22/21 6:35 PM, Fam Zheng wrote:
> > > > On 2021-02-19 15:09, Philippe Mathieu-Daudé wrote:
> > > > > On 2/19/21 12:07 PM, Max Reitz wrote:
> > > > > > On 13.02.21 22:54, Fam Zheng wrote:
> > > > > > > On 2021-02-11 15:26, Philippe Mathieu-Daudé wrote:
> > > > > > > > The null-co driver doesn't zeroize buffer in its default config,
> > > > > > > > because it is designed for testing and tests want to run fast.
> > > > > > > > However this confuses security researchers (access to uninit
> > > > > > > > buffers).
> > > > > > > 
> > > > > > > I'm a little surprised.
> > > > > > > 
> > > > > > > Is changing default the only way to fix this? I'm not opposed to
> > > > > > > changing the default but I'm not convinced this is the easiest 
> > > > > > > way.
> > > > > > > block/nvme.c also doesn't touch the memory, but defers to the 
> > > > > > > device
> > > > > > > DMA, why doesn't that confuse the security checker?
> > > > > 
> > > > > Generally speaking, there is a balance between security and 
> > > > > performance.
> > > > > We try to provide both, but when we can't, my understanding is 
> > > > > security
> > > > > is more important.
> > > > 
> > > > Why is hiding the code path behind a non-default more secure? What is
> > > > not secure now?
> > > 
> > > Se we are back to the problem of having default values.
> > > 
> > > I'd like to remove the default and have the option explicit,
> > > but qemu_opt_get_bool() expects a 'default' value.
> > > 
> > > Should we rename qemu_opt_get_bool() -> qemu_opt_get_bool_with_default()
> > > and add a simpler qemu_opt_get_bool()?
> > 
> > My point is I still don't get the full context of the problem this
> > series is trying to solve. If the problem is tools are confused, I would
> > like to understand why. What is the thing that matters here, exactly?
> 
> My personal opinion is that it isn’t even about tools, it’s just about the
> fact that operating on uninitialized data is something that should generally
> be avoided.  For the null drivers, there is a reason to allow it
> (performance testing), but that should be a special case, not the default.

How do we define uninitialized data? Are buffers passed to a successful
read (2) syscall initialized? We cannot know, because it's up to the
fs/driver/storage. It's the same to bdrv_pread().

In fact block/null.c doesn't operate on uninitialized data, we should
really fix guess_disk_lchs() and similar.

> 
> > But there's always been nullblk.ko in kernel that doesn't lie in the
> > name. If we change the default we are not very honest any more about
> > what the driver actually does.
> 
> Then we’re already lying, because if we model it after /dev/null, we should
> probably return -EIO on every read.

nullblk.ko is not /dev/null, it's /dev/nullb*:

https://www.kernel.org/doc/Documentation/block/null_blk.txt

Fam

Re: [PATCH v2 0/2] block: Use 'read-zeroes=true' mode by default with 'null-co' driver

2021-02-23 Thread Fam Zheng

On 2021-02-22 18:55, Philippe Mathieu-Daudé wrote:
> On 2/22/21 6:35 PM, Fam Zheng wrote:
> > On 2021-02-19 15:09, Philippe Mathieu-Daudé wrote:
> >> On 2/19/21 12:07 PM, Max Reitz wrote:
> >>> On 13.02.21 22:54, Fam Zheng wrote:
> >>>> On 2021-02-11 15:26, Philippe Mathieu-Daudé wrote:
> >>>>> The null-co driver doesn't zeroize buffer in its default config,
> >>>>> because it is designed for testing and tests want to run fast.
> >>>>> However this confuses security researchers (access to uninit
> >>>>> buffers).
> >>>>
> >>>> I'm a little surprised.
> >>>>
> >>>> Is changing default the only way to fix this? I'm not opposed to
> >>>> changing the default but I'm not convinced this is the easiest way.
> >>>> block/nvme.c also doesn't touch the memory, but defers to the device
> >>>> DMA, why doesn't that confuse the security checker?
> >>
> >> Generally speaking, there is a balance between security and performance.
> >> We try to provide both, but when we can't, my understanding is security
> >> is more important.
> > 
> > Why is hiding the code path behind a non-default more secure? What is
> > not secure now?
> 
> Se we are back to the problem of having default values.
> 
> I'd like to remove the default and have the option explicit,
> but qemu_opt_get_bool() expects a 'default' value.
> 
> Should we rename qemu_opt_get_bool() -> qemu_opt_get_bool_with_default()
> and add a simpler qemu_opt_get_bool()?

My point is I still don't get the full context of the problem this
series is trying to solve. If the problem is tools are confused, I would
like to understand why. What is the thing that matters here, exactly?

But there's always been nullblk.ko in kernel that doesn't lie in the
name. If we change the default we are not very honest any more about
what the driver actually does.

Even if null-co:// and null-aio:// is a bad idea, then let's add
zero-co://co and zero-aio://, and deprecate null-*://.

Fam

Re: [PATCH v2 0/2] block: Use 'read-zeroes=true' mode by default with 'null-co' driver

2021-02-22 Thread Fam Zheng

On 2021-02-19 15:09, Philippe Mathieu-Daudé wrote:
> On 2/19/21 12:07 PM, Max Reitz wrote:
> > On 13.02.21 22:54, Fam Zheng wrote:
> >> On 2021-02-11 15:26, Philippe Mathieu-Daudé wrote:
> >>> The null-co driver doesn't zeroize buffer in its default config,
> >>> because it is designed for testing and tests want to run fast.
> >>> However this confuses security researchers (access to uninit
> >>> buffers).
> >>
> >> I'm a little surprised.
> >>
> >> Is changing default the only way to fix this? I'm not opposed to
> >> changing the default but I'm not convinced this is the easiest way.
> >> block/nvme.c also doesn't touch the memory, but defers to the device
> >> DMA, why doesn't that confuse the security checker?
> 
> Generally speaking, there is a balance between security and performance.
> We try to provide both, but when we can't, my understanding is security
> is more important.

Why is hiding the code path behind a non-default more secure? What is
not secure now?

Fam

Re: [PATCH v2 0/2] block: Use 'read-zeroes=true' mode by default with 'null-co' driver

2021-02-13 Thread Fam Zheng

On 2021-02-11 15:26, Philippe Mathieu-Daudé wrote:
> The null-co driver doesn't zeroize buffer in its default config,
> because it is designed for testing and tests want to run fast.
> However this confuses security researchers (access to uninit
> buffers).

I'm a little surprised.

Is changing default the only way to fix this? I'm not opposed to
changing the default but I'm not convinced this is the easiest way.
block/nvme.c also doesn't touch the memory, but defers to the device
DMA, why doesn't that confuse the security checker?

Cannot we just somehow annotate it in a way that the checker can
understand (akin to how we provide coverity models) and be done?

Thanks,
Fam

> 
> A one-line patch supposed which became a painful one, because
> there is so many different syntax to express the same usage:
> 
>  opt = qdict_new();
>  qdict_put_str(opt, "read-zeroes", "off");
>  null_bs = bdrv_open("null-co://", NULL, opt, BDRV_O_RDWR | BDRV_O_PROTOCOL,
>  _abort);
> 
> vm.qmp('blockdev-add', driver='null-co', read_zeroes=False, ...)
> 
> vm.add_drive_raw("id=drive0,driver=null-co,read-zeroes=off,if=none")
> 
> blk0 = { 'node-name': 'src',
> 'driver': 'null-co',
> 'read-zeroes': 'off' }
> 
> 'file': {
> 'driver': 'null-co',
> 'read-zeroes': False,
> }
> 
> "file": {
> "driver": "null-co",
> "read-zeroes": "off"
> }
> 
> { "execute": "blockdev-add",
>   "arguments": {
>   "driver": "null-co",
>   "read-zeroes": false,
>   "node-name": "disk0"
> }
> }
> 
> opts = {'driver': 'null-co,read-zeroes=off', 'node-name': 'root', 'size': 
> 1024}
> 
> qemu -drive driver=null-co,read-zeroes=off
> 
> qemu-io ... "json:{'driver': 'null-co', 'read-zeroes': false, 'size': 65536}"
> 
> qemu-img null-co://,read-zeroes=off
> 
> qemu-img ... -o 
> data_file="json:{'driver':'null-co',,'read-zeroes':false,,'size':'4294967296'}"
> 
> There are probably more.
> 
> Anyhow, the iotests I am not sure and should be audited are 056, 155
> (I don't understand the syntax) and 162.
> 
> Please review,
> 
> Phil.
> 
> Philippe Mathieu-Daud=C3=A9 (2):
>   block: Explicit null-co uses 'read-zeroes=3Dfalse'
>   block/null: Enable 'read-zeroes' mode by default
> 
>  docs/devel/testing.rst | 14 +++---
>  tests/qtest/fuzz/generic_fuzz_configs.h| 11 ++-
>  block/null.c   |  2 +-
>  tests/test-bdrv-drain.c| 10 --
>  tests/acceptance/virtio_check_params.py|  2 +-
>  tests/perf/block/qcow2/convert-blockstatus |  6 +++---
>  tests/qemu-iotests/040 |  2 +-
>  tests/qemu-iotests/041 | 12 
>  tests/qemu-iotests/051 |  2 +-
>  tests/qemu-iotests/051.out |  2 +-
>  tests/qemu-iotests/051.pc.out  |  4 ++--
>  tests/qemu-iotests/087 |  6 --
>  tests/qemu-iotests/118 |  2 +-
>  tests/qemu-iotests/133 |  2 +-
>  tests/qemu-iotests/153 |  8 
>  tests/qemu-iotests/184 |  2 ++
>  tests/qemu-iotests/184.out | 10 +-
>  tests/qemu-iotests/218 |  3 +++
>  tests/qemu-iotests/224 |  3 ++-
>  tests/qemu-iotests/224.out |  8 
>  tests/qemu-iotests/225 |  2 +-
>  tests/qemu-iotests/227 |  4 ++--
>  tests/qemu-iotests/227.out |  4 ++--
>  tests/qemu-iotests/228 |  2 +-
>  tests/qemu-iotests/235 |  1 +
>  tests/qemu-iotests/245 |  2 +-
>  tests/qemu-iotests/270 |  2 +-
>  tests/qemu-iotests/283 |  3 ++-
>  tests/qemu-iotests/283.out |  4 ++--
>  tests/qemu-iotests/299 |  1 +
>  tests/qemu-iotests/299.out |  2 +-
>  tests/qemu-iotests/300 |  4 ++--
>  32 files changed, 82 insertions(+), 60 deletions(-)
> 
> --=20
> 2.26.2
> 
> 
>

Re: [PATCH] scsi: allow user to set werror as report

2020-11-03 Thread Fam Zheng

On Tue, 2020-11-03 at 14:12 +0800, Zihao Chang wrote:
> 'enospc' is the default for -drive, but qemu allows user to set
> drive option werror. If werror of scsi-generic is set to 'report'
> by user, qemu will not allow vm to start.
> 
> This patch allow user to set werror as 'report' for scsi-generic.
> 
> Signed-off-by: Zihao Chang 
> ---
>  hw/scsi/scsi-generic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 2cb23ca891..2730e37d63 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -664,7 +664,8 @@ static void scsi_generic_realize(SCSIDevice *s,
> Error **errp)
>  return;
>  }
>  
> -if (blk_get_on_error(s->conf.blk, 0) !=
> BLOCKDEV_ON_ERROR_ENOSPC) {
> +if (blk_get_on_error(s->conf.blk, 0) != BLOCKDEV_ON_ERROR_ENOSPC
> &&
> +blk_get_on_error(s->conf.blk, 0) !=
> BLOCKDEV_ON_ERROR_REPORT) {
>  error_setg(errp, "Device doesn't support drive option
> werror");
>  return;
>  }

Accepting the report sounds sane to me, it matches what we actually
(always) do. Is the idea to allow users to spell it out explicitly in
the command line?

Reviewed-by: Fam Zheng

Re: [PATCH] pci: check bus pointer before dereference

2020-10-30 Thread Fam Zheng

On Fri, 2020-10-30 at 05:01 -0400, Michael S. Tsirkin wrote:
> On Wed, Sep 30, 2020 at 10:32:42AM +0530, P J P wrote:
> > 
> > [+Paolo, +Fam Zheng - for scsi]
> > 
> > +-- On Mon, 28 Sep 2020, P J P wrote --+
> > > +-- On Wed, 16 Sep 2020, Peter Maydell wrote --+
> > > > On Wed, 16 Sep 2020 at 07:28, P J P  wrote:
> > > > > -> 
> > > > > https://ruhr-uni-bochum.sciebo.de/s/NNWP2GfwzYKeKwE?path=%2Flsi_nullptr1
> > > > > ==1183858==Hint: address points to the zero page.
> > > > > #0 pci_change_irq_level hw/pci/pci.c:259
> > > > > #1 pci_irq_handler hw/pci/pci.c:1445
> > > > > #2 pci_set_irq hw/pci/pci.c:1463
> > > > > #3 lsi_set_irq hw/scsi/lsi53c895a.c:488
> > > > > #4 lsi_update_irq hw/scsi/lsi53c895a.c:523
> > > > > #5 lsi_script_scsi_interrupt hw/scsi/lsi53c895a.c:554
> > > > > #6 lsi_execute_script hw/scsi/lsi53c895a.c:1149
> > > > > #7 lsi_reg_writeb hw/scsi/lsi53c895a.c:1984
> > > > > #8 lsi_io_write hw/scsi/lsi53c895a.c:2146
> > > 
> > > ...
> > > > Generally we don't bother to assert() that pointers that
> > > > shouldn't be NULL 
> > > > really are NULL immediately before dereferencing them, because
> > > > the 
> > > > dereference provides an equally easy-to-debug crash to the
> > > > assert, and so 
> > > > the assert doesn't provide anything extra. assert()ing that a
> > > > pointer is 
> > > > non-NULL is more useful if it is done in a place that
> > > > identifies the problem 
> > > > at an earlier and easier-to-debug point in execution rather
> > > > than at a later 
> > > > point which is distantly removed from the place where the bogus
> > > > pointer was 
> > > > introduced.
> > > 
> > > * The NULL dereference above occurs because the 'pci_dev->qdev-
> > > >parent_bus' 
> > >   address gets overwritten (with 0x0) during scsi 'Memory Move'
> > > operation in
> > > 
> > >  ../hw/scsi/lsi53c895a.c
> > >   #define LSI_BUF_SIZE 4096
> > > 
> > > lsi_mmio_write
> > >  lsi_reg_writeb
> > >   lsi_execute_script
> > >static void lsi_memcpy(LSIState *s, ... int count=12MB)
> > >{
> > >   int n;
> > >   uint8_t buf[LSI_BUF_SIZE];
> > > 
> > >   while (count) {
> > > n = (count > LSI_BUF_SIZE) ? LSI_BUF_SIZE : count;
> > > lsi_mem_read(s, src, buf, n);  <== read from DMA
> > > memory
> > > lsi_mem_write(s, dest, buf, n);<== write to I/O
> > > memory
> > > src += n;
> > > dest += n;
> > > count -= n;
> > >  }
> > >}
> > > -> 
> > > https://www.manualslib.com/manual/1407578/Lsi-Lsi53c895a.html?page=254#manual
> > > 
> > > * Above loop moves data between DMA memory to i/o address space.
> > > 
> > > * Going through the manual above, it seems 'Memory Move' can move
> > > upto 16MB of 
> > >   data between memory spaces.
> > > 
> > > * I tried to see a suitable fix, but couldn't get one.
> > > 
> > >   - Limiting 'count' value does not seem right, as allowed value
> > > is upto 16MB.
> > > 
> > >   - Manual above talks about moving data via 'dma_buf'. But it
> > > doesn't seem to 
> > > be used here.
> > > 
> > > * During above loop, 'dest' address moves past its 'MemoryRegion
> > > mr' and 
> > >   overwrites the adjacent 'mr' memory area, overwritting
> > > 'parent_bus' value.

I agree with Igor, I don't understand how pci_dma_rw writing into the
next mr can cause the NULL pointer. parent_bus is in the QEMU heap, but
mr is backed by different ram areas, protected with boundary check.

Is there a backtrace at the corruption time?

Fam

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-10-22 Thread Fam Zheng

On Tue, 2020-10-20 at 09:34 +0800, Zhenyu Ye wrote:
> On 2020/10/19 21:25, Paolo Bonzini wrote:
> > On 19/10/20 14:40, Zhenyu Ye wrote:
> > > The kernel backtrace for io_submit in GUEST is:
> > > 
> > >   guest# ./offcputime -K -p `pgrep -nx fio`
> > >   b'finish_task_switch'
> > >   b'__schedule'
> > >   b'schedule'
> > >   b'io_schedule'
> > >   b'blk_mq_get_tag'
> > >   b'blk_mq_get_request'
> > >   b'blk_mq_make_request'
> > >   b'generic_make_request'
> > >   b'submit_bio'
> > >   b'blkdev_direct_IO'
> > >   b'generic_file_read_iter'
> > >   b'aio_read'
> > >   b'io_submit_one'
> > >   b'__x64_sys_io_submit'
> > >   b'do_syscall_64'
> > >   b'entry_SYSCALL_64_after_hwframe'
> > >   -fio (1464)
> > >   40031912
> > > 
> > > And Linux io_uring can avoid the latency problem.

Thanks for the info. What this tells us is basically the inflight
requests are high. It's sad that the linux-aio is in practice
implemented as a blocking API.

Host side backtrace will be of more help. Can you get that too?

Fam

> > 
> > What filesystem are you using?
> > 
> 
> On host, the VM image and disk images are based on ext4 filesystem.
> In guest, the '/' uses xfs filesystem, and the disks are raw devices.
> 
> guest# df -hT
> Filesystem  Type  Size  Used Avail Use% Mounted on
> devtmpfsdevtmpfs   16G 0   16G   0% /dev
> tmpfs   tmpfs  16G 0   16G   0% /dev/shm
> tmpfs   tmpfs  16G  976K   16G   1% /run
> /dev/mapper/fedora-root xfs   8.0G  3.2G  4.9G  40% /
> tmpfs   tmpfs  16G 0   16G   0% /tmp
> /dev/sda1   xfs  1014M  181M  834M  18% /boot
> tmpfs   tmpfs 3.2G 0  3.2G   0% /run/user/0
> 
> guest# lsblk
> NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda   8:00  10G  0 disk
> ├─sda18:10   1G  0 part /boot
> └─sda28:20   9G  0 part
>   ├─fedora-root 253:00   8G  0 lvm  /
>   └─fedora-swap 253:10   1G  0 lvm  [SWAP]
> vda 252:00  10G  0 disk
> vdb 252:16   0  10G  0 disk
> vdc 252:32   0  10G  0 disk
> vdd 252:48   0  10G  0 disk
> 
> Thanks,
> Zhenyu
>

Re: [PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Fam Zheng

On Wed, 2020-10-14 at 14:42 +0200, Philippe Mathieu-Daudé wrote:
> On 10/14/20 2:34 PM, Fam Zheng wrote:
> > On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> > > A bunch of boring patches that have been proven helpful
> > > while debugging.
> > > 
> > > Philippe Mathieu-Daudé (9):
> > >util/vfio-helpers: Improve reporting unsupported IOMMU type
> > >util/vfio-helpers: Trace PCI I/O config accesses
> > >util/vfio-helpers: Trace PCI BAR region info
> > >util/vfio-helpers: Trace where BARs are mapped
> > >util/vfio-helpers: Improve DMA trace events
> > >util/vfio-helpers: Convert vfio_dump_mapping to trace events
> > >util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
> > >util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
> > >util/vfio-helpers: Let qemu_vfio_verify_mappings() use
> > > error_report()
> > > 
> > >   include/qemu/vfio-helpers.h |  2 +-
> > >   block/nvme.c| 14 
> > >   util/vfio-helpers.c | 66 +-
> > > ---
> > > 
> > >   util/trace-events   | 10 ++++--
> > >   4 files changed, 54 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 2.26.2
> > > 
> > > 
> > > 
> > 
> > Modular the g_strdup_printf() memleak I pointed out:
> > 
> > Reviewed-by: Fam Zheng 

Overlooked the auto free qualifier, so that one is okay too!

Fam

> 
> Thanks!
> 
>

Re: [PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Fam Zheng

On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> A bunch of boring patches that have been proven helpful
> while debugging.
> 
> Philippe Mathieu-Daudé (9):
>   util/vfio-helpers: Improve reporting unsupported IOMMU type
>   util/vfio-helpers: Trace PCI I/O config accesses
>   util/vfio-helpers: Trace PCI BAR region info
>   util/vfio-helpers: Trace where BARs are mapped
>   util/vfio-helpers: Improve DMA trace events
>   util/vfio-helpers: Convert vfio_dump_mapping to trace events
>   util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
>   util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
>   util/vfio-helpers: Let qemu_vfio_verify_mappings() use
> error_report()
> 
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c| 14 
>  util/vfio-helpers.c | 66 +
> 
>  util/trace-events   | 10 --
>  4 files changed, 54 insertions(+), 38 deletions(-)
> 
> -- 
> 2.26.2
> 
> 
> 

Modular the g_strdup_printf() memleak I pointed out:

Reviewed-by: Fam Zheng

Re: [PATCH 3/9] util/vfio-helpers: Trace PCI BAR region info

2020-10-14 Thread Fam Zheng

On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> For debug purpose, trace BAR regions info.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  util/vfio-helpers.c | 8 
>  util/trace-events   | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 1d4efafcaa4..cd6287c3a98 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -136,6 +136,7 @@ static inline void
> assert_bar_index_valid(QEMUVFIOState *s, int index)
>  
>  static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error
> **errp)
>  {
> +g_autofree char *barname = NULL;
>  assert_bar_index_valid(s, index);
>  s->bar_region_info[index] = (struct vfio_region_info) {
>  .index = VFIO_PCI_BAR0_REGION_INDEX + index,
> @@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState
> *s, int index, Error **errp)
>  error_setg_errno(errp, errno, "Failed to get BAR region
> info");
>  return -errno;
>  }
> +barname = g_strdup_printf("bar[%d]", index);

Where is barname freed?

Fam

> +trace_qemu_vfio_region_info(barname, s-
> >bar_region_info[index].offset,
> +s->bar_region_info[index].size,
> +s-
> >bar_region_info[index].cap_offset);
>  
>  return 0;
>  }
> @@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s,
> const char *device,
>  ret = -errno;
>  goto fail;
>  }
> +trace_qemu_vfio_region_info("config", s-
> >config_region_info.offset,
> +s->config_region_info.size,
> +s->config_region_info.cap_offset);
>  
>  for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
>  ret = qemu_vfio_pci_init_bar(s, i, errp);
> diff --git a/util/trace-events b/util/trace-events
> index c048f85f828..4d40c74a21f 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size,
> bool temporary, uint64_t *io
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t
> region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d
> (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t
> region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d
> (region ofs 0x%"PRIx64" size %"PRId64")"
> +qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t
> size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size
> %"PRId64" cap_ofs %"PRId32

Re: [PATCH v2] gitlab: move linux-user plugins test across to gitlab

2020-10-02 Thread Fam Zheng

On Fri, 2020-10-02 at 11:32 +0100, Alex Bennée wrote:
> Even with the recent split moving beefier plugins into contrib and
> dropping them from the check-tcg tests we are still hitting time
> limits. This possibly points to a slow down of --debug-tcg but seeing
> as we are migrating stuff to gitlab we might as well move there and
> bump the timeout.
> 
> Signed-off-by: Alex Bennée 

Hi Alex,

Unrelated to the patch: do we have custom runners on gitlab? I'm
exploring ideas to run vm based tests there.

Fam

Re: [PATCH 1/4] vmdk: fix maybe uninitialized warnings

2020-09-30 Thread Fam Zheng

On Wed, 2020-09-30 at 17:58 +0200, Christian Borntraeger wrote:
> Fedora 32 gcc 10 seems to give false positives:
> 
> Compiling C object libblock.fa.p/block_vmdk.c.o
> ../block/vmdk.c: In function ‘vmdk_parse_extents’:
> ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   587 | g_free(extent->l1_table);
>   | ^~~~
> ../block/vmdk.c:754:17: note: ‘extent’ was declared here
>   754 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   620 | ret = vmdk_init_tables(bs, extent, errp);
>   |   ^~
> ../block/vmdk.c:598:17: note: ‘extent’ was declared here
>   598 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>  1178 | extent->flat_start_offset = flat_offset << 9;
>   | ~~^~
> ../block/vmdk.c: In function ‘vmdk_open_vmdk4’:
> ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   581 | extent->l2_cache =
>   | ~^
>   582 | g_malloc(extent->entry_size * extent->l2_size *
> L2_CACHE_SIZE);
>   | ~
> ~
> ../block/vmdk.c:872:17: note: ‘extent’ was declared here
>   872 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c: In function ‘vmdk_open’:
> ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   620 | ret = vmdk_init_tables(bs, extent, errp);
>   |   ^~
> ../block/vmdk.c:598:17: note: ‘extent’ was declared here
>   598 | VmdkExtent *extent;
>   | ^~
> cc1: all warnings being treated as errors
> make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1
> 
> fix them by assigning a default value.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  block/vmdk.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 8ec62c7ab798..a00dc00eb47a 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -595,7 +595,7 @@ static int vmdk_open_vmfs_sparse(BlockDriverState
> *bs,
>  int ret;
>  uint32_t magic;
>  VMDK3Header header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  
>  ret = bdrv_pread(file, sizeof(magic), , sizeof(header));
>  if (ret < 0) {
> @@ -751,7 +751,7 @@ static int vmdk_open_se_sparse(BlockDriverState
> *bs,
>  int ret;
>  VMDKSESparseConstHeader const_header;
>  VMDKSESparseVolatileHeader volatile_header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  
>  ret = bdrv_apply_auto_read_only(bs,
>  "No write support for seSparse images available", errp);
> @@ -869,7 +869,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
>  uint32_t magic;
>  uint32_t l1_size, l1_entry_sectors;
>  VMDK4Header header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  BDRVVmdkState *s = bs->opaque;
>  int64_t l1_backup_offset = 0;
>  bool compressed;
> @@ -1088,7 +1088,7 @@ static int vmdk_parse_extents(const char *desc,
> BlockDriverState *bs,
>  BdrvChild *extent_file;
>  BdrvChildRole extent_role;
>  BDRVVmdkState *s = bs->opaque;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  char extent_opt_prefix[32];
>  Error *local_err = NULL;
>  

Looks trivial, and correct.

Reviewed-by: Fam Zheng

Re: [RFC 1/3] util/vfio-helpers: Collect IOVA reserved regions

2020-09-25 Thread Fam Zheng

On Fri, 2020-09-25 at 17:23 +0200, Auger Eric wrote:
> > > @@ -365,8 +430,12 @@ static int qemu_vfio_init_pci(QEMUVFIOState
> > > *s, const char *device,
> > >  if (ret) {
> > >  goto fail;
> > >  }
> > > +g_free(iommu_info);
> > >  return 0;
> > >  fail:
> > > +g_free(s->usable_iova_ranges);
> > 
> > Set s->usable_iova_ranges to NULL to avoid double free?
> 
> I think I did at the beginning of qemu_vfio_init_pci()

Yes, but I mean clearing the pointer will make calling
qemu_vfio_close() safe, there is also a g_free() on this one.

Fam

> 
> Thanks
> 
> Eric
> > 
> > > +s->nb_iova_ranges = 0;
> > > +g_free(iommu_info);
> > >  close(s->group);
> > >  fail_container:
> > >  close(s->container);
> > > @@ -716,6 +785,8 @@ void qemu_vfio_close(QEMUVFIOState *s)
> > >  qemu_vfio_undo_mapping(s, >mappings[i], NULL);
> > >  }
> > >  ram_block_notifier_remove(>ram_notifier);
> > > +g_free(s->usable_iova_ranges);
> > > +s->nb_iova_ranges = 0;
> > >  qemu_vfio_reset(s);
> > >  close(s->device);
> > >  close(s->group);
> > > -- 
> > > 2.21.3
> > > 
> > > 
> > 
> > Fam
> > 
> 
>

Re: [RFC 1/3] util/vfio-helpers: Collect IOVA reserved regions

2020-09-25 Thread Fam Zheng

On 2020-09-25 15:48, Eric Auger wrote:
> The IOVA allocator currently ignores host reserved regions.
> As a result some chosen IOVAs may collide with some of them,
> resulting in VFIO MAP_DMA errors later on. This happens on ARM
> where the MSI reserved window quickly is encountered:
> [0x800, 0x810]. since 5.4 kernel, VFIO returns the usable
> IOVA regions. So let's enumerate them in the prospect to avoid
> them, later on.
> 
> Signed-off-by: Eric Auger 
> ---
>  util/vfio-helpers.c | 75 +++--
>  1 file changed, 73 insertions(+), 2 deletions(-)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 583bdfb36f..8e91beba95 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -40,6 +40,11 @@ typedef struct {
>  uint64_t iova;
>  } IOVAMapping;
>  
> +struct IOVARange {
> +uint64_t start;
> +uint64_t end;
> +};
> +
>  struct QEMUVFIOState {
>  QemuMutex lock;
>  
> @@ -49,6 +54,8 @@ struct QEMUVFIOState {
>  int device;
>  RAMBlockNotifier ram_notifier;
>  struct vfio_region_info config_region_info, bar_region_info[6];
> +struct IOVARange *usable_iova_ranges;
> +uint8_t nb_iova_ranges;
>  
>  /* These fields are protected by @lock */
>  /* VFIO's IO virtual address space is managed by splitting into a few
> @@ -236,6 +243,36 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, 
> void *buf, int size, int
>  return ret == size ? 0 : -errno;
>  }
>  
> +static void collect_usable_iova_ranges(QEMUVFIOState *s, void *first_cap)
> +{
> +struct vfio_iommu_type1_info_cap_iova_range *cap_iova_range;
> +struct vfio_info_cap_header *cap = first_cap;
> +void *offset = first_cap;
> +int i;
> +
> +while (cap->id != VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE) {
> +if (cap->next) {

This check looks reversed.

> +return;
> +}
> +offset += cap->next;

@offset is unused.

> +cap = (struct vfio_info_cap_header *)cap;

This assignment is nop.

> +}
> +
> +cap_iova_range = (struct vfio_iommu_type1_info_cap_iova_range *)cap;
> +
> +s->nb_iova_ranges = cap_iova_range->nr_iovas;
> +if (s->nb_iova_ranges > 1) {
> +s->usable_iova_ranges =
> +g_realloc(s->usable_iova_ranges,
> +  s->nb_iova_ranges * sizeof(struct IOVARange));
> +}
> +
> +for (i = 0; i <  s->nb_iova_ranges; i++) {

s/  / /

> +s->usable_iova_ranges[i].start = 
> cap_iova_range->iova_ranges[i].start;
> +s->usable_iova_ranges[i].end = cap_iova_range->iova_ranges[i].end;
> +}
> +}
> +
>  static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
>Error **errp)
>  {
> @@ -243,10 +280,13 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
> char *device,
>  int i;
>  uint16_t pci_cmd;
>  struct vfio_group_status group_status = { .argsz = sizeof(group_status) 
> };
> -struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) 
> };
> +struct vfio_iommu_type1_info *iommu_info = NULL;
> +size_t iommu_info_size = sizeof(*iommu_info);
>  struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
>  char *group_file = NULL;
>  
> +s->usable_iova_ranges = NULL;
> +
>  /* Create a new container */
>  s->container = open("/dev/vfio/vfio", O_RDWR);
>  
> @@ -310,13 +350,38 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
> char *device,
>  goto fail;
>  }
>  
> +iommu_info = g_malloc0(iommu_info_size);
> +iommu_info->argsz = iommu_info_size;
> +
>  /* Get additional IOMMU info */
> -if (ioctl(s->container, VFIO_IOMMU_GET_INFO, _info)) {
> +if (ioctl(s->container, VFIO_IOMMU_GET_INFO, iommu_info)) {
>  error_setg_errno(errp, errno, "Failed to get IOMMU info");
>  ret = -errno;
>  goto fail;
>  }
>  
> +/*
> + * if the kernel does not report usable IOVA regions, choose
> + * the legacy [QEMU_VFIO_IOVA_MIN, QEMU_VFIO_IOVA_MAX -1] region
> + */
> +s->nb_iova_ranges = 1;
> +s->usable_iova_ranges = g_new0(struct IOVARange, 1);
> +s->usable_iova_ranges[0].start = QEMU_VFIO_IOVA_MIN;
> +s->usable_iova_ranges[0].end = QEMU_VFIO_IOVA_MAX - 1;
> +
> +if (iommu_info->argsz > iommu_info_size) {
> +void *first_cap;
> +
> +iommu_info_size = iommu_info->argsz;
> +iommu_info = g_realloc(iommu_info, iommu_info_size);
> +if (ioctl(s->container, VFIO_IOMMU_GET_INFO, iommu_info)) {
> +ret = -errno;
> +goto fail;
> +}
> +first_cap = (void *)iommu_info + iommu_info->cap_offset;
> +collect_usable_iova_ranges(s, first_cap);
> +}
> +
>  s->device = ioctl(s->group, VFIO_GROUP_GET_DEVICE_FD, device);
>  
>  if (s->device < 0) {
> @@ -365,8 +430,12 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
> char *device,
>  if (ret)

Re: [PATCH v2 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-22 Thread Fam Zheng

On Tue, 2020-09-22 at 10:38 +0200, Philippe Mathieu-Daudé wrote:
> Instead of mapping 8K of I/O + doorbells as RW during the whole
> execution, maps I/O temporarly at init, and map doorbells WO.
> 
> Replace various magic values by slighly more explicit macros from
> "block/nvme.h".
> 
> Since v1: Fix uninitialized regs* (patchew)
> 
> Philippe Mathieu-Daudé (6):
>   util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()
>   block/nvme: Map doorbells pages write-only
>   block/nvme: Reduce I/O registers scope
>   block/nvme: Drop NVMeRegs structure, directly use NvmeBar
>   block/nvme: Use register definitions from 'block/nvme.h'
>   block/nvme: Replace magic value by SCALE_MS definition
> 
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c| 73 +
> 
>  util/vfio-helpers.c |  4 +-
>  3 files changed, 44 insertions(+), 35 deletions(-)
> 

Reviewed-by: Fam Zheng

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Fam Zheng

On Tue, 2020-09-22 at 10:41 +0200, Philippe Mathieu-Daudé wrote:
> Hi Fam,
> 
> +Paolo?
> 
> On 9/22/20 10:18 AM, Fam Zheng wrote:
> > On Mon, 2020-09-21 at 18:29 +0200, Philippe Mathieu-Daudé wrote:
> > > Per the datasheet sections 3.1.13/3.1.14:
> > >   "The host should not read the doorbell registers."
> > > 
> > > As we don't need read access, map the doorbells with write-only
> > > permission. We keep a reference to this mapped address in the
> > > BDRVNVMeState structure.
> > 
> > Besides looking more correct in access mode, is there any side
> > effect
> > of WO mapping?
> 
> TBH I don't have enough knowledge to answer this question.
> I tested successfully on X86. I'm writing more tests.

The reason I'm asking is more because x86 pages are either RO or RW. So
I'd like to see if there's a practical reason behind this patch (I have
no idea about the effects on MTRR and/or IOMMU).

Fam

> 
> > 
> > Fam
> > 
> > > 
> > > Signed-off-by: Philippe Mathieu-Daudé 
> > > ---
> > >  block/nvme.c | 29 +++--
> > >  1 file changed, 19 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index 5a4dc6a722a..3c834da8fec 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -31,7 +31,7 @@
> > >  #define NVME_SQ_ENTRY_BYTES 64
> > >  #define NVME_CQ_ENTRY_BYTES 16
> > >  #define NVME_QUEUE_SIZE 128
> > > -#define NVME_BAR_SIZE 8192
> > > +#define NVME_DOORBELL_SIZE 4096
> > >  
> > >  /*
> > >   * We have to leave one slot empty as that is the full queue
> > > case
> > > where
> > > @@ -84,10 +84,6 @@ typedef struct {
> > >  /* Memory mapped registers */
> > >  typedef volatile struct {
> > >  NvmeBar ctrl;
> > > -struct {
> > > -uint32_t sq_tail;
> > > -uint32_t cq_head;
> > > -} doorbells[];
> > >  } NVMeRegs;
> > >  
> > >  #define INDEX_ADMIN 0
> > > @@ -103,6 +99,11 @@ struct BDRVNVMeState {
> > >  AioContext *aio_context;
> > >  QEMUVFIOState *vfio;
> > >  NVMeRegs *regs;
> > > +/* Memory mapped registers */
> > > +volatile struct {
> > > +uint32_t sq_tail;
> > > +uint32_t cq_head;
> > > +} *doorbells;
> > >  /* The submission/completion queue pairs.
> > >   * [0]: admin queue.
> > >   * [1..]: io queues.
> > > @@ -247,14 +248,14 @@ static NVMeQueuePair
> > > *nvme_create_queue_pair(BDRVNVMeState *s,
> > >  error_propagate(errp, local_err);
> > >  goto fail;
> > >  }
> > > -q->sq.doorbell = >regs->doorbells[idx * s-
> > > > doorbell_scale].sq_tail;
> > > 
> > > +q->sq.doorbell = >doorbells[idx * s-
> > > >doorbell_scale].sq_tail;
> > >  
> > >  nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES,
> > > _err);
> > >  if (local_err) {
> > >  error_propagate(errp, local_err);
> > >  goto fail;
> > >  }
> > > -q->cq.doorbell = >regs->doorbells[idx * s-
> > > > doorbell_scale].cq_head;
> > > 
> > > +q->cq.doorbell = >doorbells[idx * s-
> > > >doorbell_scale].cq_head;
> > >  
> > >  return q;
> > >  fail:
> > > @@ -712,13 +713,12 @@ static int nvme_init(BlockDriverState *bs,
> > > const char *device, int namespace,
> > >  goto out;
> > >  }
> > >  
> > > -s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0,
> > > NVME_BAR_SIZE,
> > > +s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0,
> > > sizeof(NvmeBar),
> > >  PROT_READ | PROT_WRITE,
> > > errp);
> > >  if (!s->regs) {
> > >  ret = -EINVAL;
> > >  goto out;
> > >  }
> > > -
> > >  /* Perform initialize sequence as described in NVMe spec
> > > "7.6.1
> > >   * Initialization". */
> > >  
> > > @@ -748,6 +748,13 @@ static int nvme_init(BlockDriverState *bs,
> > > const
> > > char *device, int namespace,
> > >  }
> > >  }
> > >  
> > > +s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0,
> > > sizeof(NvmeBar),
> > > + NVME_DOORBELL_SIZE,
> > > PROT_WRITE, errp);
> > > +if (!s->doorbells) {
> > > +ret = -EINVAL;
> > > +goto out;
> > > +}
> > > +
> > >  /* Set up admin queue. */
> > >  s->queues = g_new(NVMeQueuePair *, 1);
> > >  s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s,
> > > aio_context,
> > > 0,
> > > @@ -873,7 +880,9 @@ static void nvme_close(BlockDriverState *bs)
> > > 
> > > >irq_notifier[MSIX_SHARED_IRQ_IDX],
> > > false, NULL, NULL);
> > >  event_notifier_cleanup(
> > > >irq_notifier[MSIX_SHARED_IRQ_IDX]);
> > > -qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> > > NVME_BAR_SIZE);
> > > +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
> > > +sizeof(NvmeBar),
> > > NVME_DOORBELL_SIZE);
> > > +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> > > sizeof(NvmeBar));
> > >  qemu_vfio_close(s->vfio);
> > >  
> > >  g_free(s->device);
> 
>

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Fam Zheng

On Mon, 2020-09-21 at 18:29 +0200, Philippe Mathieu-Daudé wrote:
> Per the datasheet sections 3.1.13/3.1.14:
>   "The host should not read the doorbell registers."
> 
> As we don't need read access, map the doorbells with write-only
> permission. We keep a reference to this mapped address in the
> BDRVNVMeState structure.

Besides looking more correct in access mode, is there any side effect
of WO mapping?

Fam

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 29 +++--
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 5a4dc6a722a..3c834da8fec 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -31,7 +31,7 @@
>  #define NVME_SQ_ENTRY_BYTES 64
>  #define NVME_CQ_ENTRY_BYTES 16
>  #define NVME_QUEUE_SIZE 128
> -#define NVME_BAR_SIZE 8192
> +#define NVME_DOORBELL_SIZE 4096
>  
>  /*
>   * We have to leave one slot empty as that is the full queue case
> where
> @@ -84,10 +84,6 @@ typedef struct {
>  /* Memory mapped registers */
>  typedef volatile struct {
>  NvmeBar ctrl;
> -struct {
> -uint32_t sq_tail;
> -uint32_t cq_head;
> -} doorbells[];
>  } NVMeRegs;
>  
>  #define INDEX_ADMIN 0
> @@ -103,6 +99,11 @@ struct BDRVNVMeState {
>  AioContext *aio_context;
>  QEMUVFIOState *vfio;
>  NVMeRegs *regs;
> +/* Memory mapped registers */
> +volatile struct {
> +uint32_t sq_tail;
> +uint32_t cq_head;
> +} *doorbells;
>  /* The submission/completion queue pairs.
>   * [0]: admin queue.
>   * [1..]: io queues.
> @@ -247,14 +248,14 @@ static NVMeQueuePair
> *nvme_create_queue_pair(BDRVNVMeState *s,
>  error_propagate(errp, local_err);
>  goto fail;
>  }
> -q->sq.doorbell = >regs->doorbells[idx * s-
> >doorbell_scale].sq_tail;
> +q->sq.doorbell = >doorbells[idx * s->doorbell_scale].sq_tail;
>  
>  nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES,
> _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  goto fail;
>  }
> -q->cq.doorbell = >regs->doorbells[idx * s-
> >doorbell_scale].cq_head;
> +q->cq.doorbell = >doorbells[idx * s->doorbell_scale].cq_head;
>  
>  return q;
>  fail:
> @@ -712,13 +713,12 @@ static int nvme_init(BlockDriverState *bs,
> const char *device, int namespace,
>  goto out;
>  }
>  
> -s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE,
> +s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
>  PROT_READ | PROT_WRITE, errp);
>  if (!s->regs) {
>  ret = -EINVAL;
>  goto out;
>  }
> -
>  /* Perform initialize sequence as described in NVMe spec "7.6.1
>   * Initialization". */
>  
> @@ -748,6 +748,13 @@ static int nvme_init(BlockDriverState *bs, const
> char *device, int namespace,
>  }
>  }
>  
> +s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0,
> sizeof(NvmeBar),
> + NVME_DOORBELL_SIZE,
> PROT_WRITE, errp);
> +if (!s->doorbells) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
>  /* Set up admin queue. */
>  s->queues = g_new(NVMeQueuePair *, 1);
>  s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context,
> 0,
> @@ -873,7 +880,9 @@ static void nvme_close(BlockDriverState *bs)
> >irq_notifier[MSIX_SHARED_IRQ_IDX],
> false, NULL, NULL);
>  event_notifier_cleanup(>irq_notifier[MSIX_SHARED_IRQ_IDX]);
> -qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> NVME_BAR_SIZE);
> +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
> +sizeof(NvmeBar), NVME_DOORBELL_SIZE);
> +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> sizeof(NvmeBar));
>  qemu_vfio_close(s->vfio);
>  
>  g_free(s->device);

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-21 Thread Fam Zheng

On 2020-09-19 10:22, Zhenyu Ye wrote:
> On 2020/9/18 22:06, Fam Zheng wrote:
> > 
> > I can see how blocking in a slow io_submit can cause trouble for main
> > thread. I think one way to fix it (until it's made truly async in new
> > kernels) is moving the io_submit call to thread pool, and wrapped in a
> > coroutine, perhaps.
> >
> 
> I'm not sure if any other operation will block the main thread, other
> than io_submit().

Then that's a problem with io_submit which should be fixed. Or more
precisely, that is a long held lock that we should avoid in QEMU's event
loops.

> 
> > I'm not sure qmp timeout is a complete solution because we would still
> > suffer from a blocked state for a period, in this exact situation before
> > the timeout.
> 
> Anyway, the qmp timeout may be the last measure to prevent the VM
> soft lockup. 

Maybe, but I don't think baking such a workaround into the QMP API is a
good idea. No QMP command should be synchronously long running, so
having a timeout parameter is just a wrong design.

Thanks,

Fam

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-18 Thread Fam Zheng

On 2020-09-18 19:23, Zhenyu Ye wrote:
>   Thread 5 (LWP 4802):
>   #0  0x83086b54 in syscall () at /lib64/libc.so.6
>   #1  0x834598b8 in io_submit () at /lib64/libaio.so.1
>   #2  0xe851e89c in ioq_submit (s=0xfffd3c001bb0) at 
> ../block/linux-aio.c:299
>   #3  0xe851eb50 in laio_io_unplug (bs=0xef0f2340, 
> s=0xfffd3c001bb0)
>   at ../block/linux-aio.c:344
>   #4  0xe8559f1c in raw_aio_unplug (bs=0xef0f2340) at 
> ../block/file-posix.c:2063
>   #5  0xe8538344 in bdrv_io_unplug (bs=0xef0f2340) at 
> ../block/io.c:3135
>   #6  0xe8538360 in bdrv_io_unplug (bs=0xef0eb020) at 
> ../block/io.c:3140
>   #7  0xe8496104 in blk_io_unplug (blk=0xef0e8f20)
>   at ../block/block-backend.c:2147
>   #8  0xe830e1a4 in virtio_blk_handle_vq (s=0xf0374280, 
> vq=0x700fc1d8)
>   at ../hw/block/virtio-blk.c:796
>   #9  0xe82e6b68 in virtio_blk_data_plane_handle_output
>   (vdev=0xf0374280, vq=0x700fc1d8) at 
> ../hw/block/dataplane/virtio-blk.c:165
>   #10 0xe83878fc in virtio_queue_notify_aio_vq (vq=0x700fc1d8)
>   at ../hw/virtio/virtio.c:2325
>   #11 0xe838ab50 in virtio_queue_host_notifier_aio_poll 
> (opaque=0x700fc250)
>   at ../hw/virtio/virtio.c:3545
>   #12 0xe85fab3c in run_poll_handlers_once
>   (ctx=0xef0a87b0, now=77604310618960, timeout=0x73ffdf78)
>   at ../util/aio-posix.c:398
>   #13 0xe85fae5c in run_poll_handlers
>   (ctx=0xef0a87b0, max_ns=4000, timeout=0x73ffdf78) at 
> ../util/aio-posix.c:492
>   #14 0xe85fb078 in try_poll_mode (ctx=0xef0a87b0, 
> timeout=0x73ffdf78)
>   at ../util/aio-posix.c:535
>   #15 0xe85fb180 in aio_poll (ctx=0xef0a87b0, blocking=true)
>   at ../util/aio-posix.c:571
>   #16 0xe8027004 in iothread_run (opaque=0xeee79a00) at 
> ../iothread.c:73
>   #17 0xe85f269c in qemu_thread_start (args=0xef0a8d10)
>   at ../util/qemu-thread-posix.c:521
>   #18 0x831428bc in  () at /lib64/libpthread.so.0
>   #19 0x8308aa1c in  () at /lib64/libc.so.6

I can see how blocking in a slow io_submit can cause trouble for main
thread. I think one way to fix it (until it's made truly async in new
kernels) is moving the io_submit call to thread pool, and wrapped in a
coroutine, perhaps.

I'm not sure qmp timeout is a complete solution because we would still
suffer from a blocked state for a period, in this exact situation before
the timeout.

Fam

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Fam Zheng

On 2020-09-17 16:44, Stefan Hajnoczi wrote:
> On Thu, Sep 17, 2020 at 03:36:57PM +0800, Zhenyu Ye wrote:
> > When the hang occurs, the QEMU is blocked at:
> > 
> > #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
> > #1  0x9575bd88 in pthread_mutex_lock () from 
> > target:/usr/lib64/libpthread.so.0
> > #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
> > file=0xbb4e1bd0 
> > "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
> > #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
> > #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
> > p_info=0xccc41e18, errp=0xca669118)
> > #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
> > bs=0xcc934620,
> > flat=false, errp=0xca6692b8)
> > #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
> > p_info=0xcd29c9a8,
> > errp=0xca6692b8)
> > #7  0xbb106c14 in qmp_query_block (errp=0x0)
> > #8  0xbacb8e6c in hmp_info_block (mon=0xca6693d0, 
> > qdict=0xcd089790)
> 
> Great, this shows that the main loop thread is stuck waiting for the
> AioContext lock.
> 
> Please post backtraces from all QEMU threads ((gdb) thread apply all bt)
> so we can figure out which thread is holding up the main loop.

I think that is reflected in the perf backtrace posted by Zheny already:

And in the host, the information of sys_enter_io_submit() is:

Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count
(approx.): 3150
   Children  Self  Trace output
   -   66.70%66.70%  ctx_id: 0x9c044000,
   nr: 0x0001, iocbpp: 0x9f7fad28
   0xae7f871c
   0xae8a27c4
   qemu_thread_start
   iothread_run
   aio_poll
   aio_dispatch_ready_handlers
   aio_dispatch_handler
   virtio_queue_host_notifier_aio_read
   virtio_queue_notify_aio_vq
   virtio_blk_data_plane_handle_output
   virtio_blk_handle_vq
   blk_io_unplug
   bdrv_io_unplug
   bdrv_io_unplug
   raw_aio_unplug
   laio_io_unplug
   syscall

So the iothread is blocked by a slow io_submit holding the AioContext
lock.

It would be interesting to know what in kernel is blocking io_submit
from returning.

Fam

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-17 Thread Fam Zheng

On 2020-09-17 15:36, Zhenyu Ye wrote:
> Hi Stefan,
> 
> On 2020/9/14 21:27, Stefan Hajnoczi wrote:
> >>
> >> Theoretically, everything running in an iothread is asynchronous. However,
> >> some 'asynchronous' actions are not non-blocking entirely, such as
> >> io_submit().  This will block while the iodepth is too big and I/O pressure
> >> is too high.  If we do some qmp actions, such as 'info block', at this 
> >> time,
> >> may cause vm soft lockup.  This series can make these qmp actions safer.
> >>
> >> I constructed the scene as follow:
> >> 1. create a vm with 4 disks, using iothread.
> >> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 
> >> 95%.
> >> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
> >> some parameters are:
> >>
> >> fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
> >>
> >> 4. do block query actions, for example, by virsh:
> >>
> >>virsh qemu-monitor-command [vm name] --hmp info block
> >>
> >> Then the vm will soft lockup, the calltrace is:
> >>
> >> [  192.311393] watchdog: BUG: soft lockup - CPU#1 stuck for 42s! 
> >> [kworker/1:1:33]
> > 
> > Hi,
> > Sorry I haven't had time to investigate this myself yet.
> > 
> > Do you also have a QEMU backtrace when the hang occurs?
> > 
> > Let's find out if QEMU is stuck in the io_submit(2) syscall or whether
> > there's an issue in QEMU itself that causes the softlockup (for example,
> > aio_poll() with the global mutex held).
> 
> Sorry for the delay.
> 
> I traced the io_submit() within the guest. The maximum execution time exceeds 
> 16s:
> 
>   guest# perf trace -p `pidof -s fio` --duration 1
>   14.808 (3843.082 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
> 3861.336 (11470.608 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>15341.998 (479.283 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>15831.380 (3704.997 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>19547.869 (3412.839 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>22966.953 (1080.854 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>24062.689 (6939.813 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>31019.219 (532.147 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>31556.610 (3469.920 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>35038.885 (9007.175 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>44053.578 (16006.405 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>60068.944 (3068.748 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>63138.474 (13012.499 ms): io_getevents(ctx_id: 281472924459008, 
> min_nr: 511, nr: 511, events: 0x19a83150) = 511
>76191.073 (2872.657 ms): io_submit(ctx_id: 281472924459008, nr: 1, 
> iocbpp: 0x19a87160  ) = 1
>   ...
> 
> And in the host, the information of sys_enter_io_submit() is:
> 
>   Samples: 3K of event 'syscalls:sys_enter_io_submit', Event count 
> (approx.): 3150
>   Children  Self  Trace output
>   -   66.70%66.70%  ctx_id: 0x9c044000, nr: 0x0001, iocbpp: 
> 0x9f7fad28
>   0xae7f871c
>   0xae8a27c4
>   qemu_thread_start
>   iothread_run
>   aio_poll
>   aio_dispatch_ready_handlers
>   aio_dispatch_handler
>   virtio_queue_host_notifier_aio_read
>   virtio_queue_notify_aio_vq
>   virtio_blk_data_plane_handle_output
>   virtio_blk_handle_vq
>   blk_io_unplug
>   bdrv_io_unplug
>   bdrv_io_unplug
>   raw_aio_unplug
>   laio_io_unplug
>   syscall
> 
> 
> When the hang occurs, the QEMU is blocked at:
> 
>   #0  0x95762b64 in ?? () from target:/usr/lib64/libpthread.so.0
>   #1  0x9575bd88 in pthread_mutex_lock () from 
> target:/usr/lib64/libpthread.so.0
>   #2  0xbb1f5948 in qemu_mutex_lock_impl (mutex=0xcc8e1860,
>   file=0xbb4e1bd0 
> "/Images/eillon/CODE/5-opensource/qemu/util/async.c", line=605)
>   #3  0xbb20acd4 in aio_context_acquire (ctx=0xcc8e1800)
>   #4  0xbb105e90 in bdrv_query_image_info (bs=0xcc934620,
>   p_info=0xccc41e18, errp=0xca669118)
>   #5  0xbb105968 in bdrv_block_device_info (blk=0xcdca19f0, 
> bs=0xcc934620,
>   flat=false, errp=0xca6692b8)
>   #6  0xbb1063dc in bdrv_query_info (blk=0xcdca19f0, 
> p_info=0xcd29c9a8,
>   errp=0xca6692b8)
>   #7  0xbb106c14 in qmp_query_block (errp=0x0)
>

Re: [RFC PATCH v5 4/4] block/nvme: Use qemu_vfio_pci_init_msix_irqs() to initialize our IRQ

2020-09-09 Thread Fam Zheng

On 2020-09-08 20:03, Philippe Mathieu-Daudé wrote:
> Instead of initializing one MSIX IRQ with the generic
> qemu_vfio_pci_init_irq() function, use the MSIX specific one which
> ill allow us to use multiple IRQs. For now we provide an array of

s/ill/will/

> a single IRQ.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Fam

Re: [RFC PATCH v5 0/4] util/vfio-helpers: Add support for multiple IRQs

2020-09-09 Thread Fam Zheng

On 2020-09-08 20:03, Philippe Mathieu-Daudé wrote:
> This series intends to setup the VFIO helper to allow
> binding notifiers on different IRQs.
> 
> For the NVMe use case, we only care about MSIX interrupts.
> To not disrupt other users, introduce the qemu_vfio_pci_init_msix_irqs
> function to initialize multiple MSIX IRQs and attach eventfd to
> them.
> 
> Since RFC v4:
> - addressed Alex review comment:
>   check ioctl(VFIO_DEVICE_SET_IRQS) return value

Reviewed-by: Fam Zheng

Re: [RFC PATCH v5 2/4] util/vfio-helpers: Report error when IOMMU page size is not supported

2020-09-09 Thread Fam Zheng

On 2020-09-08 20:03, Philippe Mathieu-Daudé wrote:
> This driver uses the host page size to align its memory regions,
> but this size is not always compatible with the IOMMU. Add a
> check if the size matches, and bails out with listing the sizes
> the IOMMU supports.
> 
> Example on Aarch64:
> 
>  $ qemu-system-aarch64 -M virt -drive 
> if=none,id=nvme0,file=nvme://0006:90:00.0/1,format=raw
>  qemu-system-aarch64: -drive 
> if=none,id=nvme0,file=nvme://0006:90:00.0/1,format=raw: Unsupported IOMMU 
> page size: 4 KiB
>  Available page size:
>   64 KiB
>   512 MiB
> 
> Suggested-by: Alex Williamson 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  util/vfio-helpers.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 55b4107ce69..6d9ec7d365c 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include 
>  #include 
>  #include "qapi/error.h"
> @@ -316,6 +317,25 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
> char *device,
>  ret = -errno;
>  goto fail;
>  }
> +if (!(iommu_info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> +error_setg(errp, "Failed to get IOMMU page size info");
> +ret = -errno;

We don't have errno here, do we?

> +goto fail;
> +}
> +if (!extract64(iommu_info.iova_pgsizes,
> +   ctz64(qemu_real_host_page_size), 1)) {
> +g_autofree char *host_page_size = 
> size_to_str(qemu_real_host_page_size);
> +error_setg(errp, "Unsupported IOMMU page size: %s", host_page_size);
> +error_append_hint(errp, "Available page size:\n");
> +for (int i = 0; i < 64; i++) {
> +if (extract64(iommu_info.iova_pgsizes, i, 1)) {
> +g_autofree char *iova_pgsizes = size_to_str(1UL << i);
> +error_append_hint(errp, " %s\n", iova_pgsizes);

Interesting... Since it's printing page size which is fairly low level,
why not just plain (hex) numbers?

Fam

> +}
> +}
> +ret = -EINVAL;
> +goto fail;
> +}
>  
>  s->device = ioctl(s->group, VFIO_GROUP_GET_DEVICE_FD, device);
>  
> -- 
> 2.26.2
> 
>

Re: [PATCH] MAINTAINERS: add Stefan Hajnoczi as block/nvme.c maintainer

2020-09-08 Thread Fam Zheng

On 2020-09-07 12:16, Stefan Hajnoczi wrote:
> Development of the userspace NVMe block driver picked up again recently.
> After talking with Fam I am stepping up as block/nvme.c maintainer.
> Patches will be merged through my 'block' tree.
> 
> Cc: Kevin Wolf 
> Cc: Klaus Jensen 
> Cc: Fam Zheng 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Fam Zheng

Re: [PATCH 0/3] block/nvme: Use NvmeBar structure from "block/nvme.h"

2020-09-04 Thread Fam Zheng

On 2020-09-04 14:41, Philippe Mathieu-Daudé wrote:
> Cleanups in the NVMeRegs structure:
> - Use the already existing NvmeBar structure from "block/nvme.h"
> - Pair doorbell registers
> 
> Based-on: <20200903122803.405265-1-phi...@redhat.com>
> 
> Philippe Mathieu-Daudé (3):
>   block/nvme: Group controller registers in NVMeRegs structure
>   block/nvme: Use generic NvmeBar structure
>   block/nvme: Pair doorbell registers
> 
>  block/nvme.c | 43 +++
>  1 file changed, 15 insertions(+), 28 deletions(-)
> 
> -- 
> 2.26.2
> 
> 

Reviewed-by: Fam Zheng

Re: [PATCH v2 00/20] nvme: support NVMe v1.3d, SGLs and multiple namespaces

2019-10-16 Thread Fam Zheng

On Tue, 10/15 12:38, Klaus Jensen wrote:
> Hi,
> 
> (Quick note to Fam): most of this series is irrelevant to you as the
> maintainer of the nvme block driver, but patch "nvme: add support for
> scatter gather lists" touches block/nvme.c due to changes in the shared
> NvmeCmd struct.

Yeah, that part looks sane to me. For the block/nvme.c bit:

Acked-by: Fam Zheng

Re: [Qemu-devel] [PATCH] docker: add sanitizers back to clang build

2019-09-17 Thread Fam Zheng

On Thu, 09/12 19:07, John Snow wrote:
> 
> 
> On 9/11/19 9:52 PM, no-re...@patchew.org wrote:
> > Patchew URL: 
> > https://patchew.org/QEMU/20190912014442.5757-1-js...@redhat.com/
> > 
> > 
> > 
> > Hi,
> > 
> > This series seems to have some coding style problems. See output below for
> > more information:
> > 
> > Subject: [Qemu-devel] [PATCH] docker: add sanitizers back to clang build
> > Message-id: 20190912014442.5757-1-js...@redhat.com
> > Type: series
> > 
> > === TEST SCRIPT BEGIN ===
> > #!/bin/bash
> > git rev-parse base > /dev/null || exit 0
> > git config --local diff.renamelimit 0
> > git config --local diff.renames True
> > git config --local diff.algorithm histogram
> > ./scripts/checkpatch.pl --mailback base..
> > === TEST SCRIPT END ===
> > 
> > From https://github.com/patchew-project/qemu
> >  * [new tag] patchew/20190912014442.5757-1-js...@redhat.com -> 
> > patchew/20190912014442.5757-1-js...@redhat.com
> > Switched to a new branch 'test'
> > 96d44b9 docker: add sanitizers back to clang build
> > 
> > === OUTPUT BEGIN ===
> > ERROR: Missing Signed-off-by: line(s)
> 
> GDI.
> 
> I keep adding this to my configuration files, but it keeps "falling
> off", somehow.
> 
> I have some patches in the works for stgit where I'm going to work
> through some test cases for setting profile variables and try to fix this.
> 
> In the meantime:
> 
> Signed-off-by: John Snow 

Isn't this because you inserted a '---' line in the middle of the commit
message so the part after it is ditched by 'git am'?

This feels a bit hard to catch, wondering what is in the works. :)

Fam

Re: [Qemu-devel] [Patchew-devel] [PATCH v15 0/5] linux-user: A set of miscellaneous patches

2019-06-28 Thread Fam Zheng

On Fri, 06/28 14:15, Laurent Vivier wrote:
> Dear Patchew developers,
> 
> Le 28/06/2019 à 11:49, no-re...@patchew.org a écrit :
> > Patchew URL: 
> > https://patchew.org/QEMU/1561712082-31441-1-git-send-email-aleksandar.marko...@rt-rk.com/
> > 
> > 
> > 
> > Hi,
> > 
> > This series failed build test on s390x host. Please find the details below.
> 
> To debug this kind of problem (at least to reproduce it), it may be
> interesting to know the OS release of the target build environment
> (glibc version, gcc version, ...) on which the build fails.
> 
> Is this possible to add this in the mail (or in the logs)?

I've moved the env/rpm/uname commands before the actual testing commands, so
the information can be collected even upon failures.

Fam

> 
> ...
> >   CC  microblaze-linux-user/linux-user/strace.o
> >   CC  mips64el-linux-user/linux-user/syscall.o
> >   LINKmicroblazeel-linux-user/qemu-microblazeel
> > /var/tmp/patchew-tester-tmp-tr0wvoyz/src/linux-user/syscall.c:322:16: 
> > error: conflicting types for ‘statx’
> >   322 | _syscall5(int, statx, int, dirfd, const char *, pathname, int, 
> > flags,
> >   |^
> > /var/tmp/patchew-tester-tmp-tr0wvoyz/src/linux-user/syscall.c:213:13: note: 
> > in definition of macro ‘_syscall5’
> 
> Thanks,
> Laurent
> 
> ___
> Patchew-devel mailing list
> patchew-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/patchew-devel

Re: [Qemu-devel] [PATCH v5 09/12] block: add trace events for io_uring

2019-06-11 Thread Fam Zheng

On Mon, 06/10 19:19, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta 
> ---
>  block/io_uring.c   | 14 --
>  block/trace-events |  8 
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index f327c7ef96..47e027364a 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -17,6 +17,7 @@
>  #include "block/raw-aio.h"
>  #include "qemu/coroutine.h"
>  #include "qapi/error.h"
> +#include "trace.h"
>  
>  #define MAX_EVENTS 128
>  
> @@ -191,12 +192,15 @@ static int ioq_submit(LuringState *s)
>  
>  void luring_io_plug(BlockDriverState *bs, LuringState *s)
>  {
> +trace_luring_io_plug();
>  s->io_q.plugged++;
>  }
>  
>  void luring_io_unplug(BlockDriverState *bs, LuringState *s)
>  {
>  assert(s->io_q.plugged);
> +trace_luring_io_unplug(s->io_q.blocked, s->io_q.plugged,
> + s->io_q.in_queue, s->io_q.in_flight);
>  if (--s->io_q.plugged == 0 &&
>  !s->io_q.blocked && s->io_q.in_queue > 0) {
>  ioq_submit(s);
> @@ -217,6 +221,7 @@ void luring_io_unplug(BlockDriverState *bs, LuringState 
> *s)
>  static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
>  uint64_t offset, int type)
>  {
> +int ret;
>  struct io_uring_sqe *sqes = io_uring_get_sqe(>ring);
>  if (!sqes) {
>  sqes = >sqeq;
> @@ -242,11 +247,14 @@ static int luring_do_submit(int fd, LuringAIOCB 
> *luringcb, LuringState *s,
>  }
>  io_uring_sqe_set_data(sqes, luringcb);
>  s->io_q.in_queue++;
> -
> +trace_luring_do_submit(s->io_q.blocked, s->io_q.plugged,
> +   s->io_q.in_queue, s->io_q.in_flight);
>  if (!s->io_q.blocked &&
>  (!s->io_q.plugged ||
>   s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
> -return ioq_submit(s);
> +ret = ioq_submit(s);
> +trace_luring_do_submit_done(ret);
> +return ret;
>  }
>  return 0;
>  }
> @@ -294,6 +302,7 @@ LuringState *luring_init(Error **errp)
>  int rc;
>  LuringState *s;
>  s = g_malloc0(sizeof(*s));
> +trace_luring_init_state((void *)s, sizeof(*s));

The pointer type cast is unnecessary.

>  struct io_uring *ring = >ring;
>  rc =  io_uring_queue_init(MAX_EVENTS, ring, 0);
>  if (rc < 0) {
> @@ -311,4 +320,5 @@ void luring_cleanup(LuringState *s)
>  {
>  io_uring_queue_exit(>ring);
>  g_free(s);
> +trace_luring_cleanup_state();

Pass @s?

>  }
> diff --git a/block/trace-events b/block/trace-events
> index eab51497fc..c4564dcd96 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -60,6 +60,14 @@ qmp_block_stream(void *bs, void *job) "bs %p job %p"
>  file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int 
> type) "acb %p opaque %p offset %"PRId64" count %d type %d"
>  file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t 
> dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset 
> %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
>  
> +#io_uring.c
> +luring_init_state(void *s, size_t size) "s %p size %zu"
> +luring_cleanup_state(void) "s freed"
> +disable luring_io_plug(void) "plug"
> +disable luring_io_unplug(int blocked, int plugged, int queued, int inflight) 
> "blocked %d plugged %d queued %d inflight %d"
> +disable luring_do_submit(int blocked, int plugged, int queued, int inflight) 
> "blocked %d plugged %d queued %d inflight %d"
> +disable luring_do_submit_done(int ret) "submitted to kernel %d"
> +
>  # qcow2.c
>  qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 
> 0x%" PRIx64 " bytes %d"
>  qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
> -- 
> 2.17.1
> 

Fam

Re: [Qemu-devel] [PATCH v5 11/12] qemu-io: adds support for io_uring

2019-06-11 Thread Fam Zheng

On Mon, 06/10 19:19, Aarushi Mehta wrote:
> Signed-off-by: Aarushi Mehta 
> ---
>  qemu-io.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/qemu-io.c b/qemu-io.c
> index 8d5d5911cb..54b82151c4 100644
> --- a/qemu-io.c
> +++ b/qemu-io.c
> @@ -129,6 +129,7 @@ static void open_help(void)
>  " -n, -- disable host cache, short for -t none\n"
>  " -U, -- force shared permissions\n"
>  " -k, -- use kernel AIO implementation (on Linux only)\n"
> +" -i  -- use kernel io_uring (Linux 5.1+)\n"
>  " -t, -- use the given cache mode for the image\n"
>  " -d, -- use the given discard mode for the image\n"
>  " -o, -- options to be given to the block driver"
> @@ -188,6 +189,11 @@ static int open_f(BlockBackend *blk, int argc, char 
> **argv)
>  case 'k':
>  flags |= BDRV_O_NATIVE_AIO;
>  break;
> +#ifdef CONFIG_LINUX_IO_URING
> +case 'i':

Maybe printing an error message saying the feature is not compiled in is
slightly better than just saying the argument is unknown?

Fam

> +flags |= BDRV_O_IO_URING;
> +break;
> +#endif
>  case 't':
>  if (bdrv_parse_cache_mode(optarg, , ) < 0) {
>  error_report("Invalid cache option: %s", optarg);
> @@ -290,6 +296,7 @@ static void usage(const char *name)
>  "  -C, --copy-on-read   enable copy-on-read\n"
>  "  -m, --misalign   misalign allocations for O_DIRECT\n"
>  "  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
> +"  -i  --io_uring   use kernel io_uring (Linux 5.1+)\n"
>  "  -t, --cache=MODE use the given cache mode for the image\n"
>  "  -d, --discard=MODE   use the given discard mode for the image\n"
>  "  -T, --trace [[enable=]][,events=][,file=]\n"
> @@ -499,6 +506,7 @@ int main(int argc, char **argv)
>  { "copy-on-read", no_argument, NULL, 'C' },
>  { "misalign", no_argument, NULL, 'm' },
>  { "native-aio", no_argument, NULL, 'k' },
> +{ "io_uring", no_argument, NULL, 'i' },
>  { "discard", required_argument, NULL, 'd' },
>  { "cache", required_argument, NULL, 't' },
>  { "trace", required_argument, NULL, 'T' },
> @@ -566,6 +574,11 @@ int main(int argc, char **argv)
>  case 'k':
>  flags |= BDRV_O_NATIVE_AIO;
>  break;
> +#ifdef CONFIG_LINUX_IO_URING
> +case 'i':
> +flags |= BDRV_O_IO_URING;
> +break;
> +#endif
>  case 't':
>  if (bdrv_parse_cache_mode(optarg, , ) < 0) {
>  error_report("Invalid cache option: %s", optarg);
> -- 
> 2.17.1
>

Re: [Qemu-devel] [PATCH v5 04/12] block/io_uring: implements interfaces for io_uring

2019-06-11 Thread Fam Zheng

On Mon, 06/10 19:18, Aarushi Mehta wrote:
> Aborts when sqe fails to be set as sqes cannot be returned to the ring.
> 
> Signed-off-by: Aarushi Mehta 
> ---
>  MAINTAINERS |   7 +
>  block/Makefile.objs |   3 +
>  block/io_uring.c| 314 
>  include/block/aio.h |  16 +-
>  include/block/raw-aio.h |  12 ++
>  5 files changed, 351 insertions(+), 1 deletion(-)
>  create mode 100644 block/io_uring.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7be1225415..49f896796e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2516,6 +2516,13 @@ F: block/file-posix.c
>  F: block/file-win32.c
>  F: block/win32-aio.c
>  
> +Linux io_uring
> +M: Aarushi Mehta 
> +R: Stefan Hajnoczi 
> +L: qemu-bl...@nongnu.org
> +S: Maintained
> +F: block/io_uring.c
> +
>  qcow2
>  M: Kevin Wolf 
>  M: Max Reitz 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index ae11605c9f..8fde7a23a5 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
>  block-obj-$(CONFIG_POSIX) += file-posix.o
>  block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
> +block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
>  block-obj-y += null.o mirror.o commit.o io.o create.o
>  block-obj-y += throttle-groups.o
>  block-obj-$(CONFIG_LINUX) += nvme.o
> @@ -61,5 +62,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
>  dmg-lzfse.o-libs   := $(LZFSE_LIBS)
>  qcow.o-libs:= -lz
>  linux-aio.o-libs   := -laio
> +io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
> +io_uring.o-libs:= $(LINUX_IO_URING_LIBS)
>  parallels.o-cflags := $(LIBXML2_CFLAGS)
>  parallels.o-libs   := $(LIBXML2_LIBS)
> diff --git a/block/io_uring.c b/block/io_uring.c
> new file mode 100644
> index 00..f327c7ef96
> --- /dev/null
> +++ b/block/io_uring.c
> @@ -0,0 +1,314 @@
> +/*
> + * Linux io_uring support.
> + *
> + * Copyright (C) 2009 IBM, Corp.
> + * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2019 Aarushi Mehta
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include 
> +#include "qemu-common.h"
> +#include "block/aio.h"
> +#include "qemu/queue.h"
> +#include "block/block.h"
> +#include "block/raw-aio.h"
> +#include "qemu/coroutine.h"
> +#include "qapi/error.h"
> +
> +#define MAX_EVENTS 128
> +
> +typedef struct LuringAIOCB {

I have to say it is a good name.

> +Coroutine *co;
> +struct io_uring_sqe sqeq;
> +ssize_t ret;
> +QEMUIOVector *qiov;
> +bool is_read;
> +QSIMPLEQ_ENTRY(LuringAIOCB) next;
> +} LuringAIOCB;
> +
> +typedef struct LuringQueue {
> +int plugged;
> +unsigned int in_queue;
> +unsigned int in_flight;
> +bool blocked;
> +QSIMPLEQ_HEAD(, LuringAIOCB) sq_overflow;
> +} LuringQueue;
> +
> +typedef struct LuringState {
> +AioContext *aio_context;
> +
> +struct io_uring ring;
> +
> +/* io queue for submit at batch.  Protected by AioContext lock. */
> +LuringQueue io_q;
> +
> +/* I/O completion processing.  Only runs in I/O thread.  */
> +QEMUBH *completion_bh;
> +} LuringState;
> +
> +/**
> + * ioq_submit:
> + * @s: AIO state
> + *
> + * Queues pending sqes and submits them
> + *
> + */
> +static int ioq_submit(LuringState *s);
> +
> +/**
> + * qemu_luring_process_completions:
> + * @s: AIO state
> + *
> + * Fetches completed I/O requests, consumes cqes and invokes their callbacks.
> + *
> + */
> +static void qemu_luring_process_completions(LuringState *s)
> +{
> +struct io_uring_cqe *cqes;
> +int ret;
> +
> +/*
> + * Request completion callbacks can run the nested event loop.
> + * Schedule ourselves so the nested event loop will "see" remaining
> + * completed requests and process them.  Without this, completion
> + * callbacks that wait for other requests using a nested event loop
> + * would hang forever.
> + */
> +qemu_bh_schedule(s->completion_bh);
> +
> +while (io_uring_peek_cqe(>ring, ) == 0) {
> +if (!cqes) {
> +break;
> +}
> +LuringAIOCB *luringcb = io_uring_cqe_get_data(cqes);
> +ret = cqes->res;

Declarations should be in the beginning of the code block.

> +
> +if (ret == luringcb->qiov->size) {
> +ret = 0;
> +} else if (ret >= 0) {
> +/* Short Read/Write */
> +if (luringcb->is_read) {
> +/* Read, pad with zeroes */
> +qemu_iovec_memset(luringcb->qiov, ret, 0,
> +luringcb->qiov->size - ret);

Should you check that (ret < luringcb->qiov->size) since ret is from external?

Either way, ret should be assigned 0, I think.

> +} else {
> +ret = -ENOSPC;;

s/;;/;/

> +}
> +}
> +luringcb->ret = ret;
> +
> +

Re: [Qemu-devel] [PATCH v5 02/12] qapi/block-core: add option for io_uring

2019-06-11 Thread Fam Zheng

On Mon, 06/10 19:18, Aarushi Mehta wrote:
> Option only enumerates for hosts that support it.
> 
> Signed-off-by: Aarushi Mehta 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  qapi/block-core.json | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 1defcde048..db7eedd058 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2792,11 +2792,13 @@
>  #
>  # @threads: Use qemu's thread pool
>  # @native:  Use native AIO backend (only Linux and Windows)
> +# @io_uring:Use linux io_uring (since 4.1)
>  #
>  # Since: 2.9
>  ##
>  { 'enum': 'BlockdevAioOptions',
> -  'data': [ 'threads', 'native' ] }
> +  'data': [ 'threads', 'native',
> +{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] 
> }

Question: 'native' has a dependency on libaio but it doesn't have the
condition.  Is the inconsistency intended?

>  
>  ##
>  # @BlockdevCacheOptions:
> -- 
> 2.17.1
>

Re: [Qemu-devel] [PATCH v2 5/5] block/nvme: add support for discard

2019-06-05 Thread Fam Zheng

On Wed, 04/17 22:53, Maxim Levitsky wrote:
> Signed-off-by: Maxim Levitsky 
> ---
>  block/nvme.c   | 80 ++
>  block/trace-events |  2 ++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 35b925899f..b83912c627 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -110,6 +110,7 @@ typedef struct {
>  bool plugged;
>  
>  bool supports_write_zeros;
> +bool supports_discard;
>  
>  CoMutex dma_map_lock;
>  CoQueue dma_flush_queue;
> @@ -462,6 +463,7 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  
>  
>  s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
> +s->supports_discard = (idctrl->oncs & NVME_ONCS_DSM) != 0;
>  
>  memset(resp, 0, 4096);
>  
> @@ -1144,6 +1146,83 @@ static coroutine_fn int 
> nvme_co_pwrite_zeroes(BlockDriverState *bs,
>  }
>  
>  
> +static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
> +int64_t offset, int bytes)

While you respin, you can align the parameters.

> +{
> +BDRVNVMeState *s = bs->opaque;
> +NVMeQueuePair *ioq = s->queues[1];
> +NVMeRequest *req;
> +NvmeDsmRange *buf;
> +QEMUIOVector local_qiov;
> +int r;
> +
> +NvmeCmd cmd = {
> +.opcode = NVME_CMD_DSM,
> +.nsid = cpu_to_le32(s->nsid),
> +.cdw10 = 0, /*number of ranges - 0 based*/
> +.cdw11 = cpu_to_le32(1 << 2), /*deallocate bit*/
> +};
> +
> +NVMeCoData data = {
> +.ctx = bdrv_get_aio_context(bs),
> +.ret = -EINPROGRESS,
> +};
> +
> +if (!s->supports_discard) {
> +return -ENOTSUP;
> +}
> +
> +assert(s->nr_queues > 1);
> +
> +buf = qemu_try_blockalign0(bs, 4096);
> +if (!buf) {
> +return -ENOMEM;
> +}
> +
> +buf->nlb = bytes >> s->blkshift;
> +buf->slba = offset >> s->blkshift;

This buffer is for the device, do we need to do anything about the endianness?

> +buf->cattr = 0;
> +
> +qemu_iovec_init(_qiov, 1);
> +qemu_iovec_add(_qiov, buf, 4096);
> +
> +req = nvme_get_free_req(ioq);
> +assert(req);
> +
> +qemu_co_mutex_lock(>dma_map_lock);
> +r = nvme_cmd_map_qiov(bs, , req, _qiov);
> +qemu_co_mutex_unlock(>dma_map_lock);
> +
> +if (r) {
> +req->busy = false;
> +return r;
> +}
> +
> +trace_nvme_dsm(s, offset, bytes);
> +
> +nvme_submit_command(s, ioq, req, , nvme_rw_cb, );
> +
> +data.co = qemu_coroutine_self();
> +while (data.ret == -EINPROGRESS) {
> +qemu_coroutine_yield();
> +}
> +
> +qemu_co_mutex_lock(>dma_map_lock);
> +r = nvme_cmd_unmap_qiov(bs, _qiov);
> +qemu_co_mutex_unlock(>dma_map_lock);
> +if (r) {
> +return r;
> +}
> +
> +trace_nvme_dsm_done(s, offset, bytes, data.ret);
> +
> +qemu_iovec_destroy(_qiov);
> +qemu_vfree(buf);
> +return data.ret;
> +
> +}
> +
> +
>  static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
> BlockReopenQueue *queue, Error **errp)
>  {
> @@ -1250,6 +1329,7 @@ static BlockDriver bdrv_nvme = {
>  .bdrv_co_pwritev  = nvme_co_pwritev,
>  
>  .bdrv_co_pwrite_zeroes= nvme_co_pwrite_zeroes,
> +.bdrv_co_pdiscard = nvme_co_pdiscard,
>  
>  .bdrv_co_flush_to_disk= nvme_co_flush,
>  .bdrv_reopen_prepare  = nvme_reopen_prepare,
> diff --git a/block/trace-events b/block/trace-events
> index 943a58569f..e55ac5c40b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -148,6 +148,8 @@ nvme_write_zeros(void *s, uint64_t offset, uint64_t 
> bytes, int flags) "s %p offs
>  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int 
> align) "qiov %p n %d base %p size 0x%zx align 0x%x"
>  nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int 
> is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
>  nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
> +nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" 
> bytes %"PRId64""
> +nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p 
> offset %"PRId64" bytes %"PRId64" ret %d"
>  nvme_dma_map_flush(void *s) "s %p"
>  nvme_free_req_queue_wait(void *q) "q %p"
>  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
> %p cmd %p req %p qiov %p entries %d"
> -- 
> 2.17.2
>

Re: [Qemu-devel] [PATCH v2 4/5] block/nvme: add support for write zeros

2019-06-05 Thread Fam Zheng

On Wed, 04/17 22:53, Maxim Levitsky wrote:
> Signed-off-by: Maxim Levitsky 
> ---
>  block/nvme.c | 69 +++-
>  block/trace-events   |  1 +
>  include/block/nvme.h | 19 +++-
>  3 files changed, 87 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 0b1da54574..35b925899f 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -109,6 +109,8 @@ typedef struct {
>  uint64_t max_transfer;
>  bool plugged;
>  
> +bool supports_write_zeros;
> +
>  CoMutex dma_map_lock;
>  CoQueue dma_flush_queue;
>  
> @@ -457,6 +459,10 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  s->max_transfer = MIN_NON_ZERO(s->max_transfer,
>s->page_size / sizeof(uint64_t) * s->page_size);
>  
> +
> +

Too many blank lines here.

> +s->supports_write_zeros = (idctrl->oncs & NVME_ONCS_WRITE_ZEROS) != 0;
> +
>  memset(resp, 0, 4096);
>  
>  cmd.cdw10 = 0;
> @@ -469,6 +475,11 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  s->nsze = le64_to_cpu(idns->nsze);
>  lbaf = >lbaf[NVME_ID_NS_FLBAS_INDEX(idns->flbas)];
>  
> +if (NVME_ID_NS_DLFEAT_WRITE_ZEROS(idns->dlfeat) &&
> +NVME_ID_NS_DLFEAT_READ_BEHAVIOR(idns->dlfeat) ==
> +NVME_ID_NS_DLFEAT_READ_BEHAVIOR_ZEROS)
> +bs->supported_write_flags |= BDRV_REQ_MAY_UNMAP;
> +
>  if (lbaf->ms) {
>  error_setg(errp, "Namespaces with metadata are not yet supported");
>  goto out;
> @@ -763,6 +774,8 @@ static int nvme_file_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  int ret;
>  BDRVNVMeState *s = bs->opaque;
>  
> +bs->supported_write_flags = BDRV_REQ_FUA;
> +
>  opts = qemu_opts_create(_opts, NULL, 0, _abort);
>  qemu_opts_absorb_qdict(opts, options, _abort);
>  device = qemu_opt_get(opts, NVME_BLOCK_OPT_DEVICE);
> @@ -791,7 +804,6 @@ static int nvme_file_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto fail;
>  }
>  }
> -bs->supported_write_flags = BDRV_REQ_FUA;
>  return 0;
>  fail:
>  nvme_close(bs);
> @@ -1080,6 +1092,58 @@ static coroutine_fn int nvme_co_flush(BlockDriverState 
> *bs)
>  }
>  
>  
> +static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
> +int64_t offset, int bytes, BdrvRequestFlags flags)
> +{
> +BDRVNVMeState *s = bs->opaque;
> +NVMeQueuePair *ioq = s->queues[1];
> +NVMeRequest *req;
> +
> +if (!s->supports_write_zeros) {
> +return -ENOTSUP;
> +}

Local variables declaration below statements is not allowed as per coding style.

> +
> +uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0x;
> +
> +NvmeCmd cmd = {
> +.opcode = NVME_CMD_WRITE_ZEROS,
> +.nsid = cpu_to_le32(s->nsid),
> +.cdw10 = cpu_to_le32((offset >> s->blkshift) & 0x),
> +.cdw11 = cpu_to_le32(((offset >> s->blkshift) >> 32) & 0x),
> +};
> +
> +NVMeCoData data = {
> +.ctx = bdrv_get_aio_context(bs),
> +.ret = -EINPROGRESS,
> +};
> +
> +if (flags & BDRV_REQ_MAY_UNMAP) {
> +cdw12 |= (1 << 25);
> +}
> +
> +if (flags & BDRV_REQ_FUA) {
> +cdw12 |= (1 << 30);
> +}
> +
> +cmd.cdw12 = cpu_to_le32(cdw12);
> +
> +trace_nvme_write_zeros(s, offset, bytes, flags);
> +assert(s->nr_queues > 1);
> +req = nvme_get_free_req(ioq);
> +assert(req);
> +
> +nvme_submit_command(s, ioq, req, , nvme_rw_cb, );
> +
> +data.co = qemu_coroutine_self();
> +while (data.ret == -EINPROGRESS) {
> +qemu_coroutine_yield();
> +}
> +
> +trace_nvme_rw_done(s, true, offset, bytes, data.ret);
> +return data.ret;
> +}
> +
> +
>  static int nvme_reopen_prepare(BDRVReopenState *reopen_state,
> BlockReopenQueue *queue, Error **errp)
>  {
> @@ -1184,6 +1248,9 @@ static BlockDriver bdrv_nvme = {
>  
>  .bdrv_co_preadv   = nvme_co_preadv,
>  .bdrv_co_pwritev  = nvme_co_pwritev,
> +
> +.bdrv_co_pwrite_zeroes= nvme_co_pwrite_zeroes,
> +
>  .bdrv_co_flush_to_disk= nvme_co_flush,
>  .bdrv_reopen_prepare  = nvme_reopen_prepare,
>  
> diff --git a/block/trace-events b/block/trace-events
> index 7335a42540..943a58569f 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -144,6 +144,7 @@ nvme_submit_command_raw(int c0, int c1, int c2, int c3, 
> int c4, int c5, int c6,
>  nvme_handle_event(void *s) "s %p"
>  nvme_poll_cb(void *s) "s %p"
>  nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d 
> niov %d"
> +nvme_write_zeros(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p 
> offset %"PRId64" bytes %"PRId64" flags %d"
>  nvme_qiov_unaligned(const void *qiov, int n, void *base,

Re: [Qemu-devel] [PATCH] block/file-posix: ignore fail on unlock bytes

2019-03-28 Thread Fam Zheng

On Wed, 03/27 15:49, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> loosening permissions. However file-locking operations may fail even
> in this case, for example on NFS. And this leads to Qemu crash.
> 
> Let's ignore such errors, as we do already on permission update commit
> and abort.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/file-posix.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index db4cccbe51..403e67fe90 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -815,6 +815,20 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>  
>  switch (op) {
>  case RAW_PL_PREPARE:
> +if ((s->perm | new_perm) == s->perm &&
> +(~s->shared_perm | ~new_perm) == ~s->shared_perm)
> +{
> +/*
> + * We are going to unlock bytes, it should not fail. If fail,
> + * just report it and ignore, like we do for ABORT and COMMIT
> + * anyway.
> + */
> +ret = raw_check_lock_bytes(s->fd, new_perm, new_shared, 
> _err);
> +if (local_err) {
> +error_report_err(local_err);
> +}
> +return 0;
> +}
>  ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>         ~s->shared_perm | ~new_shared,
> false, errp);
> -- 
> 2.18.0
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH] docker: trivial changes to `make docker` help

2019-03-24 Thread Fam Zheng




> On Mar 22, 2019, at 05:25, Wainer dos Santos Moschetta  
> wrote:
> 
> Apply double quotes and period punctuation uniformly.
> 
> Signed-off-by: Wainer dos Santos Moschetta 
> ---
> tests/docker/Makefile.include | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
> index 60314d293a..c0e1bf57a3 100644
> --- a/tests/docker/Makefile.include
> +++ b/tests/docker/Makefile.include
> @@ -151,15 +151,15 @@ docker:
>   @echo
>   @echo 'docker:  Print this help.'
>   @echo 'docker-all-tests:Run all image/test combinations.'
> - @echo 'docker-TEST: Run TEST on all image combinations.'
> + @echo 'docker-TEST: Run "TEST" on all image combinations.'
>   @echo 'docker-clean:Kill and remove residual docker testing 
> containers.'
>   @echo 'docker-TEST@IMAGE:   Run "TEST" in container "IMAGE".'
>   @echo ' Note: "TEST" is one of the listed test 
> name,'
>   @echo ' or a script name under 
> $$QEMU_SRC/tests/docker/;'
> - @echo ' "IMAGE" is one of the listed container 
> name."'
> + @echo ' "IMAGE" is one of the listed container 
> name.'
>   @echo 'docker-image:Build all images.'
>   @echo 'docker-image-IMAGE:  Build image "IMAGE".'
> - @echo 'docker-run:  For manually running a "TEST" with 
> "IMAGE"'
> + @echo '    docker-run:  For manually running a "TEST" with 
> "IMAGE".'
>   @echo
>   @echo 'Available container images:'
>   @echo '$(DOCKER_IMAGES)'
> -- 
> 2.20.1
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] Combining synchronous and asynchronous IO

2019-03-17 Thread Fam Zheng




> On Mar 15, 2019, at 01:31, Sergio Lopez  wrote:
> 
> Hi,
> 
> Our current AIO path does a great job at unloading the work from the VM,
> and combined with IOThreads provides a good performance in most
> scenarios. But it also comes with its costs, in both a longer execution
> path and the need of the intervention of the scheduler at various
> points.
> 
> There's one particular workload that suffers from this cost, and that's
> when you have just 1 or 2 cores on the Guest issuing synchronous
> requests. This happens to be a pretty common workload for some DBs and,
> in a general sense, on small VMs.
> 
> I did a quick'n'dirty implementation on top of virtio-blk to get some
> numbers. This comes from a VM with 4 CPUs running on an idle server,
> with a secondary virtio-blk disk backed by a null_blk device with a
> simulated latency of 30us.
> 
> - Average latency (us)
> 
> 
> || AIO+iothread | SIO+iothread |
> | 1 job  |  70  |  55  |
> | 2 jobs |  83  |  82  |
> | 4 jobs |  90  | 159  |
> 
> 
> In this case the intuition matches the reality, and synchronous IO wins
> when there's just 1 job issuing the requests, while it loses hard when
> the are 4.
> 
> While my first thought was implementing this as a tunable, turns out we
> have a hint about the nature of the workload in the number of the
> requests in the VQ. So I updated the code to use SIO if there's just 1
> request and AIO otherwise, with these results:
> 
> ---
> || AIO+iothread | SIO+iothread | AIO+SIO+iothread |
> | 1 job  |  70  |  55  |55|
> | 2 jobs |  83  |  82  |78|
> | 4 jobs |  90  | 159  |90|
> ---
> 
> This data makes me think this is something worth pursuing, but I'd like
> to hear your opinion on it.

Nice. In many cases coroutines just forward the raw read/write to the raw file 
(no qcow2 LBA translation, backup, throttling, etc. in the data path), being 
able to transparently (and dynamically, since the said condition can change any 
time for any request) bypass block layer will be a very interesting idea to 
explore. The challenge is how not to totally break existing features (e.g. live 
snapshot and everything).

> 
> Thanks,
> Sergio.
>

Re: [Qemu-devel] State of QEMU CI as we enter 4.0

2019-03-17 Thread Fam Zheng

> On Mar 15, 2019, at 23:12, Stefan Hajnoczi  wrote:
> 
> On Thu, Mar 14, 2019 at 03:57:06PM +, Alex Bennée wrote:
>> As we approach stabilisation for 4.0 I thought it would be worth doing a
>> review of the current state of CI and stimulate some discussion of where
>> it is working for us and what could be improved.
> 
> Thanks for this summary and for all the work that is being put into CI.
> 
> How should all sub-maintainers be checking their pull requests?
> 
> We should have information and a strict policy on minimum testing of
> pull requests.  Right now I imagine it varies a lot between
> sub-maintainers.
> 
> For my block pull requests I run qemu-iotests locally and also push to
> GitHub to trigger Travis CI.

Well, long story short, by pushing to gitlab.

If the patchew importer is changed to push to a gitlab repo that is watched by 
the same set of gitlab runners (this is a supported setup by gitlab CI), all 
posted patches can be tested the same way.

It’s a natural next step after we figure out how to automate things just for 
Peter's manual pre-merge testing, as long as the machine resources allow 
testing more subjects.

Tesing private branches are much more costly depending on test set size and 
developer numbers. Maybe it’ll be have to limited to maintainer branches first.

Fam

> 
> Stefan

Re: [Qemu-devel] [PATCH] ci: store Patchew configuration in the tree

2019-03-15 Thread Fam Zheng

; +  #!/bin/sh
> +  test "$(uname -m)" = ppc64
> +s390x: |
> +  #!/bin/sh
> +  test "$(uname -m)" = s390x
> +
> +  tests:
> +asan:
> +  requirements: docker
> +  script: |
> +#!/bin/bash
> +time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 
> NETWORK=1
> +
> +docker-clang@ubuntu:
> +  requirements: docker
> +  script: |
> +#!/bin/bash
> +time make docker-test-clang@ubuntu SHOW_ENV=1 J=14 NETWORK=1
> +
> +docker-mingw@fedora:
> +  requirements: docker
> +  script: |
> +#!/bin/bash
> +time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1
> +
> +docker-quick@centos7:
> +  requirements: docker
> +  script: |
> +#!/bin/bash
> +time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> +
> +checkpatch:
> +  script: |
> +#!/bin/bash
> +git rev-parse base > /dev/null || exit 0
> +git config --local diff.renamelimit 0
> +git config --local diff.renames True
> +git config --local diff.algorithm histogram
> +./scripts/checkpatch.pl --mailback base..
> +
> +FreeBSD:
> +  enabled: false
> +  requirements: x86-kvm
> +  script: |
> +#!/bin/bash
> +if qemu-system-x86_64 --help >/dev/null 2>&1; then
> +  QEMU=qemu-system-x86_64
> +elif /usr/libexec/qemu-kvm --help >/dev/null 2>&1; then
> +  QEMU=/usr/libexec/qemu-kvm
> +else
> +  exit 1
> +fi
> +make vm-build-freebsd J=14 QEMU=$QEMU
> +exit 0
> +
> +ppcle:
> +  enabled: false
> +  requirements: ppcle
> +  script: |
> +#!/bin/bash
> +INSTALL=$PWD/install
> +BUILD=$PWD/build
> +mkdir -p $BUILD $INSTALL
> +SRC=$PWD
> +cd $BUILD
> +$SRC/configure --prefix=$INSTALL
> +make -j14
> +
> +ppcbe:
> +  enabled: false
> +  requirements: ppcbe
> +  script: |
> +#!/bin/bash
> +INSTALL=$PWD/install
> +BUILD=$PWD/build
> +mkdir -p $BUILD $INSTALL
> +SRC=$PWD
> +cd $BUILD
> +$SRC/configure --prefix=$INSTALL
> +make -j14
> +
> +s390x:
> +  enabled: false
> +  requirements: ppcle
> +  script: |
> +#!/bin/bash
> +INSTALL=$PWD/install
> +BUILD=$PWD/build
> +mkdir -p $BUILD $INSTALL
> +SRC=$PWD
> +cd $BUILD
> +$SRC/configure --prefix=$INSTALL
> +make -j14
> -- 
> 2.20.1
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] State of QEMU CI as we enter 4.0

2019-03-15 Thread Fam Zheng

> On Mar 15, 2019, at 17:58, Alex Bennée  wrote:
> 
> 
> Fam Zheng  writes:
> 
>>> On Mar 15, 2019, at 16:57, Alex Bennée  wrote:
>>> 
>>> I had installed the gitlab-runner from the Debian repo but it was out
>>> of date and didn't seem to work correctly.
>> 
>> If there can be a sidecar x86 box next to the test bot, it can be the
>> controller node which runs gitlab-runner, the test script (in
>> .gitlab-ci.yml) can then sshs into the actual env to run test
>> commands.
> 
> Sure although that just adds complexity compared to spinning up a box in
> the cloud ;-)

In the middle is one controller node and a number of hetergeneous boxes it 
knows how to control with ssh.

(BTW patchew tester only relies on vanilla python3 to work, though clearly it 
suffers from insufficient manpower assumed the SLA we'll need on the merge 
test. It’s unfortunate that gitlab-runner is a binary.)

Fam

Re: [Qemu-devel] State of QEMU CI as we enter 4.0

2019-03-15 Thread Fam Zheng

> On Mar 15, 2019, at 16:57, Alex Bennée  wrote:
> 
> I had installed the gitlab-runner from the Debian repo but it was out
> of date and didn't seem to work correctly.

If there can be a sidecar x86 box next to the test bot, it can be the 
controller node which runs gitlab-runner, the test script (in .gitlab-ci.yml) 
can then sshs into the actual env to run test commands.

Fam

Re: [Qemu-devel] State of QEMU CI as we enter 4.0

2019-03-14 Thread Fam Zheng




> On Mar 15, 2019, at 02:22, Peter Maydell  wrote:
> 
> On Thu, 14 Mar 2019 at 15:57, Alex Bennée  wrote:
>> Testing in the Cloud
>> 
>> 
>> After BuildBot went out-of-service we have been relying heavily on Travis
>> as our primary CI platform. This has been creaking somewhat under the
>> strain and while we have a large test matrix its coverage is fairly
>> Ubuntu/x86 centric. However in recent months we've expanded and we now
>> have:
>> 
>>  - Shippable, cross compilers - catches a lot of 32/64 bit isms
>>  - Cirrus, FreeBSD and MacOS builds
>>  - GitLab, Alternative x86/Debian - iotests
> 
> Are any of these capable of replacing my ad-hoc collection
> of build test systems for testing merges ? I would quite like
> to be able to do that, because it would make it easier for
> other people to take over the process of handling pull requests
> when I'm away.
> 
> I think the main requirements for that would be:
> * covers full range of hosts[*]
> * can be asked to do a test build of a merge before
>   I push it to master
> * reliably completes all builds within say 90 minutes
>   of being asked to start
> 
> [+] I currently test:
> - windows crossbuilds
> - S390, AArch32, AArch64, PPC64 Linux
>   (SPARC currently disabled because of the migration-test flakiness)
> - OSX
> - FreeBSD, OpenBSD, NetBSD via the tests/vm setup
> - various x86-64 configs: from-clean; debug; clang; TCI; no-tcg;
>   linux-static (including 'make check-tcg’)

I think the gitlab CI architecture is quite capable of doing what you want 
here. Some efforts will be needed to set up the gitlab-runners in each of above 
environments and I expect tweakings will be needed to get the automation 
smooth, but it is fairly straigtforward and managable:

https://docs.gitlab.com/runner/

Fam

> 
> thanks
> -- PMM
>

Re: [Qemu-devel] [Qemu-block] [PATCH] tcmu: Introduce qemu-tcmu utility

2019-03-12 Thread Fam Zheng




> On Mar 12, 2019, at 18:03, Kevin Wolf  wrote:
> 
> Am 12.03.2019 um 03:18 hat Fam Zheng geschrieben:
>>> On Mar 11, 2019, at 19:06, Kevin Wolf  wrote:
>>> Am 09.03.2019 um 02:46 hat Yaowei Bai geschrieben:
>>>> Thanks for explaining the background. It comes to my mind that actually we
>>>> talked about these two cases with Fam a bit long time ago and decided to
>>>> support both these two cases. The reason why we implement case2 first is
>>>> currently we care more about exporting new opened images and it's a bit
>>>> more convenient, exporting from a VM or QMP can be added in the later
>>>> release. Do you think it's reasonable/acceptable that we support both
>>>> cases and use case2 for normal new opened images and case1 for the
>>>> circumstances you mention above?
>>> 
>>> I would like to avoid a second code path because it comes with a
>>> maintenance cost.
>>> 
>>> Experience also tells that adding a new way to parse option strings will
>>> come back at us later because it we must always maintain compatibility
>>> with yet another format.
>> 
>> If the rule is that cfgstr strictly follows -blockdev syntax and the
>> parsing code is mostly shared, it shouldn’t be a problem, right? I
>> suppose this will limit the expressiveness of cfgstr but might be a
>> reasonable tradeoff between implementation complexity, user
>> friendliness and interface consistency.
> 
> Yes, if we could pass the cfgstr directly to an existing parser, that
> would make it less bad at least.
> 
> Of course, if we directly pass cfgstr to the -blockdev parser, it also
> means that we can't have additional options for configuring the
> BlockBackend (which could be part of the -export command line option for
> qemu-tcmu).
> 
> That could be addressed by wrapping another QAPI object around it that
> only contain BlockdevOptions as one field, but then you have to prefix
> every block node option if you use keyval visitor syntax.
> 
>> Another possibility is to use json: format in cfgstr for anything more
>> complex than a single -blockdev.
> 
> Parsing like -blockdev already includes JSON syntax (which is necessary
> to get the full expressiveness).
> 
> If you want to support the equivalent of multiple -blockdevs, you'd need
> to wrap the BlockdevOptions in a QAPI list.
> 
>>> So I would prefer not doing this and just passing command line options
>>> to qemu-tcmu, which can reuse the already existing code paths.
>> 
>> I think the effect of not supporting adding new blockdev from cfgstr
>> is that we have to resort to QMP to allow hot-adding targets. It’s not
>> necessarily bad, though I’m not sure hwo that aligns with Yaowei’s
>> development plan.
> 
> This is true, but we need a way to do this from QMP anyway. So the
> question is whether we want to introduce a second way to achieve the
> same thing a bit more conveniently. But I expect that hot-adding is
> something that is usually done with management software anyway, so does
> the convenience actually buy us much?

The real difference is, are existing management software to adopt qemu-tcmu 
built around targetcli family or QMP? I think the management must understand 
targetcli interface to work, talking QMP is an additional burden.

So IMO cfgstr can ideally be the only channel that the management interacts 
with, if we could reuse existing QMP/QAPI code well enough.

Fam

> 
> Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH] tcmu: Introduce qemu-tcmu utility

2019-03-11 Thread Fam Zheng

> On Mar 11, 2019, at 19:06, Kevin Wolf  wrote:
> 
> Am 09.03.2019 um 02:46 hat Yaowei Bai geschrieben:
>> Thanks for explaining the background. It comes to my mind that actually we
>> talked about these two cases with Fam a bit long time ago and decided to
>> support both these two cases. The reason why we implement case2 first is
>> currently we care more about exporting new opened images and it's a bit
>> more convenient, exporting from a VM or QMP can be added in the later
>> release. Do you think it's reasonable/acceptable that we support both
>> cases and use case2 for normal new opened images and case1 for the
>> circumstances you mention above?
> 
> I would like to avoid a second code path because it comes with a
> maintenance cost.
> 
> Experience also tells that adding a new way to parse option strings will
> come back at us later because it we must always maintain compatibility
> with yet another format.

If the rule is that cfgstr strictly follows -blockdev syntax and the parsing 
code is mostly shared, it shouldn’t be a problem, right? I suppose this will 
limit the expressiveness of cfgstr but might be a reasonable tradeoff between 
implementation complexity, user friendliness and interface consistency.

Another possibility is to use json: format in cfgstr for anything more complex 
than a single -blockdev.

> 
> So I would prefer not doing this and just passing command line options
> to qemu-tcmu, which can reuse the already existing code paths.

I think the effect of not supporting adding new blockdev from cfgstr is that we 
have to resort to QMP to allow hot-adding targets. It’s not necessarily bad, 
though I’m not sure hwo that aligns with Yaowei’s development plan.

Fam

> 
> Kevin
>

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2019-02-27 Thread Fam Zheng

On Tue, 01/22 16:12, Daniel P. Berrangé wrote:
> > can't you just refresh that pre-built image to have SDL 2, based on your
> > auto-install patches, so we don't have to wait for that to merge.
> 
> Please could we just get this pre-built img updated so that we are
> not blocked on the SDL 1 removal anymore.

Oops. I was seriously distracted from the external mail folders recently..
Updated the image, please test it.

Fam

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2019-02-27 Thread Fam Zheng

On Tue, 01/22 16:12, Daniel P. Berrangé wrote:
> > can't you just refresh that pre-built image to have SDL 2, based on your
> > auto-install patches, so we don't have to wait for that to merge.
> 
> Please could we just get this pre-built img updated so that we are
> not blocked on the SDL 1 removal anymore.

Oops. I was seriously distracted from the external mail folders recently..
Updated the image, please test it.

Fam

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2019-01-13 Thread Fam Zheng




> On Jan 9, 2019, at 21:59, Daniel P. Berrangé  wrote:
> 
> On Wed, Jan 09, 2019 at 01:55:04PM +, Daniel P. Berrangé wrote:
>> On Wed, Nov 14, 2018 at 05:58:07PM +0800, Fam Zheng wrote:
>>> On Sun, 11/11 18:20, Brad Smith wrote:
>>>> ping.
>>> 
>>> Queued. Will send a pull request soon.
>> 
>> ping, this seems to have got lost, and is blocking us deleting SDL 1.2
>> support from QEMU.
> 
> Sigh, never mind, I found the PR is here:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00732.html

I had to drop it from the queue because Peter didn’t like the QEMU version 
requirement (for the slirp tftp parameters). :(

What is the blocker for delete SDL 1.2? Is it breaking tests/vm/openbsd? If so 
maybe we can add —disable-sdl to its configure command.

Fam

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

[Qemu-devel] [PULL 1/2] block/nvme: optimize the performance of nvme driver based on vfio-pci

2019-01-08 Thread Fam Zheng

From: Li Feng 

When the IO size is larger than 2 pages, we move the the pointer one by
one in the pagelist, this is inefficient.

This is a simple benchmark result:

Before:
$ qemu-io -c 'write 0 1G' nvme://:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec)

After:
$ qemu-io -c 'write 0 1G' nvme://:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec)

Signed-off-by: Li Feng 
Message-Id: <20181101103807.25862-1-lifeng1...@gmail.com>
Signed-off-by: Fam Zheng 
---
 block/nvme.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 29294038fc..982097b5b1 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -837,7 +837,7 @@ try_map:
 }
 
 for (j = 0; j < qiov->iov[i].iov_len / s->page_size; j++) {
-pagelist[entries++] = iova + j * s->page_size;
+pagelist[entries++] = cpu_to_le64(iova + j * s->page_size);
 }
 trace_nvme_cmd_map_qiov_iov(s, i, qiov->iov[i].iov_base,
 qiov->iov[i].iov_len / s->page_size);
@@ -850,20 +850,16 @@ try_map:
 case 0:
 abort();
 case 1:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
+cmd->prp1 = pagelist[0];
 cmd->prp2 = 0;
 break;
 case 2:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
-cmd->prp2 = cpu_to_le64(pagelist[1]);;
+cmd->prp1 = pagelist[0];
+cmd->prp2 = pagelist[1];
 break;
 default:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
-cmd->prp2 = cpu_to_le64(req->prp_list_iova);
-for (i = 0; i < entries - 1; ++i) {
-pagelist[i] = cpu_to_le64(pagelist[i + 1]);
-}
-pagelist[entries - 1] = 0;
+cmd->prp1 = pagelist[0];
+cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
 break;
 }
 trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
-- 
2.11.0

[Qemu-devel] [PULL 2/2] docker: Use a stable snapshot for Debian Sid

2019-01-08 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

The Debian Sid repository is not garanteed to be stable, as his
'unstable' name suggest :)

To allow quick testing, Debian maintainers might push packages
various time a day. Sometime package dependencies might break,
which is annoying when using this repository for stable development
(which is not recommended, but Sid provides edge packages we use
for testing).

Debian provides repositories snapshots which are suitable for our
use. Pick a recent date that works. When required, update to newer
releases will be easy.

This fixes current issues with this image:

  $ make docker-image-debian-sid
  [...]
  The following packages have unmet dependencies:
   build-essential : Depends: dpkg-dev (>= 1.17.11) but it is not going to be 
installed
   git : Depends: perl but it is not going to be installed
 Depends: liberror-perl but it is not going to be installed
   pkg-config : Depends: libdpkg-perl but it is not going to be installed
   texinfo : Depends: perl (>= 5.26.2-6) but it is not going to be installed
 Depends: libtext-unidecode-perl but it is not going to be installed
 Depends: libxml-libxml-perl but it is not going to be installed
  E: Unable to correct problems, you have held broken packages.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20181101183705.5422-1-phi...@redhat.com>
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/debian-sid.docker | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/docker/dockerfiles/debian-sid.docker 
b/tests/docker/dockerfiles/debian-sid.docker
index 9a3d168705..4e4cda0ba5 100644
--- a/tests/docker/dockerfiles/debian-sid.docker
+++ b/tests/docker/dockerfiles/debian-sid.docker
@@ -13,6 +13,10 @@
 
 FROM debian:sid-slim
 
+# Use a snapshot known to work (see http://snapshot.debian.org/#Usage)
+ENV DEBIAN_SNAPSHOT_DATE "20181030"
+RUN sed -i "s%^deb \(https\?://\)deb.debian.org/debian/\? \(.*\)%deb 
[check-valid-until=no] 
\1snapshot.debian.org/archive/debian/${DEBIAN_SNAPSHOT_DATE} \2%" 
/etc/apt/sources.list
+
 # Duplicate deb line as deb-src
 RUN cat /etc/apt/sources.list | sed "s/^deb\ /deb-src /" >> 
/etc/apt/sources.list
 
-- 
2.11.0

[Qemu-devel] [PULL 0/2] Block/testing patches

2019-01-08 Thread Fam Zheng

The following changes since commit 147923b1a901a0370f83a0f4c58ec1baffef22f0:

  Merge remote-tracking branch 'remotes/kraxel/tags/usb-20190108-pull-request' 
into staging (2019-01-08 16:07:32 +)

are available in the git repository at:

  https://github.com/famz/qemu.git tags/staging-pull-request

for you to fetch changes up to 4ce58d861bf740bb893bd28b974675ad3d9f98b5:

  docker: Use a stable snapshot for Debian Sid (2019-01-09 09:38:34 +0800)


Block/testing patches

v2: Fix URL.
Drop BSD patch.



Li Feng (1):
  block/nvme: optimize the performance of nvme driver based on vfio-pci

Philippe Mathieu-Daudé (1):
  docker: Use a stable snapshot for Debian Sid

 block/nvme.c   | 16 ++--
 tests/docker/dockerfiles/debian-sid.docker |  4 
 2 files changed, 10 insertions(+), 10 deletions(-)

-- 
2.11.0

[Qemu-devel] [PULL 3/3] tests: vm: auto_install OpenBSD

2019-01-05 Thread Fam Zheng

From: Fam Zheng 

Upgrade OpenBSD to 6.4 using auto_install. Especially, drop SDL1,
include SDL2.

Also do the build in $HOME since both /var/tmp and /tmp are tmpfs with
limited capacities.

Signed-off-by: Fam Zheng 

Message-Id: <20181031025712.18602-1-f...@redhat.com>
Signed-off-by: Fam Zheng 
---
 tests/vm/basevm.py |  6 ++--
 tests/vm/openbsd   | 85 ++
 2 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index 5caf77d6b8..6fb446d4c5 100755
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -68,8 +68,6 @@ class BaseVM(object):
 self._args = [ \
 "-nodefaults", "-m", "4G",
 "-cpu", "max",
-"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22",
-"-device", "virtio-net-pci,netdev=vnet",
 "-vnc", "127.0.0.1:0,to=20",
 "-serial", "file:%s" % os.path.join(self._tmpdir, "serial.out")]
 if vcpus and vcpus > 1:
@@ -146,8 +144,10 @@ class BaseVM(object):
 "-device",
 "virtio-blk,drive=%s,serial=%s,bootindex=1" % 
(name, name)]
 
-def boot(self, img, extra_args=[]):
+def boot(self, img, extra_args=[], extra_usernet_args=""):
 args = self._args + [
+"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22" + 
extra_usernet_args,
+"-device", "virtio-net-pci,netdev=vnet",
 "-device", "VGA",
 "-drive", "file=%s,if=none,id=drive0,cache=writeback" % img,
 "-device", "virtio-blk,drive=drive0,bootindex=0"]
diff --git a/tests/vm/openbsd b/tests/vm/openbsd
index cfe0572c59..99a7e98d80 100755
--- a/tests/vm/openbsd
+++ b/tests/vm/openbsd
@@ -14,6 +14,9 @@
 import os
 import sys
 import subprocess
+import time
+import atexit
+import tempfile
 import basevm
 
 class OpenBSDVM(basevm.BaseVM):
@@ -21,25 +24,83 @@ class OpenBSDVM(basevm.BaseVM):
 arch = "x86_64"
 BUILD_SCRIPT = """
 set -e;
-rm -rf /var/tmp/qemu-test.*
-cd $(mktemp -d /var/tmp/qemu-test.XX);
+rm -rf $HOME/qemu-test.*
+cd $(mktemp -d $HOME/qemu-test.XX);
 tar -xf /dev/rsd1c;
-./configure --cc=x86_64-unknown-openbsd6.1-gcc-4.9.4 
--python=python2.7 {configure_opts};
+./configure {configure_opts};
 gmake --output-sync -j{jobs} {verbose};
 # XXX: "gmake check" seems to always hang or fail
 #gmake --output-sync -j{jobs} check {verbose};
 """
 
+def _install_os(self, img):
+tmpdir = tempfile.mkdtemp()
+pxeboot = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/pxeboot;,
+
sha256sum="d87ab39d941ff926d693943a927585945456ccedb76ea504a251b4b93cd4c266")
+bsd_rd = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/bsd.rd;,
+
sha256sum="89505c683cbcd75582fe475e847ed53d89e2b8180c3e3d61f4eb4b76b5e11f5c")
+install = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/install64.iso;,
+
sha256sum='81833b79e23dc0f961ac5fb34484bca66386deb3181ddb8236870fa4f488cdd2')
+subprocess.check_call(["qemu-img", "create", img, "32G"])
+subprocess.check_call(["cp", pxeboot, os.path.join(tmpdir, 
"auto_install")])
+subprocess.check_call(["cp", bsd_rd, os.path.join(tmpdir, "bsd")])
+
+self._gen_install_conf(tmpdir)
+# BOOTP filename being auto_install makes sure OpenBSD installer
+# not prompt for "auto install mode"
+usernet_args = ",tftp=%s,bootfile=/auto_install" % tmpdir
+usernet_args += ",tftp-server-name=10.0.2.4"
+usernet_args += ",guestfwd=tcp:10.0.2.4:80-cmd:cat %s" % \
+os.path.join(tmpdir, "install.conf")
+self.boot(img,
+  extra_args=["-boot", "once=n", "-no-reboot",
+  "-cdrom", install],
+  extra_usernet_args=usernet_args)
+self.wait()
+
+def _gen_install_conf(self, tmpdir):
+contents = """\
+HTTP/1.0 200 OK
+
+System hostname = qemu-openbsd
+Password for root = qemupass
+Public ssh key for root = {pub_key}
+Allow root ssh login = yes
+Network interfaces = vio0
+IPv4 address for vio0 = dhcp
+Setup a user = qemu
+Password for user = qemupa

[Qemu-devel] [PULL 1/3] block/nvme: optimize the performance of nvme driver based on vfio-pci

2019-01-05 Thread Fam Zheng

From: Li Feng 

When the IO size is larger than 2 pages, we move the the pointer one by
one in the pagelist, this is inefficient.

This is a simple benchmark result:

Before:
$ qemu-io -c 'write 0 1G' nvme://:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec)

After:
$ qemu-io -c 'write 0 1G' nvme://:00:04.0/1

wrote 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec)

 $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1

read 1073741824/1073741824 bytes at offset 0
1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec)

Signed-off-by: Li Feng 
Message-Id: <20181101103807.25862-1-lifeng1...@gmail.com>
Signed-off-by: Fam Zheng 
---
 block/nvme.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 29294038fc..982097b5b1 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -837,7 +837,7 @@ try_map:
 }
 
 for (j = 0; j < qiov->iov[i].iov_len / s->page_size; j++) {
-pagelist[entries++] = iova + j * s->page_size;
+pagelist[entries++] = cpu_to_le64(iova + j * s->page_size);
 }
 trace_nvme_cmd_map_qiov_iov(s, i, qiov->iov[i].iov_base,
 qiov->iov[i].iov_len / s->page_size);
@@ -850,20 +850,16 @@ try_map:
 case 0:
 abort();
 case 1:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
+cmd->prp1 = pagelist[0];
 cmd->prp2 = 0;
 break;
 case 2:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
-cmd->prp2 = cpu_to_le64(pagelist[1]);;
+cmd->prp1 = pagelist[0];
+cmd->prp2 = pagelist[1];
 break;
 default:
-cmd->prp1 = cpu_to_le64(pagelist[0]);
-cmd->prp2 = cpu_to_le64(req->prp_list_iova);
-for (i = 0; i < entries - 1; ++i) {
-pagelist[i] = cpu_to_le64(pagelist[i + 1]);
-}
-pagelist[entries - 1] = 0;
+cmd->prp1 = pagelist[0];
+cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
 break;
 }
 trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
-- 
2.11.0

[Qemu-devel] [PULL 2/3] docker: Use a stable snapshot for Debian Sid

2019-01-05 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

The Debian Sid repository is not garanteed to be stable, as his
'unstable' name suggest :)

To allow quick testing, Debian maintainers might push packages
various time a day. Sometime package dependencies might break,
which is annoying when using this repository for stable development
(which is not recommended, but Sid provides edge packages we use
for testing).

Debian provides repositories snapshots which are suitable for our
use. Pick a recent date that works. When required, update to newer
releases will be easy.

This fixes current issues with this image:

  $ make docker-image-debian-sid
  [...]
  The following packages have unmet dependencies:
   build-essential : Depends: dpkg-dev (>= 1.17.11) but it is not going to be 
installed
   git : Depends: perl but it is not going to be installed
 Depends: liberror-perl but it is not going to be installed
   pkg-config : Depends: libdpkg-perl but it is not going to be installed
   texinfo : Depends: perl (>= 5.26.2-6) but it is not going to be installed
 Depends: libtext-unidecode-perl but it is not going to be installed
 Depends: libxml-libxml-perl but it is not going to be installed
  E: Unable to correct problems, you have held broken packages.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20181101183705.5422-1-phi...@redhat.com>
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/debian-sid.docker | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/docker/dockerfiles/debian-sid.docker 
b/tests/docker/dockerfiles/debian-sid.docker
index 9a3d168705..4e4cda0ba5 100644
--- a/tests/docker/dockerfiles/debian-sid.docker
+++ b/tests/docker/dockerfiles/debian-sid.docker
@@ -13,6 +13,10 @@
 
 FROM debian:sid-slim
 
+# Use a snapshot known to work (see http://snapshot.debian.org/#Usage)
+ENV DEBIAN_SNAPSHOT_DATE "20181030"
+RUN sed -i "s%^deb \(https\?://\)deb.debian.org/debian/\? \(.*\)%deb 
[check-valid-until=no] 
\1snapshot.debian.org/archive/debian/${DEBIAN_SNAPSHOT_DATE} \2%" 
/etc/apt/sources.list
+
 # Duplicate deb line as deb-src
 RUN cat /etc/apt/sources.list | sed "s/^deb\ /deb-src /" >> 
/etc/apt/sources.list
 
-- 
2.11.0

[Qemu-devel] [PULL 0/3] Block and testing patches

2019-01-05 Thread Fam Zheng

The following changes since commit 9b2e891ec5ccdb4a7d583b77988848282606fdea:

  Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into 
staging (2018-12-22 11:25:31 +)

are available in the git repository at:

  https://f...@github.com/famz/qemu tags/block-and-testing-pull-request

for you to fetch changes up to 3baa6942169cce0f58b88484ad010742c382480a:

  tests: vm: auto_install OpenBSD (2019-01-02 20:51:15 +0800)


Pull request



Fam Zheng (1):
  tests: vm: auto_install OpenBSD

Li Feng (1):
  block/nvme: optimize the performance of nvme driver based on vfio-pci

Philippe Mathieu-Daudé (1):
  docker: Use a stable snapshot for Debian Sid

 block/nvme.c   | 16 +++---
 tests/docker/dockerfiles/debian-sid.docker |  4 ++
 tests/vm/basevm.py |  6 +--
 tests/vm/openbsd   | 85 +-
 4 files changed, 86 insertions(+), 25 deletions(-)

-- 
2.11.0

Re: [Qemu-devel] [PATCH] docker: Use a stable snapshot for Debian Sid

2019-01-01 Thread Fam Zheng




> On Dec 20, 2018, at 19:20, Philippe Mathieu-Daudé  wrote:
> 
> Hi Fam,
> 
> On 11/2/18 8:24 AM, Fam Zheng wrote:
>> On Fri, Nov 2, 2018 at 3:20 PM Philippe Mathieu-Daudé  
>> wrote:
>>> 
>>> Hi Fam,
>>> 
>>> Thanks for picking this.
>>> 
>>> On Fri, Nov 2, 2018 at 7:48 AM Fam Zheng  wrote:
>>>> On Thu, 11/01 19:37, Philippe Mathieu-Daudé wrote:
>>>>> The Debian Sid repository is not garanteed to be stable, as his
>>>>> 'unstable' name suggest :)
>>> 
>>> There is an error in "my be" -> "might be"...
>>> Do you mind to update the comment:
>>> 
>>>>> To allow quick testing, packages are pushed various time a day,
>>>>> which my be annoying when trying to use it for stable development
>>>>> (which is not recommended, but Sid provides edge packages we use
>>>>> for testing).
>>> 
>>> By:
>>> 
>>> To allow quick testing, Debian maintainers might push packages
>>> various time a day. Sometime package dependencies might break,
>>> which is annoying when using this repository for stable development
>>> (which is not recommended, but Sid provides edge packages we use
>>> for testing).
>> 
>> Sure, updated in my queue.
> 
> It seems your queue never hit master…
> 

Oh.. I'll have to check out later today.

Fam

Re: [Qemu-devel] Patchew down?

2019-01-01 Thread Fam Zheng

> On Dec 30, 2018, at 08:09, Philippe Mathieu-Daudé  wrote:
> 
> Hi,
> 
> I think patchew is having some trouble since at least 1 week:
> new series aren't added.

Thanks for the reminder, Phil. I fixed https on patches.org but forgot to 
switch the importer back from http. Now it’s back to work.

Fam

> 
> Regards,
> 
> Phil.

Re: [Qemu-devel] Question about aio_poll and glib aio_ctx_dispatch

2018-12-19 Thread Fam Zheng




> On Dec 20, 2018, at 06:58, Li Qiang  wrote:
> 
> Hello Paolo
> 
> Thanks for your kind reply.
> 
> Yes, aio_poll and aio_ctx_dispatch mostly run in different threads, though
> Sometimes they can run in a thread nested from Fam’s slides:
> →http://events17.linuxfoundation.org/sites/events/files/slides/Improving%20the%20QEMU%20Event%20Loop%20-%203.pdf
> 
> So you mean we can poll the same fd in two threads? If so , the ‘fd’ in 
> ‘qemu_aio_context’ will be run twice, I think there’s 
> something I understand wrong.
> 
> Let’s take an example. In the ‘v9fs_reset’.
> 
> void v9fs_reset(V9fsState *s)
> {
>VirtfsCoResetData data = { .pdu = { .s = s }, .done = false };
>Coroutine *co;
> 
>while (!QLIST_EMPTY(>active_list)) {
>aio_poll(qemu_get_aio_context(), true);
>}
> 
>co = qemu_coroutine_create(virtfs_co_reset, );
>qemu_coroutine_enter(co);
> 
>while (!data.done) {
>aio_poll(qemu_get_aio_context(), true);
>}
> }
> 
> Here we use aio_poll to wait the pending action to complete.
> Here aio_poll will poll the fds in ‘qemu_aio_context’ in vcpu thread,
> However, the main loop is also poll the fds in ‘qemu_aio_context’.
> If some ‘fd’ in ‘qemu_aio_context’ has events, the two thread will be wakeup?
> You say both will call aio_dispatch_handlers and timerlistgroup_run_timers.
> But the are the same fd, how can this happen?


I think in this case BQL is used to synchronize them so when v9fs_reset runs, 
the main loop doesn't.

Thanks,
Fam

> 
> Thanks,
> Li Qiang
> 
> 
> 发件人: Paolo Bonzini
> 发送时间: 2018年12月20日 4:42
> 收件人: Li Qiang; stefa...@redhat.com; f...@euphon.net; Qemu Developers; 李强
> 主题: Re: Question about aio_poll and glib aio_ctx_dispatch
> 
> On 19/12/18 11:05, Li Qiang wrote:
>> Sent it to qemu-devel.
>> 
>> Li Qiang mailto:liq...@gmail.com>> 于2018年12月19日周
>> 三 下午6:04写道：
>> 
>>Hello Paolo, Stefan, Fam and all,
>> 
>>Here I have a question about 'aio_poll'.
>>IIUC the 'aio_poll' is (mostly) used for synchronous IO
>>as I see a lot of code like this: 
>>while(condition)
>> aio_poll();
>> 
>>However it seems the 'aio_poll' and 'aio_ctx_dispatch' both poll the fd.
>>So what happened when the 'fd' has events, which function will be
>>wakeup?
> 
> Roughly speaking, aio_poll is used for synchronous IO and within I/O
> threads; aio_ctx_dispatch is used within the main thread.
> 
> Both end up calling aio_dispatch_handlers and timerlistgroup_run_timers.
> 
> Paolo
>

Re: [Qemu-devel] patchew.org using expired https certificate

2018-12-14 Thread Fam Zheng




> On Dec 14, 2018, at 18:13, Stefan Hajnoczi  wrote:
> 
> On Fri, Dec 14, 2018 at 09:54:06AM +, Peter Maydell wrote:
>> Connecting to https://patchew.org/ makes my browser complain:
>> 
>> # patchew.org uses an invalid security certificate.
>> # The certificate expired on 19 November 2017, 00:27:00 GMT.
>> # The current time is 14 December 2018, 09:52.
>> # Error code: SEC_ERROR_EXPIRED_CERTIFICATE
>> 
>> which is a bit odd, given I don't think this has been
>> broken since 2017. Did a cert-renewal script go wrong?
> 
> Hi Fam,
> Let's Encrypt isn't set up correctly on patchew.org.  Any ideas?

I tried to solve a coming expiration but some misconfiguration and mistakes by 
me stoped it from working today, and I’m unfortunately hitting a rate limit of 
certbot operations. I’ll fix it when the limit is cleared (next week, 
hopefully). And until then, we’ll have to use HTTP. :(

Fam

> 
> Also, is there a backup admin for the patchew VM at OSUOSL?  I have root
> access but I'd prefer for someone with patchew internals knowledge to be
> backup admin:
> https://wiki.qemu.org/AdminContacts
> 
> Stefan

Re: [Qemu-devel] [PATCH v6 00/37] ppc: support for the XIVE interrupt controller (POWER9)

2018-12-06 Thread Fam Zheng




> On Dec 6, 2018, at 14:14, Cédric Le Goater  wrote:
> 
> Hello,
> 
>> Your patch has style problems, please review.  If any of these errors
>> are false positives report them to the maintainer, see
>> CHECKPATCH in MAINTAINERS.
>> Checking PATCH 25/37: spapr/xive: add state synchronization with KVM...
>> Checking PATCH 26/37: spapr/xive: introduce a VM state change handler...
>> ERROR: spaces required around that '*' (ctx:WxV)
>> #38: FILE: hw/intc/spapr_xive_kvm.c:341:
>> + static void kvmppc_xive_sync_all(sPAPRXive *xive, Error **errp)
>> ^
>> 
>> total: 1 errors, 0 warnings, 135 lines checked
> 
> This looks like a false positive.

Right. Without looking at the checkpatch code, I suspect this is due to 
sPAPRXive starts with a lower case letter which is not standard.

Fam

> 
> C.
>

[Qemu-devel] [PATCH] MAINTAINERS: Update email address for Fam Zheng

2018-11-21 Thread Fam Zheng

Since I am about to change company, update the email address in
MAINTAINERS to my personal one. Depending on responsibility changes I
may eventually fade out in some of the maintained areas, but that will
be figured out afterward, or maybe I'll use the work email later. For
now, just do a search and replace.

Signed-off-by: Fam Zheng 
Signed-off-by: Fam Zheng 
---
 MAINTAINERS | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1032406c56..906d40d06d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1252,7 +1252,7 @@ T: git https://github.com/jasowang/qemu.git net
 
 SCSI
 M: Paolo Bonzini 
-R: Fam Zheng 
+R: Fam Zheng 
 S: Supported
 F: include/hw/scsi/*
 F: hw/scsi/*
@@ -1565,7 +1565,7 @@ T: git https://repo.or.cz/qemu/kevin.git block
 
 Block I/O path
 M: Stefan Hajnoczi 
-M: Fam Zheng 
+M: Fam Zheng 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: util/async.c
@@ -1579,7 +1579,7 @@ T: git https://github.com/stefanha/qemu.git block
 
 Block SCSI subsystem
 M: Paolo Bonzini 
-R: Fam Zheng 
+R: Fam Zheng 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: include/scsi/*
@@ -1611,7 +1611,7 @@ F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
 Dirty Bitmaps
-M: Fam Zheng 
+M: Fam Zheng 
 M: John Snow 
 L: qemu-bl...@nongnu.org
 S: Supported
@@ -1985,7 +1985,7 @@ F: tests/test-throttle.c
 L: qemu-bl...@nongnu.org
 
 UUID
-M: Fam Zheng 
+M: Fam Zheng 
 S: Supported
 F: util/uuid.c
 F: include/qemu/uuid.h
@@ -2116,7 +2116,7 @@ F: disas/tci.c
 Block drivers
 -
 VMDK
-M: Fam Zheng 
+M: Fam Zheng 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: block/vmdk.c
@@ -2202,13 +2202,13 @@ F: block/gluster.c
 T: git https://github.com/codyprime/qemu-kvm-jtc.git block
 
 Null Block Driver
-M: Fam Zheng 
+M: Fam Zheng 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: block/null.c
 
 NVMe Block Driver
-M: Fam Zheng 
+M: Fam Zheng 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: block/nvme*
@@ -2339,7 +2339,7 @@ Build and test automation
 -
 Build and test automation
 M: Alex Bennée 
-M: Fam Zheng 
+M: Fam Zheng 
 R: Philippe Mathieu-Daudé 
 L: qemu-devel@nongnu.org
 S: Maintained
-- 
2.17.2

Re: [Qemu-devel] When AioHandler->is_external=true?

2018-11-20 Thread Fam Zheng

On Tue, 11/20 20:34, Dongli Zhang wrote:
> Hi,
> 
> Would you please help explain in which case AioHandler->is_external is true, 
> and
> when it is false?
> 
> I read about iothread and mainloop and I am little bit confused about it.

VirtIO's ioeventfd is an example of is_external == true. It means the events
handler on this fd may initiate more I/O, such as read/write on virtual storage
backend, so are specially taken care of at certain points when we won't want
more I/O requests to be processed, such as when a block job is completing, or in
the middle of a QMP transaction.

Fam

Re: [Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2018-11-14 Thread Fam Zheng

On Sun, 11/11 18:20, Brad Smith wrote:
> ping.

Queued. Will send a pull request soon.

Fam

> 
> On 10/30/2018 10:57 PM, Fam Zheng wrote:
> > Upgrade OpenBSD to 6.4 using auto_install. Especially, drop SDL1,
> > include SDL2.
> > 
> > Also do the build in $HOME since both /var/tmp and /tmp are tmpfs with
> > limited capacities.
> > 
> > Signed-off-by: Fam Zheng 
> > 
> > ---
> > 
> > v4: Use 6.4. [Brad]
> > ---
> >   tests/vm/basevm.py |  6 ++--
> >   tests/vm/openbsd   | 85 +++---
> >   2 files changed, 76 insertions(+), 15 deletions(-)
> > 
> > diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
> > index 5caf77d6b8..6fb446d4c5 100755
> > --- a/tests/vm/basevm.py
> > +++ b/tests/vm/basevm.py
> > @@ -68,8 +68,6 @@ class BaseVM(object):
> >   self._args = [ \
> >   "-nodefaults", "-m", "4G",
> >   "-cpu", "max",
> > -"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22",
> > -"-device", "virtio-net-pci,netdev=vnet",
> >   "-vnc", "127.0.0.1:0,to=20",
> >   "-serial", "file:%s" % os.path.join(self._tmpdir, 
> > "serial.out")]
> >   if vcpus and vcpus > 1:
> > @@ -146,8 +144,10 @@ class BaseVM(object):
> >   "-device",
> >   "virtio-blk,drive=%s,serial=%s,bootindex=1" % 
> > (name, name)]
> > -def boot(self, img, extra_args=[]):
> > +def boot(self, img, extra_args=[], extra_usernet_args=""):
> >   args = self._args + [
> > +"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22" + 
> > extra_usernet_args,
> > +"-device", "virtio-net-pci,netdev=vnet",
> >   "-device", "VGA",
> >   "-drive", "file=%s,if=none,id=drive0,cache=writeback" % img,
> >   "-device", "virtio-blk,drive=drive0,bootindex=0"]
> > diff --git a/tests/vm/openbsd b/tests/vm/openbsd
> > index cfe0572c59..99a7e98d80 100755
> > --- a/tests/vm/openbsd
> > +++ b/tests/vm/openbsd
> > @@ -14,6 +14,9 @@
> >   import os
> >   import sys
> >   import subprocess
> > +import time
> > +import atexit
> > +import tempfile
> >   import basevm
> >   class OpenBSDVM(basevm.BaseVM):
> > @@ -21,25 +24,83 @@ class OpenBSDVM(basevm.BaseVM):
> >   arch = "x86_64"
> >   BUILD_SCRIPT = """
> >   set -e;
> > -rm -rf /var/tmp/qemu-test.*
> > -cd $(mktemp -d /var/tmp/qemu-test.XX);
> > +rm -rf $HOME/qemu-test.*
> > +cd $(mktemp -d $HOME/qemu-test.XX);
> >   tar -xf /dev/rsd1c;
> > -./configure --cc=x86_64-unknown-openbsd6.1-gcc-4.9.4 
> > --python=python2.7 {configure_opts};
> > +./configure {configure_opts};
> >   gmake --output-sync -j{jobs} {verbose};
> >   # XXX: "gmake check" seems to always hang or fail
> >   #gmake --output-sync -j{jobs} check {verbose};
> >   """
> > +def _install_os(self, img):
> > +tmpdir = tempfile.mkdtemp()
> > +pxeboot = 
> > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/pxeboot;,
> > +
> > sha256sum="d87ab39d941ff926d693943a927585945456ccedb76ea504a251b4b93cd4c266")
> > +bsd_rd = 
> > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/bsd.rd;,
> > +
> > sha256sum="89505c683cbcd75582fe475e847ed53d89e2b8180c3e3d61f4eb4b76b5e11f5c")
> > +install = 
> > self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/install64.iso;,
> > +
> > sha256sum='81833b79e23dc0f961ac5fb34484bca66386deb3181ddb8236870fa4f488cdd2')
> > +subprocess.check_call(["qemu-img", "create", img, "32G"])
> > +subprocess.check_call(["cp", pxeboot, os.path.join(tmpdir, 
> > "auto_install")])
> > +subprocess.check_call(["cp", bsd_rd, os.path.join(tmpdir, "bsd")])
> > +
> > +self._gen_install_conf(tmpdir)
> > +# BOOTP filename being auto_inst

Re: [Qemu-devel] [PATCH v5 0/3] file-posix: Simplifications on image locking

2018-11-07 Thread Fam Zheng

On Thu, 10/11 15:21, Fam Zheng wrote:
> v5: Address Max's comments (Thanks for reviewing):
> - Clean up after test done.
> - Add rev-by to patch 1 and 2.

Ping?

Fam

Re: [Qemu-devel] [PATCH 0/2] target/arm: fix some ATS* bugs

2018-11-06 Thread Fam Zheng

On Tue, 11/06 01:50, no-re...@patchew.org wrote:
> ERROR: unknown option --with-gtkabi=3.0

Patchew is testing old series branches of which are heavily lagging behind. This
configure option in the mingw docker testing is recently dropped, so it's a
false positive.

Fam


> Try '/tmp/qemu-test/src/configure --help' for more information
> # QEMU configure log Tue Nov  6 09:50:16 UTC 2018
> # Configured with: '/tmp/qemu-test/src/configure' '--enable-werror' 
> '--target-list=x86_64-softmmu,aarch64-softmmu' 
> '--prefix=/tmp/qemu-test/install' '--python=/usr/bin/python3' 
> '--cross-prefix=x86_64-w64-mingw32-' '--enable-trace-backends=simple' 
> '--enable-gnutls' '--enable-nettle' '--enable-curl' '--enable-vnc' 
> '--enable-bzip2' '--enable-guest-agent' '--with-sdlabi=2.0' 
> '--with-gtkabi=3.0'
> #
> 
> funcs: do_compiler do_cc compile_object check_define main
> lines: 92 119 608 625 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> config-temp/qemu-conf.c:2:2: error: #error __linux__ not defined
>  #error __linux__ not defined
>   ^
> 
> funcs: do_compiler do_cc compile_object check_define main
> lines: 92 119 608 627 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> 
> funcs: do_compiler do_cc compile_object check_define main
> lines: 92 119 608 677 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> config-temp/qemu-conf.c:2:2: error: #error __i386__ not defined
>  #error __i386__ not defined
>   ^
> 
> funcs: do_compiler do_cc compile_object check_define main
> lines: 92 119 608 679 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> 
> funcs: do_compiler do_cc compile_object check_define main
> lines: 92 119 608 680 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> config-temp/qemu-conf.c:2:2: error: #error __ILP32__ not defined
>  #error __ILP32__ not defined
>   ^
> 
> funcs: do_compiler do_cc compile_object check_include main
> lines: 92 119 616 771 0
> x86_64-w64-mingw32-gcc -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -c -o config-temp/qemu-conf.o config-temp/qemu-conf.c
> 
> funcs: do_compiler do_cc compile_prog main
> lines: 92 125 907 0
> x86_64-w64-mingw32-gcc -mthreads -D__USE_MINGW_ANSI_STDIO=1 
> -DWIN32_LEAN_AND_MEAN -DWINVER=0x501 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
> -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef 
> -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
> -o config-temp/qemu-conf.exe config-temp/qemu-conf.c -g -liberty
> /usr/lib/gcc/x86_64-w64-mingw32/7.3.0/../../../../x86_64-w64-mingw32/bin/ld: 
> cannot find -liberty
> collect2: error: ld returned 1 exit status
> Failed to run 'configure'
> Traceback (most recent call last):
>   File "./tests/docker/docker.py", line 563, in 
> sys.exit(main())
>   File "./tests/docker/docker.py", line 560, in main
> return args.cmdobj.run(args, argv)
>   File "./tests/docker/docker.py", line 306, in run
> return Docker().run(argv, args.keep, quiet=args.quiet)
>   File "./tests/docker/docker.py", line 274, in run
> quiet=quiet)
>   File "./tests/docker/docker.py", line 181, in _do_check
> return subprocess.check_call(self._command + cmd, **kwargs)
>   File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
> raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
> '--label', 'com.qemu.instance.uuid=5bcf7f1ce1a911e88b3152540069c830', '-u', 
> '1000', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 
> 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 
> 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
>

Re: [Qemu-devel] [PATCH] docker: Use a stable snapshot for Debian Sid

2018-11-02 Thread Fam Zheng

On Fri, Nov 2, 2018 at 3:20 PM Philippe Mathieu-Daudé  wrote:
>
> Hi Fam,
>
> Thanks for picking this.
>
> On Fri, Nov 2, 2018 at 7:48 AM Fam Zheng  wrote:
> > On Thu, 11/01 19:37, Philippe Mathieu-Daudé wrote:
> > > The Debian Sid repository is not garanteed to be stable, as his
> > > 'unstable' name suggest :)
>
> There is an error in "my be" -> "might be"...
> Do you mind to update the comment:
>
> > > To allow quick testing, packages are pushed various time a day,
> > > which my be annoying when trying to use it for stable development
> > > (which is not recommended, but Sid provides edge packages we use
> > > for testing).
>
> By:
>
> To allow quick testing, Debian maintainers might push packages
> various time a day. Sometime package dependencies might break,
> which is annoying when using this repository for stable development
> (which is not recommended, but Sid provides edge packages we use
> for testing).

Sure, updated in my queue.

Fam

Re: [Qemu-devel] [PATCH] docker: Use a stable snapshot for Debian Sid

2018-11-02 Thread Fam Zheng

On Thu, 11/01 19:37, Philippe Mathieu-Daudé wrote:
> The Debian Sid repository is not garanteed to be stable, as his
> 'unstable' name suggest :)
> To allow quick testing, packages are pushed various time a day,
> which my be annoying when trying to use it for stable development
> (which is not recommended, but Sid provides edge packages we use
> for testing).
> 
> Debian provides repositories snapshots which are suitable for our
> use. Pick a recent date that works. When required, update to newer
> releases will be easy.
> 
> This fixes current issues with this image:
> 
>   $ make docker-image-debian-sid
>   [...]
>   The following packages have unmet dependencies:
>build-essential : Depends: dpkg-dev (>= 1.17.11) but it is not going to be 
> installed
>git : Depends: perl but it is not going to be installed
>  Depends: liberror-perl but it is not going to be installed
>pkg-config : Depends: libdpkg-perl but it is not going to be installed
>texinfo : Depends: perl (>= 5.26.2-6) but it is not going to be installed
>  Depends: libtext-unidecode-perl but it is not going to be 
> installed
>  Depends: libxml-libxml-perl but it is not going to be installed
>   E: Unable to correct problems, you have held broken packages.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/docker/dockerfiles/debian-sid.docker | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tests/docker/dockerfiles/debian-sid.docker 
> b/tests/docker/dockerfiles/debian-sid.docker
> index 9a3d168705..4e4cda0ba5 100644
> --- a/tests/docker/dockerfiles/debian-sid.docker
> +++ b/tests/docker/dockerfiles/debian-sid.docker
> @@ -13,6 +13,10 @@
>  
>  FROM debian:sid-slim
>  
> +# Use a snapshot known to work (see http://snapshot.debian.org/#Usage)
> +ENV DEBIAN_SNAPSHOT_DATE "20181030"
> +RUN sed -i "s%^deb \(https\?://\)deb.debian.org/debian/\? \(.*\)%deb 
> [check-valid-until=no] 
> \1snapshot.debian.org/archive/debian/${DEBIAN_SNAPSHOT_DATE} \2%" 
> /etc/apt/sources.list
> +
>  # Duplicate deb line as deb-src
>  RUN cat /etc/apt/sources.list | sed "s/^deb\ /deb-src /" >> 
> /etc/apt/sources.list
>  
> -- 
> 2.17.2
> 

Queued, thanks!

Fam

Re: [Qemu-devel] [PATCH] block/nvme: optimize the performance of nvme driver based on vfio-pci

2018-11-02 Thread Fam Zheng

On Thu, 11/01 18:38, Li Feng wrote:
> When the IO size is larger than 2 pages, we move the the pointer one by
> one in the pagelist, this is inefficient.
> 
> This is a simple benchmark result:
> 
> Before:
> $ qemu-io -c 'write 0 1G' nvme://:00:04.0/1
> 
> wrote 1073741824/1073741824 bytes at offset 0
> 1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec)
> 
>  $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1
> 
> read 1073741824/1073741824 bytes at offset 0
> 1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec)
> 
> After:
> $ qemu-io -c 'write 0 1G' nvme://:00:04.0/1
> 
> wrote 1073741824/1073741824 bytes at offset 0
> 1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec)
> 
>  $ qemu-io -c 'read 0 1G' nvme://:00:04.0/1   
>   
>   
>1 ↵
> 
> read 1073741824/1073741824 bytes at offset 0
> 1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec)
> 
> Signed-off-by: Li Feng 
> ---
>  block/nvme.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 29294038fc..982097b5b1 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -837,7 +837,7 @@ try_map:
>  }
>  
>  for (j = 0; j < qiov->iov[i].iov_len / s->page_size; j++) {
> -pagelist[entries++] = iova + j * s->page_size;
> +pagelist[entries++] = cpu_to_le64(iova + j * s->page_size);
>  }
>  trace_nvme_cmd_map_qiov_iov(s, i, qiov->iov[i].iov_base,
>  qiov->iov[i].iov_len / s->page_size);
> @@ -850,20 +850,16 @@ try_map:
>  case 0:
>  abort();
>  case 1:
> -cmd->prp1 = cpu_to_le64(pagelist[0]);
> +cmd->prp1 = pagelist[0];
>  cmd->prp2 = 0;
>  break;
>  case 2:
> -cmd->prp1 = cpu_to_le64(pagelist[0]);
> -cmd->prp2 = cpu_to_le64(pagelist[1]);;
> +cmd->prp1 = pagelist[0];
> +cmd->prp2 = pagelist[1];
>  break;
>  default:
> -cmd->prp1 = cpu_to_le64(pagelist[0]);
> -cmd->prp2 = cpu_to_le64(req->prp_list_iova);
> -for (i = 0; i < entries - 1; ++i) {
> -pagelist[i] = cpu_to_le64(pagelist[i + 1]);
> -}
> -pagelist[entries - 1] = 0;
> +cmd->prp1 = pagelist[0];
> +cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
>  break;
>  }
>  trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> -- 
> 2.11.0
> 

Nice! Thanks. I've queued the patch.

Fam

Re: [Qemu-devel] [PATCH] tests/vm: Use subprocess.Popen() with to uncompress XZ files

2018-11-01 Thread Fam Zheng

On Sat, 10/13 02:30, Philippe Mathieu-Daudé wrote:
> Avoiding the file copy greatly speeds the process up.
> 
> Comparison with network file already cached, stopping after build_image():
> 
> Before:
>   $ time make vm-build-freebsd
>   real1m38.153s
>   user1m16.871s
>   sys 0m19.325s
> 
> After:
>   $ time make vm-build-freebsd
>   real0m13.512s
>   user0m9.520s
>   sys 0m3.685s
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/vm/centos  | 5 ++---
>  tests/vm/freebsd | 9 ++---
>  tests/vm/netbsd  | 9 ++---
>  tests/vm/openbsd | 9 ++---
>  4 files changed, 8 insertions(+), 24 deletions(-)
> 
> diff --git a/tests/vm/centos b/tests/vm/centos
> index afd560c564..fa653c1650 100755
> --- a/tests/vm/centos
> +++ b/tests/vm/centos
> @@ -63,9 +63,8 @@ class CentosVM(basevm.BaseVM):
>  
>  def build_image(self, img):
>  cimg = 
> self._download_with_cache("https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1802.qcow2.xz;)
> -img_tmp = img + ".tmp"
> -subprocess.check_call(["cp", "-f", cimg, img_tmp + ".xz"])
> -subprocess.check_call(["xz", "-df", img_tmp + ".xz"])
> +with open(img, 'wb', 0644) as output:
> +subprocess.Popen(["xz", "--threads=0", "--decompress", 
> "--force", "--to-stdout", cimg], stdout=output)
>  subprocess.check_call(["qemu-img", "resize", img_tmp, "50G"])
>  self.boot(img_tmp, extra_args = ["-cdrom", 
> self._gen_cloud_init_iso()])
>  self.wait_ssh()
> diff --git a/tests/vm/freebsd b/tests/vm/freebsd
> index b6983127d0..7ec43a4c5c 100755
> --- a/tests/vm/freebsd
> +++ b/tests/vm/freebsd
> @@ -31,13 +31,8 @@ class FreeBSDVM(basevm.BaseVM):
>  def build_image(self, img):
>  cimg = 
> self._download_with_cache("http://download.patchew.org/freebsd-11.1-amd64.img.xz;,
>  
> sha256sum='adcb771549b37bc63826c501f05121a206ed3d9f55f49145908f7e1432d65891')
> -img_tmp_xz = img + ".tmp.xz"
> -img_tmp = img + ".tmp"
> -subprocess.check_call(["cp", "-f", cimg, img_tmp_xz])
> -subprocess.check_call(["xz", "-df", img_tmp_xz])
> -if os.path.exists(img):
> -os.remove(img)
> -os.rename(img_tmp, img)
> +with open(img, 'wb', 0644) as output:
> +subprocess.Popen(["xz", "--threads=0", "--decompress", 
> "--force", "--to-stdout", cimg], stdout=output)

What if xz is terminated before the image is fully polulated? The next time make
is invoked we want build_image to start over, but with this change it won't.

Fam

>  
>  if __name__ == "__main__":
>  sys.exit(basevm.main(FreeBSDVM))
> diff --git a/tests/vm/netbsd b/tests/vm/netbsd
> index a4e25820d5..99023344b7 100755
> --- a/tests/vm/netbsd
> +++ b/tests/vm/netbsd
> @@ -31,13 +31,8 @@ class NetBSDVM(basevm.BaseVM):
>  def build_image(self, img):
>  cimg = 
> self._download_with_cache("http://download.patchew.org/netbsd-7.1-amd64.img.xz;,
>   
> sha256sum='b633d565b0eac3d02015cd0c81440bd8a7a8df8512615ac1ee05d318be015732')
> -img_tmp_xz = img + ".tmp.xz"
> -img_tmp = img + ".tmp"
> -subprocess.check_call(["cp", "-f", cimg, img_tmp_xz])
> -subprocess.check_call(["xz", "-df", img_tmp_xz])
> -if os.path.exists(img):
> -os.remove(img)
> -os.rename(img_tmp, img)
> +with open(img, 'wb', 0644) as output:
> +subprocess.Popen(["xz", "--threads=0", "--decompress", 
> "--force", "--to-stdout", cimg], stdout=output)
>  
>  if __name__ == "__main__":
>  sys.exit(basevm.main(NetBSDVM))
> diff --git a/tests/vm/openbsd b/tests/vm/openbsd
> index 52500ee52b..57604bb43d 100755
> --- a/tests/vm/openbsd
> +++ b/tests/vm/openbsd
> @@ -32,13 +32,8 @@ class OpenBSDVM(basevm.BaseVM):
>  def build_image(self, img):
>  cimg = 
> self._download_with_cache("http://download.patchew.org/openbsd-6.1-amd64.img.xz;,
>  
> sha256sum='8c6cedc483e602cfee5e04f0406c64eb99138495e8ca580bc0293bcf0640c1bf')
> -img_tmp_xz = img + ".tmp.xz"
> -img_tmp = img + ".tmp"
> -subprocess.check_call(["cp", "-f", cimg, img_tmp_xz])
> -subprocess.check_call(["xz", "-df", img_tmp_xz])
> -if os.path.exists(img):
> -os.remove(img)
> -os.rename(img_tmp, img)
> +with open(img, 'wb', 0644) as output:
> +subprocess.Popen(["xz", "--threads=0", "--decompress", 
> "--force", "--to-stdout", cimg], stdout=output)
>  
>  if __name__ == "__main__":
>  sys.exit(basevm.main(OpenBSDVM))
> -- 
> 2.19.1
>

[Qemu-devel] [PATCH v3] file-posix: Use error API properly

2018-11-01 Thread Fam Zheng

Use error_report for situations that affect user operation (i.e.  we're
actually returning error), and warn_report/warn_report_err when some
less critical error happened but the user operation can still carry on.

For raw_normalize_devicepath, add Error parameter to propagate to
its callers.

Suggested-by: Markus Armbruster 
Signed-off-by: Fam Zheng 

---

v3: Use error_setg_errno. [Eric]

v2: Add Error ** to raw_normalize_devicepath. [Markus]
Use error_printf for splitting multi-sentence message. [Markus]
---
 block/file-posix.c | 39 ---
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2da3a76355..e5606655b6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -205,7 +205,7 @@ static int cdrom_reopen(BlockDriverState *bs);
 #endif
 
 #if defined(__NetBSD__)
-static int raw_normalize_devicepath(const char **filename)
+static int raw_normalize_devicepath(const char **filename, Error **errp)
 {
 static char namebuf[PATH_MAX];
 const char *dp, *fname;
@@ -214,8 +214,7 @@ static int raw_normalize_devicepath(const char **filename)
 fname = *filename;
 dp = strrchr(fname, '/');
 if (lstat(fname, ) < 0) {
-fprintf(stderr, "%s: stat failed: %s\n",
-fname, strerror(errno));
+error_setg_errno(errp, errno, "%s: stat failed", fname);
 return -errno;
 }
 
@@ -229,14 +228,13 @@ static int raw_normalize_devicepath(const char **filename)
 snprintf(namebuf, PATH_MAX, "%.*s/r%s",
 (int)(dp - fname), fname, dp + 1);
 }
-fprintf(stderr, "%s is a block device", fname);
 *filename = namebuf;
-fprintf(stderr, ", using %s\n", *filename);
+warn_report("%s is a block device, using %s", fname, *filename);
 
 return 0;
 }
 #else
-static int raw_normalize_devicepath(const char **filename)
+static int raw_normalize_devicepath(const char **filename, Error **errp)
 {
 return 0;
 }
@@ -461,9 +459,8 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 
 filename = qemu_opt_get(opts, "filename");
 
-ret = raw_normalize_devicepath();
+ret = raw_normalize_devicepath(, errp);
 if (ret != 0) {
-error_setg_errno(errp, -ret, "Could not normalize device path");
 goto fail;
 }
 
@@ -492,11 +489,10 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 case ON_OFF_AUTO_ON:
 s->use_lock = true;
 if (!qemu_has_ofd_lock()) {
-fprintf(stderr,
-"File lock requested but OFD locking syscall is "
-"unavailable, falling back to POSIX file locks.\n"
-"Due to the implementation, locks can be lost "
-"unexpectedly.\n");
+warn_report("File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks");
+error_printf("Due to the implementation, locks can be lost "
+ "unexpectedly.\n");
 }
 break;
 case ON_OFF_AUTO_OFF:
@@ -805,7 +801,7 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 /* Theoretically the above call only unlocks bytes and it cannot
  * fail. Something weird happened, report it.
  */
-error_report_err(local_err);
+warn_report_err(local_err);
 }
 break;
 case RAW_PL_COMMIT:
@@ -815,7 +811,7 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 /* Theoretically the above call only unlocks bytes and it cannot
  * fail. Something weird happened, report it.
  */
-error_report_err(local_err);
+warn_report_err(local_err);
 }
 break;
 }
@@ -892,10 +888,8 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* If we cannot use fcntl, or fcntl failed, fall back to qemu_open() */
 if (rs->fd == -1) {
 const char *normalized_filename = state->bs->filename;
-ret = raw_normalize_devicepath(_filename);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not normalize device path");
-} else {
+ret = raw_normalize_devicepath(_filename, errp);
+if (ret >= 0) {
 assert(!(rs->open_flags & O_CREAT));
 rs->fd = qemu_open(normalized_filename, rs->open_flags);
 if (rs->fd == -1) {
@@ -1775,7 +1769,7 @@ static int aio_worker(void *arg)
 ret = handle_aiocb_truncate(aiocb);
 break;
 default:
-fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
+error_report("invalid aio request (0x%x)", aiocb-

Re: [Qemu-devel] [PATCH v2] file-posix: Use error API properly

2018-11-01 Thread Fam Zheng

On Wed, 10/31 15:51, Markus Armbruster wrote:
> Fam Zheng  writes:
> 
> > Use error_report for situations that affect user operation (i.e.  we're
> > actually returning error), and warn_report/warn_report_err when some
> > less critical error happened but the user operation can still carry on.
> >
> > For raw_normalize_devicepath, add Error parameter to propagate to
> > its callers.
> >
> > Suggested-by: Markus Armbruster 
> > Signed-off-by: Fam Zheng 
> 
> Reviewed-by: Markus Armbruster 
> 
> Kevin, if you'd prefer this to go through my tree, let me know.

Thanks for the review. I'll respin and use error_setg_errno as suggested by
Eric, if it's okay.

Fam

Re: [Qemu-devel] [PATCH] iotests: 082: Update output

2018-10-31 Thread Fam Zheng

On Wed, 10/31 16:04, Fam Zheng wrote:
> Commit 9cbef9d68e (qemu-option: improve qemu_opts_print_help() output)
> affected qemu-img help output, and broke this test case.
> 
> Update the output reference to fix it.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> I'm once again looking at enabling iotests on patchew (via vm based
> tests), but immediately got blocked by this. :(

Please ignore this one, there's already another patch and comments buried deep
in my maildir. I'll peacefully wait for a fix then.

Fam

[Qemu-devel] [PATCH] iotests: 082: Update output

2018-10-31 Thread Fam Zheng

Commit 9cbef9d68e (qemu-option: improve qemu_opts_print_help() output)
affected qemu-img help output, and broke this test case.

Update the output reference to fix it.

Signed-off-by: Fam Zheng 

---

I'm once again looking at enabling iotests on patchew (via vm based
tests), but immediately got blocked by this. :(
---
 tests/qemu-iotests/082.out | 956 ++---
 1 file changed, 478 insertions(+), 478 deletions(-)

diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 19e9fb13ff..2672349e1d 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -44,171 +44,171 @@ cluster_size: 8192
 
 Testing: create -f qcow2 -o help TEST_DIR/t.qcow2 128M
 Supported options:
-size Virtual disk size
-compat   Compatibility level (0.10 or 1.1)
-backing_file File name of a base image
-backing_fmt  Image format of the base image
-encryption   Encrypt the image with format 'aes'. (Deprecated in favor of 
encrypt.format=aes)
-encrypt.format   Encrypt the image, format choices: 'aes', 'luks'
-encrypt.key-secret ID of secret providing qcow AES key or LUKS passphrase
-encrypt.cipher-alg Name of encryption cipher algorithm
-encrypt.cipher-mode Name of encryption cipher mode
-encrypt.ivgen-alg Name of IV generator algorithm
-encrypt.ivgen-hash-alg Name of IV generator hash algorithm
-encrypt.hash-alg Name of encryption hash algorithm
-encrypt.iter-time Time to spend in PBKDF in milliseconds
-cluster_size qcow2 cluster size
-preallocationPreallocation mode (allowed values: off, metadata, falloc, 
full)
-lazy_refcounts   Postpone refcount updates
-refcount_bitsWidth of a reference count entry in bits
-nocowTurn off copy-on-write (valid only on btrfs)
+backing_file=str - File name of a base image
+backing_fmt=str - Image format of the base image
+cluster_size=size - qcow2 cluster size
+compat=str - Compatibility level (0.10 or 1.1)
+encrypt.cipher-alg=str - Name of encryption cipher algorithm
+encrypt.cipher-mode=str - Name of encryption cipher mode
+encrypt.format=str - Encrypt the image, format choices: 'aes', 'luks'
+encrypt.hash-alg=str - Name of encryption hash algorithm
+encrypt.iter-time=num - Time to spend in PBKDF in milliseconds
+encrypt.ivgen-alg=str - Name of IV generator algorithm
+encrypt.ivgen-hash-alg=str - Name of IV generator hash algorithm
+encrypt.key-secret=str - ID of secret providing qcow AES key or LUKS passphrase
+encryption=bool (on/off) - Encrypt the image with format 'aes'. (Deprecated in 
favor of encrypt.format=aes)
+lazy_refcounts=bool (on/off) - Postpone refcount updates
+nocow=bool (on/off) - Turn off copy-on-write (valid only on btrfs)
+preallocation=str - Preallocation mode (allowed values: off, metadata, falloc, 
full)
+refcount_bits=num - Width of a reference count entry in bits
+size=size - Virtual disk size
 
 Testing: create -f qcow2 -o ? TEST_DIR/t.qcow2 128M
 Supported options:
-size Virtual disk size
-compat   Compatibility level (0.10 or 1.1)
-backing_file File name of a base image
-backing_fmt  Image format of the base image
-encryption   Encrypt the image with format 'aes'. (Deprecated in favor of 
encrypt.format=aes)
-encrypt.format   Encrypt the image, format choices: 'aes', 'luks'
-encrypt.key-secret ID of secret providing qcow AES key or LUKS passphrase
-encrypt.cipher-alg Name of encryption cipher algorithm
-encrypt.cipher-mode Name of encryption cipher mode
-encrypt.ivgen-alg Name of IV generator algorithm
-encrypt.ivgen-hash-alg Name of IV generator hash algorithm
-encrypt.hash-alg Name of encryption hash algorithm
-encrypt.iter-time Time to spend in PBKDF in milliseconds
-cluster_size qcow2 cluster size
-preallocationPreallocation mode (allowed values: off, metadata, falloc, 
full)
-lazy_refcounts   Postpone refcount updates
-refcount_bitsWidth of a reference count entry in bits
-nocowTurn off copy-on-write (valid only on btrfs)
+backing_file=str - File name of a base image
+backing_fmt=str - Image format of the base image
+cluster_size=size - qcow2 cluster size
+compat=str - Compatibility level (0.10 or 1.1)
+encrypt.cipher-alg=str - Name of encryption cipher algorithm
+encrypt.cipher-mode=str - Name of encryption cipher mode
+encrypt.format=str - Encrypt the image, format choices: 'aes', 'luks'
+encrypt.hash-alg=str - Name of encryption hash algorithm
+encrypt.iter-time=num - Time to spend in PBKDF in milliseconds
+encrypt.ivgen-alg=str - Name of IV generator algorithm
+encrypt.ivgen-hash-alg=str - Name of IV generator hash algorithm
+encrypt.key-secret=str - ID of secret providing qcow AES key or LUKS passphrase
+encryption=bool (on/off) - Encrypt the image with format 'aes'. (Deprecated in 
favor of encrypt.format=aes)
+lazy_refcounts=bool (on/off) - Postpone refcount updates
+nocow=bool (on/off) - Turn off copy-on-write (valid only on btrfs)
+preallocation=str - Preallocation mode (allowed values

[Qemu-devel] [PATCH v2] file-posix: Use error API properly

2018-10-30 Thread Fam Zheng

Use error_report for situations that affect user operation (i.e.  we're
actually returning error), and warn_report/warn_report_err when some
less critical error happened but the user operation can still carry on.

For raw_normalize_devicepath, add Error parameter to propagate to
its callers.

Suggested-by: Markus Armbruster 
Signed-off-by: Fam Zheng 

---

v2: Add Error ** to raw_normalize_devicepath. [Markus]
Use error_printf for splitting multi-sentence message. [Markus]
---
 block/file-posix.c | 39 ---
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2da3a76355..b7f0316005 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -205,7 +205,7 @@ static int cdrom_reopen(BlockDriverState *bs);
 #endif
 
 #if defined(__NetBSD__)
-static int raw_normalize_devicepath(const char **filename)
+static int raw_normalize_devicepath(const char **filename, Error **errp)
 {
 static char namebuf[PATH_MAX];
 const char *dp, *fname;
@@ -214,8 +214,7 @@ static int raw_normalize_devicepath(const char **filename)
 fname = *filename;
 dp = strrchr(fname, '/');
 if (lstat(fname, ) < 0) {
-fprintf(stderr, "%s: stat failed: %s\n",
-fname, strerror(errno));
+error_setg(errp, "%s: stat failed: %s", fname, strerror(errno));
 return -errno;
 }
 
@@ -229,14 +228,13 @@ static int raw_normalize_devicepath(const char **filename)
 snprintf(namebuf, PATH_MAX, "%.*s/r%s",
 (int)(dp - fname), fname, dp + 1);
 }
-fprintf(stderr, "%s is a block device", fname);
 *filename = namebuf;
-fprintf(stderr, ", using %s\n", *filename);
+warn_report("%s is a block device, using %s", fname, *filename);
 
 return 0;
 }
 #else
-static int raw_normalize_devicepath(const char **filename)
+static int raw_normalize_devicepath(const char **filename, Error **errp)
 {
 return 0;
 }
@@ -461,9 +459,8 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 
 filename = qemu_opt_get(opts, "filename");
 
-ret = raw_normalize_devicepath();
+ret = raw_normalize_devicepath(, errp);
 if (ret != 0) {
-error_setg_errno(errp, -ret, "Could not normalize device path");
 goto fail;
 }
 
@@ -492,11 +489,10 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 case ON_OFF_AUTO_ON:
 s->use_lock = true;
 if (!qemu_has_ofd_lock()) {
-fprintf(stderr,
-"File lock requested but OFD locking syscall is "
-"unavailable, falling back to POSIX file locks.\n"
-"Due to the implementation, locks can be lost "
-"unexpectedly.\n");
+warn_report("File lock requested but OFD locking syscall is "
+"unavailable, falling back to POSIX file locks");
+error_printf("Due to the implementation, locks can be lost "
+ "unexpectedly.\n");
 }
 break;
 case ON_OFF_AUTO_OFF:
@@ -805,7 +801,7 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 /* Theoretically the above call only unlocks bytes and it cannot
  * fail. Something weird happened, report it.
  */
-error_report_err(local_err);
+warn_report_err(local_err);
 }
 break;
 case RAW_PL_COMMIT:
@@ -815,7 +811,7 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 /* Theoretically the above call only unlocks bytes and it cannot
  * fail. Something weird happened, report it.
  */
-error_report_err(local_err);
+warn_report_err(local_err);
 }
 break;
 }
@@ -892,10 +888,8 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 /* If we cannot use fcntl, or fcntl failed, fall back to qemu_open() */
 if (rs->fd == -1) {
 const char *normalized_filename = state->bs->filename;
-ret = raw_normalize_devicepath(_filename);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not normalize device path");
-} else {
+ret = raw_normalize_devicepath(_filename, errp);
+if (ret >= 0) {
 assert(!(rs->open_flags & O_CREAT));
 rs->fd = qemu_open(normalized_filename, rs->open_flags);
 if (rs->fd == -1) {
@@ -1775,7 +1769,7 @@ static int aio_worker(void *arg)
 ret = handle_aiocb_truncate(aiocb);
 break;
 default:
-fprintf(stderr, "invalid aio request (0x%x)\n", aiocb->aio_type);
+error_report("invalid aio request (0x%x)", aiocb->aio_type);
 ret =

[Qemu-devel] [PATCH v4] tests: vm: auto_install OpenBSD

2018-10-30 Thread Fam Zheng

Upgrade OpenBSD to 6.4 using auto_install. Especially, drop SDL1,
include SDL2.

Also do the build in $HOME since both /var/tmp and /tmp are tmpfs with
limited capacities.

Signed-off-by: Fam Zheng 

---

v4: Use 6.4. [Brad]
---
 tests/vm/basevm.py |  6 ++--
 tests/vm/openbsd   | 85 +++---
 2 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index 5caf77d6b8..6fb446d4c5 100755
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -68,8 +68,6 @@ class BaseVM(object):
 self._args = [ \
 "-nodefaults", "-m", "4G",
 "-cpu", "max",
-"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22",
-"-device", "virtio-net-pci,netdev=vnet",
 "-vnc", "127.0.0.1:0,to=20",
 "-serial", "file:%s" % os.path.join(self._tmpdir, "serial.out")]
 if vcpus and vcpus > 1:
@@ -146,8 +144,10 @@ class BaseVM(object):
 "-device",
 "virtio-blk,drive=%s,serial=%s,bootindex=1" % 
(name, name)]
 
-def boot(self, img, extra_args=[]):
+def boot(self, img, extra_args=[], extra_usernet_args=""):
 args = self._args + [
+"-netdev", "user,id=vnet,hostfwd=:127.0.0.1:0-:22" + 
extra_usernet_args,
+"-device", "virtio-net-pci,netdev=vnet",
 "-device", "VGA",
 "-drive", "file=%s,if=none,id=drive0,cache=writeback" % img,
 "-device", "virtio-blk,drive=drive0,bootindex=0"]
diff --git a/tests/vm/openbsd b/tests/vm/openbsd
index cfe0572c59..99a7e98d80 100755
--- a/tests/vm/openbsd
+++ b/tests/vm/openbsd
@@ -14,6 +14,9 @@
 import os
 import sys
 import subprocess
+import time
+import atexit
+import tempfile
 import basevm
 
 class OpenBSDVM(basevm.BaseVM):
@@ -21,25 +24,83 @@ class OpenBSDVM(basevm.BaseVM):
 arch = "x86_64"
 BUILD_SCRIPT = """
 set -e;
-rm -rf /var/tmp/qemu-test.*
-cd $(mktemp -d /var/tmp/qemu-test.XX);
+rm -rf $HOME/qemu-test.*
+cd $(mktemp -d $HOME/qemu-test.XX);
 tar -xf /dev/rsd1c;
-./configure --cc=x86_64-unknown-openbsd6.1-gcc-4.9.4 
--python=python2.7 {configure_opts};
+./configure {configure_opts};
 gmake --output-sync -j{jobs} {verbose};
 # XXX: "gmake check" seems to always hang or fail
 #gmake --output-sync -j{jobs} check {verbose};
 """
 
+def _install_os(self, img):
+tmpdir = tempfile.mkdtemp()
+pxeboot = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/pxeboot;,
+
sha256sum="d87ab39d941ff926d693943a927585945456ccedb76ea504a251b4b93cd4c266")
+bsd_rd = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/bsd.rd;,
+
sha256sum="89505c683cbcd75582fe475e847ed53d89e2b8180c3e3d61f4eb4b76b5e11f5c")
+install = 
self._download_with_cache("https://fastly.cdn.openbsd.org/pub/OpenBSD/6.4/amd64/install64.iso;,
+
sha256sum='81833b79e23dc0f961ac5fb34484bca66386deb3181ddb8236870fa4f488cdd2')
+subprocess.check_call(["qemu-img", "create", img, "32G"])
+subprocess.check_call(["cp", pxeboot, os.path.join(tmpdir, 
"auto_install")])
+subprocess.check_call(["cp", bsd_rd, os.path.join(tmpdir, "bsd")])
+
+self._gen_install_conf(tmpdir)
+# BOOTP filename being auto_install makes sure OpenBSD installer
+# not prompt for "auto install mode"
+usernet_args = ",tftp=%s,bootfile=/auto_install" % tmpdir
+usernet_args += ",tftp-server-name=10.0.2.4"
+usernet_args += ",guestfwd=tcp:10.0.2.4:80-cmd:cat %s" % \
+os.path.join(tmpdir, "install.conf")
+self.boot(img,
+  extra_args=["-boot", "once=n", "-no-reboot",
+  "-cdrom", install],
+  extra_usernet_args=usernet_args)
+self.wait()
+
+def _gen_install_conf(self, tmpdir):
+contents = """\
+HTTP/1.0 200 OK
+
+System hostname = qemu-openbsd
+Password for root = qemupass
+Public ssh key for root = {pub_key}
+Allow root ssh login = yes
+Network interfaces = vio0
+IPv4 address for vio0 = dhcp
+Setup a user = qemu
+Password for user = qemupass
+Public ssh key for user = {pub_key}
+What timezone are you in = US/Eastern
+Server = fastly.cdn.openbsd.org
+Use http = yes
+Default IPv4 r

[Qemu-devel] [PULL 7/9] tests/vm: Let kvm_available() work in cross environments

2018-10-26 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20181013004034.6968-7-f4...@amsat.org>
Reviewed-by: Richard Henderson 
Signed-off-by: Fam Zheng 
---
 scripts/qemu.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index 9fc0be4828..bcd24aad82 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -27,6 +27,8 @@ LOG = logging.getLogger(__name__)
 
 
 def kvm_available(target_arch=None):
+if target_arch and target_arch != os.uname()[4]:
+return False
 return os.access("/dev/kvm", os.R_OK | os.W_OK)
 
 
-- 
2.17.1

[Qemu-devel] [PULL 8/9] tests/vm: Do not use -enable-kvm if HOST != TARGET architecture

2018-10-26 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20181013004034.6968-8-f4...@amsat.org>
Reviewed-by: Richard Henderson 
Signed-off-by: Fam Zheng 
---
 tests/vm/basevm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index b2e0de2022..9f4794898a 100755
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -74,7 +74,7 @@ class BaseVM(object):
 "-serial", "file:%s" % os.path.join(self._tmpdir, "serial.out")]
 if vcpus and vcpus > 1:
 self._args += ["-smp", str(vcpus)]
-if kvm_available():
+if kvm_available(self.arch):
 self._args += ["-enable-kvm"]
 else:
 logging.info("KVM not available, not using -enable-kvm")
-- 
2.17.1

[Qemu-devel] [PULL 0/9] Testing patches

2018-10-26 Thread Fam Zheng

The following changes since commit 808ebd66e467f77c0d1f8c6346235f81e9c99cf2:

  Merge remote-tracking branch 'remotes/riscv/tags/riscv-for-master-3.1-sf0' 
into staging (2018-10-25 17:41:03 +0100)

are available in the Git repository at:

  git://github.com/famz/qemu.git tags/testing-pull-request

for you to fetch changes up to 63a24c5e2354833a84f18bdf0e857fad8812f65b:

  tests/vm: Do not abuse parallelism when HOST != TARGET architecture 
(2018-10-26 22:03:21 +0800)


Testing patches

One fix for mingw build and some improvements in VM based testing, many thanks
to Paolo and Phil.



Paolo Bonzini (1):
  tests: docker: update test-mingw for GTK+ 2.0 removal

Philippe Mathieu-Daudé (8):
  tests/vm: Extract the kvm_available() handy function
  tests/vm: Do not abuse parallelism when KVM is not available
  tests/vm: Do not use the -smp option with a single cpu
  tests/vm: Display remaining seconds to wait for a VM to start
  tests/vm: Add a BaseVM::arch property
  tests/vm: Let kvm_available() work in cross environments
  tests/vm: Do not use -enable-kvm if HOST != TARGET architecture
  tests/vm: Do not abuse parallelism when HOST != TARGET architecture

 scripts/qemu.py |  6 ++
 tests/docker/test-mingw |  3 +--
 tests/vm/basevm.py  | 30 +-
 tests/vm/centos |  1 +
 tests/vm/freebsd|  1 +
 tests/vm/netbsd |  1 +
 tests/vm/openbsd|  1 +
 tests/vm/ubuntu.i386|  1 +
 8 files changed, 33 insertions(+), 11 deletions(-)

-- 
2.17.1

[Qemu-devel] [PULL 4/9] tests/vm: Do not use the -smp option with a single cpu

2018-10-26 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20181013004034.6968-4-f4...@amsat.org>
Reviewed-by: Richard Henderson 
Signed-off-by: Fam Zheng 
---
 tests/vm/basevm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index 2bd32dc6ce..9415e7c33a 100755
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -70,7 +70,7 @@ class BaseVM(object):
 "-device", "virtio-net-pci,netdev=vnet",
 "-vnc", "127.0.0.1:0,to=20",
 "-serial", "file:%s" % os.path.join(self._tmpdir, "serial.out")]
-if vcpus:
+if vcpus and vcpus > 1:
 self._args += ["-smp", str(vcpus)]
 if kvm_available():
 self._args += ["-enable-kvm"]
-- 
2.17.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 9154 matches

Mail list logo